Principal component analysis in the spectral analysis of the dynamic laser speckle patterns

Dynamic laser speckle is a phenomenon that interprets an optical patterns formed by illuminating a surface under changes with coherent light. Therefore, the dynamic change of the speckle patterns caused by biological material is known as biospeckle. Usually, these patterns of optical interference evolving in time are analyzed by graphical or numerical methods, and the analysis in frequency domain has also been an option, however involving large computational requirements which demands new approaches to ﬁlter the images in time. Principal component analysis (PCA) works with the statistical decorrelation of data and it can be used as a data ﬁltering. In this context, the present work evaluated the PCA technique to ﬁlter in time the data from the biospeckle images aiming the reduction of time computer consuming and improving the robustness of the ﬁltering. It was used 64 images of biospeckle in time observed in a maize seed. The images were arranged in a data matrix and statistically uncorrelated by PCA technique, and the reconstructed signals were analyzed using the routine graphical and numerical methods to analyze the biospeckle. Results showed the potential of the PCA tool in ﬁltering the dynamic laser speckle data, with the deﬁnition of markers of principal components related to the biological phenomena and with the advantage of fast computational processing.


INTRODUCTION
Dynamic laser speckle, also known as biospeckle when applied to biological materials, is an optical technique that processes the interference patterns formed when a material is illuminated by coherent light.It is a non-destructive technique and that has been validated as a tool for analysis and quantification of biological activity in the material under study [1].
The term 'biological activity' expressed in the context of speckle does not present a precise definition and it is understood as the result of phenomena such as the Doppler effect, Brownian motion, variations of the refractive index, structural and molecular motions occurring in the material analyzed, among others [2,3].Dynamic laser speckle technique has been used in several areas of research, such as in medicine, industrial processes and agriculture.Some examples of recent application of this tool are the works of Zakharov et al. [4] imaging blood flow in rodent brain, Mavilio et al. [5] studying the process of paint drying, Ansari and Nirala [6] monitoring the maturation of Indian fruits, among others.In addition, the high number of applications of biospeckle brings with themselves the need for techniques of image and signal processing that can help in the interpretation, and offer additional information derived from these optical interference patterns.
The analysis of the data from optical interference patterns can be accomplished using graphical and numerical approaches [1], in turn, Cardoso et al. [7] associated graphical and numerical analysis using the frequency domain to create signatures and isolate some phenomena.
There are many studies analyzing the spectral information of the biospeckle data in different types of material and most use either Fourier or wavelet transforms as tools to analyze the data in the frequency domain.Each method has distinct characteristics and properties.The Fourier transform is suited for stationary signals, which is not the case of the dynamic laser speckle as reported by Sendra et al. [8], and this can compromise or limit the use of the technique.
Moreover, wavelet transforms have shown useful results in the segmentation of tissues, definition of frequency markers, and data filtering, as demonstrated by Sendra et al. [9] in the assessment of apple damage and seed germination, as well as by Cardoso et al. [7] studying seeds of maize and bean and animal cancer.However, the wavelets transform demands complex computational operations, as well as requiring some subjective choices such as that of a mother wavelet.Argoud et al. [10] claimed that the methodology for selection of the base function is not clear yet.
Despite the success of using Fourier and wavelet transform in frequency analysis, there are other filtering techniques in the literature which can be considered as alternative, overcoming the limitations of the methods used currently and providing information about this complex pattern of optical interference.Additionally, even though existing methods that have presented important contributions to dynamic speckle analysis, it may still be considered a complex problem and, therefore, alternative methods should be examined in order to undertake a thorough analysis.
In this context, statistical tools, such as principal component analysis, stand out as an option to analyze biospeckle data.As described by Rabal et al. [11], the statistical techniques are indicated for data with random nature and with time evolution, which is the case with dynamic laser speckle.In the basis of the dynamic laser speckle phenomenon, the laser light scattering in a dynamic way can be related to a multiple range of physical and chemical phenomena that can be the considered the key factor to understand and correlate the dynamic scattered output with the analyzed phenomenon itself [12].
Principal component analysis -PCA -is a classic technique for multivariate statistical analysis of data, which consists essentially in transforming orthogonally a set of correlated observed variables into a new set of uncorrelated variables, called the principal components.The transformation is accomplished by calculation of the eigenvalues and eigenvectors of the data covariance matrix [13,15].
PCA has been used in many applications as a tool to reduce the data volume with the least possible loss of information, classification and clustering of data, extraction and identification of patterns and also filtering of signals [16,17].Papers presented by Souza Filho and Dinniss [18] and by Chen and Qian [19] confirm the potential of principal component analysis as filtering technique.
In this context, the present work aims at proposing the usage of this multivariate statistical tool as an alternative for the spectral analysis of the dynamic laser speckle signal.The proposed method consists in applying the PCA technique as a preprocessing tool for biospeckle signal analysis.The combination of PCA and existing methods like Fujji and GD is shown and promising results have been achieved for real data.
The next section reviews the background theory of the methods used in this work.The first subsection describes the technique of principal component analysis, and Sections 2.2 and 2.3 relate the Fujii and GD methods of graphical analysis of biospeckle patterns while the last part presents the use of the logarithm unit to carry out numerical interpretation of the data.

Principal component analysis (PCA)
Principal component analysis (PCA) is a multivariate statistical technique that describes a set of correlated observations in terms of a new set of orthogonal and uncorrelated variables, called principal components, which are linear combinations of the original variables [20].
The transformation of the data to the PCA domain is performed by the decomposition of the covariance matrix into eigenvalues and eigenvectors, and this technique has been used in several application areas under different approaches, such as use as a denoising method, and with the advantage of being a convenient tool from a computational viewpoint [14,15,21].
Principal components analysis begins with the organization of the data in a matrix X of dimension M × N, in which M represents the number of observations and N the number of variables, as illustrated in Eq. (1).
In order to avoid points distant from the data center having a greater influence than nearby points, (as would arbitrarily occur when data are in different units), the mean of each variable is removed from data.This process is called centralization of data and it is represented by Eq. ( 2).
where y i correspond the data vectors centralized around of the mean, x i are the N sample vectors studied and µ(x i ) consists of the mean of the sample vectors, which can be calculated by Eq. (3).
The variables or sample vectors, as it is called by Zhang et al. [14], are each column of X and are expressed mathematically by Eq. ( 4).
The data matrix organized and centralized on the mean is used to compute the covariance matrix as shown in Eq. ( 5).
in which Y and Y T are, in order, the data matrix centered on the mean and its transpose, and C Y is the covariance matrix.
The diagonal elements of C Y represent the statistical variance while the off-diagonal elements characterize the covariance between variables.Null diagonal covariance means that the random variables are uncorrelated [22], though we cannot affirm about the statistical independence for the biospeckle, since the speckle patterns in time cannot be represented by a Gaussian behavior.Furthermore, the covariance matrix is real and symmetric, which permits us to decompose C Y into a set of eigenvalues and orthogonal eigenvectors [15] using Eq. ( 6). where The eigenvectors represent the contribution to each of the original axes to the composition of the new axes, the principal components.The eigenvalues, in turn, are associated with the original amount of the variance described by each of the eigenvectors [13,23].
The last step of the analysis is the construction of the uncorrelated data matrix that is also known as the principal component scores, and which is formed by the product of the orthonormal eigenvector matrix V and the data matrix organized and centralized on the mean Y, as expressed by Eq. (7).
in which PC is the matrix of uncorrelated principal component scores.
From the data in the PCA domain, it is possible to extract signal characteristics, and according to Zhang et al. [14], the signal and the noise of a data set can be better distinguished in the PCA domain, since the signal energy and noise energy will concentrate in different subsets of the uncorrelated data.Because of this ability, PCA is referred to as a statistical data filtering method.
We can also consider the inverse PCA transform, which is used to back transform the principal component scores (uncorrelated data), thereby reconstructing the original dataset.Eq. ( 8) presents the mathematical expression of the inverse PCA transform.
The inverse PCA transformation is a useful operation since reconstruction of original data with only some specific PCs, discarding the rest of them, can enhance important features not previously easily seen in the data and/or remove the contribution of undesirable features such as noise.Such an operation is also widely used for data compaction.

Fujii method for biospeckle
One way to analyze the interference patterns of the dynamic laser speckle is the use of graphical methods, which display maps of the spatial variability of the biological activity of the material studied, and the Fujii method is a tool that fits this classification.
Fujii et al. [24] presented this technique in the analysis of a sequence of dynamic laser speckle images.The method consists of the summation of the weighted differences between each image and the subsequent image (Eq.( 9)).
Fujii (x, y) where Fujii (x, y) is the resulting image and I k (x, y) is the gray level in the coordinates x and y of the k th image.
The result is a new image, in which it is possible to visualize the spatial variability of biological activity.Regions of high activity are represented in light tonalities while dark areas illustrate regions of low biological activity.
In addition, a feature of the Fujii method is the amplification of movements in darkest areas, making the images clearer when compared with other approaches such as the generalized difference method [3].

Generalized difference method (GD)
The generalized difference approach was introduced by Arizaga et al. [25] as an alternative to the Fujii technique.
The method generalized the summation of the differences of the intensities along the whole sequence of images and the weighting factor was eliminated (Eq.( 10)).
where GD (x, y) is the resulting image, and I k (x, y) is the pixel intensity located in the coordinates x and y of the k th image.

Logarithm unit
Comparison between the results before and after the adoption of the filtering promoted by PCA of the biospeckle data were carried out by means of the logarithm scale, in particular by using the decibel scale.
The decibel (dB) is defined by a logarithmic relationship that expresses the ratio of a value being measured with a reference [26].Eq. ( 11) describes mathematically the logarithmic unit in decibels.
where dB is the result of the logarithmic relationship expressed in dB, and W 1 and W 2 are the energies of the signal studied and the reference signal, respectively.
Negative dB results indicate that the data processing promoted attenuation of the signal energy, whereas positive values express energy gain after application of the analysis.
The energy of a discrete signal k[n] is the summation of squares over time as shown in Eq. (12).
FIG. 1 Organization of the concatenated images in a new data matrix.

MATERIALS AND METHODS
In order to evaluate the proposed method, a database from a maize fruit illuminated by laser was used [27], and the approach adopted was the back-scattering.In the backscattering approach adopted, the laser beam reached the object in a plane and the scattered light that returned from the sample was collected by a CCD camera in the same side of the plane where the laser was positioned.The images in time were acquired in the CCD were processed by image analysis and by statistical procedures in order to quantify or qualify the biospeckle phenomenon.In this work, the database from the illuminated maize had 64 gray level images, each with a resolution of 490 by 256 pixels, and they were collected using the experimental setup with a time rate of 0.08 seconds.The time rate adopted was enough to acquire all the relevant frequencies in the signal, since the biological activity of the maize seed is below 6 Hz [7,9].The images were collected in order to get a sufficient focus of the maize, as well as with a clear definition of the speckle grains, avoiding the saturation of the light or the sub-exposition on the whole sample.Each image of the database was concatenated and the signals formed were vertically arranged side by side following the sequence of the images.Figure 1 illustrates the construction of the concatenated images in the data matrix X.
The data matrix X was transformed to a set of statistically uncorrelated coordinates by the PCA technique, converting the original data to the PCA score domain.In order to study the contribution of each principal component to the composition of the original signal, some principal components were eliminated before application of the inverse PCA transform, and this selection process of the PCs was performed using three approaches: a) Emphasis on the first g principal components; b) Using only the last h PC's; c) A random choice of some PC's.
After selecting PCs, the inverse PCA transform was obtained.
Then, the inverse process of concatenating image was done.Afterward, the reconstructed data were analyzed graphically by the Fujii and GD methods.Figure 2 summarizes the proposed methodology in a flow chart.
In order to carry out a numerical analysis and to assist the interpretation of the processed data, one line each from the Fujii and GD images resulting of the graphical methods was selected, as illustrated in Figure 3.Each line was shown in the same figure to compare its behavior in terms of amplitude.In addition, quantitative analyses were also carried out by calculating the energy of the chosen lines on the dB scale.

RESULTS AND DISCUSSION
4.1 Signal reconstruction using the first g principal components Figure 4 illustrates the biospeckle activity maps of the maize fruit analyzed using the PCA technique, in which we used the first g principal components in the reconstruction process of the signal.The areas of high biological activity are illustrated by the light gray in the images whereas the dark shades are linked to low activity (in pseudocolors red means light gray and blue means dark gray).Furthermore, the images named as Original presented in the Figures 4(a) and 4(b) are, respectively, the Fujii and GD graphics of the biospeckle of the maize fruit unprocessed with the PCA technique, and they are the reference images for the data analysis.
The total reconstruction of the data, using all 64 principal components in the inverse transform, presented images visually identical to the reference in both graphical methods, in  Figure 5 shows the selected rows in the GD images where it is possible to observe the filtering effect in the tissues of the embryo and endosperm for different values of g, and Table 1 presents the results of the numerical analysis, based on the data from Figure 5.In the embryo it was expected the highest activity since there are live tissues and water movement contributing to the Doppler beating of the scattered light, though in the endosperm the expected activity should be lower than in the embryo since there is no presence of live tissues in there, but only a reserve of nutrients [27].Therefore, the outputs presented the ability to tag that difference with different levels depending on the g values of PCA adopted.The dB values (Table 1) oscillated between 0.05 and 6.86 dB for embryonic tissue whereas for endosperm tissue they kept close to zero, except for g equal to 4, which presented an attenuation of 1.70 dB.These results show numerically a higher attenuation of the embryo data relative to endosperm.Such attenuation is shown in the Figure 5, where the embryo sig-  nal exhibits large changes for the different g values, decreasing the normalized amplitude with the decrease of g, whilst the endosperm signal remained near the original curve.In addition, the correlation index presented lower fluctuations in the values for the endosperm tissue, which also demonstrates preservation of the characteristics of the endosperm signal and modifications of the embryo signal.The better estimation of the level of those noise and variations in the signal can be addressed by some techniques [28] which can validate the filtering outputs at each case.

Negative decibel values in
Kaiser [29] proposed a statistical criterion to define the optimal number of principal components to represent a dataset.Applying this criterion to the database used here, the optimal number of principal components was 16, which explain 94% of the variance of the data.Therefore, setting g equal to 4, which describes 89% of the data variance, is considered too low to represent the dataset by the criterion of Kaiser [29].It explains the achieved attenuation in the amplitude of the endosperm representation signal and the low correlation index.
In this context, the inverse PCA transform using only the first g PCs implements a low pass filter in the time expression of the images, attenuating amplitudes associated to the high frequencies (embryo) and preservation of the low frequencies, which represents the endosperm activity.
According to Scalassara et al. [30], the first principal components contain information of a large proportion of the signal variance and the last contain basically the noise variance (high frequency signal).Consequently, the use of the first g PCs produces a data filtering with elimination of high frequency activities, related to the images varying in time domain, which means concerning to the temporal Fourier transform.sults to Fujii and GD methods, with maps visually identical to original pictures for h = 64.In addition, we note the emphasis in the embryonic part by decreasing the number of PCs.

Signal reconstruction using the last h principal components
These results show that the preprocessing with PCA using the last h principal components served as a high pass filter, highlighting the high frequencies, such as in the embryonic portion, and filtering of the lowest frequencies, which are linked to the biological activity of the endosperm, as discussed by Cardoso et al. [7].
Quantitative results point out higher attenuation in the endosperm activities for low values, achieving -6.97 dB (h = 32) and a correlation index of 0.19 (h = 16), summarized in the Table 2. Figure 6(c) allows us to visualize the filtering effect in the endosperm signal, where the reconstructed lines present amplitudes considerably different from those of the reference signal, except for h = 64, which corresponds to the total reconstruction of the original signal.
Otherwise, the results for the embryo signal presented low oscillation of the decibel values, where the highest attenuation achieved was -0.99 dB for h = 16.The correlation indices (Table 2) also kept high values for different h, except for the last 4 PCs.These results show the preservation of the information retained in the high frequencies, thereby performing a high pass filter by PCA.The goal of using this specific and random number of principal components is to combine both high and low pass filters obtained from PCA in order to improve the results of Fujii and GD methods.Use of high pass filters, low pass filters or band pass filters allows us to define small spectral ranges in which the characteristics of biological, physical or chemical phenomena are concentrated and occurring more intensely, the frequency markers as it is called by Sendra et al. [9] and Cardoso et al. [7].

Random selection of some principal components to application of the inverse PCA transform
In principal component analysis, the terminology used is based on the principal component scores and loadings, and not frequency, but the signal reconstruction using random and specific number of PCs opens an option to define markers of principal components and associate them to biological phenomena, as presented in the Figure 7.The characteristic of the biospeckle signal allowed the use of the PCA as a filtering tool, based in the advantage of performing as a non-parametric and adaptive method, which is desirable for practical implemen- tations.In addition, the PCA filtering presents the advantage of the reducing of the computational time consuming which is relevant in the quasi-online applications.
The first image presented in the Figure 7 is the signal reconstruction using the PCs from 1 to 4 followed by the GD processing, which highlighted information from the endosperm and filtered the embryo signals.Thus, the PCs interval 1-4 can be considered as a marker of principal component for biological activity of the endosperm tissue of the maize fruit.
The same perception occurs in the third image of the Figure 7, in which the PCs from 32 to 36 also are markers of principal components but for biological activity of the embryonic tissue.The result of the analysis emphasized the embryo and attenuated information from endosperm tissue in the GD image.
Finally, the GD image constructed using the signals reconstructed from 8 to 12 PCs (second image of the Figure 7), improved the quality of the output however without any mark.

CONCLUSION
Principal component analysis was proposed as tool to spectral analysis of dynamic laser speckle data and showed to be a powerful tool to analyze biospeckle data, allowing the implementation of filters with different frequency pass band ranges for data analysis concerning to the temporal Fourier transform.
The proposed PCA based method allowed the decomposition of biological activity in the endosperm and embryo of the maize seed example, with the advantage of a blind source separation technique with fast computational processing, in which the orthogonal basis functions used for data decomposition are statistically optimum fitted.In addition, in comparison to conventional low-pass and high pass filters, the PCA based filtering has the advantage of performing as a nonparametric and adaptive method, which is desirable for practical implementations.
The proposed method also provided tissues segmentation of the biological materials, improving the visual quality of the final images and the definition of markers of principal components of the biological phenomena, which supports its potential for biospeckle data analysis.
Figure 4(a) and Figure 4(b) as expected.Moreover, decreasing of the number of the first PCs used in the inverse PCA transform attenuated the embryo information and kept the endosperm separation, so filtering the data and segmenting the tissues.

FIG. 4
FIG. 4 Fujii (a) and GD (b) images performed by PCA analysis with the signals reconstruction using the first g PCs and the correspondent original images.

Figure 6
Figure 6  presents the results of the PCA analysis based on signal reconstruction using the last h PCs, in which are illustrated the Fujii and GD maps, and the selected line in the graphics output with the behavior of the signals for different values of h, respectively.Figures6(a) and 6(b) show similar graphical re-

Figure 7
Figure7illustrates four GD images in which the signals were reconstructed using a small and random number of principal components.

FIG. 6
FIG. 6 Biological activity according to Fujii (a) and GD (b) techniques and the filtering effect in the embryo and endosperm tissues for different numbers of PCs used in the signal reconstruction (c).

FIG. 7
FIG.7GD images resulting of the signal reconstruction using a short and random number of principal components.

Table 1
indicate attenuation of the energy and positive values denote gain of energy in the acquired line.Null values of decibels mean that the two signals compared have the same energy.

TABLE 1
Decibels and correlation index of the signals reconstructed using the first g principal components and of the original signal.

TABLE 2
Numerical analysis for signals reconstructed using the last h principal components.