3.5.3. Principle Component Analysis (PCA)

It is an essential multivariate statistical technique used to describe the protein dynamics in a spatial scales. It is a linear relationship that extracts essential features of protein using covariance and/or correlation matrices. These matrices are derived from the atomic coordinate that represents the accessible degree of freedom (DOF) of the protein in a simulated trajectory. In the current study, Pearson's cross-correlation matrix was employed as it can normalize the large protein variables and prevent high atomic variations that can skew

the results. In addition, eigenvectors with a specific variance value also play an important role in characterizing the motion of protein in spatial scales. In the current study, essential dynamics of protein were calculated by applying PCA analysis to the protein trajectory. It was observed that different variables were forming tight clusters with narrow angles, which indicates that they were correlated with vectors (PCs) [62]. PCs are the vectors that are used to describe protein motion with respect to variables. Two PCs are used in the current study to characterize the protein motion. In Figure 16, it can be observed that PC1 and PC2 are clearly indicating the behavior of various variables. Distribution on the scatter plot indicates the protein components are tightly clustered with small angles.

**Figure 15.** Residue wise solvent accessible surface area (SASA) for NEK7.

**Figure 16.** The correlation between protein variables and two top PCs.

Correlation matrices are also the correlation coefficients between variables and PCs. In Pearson's cross-correlation, the percent of variance in a protein variable is explained by PCs. Figure 17 is depicting the Pearson correlation graph for NEK7 variables.

**Figure 17.** Pearson correlation graph for NEK7 variables.
