*2.2. Singular Value Decomposition*

The principal component analysis of matrix *A* can be equivalent to the eigenvector analysis of covariance matrix *A*T*A*. The load vectors of the matrix *A* are the eigenvector of *A*T*A*. If the eigenvalues of *<sup>A</sup>*T*<sup>A</sup>* are arranged as <sup>λ</sup><sup>1</sup> <sup>≥</sup> <sup>λ</sup><sup>2</sup> ≥···≥ <sup>λ</sup>*<sup>m</sup>* <sup>≥</sup> 0, the eigenvectors *<sup>p</sup>*1, *<sup>p</sup>*2, ··· , *<sup>p</sup>*m, corresponding to the eigenvalues one by one, are the load vectors of the matrix *A*. The SVD of matrix *A* can be expressed by the equation below.

$$A = \mathsf{U}\Sigma\mathsf{V}^{\mathsf{T}} \tag{10}$$

In the equation,

$$\mathcal{U} = [\mathfrak{u}\_1, \mathfrak{u}\_2, \dots, \mathfrak{u}\_n] \in \mathbb{R}^{n \times n} \tag{11}$$

$$\mathbf{V} = [\boldsymbol{\upsilon}\_1, \boldsymbol{\upsilon}\_2, \dots, \boldsymbol{\upsilon}\_m] \in \mathbb{R}^{m \times m} \tag{12}$$

$$\mathbf{E} = \begin{bmatrix} \sigma\_1 & 0 & \cdots & 0 \\ 0 & \sigma\_2 & \cdots & 0 \\ 0 & 0 & \cdots & \sigma\_m \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix} \in \mathbb{R}^{m \times m} \tag{13}$$

where σ<sup>1</sup> > σ<sup>2</sup> > ··· > σ*<sup>m</sup>* are the singular values of the matrix *A*. The singular values of the data matrix *A* are actually the square roots of the eigenvalues of its covariance matrix *A*T*A*. Therefore, the following is true.

$$\begin{cases} \sigma\_1 = \sqrt{\lambda\_1} \\ \sigma\_2 = \sqrt{\lambda\_2} \\ \vdots \\ \sigma\_m = \sqrt{\lambda\_m} \end{cases} \tag{14}$$

If the columns in the matrices *U* and *V* are orthogonal to each other with a length of 1, then Equation (10) can be expressed as the formula below.

$$A = \sigma\_1 \boldsymbol{\mu}\_1 \boldsymbol{\nu}\_1^\mathrm{T} + \sigma\_2 \boldsymbol{\mu}\_2 \boldsymbol{\nu}\_2^\mathrm{T} + \dots + \sigma\_m \boldsymbol{\mu}\_m \boldsymbol{\nu}\_m^\mathrm{T} \tag{15}$$

If *v<sup>i</sup>* is denoted as *p<sup>i</sup>* and σ*iu<sup>i</sup>* as *ti*, Equation (15) is equivalent to Equation (1). σ*iu<sup>i</sup>* is the *i*-th score vector of the data matrix *A*, and *v<sup>i</sup>* is the load vector of the *i*-th principal component.

#### *2.3. Determination of the Number of Principal Components*

PCA is an analytical method to reduce the dimension by eliminating the information of independent variables with strict linear correlation or strong correlation. For *m* independent variables, up to *m* principal component vectors can be obtained. Usually, *k* principal components are used to replace *m* independent variables (*k* < *m*), and the information contained in them accounts for most of the information provided by the original *m* independent variables. In order to quantitatively describe the relative amount of information provided by principal components, the variance contribution rate δ*<sup>i</sup>* of principal component vector *t<sup>i</sup>* is defined by the equation below.

$$\delta\_i = \frac{\lambda\_i}{\sum\_{i=1}^m \lambda\_i} \tag{16}$$

The cumulative contribution rate η*<sup>k</sup>* of the first *k* principal components is defined as:

$$\eta\_k = \frac{\sum\_{i=1}^{k} \lambda\_i}{\sum\_{i=1}^{m} \lambda\_i} \tag{17}$$

where λ*<sup>i</sup>* is the variance of the principal component *ti*, and δ*<sup>i</sup>* is the variance contribution rate of *ti*, which represents the contribution share of *t<sup>i</sup>* to the total information contained in *m* variables. The cumulative contribution rate η*<sup>k</sup>* of principal components is used to represent the proportion of the information contained in the first *k* principal components to the total information.

*2.4. Main Steps of PCA*

The steps of PCA based on Singular Value Decomposition (SVD) are as follows [49].

Input: (1) data matrix *A* = {*x*1, *x*2, ··· , *xm*};

(2) dimension *k* of low-dimensional space.

Steps:

(1) Represent the sample data in the form of column vectors, and conduct zero centered for all samples: *<sup>x</sup><sup>i</sup>* <sup>←</sup> *<sup>x</sup><sup>i</sup>* <sup>−</sup> <sup>1</sup> *m* "*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *x<sup>i</sup>* ;

(2) Calculate the covariance matrix *A*T*A* of the sample;


Output:

(1) Score matrix *T* = [*t*1,*t*2, ··· ,*tk*].
