2.2.4. Level 4: Generation of Mathematical Relationships between Rainfed Sorghum Yield and Excess Precipitation

Principal component analysis (PCA) is a commonly used mathematical tool used to display patterns in multivariate data. It removes correlation within a large set of variables and sorts them according to importance (explained variance) [17]. While PCA is commonly used for dimensionality reduction, it was not used for that purpose in this study. Total precipitation and max 4-day precipitation are somewhat correlated, which could affect the regression relationships. PCA transforms the input variables to remove such correlation. A downside of PCA is that while the original variables have clear interpretations (total growing season precipitation and max 4-day precipitation), the PCA-transformed variables do not. They are called "principal components" 1 and 2.

In our regression analysis, the dependent variable was taken as the rainfed grain sorghum yield data, and the independent variables were growing season total precipitation and maximum 4-day total precipitation (Figure 3). Multiple linear regression (MLR) analysis (Equation (1)) [18] was performed with the data analysis tool available in Microsoft Excel.

$$\mathbf{Y} = \mathbf{A} + \mathbf{B}\_1 \boldsymbol{\chi}\_1 + \mathbf{B}\_2 \boldsymbol{\chi}\_2,\tag{1}$$

where Y is crop yield, A is an intercept, X1 and X2 are growing season total precipitation and maximum 4-day total precipitation respectively, and B1 and B2 are partial regression coefficients [18].

**Figure 3.** Schematic for the multiple linear regression (MLR) with and without a principal component analysis (PCA) (level 4 results).
