*4.4. The Correlation Matrix*

In our research, we used a correlation matrix as a tool to summarize the linear relations existent in our data and for identifying the strong and relevant relations that could be further modelled. Therefore, as part of the analytical framework, all data related to the main indicators that characterize the Moldavian agriculture sector were processed using the Python Seaborn library for obtaining a correlation matrix (Figure 8).

**Figure 8.** Correlation matrix of main indicators that characterize the Moldavian agriculture sector.

By analyzing the correlation matrix, significant direct correlations are found between vegetable, maize and wheat production and production values and total agricultural production, respectively (Figure 8). Therefore, the economic indicators associated to these three major crops (maize, wheat and vegetables) can be used as tools in order to control and maximize the productivity of the agriculture sector.

The dependence of the Moldavian economy on its agriculture sector is revealed by the significant direct correlations between the total agricultural production value and total GVA, the agricultural GVA and the agricultural GDP (Figure 8). However, maize and vegetable production values present strong direct correlations with agricultural GDP (Figure 8).

Direct, strong correlations in terms of production are also observed between vegetables, maize and wheat (Figure 8). Grapes also can be correlated directly, in terms of production value with maize (Figure 8). The positive effect of agriculture loans is revealed by the direct correlation of this indicator with total agricultural production (Figure 8). In terms of significant negative correlations, the strongest was recorded between the value of agriculture subsidies and agriculture loans, depicting

the fact that when the agricultural subsidy value is increasing, the agricultural loan value decreases. (Figure 8). Starting from the correlation matrix that displayed the possible parameter relations, we further investigated which relations could be actually formalized through linear or non-linear models through curve-fitting and model detection techniques.

The following part will present several investigated cases, providing a parametric model and the residual plots for assessing whether the observed error (residuals) is consistent with the stochastic error (Appendix A). It also should be mentioned that due to the limited number of samples, as yearly data is analyzed, multiple regression could not be performed, as a minimum of 10 samples should be available for each of the predictors—see Austin and Steyerberg [44]. Model residuals plots should emphasize that it is not possible to predict the error for any given observation, having the residuals consistent with the random error. Residuals should be centered on zero throughout the range of fitted values.

More than that, while using an OLS approach, random errors are assumed to produce residuals that are normally distributed. Therefore, the residuals that should not be correlated with another variable or to each other (if adjacent) should fall in a symmetrical pattern and have a constant spread throughout the range. If there is a non-random pattern in the residuals, this would indicate the deterministic portion (predictor variables) of the model is not capturing some explanatory information that is transferred into the residuals.

For each scenario, several models were tested and those considered more relevant were presented. For identifying how well the models fitted the data, metrics like S (the standard error of the regression), R-squared and adjusted R-squared were used. The standard error of the regression provides the absolute measure of the typical distance that the data points fall from the regression line (Appendix A). The S value represents a number using the same unit as the dependent variable. Normally, smaller values are better as they indicate that the observations are closer to the fitted line.
