*3.4. Potato Harvest Simulation*

The simulation of the potato harvest shows that the increase in this value continued until 2018, after which it decreased. There was observed a significant influence of the cultivation area on the harvesting potatoes. The following equation was adopted in the potato area and harvest forecast model.

$$\mathbf{y} = 9493\mathbf{x}^6 + 556.45\mathbf{x}^5 - 11.925\mathbf{x}^4 + 115.889\mathbf{x}^3 - 51.300\mathbf{x}^2 + 993.431\mathbf{x} + 0.7 \times 10^{-8} \tag{6}$$

The coefficient of determination of this equation was R2 = 84.07%, which makes it highly significant and reliable [32].

Forecasting crops yields can be used to plan the structure of their sowing, both on a microscale, i.e., a farm, and on a macroscale, e.g., a country. On this basis, it is also possible to estimate the profitability of growing a given plant. For potato cultivation, forecasting starch yields would be even more important as starch production is determined by law in the EU. Exceeding it reduces the profit of the grower and starch plants. Therefore, the use of modern prognostic techniques can bring measurable financial benefits and improve the profitability of growing a given species.

Table 2 shows certain regularities of the analyzed yield characteristics and meteorological data. The yield is characterized by a high maximum and minimum value. A high maximum and a relatively low minimum were also observed for rainfall in the April– September period. All meteorological data, with the exception of the April–September air temperature, showed a low skewness coefficient lower than one, which means that it takes negative values for distributions with left-hand asymmetry.

Kurtosis for most of the variables was positive, in a wide range from 0.11 to 10.56, but with a distribution close to normal, which means the more frequent occurrence of extreme values but at the same time a greater probability of the expected values. For the variables of tuber yield and air temperature in the April–shadow–May period, the kurtosis value ranged from −0.99 to −1.33, which means a greater share of values close to the median than in the normal distribution. The results in this case are less focused around the midpoint (Table 2). The standard deviation of the examined variables showed relatively little differentiation throughout the year. The highest values of the standard deviation were recorded for the tuber yield and the total rainfall in the period April–September, and the lowest in the case of the April–September hydrothermal coefficient and air temperature in August–September. The dispersion of the obtained results was characterized by the coefficient of variation, which, being the quotient of the absolute measure of trait variability, made it possible to compare the differentiation of several communities in terms of the same feature and the same data set in terms of several functions. The smaller the value of the coefficient of variation, the more stable the function is. The highest variability of the features described by the coefficient of variation was characteristic for the sum of precipitation and the Sielianinov hydrothermal coefficient for the April–September period, while the air temperature in June–July and August–September turned out to be the most stable (Table 2).

The regression analysis was based on the analysis of Pearson's simple correlation coefficients for the investigated dependent and independent variables (Table 3).


**Table 3.** Correlation coefficients of dependent (y) and independent (x) variable.

Y—total yield; X1—temperature of April–May; X2—temperature of June–July; X3—temperature of August–September; X4—rainfalls of April–September; X5—indicators of hydrothermal of April–September; \* significant at *p* ≤ 0.05 \*\* significant at *p* ≤ 0.01.

The data that most strongly correlated with each other were analyzed using the multiple, polynomial, linear and partially nonlinear regression methods, which allowed the determination of the influence of many independent features on one selected dependent feature and to build an appropriate regression model. Multiple regression was preceded by the analysis of the determination coefficient R2 for the examined features and the determination of the probability coefficient for the absolute statistic t, verified at two significance levels p0.05 (statistically significant difference) and p0.01 (statistically significant difference) [19].
