This section will first present the random forest regression estimation results. This step will help us choose the most important variables to estimate the production function and the inefficiency equation. Next, we will provide the SFA estimation results under three distributions: half-normal, truncated normal, and exponential. After that, we will present the pairwise Vuong test results to identify the model that best suits the data for each crop. Finally, we will discuss the results of the best model.
4.1. Random Forest Regression Results
In this research, we employed random forest regression to identify the key explanatory variables for estimating the production and inefficiency equations. We utilized the
RandomForestRegressor algorithm from the popular
sklearn library in
Python to estimate variable importance. Breiman [
33] defines variable importance as the increase in prediction error resulting from removing a specific variable from the predictors. Before using the importance scores, we performed model validation by splitting the sample into training (75%) and testing (25%) sets, used the root mean absolute percentage error (MAPE), and computed the model accuracy by subtracting the mean MAPE from 100. The accuracy measures ranged from 89.77% for truncated normal distribution for barley to 93.23% for half-normal distribution for wheat. We presented the variable importance results of the random forest in
Table 3 for the production function and
Table 4 for the technical inefficiency estimation under various distributional assumptions.
Our findings on barley reveal that the most significant variables that explain the variation in barley production are the total utility area, quantity of fertilizer, amount of seed, rent payment, crop protection, and machine and lubricants, contributing 67.33%, 18.84%, 4.99%, 3.58%, 1.11%, and 1.04%, respectively. These six variables together account for approximately 96% of the barley production variation. Consequently, we retained these six variables to estimate the production function under diverse technical inefficiency distributional assumptions.
Regarding common wheat, based on the random forest analysis, the largest contributor to the variation in output is the total utility area, which accounts for 67.39%. Other variables, such as seed quantity, crop protection, rent payment, cost of own capital, and amount of energy, explain smaller percentages in the variation (10.32%, 5.93%, 5.78%, 2.31%, and 2.28%, respectively). Due to collinearity issues, we excluded the cost of own capital and the quantity of energy and instead included machine and building upkeep expenses. With these changes, the specification now explains approximately 91% of the total variation in wheat production.
For grain maize, the random forest variable importance analysis indicates that the largest contributors to the variation in output are the seed quantity, total area, expenditure on contract work, rent payment, and machine and lubricants quantity, which account for roughly 93%. However, due to multicollinearity, we removed the expenditure on family labor and added the expenditure on crop protection instead.
The random forest regression analysis findings are displayed in
Table 4, examining the impact of various macroeconomic and agricultural policies on barley output as technical inefficiency factors. The analysis was conducted under three distributional assumptions. The results suggest that the percentage of agriculture value added in GDP is the most significant factor affecting the technical efficiency of barley crops, accounting for 33.09%, 30.55%, and 33.15% of the total variation for half-normal, exponential, and truncated normal distributions, respectively. Conversely, the intermediate consumption subsidies are the least important factor, contributing only 1.49% and 1.32% of the total variation under half-normal and truncated normal distributions, respectively. However, for the exponential distribution, total direct payments are the least significant factor, accounting for approximately 1.15% of the total variation in technical efficiency.
Table 4 displays the random forest regression analysis results for technical inefficiency variables of the common wheat crop using three distributions: half-normal, exponential, and truncated normal. The analysis reveals that the unemployment rate (as a percentage of the labor force) is the most significant predictor of the total technical efficiency for common wheat. It accounts for 9.55%, 12.73%, and 8.76% of the total variation in technical efficiency under the three distributions, respectively. Conversely, the imports of goods and services (as a percentage of GDP) have the least impact on the common wheat crop, accounting for only 1.05%, 1.02%, and 1.07% of the total variation in technical efficiency under the three distributions.
The study found that female employment in the industry sector is the most crucial variable in predicting the technical efficiency of grain maize. This variable accounts for 35.49%, 19.85%, and 25.77% for half-normal, exponential, and truncated normal distributions, respectively (refer to
Table 4 for details). On the other hand, the least important variable varies based on the distribution used. For example, the variable exports of goods and services (as a proportion of GDP) is the least important variable under the half-normal distribution. In the exponential distribution, the total population accounts for approximately 0.85% of the technical efficiency variation. Finally, under the truncated normal distribution, the total labor force accounts for around 0.94% of the total variation in the technical efficiency of grain maize. To sum up, it is important to note that the random forest variable importance provides a ranking of the relevant variables. However, this ranking may be deviated from when some variables have multicollinearity issues.
4.2. Stochastic Frontier Analysis Results
Table 5,
Table 6 and
Table 7 show the maximum likelihood results of estimating the stochastic frontier for the three crops across the countries considered, along with the findings of the effect of macroeconomics and agricultural policy variables on technical inefficiency.
Table 5 displays the maximum likelihood estimation results of the stochastic frontier analysis and the impact of macroeconomic and policy variables on technical inefficiency for barley under three different distributions: half-normal, exponential, and truncated normal. Except for the quantity index of seed, machines, and lubricants, all coefficients exhibit statistical significance and align with the anticipated direction.
Assuming all other factors remain constant (ceteris paribus), the results indicate that a 1% increase in total utility area leads to a 0.835% increase in total barley production for the half-normal distribution. Similarly, a 1% increase in the fertilizer quantity index would result in a 0.207% increase in the overall barley production. Moreover, a 1% increase in rent payment and crop protection quantity index would increase the total barley production by 0.203% and 0.139%, respectively. However, a 1% rise in the seed quantity index reduces barley production by 0.122%, while a 1% increase in the quantity index of machines and lubricants will reduce the barley production by 0.153%.
Under the exponential distribution, a 1% increase in total utility area results in a 0.809% increase in overall barley production. Similarly, a 1% increase in the fertilizer quantity index would result in a 0.238% increase in the overall barley production. In contrast, a 1% increase in rent payment and crop protection quantity index resulted in 0.177% and 0.134% increases in the total barley production, respectively. However, a 1% rise in the seed quantity index lowers the barley production by 0.099%, while a 1% increase in the quantity index of machines and lubricants results in a 0.155% reduction in overall barley production.
Finally, under the truncated normal distribution, the results show that a 1% increase in total utility area enhances the overall barley production by 0.863%. Similarly, a 1% increase in the fertilizer amount index results in a 0.09% increase in barley production. Additionally, a 1% increase in rent payment and crop protection quantity index increased the total barley production by 0.125% and 0.091%, respectively. However, a 1% increase in the seed quantity index reduces barley production by 0.098%, while a 1% increase in the quantity index of machines and lubricants resulted in a 0.056% decrease in barley production.
Table 5 illustrates the influence of different macroeconomic and subsidy variables on the technical efficiency of barley production under three distributional assumptions. The findings indicate that certain variables significantly impact the overall technical efficiency of barley production, while others demonstrate no effect.
In the context of the half-normal distribution, factors such as male employment in the industry, industry value added as a percentage of GDP, male waged and salaried workers, inflation, and imports of goods and services exhibit no statistically significant impact on technical efficiency. Conversely, variables like female wage and salaried workers, female employment in agriculture, and the total labor force show a negative yet statistically significant effect on technical inefficiency. This suggests that these variables contribute to enhancing the technical efficiency of barley production.
Conversely, factors such as forest area, other subsidies, female employment in industry, male employment in agriculture, exports of goods and services as a percentage of GDP, urban population, and total crop subsidies demonstrate a statistically significant adverse impact on the technical efficiency of barley production. Furthermore, under the half-normal distribution, variables like male employment in industry, industrial value added as a proportion of GDP, male wages, inflation, and imports of goods and services do not exert a significant effect on technical efficiency and inefficiency in barley production.
In the context of exponential distribution, the findings indicate that female wages, female employment in agriculture, and the total population exhibit a statistically significant positive impact on the technical efficiency of barley production. Conversely, factors such as forest area, male wages, the total unemployment rate, and exports of goods and services as a percentage of GDP have a statistically significant adverse effect on the technical efficiency of barley production. Additionally, variables like foreign direct investment, other subsidies, female employment in the industry, total support for rural development, and agricultural land as a percentage of total land do not exert a significant influence on technical efficiency in barley production.
The regression analysis presented in
Table 5 also examines the impact of macroeconomic and policy variables on technical efficiency within the truncated normal distribution. The results suggest that female wages, foreign direct investment, female employment in agriculture, agricultural land as a percentage of total land, and the total population demonstrate a statistically significant positive effect on the technical efficiency of barley production. Conversely, factors such as male employment in the industry, male wages, exports of goods and services as a percentage of GDP, and the total unemployment rate have a statistically significant negative impact on the technical efficiency of barley. Additionally, other subsidies, industry value added as a percentage of GDP, and total crop subsidies significantly influence the technical efficiency.
Table 6 illustrates the regression of frontier analysis and the influence of macroeconomic and policy variables on technical efficiencies for common wheat in some European Union member nations using the maximum likelihood estimator under the three distributional assumptions. The estimation results show that, except for the quantity index of seed under the half normal and truncated normal distribution, all inputs are statistically significant for wheat production. On a ceteris paribus basis, a 1% increase in total utility area results in a 0.617% increase in total wheat production under half-normal. Moreover, a 1% increase in the seed quantity index increases wheat production by 0.07%. Additionally, a 1% increase in the crop protection index results in a 0.522% increase in total wheat production. In comparison, a 1% increase in the machines and buildings’ upkeep yields a 0.171% increase in the total wheat production. However, a 1% increase in the rent payment, on the other hand, reduces the total wheat production by 5.101%.
The estimation under exponential distribution shows that increasing the total utility area by 1% boosts the overall wheat production by 0.538%. Besides that, increasing the seed quantity index by 1% increases the wheat production by 0.141%. Furthermore, increasing the quantity index crop protection by the same amount results in a 0.587% increase in total wheat production. In contrast, a 1% increase in rent payment resulted in a reduction in total wheat production of 5.734%. Additionally, a 1% increase in the machines’ and buildings’ upkeep implies a 0.16% increase in total wheat production.
Table 6 also provides the estimation results for the production frontier for the wheat under a truncated normal distribution. The results, ceteris paribus, reveal that a 1% increase in total utility area increases wheat production by 0.632%. At the same time, the same increase in the seed quantity index results in a 0.037% increase in total wheat production. Besides that, a 1% increase in the quantity index of crop protection yields a 0.46% increase in wheat production. However, raising the rent payment by 1% decreases the wheat production by 4.379%. The same increase in the machines and buildings’ upkeep resulted in a 0.167% decrease in the total wheat production.
Regarding the technical efficiency,
Table 6 outlines the impact of various macroeconomic and subsidy factors on the technical efficiency of wheat production under different distributional assumptions. Several variables exhibit a statistically significant effect, while others show no discernible impact. For instance, within the half-normal distribution, total direct payments, total population, female wages, female agricultural employment, and agricultural land as a percentage of total land all exert a statistically significant positive influence on the technical efficiency of wheat production. Conversely, the total labor force, urban population, male employment in the industry, male employment in agriculture, total unemployment rate, other subsidies, and industry value added as a percentage of GDP demonstrate a statistically significant negative effect on the technical efficiency of wheat production. Furthermore, decoupled payments, agricultural value added as a percentage of GDP, forest area as a percentage of total land area, total support for rural development, and imports of goods and services for annual growth do not exhibit a significant influence on technical efficiency for wheat production under a half-normal distribution.
Furthermore, the results within the exponential distribution framework reveal that male wages, agricultural land as a percentage of total land, and annual population growth display a statistically significant positive impact on the technical efficiency of wheat production. Conversely, male employment in the industry, the overall unemployment rate, and total subsidies on livestock exhibit a statistically significant negative effect on the technical efficiency of wheat production. Additionally, the imports of goods and services, female employment in the industry, and foreign direct investment do not exert a significant influence on the technical efficiency of wheat production.
The regression outcomes presented in
Table 6 further illustrate the impact of macroeconomic and policy variables on technical efficiency under the normal truncated distribution. The results reveal that total direct payments, total population, male wages, the forest area as a percentage of total land area, and agricultural land as a percentage of total land all exhibit a statistically significant positive effect on the technical efficiency of wheat production. Conversely, decoupled payments, the total labor force, urban population, male employment in the industry, male employment in agriculture, the total unemployment rate, other subsidies, and total subsidies on livestock demonstrate a statistically significant negative influence on the technical efficiency of wheat production. Furthermore, industry value added as a percentage of GDP, female employment, and foreign direct investment do not show a significant influence on the technical efficiency of wheat production.
Regarding the grain maize results,
Table 7 provides the results of the SFA and inefficiency estimates. All inputs in grain maize production are statistically significant except for contract work, the quantity index of machines and lubricants under half-normal and truncated normal distributions, and the quantity index of machines and lubricants under the exponential distribution. According to this research, a 1% increase in the quantity index of seed under half-normal distribution results in a 0.062% increase in total grain maize production. At the same time, a 1% increase in total utility area results in a 0.888% increase in the total grain maize production. Furthermore, a 1% increase in contract work and rent payment increased grain maize production by 0.047% and 10.796%, respectively. The same increase in the quantity index of machines and lubricants will increase the grain maize production by 0.019%. However, a 1% increase in the crop protection quantity index reduces the output of grain maize by 0.896.
Additionally, under exponential distribution, the regression findings indicate that increasing the quantity index of seed by 1% increases the output of grain maize by 0.077%. Moreover, increasing the total utility area by 1% increases the grain maize production by 0.893%. Furthermore, increasing contract work, rent payment, and the quantity index of machines and lubricants by 1% improves grain maize production by 0.075, 9.416, and 0.01%, respectively. In comparison, a 1% increase in the quantity index of crop protection leads to a 0.82% decrease in grain maize production.
For the regression under a truncated normal distribution, the results indicate that a 1% increase in the quantity index of seed enhances the grain maize production by 0.05%. Similarly, a 1% increase in total utility area enhances it by 0.917%. Furthermore, increasing the contract work by 1% will increase the grain maize output by 0.008%. Furthermore, an increase in rent payment and machine lubricant quantity index by 1% will increase overall grain maize production by 8.889% and 0.035%, respectively. Finally, a 1% increase in the crop protection quantity index will reduce the grain maize production by 0.741%.
Concerning the estimation of the inefficiency equation,
Table 7 delineates the impact of various macroeconomic and subsidy variables on the technical efficiency of grain maize production under different distributional assumptions. Most variables exhibit a statistically significant effect, while others show no discernible impact. For instance, within a half-normal distribution, male employment in the industry, male employment in agriculture, total support for rural development, and male wages all demonstrate a statistically significant positive effect on the technical efficiency of grain maize production. However, female employment in agriculture, female wages, agricultural land as a percentage of total land, and GDP all reveal a statistically significant negative effect on the technical efficiency of grain maize production. Furthermore, female employment in the industry, total direct payments, total subsidies on crops, other subsidies, decoupled payments, the total unemployment rate, and inflation do not exhibit a significant effect on the technical efficiency of grain maize production.
In the context of exponential distribution, male wages, female employment in the industry, male industry employment, and male employment in agriculture all exhibit a statistically significant positive impact on technical efficiency. Furthermore, employment in agriculture and female wages demonstrate a statistically significant negative influence on the technical efficiency of grain maize production. Additionally, the total rural development support, male wages, GDP annual growth, inflation, agricultural value added as a percentage of GDP, exports of goods and services, annual growth, and foreign direct investment do not have a significant effect on the technical efficiency of grain maize production.
Under truncated normal distribution, the results reveal that male employment in agriculture, male wages, female employment in the industry, and male employment in the industry significantly positively affect the overall technical efficiency of grain maize production. However, female employment in agriculture, agricultural land as a percentage of total land, GDP annual growth, and female wages significantly negatively impact the technical efficiency of grain maize production. Moreover, total support for rural development, total direct payment, total subsidies on crops, other subsidies, decoupled payments, and total subsidies on livestock do not have a significant effect on the technical efficiency of grain maize production.
4.4. Efficiency Results
When averaging across all countries and years, the estimates for barley’s technical efficiency scores range from 0.803 under the half-normal distribution to 0.835 under the exponential distribution, with a mean score of 0.812 (refer to
Table 9). For wheat, the figures are slightly higher, with average technical efficiency scores of 0.855 under the half-normal distribution, 0.843 under the exponential distribution, and 0.825 under the truncated normal distribution. However, maize exhibits the lowest technical efficiency scores, with values of 0.716, 0.774, and 0.755 under half-normal, exponential, and truncated normal distributions, respectively.
Table 10 presents the technical efficiency estimates distribution for barley, wheat, and maize across the countries considered in this study under the half-normal distribution. The highest efficiency level is in the United Kingdom, while Finland has the lowest technical efficiency. Specifically, the average level of technical efficiency estimates for the United Kingdom is 0.98, with a minimum of 0.97, a maximum of 0.99, and a standard deviation of 0.077. For Finland, the average is 0.65, with a minimum of 0.58, a maximum of 0.74, and a standard deviation of 0.050. It is surprising that other countries, such as Estonia, Lithuania, Romania, and Spain, have technical efficiencies lower than 0.70.
For wheat production, France has the highest technical efficiency, while Greece has the lowest. Using a half-normal distribution, France’s average level of technical efficiency is 0.993, with a minimum of 0.981 and a maximum of 1. Within the same distribution, however, the average for Greece is 0.56, with a range of 0.40–0.65 and a standard deviation of 0.086. The United Kingdom has an average efficiency of 0.985, with a minimum of 0.971, a maximum of 0.994, and a standard deviation of 0.008.
Regarding the grain maize, France has the highest technical efficiency, while Romania has the lowest. France’s average level of technical efficiency, using a half-normal distribution, is 0.994, with a minimum of 0.986 and a maximum of 1. The average for Romania is 0.48, ranging from 0.244 to 0.850. Bulgaria is another country with low technical efficiency in producing grain maize. Over the years, the TE averages 0.598, with a minimum of 0.397 and a maximum of 0.819. On the other hand, Poland and Slovakia’s maize productions exhibit a TE lower than 0.70.
However, the results provide a different story when examining the variation across the countries and years, as summarized in
Figure 4,
Figure 5 and
Figure 6. The advantage of boxplots over summary statistics is that the former provide more information on the distribution and variability of the variable under consideration. For example, the boxplot provides information on the minimum, the maximum, the median, the first quartile, and the third quartile. It also provides information on the presence of outliers, the symmetry and skewness of the distribution, and whether the data are tightly grouped or not.
Figure 4 gives the boxplot of barley’s technical efficiency score estimates under the half-normal distribution across the countries considered in the study. In the case of this study, the boxplot graph for each country provides the distribution of the technical efficiency score estimates over the years.
Examining the boxplot above, we notice a significant variability in the technical efficiency estimates. For example, the U.K. has the shortest boxplot, implying that there has not been a lot of variation in technical efficiency scores over the years. On the other hand, countries such as Poland, Romania, Spain, and Sweden have comparatively taller boxplots, indicating the higher variation of the technical efficiency estimates over the years. Another striking result is the inequality of the median values of the technical efficiency score estimates. Western European countries, such as the U.K., Germany, Ireland, and France, show high median values compared to Eastern European countries, such as Estonia, Lithuania, Romania, and Poland. The technical efficiency estimates are lower for Spain than other Western European countries, with scores as low as 0.491 compared to 0.972 in the U.K., 0.835 in Germany, or 0.820 in France.
Figure 5 provides the boxplot for technical efficiency for wheat across producing countries in Europe. We observe that countries such as the UK, Germany, France, and Italy show slight variations in technical efficiency scores over the years. In contrast, countries such as Spain, Greece, Estonia, Croatia, Lithuania, Latvia, Romania, and Slovakia exhibited high variability in technical efficiency scores over the years. It is also striking to observe that Spain and Greece, which have been part of the European Union since the 1980s, produce common wheat at a lower technical efficiency than countries such as Lithuania, Latvia, Croatia, and Romania, which recently joined the union.
Regarding the variability of the technical efficiency scores,
Figure 6 displays their boxplots under the half-normal distribution. We notice that countries such as Germany, France, Spain, Austria, and Greece have their score values oscillating between 0.9 and 1 over the years, with a small amount of variance. In comparison, Romania’s technical efficiency scores vary considerably over the years between 0.24 and 0.65.