Developing a Sustainable Machine Learning Model to Predict Crop Yield in the Gulf Countries

Assous, Hamzeh F.; AL-Najjar, Hazem; Al-Rousan, Nadia; AL-Najjar, Dania

doi:10.3390/su15129392

Open AccessArticle

Developing a Sustainable Machine Learning Model to Predict Crop Yield in the Gulf Countries

¹

Finance Department, School of Business, King Faisal University, Al Ahsa 31982, Saudi Arabia

²

Department of Computer, Abdul Aziz Al Ghurair School of Advanced Computing (ASAC), Luminus Technical University College, Amman 11732, Jordan

³

MIS Department, Faculty of Business, Sohar University, Sohar 311, Oman

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(12), 9392; https://doi.org/10.3390/su15129392

Submission received: 10 May 2023 / Revised: 7 June 2023 / Accepted: 8 June 2023 / Published: 11 June 2023

(This article belongs to the Section Sustainable Food)

Download

Browse Figures

Versions Notes

Abstract

:

Crop yield prediction is one of the most challenging tasks in agriculture. It is considered to play an important role and be an essential step in decision-making processes. The goal of crop prediction is to establish food availability for the coming years, using different input variables associated with the crop yield domain. This paper aims to predict the yield of five of the Gulf countries’ crops: wheat, dates, watermelon, potatoes, and maize (corn). Five independent variables were used to develop a prediction model, namely year, rainfall, pesticide, temperature changes, and nitrogen (N) fertilizer; all these variables are calculated by year. Moreover, this research relied on one of the most widely used machine learning models in the field of crop yield prediction, which is the neural network model. The neural network model is used because it can predict complex relationships between independent and dependent variables. To evaluate the performance of the prediction models, different statistical evaluation metrics are adopted, including mean square error (MSE), root-mean-square error (RMSE), mean bias error (MBE), Pearson’s correlation coefficient, and the determination coefficient. The results showed that all Gulf countries are affected mainly by four independent variables: year, temperature changes, pesticides, and nitrogen (N) per year. Moreover, the average of the best crop yield prediction results for the Gulf countries showed that the RMSE and R² are 0.114 and 0.93, respectively. This provides initial evidence regarding the capability of the neural network model in crop yield prediction.

Keywords:

crop yield prediction; food security; neural network; gulf countries; Pearson’s correlation

1. Introduction

The recent global COVID-19 pandemic has raised awareness around the topic of food sustainability on both the country level and worldwide. As the global population increases, it is vital to adopt farming and planting practices that ensure a sustainable food supply for the whole world. Food sustainability is important for determining the ability to produce enough food for the world’s current population, as well as for future generations. In addition, food sustainability pertains to the production of food in a manner that safeguards the environment, optimizes the utilization of natural resources, enables farmers to sustain themselves, and improves the well-being of the communities involved in food production, which encompasses both humans and animals. Accordingly, the main concern of food sustainability is to prevent world hunger, and the best way to achieve this is by increasing crop yields across the globe. Accordingly, researchers have increased their focus on analyzing crop productivity and crop yield to understand the effect of direct and indirect variables on crop yield prediction.

Gulf countries have become an area where many scholars investigate different financial factors [1,2,3]. Along with other countries around the world, Gulf countries are increasingly concerned with food sustainability and crop yield prediction. Gulf region countries face serious challenges regarding agriculture, which causes problems in reaching the level of food sustainability needed. The reasons behind this is primarily the climate of the Gulf. The climate in most of the Gulf region’s countries, such as Saudi Arabia, Qatar, and the UAE, is characterized as a desert climate. This comprises a semi-arid climate with extremely long summers, intense dry heat, and high temperatures. Additionally, Kuwait has a hyper-arid desert climate that is highly variable with recurrent extremes. However, Bahrain is an arid country with mild, pleasant winters, and summers that are very hot and humid. Finally, Oman experiences a subtropical arid climate, characterized by scorching hot and dusty winds, as well as summer monsoons. In addition, water deficiency is the main constraint for the region, as the irrigation in the Gulf Cooperation Council (GCC) region suffers from a lack of water supply and rainfall. Moreover, the problem of water deficiency is increasing with a sharp decline in the levels of groundwater with no lakes or rivers to compensate. Additionally, the region suffers from a limited amount of arable land with almost no grassland. So, most of the GCC countries have minimized the effect of this lack of water and arable land by purchasing agricultural land in different countries (e.g., the USA and Argentina). The GCC countries are well known and have high levels of exports of crops such as dates, watermelon, and potatoes. All GCC countries, except Bahrain, exports corn, and all GCC countries except Bahrain and the UAE export wheat.

The concept of food sustainability has been extensively researched, and in recent years, it has gained more attention in the public consciousness. Many researchers have tried to develop different models for crop yield prediction in different countries using variables that have a direct or indirect relationship with crop yield. Researchers attempted to improve the prediction models by enhancing the performance of the predictors, for instance through machine learning optimization [4] and the Internet of Things [5]. Additionally, many scholars have attempted to enhance prediction models through remote sensing [6,7,8]. Other scholars are interested in improving the performance of crop yield prediction by selecting the most important feature that can improve crop yield prediction, partially utilizing prediction model selection. Researchers have attempted to use various methodologies to confirm the capability of these selected features [9,10,11,12].

Moreover, researchers have focused on developing prediction models using various variables, prediction models, and selection techniques. Many authors have focused on European, American, and some Asian countries using meteorological variables [13,14]. It is very rare for research to discuss Gulf countries and the effect of climate change and metrological variables on their food security and sustainability [15,16,17,18,19,20]. Moreover, many researchers in the field of crop yield prediction have concentrated on developing a prediction system to help decision making and to benefit from the huge amount of information generated from various data sources. Researchers have focused on extracting different variables directly or indirectly related to crop productivity to help managers, researchers, and governments to understand the relationship between the selected variables and crop yield as well as to develop an accurate prediction system for predicting the highest yielding products of a country.

With this in mind, we reviewed different articles and found knowledge gaps in previous works. In our study, we are making the following contribution to the literature. First, our study explores the relationship between the yields of crop (i.e., wheat, dates, watermelon, potatoes, and maize (corn)) and selected independent variables (i.e., rainfall, temperature changes, year, pesticides, and nitrogen (N)) in Gulf countries by using Pearson’s correlation analysis. Second, we develop prediction models for crop yields in all Gulf countries using neural network models. Third, this study extracts the most important variables affecting crop yield and prediction in Gulf countries. Fourth, we summarize the most significant relevant literature that has examined the impact of the pandemic on various equity markets.

2. Literature Review

The prediction of food sustainability and crop yield is not a new field of study; however, a few researchers have tried to develop different models for crop yield prediction in different countries. Generally, researchers concentrate on how to enhance the planting of different crops. Khaki et al. (2021) [14] applied the convolutional neural network model (CNN), for instance. Their dataset contained 1132 datapoints for corn and 1076 datapoints for soybean from different places in the United States. Their findings demonstrated that the prediction model’s performance is very high, and the system can predict the productivity of soybeans and corn from one to four months before the harvest.

Additionally, Rashid et al. (2021) [21] investigated palm oil yield. They analyzed the main features, prediction methods, and methodologies used in palm oil yield. Moreover, their study showed the current and future improvements in the remote sensing of the crop yield area and the major disease recognition in the crop yield. However, to predict the crop yield of different crops (i.e., sunflower, wheat, sugar beet, potatoes, and spring barley), Paudel et al. (2021) [22] developed models that applied both machine learning and agronomic principles of crops. The dataset contains the five crops as well as weather, remote sensing, and soil data. The developed model proved its capability of predicting large-scale crop yields in the Netherlands, Germany, and France.

Many scholars have used machine learning in developing crop yield forecasting models. Yamaç and Todorovic (2020) [23] evaluated the daily potato evapotranspiration prediction system in southern Italy using three machine learning models, including k-nearest neighbor, artificial neural networks, and adaptive boosting models. These machine learning models were used with meteorological variables to enhance the accuracy of the prediction. The findings revealed that when limited meteorological data are available, the k-nearest neighbor model performed better than other models. If sufficient meteorological data were available, the neural network model performed better than other models. Moreover, Burhan (2022) [24] developed multiple machine learning models to predict crop yields using meteorological parameters and pesticides. Their dataset comprised nine major crops in Turkey in the period from 1990 to 2019. After training the machine learning models, the findings demonstrated that decision tree and random forest regression were the most accurate models. Moreover, the support vector regression showed the worst performance and inconsistent predictions.

Other researchers applied deep-learning machine techniques to build crop yield models. Oikonomidis et al. (2022) [25] developed deep learning models to assess the performance of different algorithms in predicting crop yield. The crop yield dataset represents soybean information with 395 features and 25,345 samples. The results showed that convolutional neural networks (CNN) and deep neural networks (DNN) models outperformed other models in the field. The RMSE, MSE, MAE, and R2 results of CNN–DNN were 0.266, 0.071, 0.199, and 0.87, respectively. The results revealed that the combination of neural network models can improve the performance of soybean prediction. Jin et al. (2019) [26] found that combining the leaf area index with 15 vegetative indices made the results of the deep neural network more accurate. Moreover, Khaki and Wang (2019) [13] proposed a deep neural network (DNN) approach to predict the maize yield in different locations. The proposed model produced greater accuracy with a root-mean-square error (RMSE) of 12% of the average yield and 50% of the standard deviation. The model outperformed other approaches in the field due to the ability of the feature selection to minimize input space dimension without degrading the accuracy of the model. The results revealed that environmental factors showed better performance in crop prediction than genotypes.

Furthermore, Srivastava et al. (2022) [27] analyzed machine learning and deep learning models’ performances and compared the two techniques to develop a winter wheat prediction model. To increase the prediction models’ performances, many various factors were applied (i.e., weather, soil, crop genotype, and management practices). The dataset comprised data ranging from 1999 to 2019 in 271 countries and Germany in particular. The findings revealed that the convolution neural network outperformed all the tested models. The results revealed that in predicting the winter wheat yield, both the wind speed and the radiation amount were the most important features.

Both Nevavuori et al. (2019) [28] and Haque et al. (2020) [29] used neural networks to build crop yield prediction models. Nevavuori et al. (2019) [28] applied CNNs to build a prediction model for crop yield. The study used the Adadelta training algorithm with six convolutional layers to improve the capability of the prediction model. The outcomes showed that the CNN network was able to predict crop yield. Haque et al. (2020) [29] used a neural network model for 140 datapoints to build a prediction model for crop yields. The performance analysis of the mean square error and standard deviation showed high accuracy compared to other models in the field.

Bhullar et al. (2023) [30] presented a data-driven multi-layer perceptron that predicted the land suitability of several crops in Canada. The crop yield dataset included several types of crops, such as barley, peas, spring wheat, canola, oats, and soy, collected between 2013 and 2020. To develop a fast prediction model and to focus on areas where crops are cultivated and harvested, a multi-layer perceptron of crop yields that were downscaled to the farm level were used. Through k-fold cross-validation, the multi-crop model showed significant improvements, reducing the mean absolute error by up to 2.82 times compared to single-crop models. The suggested multi-crop model has the potential to assist in evaluating the suitability of northern lands for agriculture and provide valuable insights for cost–benefit analyses.

Ed-Daoudi et al. (2023) [31] investigated the performance of machine learning models in crop yield prediction in Morocco using various factors, including weather patterns, soil moisture levels, and rainfall. The developed prediction models were compared with the most important statistical models in the field to understand the effectiveness of machine learning models. The collected results proved that machine learning models were able to improve the performance of crop prediction in Morocco, which led to enhanced food security for farmers.

Ps and Bhargavi (2019) [32] investigated the relationship between feature selection and machine learning models and their ability to improve crop yield prediction. The authors used artificial neural networks, support vector regression, K -nearest neighbor and random forest (RF) as prediction models with 70% of the dataset used for training and 30% for testing. The results showed that random forest had the highest accuracy compared with the other models.

Krithika et al. (2022) [33] developed a prediction model to predict and analyze groundnut crop yield variables by using various factors, including irrigation, rainfall, area, and production data. The authors applied a feature selection model to select the variable most connected with the groundnut crop yield. To validate the capability of the used model, the study adopted various performance metrics. The results found that the east absolute shrinkage, selection operator, and ElasticNet achieved the best results compared with the other models in the field.

Only a few researchers have investigated crop yields in any of the Gulf countries. However, Blooshi et al. (2020) [19] examined how climate change affects different agricultural indicators in the United Arab Emirates. Examples of these indicators are groundwater features and the number of livestock. The outcomes showed that farmers believe that managing a farm is easier with higher profits today than in the past. Additionally, the levels and quality of groundwater are now different, while the quality and the quantity of the products have increased compared to 20 years ago. Al-Adhaileh and Aldhyani (2022) [20] explored the effects of agriculture characteristics on crop yield for the period 1994 to 2016 using an artificial intelligence model. The characteristics include environmental and agrotechnical indicators (i.e., temperature, insecticides, and rainfall). The crop yields included potatoes, rice, sorghum, and wheat in Saudi Arabia. The findings revealed that each indicator has an equal influence on the yield prediction model.

A few scholars have investigated the agricultural situation and the ways that it can be used to enhance the productivity of conventional freshwater agriculture in the Gulf region. Researchers have studied the limited water sources (i.e., produced water, groundwater, and seawater) and how farmers can benefit from them when planting different crops [15,16,17,18,30,31,32,34,35,36]. In addition, the majority of the models proved that using a neural network model is more efficient in crop yield prediction. Therefore, the authors suggested using a neural network model with fewer hidden layers to prove the capability of the neural network in crop yield prediction.

Finally, all previous models aimed to investigate the performance of crop yield prediction using various independent variables and by selecting the most crop yields in the country. In this paper, we aimed to investigate the performance of selected crop productivity components in the Gulf countries by using a neural network model with fewer input variables. All the Gulf countries will be analyzed using a neural network model with two layers to decrease the complexity of the proposed model.

3. Research Methodology

The research methodology of the proposed model is divided into three steps: data collection, correlation analysis, and neural network prediction. For data collection, various web sources are used to collect the dataset. After cleaning and removing the outlier data, a correlation analysis is applied. Finally, the independent variables for each crop yield are forwarded to the neural network model to develop a prediction model.

3.1. Data Collection

Various sources are used to collect the dataset for crop yield prediction, including the Food and Agriculture Organization (FOA) and our global data. The data collected include the crop (i.e., dates, wheat, maize (corn), watermelon, and potatoes), year, country, temperature changes per year, total pesticides per year, rainfall per year, and nitrogen (N) fertilizer used per year. The dataset was collected using the yearly information dataset, including average and total values per year. Data collection began by merging the independent and dependent variables on a year-value basis. Thereafter, the analysis was carried out to remove the dataset’s outlier data and missing values. The dataset covered the years 2000 to 2021 with the yields of five crops: dates, wheat, maize (maize), watermelons, and potatoes. All the Gulf countries were covered in order to generalize the study, namely Bahrain, Kuwait, Oman, Qatar, Saudi Arabia, and the United Arab Emirates. To sum up, the proposed models’ parameters are listed in Table 1. The research included the most important and common crops that Gulf countries usually export to other countries. Moreover, the average temperature changes for two consecutive days throughout the year, the total amount of pesticides used per year for each studied crop, the amount of rainfall per year, and the total amount of nitrogen (N) fertilizer per year. Moreover, the year variable was included after pre-testing the relationship between crop yields and the year variable, and the analysis found that there was a strong relationship between year and different crops.

After collecting the dataset, the processing techniques were used to clean the dataset. The processing techniques include removing the missing data, outlier data, and wrong data that were not suitable for the research (i.e., removing variables with high missing values). After the cleaning process, the generated data were analyzed and resampled to build a prediction model. Using all the data collected for the prediction model is not appropriate. The generated data were split into two parts randomly with 30% and 70% of the dataset required to test and train the data, respectively.

As the second step in developing a prediction model for the proposed crop yields, Pearson’s correlation analysis was used to study the linearity in the included variables. Moreover, the full analysis of all the collected data is shown in the Supplementary Materials.

3.2. Correlation Analysis for Gulf Countries

Pearson’s correlation analysis was used to understand the relationship between independent and dependent variables for each country. This test aims to identify the possibility of finding a linear relationship between independent and dependent variables. This test’s results can help researchers understand the trend in crop yields and the expected growth for the future. To find the Pearson’s correlation, the following equation was used:

P e a r s o n ’ s C o r r e l a t i o n = \frac{N \sum x y - (\sum x) (\sum y)}{\sqrt{[N \sum x^{2} - {(\sum x)}^{2}] [N \sum y^{2} - {(\sum y)}^{2}]}}

(1)

N, x, and y are the number of samples, the original values of the crop yields, and the predicted values, respectively.

3.3. The Methodology of Developing Neural Network Models for Crop Yields Prediction

Pearson’s correlation analysis obtains the relationship between independent and dependent variables in order to understand the linear trend in the collected data. Unfortunately, Pearson’s correlation does not calculate the dependent variables by combing the independent variables. Therefore, a machine learning model to predict crop yield using independent variables was considered. This research adopted a neural network model to develop prediction models, as shown in AL-Rousan et al. (2021) [37] and AL-Najjar et al. (2022) [11], due to the ability of the neural network architecture to learn nonlinear and complex relationships between independent variables.

The neural network architecture contains three layers: input, hidden, and output. The input layer is responsible for receiving the independent variables (i.e., the temperature change, year, and rainfall) of the model, whereas the output model provides the dependent variables of the model (i.e., the crop yield value). The hidden layer combines all the independent and dependent variables using node weights; thereafter, the activation function acts as a nonlinear function in the hidden and the output layers to generate a nonlinear function. The hidden layer combines all the independent and dependent variables using node weights to generate a nonlinear function. Finding the optimal weights is known as the training phase, and predicting the future data is denoted as the test phase. Training and testing are required to develop a prediction model for the different crop yields. To develop a prediction model, the collected data were divided into train and test datasets with 70% and 30% of the data, respectively. The architecture of the developed neural network model is shown in Figure 1.

The predicted crop yield values are denoted as the output layer. The prediction output of the

k

th node in the output layer is calculated as follows:

C r o p Y i e l d = y_{k} = f (\sum_{j = 1}^{J} O_{j} w_{j k} + b_{k})

(2)

where

O_{j}, w_{j k}, and b_{k}

are the output of the last hidden layer, the weight between the j node and k node, and the bias of the k node, respectively. Moreover, to calculate the output of the hidden layers, different equations are used, as discussed in [37]. The hidden layer outputs are computed in the same way as Equation (2), with an additional adjustment in the number of neurons in each hidden layer and the activation function used for the hidden nodes. To build a neural network model, the architectural parameters of the neural network model must be determined. The tuned neural network parameters are shown in Table 2.

The collected values were calculated in the hundreds of thousands to the millions; therefore, a normalization formula was utilized to improve the prediction model’s performance. The normalization process aims to transform a dataset between zero and one.

N o r m a l i z e d v a l u e = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(3)

Moreover, to evaluate the performance of the crop yield predictors, four error metrics were used, namely mean absolute error (MAE) and root mean square error (RMSE), mean bias error (MBE), and mean square error (MSE), as well as the R square (R² is the determination coefficient) values. These are calculated as shown below:

R^{2} (determination coefficient) = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - \hat{y_{ι}})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(4)

M A E = \frac{1}{N} \sum_{i = 0}^{N} |y_{i} - \hat{y_{ι}}|

(5)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 0}^{N} {(y_{i} - \hat{y_{ι}})}^{2}}

(6)

M B E = \frac{1}{N} \sum_{i = 0}^{N} y_{i} - \hat{y_{ι}}

(7)

M S E = \frac{1}{N} \sum_{i = 0}^{N} {(y_{i} - \hat{y_{ι}})}^{2}

(8)

where

y_{i}

,

\hat{y_{ι}}

, and

\bar{y}

are the crop yields, the predicted crop yields, the mean of the crop yields, respectively. In addition, N and I are the number of crop yield samples and the element index in the dataset, respectively. Finally, the proposed model is designed as shown in Figure 2. The authors began the research by analyzing literature review models and extracting potential variables associated with the most important crop yield variables. The independent variables are collected from various sources to develop a prediction model for crop yields in different Gulf countries. The collected datasets are reanalyzed to extract any outlier data. Thereafter, the datasets for each Gulf country were divided into a training dataset (70%) and a testing dataset (30%). Neural network models with two hidden layers were developed for all Gulf countries using the most important crop in each respective country. To ensure fairness between the different prediction models, the parameters in Table 2 were used. The neural network model started the training phase by feeding 70% of the data to the neural network model, and the neural network model created a relationship between independent variables and the crop yield by eliminating the unimportant and unused independent variables in the dataset. The output of the previous step was a trained neural network model; the trained model was tested with the remaining 30% of the data to test the capability of the model in predicting the future dataset. The predicted data were utilized to estimate the performance of the prediction model using different performance metrics.

4. Results, Discussion, and Analysis

This section presents the results of the proposed model at a country level. Firstly, the correlation results are presented to understand the relationship between the independent variables and crop yields. Thereafter, the developed prediction models based on neural network architecture are presented and analyzed. The analysis contains error functions and R² values with important variable analyses.

4.1. Correlation Analysis

The correlation analysis between crop yields, namely of dates, wheat, maize (corn), watermelon, and potatoes, and the dependent variables, namely year, temperature change, total pesticides used, rainfall, and nitrogen (N) total, is shown in Table 3. To identify the correlation analysis relationship between two variables, the correlation value and the p-values are recorded in the table (as “correlation” and “p-value”). The correlation of the rainfall results has been removed from the table because the relationship between rainfall and all the studied variables was not significant. The results showed that the year was the most influencing variable on crop yield. Temperature changes, nitrogen (N), and pesticides showed a significance percentage of correlation with the dependent variables at 33%, 30%, and 22%, respectively. The correlation results showed that Saudi Arabia is the country most affected by the independent variables.

The initial correlation analysis showed that dates are most likely to have a linear relationship with the other independent variables. Moreover, the independent variables do not have a linear relationship with all crop yields in the Gulf countries, which increases the chance of a nonlinear relationship with other crop yields. Therefore, a neural network model was used to develop a prediction model using the studied independent variables.

4.2. Development of Neural Network Models for Crop Yield Prediction

After analyzing the linear relationship between the independent and dependent variables using a correlation analysis, the results revealed a weak capability to develop linear regression models for crop yields. Therefore, a nonlinear neural network model was proposed. For the neural network, the collected data were initially divided into training and testing datasets with division ratios of 70% and 30%, respectively. The training dataset was used to develop a trained neural network model; thereafter, the testing dataset was used to test the ability of the proposed model to predict future data. Only the overall prediction information is reported for each crop yield for brevity. The results of the developed models will be reported based on the country name to simplify the presentation of the results. All the developed neural network models in this section were developed using five independent variables, namely year, temperature changes per year, total pesticides per year, rainfall per year, and nitrogen (N) fertilizer per year, as shown in Table 1 and Table 2. The results showed that in all the developed models, rainfall is not an influential variable in the development of the prediction model, as the variable showed behavioral stability during the years studied.

4.2.1. Developed Crop Yield Prediction for Bahrain

The dataset for Bahrain contains only three crop yields, namely those of dates, potatoes, and watermelons. The collected crop yield data were utilized to create prediction models using a neural network. The results showed that the R² for dates, potatoes, and watermelons are 0.777, 0.850, and 0.958, respectively, while the RMSE values are 0.222, 0.132, and 0.118, respectively, as shown in Table 4. The results showed that all crop yields could be predicted with correlation values of between medium and high correlation. Moreover, watermelon prediction values were more accurate than those of dates and potatoes. The results indicated that the independent variables could be used to predict the crop yields in Bahrain. The selected independent variables are suitable for developing prediction models for the Bahrain dataset.

Moreover, the independent variables showed varied importance in developing the proposed models. The year variable is the dominant variable and the most important in all the crop yield prediction models, as shown in Figure 3. The lowest important variable for watermelons, potatoes, and dates are temperature change, nitrogen (N), and total pesticides, respectively. In addition, rainfall was excluded from all models developed due to the constant values in the last ten years. The correlation analysis and importance variables analysis indicate that it is possible to develop a realistic and robust model based on four independent variables: nitrogen, pesticides, temperature, and year.

4.2.2. Developed Crop Yield Prediction for Kuwait

The collected dataset for Kuwait contains five crop yields, namely dates, maize (corn), potatoes, watermelons, and wheat. The results demonstrate that the R² for dates, maize (corn), potatoes, watermelons, and wheat are 0.969,0.257, 0.849, 0.696, and 0.569, respectively, as shown in Table 5. The results show that the prediction models failed to predict maize and wheat, and for the rest of the crop yields, the range of R² is between 0.7 and 0.97. The results showed that the date prediction model achieved the highest performance compared to other crop yields. Moreover, the error functions showed stability in the predicted values. The results indicate that the independent variables can be used to predict the crop yields in Kuwait, and the selected independent variables are suitable for developing prediction models for dates, potatoes, and watermelons in Kuwait. On the other hand, the selected variables must be reconsidered in order to develop prediction models for wheat and maize.

Dates, potatoes, and watermelons prediction models showed that the most important variables (from highest to lowest) are year, temperature changes, and nitrogen (N), as shown in Figure 4. In addition, rainfall and pesticides are excluded from all developed models due to fixed or missing values. The correlation and importance variables analysis indicate that it is possible to develop a realistic and robust model for dates, potatoes, and watermelons based on three independent variables: nitrogen, temperature, and year.

4.2.3. Developed Crop Yield Prediction for Oman

The collected dataset for Oman contains five crop yields, namely dates, maize (corn), potatoes, watermelons, and wheat. The results show that the R² for dates, maize (corn), potatoes, watermelons, and wheat are 0.928, 0.281, 0.792, 0.611, and 0.676, respectively, as shown in Table 6. The results show that the prediction models failed to predict maize, and for the rest of the crop yields, the range of R² was between 0.6 and 0.93. The results show that the date prediction model achieved the highest performance compared to other crop yields. Moreover, the error functions show stability in the predicted values. The results indicate that the independent variables can be used to predict the crop yields in Oman. The selected independent variables are suitable for developing prediction models for dates, potatoes, watermelons, and wheat in Oman.

The prediction models of wheat, potatoes, and watermelons showed that the most important variable is that of pesticides, as shown in Figure 5. For dates, the year variable is the most important when developing a prediction model. In addition, rainfall was excluded from all models developed due to constant values over the last ten years. The correlation and importance variables analysis indicated that it is possible to develop a realistic and robust model for dates, potatoes, wheat, and watermelons.

4.2.4. Developed Crop Yield Prediction for Qatar

The collected dataset for Qatar contains five crop yields, namely dates, maize (corn), potatoes, watermelons, and wheat. The results show that the R² for dates, maize (corn), potatoes, watermelons, and wheat are 0.710, 0.068, 0.450, 0.818, and 0.117, respectively, as shown in Table 7. The results show that the prediction models failed to predict maize, potatoes, and wheat. The results show that the watermelon prediction model achieved the highest performance compared to other crop yields. Moreover, the error functions show stability in the predicted values. The results indicate that the independent variables can be used to predict the crop yields in Qatar. The selected independent variables are suitable for developing prediction models for dates and watermelons.

Dates and watermelon prediction models showed that the most important variables (from highest to lowest) are nitrogen (N), year, and temperature change, as shown in Figure 6. In addition, rainfall and pesticides are excluded from all developed models, either due to fixed values or missing values. The correlation analysis and importance variables analysis indicate that it is possible to develop a realistic and robust model for watermelons and dates.

4.2.5. Developed Crop Yield Prediction for Saudi Arabia

The collected dataset for Saudi Arabia contains five crop yields, namely dates, maize (corn), potatoes, watermelons, and wheat. The results show that the R² for dates, maize (corn), potatoes, watermelons, and wheat are 0.974, 0.792, 0.499, 0.814, and 0.930, respectively, as shown in Table 8. The results show that the prediction models failed to predict potatoes. The results show that the dates and wheat prediction models achieved the highest performance compared to other crop yields. Moreover, the error functions showed stability in the predicted values and no high peak for the generated values. The results indicate that the independent variables can be used to predict the crop yields in Saudi Arabia. The selected independent variables are suitable for developing prediction models for most crop yields.

The dates, maize, and watermelon models showed that the most important variables (from highest to lowest) are year, pesticides, temperature, and nitrogen. The wheat model showed that the most important variables (from highest to lowest) are year, temperature, pesticides, and nitrogen. The results of the important variables showed that the year variable is the highest important variable for all the developed models, as shown in Figure 7. On the other hand, potatoes displayed an odd performance compared to other models. The collected results from the potatoes model are disregarded since the developed model failed to achieve high performance. Finally, the correlation analysis and importance variables analysis indicated that it is possible to develop a realistic and robust model for crop yields using four independent models without considering the rainfall.

4.2.6. Developed Crop Yield Prediction for UAE

Only four datasets are available for the UAE, and one dataset, the wheat model, was removed from the analysis due to the unavailability of wheat data. Moreover, only three independent variables were used: nitrogen, temperature, and year. The results show that only the dates model achieved a better performance than other models with an R², MSE, MAE, MBE, and RMSE of 0.978, 0.008, 0.045, 0.028, and 0.089, respectively, as shown in Table 9. The results indicate that the independent variables can be used to predict dates in UAE, whereas other models failed.

The dates model revealed that the most important variables (from highest to lowest) are year, temperature, and nitrogen, as shown in Figure 8. Finally, the collected results proved that only three independent variables can predict the dates model in UAE.

4.3. The Comparison between the Developed Prediction Models for Gulf Countries

Crop yield prediction models (i.e., wheat, dates, watermelon, potatoes, and maize (corn)) were developed for the Gulf countries, namely Bahrain, Kuwait, Oman, Qatar, Saudi Arabia, and the UAE. Five independent variables were used to develop the prediction models which are year, rainfall, pesticide, temperature changes, and nitrogen (N) fertilizer.

As previously mentioned in the literature review section; few researchers have been interested in studying the Gulf countries either as a whole region or by country [13,20]. Most researchers applied different crop yields, such as maize, potatoes, and wheat, to alternative countries around the word. Additionally, our independent variables, such as rainfall, pesticide, and temperature, have been used in previous studies [13,20]. As the results of Khaki and Wang (2019) [13] revealed, weather conditions, the accuracy of weather predictions, soil conditions, and management practices can predict the variation in crop yields. Moreover, Al-Adhaileh, and Aldhyani, (2022) [20] results showed that temperature, pesticides, and rainfall can predict the neural network field accurately and enhance crop productivity in Saudi Arabia.

According to our findings on crop prediction level, the results showed that the watermelon model was the best for prediction in both Bahrain and Qatar, while the dates model is the best prediction model for Kuwait, Oman, Saudi Arabia, and UAE. These results can be explained by the high levels of dates and watermelon exports from the Gulf countries.

According to the independent variables on the country level, the findings revealed that Bahrain’s most important variable is the year and Qatar’s most important variable is nitrogen (N) fertilizer. Moreover, both models have the temperature variable as the most important variable. However, for the rest of the Gulf countries, the most important variable is the year compared to nitrogen (N) fertilizer, which was found to be the least important variable among independent variables. Moreover, all the developed models in the Gulf countries showed that rainfall per year can be discarded in all of them due to the constant values for the rainfall per year in all Gulf countries.

Regarding our proposed model, Table 10 shows the comparison between proposed neural network models and different crop yield prediction models as in the literature review.

Finally, our study found strong evidence that the neural network model is one of the best to predict crop yield. This result is in line with different previous studies that found the neural network model to be an appropriate model for crop yield prediction (as shown in Table 10).

To sum up, the results showed that all Gulf countries are affected mainly by four independent variables. The four variables are year, annual temperature changes, pesticides, and nitrogen (N) used annually. Moreover, the overall results of the proposed models show that the results of the different crops’ productivity in the Gulf countries are high and can be used to develop a prediction model for crop production in the Gulf countries for different crop yields. In addition, the collected results are in line with previous research, which proves that the neural network model is efficient and capable of crop yield prediction. The neural network can be used to create a strong relationship between input and output.

5. Conclusions

This paper aims to develop crop yield prediction models for Gulf countries using a neural network model. To achieve that goal, five crop yields were selected, namely dates, maize (corn), potatoes, watermelons, and wheat, along with five independent variables: year, temperature, pesticides, nitrogen (N), and rainfall. A correlation analysis was applied to understand the relationship between independent and dependent variables. The results indicated a weak relationship between crop yield and independent variables and no relationship with annual rainfall. Thereafter, the neural network models were used to develop crop yield predictions. The results show that the year variable is the most important in developing prediction models, whereas nitrogen is the least important variable in the prediction process. In addition, the results indicated that for all Gulf countries, except Kuwait and Qatar, the dates prediction model has higher performance than other crop yields. Moreover, the results found that the Gulf countries responded differently to the productivity elements of each crop. In addition, rainfall had no effect on crop prediction for the Gulf countries. The reason for this is that Gulf countries typically experience higher temperatures compared to their neighboring countries. On the other hand, the results indicated that a simple neural network with two hidden layers is able to predict the crop yield efficiently compared to a complex neural network model. This research is in line with previous work, proving the feasibility of using neural networks in crop yield prediction in different countries.

Moreover, prediction models for Qatar and Bahrain showed that watermelon is the highest predictive model compared to selected crop yields. To sum up, the results showed that all Gulf countries are affected mainly by four independent variables: the year, temperature changes, pesticides, and nitrogen (N) per year. In addition, the combination of independent variables and a neural network model can predict crop yields.

6. Theoretical and Practical Implications

Our research is practically and theoretically important. Regulators should consider our results as important findings that they can benefit from. Policy makers can use the conclusions of the prediction variable analysis to understand the direction of future investments in crop yields. Ad32ditionally, policy makers can use a neural network predictor model to forecast future crop production. In addition, this research provides a reference for the policy makers of Gulf counties to understand the impact of temperature changes in terms of increasing or decreasing crop yield production.

Moreover, the authors found that policy makers in the Gulf region should concentrate on how to enhance agriculture in their own countries, especially with the current challenges of water deficiency and the Russia–Ukraine war. Policy makers can improve the use of marginal water (e.g., ground water, seawater, and so forth) as it can be the best alternative for freshwater resources when producing crops. Additionally, policy makers should attempt to achieve a balance between converting unused land into farmland and protecting the environment. Moreover, policy makers must invest more in agricultural technology to minimize the production of greenhouse gasses. Policy makers should take climate change into consideration when imposing more regulations regarding the side effects of the agriculture industry. Finally, policy makers should find the best methods for supporting organic food suppliers and decreasing food waste.

Additionally, farmers can benefit from our findings by matching a specific crop to the most important variable that it is associated with. This will also allow farmers to focus on one variable per crop, since each type of crop showed different performances based on the independent variables. Moreover, farmers should choose the best quality seeds and fertilizers to enhance agricultural productivity. Additionally, they should study the climate conditions and protect their plants from any possible damage caused by weather extremes or diseases. Finally, farmers must test the soil and perform proper irrigation (i.e., using an appropriate amount of water) in order to support the development of their plants.

7. Limitations and Future Work

Limitations exist in this study. The world changes day by day, with earthquakes, wars, and climate change, among other things, causing sudden and unexpected changes. Even though there many crops were taken into consideration in this study, some data were available for a specific period and some data were missing for the same period. Moreover, there was a lack of previous studies that investigated crop yield in the Gulf region.

Although our results are helpful at the time of publishing, many uncontrollable variables could affect crop yield. Moreover, the overall results showed low prediction performance in a few cases for certain crops, either because the chosen variables were not sufficient to predict the yields or because more complex models must be used. In future work, the authors aim to improve the neural network model by applying a feature selection model with input variables to enhance the prediction capability of the prediction model. Additionally, the authors planned to develop a crop yield prediction model based on various machine learning models with additional crop productivity components. These additional components covered soil, environment, weather, and other factors related to crop productivity to understand the behavior of various features.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su15129392/s1, Table S1: Saudi Arabia dataset statistical description. Table S2: Kuwait dataset statistical description. Table S3: Bahrain dataset statistical description. Table S4: Qatar dataset statistical description. Table S5: Oman dataset statistical description. Table S6: UAE dataset statistical description.

Author Contributions

Conceptualization, D.A.-N.; Methodology, H.A.-N. and N.A.-R.; Software, H.A.-N.; Validation, H.F.A. and D.A.-N.; Formal analysis, H.A.-N. and N.A.-R.; Investigation, H.A.-N. and N.A.-R.; Data curation, H.A.-N. and N.A.-R.; Writing—original draft, N.A.-R.; Writing—review & editing, D.A.-N.; Supervision, H.F.A.; Funding acquisition, H.F.A. and D.A.-N. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deputyship for Research and Innovation, Ministry of Education of Saudi Arabia for funding this research work (project number INST184).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data that can reproduce the results in this study can be requested from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chang, Y.; Latham, J.; Licht, M.; Wang, L. A data-driven crop model for maize yield prediction. Commun. Biol. 2023, 6, 439. [Google Scholar] [CrossRef] [PubMed]
Al-Najjar, D.; Al-Najjar, H.; Al-Rousan, N.; Assous, H.F. Developing Machine Learning Techniques to Investigate the Impact of Air Quality Indices on Tadawul Exchange Index. Complexity 2022, 2022, 18–27. [Google Scholar] [CrossRef]
Al Najjar, D.; Assous, H.F.; Al-Najjar, H.; Al-Rousan, N. Ramadan effect and indices movement estimation: A case study from eight Arab countries. J. Islam. Mark. 2022; ahead-of-print. [Google Scholar]
Bhimavarapu, U.; Battineni, G.; Chintalapudi, N. Improved Optimization Algorithm in LSTM to Predict Crop Yield. Computers 2023, 12, 10. [Google Scholar] [CrossRef]
Ikram, A.; Aslam, W.; Aziz, R.H.H.; Noor, F.; Mallah, G.A.; Ikram, S.; Ahmad, M.S.; Abdullah, A.M.; Ullah, I. Crop Yield Maximization Using an IoT-Based Smart Decision. J. Sens. 2022, 2022, 2022923. [Google Scholar] [CrossRef]
Morales, G.; Sheppard, J.W.; Hegedus, P.B.; Maxwell, B.D. Improved Yield Prediction of Winter Wheat Using a Novel Two-Dimensional Deep Regression Neural Network Trained via Remote Sensing. Sensors 2023, 23, 489. [Google Scholar] [CrossRef] [PubMed]
Muruganantham, P.; Wibowo, S.; Grandhi, S.; Samrat, N.H.; Islam, N. A Systematic Literature Review on Crop Yield Prediction with Deep Learning and Remote Sensing. Remote Sens. 2022, 14, 1990. [Google Scholar] [CrossRef]
Bolaños, J.; Corrales, J.C.; Campo, L.V. Feasibility of Early Yield Prediction per Coffee Tree Based on Multispectral Aerial Imagery: Case of Arabica Coffee Crops in Cauca-Colombia. Remote Sens. 2023, 15, 282. [Google Scholar] [CrossRef]
Gonzalez-Sanchez, A.; Frausto-Solis, J.; Ojeda-Bustamante, W. Attribute selection impact on linear and nonlinear regression models for crop yield prediction. Sci. World J. 2014, 2014, 509429. [Google Scholar] [CrossRef]
Gupta, S.; Geetha, A.; Sankaran, K.S.; Zamani, A.S.; Ritonga, M.; Raj, R.; Ray, S.; Mohammed, H.S. Machine Learning-and Feature Selection-Enabled Framework for Accurate Crop Yield Prediction. J. Food Qual. 2022, 2022, 6293985. [Google Scholar] [CrossRef]
AL-Najjar, D.; Al-Rousan, N.; AL-Najjar, H. Machine Learning to Develop Credit Card Customer Churn Prediction. J. Theor. Appl. Electron. Commer. Res. 2022, 17, 1529–1542. [Google Scholar] [CrossRef]
Al-Najjar, D.; Al-Najjar, H.; Al-Rousan, N. Identifying Economic Sectors Influencing the Long-Term General Index Prediction Based on Feature Selection and Search Methods: Amman Stock Exchange Market. Ekon. Reg./Econ. Reg. 2022, 18, 1301–1316. [Google Scholar] [CrossRef]
Khaki, S.; Wang, L. Crop yield prediction using deep neural networks. Front. Plant Sci. 2019, 10, 621. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Khaki, S.; Pham, H.; Wang, L. Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning. Sci. Rep. 2021, 11, 11132. [Google Scholar] [CrossRef]
Aleisa, E.; Al-Zubari, W. Wastewater reuse in the countries of the Gulf Cooperation Council (GCC): The lost opportunity. Environ. Monit. Assess 2017, 189, 553. [Google Scholar] [CrossRef] [PubMed]
Warshay, B.; Brown, J.J.; Sgouridis, S. Erratum to: Life cycle assessment of integrated seawater agriculture in the Arabian (Persian) Gulf as a potential food and aviation biofuel resource. Int. J. Life Cycle Assess 2017, 22, 1033. [Google Scholar] [CrossRef] [Green Version]
Elmi, A.A. Food Security in the Arab Gulf Cooperation Council States. In Sustainable Agriculture Reviews; Lichtfouse, E., Ed.; Springer International Publishing: Cham, Switzerland, 2017; pp. 89–114. ISBN 978-3-319-58679-3. [Google Scholar]
Brown, J.J.; Das, P.; Al-Saidi, M. Sustainable agriculture in the Arabian/Persian Gulf region utilizing marginal water resources: Making the best of a bad situation. Sustainability 2018, 10, 1364. [Google Scholar] [CrossRef] [Green Version]
Al Blooshi, L.S.; Ksiksi, T.S.; Aboelenein, M.; Gargoum, A.S. The impact of climate change on agricultural and livestock production and groundwater characteristics in Abu Dhabi, UAE. Nat. Environ. Pollut. Technol. 2020, 19, 1945–1956. [Google Scholar] [CrossRef]
Al-Adhaileh, M.H.; Aldhyani, T.H. Artificial intelligence framework for modeling and predicting crop yield to enhance food security in Saudi Arabia. PeerJ Comput. Sci. 2022, 8, e1104. [Google Scholar] [CrossRef] [PubMed]
Rashid, M.; Bari, B.S.; Yusup, Y.; Kamaruddin, M.A.; Khan, N. A comprehensive review of crop yield prediction using machine learning approaches with special emphasis on palm oil yield prediction. IEEE Access 2021, 9, 63406–63439. [Google Scholar] [CrossRef]
Paudel, D.; Boogaard, H.; de Wit, A.; Janssen, S.; Osinga, S.; Pylianidis, C.; Athanasiadis, I.N. Machine learning for large-scale crop yield forecasting. Agric. Syst. 2021, 187, 103016. [Google Scholar] [CrossRef]
Yamaç, S.S.; Todorovic, M. Estimation of daily potato crop evapotranspiration using three different machine learning algorithms and four scenarios of available meteorological data. Agric. Water Manag. 2020, 228, 105875. [Google Scholar] [CrossRef]
Burhan, H.A. Crop Yield Prediction by Integrating Meteorological and Pesticides Use Data with Machine Learning Methods: An Application for Major Crops in Turkey. Ekon. Polit. Ve Finans. Araştırmaları Derg. 2022, 7, 1–18. [Google Scholar] [CrossRef]
Oikonomidis, A.; Catal, C.; Kassahun, A. Deep learning for crop yield prediction: A systematic literature review. N. Z. J. Crop Hortic. Sci. 2022, 51, 1–26. [Google Scholar] [CrossRef]
Jin, X.; Li, Z.; Feng, H.; Ren, Z.; Li, S. Deep neural network algorithm for estimating maize biomass based on simulated Sentinel 2A vegetation indices and leaf area index. Crop J. 2020, 8, 87–97. [Google Scholar] [CrossRef]
Srivastava, A.K.; Safaei, N.; Khaki, S.; Lopez, G.; Zeng, W.; Ewert, F.; Gaiser, T.; Rahimi, J. Winter wheat yield prediction using convolutional neural networks from environmental and phenological data. Sci. Rep. 2022, 12, 3215. [Google Scholar] [CrossRef]
Nevavuori, P.; Narra, N.; Lipping, T. Crop yield prediction with deep convolutional neural networks. Comput. Electron. Agric. 2019, 163, 104859. [Google Scholar] [CrossRef]
Haque, F.F.; Abdelgawad, A.; Yanambaka, V.P.; Yelamarthi, K. Crop Yield Prediction Using Deep Neural Network. In Proceedings of the 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA, 2–16 June 2020; pp. 1–4. [Google Scholar]
Bhullar, A.; Nadeem, K.; Ali, R.A. Simultaneous multi-crop land suitability prediction from remote sensing data using semi-supervised learning. Sci. Rep. 2023, 13, 6823. [Google Scholar] [CrossRef]
Ed-Daoudi, R.; Alaoui, A.; Ettaki, B.; Zerouaoui, J. Improving Crop Yield Predictions in Morocco Using Machine Learning Algorithms. J. Ecol. Eng. 2023, 24, 392–400. [Google Scholar] [CrossRef]
PS, M.G.; Bhargavi, R. Performance evaluation of best feature subsets for crop yield prediction using machine learning algorithms. Appl. Artif. Intell. 2019, 33, 621–642. [Google Scholar]
Krithika, K.M.; Maheswari, N.; Sivagami, M. Models for feature selection and efficient crop yield prediction in the groundnut production. Res. Agric. Eng. 2022, 68, 131–141. [Google Scholar] [CrossRef]
Agarwal, S.; Tarar, S. A hybrid approach for crop yield prediction using machine learning and deep learning algorithms. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 1714, p. 012012. [Google Scholar]
Morales, A.; Villalobos, F.J. Using machine learning for crop yield prediction in the past or the future. Front. Plant Sci. 2023, 14, 1128388. [Google Scholar] [CrossRef] [PubMed]
Jhajharia, K.; Mathur, P.; Jain, S.; Nijhawan, S. Crop yield prediction using machine learning and deep learning techniques. Procedia Comput. Sci. 2023, 218, 406–417. [Google Scholar] [CrossRef]
AL-Rousan, N.; Mat Isa, N.A.; Mat Desa, M.K.; AL-Najjar, H. Integration of logistic regression and multilayer perceptron for intelligent single and dual axis solar tracking systems. Int. J. Intell. Syst. 2021, 36, 5605–5669. [Google Scholar] [CrossRef]

Figure 1. The architecture of the neural network with one hidden layer.

Figure 2. The used methodology to used methodology.

Figure 3. The important variables for crop yield in the Bahrain dataset.

Figure 4. The important variables for crop yield in Kuwait dataset.

Figure 5. The important variables for crop yield in the Oman dataset.

Figure 6. The important variables for crop yield in the Qatar dataset.

Figure 7. The important variables for crop yield in the Saudi Arabia dataset.

Figure 8. The important variables for crop yield in the UAE dataset.

Table 1. Independent and dependent variables for the developed models.

Country	Dependent Variables	Independent Variables
Bahrain	dates, potatoes, watermelons	year, temperature changes per year, total pesticides per year, rainfall per year, and nitrogen (N) fertilizer per year
Kuwait	dates, wheat, maize (corn), watermelon, potatoes
Oman	dates, wheat, maize (corn), watermelon, potatoes
Qatar	dates, wheat, maize (corn), watermelon, potatoes
Saudi Arabia	dates, wheat, maize (corn), watermelon, potatoes
UAE	dates, maize (corn), watermelon, potatoes

Table 2. Tuned parameters for MLP.

Parameter	Value
Number of epochs	1000
Initial learning rate	0.4
Momentum	0.9
Performance	1 × 10⁻¹⁵
Number of neurons (hidden layer/s)	11
Type of training	Batch
Optimization algorithm	Gradient decent
Training percentage	70%
Testing percentage	30%
Number of neurons for each hidden layer	10
Number of inputs	Based on the selected model (as shown in Table 1)
Activation function for the hidden layer	Sigmoid
Activation function for input layer	Pureline
Neural network type	Feedforward neural network

Table 3. The correlation analysis between crop yields and independent variables.

Country	Item	Year	Temperature_Change	Pesticides_Total	Nitrogen (N)
Bahrain	Dates	(0.67, 0.001)	(0.67, 0.001)	NOT	NOT
	Potatoes	(0.815, 0.000)	(0.686, 0.000)	NOT	NOT
	Watermelons	(0.531, 0.023)	NOT	NOT	NOT
Kuwait	Dates	(−0.657, 0.001)	NOT	NOT	NOT
	Wheat	(0.566, 0.006)	NOT	NOT	(−0.699, 0.001)
	Maize (Corn)	NOT	NOT	NOT	(−0.699, 0.001)
	Watermelon	(0.778, 0.000)	NOT	NOT	(0.468, 0.043)
	Potatoes	(0.693, 0.000)	(0.585, 0.004)	NOT	(0.717, 0.001)
Oman	Dates	(0.866, 0.000)	(0.653, 0.001)	(0.461, 0.041)	NOT
	Wheat	NOT	NOT	(0.533, 0.016)	NOT
	Maize (Corn)	NOT	NOT	NOT	NOT
	Watermelon	NOT	NOT	NOT	NOT
	Potatoes	NOT	NOT	NOT	NOT
Qatar	Dates	NOT	NOT	NOT	(0.578, 0.006)
	Wheat	NOT	NOT	NOT	NOT
	Maize (Corn)	NOT	NOT	NOT	NOT
	watermelon	NOT	(−0.364, 0.096)	NOT	NOT
	Potatoes	NOT	NOT	NOT	NOT
Saudi Arabia	Dates	(0.839, 0.000)	(0.739, 0.000)	(0.956, 0.000)	(0.668, 0.001)
	Wheat	(0.847, 0.000)	(0.642, 0.001)	(0.658, 0.002)	(−0.614, 0.003)
	Maize(Corn)	(0.458, 0.032)	NOT	(0.608, 0.004)	NOT
	Watermelon	(0.756, 0.000)	(0.483, 0.023)	(0.573, 0.008)	(−0.441, 0.045)
	Potatoes	NOT	NOT	NOT	NOT
UAE	Dates	(0.808, 0.000)	(0.508, 0.016)	NOT	NOT
	Maize(Corn)	NOT	NOT	NOT	NOT
	Watermelon	NOT	NOT	NOT	NOT
	Potatoes	NOT	NOT	NOT	NOT

Table 4. Crop yield prediction models for the overall Bahrain dataset.

	R²	MSE	MAE	MBE	RMSE
Dates	0.777	0.049	0.117	0.083	0.222
Potatoes	0.850	0.017	0.102	0.010	0.132
Watermelons	0.958	0.014	0.080	−0.037	0.118

Table 5. Crop yield prediction models for the overall Kuwait dataset.

	R²	MSE	MAE	MBE	RMSE
Dates	0.969	0.007	0.065	0.009	0.082
Maize (corn)	0.257	0.053	0.136	−0.016	0.231
Potatoes	0.849	0.023	0.108	0.014	0.153
Watermelons	0.696	0.080	0.202	0.114	0.283
Wheat	0.569	0.043	0.144	−0.012	0.206

Table 6. Crop yield prediction models for the overall Oman dataset.

	R²	MSE	MAE	MBE	RMSE
Dates	0.928	0.019	0.074	0.003	0.138
Maize (corn)	0.281	0.037	0.126	−0.007	0.192
Potatoes	0.792	0.004	0.047	0.003	0.063
Watermelons	0.611	0.053	0.174	0.026	0.230
Wheat	0.676	0.034	0.146	0.001	0.184

Table 7. Crop yield prediction models for the overall Qatar dataset.

	R²	MSE	MAE	MBE	RMSE
Dates	0.710	0.047	0.179	−0.017	0.216
Maize (corn)	0.068	0.085	0.210	0.009	0.292
Potatoes	0.450	0.034	0.086	0.061	0.185
Watermelons	0.818	0.037	0.163	0.010	0.192
Wheat	0.117	0.030	0.098	0.013	0.174

Table 8. Crop yield prediction models for the overall Saudi Arabia dataset.

	R²	MSE	MAE	MBE	RMSE
Dates	0.974	0.004	0.050	0.002	0.066
Maize (corn)	0.792	0.046	0.168	−0.028	0.214
Potatoes	0.499	0.027	0.096	0.009	0.164
Watermelons	0.814	0.014	0.094	0.014	0.117
Wheat	0.930	0.013	0.090	−0.001	0.112

Table 9. Crop yield prediction models for the overall dataset.

	R²	MSE	MAE	MBE	RMSE
Dates	0.978	0.008	0.045	0.028	0.089
Maize (corn)	0.284	0.056	0.178	−0.003	0.236
Potatoes	0.247	0.034	0.129	−0.015	0.184
Watermelons	0.439	0.037	0.150	0.020	0.194

Table 10. Results of neural network compared with different prediction results of crop yield.

Ref	Model	Crop Yield	Region	Results
Khaki and Wang (2019) [13]	Neural network	Maize	United States and Canada	RMSE = 12.18, R² = 84.01
Al-Adhaileh and Aldhyani (2022) [20]	Multilayer perceptron (proposed system)	Maize, potatoes, rice, sorghum, and wheat	Saudi Arabia	RMSE = 0.04493, R² = 96.02
Oikonomidis, (2022) [25]	Convolutional neural networks (CNN)–deep neural networks (DNN)	soybean	United States	RMSE = 0.266, R² = 0.87
Bhullar et al. (2023) [30]	Semi-supervised learning	Multi-crop yield	Canada	The prediction model is high compared with other models
Ed-Daoudi et al. (2023) [31]	Neural networks	Multi-crop yield	Morocco	RMSE = 0.31, R² = 0.90
Ps and Bhargavi (2019) [32]	forward feature selection (FFS) algorithm and neural network	Multi-crop yield	India	RMSE = 0.098, R² = 0.92
Proposed model	Neural network	Multi-crop yield	Gulf countries	RMSE = 0.114, R² = 0.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Assous, H.F.; AL-Najjar, H.; Al-Rousan, N.; AL-Najjar, D. Developing a Sustainable Machine Learning Model to Predict Crop Yield in the Gulf Countries. Sustainability 2023, 15, 9392. https://doi.org/10.3390/su15129392

AMA Style

Assous HF, AL-Najjar H, Al-Rousan N, AL-Najjar D. Developing a Sustainable Machine Learning Model to Predict Crop Yield in the Gulf Countries. Sustainability. 2023; 15(12):9392. https://doi.org/10.3390/su15129392

Chicago/Turabian Style

Assous, Hamzeh F., Hazem AL-Najjar, Nadia Al-Rousan, and Dania AL-Najjar. 2023. "Developing a Sustainable Machine Learning Model to Predict Crop Yield in the Gulf Countries" Sustainability 15, no. 12: 9392. https://doi.org/10.3390/su15129392

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Developing a Sustainable Machine Learning Model to Predict Crop Yield in the Gulf Countries

Abstract

1. Introduction

2. Literature Review

3. Research Methodology

3.1. Data Collection

3.2. Correlation Analysis for Gulf Countries

3.3. The Methodology of Developing Neural Network Models for Crop Yields Prediction

4. Results, Discussion, and Analysis

4.1. Correlation Analysis

4.2. Development of Neural Network Models for Crop Yield Prediction

4.2.1. Developed Crop Yield Prediction for Bahrain

4.2.2. Developed Crop Yield Prediction for Kuwait

4.2.3. Developed Crop Yield Prediction for Oman

4.2.4. Developed Crop Yield Prediction for Qatar

4.2.5. Developed Crop Yield Prediction for Saudi Arabia

4.2.6. Developed Crop Yield Prediction for UAE

4.3. The Comparison between the Developed Prediction Models for Gulf Countries

5. Conclusions

6. Theoretical and Practical Implications

7. Limitations and Future Work

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI