1. Introduction
Agriculture is an economic activity that has a high dependency on weather conditions [
1]. This means that seasonal agriculture is dependent on natural weather conditions, also known as rainfed agriculture. Rainfed agriculture constitutes 80% of the cropland worldwide and generates good yields when crops have favorable weather conditions. In many lands where rainfall is scarce, rainfed agriculture is supplemented by irrigation practices [
2]. The fact still remains that agricultural production is heavily reliant on rainfall and other weather variables. It is such the case that at times, farmers do not acquire the expected harvest due to the scarcity or abundance of rainfall and other weather parameters.
Climate change has a great impact on the productivity of agriculture and may lead to hunger or food insecurity. The latter is a crucial problem in the regions characterized by droughts or other weather-related disasters. Climate variables that affect crop production include precipitation, air temperature, humidity, and solar radiation [
3]. Different studies have shown that climate indices at both global and regional levels affect crop yields and food security [
4]. In their study, Damien et al. found that the reduced crop yields could be associated with either high temperature or abundant precipitation [
5]. Extreme temperature has negative effects on crop production due to various factors such as increased evapotranspiration and respiration of crops, and higher pest infestation [
1]. Increased precipitation intensity leads to increased runoff patterns that in turn cause floods and the risk of crop failure [
2]. Crop productivity can also be affected by the increased temperature that causes the increase in crop water demand [
1,
6,
7]. In all scenarios, climate change has a potential impact on agriculture in different ways.
Although the climate variables may be the same for a specific area, however, the needs of weather parameters are different from one crop to another according to their growing stage. This means that each crop has a different level of resilience to the atmospheric variables. When weather variables spike at an extreme level, a remarkable influence on crop production will be observed [
3]. The influence of climate change on agriculture can be observed everywhere. For example, from March to August 2018, a large portion of Europe experienced extreme temperatures, while the southern region of the continent experienced abundant rainfall [
5].
In the context of Rwanda, climate change and its impacts on agriculture have been a challenge to the country in some provinces that have faced long dry season or high rainfall. In 2016, the drought left 44,000 poor households food insecure in the eastern province [
8]. Between 2012 and 2016, landslides, floods, and erosion harmed agriculture production in areas with steep slopes and heavy rainfall, resulting in a 1.4% loss of Gross Domestic Product (GDP) [
9]. According to the assessment carried out by the Ministry of Agriculture and Animal Resources (MINAGRI), more than 3000 families in the Eastern Province (Kayonza, Kirehe, and Nyagatare districts) faced hunger due to the drought in 2017 [
10]. The analysis of the variability in rainfall shows that rainy seasons tend to be shorter with higher intensity and that tendency has an impact on crop yields due to droughts, landslides, and floods [
11]. In its seasonal agriculture survey, the National Institute of Statistics of Rwanda (NISR) indicated that insufficient rainfall is the biggest factor that has a high contribution to the bad harvest in Rwanda. In addition, the annual reports published by this institution indicate that the seasonal harvest varies from one crop to another in the respective agriculture seasons depending on the various circumstances including weather conditions.
In order to overcome the problems related to the variation of weather conditions on crops, various solutions have been proposed in different studies. The investigation conducted by Safieh et al. indicated that climate change has the impact on the crop water requirements as well as the predicted crop yields in the future based on weather forecast data [
4,
12]. The study conducted on the impact of extreme weather conditions on the different regions of Europe showed that the most reliable weather predictors of agricultural production are rainfall and air temperature and their respective thresholds [
5]. Precipitation and air temperature are the most common climatic parameters used in many studies. However, other parameters such as solar radiation, air humidity, soil moisture, and wind speed have been used to predict crop yields using different machine learning models (MLM) such as the Artificial Neural Network (ANN), Semiparametric Neural Network [
13], Convolutional Neural Network (CNN) [
14,
15], Lasso, Kernel Ridge, Enet [
16], Naïve Bayes, K-Nearest Neighbor [
17], Recurrent Neural Network (RNN), Long short-term memory (LSTM) [
18], and Random Forest [
17,
19].
In Rwanda, various studies have been carried out with focus on predicting yields of different crops. In [
20], Rugimbana applied the Aqua Crop model to predict maize yields under rainfed agriculture in Eastern province of Rwanda. The author carried out a trend analysis on climatic parameters such as maximum and minimum temperatures, rainfall, evapotranspiration and maize yield. Among the findings was that rainfall trend had non-significant impact on yield over the study area within the considered study period. Ngaruye et al. in [
21] applied Small Area Estimation (SAE) techniques under a multivariate linear regression model for repeated measures data to produce district level estimates of crop yield for beans (i.e., bush and climbing beans) in Rwanda during agricultural seasons of 2014. The authors applied the analysis on micro data of NISR obtained from the Seasonal Agricultural Survey (SAS) 2014. Breure et al. [
22] predicted the yield of maize crop by applying the Quantitative Evaluation of the Fertility of Tropical Soils (QUEFTS) model. Specifically, the authors compared two methods for developing maps of QUEFTS output, i.e., maize yield and the yield-limiting nutrient, with Rwanda as a case study. The study was based on a database of soil analysis results of 999 samples collected across Rwanda. As these studies highlight, clearly there is an increase in interest on research on yield prediction in Rwanda. However, there is a gap on application of more sophisticated machine learning models for yield prediction. Furthermore, with climate change impacts becoming more apparent, the impact of climatic changes on crop yields will only increase. Hence, the need for extensive research on the impact of climatic factors on crop yields cannot be overemphasized. This study seeks to bridge this gap.
Various researchers have demonstrated the impact of climate anomalies on crops [
1,
4,
5,
6] by showing the correlation between crop production and weather variables. Some of these studies have gone further to indicate thresholds of temperature, precipitations, and water requirements for plant development [
5]. Studies that have worked on the predictions have come up with good results in terms of prediction capabilities [
16,
17]. However, each plant has different weather requirements for good production. Hence, knowledge of weather conditions suitable for each crop to produce a decent harvest should be taken into consideration when carrying out yield prediction for a particular crop. Unfortunately, various studies in the literature did not come up with the ratio of the contribution of each weather parameter on crop production. Furthermore, these studies did not indicate at what stage of crop development each weather parameter was more or less needed. The knowledge about the contribution of each climatic factor and its threshold value for good production is crucial for the future prediction of crop yields based on the weather monitoring using the Internet of Things (IoT).
In this study, more than one MLMs have been explored with the goal of knowing which one is best fit to be implemented in our future studies on crop yields prediction through weather monitoring using IoT. The MLMs explored in this study include Random Forest (RF), Polynomial Regression (PR), and Support Vector Regressor (SVR). The crops of interest are maize and Irish potatoes which are dominant crops grown in the district of Musanze of Rwanda. The data for the crop yields were gathered from NISR and from different cooperatives of farmers in the area of study. Rainfall and temperature were used as predictors. Specifically, the main contributions of this work are as follows:
- (1)
identifying the correlation between crop production and weather parameters (i.e., rainfall and temperature);
- (2)
determining the feature importance of each weather parameter on crop production, and;
- (3)
identifying the best MLM for the prediction of crop production.
The results of this study will inform the design and development of a crop yield prediction system using IoT and machine learning. The system will be used by farmers and decision makers for the adaptation of reduced crop production.
The rest of the manuscript is organized as follows. In
Section 2, a discussion of the materials and research methodology is presented. It includes a discussion of the study area, data collection, data pre-processing and a discussion of the MLMs and the motivation behind their selection.
Section 3 presents the results in terms of the performance of the various MLMs. A detailed discussion of the results, its interpretation and how the model performances related to previous works is presented in
Section 4.
Section 5 is conclusion.
4. Discussion
Crop yields depends on both controllable and non-controllable factors. The first include factors such as crop or seed varieties, tillage practice, the use of fertilizer, and many others. The non-controllable factors are those beyond human control such as weather variables such as precipitation, air temperature, soil temperature, air humidity, soil moisture, solar radiation, etc. If any of the parameters exceed or is below the level of the plant’s requirements, the plant might not grow well, and the productivity will be affected. To achieve a good crop yield prediction, the use of the above-mentioned weather variables is very crucial. However, this can be limited by the availability of the data. For that reason, in this study, the precipitation and temperature data were used for the yield prediction of the Irish potatoes and maize.
Rainfall and temperature requirements differ from one crop to another and from one growing stage to another [
12]. This means that an inadequate weather variable at a given stage will have implication on the production level. In the current study, the monthly cumulative rainfall has been used to reflect the crop’s growing stage and the average daily temperature.
4.1. Crops Growing Stages and Climate Requirements
The amount of water needed for any plant depends on the growing stage of the crop, the evaporative requirement of the atmosphere, the crop species, and other parameters [
28]. Irish potatoes undertake different stages from the planting day. They are sprout development, vegetative growth, tuber initiation, tuber bulking/filling, and maturity [
29]. The growth stages of maize are: establishment, vegetative, tasseling, cob setting (or cob filling), and maturity [
30].
4.1.1. Climate Requirements for the Irish Potatoes and Their Impacts on the Production
Potatoes are cool environment-loving plant and their growth performs well when the air temperature ranges 16–25 °C during vegetative growth, while the optimum temperature in tuber initiation and bulking stages is in the range of 4–18 °C [
30]. In the present study, we determined the optimum rainfall and temperature needed for Irish potatoes to obtain their optimum production.
Figure 11 shows the range (minimum, medium, and maximum) of weather requirements from the first month to the last month ADP.
The optimum Irish potato yield is 10,000–14,000 kg ha
−1. Given that the establishment (sprouting) stage takes 21 days ADP [
30], which means that it occurs during the first month and to obtain the optimum harvest, the rainfall (rain_1m) was in the range of 50–200 mm, whereas the temperature (temp_1m) was 22.5–26.5. From the last week of the 1st month up to the end of the 2nd month is the stage of vegetative growth and tuber initiation. According to the analysis, the ideal rainfall and temperature for these stages are are 100–250 mm and 24–26 °C, respectively. During the tuber bulking stage, the optimum rainfall and temperature for optimum harvest are 150–250 mm and 21–26 °C, respectively, whereas during the maturity stage, the optimum rainfall and temperature for optimum crop yield are 100–200 and 22.5–25 °C, respectively.
4.1.2. Climate Requirements for the Maize and Their Impact on the Production
Crop productivity can be low due to high temperatures. This is attributed to the reduced photosynthesis phenomenon. For instance, for maize, photosynthesis is performed at an optimum level when the temperature is approximately 24 °C and being higher affects the performance of this phenomenon [
31]. Depending on the maize species, harvesting in the area of study can be carried out after 5 to 6 months ADP. In this study, the analysis of weather requirements was carried out for a period of 5 months since even though the crop can live for 6 months, the need for water and temperature are more critical during the first five months.
Figure 12 below shows the best weather requirements for the optimization of crop yield.
The rainfall and temperature for the establishment stage are 75–200 mm and 23–26.6 °C, respectively. The water requirement for second to third month ADP (vegetative and tasseling stages) is in range of 100–300 mm, whereas the temperature is 23–26.5 °C and 22.5–26 °C during the second and third month, respectively. The rainfall and temperature during the cob setting stage were 100–250 mm and 22.5–25.5 °C. The maturity stage required rainfall of 50–150 mm, whereas the temperature was in the range of 24–25 °C.
4.2. Prediction and Models Performance
Prediction of crop yield through machine learning is a crucial study because it helps to provide information about the trend of productivity for decision making. In this study, Irish potato and maize datasets were trained in three models for the purpose of finding the best yield predictor to be recommended to the systems developers and for our future work. Although there are many climate-related predictors of crop yield, rainfall and temperature have been used due to the unavailability of other variables such as air humidity, soil moisture, solar radiation etc. The results from the trained and tested models indicate that the precision for Irish potato yield forecasting with RMSE were 510.8, 740, and 971.6 kg ha
−1 for RF, SVR, and PR, respectively. On the other hand, the RMSE for maize yield prediction were 129.9, 152.7, and 212.4 kg ha
−1 for RF, PL, and SVR. These results imply that RF is the best model, as shown by
Figure 10. The strength of correlation between predicted and actual crop yields was evaluated through R
2 that were equal to 0.875, 0.773, 0.560 (RF, PL, and SVR) for Irish potatoes, and 0.817, 0.716, 0.549 for maize.
These performance results indicate that the best model to be recommended in the development of early crop yield prediction system is RF. The latter is one of the most widely used MLMs in crop yield forecasting today [
26,
32,
33,
34]. Similar studies that used different models such as the SVR, ANN, CNN, DNN, LSTM [
13,
35,
36,
37,
38] have achieved good performance results. However, authors select MLM and the predictors (independent variables) for different reasons, such as the nature of the dataset, types of the dependent variable (target), size of the dataset, availability of the data, etc. This study considered the climatic variables because the impact climate change has on crop productivity is critical. At their various growth stages, crops can be affected by climate change that may later result in a reduced yield. This is the reason we used rainfall and temperature by considering the growth stages of the crops. Even though the prediction results of the best MLM were good, this study was limited by the lack of data for other parameters such as air humidity, soil moisture, and solar radiations. Improvement of the prediction results would be observed if those parameters were considered. These parameters will be considered in our next study, where the IoT will be used to gather those climatic and hydraulic variables to be fed into the MLM and forecast the crop yield.