Cascade Parallel Random Forest Algorithm for Predicting Rice Diseases in Big Data Analysis
Abstract
:1. Introduction
1.1. Research Background and Meaning
1.2. Research Gap
1.3. Contribution
1.4. Organizations of the Paper
2. Background and Related Work
2.1. Background of Rice Yield Prediction
2.2. Related Work
3. Materials and Methods
3.1. Problem Formulation
3.2. Evaluation Indexes and Validation Procedure
3.3. CPRF
4. CPRF Algorithm Optimization
Cascade Random Forest Algorithm
Algorithm 1: Balanced dimension reduction in the cascading process |
Smaj—The majority class sample set; Smin—The minority class sample set; |
Input: z* stands for imbalance coefficient; |
TPR*: Using the threshold of each trained PRF as the true positive rate; |
iTree: the number of trees for growing; |
minNode: represents the minimum node size to split. |
Output: |
Steps: |
Step 0: Initialization, assigned to initial value |
CPRF←Ø; l←1; |
Step 1: Create an empty string array Zj |
Step 2: Decide whether the training array is a CPRF. |
IF |Zmaj|< = z*|Zmin| |
Train a PRF, denoted as PRF1, on Zmaj∪Zmin with prescribed training parameters iTree and |minNode|; assign an initial threshold T1; go to Step 10 |
END IF |
Step 3: Train CPRFs. |
While TRUE IF |Zmaj|>z*|Zmin|, randomly sample a subset Zl from Zmaj such that |Zl|=|Zmin| |
else if |Zmaj|>z*)|Zmin| |
then set |Zl||Zmaj| |
else Go to Step 2 |
end if |
Algorithm 2: Using the CPRF algorithm to train the agricultural data in a cascading way |
Input: Dj: the jth training dataset; Tyield: the yield table of CPRF; k: the number of important factors or variables selected by VI; m: the number of the selected feature variables. Output: Zj: a set of m important feature variables of Dj. CPRFtrained: the trained CPRF model. CPRF algorithm’s Accuracy—The accuracy of the ensemble cascade parallel random forests algorithm Steps: Step 1: for each feature variable tij in Dj do Calculate Entropy (tij) for each input feature factor Calculate gain G(tij)←Entropy(Dj) – Entropy(tij); Calculate split information obtain the gain ratio end for Step 2: Arithmetic and get the value of variable importance for feature variable tij Step 3: Sort M feature variables in descending order by VI(tij) put top n feature variables to Fj [0; …; n − 1] define c←0; for j = n to M − 1 do While c < (M – n) do select tij from (M – n) randomly; put tij to Fj[n + c]; c←c+1; end while end for return Fj Return CPRFtrained Step 4: Determine the accuracy of the CPRF algorithm in comparison with other algorithms. |
5. Experiments Results and Analysis
5.1. Selected Datasets
5.2. Precipitation and Temperature
5.3. Experimental Design of Rice Diseases
5.4. Advantages of CPRF Algorithm
5.5. Efficiency Comparison for Different Algorithms
5.6. Average Train_Time Comparison for Different Algorithms
5.7. Average Execution Time of Different Datasets
5.8. Accuracy Comparison of Different Algorithms
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
Num | Abbreviation | Full Name |
1 | RF | Random Forest |
2 | PRF | Parallel Random Forest |
3 | CPRF | Cascade Parallel Random Forest |
4 | MLRF | Machine Learning Random Forest |
5 | kNN | K-Nearest Neighbor |
6 | RYCC | Rice Yield Correlation Coefficient |
7 | CPRF-RY | Cascade Parallel Random Forest-Rice Yield |
References
- Abbaspour-Gilandeh, Y.; Molaee, A.; Sabzi, S.; Nabipur, N.; Shamshirband, S.; Mosavi, A. A combined method of image processing and artificial neural network for the identification of 13 Iranian rice cultivars-agronomy. Agronomy 2020, 10, 117. [Google Scholar] [CrossRef] [Green Version]
- Everingham, Y.; Sexton, J.; Skocaj, D.; Inman-Bamber, G. Accurate prediction of sugarcane yield using a random forest algorithm. Agron. Sustain. Dev. 2016, 36, 27. [Google Scholar] [CrossRef] [Green Version]
- Chu, Z.; Yu, J. An end-to-end model for rice yield prediction using deep learning fusion. Comput. Electron. Agric. 2020, 174, 105471. [Google Scholar] [CrossRef]
- Tian, L.; Wang, C.; Li, H.; Sun, H. Yield prediction model of rice and wheat crops based on ecological distance algorithm. Environ. Technol. Innov. 2020, 20, 101132. [Google Scholar] [CrossRef]
- Barzin, R.; Pathak, R.; Lotfi, H.; Varco, J.; Bora, G.C. Use of UAS Multispectral Imagery at Different Physiological Stages for Yield Prediction and Input Resource Optimization in Corn. Remote Sens. 2020, 12, 2392. [Google Scholar] [CrossRef]
- Rashid, M.; Bari, B.S.; Yusup, Y.; Kamaruddin, M.A.; Khan, N. A Comprehensive Review of Crop Yield Prediction Using Machine Learning Approaches With Special Emphasis on Palm Oil Yield Prediction. IEEE Access 2021, 9, 63406–63439. [Google Scholar] [CrossRef]
- Peng, B.; Guan, K.; Zhou, W.; Jiang, C.; Frankenberg, C.; Sun, Y.; He, L.; Köhler, P. Assessing the benefit of satellite-based Solar-Induced Chlorophyll Fluorescence in crop yield prediction. Int. J. Appl. Earth Obs. Geoinf. 2020, 90, 102126. [Google Scholar] [CrossRef]
- Shin, J.-Y.; Kim, K.R.; Ha, J.-C. Seasonal forecasting of daily mean air temperatures using a coupled global climate model and machine learning algorithm for field-scale agricultural management. Agric. For. Meteorol. 2020, 281, 107858. [Google Scholar] [CrossRef]
- Khosla, E.; Dharavath, R.; Priya, R. Crop yield prediction using aggregated rainfall-based modular artificial neural networks and support vector regression. Environ. Dev. Sustain. 2020, 22, 5687–5708. [Google Scholar] [CrossRef]
- Vimala, S.; Merlin, N.R.G.; Ramanathan, L.; Cristin, R. Optimal Routing and Deep Regression Neural Network for Rice Leaf Disease Prediction in IoT. Int. J. Comput. Methods 2021, 18. [Google Scholar] [CrossRef]
- van Kloppenburg, T.; Kassahun, A.; Catal, C. Crop yield prediction using machine learning: A systematic literature review. Comput. Electron. Agric. 2020, 177, 105709. [Google Scholar] [CrossRef]
- Chen, S.; Jiang, T.; Ma, H.; He, C.; Xu, F.; Malone, R.W.; Feng, H.; Yu, Q.; Siddique, K.H.; Dong, Q.; et al. Dynamic within-season irrigation scheduling for maize production in Northwest China A Method Based on Weather Data Fusion and yield prediction by DSSAT. Agric. For. Meteorol. 2020, 285, 107928. [Google Scholar] [CrossRef]
- Veerakachen, W.; Raksapatcharawong, M. RiceSAP: An Efficient Satellite-Based AquaCrop Platform for Rice Crop Monitoring and Yield Prediction on a Farm- to Regional-Scale. Agronomy 2020, 10, 858. [Google Scholar] [CrossRef]
- Sharifi, A. Yield prediction with machine learning algorithms and satellite images. J. Sci. Food Agric. 2020, 101, 891–896. [Google Scholar] [CrossRef]
- Sun, S.; Li, C.; Paterson, A.H.; Chee, P.W.; Robertson, J.S. Image processing algorithms for infield single cotton boll counting and yield prediction. Comput. Electron. Agric. 2019, 166, 104976. [Google Scholar] [CrossRef]
- Das, S.; Christopher, J.; Apan, A.; Choudhury, M.R.; Chapman, S.; Menzies, N.W.; Dang, Y.P. Evaluation of water status of genotypes to aid prediction of yield on sodic soils using UAV-thermal imaging and machine learning. Agric. For. Meteorol. 2021, 307, 108477. [Google Scholar] [CrossRef]
- Esfandiarpour-Boroujeni, I.; Karimi, E.; Shirani, H.; Esmaeilizadeh, M.; Mosleh, Z. Yield prediction of apricot using a hybrid particle swarm optimization-imperialist competitive algorithm- support vector regression (PSO-ICA-SVR) method. Sci. Hortic. 2019, 257, 108756. [Google Scholar] [CrossRef]
- Aylak, B.L. Artificial Intelligence and Machine Learning Applications in Agricultural Supply Chain: A Critical Commentary. Fresenius Environ. Bull. 2021, 30, 8905–8916. [Google Scholar]
- Mariano, C.; Mónica, B. A random forest-based algorithm for data-intensive spatial interpolation in crop yield mapping. Comput. Electron. Agric. 2021, 184, 106094. [Google Scholar] [CrossRef]
- Wei, Z.-S.; Yang, J.-Y.; Shen, H.-B.; Yu, D.-J. A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites. IEEE Trans. Nanobiosci. 2015, 14, 746–760. [Google Scholar] [CrossRef]
- Chen, J.; Li, K.; Tang, Z.; Bilal, K.; Yu, S.; Weng, C.; Li, K. A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment. IEEE Trans. Parallel Distrib. Syst. 2017, 28, 919–933. [Google Scholar] [CrossRef] [Green Version]
- da Silva, J.C.; Medeiros, V.; Garrozi, C.; Montenegro, A.; Gonçalves, G.E. Random forest techniques for spatial interpolation of evapotranspiration data from Brazilian’s Northeast. Comput. Electron. Agric. 2019, 166, 105017. [Google Scholar] [CrossRef]
- Shabani, S.; Samadianfard, S.; Sattari, M.T.; Mosavi, A.; Shamshirband, S.; Kmet, T.; Várkonyi-Kóczy, A.R. Modeling pan evaporation using Gaussian process regression K-nearest neighbors random forest and support vector machines. Atmosphere 2020, 11, 66. [Google Scholar] [CrossRef] [Green Version]
- Lulli, A.; Oneto, L.; Anguita, D. ReForeSt: Random Forests in Apache Spark. Artif. Neural Netw. Mach. Learn. 2017, PT II 10614, 331–339. [Google Scholar]
- Ahmed, A.M.; Deo, R.C.; Feng, Q.; Ghahramani, A.; Raj, N.; Yin, Z.; Yang, L. Deep learning hybrid model with Boruta-Random forest optimiser algorithm for streamflow forecasting with climate mode indices, rainfall, and periodicity. J. Hydrol. 2021, 599, 126350. [Google Scholar] [CrossRef]
- Lin, W.; Wu, Z.; Lin, L.; Wen, A.; Li, J. An Ensemble Random Forest Algorithm for Insurance Big Data Analysis. IEEE Access 2017, 5, 16568–16575. [Google Scholar] [CrossRef]
- Xu, M.; Chen, H.; Varshney, P.K. Dimensionality reduction for registration of high-dimensional data sets. IEEE Trans. Image Process. 2013, 22, 3041–3049. [Google Scholar]
- Wang, H.; Zhu, Y.; Li, W.; Cao, W.; Tian, Y. Integrating remotely sensed leaf area index and leaf nitrogen accumulation with RiceGrow model based on particle swarm optimization algorithm for rice grain yield assessment. J. Appl. Remote Sens. 2014, 8, 83674. [Google Scholar] [CrossRef] [Green Version]
- Paudel, D.; Boogaard, H.; de Wit, A.; Janssen, S.; Osinga, S.; Pylianidis, C.; Athanasiadis, I.N. Machine learning for large-scale crop yield forecasting. Agric. Syst. 2020, 187, 103016. [Google Scholar] [CrossRef]
- Romeiko, X.X.; Guo, Z.; Pang, Y.; Lee, E.K.; Zhang, X. Comparing Machine Learning Approaches for Predicting Spatially Explicit Life Cycle Global Warming and Eutrophication Impacts from Corn Production. Sustainability 2020, 12, 1481. [Google Scholar] [CrossRef] [Green Version]
- Kang, Y.; Ozdogan, M.; Zhu, X.; Ye, Z. Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environ. Res. Lett. 2020, 15, 064005. [Google Scholar] [CrossRef]
- Feng, P.; Wang, B.; Liu, D.L.; Waters, C.; Xiao, D.; Shi, L.; Yu, Q. Dynamic wheat yield forecasts are improved by a hybrid approach using a biophysical model and machine learning technique. Agric. For. Meteorol. 2020, 285–286, 107922. [Google Scholar] [CrossRef]
- Grace, R.K.; Induja, K.; Lincy, M. Enrichment of Crop Yield Prophecy Using Machine Learning Algorithms. Intell. Autom. Soft Comput. 2022, 31, 279–296. [Google Scholar] [CrossRef]
- Wen, G.; Ma, B.-L.; Vanasse, A.; Caldwell, C.D.; Earl, H.J.; Smith, D.L. Machine learning-based canola yield prediction for site-specific nitrogen recommendations. Nutr. Cycl. Agroecosyst. 2021, 121, 241–256. [Google Scholar] [CrossRef]
- Folberth, C.; Baklanov, A.; Balkovič, J.; Skalský, R.; Khabarov, N.; Obersteiner, M. Spatio-temporal downscaling of gridded crop model yield estimates based on machine learning. Agric. For. Meteorol. 2018, 264, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Khanal, S.; Klopfenstein, A.; KC, K.; Ramarao, V.; Fulton, J.; Douridas, N.; Shearer, S.A. Assessing the impact of agricultural field traffic on corn grain yield using remote sensing and machine learning. Soil Tillage Res. 2021, 208, 104880. [Google Scholar] [CrossRef]
- He, Z.; Li, S.; Zhai, P.; Deng, Y. Mapping Rice Planting Area Using Multi-Temporal Quad-Pol Radarsat-2 Datasets and Random Forest Algorithm. IEEE Int. Geosci. Remote Sens. Symp. 2020, 4653–4656. [Google Scholar] [CrossRef]
- Bu, S.-J.; Kang, H.-B.; Cho, S.-B. Ensemble of Deep Convolutional Learning Classifier System Based on Genetic Algorithm for Database Intrusion Detection. Electronics 2022, 11, 745. [Google Scholar] [CrossRef]
- Alam, S.; Kalpoma, K.; Karim, S.; Al Sefat, A.; Kudoh, J.-I. Boro rice yield estimation model using modis NDVI data for bangladesh. IEEE Int. Geosci. Remote Sens. Symp. 2019, 7330–7333. [Google Scholar] [CrossRef]
- A Kalpoma, K.; Rahman, A. Web-based monitoring of boro rice production using improvised NDVI threshold of modis MOD13Q1 and MYD13Q1 images. IEEE Int. Geosci. Remote Sens. Symp. 2021, 6877–6880. [Google Scholar] [CrossRef]
Training Dataset | Agricultural Validation Dataset | ||||
---|---|---|---|---|---|
Name | No. of Diseases | (numMin, numMaj) | Name | No. of Sequences | (numMin, numMaj) |
Train-dxs | 19,550 | (5619, 30,709) | Ytest95 | 95 | (1938, 16,319) |
RYtestset219 | 219 | (6098, 21,996) |
Classification | Predict Positives | Predict Negatives |
---|---|---|
Actual positives | True Positives (TP) | False Negatives (FN) |
Actual negatives | False Positives (FP) | True Negatives (TN) |
The Algorithms | Accuracy Value |
---|---|
CPRF | 96.253% |
CRF | 92.321% |
Spark-MLRF | 86.159% |
Random Forest | 75.072% |
Non-Linear | 63.084% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Xie, L.; Wang, Z.; Huang, C. Cascade Parallel Random Forest Algorithm for Predicting Rice Diseases in Big Data Analysis. Electronics 2022, 11, 1079. https://doi.org/10.3390/electronics11071079
Zhang L, Xie L, Wang Z, Huang C. Cascade Parallel Random Forest Algorithm for Predicting Rice Diseases in Big Data Analysis. Electronics. 2022; 11(7):1079. https://doi.org/10.3390/electronics11071079
Chicago/Turabian StyleZhang, Lei, Lun Xie, Zhiliang Wang, and Chen Huang. 2022. "Cascade Parallel Random Forest Algorithm for Predicting Rice Diseases in Big Data Analysis" Electronics 11, no. 7: 1079. https://doi.org/10.3390/electronics11071079
APA StyleZhang, L., Xie, L., Wang, Z., & Huang, C. (2022). Cascade Parallel Random Forest Algorithm for Predicting Rice Diseases in Big Data Analysis. Electronics, 11(7), 1079. https://doi.org/10.3390/electronics11071079