Classification Efficacy Using K-Fold Cross-Validation and Bootstrapping Resampling Techniques on the Example of Mapping Complex Gully Systems
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Data Acquisition and Pre-Processing
2.3. Gully Classification
2.4. Reference Data Collection and Accuracy Assessment
2.5. Statistical Analysis
3. Results
3.1. Spectral Bands, Land Cover Classes and Seasons as Determinants of Reflectance
3.2. Accuracy Assessment of Gully Mapping
3.3. Gully Distribution
4. Discussion
5. Conclusions
- Gullies were spectrally different in all bands of the PlanetScope images, both in the dry and the wet seasons.
- NDVI values did not differ from all land cover classes regarding the reflectance values; thus, it was not involved in gully classification.
- Dry and wet seasons ensured different classification accuracy, but gully extraction was successful. RF outperformed the SVM algorithm in terms of OA, but the differences of the OAs were < 4%. Differences were larger in the dry season (3.5%) and smaller in the wet season (~1%).
- Generally, based on the OAs, CV performed better with the RF algorithm than the bootstrapping (with ~1.0–1.5% differences), but on a class level, bootstrapping provided the most accurate gully extraction with the RF in the dry season, whereas CV was efficient with SVM in the wet season.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Meyer, L.D.; Wischmeier, W.H. Mathematical simulation of the process of soil erosion by water. Trans. ASAE 1969, 12, 754–758. [Google Scholar]
- Morgan, R.P.C. Soil Erosion and Conservation; John Wiley & Sons: Hoboken, NJ, USA, 2009; ISBN 140514467X. [Google Scholar]
- Borrelli, P.; Robinson, D.A.; Fleischer, L.R.; Lugato, E.; Ballabio, C.; Alewell, C.; Meusburger, K.; Modugno, S.; Schütt, B.; Ferro, V.; et al. An assessment of the global impact of 21st century land use change on soil erosion. Nat. Commun. 2017, 8, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Omuto, C.; Nachtergaele, F.; Rojas, R.V. State of the Art Report on Global and Regional Soil Information: Where Are We? Where To Go? Food and Agriculture Organization of the United Nations: Rome, Italy, 2013; ISBN 9251074496. [Google Scholar]
- Kertész, A.; Křeček, J. Landscape degradation in the world and in Hungary. Hung. Geogr. Bull. 2019, 68, 201–221. [Google Scholar] [CrossRef] [Green Version]
- Phinzi, K.; Ngetar, N.S.; Ebhuoma, O. Soil erosion risk assessment in the Umzintlava catchment (T32E), Eastern Cape, South Africa, using RUSLE and random forest algorithm. S. Afr. Geogr. J. 2020, 103, 139–162. [Google Scholar] [CrossRef]
- Strategic Plan for the Department of Agriculture, Pretoria, South Africa. 2007. Available online: https://www.gov.za/sites/default/files/gcis_document/201409/agricstratplan2007.pdf (accessed on 16 July 2020).
- Meadows, M.E.; Hoffman, M.T. The nature, extent and causes of land degradation in South Africa: Legacy of the past, lessons for the future? Area 2002, 34, 428–437. [Google Scholar] [CrossRef]
- Beckedahl, H.R.; de Villiers, A.B. Accelerated erosion by piping in the Eastern Cape Province, South Africa. S. Afr. Geogr. J. 2000, 82, 157–162. [Google Scholar] [CrossRef]
- Kakembo, V.; Rowntree, K.M. The relationship between land use and soil erosion in the communal lands near Peddie town, Eastern Cape, South Africa. Land Degrad. Dev. 2003, 14, 39–49. [Google Scholar] [CrossRef]
- Mhangara, P.; Kakembo, V.; Lim, K.J. Soil erosion risk assessment of the Keiskamma catchment, South Africa using GIS and remote sensing. Environ. Earth Sci. 2012, 65, 2087–2102. [Google Scholar] [CrossRef]
- Phinzi, K.; Ngetar, N.S. Land use/land cover dynamics and soil erosion in the Umzintlava catchment (T32E), Eastern Cape, South Africa. Trans. R. Soc. S. Afr. 2019, 74, 223–237. [Google Scholar] [CrossRef]
- Kakembo, V.; Xanga, W.W.; Rowntree, K. Topographic thresholds in gully development on the hillslopes of communal areas in Ngqushwa Local Municipality, Eastern Cape, South Africa. Geomorphology 2009, 110, 188–194. [Google Scholar] [CrossRef]
- Le Roux, J.J.; Sumner, P.D. Factors controlling gully development: Comparing continuous and discontinuous gullies. Land Degrad. Dev. 2012, 23, 440–449. [Google Scholar] [CrossRef] [Green Version]
- Mararakanye, N.; Le Roux, J.J. Gully location mapping at a national scale for South Africa. S. Afr. Geogr. J. 2012, 94, 208–218. [Google Scholar] [CrossRef]
- Poesen, J.; Nachtergaele, J.; Verstraeten, G.; Valentin, C. Gully erosion and environmental change: Importance and research needs. Catena 2003, 50, 91–133. [Google Scholar] [CrossRef]
- Zhang, T.; Liu, G.; Duan, X.; Wilson, G.V. Spatial distribution and morphologic characteristics of gullies in the Black Soil Region of Northeast China: Hebei watershed. Phys. Geogr. 2016, 37, 228–250. [Google Scholar] [CrossRef]
- Zgłobicki, W.; Poesen, J.; Cohen, M.; Del Monte, M.; García-Ruiz, J.M.; Ionita, I.; Niacsu, L.; Machová, Z.; Martín-Duque, J.F.; Nadal-Romero, E.; et al. The potential of permanent gullies in Europe as geomorphosites. Geoheritage 2019, 11, 217–239. [Google Scholar] [CrossRef]
- Valentin, C.; Poesen, J.; Li, Y. Gully erosion: Impacts, factors and control. Catena 2005, 63, 132–153. [Google Scholar] [CrossRef]
- Phinzi, K.; Ngetar, N.S. Mapping soil erosion in a quaternary catchment in Eastern Cape using geographic information system and remote sensing. S. Afr. J. Geomat. 2017, 6, 11. [Google Scholar] [CrossRef] [Green Version]
- Seutloali, K.E.; Dube, T.; Mutanga, O. Assessing and mapping the severity of soil erosion using the 30-m Landsat multispectral satellite data in the former South African homelands of Transkei. Phys. Chem. Earth 2017, 100, 296–304. [Google Scholar] [CrossRef]
- Phinzi, K.; Ngetar, N.S.; Ebhuoma, O.; Szabó, S. Comparison of rusle and supervised classification algorithms for identifying erosion-prone areas in a mountainous rural landscape. Carpathian J. Earth Environ. Sci. 2020, 15, 405–413. [Google Scholar] [CrossRef]
- Shruthi, R.B.V.; Kerle, N.; Jetten, V. Object-based gully feature extraction using high spatial resolution imagery. Geomorphology 2011, 134, 260–268. [Google Scholar] [CrossRef]
- Seutloali, K.E.; Beckedahl, H.R.; Dube, T.; Sibanda, M. An assessment of gully erosion along major armoured roads in south-eastern region of South Africa: A remote sensing and GIS approach. Geocarto Int. 2016, 31, 225–239. [Google Scholar] [CrossRef]
- Phinzi, K.; Ngetar, N.S. The assessment of water-borne erosion at catchment level using GIS-based RUSLE and remote sensing: A review. Int. Soil Water Conserv. Res. 2019, 7, 27–46. [Google Scholar] [CrossRef]
- Casalí, J.; López, J.J.; Giráldez, J.V. Ephemeral gully erosion in southern Navarra (Spain). Catena 1999, 36, 65–84. [Google Scholar] [CrossRef]
- Knight, J.; Spencer, J.; Brooks, A.; Phinn, S. Large-area, high-resolution remote sensing based mapping of alluvial gully erosion in Australia’s tropical rivers. In Proceedings of the 5th Australian Stream Management Conference, Thurgoona, Australia, 21–25 May 2007; Institute for Land, Water and Society, Charles Sturt University: Bathurst, Australia, 2007; Volume 2, pp. 199–204. [Google Scholar] [CrossRef]
- Karydas, C.; Panagos, P. Towards an assessment of the ephemeral gully erosion potential in Greece using google earth. Water 2020, 12, 603. [Google Scholar] [CrossRef] [Green Version]
- Liu, K.; Ding, H.; Tang, G.; Zhu, A.X.; Yang, X.; Jiang, S.; Cao, J. An object-based approach for two-level gully feature mapping using high-resolution DEM and imagery: A case study on hilly loess plateau region, China. Chin. Geogr. Sci. 2017, 27, 415–430. [Google Scholar] [CrossRef]
- Duro, D.C.; Franklin, S.E.; Dubé, M.G. A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery. Remote Sens. Environ. 2012, 118, 259–272. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, L.; Du, B. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geosci. Remote Sens. Mag. 2016, 4, 22–40. [Google Scholar] [CrossRef]
- Ghorbanzadeh, O.; Shahabi, H.; Mirchooli, F.; Valizadeh Kamran, K.; Lim, S.; Aryal, J.; Jarihani, B.; Blaschke, T. Gully erosion susceptibility mapping (GESM) using machine learning methods optimized by the multi-collinearity analysis and K-fold cross-validation. Geomat. Nat. Hazards Risk 2020, 11, 1653–1678. [Google Scholar] [CrossRef]
- Thanh Noi, P.; Kappas, M. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery. Sensors 2017, 18, 18. [Google Scholar] [CrossRef] [Green Version]
- Phinzi, K.; Abriha, D.; Bertalan, L.; Holb, I.; Szabó, S. Machine learning for gully feature extraction based on a pan-sharpened multispectral image: Multiclass vs. Binary approach. ISPRS Int. J. Geo Inf. 2020, 9, 252. [Google Scholar] [CrossRef] [Green Version]
- Heydari, S.S.; Mountrakis, G. Meta-analysis of deep neural networks in remote sensing: A comparative study of mono-temporal classification to support vector machines. ISPRS J. Photogramm. Remote Sens. 2019, 152, 192–210. [Google Scholar] [CrossRef]
- Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
- Gafurov, A.M.; Yermolayev, O.P. Automatic gully detection: Neural networks and computer vision. Remote Sens. 2020, 12, 1743. [Google Scholar] [CrossRef]
- Dong, L.; Xing, L.; Liu, T.; Du, H.; Mao, F.; Han, N.; Li, X.; Zhou, G.; Zhu, D.; Zheng, J.; et al. Very High Resolution Remote Sensing Imagery Classification Using a Fusion of Random Forest and Deep Learning Technique-Subtropical Area for Example. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 113–128. [Google Scholar] [CrossRef]
- Ghamisi, P.; Rasti, B.; Yokoya, N.; Wang, Q.; Hofle, B.; Bruzzone, L.; Bovolo, F.; Chi, M.; Anders, K.; Gloaguen, R. Multisource and multitemporal data fusion in remote sensing: A comprehensive review of the state of the art. IEEE Geosci. Remote Sens. Mag. 2019, 7, 6–39. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J. Multi-source remote sensing data fusion: Status and trends. Int. J. Image Data Fusion 2010, 1, 5–24. [Google Scholar] [CrossRef] [Green Version]
- Shahabi, H.; Jarihani, B.; Tavakkoli Piralilou, S.; Chittleborough, D.; Avand, M.; Ghorbanzadeh, O. A Semi-Automated Object-Based Gully Networks Detection Using Different Machine Learning Models: A Case Study of Bowen Catchment, Queensland, Australia. Sensors 2019, 19, 4893. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Phinzi, K.; Holb, I.; Szabó, S. Mapping Permanent Gullies in an Agricultural Area Using Satellite Images: Efficacy of Machine Learning Algorithms. Agronomy 2021, 11, 333. [Google Scholar] [CrossRef]
- van Breda Weaver, A. The distribution of soil erosion as a function of slope aspect and parent material in Ciskei, Southern Africa. GeoJournal 1991, 23, 29–34. [Google Scholar] [CrossRef]
- Hilbich, C.; Daut, G.; Mäusbacher, R.; Helmschrot, J. A landscape-based model to characterize the evolution and recent dynamics of wetlands in the Umzimvubu headwaters, Eastern Cape, South Africa. In Wetlands: Modelling, Monitoring, Management; Kotkowski, W., Maltby, E., Miroslaw–Swiatek, D., Okruszko, T., Szatylowicz, J., Eds.; Taylor & Francis: Abingdon, UK, 2007; pp. 61–69. [Google Scholar]
- Adam, E.; Mutanga, O.; Odindi, J.; Abdel-Rahman, E.M. Land-use/cover classification in a heterogeneous coastal landscape using RapidEye imagery: Evaluating the performance of random forest and support vector machines classifiers. Int. J. Remote Sens. 2014, 35, 3440–3458. [Google Scholar] [CrossRef]
- Sabat-Tomala, A.; Raczko, E. Comparison of Support Vector Machine and Random Forest Algorithms for Invasive and Expansive Species Classification Using Airborne Hyperspectral Data. Remote Sens. 2020, 12, 516. [Google Scholar] [CrossRef] [Green Version]
- Papp, L.; van Leeuwen, B.; Szilassi, P.; Tobak, Z.; Szatmári, J.; Árvai, M.; Mészáros, J.; Pásztor, L. Monitoring invasive plant species using hyperspectral remote sensing data. Land 2021, 10, 29. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Boehmke, B.; Greenwell, B.M. Hands-On Machine Learning with R; CRC Press: Boca Raton, FL, USA, 2019; ISBN 1000730190. [Google Scholar]
- Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How many trees in a random forest? In Proceedings of the 8th International Workshop on Machine Learning and Data Mining in Pattern Recognition, Berlin, Germany, 13–20 July 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 154–168. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; ISBN 1475732643. [Google Scholar]
- Brenning, A. Spatial prediction models for landslide hazards: Review, comparison and evaluation. Nat. Hazards Earth Syst. Sci. 2005, 5, 853–862. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
- Heydari, S.S.; Mountrakis, G. Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites. Remote Sens. Environ. 2018, 204, 648–658. [Google Scholar] [CrossRef]
- Pontius, R.G.; Millones, M. Death to Kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. Int. J. Remote Sens. 2011, 32, 4407–4429. [Google Scholar] [CrossRef]
- Flight, L.; Julious, S.A. The disagreeable behaviour of the kappa statistic. Pharm. Stat. 2015, 14, 74–78. [Google Scholar] [CrossRef] [PubMed]
- Delgado, R.; Tibau, X.-A. Why Cohen’s Kappa should be avoided as performance measure in classification. PLoS ONE 2019, 14, e0222916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: Berlin/Heidelberg, Germany, 2013; Volume 26. [Google Scholar]
- Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
- Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Field, A. Discovering Statistics Using IBM SPSS Statistics; Sage: Newcastle upon Tyne, UK, 2013; ISBN 1446274586. [Google Scholar]
- Lee, S.; Lee, D.K. What is the proper way to apply the multiple comparison test? Korean J. Anesth. 2018, 71, 353. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- McHugh, M.L. Multiple comparison analysis testing in ANOVA. Biochem. Med. 2011, 21, 203–209. [Google Scholar] [CrossRef] [PubMed]
- Povey, A.C.; Grainger, R.G. Known and unknown unknowns: Uncertainty estimation in satellite remote sensing. Atmos. Meas. Tech. 2015, 8, 4699–4718. [Google Scholar] [CrossRef] [Green Version]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 20 August 1995; Volume 14, pp. 1137–1145. [Google Scholar]
- Kim, J.-H. Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 2009, 53, 3735–3745. [Google Scholar] [CrossRef]
- Vrieling, A.; Rodrigues, S.C.; Bartholomeus, H.; Sterk, G. Automatic identification of erosion gullies with ASTER imagery in the Brazilian Cerrados. Int. J. Remote Sens. 2007, 28, 2723–2738. [Google Scholar] [CrossRef]
- Lu, D.; Weng, Q. A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 2007, 28, 823–870. [Google Scholar] [CrossRef]
- Sepuru, T.K.; Dube, T. Understanding the spatial distribution of eroded areas in the former rural homelands of South Africa: Comparative evidence from two new non-commercial multispectral sensors. Int. J. Appl. Earth Obs. Geoinf. 2018, 69, 119–132. [Google Scholar] [CrossRef]
- Orti, M.V.; Winiwarter, L.; Corral-Pazos-de-Provens, E.; Williams, J.G.; Bubenzer, O.; Höfle, B. Use of TanDEM-X and Sentinel products to derive gully activity maps in Kunene Region (Namibia) based on automatic iterative Random Forest approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 607–623. [Google Scholar] [CrossRef]
Variables | SS | df | F | p | ω2p |
---|---|---|---|---|---|
Model | 6.99 × 109 | 55 | 860.4 | <0.001 | 0.923 |
Bands | 1.00 × 109 | 3 | 2256.1 | <0.001 | 0.633 |
Season | 3.80 × 109 | 1 | 25,715.0 | <0.001 | 0.868 |
Class | 9.79 × 108 | 6 | 1104.2 | <0.001 | 0.629 |
Bands × Season | 4.48 × 108 | 3 | 1010.0 | <0.001 | 0.436 |
Bands × Class | 5.30 × 108 | 18 | 199.3 | <0.001 | 0.477 |
Season × Class | 9.62 × 107 | 6 | 108.5 | <0.001 | 0.141 |
Bands × Season × Class | 1.34 × 108 | 18 | 50.3 | <0.001 | 0.185 |
Residuals | 5.70 × 108 | 3860 | |||
Total | 2.96 × 1010 | 3916 |
Dry Season | Wet Season | ||
---|---|---|---|
Band | Importance Ranking (%) | Band | Importance Ranking (%) |
NIR | 31 | NIR | 35 |
Red | 26 | Red | 32 |
Green | 25 | Green | 21 |
Blue | 17 | Blue | 12 |
Algorithm | Area (ha) | Standard Error (ha) | ± 95% CI (ha) | PA (%) | UA (%) | F1-Score |
---|---|---|---|---|---|---|
rf-d-b | 88 | 6.1 | 14.4 | 83.6 | 90.6 | 0.92 |
rf-d-cv | 91.3 | 7.6 | 17.1 | 76.3 | 89.3 | 0.91 |
rf-w-cv | 54.6 | 11.3 | 24.3 | 47.9 | 77.9 | 0.82 |
rf-w-b | 55.2 | 11.5 | 25.0 | 46.8 | 77 | 0.82 |
svm-d-cv | 32.6 | 10.1 | 21.1 | 35.4 | 92.3 | 0.86 |
svm-d-b | 31.1 | 10.5 | 21.8 | 32.5 | 93.4 | 0.85 |
svm-w-cv | 57.2 | 3.7 | 18.8 | 89.2 | 81 | 0.88 |
svm-w-b | 57.4 | 6.4 | 19.3 | 74.1 | 79.4 | 0.86 |
Resampling Technique | Classifier | Season | ||||
---|---|---|---|---|---|---|
Error | Bootstrap | k-Fold CV | RF | SVM | Dry | Wet |
Commission (%) | 40.8 | 37.8 | 36.4 | 42.2 | 43.1 | 35.5 |
Omission (%) | 14.9 | 14.9 | 16.3 | 13.5 | 8.6 | 21.2 |
Standard error (ha) | 8.6 | 8.2 | 9.1 | 7.7 | 8.6 | 8.2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Phinzi, K.; Abriha, D.; Szabó, S. Classification Efficacy Using K-Fold Cross-Validation and Bootstrapping Resampling Techniques on the Example of Mapping Complex Gully Systems. Remote Sens. 2021, 13, 2980. https://doi.org/10.3390/rs13152980
Phinzi K, Abriha D, Szabó S. Classification Efficacy Using K-Fold Cross-Validation and Bootstrapping Resampling Techniques on the Example of Mapping Complex Gully Systems. Remote Sensing. 2021; 13(15):2980. https://doi.org/10.3390/rs13152980
Chicago/Turabian StylePhinzi, Kwanele, Dávid Abriha, and Szilárd Szabó. 2021. "Classification Efficacy Using K-Fold Cross-Validation and Bootstrapping Resampling Techniques on the Example of Mapping Complex Gully Systems" Remote Sensing 13, no. 15: 2980. https://doi.org/10.3390/rs13152980