Next Article in Journal
Synergies between Sustainable Farming, Green Technology, and Energy Policy for Carbon-Free Development
Previous Article in Journal
Transcriptomic Analysis Reveals the Mechanism of MtLOX24 in Response to Methyl Jasmonate Stress in Medicago truncatula
Previous Article in Special Issue
Walnut Recognition Method for UAV Remote Sensing Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of the Habitat Suitability for Zhuji Torreya Based on Machine Learning Algorithms

1
School of Ecology and Applied Meteorology, Nanjing University of Information Science and Technology, Nanjing 210044, China
2
Nanning Meteorological Bureau, Nanning 530029, China
3
Fujian Provincial Climate Center, Fuzhou 350025, China
4
Heilongjiang Provincial Climate Center, Harbin 150030, China
5
Zhuji Meteorological Bureau, Zhuji 311800, China
6
Key Laboratory of Transportation Meteorology of China Meteorological Administration, Nanjing Joint Institute for Atmospheric Sciences, Nanjing 210041, China
*
Author to whom correspondence should be addressed.
Agriculture 2024, 14(7), 1077; https://doi.org/10.3390/agriculture14071077
Submission received: 4 June 2024 / Revised: 30 June 2024 / Accepted: 2 July 2024 / Published: 4 July 2024

Abstract

:
Torreya, with its dual roles in both food and medicine, has faced multiple challenges in its cultivation in Zhuji city due to frequent global climate disasters in recent years. Therefore, conducting a study on suitable zoning for Torreya habitats based on climatic, topographic, and soil factors is highly important. In this study, we utilized the latitude and longitude coordinates of Torreya distribution points and ecological factor raster data. We thoroughly analyzed the ecological environmental characteristics of the climate, topography, and soil at Torreya distribution points via both physical modeling and machine learning methods. Zhuji city was classified into suitable, moderately suitable, and unsuitable zones to determine regions conducive to Torreya growth. The results indicate that suitable zones for Torreya cultivation in Zhuji city are distributed mainly in mountainous and hilly areas, while unsuitable zones are found predominantly in central basins and northern river plain networks. Moderately suitable zones are located in transitional areas between suitable and unsuitable zones. Compared to climatic factors, soil and topographic factors more significantly restrict Torreya cultivation. Machine learning algorithms can also achieve suitability zoning with a more concise and efficient classification process. In this study, the random forest (RF) algorithm demonstrated greater predictive accuracy than the support vector machine (SVM) and naive Bayes (NB) algorithms, achieving the best classification results.

1. Introduction

In contemporary society, as people’s pursuit of health and sustainable lifestyles continues to intensify, more attention is being given to traditional forestry industries, thus creating new development opportunities [1]. Among these, Torreya, known as the “King of Nuts”, has multiple purposes, as it is used for food, medicine, oil, timber, decoration, and environmental protection; it is one of China’s ancient forestry resources [2,3]. In recent years, Torreya has gained prominence in the market as a supplier of healthy leisure nuts, attracting widespread consumer favor and sustaining market demand [4,5]. The Torreya industry plays a crucial role in poverty alleviation and prosperity for farmers in the Kuaiji Mountain region of Zhejiang Province [6,7]. For instance, by providing an average income of approximately 11,600 yuan per acre, Torreya is considered one of the most economically profitable tree species per unit area. Natural resources such as climate, topography, and soil have a certain influence on the growth of forests or crops [8]. For example, Torreya is a neutral to slightly shady tree species that shows strong adaptability to regions with high humidity and high altitudes characterized by heat [9]. Therefore, in this study, we identified the most suitable areas for Torreya cultivation in Zhuji city by analyzing the topography, climate, soil ecology, and environmental factors of Torreya planting areas in various regions of Zhuji city.
Scholars both in China and internationally have conducted a series of related studies employing two main methodological approaches: physical modeling methods and machine learning methods [10,11,12]. Traditional physical modeling methods typically rely on linear relationships among ecological factors, potentially overlooking nonlinear and complex interactions [13,14]. Machine learning methods can identify nonlinear relationships among multiple factors, thus more accurately determining suitability ranges [15,16]. In this era of rapid artificial intelligence advancement, compared to traditional physical modeling methods, random forest (RF) and support vector machine (SVM) methods exhibit significant advantages in classification. RF can handle large feature sets and high-dimensional data, performs well with large datasets, and demonstrates high accuracy and robustness. Moreover, SVMs are effective at handling nonlinear data and complex boundaries [17,18]. The advantages of machine learning algorithms in assessing the importance of feature variables and establishing predictive models have been widely applied in various agricultural fields and have included studies on crop climate suitability, yield simulation, and agricultural risk assessment. For instance, Abel Chemura et al. [19] used extreme gradient boosting methods by combining yield data and agronomic variables to model the current climate suitability of maize, sorghum, cassava, and groundnut in Ghana. They used future climate predictions for the 2050s under two greenhouse gas emission scenarios to forecast changes in crop suitability ranges. Guoyong Leng et al. [20] compared the performances of process-based, regression, and machine learning models for simulating US maize yields. Their results showed that machine learning could better reproduce yield variability and extreme conditions, providing more efficient probabilistic risk analysis to assess climate impacts on crop production. Ying Han et al. [21] constructed a frost risk assessment model for tea in Hangzhou by using the RF algorithm. By considering indicators such as the minimum temperature, altitude, tea planting area, and tea yield, they assessed the risk for different cold-resistant tea varieties to guide local agricultural production and provide disaster warning.
However, in most suitability zoning studies, traditional physical modeling methods predominate, with little utilization of machine learning algorithms such as RFs. Therefore, in this study, based on Torreya samples and three categories of raster data, namely, climate, topography, and soil raster data, we fully exploited the feature importance calculation function ‘feature_importances’ to evaluate the importance of each feature variable, thereby constructing and selecting features. By comparing and analyzing different algorithms, we identified the optimal features and algorithm models capable of rapidly and accurately zoning the suitability of Torreya in Zhuji city.

2. Data and Methods

2.1. Research Area

As one of China’s major Torreya production areas, Zhuji city boasts a long history of Torreya cultivation, spanning over 1300 years. The cultivated area of Torreya reaches 67 km2, with 41,000 Torreya trees aged more than a hundred years within the city. Torreya production in Zhuji city is concentrated mainly in the southeastern region, which includes Zhaojia town, Dongbaihu town, Donghe township, Fengqiao town, Huangshan town, and Chenzhai town. These areas benefit from abundant heat resources conducive to the growth and development of Torreya. Adequate precipitation ensures a sufficient water supply in these planting areas, thereby promoting the healthy growth of Torreya. The soil types are mainly yellow-red soil, mountain red soil, yellow soil, and red soil, which exhibit weak acidity favorable for Torreya root growth and nutrient absorption. Moreover, the existing planting areas are mostly hilly terrain. The specific location and division of Zhuji city are shown in Figure 1.

2.2. Data

2.2.1. Data Introduction

The data used in this study included climate data, terrain data, soil data, land use data, and Torreya sample data. Climate data were sourced from the Zhuji Meteorological Bureau, comprising meteorological data from 47 regional automatic stations and 1 national basic station spanning from 2006 to 2020. The dataset covered daily average temperature (°C), maximum temperature (°C), precipitation (mm), and average relative humidity (%). The terrain data included elevation (m), slope (°), and aspect and were extracted from a 30 m resolution digital elevation model (DEM) of Zhuji city acquired from the Geographic Spatial Data Cloud Platform. The soil data included soil types, soil layer thickness (m), and soil pH values and were sourced from the Basic Attributes Data Set of the China High-resolution National Soil Information Grid of the National Tibetan Plateau Data Center. The land use data were obtained from the National Earth System Science Data Center. We used GPS instruments to collect latitude and longitude information for the Torreya ancient tree sample points, resulting in approximately 1600 Torreya sample points. Among these, approximately 1200 sample points belonging to the suitable growth category, approximately 400 belonging to the moderately suitable growth category, and approximately 800 belonging to the unsuitable growth category were randomly generated within the urban area of Zhuji.

2.2.2. Data Processing

In our study, we conducted quality checks and comparisons on soil, terrain, and land use data. During this process, various data sources were used and cross-validated to ensure the accuracy and reliability of the data. Specifically, official data provided by local governments and relevant departments were compared with each other and with the data in the existing studies and literature to validate the data consistency and correctness [13,14].
To spatialize the meteorological data, statistical methods combined with geographic information system (GIS) technology were employed. For climate factors such as annual average temperature, annual accumulated temperature above 10 °C, annual effective accumulated temperature above 10 °C, and annual precipitation, excluding potential evapotranspiration, a mixed interpolation method was used. Specifically, this method combined a multivariate regression model with the spatial interpolation of residuals to establish a spatial estimation model for meteorological data. In our study, we used the longitude, latitude, and elevation of meteorological stations to construct multivariate regression models for meteorological elements via SPSS software (v.22). This method of inexact interpolation treats the difference between station-estimated values and observed values as residuals; these values are typically influenced by small-scale terrain variations. Subsequently, residual processing was conducted, and spatial secondary interpolation was performed using the inverse distance weighting method in GIS. For the spatialization of potential evapotranspiration, the inverse distance weighting interpolation method was directly applied. We used climate data from 48 meteorological stations in Zhuji city from 2006 to 2020, with 85% of the meteorological station data used for spatial interpolation and the remaining 15% used to validate the accuracy of the spatial interpolation results.

2.3. Methods

2.3.1. Growth Climate Indicators

In this study, based on the literature review, several growth climate indices were introduced. These comprehensive indices included Kira’s Warmth Index ( W I ) [22], Holdridge’s bioclimatic temperature ( B T ) and potential evapotranspiration ( P E R ) [23], as well as Xu Wenduo’s Humidity Index ( H I ) [24].
(1)
Kira’s Warmth Index ( W I ):
W I = i = 1 12 t i 5
where W I represents the warmth index and t i is the monthly average temperature greater than 5 °C.
(2)
Holdridge’s biological temperature ( B T ) and possible evapotranspiration rate ( P E R ):
B T = T / 12
P E R = 58.93 B T / P
where B T represents the optimal temperature range required for sustaining life activities within the organism and T denotes the monthly average temperature ranging from 0 °C to 30 °C. A monthly average temperature below 0 °C is regarded as 0; if it exceeds 30 °C, it is considered 30.
(3)
Xu Wenduo’s Humidity Index ( H I ):
H I = P / W I
where P represents the annual precipitation (mm) and W I represents the warmth index (°C).

2.3.2. Physical Model Calculation Method for Climate and Habitat Zoning in Torreya

The physical modeling study is based on Torreya latitude and longitude coordinate data and ecological factor grid data. Spatial analysis and mathematical statistics were used to comprehensively analyze the ecological environmental characteristics, such as climate, soil, and terrain, of the Torreya DPs. This analysis identified significant climate variables, ecological factor indicators, and their respective weights affecting Torreya growth. Subsequently, the suitability of Torreya habitats was quantitatively investigated and evaluated, leading to the delineation of planting zoning results.
The single-factor degree is calculated using the following formula:
N F i = min x i L i min , 1 , L i max x i S i max x i S i min 0 other
where N F i represents the suitability value of the i-th crop factor; L i m i n , L i m a x , and X i denote the minimum optimal value, maximum optimal value, and measured value of the i-th ecological factor, respectively; and S i m i n and S i m a x represent the lower and upper limits of the ecological factor’s suitable range, respectively. A tiered assignment method is adopted to determine the suitability values of soil types.
According to Equation (5), the suitability index of each evaluation unit is calculated separately for each single factor, with values ranging from 0 to 1. The comprehensive suitability calculation employs the exponential weighted sum method, as shown in the following formula:
S j = i = 1 n w i j N F i
In the equation, i and j represent the ecological factors and evaluation units, respectively, where the weight value w i j is obtained through the analytic hierarchy process (AHP), with S j indicating the comprehensive suitability value of each evaluation unit. Based on the evaluation model results, the suitability of each single factor was analyzed cumulatively, resulting in a distribution map of Torreya habitat suitability and subsequent planting zoning.

2.3.3. Machine Learning Methods for Climate and Habitat Zoning in Torreya

  • Algorithm selection
To facilitate the comparison and validation of the classification performance of machine learning models in suitability zoning, we used not only the RF model algorithm but also the SVM model algorithm for mutual comparison and validation.
  • Sample selection partition
To determine the distribution of the main Torreya samples in Zhuji city, we fully used the reference sample points provided by the Zhuji Meteorological Bureau and the forestry resource system of the Zhuji Natural Resources and Planning Bureau. To meet the research requirements, the suitable types of Torreya planted in Zhuji city were classified into three major categories: suitable, moderately suitable, and unsuitable. Approximately 800 unsuitable sample points were randomly generated in the urban area of Zhuji. Considering that the ecological conditions differ between moderately suitable and suitable areas for Torreya, to avoid simply dividing Zhuji city into suitable and unsuitable areas, we divided the Torreya sample points into suitable and moderately suitable categories based on the optimal range and suitable range of indicator factors. Overall, there were approximately 1200 suitable sample points and approximately 400 moderately suitable sample points. A total of approximately 2400 sample points were obtained across the entire study area. Before training the model samples, 100 sample points were randomly extracted from each of the three categories for results validation. To ensure the accuracy of model training and testing, a random sampling method was employed, with 75% of the total samples used as the training set and the remaining 25% used as the testing set to establish the RF and SVM models. The specific sample distribution is shown in Figure 2.
  • Classification feature evaluation
To enhance the effectiveness of Torreya suitability zoning, we comprehensively considered the habitat characteristics of the study area and selected multiple feature variables. However, not all preselected feature variables positively influenced the classification of the Torreya suitability zones. Those feature variables that failed to enhance the classification ability of the Torreya suitability zones may result in prolonged algorithm runtime and decreased classification accuracy. Therefore, before classification, we conducted a feature importance assessment to calculate the importance of each feature variable. By ranking the importance of feature variables, we assessed the accuracy of the test set to obtain a subset of classification features that achieved the highest test set accuracy, thereby improving the model’s precision. In this study, we used the feature importance calculation function “feature_importances” in RF to evaluate the importance of each feature variable, sorting them in descending order based on their importance scores. Subsequently, we employed a sequential forward feature selection method, starting from zero features and iteratively adding feature variables to the model to maximize cross-validation scores for feature combinations. The specific formula is as follows:
P k = n i = 1 t j = 1 D G k i j m k = 1 n i = 1 t j = 1 D G k i j × 100 %
where m , n , and t represent the total number of indicators, the number of classification trees, and the number of nodes in a single tree, respectively. D G k i j denotes the Gini exponential reduction value of the k-th indicator in the j-th node of the tree. P k represents the importance of the k-th indicator.
  • Model parameter optimization
To enhance the performance of the RF model, in our RF model study, we randomly split the dataset into training and testing sets at a ratio of 1:3 and used the sklearn library for model training. Regarding parameter optimization, we employed a combination of random search cross-validation and grid search cross-validation methods. Both the framework parameters and decision tree parameters of the RF model were optimized. Specifically, we set the number of trees in the RF to 100, the maximum number of features to “auto”, the minimum number of samples required for node splitting to 2, and the minimum number of samples required at leaf nodes to 1. These parameter optimizations aimed to improve the performance and generalization ability of the RF model, enabling better adaptation to real-world datasets and enhancing prediction accuracy.
Additionally, to ensure an adequate amount of data for training and evaluating the model, in the SVM model study, the ratio of the testing set to the training set was set to 1:3. Regarding the parameter settings for the SVM model, we opted not to specify the maximum depth. For the choice of kernel function, we employed a linear kernel. Finally, we set the regularization parameter C to 1.

2.3.4. Machine Learning Model Accuracy Evaluation Index

Overall accuracy (OA) is an indicator that measures the proportion of correctly classified pixels among all pixels considered for classification, providing an assessment of overall classification accuracy. User accuracy (UA) evaluates the model’s classification performance for each class. The kappa coefficient is used to measure the agreement between classification results and actual data, considering the possibility of random errors in classification, thereby making it a more comprehensive evaluation metric. Another important metric is the F1-score (F1 value), which is the harmonic mean of Precision and Recall. The F1-score provides a holistic evaluation of the model’s performance in handling imbalanced positive and negative samples. It ranges from 0 to 1, where values closer to 1 indicate a better balance between Precision and Recall, thus superior overall performance.

2.4. Technical Process

The input data include 10 climate indicators: annual mean temperature, ≥10 °C annual active temperature sum, ≥10 °C annual effective temperature sum, number of days with daily average temperature ≥ 35 °C, annual precipitation, annual average relative humidity, warmth index, biological temperature, potential evapotranspiration rate, and humid index. They also include 6 soil and topographic indicators: soil pH value, soil depth, soil type, slope, aspect, and elevation. The Technical process chart is shown in Figure 3.

3. Results

3.1. Climate Suitability Zoning and Habitat Suitability Zoning Analysis Based on Physical Models

3.1.1. Spatial Distribution of Climate Suitability Zoning

We used the latitude and longitude coordinate data of Torreya, along with the raster data of climatic factors, to calculate the climate suitability zoning results through GIS, as depicted in Figure 4.
According to the results shown in Figure 4, 85.7% of the surveyed area falls within the climate-suitable zone, 5.1% falls within the moderately suitable zone, and 9.2% is classified as unsuitable due to the climate conditions. Unsuitable areas are primarily distributed in the southern part of Diankou town, Jiangzao town, the northwestern part of Zhibu town, the central part of Taozhu Street, and the northern part of Huandong Street. Overall, the climate resources in Zhuji city generally meet the planting requirements for Torreya. However, differences in spatial distribution may render some areas unsuitable for Torreya cultivation.

3.1.2. Spatial Distribution of Habitat Suitability Zones

Based on the raster data for the climatic factors, we integrated the raster data for the soil and terrain factors and excluded unsuitable areas from the land use data to derive the results of the habitat suitability zoning, as depicted in Figure 5.
According to the results shown in Figure 5, 26.9% of the surveyed area falls within the suitable habitat zone, and 17.9% falls within the moderately suitable habitat zone; 55.2% is classified as unsuitable due to climate conditions. The western, southern, and eastern regions of Zhuji city are mainly suitable or moderately suitable for Torreya cultivation. The primary townships included Dongbaihu town, Donghe township, Zhaojia town, Huangshan town, Lingbei town, Chenzhai town, Yingdian Street town, and Majian town. Additionally, these habitat zones are distributed in Caota town, Ciwu town, Fengqiao town, Wuxie town, and Tongshan town. The central and northern regions of Zhuji city are mainly unsuitable for Torreya cultivation. The major distributed townships and streets include Shanxiahu town, Jiangzao town, Diankou town, Ruanshi town, Jiyang Street, Huandong Street, Zhibu town, Taozhu Street, Datang town, Wangjiajing town, Anhua town, Paitou town, Jieting town, and Gupu town.

3.2. Climate Suitability Zoning and Habitat Suitability Zoning Analysis Based on Machine Learning Methods

3.2.1. Classification Feature Evaluation Results

The feature importance calculation function yielded the importance ranking of 16 feature variables (Figure 6). The ranking results indicate that compared to climatic features, terrain and soil features are more crucial. Based on this finding, we employed the sequential forward feature selection method and used the test set to train and evaluate the 16 preselected feature variables, as illustrated in Figure 7.
According to the relationship between the test set accuracy and the number of features illustrated in Figure 7, as the number of feature increases from 1 to 5, the test set accuracy increases from 65.23% to 93.34%. With the increase in the number of features, there is a significant improvement in model accuracy. When the number of features increases from 5 to 9, the accuracy shows a generally stable and slight upward trend, with the highest test set accuracy reaching 95.63%. However, as the number increases from 9 to 17, the overall test set accuracy decreases due to data redundancy. Therefore, for habitat suitability zoning model training, it is advisable to select the top 9 feature variables based on the feature importance rankings shown in Figure 6. For climate suitability zoning model training, the top nine feature variables should be selected from among the climate-related feature variables within the same set.

3.2.2. Spatial Classification Results for Climate Suitability Zones

To compare the classification accuracy of three classifiers, namely, RF, SVM, and naive Bayes (NB), in the process of climate suitability zoning and to select the optimal classification algorithm suitable for suitability zoning in topographically complex areas, we used the confusion matrix method to validate the classification results. The evaluation metrics included the overall accuracy (OA), user accuracy (UA), F1-score, kappa coefficient, and R2. See Table 1 for specific accuracy.
According to various model accuracy metrics, the RF algorithm consistently outperforms the SVM and NB algorithms in terms of model classification effectiveness. The classification performance of the SVM algorithm surpasses that of NB. The overall classification accuracy of the RF algorithm reaches as high as 99.08%. The specific classification diagram is shown in Figure 8.
Given that the accuracy and R2 values of the SVM and NB algorithms are significantly lower than those of the RF algorithm, we opted to use the classification results generated by the RF algorithm as the final outcome for climate suitability zoning.

3.2.3. Spatial Classification Results for Habitat Suitability Zones

Since NB performed poorly in climate zoning classification and was found through experimentation to be unsuitable for habitat zoning classification, we compared only the classification accuracy of RF and SVM classifiers in habitat suitability zoning. The evaluation metrics were consistent with those used for spatial climate suitability classification. See Table 2 for specific accuracy.
From the perspective of various model accuracy metrics, the RF algorithm consistently outperforms the traditional SVM model in terms of classification effectiveness. Specifically, the RF algorithm exhibits superior performance in terms of OA, UA for each class, the kappa coefficient, and other aspects. The specific classification diagram is shown in Figure 9.
The distribution of unsuitable areas is extensive, occupying most of the area of Zhuji city. A consideration of the topographical features of Zhuji city indicated that unsuitable areas are concentrated mainly in the PuYang River valley basin in the central part of the city and the river network plain in the northern part. In contrast, suitable areas are distributed primarily in the low hills of Mount Kuaiji in the east, the low hills of Longmen Mountain in the west, and some towns in the southern part of Zhuji. The distribution of moderately suitable areas is relatively limited; these areas are located mainly in transitional zones between unsuitable and suitable areas. Specifically, regions with lower elevations and relatively flat terrain are less suitable for Torreya cultivation; conversely, areas with low hills and ridges are more suitable for Torreya growth. The classification results of these two machine learning models are relatively consistent with the conclusions drawn from the physical models.

3.3. Comparison of Zoning Results

The habitat suitability zoning results for Torreya in Zhuji were resampled to a resolution of 250 m and spatially compared with the classification results of habitat suitability zoning at a 250 m resolution via the SVM and RF methods. The pixel statistics table is shown in Table 3.
From the perspective of pixel quantity analysis, both the suitable, moderately suitable, and unsuitable categories demonstrated trends closer to the simulated habitat suitability index model results within the prediction outcomes of the RF model. By comparing the predictive classification results of the SVM and RF models with the zoning results of the habitat suitability index model on a per-pixel basis, the overlap rates of the three categories of unsuitable, moderately suitable, and suitable pixels were assessed.
For the SVM model, the overlap rate between the predicted suitable spatial distribution of Torreya and the results of the habitat suitability index model reached 80.12%, with an overlap rate of 71.44% for the moderately suitable spatial distribution and 84.51% for the unsuitable spatial distribution, resulting in an overall comprehensive overlap rate of 80.98%. In contrast, the RF model exhibited an overlap rate of 85.68% for the suitable spatial distribution of Torreya compared to the habitat suitability index model results; an overlap rate of 71.41% for the moderately suitable spatial distribution; and a notably greater overlap rate of 91.58% for the unsuitable spatial distribution. The overall comprehensive overlap rate reached 86.38%. This comparison demonstrates the RF model’s superior accuracy and consistency relative to the SVM model in predicting all categories, especially the unsuitable category, for which it achieved a substantially higher overlap rate, providing strong support for the superiority of the model’s performance.
In the feature importance ranking chart (Figure 6), elevation and slope are ranked first and sixth, respectively, indicating the significant influences of these two topographic factors on the habitat suitability of Torreya. Figure 10 shows that the suitable areas for Torreya in Zhuji city are distributed mainly in the eastern, southern, and western parts, which is consistent with regions with high elevations and steep slopes, while the predominantly unsuitable areas are distributed in the central and northern parts, which are characterized by lower elevations and gentler slopes. This finding validated the relationship between elevation and slope within a certain range and habitat suitability.

4. Discussion

The RF classification method based on the best features achieved an overall classification accuracy of over 95% for both the climate and habitat suitability categories. In contrast, the SVM algorithm still has considerable room for improvement in classification accuracy, especially for the moderately suitable class. Scholars have pointed out two main reasons for this error. Firstly, SVMs are limited in some cases due to data dimensionality and kernel function selection, resulting in relatively limited classification performance on complex problems [25,26]. Secondly, the relatively small number of samples in the moderately suitable class of Torreya leads to a lower classification accuracy [27].
Despite significant progress in solving classification problems, machine learning algorithms still face challenges such as insufficient model interpretability, sensitivity to data quality and features, and poor generalization ability [28,29]. With the increasing complexity of algorithms, understanding the decision-making process of models becomes more difficult, which may affect trust and interpretability. Moreover, the problem of good performance on training data but poor generalization ability on new data needs to be addressed, as uncertainties still exist in the practical application of machine learning algorithms [30,31].
Research indicates that classification algorithms such as RF and SVM algorithms are gradually gaining attention in the agricultural field [32,33,34]. RFs are widely used in crop classification, pest detection, and soil analysis due to their automatic feature selection and ability to handle multiple samples [35,36]. In contrast, SVMs perform well in handling high-dimensional and complex data and are used for tasks such as crop growth prediction, soil type identification, and disease prediction [37,38]. However, these algorithms still face challenges in practical applications, such as the complexity of parameter tuning, sensitivity to sample imbalance, and lack of model interpretability. Therefore, in agricultural applications, researchers need to choose appropriate classification algorithms based on the characteristics of specific problems and data and integrate domain expertise for comprehensive analysis to improve the accuracy and interpretability of classification models [39,40].

5. Conclusions

In this study, based on Torreya samples and various ecological factor grid data, an evaluation index system was constructed for Torreya habitat suitability. Physical model and machine learning model methods were used to simulate climate and habitat suitability zoning, respectively. The main research conclusions are as follows:
(1)
The suitable Torreya planting habitats in Zhuji city are distributed mainly in mountainous and hilly areas, while unsuitable areas are located mainly in central basins and northern river plains. Moderately suitable areas are found primarily in transitional zones between suitable and unsuitable areas. The main townships suitable for Torreya cultivation include Dongbaihu town, Zhaojia town, Donghe township, Majian town, Huangshan town, Yingdian Street town, Lingbei town, Chenzhai town, Fengqiao town, and Caota town.
(2)
In terms of the relative habitat suitability area, the distribution of climate-suitable areas is more extensive, indicating that climate factors in Zhuji city are not the primary limiting factors for Torreya cultivation. Soil and terrain factors play greater roles in limiting Torreya cultivation.
(3)
Machine learning classification algorithms can also achieve suitability zoning for cultivation, and their classification process is more concise and efficient. Compared to the SVM and NB algorithms, the RF algorithm demonstrated higher prediction accuracy in this study, achieving the best classification results.
In research, the selection of indicators is subjective and limited, which may result in biased evaluation outcomes. Simultaneously, the ambiguity and subjectivity of evaluation criteria may diminish interpretability and credibility. Therefore, in future studies, it is imperative to enhance the scientific rigor and comprehensiveness of indicator selection while reinforcing the standardization and operability of evaluation criteria. This process will increase both the scientific validity and practical utility of habitat assessment methods.

Author Contributions

L.W.: visualization and writing. L.Y.: conceptualization and supervision. Y.L.: data collection and writing. J.S.: software and formal analysis. X.Z. and Y.Z.: essay writing guidance. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Foundation of Meteorological Technology Innovation Platform, China Meteorological Service Association (CMSA2023MC022).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The model prediction results presented in this study are available on request from the corresponding author. The original observations are not publicly available due to the privacy policy.

Acknowledgments

Funding from the National Key Research and Development Program of Foundation of Meteorological Technology Innovation Platform, China Meteorological Service Association (CMSA2023MC022) is gratefully acknowledged. We also thank the editors and reviewers for their comments to improve our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sansavini, S. The role of research and technology in shaping a sustainable fruit industry: European advances and prospects. Rev. Bras. Frutic. 2006, 28, 550–558. [Google Scholar] [CrossRef]
  2. Shi, L.K.; Mao, J.H.; Zheng, L.; Zhao, C.W.; Jin, Q.Z.; Wang, X.G. Chemical characterization and free radical scavenging capacity of oils obtained from Torreya grandis Fort. ex. Lindl. and Torreya grandis Fort. var. Merrillii: A comparative study using chemometrics. Ind. Crops Prod. 2018, 115, 250–260. [Google Scholar] [CrossRef]
  3. Quan, W.; Zhang, C.; Wang, Z.; Zeng, M.; Qin, F.; He, Z.; Chen, J. Assessment antioxidant properties of Torreya grandis protein enzymatic hydrolysates: Utilization of industrial by-products. Food Biosci. 2021, 43, 101325. [Google Scholar] [CrossRef]
  4. Wu, L.; Wu, L.; Ni, R.; Yan, F. Economic benefits of Torreya grandis ‘Merrillii’ plantings. J. Zhejiang AF Univ. 2013, 30, 299–303. [Google Scholar]
  5. Laghari, A.; Kandhro, A.; Memon, A. Cold pressed Torreya grandis kernel oil. In Cold Pressed Oils; Elsevier: Amsterdam, The Netherlands, 2020; pp. 31–38. [Google Scholar]
  6. Chen, X.; Jin, H. Review of cultivation and development of Chinese torreya in China. For. Trees Livelihoods 2019, 28, 68–78. [Google Scholar] [CrossRef]
  7. Chen, X.; Jin, H. A case study of enhancing sustainable intensification of Chinese Torreya forest in Zhuji of China. Environ. Nat. Resour. Res. 2019, 9, 53–60. [Google Scholar] [CrossRef]
  8. Mohamed, A.; Reich, R.M.; Khosla, R.; Aguirre-Bravo, C.; Briseño, M.M. Influence of climatic conditions, topography and soil attributes on the spatial distribution of site productivity index of the species rich forests of Jalisco, Mexico. J. For. Res. 2014, 25, 87–95. [Google Scholar] [CrossRef]
  9. Tang, H.; Hu, Y.-Y.; Yu, W.-W.; Song, L.-L.; Wu, J.-S. Growth, photosynthetic and physiological responses of Torreya grandis seedlings to varied light environments. Trees 2015, 29, 1011–1022. [Google Scholar] [CrossRef]
  10. Al-Ruzouq, R.; Shanableh, A.; Yilmaz, A.G.; Idris, A.; Mukherjee, S.; Khalil, M.A.; Gibril, M.B.A. Dam site suitability mapping and analysis using an integrated GIS and machine learning approach. Water 2019, 11, 1880. [Google Scholar] [CrossRef]
  11. Radočaj, D.; Jurišić, M. GIS-based cropland suitability prediction using machine learning: A novel approach to sustainable agricultural production. Agronomy 2022, 12, 2210. [Google Scholar] [CrossRef]
  12. Taghizadeh-Mehrjardi, R.; Nabiollahi, K.; Rasoli, L.; Kerry, R.; Scholten, T. Land suitability assessment and agricultural production sustainability using machine learning models. Agronomy 2020, 10, 573. [Google Scholar] [CrossRef]
  13. El Baroudy, A. Mapping and evaluating land suitability using a GIS-based model. Catena 2016, 140, 96–104. [Google Scholar] [CrossRef]
  14. Dengiz, O.; Sezer, İ.; Özdemir, N.; Göl, C.; Yakupoğlu, T.; Öztürk, E.; Sırat, A.; Şahin, M. Application of GIS model in physical land evaluation suitability for rice cultivation. Anadolu Tarım Bilim. Derg. 2010, 25, 184–191. [Google Scholar]
  15. Chen, Y.; Wu, B.; Chen, D.; Qi, Y. Using machine learning to assess site suitability for afforestation with particular species. Forests 2019, 10, 739. [Google Scholar] [CrossRef]
  16. Xing, W.; Zhou, C.; Li, J.; Wang, W.; He, J.; Tu, Y.; Cao, X.; Zhang, Y. Suitability evaluation of tea cultivation using machine learning technique at town and village scales. Agronomy 2022, 12, 2010. [Google Scholar] [CrossRef]
  17. Morais, R. On the suitability, requisites, and challenges of machine learning. J. Opt. Commun. Netw. 2021, 13, A1–A12. [Google Scholar] [CrossRef]
  18. Cao, H.; Li, H.; Sun, W.; Xie, Y.; Huang, B. A boundary identification approach for the feasible space of structural optimization using a virtual sampling technique-based support vector machine. Comput. Struct. 2023, 287, 107118. [Google Scholar] [CrossRef]
  19. Chemura, A.; Schauberger, B.; Gornott, C.J.P.O. Impacts of climate change on agro-climatic suitability of major food crops in Ghana. PLoS ONE 2020, 15, e0229881. [Google Scholar] [CrossRef] [PubMed]
  20. Leng, G.; Hall, J.W. Predicting spatial and temporal variability in crop yields: An inter-comparison of machine learning, regression and process-based models. Environ. Res. Lett. 2020, 15, 044027. [Google Scholar] [CrossRef]
  21. Han, Y.; He, Y.; Liang, Z.; Shi, G.; Zhu, X.; Qiu, X. Risk Assessment and Application of Tea Frost Hazard in Hangzhou City Based on the Random Forest Algorithm. Agriculture 2023, 13, 327. [Google Scholar] [CrossRef]
  22. Kira, T. On the altitudinal arrangement of climatic zones in Japan. Kanchi-Nogaku 1948, 2, 143–173. [Google Scholar]
  23. Holdridge, L.R. Life Zone Ecology; CABI: Wallingford, UK, 1967. [Google Scholar]
  24. Xu, W. Ji Liang’s caloric index and its application in Chinese vegetation. J. Ecol. 1985, 3, 35–39. [Google Scholar]
  25. Tharwat, A. Parameter investigation of support vector machine classifier with kernel functions. Knowl. Inf. Syst. 2019, 61, 1269–1302. [Google Scholar] [CrossRef]
  26. Ghaddar, B.; Naoum-Sawaya, J. High dimensional data classification and feature selection using support vector machines. Eur. J. Oper. Res. 2018, 265, 993–1004. [Google Scholar] [CrossRef]
  27. Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef] [PubMed]
  28. Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
  29. Nayak, J.; Naik, B.; Behera, H. A comprehensive survey on support vector machine in data mining tasks: Applications & challenges. Int. J. Database Theory Appl. 2015, 8, 169–186. [Google Scholar]
  30. Löw, F.; Michel, U.; Dech, S.; Conrad, C. Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using support vector machines. ISPRS J. Photogramm. Remote Sens. 2013, 85, 102–119. [Google Scholar] [CrossRef]
  31. He, J.; Mattis, S.A.; Butler, T.D.; Dawson, C.N. Data-driven uncertainty quantification for predictive flow and transport modeling using support vector machines. Comput. Geosci. 2019, 23, 631–645. [Google Scholar] [CrossRef]
  32. Dehghanisanij, H.; Emami, H.; Emami, S.; Rezaverdinejad, V. A hybrid machine learning approach for estimating the water-use efficiency and yield in agriculture. Sci. Rep. 2022, 12, 6728. [Google Scholar] [CrossRef]
  33. Kok, Z.H.; Shariff, A.R.M.; Alfatni, M.S.M.; Khairunniza-Bejo, S. Support vector machine in precision agriculture: A review. Comput. Electron. Agric. 2021, 191, 106546. [Google Scholar] [CrossRef]
  34. Son, N.T.; Chen, C.F.; Chen, C.R.; Minh, V.Q. Assessment of Sentinel-1A data for rice crop classification using random forests and support vector machines. Geocarto Int. 2018, 33, 587–601. [Google Scholar] [CrossRef]
  35. Tatsumi, K.; Yamashiki, Y.; Torres, M.A.C.; Taipe, C.L.R. Crop classification of upland fields using Random forest of time-series Landsat 7 ETM+ data. Comput. Electron. Agric. 2015, 115, 171–179. [Google Scholar] [CrossRef]
  36. Castro-Franco, M.; Costa, J.L.; Peralta, N.; Aparicio, V. Prediction of soil properties at farm scale using a model-based soil sampling scheme and random forest. Soil Sci. 2015, 180, 74–85. [Google Scholar] [CrossRef]
  37. Dang, C.; Liu, Y.; Yue, H.; Qian, J.; Zhu, R. Autumn crop yield prediction using data-driven approaches:-support vector machines, random forest, and deep neural network method. Can. J. Remote Sens. 2021, 47, 162–181. [Google Scholar] [CrossRef]
  38. Kovačević, M.; Bajat, B.; Gajić, B. Soil type classification and estimation of soil properties using support vector machines. Geoderma 2010, 154, 340–347. [Google Scholar] [CrossRef]
  39. Dziugaite, G.; Ben-David, S.; Roy, D. Enforcing interpretability and its statistical impacts: Trade-offs between accuracy and interpretability. arXiv 2020, arXiv:2010.13764. [Google Scholar]
  40. ElShawi, R.; Sherif, Y.; Al-Mallah, M.; Sakr, S. Interpretability in healthcare: A comparative study of local machine learning interpretability techniques. Comput. Intell. 2021, 37, 1633–1650. [Google Scholar] [CrossRef]
Figure 1. Study area map.
Figure 1. Study area map.
Agriculture 14 01077 g001
Figure 2. Torreya sample distribution map.
Figure 2. Torreya sample distribution map.
Agriculture 14 01077 g002
Figure 3. Technical process chart.
Figure 3. Technical process chart.
Agriculture 14 01077 g003
Figure 4. Climate suitability zoning map.
Figure 4. Climate suitability zoning map.
Agriculture 14 01077 g004
Figure 5. Habitat suitability zoning map.
Figure 5. Habitat suitability zoning map.
Agriculture 14 01077 g005
Figure 6. Feature importance ranking map.
Figure 6. Feature importance ranking map.
Agriculture 14 01077 g006
Figure 7. Test set accuracy line chart (left: test set accuracy with feature numbers 1–16; right: partial amplification of test set accuracy).
Figure 7. Test set accuracy line chart (left: test set accuracy with feature numbers 1–16; right: partial amplification of test set accuracy).
Agriculture 14 01077 g007
Figure 8. Climate suitability zone (RF) classification diagram.
Figure 8. Climate suitability zone (RF) classification diagram.
Agriculture 14 01077 g008
Figure 9. Habitat suitability zone classification diagram ((a): SVM; (b): RF).
Figure 9. Habitat suitability zone classification diagram ((a): SVM; (b): RF).
Agriculture 14 01077 g009
Figure 10. Verification analysis diagram ((a): physical model habitat suitability zoning; (b): RF model habitat suitability zoning; (c): altitude; (d): slope).
Figure 10. Verification analysis diagram ((a): physical model habitat suitability zoning; (b): RF model habitat suitability zoning; (c): altitude; (d): slope).
Agriculture 14 01077 g010
Table 1. Climate suitability zone classification model accuracy table.
Table 1. Climate suitability zone classification model accuracy table.
ClassificationSuitable TypeF1-ScoreUser Accuracy/%Overall Accuracy/%Kappa CoefficientR2
RFSuitable0.9999.32%98.08%0.99370.9957
Moderately suitable0.9898.46%
Unsuitable0.9797.46%
SVMSuitable0.8989.03%83.36%0.71410.5554
Moderately suitable0.6969.23%
Unsuitable0.7978.80%
NBSuitable0.7661.29%48.30%0.28560.3019
Moderately suitable0.3298.46%
Unsuitable0.158.70%
Table 2. Habitat suitability zone classification model accuracy table.
Table 2. Habitat suitability zone classification model accuracy table.
ClassificationSuitable TypeF1-ScoreUser Accuracy/%Overall Accuracy/%Kappa CoefficientR2
SVMSuitable0.9290.37%91.95%0.87430.8933
Moderately suitable0.8388.54%
Unsuitable0.9694.61%
RFSuitable0.9296.08%95.17%0.92430.9268
Moderately suitable0.9091.67%
Unsuitable0.9796.30%
Table 3. Pixel statistics table.
Table 3. Pixel statistics table.
Number of Pixels and Overlap RateSuitable CategoryModerately Suitable CategoryUnsuitable CategoryZhuji
Number of physical model simulation pixels/each9157610618,77534,038
SVM model predicts the number of pixels/each8935832316,78034,038
The number of pixels in the overlap between the SVM model and the physical model/each7337436215,86627,565
SVM model pixel overlap rate/%80.12%71.44%84.51%80.98%
RF model predicts the number of pixels/each9304649418,24034,038
The number of pixels in the overlap between the RF model and the physical model/each7846436017,19529,401
RF model pixel overlap rate/%85.68%71.41%91.58%86.38%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, L.; Yang, L.; Li, Y.; Shi, J.; Zhu, X.; Zeng, Y. Evaluation of the Habitat Suitability for Zhuji Torreya Based on Machine Learning Algorithms. Agriculture 2024, 14, 1077. https://doi.org/10.3390/agriculture14071077

AMA Style

Wu L, Yang L, Li Y, Shi J, Zhu X, Zeng Y. Evaluation of the Habitat Suitability for Zhuji Torreya Based on Machine Learning Algorithms. Agriculture. 2024; 14(7):1077. https://doi.org/10.3390/agriculture14071077

Chicago/Turabian Style

Wu, Liangjun, Lihui Yang, Yabin Li, Jian Shi, Xiaochen Zhu, and Yan Zeng. 2024. "Evaluation of the Habitat Suitability for Zhuji Torreya Based on Machine Learning Algorithms" Agriculture 14, no. 7: 1077. https://doi.org/10.3390/agriculture14071077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop