Abstract
Judging and predicting tree suitability is of great significance in the cultivation and management of forests. Background and Objectives: Due to the diversity of tree species for afforestation in China and the lack of experts or the limitations of expert knowledge, the site rules of tree species in some regions are lacking or incomplete, so that a small number of tree suitability empirical site rules are difficult to adapt to the afforestation expert system’s diverse needs. Research Highlights: This paper explores an intelligent method to automatically extract rules for selecting favorable site conditions (tree suitability site rules) from a large amount of data to solve the problem of knowledge acquisition, updating and maintenance of suitable forest site rules in the expert system. Materials and Methods: Based on the method of site quality evaluation and the theory of the decision tree in knowledge discovery and machine learning, the dominant species of Chinese fir and Masson pine in the forest resources subcompartment data (FRSD) of Jinping County, Guizhou Province were taken as examples to select the important site factors affecting the forest quality and based on the site quality of potential productivity. Assessment methodology was proposed to determine the afforestation of a stand site by nonlinear quantile regression, the decision tree was constructed from the ID3, C5.0 and CART algorithms. Results: Finally, the best-performing CART algorithm was selected to construct the model, and the extractor of the afforestation rules was constructed. After validating the rules for selecting favorable site conditions of Chinese fir and Masson pine, the production representation method was used to construct the relationship model of the knowledge base. Conclusions: Intelligent extraction of suitable tree rules for afforestation design in an expert system was realized, which provided the theoretical basis and technical support for afforestation land planning and design.
1. Introduction
Tree suitability involves adapting the afforestation characteristics of a tree species to the site conditions to give full play to the productive potential of forests and achieve a higher level of productivity of afforestation tree species under the current technical and economic conditions of the site. Tree suitability is the embodiment of the principle of suitability to local conditions. To improve the survival rate of afforestation and the growth of trees, it is necessary to select suitable tree species scientifically and reasonably, which is the key part to promote the afforestation quality [1,2].
Basically, there are three aspects that determinate the growth of a tree species: climatic water, energy (i.e., either solar radiation or temperature) and soil nutrient availability. These aspects are described then by topography factors (e.g., slope and altitude), soil factors (e.g., soil type and soil layer thickness), meteorological factors (e.g., precipitation, radiation and temperature) and biological factors (e.g., plant biodiversity, composition and stand factors) [3,4,5]. For afforestation, site conditions, such as topography factors, soil factors and stand factors may be important factors that determine the growth of a tree species. Tree suitability judgment and prediction are highly important links in forest production and management. Currently, there are three primary kinds of quantitative criteria for evaluating afforestation in the relevant literature: site index (SI), average volume growth and site rules. Site index is normally used to measure the site quality under various site conditions to better reflect the relationship between site performance and tree species growth [6,7,8]. The dominant height model [9,10,11,12,13,14,15] and site index curve [14,16,17,18,19,20,21] are two main forms of site indexes. The disadvantage of the site index is that it is difficult to directly explain the productivity level of site (i.e., tree suitability performance) through the SI value. This is because the planting density and the relationship between tree height and diameter at breast height (1.3 m)(DBH) depend on the tree species, the relationship between SI value and yield of different tree species is also different [1,2]. Average volume growth is another index to measure tree suitability performance, which measures tree suitability by the average volume growth of a tree species when it reaches maturity. Because the average volume growth is not only affected by the site conditions but also by the stand density and management level [1,2,22], it is necessary to consider complex conditions (different regions, site conditions, tree species and management measures) when using average volume growth as an evaluation index. However, it is not practical to use the average volume growth as a measure of tree suitability because of the complexity [23]. Site rules could also be used to judge the tree suitability performance in the expert system of forest cultivation. Several papers that study the application of site rules matching afforestation tree species can also be found in the literature. Schröder [24] developed the Multipurpose Tree and Shrub Database, which contains the first-hand, site-specific information about multipurpose tree species. This information can provide decision support when candidate species for specific sites or end-uses are required. Hu Bo [25] used the accumulated empirical site rules to establish a forestry knowledge base by tree structure representation and construct a reasoning afforestation expert system. Ding Quanlong [26] used the tree structure method to build the site rules knowledge base and used the fact base to control the repository to enhance the universality of usage and reduces the regional restrictions of the Expert System. Wu Baoguo [27,28] realized the implementation of a web-based afforestation decision-making consultation system. The knowledge base of afforestation site rules was represented by the production rule method, and the forest site factors provided by users were inferred and analyzed by imitating afforestation experts to select suitable afforestation tree species. Ma Chi [29,30] expressed afforestation site rules knowledge through the combination of the production rule method and the frame method, realized the separation of a reasoning machine and knowledge base and improved the practicability of afforestation expert system. Helton Nonato de Souza [31] selected the suitable trees in the agroforestry systems was based on market accessibility and environmental needs (e.g., soil fertility). Han Yanyun [32] used the credibility of the rules to deal with the uncertainty of forest cultivation knowledge and realized the inference engine algorithm of uncertainty reasoning in the forest cultivation expert system. Prabakaran [33] proposed fuzzy system structure along with integration of expert knowledge. Vásquez Ruben Purroy [34] proposed fuzzy multicriteria decision support system founded on logic-based decision rules. With the wide application of the forestry-service-oriented expert system of afforestation consultation, the site rules quantitative criteria for evaluating afforestation was more intuitive, simple and practical.
However, one practical and several theoretical problems have arisen after almost a decade of practical experience with the site rules of the forest cultivation expert system. The practical problem pertains to the absence and incompleteness of site rules; the current site rules are summarized by experienced experts through long-term afforestation practice, commonly referred to as empirical site rules. Due to the diversity of tree species for afforestation in China and the lack of experts or the limitations of expert knowledge, the site rules of tree species in some regions are lacking or incomplete, so that a small number of empirical tree suitability site rules are difficult to adapt to the diverse needs. Two of the theoretical problems are as follows: (i) The empirical site rules of a certain tree species are subjective due to human judgment by experts, these experts’ subjectivity is high during the operation process, thus the tree suitability evaluation lacks scientific accuracy. (ii) Since the site conditions are changing and the rules of the expert system are relatively fixed, the expert system is faced with great difficulty in maintaining the knowledge of the site rules.
In order to remedy these practical and theoretical problems, it is necessary to explore a method of automatically extracting appropriate site rules. The machine learning algorithm features excellent self-organization, self-learning and self-adaptability, and can acquire the implicit knowledge from the data through massive data learning. The long-term-accumulated survey data of forest resources and the statistical data contain a large amount of explicit information (e.g., dominant tree species, landform, gradient, slope direction, slope position, soil type and soil thickness) and, more importantly, the relationships and rules among them [2]. By mining the hidden knowledge behind the forest resource data, the bottleneck problem of obtaining afforestation site rules has been overcome to some extent. The decision tree algorithm in the machine learning algorithm has been widely used for its advantages of easily extracting rules [35,36], displaying important decision attributes [37] and high classification accuracy [38,39,40]. Currently, the decision tree algorithm has rarely been used to extract afforestation suitability rules from a large amount of forest resource survey data. Compared with the site rules obtained from the experience summary, this algorithm has some advantages. First, the algorithm extracts rules from the overall data. The rules are relatively comprehensive which, to some extent, resolves the bottleneck problem of acquiring rules for afforestation sites. Second, extracting rules from the data can overcome the limitation and subjectivity of expert judgment and enhance confidence. Third, the more data, the more accurate the extraction rules. Thus, the extraction rules can be updated and maintained.
The purpose of this study was to solve the problem of acquiring and updating the site rules in the afforestation expert system. Therefore, the objectives of this study were as follows: (i) To explore an intelligent method to automatically extract the afforestation site rules; (ii) the knowledge of site rules was represented by the production rule method and then applied in the afforestation expert system.
2. Materials and Methods
2.1. Data Source and Processing
The data were obtained from the 2005 and 2015 forest resources subcompartment database in Jinping County, Guizhou Province. Jinping County is located in the eastern part of Guizhou province, with terrain gradually declining from west to east. The western part is dominated by low mountain and low-middle mountain landforms, with an elevation of 800–1300 m. The eastern part encompasses low mountains, hills, valleys and basins with an elevation between 500 and 700 m. Yellow soil is the majority followed by red soil, yellow brown soil and rice soil. Located in the subtropical evergreen broad-leaved forest area, the tree species are mainly Chinese fir followed by Masson pine and broad-leaved tree species, such as Quercus and Liquidambar. The dominant tree species Chinese fir (Cunninghamia lanceolate (Lamb.) Hook.) and Masson pine (Pinus massoniana Lamb.), which were the main fast-growing timber forest in the area, were selected, and the forest factors, including topographic factors, such as landform (DM), slope direction (PX), slope position (PW), slope gradient (PD), elevation (HB), soil factors, such as soil type (TRMC), soil parent material (TRMZ), soil layer thickness (TCHD), and stand factors, such as average age (t), volume per hectare of dominant tree species (YSSZGQXJ), number of trees per hectare of dominant tree species (YSSZGQZS), average height of dominant tree species (YSSZPJG) and average DBH of dominant tree species (YSSZPJXJ), were sorted out. Among these species, there were 7971 effective subcompartments of Chinese fir as the dominant tree species and 263 effective subcompartments of Masson pine as the dominant tree species. Eight attributes, such as landform, slope direction, slope position, slope gradient, altitude, soil type, soil parent material and soil thickness, were extracted from the forest resources subcompartment database, forming the growth site information for Chinese fir and Masson pine.
Ideally, model validation should involve the use of an independent data set [41]. In the present study, the validation data sets were gathered from the 2015 forest resources subcompartment database in the region. After processing, the validation data sets contained 1224 subcompartments of Chinese fir and 171 subcompartments of Masson pine (Table 1). Moreover, variations in stand factors and environmental site factors were included in the data set.
Table 1.
Growth information of the forest resource subcompartment data in Jinping County of Guizhou Province in 2015.
2.2. Extraction of Tree Suitability Site Rules Based on the Decision Tree
The decision tree is an instance-based learning algorithm similar to the tree structure classification of flowcharts. The decision tree is composed of three main parts: the decision node, branch and leaf node. The classification rules represented by the decision tree are inferred from the multiple irregular and disorderly data tuples. The whole decision-making process starts with the root decision node. From top to bottom, each decision node represents a data category or attribute to be classified, and each leaf node represents the result. Each path corresponds to a classification rule, and the set of classification rules constitutes a complete set of decision tree expressions. In this paper, the decision tree algorithm was selected to discover the knowledge of tree suitability, and the decision tree model was used to extract the implicit classification rules between a large number of site factors and tree suitability. Figure 1 is a schematic diagram of the decision tree being transformed into a decision rule.
Figure 1.
Principle of decision rules extraction based on the decision tree algorithm. Where Input1 and Input2 outputs are variables of the decision tree and a1, a2, b1, b2, Y1 and Y2 are the decision tree sets.
Based on the above principles, first, the site factors and grading index were constructed to determine the input of the decision tree model. Second, the tree suitability was evaluated, and the output of the decision tree model was determined. Third, the training results of decision tree models trained by different algorithms were compared. The decision tree model of the optimal algorithm was selected as the knowledge rule extraction model of tree suitability, and the extracted rules were the output.
2.2.1. Site Factors and Grading Index
In the actual afforestation operation plan of Guizhou province, the site conditions mainly involve 8 site factors, including landform, slope direction, slope position, slope gradient, elevation, soil type, soil parent material and soil layer thickness. Therefore, the 8 site factors served as input variables of the decision tree model. Based on the “Detailed Rules for the Implementation of Forest Resources Planning and Design Survey in Guizhou Province"”, site factors were divided, and the meanings of site factors and their grading indicators in the study area are presented in Table 2.
Table 2.
Site factors and their classification indexes.
2.2.2. Determining Tree Suitability Based on Quantile Regression
The decision tree is a classification algorithm. The output variable corresponded to the classification of tree suitability (most suitable, suitable and unsuitable). The essence of determining tree suitability is to evaluate the site quality. To ensure the accuracy and flexibility of the decision tree model, that is, to make the extracted tree suitability site rules accurate and flexible in the application of natural, uneven-aged and mixed forests of different species in different regions. This paper used the theory of site quality assessment based on potential growth to determine tree suitability. This method assumes that at the same site and with the same stand type, if there were an approximate stand structure and approximate density, there would be an approximate growth process, including height growth, area growth and volume growth [42,43]. Since quantile regression could comprehensively describe the relationship between independent variables and dependent variables under different quantiles [44,45], this hypothesis could be quantified by quantile regression, which could capture the tail characteristics of the distribution. When the independent variables had different effects on the distribution of dependent variables in different parts, it could describe the distribution characteristics more comprehensively and obtain a comprehensive analysis [46]. The growth process was essentially the distribution of the age-dependent variable and growth-dependent variable. The theoretical growth equations (Table 3) were used to fit the age and growth of one-third and two-thirds of the quantile positions. The optimal equations were screened by the Akaike Information Criterion (AIC) information criterion. Therefore, for forests of the same age, the two quantile regression lines divided the growth into three types: the most suitable, the suitable and the unsuitable.
Table 3.
Theoretical growth equations.
When quantile regression of forest resource subcompartment survey data was used to determine tree suitability, there were three forms of growth in the growth process: (1) The determination of the average volume growth under the same age forest condition (AGE-V-Quantile method); (2) the determination of the average height growth under the same age forest condition (AGE-H-Quantile method); and (3) the determination of the average diameter at breast height growth under the same age forest condition (AGE-DBH-Quantile method). Since height growth was less affected by density, the AGE-H-Quantile method was selected in the experiment. Although the average tree height of the dominant trees can more accurately reflect the site quality of the sample plot, in the actual forest resources subcompartment survey, the forest stand survey factors only included the average height, but not the average height of the dominant trees. In terms of determining tree suitability, because the average height was smaller than the average height of the dominant trees, the average tree height of the dominant trees must be “suitable” if the results of judging by average tree height were “suitable”. Therefore, determining the tree suitability by the average tree height ensured the appropriate accuracy of the tree suitability rules to some extent.
2.2.3. Decision Tree Algorithms Modeling
The decision tree is a data mining technology that could realize functions, such as data classification, association rules extraction and regression prediction [36]. Common algorithms include the ID3 algorithm based on entropy theory and information gain theory, the C5.0 algorithm based on information gain ratio to select features (an improved algorithm based on C4.5), the CART algorithm with the Gini index as the sorting criterion. This paper used ID3, C5.0 and CART algorithms of a decision tree to construct the model and implemented it in R language. All statistical analyses were performed using C50 and rpart R packages. Generally, the decision tree construction includes three processes: feature selection (ID3 for information gain, C5.0 for information gain ratio and CART for Gini index), decision tree construction and pruning. The decision tree model was constructed with R language as follows:
- Step 1.
- the training data were pretreated, the input variables were discretized by establishing the site factor and grading index, and the output variables were discretized by determining tree suitability.
- Step 2.
- by adjusting the parameters, the decision tree model was generated and the tree suitability site rules were produced.
- Step 3.
- the decision tree was pruned, and the tree suitability site rules were the output.
- Step 4.
- the importance of site factors in the final decision tree model was analyzed.
- Step 5.
- the accuracy of the decision tree classification was evaluated.
2.3. Rules Validation
The accuracy of the tree suitability rules extracted from the model was verified based on validation data. Table 4 summarizes the experience rules of Chinese fir and Masson pine in Guizhou Province Forest Management Information Collection in 1989. Table 5 shows the empirical rules of Masson pine collected from the table of afforestation investigation and design in Guizhou province in actual afforestation operations. The incompleteness and obsolescence of the empirical rules affected the accuracy of judgment in Table 4. In addition, the empirical rules can only roughly determine the suitability of the tree species and cannot further determine the degree of suitability (most suitable, suitable and unsuitable) in Table 5. The existing site index tables of Chinese fir and Masson pine in Guizhou were divided into three grades according to SI grade, that is, 8–12 index grade is unsuitable, 14–18 index grade is suitable and 20–22 index grade is most suitable. In this paper, the extracted site rules for tree suitability evaluation were compared with the existing site-specific site index tables of Chinese fir and Masson pine respectively, which were used to test the consistency of the judgment results of the tree suitability.
Table 4.
Summary of empirical site rules from Guizhou Province Forest Management Information Collection in 1989.
Table 5.
Summary of empirical site rules from Guizhou Plantation Survey and Design.
2.4. Rules Application
Based on the rules extracted by the decision tree algorithm, an expert-assisted decision support system for afforestation could be constructed with five parts. Figure 2 shows the relationship between the rule extraction and the knowledge base of the expert system for afforestation. A database was used to store the original data and intermediate results. In this study, the database stored the forest resource subcompartment data. The Site Rule Intelligent Extractor is the procedure of extracting site rules for afforestation based on decision tree algorithm, including the determination of tree suitability by the quantile method, decision tree rule extraction and rule verification, which is the core module for data-to-knowledge conversion. The Site Rule Knowledge Base was used to store the extracted knowledge of the site rules. The knowledge of site rules was represented by the production rule method in this paper. The Inference Machine transformed site rules into IF-THEN form and deduced the conclusion based on the user input conditions. The Human-computer Interaction Interface was the interface between the system and the user for communication. Through this interface, users input basic information to answer the relevant questions raised by the system, and the system outputs reasoning results and relevant explanations.
Figure 2.
The architecture diagram of the afforestation expert assistant decision support system.
3. Results
3.1. Determining Tree Suitability Based on the Quantile Method
The fitting results at one-third and two-thirds of the quantile positions by the AGE-H-Quantile method are shown in Table 6. Both Chinese fir and Masson pine, Logistics regression models (Equation 1) with one-third of the quantile were the best, Richards’s regression models (Equation 5) with two-thirds of the quantile were the best, and the corresponding stand site tree suitability results for Chinese fir and Masson pine are shown in Figure 3.
Table 6.
Fitting results of AGE-H-Quantile method.
Figure 3.
Distribution of stand site tree suitability of Chinese fir and Masson Pine.
3.2. Model Evaluation
Using the decision tree ID3 C5.0 and CART algorithms, eight site factors in the subcompartment data were taken as input data sets according to Table 2, and the tree suitability of Chinese fir and Masson pine, which were determined by quantile regression, were used as output data. The decision tree models for classifying the suitability of Chinese fir and Masson pine were established. The three algorithm decision tree models were used to predict the tree suitability of the subcompartments, and the accuracy was calculated and is shown in Table 7. It showed that the CART algorithm for the Chinese fir and Masson Pine decision tree models was slightly superior to the other two algorithms.
Table 7.
Comparison results of three decision tree models.
Further analysis of the coincidence matrix of the three algorithms (Table 8), the CART algorithm showed that the number of correct judgments for the suitable growth of Chinese fir was 3308, and the number of wrong judgments was 2039, the correct rate being 61.87%. The number of correct judgments for the suitable growth of Masson pine was 123, and the number of wrong judgments was 44, the correct rate being 73.65%. The ID3 algorithm the correct rate of correct judgments for the suitable growth of Chinese fir and Masson pine were 59.40% and 75.45%, respectively. The C5.0 algorithm the correct rate of correct judgments for the suitable growth of Chinese fir and Masson pine were 59.89% and 69.46%, respectively. For Chinese fir, the prediction effect of the three algorithms are similar, CART performed best, followed by C5.0 and ID3. For Masson pine, the number of correct judgments for the suitable growth of the three algorithms were 126,116 and 123, ID3 and CART performed slightly better. Therefore, the decision tree model was constructed by CART algorithm comprehensively.
Table 8.
The coincidence matrix of three algorithms for Chinese fir and Masson Pine.
Based on the decision tree model of the CART algorithm, the decision trees for the classification of tree suitability of Chinese fir and Masson pine were established, as shown in Figure 4 and Figure 5. Figure 4 shows that the decision tree established for Chinese fir had 10 layers after pruning, and 23 leaf nodes were generated, among which 6 were most suitable and 10 were suitable. Table 9 shows the importance of each input site factor of the corresponding decision tree. The slope position, soil parent material, soil layer thickness, elevation and landform were important factors affecting the growth of Chinese fir in the Jinping area of Guizhou Province, corresponding to the main classification factors in the decision tree. For the first layer of the soil parent material factor, (shale) was better for tree suitability. For the second layer of the slope position factor, (whole slope) was more suitable. For the third layer of soil thickness factor, (thickness) was more suitable and (low elevation was selected for tree suitability. In the region concerned, the three factors of slope, soil type and slope direction had little influence on tree suitability.
Figure 4.
Classification decision tree model of tree suitability for certain forest site conditions of Chinese fir. Where each box contains the following information according to the order of top to bottom and left to right: Rule number, tree suitability judgment, three tree suitability (most suitable, suitable and unsuitable) percentage and the probability of the rule.
Figure 5.
Classification decision tree model of tree suitability for certain forest site conditions of Masson pine.
Table 9.
Importance of variables for Chinese fir.
Similarly, the decision tree established by Masson pine shared nine layers after pruning, with 14 leaf nodes generated, among which four were most suitable and three were suitable. Table 10 shows the importance of the site factor of the corresponding decision tree (Figure 5). The important factors affecting the growth of Masson pine in the Jinping area of Guizhou Province were slope position, soil parent material, soil layer thickness, slope gradient and slope gradient, corresponding to the main classification factors in the decision tree. For the first layer attribute factor, slope location factor, (flat land and valley) were better for tree suitability. For the second layer of the soil parent material factor, (shale) was more suitable. For the third layer of soil thickness factor, (thickness) was more suitable. The suitable tree suitability choice for slope direction was (sunny slope). Because the suitable sample size of Masson pine in this area was relatively small, the lack of site factor classifications had a certain impact on the model, which will be improved in future studies.
Table 10.
Importance of variables of Masson pine.
3.3. Rules Validation
The site rules and site index were used to judge the tree suitability. The results of rule verification Table 11 indicated that 920 of 1224 subcompartments were consistent for Chinese fir, and 304 were inconsistent with 75.16% consistency. For Masson pine, 121 out of 171 subcompartments were consistent, and 50 were inconsistent, with 70.76% consistency. Therefore, for Chinese fir and Masson pine species in Guizhou, the two methods were consistent in more than 70% of the tree suitability judgment results, indicating that the rules extracted by this method can be applied to the site suitability judgment to some extent, especially those species without site index. Similarly, the limited errors indicate that the lack of appropriate samples affected the model, which should be improved in future studies.
Table 11.
Results of rule verification.
3.4. Rules Application
The production rule method could simplify the reasoning knowledge of afforestation represented by multiple rules into one rule, greatly reducing the number of rules recorded in the Site Rule Knowledge Base and facilitating the maintenance of the knowledge base by afforestation experts. It is essentially a description of the final leaf nodes. To illustrate this method, a tree suitability site rule includes m values of landforms conditions, n values of soil type conditions and p values of soil thickness conditions; thus, the number of rules stored in the tree structure representation method is m * n * p *... Then, only one generative expression is sufficient. The relation schema of the afforestation site rules table designed according to this idea follows:
Tree Suitability Site Rule Table
(Rule Number, region, DM, PX, PW, PD, HB, TRMC, TRMZ, TCHD, Tree species, Tree Suitability)
The content of the rules refers to the combination of tree suitability site rules. According to the multiple values of each site condition factor, those rules are separated by ’*’. The tree suitability site rules extracted from Chinese fir and Masson Pine in the Jinping area of Guizhou Province were saved (see Appendix A for tree suitability site rule table for Chinese fir and Masson Pine in Jinping County, Guizhou Province in Table A1).
The above classification rules could intuitively express the relationship between tree suitability and site factors. At the same time, the judgment of tree suitability was influenced by many site factors. The effect of each factor depended on whether it is larger or smaller than a threshold value. Only a few threshold values of variables were needed to determine the final suitability or unsuitability of the site factors of great importance. For example, in rule number 2, the suitability of Chinese fir could be judged as the most suitable by the soil parent material being shale, while some rules required more site factors to be judged repeatedly before obtaining the results of tree suitability.
4. Discussion
To construct the decision tree model, the tree suitability needs to be determined as the output variable of the model. The realization of this process was based on the theory of site quality evaluation of potential productivity. The site index is the most commonly used method for site quality evaluation, mainly for older, even-aged, well-stocked, free-growing, undisturbed and pure or single species-dominated stands [47,48]. This paper quantified the theory by combining the characteristics of quantile regression and proposed a method to determine the tree suitability based on the quantile method. The proposed method will extend the application scope of the decision tree model. (1) It could be applied to different types of forest resource data. The average age and average tree height are generally included in forest resources subcompartment data, permanent sample plot data and parsed tree data. However, the site index method needs dominant average tree height variables, which are usually missing in forest resource surveys, which limits the use of this method. (2) It is suitable for mixed forest types with different origins. In this paper, Chinese fir and Masson pine were selected as the dominant tree species for screening data, and their origins were not limited to natural or artificial. At the same time, if the dominant tree species were used to extract the rules, further study could analyze the most suitable mixed forest types. For example, it is quite likely that the productivity will be affected by the mixing of tree species, so taking this into account in the composition, if the two rules of tree suitability extracted from Chinese fir were “most suitable” and “suitable”, the corresponding species composition being “8 fir 2 broad” and “7 fir 2 broad”, respectively. We would have reason to think that the “8 fir 2 broad” mixed forest type would be better than the “7 fir 2 broad” mixed forest type. (3) It is suitable for unevenly aged forests. The quantile method converts the growth of unevenly aged forests into the growth of the evenly aged forests according to the distribution of different quantiles.
After determining the tree suitability of a stand site, a large amount of explicit information (e.g., dominant tree species, landform, slope, slope direction, slope position, soil species and soil thickness) in forest resource data can be used to judge the afforestation suitability (most suitable, suitable or unsuitable). However, it is very likely that the same site factor corresponds to an inconsistent suitability judgment. The significance of rule extraction by decision algorithm lies in the fact that the algorithm could mine the relationship and rules hidden in a large amount of data in a probabilistic way and summarizes a large number of rules. The reliability of the rules has been enhanced. The existing data could be used to predict and evaluate the tree suitability effectively by these rules. Compared with the empirical rules, the extracted rules not only judge the suitability of tree species but also further clarify the suitability degree, which could solve the problem of incomplete and fuzzy knowledge of the empirical rules.
The machine learning algorithm and expert system, as two branches of the artificial intelligence application, are becoming research hotspots in the field of forestry currently, but mostly have been applied separately. Both methods have significant limitations. Few studies have combined the two technologies to achieve complementary advantages. Feng Yuqiang [49,50] developed an integrated system of a neural network and expert system for stock analysis, which minimized the difficulty of knowledge acquisition in the expert system by using the neural network. Han Yanyun [32] mentioned that a neural network is used to solve the bottleneck problem of knowledge acquisition in the forest cultivation expert system, because of the problems, such as the black-box operation without an explanation mechanism and slow existing training technology in the neural network, which is not applied in afforestation knowledge acquisition. In this paper, the ability to extract association rules from a decision tree algorithm in machine learning was used to tackle the difficulty of acquiring knowledge of afforestation site rules in an afforestation expert system. At the same time, the knowledge base and reasoning mechanism in the expert system were used to make up for the defects of knowledge expression in the decision tree model. The decision tree algorithm and the afforestation expert system were combined to achieve the advantages of the integrated system and then enhanced the ability of assistant decision. The essence of the CART algorithm is the binary partition of feature space, and it could split nominal attributes and continuous attributes. The core idea was to use the Gini index minimization criterion to select features and generate a binary tree. The larger the Gini index, the greater the uncertainty of the sample set, which is similar to entropy. Table 7 and Table 8 showed that the CART algorithm for the Chinese fir and Masson Pine decision tree models was slightly superior to the other two algorithms. The main difference of these three decision tree algorithms lies in the different measurement standards of attribute splitting, the ID3 algorithm is based on entropy theory and information gain theory, the C5.0 algorithm is based on information gain ratio to select features (an improved algorithm based on C4.5), the CART algorithm selects the Gini index as the sorting criterion. In the binary classification problem, the relation between the Gini index, entropy half and classification error rate, and the curves of the Gini index and entropy half are very close, both of which can represent classification error rate approximately [51]. Various classification algorithms were emerging, but the CART algorithm was the base classifier of many ensemble learning classification algorithms, and it was the most widely used classification technology.
5. Conclusions
To solve the problems of incomplete expert knowledge and difficult acquisition, this paper proposed to determine forest site suitability by quantile regression, then constructed a decision tree of the ID3, C5.0 and CART algorithms, and finally selected the best performing CART algorithm to construct the model. This system constructed the extractor of afforestation rules based on site quality evaluation of the potential productivity method, the knowledge discovery method and the decision tree theory of the machine learning method, taking the Chinese fir and Masson Pine tree species in Jinping County of Guizhou Province as examples. After validating the stand rules of Chinese fir and Masson pine, the production representation method was used to construct the relationship model of the knowledge base. In conclusion, the following are the highlights of the paper:
- Tree suitability site rules were automatically extracted by decision tree algorithms to solve the problem of acquiring and updating the knowledge in the expert system.
- The knowledge of site rules was represented by the production rule method and then applied in the afforestation expert system.
- Site quality of potential productivity was quantified by quantile regression.
- As per the findings, the consistency of the extracted rules and the stand index is more than 70% for Chinese fir and Masson pine.
Because of the limitations of the data, the consistency of the extracted rules and the stand index is more than 70%, which means that the expectation has been generally realized. However, the rule verification (Table 11) results, due to the insufficient amount of sample plot data suitable for the growth conditions of Masson pine, rule extraction has lower prediction accuracy and poor stability, and further research is to improve the prediction accuracy and stability.
It should be pointed out that the decision tree model and rule extractor should be used in the system to realize the dynamic updating of expert knowledge. The next research goal is to develop the prototype of the afforestation expert assisted decision support system based online, the main function of the system is to realize the transformation from data to knowledge, and to provide users with the auxiliary decision making based on the extracted rule knowledge. Further research can explore the inference engine algorithm in the expert system, so that the order questions to be asked will be decided dynamically depending upon the answers of the users. Based on the answers of users, the suitable afforestation tree species are provided.
Author Contributions
Conceptualization, Y.C., D.C. and B.W.; methodology, Y.C. and D.C.; software, D.C.; validation, Y.C.; formal analysis, Y.C.; investigation, Y.C.; resources, B.W.; data curation, Y.C.; writing—original draft preparation, Y.C. and D.C.; writing—review and editing, Y.C.; visualization, Y.C.; supervision, B.W. and Y.Q.; project administration, Y.C., D.C. and B.W.; funding acquisition, B.W.
Funding
This research was funded by the Key National Research and Development Program of China, grant number 2017YFD0600906.
Acknowledgments
We are grateful to the Guizhou Forestry Department and the Guizhou Forestry Investigation and Planning Institute for supplying valuable modeling data.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Table A1.
Tree suitability site rule table for Chinese fir and Masson Pine in Jinping County, Guizhou Province.
Table A1.
Tree suitability site rule table for Chinese fir and Masson Pine in Jinping County, Guizhou Province.
| Rule Number | Region | DM | PD | PX | PW | HB | TRMZ | TRMC | TCHD | Tree Species | Tree Suitability |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | Guizhou Jinping | NoSlope | Sandstone*SandstoneSha*Slate | Chinese fir | MostSuitable | ||||||
| 1856 | Guizhou Jinping | GentleSlope*Incline | HalfsunnySlo*ShadySlope | downhill*MidSlope | Low | SandstoneSha | Yellow Soil | Thick | Chinese fir | MostSuitable | |
| 2 | Guizhou Jinping | Shale | Chinese fir | MostSuitable | |||||||
| 450 | Guizhou Jinping | AbruptSlope*FlatSlope*GentleSlope*Incline | HalfsunnySlo | downhill*MidSlope*Ridge*valley | Slate | Thick | Chinese fir | MostSuitable | |||
| 902 | Guizhou Jinping | AbruptSlope*FlatSlope*GentleSlope*Incline | SunnySlope | MidSlope*Ridge*valley | Slate | Thick | Chinese fir | MostSuitable | |||
| 932 | Guizhou Jinping | AbruptSlope*DangerousSlo*SteepSlope | MidSlope | Low | Sandstone*SandstoneSha | YellowSoil | Thick | Chinese fir | MostSuitable | ||
| 224 | Guizhou Jinping | AbruptSlope*FlatSlope*GentleSlope*Incline | ShadySlope | downhill*MidSlope*Ridge*valley | Slate | Thick | Chinese fir | Suitable | |||
| 903 | Guizhou Jinping | AbruptSlope*FlatSlope*GentleSlope*Incline | SunnySlope | downhill | Slate | Thick | Chinese fir | Suitable | |||
| 243 | Guizhou Jinping | HalfsunnySlo | FlatLand*Ridge*Uphill*valley | Low | SandstoneSha | Middl*Thin | Chinese fir | Unsuitable | |||
| 242 | Guizhou Jinping | ShadySlope*SunnySlope | FlatLand*Ridge*Uphill*valley | Low | SandstoneSha | Middl*Thin | Chinese fir | Suitable | |||
| 120 | Guizhou Jinping | downhill*MidSlope | Low | SandstoneSha | Middl*Thin | Chinese fir | Suitable | ||||
| 1857 | Guizhou Jinping | GentleSlope*Incline | SunnySlope | downhill*MidSlope | Low | SandstoneSha | YellowSoil | Thick | Chinese fir | Suitable | |
| 933 | Guizhou Jinping | AbruptSlope*DangerousSlo*SteepSlope | downhill | Low | Sandstone*SandstoneSha | YellowSoil | Thick | Chinese fir | Unsuitable | ||
| 929 | Guizhou Jinping | GentleSlope*Incline | downhill*MidSlope | Low | Sandstone | YellowSoil | Thick | Chinese fir | Suitable | ||
| 57 | Guizhou Jinping | Uphill | Slate | Thick | Chinese fir | Suitable | |||||
| 117 | Guizhou Jinping | Uphill | Low | Sandstone*SandstoneSha | Thick | Chinese fir | Unsuitable | ||||
| 61 | Guizhou Jinping | downhill*FlatLand*MidSlope*Ridge*Uphill*valley | Low | Sandstone*Slate | Middl*Thin | Chinese fir | Unsuitable | ||||
| 465 | Guizhou Jinping | GentleSlope*Incline | downhill*MidSlope | Low | Sandstone*SandstoneSha | RedSoil*YellowBrownS | Thick | Chinese fir | Unsuitable | ||
| 467 | Guizhou Jinping | AbruptSlope*DangerousSlo*SteepSlope | downhill*MidSlope | Low | Sandstone*SandstoneSha | RedSoil*YellowBrownS | Thick | Chinese fir | Suitable | ||
| 31 | Guizhou Jinping | downhill*FlatLand*MidSlope*Ridge*Uphill*valley | Medi | Sandstone*SandstoneSha*Slate | Middl*Thin | Chinese fir | Unsuitable | ||||
| 119 | Guizhou Jinping | MidSlope | Medi | Sandstone*SandstoneSha | Thick | Chinese fir | Unsuitable | ||||
| 118 | Guizhou Jinping | Ridge*Uphill | Medi | Sandstone*SandstoneSha | Thick | Chinese fir | Suitable | ||||
| 113 | Guizhou Jinping | SteepSlope | downhill*MidSlope*Ridge*valley | Slate | Thick | Chinese fir | Suitable | ||||
| 20 | Guizhou Jinping | MidSlope*Ridge | Thick*Thin | Masson Pine | MostSuitable | ||||||
| 8 | Guizhou Jinping | downhill*FlatLand | Sandstone*Shale*Slate | Masson Pine | MostSuitable | ||||||
| 12 | Guizhou Jinping | ShadySlope | Uphill | Shale*Slate | Masson Pine | MostSuitable | |||||
| 180 | Guizhou Jinping | AbruptSlope*Incline | HalfsunnySlo | MidSlope*Ridge | Shale*Slate | Middl | Masson Pine | MostSuitable | |||
| 9 | Guizhou Jinping | downhill*FlatLand | SandstoneSha | Masson Pine | Unsuitable | ||||||
| 44 | Guizhou Jinping | NoSlope | Shale*Slate | Middl | Masson Pine | Suitable | |||||
| 363 | Guizhou Jinping | Incline | ShadySlope*SunnySlope | MidSlope*Ridge | Shale*Slate | Middl | Masson Pine | Unsuitable | |||
| 362 | Guizhou Jinping | AbruptSlope | ShadySlope*SunnySlope | MidSlope*Ridge | Shale*Slate | Middl | Masson Pine | Suitable | |||
| 21 | Guizhou Jinping | NoSlope | Thick*Thin | Masson Pine | Unsuitable | ||||||
| 26 | Guizhou Jinping | HalfsunnySlo*SunnySlope | Uphill | Shale*Slate | Middl | Masson Pine | Suitable | ||||
| 91 | Guizhou Jinping | GentleSlope*SteepSlope | MidSlope*Ridge | Shale*Slate | Middl | Masson Pine | Unsuitable | ||||
| 23 | Guizhou Jinping | MidSlope*NoSlope*Ridge | Sandstone*SandstoneSha | Middl | Masson Pine | Unsuitable | |||||
| 7 | Guizhou Jinping | Uphill | Sandstone*SandstoneSha | Masson Pine | Unsuitable | ||||||
| 27 | Guizhou Jinping | HalfsunnySlo*SunnySlope | Uphill | Shale*Slate | Thick | Masson Pine | Unsuitable | ||||
References
- Shen, G.; Zhai, M. Silvculture, 2nd ed.; Chinese Forestry Press: Beijing, China, 2011. [Google Scholar]
- Gong, Y. Study of Site Knowledge Discovery Based on Multivariate Forestry Information; Beijing Forestry University: Beijing, China, 2013. [Google Scholar]
- Gillman, L.N.; Wright, S.D.; Cusens, J.; McBride, P.D.; Malhi, Y.; Whittaker, R.J. Latitude, productivity and species richness. Glob. Ecol. Biogeogr. 2015, 24, 107–117. [Google Scholar] [CrossRef]
- Poorter, L.; Van der Sande, M.T.; Arets, E.J.; Ascarrunz, N.; Enquist, B.J.; Finegan, B.; Licona, J.C.; Martínez-Ramos, M.; Mazzei, L.; Meave, J.A.; et al. Biodiversity and climate determine the functioning of Neotropical forests. Glob. Ecol. Biogeogr. 2017, 26, 1423–1434. [Google Scholar] [CrossRef]
- Ali, A.; Lin, S.L.; He, J.K.; Kong, F.M.; Yu, J.H.; Jiang, H.S. Climatic water availability is the main limiting factor of biotic attributes across large-scale elevational gradients in tropical forests. Sci. Total Environ. 2019, 647, 1211–1221. [Google Scholar] [CrossRef] [PubMed]
- Albrektson, A. Needle litterfall in stands of Pinus sylvestris L. in sweden, in relation to site quality, stand age and latitude. Scand. J. For. Res. 1988, 3, 333–342. [Google Scholar] [CrossRef]
- Green, R.N.; Marshall, P.L.; Klinka, K. Estimating site index of Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) from ecological varibles in southwestern British Columbia. For. Sci. 1989, 35, 50–63. [Google Scholar]
- Karlsson, A.; Albrektson, A.; Sonesson, J. Site index and productivity of artificially regenerated Betula pendula and Betula pubescens stands on former farmland in southern and central Sweden. Scand. J. For. Res. 1997, 12, 256–263. [Google Scholar] [CrossRef]
- Carmean, W.H. Forest Site Quality Evaluation in the United States. Adv. Agron. 1975, 27, 209–269. [Google Scholar]
- Zeide, B.; Zakrzewski, W.T. Selection of site trees: The combined method and its application. Can. J. For. Res. 1993, 23, 1019–1025. [Google Scholar] [CrossRef]
- Palahí, M.; Pukkala, T.; Kasimiadis, D.; Poirazidis, K.; Papageorgiou, A.C. Modelling site quality and individual-tree growth in pure and mixed Pinus brutia stands in north-east Greece. Ann. For. Sci. 2008, 65, 501. [Google Scholar] [CrossRef]
- Socha, J. Effect of topography and geology on the site index of Picea abies in the West Carpathian, Poland. Scand. J. For. Res. 2008, 23, 203–213. [Google Scholar] [CrossRef]
- Fu, L.; Lei, X.; Sharma, R.P.; Li, H.; Zhu, G.; Hong, L.; You, L.; Duan, G.; Guo, H.; Lei, Y.; et al. Comparing height–age and height–diameter modelling approaches for estimating site productivity of natural uneven-aged forests. For. Int. J. For. Res. 2018, 91, 419–433. [Google Scholar] [CrossRef]
- Borders, B.E.; Bailey, R.L.; Clutter, M.L. Forest growth models: Parameter estimation using real growth series. In Forest Growth Modelling and Prediction, Proceedings of the IUFRO Conference, Minneapolis, MN, USA, 23–27 August 1987; Ek, A.R., Shifley, S.R., Burk, T.E., Eds.; U.S. Department of Agriculture, Forest Service, North Central Forest Experiment Station: St. Paul, MN, USA, 1988. [Google Scholar]
- Zhao, L.; Ni, C.; Gordon, N. Generalized Algebraic Difference Site Index Model for Ponderosa Pine in British Columbia, Canada. Sci. Silvae Sin. 2012, 48, 74–81. [Google Scholar]
- Kimberley, M.O.; Ledgard, N.J. Site index curves for grown in the South Island high country, New Zealand. N. Z. J. For. Sci. 1998, 28, 389–399. [Google Scholar]
- Hui, G.; Zhang, L.; Hu, Y.; Zhao, Z. A new Method for Establishing Richards Polymorphic Site Index Model: Parameter Replacement. For. Res. 2010, 23, 481–486. [Google Scholar]
- Pienaar, L.V.; Shiver, B.D. Dominant height growth and site index curves for loblolly pine plantations in the Carolina flatwoods. South. J. Appl. For. 1980, 4, 54–59. [Google Scholar] [CrossRef]
- Upadhyay, A.; Eid, T.; Sankhayan, P.L. Construction of site index equations for even aged stands of Tectona grandis (teak) from permanent plot data in India. For. Ecol. Manag. 2005, 212, 14–22. [Google Scholar] [CrossRef]
- Rivas, J.J.; González, J.G.; González, A.D.; Von Gadow, K. Compatible height and site index models for five pine species in El Salto, Durango (Mexico). For. Ecol. Manag. 2004, 201, 145–160. [Google Scholar] [CrossRef]
- Duan, A.; Zhang, J. Modeling of Dominant Height Growth and Building of Polymorphic Site Index Equations of Chinese Fir Plantation. Sci. Silvae Sin. 2004, 40, 13–19. [Google Scholar]
- Wang, C. Establishment of Plantation Site Quality Evaluation System; Beijing Forestry University: Beijing, China, 2013. [Google Scholar]
- Moisen, G.G.; Frescino, T.S. Comparing five modelling techniques for predicting forest characteristics. Ecol. Model. 2002, 157, 209–225. [Google Scholar] [CrossRef]
- Schröder, J.M.; Jaenicke, H. A computerized database as decision support tool for the selection of agroforestry tree species. Agrofor. Syst. 1994, 26, 65–70. [Google Scholar] [CrossRef]
- Hu, B.; Wu, B.; Lu, D. System of Experts on Afforestation Based on ASP. NET. For. Inventory Plan. 2005, 30, 20–23. [Google Scholar]
- Ding, Q.; Wu, B. The design and application of an expert system based on generative rule. Agric. Netw. Inf. 2006, 8, 16–18. [Google Scholar]
- Wu, B.; Ding, Q.; Hu, B. Study on Consultation System for Experts in Afforestation Based on Web. Sci. Silvae Sin. 2006, 42, 85–89. [Google Scholar]
- Wu, B.; Ding, Q.; Wang, L. A forestation planning expert decision advisory system. N. Z. J. Agric. Res. 2007, 50, 1399–1404. [Google Scholar] [CrossRef]
- Ma, C.; Wu, B. Forestation expert system based on the production and framework knowledge representation. Agric. Netw. Inf. 2009, 5, 22–24. [Google Scholar]
- Ma, C. Establishment of Fast-Growing and High-Yielding Forest Cultivation Expert System; Beijing Forestry University: Beijing, China, 2009. [Google Scholar]
- Souza, H.N.D.; Graaff, J.D.; Pulleman, M.M. Strategies and economics of farming systems with coffee in the Atlantic Rainforest Biome. Agrofor. Syst. 2012, 84, 227–242. [Google Scholar] [CrossRef][Green Version]
- Han, Y.; Wu, B.; Liu, J.; Guo, Y.; Dong, C. Application of uncertainty inference in the forest cultivation expert system. J. Beijing For. Univ. 2014, 36, 88–93. [Google Scholar]
- Prabakaran, G.; Vaithiyanathan, D.; Ganesan, M. Fuzzy decision support system for improving the crop productivity and efficient use of fertilizers. Comput. Electron. Agric. 2018, 150, 88–97. [Google Scholar] [CrossRef]
- Vásquez, R.P.; Aguilar-Lasserre, A.A.; López-Segura, M.V.; Rivero, L.C.; Rodríguez-Duran, A.A.; Rojas-Luna, M.A. Expert system based on a fuzzy logic model for the analysis of the sustainable livestock production dynamic system. Comput. Electron. Agric. 2019, 161, 104–120. [Google Scholar] [CrossRef]
- Chi, Q. Classification Algorithm Study and Application based on Decision Tree; Shandong Normal University: Jinan, China, 2005. [Google Scholar]
- Miao, J.; Gong, Y. Data mining of environmental factors affecting the value of forest assets. Land Resour. North China 2013, 2, 50–54. [Google Scholar]
- Torresan, C.; Corona, P.; Scrinzi, G.; Marsal, J.V. Using classification trees to predict forest structure types from LiDAR data. Ann. For. Res. 2016, 59, 281–298. [Google Scholar] [CrossRef]
- Isaac-Renton, M.G.; Roberts, D.R.; Hamann, A.; Spiecker, H. Douglas-fir plantations in Europe: A retrospective test of assisted migration to address climate change. Glob. Chang. Biol. 2014, 20, 2607–2617. [Google Scholar] [CrossRef]
- Wang, T.; Campbell, E.M.; O’Neill, G.A.; Aitken, S.N. Projecting future distributions of ecosystem climate niches: Uncertainties and management applications. For. Ecol. Manag. 2012, 279, 128–140. [Google Scholar] [CrossRef]
- Marchi, M.; Ducci, F. Some refinements on species distribution models using tree-level National Forest Inventories for supporting forest management and marginal forest population detection. iForest Biogeosciences For. 2018, 11, 291. [Google Scholar] [CrossRef]
- De-Miguel, S.; Mehtätalo, L.; Shater, Z.; Kraid, B.; Pukkala, T. Evaluating marginal and conditional predictions of taper models in the absence of calibration data. Can. J. For. Res. 2012, 42, 1383–1394. [Google Scholar] [CrossRef]
- Meng, X. Forest Mensuration, 3rd ed.; China Forestry Publishing House: Beijing, China, 2006. [Google Scholar]
- Lei, X.; Fu, L.; Li, H.; Li, Y.; Tang, S. Methodology and Applications of Site Quality Assessment Based on Potential Mean Annual Increment. Sci. Silvae Sin. 2018, 54, 116–126. [Google Scholar]
- Hao, L.; Naiman, D.Q. Quantile Regression Model; Shanghai People’s Publishing House: Sahnghai, China, 2012. [Google Scholar]
- Zhang, L. The Change Point Problem in Quantile Regression and Its Application; Economic Management Press: Beijing, China, 2017. [Google Scholar]
- Su, Y.; Wan, Y. The Idea and Application of Quantile Regression. Stat. Thinktank 2009, 10, 58–61. [Google Scholar]
- Carmean, W.H.; Lenthall, D.J. Height-growth and site-index curves for jack pine in north central Ont. Can. J. For. Res. 1989, 19, 215–224. [Google Scholar] [CrossRef]
- Huang, S.; Titus, S.J. An index of site productivity for uneven-aged or mixed-species stands. Can. J. For. Res. 1993, 23, 558–562. [Google Scholar] [CrossRef]
- Feng, Y.; Huang, T.; Hou, C. Design of Expert System and Neural Network Integration System. J. Manag. Sci. China 1999, 1, 82–88. [Google Scholar]
- Feng, Y. Research on Intelligent Decision Support System Based on Integration of Neural Network and Expert System; Harbin University of Technology: Harbin, China, 1999. [Google Scholar]
- Li, H. Statistical Learning Method, 2nd ed.; Tsinghua University Press: Beijing, China, 2019. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).




