1. Introduction
The tree crown is an important part of a tree that reflects the growth status of individual tree, and also reflects the adaptation and variation degree of trees to different growth environments [
1,
2]. Significant physiological processes such as photosynthesis, respiration, and transpiration take place in the tree crown. The ecological environment in the tree crown is also an important component of the forest ecosystem. The shape of the tree crown and the distribution of its leaves affect the interception of rainfall and the utilization of solar energy [
3,
4,
5,
6]. The tree crown structure affects the growth of trees and also the dynamic changes of forest stands. Therefore, the study of the shape of tree crowns is of great significance. Visualization of the tree crown provides the basis for forest stand dynamic visualization simulation and is a major research topic in forestry informatization. It is also important for a three-dimensional (3D) visualization assistant decision-making system of plantations, because it permits direct observation of the growth status of plantations and calculations of crown density. The shape of tree crowns is often considered to be a feature of space geometry with boundary [
7,
8]. Using the vertical plane of the trunk to cut the tree crown, the closed intersection line is called the crown contour envelope of the tree crown. Similarly, the crown contour envelope can be rotated around the trunk to the tree crown.
Many methods have been used to study tree crown contour envelope, such as the fractal method, the simple geometry method, and the mathematical model method.
The fractal method is a geometry concept, proposed by Mandelbrot in the 1970s, which has been widely used in natural botany [
9]. Paul Henning, Kangning Lu, and other scholars used the fractal method of tree growth modeling to construct crown contour envelope models (CCEMs) [
10,
11]. However, shortcomings of the fractal method include the difficulty of expressing tree crowns of different tree species and the fact that it cannot reflect the influence of a series of parameters on tree crown growth.
The simple geometry method uses simple geometry, such as cylinder, cone, parabola or their combination, to represent the shape of tree crowns. In early research, Gill, Biging, Hann, Marshall, and other scholars defined the tree crown shape of different tree species, such as Douglas fir (
Pseudotsuga menziesii (Mirbel) Franco) and eastern hemlock (
Tsuga canadensis (L.) Carrière), and different growth stages of the same tree species, as regular geometric bodies such as cone, paraboloid, ellipsoid, and cylinder, and established models to predict tree crown volume and crown radius at any crown height [
12,
13,
14,
15]. This method has long been used to study the relationship between crown size and tree growth. Enying Guo and Han disassembled the entire Chinese fir plantation into cone, circular truncated cone, and cylinder, and then constructed a model for tree crown morphology based on diameter at breast height DBH, tree height, crown radius and crown length. They used visualization technology to realize a 3D growth visual simulation of a Chinese Fir plantation [
16,
17]. Because it is difficult to measure tree crown factor, many researchers have used this simple and convenient method to predict tree crown shape. However, this method cannot be used to completely describe tree crown given that the tree crown shape of trees in stands is irregular.
The mathematical model method involves using mathematical equations to simulate tree crown contour envelope at certain growth stages. Crecente Campo, Hann, Yanrong Guo, and other scholars used simple polynomials and deformation piecewise function equations to define tree crown contour envelope [
18,
19]. Some researchers have used an improved Kozak, Weibull, and other special curves to define the crown shape as well as different methods for building the CCEMs [
20]. This method can accurately describe the crown shape with a small number of tree parameters. However, because of the small number of variables used, such models do not effectively reflect the differences in tree morphology and do not reflect complex dynamic changes of the crown in the process of tree growth. In addition, reparameterization of these models through the addition of relevant tree factors is complex.
With the development of artificial intelligence technology, machine learning provides a new method for forest growth and harvest prediction [
21]. Machine learning has several advantages, including the lack of hypothesis for the distribution of input data, the ability to reveal the hidden structure of data, and robust prediction results. Machine learning has been widely used in forest research, including the prediction of tree height, DBH, and volume [
22,
23,
24,
25,
26]. Drawbacks of mathematical modeling fitting methods include the difficulty of determining the form of the model, selecting tree factors, and determining the regression equation; in addition, the model forms or parameters of different tree species or different regions of the same tree species also differ. Thus, much work is required to revise these models. The machine learning method is free of these problems and can quickly generate models in line with research objectives to simulate tree crown.
Random forest is an integrated algorithm based on the classification tree proposed by Leo Breiman [
27], which is a set of tree classifiers that uses the bootstrap repeated sampling method to extract samples from samples for modeling. The final output is a simple majority voting (classification) or average (regression) of output results of a single tree. The random forest model is more robust to outliers and noise, is more rapid than boosting algorithm, and is less overfitted. This model has been applied to fire prediction, forest growth, and harvest prediction [
28].
Fujian Province is located in North China and it has the highest forest coverage rate (66.8%) in China. Chinese fir is one of the most important plantation tree species in Fujian Province and accounts for 21.35% of the total plantation area in China. To date, few studies have been conducted on the crown contour envelope of Chinese fir plantations. Most CCEMs based on mathematical modeling have only used crown depth as an independent variable to predict crown shape, and few studies have added variables such as tree height and diameter at breast height as covariates to improve model accuracy. In addition, machine learning has not been used to predict Chinese fir crown shape. Thus, the goals of the present study were the following: (1) to collect classical CCEMs suitable for Chinese fir and use Chinese fir crown data in Fujian Province to fit them; (2) to use different feature selection methods to screen tree factors that affect the crown shape of Chinese fir, construct the random forest regression model, and then fit hyperparameters; and (3) to evaluate the CCEMs constructed by mathematical modeling and random forest regression model and compare them.
4. Discussion
The tree crown is important for evaluating the growth vigor of trees and the status of competition with adjacent trees. Forest stand 3D visualization is also an important part of the decision-making system for plantation growth and harvest. In the early 3D visualization of forest stands, trees were only defined using some simple geometry, such as cylinders and cones. Such an approach could not accurately capture the actual growth of trees. In the 1980s, some researchers applied the concept of fractals to the visualization of the tree crown contour. Although this method could capture the shape of the tree crown contour to a certain extent, there was no otherness in the tree crown contour based on this method, and the fractal parameters were not easy to determine. The method of taking the tree crown contour as a continuous and complete line segment and expressing it with a specific function expression has been considered. In the early stage, there were only two parameters in this equation: DINCT (or DINCB) and CR. Therefore, the model based on this model was used to describe tree crown uniformity, but the shape of the tree crown contour was different in different growth stages. Consequently, some researchers tried to add some variables, such as AGE, N, DBH, and CL into the equation. However, AGE is strongly correlated with the DBH, CL, and other variables. Therefore, adding these variables to modify the mathematical modeling can improve the accuracy of the model and better reflect the differences among trees, but the determination and modification of the model form are difficult; furthermore, the model forms of different tree species and different ages need to be considered comprehensively. Among the models mentioned in this paper, the HT, N, AGE, CH, and other variables do not show noticeable improvement in the model accuracy.
The results of random forest regression showed that the addition of multiple tree characteristic factors improved the fitting accuracy of Chinese fir crown contour envelope. In addition, the precision of random forest regression model constructed by different combinations of tree characteristics was also different. Therefore, using a single factor such as HT and AGE, and composite factors such as CR and CLC to predict Chinese fir crown contour envelope could prove to be useful. In both the training set and test set, the simulation accuracy and model interpretation were higher for the random forest regression model than the mathematical regression model, and the overall effect of random forest regression model was better. The results of variable importance analysis showed that the main factors affecting the Chinese fir crown contour envelope in Chinese fir plantation were LCR, N, AGE, DBH, and HT. Among these factors, LCR had the most significant effect on the Chinese fir crown contour envelope.
The CCEM based on the random forest regression method does not need to consider the correlation between variables, and the process is relatively simple. Therefore, we can select different forms of variable combinations to select the best group to build a random regression forest model. In this study, the random forest regression models constructed by four feature selection methods showed high performance; the best was the random forest model constructed by MI. The reserved features of this method were N, AGE, DBH, HT, HBLC, LCR, CLC, DINC
T, DINC
B, RDINC
T, and RDINC
B. Among these variables, N and AGE were the initial factors, and the DBH, HT, HBLC, and LCR had mature growth models with AGE and N and SI (site index) as variables and its distribution model; the other composite factors could be calculated from the above single factor [
49,
50,
51,
52]. Therefore, the CCEM based on the random forest has higher accuracy than the CCEMs based on mathematical modeling, and it can describe different shapes of tree crown at various stages of growth. Therefore, the random forest CCEM can accurately reflect differences in tree crown morphology among forests. Thus, the forest stand 3D model is of great significance for a 3D visualization of a plantation and for the management of plantation growth and harvest.
Figure 11 shows the crown contour envelope of a Chinese fir plantation with 5-year, 10-year, 15-year, 20-year, and 25-year standard trees. The X-axis is DINC
T, and the Y-axis is CR. The mathematical modeling regression model and random forest regression model selected the best performing Model (7), i.e., the random forest regression model based on MI. The 5-year, 10-year, and 15-year prediction results show that the CCEM based on random forest regression model is superior to the mathematical modeling regression model. For 20-year-old Chinese fir, the random forest prediction result is slightly better than the mathematical modeling, whereas the prediction results of 25-year-old Chinese fir are close. In general, the random forest method has higher fitting accuracy than mathematical modeling.
One advantage of the mathematical modeling approach is that it is highly generalized; consequently, the CCEM constructed by mathematical modeling is relatively simple. All of the trees it describes in a stand have the same crown shape, but this does not apply to the requirements of modern forestry precision management. A CCEM based on random forest can accurately reflect the differences among trees in a stand combined with existing stand distribution models, such as the HT distribution model and the DBH distribution model. Covariables such as HT and N can also be added to the mathematical model to improve the prediction accuracy. However, the form of the model is extremely difficult to determine, the fitting is more difficult, and its generalization is also reduced to some extent. For example, Chengde Wang added covariates DBH, HT, and N to Models (1) and (9) to obtain Models (12)–(14), and the results showed that adding covariates effectively improved the fitting accuracy [
38]. However, in our study, improvements associated with adding covariates were small. If more covariables are added, the model form becomes more difficult to control. For example, Model (7) with three covariables had the lowest prediction accuracy among all of the CCEMs. Another advantage of random forest is that it can help to identify the tree factors most closely related to the crown in the process of feature screening, which aids the study of the crown shape. According to the four feature screening methods in this study, AGE, DBH, N, and HT significantly affect crown shape. Aiming at extended the study case problem of machine learning black box, several random sampling tests were carried out for further discussion in this paper. The splitting of the dataset was the common method; the dataset was randomly split 200 times into the model training set (70%) and the test set (30%). After selecting, validating, and comparing the parameters, the final parameters showed powerful stability in multiple iterations. The ultimate goal establishes that CCEMs aid plantation management. Therefore, CCEMs that are constructed based on random forest can be tailored to specific areas. As the amount of sample data increases, the prediction accuracy of the CCEM increases, because the random regression forest provides a robust method for dealing with the similar samples. For Chinese fir in other areas or other tree species, the feature combination form or hyperparameter optimization scheme used in this study may not be optimal, but if sample data are sufficient, the method described in this paper could still be used to construct CCEMs based on random forest in target areas or tree species.