**3. Results**

#### *3.1. Determination of the Optimal Novel Vegetation Index*

According to the calculation formula of the novel vegetation index (*NDVIRE*), the value range of the weighting coefficients "*α*" and "*β*" is (0,1), and the step size is 0.1, so 121 *NDVIRE* can be obtained. Python 3.10 software was used to calculate each *NDVIRE* value of all small class data, and the Pearson correlation coefficient of each *NDVIRE* with the FSV per unit area was also calculated. Results of the analysis are shown in Figure 2 (correlation is significant at the 0.01 level (two-tailed). In addition, the Pearson correlation coefficient was also put between the traditional NDVI and unit area FSV in the figure for comparison. Results showed the 47th *NDVIRE* to have the highest correlation coefficient (r = 0.778), which is better than the traditional NDVI (r = 0.767), and its corresponding values of "*α*" and "*β*" were 0.4 and 0.2, respectively. Therefore, the optimal *NDVIRE* was determined and used for the subsequent modeling analysis.

**Figure 2.** Pearson correlation coefficients of the NDVI and *NDVIRE* with FSV per unit area.

#### *3.2. Major Variables Selection and the Importance Related to the FSV Data*

Two types of variables, the band (B2, B3, B4, B5, B6, B7, B8, and B8A) and vegetation index (NDVI, DVI, RVI, PVI, TVI, EVI, and *NDVIRE*) were selected to participate in the modeling. Figures 3–5, represent the variable selection process of the three models (BBM, VBM, and BVBM). Meanwhile, Table 4 shows the final variable selection results of each model.

**Figure 3.** The variables selection of BBM. (**a**,**b**) Removes the negatively important variables based on the variable importance (VI) mean and standard deviation, respectively ((**a**), the threshold position is represented by a solid red line that runs horizontally, and (**b**), the green segmented line represents the predicted value given by the CART model, and the red line with dashes running horizontally represents the minimum predicted value). (**c**) Gradually builds a random forest from only the most important variables to all variables selected in the first step, and selects the corresponding variables according to the average OOB error (the vertical solid red line indicates the minimum error position). (**d**) Gives the number of variables meeting the requirements.

**Figure 4.** The variables selection of VBM. (**a**,**b**) Removes the negatively important variables based on the VI mean and standard deviation, respectively ((**a**), the threshold position is represented by a solid red line that runs horizontally, and (**b**), the green segmented line represents the predicted value given by the CART model, and the red line with dashes running horizontally represents the minimum predicted value). (**c**) Gradually builds a random forest from only the most important variables to all variables selected in the first step, and selects the corresponding variables according to the average OOB error (the vertical solid red line indicates the minimum error position). (**d**) Gives the number of variables meeting the requirements.

**Figure 5.** The variables selection of BVBM. (**a**,**b**) Removes the negatively important variables based on the VI mean and standard deviation, respectively ((**a**), the threshold position is represented by a solid red line that runs horizontally, and (**b**), the green segmented line represents the predicted value given by the CART model, and the red line with dashes running horizontally represents the minimum predicted value). (**c**) Gradually builds a random forest from only the most important variables to all variables selected in the first step, and selects the corresponding variables according to the average OOB error (the vertical solid red line indicates the minimum error position). (**d**) Gives the number of variables meeting the requirements.


Furthermore, all predictor variables were ranked based on their ability to estimate FSV using PercentIncMSE and IncNodePurity estimated from the OOB data. The greater the value, the greater the significance of the variable (Figure 6). It is worth noting that the novel vegetation index *NDVIRE* ranks first in importance under the two evaluation criteria.

**Figure 6.** Importance ranking plot of all variables. Left, %IncMSE (percentage increase in the mean square error, (**a**)), and right, IncNodePurity (increase in NodePurity, (**b**)).

#### *3.3. Optimal Regression Model for the Three Models*

To optimize the RF regression model, we need to find the optimal values for two key parameters: "mtry", which determines the number of variables randomly selected as candidates for each split in the decision tree, and "ntree", which determines the total number of trees in the forest that have grown. To calculate the minimum error rate, an iterative algorithm was used, known as an "error rate loop", according to the number of variables participating in the modeling in the three models. Figure 7 shows the determination process of the optimal mtry and ntree of the three models. The values of mtry, ntree, and other performances of each model are summarized in Table 5.
