**1. Introduction**

Vegetation is produced as a result of the interactions among factors such as soil, atmosphere, and moisture [1]. Vegetation is affected by climate because of biophysical responses such as plant respiration, photosynthesis, and evapotranspiration [2]. Recent research found that vegetation plays a key role in future terrestrial hydrologic response, and understanding water stress is of the utmost importance for properly predicting future dryness and water resources [3]. Changes in global climate and associated effects on vegetation condition have received an increasing amount of attention [4]. Among such research, the Normalized Difference Vegetation Index (NDVI) is frequently used to monitor changes in vegetation conditions, because of its close relationship with photosynthetically active radiation, which is absorbed by photosynthesizing tissues [5,6]. With the improvement of remote sensors, the NDVI has been widely applied in continental and regional

research [7]. Continuous NDVI datasets make it possible to trace vegetation conditions changes and explore the underlying climate factor-associated mechanisms [8,9]. The NDVI has been widely exploited to monitor and quantify drought disturbance in semiarid and arid regions with low values corresponding to stressed vegetation [10,11]. As a known covariate with other environmental variables, the NDVI was also applied to soil-loss-prone area identification [12,13], wetland delineation [14], irrigation and soil salinity management [15]. Therefore, quantifying the relationship between NDVI and climate factors, and predicting the NDVI trends will help effectively guide regional water resource managements [16,17].

Yarlung Zangbo River, the longest river on the Tibetan Plateau, has high spatial heterogeneity in vegetation conditions and is the main freshwater resource of local residents and downstream countries. As one of the most important ecosystems in the Tibetan Plateau, the vegetation conditions of the Yarlung Zangbo River Basin (YZRB) have a significant impact on the water balance and biological population of the Tibetan Plateau and surrounding areas [17]. Because of the influence of the plateau's high altitude, YZRB vegetation is extremely fragile and sensitive to global climate change. In recent years, statistically significant warming and intensive drought were observed in the YZRB [18], where the cultivated land accounts for about 62.89% of the area of the Tibet Autonomous Region [19]. Soil erosion is another water resources problem of YLZB, where the vegetation conditions play an important role [20]. Moreover, the changes in vegetation cover also influence the water availability of the YLZB [21,22]. Therefore, investigating and modelling the vegetation responses to climate changes is of great significance to the water resource management of YLRB and the water governance of the transboundary rivers [23]. Han et al. explore the relationship between the NDVI and the meteorological variables of the YZRB [24]. Liu et al. analyzed the spatiotemporal patterns of vegetation during 1998–2014 using the NDVI [25]. Sun et al. investigate the spatial heterogeneity of changes in vegetation growth and their driving forces using the NDVI of the YLZB [26]. Based on these researches, an NDVI prediction model that incorporates a comprehensive understanding of the climate–vegetation–hydrology relationships could be important for integrated water resource management.

A large amount of studies have been devoted to exploring the response of the NDVI to precipitation and temperature on regional and global scales, which are the most common climate factors [27,28]. Most of the studies adopted linear methods, such as partial correlation coefficient [29], complex correlation coefficient [30] and linear regression [31]. Due to the complexity of ecosystem and the uncertainties of vegetation dynamics, nonlinear modes, especially machine learning models, attached the attention of researchers [32–35]. Moreover, because the climate and topography show high heterogeneity from upstream to downstream regions [36,37], it puts forward higher requirements on the universal abilities of prediction models in the YZRB. Furthermore, because of the diversity of ecosystems and climate characters, the correlation between NDVI and climate are diverse in different regions [38]. Therefore, predictor selection is also a challenge for NDVI prediction models. Recently, random forest (RF) has received substantial attention in water resource research [39,40]. RF is advantageous because it can handle large datasets and undergoes predictor selection using a built-in variable importance evaluation method [41,42]. Therefore, RF should be highly suitable for the NDVI prediction of the YZRB. This is the first time RF has been applied to explore the complex relationship between the NDVI and climatic factors to the best of our knowledge.

The objective of this study was to propose feasible NDVI prediction models for the YZRB on the subzone scale. RF was adopted to simulate the relationships between NDVI and climatic factors. A comparison was then conducted between the RF and Artificial Neural Network (ANN) and Support Vector Machines (SVM) models. For comparative study, principal component analysis (PCA) and partial correlation analysis (PAR) were used for predictor selection of the models. This research will improve our knowledge on the climate–vegetation–hydrology relationships of the YZRB, which is an important high-altitude continental plateau basin.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The Yarlung Zangbo River is the largest river on the Tibetan Plateau and one of the most important international rivers [43]. It originates from the Jemayangzong Glacier in southern Tibet and has a total length of 2229 km and its drainage area is 2.42 <sup>×</sup> 105 km<sup>2</sup> [44]. This river is one of the highest rivers in the world, with an average elevation of above 4600 m, and tilts from the west to the east, with an average slope of 2.6◦ [21]. The Yarlung Zangbo River has six major tributaries (the Dogxung Zangbo, Nyangqu, Lhasa, Nyang, Yigong Zangbo, and Purlung Zangbo Rivers). The locations of the YZRB and its tributaries are shown in Figure 1.

**Figure 1.** Location of the Yarlung Zangbo River Basin (YZRB) and its five subzones.

Because of the unique topographic characteristics and high altitude of the plateau, the vegetation and ecological environment of the YZRB are relatively fragile and complex [23] and show obvious changes from upstream to downstream [45]. According to the China Vegetation Atlas (Figure 2), the upstream region is located in an arid zone that is dominated by alpine grassland and meadows [46]. With decreasing elevation, the midstream transitions into a continental climate and is mainly covered by alpine grassland and meadows, and the cultivated vegetation slightly increases. The lower reaches of YZRB have a subtropical climate and are mainly covered by coniferous and broadleaf forests.

**Figure 2.** Vegetation conditions map of the YZRB.

Yarlung Zangbo river has more than 130 tributaries, which is larger than 100 km2, and its major tributaries include Nianchu River, Lhasa River, Nyang River, and Parlung Tsangpo. By considering the hydrological and vegetation similarity, which is shown in Figures 1 and 2, the YLZB is divided into 5 subzones in the research. The area and vegetation conditions of the five subzones are shown in Tables 1 and 2.


**Table 1.** Name and area of the subzone.


**Table 2.** Proportion of different vegetation types in each subzone.

#### *2.2. Data Description*

A quality-controlled NDVI remote sensing product (MOD13A3) is selected in this study, obtained from the observation of MODIS (Moderate Resolution Image Spectroradiometer) data provided by NASA, spanning 16 years (February 2000 to December 2015). MOD13A3 is the third level product, based on the secondary product, corrected the edge distortion (Bowtie effect) produced by the sensor imaging process. The spatial resolution of the product is 1.1 km × 1.1 km, while the time resolution is monthly. The data were processed into the Geostationary Earth Orbit Tag Image File Format (GEO TIFF) by MODIS Reprojection Tool (MRT) software and processed by ArcGIS projection splicing. The monthly mean air temperature and precipitation data covering 2000–2015 from 30 meteorological stations located in the YZRB were collected from the China Meteorological Data Network. The locations of the meteorological stations are shown in Figure 1.

#### *2.3. Methodology*

#### 2.3.1. Random Forest

RF was first proposed by Breiman (2001) [47] as an ensemble learning method that can be used in both classification and regression tasks. The model is considered capable of dealing with small sample sizes and high-dimensional correlation relationships [47]. RF is also advantageous because of its robustness; it does not easily lead to overfitting or provide biased estimates when predictors that do not add information are used [48].

RF includes a group of classification and regression decision trees (CARTs) that are unpruned and generated by bootstrap sampling and random variable selection. The RF algorithm can be divided into the following steps. First, the training dataset is randomly extracted from the original dataset by bootstrap resampling. Second, the CARTs are established for each training set. Compared with the

traditional CART method, RF selects random feature combinations to split each node, and each CART grows to the maximum extent without any pruning. Finally, the RF output is obtained by voting in classification mode or averaging in the regression mode of all of the CART predictions.

RF provides a built-in cross-validation process that occurs in parallel with the training procedure for the out-of-bag (OOB) samples, which are not chosen by the bootstrap process. RF can evaluate variable importance by randomly permuting these variables and observing the difference in model performance using OOB samples. At the end of procedure, RF obtains variable importance by averaging these differences, which is then normalized by the standard deviation.

#### 2.3.2. Model Implementation and Validation

The RF is utilized to simulate the nonlinear relationship between the NDVI and climate factors in the 5 subzones. The monthly area-averaged NDVI datasets for each subzone were obtained by calculating the average values of each pixel. The monthly area-averaged precipitation and temperature of 5 subzones were obtained from actual data. For model development, the monthly average NDVI datasets and monthly area precipitation and temperature datasets of each subzone are divided into two datasets. The datasets from 2000 to 2009 were used for model calibration, and those from 2010 to 2015 were used for model validation.

Two machine learning models, ANN and SVM, which were previously used for NDVI prediction [49,50], were selected to compare with RF performance. A three-layer back propagation (BP) ANN model was used in this research. The BP method has been the most widely used algorithm to design multiple layer neural networks, and has also been successfully used for NDVI prediction [16]. SVM was the first classification machine learning algorithm and was proposed by Vapnik, and then gradually derived to the regression algorithm [51]. SVM has been widely used for hydrological prediction, and most recently for NDVI prediction. In this research, SVM with linear kernel function was used, which has been widely used in former studies.

One of the most important steps in the development of machine learning prediction models is the choice of appropriate predictors. Due to the spatial heterogeneity of the vegetation in the YLZB, the relationships between the NDVI and climate factors are different in the 5 subzones. Therefore, it is important to choose appreciate climate factors as predictors in NDVI prediction. In previous studies, PCA and PAR were used for predictor selections of ANN and SVM models. For comparative study, here, PAR and PCA were both used for predictor selection of the ANN and SVM models [52]. For PCA, the climate factors are standardized by subtracting the mean from the original values and then dividing the results by the standard deviation of the original variables. The PCA method is then applied to the standardized climate factors to extract principal components (PCs) that are orthogonal. The obtained PCs preserve more than 90% of the variances that are selected as predictors. Then, the PCs are used in the ANN and SVM modeling, and these results are marked as, ANN-PCA and SVM-PCA. For PAR, climate factors with a partial correlation coefficient greater than 0.3 were selected as predictors, and the corresponding results are marked as ANN-PAR and SVM-PAR.

#### 2.3.3. Model Evaluation Index

The mean absolute percentage error (MAE), Nash–Sutcliffe coefficient (NASH), root mean square errors (RMSE), and correlation coefficient (R) statistical indicators were used to assess the predictive performance of the ANN, SVM, and RF models. MAE, NASH, RMSE, and R were defined as:

$$NASH = 1 - \frac{\sum\_{i=1}^{n} \left(\boldsymbol{Y}\_{i,\text{obs}} - \boldsymbol{Y}\_{i,\text{sim}}\right)^2}{\sum\_{i=1}^{n} \left(\boldsymbol{Y}\_{i,\text{obs}} - \overline{\boldsymbol{Y}\_{\text{obs}}}\right)^2},$$

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{n} \left(Y\_{i,\text{obs}} - \overline{Y}\_{i,\text{sim}}\right)^{2}}{n}},$$

$$MAE = \frac{\sum\_{i=1}^{n} \left| Y\_{i,\text{obs}} - \overline{Y}\_{i,\text{sim}} \right|}{n},$$

$$r = \frac{\sum\_{i=1}^{n} \left[ \left(Y\_{i,\text{obs}} - \overline{Y}\_{\text{obs}}\right) \left(Y\_{i,\text{sim}} - \overline{Y}\_{\text{sim}}\right) \right]}{\sqrt{\left[\sum\_{i=1}^{n} \left(Y\_{i,\text{obs}} - \overline{Y}\_{\text{obs}}\right)^{2}\right]} \sqrt{\left[\sum\_{i=1}^{n} \left(Y\_{i,\text{sim}} - \overline{Y}\_{\text{sim}}\right)^{2}\right]}}$$

where *Yi*,*obs* is the measured NDVI value of the station, *Yobs* is the mean of the observed NDVI value, *Yi*,*sim* is the vector of the simulated NDVI value, and *Ysim* is the mean of the simulated NDVI value. In general, a higher NASH value indicates better model efficiency; in contrast, smaller RMSE, MAE, and R values indicate higher accuracy.

]

#### **3. Results and Discussion**

#### *3.1. Spatial and Temporal Characteristics of the NDVI in the YZRB*

The inter-annual variations of the NDVI, precipitation, and temperature on the subzone scale from 2000 to 2015 are shown in Table 3. The NDVI and temperature values showed a statistically insignificant increase, whereas the average precipitation of the Yarlung Zangbo River Basin significantly decreased from 528 mm in 2000 to 396 mm in 2015, with a total increase of 0.8 ◦C over the 16 years. This finding is consistent with the results of previous studies [26].


**Table 3.** Rainfall, temperature and the Normalized Difference Vegetation Index (NDVI) perennial change rate.

In the five subzones, NDVI gradually increased from upstream to downstream. The average annual growth of NDVI in the five subzones was 0.1 <sup>×</sup> 10−3, 0.1 <sup>×</sup> 10−3, 0.4 <sup>×</sup> 10−3, 0.7 <sup>×</sup> 10−3, and 0.2 <sup>×</sup> <sup>10</sup><sup>−</sup>3. The precipitation and temperature show similar trends. The average annual growth of precipitation was −3.9, −3.7, −9.86, −13.86, and −12.8; the average annual growth of temperature was 0.02, 0.04, 0.07, 0.04, and 0.01.

#### *3.2. Predictors Selection*

In order to determine the optimal predictors for NDVI prediction models, PCA and PAR were used to analyze the relationships between NDVI and precipitation/temperature at different lead times. The results are shown in Tables 4 and 5, where *Pn* represents the average precipitation with a lead time n month, and *Tn* represents the average precipitation with a lead time n month. With reference to similar studies and the meteorological cycles [32–35], the maximum lead times were set to 6 months.

As shown in Tables 4 and 5, the correlations between the NDVI and precipitation/temperature gradually decayed with the increase of lead time. The PCA results show that the precipitation and temperature whose lead time was shorter than 2 months had major impacts on the NDVI in these subzones. However, the PAR results varied in these subzones. In Sub1 and Sub5, the precipitation in the present month and temperature whose lead time was shorter than 2 months had major impacts on the NDVI. In Sub2, the precipitation whose lead time was shorter than 1 month and temperature whose lead time was shorter than 2 months had major impacts on the NDVI. In Sub3, the precipitation whose lead time shorter than 1 months and temperature whose lead time shorter than 3 months had major impacts on NDVI. In Sub4, the precipitation whose lead time was shorter than 2 months and temperature whose lead time was shorter than 3 months had major impacts on the NDVI. In general, the relationships between the NDVI and temperature were slightly closer than those between NDVI and precipitation in the five subzones.

RF evaluates the relative contribution of each predictor using a built-in variable importance evaluation process. The importance of the precipitation/temperature at different lead times in these subzones are calculated and indicated in Figure 3. As illustrated in Figure 3, although the importance of precipitation and temperature gradually decreased, the increase in lead time and the decreases were not as significant as in the PCA and PAR results. This finding may indicate that RF can use all predictors without overfitting. Thus, the precipitation and temperature whose lead time was shorter than 6 months were used for RF modeling of the five subzones.

**Figure 3.** Important factors (rainfall and temperature) in different months.


**Table 4.** Contribution and cumulative contribution rates for selected principal components.

T0: Temperature of the month, P0: rainfall of the month, T1: temperature of the previous month, P1: rainfall of the previous month, T2: temperature of the first 2 months, P2: rainfall of the last 2 months, T3: temperature of the first 3 months, P3: rainfall in the first 3 months.


**Table 5.** Partial correlation calculation results.

T0: Temperature of the month, P0: rainfall of the month, T1: temperature of the previous month, P1: rainfall of the previous month, T2: temperature of the first 2 months, P2: rainfall of the last 2 months, T3: temperature of the first 3 months, P3: rainfall in the first 3 months.

#### *3.3. Comparative Study*

The calibration and validation results of the RF and comparative models are summarized in Table 6.


**Table 6.** Machine learning calculation results.

The results show that RF was superior to the comparative models in the calibration and validation periods. The NASH RF values for the five subzones were 0.96, 0.97, 0.96, 0.94, and 0.92 in the calibration period, and 0.91, 0.95, 0.96, 0.89, and 0.83 in the validation period. All of the measured criteria were superior to those of the compared models (ANN and SVM).

The results of the two-parameter selection were also compared between the ANN and SVM models. PCA was superior to PAR for both the ANN and SVM models. For the ANN models, the average RMSE and MAE were similar in both the calibration and validation periods. However, the average NASH and R of the results using PAR were superior to those of the PCA by 0.03 and 0.04 in the calibration period, and 0.03 and 0.05 in the validation period, respectively. For the SVM models, the average NASH and R increased by 0.03 and 0.02 in the calibration period, and 0.03 and 0.02 in the validation period, respectively. The average RMSE and MAE decreased by 0.002 and 0.004 in the calibration period, and 0.004 and 0.006 in validation period, respectively. Therefore, PCA was advantageous over PAR, with increases of NASH and R, and decreases of RMSE and MAE.

### **4. Conclusions**

As a key component of ecohydrological processes, vegetation conditions influence the efficiency of plant water use and potentially affect water resources. Therefore, investing the changes of vegetation conditions and exploring the vegetation responses to climate changes will provide essential information for regional water resource management [53,54]. Combining with climate models, NDVI prediction models can assess the effects of future drought events [10]. As a covariate with other environmental variables, NDVI prediction models will also provide essential information for irrigation management [15] and soil-loss-prone area identification [12,13], etc. By exploring the vegetation condition changes of the YZRB and their relationship with climatic factors, we proposed an NDVI prediction model based on RF with area-averaged precipitation and temperature as predictors. The monthly rainfall and temperature observations from 30 meteorological stations in the YZRB and the MODIS NDVI datasets from 2000 to 2015 were selected to calibrate and validate the proposed model. The RF results were also compared with those of ANN and SVM models. The primary conclusions are as follows:


Because of sparse meteorological networks, this research was conducted on a subzone scale. In the future, we will try to explore the relationships between NDVI and climatic factors at a higher resolution with gridded meteorological observations, which will be more applicable for integrated water resource management. The adoption of more vegetation indices, such as leaf area index (LAI), is another important direction.

**Author Contributions:** K.C. used RF, SVM, ANN model for simulation, calibration, and validation; B.P., L.C. and D.P. collected and processed data; K.C., B.P. and wrote the paper; Z.Z., G.Z. and S.S. supervised the research. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by three research programs: (1) National Natural Science Foundation of China (91647202). (2) National Natural Science Foundation of China (51879008); (3) China Scholarship Council (No. 201906045024).

**Conflicts of Interest:** The authors declare no conflict of interest.
