*3.3. The Results of Regression Trees*

This section analyzes the effects of the four regression trees modeling methods (i.e., BT, ET, XGB and LGBM) that were described in Section 2.3. Figure 7 shows the scatter plots of the true and estimated wind speeds. In the figure, the color (from cool to warm) indicates the density of the points. Table 4 shows the retrieval performance of each regression tree

model. The bold font represents the best results. It may be seen that many high wind speed data are concentrated in the range of 15–20 m/s, causing elevated inversion accuracy in this range. In order to avoid the influence of data distribution on the analysis of the result, the performance of high wind speed models was analyzed in three data intervals: (1) overall (15–30 m/s), (2) 15–20 m/s and (3) 20–30 m/s.

**Figure 7.** Results of wind speed retrievals based on regression trees methods. The subgraphs in the first row represent the retrieval results in low wind speed, while those in the second row represent the retrieval results in high wind speed. The black line shows the 1:1 performance line.



As shown in Figure 7, all four regression tree-based modeling methods have the ability to retrieve wind speeds in different intervals. As the simplest regression tree modeling method, BT demonstrated the worst retrieval results, i.e., the greatest dispersion, as shown in Figure 7. Further analysis showed that the performance of the other three methods was superior to that of BT. LGBM had the best performance in the low wind speed interval; the RMSE and R of LGBM were improved by 27.97% and 17.27% compared with BT. In the high wind speed interval, the performance of ET was the best. For instance, the RMSE and R of ET were improved by 23.61% and 22.33% compared with BT. It should be noted that the RMSEs of high wind speed models are basically smaller than those of low wind speed models, which does not mean that the former have better performance in general. In fact, this situation is mainly affected by the wind speed distribution of the dataset used in this paper. The performance of all regression trees modeling methods was better in low wind speed interval, which is consistent with the conclusions of many previous studies [26–29]. From the calculated MD, a slight underestimation of true wind speed in both figures was observed. Besides, more obvious underestimations at high winds were shown by both models. This result is similar to that of [28]. Most of the research results demonstrate that GNSS-R data are more suitable for retrieving low wind speeds, while significant performance degradation occurs when retrieving high wind speeds [27–29]. This might be due to the reduced sensitivity of an ocean scattering cross-section to the high wind speed and the increased random error in the DDM signal [14].
