2.2.4. LightGBM

LightGBM (LGBM) is an efficient gradient boosting decision tree, which serves to enhance the efficiency of the model when the variable dimension of the data sample is high and the data scale is large [44]. Compared with Xgboost, LightGBM is faster to compute and consumes less memory. LightGBM uses an Exclusive Feature Bundling (EFB) strategy to bundle mutually exclusive variables in order to reduce the number of variables and achieve the purpose of dimensionality reduction. Finding the optimal binding variable has been proven to be an NP-hard problem, as the enumeration method cannot be applied. In actual operation, EFB uses the greedy algorithm to approximate the optimal solution, i.e., which reduces the number of variables without affecting the accuracy of split nodes. Table 3 shows the parameter settings of LightGBM.

**Table 3.** Parameter settings of LightGBM.


## *2.3. Artificial Neural Network*

Artificial neural networks (ANNs) are relatively new computational tools that have been used extensively to solve many complex real-world problems [45]. In order to avoid the effects of dimension and order of magnitude, before using an ANN to process data, the CYGNSS variables need to be normalized:

$$X\_i' = \frac{X\_i - X\_{\min}}{X\_{\max} - X\_{\min}} \tag{8}$$

where *Xmin* and *Xmax* are the minimum and maximum values of the CYGNSS variables and *X i* are the normalized CYGNSS variables. The Full Connection Network (FCN) is often used in regression problems. Regarding GNSS-R wind speed retrieval, many researchers have demonstrated a significant improvement compared with traditional methods [23,25]. Figure 2 shows the ANN structure adopted in this paper, including input layers, hidden layers and the result of wind speed retrieval. Input layers are the 10 CYGNSS variables used in this paper. Three hidden layers are adopted; their neurons are N, 2N and N respectively. Figure 2 shows the structure of ANN when N = 5.

**Figure 2.** ANN structure adopted in this paper when N = 5. Small circles represent neurons in the model.

The number of neurons in an ANN affects the retrieval results, so the size of N is an important parameter when setting up a network. Herein, we analyze the impact of different activation functions on the performance of FCN, which makes connections between neurons. Generally, the accuracy of linear models is low, so activation functions improve the performance of ANN models by adding nonlinear factors. Determining the optimal activation function in an artificial neural network is an important task, because it is directly linked with the network performance. However, unfortunately, it is hard to determine this function analytically; rather, the optimal function is generally determined by trial and error or by tuning [46]. Three activation functions are analyzed in this paper, i.e., ReLu, Tanh and Sigmoid:

$$f\_{\text{ReLU}} = \max(0, v) = \begin{cases} 0 \ (v < 0) \\ v \ (v \ge 0) \end{cases} \tag{9}$$

$$f\_{\text{Tanh}} = \frac{\varepsilon^{\upsilon} - \varepsilon^{-\upsilon}}{\varepsilon^{\upsilon} + \varepsilon^{-\upsilon}} \tag{10}$$

$$f\_{Sigmod} = \sigma(v) = \frac{1}{1 + e^{-v}}\tag{11}$$

where *v* is the input value of the previous neuron. The advantages of ReLU include the fast convergence speed of the network being trained, low computational complexity, and the absence of saturation and vanishing of gradient problems when *v* > 0. The ReLU activation and combinations of multiple instances are non-linear. The Tanh function provides stronger non-linearity but is plagued from with saturating and vanishing gradient problems. The advantage of Tanh and Sigmoid is their stability.

#### *2.4. Stepwise Linear Regression*

Stepwise linear regression (SLR) is able to establish the optimal multi-variable linear regression equation. First, linear regression model *SLR*<sup>0</sup> is constructed with all variables *v*1, *v*2,..., *vp* :

$$f\_{SLR}^0 = \beta\_0 + \sum\_{i=1}^p \beta\_i v\_i \tag{12}$$

where *β*0, *β*1, *β*2,..., *β<sup>p</sup>* are constant parameters. The model is then used to estimate an unknown parameter such as wind speed for *n* times, where *n* is the number of observation datasets. The root mean square error (RMSE) of m estimations of model *SLR*<sup>0</sup> is calculated and denoted as *RMSE*0. Next, the first variable, *v*1, is removed, and estimation is performed m times again. Finally, the RMSE may be calculated and denoted as *RMSE*1. If *RMSE*<sup>1</sup> is smaller than *RMSE*0, *v*<sup>1</sup> may be removed; otherwise, it should be retained. This process is repeated until all variables are tested. Then, the variable with the smallest RMSE is selected. Therefore, this method is efficient for seeking localized variables [47]. SLR has good predictive ability and lower computational complexity than other methods [48].
