2.2.1. Binary Tree

A binary tree (BT) is easy to interpret, fast for fitting and prediction, and low on memory usage. It consists of nodes and directed edges. There are two types of nodes: internal and leaf. In this paper, the internal nodes represent the variables of CYGNSS data and the leaf nodes represent the wind speed value. Each step in a prediction involves checking the value of one predictor variable. Figure 1 shows a simple sample BT composed of 100 CYGNSS-ERA5 matchups. In the experiments described in Section 3, the BT models are much more complex than this example, and the retrieval accuracy is much improved, because the amount of data used to build BT models is much larger.

**Figure 1.** An example of a BT model structure.

When BT is used for regression tasks, variables of the sample are tested from the root node, and the sample is assigned to its child node according to the test results. In this way, the samples are tested and allocated recursively until they reach the leaf node, and each leaf node corresponds to a wind speed value. The criteria of splitting nodes are defined to balance predictive power and parsimony [36]. It is necessary to specify the minimum number of training samples used to calculate the response of each leaf node. When growing a regression tree, its simplicity and predictive power need to be considered at the same time. A very leafy tree tends to overfit, and its validation accuracy is often far lower than its training (or resubstitution) accuracy. In contrast, a coarse tree with fewer large leaves does not attain high training accuracy. However, a coarse tree can be more robust in that its

training accuracy can be near that of a representative test set. In this paper, the minimum leaf size is set at 4.
