*3.3. DTR*

A decision tree corresponds to a partition of the feature space and the output value on the partition unit which is constructed by recursive segmentation, and the feature with the highest information gain is split first. The training process consists of feature selection, tree generation and pruning. All values of the feature are traversed and the space is divided until the value of the feature minimizes the loss function, and a partition point is obtained.

The optimal segmentation is used as the node of the decision tree. When generating leaf nodes, the most important thing is to pay attention to whether it is necessary to stop the growth of the tree. The process continues iteratively until we reach a prespecified stopping criterion such as a maximum depth, which only allows a certain number of splits from the root node to the terminal nodes. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. The topmost decision node in a tree corresponds to the best predictor called the root node.

A primary advantage of DTR is that it is easy to follow and understand. It does not require any transformation of the features according to nonlinear data. In order to reduce storage requirement, the size of a decision tree is controlled by setting parameters such as maximum depth and minimum number of leaf nodes. At each segmentation, the features are always randomly arranged. Its output value is the average of all leaf node samples. Therefore, even if the same training data set is used, the optimal segmentation may be different. DTRs tend to overfit very easily.

#### *3.4. Random Forest Regression*

RFR is one of the most popular algorithms for regression problems because of its simplicity and high accuracy. It is an ensemble technique that combines multiple decision trees with a voting mechanism. Due to randomness it has a better generalization performance than DTR. This helps to decrease the model's variance. It is usually trained by using the bagging method which combines predictions from multiple machine learning algorithms together to make predictions more accurate than an individual model. They are less sensitive to outliers in the dataset and do not require much parameter tuning. The only parameter in RFRs is typically needed to experiment with is the number of trees in the ensemble. The predictions are calculated as the average prediction over all decision trees. The key lies in the fact that there is a low correlation between the individual models.

RFR is regressor, which adopts a voting mechanism to obtain prediction results based on decision tree. RFRs establish multi-decision trees by dividing the training samples. According to the bootstrap sampling method, part of the data is randomly extracted from

the data set as the training sample, and the remaining data is used as the validation sample of each decision tree. When regressing unknown samples, the prediction of each decision tree is output first, and then all the prediction results are synthesized by using the simple voting method to obtain the final prediction.

The most apparent benefit of RFR is its default ability in correcting the overfitting problems of decision trees to their training data sets. By using the bagging method and random feature selection the overfitting problem, which often leads to inaccurate outcomes, is almost completely resolved.
