3.3.3. Random Forest

Random forest is a set of tree classifiers {*h*(*<sup>x</sup>*, θ*k*), *k* = 1, 2, ... , n)} and *h*(*<sup>x</sup>*, θ*k*) is the meta-classifier, which is a classification regression tree composed of CART algorithm. As an independent random vector, *h*(*<sup>x</sup>*, θ*k*) determines the growth of each decision tree and *x* is the input vector of the classifier.

A schematic diagram of the random forest algorithm is shown in Figure 3.

**Figure 3.** Schematic diagram of random forest algorithm.

Combined with the proposed oversampling method, the specific electricity theft detection steps are as follows:


$$m = \sqrt[n]{K} \tag{10}$$

(5) Input test set *Te* into each trained decision trees, and the classification result is determined according to the voting result of each decision tree. The voting classification formula is as follows:

$$f(Te\_i) = MV\{h\_t(Te\_i)\}\_{t=1}^{nTrc} \tag{11}$$

where *Tei* (*i* = 1, 2, ... , *k*) represents each element in the test set, *MV* represents the majority vote, and *ht*(*Tei*) represents the classification result of element *Tei* in decision tree *T*.


Based on the above theory and steps, the proposed electricity theft detection process was as shown in Figure 4.

**Figure 4.** The proposed process of electricity theft detection.
