**2. Problem Formulation**

The goal of this paper is to achieve the stability of high voltage buses, complete distributed reactive power compensation, and minimize the total compensation of the SVG. In this paper, for the training model, deep learning and reinforcement learning are combined to provide the installation strategy of SVGs.

### *2.1. Voltage Stability in EI and Construction of the Simulation Model*

The local microgrids can be integrated into the large power grid, or they can be operated in the islanded mode. When the local-area grid network is disconnected from the large power grid, voltage stability issues occur, which potentially affects the reliability of the system operation.

Normally, an EI scenario functions in a stable state for most of the operation time, and an unstable state caused by short-circuit faults rarely occurs. Thereby, the gaps between the number of stable and unstable samples collected by phasor measurement unit (PMU) in EI are extremely large. If real data is used for prediction and all the selected classifiers are stable, the accuracy of the trained classifier would still be relatively high (no less than 99%). In this manner, there is no training effect. Hence, for the considered EI scenario in this paper, the simulation data is generated by BPA software. The power grid simulation model is shown in Figure 1. The sequence numbers 1–60 in Figure 1 represent network nodes 1–60. We consider *n* voltage-grade substations, namely, n*i* kV, and *Ai* substation, 1 ≤ *i* ≤ *n*. Here, *Ai* is a custom symbol.

**Figure 1.** Energy Internet (EI) simulation model.

### *2.2. Judgment of Transient Voltage Stability*

The data extraction program file was written, and the data in each BPA output file was extracted. The voltage *U*, frequency *f*, active power *P* and reactive power *Q* of each node measured each half cycle in the first *n* cycles is taken as the input data in the process of training the stability evaluation model. The data of voltage *U* in the last five cycles is taken to determine whether the value of voltage in the stable state is finally restored. The judgment result is used as the output data in the process of training the stability evaluation model.

The results of the stability prediction are evaluated by three indexes: precision, recall, and f1-score (known as the harmonic mean of precision and recall [38]), which are as follows:

$$precision = \frac{true\ positive}{true\ positive + false\ positive}$$

$$recall = \frac{true\ positive}{true\ positive + false\ negative}$$

$$f1-score = 2 \times \frac{precision \times recall}{precision + recall}$$

The interpretations of *true positive*, *false positive*, *false negative* and *true negative* are shown in Table 1. *True* means that the classification is correct, and *false* means that the classification is false. *Positive* means that classification is positive sample "1", and *negative* means that classification is negative sample "0". By developing a general mathematical framework based upon the percolation model, [39] investigates attack robustness analytically with a false positive/negative rate.

**Table 1.** Evaluation of classification results.

