**5. Conclusions**

Based on EI architecture, this study proposes a decision optimization algorithm based on state judgment, in order to realize efficient, safe and stable operation of the EI. The experimental results show that the deep CNN is superior to the conventional machine learning algorithms with regards to feature extraction and prediction accuracy. In addition, a data batch processing toolkit based on BPA is developed to realize semi-automatic data batch processing, which improves the efficiency of data processing. Based on the stable state judgment, a deep reinforcement learning algorithm is proposed to optimize the reactive power compensation decision of EI. The experimental results not only show that this algorithm can achieve the system stability target, but can also fulfils the expectation of distributed reactive compensation and minimization of total reactive compensation.

Currently, a large number of simulation data can be generated off-line for training. The simulation and state feedback of continuous action changes cannot be realized. The action has to be discretized. In the future, it is necessary to realize the direct interface between simulation software and the deep learning platform, such that real-time simulation can be performed, and deep reinforcement learning algorithm can be used for continuous action learning and training.

**Author Contributions:** The work presented here was carried out through the cooperation of all authors. W.Z. and J.C. conceived the scope of the paper; W.Z. conceived the analysis and performed the simulations; H.H. and Z.X. wrote the paper; J.C. acquired the funding and performed revisions before submission. All authors read and approved the manuscript.

**Funding:** This work was supported in part by National Natural Science Foundation of China (grant No. 61472200) and Beijing Municipal Science & Technology Commission (grant No. Z161100000416004).

**Conflicts of Interest:** The authors declare no conflict of interest.
