*4.3. Extreme Cases*

In this part, the outcome of the DRL-based agen<sup>t</sup> is illustrated for two extreme situations, respectively corresponding to the worst-case over- and under-voltage states. These states result from the combination of extreme consumption and generation conditions, associated with unfavourable parameters of the distribution system (such as high line impedances arising from a temperature increase).

In Figure 8, we select the scenario (from the 2000 test samples) which leads to the worst-case voltage rise. In this case, the load demands are low (globally equal to around 10% of their nominal values) while active powers of DGs are at 90% of their rated values. The initial system voltages significantly exceed the upper limit of 1.01 p.u. (for almost all nodes), and reach a maximum value of 1.08 p.u. at node 27 (end of feeder 1). Also, the absolute value of the reward associated with the control actions taken by the proposed DDPG algorithm is represented in the right part of the Figure 8.

**Figure 8.** Initial nodal voltages as well as the corrected ones obtained by the DRL-based agen<sup>t</sup> in an extreme over-voltage situation. The corresponding (absolute value) of the reward related to each family of actions is also displayed.

Interestingly, the DRL-based agen<sup>t</sup> has completely solved the voltage problem. We see that it did not rely on the curtailment of the active power of distributed generators. Indeed, this solution is more expensive than consuming reactive power (which is thereby the privileged action). However, the transformer tap ratio had also to be modified (i.e., voltage drop between nodes 1 and 2) to prevent over-voltages at the end of the feeders.

In Figure 9, the voltage drop condition is analyzed, which corresponds to a situation where load demands are maximum, while active powers of DGs are equal to zero. This results into under-voltage issues in many nodes of the distribution system.

**Figure 9.** Initial nodal voltages as well as the corrected ones obtained by the DRL-based agen<sup>t</sup> in an extreme under-voltage situation. The corresponding (absolute value) of the reward related to each family of actions is also displayed.

Similarly to the over-voltage case, the privileged action is to modify the reactive power level of DG units (here by exchanging capacitive reactive power to compensate the voltage drops). The corrected situation brings the voltage plan within the desired limits, at the exception of some nodes at the end of feeder 1 that are slightly violating the lower bound (of 0.99 p.u.).

In general, after the training, the agen<sup>t</sup> is able to successfully make the right decisions. In particular, during the testing under new randomly generated conditions, the proposed DRL-based algorithm achieves robust solutions (against the various sources of uncertainty) that mitigate severe voltage violations using cost-effective actions.

#### **5. Conclusions and Perspectives**

This paper was devoted to the voltage control problem in distribution systems, which is facing new challenges from growing dynamics and uncertainties. In particular, current strategies are hampered by the limited knowledge of the network parameters, which may prevent achieving the optimal cost-efficiency. This problem is formulated as a centralized control of resources using deep reinforcement learning, through an actor-critic architecture that enables to properly represent the continuous environment. This framework bypasses the need to represent analytically the electrical system, such that the impact of model accuracy is decoupled from the control performance.

The main advantage of the proposed model is to put the computational complexity on the pre-processing (in a fully data-driven framework), such that the model provides very fast decisions in test time. Interestingly, the developed regulation scheme is not only easy to implement, but also cost-efficient as we observe that the agen<sup>t</sup> is able to automatically adapt its behavior to varying conditions.

The promising outcomes of the work pave the way towards more advanced strategies, such as the extension to a decentralized approach using a multi-agent formulation (that would prevent the single point of failure of the centralized framework). Similarly, extending the framework to partially observable networks (where the state of the system is not fully known [40]) also offers a valuable area of research for system operators.

**Author Contributions:** Conceptualization, J.-F.T. and F.V.; methodology, J.-F.T.; validation, J.-F.T., and B.B.Z.; writing–original draft preparation, J.-F.T. and B.B.Z.; writing–review and editing, M.H., Z.D.G. and F.V.; supervision, F.V.; project administration, F.V. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** J.-F. Toubeau is supported by FNRS (Belgian National Fund of Scientific Research).

**Conflicts of Interest:** The authors declare no conflict of interest.
