**3. Simulation Environment**

To train the DRL agent, it is necessary to build a simulation environment E that mimics the actual system. This environment is composed of three modules: (i) to generate realistic deviations of the expected nodal load and distributed generation powers for the next time step (to reflect prediction errors), (ii) to provide realistic values of the uncertain network parameters, and (iii) to simulate the physical flows in the distribution network.

As depicted in Figure 2, the RL agen<sup>t</sup> is trained off-line through interactions with the simulation environment, which allows calibrating the RL model using experience and rewards. As previously explained, the starting point is an observation of the state *st* of the environment (e.g., nodal voltage levels of the distribution system). Based on this information, the (target) actor network is used to take an action *at* = *μφ*(*st*) + *t* (where the additional noise *t* is used during training to boost exploration). It should be noted that, if no voltage problem is observed, the optimal action is to do nothing. Then, the simulator (thoroughly described in the rest of this Section) is used to determine the impact of the action on the environment, which consists in computing the reward *rt*, but also the next state *st*+1. Then, the (target) critic is used to evaluate the quality of the decisions, and both actor and critic networks are updated using respectively (10) and (9) to improve the policy of the DRL-based agent.

When the learning is performed, the agen<sup>t</sup> can be deployed for practical power system operation (for which only the actor network is useful). Interestingly, the agen<sup>t</sup> can still continue its learning (and thus adapt to potential misrepresentations of the simulation environment) by adjusting its parameters through on-line feedback. This may also serve for calibrating the model to the time-varying conditions of the system.

**Figure 2.** Training of the DRL agen<sup>t</sup> for autonomous voltage control in distribution systems.

#### *3.1. Exogenous Uncertainties on the Network Operating Point*

The first category of uncertainties belongs to the network working point (regarding both the nodal consumptions and generations). Indeed, the output power of renewable-based generators is intermittent upon the nature of their primal sources (mainly wind and solar), such that the generated power can quickly vary within a short interval. Moreover, the nodal consumption and generation levels are not always measurable. Consequently, in practice, the future operating state of the distribution system is not known with certainty, and this stochasticity is here represented with scenarios of representative prediction errors. Practically, for the renewable generation, a database is constructed based on the historical prediction errors of the employed forecaster (described in Section 2.1.1), and a sample is randomly drawn from this database to generate the desired scenario. For the nodal loads, the same sampling strategy is used to simulate the (uncertain) changes in the consumption level.

#### *3.2. Endogenous Uncertainties on the Network Component Models and Parameters*

The second category of uncertainties is related to partial knowledge of network component models and parameters. In general, network analyses and simulations are carried out relying on the simplified models of network components, which do not correctly represent the physical relations and dependencies within the real network. This includes uncertainties associated with the line, load and transformer models [26].

In particular, we model the thermal dependency phenomenon whereby the line resistance fluctuates with respect to the conductor temperature variation. Then, the uncertainty associated with the load power factor is considered to better reflect the different natures, types and amplitudes of the various load demands. Moreover, as shown in [38], the internal resistance of the transformer can have a significant effect on the node voltages, and is thereby also incorporated in the network model. Finally, in contrast to typical network models, the shunt admittances of power lines are taken into account using the PI line model. Overall, all these (uncertain) parameters are modelled as random variables changing within representative predefined bounds.

#### *3.3. Distribution Network Model*

The electrical network operation is modeled through load-flow calculations, which are solved using the Newton-Raphson approach.
