2.1.1. State Space

The state space of the RL agen<sup>t</sup> (i.e., central controller) is defined by the information that can be measured in real-time by SCADA (Supervisory Control and Data Acquisition) or PMU

(Phasor Measurement Unit). In that regard, the state space *st* at time *t* contains the voltage levels *Vn*,*<sup>t</sup>* for each node *n* ∈ N of the distribution system. Then, this information is complemented by the (predicted) maximum power level *Pg*,*t*+<sup>1</sup> of each distributed generators *g* ∈ G at the next time interval *t* + 1. This information (reflecting, e.g., the maximum energy contained in the wind) allows the agen<sup>t</sup> to know the upper limit of these control variables when taking its actions. Practically, this is achieved using a (deterministic) single-step ahead forecaster, which is based on an advanced architecture of recurrent neural networks, as presented in [34]. The latter is tailored to predict the power level in the upcoming future by leveraging the past dynamics of the generator output. Finally, the current tap position *Tapt* of the transformer (which defines the turn ratio between primary and secondary voltage levels) is also included in the state space *st*.

$$s\_t = (V\_{1,t}, \dots, V\_{N,t}, \mathbb{P}\_{1,t+1}, \dots, \mathbb{P}\_{G,t+1}, \operatorname{Tap}\_t) \tag{1}$$
