2.1.1. Forecasting Model

The forecasting of wind power time series is cast as a Machine Learning regression problem, a particular class of supervized learning problems for which the output is continuous. In this paper, five different models are employed and compared. We first implement two state-of-the-art neural network architectures, namely the traditional feedforward MultiLayer Perceptron or MLP [31], as well as recurrent extensions such as the LongShort Term Memory network or LSTM [32] and its bidirectional variant BLSTM [33], recently applied in the energy sector in [34,35]. We then implement two tree-based techniques, namely Random Forests or RF [36], and Gradient Boosting Decision Tree techniques or GBDT [37]. Finally, the fifth forecasting model (ENSEMBLE) is an ensemble forecast whose output is simply the average of the four previous models. The employed models represent a snapshot of current state-of-the-art Machine Learning techniques, as exemplified by their high performances in contests such as the Global Energy Forecasting Competition [30].

We focus on the day-ahead prediction of wind power, i.e., we aim at forecasting at 12:00 p.m. of day *D* − 1 the wind power for the 96 quarters of an hour of day *D*. Figure 1 depicts the overall procedure. Input features are composed of historical data (i.e., past wind power production and meteorological data such as wind speed, temperature and pression) and of future data (in this case the publicly available day-ahead onshore wind power forecast made by the Transmission System Operator—or TSO—at the national level). In the case of the tree-based models, i.e., RF and GBDT, one model is trained for each of the 96 quarters of an hour of day *D*, and the quarter hourly forecasts WPRF *q* (or WPGBDT *q* ) , *q* = 0, ... , 95 are combined in a forecast vector **WP**RF 0−95 (or **WP**GBDT 0−95 ) for the whole day *D*. One single MLP model with an output layer of size 96 predicts the wind power for day *D* **WP**MLP 0−95, and one single unrolled BLSTM model predicts **WP**BLSTM 0−95 . Finally, the output of the ENSEMBLE model, i.e., **WP**Average 0−95 , is computed by calculating the average of **WP**RF0−95, **WP**GBDT 0−95 , **WP**MLP 0−95 and **WP**BLSTM 0−95 .

**Figure 1.** Wind power forecasting methodology.

#### 2.1.2. Strategy for Dealing with Abnormal Wind Power Data

Abnormal data—or outliers—are quite common in localized wind power data, which may have a strong impact on the performance of wind power forecasting models which are built on such data. These can be detected by analyzing the wind turbine power curve (which depicts wind power as a function of wind speed), and can be classified into four categories depending on their position with respect to the normal power curve, according to [38]: bottom curve stacked outliers (due to turbine failure, communication equipment failure, measurement terminal failure, unplanned maintenance—see zone 1 of Figure 2), mid-curve stacked outliers (caused by wind curtailment or communication issues—see zone 2 of Figure 2), top-curve stacked outliers (caused by communication error or wind speed sensor failure—see zone 3 of Figure 2) and around-curve stacked outliers (due to random factors such as signal propagation noise and extreme weather conditions—see zone 4 of Figure 2).

In this paper, we propose an original method for taking into account the presence of abnormal wind power data directly in the learning procedure of the neural network wind power forecast models, in order to improve the forecast performance. In practice (and voluntarily summarizing the process for the sake of clarity), the learning procedure for neural networks, and more generally for supervised learning models, consists in identifying the values of the model parameters *θ* (e.g., the weights of a neural network) minimizing a loss function L, which quantifies how well the model fits the training data:

$$\theta^\* = \underset{\theta}{\text{arg min}} \mathcal{L}(\dot{y} = f\_{\theta}(\mathbf{x}), y) \tag{1}$$

with *y*ˆ the output of model *fθ*(*x*), *x* the vector of input features, *y* the target vector (i.e., the true forecast values extracted from the training set (*<sup>x</sup>i*, *<sup>y</sup>i*), *i* = 1, ... , *N*, with *N* the number of samples in the training set), and *θ*<sup>∗</sup> the optimal parameters values. Problem (1) is solved using variants of the gradient descent algorithm, for most of supervised learning models.

**Figure 2.** Abnormal data in wind power curves (extracted from [38]).

In that context, the main idea of our adapted learning procedure consists in modifying the loss function L in order to cancel the contribution of data objects which are tagged as abnormal by an ad hoc abnormal data detection algorithm. The general procedure, depicted in Figure 3, can be described as follows in the case of a neural network model. Each time a training sample (*<sup>x</sup>i*, *<sup>y</sup>i*) is presented at the input of the model, apply the following steps:

1. **Forward pass**. Compute ˆ*yi*= *fθ*(*<sup>x</sup>i*), i.e., an estimation of the true forecast *yi*.


$$\mathcal{L}\_i = \frac{1}{2} \|\mathfrak{m}\_i (\mathfrak{y}\_i - \mathfrak{y}\_i)\|\_2^2 \tag{2}$$

4. **Backward pass.** Update the parameters (i.e., the weights **W** = {*wlij*} of the neural network, with *l* = 1, ... , *NL*, *i* = 1, ... , *nl*−1, *j* = 1, ... , *nl*, and with *NL* the number of layers in the neural network and *nl* the number of neurons in layer *l*) according to standard backpropagation formula.

**Figure 3.** Strategy for training neural networks in presence of wind power abnormal data.

#### *2.2. Electricity Consumption Representative Profiles*

Forecasting in day-ahead the electricity consumption of individual companies with the required time granularity (quarter hourly in the present case) is a complex task. The consumption of companies in different branches shows indeed a high variance, as exemplified in [39]. An accurate forecast would require therefore explanatory variables which precisely describe the economic activity of the company, and which are therefore difficultly obtainable in practice, mainly for privacy reasons. In that context, researchers tend to aggregate the electricity consumption at an appropriate level before performing the forecasting task (see e.g., [26,34]), or focus on longer time spans (such as [39] in which the authors predict the annual electricity consumption of enterprises).

In this paper, instead of developing pure forecasting models for each company of the community, we propose a method for generating representative electricity consumption profiles for each member, which is solely based on their past consumption data. The method is inspired by [40] and adapted to the present context.

In the following, we assume that a dataset *X* of daily profiles of electricity consumption, sampled at a quarter hourly rate, is available for each member. Each data object *xi* is therefore a 96-dimensional vector ( = 4 × 24). The procedure is explained below for one community member.


$$\mu = \operatorname\*{arg\,min}\_{\mathcal{Y} \in \mathcal{X} = \{\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_N\}} \sum\_{i=1}^N d(y\_i \mathbf{x}\_i) \quad i = 1, \dots, N \tag{3}$$

with *X* = {*<sup>x</sup>*1, *x*2, ... , *<sup>x</sup>N*} a dataset of *N* consumption profiles, and *d*(., .) a distance function between two data objects. We use a Dynamic Time Warping (DTW) distance in this work, which is a distance originally developed in the field of speech processing [41] but is now generally employed when comparing time series, and more particularly in shape-based time series clustering [42].

4. Generate consumption profiles between two pre-specified dates. Finally, electricity consumption profiles between day *d* and day *d* are created by 1. generating the sequence of dates between *d* and *d* and 2. assigning to each date the corresponding medoid (winter Monday profile, summer Tuesday profile, off-day profile, etc.).

#### **3. Use Case and Results**

The data analytics modules described in Section 2 are applied in this section on a pilot REC established in Belgium in the framework of the E-Cloud project [17]. We begin by describing the selected use-case (Section 3.1), focus then on the performance of the wind power forecasting module (Sections 3.2 and 3.3), and finally quantify the impact of the data analytics modules on the behaviour of the community members and on the operation of the REC, by analyzing the evolution of the members self-consumption and self-sufficiency (Section 3.4).

## *3.1. Use Case Description*

The E-Cloud project [17,18], led by ORES (one of the main Walloon DSOs) in collaboration with local private and public entities (Luminus, IDETA, Siemens, DAPESCO, N-Side and the University of Mons- UMONS- ), established a pilot REC in Tournai (Belgium), on an industrial area connected to the Medium Voltage electricity distribution network.

The REC involved 18 members (mainly Small or Medium Enterprises or SMEs) and included 18 MW of wind power generation (of which a portion of only 2.25 MW was allocated to the community, the rest was sold through traditional market processes and wasa therefore out of the scope of the present work), as well as 70 kW of peak photovoltaic generation, owned by third-party investors (the community itself could however own the generation assets, which will be studied in future works). A temporary derogation was granted by the regional Walloon regulator ('Commission Wallonne Pour l'Energie', or CWAPE) in order to apply a tailored pricing scheme inside the community; in that way, community members were allowed to purchase, at an advantageous price, (part of) their electricity consumption directly to the local renewable generation when it was available, bypassing the traditional wholesale-retail market structure and favouring local consumption of the local available generation. A distribution key calculated a priori [18] specified the portion of local renewable energy allocated to each community member every quarter of an hour. For the consumption not covered by the local generation, members were free to establish contracts with suppliers in the classical retail market. In the E-Cloud project, thanks to the data analytics modules presented in this paper, community members were furthermore informed in day-ahead of the prospects in terms of renewable energy production, as well as of their own typical electricity consumption profiles at the concerned time of year. They were in that way incentivized to adapt their consumption to local generation via the preferential tariff which was applied in the community.

The project preparatory phase started in 2017, and the pilot was effectively deployed in Tournai, applying the pricing derogation granted by the regulator, from July 2019 to June 2020. During the pilot life, approximately 7500 MW h were produced by local generation, of which 56% have been consumed locally. The total consumption of the 18 involved companies during the full year of the pilot can be observed on Table 1.


**Table 1.** Total electricity consumption of the 18 member companies during the pilot (July 2019–June 2020).

#### *3.2. Dealing with Wind Power Abnormal Data*

We first demonstrated the efficiency of the original procedure proposed in Section 2.1.2 for dealing with abnormal wind power data in the training of neural network based forecast models. To that end, we leveraged a dataset made available in the framework of the E-Cloud project, consisting of approximately 1.4 years (January 2018–May 2019) of:


• onshore wind power forecasts at the national level, made available publicly by the Belgian Transmission System Operator (TSO) Elia [43] with the objective to benefit market participants and improve the electricity market outcomes [44], sampled at a quarter hourly scale.

The abnormal data detection algorithm [38] presented in Section 2.1.2 was first applied on the farm level wind power curve created using wind power and wind speed data. The outcome of the procedure is depicted in Figure 4, where green circles refer to normal data, and red crosses (blue stars) correspond to abnormal data detected by the quartile method (change point method respectively).

The abnormal data points were employed to create masking vectors *mi*, which were involved in the modified learning procedure of the wind power neural network forecasting models. A 96-output MultiLayer Perceptron (MLP) which aimed at performing a day-ahead wind power forecast was in that way trained on the dataset depicted above, according to the procedure of Figure 3. More precisely, in accordance with standard text books in Machine Learning [31], a backtesting procedure based on cross-validation and which respected the temporal order of observations, which is common in the field of time series prediction, was performed by decomposing the 1.4 years dataset into three sets: a 13-month training set (January 2018–February 2019) which was used for estimating the model parameters (e.g., the weights of the neural networks, etc.), a 1-month validation set (March 2019) which was employed for tuning model hyperparameters (such as e.g., the number of neurons per layer and the number of layers in neural networks, etc.) and prevent overfitting, as well as a 2-month test set (April–May 2019), for evaluating the model performances on new data that had not been seen previously by the model.

The Python libraries Keras [45] and TensorFlow [46] were employed for implementing and training the neural networks. The Adam optimization algorithm [47], a state-of-the-art variant of stochastic gradient descent, was selected as the training algorithm for estimating the neural network weights. The Tree-structured Parzen Estimator (TPE) approach [48] was employed for optimizing the hyperparameters of the neural network (i.e., the number of hidden layers, the number of neurons in each layer, the size of the input feature vector, etc.), with the help of the *Hyperopt* Python library [49], which led to an MLP architecture with one hidden layer, 32 neurons, and an input layer including 12 past time steps for the wind power, seven past time steps for the wind speed and the atmospheric pressure, and 31 past time steps for the temperature.

**Figure 4.** Abnormal wind power data filtering on the E-Cloud data, according to the procedure exposed in [38]. Normal points are tagged with green circles, and abnormal points are tagged with blue stars (red crosses) if they have been identified using the change point method (quartile method).

We then compared the performance of the trained neural network in two different configurations, i.e., when masking the contribution of wind power abnormal data during training according to the procedure exposed in Section 2.1.2 ('MLP with mask'), and without applying any masking effect ('MLP without mask'). To that end, we trained 100 neural networks with the best architecture found above, and computed the Root Mean Square Error (RMSE) obtained on the test set by comparing the forecast and the true value of wind power generation. The Adam training algorithm was indeed a stochastic algorithm, which aimed at minimizing a highly non convex cost function, and which thereby ended up in local minima which varied according to different initial conditions, training parameter values, etc. [47]. The average of the RMSEs, as well as the standard deviation, the min and max values of the RMSE, are depicted in Table 2, for the two approaches (with and without mask). It is shown that the masking of wind power abnormal data was able to decerase the average RMSE by approximately 10%, which confirms the interest and efficiency of the proposed methodology.


**Table 2.** Strategy for dealing with wind power abnormal data: forecasting performance of the trained MultiLayer Perceptrons (MLPs) with and without the proposed mask.
