Consumption–Production Profile Categorization in Energy Communities

Rozas, Wolfram; Pastor-Vargas, Rafael; García-Vico, Angel Miguel; Carpio, José

doi:10.3390/en16196996

Open AccessFeature PaperArticle

Consumption–Production Profile Categorization in Energy Communities

¹

Departamento de Sistemas de Comunicación y Control, Escuela Técnica Superior en Ingeniería Informática, Universidad Nacional de Educación a Distancia (UNED), 28040 Madrid, Spain

²

Escuela Técnica Superior de Ingenieros Industriales, Universidad Nacional de Educación a Distancia (UNED), 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(19), 6996; https://doi.org/10.3390/en16196996

Submission received: 10 September 2023 / Revised: 28 September 2023 / Accepted: 1 October 2023 / Published: 8 October 2023

(This article belongs to the Special Issue Design and Implementation of Renewable Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Energy Transition is changing the renewable energy participation in new distributed generation systems like the Local Energy Markets. Due to its inherent intermittent and variable nature, forecasting production and consumption load profiles will be more challenging and demand more complex predictive models. This paper analyzes the production, consumption load profile, and storage headroom% of the Cornwall Local Energy Market, using advanced statistical time series methods to optimize the opportunity market the storage units provide. These models also help the Energy Community storage reserves to meet contract conditions with the Distribution Network Operator. With this more accurate and detailed knowledge, all sites from this Local Energy Market will benefit more from their installation by optimizing their energy consumption, production, and storage. This better accuracy will make the Local Energy Market more fluid and safer, creating a flexible system that will guarantee the technical quality of the product for the whole community. The training of several SARIMAX, Exponential Smoothing, and Temporal Causal models improved the fitness of consumption, production, and headroom% time series. These models properly decomposed the time series in trend, seasonality, and stochastic dynamic components that help us to understand how the Local Energy Market consumes, produces, and stores energy. The model design used all power flows and battery energy storage system state-of-charge site characteristics at daily and hourly granularity levels. All model building follows an analytical methodology detailed step by step. A benchmark between these sequence models and the incumbent forecasting models utilized by the Energy Community shows a better performance measured with model error reduction. The best models present mean squared error reduction between 88.89% and 99.93%, while the mean absolute error reduction goes from 65.73% to 97.08%. These predictive models built at different prediction scales will help the Energy Communities better contribute to the Network Management and optimize their energy and power management performance. In conclusion, the expected outcome of these implementations is a cost-optimal management of the Local Energy Market and its contribution to the needed new Flexibility Electricity System Scheme, extending the adoption of renewable energies.

Keywords:

flexibility; local energy market; predictive sequence models; uncertainty

1. Introduction

The energy transition is boosting the integration and development of Distributed Energy Resources (DER) in the power grid, resulting in demand and prices becoming more unstable and less predictable than ever. The technical quality of this new electricity generation system will face the challenge of coping with the massive entrance of new renewable energy generation assets. The reason is renewable energy’s inherent intermittent nature, which depends on variable meteorological conditions. Due to these limitations, new flexibility solutions are needed. One of these solutions is named Energy Communities (EC). Energy Communities [1] define a grouping of renewable energy producers/consumers, where it is essential to optimize the process of characterization of consumption and production profiles to achieve high efficiency of electric self-consumption. At the same time, a second objective defines obtaining benefits from trading energy surpluses generated by the community. One successful approach to Energy Communities is the Local Energy Markets (LEMs) [2].

Ricardo Faia et al. [3] have defined the Local Energy Market (LEM) as coordinating decentralized energy, storage, transport, conversion, and consumption within a given local geographical area. With automated control and demand-side management strategies, local energy management, especially with local Heat, Ventilation, and Air Conditioning (HVAC) production, promises to increase energy-use efficiency, reduce Greenhouse Gas (GHG) emissions, and enhance energy independence. These new markets are based upon a fully distributed electricity value chain, including generation, distribution, and retail. These new players generate more complex interactions that are a current research hotspot for game theory [4]. With the adoption of the LEM concept by the microgrids, they will become more and more independent from the Central System. At the same time, new problems in the grid, like congestion or lack of storage, will happen more frequently [5]. Under this further complexity, load, renewable generation, and storage reserves forecasting models are critical to avoid grid instability. These forecasting models will benefit all LEM participants, allowing them to optimize the consumption and production of energy from system operators to new generators like Energy Communities that are trading their surpluses in Local Energy Markets. According to the European Union, Energy Communities are open and voluntary and combine non-commercial aims with environmental and social community objectives. Their aim is to optimize energy costs and minimize system management uncertainties by adopting flexible systems that optimally utilize all available resources.

Capper et al. [6] define the purpose of Local Energy Markets (LEM) as incentivizing small energy consumers, producers, and prosumers to exchange energy with one another in a competitive market and to balance energy supply and demand locally. LEMs are feasible if transaction costs fall. If so, LEMs reduce the need to expand the network and harmonize the system.

LEMs, in general, can deliver market services like demand-side response. They can operate in the flexibility market, including grid stability, balancing/adjustment markets, and response-to-grid. The Peer-To-Peer (P2P) model creates an online marketplace where prosumers and consumers can trade electricity, without an intermediary, at their agreed price [7]. The Transactive Energy model (i.e., a dynamic balance between supply and demand is maintained in the electrical infrastructure using economic and control mechanisms that utilize value as a key operational parameter) is more suitable for the Flexibility Market. As for market design, LEMs can operate with price signals, economic incentives, bilateral contracts, or a competitive market. There are a few examples and publications concerning examples of LEM implementation. After a review of the existing literature, the following examples have been found:

Cornwall LEM was a four-year trial (2016–2020), aligned with the 2050 net-zero economy objective. The project was based on a flexibility solution. The partners involved in the project were Centrica N-side Western Distribution National Grid ESO, University of Exeter, and Imperial College London.
The Brooklyn Microgrid case is a P2P Microgrid where quarter neighbors sell their surpluses to other neighbors. It started in 2012 because of hurricane Sandy, during which $3 / 4$ of networks lost electricity. Whole-community residential panels produce 250–400 MWh each unit of small-scale capacity.
Great Manchester Local Energy Market (GM LEM). This was a two-year project whose mission was to reach a net-zero emission target by 2038. The project was funded by 11 partners (Bruntwood, OVOEnergy, Carbon Co-Op, Hitachi, Daikin, Regent, et al.).
Gothenburg Communities. The project was financed by the Swedish Energy Agency and f3 (includes universities and research institutes. Chalmers Industrileknik (CIT) is the host of the f3 organization. The Swedish Knowledge Centre for Renewable Transportation Fuels. Gothenburg Communities comprise 16 municipalities, members of the Klimatkommunerna (Climate Municipalities).
Empower Hvaler, Norway. Empower was the European Union’s Horizon 2020 Research and Innovation program based on the idea that the local market maximizes social welfare. The consortium included Smart Innovation Østfold AS, (NO) Schneider Electric Norge AS (NO), eSmart Systems AS (NO), Fredrikstad Energi Nett AS (NO), University of St. Gallen (CH), Universitat Politècnica de Catalunya (ES), Malta Intelligent Energy Management Agency (MT), and NewEn Projects GmbH (DE) to develop a neighborhood energy using rooftop PV (3–5 kW) on about 100 houses, micro wind turbines, other de-central energy production, and EVs.
Los Molinos del Rio Aguas (LMRA). Ecological Community in Almería, Southeast of Spain. Off-grid community-owned microgrid interconnecting solar home systems. This LEM is based on Blockchain to exchange energy and contribute to the Community’s social welfare, an example of prosumption. They have achieved a Leveraged Cost of Energy (LCOE) reduction of around 30% (around 0.08 €/kWh).

In this sense, it should be indicated that LEM Regulation is still pending in most analyzed countries like Spain, the UK, Norway, Sweden, and the United States.

The Local Energy Market is a highly flexible solution with many capabilities, but it must be managed effectively. Predictive models help us to better anticipate the subsequent cost-optimal actions like discharging surpluses and automating consumption at peak electricity production hours, optimizing BESS Reserves capacity, or optimally designing the distribution network. This research aims to improve the LEM performance by applying advanced statistical methods such as SARIMAX, ESA, or TCM. The methods are valid for any LEM. This analysis applies them to the Cornwall LEM dataset, as this Energy Community has a whole set of attributes in several analytical dimensions (EPC-unit site, DERs, BESS, appliances, EV, Bill). As the authors did not see prior forecasting models applied to LEMs, this is the main novelty of this research.

These models will guide the Energy Community in making upward- and downward-flexible decisions while optimally managing the storage reserves to meet the contract Distribution Network Operator conditions. In downward flexibility, the Energy Community will purchase energy for consumption or charge BESS at low price signals. In upward flexibility, the Energy Community will discharge BESS and sell energy at high price signals. The predictive sequence models will forecast Consumption, Production, and Headroom% time series for an average site and selected site clusters from Cornwall LEM. The headroom % definition is the residual BESS energy capacity that has not participated in PV self-consumption [8]. The analysis compares achieved results from selected proposed models against the current Consumption and Production incumbent forecasting model estimates (incumbent forecasting models are the models currently used at Cornwall LEM).

This research aims to find better ways to optimize the Local Energy Market performance. A more granular categorization of the consumption–production–headroom% profile at different granularity levels will help us to optimize DERs surpluses revenues and the storage reserves management. Our research validates this hypothesis by benchmarking the proposed new model’s performance with existing forecasting models for an average site or selected site clusters.

The structure of this paper is as follows. The Materials and Methods section provides information about the LEM dataset used to build the selected sequence models. Also, the methodology used to obtain the trained models is detailed. The Experimentation and Results section elaborates on the statistical analysis and the main results achieved with the Consumption, Production, and Headroom predictive models for an average site or selected meaningful site clusters of the Cornwall LEM. The Discussion section exposes the benefits of using these models, benchmarking their results with the incumbent forecasting model used by the Cornwall LEM. Finally, the Conclusions section presents particular findings and opportunities for Future Research.

2. Materials and Methods

As presented before, there are six relevant LEM case studies: Cornwall (UK), Great Manchester (UK), Brooklyn (USA), Gothenburg (Sweden), Hvaler (Norway), and Los Molinos del Río Aguas (Spain). Their analysis will help to identify the proper attributes needed to choose a particular dataset representing the main characteristic of a Local Energy Market. The analysis compares these LEM with the main standard features:

1.: Scalability (from small to significant Energy Communities).
2.: LEM Type (namely Transactive Energy, Energy Community, or P2P).
3.: Price Scheme (i.e., how the pricing is computed).
4.: Bidding Strategies chosen (i.e., the market mechanism used to form the price, typically Long Term Auction, Double auction Auction, Intraday Market, Power Purchase Agreement, and Continuous Double Auction/Two-sided bidding market).
5.: The Forecasting Models used.
6.: Storage optimization.
7.: Smart charging (to execute upward and downward flexibility).
8.: Automation and Blockchain technologies implemented.

The authors selected the Cornwall Local Energy Market [9] dataset due to its attributes and its availability. The dataset provides important features (Solar PV production and energy consumption, among others) and different forecasting models. These models can be used to compare new approaches and the improvements made. This is the only full dataset that can be used as an LEM Dataset Reference. In addition, Cornwall provides a typical example of a LEM (according to the size, scalability, and the other relevant characteristics listed before) and has data at different time scales that allow the development of models at different time granularity levels. Cornwall LEM is a medium-sized Energy Community with 100 dwellings with available metadata information about their owner–occupiers. The sites have two kinds of storage systems. In some sites, there is a Battery Energy Storage System (BESS) that is AC-coupled, with an existing Solar Photovoltaic (PV) array + Inverter, and in others, it is DC-coupled, with a new solar PV array. The BESS Energy Capacity was 5, 7.5, and 10 kWh, while the BESS Power Capacity was 2.5 or 3.3 kW. sonnenBatterie provided all the equipment, including the capability to track the power flows and the state of charge of the battery via web scraping. Users were granted access to production and consumption forecasting model estimates. Distributed Energy Resources included PV panels and heat pumps, and the consumption of different appliances was tracked. The LEM optimizes the battery usage considering the Energy Community’s electric vehicles and regular storage. The Energy Community managed the surpluses in the market, guaranteeing the storage reserves would meet contract conditions with the Distribution Network Operation.

This Local Energy Market trial published a dataset [10] about the sites, including energy and Battery Energy Storage System State of Charge measurements from the equipment at a minute granularity level. All this information is standard in all LEM case studies analyzed, but the site metadata is very detailed in this case. The dataset also includes consumption and production forecasts, weather forecast measurements, the BESS specifications, and partially filled metadata, which included information about the site comprising household information, Energy Performance Certificate (EPC), appliances, DER, electric vehicle, and electricity bills. All these attributes will make up the analytical dimensions of the final crossed dataset.

The Cornwall LEM dataset has an adequately documented dictionary [11]. Figure 1 depicts all energy flows. Trilemma Consulting produced several reports analyzing the Sites Metadata [12], the Fleet Self-Consumption [13], and the BESS Utilization [8]. All that information offers a clear insight into the Energy Community composition and how it consumes, produces, and stores energy.

Time Series Forecasting for Renewable Energy in the solar, wind, hydropower, geothermal, and biomass domains apply multiple regression techniques. The aim is to build sequence models that help manage the unit performance optimally. Algorithm categories include advanced statistical methods, machine learning techniques, deep machine learning techniques, and new hybrid models [14,15,16,17,18]. At this point, the analysis developed consumption, production, and headroom% forecasting models with advanced statistical methods. The first hypothesis is that forecasting models of consumption, production, and storage headroom% can be built using these methods with good performance and do not have high computation needs (such as those required for more advanced methods such as deep learning techniques). The inference model for consumption, production, and storage headroom% can be integrated into low-cost Internet of Things (IoT) devices to support a more optimal and automatic energy management [19]. The second hypothesis focuses on improving the prediction of consumption, production and headroom models to make more reliable decisions. For this objective, the models developed should be compared with those currently used in the LEM.

There are a plethora of advanced statistical methods are available: Multivariate Regression, Multiple linear regression (MLR), Forward Regression, Quantile regression, Exponential Smoothing Average, Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), Seasonal Autoregressive Integrated Moving Average with eXogenous factors (SARIMAX), Non-linear Autoregressive eXogenous Model (NARX), Autoregressive Fractionally Integrated Moving Average (ARFIMA), Generalized Autoregressive Conditional Heteroskedasticity (GARCH), Maximum likelihood, a Bayesian Approach, and Kernel Density Estimation (KDE), among others.

The methods chosen for this research are SARIMAX, Exponential Smoothing, and Granger Causality–Temporal Causal methods. SARIMAX and Exponential Smoothing methods are based on correlation. They are selected because they may decompose the consumption, production, and storage headroom% time series in their dynamic components (trend, seasonality, and stochastic). These methods have a descriptive ability, unlike other more complex algorithms like Recurrent Neural Networks, Long-Short Term Memory (LSTM), or Gated Recurrent Unit (GRU), and they also present good accuracy and stability. The Granger Causality–Temporal Causal method was selected to verify the cause–effect relationship between the time series to be predicted and its predictor variables.

2.1. Time Series Forecasting

Time Series forecasting assumes that history repeats itself, and that studying the past allows us to make better decisions in the future. The microgrid consumption kWh, production kWh, or BESS Headroom% are time series or ordered collections of measurements taken at regular time intervals (day, hour, half-hour, minute, or second). Voyant et al. [18] present how different granularity levels of these series may be suitable for different purposes: yearly for global management, e.g., capacity/network; monthly/weekly/daily/hourly for energy management, e.g., consumption kWh/production kWh/BESS headroom% optimization, or every half-hour/minute/second for power management, e.g., system safety. Time series measures include dependent and exogenous variables (series that help to explain the series to be forecasted), events (series of recurring incidents), or interventions (series of non-recurring incidents).

Time Series may be decomposed into the following dynamic components: trend, e.g., ascending/descending, local/global, linear/non-linear, multiplicative, etc.; seasonal cycles, e.g., repetitive and predictable series pattern (seasonality); non-seasonal cycles, e.g., repetitive and possibly unpredictable series pattern (periodicity of the cycle varies over time); pulses and steps, e.g., abrupt and sudden level changes (pulse when it is temporary, step if it becomes permanent). Figure 2 shows the Hourly Production kWh for Cluster 1 Decomposition. Frequency graphs (left bi-dimensional and right radial) indicate the number of cases at different levels of Production kWh. The analysis includes the outliers. The upper image shows full Production kWh. The lower image shows the trend, seasonal, and stochastic/irregular components decomposition of Hourly Production kWh.

2.2. ARIMA, SARIMA, and SARIMAX Methods

George E.P. Box and Gwilym Jenkins introduced the Autoregressive Integrated Moving-Average (ARIMA) method family in 1970 [20]. This method is based on correlation and is parameterized with two integer values: p and q. Each one of these parameters characterizes one kind of process. The first p parameter distinguishes the P-order autoregressive processes denoted AR(p) as linear combinations of p past observations. Parameter q identifies q-order moving average processes denoted MA(q) as a series of q past innovations. Matt Sosna resumes (https://mattsosna.com/ARIMA-deep-dive/#:~:text=the%20moving%20average.-,MA%3A%20Moving%20average,lags%20in%20the%20white%20noise, accessed on 3 October 2023) this parameter as “The second major component of an ARIMA model is not a rolling average, but rather the lags in the white noise”. Both autoregressive and moving average processes are represented by the following equations:

AR(p):

Y_{t} = ϕ_{1} Y_{t - 1} + ϕ_{2} Y_{t - 2} + \dots + ϕ_{p} Y_{t - p} + ϵ_{t} ϵ_{t} : white noise

MA(q):

Y_{t} = ϵ_{t} + θ_{1} ϵ_{t - 1} + θ_{2} ϵ_{t - 2} + \dots + θ_{p} ϵ_{t - p} ϵ_{t} : white noise

This way, using the following three components, ARIMA models forecast the future based on the past.

1.: AR(p) Autoregression (AR) is a regression model that uses past values as input to predict future values.
2.: I(d) Integration is the process of differencing the time series to make it stationary, a necessary condition for using ARIMA models. They suppose the error is white noise. Differencing involves subtracting the current values of a series from its previous values d number of times. A Dickey–Fuller test may be run to check the null hypothesis of the series to be stationary.
3.: MA(q) Moving Average the moving average component depicts the model’s error as a combination of previous error terms. The order q represents the number of terms included in the model.

A necessary condition for algorithm training is the process’s invertibility, which is the condition that allows an AP(p) process to be convertedinto an MA(1) process or an MA(q) into an AR(1) process. This characteristic will be helpful during the identification process. The invertibility condition is met if the parameter’s root module is inferior to the unit (

| θ | < 1

). That means the current value of

Y_{t}

can be developed as a linear combination of past observations. Autoregressive autocorrelations tend exponentially to zero, meaning that observations from the distant past have less influence than most recent observations. The bigger |

θ

|, the slower the correlations will decay. This parameter is related to the process memory. The closer it is to zero, the shorter the memory is and the less it is dependent on the past. The innovation effect is transitory in a stationary process, but in a non-stationary process, the consequence is permanent (the series does not return to a constant average. If

| θ | < 1

, the effects are endless, and the process becomes a random walk. If

| θ | > 1

, the behavior becomes explosive, and the farthest observations are more important than the closest. This behavior is not typically seen in reality.

Identification of an ARIMA process involves using the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF). ACF is used to calculate the correlation between current and past values, while PACF considers intermediate observations between lags. The AR parameter (p) is identified using the Total ACF, and the MA parameter (q) is identified using the PACF. Correlograms are used to identify meaningful lags where autocorrelation is not random. The Integration (d) parameter represents the number of differences needed to make the series stationary. To test the ARIMA model, the Ljung–Box Q-test is used to check if residual autocorrelations are random. If p-values are bigger than 5%, we cannot reject the null hypothesis, reinforcing our hypothesis that our model captures the data process.

ARIMA Y_{t} = c + \sum_{i = 1}^{p} θ_{i} Y_{t - i} + \sum_{j = 1}^{q} θ_{j} ϵ_{t - j} + ϵ_{t}

ARIMA is easy to interpret, unlike other Deep Learning techniques like Recurrent Neural Networks. They also have few parameters, which makes them easy to maintain. Conversely, with complex data, reaching the optimal solution for p and q parameters might be challenging.

A more powerful type of ARIMA model that tries to capture the seasonality is the SARIMA, which stands for the Seasonal Autoregressive Integrated Moving Average. These models explain a given time series based on its past values; either they are stationary or non-stationary. SARIMA can be represented as the following equation:

\underset{n o n - s e a s o n a l}{\underset{︸}{(p, d, q)}} x \underset{s e a s o n a l}{\underset{︸}{({(P, D, Q)}_{s}))}} s : number of observations per year

SARIMA models suit stationary time series like electricity load, renewable generation, or storage headroom%. SARIMA Models are specified by six order parameters: p, d, q, P, D, and Q. The first three parameters represent the non-seasonal part of the model, and the second three are the seasonal part. p represents the order of the AR term, q the order of the MA term, and d the number of differences required to make the time series stationary; all of them refer to the nonseasonal part of the model. P, D, and Q represent the same in the seasonal part of the model. s represents the number of observations per year. Hence, SARIMA models allow for differencing data by seasonal frequency and non-seasonal differencing. SARIMA can be represented as the following equation:

Y_{t} = c + \sum_{i = 1}^{p} θ_{i} Y_{t - i} + \sum_{j = 1}^{q} θ_{j} ϵ_{t - j} + \sum_{k = 1}^{P} α_{k} Y_{t - s k} + \sum_{l = 1}^{Q} β_{l} ϵ_{t - s l} + ϵ_{t}

Finally, we have the comprehensive form of the SARIMAX, which stands for SARIMA, with exogenous variables. These added variables might enhance the model’s fitness. SARIMAX can be represented as the following equation, where M is the number of exogenous variables:

Y_{t} = c + \sum_{i = 1}^{p} θ_{i} Y_{t - i} + \sum_{j = 1}^{q} θ_{j} ϵ_{t - j} + \sum_{k = 1}^{P} α_{k} Y_{t - s k} + \sum_{l = 1}^{Q} β_{l} ϵ_{t - s l} + + \sum_{m = 1}^{M} \sum_{r = 1}^{R} γ_{r}^{m} X_{t - r}^{m} + ϵ_{t}

In this research, we have applied the following methodology to build and assess SARIMAX models:

1.: Select the dependent variable: Consumption kWh, Production kWh, or BESS Headroom %. The Consumption kWh measures the energy consumption required to run appliances and heat a particular site at a given time. Production kWh measures the PV generation. Headroom% measures the storage available as a market opportunity.
2.: Prepare the dependent variable (data cleansing/pre-processing activities): handle null values with linear interpolation, detect and replace all kinds of outliers with cut-off values, and reject variables with a quality score less than 25%.
3.: Select exogenous variables and prepare for modeling (run all data cleansing/pre-processing activities). The power flows, storage headroom%, and meteorological measures have been selected as exogenous variables for all models. The list includes the following: Discharge kWh, Charge kWh, Grid Export kWh, PV Charge kWh, PV Consumption kWh, PV Export kWh, Grid Discharge kWh, Grid Charge kWh, Grid Consumption kWh, Consumption Discharge kWh, precipitation, precipitation probability, sunshine hours, solar irradiation, wind speed, and wind direction. These variables are described in detail in the Experimentation and Results Section 3.
4.: Check dependent variable stationarity using residual analysis and Augmented Dickey–Fuller test. The Augmented Dickey–Fuller test (ADF Test) checks the null hypothesis of a series as stationary

$Δ Y_{t} = Y_{t} - Y_{t - 1} = α + β t + γ Y_{t - 1} + ϵ_{t}$

ADF Test checks if $γ = 0$ . If so, then $Y_{t}$ is a random walk. It is a stationary process if −1 < 1 + $γ$ < 1. This test helps us to determine integration (d and D) parameters. Figure 3 shows the Augmented Dickey–Fuller Test applied to the Hourly Production kWh. In this case, the natural series is stationary. The same experiment applied to Daily Production kWh data showed non-stationarity and the need to integrate with one difference to make it stationary.
5.: Run SARIMAX models.
6.: Plot ACF and PACF correlograms for SARIMAX models to determine optimal seasonal and non-seasonal autoregressive and moving average (p, q, P, and Q) parameters. Analyze the overall fitness, cause and effects, and impact diagram in TCM models.
7.: Diagnose and assess random residuals model using the Ljiung–Box Q-test. The Ljiung–Box Q-test contrasts the null hypothesis that the autocorrelations of a time series are different from zero. The null hypothesis tests’ residual errors are not random, which implies that there is a structure in the observed series that the model does not explain. The more random the errors, the more likely it is to be a good model. Instead of testing for randomness on each lag, the total randomness is tested on all study lags. Not passing this test implies not properly decomposing the time series in its trend and seasonality components from the stochastic component.

$Q = n (n + 2) + \sum_{k = 1}^{h} \frac{\hat{ρ_{k}^{2}}}{n - k}$
8.: Analyze all correlogram lags inside the 95% confidence interval to determine more optimal autoregressive and moving average parameters order.
9.: Compute validation metrics to determine model fitness. The validation metrics are RMSE, MSE, MAE, and R².
10.: Graph residuals and residuals Q-Q Plot to verify residuals’ stationarity and normality.
11.: Compute incumbent forecasting model validation metrics (RMSE, MSE, MAE, and R²) to determine model fitness.
12.: Benchmark SARIMAX performance model improvement with incumbent forecasting validation metrics and plot measured dependent variable vs. incumbent forecasting model estimates vs. SARIMAX model estimates.

Exponential Smoothing Average Method (ESA)

This method is understood as a particular case of the ARIMA model -the ARIMA(0,1,1)- presented by Holt [21] in 1954 to make predictions in data with a trend. In 1960, the Ph.D. student Winters [22] added the seasonality component. This method is suitable for short-term forecasting since it assigns decreasing weights exponentially as the observation ages. In other words, recent observations are given more weight in forecasting than older observations. Good results can be obtained when time series parameters evolve slowly. The Exponential Smoothing Average can be represented as the following equation:

{\hat{Y}}_{t + 1} = α Y_{t + 1} + (1 - α) {\hat{Y}}_{t}

α

is the smoothing weight (0 <

α

< 1). The closer it is to 0, the less influence each observation has on the smoothed time series, and little weight is given to the most recent observations when forecasting.

The Exponential Smoothing Average involves using past observations in a series and applying weighted values to make projections. Unlike other methods, it does not rely on a theoretical understanding of the data. Instead, it forecasts one point at a time and adjusts predictions as new data become available. This technique is especially helpful for forecasting a series that displays a trend or seasonality.

There are several types of Exponential Smoothing Average techniques, but Brown’s linear trend (a particular case of the Holt linear trend -series with linear trend and without seasonality-, but with the same level and trend.) and Winters’ multiplicative (a seasonal effect that changes depending on the magnitude of the series. The decomposition is made by level, trend, and seasonality.) usually provide the most accurate methods.

As Exponential Smoothing Average methods are a particular case of ARIMA, we applied the same methodology to build and assess the models.

2.3. Temporal–Causal Models

Temporal-causal modeling (TCM) is a method used to discover critical temporal relationships in time series data. It uses a combination of Granger Causality [23] and regression algorithms for variable selection. Introduced by sir Clive Granger [24], this method is based on the idea that a cause should precede its effect and that past values can help predict future values. A time series is a “Granger cause” if regressing with past values of both is more accurate than just with past values.

An essential feature of a Temporal Causal Model algorithm is that it measures the influence of independent variables on the dependent variable and the influence of the dependent variable on the independent variables. The model tests the significance of this cause–effect relationship between the production, consumption load profile, headroom% and their predictor variables. The correlation between two variables does not imply causation. Correlation means that two variables are related, but it does not necessarily mean that one causes the other.

Granger was the first to propose a causality test: the future cannot affect the past, but the opposite might be true. The null hypothesis states that no correlation exists between cause and effect.

X_{t} = α + \sum_{i = 1}^{m} β_{i} X_{t - i} + \sum_{j = 1}^{n} γ_{j} Y_{t - j} + u_{t}

Y_{t} = a + \sum_{i = 1}^{q} b_{i} Y_{t - i} + \sum_{j = 1}^{r} c_{j} X_{t - j} + v_{t}

Two contrasts can be made. The easiest is to pass the test that all the

Y_{j}

are jointly = 0 and the

c_{j}

are jointly = 0, using the Wald test. The second is to compare the restricted regressions and calculate whether the statistics comparing the difference between the whole and restricted regressions are significant. This can be tested with the Ljung–Box Q-Test.

The methodology used to build and assess the temporal causal model is similar to SARIMAX’s:

1.: Select and prepare the dependent variable. The dependent variables are the Local Energy Market Consumption kWh, Production kWh, and BESS Headroom %.
2.: Select exogenous variables and prepare for modeling. The variables selected are: Discharge kWh, Charge kWh, Grid Export kWh, PV Charge kWh, PV Consumption kWh, PV Export kWh, Grid Discharge kWh, Grid Charge kWh, Grid Consumption kWh, Consumption Discharge kWh, precipitation, precipitation probability, sunshine hours, solar irradiation, wind speed, and wind direction.
3.: Check dependent variable stationarity using residual analysis and Dickey–Fuller test and determine d and D parameters.
4.: Analyze the overall quality model by computing validation metrics (RMSE, MSE, MAE, and R²). Analyze the test of model effects (how exogenous variables affect the dependent variable and vice versa). Analyze inputs’ impact on the dependent variable. Analyze input variables lags’ significance.
5.: Graph residuals and residuals Q-Q Plot to verify residuals normality.
6.: Compute incumbent forecasting model validation metrics to determine model fitness. Validation metrics have been RMSE, MSE, MAE, and R $^{2}$ .
7.: Compare TCM performance model improvement against incumbent forecasting validation metrics and plot measured dependent variable vs. incumbent forecasting model estimates vs. TCM model estimates.

The full model development was guided by the following methodology for all experiments using the three advanced statistical exposed methods.

1.

Data preparation and selection

The Cornwall LEM source dataset crosses the Power Flows and Storage State of Charge of a certain site measured at minute granularity level with the site metadata information, weather forecasts data, and incumbent forecasting model production and consumption estimates. All analyses were created for the annual period of 1 April 2019 to 31 March 2020. Information was then aggregated daily and hourly.

2.

Stationarity Analysis of Dependent Variable

The Augmented Dickey–Fuller Test has been applied to check stationarity and identify the level of integration needed. Figure 3 shows the Augmented Dickey–Fuller Stationarity Test for hourly Production kWh.

3.

Modeling and Assessment

(a): Model Identification
Produce ESA-SARIMAX correlograms. Figure 4 shows a time series correlogram with 24 lags for an hourly time series. The correlograms present the autocorrelation and partial autocorrelation functions with a 95% confidence interval.
(b): ESA-SARIMAX random residual diagnosis with Ljung–Box Q-Test. Check the absence of serial autocorrelation.
(c): Model Assessment
Validation metrics include Root Mean Square Error and Mean Square Error (which show how close the forecasts are to the actual values), Mean Absolute Error or an average of the absolute values of the errors across all records (it indicates the average magnitude of error, independent of the direction), and the coefficient of determination R $^{2}$ , which determines the proportion of variance in the dependent variable that the exogenous variables can explain.

$MSE = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - \hat{Y_{i}})}^{2} MAE = \frac{1}{n} \sum_{j = 1}^{n} | Y_{j} - \hat{Y_{j}} | R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(Y_{i} - \hat{Y_{i}})}^{2}}{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}$

4.

Residual Normality Visualization

The Q-Q Plot is utilized to test Normality in Residuals of selected forecasting models. The Q-Q plot for Consumption kWh residual is presented in Figure 5. The Quantile–Quantile Plot helps us to compare two distributions. In the Figure, the points represent the residual quantile distribution, while the straight line/45º reference line represents the standard normal distribution. If the quantile distribution points follow a straight line, the time series is normally distributed. In our case, that means the model residuals are normally distributed.

5.

Performance Benchmark

Comparison Table with validation metrics from ESA-SARIMAX-TCM models and incumbent forecasting model for Consumption kWh and Production kWh. Data come from the incumbent forecasting models at postcode and quarter-hourly granularity levels.

6.

Error Reduction Report

Comparison Table of MSE and MAE Reduction from selected ESA-SARIMAX-TCM forecasting models and the incumbent forecasting models

3. Experimentation and Results

The experimentation aims to improve the Local Energy Market performance as the main novelty in this research field. The analysis develops several SARIMAX, ESA, and temporal–causal forecasting models that forecast the Consumption kWh, Production kWh, and Headroom% of a LEM, applying the exposed methodology to the datasets at daily and hourly granularity levels. The models estimate the behavior of the average site and also the behavior in selected site clusters identified with a prior clustering process that utilized site metadata information (especially the Electricity Performance Certificate—EPC—information). The average site represents the mean behavior of the 100 dwellings that make up Cornwall LEM. The selected site clusters share significant common characteristics like site unit (especially the Electricity Performance Certificate—EPC—information), DERs, BESS, appliances, EV, or Electricity Bill.

The exogenous variables list incorporates power flows (energies measured in kWh), storage headroom %, and meteorological data. Table 1 describes all exogenous variables specified in the predictive models.

The methodology states the following steps are required to tune and optimize the hyperparameter selection:

1.: Verify the stationarity of all time series with the Augmented Dickey-Fuller Test (it checks if the dependent variable series is stationary).
2.: Check if the residual autocorrelation is random; that is, if the model has captured the data dynamic process component, e.g., trend, seasonality, and stochastic/irregular, using the Ljiung–Box Q-Test and analyzing the residual normality with graphical analysis and residual Q-Q Plots.
3.: In SARIMAX models, ADF and PACF functions of the model that pass the residual autocorrelation randomness determine the optimal p and q parameters. Significance tests of these parameters measure their suitability. In the Time Causal Model, the Cause and Effects test measures the parameters’ aptitude using significance tests.
4.: Validate the best-selected model with RMSE, MSE, MAE, R $^{2}$ metrics.
5.: Benchmark with incumbent validation metrics (using RMSE, MSE, MAE, R $^{2}$ metrics) and present the error reduction in terms of MSE and MAE.

3.1. Models for the Cornwall LEM Average Site

The analysis starts with the model building for the average Cornwall LEM site. The experimentation table structure presents the model (dependent variable being Consumption kWh, Production kWh, and Headroom %, Method (ESA, SARIMAX, or TCM), and all validation metrics (RMSE, MSE, MAE and R

^{2}

) for experimented -marked with (1)- and incumbent forecasting methods -marked with (2)-.

a.: Average site daily granularity models

Daily granularity models are trained with a dataset with 31,955 observations produced by the 100 sites during one year (or 365 days). All time series are averaged by date to produce the Consumption, Production, and Headroom% for the average site.

Daily Consumption kWh series is not stationary as the ADF test obtains a 0.441 p-value. Conversely, the ADF Test of the first difference of this series has a 0.01 p-value, being stationary at this integration level. All ESA-SARIMAX-TCM models perform correctly with good validation metrics compared to the incumbent forecasting model. Still, only SARIMAX(1,1,1) × (1,0,1) passes the Ljiung–Box Q-Test, ensuring all residuals autocorrelation are random.

Similarly, the Daily Production kWh series is not stationary, as the ADF test returns a 0.336 p-value. The first difference is stationary, as the ADF Test of this series has a 0.01 p-value. All SARIMAX-TCM models perform correctly with good validation metrics compared to the incumbent forecasting model. On this occasion, the ESA Winters Multiplicative model presents a slightly worse MAE than the incumbent forecasting model. Only the SARIMAX(0,1,1) × (0,0,0) model passes the Ljiung–Box Q-Test.

Also, the Daily Headroom % series is not stationary, as the ADF test returns a 0.261 p-value. The first difference is stationary, as the ADF Test of this series has a 0.01 p-value. SARIMAX(1,0,4) × (0,1,1) performs correctly with good validation metrics compared to the incumbent forecasting model and passes the Ljiung–Box Q-Test.

Table 2 presents the Experimentation Table with the outcomes of different ESA, SARIMAX, and Temporal–Causal model experiments for the average site at the daily granularity level.

b.: Average site hourly granularity models

Models at hourly granularity models are trained with a dataset with 765,939 observations produced by the 100 sites during one year (or 8760 h). All time series are averaged by hour to produce the Consumption, Production, and Headroom% for the average site. All dependent variable time series are stationary as they get an ADF test with a 0.01 p-value.

Table 3 presents the Experimentation Table with the outcomes of different ESA, SARIMAX, and Temporal–Causal model experiments for the average site at hourly granularity level.

All ESA-SARIMAX-TCM models Consumption kWh series perform correctly with good validation metrics compared to the incumbent forecasting model. SARIMAX(0,0,8) × (1,1,1) passes the Ljiung–Box Q-Test marginally but is overfitted. SARIMAX(9,0,9) × (2,1,2)

As for Production kWh all ESA-SARIMAX-TCM models perform correctly with good validation metrics compared to the incumbent forecasting model. No model passes the Ljiung–Box Q-Test, but SARIMAX(24,0,24) × (1,0,0) model has a correlogram with almost all lags inside the confidence interval, meaning they are random.

SARIMAX(0,1,6) × (2,0,1) performs the best for Headroom % with good validation metrics, but no model passes the Ljiung–Box Q-Test. There is no incumbent forecasting model for this variable to compare the models to.

3.2. Models for Meaningful Cornwall LEM Site Clusters

The second level of ESA, SARIMAX, and Temporal Causal model experiments was built for selected meaningful clusters obtained with site metadata, power flows, and storage information. A couple of clustering models were built at daily and hourly granularity levels. All modeling settings share the same pre-selected input variables (main site features, power flows, and storage headroom% at different granularity levels).

Pre-selected input variables list includes the dwelling form (detached, semi-detached, mid-detached, others), dwelling type (bungalow, house, others), floor area, existent DERs (solar PV, solar thermal, biomass boiler, air source heat pump, ground source heat pump, diverter-heat recovery-immersion water heat pump, BESS type capacity (6 combinations of AC/DC systems with 5, 7.5 or 10 kWh capacity), existent appliances (fridge, freezer, dishwasher, washing machine, oven, TV, heating characteristics (boiler or heat pumps type, fuel, and if it is electric or not). The clustering methods were the standard k-Means, Kohonen Networks, or the Two-Stage.

a.: Selected cluster daily granularity models

The best target of sites for our estimation purposes of Daily Consumption kWh, Production kWh, and Headroom % has been cluster 4. The best cluster cohesion was obtained with k-Means and 5 cluster composition. Cluster cohesion and separation are measured by calculating the average (B − A)/max(A, B) overall records. A is the record’s distance from its cluster center, and B is the distance from the nearest cluster center it does not belong to. K-Means with 5 clusters got the highest 0.3 cohesion and separation metric, meaning a fair clustering according to Kaufman and Rousseeuw [25] as it is shown in the Figure 6.

Cluster 4 has a size of 16% (6659 records) of the whole dataset at daily granularity level and the following profile:

Most meaningful inputs: Heating Electric/Non-Electric, TV, Freezer, Fridge, Ground source heating pump, Air source heating pump, Solar PV, Floor Area, Dwelling Type, Consumption kWh
Dwelling Type: bungalow, house
Floor Area: mostly big (350–400 m²) with a small portion of small dwellings (0–100 m²)
DER: they have mostly PV panels, air source heat pump, and ground source heat pump
Appliances: fridge, freezer, TV
Heating: all of them have Electric Heating
Power Flows and BESS Headroom% (measured at site and date level):
–
Consumption kWh: It ranges between 0 and 157 kWh. The highest frequency is at 110 kWh.
–
Grid Consumption kWh: it ranges between 0 and 153 kWh. The highest frequency at 90 kWh
–
Grid Import kWh: it ranges between 0 and 153 kWh. The highest frequency at 90 kWh
–
PV Consumption kWh: it ranges between 0 and 28.65 kWh. The highest frequency at 7 kWh
–
Production kWh: it ranges between 0 and 84.7 kWh. The highest frequency at 40 kWh
–
Headroom: it ranges between 0.047 and 100%. The highest frequency is between 40 and 90
–
Grid Charge kWh: it ranges between 0 and 27.96 kWh. The highest frequency at 27 kWh
–
Discharge kWh: it ranges between 0 and 23.61 kWh. The highest frequency at 24 kWh
–
Consumption Discharge kWh: it ranges between 0 and 23.6 kWh. The highest frequency is at 23 kWh.
–
Grid Export kWh: it ranges between 0 and 56.51 kWh. The highest frequency at 12 kWh
–
PV Export kWh: it ranges between −0.57 and 56.51 kWh. The highest frequency at 12 kWh
–
PV Charge kWh: it ranges between 0 and 17.07 kWh. The highest frequency at 11 kWh
–
Charge kWh: it ranges between 0 and 28.11 kWh. The highest frequency at 27 kWh
–
Grid Discharge kWh: it ranges between 0 and 27.96 kWh. The highest frequency at 5 kWh

Daily granularity models are trained with a dataset with 6659 observations produced by cluster 4 dwellings (16% of the total dataset) during one year. The daily Consumption kWh series for cluster 1 is not stationary, as the ADF test gets a 0.537 p-value. On the other hand, the ADF Test of the first difference of this series has a 0.01 p-value, being stationary at this integration level. All ESA-SARIMAX-TCM Consumption kWh models perform correctly with good validation metrics compared to the incumbent forecasting model, as shown in Table.

Table 4 presents the Experimentation Table with the outcomes of different ESA, SARIMAX, and Temporal Causal model experiments for Cluster 4 at the daily granularity level.

SARIMAX(1,0,6) × (0,1,1) in Figure 7 shows the best validation metrics for Daily Consumption kWh in July 2019 and passed the Ljiung-Box Q-Test. Figure 8 exhibits the residual plot, and Figure 9 the Q-Q Plot, which verifies the residual stationarity and normality in the same period.

Similarly, Daily Production kWh for cluster 4 series is not stationary as the ADF test returns a 0.370 p-value. The first difference is stationary, as the ADF Test of this series has a 0.01 p-value. All SARIMAX-TCM models perform correctly with good validation metrics compared to the incumbent forecasting model. On this occasion, the ESA Winters Multiplicative model presents a slightly worse MAE than the incumbent model. Only SARIMAX(0,1,2) × (0,0,0) model passed the Ljiung–Box Q-Test and achieved good validation metrics.

Also, the Daily Headroom % series for cluster 4 is not stationary, as the ADF test returns a 0.459 p-value. The first difference is stationary, as the ADF Test of this series has a 0.01 p-value. SARIMAX(0,1,1) × (0,0,0) had the best validation metrics compared to the incumbent forecasting model and passed the Ljiung–Box Q-Test.

b.: Selected cluster hourly granularity models

Clustering models at hourly granularity level were trained with the same pre-selected input variables as in the daily models. After training with K-Means, Kohonen Networks, and the Two-Stage clustering algorithm, the best cluster cohesion was obtained with K-Means and 5 cluster composition. K-Means with 5 clusters achieved a 0.4 cohesion and separation metric, meaning a fair clustering according to Kaufman and Rousseeuw [25].

The best target of sites for our estimation purposes of Consumption kWh, Production kWh, and Headroom % at this granularity level is cluster 1, which has a size of 64.8% (506,676 records) of the whole dataset at the hourly granularity level and the following profile:

Most meaningful inputs: Heating Electric/Non-Electric, TV, Freezer, Fridge, Ground source heating pump, Air source heating pump, Solar PV, Floor Area, Dwelling Type, Headroom.
Dwelling Type: bungalow, house.
Floor Area: the highest frequency with 300–350 m².
DER: they have mostly PV panels and ground source heat pump.
Appliances: fridge, freezer, TV.
Heating: they have mostly Electric Heating.
Power Flows and BESS Headroom% (measured at site and hour level).
1.
Consumption kWh: It ranges between 0 and 16 kWh. The highest frequency at 16 kWh.
2.
Grid Consumption kWh: It ranges between 0 and 16 kWh. The highest frequency is at 16 kWh
3.
Grid Import kWh: It ranges between 0 and 16 kWh. The highest frequency at 16 kWh
4.
PV Consumption kWh: It ranges between 0 and 5.3 kWh. The highest frequency is at 4.2 kWh
5.
Production kWh: It ranges between 0 and 11.2 kWh. The highest frequency is at 7 kWh
6.
Headroom: It ranges between 0.047 and 100%. The highest frequency is between 20 and 90%
7.
Grid Charge kWh: It ranges between 0 and 3.47 kWh. The highest frequency is at 1.5 kWh
8.
Discharge kWh: It ranges between 0 and 3.34 kWh. The highest frequency is at 3.2 kWh
9.
Consumption Discharge kWh: It ranges between 0 and 3.33 kWh. The highest frequency is at 3.2 kWh
10.
Grid Export kWh: It ranges between 0 and 9.95 kWh. The highest frequency is at 6.2 kWh
11.
PV Export kWh: It ranges between -0.038 and 9.95 kWh. The highest frequency is at 6.2 kWh
12.
PV Charge kWh: It ranges between 0 and 3.03 kWh. The highest frequency is at 3.2 kWh
13.
Charge kWh: It ranges between 0 and 3.5 kWh. The highest frequency is at 1.5 kWh
14.
Grid Discharge kWh: It ranges between 0 and 3.21 kWh. The highest frequency is at 2.2 kWh

Hourly granularity models are trained with a dataset with 506,676 observations produced by Cluster 1 (size: 64.8%). Hourly Consumption kWh, Production kWh, and Headroom % are all stationary time series, as their ADF test achieves a 0.01 p-value. All ESA-SARIMAX-TCM Consumption kWh models perform correctly with good validation metrics compared to the incumbent forecasting model. Figure 10 shows the models computed for July 2019. SARIMAX(0,0,2) × (1,1,1) passed the Ljiung–Box Q-Test with the best validation metrics, but it is overtrained (no bias in all partitions datasets tried). On the other hand, Figure 11 presents SARIMAX(9,0,9) × (2,1,2) model with a reasonable validation metric.

The Q-test is not available for this experiment, but just a few lags outside the correlograms’ interval confidence. Residual Analysis and Residuals Q-Q Plot show visual evidence of stationarity and normality of model SARIMAX(9,0,9) × (2,1,2) in Figure 12 and Figure 13.

Table 5 presents the Experimentation Table with the outcomes of different ESA, SARIMAX, and Temporal–Causal model experiments for the average site at the hourly granularity level.

Similarly, Hourly Production kWh SARIMAX-TCM models perform correctly with good validation metrics compared to the incumbent forecasting model. On this occasion, the ESA Winters Multiplicative model presents a slightly worse MAE than the incumbent model. No identified SARIMAX model passed the Ljiung–Box Q-Test. SARIMAX(3,0,4) × (2,0,1) achieved the best validation metrics.

The best model identified for the Headroom % was the SARIMAX(3,1,3) × (2,0,2), which achieved the best validation metrics and passed the Ljiung–Box Q-Test.

4. Discussion

As mentioned previously in existing literature on this subject, the authors found no published predictive models on LEM, aside from the Cornwall LEM trial. Most of the LEM study cases reviewed and studied were trials that did not publish their data or forecasting results. This situation was the main reason to start this research with the published open Cornwall LEM dataset. Thus, no previous work on applying automatic learning techniques in LEMs exists. Therefore, the models developed in the Cornwall test will be used as a reference framework to compare the results obtained in the trial (incumbent models) with those developed in this research. The results are described below, comparing the resulting metrics with those published with the reference dataset.

The best ESA-SARIMAX-TCM models present high fitness rates compared to actual-observed time series. However, not all have decomposed the dependent variable time series in their trend, seasonal, and stochastic components. The research compares the ESA-SARIMAX-TCM models’ performance with the incumbent forecasting model for Production kWh and Consumption kWh (no BESS Headroom% incumbent forecasting model included in the Cornwall LEM dataset). The ESA-SARIMAX-TCM performs better with the annual dataset (from April 2019 to March 2020) than the incumbent forecasting models in most cases. This accuracy improvement may optimize the surplus trading and the BESS reserves, which may help to justify an investment business case.

Table 2 and Table 3, validation metrics (RMSE, MSE, MAE, and R

^{2}

) compare ESA-SARIMAX-TCM models’ performance with an incumbent forecasting model for an average site at daily and hourly intervals. The last two columns of this table present the MSE and MAE reduction.

At the daily granularity level, the selected model for Consumption kWh exposes a 91.79% MSE reduction and 69.54% MAE reduction over the incumbent forecasting model’s MSE and MAE. The Best Production kWh produced a 99.93% MSE reduction and 97.08% MAE reduction. Consumption and Production kWh have appropriately identified the data dynamic process at this granularity level. No incumbent forecasting model is available for headroom%, so a comparison is not available.

Figure 4 presents the validation metrics from ESA-SARIMAX-TCM models’ performance and the incumbent forecasting model to benchmark for selected site clusters from Cornwall LEM. Daily models for target cluster 4 and hourly models for target cluster 1 use common site characteristics (site, DERs, BESS, appliance, power flows, BESS state of charge) computed at respective granularity levels.

The best daily Consumption kWh model for site cluster 4 exhibits a 99.13% MSE reduction and 91.50% MAE reduction over the incumbent forecasting model’s MSE and MAE. The best Production kWh model for site cluster 4 displays a 99.7% MSE reduction and 95.37% MAE reduction. Both Consumption and Production kWh models properly decompose the data dynamic process at this granularity level.

The best Consumption kWh model for cluster 1 at the hourly granularity level produces a 91.11% MSE reduction and 69.94% MAE reduction over the incumbent forecasting model’s MSE and MAE. Best Production kWh shows a 99.19% MSE reduction and 91.38% MAE reduction. At this granularity level, the models for cluster 1 have not passed the Ljiung–Box Q-test to reject the existence of residual autocorrelation for a fixed number of lags of these dependent variables. However, a deeper residual analysis and residuals Q-Q Plot has not shown clear evidence of residual autocorrelation. It decomposes the dynamic data process if the study uses more profound methods that may help unveil non-linear relationships between the dependent variable and its inputs.

Considering the above results, the authors conclude that the proposed forecasting models are better than the present models used in the Cornwall LEM. This way, the starting hypothesis formulated in the Introduction section is validated. This better knowledge of the consumption–production load profile categorization may help with the LEM optimize management. The research has analyzed the impact of this accuracy improvement on the BESS reserves’ capacity to meet DSO contract conditions and avoid fines. Also, the models can be applied to create different After Diversity Maximum Demand scenarios to optimize the investment in the distribution network.

Although the forecasting models’ outcomes outperformed the incumbent forecasting models, the authors must recognize several limitations:

1.: Dataset variability: different data attributes collected at every Energy Community may contribute to the model comparison.
2.: Weather forecast information: Consumption and production depend heavily on meteorology. Weather forecast quality and granularity are critical issues in forecasting power flows.
3.: Price signals information: It is necessary to count on a fourth forecasting model: the price signals forecasting model. That way, the LEM will understand when to charge the BESS inexpensively and when to discharge with the highest profit for the Energy Community.

The accuracy and stability of these consumption and production categorization models may be improved with new non-linear shallow learning or deep learning method sequences. The non-linear shallow methods to be tested may include Random Forest or eXtreme Gradient Regressor, and deep learning methods such as a Hybrid Model of Convolutional Neural Networks and Recurrent Neural networks (Long Short-Term Memory Networks—LSTM—or Gated Recurrent Unit -GRU-), Bayesian neural networks or Transformers in Sequence like N-Beats [15,17,26,27,28,29,30,31,32,33,34,35]. These deep learning techniques unveil non-linear relationships, and the scientific community is developing new hybrid models for forecasting photovoltaic power like the models created by Guanoluisa et al. [36] using Bayesian optimization in the hyperparameters setting. It should be tested whether the memory associated with Deep Learning Sequence Models may facilitate Transfer Learning thanks to its spatial–temporal feature extraction capabilities. If this is the case, a model trained in a region will be relatively useful for the model to be trained in another region.

Another interesting study is to mix high-frequency (minute granularity) actual consumption and production data with half-hourly non-linear forecasting consumption models. A more detailed understanding of the consumption of the Cornwall LEM will help the Energy Community to optimize market decisions or manage the storage reserves optimally to meet contract conditions with the Distributed System Operator.

5. Conclusions

This paper uses time series models to characterize the production and consumption profiles of the facilities associated with a particular type of Energy Community, named the Local Energy market or LEM. There are few existing projects/deployments of this type of LEM, so it is a new field of study in which this paper aims to provide the first solution to the problems of consumption/production optimization using machine learning algorithms. These algorithms are naturally used in environments where the data time is very relevant due to the causality between the different data in the considered time intervals or gaps. However, for LEMs, no similar work has been found.

To optimize the consumption and production in a LEM, we first need to obtain reliable models that can predict the corresponding profiles and the efficiency of the battery’s use. To achieve the model, this paper analyzes Consumption kWh, Production kWh, and Headroom % of the Cornwall Local Energy Market 100-dwelling Energy Community. The research has analyzed the Cornwall LEM dataset [10] and developed forecasting models for an average site and selected target site clusters. Models built with ESA, SARIMAX, and Temporal–Causal advanced statistical methods show a better performance benchmarked with the incumbent forecasting model at daily and hourly granularity levels.

As mentioned by Voyant C. et al. [18], disposing of forecasting models is important to the consumption, production, and storage reserves needed to optimize the performance of an Energy Community, and also a LEM. This paper remarks on the necessity of these models to guarantee the reliability of an electrical system based on renewable energies. Furthermore, energy management is scaled in terms of monthly, weekly, or daily granularity levels to optimize production, consumption, trading in day-ahead markets, and network operation. Finally, the power management is scaled at an ultra-frequency granularity level, minute or second scale, to guarantee the system’s safety. Models at that prediction scale are suited for real-time monitoring, ramping events, balance markets, or BESS reserves optimization. This paper presents several models that consider these granularity levels (at least, granularity data collected in the Cornwall dataset).

It can concluded that this research has demonstrated performance improvement in model accuracy and stability, hence the possibility of optimizing LEM surplus trading, BESS reserves capacity, or the distribution network investment.

In future work, more artificial intelligence algorithm techniques (such as those mentioned above) can be employed and implemented in LEMS deployments. These implementations will allow the automation of optimization processes in consumption and production profiles not only in a theoretical way, but also in a practical way, integrating smart devices into PV installations.

Author Contributions

Conceptualization, W.R., R.P.-V. and J.C.; methodology, W.R., R.P.-V. and J.C.; software, W.R. and A.M.G.-V.; validation, W.R., R.P.-V. and A.M.G.-V.; formal analysis, W.R. and R.P.-V.; investigation, W.R. and R.P.-V.; data curation, W.R.; writing—original draft preparation, W.R.; writing—review and editing, R.P.-V. and A.M.G.-V.; supervision, J.C.; funding acquisition, R.P.-V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universidad Nacional de Educación a Distancia (UNED) grant from UNED50 activities program, and the APC was funded also by UNED.

Data Availability Statement

Data sharing not applicable.

Acknowledgments

The authors would like to thank UNED for its support and financing the publication fees. We also wish to thank Dan Nicholls from Centrica PLC and David Kane from Trilemma Consulting Limited for sharing the Cornwall LEM dataset. Additionally, we thank Andrew Peacok from Heriott-Watt University for his support in comprehending datasets and reflections on models.

Conflicts of Interest

The authors declare no conflict of interest.

References

Braunholtz-Speight, T.; Sharmina, M.; Manderson, E.; Mclachlan, C.; Hannon, M.; Hardy, J.; Mander, S. Business models and financial characteristics of community energy in the UK. Nat. Energy 2020, 5, 169–177. [Google Scholar] [CrossRef]
Okwuibe, G.C.; Gazafroudi, A.; Hambridge, S.; Dietrich, C.; Trbovich, A.; Shafie-khah, M.; Tzscheutschler, P.; Hamacher, T. Evaluation of Hierarchical, Multi-Agent, Community-Based, Local Energy Markets Based on Key Performance Indicators. Energies 2022, 15, 3575. [Google Scholar] [CrossRef]
Faia, R.; Pinto, T.; Vale, Z.; Corchado, J.M. A Local Electricity Market Model for DSO Flexibility Trading. In Proceedings of the International Conference on the European Energy Market (EEM), Ljubljana, Slovenia, 18–20 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
Cheng, L.; Yu, T. Game-Theoretic Approaches Applied to Transactions in the Open and Ever-Growing Electricity Markets from the Perspective of Power Demand Response: An Overview. IEEE Access 2019, 7, 25727–25762. [Google Scholar] [CrossRef]
Hu, J.; Wu, J.; Ai, X.; Liu, N. Coordinated Energy Management of Prosumers in a Distribution System Considering Network Congestion. IEEE Trans. Smart Grid 2021, 12, 468–478. [Google Scholar] [CrossRef]
Capper, T.; Gorbatcheva, A.; Mustafa, M.A.; Bahloul, M.; Schwidtal, J.M.; Chitchyan, R.; Andoni, M.; Robu, V.; Montakhabi, M.; Scott, I.J.; et al. Peer-to-peer, community self-consumption, and transactive energy: A systematic literature review of local energy market models. Renew. Sustain. Energy Rev. 2022, 162, 112403. [Google Scholar] [CrossRef]
Khatib, H. Electricity trading. Economic Evaluation of Projects in the Electricity Supply Industry; Institution of Engineering and Technology: London, UK, 2014; pp. 197–211. [Google Scholar] [CrossRef]
Kane, D.; Peacock, A.; McCallum, P. LEM Residential BESS Utilisation Summary Report; Technical Report; Trilemma Consulting Limited: Windsor, UK, 2020; Available online: https://www.centrica.com/media/4637/lem-residential-bess-utilisation-summary-report.pdf (accessed on 4 October 2023).
Kane, D.; Peacock, A. Cornwall Local Energy Market Residential Project a Whistle Stop Tour; Technical Report November; Trilemma Consulting Limited: Windsor, UK, 2020. [Google Scholar]
Nicholls, D.; Kane, D. Cornwall LEM Residential Electricity Dataset with Solar Production and Battery Storage, 2018–2020; UK Data Service: Colchester, UK, 2021. [Google Scholar] [CrossRef]
Kane, D.; Peacock, A.; McCallum, P. LEM Residential Data Dictionary PUBLIC; Technical Report; Trilemma Consulting Limited: Windsor, UK, 2020. [Google Scholar]
Kane, D.; Peacock, A.; McCallum, P. LEM Residential MetaData Summary Report; Technical Report; Trilemma Consulting Limited: Windsor, UK, 2020. [Google Scholar]
Kane, D.; Peacock, A.; McCallum, P. LEM Residential Fleet Self-Consumption Summary Report; Technical Report; Trilemma Consulting Limited: Windsor, UK, 2020. [Google Scholar]
Zendehboudi, A.; Baseer, M.A.; Saidur, R. Application of support vector machine models for forecasting solar and wind energy resources: A review. J. Clean. Prod. 2018, 199, 272–285. [Google Scholar] [CrossRef]
Das, U.K.; Tey, K.S.; Seyedmahmoudian, M.; Mekhilef, S.; Idris, M.Y.I.; Van Deventer, W.; Horan, B.; Stojcevski, A. Forecasting of photovoltaic power generation and model optimization: A review. Renew. Sustain. Energy Rev. 2018, 81, 912–928. [Google Scholar] [CrossRef]
Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198. [Google Scholar] [CrossRef]
Lai, J.P.; Chang, Y.M.; Chen, C.H.; Pai, P.F. A survey of machine learning models in renewable energy predictions. Appl. Sci. 2020, 10, 5975. [Google Scholar] [CrossRef]
Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.L.; Paoli, C.; Motte, F.; Fouilloy, A. Machine learning methods for solar radiation forecasting: A review. Renew. Energy 2017, 105, 569–582. [Google Scholar] [CrossRef]
Yin, S.; Ai, Q.; Li, J.; Li, Z.; Fan, S. Energy Pricing and Sharing Strategy Based on Hybrid Stochastic Robust Game Approach for a Virtual Energy Station With Energy Cells. IEEE Trans. Sustain. Energy 2020, 12, 772–784. [Google Scholar] [CrossRef]
Box, G.W.; Jenkins, G. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
Holt, C. Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 2004, 20, 5–10. [Google Scholar] [CrossRef]
Winters, P.R. Forecasting Sales by Exponentially Weighted Moving Averages. Manag. Sci. 1969, 6, 324–342. Available online: http://www.jstor.org/stable/2627346 (accessed on 4 October 2023). [CrossRef]
Song, X.; Taamouti, A. A Better Understanding of Granger Causality Analysis: A Big Data Environment. Oxf. Bull. Econ. Stat. 2019, 81, 911–936. [Google Scholar] [CrossRef]
Granger, C. Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econom. Econom. Soc. 1969, 37, 424–438. [Google Scholar] [CrossRef]
Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 1990. [Google Scholar] [CrossRef]
Mellit, A.; Pavan, A.M.; Ogliari, E.; Leva, S.; Lughi, V. Advanced methods for photovoltaic output power forecasting: A review. Appl. Sci. 2020, 10, 487. [Google Scholar] [CrossRef]
Mosavi, A.; Salimi, M.; Ardabili, S.F.; Rabczuk, T.; Shamshirband, S.; Varkonyi-Koczy, A.R. State of the art of machine learning models in energy systems, a systematic review. Energies 2019, 12, 1301. [Google Scholar] [CrossRef]
Pérez-Ortiz, M.; Jiménez-Fernández, S.; Gutiérrez, P.A.; Alexandre, E.; Hervás-Martínez, C.; Salcedo-Sanz, S. A review of classification problems and algorithms in renewable energy applications. Energies 2016, 9, 607. [Google Scholar] [CrossRef]
Bermejo, J.F.; Fernández, J.F.; Polo, F.O.; Márquez, A.C. A review of the use of artificial neural network models for energy and reliability prediction. A study of the solar PV, hydraulic and wind energy sources. Appl. Sci. 2019, 9, 1844. [Google Scholar] [CrossRef]
Khare, V.; Nema, S.; Baredar, P. Solar-wind hybrid renewable energy system: A review. Renew. Sustain. Energy Rev. 2016, 58, 23–33. [Google Scholar] [CrossRef]
Ahmed, A.; Khalid, M. A review on the selected applications of forecasting models in renewable power systems. Renew. Sustain. Energy Rev. 2019, 100, 9–21. [Google Scholar] [CrossRef]
Alkabbani, H.; Ahmadian, A.; Zhu, Q.; Elkamel, A. Machine Learning and Metaheuristic Methods for Renewable Power Forecasting: A Recent Review. Front. Chem. Eng. 2021, 3, 1–21. [Google Scholar] [CrossRef]
Bracale, A.; Caramia, P.; Carpinelli, G.; Di Fazio, A.R.; Ferruzzi, G. A Bayesian method for Short-Term probabilistic forecasting of photovoltaic generation in smart grid operation and control. Energies 2013, 6, 733–747. [Google Scholar] [CrossRef]
Ritter, H.; Karaletsos, T. TyXe: Pyro-based Bayesian neural nets for Pytorch. arXiv 2021, arXiv:2110.00276. [Google Scholar] [CrossRef]
Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-Beats: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020; pp. 1–21. [Google Scholar]
Guanoluisa-Pineda, R.; Arcos-Aviles, D.; Flores, M.; Ibarra, A.; Motoasca, E.; Martinez, W.; Guinjoan, F. Photovoltaic power forecast using deep learning techniques with hyperparameters based on Bayesian optimization. Sustainability 2023, 15, 12151. [Google Scholar] [CrossRef]

Figure 1. Cornwall LEM power flows and state of charge stored in the dataset.

Figure 2. Production kWh time series decomposition in its trend, seasonal, and stochastic components.

Figure 3. Hourly Production kWh Augmented Dickey–Fuller Stationarity Test.

Figure 4. Autocorrelation (ACF) and Partial Autocorrelation Functions (PACF) for a SARIMAX Hourly Production kWh model.

Figure 5. Q-Q Plot for Residuals SARIMAX Forecasting Model.

Figure 6. Daily Cluster Cohesion & Separation.

Figure 7. SARIMAX(1,0,6) × (0,1,1) Daily Consumption Forecast for Cluster 4 vs. Incumbent Forecast Model vs. Real by Day of Week. July 2019.

Figure 8. SARIMAX(1,0,6) × (0,1,1) Residuals. Blue line is the residuals time series and green line is the smoothing average time series which measures the residuals stability.

Figure 9. SARIMAX(1,0,6) × (0,1,1) Daily Consumption Residuals Q-Q Plot July, 2019.

Figure 10. SARIMAX(0,0,2) × (1,1,1). Hourly Consumption kWh moel for cluster 1, July 2019.

Figure 11. SARIMAX(9,0,9) × (2,1,2). Hourly Consumption kWh model for cluster 1, July 2019.

Figure 12. SARIMAX(9,0,9) × (2,1,2) Residuals Hourly Consumption Cluster 1.

Figure 13. SARIMAX(9,0,9) × (2,1,2) Residuals Q-Q Plot Hourly Consumption kWh Cluster 1.

Table 1. Exogenous variables list.

KPI	Description
Discharge kWh	Energy discharged from the battery
Charge kWh	Energy used to charge the battery
Consumption kWh	Energy consumption on the site
Grid Export kWh	Energy exported to the grid
PV Charge kWh	Energy from Solar PV system that is diverted instantaneously to BESS Charge
PV Consumption kWh	Energy from solar PV system that is used instantaneously for Consumption
PV Export kWh	Energy from Solar PV system that is spilled instantaneously to Grid Export
Grid Discharge kWh	Energy from BESS Discharge that is spilled instantaneously to Grid Export
Grid Charge kWh	Energy for BESS Discharge that is supplied instantaneously by Grid Import
Grid Consumption kWh	Energy from Grid Import that is used instantaneously for Consumption
Grid Consumption Discharge	Energy from BESS Discharge that is used instantaneously for Consumption
BESS State of Charge % (SOC)	Percentage that indicates relative storage capacity
Precipitation probability %	Probability of precipitation
Precipitation mm	Precipitation measured in mm
Wind direction	Wind direction in degrees: 0 is north, 90 is east, 270 is west)
Wind speed	Wind speed measured in knots (one knot is 1.852 km/h)
Solar radiation	energy density measured in J/cm² (the energy density is the total energy delivered per unit area in Joules per square centimeter, J/cm²), and sunshine duration measured in minutes (calculated using the geographical coordinates to compute the sun gradient and, hence, the energy density).

Table 2. SARIMAX-EMA-TCM Daily Consumption, Production, Headroom Model Performance Comparison Table for an average site.

Model	Method	RMSE(1)	MSE(1)	MAE(1)	R²(1)	RMSE(2)	MSE(2)	MAE(2)	R²(2)	∇ MSE	∇ MAE
Daily Consumption kWh	ESA Winters Multiplicative	0.909	0.826	0.672	0.944	3.173	10.071	2.173	0.301	91.80%	69.08%
	SARIMAX(1,1,1) × (1,0,1)	0.909	0.827	0.662	0.946	3.173	10.071	2.173	0.301	91.79%	69.54%
	SARIMAX(0,1,1) × (0,1,1)	0.562	0.316	0.427	0.979	3.173	10.071	2.173	0.301	96.86%	80.35%
	Time Causal Model	0.890	0.792	0.653	0.95	3.173	10.071	2.173	0.301	92.14%	69.95%
Daily Production kWh	ESA Winters Multiplicative	3.775	14.250	3.041	0.59	4.632	21.451	2.983	0.374	33.57%	−1.94%
	SARIMAX(0,1,1) × (0,0,0)	0.119	0.014	0.087	1.00	4.632	21.451	2.983	0.374	99.93%	97.08%
	Time Causal Model	3.790	14.364	2.911	0.630	4.632	21.451	2.983	0.374	33.04%	2.41%
Daily Headroom %	ESA Winters Multiplicative	9.929	98.592	7.734	0.724	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(1,0,4) × (0,1,1)	2.076	4.311	1.582	0.989	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(1,0,4) × (1,1,1)	9.112	83.020	6.739	0.781	N/A	N/A	N/A	N/A	N/A	N/A
	Time Causal Model	9.310	86.676	7.031	0.780	N/A	N/A	N/A	N/A	N/A	N/A

Table 3. SARIMAX-EMA-TCM Hourly Consumption, Production, Headroom Model Performance Comparison Table for an average site.

Model	Method	RMSE(1)	MSE(1)	MAE(1)	R²(1)	RMSE(2)	MSE(2)	MAE(2)	R²(2)	∇ MSE	∇ MAE
Hourly Consumption kWh	ESA Winters Multiplicative	0.082	0.007	0.059	0.874	0.190	0.036	0.143	0.301	80.56%	58.74%
	SARIMAX(0,0,8) × (1,1,1)	0.000	0.000	0.000	1.000	0.190	0.036	0.143	0.301	100.00%	100.00%
	SARIMAX(9,0,9) × (2,1,2)	0.065	0.004	0.049	0.918	0.190	0.036	0.143	0.301	88.89%	65.73%
	Time Causal Model	0.085	0.007	0.064	0.864	0.190	0.036	0.143	0.301	80.56%	55.24%
Hourly Production kWh	ESA Brown’s Linear Trend	0.16	0.026	0.087	0.928	0.344	0.118	0.171	0.668	77.97%	49.12%
	SARIMAX(3,0,2) × (2,0,0)	0.032	0.001	0.017	0.997	0.344	0.118	0.171	0.668	99.15%	90.06%
	SARIMAX(3,0,2) × (1,0,0)	0.036	0.001	0.019	0.996	0.344	0.118	0.171	0.668	99.15%	88.89%
	SARIMAX(24,0,24) × (1,0,0)	0.033	0.001	0.017	0.997	0.344	0.118	0.171	0.668	99.15%	90.06%
	SARIMAX(3,0,1) × (2,0,0)	0.033	0.001	0.017	0.997	0.344	0.118	0.171	0.668	99.15%	90.06%
	SARIMAX(3,0,1) × (2,0,2)	0.033	0.001	0.017	0.997	0.344	0.118	0.171	0.668	99.15%	90.06%
	Time Causal Model	0.109	0.012	0.065	0.966	0.344	0.118	0.171	0.668	89.83%	61.99%
Hourly Headroom %	ESA Brown’s Linear Trend	2.454	6.023	1.518	0.991	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(0,1,6) × (2,0,1)	0.358	0.128	0.247	1.000	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(5,1,6) × (2,0,1)	0.757	0.573	0.520	0.999	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(0,1,6) × (3,0,2)	2.150	4.623	1.449	0.993	N/A	N/A	N/A	N/A	N/A	N/A
	Time Causal Model	0.860	0.740	0.526	1.000	N/A	N/A	N/A	N/A	N/A	N/A

Table 4. SARIMAX-EMA-TCM Daily Consumption, Production, Headroom Model Performance Comparison Table for Cluster 4 sites.

Model	Method	RMSE(1)	MSE(1)	MAE(1)	R²(1)	RMSE(2)	MSE(2)	MAE(2)	R²(2)	∇ MSE	∇ MAE
Daily Cluster 4 Consumption kWh	ESA Winters Multiplicative	0.958	0.917	0.730	0.771	5.332	28.434	4.634	−6.21	96.77%	84.25%
	SARIMAX (1,0,6) × (0,1,1)	0.498	0.248	0.394	0.942	5.332	28.434	4.634	−6.21	99.13%	91.50%
	SARIMAX (2,1,6) × (1,0,1)	0.871	0.758	0.663	0.822	5.332	28.434	4.634	−6.21	97.33%	85.69%
	Time Causal Model	0.980	0.960	0.718	0.790	5.332	28.434	4.634	−6.210	96.62%	84.51%
Daily Cluster 4 Production kWh	ESA Winters Multiplicative	3.307	10.937	2.680	0.567	3.880	15.056	2.657	0.396	27.36%	−0.87%
	SARIMAX(0,1,2) × (0,0,0)	0.185	0.034	0.123	0.999	3.880	15.056	2.657	0.396	99.77%	95.37%
	SARIMAX(0,1,2) × (1,0,0)	2.247	5.047	1.759	0.809	3.880	15.056	2.657	0.396	66.48%	33.80%
	Time Causal Model	3.230	10.433	2.445	0.633	3.880	15.056	2.657	0.396	30.71%	7.98%
Daily Cluster 4 Headroom %	ESA Winters Multiplicative	11.053	122.179	8.677	0.701	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(0,1,1) × (0,0,0)	4.554	20.739	3.555	0.950	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(0,1,1) × (1,0,1)	7.685	59.06	6.606	0.860	N/A	N/A	N/A	N/A	N/A	N/A
	Time Causal Model	9.971	99.416	7.882	0.759	N/A	N/A	N/A	N/A	N/A	N/A

Table 5. SARIMAX-EMA-TCM Hourly Consumption, Production, Headroom Model Performance Comparison Table for Cluster 1 sites.

Model	Method	RMSE(1)	MSE(1)	MAE(1)	R²(1)	RMSE(2)	MSE(2)	MAE(2)	R²(2)	∇ MSE	∇ MAE
Hourly Cluster 1 Consumption kWh	ESA Winters Multiplicative	0.095	0.009	0.070	0.870	0.213	0.045	0.163	0.339	80.00%	57.06%
	SARIMAX(0,0,2) × (1,1,1)	0.000	0.000	0.000	1.000	0.213	0.045	0.163	0.339	100.00%	100.00%
	SARIMAX(9,0,9) × (2,1,2)	0.065	0.004	0.049	0.918	0.213	0.045	0.163	0.339	91.11%	69.94%
	Time Causal Model	0.098	0.010	0.075	0.864	0.213	0.045	0.163	0.339	77.78%	53.99%
Hourly Cluster 1 Production kWh	ESA Brown’s Linear Trend	0.163	0.027	0.090	0.927	0.350	0.123	0.174	0.667	78.05%	48.28%
	SARIMAX(3,0,4) × (2,0,1)	0.027	0.001	0.015	0.998	0.350	0.123	0.174	0.667	99.19%	91.38%
	SARIMAX(22,0,22) × (2,0,1)	0.027	0.001	0.015	0.998	0.350	0.123	0.174	0.667	99.19%	91.38%
	SARIMAX(8,0,8) × (2,0,1)	0.032	0.001	0.017	0.997	0.350	0.123	0.174	0.667	99.19%	90.23%
	Time Causal Model	0.112	0.013	0.067	0.966	0.350	0.123	0.174	0.667	89.43%	61.49%
Hourly Cluster 1 Headroom %	ESA Brown’s Linear Trend	2.529	6.397	1.559	0.991	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(2,1,2) × (2,0,1)	0.426	0.182	0.291	0.994	N/A	N/A	N/A	N/A	N/A	N/A
	SARIMAX(3,1,3) × (2,0,2)	0.835	0.698	0.580	0.999	N/A	N/A	N/A	N/A	N/A	N/A
	Time Causal Model	0.932	0.868	0.579	0.999	N/A	N/A	N/A	N/A	N/A	N/A

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rozas, W.; Pastor-Vargas, R.; García-Vico, A.M.; Carpio, J. Consumption–Production Profile Categorization in Energy Communities. Energies 2023, 16, 6996. https://doi.org/10.3390/en16196996

AMA Style

Rozas W, Pastor-Vargas R, García-Vico AM, Carpio J. Consumption–Production Profile Categorization in Energy Communities. Energies. 2023; 16(19):6996. https://doi.org/10.3390/en16196996

Chicago/Turabian Style

Rozas, Wolfram, Rafael Pastor-Vargas, Angel Miguel García-Vico, and José Carpio. 2023. "Consumption–Production Profile Categorization in Energy Communities" Energies 16, no. 19: 6996. https://doi.org/10.3390/en16196996

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Consumption–Production Profile Categorization in Energy Communities

Abstract

1. Introduction

2. Materials and Methods

2.1. Time Series Forecasting

2.2. ARIMA, SARIMA, and SARIMAX Methods

Exponential Smoothing Average Method (ESA)

2.3. Temporal–Causal Models

3. Experimentation and Results

3.1. Models for the Cornwall LEM Average Site

3.2. Models for Meaningful Cornwall LEM Site Clusters

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI