1. Introduction
Sustainable generation and supply of energy has become one of the biggest challenges faced by policy makers, scientists, and researchers [
1], primarily because of both an increase in energy demand and the technological (infrastructure) improvements required to respond effectively to this growth in demand. In fact, the average electricity demand increased by about 37% between 1990 and 2008 in the European Union (27 EU countries) [
2]. Hence, there is a need for concerted innovative strategies to tackle this increasing demand, estimated at 1.4% per annum [
3], through effective energy policies. Moreover, EU heads of states and governments set three targets in 2007 to be met by 2020: (i) reduction of greenhouse gas emissions by at least 20% compared to 1990 levels; (ii) increase of the share of renewable energy to 20% of EU energy consumption; and (iii) reduction of primary energy usage by 20% through improved energy efficiency [
2]. To achieve these targets, the European Commission (EC) has initiated the European Strategic Energy Technology Plan (SET-Plan), aimed at accelerating the development and deployment of low-carbon technologies for transforming the European energy system to implement the fifth pillar of the Energy Union [
4]. Further, the SET-Plan recommends the optimisation of the current energy- and electricity-grid with federation-based approaches focusing on decentralised micro-grids [
4]. While new decentralised micro-grids are required to be part of the low-and-medium-voltage (LV/MV) electricity grids [
5,
6], the centralised and federation-based grid management approach offers an efficient control of the entire electricity grid [
7]. Traditional electricity grids are static systems; they do not provide detailed information about energy consumption on the demand side, making it difficult to address peak consumptions [
8]. Moreover, both consumer behaviour and electricity markets are evolving rapidly, progressing towards a user-centric direction—transforming the centralised, uni-directional traditional grid into a decentralised energy-sharing grid with a bi-directional flow of information and energy [
6]. This change creates a new form of end users termed “prosumers”, who are both energy producers within their micro-grid (renewable sources such as photovoltaic (PV) and wind, and combined heat and power (CHP) technologies) and energy consumers [
9,
10,
11]. Prosumers add complexity to the management of the entire electricity grid, requiring advanced distributed solutions rather than traditional approaches of centralised energy management [
6]. With the Feed-in-tariffs (FiT), prosumers prefer to maximise their gain from micro electricity grids.
Several smart district energy management models have been proposed in the literature to maximise benefits for the entire grid [
12,
13,
14,
15,
16,
17,
18]. Fonseca et al. [
12] proposed an integrated framework to maximise the utility of the micro-grid concept in the district level. Fanti et al. [
13] proposed a district energy management model to optimise the smart grid with a linear programming approach and to predict negotiated results for the next day’s energy consumption and relevant cost, with a view to reducing the total cost for the entire district. Van Pruissen et al. [
14] compared the efficiency of a multi-agent based energy market management system to traditional systems, and illustrated the benefits of the multi-agent based solutions. One of the main problems to be tackled during the optimisation and control of such large-scale systems is the prediction of loads. The optimisation and the determination of flexibility in district and urban level electricity-grids suffer from a lack of detailed (temporally and spatially) and prior knowledge about demand profiles [
15,
16]. Hence, Jing et al. [
17] proposed a forecasting model for district energy management using empirical models alongside an optimisation system to reduce energy costs. However, empirical models require certain assumptions, which increase model complexity. Further, district energy consumption has an uncertain energy consumption pattern, which means that demand for the energy consumption may fluctuate during the days of years or hours of the days. These fluctuations are mostly related to the seasonal effects and socio-economic factors such as occupants’ behaviour and the changes in their economic circumstances. In addition, the existence of fuel poverty at the household level affects energy demand which needs to be considered for district level forecasting. Many households fail to ensure a warm home, especially during cold winter days, because of low incomes, thermally inefficient homes and high energy prices [
18,
19]. To deal with these types of complexities, advanced, adaptive and intelligent solutions are often required. Related smart solutions provide promising means in the built environment to control and predict energy consumption. These include artificial neural network [
11,
20], support vector machine [
21], genetic algorithm [
22], and rule- [
23], and ontology-based systems [
24]. They have also been utilised in district energy management problems. Powell et al. [
15] proposed an ANN-based forecasting system prior to the optimisation and control of the district-level energy grid, as well as large-scale systems.
Load forecasting using machine learning (ML) algorithms has become very popular because of the increasing need for cost-effective prediction of demand at a finer temporal resolution to operate and manage the grid in cost- and energy-efficient manner. Several studies have proposed various approaches for ML-based prediction. Kandananound [
25] presented a forecasting process in Thailand to predict the electricity demand using three approaches: Autoregressive Integrated Moving Average (ARIMA) method, ANN and Multiple Linear Regression (MLR) on the annual electricity consumption data (1986–2010). ANN-based approach performed better than the remaining two in the study. Hernandez et al. [
26] proposed an ANN-based load forecasting system to predict hourly based energy generation data using solar radiation information. They have found that the disaggregated load forecasting increased the complexity of predicting electricity load for the next hour. Their best performed ANN predicted short-term electricity load with 15.34% Mean Absolute Percentage Error (MAPE). Similarly, Srinivasan [
27] proposed an evolved ANN based forecasting system to predict the weekdays and weekends electric loads using Genetic Algorithm (GA) as the optimisation engine. The proposed model forecasts hourly electricity load. The results indicate that the ANN-GA predicts the load better than the statistical approaches. However, this model utilises average hourly based electricity load where the average energy consumption may differ from the actual hourly based load. Further, the study does not demonstrate individual consumers’ demand (building level electricity consumption); hence, the proposed ANN-GA based forecasting system is not a desired approach for the smart microgrid applications. Further, Kalaitzakis et al. [
28] proposed a Gaussian encoding backpropagation based ANN model for short-term load forecasting using them in parallel (individual ANNs). The model is tested on the forecasting of a power system in the island of the Crete with relative errors of 1.5–13.4%. However, authors did not mention about the selection of inputs variables for ANN. Since the identification of input variables is very crucial and requires a systematic approach such as sensitivity analysis. Rodrigues et al. [
29] proposed a Levenberg-Marquardt algorithm based ANN for short-term electricity consumption for 96 buildings. The proposed method predicted daily electricity consumption with 18.1% means average percentage error. Authors used one single ANN for each building using their appliance average daily energy consumption as input and aimed to forecast daily building energy consumption. In this approach, authors did not consider the other sensitive variables which had an impact on the daily energy consumption. Moreover, they tried to forecast each individual building’s electricity consumption using all buildings information which affects the accuracy of the forecast. As each building’s energy usage pattern is different than each other due to different occupants’ characteristics. Moreover, these authors did also not do any topology optimisations. Another study is presented by Further, Chen et al. [
30], they proposed a forecasting system for the substation’s electricity load using ANN to support distribution system operation. The proposed method predicted the electricity load with about 2% mean absolute percentage error.
In addition to ANN, several methods have been used to forecast electricity demand; e.g., Gaussian Process Model [
31], Support Vector Machine (SVM) [
32], Mixed Lazy Learning (MLL) [
33], Adaptive Neuro Fuzzy Inference System (ANFIS) [
34] and Fuzzy Logic (FL) [
35].
Gaussian-based methods have two main limitations compared to other techniques: computational complexities and restrictive modelling for large datasets. Their applications using big data in demand-side electricity management are, therefore, challenging. On the other hand, computational intelligence techniques such as SVM, MLL, ANFIS, FL and ANN have better responses for complex problems, because of their autonomous and adaptive approximation methodologies. Among the reviewed techniques, ANN is effective in tackling the forecasting of such complex problems. Hence, this study adopted ANN-based methodology for district-level electricity management.
The main contribution of the proposed approach is to forecast sub-hourly electricity consumption of both individual building and substation (aggregator) accurately. Moreover, the study aims to demonstrate forecasting difficulties due to the different number of occupant and seasons. Further, this research also presents a systematic ANN development process including input parameters determination through a sensitivity analysis and topology optimisation for parallel ANNs where there is a lack of detail explanation in the related domain. The proposed hierarchical and systematic modelling approach is the main motivation of this research, which is also a necessity for the smart grid domain to generate an accurate energy information flow from buildings level to distribution operators level. As stated above, the previous studies did not consider a sensitivity analysis during the ANN development process. Moreover, they did not consider the effect of the occupants, who are under 15, on the forecasting of the electricity consumption. Further, the forecasting difficulties in the different season for individual building level has not been considered by literature which is considered in detailed in the proposed study.
The proposed research involves the following steps to achieve these objectives: (a) the determination of dependent and sensitive variables for the aggregated energy consumption using Principal Component Analysis (PCA) and MRA, which are then used in the ANN-based forecasting model; (b) ANN topology determination; (c) testing and validation; (d) prediction with best-performed ANN-topology; (e) implementation of the best performed ANN models in parallel to predict sub-hourly based electricity consumption; and (f) analysis the performance of each ANNs in each seasons with the aggregated results.
2. Artificial Neural Network for District Energy Management
Artificial Neural Network (ANN) recently became highly popular for energy management in the built environment, which is highly complex and nonlinear [
20,
36,
37,
38], primarily because of the strength of ANN in modelling complex systems. ANN mimics the biological neural system to find correlations for complex systems without having an explicit functional relationship [
11]. These relations are defined with artificial neurons and their artificial importance (weight) with transfer functions. This process is performed as a non-linear computational process to find the complex relationship between inputs and outputs. ANNs involve high performance, fast and non-linear analytics. The study presented in [
11] utilises ANN as a cost function engine for the optimisation system. One of the key elements to highlight about the ANN is that each developed ANN is problem specific. Once a new dataset with a specific number of inputs and outputs are modelled with an ANN, it cannot be applied on another problem with different configurations. Therefore, ANN based forecasting systems are problem-specific rather than domain-specific. Moreover, they are not generalised systems and cannot be replicable due to the lack of commonality between different problems’ dataset. ANN models typically utilise different number of variables in input and output layers, as well as different configurations. However, the working principles are same, as every ANN model undergoes a training process, a topology configuration, input variables, output variables, and an error target level. Once the training process is in place, then the trained network can be utilised in the selected problem set. During the training stage of an ANN model, it involves changing the weight of the links iteratively to direct the information down the correct path to the correct output [
11].
The generation of electricity to meet local demand is mostly governed by local consumers’ total peak demand [
39,
40]. Idowu et al. [
41] proposed a forecasting approach to predict the substations’ electricity demand in the district-level, which varies considerably because of households’ social and financial circumstances. Therefore, predicting the household-level electricity consumption with higher granularity can improve the prediction of substation-level demand by aggregating the demand of each connected house. Thus, the determination of the consumer’s peak electricity demand at household level becomes critical for district energy management. Like energy management, peak demand determination is also highly complex and hard to solve numerically. Since ANN has been widely implemented for load forecasting, they are also suitable for peak demand determination [
42].
Given their robustness and comparatively higher accuracy, ANN-based forecasting models have the potential to provide a perspective of the future electricity demand of each individual building at the district level. Hence, an advanced and intelligent control and management system will provide a holistic and adaptive control ability to the entire district. In addition, the intelligent controller can be enhanced with optimisation algorithms to minimise energy consumption per household and maximise the utility of the entire grid. A generalised smart grid energy management and control hierarchy is illustrated in
Figure 1, comprising three hierarchal stages and one negotiation and exchange stage. The first stage is the device level stage, which involves the activation of each individual device in the building. Buildings are at the second level, which is highly dependent on the building level energy consumption. The third level is the district level energy management that addresses energy demand of a specific district which is also sometimes called aggregator energy management system that organises negotiation, and exchange of information and money between other districts and connected buildings. The final stage relates to the Distribution System Operator’s (DSO) energy management system which organises the energy distribution between districts/aggregators. To optimise the entire process, the prediction of the building level energy demand becomes highly critical in the entire value chain. Therefore, an ANN-based forecasting system is proposed to predict the electricity demand of the individual households for a selected district.
3. Methodology
The proposed method involves: (a) the determination of dependent and sensitive variables for the aggregated energy consumption using Principal Component Analysis (PCA) and Multiple Regression Analysis (MRA); (b) topology determination; (c) testing and validation; and (d) prediction with best-performed topology. To implement the proposed methodology, a small district from Cork, Ireland has been selected. The data come from a smart metering trial by the Irish Commission for Energy Regulation (CER) where building energy consumption was logged on a thirty-minute interval [
43]. These data are accessible from the Irish Social Science Data Archive (ISSDA) [
43]. The selected dataset consists of about 7000 residential buildings’ energy consumption for the year 2012 and rich data (obtained through a questionnaire) about householders such as the number of occupants, number of child under age of 15, household income, occupancy patterns (people staying in the house for more than 5 h during the day) and so on. In addition, information on fuel poverty, i.e., the ratio of the total fuel payment and net household income, are present in the data. However, the data related to household financial information are not present. The employment status of the primary earning member and whether the buildings were adequately warmed have been fetched to correlate building energy consumption, alongside other variables. This work is an extended version of the work presented at a conference in 2016 [
44]. This paper contains further enhancements in terms of the variables considered, the number of experiments conducted, and the tasks accomplished in the pre-processing stage, as well as the detailed consideration of an extended number and scope of social variables.
To validate the proposed concept, six domestic buildings in the same grid have been selected which have different specifications: for example; the number of rooms in each building are 3, 4, 3, 3, 4, and 3 for the Buildings 1–6, respectively. Moreover, Buildings 1, 2, 5, and 6 utilise natural gas for space heating; the remaining two buildings use electricity for space heating. Moreover, Buildings 5 and 6 also use the available renewable resources installed on site. All buildings have one washing machine on site. Further, Buildings 2 and 5 have tumble dryer in the building. Buildings 1–5 contain large-size TV, and Building 6 has a smaller size TV. Building 2 has a game console on site. The next step is to link this dataset with relevant variables such as occupancy types and outdoor weather conditions using factor analysis. Results from the factor analysis; i.e., the most sensitive variables for total district energy consumption, are selected as Artificial Neural Network (ANN) inputs. Then, the best performing ANN topology is determined by testing several combinations of ANN, followed by training and validation. Finally, a district level aggregated electricity forecasting model is generated. This generated aggregated model will provide information about the electricity demand for the selected pilot district. The proposed forecasting model is illustrated in
Figure 2.
As shown in
Figure 2, the first step in the process is the collection of data, which are used to train and test the proposed forecasting algorithm. The second step is the analysis of sensitivities, which is divided into four sub-stages: (a) determine the required number of uncorrelated element,
i; (b) apply MRA on the given variables; (c) sort the absolute value of coefficients for each variable from high to low; and (d) select the top
i variables. The last stage is the development of ANN based forecasting system using the selected
i variables as input to the model. The ANN development comprises four sub-stages. First, the topology (the optimum number of hidden layers, process elements and training functions) is determined. Second, the ANN model is trained with the optimum configuration. Third, the trained network is tested and validated. Finally, the tested and validated network is utilised in real-life predictions.
4. PCA and MRA Analysis Based Sensitive Variables Determination
To determine the highly correlated and sensitive variables for the aggregated electricity demand for the pilot site, a PCA based dimension reduction approach [
11] is utilised. PCA is a multivariate orthogonal transformation approach that converts a set of observations of possibly correlated variables into linearly uncorrelated variables set using the eigenvalues of the covariance matrix for the initial set of variables [
45]. The study focuses on determining the most important social and environmental variables for predicting electricity consumption; hence, 23 variables of interest, including weather conditions and social variables are selected for PCA. Eight out 23 variables, as shown in
Figure 3, have been found as uncorrelated, indicating that these variables impact on the outputs independently without sharing information among each other; i.e., they are uncorrelated.
The next step is to determine the coefficients of the selected eight variables using a multi regression analysis (MRA) as in Equation (1).
where
is coefficient vector
,
is variable vector
and
is the total grid energy consumption for next 30 min.
According to the MRA, the eight highest coefficient values are found for variables: current electricity consumption, outdoor air temperature, outdoor humidity, wind speed, outdoor air pressure, visibility, wind direction and number of the occupant under the age of fifteen.
5. Determination of the Best-Performed ANN Topology
As highlighted in
Section 3, the main objective of the topology determination process is to find the best performing ANN architecture for each individual ANN model which contributes in parallel to the aggregated district energy consumption. Each proposed ANN model has eight sensitive variables and four-time information as ANN inputs and one output for the next thirty minutes’ energy consumption. In the proposed forecasting system, six parallel ANN models have been proposed to predict the aggregated energy demand. The cumulative forecasted energy demand provides the expected district energy consumption for this building cluster. The proposed ANN architecture with inputs and outputs is given in
Figure 4.
According to
Figure 4, each proposed ANN model has twelve inputs which are: month, day, hour, minute, outdoor air temperature, outdoor humidity, outdoor air pressure, wind speed, wind direction, visibility, number of occupants under the age of fifteen (e.g., zero, one, two, three, and so on), and current energy consumption; and single output as next thirty minutes’ energy consumption. In the proposed parallel ANN model, Buildings 1, 4 and 6 have 0 occupants under age of 15; Building 2 has three occupants who are under age of 15, who are also under age of 5 (they are staying in the house more than 6 h during the day); Building 3 has two occupants under age of 15, who are also above the age of 5 (they are not staying in the house more than 6 h during the day); and Building 5 has one occupant under age of 15, who is also under age of 5 (he/she is staying in the house more than 6 h during the day). Although PCA-MRA based pre-processing did not correlate electricity consumption and the opinion about the buildings’ temperature (i.e., if they were adequately warmed up), the authors wanted to see if there was a relationship among them by investigating the household budget, a proxy indicator for fuel poverty. As per rich data (i.e., questionnaire survey), respondents from all households believed that their houses were adequately warmed up and the ratio of annual fuel expenses and annual household income was less than 0.1 or 10%, the fuel poverty threshold. The historical energy consumption data were for 18 months. The first year’s data were used for training, while the remaining six months’ electricity consumption data were utilised for testing and validation.
The training process for each building started with the determination of the best-performed training algorithm, as illustrated in in
Table 1, while keeping the other variables constant; e.g., maximum number of iteration as 5000; the learning rate as 0.01; and the momentum coefficient as 0.95. In addition, the number of hidden layers is kept as two, the numbers of the process elements in each hidden layer are kept as 25 for both layers, and the transfer function types in both two hidden layers and the output layer are selected as logarithmic sigmoid with maximum epoch number of 5000 with 10 repetitive runs. Further, the mean square error (MSE) for the parameter tuning during the training stage is set to 0.001, to keep the training error as low as possible. In this case, this value is found as 0.001 with empirical tests. Moreover, the dataset is normalised between 0 and 1.
The average results of 10 runs for each ANN model are given in
Table 2. The best performed ANN was found for all six buildings with trainlm based algorithm (No. 9 in
Table 1) [
47]. Hereafter, further experiments will be carried with this algorithm (determined value) using other parameters (number of hidden layers, number of process elements in hidden layers and transfer function) where the initialised values of other parameters will be updated one by one in further stages.
The next stage is to find the required number of the hidden layers for the best performed ANN. To determine the optimum number of hidden layers, one, two, and three hidden layers are tested for each building with ten repetitive runs. The average results of ten runs for each building are presented in
Table 3.
According to
Table 3, some experiments achieved the targets with one and three hidden layers for some buildings (coloured in grey), but the best performed ANNs were found with two hidden layers for each building specific ANNs, with the lowest number of iterations. Moreover, ANN for Building 1 achieved the target MSE (0.001) with 278 epochs. Similarly, energy consumption for Buildings 2–6 has been found after 413, 372, 291, 276 and 294 epochs, respectively. Based on the experiments, the topology optimisation will be carried out using two hidden layers.
The next stage of the experiments is the determination of the transfer function in both hidden and output layers. Three types of transfer functions are being considered—hyperbolic tangent sigmoid function (tansig), logarithmic sigmoid function (logsig) to include nonlinearity in the learning process, and linear transfer function (purelin) to transform a linear mapping between inputs and outputs. The experiments were conducted with ten repetitive runs for each selected configuration, and the average results are shown in
Table 4.
According to the testing results presented in
Table 4, the best-performed topology has been found with the combination of [Logsig-Logsig-Logsig] in both hidden layers and output layers. In some cases, some other combinations also provided desired target level but none of them performed better than the [Logsig-Logsig-Logsig] combination. This combination satisfied the desired target level for all six ANN models with lowest epoch numbers. In the following experiments, this combination will be utilised alongside previous best-performed parameters. The last step of the topology determination process is to find the best-performed number of process elements in the hidden layers. To perform this experiment, several combinations of the process elements are utilised in hidden layers for each ANN model. These combinations and the average results for the repetitive runs are given in
Table 5.
According to
Table 5, the desired MSE was found in most cases; the best-performed ones are coloured in blue and presented in bold font. The others that achieved the desired MSE level are coloured grey. The best-performed ones are the ones that met the desired MSE with the lowest number of the epochs. The selected topology for each ANN model is presented in
Table 6.
Since the ANN development is a data driven process, configurations presented in
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 are dependent on the datasets. Configurations and MSEs are, therefore, not guaranteed for different datasets. Moreover, this configuration cannot be generalised in some parameter. However, Yuce et al. [
23] stated that trainlm based training function performs the best in electricity management problems; and that other configurations do not seem to have similar performance.
6. Results and Discussion
The proposed ANN models are developed and tested in a computer with Intel
TM Core i5 2.27 GHz processor and 4 GB memory. MATLAB 2016a was used as the software platform. As per
Table 6, the best-performed training algorithm is Levenberg-Marquart for every ANN model with two hidden layers. The logarithmic sigmoid transfer function is used in both hidden and output layers. Finally, the number of the process elements for each hidden layer are (15 20), (15 20), (15 20), (15 20), (15 30), and (15 30) for the ANN of Buildings 1–6, respectively. Training performance of these best performed ANN models is illustrated in
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9 and
Figure 10 by comparing with expected electricity consumption (expected electricity consumption is the actual electricity consumption which is occurred after 30 min from the corresponding prediction time frame) for a typical day of each season.
Results presented in
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9 and
Figure 10 demonstrate the accuracy of the developed ANN for a typical day, which is the middle day of each season, for each building. The error rate for this selected day is found based on the average percentage error, computed based on the Equation (2).
Further, an error analysis is carried out to determine the accuracy of the training process for each ANN using the average percentage error for the selected period (one year); the results are presented in
Table 7. Moreover, the correlation between the predicted demand and expected electricity consumption (it is the actual electricity consumption which is occurred after 30 min from the corresponding prediction time frame) are statistically analysed using Pearson correlation coefficient and regression analysis.
According to
Table 7, all Pearson correlation coefficients are found greater than 0.90 for the predicted and the expected results, demonstrating a high correlation between the predicted and expected values. Further, the best performing ANN results are found with Building 1 with 4.03% average percentage error. The high accuracy of ANN for the Building 1 can also be seen based on the linear regressions coefficient as shown in
Table 7. In this paper, the linear regression model for this problem is presented as in Equation (3).
Based on Equation (3), the expected energy consumption for Building 1 almost equals to the predicted energy consumption by including “
a” as 1.85 × 10
−5 and “
b” as 0.998. Moreover, coefficients for Building 2 are also verified that the performance of ANN for Building 2 is the least accurate one, which has three children under the age of fifteen who are also under the age of five. Moreover, according to the comparison between buildings based on the number of children occupant, Building 2 consists of three children under the age of five (who are staying at home more than 6 h during the day) with the average percentage error of 15.81%, which is 6.78% higher than the average percentage error found in Building 3 that consists of two children who are under age of fifteen and not staying at home more than 6 h during the day (age > 5 years). According to the comparison between Buildings 3 and 5 (one child, who does not stay at home more than 6 h during the day), the error difference is found as 1.98%. Based on this result, the difference between the error of Buildings 2 and 3 is about three times greater than the error of the difference between Buildings 3 and 5. It was challenging to achieve prediction results for the buildings with irregular energy consumption patterns. This expectation is confirmed with both average percentage error and linear regression analysis. The building with no children under age fifteen has a regular energy consumption pattern compared to the buildings with children under age of fifteen. However, the prediction error is lower in the aggregated energy consumption. The aggregated energy consumption prediction error for these six buildings is found as 4.33%. This result shows that the error level in the building level can vary under different scales. However, this variation is much lower in the aggregated electricity compared to the building level. Prediction in some time stamp can be higher for one building while potentially lower in another. Hence, the effect of the prediction variation stays lower during the aggregation, as illustrated in
Figure 11.
In
Figure 11, the red and blue coloured lines denote the expected total (i.e., aggregated) energy consumption and the total predicted energy consumption for six buildings, respectively. The aggregation is accomplished by adding together all buildings’ electricity consumption. The training stage’s aggregated forecast accurately traces the aggregated expected electricity consumption.
Further analysis is carried out for the testing stage of the developed ANN models for each building. The average percentage error, Pearson correlation coefficient and regression analysis based comparison are illustrated in
Table 8.
As illustrated
Table 8, the testing results for Building 1 is found best compared to other building results. As Building 1 does not have any children under age of 15, the better prediction is obtained compared to other buildings. Hence, the results are expected to be lowest. However, this result is expected for Buildings 4 and 6 too. However, their prediction results slightly worse compare to the Building 1. In addition, they have also better accuracy rate compare to buildings than the Buildings 3 and 5. This result is also verify based on the linear regression coefficients. The results for the Building 2 are still found the worst results in
Table 8 (red font). This is also expected since this building consists of occupants which are 3 children under age of 15 and adults. According to the linear regression coefficients, the best prediction results during the testing stage are found with the Building 1. This result shows that an expected result is equal to 1.065 times of the predicted result. Finally, the aggregated results for the testing stage are also illustrated in
Figure 12.
According to the error analysis between the aggregated expected electricity and the aggregated predicted results, it has been found that the average percentage error is 13.41% which is the average of the errors presented in Column 2 of
Table 8. Although this result is slightly higher than the average errors of the Building 1 and 6, the results are still lower than the rest of the buildings’ results. This output also verifies that the aggregated prediction results are still less affected compared to individual buildings (apart from Buildings 1 and 6). The worst average percentage was found for Building 2 as 22.64%, computed based on the average percentage differences between expected and predicted values. A detailed analysis of existing literature and proposed study is presented in
Table 9 to demonstrate the main contribution.
Finally, an analysis is carried out on the average error for each building in different seasons. The entire period’s (1.5 year, including training and testing time periods) predicted and expected consumption are considered in this analysis. The results are illustrated for winter, spring, summer and autumn seasons in
Figure 13,
Figure 14,
Figure 15 and
Figure 16, respectively. Spring season has the highest average percentage error, compared to other seasons, for every building and the aggregation, while the results for autumn season have the lowest average percentage error for every building and the aggregation. The electricity consumption during winter season appears to have a consistent profile. On the other hand, the consumption profiles are intermittent during spring, regardless of the occupancy type.
As shown in
Table 9, the proposed study has demonstrated a systematic approach for ANN development and implementation for the smart grid domain both using a sensitivity analysis for the input parameter selection, social variables involvement and analysis in seasonal levels (coloured as blue in the above table). Further, the proposed study also presents the usage of the parallel ANN process in the grid level which is assumed as single ANN model in the other literature studies. The detailed conclusion is presented in
Section 7.
7. Conclusions and Limitations
The main objective of this study is to develop an accurate and robust ANN-based forecasting models (in parallel performing ANNs) for the sub-hourly prediction of the electricity consumption in district-level which consists of multiple building based consumers; moreover, this information is then aimed to utilise in the smart grid domain. Further, the average accuracy of forecasting system will also be used to adjust the demand’s flexibility at Aggregator and DSO levels; hence, the high accuracy of the forecasting systems is the key approach. To achieve high accuracy with the forecasting, the objective of the study was enhanced with the accuracy analysis in different seasons, and varying occupant types. Moreover, the study was also aimed to demonstrate a systematic approach to the development parallel ANNs including a sensitivity approach to determine the inputs of ANN and implementing an experimental design to optimise the topology among multiple configuration types. Using a systematic approach during the ANN development stage and stating the accuracy differences in different seasons will provide the aggregator and DSO operators to update their demand flexibility and adjust their loads during the pick hours based the provided limits. Further, results for buildings with different occupancy types provide the Aggregator and DSO operators to have adaptive demand flexibilities. Hence, the proposed model is implemented on six buildings, with different characteristics and occupancy type. The development of the ANN-based electricity demand forecasting model started with the topology determination; i.e., the identification of the most appropriate ANN inputs. Sensitivities of electricity consumption and environmental variables are conducted using Principal Component Analysis (PCA) and Multi-Regression Analysis (MRA). The remaining topology parameters such as the number of hidden layers, number of processes in hidden layers, transfer function types and the training algorithm are found through several parametric experiments. The topology analysis is carried out for each individual ANN model, followed by model training and testing. The model is developed and tested on the Irish Smart Grid dataset comprising monitored electricity consumption data for 18 months.
Results indicate that the prediction of electricity consumption of residential buildings with children aged up to fifteen is harder than the buildings occupied only by adults. However, the aggregated electricity demand prediction has a lower prediction average percentage error (i.e., less sensitive) compared to the individual buildings. With regards to seasonal predictions, the average percentage error is lower during winter, while autumn season has the highest average percentage error. Irregular demand for electricity during autumn may be attributed as the reason. Since peak demand prediction is critical for the district-level electricity management, greater accuracy of prediction is important if the district is to incorporate flexibility in management. The accuracy of the prediction is, therefore, investigated in detail, including the effects of occupancy type and season. The worst average prediction average percentage error is found as 19.18% during spring for buildings occupied by children under age of 15. The lowest average percentage error is found as 4.06% with the building with no children under age of 15 for winter. Further, the lowest average percentage error for the aggregated electricity consumption is found for winter (4.51%) and the highest average percentage error (8.82%) is found for spring. The accuracy of the proposed model is highly depending on the consistency of the data: if the existing data are not very representative, than the accuracy will be lower than the accuracy of a well-correlated dataset. Since the ANN-based forecasting system is also problem specific, the scenario changes, which means changes in the dataset, will affect the ANN topology and configuration even if the inputs and outputs will remain the same. Hence, the generalisation of the data with ANN is not possible unless the training process with the new problem is conducted with the proposed methodology.
Research reported in this paper is one of the very few on the importance of social and/or demographical characteristics on forecasting electricity consumption in the distribution grid. With the increased penetration of variable renewable energy resources, having an accurate forecasting of demand at the substation level and below becomes important. The effectiveness of social variables in predicting average and peak demand is successfully highlighted here; however, the main limitation is related to the lack of detailed information about fuel poverty. Although the householders’ general opinion was that the indoor temperature was adequate, it was not clear how and whether fuel poverty affected their consumption signature. Details on energy expenses, household budgets, and home level appliances can provide a better estimation, which needs to be explored in future research. The other challenge was related to the correlation of householders’ opinion with electricity consumption. Quantitative estimates or measurements of indoor environmental conditions may ameliorate some of the related limitations. Furthermore, the measurement through the Irish Smart Grid trial was carried out in 2012; hence, further information about the households’ energy consumption and relevant social characteristics could not be gathered.
Finally, it is found that ANN-based forecasting solution is performing very well for the district level energy prediction. This approach is very sensitive with irregular patterns; hence, the selection of data or pre-processing of the data is the key to reducing estimation errors. However, this approach may still not be enough to achieve a better solution with ANN on irregular datasets. It may be required to utilise statistical or other data mining solution to tackle these types of datasets such as predictive classification algorithms or high-order time series techniques.