A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm

Li, Kangji; Wei, Borui; Tang, Qianqian; Liu, Yufei

doi:10.3390/en15238780

Open AccessArticle

A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm

by

Kangji Li

^*,

Borui Wei

,

Qianqian Tang

and

Yufei Liu

School of Electricity Information Engineering, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(23), 8780; https://doi.org/10.3390/en15238780

Submission received: 24 October 2022 / Revised: 17 November 2022 / Accepted: 18 November 2022 / Published: 22 November 2022

(This article belongs to the Section G: Energy and Buildings)

Download

Browse Figures

Versions Notes

Abstract

:

Building electricity load forecasting plays an important role in building energy management, peak demand and power grid security. In the past two decades, a large number of data-driven models have been applied to building and larger-scale energy consumption predictions. Although these models have been successful in specific cases, their performances would be greatly affected by the quantity and quality of the building data. Moreover, for older buildings with sparse data, or new buildings with no historical data, accurate predictions are difficult to achieve. Aiming at such a data silos problem caused by the insufficient data collection in the building energy consumption prediction, this study proposes a building electricity load forecasting method based on a similarity judgement and an improved TrAdaBoost algorithm (iTrAdaBoost). The Maximum Mean Discrepancy (MMD) is used to search similar building samples related to the target building from public datasets. Different from general Boosting algorithms, the proposed iTrAdaBoost algorithm iteratively updates the weights of the similar building samples and combines them together with the target building samples for a prediction accuracy improvement. An educational building’s case study is carried out in this paper. The results show that even when the target and source samples belong to different domains, i.e., the geographical location and meteorological condition of the buildings are different, the proposed MMD-iTradaBoost method has a better prediction accuracy in the transfer learning process than the BP or traditional AdaBoost models. In addition, compared with other advanced deep learning models, the proposed method has a simple structure and is easy for engineering implementation.

Keywords:

electricity load forecasting; data-driven model; transfer learning; MMD; iTrAdaBoost

1. Introduction

In recent years, the construction industry has gradually become the largest energy consumer in the world. Its energy consumption accounts for about 32% of the world’s total energy consumption. In many developed countries, its share is even higher. For example, in Europe, the construction industry consumes 40% of the EU’s primary energy; in the United States, this rate also reaches 39%. With the growing population and increasing requirements of dwelling comfort, the growth rate of building energy consumption will continue to rise. Out of all the building-related energy consumption, the building operation energy consumption occupies an important part. The reliable and accurate prediction of this part’s usage is the key for improving the building energy efficiency [1], which has attracted more and more attention. A significant proportion of buildings have already been equipped with smart metering devices for data recording, and a large number of predictive models have been designed for energy consumption forecasting. Most current models’ performances would be greatly affected by the quantity and quality of the collected data. For buildings with sparse historical data, reliable and accurate predictions are still difficult to achieve. The problem of data silos caused by the insufficient data collection of buildings is the main challenge to the widespread use of building energy prediction, which is the main focus of this study.

1.1. Related Works

It is well known that data-driven technologies have the ability to derive different energy use patterns based on reliable time-series data. According to the literature on this topic [2], energy prediction cases could be classified by the building type, energy usage type and predictive time scale. In terms of forecasting methods, there is a lot of literature presenting various predictive methods, including a statistical analysis, machine learning, deep learning, etc. [3,4,5]. The most widely used data-driven models are artificial neural networks (ANN) and their variants [3]. With the rapid development of optimization technology, many swarm intelligence algorithms are also combined with data-driven models for prediction performance improvement, such as the genetic algorithm (GA), particle swarm optimization (PSO) and Teaching-and-Learning-Based optimization (TLBO). Results have proved their effectiveness for multi-scale predictions of building energy consumption [6,7]. As a kind of predictive method with a strong adaptability and high accuracy, the ensemble learning scheme has also been applied more and more to energy consumption prediction recently. The core idea is to train different learners (weak learners) with the same training set and then ensemble them together to form a stronger learner (strong learner) for prediction tasks, such as Boosting-type ensemble models. The most successful application is the AdaBoost model, which is mainly applied to classification problems and has been extended to regression problems in these years [8,9]. Xiao et al. [8] proposed an ELM-AdaBoost model for short-term power load forecasting. The accuracy was improved by the hybrid ELM and Boosting algorithm. Xiao et al. [9] introduced the AdaBoost algorithm to a hybrid predictive model with a selective ensemble strategy. The results showed that the prediction performance of the proposed model was better than that of the grouped autoregressive model and the other seven hybrid models. In addition to the Boosting models, some parallel ensemble methods have also been applied to building energy forecasting. Li et al. [7,10,11] applied tens of different sublearners for ensemble learning, and a variety of intelligent optimization algorithms were used for parameter adjustment. The proposed ensemble strategy had a better adaptability and generalization ability compared with other models.

Although data-driven methods have shown great performances in building energy prediction, their usages are limited by constraints in some practical scenarios [12]. For example, the dimensionality and quantity of the datasets have a great influence on the prediction accuracy. When dealing with a huge amount of multi-feature data, complex neural networks generally yield more accurate predictions than simple linear regressions [13,14]. On the other hand, the reliance of advanced models on large amounts of training data also brings challenges, especially when buildings’ historical data are sparse or the data distribution changes over time. Before using advanced machine learning algorithms, the dataset acquisition in real practical scenarios faces the following obstacles [15,16]:

(1) Sufficient available data may be expensive or difficult to collect. The performance of the data-driven model depends on sufficient, labeled, high-quality data. However, for some older buildings, the available data are sparse or expensive (manual meter reading); for new buildings, not much historical data are collected.

(2) Traditional machine learning methods assume that training and testing data come from the same data distribution. When buildings’ occupiers, using schedules and purposes change, the data distribution will change. The trained model needs to be readjusted before the usage to adapt to the new data.

(3) When the general predictive model is applied to specific scenarios, it needs to be adapted through retraining; otherwise, there is a cold start problem.

In this context, more and more attention is put to transfer learning technology. It is a new kind of machine learning method that refers to the reuse of pre-trained model(s) for other similar tasks. Rather than developing a separated learning model for each task, transfer learning intelligently uses the knowledge from other domains to solve problems in the target domain; thus, the need for data resources is reduced [17]. Recent studies showed that techniques such as transfer learning and semi-supervised learning can lead to significant improvements in the performance of machine learning models [18,19]. Transfer learning has been successfully applied to various fields, such as image detection and classification [20,21], fault diagnosis [22] and natural language processing [23].

Due to the advantages of transfer learning, it has also been gradually introduced to building energy consumption prediction. When the available data of the target building are insufficient, transfer learning methods can still obtain a satisfactory prediction accuracy by adding similar data for training so as to reduce the data acquisition cost of the target buildings [24]. In 2022, Yusun Ahn et al. [25] proposed a transfer learning approach based on a reference building’s simulation dataset. The established transfer learning long- and short-term memory (TL-LSTM) model was used to forecast the next 24 h power consumption of an office building. Milan Jain et al. [26] transferred the data and parameters from a physically based simulation framework to a field case to solve the problems of data with a sparse quantity and unreliable quality. Similar research can be found in [27,28]. From the above papers, the concept of transfer learning was applied to prediction cases, but the difference in the data domains (source and target domains) is not fully discussed, and the transferring process is crude to some extent. Bowen Li et al. [29] presented that a large overlap of two tasks’ features is required, and transferring information from an uncorrelated task may be detrimental to training a model for the target task. Chendong Wang et al. [30] proposed a hierarchical transfer model and a combined transfer model for the heat load prediction in central heating stations. The effectiveness of the transfer learning method was verified, and the similar dataset’s selection was not specified. Dai et al. [31] used a re-weighting strategy that was similar to the traditional AdaBoost method for finding and transferring useful data from the source domain to the target domain. The so-called TrAdaBoost algorithm has the ability to decrease the adverse effect of misleading data in the source domain and produce an ensemble classifier in the target domain. Up to now, this method is still in improvement continuously [32,33]. Let it be noted that Dai’s TrAdaBoost is a binary classifier and cannot be directly used for energy prediction tasks.

1.2. Contribution and Novelty

According to the literature review, transfer learning is still in its early stage in the applications of building energy prediction. Most of them are simply based on the customization of the transfer learning toolbox. How to use the source-domain data as an auxiliary training set to participate in the training of the entire model requires more in-depth analysis and modeling; moreover, the process of building the data selection from similar domains is not given as a method or is relatively simple.

To address the above issues, a transfer learning model based on AdaBoost with an automatic weight adjustment is proposed for the building electricity consumption prediction. The main contributions of this paper include:

(1) To improve the accuracy of the learning model, an automatic weight adjustment mechanism is established to optimally adjust the auxiliary training set and screen the data that are helpful to the target domain;

(2) The Maximum Mean Discrepancy (MMD) is used to judge the similarity of two domains and screen the appropriate dataset for training so as to improve the model accuracy;

(3) A public dataset containing 549 educational buildings is used for model verification. The prediction results of the proposed model are compared and analyzed with those of the previously reported data-driven models.

1.3. Paper Organization

The following section describes the algorithms used in the proposed model. The principle of the transfer learning framework is also described in this section. Section 3 provides a case study using the proposed method. The prediction results are specified and analyzed. Section 5 gives a brief conclusion.

2. Principles and Methods

2.1. Principle of Transfer Learning

Transfer learning is an expansion of machine learning capabilities and is a new direction in the field of machine learning. The basic definition is as follows: given a source domain

D_{s}

and a learning task

T_{s}

, a target domain

D_{t}

and a learning task

T_{t}

, transfer learning is aiming to acquire knowledge from the source domain

D_{s}

and learning task

D_{t}

to help enhance the learning of the predictive function

f_{t} (\cdot)

in the target domain

D_{t}

, where

D_{s} \neq D_{t}

or

T_{s} \neq T_{t}

[34]. In a scene suitable for transfer learning, the target domain usually contains a finite number of labeled and unlabeled samples, which is insufficient for effective model training. The assistance of the similar training data in source domain is needed. An example of transfer learning process is shown in Figure 1.

Inspired by the AdaBoost algorithm, Dai et al. [31] firstly proposed a transfer learning algorithm, so-called TrAdaBoost [35,36]. The Boosting style’s transfer learning algorithm trains the learner by re-weighting samples from both source domain and target domain. Concretely, to ensure the accuracy of the model in the target domain, the traditional AdaBoost algorithm is used to increase the weights of the samples with incorrect predictions; to reduce the importance of the samples with incorrect predictions in source domain, the updated weight

β

is introduced to reduce their weights. According to Dai’s report, the traditional TrAdaBoost algorithm only handles binary classification problems. For this reason, the target-domain error rate during the training iterations is restricted as that in the traditional AdaBoost algorithm.

2.2. The Improved TrAdaBoost Algorithm

To expand the applications of TrAdaBoost algorithm to time-series prediction problems, an improved TrAdaBoost algorithm (iTrAdaBoost) is proposed in this study. For the training data from target domain that have the same distribution as the target task, the data with incorrect prediction are given higher weights so that the difficult data could be trained more accurately. For the training data from source domain that have different but similar distribution, the weights of data with incorrect prediction are reduced. In this way, we can find the data in the source domain that are more similar to the target task.

Based on the basic TrAdaBoost algorithm, the main improvements for time-series prediction in this study are as follows: (1) For binary classification, the threshold value in the basic TrAdaBoost algorithm is set to a fixed value. In the proposed iTrAdaBoost, the threshold value

γ

is set to change according to the prediction error rate of the current iteration so that the prediction accuracy of the model can be improved with the iterations. See Algorithm 1 for details. (2) Considering the different effects of data from source and target domains, the weights updating methods are different for the two domains, as shown in Equations (5) and (6). The details of the improved TrAdaBoost algorithm are specified in Algorithm 1.

Algorithm 1 iTrAdaBoost algorithm

Input: Target training dataset

T_{a}

, source training dataset

T_{b}

, merged training dataset

T = (T_{a} ⋃ T_{b})

, target dataset S, weak predictive models (Learner) and number of iterations N.
Initialization
Initialize the weight vector

ω^{1} = (ω_{1}^{1}, ω_{2}^{1}, \dots, ω_{n + m}^{1})

,

\begin{matrix} ω_{i}^{1} = \{\begin{matrix} 1 / n & if & i = 1, \dots, n \\ 1 / m & if & i = n + 1, \dots, n + m \end{matrix} \end{matrix}

(1)

where n denotes the number of samples from the target domain and m denotes the number of samples from the source domain.
For

t = 1, \dots, N

:
Set the weight

P^{t}

to meet:

\begin{matrix} P^{t} = \frac{ω^{t}}{\sum_{i = 1}^{N} \int_{0}^{1} ω_{i, y}^{t} dy} \end{matrix}

(2)

Call Learner. Based on the merged training data T and the weights of each data

P^{t}

, a hypothesis is obtained,

h_{t} : X \to Y

.
Calculate the error of

h_{t}

on

T_{b}

:

\begin{matrix} e_{t} = h_{t} (x_{i}) - y (x_{i}), \end{matrix}

(3)

where

h_{i} (x_{i})

is predicted result and

y (x_{i})

is the actual result.
Set the mean square error

E_{t} = \frac{1}{2} \sum_{i = 1}^{n + m} (h_{i} (x_{i}) - y (x_{i}))

, and threshold

γ

\begin{matrix} Δ γ_{t} = - η \frac{\partial E_{t}}{\partial y_{t}}, \end{matrix}

(4)

where

η

is the learning rate.
Calculate the error rate of the prediction. If the error of

h_{t}

on

T_{b}

is less than the threshold

γ_{t}

, the prediction is considered accurate; if the error of

h_{t}

on

T_{b}

is greater than the threshold

γ_{t}

, the prediction is considered inaccurate.
Set

β_{t} = e / (1 - e)

, and

β = 1 / (1 + \sqrt{2 ln n / N})

, where

e = f / m

. f is the number of inaccurate predictions.
Update the new weight vector

ω_{i}^{t + 1}

as follows:

\begin{matrix} ω_{i}^{t + 1} = \{\begin{matrix} ω_{i}^{t}, & e_{t} < γ_{t} \\ ω_{i}^{t} β_{t}, & e_{t} ⩾ γ_{t} \end{matrix} \end{matrix}

(5)

where

i = 1, . . ., n;

\begin{matrix} ω_{i}^{t + 1} = \{\begin{matrix} ω_{i}^{t}, & e_{t} ⩾ γ_{t} \\ ω_{i}^{t} β_{t}, & e_{t} < γ_{t} \end{matrix} \end{matrix}

(6)

where

i = n + 1, . . ., n + m .

Output: The final result at the end of the iteration.

2.3. Maximum Mean Discrepancy (MMD) Similarity Judgment

Maximum Mean Discrepancy (MMD) is frequently used for judging similarity in transfer learning [37]. It mainly measures the distance between two distributions in the regenerative Hilbert space, which belongs to a kernel learning method. The calculation of the mean distance between two piles of data is difficult. To measure the distance between two distributions of different but related random variables, finding a suitable mapping space may simplify the calculation task.

The basic principle of MMD is: assume a dataset satisfying the P distribution

X^{s} = [X_{1}^{s}, \dots, X_{n}^{s}]

and a dataset satisfying the Q distribution

X^{t} = [X_{1}^{t}, \dots, X_{m}^{t}]

. There is a regenerative Hilbert space H (

R K H S

) where exists a mapping function

ϕ (\cdot) = X \to H

denoting a mapping from the original space to the Hilbert space. Basically, the principle of MMD is described in Figure 2, and the MMD of

X^{s}

and

X^{t}

can be formulated as

f (X^{s}, X^{t}) = ∥ \frac{1}{n} \sum_{i = 1}^{n} ϕ (x_{i}^{s}) - \frac{1}{m} \sum_{i = 1}^{m} ϕ (x_{i}^{t}) ∥_{H}

(7)

2.4. General Framework

Building energy consumption prediction is essentially a kind of time-series prediction problem. The performance of the transfer learning models depends heavily on the similarity between the source and target tasks. In this study, a combined MMD-iTrAdaBoost predictive model is proposed for building electricity load forecasting task. For buildings with sparse or low-quality data, this model would find similar building data from public dataset by MMD-based sample selection method. Here, the building data mean building energy consumption and meteorological data. Be noted that data for target task belongs to target domain, and data from MMD selection belongs to source domain. After training and testing data division of target-domain data, the training dataset for model construction is formed that involves training data from target domain and data from source domain. Then, through above-mentioned iTrAdaBoost algorithm, the transfer learning Boosting model is established. By using testing data from target domain, the performance of the proposed model could be validated. The overall framework of this predictive model is provided in Figure 3.

3. Data Sources and Data Preprocessing

3.1. Data Sources

In order to verify the performance of the proposed predictive model, an educational building’s short-term electricity load prediction is set as the target task. To improve the accuracy of the prediction, data from another similar building (source domain for transfer learning) is used as auxiliary data for model training. A public dataset, from which the auxiliary dataset is selected, came from the Great Energy Predictor III competition organized by ASHRAE. The public dataset involves over 1000 buildings in 14 locations belonging to 14 building types. The types of datasets include building energy consumption (WBE), outdoor dry-bulb temperature, humidity, wind direction, wind speed and sea level pressure, which were recorded from 1 January 2016 to 31 December 2017. The classification description of building types in the public dataset is provided in Figure 4.

These 1000+ buildings contain 549 educational buildings, 279 office buildings, 184 entertainment buildings, 147 residential buildings and other types of buildings. For this study, the similar building is selected from the same type of buildings, i.e., educational buildings. Firstly, randomly select one building from education-type buildings. Then, the MMD selection process is carried out to judge its similarity to the target building. The smaller calculated distance means the higher similarity. All educational buildings traverse the above selection process once; the building with the smallest distance is the most similar building to the target one. The selected building has a building area of 75,929 m² (target one: 182,943 m²). Different data distributions of two buildings are recorded in Figure 5 and Figure 6.

3.2. Data Preprocessing

Total of 2856 sets of hourly data, including energy consumption and meteorological data from January 2016 to April 2016, are selected as dataset from target domain (called target dataset). According to the principle of iTrAdaBoost algorithm, this dataset is divided into two subsets for model training and testing. In order to simulate the lack of available data in the target building, only 10%–30% of the data is used for modeling training, and others are used for model testing. The role of the dataset from source domain is to assist model training when the training data from the target domain are insufficient. Total of 4000 sets of hourly data, including energy consumption and meteorological data from January 2016 to June 2016, are selected in source domain (from similar building), and noted as auxiliary dataset.

According to the information provided by public dataset, the predictive model has 8 inputs, which is formulated as:

I n p u t v e c t o r = [y (t - 1), T_{a i r} (t), P_{s e a} (t), T_{d e w} (t), W_{d i r} (t), W_{s p d} (t), s h, c h],

(8)

where

y (t - 1)

represents historical energy consumption of previous 1 h,

T_{a i r} (t)

is air temperature,

P_{s e a} (t)

is sea-level pressure,

T_{d e w} (t)

is dew temperature,

W_{d i r} (t)

is wind direction,

W_{s p d} (t)

is wind speed

s h

and

c h

mean sine and cosine values of hours in one day. It is known there are other factors that affect the use of building energy consumption, such as occupancy, average heating or cooling days (HDD and CDD). From the perspective of transfer learning, we just adopt the information from public dataset without adding other manual data. For the same reason, the feature selection process is also ignored. The data normalization is carried out, which is formulated as follows:

y = 2 \times \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}} - 1

(9)

In this study, mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square error (RMSE) are used for performance evaluation. The mathematical formulas of these three metrics are as follows:

M A P E = \frac{1}{n} \sum_{i = 1}^{n} \frac{| Y_{i} - Y_{t} |}{Y_{i}}

(10)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(Y_{i} - \hat{Y_{t}})}^{2}}{n}}

(11)

M A E = \frac{\sum_{i = 1}^{n} | Y_{i} - \hat{Y_{t}} |}{n}

(12)

where n is the number of data in the dataset,

Y_{i}

is the original value of the data and

\hat{Y_{t}}

is the predicted value of the data.

4. Results and Analysis

4.1. Parameter Setting

To investigate the effect of the proposed transfer learning method, a case study of the short-term building electricity load prediction is carried out. The time step is set to hourly; the BP network is set as a weak learner in the proposed iTrAdaBoost algorithm, and its structure is set to 8–10–1 (input layer–hidden layer–output layer); the iteration number is set to 100. The main parameters’ settings are specified in Table 1. Let it be noted that all the calculations are performed with MATLAB-r2021a software (Intel Core i7, 2.80 GHz and 16 GB RAM).

4.2. Results with Different Ratios of Target-Domain Data

Based on the proposed transfer learning model and the above parameter settings, the target building’s hourly electricity load prediction is performed. In order to investigate the performance of the proposed MMD-iTrAdaBoost model, the popularly used BP artificial neural networks and typical ensemble model AdaBoost are also carried out for comparison in this case study. By using three ratios of the target-domain data, the prediction results are recorded in Table 2. From the table, it reveals that the transfer learning model always achieves the best prediction accuracies with a small amount of the target building’s historical data.

The predictions with different ratios of the target-domain data are also compared in Table 2. When the ratio is set to 10%, i.e., only 10% of the target-domain data for the model training, the proposed model shows a good transfer learning effect, and the prediction error (MAPE: 8.94%) is much lower than those of the other two nontransfer learning models (BP: 16.79% and AdaBoost: 10.01%). When the ratio is increased to 20% and 30%, all three predictive models show better accuracies. Concretely, the BP model’s prediction error is decreased from 16.79% to 13.22% (ratio: 30%, MAPE), the AdaBoost model’s prediction error is decreased from 10.01% to 7.51% and the proposed model’s prediction error is decreased from 8.94% to 6.72%.

From the results, it is seen that when the ratio of the target training data is increased from 10% to 20% and 30%, all three predictive models show better accuracies. Obviously, the more training data from the target domain that are available, the better all the predictive models will be. The proposed transfer learning model could always achieve the best prediction accuracies compared with the other two models. The reason is that a similar building is picked out by the MMD, and the useful data from the source domain (similar building) are selected (weighted) by iTrAdaBoost to help the training target prediction task. On the contrary, although the AdaBoost model has an ensemble learning ability, it is unable to effectively select the source-domain data, resulting in irrelevant data from the source domain also participating in the model training that affects the prediction accuracy. Compared with the iTrAdaBoost and AdaBoost models, the BP neural network gets the worst predictive accuracy. Obviously, when encountering insufficient target-domain training data, the performance of the simple BP model is degraded.

In order to more intuitively display the accuracy improvement in the proposed model compared with the other two models under different ratios of the target-domain training data, the percentage improvements in the indexes MAPE, RMSE and MAE are described in Figure 7. Taking the index MAPE as an example, the percentage improvements in the proposed model vs. the BP model are all over 40%. The improvements in the proposed model vs. the AdaBoost model are more than 10%. Let it be noted the percentage improvement is calculated as

p = \frac{x - y}{x} \times 100 %

, where x is the original value and y is the improved value. Figure 8 shows the real and predicted electricity loads using the three predictive models and 30% training data from the target domain. To more clearly show the accuracy differences between the proposed model and the other two models, Figure 9 gives the one-week prediction results of the three models.

4.3. Verification of MMD Effectiveness

To verify the effectiveness of the MMD similarity judgment, another educational building is manually selected for the auxiliary training. Instead of through the MMD method, the building is randomly selected in the educational building group of the public dataset. To ensure the consistency of the prediction process, the same eight variables are selected as the model inputs, and the parameter settings remain unchanged. We briefly set the ratio of the training data from the target domain to 30%, and the prediction results of the three models are recorded in Table 3. From the results, it is shown that although the auxiliary training data have not been selected by the MMD, the proposed iTrAdaBoost model still achieves the best prediction accuracy (MAPE: 12.26%, RMSE: 84.24 kWh and MAE: 74.15 kWh). It means the sample-based transfer learning mechanism (iTrAdaBoost) can still pick out useful data from the source domain (another building) for helping the training target model. However, compared with the results from Table 2, the accuracies of the three models are all deteriorated. It means that the MMD similarity judgment can substantially help select similar buildings, which is conducive to improving the accuracy of the transfer learning models. The real and predicted electricity loads using the three predictive models (without the MMD) are shown in Figure 10 and Figure 11.

5. Conclusions

In the field of building energy prediction, dataset acquisition usually faces obstacles in real practical scenarios, such as insufficient available data for model training, and data distribution changes with time and other factors. A building electricity load forecasting method based on the MMD similarity judgement and the iTrAdaBoost algorithm is proposed. Different from the basic TrAdaBoost algorithm, the iTrAdaBoost algorithm iteratively updates the weights of the data from the source domain so that the prediction accuracy is improved. An educational building’s case study is performed. When the target and source samples belong to different domains, the proposed predictive method has a better prediction accuracy than the BP or basic AdaBoost models. When the ratio of the training data from the target domain are changed (10%, 20% and 30%), the proposed transfer learning model always achieves the best performance and thus has great potential for a practical application.

Let it be noted that only one of the most similar datasets was chosen for the auxiliary training. The multi-source transfer learning scheme may be considered to further improve the model’s transfer learning ability in the future.

Author Contributions

K.L. designed the overall framework. B.W. developed the transfer learning model and performed the case studies. Q.T. and Y.L. prepared all the datasets and realized part of the algorithms. B.W. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (Grant No. 61873114), “Six Talents Peak” High-level Talents Program of Jiangsu Province (Grant No. JZ-053) and Youth Program of Agricultural Equipment Faculty of Jiangsu University (Grant No. NZXB20210211).

Data Availability Statement

https://www.kaggle.com/competitions/ashrae-energy-prediction/data, accessed on 14 November 2022.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Daut, M.A.M.; Hassan, M.Y.; Abdullah, H.; Rahman, H.A.; Abdullah, M.P.; Hussin, F. Building electrical energy consumption forecasting analysis using conventional and artificial intelligence methods: A review. Renew. Sustain. Energy Rev. 2017, 70, 1108–1118. [Google Scholar] [CrossRef]
Li, K.; Xue, W.; Tan, G.; Denzer, A.S. A state of the art review on the prediction of building energy consumption using data-driven technique and evolutionary algorithms. Build. Serv. Eng. Res. Technol. 2020, 41, 108–127. [Google Scholar] [CrossRef]
Khalil, M.; McGough, A.S.; Pourmirza, Z.; Pazhoohesh, M.; Walker, S. Machine Learning, Deep Learning and Statistical Analysis for forecasting building energy consumption—A systematic review. Eng. Appl. Artif. Intell. 2022, 115, 105287. [Google Scholar] [CrossRef]
Fathi, S.; Srinivasan, R.; Fenner, A.; Fathi, S. Machine learning applications in urban building energy performance forecasting: A systematic review. Renew. Sustain. Energy Rev. 2020, 133, 110287. [Google Scholar] [CrossRef]
Manfren, M.; James, P.A.; Tronchin, L. Data-driven building energy modelling–An analysis of the potential for generalisation through interpretable machine learning. Renew. Sustain. Energy Rev. 2022, 167, 112686. [Google Scholar] [CrossRef]
Etemad, A.; Shafaat, A.; Bahman, A.M. Data-driven performance analysis of a residential building applying artificial neural network (ANN) and multi-objective genetic algorithm (GA). Build. Environ. 2022, 7, 109633. [Google Scholar]
Tian, J.; Li, K.; Xue, W. An adaptive ensemble predictive strategy for multiple scale electrical energy usages forecasting. Sustain. Cities Soc. 2021, 66, 102654. [Google Scholar] [CrossRef]
Xiao, L.; Li, M.; Zhang, S. Short-term power load interval forecasting based on nonparametric Bootstrap errors sampling. Energy Rep. 2022, 8, 6672–6686. [Google Scholar] [CrossRef]
Xiao, J.; Li, Y.; Xie, L.; Liu, D.; Huang, J. A hybrid model based on selective ensemble for energy consumption forecasting in China. Energy 2018, 159, 534–546. [Google Scholar] [CrossRef]
Li, K.; Zhang, J.; Chen, X.; Xue, W. Building’s hourly electrical load prediction based on data clustering and ensemble learning strategy. Energy Build. 2022, 261, 111943. [Google Scholar] [CrossRef]
Li, K.; Tian, J.; Xue, W.; Tan, G. Short-term electricity consumption prediction for buildings using data-driven swarm intelligence based ensemble model. Energy Build. 2021, 231, 110558. [Google Scholar] [CrossRef]
Nastasi, B.; Manfren, M.; Groppi, D.; Lamagna, M.; Mancini, F.; Garcia, D.A. Data-driven load profile modelling for advanced measurement and verification (M&V) in a fully electrified building. Build. Environ. 2022, 221, 109279. [Google Scholar]
Li, X.; Jiang, H.; Liu, Y.; Wang, T.; Li, Z. An integrated deep multiscale feature fusion network for aeroengine remaining useful life prediction with multisensor data. Knowl.-Based Syst. 2022, 235, 107652. [Google Scholar] [CrossRef]
Tronchin, L.; Manfren, M.; Nastasi, B. Energy analytics for supporting built environment decarbonisation. Energy Procedia 2019, 157, 1486–1493. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 1–40. [Google Scholar] [CrossRef] [Green Version]
Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Reddy, S.; Akashdeep, S.; Harshvardhan, R.; Kamath, S. Stacking Deep learning and Machine learning models for short-term energy consumption forecasting. Adv. Eng. Inform. 2022, 52, 101542. [Google Scholar]
Peirelinck, T.; Kazmi, H.; Mbuwir, B.V.; Hermans, C.; Spiessens, F.; Suykens, J.; Deconinck, G. Transfer learning in demand response: A review of algorithms for data-efficient modelling and control. Energy AI 2022, 7, 100126. [Google Scholar] [CrossRef]
Begum, N.; Hazarika, M.K. Maturity detection of tomatoes using Transfer Learning. Meas. Food 2022, 7, 100038. [Google Scholar] [CrossRef]
Zhong, J.; Li, J.; Lotfi, A.; Liang, P.; Yang, C. An incremental cross-modal transfer learning method for gesture interaction. Robot. Auton. Syst. 2022, 155, 104181. [Google Scholar] [CrossRef]
Jamil, F.; Verstraeten, T.; Nowé, A.; Peeters, C.; Helsen, J. A deep boosted transfer learning method for wind turbine gearbox fault detection. Renew. Energy 2022, 197, 331–341. [Google Scholar] [CrossRef]
Ruder, S.; Peters, M.E.; Swayamdipta, S.; Wolf, T. Transfer learning in natural language processing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, Florence, Italy, 28 July–2 August 2019; pp. 15–18. [Google Scholar]
Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020; IEEE: New York, NY, USA, 2020; pp. 737–744. [Google Scholar]
Ahn, Y.; Kim, B.S. Prediction of building power consumption using transfer learning-based reference building and simulation dataset. Energy Build. 2022, 258, 111717. [Google Scholar] [CrossRef]
Jain, M.; Gupta, K.; Sathanur, A.; Chandan, V.; Halappanavar, M.M. Transfer-learnt models for predicting electricity consumption in buildings with limited and sparse field data. In Proceedings of the 2021 American Control Conference (ACC), New Orleans, LA, USA, 26–28 May 2021; IEEE: New York, NY, USA, 2021; pp. 2887–2894. [Google Scholar]
Ye, R.; Dai, Q. A relationship-aligned transfer learning algorithm for time series forecasting. Inf. Sci. 2022, 593, 17–34. [Google Scholar] [CrossRef]
Yang, K.; Lu, J.; Wan, W.; Zhang, G.; Hou, L. Transfer learning based on sparse Gaussian process for regression. Inf. Sci. 2022, 605, 286–300. [Google Scholar] [CrossRef]
Li, B.; Rangarajan, S. A conceptual study of transfer learning with linear models for data-driven property prediction. Comput. Chem. Eng. 2022, 157, 107599. [Google Scholar] [CrossRef]
Wang, C.; Yuan, J.; Huang, K.; Zhang, J.; Zheng, L.; Zhou, Z.; Zhang, Y. Research on thermal load prediction of district heating station based on transfer learning. Energy 2022, 239, 122309. [Google Scholar] [CrossRef]
Dai Wenyuan, Y.Q.; Guirong, X.; Yong, Y. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 193–200. [Google Scholar]
Liu, B.; Liu, L.; Xiao, Y.; Liu, C.; Chen, X.; Li, W. AdaBoost-based transfer learning with privileged information. Inf. Sci. 2022, 593, 216–232. [Google Scholar] [CrossRef]
Liu, B.; Liu, C.; Xiao, Y.; Liu, L.; Li, W.; Chen, X. AdaBoost-based transfer learning method for positive and unlabelled learning problem. Knowl.-Based Syst. 2022, 241, 108162. [Google Scholar] [CrossRef]
Fan, C.; Sun, Y.; Xiao, F.; Ma, J.; Lee, D.; Wang, J.; Tseng, Y.C. Statistical investigations of transfer learning-based methodology for short-term building energy predictions. Appl. Energy 2020, 262, 114499. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. Icml. Citeseer 1996, 96, 148–156. [Google Scholar]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
Fang, X.; Gong, G.; Li, G.; Chun, L.; Peng, P.; Li, W. A general multi-source ensemble transfer learning framework integrate of LSTM-DANN and similarity metric for building energy prediction. Energy Build. 2021, 252, 111435. [Google Scholar] [CrossRef]

Figure 1. An example of transfer learning process.

Figure 2. Schematic of Maximum Mean Discrepancy.

Figure 3. The overall framework.

Figure 4. Classification of building types in the public dataset from the Great Energy Predictor III competition organized by ASHRAE.

Figure 5. Distributions of two buildings’ electricity loads.

Figure 6. Different meteorological data distributions of two buildings: (a) air temperature, (b) dew temperature, (c) wind speed and (d) sea level pressure.

Figure 7. Performance improvements in the proposed model compared with BP and AdaBoost models under three ratios of target-domain training data: (a) MAPE, (b) RMSE and (c) MAE.

Figure 8. Real and predicted electricity loads using three predictive models, ratio of training data from target domain: 30%, 2000 h data.

Figure 9. Real and predicted electricity loads using three predictive models, ratio of training data from target domain: 30%, one week’s data.

Figure 10. Real and predicted electricity loads using three predictive models, no MMD, 2000 h data.

Figure 11. Real and predicted electricity loads using three predictive models, no MMD, one week’s data.

Table 1. Main parameter settings for the case study.

Parameter	Value
Prediction task	short-term building electricity load
Building type	educational building
Auxiliary data source	Great Energy Predictor III competition organized by ASHRAE
Similarity judgment method	MMD
Prediction algorithm	iTrAdaBoost
External meteorological data	air temperature and dew temperature and wind speed
	and wind direction and sea-level pressure
Time step	hourly
BP network structure	8–10–1
Learning rate	0.1
Initial threshold	0.1
Iteration number	100

Table 2. Performance comparison of the three predictive models with different ratios of training data from target domain.

Model	Ratio of Target Domain Data	MAPE (%)	RMSE (kWh)	MAE (kWh)
BP		16.79	117.7988	106.3294
AdaBoost	10%	10.01	80.4766	61.1933
iTrAdaBoost		8.94	73.3029	58.3681
BP		16.35	117.6454	105.0066
AdaBoost	20%	8.91	68.2793	52.3476
iTrAdaBoost		7.37	64.8076	48.9637
BP		13.22	110.7241	90.0883
AdaBoost	30%	7.51	64.0791	49.8123
iTrAdaBoost		6.72	55.6497	43.2473

Table 3. Performance comparison of the three predictive models without MMD similarity judgment.

Model	MAPE (%)	RMSE (kWh)	MAE (kWh)
BP	20.53	149.0932	135.4407
AdaBoost	14.02	95.4129	84.3469
iTrAdaBoost	12.26	84.2419	74.1522

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, K.; Wei, B.; Tang, Q.; Liu, Y. A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm. Energies 2022, 15, 8780. https://doi.org/10.3390/en15238780

AMA Style

Li K, Wei B, Tang Q, Liu Y. A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm. Energies. 2022; 15(23):8780. https://doi.org/10.3390/en15238780

Chicago/Turabian Style

Li, Kangji, Borui Wei, Qianqian Tang, and Yufei Liu. 2022. "A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm" Energies 15, no. 23: 8780. https://doi.org/10.3390/en15238780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Data-Efficient Building Electricity Load Forecasting Method Based on Maximum Mean Discrepancy and Improved TrAdaBoost Algorithm

Abstract

1. Introduction

1.1. Related Works

1.2. Contribution and Novelty

1.3. Paper Organization

2. Principles and Methods

2.1. Principle of Transfer Learning

2.2. The Improved TrAdaBoost Algorithm

2.3. Maximum Mean Discrepancy (MMD) Similarity Judgment

2.4. General Framework

3. Data Sources and Data Preprocessing

3.1. Data Sources

3.2. Data Preprocessing

4. Results and Analysis

4.1. Parameter Setting

4.2. Results with Different Ratios of Target-Domain Data

4.3. Verification of MMD Effectiveness

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI