**1. Introduction**

Accurate, reliable, and timely electricity consumption information is the key to ensure a stable and e fficient electricity supply. However, the electricity consumption in daily life usually fluctuates with time, region, season, temperature, and society. Even in the same city, electricity consumption in di fferent areas may vary. Typically, the power company arranges fixed personnel to provide the electricity supply of the fixed place. Once there is a surge of local electricity consumption, the electricity supply of the area will be a ffected, thus a ffecting the healthy life. Forecasting actual future electricity consumption can make corresponding adjustments in time to avoid this situation. There are three types of forecasts according to the forecasting duration: short-term forecast (STF), medium-term forecasting (MTF), and long-term forecasting (LTF). Generally, STF focuses on the time range from 24 h to one week; MTF focuses on the time range from one week to one month, and LTF focuses on the time range longer than the other two types [1,2].

Di fferent types of electric power forecasting have di fferent purposes: The short-term electricity consumption forecasting supports the personnel and equipment arrangemen<sup>t</sup> of the next day. The medium-term electricity consumption forecasting gives decision support for the human resource allocation of the power company. The long-term electricity consumption forecasting is a significant decision basis from the macro perspective. To deal with an emergency such as line damage, natural disasters, and so on, very short-term (VST) power consumption forecasting is also essential. We defined very short-term electricity forecasting in this paper is hourly.

Di fferent methods have been carried out for power forecasting, which mainly contains three categories: regression-based, time series-based, and machine learning-based methods [3]. The regression-based method can be divided into two sub-classes: Normal regression such as simple linear regression, lasso regression, ridge regression, and autoregression methods such as vector auto-regression (AR) and vector moving average (MA). Especially, Tang et al. applied a LASSO-based approach to forecast the current solar power generation by using the past 30 days of data and achieve better results than the support vector machine-based method [4]. Yu et al. applied an improved AR-based method for short-term hourly load forecasting, which was tested on two kinds of real-time hourly data sets [5]. Ordinary regression only considers the relationship of current variables and needs additional related data. However, the dependent variables are a ffected by the relevant variables of the current and past periods. The autoregressive model takes into account the impact of the current and past points, but it requires data that must be stationary. To overcome the disadvantages that occurred in the regression-based method, a time series-based method is presented for energy consumption forecasting. Autoregressive integrated moving average model (ARIMA) is one of the most excellent time series-based models. It not only considers the impact of the current and past periods but also can be used for non-stationary data. The ARIMA model can be symbolized as *ARIMA*(*p*, *d*, *q*), where *p* is the parameter of lag *pth* order autocorrelation, *q* is the parameter of lag *qth* order partial autocorrelation, and *d* is the parameter for generating stationary time series. Usually, *d* ranges from 1 to 2; *p*, *q* range from 8 to 10 [3,6]. ARIMA has been employed for short-term power forecasting in [7,8]. Mitkov et al. [9] proved that ARIMA could be used for MTF and LTF for electricity forecasting.

The above regression-based and time series-based methods consider the relationship between the past and the current time is linear. However, most of the hidden relationships are nonlinear. The machine learning-based method can overcome this issue by using di fferent nonlinear kernels such as support machine vectors (SVMs). Although some studies have successfully used SVM to predict energy consumption, there will be overfitting when data is broad [3,10]. Fortunately, the deep learning-based method can handle the overfitting problem very well with a good forecasting result. Recently, the convolutional neural network (CNN) [11], one of the mighty deep learning methods, has been widely applied for power forecasting due to its excellent feature extraction capacity. Li et al. [12] reshapes the data into two dimensions as an image and then applies CNN for short-term electrical load forecasting. A novel multi-scale CNN considering time-cognition was presented in [13] for multi-step short-term load forecasting. Suresh et al. developed a new sliding window algorithm to generate data to forecast solar PV using multi-head CNN in making STF and MTF [14]. Kim et al. applied CNN for VST photovoltaic power generation forecasting and compared it with the long short-term memory (LSTM) method, proving that the CNN-based method is better than LSTM for VSTF [15]. Another deep learning-based method LSTM was used for LTF and STF problems as it has long-term memory [16]. Ma et al. [17–20] employed LSTM for STF in the area of power. For LTF problems, Agrawal et al. presented a novel model by combining LSTM and recurrent neural network (RNN) to predict future five-year electricity loads [21]. An enhanced deep model was proposed in Han's work [22] for STF and MTF of electric load. The attention mechanism was combined into LSTM for short-term photovoltaic power forecasting in Zhou's work [23]. In order to overcome the shortcomings of a single model, some hybrid models are proposed for power forecasting, such asWang et al. [24] proposed ARIMA–LSTM for daily water level forecasting. It used LSTM to forecast the residuals through results and then utilized ARIMA to train the model with residuals. However, it is complex to build so many ARIMA models to ge<sup>t</sup> the residuals when the data size is massive. Another hybrid model, CNN–LSTM, is proposed in Kim's work [25] for minutely, hourly, daily, and weekly electricity energy consumption forecasting using multi-variables as input. Hu et al. [26] also applied CNN–LSTM for daily urban water demand forecasting using related meteorological data. However, collecting such correlated variables is hard and time-consuming in reality. Although Yan et al. proposed a hybrid of CNN–LSTM to predict power consumption by using raw time series, it only focused on VSTF (minutely) [27]. Moreover, Yan et al. [28] proposed a hybrid LSTM model, in which wavelet transform (WT) is applied to preprocess the raw univariate time series firstly. Later, stationary parts of transformation are selected for VSTF (minutely). However, there is a problem that occurred in Yan's work [28] is that we still need to select the stationary part by hand.

The limitations of current research for energy consumption forecasting are summarized as follows. On the one hand, most above methods only focused on one or two types of forecasts among VSTF, STF, MTF, and LTF. However, we need to master various types of future power consumption information to improve power supply e fficiency and realize the smart grid. On the other hand, most existing methods refer to multi-variable regression, which requires collecting multiple related data. Motivated by this, we present a highly accurate deep model for various types of electricity forecasts by only using self-history data. We call this deep model multi-channels and scales CNN–LSTM (MCSCNN–LSTM). The proposed MCSCNN–LSTM employs dual channels as input to extract rich, robust feature representations from di fferent domains of raw data. One channel is the raw sample, and the other is the information of statistics corresponding to the raw sample. We adopted the parallel structure of CNN–LSTM, which is di fferent from conventional CNN–LSTM. At first, the CNN part in this structure extracts multi-scale and global features from the first channel using multi-scale and wide convolution technology. Then, the LSTM part guarantees to extract features that have a long-time dependency from the raw data. At last, combined with CNN, LSTM extracted features with statistics channels as comprehensive features to forecast the electricity consumption.

The biggest challenge is that the power consumption time series only has fewer time points rather than vibration signal, image, and video. It requires us to use CNN seriously due to the obtained data being relatively low dimensional. The strategy of this paper is to use a few pooling layers to reduce the loss of valuable information.

The main contributions of this paper are summarized as follows:


The rest of the paper is arranged as follows. Section 2 formalizes our problem and gives the data generation method. Section 3 introduces the theoretical background of the proposed approach consisting of CNN, LSTM, and statistical components knowledge. Section 4 gives the proposed architecture for electricity forecasting. Each type of forecasting mission is defined in this section also. In Section 5, comparative experimental studies on three datasets are carried out. In Section 6, we discuss the proposed deep model. Section 7 presents the conclusions and feature work.
