*3.1. Data Feature Extraction and Model Inputs*

The historical water demand data from three actual DMAs (namely, DMA1, DMA2, and DMA3) in Beijing, China, were collected and used to train and test the forecasting model. On the inlet of the DMA, the water demand data were metered with the unit of m3 and recorded every 15 min; then the data were transferred to the database of the Beijing Water Works in real time. The water consumption pattern and the composition of customers in DMA1 is very different from that in DMA2 and DMA3; DMA1 includes more than 10,000 residential customers, 168 business customers, and 68 industrial customers. The number of water customers in DMA2 and DMA3 are 1822 and 1936, respectively; water customers in DMA2 and DMA3 are mostly residential and there are also some business customers. The statistics of the three DMAs' water consumption data are show in Table 1. The three DMAs' water consumptions at different times in one week are shown in Figure 3. From the weekly curves of water demands in Figure 3, one can see the different demand patterns of the three DMAs, for example, there is no obvious peak hour in the evening for DMA1, and there are no obvious morning peak hours on weekends for DMA3.

**Table 1.** Characteristics of water demand data in 2018 for the three case study district metering areas (DMAs).


**Figure 3.** One-week water consumption curves of the case study DMAs. (**a**) DMA1; (**b**) DMA2, and (**c**) DMA3.

In total, 8 weeks' data were collected from the water demand record in 2018 for training and testing the forecasting model. The data set contains 5376 observations for each DMA. Seven weeks' data were used as training data, while the last week's data were used for model testing. When using

the hybrid framework to predict the water demand at 96 time steps on the next day, the water demand data of the current day and previous days were used for model training, for example, the historical water demand data of the previous 49 days were used for model training to predict the demand on the 50th day, and the water demand data of the previous 50 days were used for model training to predict the demand on day 51, and so on.

When selecting the input data for the forecasting model from the historical water demand data, Guo et al. [9] categorized the historical data into three fragments, namely, recent time, near time, and distant time, and selected five time-steps in each time fragment as the input data. Herrera et al. [1] selected the historical water demand data at three time-steps including the current time, the previous time, and the target time in the previous week as the input data. Ordan and Reis [7] selected six time-steps including four continuous time-steps before the target time, the target time on the previous day, and previous week. According to these literatures, the historical water demand at the current time, the previous time, the target time on the previous day, and the previous week are usually adopted as the model input data in the short-term water demand forecasting. In this study, to better model the characteristics of the water demand time series, a correlation analysis [7] is performed based on the data of three DMAs to find the data that is highly related to the water demand data at the target time from the historical water demand data. Furthermore, various combinations of the related data are tested as the input for the forecasting model, and the following combination is identified as having the best performance, in other words, three continuous time-steps before the target time (*Qt*, *Qt*–1, *Qt*–2), the target time on the previous one day and two days (*Qt*–95 and *Qt*–191), and the target time on the previous week (*Qt*–671). Therefore, the historical data set (*Qt*, *Qt*–1, *Qt*–2, *Qt*–95, *Qt*–191, *Qt*–671) is adopted as the input data for the initial forecasting model in this study.

## *3.2. Model Setup*

In addition to the hybrid forecasting model proposed in this study, two other forecasting models are established to make comparisons with and to validate the performance of the proposed hybrid forecasting approach. As summarized in Table 2, the hybrid model H\_LSSVM\_Chaos is the one established by the hybrid framework of this study (see Figure 2), and the other two are a single forecasting (S\_LSSVM) and a hybrid forecasting model (H\_LSSSVM\_FS), respectively. The single forecasting model S\_LSSVM uses the traditional prediction procedure without error correction module, in other words, only the initial forecasting module is used. The hybrid forecasting model (H\_LSSVM\_Chaos and H\_LSSSVM\_FS) adopts both the initial forecasting module and the error correction module. The model inputs of the initial forecasting module are the feature data extracted from the historical water demand data, while the model inputs of the error correction module are the error series of the initial forecasting model. The error series can be evaluated according to Equation (1) and the flowchart in Figure 2. In the hybrid forecasting model, the initial forecasting module is the same one applied in the single forecasting model.

**Table 2.** Characteristics of forecasting models.


The hybrid model H\_LSSSVM\_FS uses the Fourier series as the forecasting model of the error time series in the error correction module, which is similar to the approach used by Brentan et al. [29] and Ordan and Reis [7]. Model inputs of the hybrid models' error correction modules are based on the errors of the initial forecasting by the S\_LSSVM model.

For the error correction module in the H\_LSSVM\_FS model, the error time series of the previous seven days (i.e., 672 values) is used to compute the coefficients of the Fourier series; the number of harmonics of FS is set to 336. The LS-SVMlab Toolbox developed by Brabanter et al. [44] is used to train the forecasting models by LSSVM, and the three-Level Bayesian inferring method is adopted for parameter tuning of the LSSVM. Table 3 displays the model parameters for the application of LSSVM and chaos methods. Parameters γ and δ<sup>2</sup> in Table 3 were obtained by Bayesian method for the LSSVM model training. In addition, *m* and τ are the essential parameters for chaotic time series construction.


**Table 3.** Model parameters for the application of LSSVM and chaos methods.
