*5.1. Dataset Introduction*

The data we adopted for validating the priority of the proposed method is from Pennsylvania-New Jersey-Maryland (PJM), which is a regional transmission organization in the USA. It is a part of the Eastern Interconnection grid operating an electric transmission system serving all or parts of some states. Different companies supply different regions. This paper applies three data sets from three companies: American Electric Power (AEP), Commonwealth Edison (COMED), and Dayton Power and Light Company (DAYTON). The raw data set of those is hourly consumption in megawatts (MW), and detailed information is described in Table 3. The data is available on the website of kaggle.com/robikscube/hourly-energy-consumption.


**Table 3.** Induction of data sets.

The original data is utilized to validate the effectiveness of VSTF. For the other forecasting tasks, we use an overlapping sample algorithm to generate corresponding samples for each forecast, as

shown in Algorithm 1. The stride of the algorithm we defined is one. Notably, we adopted the sample rate of 24 h, 168 h, and 720 h to generate each type of sample for STF, MTF, and LTF. One electricity consumption at different durations is given in Figure 3, in which different types fluctuate differently, respectively, VSTF and STF, which fluctuate frequently.

**Algorithm 1:** Overlapping sample algorithm

Return *dailyf eatures*, *dailylabels*, *weeklyf eatures*, *weeklylabels*, *monthlyf eatures*, *monthlylabels*

**Figure 3.** The electricity consumption at different durations. (**a**) Hourly electricity consumption for VSTF. (**b**) Daily electricity consumption for STF. (**c**) Weekly electricity consumption for MTF. (**d**) Monthly electricity consumption for LTF.

A description of each data for different forecasts is shown in Table 4. The first 80% of samples are utilized for training the model; the last 20% of samples are utilized to validate. Before starting the experiment, we adopted Equation (27) to normalize each data to work out the impact of different sizes of units, where *x* is the normalized data point of time series *T* and *x* is the raw data sample.

$$\mathbf{x}' = \frac{\mathbf{x} - \min(T)}{\max(T) - \min(T)} \tag{27}$$


**Table 4.** The description of each data set for different forecasts.
