2.2.3. Demand Fingerprint and Synthetic Demand Generation

Figure 3 shows the creation of the four parameters necessary to generate energy demand. The first two swim lanes in the flowchart show the identification of the cluster. Initially, the goal was to extract a demand shape or fingerprint from the data. Since human behavior through weekly routines greatly influences the demand, the energy demand data were split into weekly samples. The peaks and troughs of the demand patterns were identified to emphasize the demand's fingerprint. This process entailed moving the peaks and troughs to the following hourly locations: 0, 3, 6, 15, 18, 21, and 24, since it was discovered that the peaks and troughs occur at these times depending on the season. Actual values were selected for 9, 11, 12, and 13, since these values represent the midday demand dip that occurs due to the Japanese lunch hour, which is noticeable yearlong. After the alignment, since the goal was to extract the fingerprint, the demand's magnitude was scaled using z-transform. The resulting scaled value was then used as input to an FFT transform to extract the frequency components that comprise the fingerprint. Only the daily variations (multiples of 7 Hz) were selected as features for the clustering algorithm to reduce the noise. These feature values were scaled using z-transform to reduce the magnitude in the distance calculation.

These features were then clustered using the Kmeans algorithm through the *sklearn*.*cluster*. *Kmeans* method of the *sklearn* Python library. Several values of K were explored, and through

experimentation K = 5 was identified as the number of clusters that could explain the data. The clusters' fingerprints are shown in the inset of Figure 4. Once these clusters were identified, the weekday datasets were combined for each cluster, and another FFT transform was done to extract the Fourier parameters necessary to represent the waveform. Combining the datasets, after clustering, emphasized the pattern for each cluster and removed the noise. As with the pre-clustering data, only the daily variations (different frequencies depending on the sample sizes) were selected.

A classifier must be developed to identify the appropriate fingerprint for each week during the energy demand generation. Through data exploration, the maximum weekly temperature, minimum weekly temperature, and the month of the week were identified as the features that could be used to classify each week. The *sklearn*.*neighbors*.*KneighborsClassi fier* method of *sklearn* Python library was used with k = 5 to classify the weekly data. By running 80–20 training-test split 1000 times, the classification got an average accuracy score of 83%, with 66% and 95% as the minimum and maximum scores, respectively. This average accuracy score was deemed acceptable and this was used as the fingerprint classifier.

**Figure 3.** Generating the demand fingerprint based on the demand and temperature data.

**Figure 4.** The weekly demand clusters of Kyushu from FY2016–2019.

While the first two swim lanes in Figure 3 provided the demand's fingerprint, the last two provide the minimum and maximum values that stretch or compress the fingerprint.

Through data exploration, it was observed that non-holiday weekday temperature and demand have a strong correlation; thus, it was extracted and fitted into known functions. Using *scipy*.*optimize*.*curve*\_ *fit* method of the *scipy* Python library, as seen in Figure 5, the minimum temperature and minimum demand were fitted to a quadratic curve with an *R*<sup>2</sup> of 0.80. The curve fitting for the maximum temperature and maximum demand required a piece-wise linear equation and was similarly fitted with an *R*<sup>2</sup> of 0.88. The weekend and holiday fitting were explored, but no meaningful functions were derived; thus, a simple weekday-to-weekend ratio was extracted by averaging all the known values. Seasonal variations in the ratio were initially explored, but no meaningful trend was seen; thus, the concept was dropped.

**Figure 5.** Correlation of Temperature and Demand in Kyushu.

Generating yearly demand based on temperature is shown in Figure 6 where the green input blocks represent the models, values, and functions generated from Figure 3, and the red input block represents the weekly temperature statistics from the selected year. Using these inputs, a fingerprint assignment and the minimum and maximum demand per day were identified. The fingerprint is then fitted to the daily min-max demand using *scipy*.*optimize*.*curve*\_ *fit* method and provides the *A*<sup>0</sup> and *B*<sup>0</sup> coefficient for the Fourier representation. This is done for all weeks of the year to generate the entire year. Testing this approach with the known values for 2017, 2018, and 2019, the synthetic demand approach could get *R*<sup>2</sup> of 0.8675, 0.8714, and 0.8177, respectively. A sample of the demand curve can be seen in Figure 7, where the demand were closely synthesized. The problem with holidays (e.g., new year) is noticeable and some weekends are not reproduced accurately. However, the general shape or *fingerprint* of the demand fits well with the actual values.

**Figure 6.** Temperature-dependent demand generation.

**Figure 7.** Sample demand synthesis for 2018.
