Extraction and Joint Method of PV–Load Typical Scenes Considering Temporal and Spatial Distribution Characteristics

Wang, Xinghua; Zhong, Fucheng; Xu, Yilin; Liu, Xixian; Li, Zezhong; Liu, Jianan; Zhao, Zhuoli

doi:10.3390/en16186458

Open AccessArticle

Extraction and Joint Method of PV–Load Typical Scenes Considering Temporal and Spatial Distribution Characteristics

Department of Electrical Engineering, School of Automation, Guangdong University of Technology, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(18), 6458; https://doi.org/10.3390/en16186458

Submission received: 2 August 2023 / Revised: 20 August 2023 / Accepted: 31 August 2023 / Published: 6 September 2023

(This article belongs to the Section A: Sustainable Energy)

Download

Browse Figures

Versions Notes

Abstract

:

Regarding the generation and integration of typical scenes of PV and loads in urban photovoltaic distribution networks, as well as the insufficient consideration of the spatiotemporal correlation between PV and loads, this paper proposes a typical scene extraction method based on local linear embedding, kernel density estimation, and a joint PV–load typical scene extraction method based on the FP-growth algorithm. Firstly, the daily operation matrices of PV and load are constructed by using the historical operation data of PV and load. Then, the typical scenes are extracted by the dimensionality reduction of local linear embedding and the kernel density estimation method. Finally, the strong association rules of PV–meteorological conditions and load–meteorological conditions are mined based on the FP-growth algorithm, respectively. The association of PV–load typical daily operation scenarios is completed using meteorological conditions as a link. This experiment involved one year of operation data of a distribution network containing PV in Qingyuan, Guangdong Province. The typical scene extraction joint method, Latin hypercube sampling method, and k-means clustering-based scene generation method proposed in this paper are used for comparison, respectively. The results show that compared to the other two scenario generation methods, the error between the typical scenario obtained by this method and the actual operating scenario of the distribution network is smaller. The extracted typical PV and load scenarios can better fit the actual PV and load operation scenarios, which have more reference value for the operation planning of actual distribution networks containing PV.

Keywords:

PV–load scenario association; FP-growth algorithm; local linear embedding; kernel density estimation; meteorological factors

1. Introduction

With the advancement of China’s dual-carbon target and the National Energy Administration’s PV system whole-county promotion pilot project, more and more rooftop-distributed PV power generation is being continuously connected to the distribution grid [1,2,3,4]. The output of distributed PV power generation is highly stochastic and correlated with meteorological conditions. The high penetration of distributed PV in the distribution network will put forward higher requirements for the optimization and control of power system dispatch [5,6]. How to generate reasonable PV and load daily operation scenarios for the distribution network and reduce the uncertainty of PV output, so as to improve the in situ PV consumption capacity and the safe and reliable operation of the distribution network, has become a difficult problem.

The scenario generation method is currently the main tool for describing PV output uncertainty. This method generates multiple deterministic scenarios based on variables related to PV output uncertainty that correspond to actual output situations. These deterministic scenarios are used to depict the uncertainty of new energy output. They provide a reference basis for the scheduling and optimal dispatch of short-term power generation plans on the grid [7,8].

Currently, there are three main methods for typical scenario generation, namely, mathematical statistical methods, clustering methods, and learning methods combined with artificial intelligence.

(1) Probability density function-based method. This method uses mathematical statistical methods to process historical PV and load data to derive formulas that satisfy the distribution characteristics of historical PV and load data. Ref. [9] generated the stochastic component of residential load power using the Markov chain Monte Carlo sampling method, and generated the load profile probability distribution of residential loads by considering meteorological factors and neighbourhood power usage characteristics. Ref. [10] constructed a PV–wind power correlation model after using copula functions. A large number of PV–wind power scenes were generated using the improved Latin hypercube sampling method. Finally, the k-means clustering method was combined to complete the generation of typical scenes.

(2) Cluster analysis-based approach. This is an algorithm that achieves greater similarity within clusters and lesser similarity between clusters to reveal the intrinsic nature of the samples to each other. Ref. [11] used the k-means clustering algorithm to cluster the load, PV and wind data at each node separately, and the center of clustering as a typical scenario for that node. Ref. [12] clustered one year of wind and PV power output data into 10 scenarios based on k-modiods clustering, with the center of the clusters as the typical output scenario.

(3) Learning method combined with artificial intelligence. This method uses some neural networks with the ability to learn to fit to complete the learning of the characteristics of the distribution of historical data, so that the network has the ability to generate the desired data series. Ref. [13] used conditional generation adversarial networks to generate accurate and reliable day-ahead scenarios using meteorological information as conditions and historical PV data as input. The day-ahead pattern and seasonal variability of PV output are fully considered. Ref. [14] used conditional generative adversarial networks to generate stochastic sequences of wind and PV outputs, which are combined with k-means clustering to complete the approximation of historical wind and PV scenarios, and ultimately generated scenarios that preserve the real data and take into account the stochastic nature of wind and solar energy. Ref. [15] divided the load data into different datasets based on different meteorology. Then, joint radiation–load–temperature scenarios were formed using a denoising variational autoencoder. Finally, a few typical scenes were formed using clustering.

In terms of scenario association, Ref. [16] constructed a correlation formula between PV and load based on the Copula theory to complete PV–load association in line with historical distribution characteristics. Based on the correlation formula, this provided a planning reference for the distribution network containing PV. Ref. [17] used the method of clustering to complete the clustering of wind power and PV historical data to form a typical scenario, and then used the Copula function to achieve the scenario federation of multiple wind–PV fields.

The above scenario generation method has a drawback. It fails to consider well the correlation between multi-meteorological factors and PV, as well as multi-meteorological factors and loads. It also fails to describe well the spatial and temporal distribution among the nodes in a distribution network. Therefore, this paper proposes a joint PV–load typical scenario method that accounts for spatial and temporal distribution correlation. Considering the spatial topology of the distribution network, a matrix structure of daily operation scenarios that takes into account the spatial and temporal distribution characteristics of the nodes and reflects the PV and load as a whole is established. After completing typical scene extraction based on the local linear embedding method and kernel density estimation, the FP-growth algorithm is used to mine the association rules of PV–meteorological factors and load–meteorological factors to complete the union of PV–load typical scenarios with meteorological factors as a link. Finally, a typical daily operation scenario of the PV–load combination that can provide a reliable reference for the operation scheduling and planning of the distribution network is obtained.

2. Scene Extraction Based on Local Linear Embedding and Kernel Density Estimation

For the urban medium voltage distribution network in the small number of stations, feeder length is not long and certain stations contain distributed PV power on the characteristics. Consider creating a daily operating scenario for each PV and load node of the entire MV distribution network instead of considering the daily operating scenario of each node separately. This facilitates the scheduling of the entire distribution network as well as the generation plan scheduling and other tasks. In order to adapt to the above requirements, this paper proposes a joint PV–load typical scenario method based on local linear embedding and the kernel density estimation method to account for spatial and temporal distribution characteristics. The main steps of this paper are shown in Figure 1.

2.1. Description Matrix of PV and Load Daily Operation Scenarios

The data required for this paper are the PV and load data for each moment, 24 h per day, for a complete year. Dates with a small amount of missing data were filled in using cubic spline interpolation [18]. Additionally, dates with too much missing data were discarded. The end result is PV and load operation data that are recorded 24 h a day, every day of the year. This ensures that the rows and columns of the resulting running scenario matrices are complete and identical. This facilitates the subsequent estimation of the reduced dimensionality and kernel density of the matrices in Section 2.2 and Section 2.3 to avoid the effect of unavailability of reduced dimensionality values due to inconsistencies in the structure of the matrices.

Figure 2 shows a distribution network with a known topology. Each of these nodes contain both load operation data and possibly distributed PV output data. The schematic diagram has fourteen load nodes, of which, three load nodes contain three PV nodes. The daily operation scenario matrix of PV and loads is established based on this distribution network topology, as shown in Equation (1).

[\begin{matrix} p_{11} & p_{12} & \dots & p_{1, n} \\ p_{21} & p_{22} & \dots & p_{2, n} \\ \dots & \dots & \dots & \dots \\ p_{24, 1} & p_{24, 2} & \dots & p_{24, n} \end{matrix}]

(1)

In the formula, each column of the daily operation scenario description matrix represents a node, and each row represents the output data recorded at the sampling time of the node. It is necessary to establish daily operation scenarios that can reflect the distribution of load and PV output nodes under the time section of the distribution network.

Each column of such a scenario description matrix is well characterized by the temporal correlation of PV output and load levels at each node. In addition, this matrix of PV and load daily operating scenarios can describe the addition of PV and load nodes in the same distribution network structure by adding new rows and columns or creating new matrices. Not only can it well reflect the temporal and spatial distribution characteristics of distribution network PV and load, but it can also fully take into account its growth characteristics.

2.2. Matrix Dimensionality Reduction Based on Local Linear Embedding Method

Local linear embedding (LLE) is a non-linear dimensionality reduction algorithm based on stream shape. It is able to achieve the dimensionality reduction and visualization of high-dimensional data without destroying the structure of the data stream shape [19,20]. The local information of the original data is preserved while removing the redundant information, which is almost unaffected by the noise value [21]. For each different scene description matrix, its internal numerical arrangement contains temporal correlation features for different nodes and times, so that each different daily scene matrix presents a different popular structure internally. The basic implementation steps are as follows:

Step 1: Select the neighborhood size k. For each data point, it is first necessary to select its k-nearest neighbors. Usually, Euclidean distance

d_{ij} = {‖ x_{i} - x_{j} ‖}_{2}

is used for measurement. Among them, the value of k needs to be adjusted based on the specific dataset and dimensionality reduction effect.

Step 2: Calculate the weight matrix W. For each data point

x_{i}

, use linear regression to represent it as a linear combination

x_{i} = \sum_{j}^{k} w_{ij} x_{j}

between its k-nearest neighbors.

Step 3: Determine the objective loss function. Suppose we have m n-dimensional samples

x_{1}, x_{2}, \dots, x_{m}

; we can use the mean square deviation as the loss function of the problem, as shown in the following formula:

J (w) = \sum_{i = 1}^{m} {‖ x_{i} - \sum_{j = 1}^{k} w_{ij} x_{j} ‖}_{2}^{2}

(2)

Normalize the weight coefficient

w_{ij}

in Equation (2), that is, the weight coefficient needs to meet:

\sum_{j = 1}^{k} w_{ij} = 1

. By using these two formulas, the weight coefficients can be determined.

Step 4: Objective function optimization. After deriving the above formula,

J (w)

is obtained as follows:

J (w) = \sum_{i = 1}^{m} W_{i}^{T} Z_{i} W_{i}

(3)

Among them, the constraint condition can be transformed into

W_{i} = {(w_{i 1}, w_{i 2}, \dots, w_{ik})}^{T}

,

Z_{i} = X_{i} X_{i}^{T}

,

{X_{i} = (x_{i} - x_{1}, x_{i} - x_{2}, \dots, x_{i} - x_{k})}^{T}

,

\sum_{j = 1}^{k} w_{ij} = W_{i} l_{k} = 1

, where

l_{k}

represents a vector with all k dimensions being 1.

Step 5: This is solved by the Lagrange multiplier method. First, construct a Lagrange function, as shown in Equation (4):

L (W) = \sum_{i = 1}^{m} W_{i}^{T} Z_{i} W_{i} + λ (W_{i} l_{k} - 1)

(4)

Then, take its derivative and set its value to 0 to obtain

W_{i} = λ^{'} Z_{i}^{- 1} l_{k}

, where

λ^{'}

is a constant. Finally, the weight coefficient

W_{i} = \frac{Z_{i}^{- 1} l_{k}}{l_{k}^{T} Z_{i}^{- 1} l_{k}}

can be obtained.

Step 6: Assuming that the projection of the above m n-dimensional samples

x_{1}, x_{2}, \dots, x_{m}

in the low dimensional d-dimension is

y_{1}, y_{2}, \dots, y_{m}

, in order to maintain a linear relationship, it is necessary to minimize the loss function

J (y)

, as follows:

J (y) = \sum_{i = 1}^{m} {‖ y_{i} - \sum_{j = 1}^{m} w_{ij} y_{j} ‖}_{2}^{2} = {\sum_{i = 1}^{m} ‖ Y I_{i} - Y W_{i} ‖}_{2}^{2} = Tr (Y (I - W) (I - W)^{T} Y^{T}) = Tr (Y^{T} MY)

(5)

Among them,

Y^{T} Y = mI

. The constraint condition of Equation (5) is

\sum_{i = 1}^{m} y_{i} = 0;

\frac{1}{m} \sum_{i = 1}^{m} y_{i} y_{i}^{T} = I

, mean

Y^{T} Y = mI

.

Next, use the Lagrange multiplier method to solve the above equation. The Lagrange multiplier method formula is as follows:

L (Y) = Tr (Y^{T} MY) + λ (Y^{T} Y - mI)

(6)

Finally, the matrix

Y = (y_{1}, y_{2}, \dots, y_{d})

composed of the eigenvectors corresponding to the d non-zero eigenvalues of the smallest matrix M is mapped to the d-dimensional dataset.

Due to the latter kernel density estimation, the computational complexity increases dramatically when dealing with high dimensional data, and it is prone to the dimensional catastrophe problem leading to inaccurate estimation results. It works best in estimating one dimensional data. So, in this paper, we need to downscale the matrix of 24 rows and n columns to one dimension (n denotes the number of nodes). The PV and load datasets used for kernel density estimation calculations are then composed separately. Since each matrix presents a different popular structure within it, the values after performing the same dimensionality reduction for each matrix still characterize the differences in the popular structure of the matrices.

2.3. Recognition of Candidate-Typical Scenes Based on Kernel Density Estimation

In the following, the probability distributions of these two datasets are calculated using kernel density estimation, respectively. In this paper, we chose kernel density estimation (KDE), which can calculate the data probability distribution of the unknown dataset [22,23,24,25].

The unknown dataset in this article is denoted as

P_{t} = {p_{1}, p_{2}, \dots, p_{t}}

(t represents the number of values in the dataset). Each p represents a daily operation scenario description matrix, and the result of KDE is

{\hat{f}}_{h} (x)

. Then, the KDE at any point x is:

{\hat{f}}_{h} (x) = \frac{1}{t} \sum_{i = 1}^{t} \frac{1}{h} K (\frac{x - x_{i}}{h})

(7)

Among them,

{\hat{f}}_{h} (x)

is the density function of a random variable and K(~) is the kernel function, and it satisfies

K (x) \geq 0, \int K (x) du = 1

. The bandwidth h is a preset parameter that will affect the smoothness of the final image curve.

Considering the ease of use in waveform synthesis calculations, a Gaussian kernel function is generally selected, with the formula:

K (x) = \frac{1}{\sqrt{2 π}} \exp (- \frac{1}{2} x^{2})

(8)

For each representative value of the downscaled daily scene description matrix, it is calculated using the KDE method described above. The probability distribution of PV and load scenes can be derived, respectively. The scenarios with higher probability densities in the PV and load datasets are selected as candidate typical scenario sets.

2.4. Typical Scene Extraction Based on Wasserstein Distance

After the initial identification of the candidate set of typical scenes. In order to reduce the number of candidate scenarios that satisfy the probability threshold but have little difference in data distribution within the scenario matrix, so as to improve the representativeness of typical scenarios and reduce the complexity of association rules. So, in this paper, the Wasserstein distance metric is utilized to measure the degree of similarity between sets of candidate-typical scenes [26,27,28]. Finally, discard some redundant scenarios that are more similar even if they satisfy the probability threshold requirement. Typical daily operational scenarios with good meteorological representativeness and with significant differences from other scenarios are derived.

The Wasserstein distance can be used to measure the similarity between two distributions. The minimum value of the distance that the data moves from distribution p to distribution q is the Wasserstein distance between the two distributions. This article uses Wasserstein distance to measure the similarity between matrices. The smaller the Wasserstein distance between two matrices, the higher the similarity between them. The formula is:

W (p, q) = \inf_{γ \sim (p, q)} E_{x, y \sim γ} [‖ x - y ‖]

(9)

Among them,

Π (p, q)

represents the set of all possible joint probability distribution combined by distribution p and q. For each possible joint probability distribution

γ

, sample

(x, y) \sim γ

from it to obtain a sample x and y; then, calculate the distance between the two samples. Finally, the expected distance

E_{x, y \sim γ} [‖ x - y ‖]

of the sample under the joint probability distribution

γ

can be calculated. The lower bound of the expected value in all possible joint probability distribution is the Wasserstein distance between the two distributions.

In this article, if the Wasserstein distance between two candidate scene set matrices is too small, it indicates that the two daily scene matrices are relatively similar, and the one with higher probability density should be selected as a typical scenario. The specific Wasserstein distance lower limit can be adjusted based on the scale of six to ten typical PV scenes and ten to fifteen typical load scene data in a year. This typical scenario scale of PV and electricity is more suitable for distribution network operation planning [29,30,31,32,33,34].

3. PV–Load Typical Scene Joint Based on FP-Growth Algorithm

3.1. Variable Quantile Division Considering the Distribution Characteristics of Meteorological Factors

Due to the strong correlation between PV output and meteorological conditions, it is necessary to first perform feature extraction and correlation analysis on some meteorological impact factors. The meteorological factors taken are the radiation, temperature, and light duration corresponding to each day of the year. Meteorological feature

W

can be extracted by the following formula:

{\begin{cases} W = {F T S} \\ F = {F_{d_\max} F_{d_mean} F_{d_difmax} F_{d_difmean}} \\ T = {T_{d_\max} T_{d_\min} T_{d_mean}} \\ S = {S_{d_time}} \end{cases}

(10)

In Formula (10),

F

,

T

, and

S

are radiation, temperature, and illumination time, respectively. The subscripts

d_\max, d_\min, d_mean

represent the maximum, minimum, and average values, respectively.

d_difmean

is the first-order difference mean, and

d_difmax

is the maximum value of the first order difference.

In order to facilitate the generation of association rule libraries using the FP-growth algorithm, further association analysis and the processing of meteorological features are needed. If the fixed quantile is used to grade the meteorological factors, the meteorological factors under the same interval distribution characteristics may be truncated. In order to avoid the above situation, this paper proposes a quantile method of meteorological factors based on KDE to divide each meteorological factor into several grades.

In order to facilitate the generation of an association rule base using the FP-growth algorithm, the meteorological features need to be processed for further association analysis. If a fixed quantile is used to classify meteorological factors, there may be truncation of meteorological factors that are under the same interval distribution characteristic. To avoid the above situation, this paper proposes a KDE-based weather factor partitioning method to classify each weather factor into several grades. Firstly, KDE [35,36,37,38] was carried out for eight meteorological factors

F

,

T

, and

S

, respectively, and the probability density curve of the annual distribution of each meteorological factor was obtained. Then, the quantile was selected according to the probability distribution characteristics of the curve. The quantile classification of meteorological factors based on KDE can divide meteorological factors under the same distribution characteristics into one level, making the classification of meteorological factors more scientific.

The following is a classification of meteorological factor

W = {F T S}

in a certain area of Guangdong using the method presented in this article. Figure 3 shows the classification of daily average temperature, daily maximum temperature, daily average radiation, and daily light hours based on KDE. The classification of other meteorological factors is similar. The abscissa unit of each graph is °C, and the radiation value unit is

W / m^{2}

. The ordinate unit is the probability density value. The unit of illumination hours in Figure 3f is hours/day, and the ordinate unit is days.

3.2. Generation of PV–Load Association Rule Library Based on FP-Growth

Frequent pattern tree (FP-growth) algorithm [39,40,41] is an association analysis algorithm proposed by HAN that cleverly introduces tree structure into the algorithm. Compared with the a priori algorithm, the characteristics are not generating candidate sets and traversing the dataset only two times, which greatly improves the efficiency of mining. In this paper, FP-growth algorithm is used to mine association rules between PV and weather factors and load and weather factors, and finally, to match to form PV–load association rule base. The main steps are shown in Section 3.2.1 and Section 3.2.2.

3.2.1. Construct FP Tree and Item Header Table

Construct PV meteorological and load meteorological datasets are based on the meteorological factors processed above, where

F, T, S

are the radiation, temperature, and lighting time processed by the quantile, and

M_{i}

is the typical daily operation scenario of the i-th PV or load. The samples for each dataset are as follows:

{F_{(d 1 ~ d 5)}, T_{(d 1 ~ d 5)}, S_{(d 1 ~ d 5)}, M_{i}}

(11)

Firstly, perform the first scan of the dataset to obtain all frequent single-item sets, while removing items that are less than the set minimum support. Create an item header table and arrange it in descending order of support.

Scan the dataset for the second time. For each transaction in the original dataset, remove the infrequent single-itemset and arrange its elements in descending order of support; then, start to build the FP tree. For each transaction, all its elements form a path from the root node to the leaf node in the FP tree. Insert the dataset into the FP tree in sequence, and if there are shared nodes, increase the corresponding node count by one.

3.2.2. Frequent Itemset Mining and Improvement

Dig up from the bottom of the item header table. Find the conditional pattern base for each frequent item, recursively call the tree structure, and remove items less than the minimum support. If a single-path tree structure is ultimately presented, then a frequent itemset is obtained. Otherwise, continue to call the tree structure. Finally, all frequent K-term sets are obtained.

To complete the generation of strongly correlated material rules, frequent K-itemsets should also meet the set requirements of support and confidence. Only frequent K-itemsets that meet the set minimum support and minimum confidence can meet the strongly correlated material rule. The introduction of support and confidence is as follows:

(1): Support

If the association rule is:

R : X \Rightarrow Y

, where

X \subset I

,

Y \subset I

, and

X \cap Y = Ø

, I is the itemset, and X, Y are the associated elements. If the proportion of itemsets in the itemset database T containing both X and Y-associated elements is s, then the support of association rule R in T is s, and it can also be expressed as probability

P (X \cup Y)

, that is, the ratio of the number of occurrences of X and Y in T to the total number of occurrences, as shown in the following equation:

support (XUY) = P (XUY) = \frac{count (XUY)}{| T |}

(12)

(2): Confidence level

If the association rule is:

R : X \Rightarrow Y

, where

X \subset I

,

Y \subset I

, and

X \cap Y = Ø

, I is the itemset, and X, Y are the associated elements. Then, the confidence degree of rule R refers to the possibility of containing Y in the item set containing X in the item set database T, which can be expressed by conditional probability P (Y|X). The formula is the ratio of the number of itemsets containing both X and Y to the number of itemsets containing X, expressed as follows:

confidence (X \Rightarrow Y) = P (Y | X) = \frac{support (XUY)}{support (X)}

(13)

For mining between PV–load–meteorological factors, only the correlation rules between meteorological factors and PV–load daily operating scenarios need to be obtained. The correlation rules between meteorological factors are then useless information for forming PV–load typical scenarios. Therefore, in the process of mining the frequent term set, while obtaining each conditional pattern base, it is determined whether it contains a day-running scenario or not. If it is included, then continue to construct the FP tree; otherwise, consider the next model base. Thus, the ones that satisfy the conditions are filtered in advance as strong association rules for PV–load–meteorological factors.

After completing the generation of association rules for PV-meteorological factors and load–meteorological factors, the meteorological factors in the two association rule libraries are used as links. When the meteorological factors of a typical PV day and a typical load day are the same, the PV typical scenario can be associated with the typical load scenario. Finally, complete the combination of PV–load typical scenes. The schematic diagram of a typical joint scenario is shown in Figure 4.

4. Example Analysis

In order to verify the effectiveness of the method in this paper, the PV and load operation data of a PV-containing MV distribution network in an urban area of Guangdong for a complete year (2022) were selected for experimental simulation. The medium voltage distribution network had a total of seven load nodes, including three PV nodes. The topology diagram is shown in Figure 5:

Firstly, preprocess all collected data, fill in some missing values as described above, and discard dates with too many missing values. Then, establish daily operation scenario matrices for PV and load based on the topology structure of the distribution network.

4.1. Extraction of Typical Scenarios

This paper uses the PyCharm integrated development environment for experimental simulation. Selecting the above method, the daily operation scenario matrices of PV and load are first downscaled separately. Subsequently, the kernel density was used to estimate the probability density distribution of each matrix using the kernel density of the downscaled results, and the results were obtained as shown in Figure 6. The two figures show the probability distribution of the PV daily running scenario matrix and the probability density values of the daily running scenario matrix downscaling results, respectively. Figure 7 shows the probability distribution plot of the load’s daily operating scenario matrix and the probability density values of the downscaling results, respectively. The horizontal axis of the scatter plot in Figure 6 and Figure 7 represent a total of 365 days from 1 January to 31 December, and the vertical axis represents the corresponding probability density value. The abscissa of the second image in Figure 6 and Figure 7 represents the dimensionality-reduced values of the PV–load matrix, while the ordinate represents the corresponding probability density values.

From Figure 6, the highest probability scenario matrix for the first 20 days can be selected. After screening scenes with excessive similarity through the Wasserstein distance index, seven typical PV scenes were ultimately obtained, with the first four typical scenes accounting for a larger proportion. Using the same method in Figure 7, typical daily operating scenarios for eleven types of loads can be obtained (top eight, eleven, and fourteen are discarded), and the first three typical scenarios account for a relatively large proportion.

4.2. Comparison of Accuracy of Typical Daily Curves Generated by Different Methods

In order to highlight the advantages of the typical scenarios identified by this method, the verification method is shown in the flowchart below (Figure 8).

Using the daily scene matrix composed of data from a distribution network in Qingyuan, the typical scenes generated by the typical scene extraction method, k-means clustering method, and Latin hypercube sampling method based on local linear embedding and KDE in this paper are used, respectively. Compare the typical scenarios obtained by the three methods with the actual operating scenarios.

For PV, the 289th meteorological day of the typical PV daily operation scenario (red line in Figure 9) was extracted using the method in this article, and all PV daily curves with the same meteorological conditions as that day in a year were selected to obtain the average value. The PV curve under that meteorological day is used as the actual operation scenario (blue line in Figure 9). Then, k-means clustering was used to cluster the original PV daily operation matrix, forming a total of eight typical PV daily operation scenarios. The typical daily operation scenarios that were the same as the 289th weather image were selected (orange line in Figure 9). Finally, six typical scenarios were generated using Latin hypercube sampling, with typical daily operating scenarios that were the same as the meteorological day (black line in Figure 9). The results obtained are shown in Figure 9.

The clustering method generates a total of eight typical daily operation scenarios for PV systems. Latin hypercube sampling generates six typical daily PV operation scenarios. The method in this article extracts seven typical daily operation scenarios of PV systems. Typical days under different methods and corresponding meteorological days of the PV scene for the error, respectively, were obtained after the average of the error of each day—being three error curves—as shown in Figure 10. The results show that the error of the typical PV scenes extracted by this method is smaller and more closely related to actual operating scenarios. The curve in Figure 9 shows the corresponding force value at a given moment. Figure 10 shows the margin of error between the value of the output force and the actual value at a given moment.

For loads, the meteorological day on which the 165th load day is located is considered a working day. As shown in Figure 11, the average of all load days with the same meteorological day and working day is plotted in blue. The red line represents the typical daily operation scenario of the corresponding load extracted by the method in this article, the black curve represents the typical scenario generated by hypercube sampling, and the orange curve represents the typical scenario generated by the clustering algorithm. From Figure 11 and Figure 12, it can be seen that although all three can fit the changes in load, the error of this method is smaller.

4.3. Combination of PV–Load Typical Scenarios

In order to facilitate the use of FP-growth algorithm to mine the correlation rules between PV–meteorological factors and load–meteorological factors, the discrete weather factor values need to be binned. First, the KDE method is used to grade the quantile of each meteorological factor

W = {F T S}

in this paper, and the results are shown in Table 1. Taking

F_{d_\max}

as an example, when the maximum daily solar radiation is less than or equal to 732.5 W/m²,

F_{d_\max}

= 1; At 732.5–867 W/m²,

F_{d_\max}

= 2; and so on.

Mining association rules based on FP-growth algorithm for daily PV and load operation data in 2022 and corresponding meteorological factors. Set the minimum support degree to six and the minimum confidence degree to 0.7, and receive 34 PV–meteorological strongly correlated material factor rules and 26 load–meteorological factor rules. Among them, seven typical PV daily operation scenarios and eleven typical load daily operation scenarios are included. Both the premise of scene association and the strong representation of typical scenes are ensured.

After completing the PV–meteorological factor, load, and meteorological factor strong association rule generation, respectively, the association of typical daily operating scenarios of PV–load in 2022 for this distribution network were finalized using the meteorological factor as a link, as shown in Figure 13. This represents the relevance of which types of PV–load combination typical scenarios are highly likely to occur under the corresponding meteorological day.

Select the typical scenario method that best fits the real operation scenario of the PV distribution network and further complete the combination of PV–load typical scenarios. Only then can reliable references be accurately provided for the short-term operation scheduling, adjustment of power generation plans, and planning optimization of the distribution network. Only in this way can we achieve the practical significance of scene generation.

5. Conclusions and Innovation Points

A discussion of several approaches in the text:

(1): The core of the k-means clustering method is to classify data by measuring the similarity between data objects using Euclidean distance. However, in the daily operating scenario description matrix constructed in this paper, if we simply use the Euclidean distance to measure the Euclidean distance between the values of each row or column in the matrix, then it fails to consider the PV output timing correlation and spatial correlation well. Further, the method proposed in this paper not only takes into account the distance of each value in the matrix regarding the dimensionality reduction of the matrix, but also retains the popular structure of the matrix; i.e., it takes into account the spatiotemporal correlation. So, for the daily scene description matrix constructed in this paper, its ability to identify typical scenes is relatively good.
(2): The Latin hypercubic sampling method is a method of approximate random sampling from a multivariate parameter distribution, with the greatest advantage of ensuring that very little data existing in the dataset can also be sampled with probability. For a full year’s worth of daily scene description matrix datasets, this method is equally capable of identifying representative scenes in the dataset. However, it is also affected by a small number of extreme scenes, resulting in the extraction of less representative typical scenes. The KDE method used in this paper’s method, on the other hand, can well represent the typicality of each loyal scene through the probability density curve.
(3): The FP-growth algorithm is able to mine the correlation between various variables in the dataset. This can fit the PV–meteorological factor and load–meteorological factor correlations well. If the generation of PV and load-typical scenarios is completed without a reasonable PV–load combination, then only random matching between multiple PV typical scenes and multiple load typical scenes can occur. The redundancy is increased.

In summary, compared to the typical scenarios formed by traditional k-means clustering algorithms and Latin hypercube sampling, the typical daily operation scenario extraction method based on local linear embedding and KDE proposed in this paper has better fitting ability and fewer errors for actual operation scenarios. Moreover, the FP-growth algorithm can complete the construction of dual association rules for PV meteorological factors and load meteorological factors. We explored the correlation between multiple meteorological factors and PV and meteorological factors, and ultimately achieved the combination of typical daily operation scenarios of PV and load. This provides a reliable reference for the operation and scheduling of PV distribution networks and the arrangement of power generation plans. After obtaining a typical operation scenario that can effectively fit the distribution grid in reality, combined with the meteorological forecast data, a reliable reference is provided for the operation scheduling of distribution grids containing PV in urban areas as well as the scheduling of generation plans.

Innovation points:

(1): For the operation of PV distribution networks, this article proposes a PV and load daily operation scenario matrix, which can better describe the PV and load power on each node time section of the distribution network. Using a matrix can effectively describe the power of the entire PV and load nodes in the distribution network, greatly simplifying the complexity of describing the distribution network scenario. For the subsequent addition of PV or load nodes, new rows and columns can be directly added to the original matrix. Alternatively, in order to avoid damaging the original matrix structure, a new matrix is created to represent the operation of the newly added nodes, taking into account the growth of the nodes, which have high flexibility.
(2): For typical scene extraction of PV and load, this paper proposes a typical scene extraction method based on local linear embedding and KDE. The effectiveness of this method has been demonstrated through simulation experiments.
(3): For the combination of PV and load typical scenarios, this paper first used KDE to process the quantile of meteorological factors. Then, the FP-growth algorithm was used to establish dual association rules for PV meteorological factors and load meteorological factors. Through the meteorological factors in the dual rule library as a link, the combination of typical daily operation scenarios of PV and load was achieved.

Author Contributions

Conceptualization, F.Z. and X.W.; data curation, F.Z. and Y.X.; formal analysis, F.Z. and X.L.; investigation, F.Z. and X.W.; methodology, F.Z. and Y.X.; resources, F.Z. and Z.L.; software, F.Z.; validation, F.Z. and J.L.; visualization, F.Z. and Z.Z.; writing—original draft, F.Z. and X.W.; writing—review and editing, F.Z., Z.Z. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Project Supported by the National Natural Science Foundation of China (No.62273104), the Science & Technology Program of Guangdong Power Grid Power Grid Co., Ltd. (030800KK52220016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shafique, M.; Luo, X.; Zuo, J. PV-green roofs: A review of benefits, limitations, and trends. Sol. Energy 2020, 202, 485–497. [Google Scholar] [CrossRef]
Zhong, T.; Zhang, Z.; Chen, M.; Zhang, K.; Zhou, Z.; Zhu, R.; Wang, Y.; Lu, G.; Yan, J. A city-scale estimation of rooftop solar photovoltaic potential based on deep learning. Appl. Energy 2021, 298, 117132. [Google Scholar] [CrossRef]
Fatima, S.; Püvi, V.; Lehtonen, M. Review on the PV hosting capacity in distribution networks. Energies 2020, 13, 4756. [Google Scholar] [CrossRef]
Gabdullin, Y.; Azzopardi, B. Impacts of PVs in Low-Voltage Distribution Networks: A Case Study in Malta. Energies 2022, 15, 6731. [Google Scholar] [CrossRef]
Guo, Y.; Ming, B.; Huang, Q.; Wang, Y.; Zheng, X.; Zhang, W. Risk-averse day-ahead generation scheduling of hydro–wind–PV complementary systems considering the steady requirement of power delivery. Appl. Energy 2022, 309, 118467. [Google Scholar] [CrossRef]
El Helou, R.; Kalathil, D.; Xie, L. Fully decentralized reinforcement learning-based control of PVs in distribution grids for joint provision of real and reactive power. IEEE Open Access J. Power Energy 2021, 8, 175–185. [Google Scholar] [CrossRef]
He, M.; Soltani, Z.; Khorsand, M.; Dock, A.; Malaty, P.; Esmaili, M. Behavior-Aware Aggregation of Distributed Energy Resources for Risk-Aware Operational Scheduling of Distribution Systems. Energies 2022, 15, 9420. [Google Scholar] [CrossRef]
Xiao, C.; Zhao, B.; Ding, M.; Li, Z.; Ge, X. Zonal voltage control combined day-ahead scheduling and real-time control for distribution networks with high proportion of PVs. Energies 2017, 10, 1464. [Google Scholar] [CrossRef]
Nijhuis, M.; Gibescu, M.; Cobben, J.F.G. Bottom-up Markov Chain Monte Carlo approach for scenario based residential load modelling with publicly available data. Energy Build. 2016, 112, 121–129. [Google Scholar] [CrossRef]
Lin, Y.; Zhang, J.; Ying, Q.; Li, Y.; Liu, J.; Zhou, J. Scenarios analysis method based on wind and PV power output correlation. In Proceedings of the 2021 IEEE Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 23–25 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 19–24. [Google Scholar]
Song, X.; Liu, Y. Wind and PV Generation Scene Division Based on Improved K-means Clustering. Power Gener. Technol. 2020, 41, 625. [Google Scholar]
Yu, R.; Hu, W.; Jiang, H.; Zhang, X.; He, F.; Zhang, Y.; Lai, M. Research on Scene Generation Method of Wind and Solar Active Power Output Based on k-Medoids Clustering and Generative Adversarial Networks. In Proceedings of the 2021 11th International Conference on Power and Energy Systems (ICPES), Virtual, 18–20 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 690–695. [Google Scholar]
Yuan, R.; Wang, B.; Sun, Y.; Song, X.; Watada, J. Conditional style-based generative adversarial networks for renewable scenario generation. IEEE Trans. Power Syst. 2022, 38, 1281–1296. [Google Scholar] [CrossRef]
Peng, Y.; Ye, L.; Zhao, Y.; Li, Z.; Wang, X.; Li, P. Stochastic Scenario Generation for Wind Power and PV System Based on CGAN. In Proceedings of the 2022 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia), Shanghai, China, 8–11 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1441–1446. [Google Scholar]
Huang, N.; Wang, W.; Cai, G. Optimal configuration planning of multi-energy microgird based on deep joint generation of source-load-temperature scenarios. CSEE J. Power Energy Syst. 2020, 9, 1090–1102. [Google Scholar]
Mingze, Z.; Yichao, H.; Minghan, Y.; Min, W.; Xinyuan, S. Correlation analysis between load and output of renewable energy generation based on time-varying Copula theory. In Proceedings of the 8th Renewable Power Generation Conference (RPG 2019), Shanghai, China, 24–25 October 2019. [Google Scholar]
Yang, M.; Liu, W.; Yin, X.; Cui, Z.; Zhang, W. A two-stage scenario generation method for wind-solar joint power output considering temporal and spatial correlations. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 8–11 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 415–423. [Google Scholar]
Hu, D.; Ding, M.; Sun, L.; Zhang, J. Planning of high renewable-penetrated distribution systems considering complementarity and cluster partitioning. Energies 2019, 12, 2090. [Google Scholar] [CrossRef]
Dong, W.; Yang, Q.; Fang, X. Multi-step ahead wind power generation prediction based on hybrid machine learning techniques. Energies 2018, 11, 1975. [Google Scholar] [CrossRef]
Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef]
Papaioannou, G.P.; Dikaiakos, C.; Dramountanis, A.; Papaioannou, P.G. Analysis and modeling for short-to medium-term load forecasting using a hybrid manifold learning principal component model and comparison with classical statistical models (SARIMAX, Exponential Smoothing) and artificial intelligence models (ANN, SVM): The case of Greek electricity market. Energies 2016, 9, 635. [Google Scholar]
Yang, N.; Huang, Y.; Hou, D.; Liu, S.; Ye, D.; Dong, B.; Fan, Y. Adaptive nonparametric kernel density estimation approach for joint probability density function modeling of multiple wind farms. Energies 2019, 12, 1356. [Google Scholar] [CrossRef]
Gramacki, A. Nonparametric Kernel Density Estimation and Its Computational Aspects; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
Kornatka, M.; Gawlak, A. An Analysis of the Operation of Distribution Networks Using Kernel Density Estimators. Energies 2021, 14, 6984. [Google Scholar] [CrossRef]
Wu, X.; Lai, C.S.; Bai, C.; Lai, L.L.; Zhang, Q.; Liu, B. Optimal kernel ELM and variational mode decomposition for probabilistic PV power prediction. Energies 2020, 13, 3592. [Google Scholar] [CrossRef]
Wang, Y.; Zhou, X.; Shi, Y.; Zheng, Z.; Zeng, Q.; Chen, L.; Xiang, B.; Huang, R. Transmission network expansion planning considering wind power and load uncertainties based on multi-agent DDQN. Energies 2021, 14, 6073. [Google Scholar] [CrossRef]
Ma, X.; Liu, Y.; Yan, J.; Wang, H. A WGAN-GP-Based Scenarios Generation Method for Wind and Solar Power Complementary Study. Energies 2023, 16, 3114. [Google Scholar] [CrossRef]
Santambrogio, F. Euclidean, metric, and Wasserstein gradient flows: An overview. Bull. Math. Sci. 2017, 7, 87–154. [Google Scholar] [CrossRef]
Baran Junior, A.R.; Piazza Fernandes, T.S.; Borba, R.A. Voltage regulation planning for distribution networks using multi-scenario three-phase optimal power flow. Energies 2019, 13, 159. [Google Scholar] [CrossRef]
Ehsan, A.; Cheng, M.; Yang, Q. Scenario-based planning of active distribution systems under uncertainties of renewable generation and electricity demand. CSEE J. Power Energy Syst. 2019, 5, 56–62. [Google Scholar] [CrossRef]
Li, R.; Wang, W.; Chen, Z.; Jiang, J.; Zhang, W. A review of optimal planning active distribution system: Models, methods, and future researches. Energies 2017, 10, 1715. [Google Scholar] [CrossRef]
Colmenar-Santos, A.; Reino-Rio, C.; Borge-Diez, D.; Collado-Fernández, E. Distributed generation: A review of factors that can contribute most to achieve a scenario of DG units embedded in the new distribution networks. Renew. Sustain. Energy Rev. 2016, 59, 1130–1148. [Google Scholar] [CrossRef]
Luo, L.; Gu, W.; Zhang, X.-P.; Cao, G.; Wang, W.; Zhu, G.; You, D.; Wu, Z. Optimal siting and sizing of distributed generation in distribution systems with PV solar farm utilized as STATCOM (PV-STATCOM). Appl. Energy 2018, 210, 1092–1100. [Google Scholar] [CrossRef]
Yang, Y.; Wang, X.; Luo, J.; Duan, J.; Gao, Y.; Li, H.; Xiao, X. Multi-objective coordinated planning of distributed generation and AC/DC hybrid distribution networks based on a multi-scenario technique considering timing characteristics. Energies 2017, 10, 2137. [Google Scholar] [CrossRef]
Wang, P.; Li, Y.; Zhang, G. Probabilistic power curve estimation based on meteorological factors and density LSTM. Energy 2023, 269, 126768. [Google Scholar] [CrossRef]
Skarlatos, K.; Bekri, E.S.; Georgakellos, D.; Economou, P.; Bersimis, S. Projecting Annual Rainfall Timeseries Using Machine Learning Techniques. Energies 2023, 16, 1459. [Google Scholar] [CrossRef]
Lotfi, M.; Javadi, M.; Osório, G.J.; Monteiro, C.; Catalão, J.P.S. A novel ensemble algorithm for solar power forecasting based on kernel density estimation. Energies 2020, 13, 216. [Google Scholar] [CrossRef]
He, H.; Pan, J.; Lu, N.; Chen, B.; Jiao, R. Short-term load probabilistic forecasting based on quantile regression convolutional neural network and Epanechnikov kernel density estimation. Energy Rep. 2020, 6, 1550–1556. [Google Scholar] [CrossRef]
Han, J.; Pei, J.; Yin, Y.; Mao, R. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 2004, 8, 53–87. [Google Scholar] [CrossRef]
Singh, S.; Yassine, A. Big data mining of energy time series for behavioral analytics and energy consumption forecasting. Energies 2018, 11, 452. [Google Scholar] [CrossRef]
Bhandari, A.; Gupta, A.; Das, D. Improvised apriori algorithm using frequent pattern tree for real time applications in data mining. Procedia Comput. Sci. 2015, 46, 644–651. [Google Scholar] [CrossRef]

Figure 1. Overall process flow of this article.

Figure 2. Schematic diagram of distribution network topology structure.

Figure 3. The quantile of several meteorological factors. (a) Quantile of average temperature (b). Quantile of maximum temperature. (c) Quantile of minimum temperature. (d) Quantile of average radiation. (e) Quantile of maximum radiation. (f) Statistics of daily light hour frequency.

Figure 4. Schematic diagram of joint typical scenario generation.

Figure 5. Topological structure diagram of a medium voltage distribution network in a certain area.

Figure 6. Identification of typical daily operation scenarios of PV.

Figure 7. Identification of typical daily load operation scenarios.

Figure 8. Three typical scenario simulation verification processes.

Figure 9. Comparison of photovoltaic typical daily curves generated by different methods.

Figure 10. Comparison of typical daily curve errors for PV produced by different methods.

Figure 11. Comparison of load typical daily curves generated by different methods.

Figure 12. Comparison of typical daily curve errors for Load produced by different methods.

Figure 13. Joint example of typical scenarios of PV–load.

Table 1. Quantile processing of meteorological factors.

	1	2	3	4
Meteorological Factors	1	2	3	4
$F_{d_\max}$ (W/m²)	≤732.5	732.5–867	867–986	≥986
$F_{d_mean}$ (W/m²)	≤65	65–147.5	147.5–208	≥208
$F_{d_difmax}$ (W/m²)	≤216	216–293.3	293.3–387	≥387
$F_{d_difmean}$ (W/m²)	≤76.4	76.4–94.7	94.7–110.5	≥110.5
$T_{d_mean}$ (°C)	≤10.5	10.5–19.6	19.6–25.5	≥25.5
$T_{d_\max}$ (°C)	≤21.3	21.3–26	26–31.3	≥31.3
$T_{d_\min}$ (°C)	≤8.4	8.4–15.6	15.6–24.8	≥24.8
$S_{d_time}$ (h)	≤7	8–9	10	≥11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Zhong, F.; Xu, Y.; Liu, X.; Li, Z.; Liu, J.; Zhao, Z. Extraction and Joint Method of PV–Load Typical Scenes Considering Temporal and Spatial Distribution Characteristics. Energies 2023, 16, 6458. https://doi.org/10.3390/en16186458

AMA Style

Wang X, Zhong F, Xu Y, Liu X, Li Z, Liu J, Zhao Z. Extraction and Joint Method of PV–Load Typical Scenes Considering Temporal and Spatial Distribution Characteristics. Energies. 2023; 16(18):6458. https://doi.org/10.3390/en16186458

Chicago/Turabian Style

Wang, Xinghua, Fucheng Zhong, Yilin Xu, Xixian Liu, Zezhong Li, Jianan Liu, and Zhuoli Zhao. 2023. "Extraction and Joint Method of PV–Load Typical Scenes Considering Temporal and Spatial Distribution Characteristics" Energies 16, no. 18: 6458. https://doi.org/10.3390/en16186458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Extraction and Joint Method of PV–Load Typical Scenes Considering Temporal and Spatial Distribution Characteristics

Abstract

1. Introduction

2. Scene Extraction Based on Local Linear Embedding and Kernel Density Estimation

2.1. Description Matrix of PV and Load Daily Operation Scenarios

2.2. Matrix Dimensionality Reduction Based on Local Linear Embedding Method

2.3. Recognition of Candidate-Typical Scenes Based on Kernel Density Estimation

2.4. Typical Scene Extraction Based on Wasserstein Distance

3. PV–Load Typical Scene Joint Based on FP-Growth Algorithm

3.1. Variable Quantile Division Considering the Distribution Characteristics of Meteorological Factors

3.2. Generation of PV–Load Association Rule Library Based on FP-Growth

3.2.1. Construct FP Tree and Item Header Table

3.2.2. Frequent Itemset Mining and Improvement

4. Example Analysis

4.1. Extraction of Typical Scenarios

4.2. Comparison of Accuracy of Typical Daily Curves Generated by Different Methods

4.3. Combination of PV–Load Typical Scenarios

5. Conclusions and Innovation Points

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI