Kullback–Leibler Divergence-Based Distributionally Robust Chance-Constrained Programming for PV Hosting Capacity Assessment in Distribution Networks

Shen, Chao; Liu, Haoming; Wang, Jian; Yang, Zhihao; Hai, Chen

doi:10.3390/su17052022

Open AccessArticle

Kullback–Leibler Divergence-Based Distributionally Robust Chance-Constrained Programming for PV Hosting Capacity Assessment in Distribution Networks

by

Chao Shen

¹

,

Haoming Liu

^1,*

,

Jian Wang

¹

,

Zhihao Yang

²

and

Chen Hai

³

¹

School of Electrical and Power Engineering, Hohai University, Nanjing 211100, China

²

College of Electrical, Energy and Power Engineering, Yangzhou University, Yangzhou 225000, China

³

College of Artificial Intelligence and Automation, Hohai University, Nanjing 211100, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(5), 2022; https://doi.org/10.3390/su17052022

Submission received: 15 January 2025 / Revised: 19 February 2025 / Accepted: 25 February 2025 / Published: 26 February 2025

Download

Browse Figures

Versions Notes

Abstract

:

This paper addresses the challenge of assessing photovoltaic (PV) hosting capacity in distribution networks while accounting for the uncertainty of PV output, a critical step toward achieving sustainable energy transitions. Traditional optimization methods for dealing with uncertainty, including robust optimization (RO) and stochastic optimization (SO), often result in overly conservative or optimistic assessments, hindering the efficient integration of renewable energy. To overcome these limitations, this paper proposes a novel distributionally robust chance-constrained (DRCC) assessment method based on Kullback–Leibler (KL) divergence. First, the time-segment adaptive bandwidth kernel density estimation (KDE) combined with Copula theory is employed to model the conditional probability density of PV forecasting errors, capturing temporal and output-dependent correlations. The KL divergence is then used to construct a fuzzy set for PV output, quantifying its uncertainty within specified confidence levels. Finally, the assessment results are derived by integrating the fuzzy set into the optimization model. Case studies demonstrate its effectiveness of the method. Key findings indicate that higher confidence levels reduce PV hosting capacities due to broader uncertainty ranges, while increased historical sample sizes enhance the accuracy of distribution estimates, thereby increasing assessed capacities. By balancing conservatism and optimism, this method enables safer and more efficient PV integration, directly supporting sustainability goals such as reducing fossil fuel dependence and lowering carbon emissions. The findings provide actionable insights for grid operators to maximize renewable energy utilization while maintaining grid stability, advancing global efforts toward sustainable energy infrastructure.

Keywords:

Copula theory; distributionally robust chance-constrained; kernel density estimation; Kullback–Leibler divergence; renewable energy integration

1. Introduction

1.1. Motivation

Traditional power generation, heavily reliant on fossil fuels, not only worsens energy shortages but also harms the environment. In contrast, renewable energy sources, recognized for their low costs and environmental benefits, have grown quickly and are widely used. This shift helps reduce the gap between energy supply and demand while supporting environmental protection and ecosystem health. In 2015, the United Nations adopted 17 Sustainable Development Goals, one of which is affordable and clean energy [1]. To achieve global energy and climate goals, further integration of renewable energy needs to be promoted [2].

However, as a key component of renewable energy, the increasing capacity of photovoltaic (PV) generations pose a significant challenge to the stable operation of distribution networks due to their inherent variability and stochasticity [3,4,5], including frequent changes in power flow direction and magnitude, disruptions to relay protection mechanisms, degradation of voltage and current quality. Therefore, it becomes crucial to integrate PV as much as possible while ensuring the stable and efficient operation of the distribution network. This requires a scientifically rigorous method that fully considers the safety constraints of distribution networks and the uncertainties in PV output to objectively assess the capacity of PV generations.

1.2. Literature Review

The PV hosting capacity is defined as the maximum capacity that distribution networks can accommodate under various safety constraints, including stable node voltages, line current limits, and acceptable power quality [6,7,8]. Current research on PV hosting capacity assessment in distribution networks primarily employs mathematical optimization methods. References [9,10,11] use deterministic methods for assessment, without considering system uncertainties. To address these uncertainties, mathematical optimization methods are further categorized into robust optimization (RO), stochastic optimization (SO), mathematical analysis (MA), and distributionally robust optimization (DRO) [12,13,14].

RO is a method for optimizing under uncertainty, designed to ensure that system performance is maintained even in the most adverse conditions [15,16,17]. It characterizes uncertainty by constructing an uncertainty set and aims to find solutions that perform effectively across all potential uncertainties. However, RO typically yields conservative results and handling the uncertainty set is often complex. For example, ref. [16] incorporated adjustable robust models to manage uncertainties in distributed generation, but this approach may underestimate hosting capacity by assuming extreme scenarios that both PV generation and load demand peak simultaneously. SO is a probability-based analytical method that assesses system performance by modeling and statistically analyzing uncertainties [17,18,19]. However, SO relies heavily on extensive historical data and requires that the probability distribution function (PDF) of uncertainties is known or can be estimated. Ref. [17] used Markov Chain Monte Carlo for probabilistic hosting capacity assessment, but real-world data often exhibit non-Gaussian or time-varying patterns, leading to model mismatch. It also tends to overlook extreme scenarios, which can lead to overly optimistic results. In the process of using stochastic optimization, ref. [18] overlooked the volatility of renewable energy generation, leading to overly optimistic results. MA is a method that estimates system performance through analytical approaches or simplified models [20,21,22,23]. It has the advantages of quick computation, intuitive results, and easy interpretation. However, under different uncertainty conditions, the accuracy of its results may be difficult to guarantee. Ref. [20] proposed an interval-affine arithmetic-based hosting capacity framework. While the approach enables faster analysis and provides results comparable to time series simulations, it may not fully account for all uncertainties present in real-world scenarios.

RO and SO are two opposing approaches: RO disregards probability distributions, focusing on decision making in the worst-case scenario, while SO relies entirely on probability distributions for modeling. DRO [24,25] is an optimization method that lies between RO and SO. DRO assumes that the PDF of uncertain parameters is unknown, but the distribution set can be limited through specific constraints, such as Kullback–Leibler (KL) divergence [26] or Wasserstein distance [27,28,29,30,31,32,33]. While Wasserstein distance offers advantages like robustness to non-overlapping distributions and spatial awareness, it also introduces specific limitations compared to KL divergence. Ref. [28] combined Wasserstein with moment constraints, but finite data can skew moment estimates, leading to unstable solutions compared to KL’s distributional focus. For multi-terminal VSC-HVDC grids in [29], Euclidean distance in Wasserstein may oversimplify spatial–temporal correlations in renewable generation, unlike KL divergence’s probabilistic focus. Ref. [30] relied on Wasserstein-based ambiguity sets for PV integration capacity, which require iterative optimization over large datasets, increasing runtime compared to KL divergence’s closed-form calculations. KL divergence, in contrast, is computationally efficient as it directly calculates entropy differences without optimization loops. The DRO method optimizes for the worst-case scenario within the distribution set, thereby enhancing the robustness of the system. The robustness of the results from DRO depends on the accurate specification of the distribution set. If the set is too broad, it may lead to overly conservative outcomes. Additionally, the accuracy of estimating the distribution parameters is crucial, as errors in these estimates can significantly affect the assessment results.

1.3. Contributions and Organization

This paper proposes a distributionally robust chance-constrained (DRCC) assessment model for PV hosting capacity in distribution networks that incorporates KL divergence to address gaps in existing research. The main contributions of this paper are as follows:

Different from [17,18,19], which use empirical probability distributions to characterize PV output uncertainty, we employ a time-segmented adaptive kernel density estimation (KDE) method and Copula function to derive the conditional probability density function (CPDF) of PV forecast errors under different PV output conditions. This method eliminates reliance on empirical probability distributions, thereby yielding a more accurate representation of PV output uncertainty.
This paper applies the KL divergence to assess the divergence between the actual and empirical probability distributions of PV output. This divergence is adjusted for confidence level based on the number of historical samples, resulting in a PV output fuzzy set. Compared to the Wasserstein distance used in [27,28,29,30,31,32,33], KL divergence is more suitable for distributions with similarities and overlaps, and its calculation is generally more efficient when the probability density function is known.

The remainder of this paper is organized as follows: Section 2 introduces the uncertainty model of PV output. Section 3 presents the construction of the DRCC fuzzy set. Section 4 describes the PV hosting capacity assessment model. Case studies are conducted in Section 5. Section 6 provides a summary of the paper.

2. Modeling of the Uncertainty of PV Output

Due to the inherent intermittency and randomness of PV generations, their power output is significantly affected by external environmental factors, such as solar irradiance, temperature, and weather conditions, leading to considerable uncertainty. This variability poses challenges to integrating renewable energy into the grid in a sustainable manner. Therefore, this section introduces an adaptive bandwidth KDE method based on time segmentation and utilizes the Copula function to model the probability density of PV forecasting errors, thereby assessing the uncertainty of PV output.

2.1. Time-Segment Adaptive Bandwidth KDE

To analyze the differences in PV forecast errors across different time periods, historical data on PV forecast errors are sequentially displayed in chronological order, as shown in Figure 1. They reveal significant differences in PV forecast errors across various time periods. The year is divided into two time periods: Period 1 from April to September, and Period 2 from January to March and October to December. The relationship between PV forecasting errors and PV output is analyzed for each time period separately. The average PV output and forecast error are calculated for each time point within each period, producing daily cycle PV output and forecast error curves, as shown in Figure 2. It shows substantial differences in forecast error magnitudes across periods. Additionally, there is a clear correlation between PV output and forecast error in terms of both time scale and magnitude. Given these observed differences, a time-segmented adaptive bandwidth KDE method is proposed.

Non-parametric KDE avoids the influence of parameter estimation, thereby providing a more accurate approximation of the actual distribution. However, the effectiveness of KDE is contingent upon the selection of an appropriate bandwidth. Therefore, this paper calculates the bandwidth for each time segment individually.

Taking PV forecasting error sample

Δ P

as an example. The bandwidth calculation for each time segment is as follows:

h_{Δ P, t} = \sqrt{\frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} {(Δ P_{t, j} - Δ {\bar{P}}_{t})}^{2}}, t = 1, 2, \dots C_{In}

(1)

where

h_{Δ P, t}

represents estimated bandwidth of PV forecasting error

Δ P

in the time segment t.

In this paper, the Gaussian kernel function is employed for KDE, and its corresponding estimation function is expressed as follows:

\hat{f} (Δ P_{t}) = \frac{1}{M} \sum_{t = 1}^{C_{In}} \frac{1}{\sqrt{2 π} h_{t}} \sum_{n = 1}^{N_{t}} \exp [- \frac{1}{2} {(\frac{Δ P - Δ P_{t, n}}{h_{t}})}^{2}]

(2)

where

\hat{f} (Δ P_{t})

represents the estimated probability density of PV forecasting error,

Δ P

, in the time segment, t. From (1) and (2), the PDF of the PV output dataset

{P_{t}}

and the PV forecasting error dataset

{Δ P_{t}}

for the time segment, t, can be obtained.

2.2. PV Conditional Forecasting Errors Model Based on Copula Theory

PV forecasting errors are not only dependent on the time period in which they occur but are also strongly correlated with the PV output during that period. As a result, the CPDF of PV forecasting errors can be derived under varying PV output conditions. Copula theory offers an effective framework for describing the dependency structure between random variables. The Copula function connects the joint distribution of multivariate random variables with their respective marginal distributions.

Based on Copula theory, the probability distributions of the PV output dataset

{P_{t}}

and the PV forecasting error dataset

{Δ P_{t}}

for time segment, t, can be used as marginal distributions, thereby establishing their joint probability distribution function (JPDF).

F_{J} {P_{t}, Δ P_{t}} = C_{P Δ P} {F (P_{t}), F (Δ P_{t})}

(3)

The Copula function comprises five distinct types: Normal Copula, t-Copula, Gumbel-Copula, Clayton-Copula, and Frank-Copula. To select the most appropriate Copula function for fitting, the Kendall rank correlation coefficient, commonly used for nonlinear problems, is introduced. This coefficient quantifies the difference between the concordant and discordant probabilities for two independent random variables

(X_{1}, Y_{1})

and

(X_{2}, Y_{2})

, which have the same distribution as

(X, Y)

. The definition of the Kendall rank correlation coefficient τ is as follows:

τ = P [(X_{1} - X_{2}) (Y_{1} - Y_{2}) > 0] - P [(X_{1} - X_{2}) (Y_{1} - Y_{2}) < 0]

(4)

First, the empirical Copula function for PV output and PV forecasting error is calculated. Then, the Euclidean distances between the Kendall rank correlation coefficients of five types of Copula functions and the empirical Copula function are computed. The Copula function with the smallest Euclidean distance is considered to have the highest goodness-of-fit and is subsequently used to calculate the joint probability distribution function of PV output and forecast error. Consequently, the CPDF of PV output forecasting errors is derived.

f_{J} (P_{t}, Δ P_{t}) = C_{P Δ P} {F (P_{t}), F (Δ P_{t})} f (P_{t}) f (Δ P_{t})

(5)

f_{Δ P |P} {Δ P_{t} |P_{t}} = \frac{f_{J} (P_{t}, Δ P_{t})}{f (P_{t})}

(6)

2.3. Evaluation of the Probabilistic Model for PV Forecasting Errors

Hypothesis testing is a method used to determine whether a theoretical distribution aligns with the actual distribution of a sample. If a significant discrepancy exists between the theoretical distribution and the sample’s actual distribution, the model based on the theoretical distribution may result in considerable errors when applied to real-world data. Given that the theoretical distribution of photovoltaic forecasting errors is unknown, this paper employs non-parametric hypothesis testing.

The Cramér-von Mises (CvM) test measures the overall squared deviation between the empirical cumulative distribution function (ECDF) and the target cumulative distribution function (CDF). Its test statistic is defined as follows:

W^{2} = M \int_{- \infty}^{+ \infty} {(F (Δ P) - F_{e} (Δ P))}^{2} d F_{e} (Δ P)

(7)

The AD test builds upon the CvM test by introducing a weighting function that increases sensitivity to differences in the tails of the distribution. The test statistic is defined as:

A^{2} = \int_{- \infty}^{+ \infty} {(F (Δ P) - F_{e} (Δ P))}^{2} w (Δ P) d F_{e} (Δ P)

(8)

w (Δ P) = [F (Δ P) {(1 - F (Δ P))]}^{- 1}

(9)

The root mean square error (R_MSE) is used to analyze the difference between the output results of the photovoltaic forecasting error probabilistic model and the observed data. Its expression is given as follows:

R_{M S E} = \sqrt{\frac{1}{M} \sum_{t = 1}^{N_{t}} {(Δ P_{t} - Δ {\overset{⌢}{P}}_{t})}^{2}}

(10)

3. DRCC Fuzzy Set for PV Forecasting Errors

Although the empirical distribution provides a foundation for simulating the actual probability distribution, it is not equivalent to the actual probability distribution. To address this limitation and enhance model robustness, an uncertainty tolerance interval can be constructed. It can ensure that the statistical distance between the empirical and actual probability distributions remains within a predefined allowable threshold. By incorporating this uncertainty tolerance, the model better supports the integration of renewable energy sources, such as PV systems, into the grid, helping to optimize their contribution to a more sustainable and resilient energy system.

3.1. Confidence Interval for PV Output Efficiency

The PV capacity

S_{PV}

used in the PV output sample data are fixed, but in assessing PV hosting capacity, the PV capacity is an optimization variable. To ensure the applicability of the data, PV output,

P

, is expressed as PV output efficiency,

ω

, and PV forecasting error,

Δ P

, is expressed as PV output efficiency forecasting error,

Δ ω

.

\{\begin{matrix} ω = \frac{P}{S_{PV}} \\ Δ ω = \frac{Δ P}{S_{PV}} \end{matrix}

(11)

Based on the CPDF of PV forecasting errors, a chance constraint is introduced to determine the confidence interval for PV output, forming an uncertain set under chance constraints. Given a confidence level

α

and the PDF of PV forecasting errors, the confidence interval for PV output efficiency is calculated as follows:

\{\begin{matrix} W_{α} = [ω - Δ {\underline{ω}}_{α}, ω + Δ {\bar{ω}}_{α}] \\ P (Δ ω \leq Δ {\underline{ω}}_{α}) = (1 - α) / 2 \\ P (Δ ω \geq Δ {\bar{ω}}_{α}) = (1 - α) / 2 \end{matrix}

(12)

3.2. DRCC Fuzzy Set of PV Forecasting Errors Based on KL Divergence

In practical applications, many parameters that describe system behavior are inherently uncertain, often taking the form of random or fuzzy variables. To ensure the effectiveness and adaptability of optimization results in real-world applications, it is essential to properly address parameter uncertainty within the model. The general form of an uncertain programming model that incorporates random variables is expressed as follows:

\begin{array}{l} \min a {(ξ)}^{T} x_{s} \\ s . t . v (ξ) x_{s} \leq b (ξ) \\ x_{s} \geq 0 \end{array}

(13)

where a represents the objective function, and v and b together define the constraint conditions.

Each random variable,

ξ

, corresponds to a solution,

x_{s} (ξ)

, but the decision-making process is in the context of uncertainty. At the time of making decisions, the exact state of uncertain variables is unknown, and their actual values only become evident after the decisions are implemented. To address these challenges, various strategies have been developed in academic research, including SO, RO, and DRO. This study focuses on the DRO method.

Under uncertain conditions, traditional constraints can be reformulated as follows:

g^{0} (x_{s}) + g {(x_{s})}^{T} ξ \leq 0

(14)

where

g^{0}

represents constraints independent of random variables;

g

represents constraints affected by random variables.

The main challenge in uncertain programming problems lies in effectively managing constraint (14). This study primarily adopts a DRCC programming approach to address this, with a specific focus on managing.

Ρ [g^{0} (x_{s}) + g {(x_{s})}^{T} ξ \leq 0] \geq 1 - ε, \forall P \in D

(15)

Equation (15) indicates that (14) holds with a probability of at least

1 - ε

. In this equation, the random variable

ξ

follows a JPDF P, where P is uncertain and belongs to a fuzzy set D.

Based on Equation (15), the confidence interval for PV output can be reformulated as a chance constraint.

\{\begin{matrix} W_{α} = [ω - Δ {\underline{ω}}_{α}, ω + Δ {\bar{ω}}_{α}] \\ P (Δ ω \leq Δ {\underline{ω}}_{α}) \geq (1 - α) / 2 \\ P (Δ ω \geq Δ {\bar{ω}}_{α}) \geq (1 - α) / 2 \end{matrix}

(16)

The KL divergence can be used to describe the distance, D_KL, between the actual PDF f and the empirical probability distribution function (EPDF) f₀. The distance between the estimated PDF of PV forecasting errors and the EPDF of PV forecasting errors is given as follows [26]:

D_{KL} (f_{Δ P |P} ∥ f_{e}) = \int_{D} \log \frac{f_{Δ P |P} (ξ)}{f_{0} (ξ)} f_{Δ P |P} (ξ) d ξ

(17)

D_KL represents the KL divergence between the actual PDF,

f_{Δ P |P}

, and the EPDF, f₀. Therefore, the KL divergence characterizes the distance between the empirical distribution and the actual distribution. The KL divergence fuzzy set for PV forecasting errors is expressed as follows:

D = \{ℂ | D_{KL} (f_{Δ P |P} ∥ f_{e}) ⩽ d_{KL}, f = \frac{d ℂ}{d ξ}\}

(18)

The value of d_KL can be estimated using historical data [34], and the estimation method is as follows:

d_{KL} = \frac{1}{2 M} χ_{N - 1, α}^{2}

(19)

It is evident from the expression that d_KL decreases as the sample size, M, increases, while it increases with the confidence level,

α

.

By adjusting the value of d_KL, the range of variation in the actual PDF can be altered. All the actual probability distributions that satisfy the conditions constitute the KL divergence fuzzy set, while the DRCC aims to identify the worst-case scenario within the fuzzy set, ensuring that (20) holds.

\inf_{ℂ \in D} P [c (x_{s}, ξ) \leq 0] ⩾ 1 - α

(20)

where

c (x_{s}, ξ) \leq 0

represents the inequality constraint that the random variable satisfies.

3.3. Chance-Constrained Uncertainty Set

Theorem 1 in [35] provides the equivalence transformation formula based on the KL divergence fuzzy set. The DRCC in (14) can be reformulated as the traditional chance constraint.

\{\begin{array}{l} P [c (x_{s}, ξ) \leq 0] ⩾ 1 - α_{1 +} \\ α_{1 +} = 1 - \inf_{z \in (0, 1)} \{\frac{e^{- d_{KL}} z^{1 - α} - 1}{z - 1}\} \end{array}

(21)

Equation (15) can adjust the confidence level to using the KL divergence-based fuzzy set, thereby obtaining the PV output efficiency interval.

\{\begin{matrix} W_{α} = [ω - Δ {\underline{ω}}_{α_{1 +}}, ω + Δ {\bar{ω}}_{α_{1 +}}] \\ P (Δ ω \leq Δ {\underline{ω}}_{α_{1 +}}) = (1 - α_{1 +}) / 2 \\ P (Δ ω \geq Δ {\bar{ω}}_{α_{1 +}}) = (1 - α_{1 +}) / 2 \end{matrix}

(22)

Considering that extreme scenarios always occur at the boundaries, the above confidence interval is discretized into the following chance-constrained uncertainty set:

Ω_{α_{1 +}} = \{ω |\begin{array}{l} ω = ω - μ Δ {\underline{ω}}_{α_{1 +}} + ν Δ {\bar{ω}}_{α_{1 +}} \\ 0 \leq μ + ν \leq 1 \\ μ \in {0, 1}, ν \in {0, 1} \end{array}\}

(23)

where

μ

and

ν

represent the state variables for the lower and upper boundaries of the confidence interval of PV output efficiency, respectively. Using the above method, the uncertainty of PV output is represented as different deterministic PV output scenarios.

Ω_{α_{1 +}} = \{ω_{λ_{1}}, ω_{λ_{2}}, \dots, ω_{λ_{q}}\}

(24)

where

λ_{1}, λ_{2} \dots, λ_{q}

represent different deterministic PV output scenarios. Each deterministic PV output scenario in the uncertainty set will serve as a constraint in the PV hosting capacity evaluation model.

4. PV Hosting Capacity Assessment Model

In assessing the hosting capacity of distribution networks for PV generations, the key is to determine the maximum PV integration capacity that the network can accommodate while ensuring system stability and sustainability. An analytical method is employed as the assessment tool, with its core theoretical foundation rooted in the optimal power flow model.

4.1. Objective Function

The core task of assessing its PV hosting capacity is to determine the maximum PV integration capacity the network can accommodate. DRO aims to find the worst-case result within the fuzzy set. Accordingly, the following objective function can be defined as follows:

\min_{ω} \max f = \sum S_{PV, i}, i \in Ω^{PV}, ω \in Ω_{α_{1 +}}

(25)

where

Ω_{α_{1 +}}

represents the chance-constrained uncertainty set established previously. The process of the PV capacity assessment model is shown in Figure 3.

4.2. Constraints

4.2.1. Power Flow

The DistFlow branch model is used to represent the nonlinear power flow model of the distribution network as follows:

\{\begin{array}{l} \sum_{j \in Ω_{i}^{st}} P_{i j} - \sum_{k \in Ω_{i}^{end}} (P_{k i} - I_{k i}^{2} R_{k i}) = P_{inj, i} \\ \sum_{j \in Ω_{i}^{st}} Q_{i j} - \sum_{k \in Ω_{i}^{end}} (Q_{k i} - I_{k i}^{2} X_{k i}) = Q_{inj, i} \end{array}, i \in Ω^{N}

(26)

\begin{matrix} U_{j}^{2} = U_{i}^{2} - 2 (P_{i j} R_{i j} + Q_{i j} X_{i j}) + I_{i j}^{2} (R_{i j}^{2} + X_{i j}^{2}) \\ i, j \in Ω^{N}, i j \in Ω^{L} \end{matrix}

(27)

I_{i j}^{2} U_{i}^{2} = P_{i j}^{2} + Q_{i j}^{2}, i \in Ω^{N}, i j \in Ω^{L}

(28)

The above power flow model is nonlinear, and a second-order cone relaxation model can be applied to handle it. In this process, replace the term

U_{i}^{2}

in the nonlinear model with

{\tilde{U}}_{i}

, and replace

I_{i j}^{2}

with

{\tilde{I}}_{i j}

. After substitution into the power flow equations, apply second-order cone relaxation to (27).

{‖\begin{matrix} 2 P_{i j} \\ 2 Q_{i j} \\ {\tilde{I}}_{i j} - {\tilde{U}}_{i} \end{matrix}‖}_{2} ⩽ {\tilde{I}}_{i j} + {\tilde{U}}_{i}, i \in Ω^{N}, i j \in Ω^{L}

(29)

4.2.2. PV Operating Constraints

The active power output of PV generations is determined by the PV grid-connected capacity and the PV output efficiency. Through the PV inverter, reactive power can be either generated or absorbed under a specific power factor angle, which is expressed as follows:

\{\begin{array}{l} P_{PV, i} = ω_{i} S_{PV, i} \\ ω_{i} \in Ω_{α_{1 +}} \\ Ω_{α_{1 +}} = \{ω_{λ_{1}}, ω_{λ_{2}}, \dots, ω_{λ_{q}}\} \\ Q_{PV, i} = \tan φ_{i} P_{PV, i} \end{array}, i \in Ω^{PV}

(30)

By adjusting the state variables

μ

and

ν

in the chance-constrained uncertainty set of PV output efficiency

Ω_{α_{1 +}}

, different values of

ω

can be obtained, leading to different PV hosting capacity evaluation results. The evaluation result with the minimum PV hosting capacity is taken as the final result.

4.2.3. Network Safety Constraints

The premise of PV hosting capacity assessment is the satisfaction of system safety constraints. The main constraints include transmission power constraints at interconnected nodes, node voltage constraints, and line transmission capacity constraints, which are expressed as follows:

\{\begin{array}{l} P_{sub, i}^{\min} ⩽ P_{sub, i} ⩽ P_{sub, i}^{\max} \\ Q_{sub, i}^{\min} ⩽ Q_{sub, i} ⩽ Q_{sub, i}^{\max} \end{array}, i \in Ω^{sub}

(31)

U_{i}^{\min} \leq U_{i} \leq U_{i}^{\max}, i \in Ω^{N}

(32)

P_{i j}^{2} + Q_{i j}^{2} ⩽ {(S_{i j}^{2})}^{\max}, i j \in Ω^{L}

(33)

The transmission capacity constraint of the power line is nonlinear. To address the optimization problem in subsequent steps, a quadratic constraint linearization method can be applied, replacing the circular constraint with two squared constraints.

\{\begin{array}{l} - S_{i j}^{\max} \leq P_{i j} \leq S_{i j}^{\max} \\ - S_{i j}^{\max} \leq Q_{i j} \leq S_{i j}^{\max} \\ - \sqrt{2} S_{i j}^{\max} \leq P_{i j} + Q_{i j} \leq \sqrt{2} S_{i j}^{\max} \\ - \sqrt{2} S_{i j}^{\max} \leq P_{i j} - Q_{i j} \leq \sqrt{2} S_{i j}^{\max} \end{array}

(34)

5. Case Studies

In this section, the PV hosting capacity assessment method is applied to the IEEE 33-bus system.

5.1. Test System Parameters

The IEEE 33-bus distribution system operates at a standard voltage of 12.66 kV, with node 1 at the substation acting as the interconnection node, assumed to have a voltage of 1 p.u. The day is divided into 24 time intervals, with load and PV output data sourced from the publicly available database of Ireland. The PV output data are real operational data, and the PV forecasting error is based on predicted data. The selection of data samples must first ensure reliability. If there are missing values in the historical data, the mean of adjacent values is used to fill in the gaps. A small number of outliers will not have a significant impact on the results, but if a large number of outliers occur in a concentrated manner, data should be reselected or deleted, provided that the required sample size is met. If the sample size cannot be met, the mean value of the historical dataset at that time is used instead. Since the PV hosting capacity assessment focuses on normal system operation, extreme cases need not be considered separately. If a small number of extreme cases occur in the dataset, such as extreme weather or grid failures, the above method can also be used for data processing.

5.2. Comparative Analysis of the Probabilistic Model for PV Forecasting Errors

This section compares five different probabilistic models for PV forecasting errors. The CvM statistics, AD statistics, and RMSE for each method are shown in Table 1.

Model 4 assumes that PV forecasting errors for each time interval are independent and normally distributed, whereas Model 5 assumes that the overall PV forecasting errors are normally distributed. From Table 1, it can be observed that the models constructed using the KDF method have smaller CvM and AD statistics. This suggests that the PV forecasting error probability distribution derived using the KDF method is more closely aligned with the actual distribution. Especially in cases where the probability distribution has unknown parameters, it demonstrates better fitting results. Particularly in terms of the AD statistic, the KDF method shows a clear advantage. This is because the AD statistic effectively captures the fit of the distribution’s tail, and the PV uncertainty model presented in this paper focuses on assessing the extreme values of forecasting errors. Therefore, the KDF method can precisely quantify PV uncertainty.

Furthermore, the time-segmented KDF method takes into account the temporal variations in PV forecasting errors, while the Copula method considers the correlation between PV forecasting errors and PV output. These methods make the model output more closely match observed data, leading to a lower RMSE for the combined time-segmented KDF and Copula model.

5.3. Analysis of the Sensitivity for Model Parameters

Within the obtained PV output efficiency interval, the minimum total PV hosting capacity is calculated as the PV hosting capacity of the system. In this section, PV generations can be installed at any node except for the transformer node. The PV hosting capacity is assessed under different confidence levels and historical sample sizes. The specific assessment results are shown in Figure 4.

As shown in Figure 4, with the increase in confidence level, the PV capacity gradually decreases. A higher confidence level implies that the uncertainty range of PV output becomes broader, suggesting an increased tolerance for PV uncertainty. Additionally, the PV capacity increases with the number of historical samples. According to Equation (19), a larger number of historical samples results in a smaller KL divergence tolerance, indicating a closer match between the estimated and actual distributions. According to Equation (20), the adjusted confidence level also increases, signifying a reduction in the PV output uncertainty that can be tolerated.

Figure 5 illustrates the adjusted confidence levels for different confidence levels and KL divergence tolerance values. It is worth noting that, regardless of the confidence level, the change in the adjusted confidence level is small. As more historical data become available, the depiction of PV uncertainty becomes more accurate, and the probability of PV output reaching extreme conditions becomes lower. However, at extreme confidence levels, the probability of extreme conditions is accepted regardless of its size, resulting in small changes in the adjusted confidence level.

In special cases, when the confidence level is 0, it indicates that there is no uncertainty in the PV system, and the result corresponds to the deterministic optimization outcome, which is 20.3781 MW. Conversely, when the confidence level is 1, it represents the worst-case scenario under all possible conditions, yielding the result of RO, which is 12.4445 MW. Therefore, when the confidence level is either 0 or 1, the change in PV hosting capacity with varying sample sizes is minimal. From the assessment results, it is evident that the PV hosting capacity obtained through deterministic optimization is greater than that from RO and DRO, as deterministic optimization relies solely on historical data without accounting for uncertainty in PV output, leading to an overly optimistic result. In contrast, the PV hosting capacity obtained from RO is smaller than that from DRO, reflecting its higher conservatism. The reason for this is that RO only considers the uncertainty set and neglects the probabilistic distribution characteristics of uncertainty, leading to a result skewed towards the worst-case scenario with excessive conservatism. In comparison, the DRCC method employed in this study effectively balances the historical data and the EPDF of uncertainty, offering a well-balanced trade-off between optimism and conservatism, and demonstrating significant practical value.

Based on the above analysis, this method faces several challenges when applied to real power grids. A higher confidence level leads to a broader uncertainty range, which reduces the PV integration capacity and results in more conservative outcomes. On the other hand, a lower confidence level may overestimate the PV integration capacity, leading to optimistic but potentially unreliable predictions. The accuracy of this method largely depends on the quantity and quality of historical data. For grid operators, collecting sufficient and accurate historical data is a significant challenge. In real-world scenarios, data may not always be comprehensive or updated in a timely manner, which could lead to uncertainty in the PV integration capacity assessment. This issue is particularly relevant in the context of transitioning toward more sustainable energy systems, where accurate forecasting and integration of renewable energy sources like PV are essential for reducing reliance on fossil fuels and minimizing environmental impact. In response to these challenges, we offer the following recommendations to promote both reliability and sustainability. In the actual operation of the power grid, decision makers can first determine the confidence level based on their risk tolerance and then adjust the confidence level according to the scale and reliability of the available data to obtain PV hosting capacity that meets practical requirements.

5.4. Comparative Analysis of Different Methods

This section will first compare the differences between DRCC, RO, and SO in terms of evaluation results and computational efficiency. The specific results are shown in Table 2. The DRCC sample size is 1200, with a confidence level of 0.8.

From the table, it can be seen that the evaluation result of RO is smaller than that of DRCC. As stated in Section 5.3, the evaluation result of RO is equivalent to the result of DRCC when the confidence level is 1. This is because RO only considers the worst-case scenario in the PV output uncertainty set, so its result is comparable to the minimum result of DRCC. The computation time for RO is only one-third of that for DRCC because RO only needs to calculate the worst-case scenario, so its computational load is smaller than that of DRCC. The PV hosting capacity evaluated by SO is slightly larger than that of DRCC. This is because the worst-case scenario occupies a small proportion in the uncertainty set, so the probability of encountering the worst-case scenario when SO traverses all scenarios is very small, making its result much larger than RO’s and slightly larger than DRCC’s evaluation result. The computation time for SO is much greater than that of DRCC and RO because SO needs to traverse all the scenarios in the uncertainty set and evaluate them sequentially, which often consumes a significant amount of time.

While RO ensures robustness, its conservatism may unnecessarily restrict PV deployment, hindering progress toward renewable energy targets. Conversely, SO’s reliance on historical data overlooks extreme weather events exacerbated by climate change, risking grid instability as PV penetration grows. By balancing robustness and adaptability, the proposed KL divergence-based method supports sustainable energy transitions by avoiding both underutilization of renewables and over-optimism.

This section will also compare the differences between the Wasserstein distance and KL divergence as two distance measurement methods. According to the method proposed in [30], the PV hosting capacity is calculated under different confidence levels. The results are shown in Figure 6.

Confidence level partly represents the risk tolerance. As the confidence level increases, the uncertainty of PV output increases, and the PV hosting capacity evaluation result becomes lower. This result is consistent with the KL divergence-based method. The Wasserstein distance measures the minimum amount of work required to transform one distribution into another. In DRCC, it is required that the probability of the Wasserstein distance not exceeding the tolerance value is greater than or equal to a specified confidence probability. Therefore, as the Wasserstein distance increases, the uncertainty of PV output increases, and the PV hosting capacity evaluation result becomes lower. However, as the distance grows larger, its growth rate slows down. Due to the asymmetry of KL divergence, it requires that the gap between the actual and empirical distributions cannot be too large, which is clearly in line with practical production.

To make a more detailed comparison with the evaluation results in Section 5.3, the maximum Wasserstein distance between the distribution in the uncertainty set and the actual distribution is also calculated for different confidence levels and sample sizes. The specific results are shown in Table 3.

Unlike KL divergence, Wasserstein distance can measure the distance between any distributions, so the tolerance value for Wasserstein distance has a much larger range and includes more extreme cases. Consequently, the range of the evaluated PV hosting capacity is larger than that obtained using KL divergence. While PV output uncertainty is a low-dimensional problem, and KL divergence is easier to compute when extreme cases are not considered, it is also more efficient for practical applications, especially when the tolerance value can be derived from actual data. This makes KL divergence particularly useful for real-world scenarios, where time and computational resources are often limited. If more accurate or extreme-case results are needed, Wasserstein distance can be used. However, small changes in the Wasserstein distance can cause large differences in the evaluation results, which might introduce unnecessary conservatism. In this context, careful consideration is needed to filter the results based on the actual situation, ensuring that the evaluation not only provides accurate assessments but also aligns with sustainability goals by facilitating the efficient integration of renewable energy into the grid while maintaining system stability and minimizing environmental impact.

Table 4 shows the computation time required for the two methods under different dataset sizes. It can be observed from the table that as the data scale increases, the computation time also slightly increases. This is due to the fact that DRCC, when handling the uncertainty set, requires a thorough consideration of historical data to achieve a more accurate data distribution. The uncertainty set of PV output is then determined based on the corresponding confidence level. While this process increases computation time, the increase in time is not significant as the dataset grows, showing that the method can effectively handle large-scale datasets. The table also shows that when using the Wasserstein distance for evaluation, the computation time required is higher than that for KL divergence. While Wasserstein distance offers a more general framework for measuring distributional differences and can handle a wider variety of distributions, it also increases computational complexity, requiring more computation time. KL divergence provides computational simplicity and clear interpretability, but it may be less flexible when dealing with various distributional differences. However, this paper presents a probability estimation method combining KDE and Copula, which produces a more accurate PV output probability distribution, ensuring that the difference between the empirical and actual distributions remains small. This method enhances overall computational efficiency and speed while maintaining the accuracy of the evaluation results.

5.5. Impact of PV Installation Location

In this section, the PV hosting capacity is assessed with a sample size of 1200 and a confidence level of 0.8. To analyze the impact of PV installation locations on PV hosting capacity, the PV hosting capacity is assessed under the following two scenarios.

Scenario 1: PV installation is allowed at node 2 only, as this node is directly connected to the highest-capacity lines in the system.

Scenario 2: PV installation is allowed at end nodes 18, 22, 25, and 33.

Figure 7 illustrates the PV hosting capacity under different confidence levels and sample sizes for the two scenarios. In Scenario 1, the number of upstream nodes is smaller than the downstream nodes in Scenario 2, resulting in a lower overall PV hosting capacity. However, the average hosting capacity per node in Scenario 1 is higher, about twice that of Scenario 2. This discrepancy primarily stems from the locational characteristics of upstream and downstream nodes in the power system, highlighting how the distribution of PV installations impacts grid efficiency and sustainability.

Figure 8 illustrates the PV hosting capacity of each node when PV installation is restricted to a single node. Feeder’s head nodes, being closer to the power source, have a minimal impact on the system’s overall voltage levels. This means that installing PV at these nodes is less likely to cause voltage fluctuations or overvoltage issues. This characteristic is beneficial for maintaining system stability and minimizing the environmental impact of renewable energy integration. In contrast, feeder’s end nodes, which are farther from the power source, have a greater influence on the system voltage levels. PV integration at these nodes is more likely to lead to voltage increases, potentially exceeding the system’s safe operating limits. Furthermore, feeder’s head nodes can transmit power not only to feeder’s end nodes but also back upstream, enabling bidirectional power flow. This flexibility enhances their ability to accommodate PV energy, facilitating a more resilient and sustainable grid. In contrast, feeder’s end nodes are typically limited to unidirectional power flow, supplying power only to feeder’s end nodes. This limitation restricts their PV hosting capacity and limits their contribution to a greener, more efficient energy system.

5.6. The Impact of the Number of Nodes Allowed for PV Installation

In this section, a larger-scale 118-node system is used as the test system. The topology of the 118-node system is shown in Figure 9. The active and reactive power loads at each node, as well as the line parameters, are provided in [36]. The total active load of the system is 22.71 MW, and the reactive load is 17.04 MW. The system’s base voltage is 11 kV, and the base capacity is 10 MVA.

This section discusses the impact of the number of PV nodes on PV hosting capacity and system performance. The PV hosting capacity is assessed with a sample size of 1200 and a confidence level of 0.8. The combinations of PV installation capacities for different numbers of PV nodes are shown in Figure 10. Each region in the figure corresponds to the PV capacity installed at a given node. Uncolored regions indicate no PV installation, while colored regions represent PV installation, with the color intensity reflecting the magnitude of the PV capacity installed. From the figure, it can be observed that when the number of PV nodes exceeds twenty, the system’s PV hosting capacity stabilizes and no longer changes with an increase in the number of nodes. This indicates that the system’s PV hosting capacity is significantly affected when the number of PV nodes is limited. However, once the number of nodes reaches a certain threshold, further increasing the number of nodes provides little improvement in the overall PV hosting capacity. These findings underscore the importance of optimizing PV installations for maximum sustainability impact, ensuring that grid capacity is used efficiently and that renewable energy resources are integrated effectively to reduce environmental impacts and improve energy efficiency.

Figure 11, Figure 12 and Figure 13 show the voltage, active and reactive power of Node 67 and Branch 67 at different times, as well as the voltage of all nodes and the active and reactive power of all branches at the 12th hour. The results show that as the number of PV nodes increases, the maximum power values on the branches and the maximum voltages at the nodes both exhibit a decreasing trend. Additionally, power distribution across the branches becomes increasingly balanced. Specifically, when the number of PV nodes is relatively small, prolonged overvoltage and heavy loading are observed around noon at PV nodes and their adjacent branches. As the number of PV nodes increases, the installation capacity at each PV node decreases, which not only reduces the peak node voltages and their duration but also alleviates the transmission burden on the associated branches. These results highlight the importance of distributing PV installations in promoting a more sustainable and resilient power system. By optimizing the integration of renewable energy resources, this approach helps minimize the environmental impact of energy production, reduces the risks of overvoltage and overload typically caused by centralized PV installations, and enhances system safety. Furthermore, it supports the transition towards a cleaner, more efficient grid by improving the utilization of renewable energy while maintaining system stability.

A detailed analysis of the system’s operating state reveals that branch power flows and node voltages remain within the safety constraints. Within the same period, the voltage generally decreases from the feeder’s head to its end, while the voltage at the same node varies across time, with particularly notable fluctuations during midday. Comparing these variations with the PV output curve reveals a strong correlation between periods of significant node voltage fluctuation and times of sharp changes in PV output, highlighting the impact of PV on node voltage stability. Additionally, reverse power flow occurs in branches during these periods, accompanied by a corresponding voltage increase at nearby nodes. These observations illustrate how PV can induce reverse power flow, thereby elevating node voltage.

6. Conclusions

This paper proposes a DRCC method based on KL divergence to assess the PV hosting capacity in distribution networks, addressing the uncertainty of PV output. Through theoretical analysis and case studies, the main conclusions of this work can be summarized as follows:

The proposed time-segmented adaptive bandwidth kernel density estimation method, combined with Copula theory, effectively captures the characteristics of PV forecasting errors as they change over time and with different PV output. Through analysis and comparison, it is demonstrated that this method can accurately estimate the CPDF of PV forecasting errors;
The number of historical samples and the confidence level significantly affect the assessment results of PV capacity. By adjusting the confidence level based on the tolerable range of PV output uncertainty, it is possible to balance the optimism and conservatism of the assessment results;
The installation location of PV has a significant impact on PV capacity levels, with upstream nodes typically having higher capacity than downstream nodes. Furthermore, the number of PV installations also affects capacity, but once the number exceeds a certain threshold, the increase in capacity becomes less significant. However, decentralized PV installations help mitigate the risk of overvoltage and overload in the distribution network.

Author Contributions

Conceptualization, C.S. and H.L.; methodology, C.S.; software, C.S.; validation, C.S.; data curation, C.S. and C.H.; writing—original draft preparation, C.S.; writing—review and editing, J.W., H.L., C.H. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [grant number 52207091] and Natural Science Foundation of Jiangsu Province [grant number BK20220977].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

Abbreviation
PV	Photovoltaic
RO	Robust optimization
SO	Stochastic optimization
DRCC	Distributionally robust chance-constrained
KL	Kullback–Leibler
KDE	Kernel density estimation
MA	Mathematical analysis
DRO	Distributionally robust optimization
PDF	Probability distribution function
CPDF	Conditional probability density function
JPDF	Joint probability distribution function
CvM	Cramér-von Mises
ECDF	Empirical cumulative distribution function
CDF	Cumulative distribution function
RMSE	Root mean square error
EPDF	Empirical probability distribution function
Indices and Sets
i	Index of node
ij, kj	Index of branches that ends at node j
n	Index of data point
t	Index of time segment
inj	Index of power injection
PV	Index of photovoltaic
sub	Index of feeder head
$Ω_{α_{1 +}}$	Chance-constrained uncertainty set
$Ω_{i}^{end}$	Set of starting nodes for the line with ending node i
$Ω^{L}$	Set of all branches
$Ω^{N}$	Set of PV nodes
$Ω^{PV}$	Set of PV nodes
$Ω_{i}^{st}$	Set of ending nodes for the line with starting node i
$Ω^{sub}$	Set of nodes interconnected with the upstream power grid
Parameters
$μ$ , $ν$	State variables for the lower and upper bounds of the confidence interval
$χ_{N - 1, α}^{2}$	Upper α-quantile of the chi-square distribution with N − 1 degrees of freedom
$ℂ$	Cumulative distribution function of the actual f
$C_{In}$	Number of time segments
$C_{P Δ P}$	Copula distribution function of $Δ P$ and $Δ P$
$D$	Fuzzy set of PDF
$\hat{f}$	Estimated probability density
$f_{e}$	Empirical probability distribution function
$f_{J}$	Joint probability density function
$f$	Marginal probability densities function
$f_{Δ P \|P}$	Conditional probability density function of the $Δ P$ for the time segment i
$F_{e}$	Empirical cumulative distribution function
$F_{J}$	Joint cumulative distribution function
$F$	Marginal cumulative distribution function
$M$	Total number of samples
$Ρ [A]$	Probability of event A occurring
$R_{i j}$ $, X_{i j}$	Resistances and reactances of the branch ij
$W_{α}$	Confidence interval for PV output efficiency at a confidence level $α$
$P$	Sample of PV output
$Δ P$	Sample of PV forecasting error
Variables
$α$	Confidence level
$α_{1 +}$	Confidence interval adjustment value
$Δ ω$	PV power conversion coefficient error
$Δ {\underline{ω}}_{α}$	Lower bounds of the PV forecasting error at the confidence level $α$
$Δ {\bar{ω}}_{α}$	upper bounds of the PV forecasting error at the confidence level $α$
$ε$	Probability of failure
$ξ$	Random variable
$τ$	Kendall rank correlation coefficient
$φ_{i}$	Power factor angle of the PV
$ω$	PV output conversion coefficient
$A^{2}$	Anderson-Darling test statistic
d_KL	Tolerance value of the KL divergence
$D_{KL}$	KL divergence
$h_{Δ P, t}$	Estimated bandwidth of PV forecasting error $Δ P$ for the time segment t
$I_{i j}$	Currents in branch ij
$N_{t}$	Number of data points in the time segment i
$P_{i j}$ $, Q_{i j}$	Active and reactive power flows in branch ij
$P_{inj, i}$ $, P_{inj, i}$	Active and reactive power injections at node i
$P_{PV, i}$ $, Q_{PV, i}$	Active and reactive power output of the PV
$P_{sub, i}$ $, Q_{sub, i}$	Active and reactive interconnected power at node i
$R_{M S E}$	Root mean square error
$S_{i j}$	Transmission capacity of the power line
$S_{PV, i}$	PV capacity installed at node i
$U_{i}$	Voltages at nodes i
$W^{2}$	Cramér-von Mises test statistic
$Δ {\bar{P}}_{t}$	Mean PV forecasting error in the time segment t
$Δ P_{t, n}$	PV forecasting error of the data point n in the segment t
$x_{s}$	Decision variable
$Δ {\overset{⌢}{P}}_{t}$	Corresponding predicted value from the model
$w$	Weighting function

References

United Nations. The 17 Goals. Available online: https://sdgs.un.org/goals (accessed on 1 October 2023).
United Nations Department of Economic and Social Affairs. The Sustainable Development Goals Report 2022; United Nations: New York, NY, USA, 2022; Available online: https://unstats.un.org/sdgs/report/2022/The-Sustainable-Development-Goals-Report-2022.pdf (accessed on 7 July 2022).
Koirala, A.; Van Acker, T.; Hulst, R.D.; Van Hertem, D. Hosting capacity of photovoltaic systems in low voltage distribution systems: A benchmark of deterministic and stochastic approaches. Renew. Sustain. Energy Rev. 2022, 155, 111899. [Google Scholar] [CrossRef]
Iweh, C.D.; Gyamfi, S.; Tanyi, E.; Effah-Donyina, E. Distributed generation and renewable energy integration into the grid: Prerequisites, push factors, practical options, issues and merits. Energies 2021, 14, 5375. [Google Scholar] [CrossRef]
Chathurangi, D.; Jayatunga, U.; Perera, S.; Agalgaonkar, A.P.; Siyambalapitiya, T. A nomographic tool to assess solar PV hosting capacity constrained by voltage rise in low-voltage distribution networks. Int. J. Electr. Power Energy Syst. 2022, 134, 107409. [Google Scholar] [CrossRef]
Wu, H.; Yuan, Y.; Zhang, X.; Miao, A.; Zhu, J. Robust comprehensive PV hosting capacity assessment model for active distribution networks with spatiotemporal correlation. Appl. Energy 2022, 323, 119558. [Google Scholar] [CrossRef]
Son, Y.; Lim, S.; Yoon, S.; Khargonekar, P.P. Residential demand response-based load-shifting scheme to increase hosting capacity in distribution system. IEEE Access 2022, 10, 18544–18556. [Google Scholar] [CrossRef]
Zhang, S.; Fang, Y.; Zhang, H.; Cheng, H.; Wang, X. Maximum hosting capacity of photovoltaic generation in SOP-based power distribution network integrated with electric vehicles. IEEE Trans. Ind. Informat. 2022, 18, 8213–8224. [Google Scholar] [CrossRef]
Ding, F.; Mather, B. On distributed PV hosting capacity estimation, sensitivity study, and improvement. IEEE Trans. Sustain. Energy 2016, 8, 1010–1020. [Google Scholar] [CrossRef]
Rylander, M.; Smith, J.; Sunderman, W. Streamlined method for determining distribution system hosting capacity. IEEE Trans. Ind. Appl. 2016, 52, 105–111. [Google Scholar] [CrossRef]
Gush, T.; Kim, C.-H.; Admasie, S.; Kim, J.-S.; Song, J.-S. Optimal Smart Inverter Control for PV and BESS to Improve PV Hosting Capacity of Distribution Networks Using Slime Mould Algorithm. IEEE Access 2021, 9, 52164–52176. [Google Scholar] [CrossRef]
Abideen, M.Z.U.; Ellabban, O.; Al-Fagih, L. A review of the tools and methods for distribution networks’ hosting capacity calculation. Energies 2020, 13, 2758. [Google Scholar] [CrossRef]
Ismael, S.M.; Abdel Aleem, S.H.; Abdelaziz, A.Y.; Zobaa, A.F. State-of-the-art of hosting capacity in modern power systems with distributed generation. Renew. Energy 2019, 130, 1002–1020. [Google Scholar] [CrossRef]
Mulenga, E.; Bollen, M.H.; Etherden, N. A review of hosting capacity quantification methods for photovoltaics in low-voltage distribution grids. Int. J. Electr. Power 2020, 115, 105445. [Google Scholar] [CrossRef]
Chen, X.; Wu, W.; Zhang, B. Robust capacity assessment of distributed generation in unbalanced distribution networks incorporating ANM techniques. IEEE Trans. Sustain. Energy 2018, 9, 651–663. [Google Scholar] [CrossRef]
Mahmoodi, M.; Attarha, A.; Noori, S.M.; Scott, P.; Blackhall, L. Adjustable robust approach to increase DG hosting capacity in active distribution systems. Electr. Power Syst. Res. 2022, 211, 108347. [Google Scholar] [CrossRef]
Naveed, Q.; Ammar, A.; John, J.R.M.; Mahmoud, K.; Lehtonen, M. Probabilistic hosting capacity assessment towards efficient PV-rich low-voltage distribution networks. Electr. Power Syst. Res. 2024, 226, 109940. [Google Scholar]
Cho, Y.; Lee, E.; Baek, K.; Kim, J. Stochastic optimization-based hosting capacity estimation with volatile net load deviation in distribution grids. Appl. Energy 2023, 341, 121075. [Google Scholar] [CrossRef]
Atmaja, W.Y.; Sarjiya; Putranto, L.M. Development of PV hosting-capacity prediction method based on Markov Chain for high PV penetration with utility-scale battery storage on low-voltage grid. Int. J. Sustain. Energy 2023, 42, 1297–1316. [Google Scholar] [CrossRef]
Moro, V.C.; Trindade, F.C.; Costa, F.B.; Bonatto, B.D. Distributed generation hosting capacity analysis: An approach using interval-affine arithmetic and power flow sensitivities. Electric Power Syst. Res. 2024, 226, 109946. [Google Scholar] [CrossRef]
Li, J.; Ge, S.; Liu, H.; Hou, T.; Wang, P.; Xing, P. An improved IGDT approach for distributed generation hosting capacity evaluation in multi-feeders distribution system with soft open points. Int. J. Electr. Power Energy Syst. 2023, 154, 109404. [Google Scholar] [CrossRef]
Yao, H.; Qin, W.; Jing, X.; Zhu, Z.; Wang, K.; Han, X.; Wang, P. Possibilistic evaluation of photovoltaic hosting capacity on distribution networks under uncertain environment. Appl. Energy 2022, 324, 119681. [Google Scholar] [CrossRef]
Chen, X.; Wu, W.; Zhang, B.; Lin, C. Data-driven DG capacity assessment method for active distribution networks. IEEE Trans. Power Syst. 2017, 32, 3946–3957. [Google Scholar] [CrossRef]
Nasiri, N.; Zeynali, S.; Ravadanegh, S.N.; Kubler, S. Moment-based distributionally robust peer-to-peer transactive energy trading framework between networked microgrids, smart parking lots and electricity distribution network. IEEE Trans. Smart Grid. 2024, 15, 1965–1977. [Google Scholar] [CrossRef]
Yang, L.; Li, Z.; Xu, Y.; Zhou, J.; Sun, H. Frequency constrained scheduling under multiple uncertainties via data-driven distributionally robust chance-constrained approach. IEEE Trans. Sustain. Energy 2023, 14, 763–776. [Google Scholar] [CrossRef]
Chen, Y.; Guo, Q.; Sun, H.; Li, Z.; Wu, W.; Li, Z. A distributionally robust optimization model for unit commitment based on Kullback–Leibler divergence. IEEE Trans. Power Syst. 2018, 33, 5147–5160. [Google Scholar] [CrossRef]
Zhou, A.; Yang, M.; Wang, M.; Zhang, Y. A linear programming approximation of distributionally robust chance-constrained dispatch with Wasserstein distance. IEEE Trans. Power Syst. 2020, 35, 3366–3377. [Google Scholar] [CrossRef]
Zhou, Y.; Wei, Z.; Shahidehpour, M.; Chen, S. Distributionally robust resilient operation of integrated energy systems using moment and Wasserstein metric for contingencies. IEEE Trans. Power Syst. 2021, 36, 3574–3584. [Google Scholar] [CrossRef]
Yao, L.; Wang, X.; Li, Y.; Duan, C.; Wu, X. Distributionally robust chance-constrained AC-OPF for integrating wind energy through multi-terminal VSC-HVDC. IEEE Trans. Sustain. Energy 2020, 11, 1414–1426. [Google Scholar] [CrossRef]
Zhang, S.; Ge, S.; Liu, H.; Li, J.; Wang, C. Model and observation of the feasible region for PV integration capacity considering Wasserstein-distance-based distributionally robust chance constraints. Appl. Energy 2023, 347, 121312. [Google Scholar] [CrossRef]
Zhong, J.; Li, Y.; Wu, Y.; Cao, Y.; Li, Z.; Peng, Y.; Qiao, X.; Xu, Y.; Yu, Q.; Yang, X.; et al. Optimal operation of energy hub: An integrated model combined distributionally robust optimization method with stackelberg game. IEEE Trans. Sustain. Energy 2023, 14, 1835–1848. [Google Scholar] [CrossRef]
Zhong, J.; Zhao, Y.; Li, Y.; Yan, M.; Peng, Y.; Cai, Y.; Cao, Y. Synergistic Operation Framework for the Energy Hub Merging Stochastic Distributionally Robust Chance-Constrained Optimization and Stackelberg Game. IEEE Trans. Smart Grid 2024, 16, 1037–1050. [Google Scholar] [CrossRef]
Li, J.; Ge, S.; Liu, H.; Zhang, S.; Wang, C.; Wang, P. Distribution locational pricing mechanisms for flexible interconnected distribution system with variable renewable energy generation. Appl. Energy 2023, 335, 120476. [Google Scholar] [CrossRef]
Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall: London, UK; CRC: Boca Raton, FL, USA, 2005. [Google Scholar]
Jiang, R.; Guan, Y. Data-driven chance-constrained stochastic program. Math. Prog. 2016, 158, 291–327. [Google Scholar] [CrossRef]
Zhang, D.; Fu, Z.; Zhang, L. An improved TS algorithm for loss-minimum reconfiguration in large-scale distribution systems. Electr. Power Syst. Res. 2007, 77, 685–694. [Google Scholar] [CrossRef]

Figure 1. Historical data on PV forecast errors.

Figure 2. Daily cycle PV output and forecast error curves.

Figure 3. The mind map of the PV capacity assessment model.

Figure 4. PV hosting capacity under different confidence levels and historical sample sizes.

Figure 5. Adjusted confidence levels for different confidence levels and KL divergence tolerance values.

Figure 6. PV hosting capacity under different confidence levels for two methods.

Figure 7. (a) The PV hosting capacity under different confidence levels in Scenarios 1 and 2; (b) the PV hosting capacity under different sample sizes in Scenarios 1 and 2.

Figure 8. PV hosting capacity at each node.

Figure 9. The topology of the 118-node system.

Figure 10. The combinations of PV installation capacities for different numbers of PV nodes.

Figure 11. (a) Voltage at Node 67 over time. (b) Voltage of all nodes at the 12th hour.

Figure 12. (a) Active power of Branch 67 over time. (b) Active power of all branches at the 12th hour.

Figure 13. (a) Reactive power of Branch 67 over time. (b) Reactive power of all branches at the 12th hour.

Table 1. The CvM statistics, AD statistics, and RMSE for each method.

Model Number	Time Segment	KDE	Copula Theory	$W^{2}$	$A^{2}$	$R_{M S E}$
1	√	√	√	0.0018	0.0035	0.2146
2	√	√	×	0.0018	0.0035	0.3371
3	×	√	√	0.0013	0.0093	0.3594
4	√	×	√	0.0076	3.2516	0.3594
5	×	×	×	0.3258	3.8909	0.4090

√/× denotes that the method is/is not considered; W² represents CvM statistics; A² represents AD statistics.

Table 2. Evaluation results and computation time for RO, SO, and DRCC.

Method	PV Hosting Capacity (MW)	Run Time (s)
RO	12.44	30.87
SO	15.98	4786
DRCC	14.31	88.27

Table 3. The maximum Wasserstein distance for different confidence levels and sample sizes.

	0	0.2	0.4	0.6	0.8	1
Sample Sizes	0	0.2	0.4	0.6	0.8	1
400	0.4839	1.4185	1.5424	1.5585	1.5637	1.5660
600	0.4553	1.3598	1.5403	1.5591	1.5644	1.5660
800	0.5065	1.3632	1.5438	1.5602	1.5642	1.5660
1000	0.4877	1.3000	1.5295	1.5581	1.5641	1.5660
1200	0.5056	1.2805	1.5287	1.5594	1.5644	1.5660

Table 4. The computation time for KL divergence and Wasserstein distance under different sample sizes.

	400	600	800	1000	1200
Method	400	600	800	1000	1200
KL divergence	68.17	70.72	78.55	88.27	94.76
Wasserstein distance	89.87	92.42	100.25	109.97	116.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shen, C.; Liu, H.; Wang, J.; Yang, Z.; Hai, C. Kullback–Leibler Divergence-Based Distributionally Robust Chance-Constrained Programming for PV Hosting Capacity Assessment in Distribution Networks. Sustainability 2025, 17, 2022. https://doi.org/10.3390/su17052022

AMA Style

Shen C, Liu H, Wang J, Yang Z, Hai C. Kullback–Leibler Divergence-Based Distributionally Robust Chance-Constrained Programming for PV Hosting Capacity Assessment in Distribution Networks. Sustainability. 2025; 17(5):2022. https://doi.org/10.3390/su17052022

Chicago/Turabian Style

Shen, Chao, Haoming Liu, Jian Wang, Zhihao Yang, and Chen Hai. 2025. "Kullback–Leibler Divergence-Based Distributionally Robust Chance-Constrained Programming for PV Hosting Capacity Assessment in Distribution Networks" Sustainability 17, no. 5: 2022. https://doi.org/10.3390/su17052022

APA Style

Shen, C., Liu, H., Wang, J., Yang, Z., & Hai, C. (2025). Kullback–Leibler Divergence-Based Distributionally Robust Chance-Constrained Programming for PV Hosting Capacity Assessment in Distribution Networks. Sustainability, 17(5), 2022. https://doi.org/10.3390/su17052022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kullback–Leibler Divergence-Based Distributionally Robust Chance-Constrained Programming for PV Hosting Capacity Assessment in Distribution Networks

Abstract

1. Introduction

1.1. Motivation

1.2. Literature Review

1.3. Contributions and Organization

2. Modeling of the Uncertainty of PV Output

2.1. Time-Segment Adaptive Bandwidth KDE

2.2. PV Conditional Forecasting Errors Model Based on Copula Theory

2.3. Evaluation of the Probabilistic Model for PV Forecasting Errors

3. DRCC Fuzzy Set for PV Forecasting Errors

3.1. Confidence Interval for PV Output Efficiency

3.2. DRCC Fuzzy Set of PV Forecasting Errors Based on KL Divergence

3.3. Chance-Constrained Uncertainty Set

4. PV Hosting Capacity Assessment Model

4.1. Objective Function

4.2. Constraints

4.2.1. Power Flow

4.2.2. PV Operating Constraints

4.2.3. Network Safety Constraints

5. Case Studies

5.1. Test System Parameters

5.2. Comparative Analysis of the Probabilistic Model for PV Forecasting Errors

5.3. Analysis of the Sensitivity for Model Parameters

5.4. Comparative Analysis of Different Methods

5.5. Impact of PV Installation Location

5.6. The Impact of the Number of Nodes Allowed for PV Installation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI