To this end, based on the data-driven approach, this paper comprehensively considers the influencing factors, such as geographic information, meteorological information, and power grid information, to correct the damage probability of the transmission line-tower system.
3.1. Influence Factor Weight Calculation
In the calculation of variable weights, the typhoon data that caused damage to the Guangdong power grid in China over the years are selected as samples and the weights of the variables are obtained by analyzing and processing the sample data.
The feature information contained in the sample is as follows:
Geographic information: Elevation, slope direction, slope, slope position, underlying surface, roughness, etc.;
Meteorological information: Gust wind speed;
Power grid information: Design wind speed, running time.
The correction coefficient calculation framework is shown in
Figure 2.
As shown in
Figure 2, the geographic information, meteorological information, and power grid information from previous typhoon disasters are selected as sample data. The weights of each variable are first obtained by analyzing and calculating the sample data and then the correction coefficient is obtained according to the predicted variable data. In this paper, three kinds of analysis methods are used to evaluate the importance of each variable. Finally, the reasonable weights of the three kinds of analysis results are used to calculate the correction coefficient.
1. Variable importance evaluation based on the Gini index.
The random forest (RF) was proposed by Breiman et al. in 2001 and has now become one of the most commonly used tools in data mining and bioinformatics [
23]. RF can effectively analyze nonlinear, collinear, and interactive data and give variable importance scores while analyzing the data.
In the process of generating a decision tree, the RF algorithm divides each node based on the Gini index (one of the strategies). The Gini index can characterize the importance of a node and thus the importance of the variable [
24]. Therefore, the importance of the variables can be evaluated accordingly, based on the Gini index.
Assuming that there are m variables, this paper intends to calculate the weight of the variable Xj (j = 1, 2,…, m) according to the Gini index, based on the random forest algorithm.
In this paper, the damage of the transmission line-tower system is regarded as the two-category variable, that is, damaged and not damaged, then the Gini index is calculated as follows:
where
pm is a probability estimate of the sample belonging to any class at node
m.
The importance of the variable
Xj at the node
m is
VIMjm, that is, the Gini index change before and after the node
m branch is as follows:
where
GIl and
GIr represent the Gini indices of the two new nodes split by node
m, respectively.
If the variable
Xj appears
M times in the
ith tree, the importance of the variable
Xj in the
ith tree is as follows:
The Gini importance of the variable
Xj in RF is defined as follows:
where,
n is the number of classification numbers in the RF.
2. Variable importance assessment based on mean decrease accuracy.
Mean decrease accuracy is a commonly used feature selection method that directly measures the impact of each feature on the accuracy of the model [
25]. The main idea is to disrupt the order of the eigenvalues of each feature and measure the impact of sequence changes on the accuracy of the model. For variables that are not important, the scrambling order does not affect the accuracy of the model too much. For important variables, the disordered order will reduce the accuracy of the model.
Assuming that there are m variables, this paper intends to calculate the weight of the variable Xj (j = 1, 2,…, m) according to the mean decrease accuracy based on the random forest algorithm.
The specific steps are as follows:
Step 1: The sample data is divided into a training set and a test set, of which 80% are training samples and 20% are test samples.
Step 2: Train and adjust the RF model to obtain the accuracy rate acc.
Step 3: Calculate the effect of the variable
Xj on the test accuracy. The impact score is represented by score
j. The feature data corresponding to the variable
Xj is randomly shuffled
n times and, at the same time, the corresponding test accuracy rate, shuff_acc
ij (
i = 1, 2,…,
n), will be obtained when the feature data is be shuffled at
ith time.
The effect of variable
X on the test accuracy is represented by score
j.
Step 4: Calculate the importance weight
wj of each variable.
3. Variable importance assessment based on the entropy weight method.
The entropy weight method is commonly used for multi-factor weight analysis [
26], because its conclusion is more objective and the calculation process is simple. In the entropy weight method, the information entropy of each index is negatively correlated with the degree of numerical difference. The larger the numerical difference degree, the smaller the information entropy, and the larger the information amount, the final weight will be larger [
27].
The specific steps of the entropy weight method are as follows:
Step 1: Assuming there are
n damaged samples, each sample has
m variables. The evaluation matrix
X was obtained according to Reference [
28].
Step 2: Calculate the specific gravity size
Pij of the
ith sample in the variable
Xj. Calculate the entropy,
ej, of the variable
Xj.
Step 3: Calculate the entropy weight
aj.
3.2. Optimal Weight Determination Method
Since the above three variables importance weight determination methods have advantages and disadvantages, considering the actual situation and human subjective judgment, this paper intends to use the fuzzy multi-criteria decision-making method [
29] to evaluate the variable importance determination method and select the relatively superior determination method.
Supposing there are n (x1, x2,…, xn) schemes for determining weights, considering m variable factors, specific steps are as follows:
Step 1: Establishing the decision matrix
Y based on the results of
n kinds of weight determination.
where
Xmn represents the weight of the
mth variable under the
nth weight calculation method.
Step 2: Normalizing the decision matrix
Y base on the minimum value, and the relative membership degree matrix
R is obtained.
where
rij = min
j(
xij)/
xij.
Step 3: Calculating the weight vector of
R.
In this paper,
p =
m is taken and all the weights are to be determined, and
α = 1.
Step 4: Calculating the decision vector
D.
To this end, the scheme corresponding to the maximum value in the decision vector
D is selected as a relatively optimal scheme. According to the actual situation, the range of values for each variable is shown in
Table 1.
3.3. Correction Coefficient Calculation
It is assumed that the weight of each variable (m variables) is and the predicted value corresponding to each variable is .
In this paper, the correction coefficient is calculated based on the value range of the variables and the prediction data. The specific steps are as follows:
Step 1: Comprehensive scoring benchmark.
Considering that the numerical growth of each variable has different effects on the damage situation, the gust wind speed, running time, slope and roughness are positively correlated with the damage of the transmission line-tower system and the design wind speed, altitude, underlying surface, slope direction, and slope position are negatively correlated with the damage of the transmission line-tower system [
30]. To this end, in the calculation of the score, the positive correlation variable front symbol is ‘positive’ and the negative correlation variable front symbol is ‘negative’.
where
and
are the maximum and minimum values of the comprehensive scoring benchmark, respectively. The value
wi is the weight of each variable and
,
are the upper and lower bounds of the variable range, respectively.
Step 2: Comprehensive forecast score.
where
W is the comprehensive forecast score and
xi is the forecast value of the
ith variable.
Step3: Correction coefficient.
Based on reference [
7,
31], this paper maps the comprehensive prediction score to the interval (0.9, 1.3) and obtains the correction coefficient
k as follows:
In this paper, the correction coefficient k is used to correct the damage probability of the transmission line-tower system under a typhoon disaster and the final comprehensive damage probability prediction result is obtained. At the same time, since some of the variables that can be collected are based on a 1 km × 1 km mesh, in order to match the data of each variable and ensure the accuracy of the data, this paper performs a 1 km × 1 km mesh division on the prediction area and calculates the correction coefficients of each grid separately.