1. Introduction
In the 2021 Chinese National People’s Congress, the Chinese government’s work report for the first time emphasized the importance of “Carbon Neutral, Emission Peaking” as a key task. China has set ambitious goals, aiming to reach the peak of carbon emissions by 2030 and achieve “carbon neutrality” by 2060. In light of the significance of reducing losses, power grid enterprises have made energy saving and emission reduction their top priorities [
1].
Line loss is a comprehensive reflection of the power supply quality of power grid enterprises. However, line loss management is a systematic project with large data and strong comprehensiveness. At present, power grid enterprises have a set of established methods and implementation measures for line loss management and evaluation, but due to uneven management levels, imperfect management mechanisms, and single evaluation methods in the evaluation system, the management and implementation of line loss are still not in place. Therefore, power grid enterprises need a more effective line loss management evaluation system to guide loss reduction.
At present, comprehensive evaluation has been widely used in power systems. Yang et al. [
2] employed the Analytic Hierarchy Process (AHP) to evaluate the energy efficiency level of distribution networks based on expert experience. Goh et al. [
3] proposed the Fuzzy-AHP approach to determine the weight of load nodes in a system’s load profile and select control strategies for different load levels. Zhao et al. [
4], aiming to minimize subjective bias, integrated the subjective evaluation values obtained from the entropy method using the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) method to evaluate the national electricity development status comprehensively. Zhang et al. [
5] utilized the coefficient of variation method and ideal approximation sorting method to evaluate wind farms comprehensively. To evaluate power quality objectively and reasonably, Li et al. [
6] used the improved AHP and entropy weight method to introduce attribute recognition in the literature. Additionally, contingency theory has been employed to improve the Analytic Hierarchy Process (AHP) and enable comprehensive evaluations of power quality and wind power grid connection technology [
7,
8]. The main focus of these works [
2,
3,
4,
5,
6,
7,
8] was to establish a suitable evaluation index system and develop an objective weighting scheme to address the evaluation challenges. However, the aforementioned scholars’ investigations into evaluation methods in the comprehensive evaluation system mainly focus on employing either a single algorithm or a combination of two algorithms for calculation. This narrow approach may result in biased evaluation outcomes, making it challenging to effectively integrate subjectivity and objectivity in the evaluation process.
The theory of comprehensive evaluation has been applied in line loss management. For instance, in [
9], a combination of subjective and objective assignments using the AHP method and sensitivity weighting is utilized to evaluate line loss management in rural network systems. In [
10], a principal component analysis (PCA) method is proposed to comprehensively evaluate the impact of distributed generation on voltage and line loss. Considering the limitations of a single evaluation method, some researchers employed multiple algorithms to assess the level of line loss management. The study in [
11] employs seven different evaluation algorithms to assess municipal power companies’ line loss management systems, providing a comprehensive evaluation of the level of line loss management. Similarly, in [
12], which focuses on line loss management at the county level, multiple indicators are used to measure and track the progress of line loss management, which allows for a more comprehensive assessment of the effectiveness of line loss management efforts. Researchers use multiple evaluation algorithms to capture different aspects and dimensions of line loss management, providing a well-rounded evaluation considering various factors and perspectives. Nonetheless, scholars predominantly rely on the traditional model-driven research paradigm when evaluating line loss management, emphasizing a “cause and effect” perspective. However, in the current digital era, traditional management practices have transitioned to digital management, with decision-making increasingly based on data analysis. Numerous decision scenarios require a comprehensive understanding of causality and correlation. Hence, considering the application of data-driven methods in evaluating comprehensive line loss management is imperative.
Artificial intelligence has witnessed rapid development in the electrical industry in recent years. This technology has the potential to revolutionize grid operations by enabling intelligent evaluations and management based on integrated big data [
13]. The advancement of artificial intelligence has led to the emergence of data-driven methods, particularly machine learning, which offer new possibilities for constructing comprehensive evaluation systems. As computing power continues to improve, intelligent algorithms have been successfully applied in evaluation systems, replacing traditional complex evaluation processes with a more reliable model-based learning approach.
For instance, a study by Yao et al. [
14] proposes an approach based on gradient-boosting decision trees for line loss rate prediction. This model leverages the power of machine learning to provide accurate and reliable forecasts in line loss management. Another study by Author et al. [
15] applies an enhanced online random forest model to assess online voltage reliability, offering a dependable and precise evaluation method. These examples demonstrate how artificial intelligence techniques, specifically data-driven methods such as machine learning algorithms, have been successfully utilized in line loss management evaluation. By utilizing advanced technologies, the development of more robust and effective integrated assessment systems can be achieved, resulting in improved reliability and efficiency of grid operations.
Collective intelligence refers to the idea that multiple participants independently provide their evaluation opinions on a particular subject and then aggregate those opinions. The resulting evaluation is often more accurate than the opinions of any individual participant alone [
16]. This concept has garnered interest from businesses seeking to foster collaborative innovation [
17] and researchers aiming to address systemic challenges such as climate change [
18]. These studies demonstrate that groups can exhibit intelligence and, in certain cases, can be as intelligent as experts when they offer independent viewpoints on a given matter.
The collective intelligence paradigm focuses on leveraging the intelligence of groups of individuals to enhance productivity and facilitate better decision-making compared to isolated individuals [
19]. Studies such as Yi’s [
20] have provided further insights into the methods and factors that contribute to the stability of labeled behaviors. In this study, the relationship between the level of label distribution on a single resource and the number of annotators was quantified. The researchers found that a typical stability level of label distribution can be achieved when there are around 300–400 annotators. Another study conducted by Robu et al. [
21] used the Kullback–Leibler (KL) distance to measure the degree of randomization in label distribution. This measurement was utilized to establish a metric for determining the dynamics of label distribution, which characterizes the group intelligence level. Collective intelligence recognizes the value of collaboration and the diverse perspectives within a group, enabling more comprehensive and effective problem-solving processes. By harnessing collective intelligence, organizations and researchers can tap into the collective intelligence of individuals, leading to improved outcomes and decision-making.
Traditional models face challenges in examining numerous variable combinations to establish correlations within the context of big data. Moreover, these models tend to focus on data causality, potentially overlooking implicit information and resulting in low explanatory power. On the other hand, traditional comprehensive evaluation models often rely on a single algorithm or a combination of two algorithms for evaluating line loss management. However, this approach tends to exhibit one side in terms of the chosen algorithm(s). As a result, the evaluation results may lack scientific rigor and fail to provide a comprehensive measurement of the level of line loss management. Based on the provided background, this paper proposes a comprehensive evaluation method for lean line loss management using a big-data-driven paradigm.
The proposed method combines data-driven and model-driven approaches to construct a big-data-driven paradigm. It involves utilizing eight evaluation methods with different focuses to create collective intelligence and establish an effective evaluation model. The model-driven paradigm measures are employed to resolve causal relationships among variables, while data-driven paradigm association mining methods are used to discover associations among variables. Subsequently, the random forest algorithm is applied to build a situation prediction model using the collective-intelligence-based results as training samples. This paper evaluates line loss management based on the data from 61 grid companies in five provinces in southern China. The study contributes to the field in the following ways:
This paper proposes a research method combining “model-driven” and “data-driven” approaches. The data-driven approach is utilized to uncover causal and correlative relationships within the data, while the model-driven method is employed to evaluate the data and generate a comprehensive dataset.
The comprehensive evaluation model developed in this study incorporates the theory of group intelligence. By integrating objective information such as indicator conflicts and information entropy as well as leveraging the collective expertise of experts in relevant fields, the evaluation results obtained provide a scientific assessment of line loss management.
The random forest algorithm is employed to evaluate line loss management in this study. The trained model enables accurate evaluation of the management score for enterprises using indicator characteristics specific to municipal power enterprises, which offers scientific guidance for implementing lean management strategies to minimize line losses.
The rest of the paper is organized as follows:
Section 2 introduces the background of big data and explains the implications of lean line loss management.
Section 3 provides a brief description and organization of the research model.
Section 4 briefly discusses the process of constructing the data-driven evaluation model and describes the validation methods for the evaluation results.
Section 5 examines the prediction model results and provides an analysis. Based on the analyses in
Section 5, conclusions are drawn in
Section 6.
2. Paradigm and Characteristics of Lean Line Loss Management in the Era of Big Data
2.1. Paradigm and Characteristics of the Era of Big Data
The new paradigm in the era of big data can be observed from the “making” perspective and the “using” perspective. In the comprehensive evaluation of line loss management, big data is considered a component of information technology (IT) encompassing data and systems.
From the “making” perspective, the primary focus lies in big data analysis, which involves organizing line loss data and conducting line loss index analysis, among other tasks. Additionally, attention is given to constructing big data systems for line loss evaluation and integrating comprehensive evaluation methods.
On the other hand, from the “using” perspective, key areas of concern include the behavior of utilizing big data. This encompasses establishing line loss management evaluation indices and constructing models for evaluating their effectiveness. Furthermore, big data plays a crucial role in enabling innovation in line loss management, such as predicting line loss situations and implementing closed-loop management strategies.
In traditional models, as the combination of variables increases and the original model’s explanatory power diminishes, the introduction of new variables becomes necessary. These variables are often latent, unpredictable, or currently unavailable. This is where data-driven methods and techniques come into play. Data awareness, technological capabilities, and their integration become the core competencies in research and application within the realm of big data. They form the fundamental elements of the big-data-driven paradigm.
Another significant implication of the big-data-driven paradigm is big-data enablement, which refers to the value creation driven by big-data capabilities. Big-data enablement focuses on developing big-data capabilities to uncover new models and opportunities. This, in turn, drives service and model innovation, ultimately leading to the creation of enterprise value.
The big-data-driven paradigm framework can be examined through the lens of the aforementioned perspectives: the technology methods and the enabling innovation (as shown in
Figure 1).
2.2. Overview and Requirements of the Comprehensive Evaluation System for Line Loss Lean Management
With the rise of the big data era and the increasing complexity of power grid operations, traditional evaluation systems can no longer meet the needs of power grid enterprises. Therefore, the concept of lean management needs to be introduced. In the context of line loss management, “lean” refers to accurately and comprehensively measuring the level of line loss management, identifying weaknesses promptly, and investing resources accurately. One aspect of “lean” management is cost reduction, which involves minimizing inputs and resource consumption. The goal is to achieve greater output benefits while conserving resources. These benefits include economic gains and social benefits, both in the short term and the long term. Ultimately, lean management supports the sustainable and rapid development of the enterprise. Power grid enterprises can enhance their line loss management practices by adopting lean management principles and leveraging big data. Through accurate measurement, identification of weaknesses, and efficient resource allocation, they can optimize their operations, reduce losses, and achieve long-term success.
The demand for big data also necessitates the development of a comprehensive evaluation system for line loss lean management. Within the field of line loss lean management evaluation, the utilization of big data requires consideration of both cause–effect relationships and correlations. Without these elements, making effective decisions to guide reduction becomes challenging. Consequently, enhancing foresight and risk insight in evaluating line loss lean management becomes imperative.
The construction of a comprehensive evaluation system for lean line loss management involves several key steps, including data standardization, establishment of the line loss management indicators, development of an effective evaluation model, creation of a situation prediction model, and closing the loop of line loss management.
The application of the lean management evaluation system holds significant importance for both industrial and service-oriented enterprises. It offers valuable assistance by eliminating inefficient labor within the production management process to achieve optimal long-term benefits. Therefore, in the era of big data, power grid enterprises should leverage artificial intelligence technology to make informed decisions and provide guidance for reducing losses, ultimately maximizing long-term advantages
3. Artificial Intelligence Technology
3.1. Artificial Intelligence Methods
In the big-data-driven paradigm, numerous innovative technological approaches have emerged in response to the challenges posed by traditional models. Additionally, the development of new digital infrastructures has facilitated the implementation of artificial intelligence methods in power systems [
22].
Figure 2 provides a schematic illustration of the analytical process of the AI approach.
Artificial intelligence is applied to evaluate lean management of power grid line loss, based on
Figure 2. Initially, actual sample data are collected and processed through dimensionality reduction, clustering, and missing-value supplementation. Subsequently, evaluation model construction, parameter adjustment, and results derivation are performed using machine learning or deep learning methods. This study selects the random forest algorithm as the machine learning method, as it can efficiently train the dataset and generate relatively precise outcomes. Random forest swiftly and accurately analyzes substantial line loss management index data, delivering timely scientific outcomes, which is especially valuable for time-sensitive enterprises such as power grids.
3.2. Data Processing
Data processing is required once the study population is established and the sample set is generated. This processing step aims to prevent individual data points with significantly deviating values from influencing the results. Clustering is performed to uncover and explore potential differences and associations within the data. In this paper, the AP (Affinity Propagation) clustering algorithm is selected to cluster the samples and eliminate values that deviate excessively from other data points.
The AP clustering algorithm is a technique that considers the information exchange between observations. Unlike other clustering algorithms, it does not require the pre-determination of cluster centers. Instead, it considers all sample points as potential centers and automatically determines the location and number of cluster centers through iterative calculations. The algorithm searches for suitable clustering centers in each iteration, continuously refining the clustering assignment until convergence is achieved.
is defined as the measure of suitability for
to serve as a clustering center for
, indicating the extent to which
considers
appropriate as its clustering center. To determine the appropriate clustering center
, the AP algorithm continuously gathers evidence in the form of
and
from the data sample. The iterative formulas for
and
are as follows:
The AP clustering process is a continuous-loop iteration based on Equations (
1) and (
2) to update the evidence. By iterative competition, AP clustering can obtain the optimal clustering centers and the class profiles of each sample point.
3.3. Construction of Data-Driven Model
Random forest offers several advantages over traditional networks, including a simple model structure, high accuracy, fast training speed, and robustness against overfitting and noise interference. Hence, random forest is applied as the data-driven method to construct an empirical model for evaluating line loss lean management.
Random forest [
23] is essentially a decision tree algorithm: a well-established and classical data mining method that automatically generates decision rules. It utilizes a partitioning approach to address classification or regression problems. During training, decision trees evaluate the information gained from feature partitioning at each stage, starting from the top and progressing downwards, in order to select the optimal features from the partitioned dataset. Subsequently, the resulting subproblems are recursively processed. The data instances are assigned to different branches within the tree structure, and this process iterates until the recursive stopping condition is met.
Each path within a decision tree model, starting from the root node and extending to the leaf nodes, represents a classification rule. Consequently, a decision tree can be viewed as a collection of classification rules that enable the model to make predictions on unlabeled data.
In the case of classification trees, the Gini impurity is a commonly used metric to evaluate the effectiveness of branching in the tree model. This metric measures the “impurity” of the system by assessing the likelihood that a randomly selected sub-item from the dataset will be incorrectly assigned to another category. The formula for calculating the Gini impurity is defined by Equation (
3).
Here, denotes the proportion of the current nodes belonging to the class i. The range of lies within [0,1], where values closer to zero indicate a more favorable classification outcome.
In the case of regression trees, the splitting point is determined using a variance metric. This involves calculating the variance of each leaf node and then summing and weighting all the variances. The splitting method with the lowest variance value is ultimately chosen. The calculation formula for variance is given by Equation (
4).
Here, represents the average value of the input variable data in the regression tree, and V denotes the weighted variance associated with that particular input variable. Each corresponds to an individual data point within the dataset, and n signifies the total number of data points.
4. Construction of Evaluation Algorithms Based on the Big-Data-Driven Paradigm
4.1. The Data-Driven Model Based on Collective Intelligence Theory
Numerous recent studies have demonstrated that groups exhibit collective intelligence, resulting in more accurate decision-making compared to individuals. The practical application of collective intelligence theory should adhere to four key principles: independence, decentralization, plurality, and integration [
24]. In the evaluation of line loss lean management within power grid enterprises, primary organizations focus on utilizing model-driven methods to assess the level of line loss lean management. By employing model-driven methods, the fundamental nature of the problem can be identified, allowing for the development of new theories. These methods consider the research problem holistically and describe the characteristics of the research object using specific mechanism models or relevant rules. In alignment with the objectives of this paper, the collective intelligence of this study should also adhere to the following four principles:
Independence: Each model-driven method should express its own viewpoint independently regarding the perceived level of line loss management within the grid enterprise. The viewpoints of other model-driven methods should not influence its assessment of the perceived level of line loss management.
Decentralization: Each model-driven approach should be able to focus and apply its knowledge when providing an opinion on the perceived lean level of line loss management.
Plurality: Each model-driven approach possesses unique knowledge regarding the integrated level of line loss management perception, resulting in diverse information among different approaches.
Integration: A unified decision integration mechanism should be employed to aggregate the evaluation opinions of various model-driven methods regarding the comprehensive level of line loss lean management, thereby deriving a collective opinion.
This study aims to adhere to the criteria above by selecting model-driven methods and decision-integration mechanisms. The evaluation algorithm is then constructed based on the big-data-driven paradigm. The following eight evaluation methods have been chosen: analytic hierarchy process (AHP) [
25], entropy weight method [
26], TOPSIS method [
27], weighted rank-sum ratio [
28], coefficient of variation method [
29], CRITIC weight method [
30], cosine value method [
31], and gray correlation analysis method [
32]. These methods are utilized to construct an effect evaluation model. Furthermore, collective intelligence is integrated using the random forest approach. Based on collective intelligence theory, the data-driven model is illustrated in
Figure 3.
As shown in
Figure 3, in this study, the relevant raw data within the evaluation index system are collected, and the original line loss index data are quantified based on specific scoring requirements, resulting in the extraction of new feature index data. Subsequently, the evaluation process employs the model-driven approach for scoring, wherein the feature indicator data and model-driven evaluation scores are utilized as a training set for the AI model. The trained AI model is then employed to make predictions and determine the final evaluation results. The integrated model does not simply average the results of individual methods but instead combines the strengths of each evaluation method. By preserving the unique information contributed by each method and incorporating the collective intelligence of the group, the final result becomes objective and realistic.
4.2. Flow of Random Forest
In the context of lean management decisions for line loss reduction, a well-integrated sample set is created through clustering to ensure a high degree of consensus among different groups. This sample set is then used as the training data for the random forest algorithm.
Random forest is an ensemble learning algorithm that utilizes decision trees as its base learners. The base learners are combined together using the bagging method. Each decision tree is trained on a randomly selected subset of the training data, with replacement. This process is repeated for multiple decision trees, and their predictions are aggregated to obtain the final prediction.
Suppose the original dataset consists of samples with input features and a classification label. The random forest algorithm combines independently trained decision trees to form a forest. The construction process of each tree can be viewed as partitioning the data space. Specifically, before constructing a decision tree, the bootstrap method is applied to draw K training datasets from the original dataset D. Each decision tree is built using the categorical regression tree (CART) method. Each tree node randomly selects m features from the M input features as the candidate split feature set. The optimal split feature and cut-point are determined based on the minimization of the mean square error (MSE) criterion. This partitioning process is repeated until a stopping condition is met.
After training K decision trees using the bootstrap sample sets, they are combined into a random forest model, denoted as . When a test sample x is input into the model, the corresponding classification results are obtained.
The random forest algorithm aggregates the results from each decision tree to make the final prediction. In this study, a simple averaging method is employed to obtain the final regression result , where the predicted values from all decision trees are averaged. The final output represents the grid enterprise’s line loss lean management score.
The flowchart of the random forest algorithm is shown in
Figure 4.
4.3. Evaluation Criteria for Data-Driven Models
The criteria used to evaluate and verify the data-driven model are predicted values and observed values to show the accuracy of the model. The observed value refers to the line loss index data of the city network enterprise, and the predicted value is the predicted value of the output of the random forest model. These criteria include Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and accuracy
[
33]. The calculation formulas are as follows:
Here, n represents the total number of predicted elements, while and refer to the predicted value and the actual evaluation value of the ith sampling point, respectively. In line loss lean management evaluation training, lower values of MAE and MAPE indicate better prediction performance, whereas a higher accuracy value signifies greater precision.
4.4. Test Methods of Comprehensive Evaluation
4.4.1. Preliminary Verification of Comprehensive Evaluation
When employing various comprehensive evaluation methods, the correlation degree of evaluation results is commonly measured using the KENDALL concordant coefficient. The formula for calculating the correlation degree
G is as follows:
Here, m indicates the number of evaluation methods, n is the number of evaluated indicators, is the order value, is the number of the repeated order value in the evaluation conclusion of the ith evaluation method of line loss management, and is the jth repetition value in the evaluation conclusion of the ith evaluation method of line loss management.
Subsequently, the significance check of
G should be conducted using the following formula:
The requirement is that the approximation abides by
, thus satisfying:
Meeting the condition in Equation (
9) implies significant consistency among multiple evaluation methods. It not only reflects the information within the initial evaluation conclusion but also captures the characteristics of multiple evaluation methods.
4.4.2. Post Verification of Comprehensive Evaluation
When multiple combination evaluation methods are employed, selection of the most reasonable method can be determined by assessing the consistency among the conclusions drawn from these methods. In order to test this consistency, the Spearman rank correlation coefficient method is utilized, and the calculation is as follows:
In this equation, represents the rank difference between the two evaluation methods for the ith evaluation object, and represents the Spearman rank correlation coefficient between the kth combination method and the jth comprehensive evaluation method. A higher value of indicates a stronger correlation between the ranking results of the two methods. The variable n in this context represents the number of schemes or municipal network enterprises participating in the evaluation, as described in this paper.
When the number of schemes
n is less than 10, the test statistics can be calculated using the equation:
Here, represents the average correlation degree between the kth combination evaluation method and the original m comprehensive evaluation methods.
However, when the number of schemes
n exceeds 10, specifically in this article, where
n is 61, the test statistics can be determined using the following formula:
In this case, the value of
is calculated using the same method as in Equation (
12). This test statistic provides a measure of the overall consistency between the ranking results of the
kth combination method and the multiple comprehensive evaluation methods.
After connecting all the aforementioned processes, the evaluation process of lean line loss management driven by big data is illustrated in
Figure 5. Firstly, the evaluation system is established based on a data-driven approach to determine the relevant indicators required for the evaluation system. Through analysis of the characteristics of the relevant power grid structure coupled with the strategic requirements of contemporary power grid enterprise development, the collection of line-loss-related data and information is targeted. The second aspect involves conducting a comprehensive evaluation and analysis. This study adopts a joint evaluation model combining the “model-driven” and “data-driven” approaches for lean evaluation work on grid line loss management. This choice is made based on the four principles of group intelligence theory and considering the current context of the big-data era. For this study, an eight-method group intelligence model is selected to evaluate line loss management using the model-driven approach. Subsequently, the evaluation results are utilized for training in random forest and are compared with the LightGBM algorithm as well as other artificial intelligence algorithms to validate the rationality and effectiveness of the outcomes.
5. Results and Discussion
5.1. Data Introduction and Experimental Platform
In this paper, the actual line loss index data of 61 local and municipal power grids in a southern network area for the years 2015 and 2016 were utilized as the simulation dataset. The line loss index data of 61 municipal network enterprises were evaluated using several methods, including analytical hierarchy process (AHP), entropy weight method, TOPSIS method, weighted rank-sum ratio, coefficient of variation method, CRITIC weight method, cosine value method, and gray correlation analysis method. All evaluation results were then scored and quantified on a scale of 0–100.
The training and testing set for the random forest model consisted of the data from 2015, while the data from 2016 were used as the validation set. These 61 power grids are situated in five southern China provinces: Guangdong, Guangxi, Yunnan, Guizhou, and Hainan. The geographical locations of these power grids are depicted as the yellow area in
Figure 6.
The hardware platform used in this paper consists of an Intel Core i5-7500 CPU and Intel(R) HD Graphics 630 GPU. The software code was developed using Python, and the random forest algorithm was implemented by invoking the random forest machine learning package.
5.2. Construction of Evaluation System and Data Processing
In order to make the evaluation system conform to the actual situation of line loss management, this paper refers to the related standards and scheme of line loss management of power grid enterprises and constructs the evaluation system from four dimensions that cover the whole process of line loss management, namely “planning and reducing”, “management loss”, “running loss”, and “technical loss”, which cover the structure of the net frame, marketing management, power grid operation, and equipment configuration. The system fully considers municipal power grids’ actual operation and management level, covers multiple voltage levels, and involves various daily business management scopes. The specific evaluation index system is shown in
Table 1.
The original line loss management data collected were scored based on the indicators listed in
Table 1. All the original line loss management data were then transformed into scores ranging from 0 to 100. Subsequently, the collective intelligence theory was employed to evaluate each city and obtain evaluation results. In order to eliminate values that deviated significantly from other evaluation scores, the AP clustering algorithm was selected for clustering the evaluation scores.
5.3. Evaluation Accuracy Results
The random forest model is employed to train the evaluation results in the proposed comprehensive evaluation method for collective-intelligence-based line loss lean management. Following simulation analysis, the values of the test samples are compared with the comprehensive evaluation value, depicted in
Figure 7, which represents the average value determined by the model-driven algorithm previously discussed.
The close resemblance between the predicted values in the test samples and the true values is demonstrated in
Figure 7. The predicted values represent the evaluation score of municipal power grids obtained from the random forest algorithm. The true values correspond to the average evaluation scores of municipal power grids derived from collective intelligence. The vertical axis is the fraction of the predicted and true values divided by one hundred. The horizontal axis denotes the count of power grid enterprises. With an accuracy of 91.07%, the predicted values demonstrate their reliability and accuracy as evaluation results derived from the random forest algorithm.
As shown in
Table 2, the test set yields an
MAE of 5.40% and
MAPE of 0.070%, which signifies the high accuracy of the random forest model. Utilizing the random forest algorithm, the training model adeptly combines the assessment perspectives from diverse model-driven methods regarding the comprehensive aspect of line loss lean management into a collective evaluation.
5.4. Comparative Analysis of Evaluation Methods
The validation set used for training the random forest prediction model was based on the dataset from 2016. The results obtained from all the evaluation methods mentioned were compared with the findings presented in this paper. The rankings, which assess the level of lean comprehensive management of line loss in grid enterprises using nine different methods, can be found in
Appendix A.
To demonstrate the rationality and validity of the effect evaluation model constructed using collective intelligence in this paper, the ranking of this proposed method was compared and analyzed alongside the other eight methods. Furthermore, to compare the management gaps among different enterprises, MM and SW, as well as LPS and YL, were selected as examples to identify the weak points in line loss management across these companies. The comparison results are illustrated in
Figure 8 and
Figure 9, where the horizontal axis represents the name of each line loss indicator for the respective enterprise. The indicators and their meanings are listed in
Table 1 (e.g., GH1 represents the standard rate of power supply radius of the power grid). The vertical axis shows the scores of each indicator for different firms.
Figure 8 illustrates that in MM’s planning dimension, the standard rate of power supply radius of the power grid and the capacity load ratio of the main network, as well as in the technical dimension, the station electricity consumption rate and energy-saving transformer ratio, have low scores. On the other hand, in the management dimension, the proportion of old and low electric energy meters, the power consumption ratio of automatic meters, calculation and measurement of fault error rate, and the rate of abnormal line loss solvation all received higher scores. AHP assigns too much weight to the management dimension based on subjective expert experience, resulting in a higher final ranking for MM. This approach overlooks the management shortcomings in the planning and technical dimensions. Therefore, introducing an evaluation method that considers the objective information quantity of indicators, as proposed in this paper, is more reasonable.
In comparison with the entropy weight method, the ranking differences between SW and WZ were nine and seven, respectively, while the ranking differences for other power grid enterprises were no more than six. The entropy weight method only considers the objective weights of indicators and focuses on their variability. In contrast, the method proposed in this paper considers both the objective weights and the comparative strengths and conflicts between indicators. It combines the advantages of objective weighting methods and AHP, which takes into account the subjective intentions of decision-makers and the collective intelligence of experts. This allows for a more comprehensive evaluation that incorporates expert knowledge and experience. The rankings of this paper’s method were also compared with the TOPSIS method, weighted rank-sum ratio, coefficient of variation method, and CRITIC weight method. Since these are all objective weighting methods, the comparisons are similar to that of the entropy weight method and will not be repeated here.
Finally, since both the cosine value method and gray correlation method are methods for measuring the differences in the degree of association between factors, the ranking of this paper’s method is compared with that of the gray correlation method, using the gray correlation method as a representative. As can be seen from
Appendix A, there are also some differences between the two methods, where ST, JY, YX, CX, DH, YL, and LPS, respectively, are ranked 11, 15, 10, 13, 12, 16, and 19 places apart in the two methods. For the city LPS with the largest difference, see
Figure 9; this grid company’s planning dimension and technical dimension scores are close to 90. The gray correlation analysis method sets the best column of the available data as the reference series. It calculates the similarity between LPS and the reference series, ranking high in the gray correlation method. The weight of the management and operation dimensions in the evaluation index system are slightly more significant than that of the planning and technical dimensions. The LPS does not perform well enough in some indicators with greater weights, such as calculation and measurement of fault error rate, the rate of abnormal line loss, comprehensive voltage rate, and the qualified rate of bus power imbalances, so the similarity degree with the reference series alone cannot accurately measure the comprehensive management level of line loss leaning in this enterprise. On the contrary, YL is ranked 16 places higher than the gray correlation method in this paper because while YL has a slightly lower score of 76.93 in the technical dimension, it has a score of 85.24 in the management dimension and a higher score for several key indicators. The method in this paper has fully adopted the advantages of AHP and the objective weighting method. The objective information between the indicators can be measured fairly, while the expert group’s experience and opinions are used for evaluation.
In conclusion, the method presented in this paper provides an objective assessment of power grid enterprises’ line loss lean management by effectively evaluating various aspects to reflect the gap between these enterprises and the optimal benchmark. As a result, it offers targeted guidance for power grid enterprises seeking to reduce line losses.
5.5. Comparative Analysis of AI Models
The effectiveness of the proposed method in practical application is further illustrated through a comparison with three artificial intelligence models, namely LightGBM, deep forest, and width learning. The analysis focuses on the 2016 line loss management index data by conducting a correlation test for each evaluation method. By employing Equations (
8) and (
9), the KENADLL coefficient for eight comprehensive evaluation methods is obtained, resulting in a calculation outcome of
G = 0.946 and
= 453.841. Subject to the fulfillment of Equation (
10), the values of
G and
indicate strong correlation and compatibility among the eight comprehensive evaluation methods employed, highlighting their ability to effectively reflect line loss management information.
Subsequently, the evaluation scores for the eight evaluation methods are clustered using the same method as described earlier, while the data underwent a similar clustering process. In total, 455 new sample datasets for the year 2016 and 52 sample datasets for the year 2017 are acquired. Initially, all 446 sample datasets related to line loss management from the year 2015 are utilized as training sets for random forest, Bayesian-optimized LightGBM, deep forest, and width learning models. The four aforementioned AI models are employed for evaluation using all 507 datasets from 2016 and 2017 serving as the test set.
The evaluation results of each grid company are obtained, as shown in
Figure 10, and the ranking results are shown in
Appendix A.
In
Figure 10, the horizontal axis shows the abbreviation of the name of each power grid enterprise. The vertical axis shows the evaluation scores of each enterprise derived from the four different AI methods. The figure illustrates that the four models yield different evaluation values for the same power grid enterprise and display similar overall trends. To compare the ranking results among these four models, this study employs the widely used consistency test method from the field of comprehensive evaluation and calculates the Spearman rank correlation coefficient for the post-verification of comprehensive evaluation. The Spearman rank correlation coefficients, representing the similarity between the ranking results of each artificial intelligence model and the original comprehensive evaluation method, are calculated using Equation (
11) and are presented in
Table 3.
The
values are recalculated using
Table 3 and Equation (
12). Subsequently, the corresponding
t-statistic is derived. The calculation results are presented in
Table 4.
The
t-statistics for the four AI models are 14.659, 14.371, 13.625, and 10.122, respectively. These values are greater than the critical value
at a significance level of 0.05. This suggests that all four AI algorithms pass the consistency test. From
Table 3 and
Table 4, it is evident that the random forest algorithm exhibits the highest Spearman rank correlation coefficient and
t-statistic when compared to the other three artificial intelligence methods. This finding suggests that the model-driven approach employing the random forest algorithm yields the most scientifically valid and effective evaluation results. Consequently, the evaluation outcomes derived from this algorithm can be employed to assess the lean line loss management of power grid companies. The random forest algorithm exhibits the highest statistic, indicating that its evaluation results are the most reasonable and effective. Therefore, the evaluation results obtained from this algorithm can be utilized to assess line loss lean management in power grid enterprises.
In summary, this paper’s approach incorporates the advantages of objective weighting methods such as the entropy weight, AHP, and gray correlation analysis. It comprehensively evaluates objective information by considering indicators’ comparison strength, conflicts, and information entropy and by extracting collective intelligence from decision-making experts. Furthermore, it effectively captures the gap between an enterprise and the optimal benchmark. Consequently, it enhances the objectivity of comprehensive evaluation results for line loss lean management in power grid enterprises and offers targeted guidance for reducing line losses.