1. Introduction
With the rapid development of the new power system—a modernized grid paradigm characterized by high penetration of renewable energy, distributed generation, and digitalization—traditional power systems are undergoing significant transformation [
1,
2]. Compared with traditional power systems, new power systems offer greater flexibility and scalability, but they also introduce issues like load volatility and system uncertainty [
3].
In this context, fault detection and diagnosis have become key tasks to ensure the safe and stable operation of power systems [
4,
5]. Among these, identifying and addressing wiring errors in electricity meters has become critical for maintaining the safety and stability of power systems [
6]. Three-phase three-wire electricity meters are widely used by high-load users in power systems, especially for loads in neutral point insulated systems. However, in new power systems, wiring errors in electricity meters often resemble complex load characteristics, such as light loads and excessive reactive power compensation, which can easily lead to misjudgments. For example, studies in references [
7,
8] show that light loads and excessive reactive power compensation may cause negative current phenomena, which are not necessarily wiring errors but are instead outcomes of system load volatility and the response characteristics of electrical equipment. As a result, existing methods mostly rely on manual field investigations and analysis, which are time-consuming and labor-intensive.
Reference [
9] introduced the active power measurement principle of the three-phase three-wire smart electricity meter, providing both theoretical analysis and field testing. The study showed that under normal wiring conditions, light loading of the transformer can reduce the power factor on the primary side, and theoretical analysis revealed that even with correct wiring, a negative current can appear in one phase of the electricity meter. Reference [
10] presented techniques and methods for analyzing wiring errors, offering field-operable guidance for electricity inspectors dealing with three-phase three-wire meter miswiring. Reference [
11] provided examples of phase A current reverse connection and voltage reverse phase sequence to complete a full manual analysis. Reference [
12] proposed a prevention method for wiring errors in three-phase four-wire meters based on the PEC-H3A calibration instrument, which, although effective, still relies on specialized instruments to assist with manual analysis.
Traditional fault detection methods largely rely on human experience and qualitative analysis [
13]. However, these methods often struggle with the complexity of modern power systems, especially with the high integration of distributed energy and dynamic load characteristics. Therefore, the use of intelligent algorithms for fault detection and security enhancement has become a research hotspot.
In recent years, machine learning-based intelligent fault detection methods have been increasingly applied [
14], especially in the detection of wiring errors and system anomalies. For example, reference [
15] proposed a data-driven fault detection framework combining random forest (RF) and extreme gradient boosting (XGBoost). By ranking features and training classifiers, the framework efficiently detects faults in wind turbines. Reference [
16] trained and tested features extracted using the Random Forest model and optimized model parameters through grid search, achieving efficient arc fault detection under different loads. Reference [
17] used the XGBoost algorithm to establish a temperature regression prediction model for key components of wind turbines, and utilized the residual trend between predicted and actual values for fault early warning. Machine learning techniques have demonstrated strong potential in complex environments, particularly for fault detection in power systems. However, to date, no research has applied these intelligent fault detection techniques to the identification of three-phase three-wire wiring errors under light load and overcompensation scenarios.
Based on this background, the present study applies the light gradient boosting machine (LightGBM) algorithm to the identification of wiring errors in electricity meters. As an efficient gradient boosting decision tree (GBDT) algorithm, LightGBM excels in processing large-scale datasets and can effectively handle complex anomalies in power systems [
18,
19]. Through decision tree integration, LightGBM demonstrates excellent classification performance and faster convergence speed when dealing with large data volumes and imbalanced categories in power data. This study is novel in addressing three-phase three-wire wiring-error identification specifically under light-load and capacitive overcompensation conditions—scenarios that have not previously been studied using intelligent methods—and in using a mechanism-informed data synthesis approach that reproduces the sign-reversal behavior responsible for misidentification. The main contributions of this paper are as follows:
This study deeply analyzes the impact of new load characteristics, such as light load and reactive power compensation, on the identification of wiring errors in electricity meters. Based on the mechanistic analysis (using physical and electrical principles to analyze system behavior under specific conditions such as light load and overcompensation) of electrical characteristics, it not only reveals the reasons for misjudgments in wiring-error identification caused by light load and reactive power compensation but also supplements the training data for machine learning models, providing a theoretical foundation for training models when data is insufficient.
This paper proposes an intelligent wiring error identification method for electricity meters based on the LightGBM algorithm. It achieves high-precision wiring-error detection in complex power load environments (such as light load and excessive reactive power compensation), overcoming the limitations of traditional methods that rely on human experience, and significantly improves the accuracy and automation of fault diagnosis in power systems.
The remainder of this paper is organized as follows.
Section 2 presents the measurement principles, mechanism analysis (light load/overcompensation), and defines the data ranges.
Section 3 describes the LightGBM-based identification model.
Section 4 reports the experiments and results (comparisons and interpretability).
Section 5 concludes the study and outlines future work.
2. Analysis of Wiring Error Principles in Three-Phase Three-Wire Electricity Meters Under the New Power System
In the new power system, the rapid development of distributed energy, particularly the integration of renewable energy sources such as photovoltaics and wind power, has changed the load characteristics of traditional power systems [
20,
21]. The volatility and uncertainty of these new loads have intensified grid load fluctuations, presenting significant challenges to traditional power control and stability management methods [
22]. Meanwhile, the identification of wiring errors in electricity meters has become increasingly complex, especially in environments with a high proportion of distributed energy integration, where phenomena such as light loads and excessive reactive power compensation have become major factors leading to misjudgments in wiring errors [
23]. As an essential tool for power measurement, the accuracy of electricity meters directly impacts the precision of power trading and the safe operation of the power system [
24]. Accurately and efficiently identifying wiring errors in electricity meters, particularly under conditions of load fluctuations and improper compensation, has become a major challenge for modern power systems [
7,
9,
25]. Therefore, an in-depth analysis of the principles of three-phase three-wire electricity meter measurement and the causes of misjudgments in new load scenarios lays the necessary theoretical foundation for the development of subsequent intelligent identification methods.
2.1. Principle of Measurement for Three-Phase Three-Wire Electricity Meters
Three-phase three-wire electricity meters are widely used by high-load users in power systems, especially for loads in neutral point insulated systems. Assuming that the three-phase current and voltage are symmetric and the load is balanced, under this condition, the electricity meter can calculate active power and reactive power by measuring the magnitude and phase of the three-phase current and voltage. Specifically, the power calculation formula for the three-phase electricity meter is as follows:
Power calculation formula for phase A:
where
is the voltage of phase A;
is the current of phase A, and
is the phase angle between the current and voltage of phase A.
Power calculation formula for phase C:
where,
is the voltage of phase C;
is the current of phase C; and
is the phase angle between the current and voltage of phase C.
Equations (1)–(3) indicate that the sign of the measured component power is determined by the cosine terms: specifically, becomes negative when , and becomes negative when . Such sign reversals may occur under light-load or capacitive overcompensation conditions and can lead to an apparent reverse-current indication even without wiring errors.
2.2. Analysis of the Causes of Misjudgments in Wiring Errors Under New Load Scenarios
The current may be misjudged as having reverse polarity, thus generating a false impression of a wiring error.
2.2.1. Light Load
Light load is a common phenomenon in new power systems, typically characterized by the actual load being much lower than the design load or the rated capacity of the equipment. Under light load conditions, due to the decrease in power factor and frequent occurrences of negative current, such loads are often misjudged as wiring errors. In fact, these phenomena caused by light load are unrelated to wiring errors, so accurately identifying light load characteristics and avoiding misjudgments is crucial. During light load, the power factor typically decreases significantly, especially when reactive power is poorly managed. The decrease in power factor is due to the relative increase in reactive power, particularly in systems with high inductive or capacitive loads. This abnormal change leads to a deviation in the relationship between current and voltage. Misjudgments of wiring errors under light load typically occur in high-load users with dedicated transformers, where the transformer operates near no-load conditions.
Assuming the transformer has a capacity of
S (in kVA), no-load loss power is
(in kW), and the no-load current accounts for
of the rated current, the calculation formula for the no-load power factor
is as follows:
where
represents the no-load power factor of the transformer (dimensionless);
is the no-load loss power (in
);
S is the rated power of the transformer (in
); and
denotes the percentage of no-load current relative to the rated current (dimensionless).
As shown in
Table 1, for a 10 kV class 30–200 kVA three-phase dual-winding transformer without excitation regulation, the rated capacity, no-load loss, and no-load current are listed, and the no-load power factor of the power transformer is calculated. Therefore, under light load conditions, the primary side of the transformer will have very low measured power.
When the power factor is low, the phase angle of the transformer becomes large. According to Formula (1), when the phase angle of the first component is between and , the power of the first component becomes negative. This causes negative current to appear, which may lead to the misjudgment of reverse current polarity as a wiring error. Taking the 100 kVA S13 series transformer as an example, and assuming the transformer’s no-load power factor is 0.302, the following can be observed. In the light no-load operation state, the current on the primary side of the transformer is mainly composed of no-load current, while the active power output on the secondary side is very small or even zero. Nevertheless, the primary side still needs to draw a certain amount of active power from the grid. This is to compensate for the hysteresis loss and eddy current loss caused by magnetic flux saturation in the core. These losses result in leakage reactance. In this process, the current on the primary side of the transformer is mainly magnetizing current, where the reactive power component far exceeds the active power component. Therefore, the current on the primary side of the transformer is mainly used to maintain the magnetization of the core, rather than directly for power transmission. Assuming a light load condition, if the power factor of the transformer is 0.45, slightly higher than the no-load power factor, the corresponding power factor angle is about , still within the range of to , resulting in negative power. Due to this, the current may be misjudged as having reverse polarity, thus generating a false impression of a wiring error.
2.2.2. Overcompensation
Capacitive overcompensation refers to a situation in power systems where reactive power compensation devices (such as capacitor banks) provide more reactive power than the actual reactive power demand, resulting in excess reactive power within the system. This phenomenon typically occurs when the load is light or the load structure has a large proportion of capacitive loads. The result of overcompensation is an increase in voltage levels, along with issues such as current direction reversal or excessively high power factor in the system. When the load is capacitive or experiences capacitive overcompensation, the value of
may range from
to
, at which point phase C exhibits negative power, and the electricity meter shows negative current for phase C, as shown in
Figure 1.
In a particular building materials company using a three-phase three-wire electricity meter, negative current appeared in phase C, as shown in
Table 2. During the early morning hours, only partial security loads were being used, and the transformer was almost in a no-load state. Meanwhile, the centralized reactive power compensation device of the user experienced a failure in its automatic switching function and could only operate manually. The continuous operation of the reactive power compensation device, under light load conditions, led to excessive reactive power being provided by the capacitor bank, causing an excess of reactive power in the grid. This manifested as an increase in voltage and a reversal of current direction. Since phase C is one of the monitoring points for the electricity meter, when the reactive power of that phase reverses, the electricity meter records negative current. Therefore, it can be concluded that the negative current in phase C was caused by current reversal due to capacitive overcompensation under light load conditions.
2.3. Supplementary Sample Data
Through the mechanistic analysis of light load and capacitive overcompensation scenarios, we can observe that the load power factor is typically lower, especially when
is between
and
, causing the power of phase A to be negative and resulting in the generation of negative current. In the case of capacitive overcompensation, the value of
is usually between
and
, causing the power of phase C to be negative, which leads to the appearance of negative current. To better align with real-world application scenarios, we have defined the following phase angle ranges to generate sample data:
where,
and
represent the phase angles of phases A and C, respectively. The ranges in (5) and (6) operationalize the sign conditions of the two-element meter:
when
, and
when
. We exclude the exact boundary by using
/
to avoid ambiguity under measurement noise, and we cap the upper magnitude (
) to reflect typical light-load power-factor angles observed in
Table 1 and the field example (e.g.,
), thereby focusing on regimes that actually produce negative segment power/current and potential misidentification. Under the condition of three-phase balance, assuming
, we generate the corresponding power factor angles using this range. Throughout this work, complex-load conditions are determined by the phase-angle ranges in (5) and (6), which capture the mechanism most relevant to misidentification under light load and capacitive overcompensation. For normal-load and wiring-error categories, the meter model is used to programmatically simulate wiring configurations (e.g., voltage-sequence error, current polarity/terminal reversals) to obtain labeled synthetic samples.
Based on the sample range determined by the above formula, the existing three-phase two-element data from the dataset are mixed with the generated sample data. The current three-phase two-element dataset includes six features and one label. The six features are
, where power data can be calculated using the following formulas:
where,
, and
represent the phase angles between the current and voltage. By using these formulas, we can generate sample data based on the power factor angles. For example, under light load conditions, we assume the phase voltage is 100 V, the current is 0.2A, and the power factor angles
and
are generated within the range of
and
. Ultimately, this data will generate electrical characteristics corresponding to the electricity meter output.
Figure 2 summarizes the workflow. First, an equation-based meter model (Equations (
5)–(
10)) generates synthetic samples by applying the phase-angle ranges in (
5) and (
6), producing the feature tuple voltage, current, phase angle, active/reactive power. These samples are merged with field records to construct the dataset. The data are then normalized and split (70%/30%), after which the LightGBM classifier is trained and evaluated. The outputs comprise the predicted wiring category and the corresponding feature-importance distribution. This modeling and data synthesis were implemented in Python 3.9.21 (Python Software Foundation, Wilmington, DE, USA) to reproduce light-load and overcompensation conditions via the phase-angle mechanism specified in Equations (
5) and (
6).
Equations (1)–(3) provide the two-element wattmeter relations that govern when measured segment power can change sign. Under light-load conditions, this may occur as the phase angle increases, explaining the power-sign flips discussed earlier. Building on this physical basis, the analyses of light load and capacitive overcompensation identify the sign conditions that trigger negative segment power/current (e.g.,
,
), which directly motivate the phase-angle ranges used for sample generation in Equations (5) and (6). A worked example (e.g.,
with
within those ranges) illustrates how the equations map to meter outputs and derived features;
Figure 1 provides the vector interpretation of the negative-current case under overcompensation. This section therefore supplies the mechanistic prior that guides data synthesis when field data are scarce and aligns the physics with the features used by the learning model.
3. Three-Phase Wiring Error Identification Model Based on LightGBM
To address the issue of misjudgments caused by the similarity between wiring errors in electricity meters and changes in load characteristics, which often lead to unnecessary operational and maintenance disruptions in the power system, this paper proposes a wiring error identification method based on the intelligent algorithm LightGBM. By combining the dynamic characteristics of the power system and utilizing advanced machine learning techniques, an efficient classification model is trained to accurately identify wiring errors in electricity meters. LightGBM is an efficient implementation of GBDT that performs classification tasks by constructing multiple decision trees. The advantage of LightGBM lies in its ability to effectively improve training efficiency when handling large-scale datasets, while also supporting imbalanced data, making it naturally suitable for the task of wiring error identification.
In the LightGBM model, the choice of the objective function directly affects the model’s performance. Wiring error identification is a multi-class problem, where each class corresponds to a different type of wiring error. In this case, using multi-class logarithmic loss (multi_logloss) as the loss function effectively handles the multi-class problem. The definition of the multi-class logarithmic loss function is as follows:
where:
is the true label of sample
i for class
c. If sample
i belongs to class
c, then
, otherwise
.
is the probability predicted by the model that sample
i belongs to class
c.
N is the total number of samples, and
C is the total number of classes. This loss function evaluates the model’s performance by calculating the difference between the predicted probability distribution and the actual labels. Minimizing this loss function can improve the accuracy of wiring error identification.
3.1. Tree Construction and Splitting
LightGBM uses a decision tree-based GBDT, which improves the model’s performance by constructing a series of decision trees. In the wiring error identification task, electrical features such as voltage, current, and phase angle are used as input features. The goal of LightGBM is to build trees that learn how to distinguish between correct and incorrect wiring based on these features. In each training round, LightGBM generates decision tree nodes by selecting the best splitting feature and split point. The criterion for selecting the split point is information gain, which is the reduction in the loss function brought by splitting based on a particular feature. Specifically, LightGBM uses the following formula to calculate the split gain for a given feature:
where:
is the loss function value of dataset
;
is the dataset before the split;
and
are the subsets of the data on the left and right sides of the split, respectively. This gain represents the reduction in model error due to the feature split. The larger the gain, the greater the contribution of the feature to the classification.
In the actual wiring error identification scenario, the model must determine whether the wiring is correct based on the phase relationship between current and voltage. When specific electrical features such as phase angle or power exhibit anomalies, the model will use these changes to determine whether a wiring error has occurred. Phase angle features are particularly important in this process because they directly affect the calculation of power and the direction of current. In particular, under light load and reactive power compensation conditions, abnormal changes in power factor can cause drastic fluctuations in phase angle, and the model will make an accurate judgment based on these features.
3.2. Regularization and Overfitting Prevention
To prevent overfitting during the training process, LightGBM uses regularization strategies to control the complexity of the model. Regularization terms are used to penalize complex tree structures in the model, particularly the number of leaf nodes and the weights of each leaf node. Common regularization terms include L1 regularization and L2 regularization, with L2 regularization being widely used to control the weights of tree leaf nodes. The formula for the L2 regularization term is as follows:
where,
is the weight of the
j-th leaf node, and
T is the number of leaf nodes in the tree. The introduction of the regularization term helps reduce the complexity of the model, preventing overfitting on the training dataset, thereby improving the model’s generalization ability. This is especially important in wiring error identification tasks, where electrical feature data may be disturbed by external environmental factors, causing the model to need to make accurate predictions under more complex conditions.
3.3. Optimization of the Loss Function and Second-Order Taylor Expansion
LightGBM uses an optimization strategy that approximates the loss function by a second-order Taylor expansion, thereby efficiently optimizing the objective function. Through Taylor expansion, the optimization problem of the objective function is transformed into a quadratic problem, which accelerates the solving process. The Taylor expansion approximation of the objective function
is as follows:
where,
is the gradient of the objective function with respect to the parameter
;
is the Hessian matrix of the objective function, representing the second-order derivative information;
is the change in the model parameters. Through the second-order Taylor expansion, LightGBM is able to more accurately calculate the changes in the loss function, thereby accelerating the model’s convergence process.
3.4. Handling Class Imbalance Issues
In the wiring error identification task, the number of error samples is usually much smaller than that of normal wiring samples, and the number of samples for each error category varies, leading to class imbalance in the data. To mitigate the impact of class imbalance, LightGBM provides settings for sample weights and class weights, allowing higher weights to be assigned to samples from minority classes, so that the model focuses more on the minority class samples during training. The formula for adjusting class weights is as follows:
where,
is the weight of sample
i;
is the loss function. By assigning higher weights to wiring error samples, the model can focus more on these hard-to-classify samples during training, thereby improving the identification rate of wiring errors.
In this work, per-sample weights are assigned by class frequency with normalization. For a sample
i of class
, the weight is:
where
N is the total number of samples,
C is the number of classes, and
is the number of samples in class
.
3.5. Early Stopping Strategy in the Training Process
To prevent overfitting of the model, an early stopping strategy is set during the training process of LightGBM. This strategy is based on the performance on the validation set. After several consecutive training rounds, if the performance on the validation set does not show significant improvement, training is stopped early. The early stopping strategy can effectively prevent the model from overtraining, thereby enhancing its generalization ability. Specifically, the condition for early stopping is as follows:
where,
is the validation set loss after the
t-th round of training;
is the set tolerance.
3.6. Feature Set Used for Training
The input vector comprises: A-phase current, C-phase current, A-phase voltage, C-phase voltage, A-phase voltage angle, C-phase voltage angle, A-phase current angle, C-phase current angle, total active power, A-phase active power, C-phase active power, total reactive power, A-phase reactive power, and C-phase reactive power. Phase angles and power quantities are measured or derived from the meter equations introduced in
Section 2.
Through the above training and construction process, the LightGBM model can fully consider the complex relationships between electrical features when handling the wiring error identification problem. It effectively classifies the data using the structure of decision trees. By optimizing the loss function, regularizing the model to control complexity, and accelerating the second-order Taylor expansion, LightGBM can accurately identify wiring errors in complex power load scenarios with a high proportion of distributed energy integration, providing a reliable intelligent identification method. The overall workflow of the proposed LightGBM identification method for three-wire wiring errors is shown in
Figure 3.
4. Case Study
To validate the feasibility of LightGBM in three-phase three-wire wiring error identification, this section will comprehensively evaluate the performance of this method for the task. All experiments were executed on a workstation equipped with an Intel® Core™ i7-14700K CPU and an NVIDIA GeForce RTX 4060 Ti GPU. Based on the theoretical framework from
Section 2 and the model of the three-phase electricity meter, several synthetic datasets were generated, and data collected from actual scenarios was added. These datasets include two categories of labels: “normal wiring—overcompensation” and “normal wiring—no overcompensation,” to enhance the reliability of the experiment.
Table 3 shows the distribution of data for different categories. Data were randomly shuffled and split into 70% training and 30% testing sets, and the training set was normalized. To prevent “data leakage” during the testing process, the normalizer was trained on the training set and then applied to the test set. For easier presentation of the results in subsequent examples, we assigned numbers to the category labels, and the specific correspondence is shown in
Table 3. Labels 6–12 correspond to wiring error categories, while labels 1–5 are non-error conditions used to evaluate misidentification under complex loads. Training used multilogloss with early stopping (patience 50 rounds) on the validation fold drawn from the test portion; class imbalance was handled via class-frequency per-sample weights as described in
Section 3.4. The specific parameters of the model are shown in
Table 4.
4.1. Training Evaluation
In addition to LightGBM, this study also selected various other models, such as XGBoost, decision tree, random forest, and the neural network model Multilayer Perceptron, to perform a comparative analysis on the test dataset for three-phase three-wire wiring error identification. Among these baselines, the Multilayer Perceptron serves as the non-tree comparator used in our experiments. In addition, XGBoost, LightGBM, and Multilayer Perceptron are iterative models that continuously reduce the loss, using the loss function of mlogloss.
As shown in
Figure 4, the convergence curves of the three iterative optimization models—XGBoost, LightGBM, and Multilayer Perceptron—are presented. Overall, the loss of all three models gradually decreases as the number of training rounds increases, eventually converging to a stable value. Specifically, LightGBM converged to 0.00317, while XGBoost and Multilayer Perceptron converged to approximately 0.18 and 0.17, respectively. It is clear that LightGBM shows a significant advantage in both the speed and extent of loss reduction. By the 50th round, the loss of LightGBM had decreased to 0.14, lower than the losses of XGBoost and Multilayer Perceptron at the 300th round. Compared to the initial loss, LightGBM’s loss decreased by 92.4%, while XGBoost and Multilayer Perceptron showed reductions of 75.9% and 83.4%, respectively. Therefore, in the three-phase three-wire wiring error identification task, LightGBM demonstrates more stable and efficient training performance, both in terms of the final converged loss value and the speed of loss reduction.
The stability and speed of LightGBM during the training process are key factors contributing to its shorter training time. To verify this, we further compared the training times of these three models.
Figure 5a shows the training times for the three models as the number of training rounds increased from 10 to 800, with a step size of 10. It can be observed that as the number of training rounds increases, the training time of all three models increases significantly, especially for Multilayer Perceptron. By the third round, the training time had reached 8.75 s, and by the fourth round, it exceeded 10 s, which could impose a significant computational burden on practical deployments of three-phase three-wire wiring error identification. In contrast, the computational time of XGBoost increased linearly with the number of training rounds, while LightGBM’s computational time stabilized after around 100 rounds, remaining within a fluctuation range of about 0.5 s.
To improve the reliability of the experimental results, we further performed 20 repeated tests at the 300th round for these three models and presented the test results using box plots, as shown in
Figure 5. From the figure, it is evident that the average computation time for Multilayer Perceptron is about 80 s, while the computation times for XGBoost and LightGBM are both kept under 5 s. Notably, LightGBM’s computation time is significantly smaller, with most results concentrated within 2 s, and its computation time distribution is more stable, with fewer outliers compared to the other two algorithms. This demonstrates that LightGBM has a significant advantage in terms of computation time.
In summary, LightGBM, with its fast convergence speed and shorter computation time, is particularly well-suited to scenarios involving large datasets, high demand for light deployment, and relatively heavy computational loads.
4.2. Performance Evaluation
For the trained models, we evaluate them using three metrics: accuracy, precision, and F1-score. The definitions of these metrics are as follows: Accuracy measures the overall correctness of the model by calculating the ratio of correctly predicted samples to the total number of samples in the dataset. The formula for accuracy is:
where,
represents the number of true positive predictions (correctly predicted positive samples);
represents the number of true negative predictions (correctly predicted negative samples);
represents the number of false positive predictions (incorrectly predicted as positive);
represents the number of false negative predictions (incorrectly predicted as negative). Precision is the ratio of correctly predicted positive observations to the total predicted positives. It is a measure of the accuracy of the positive predictions. The formula for precision is:
Recall measures the model’s ability to identify all relevant positive samples. It is the ratio of correctly predicted positive observations to all actual positive observations in the dataset. The formula for recall is:
The F1-score is the harmonic mean of precision and recall. It is a balanced metric, especially useful when the class distribution is imbalanced. The formula for F1-score is:
As shown in
Figure 6, we trained three models with training rounds ranging from 5 to 300, with a step size of 5. In this test, decision tree and random forest models were compared using fixed parameters as they do not require iteration. From the three graphs, it is clearly evident that LightGBM outperforms the other models in terms of accuracy, precision, and F1-score across different training rounds. First, the curve for LightGBM is smoother and more stable compared to the Multilayer Perceptron, showing a more consistent performance, which is crucial for practical recognition scenarios. Secondly, from the training curve around the 60th round, LightGBM significantly outperforms the other four models in terms of accuracy, precision, and F1-score, and after approximately 80 training rounds, the overall recognition performance exceeds 80%. As shown in
Figure 6d, the three models only completed full convergence by the 300th round (in fact, LightGBM was close to convergence by around the 200th round). Therefore, we selected the experimental data from the 300th round for model performance comparison, with the corresponding data shown in
Table 5 and visualized in
Figure 6d. It is clear that LightGBM outperforms in all three evaluation metrics.
To validate the model’s recognition accuracy for various label types, we plotted the confusion matrix and statistical graphs of recognition accuracy for each label type, as shown in
Figure 7. From
Figure 7a, it can be seen that for labels 1 to 4, LightGBM achieved a recognition accuracy of 100%. These four labels correspond to wiring error types in scenarios with new load characteristics, which are often misjudged as common wiring errors, causing unnecessary operational disruption and requiring field inspection and confirmation by operators. LightGBM’s high recognition accuracy can significantly reduce the workload of manual field inspections. Notably, the “normal wiring—no overcompensation” and “normal wiring—overcompensation” labels are based on data collected from real-world scenarios, and the model is also able to accurately recognize these, demonstrating its good adaptability to real-world scenarios. For labels 5 to 12 (which are more challenging conditions), the model shows some degree of misjudgment, but in this case, LightGBM still performs the best in terms of recognition accuracy across all label categories. As shown in
Figure 7b, LightGBM achieves the highest accuracy in nearly all label categories.
4.3. Feature Importance Analysis
According to the feature importance extracted from the gain values, as shown in
Figure 8, the importance of each feature varies during the prediction process. The most important input feature is the total reactive power, which accounts for 19.8% of the total feature importance. Based on the theoretical analysis in
Section 2, both light load and overcompensation can lead to an increase in reactive power, which is consistent with the results of the model’s feature importance analysis.
On the other hand, the four least important features are the voltages of phase A and phase C, as well as the phase angles of the voltages for phase A and phase C (the phase angle of phase A’s voltage is not shown in the figure because its importance is zero). The combined importance of these features accounts for only 1.06% of the total feature importance. Analyzing the input data, the voltages of phase A and phase C fluctuate around 100V, which is unrelated to specific error types. The phase angles of the voltages of phase A and phase C, as reference values for the corresponding current phase angles, do not contribute significantly to error type identification, which explains their low feature importance. In contrast, the phase angles of the currents for phase A and phase C play a crucial role in error type recognition. The combined importance of these features accounts for 26.21% of the total feature importance. Because total reactive power is the dominant driver and phase-voltage features contribute little, suspected wiring errors should first be screened for reactive conditions. This triage helps distinguish special-load phenomena from true miswiring and can reduce unnecessary site visits.
In summary, LightGBM not only performs excellently in training time and convergence speed but also maintains high accuracy in identifying various error types. Additionally, the feature importance analysis further validates that LightGBM accurately identifies key input features, thus consistently maintaining high accuracy in recognizing different error types.
5. Conclusions
This paper proposes an intelligent identification method for wiring errors in three-phase three-wire electricity meters based on the LightGBM algorithm. The approach effectively addresses the challenges of wiring-error detection in new power systems, especially under complex load conditions such as light load and overcompensation, significantly improving accuracy and automation. The main contributions of this article are as follows:
The study reveals the impact of light load and overcompensation on wiring error identification. Specifically, under light load conditions, the power factor decreases, causing large variations in the A-phase phase angle, which falls between 60° and 90°, resulting in negative power and a misjudgment of a wiring error. In the case of overcompensation, an increase in reactive power leads to changes in the C-phase phase angle, which falls between −90° and −60°, producing negative power and resulting in a misidentification as a wiring error. These insights provide support for the data-driven classification model through mechanistic analysis, especially when data is insufficient, offering a theoretical basis for model training.
The proposed LightGBM-based wiring error identification method, which offers low training costs and faster training speed, enables automated detection of wiring errors in complex load environments. Experimental results show that by the 50th round, LightGBM’s loss decreased by 92.4%, and training for 300 rounds required less than 2 s, far outperforming XGBoost and MLP. Overall identification performance exceeded 80%, with 100% accuracy in the “Correct wiring—photovoltaic,” “Correct wiring—light load,” “Correct wiring—no compensation,” and “Correct wiring—compensation” categories.
Combining theoretical analysis with the experimental results, feature importance analysis indicated total reactive power as the most important input feature, contributing 19.8% to the total feature importance. Meanwhile, A- and C-phase voltages and their phase angles contributed only 1.06%, showing weak association with error types. These results provide important support for further optimization of the model.
Future work will expand field datasets and run cross-site, time-separated external tests to better evaluate generalization. Robustness to measurement noise and partial sensor outages will be systematically assessed, and class thresholds calibrated. Dependence on synthetic data will be progressively reduced via broader real-world coverage and domain-shift checks. We also plan to extend the framework from three-phase three-wire to three-phase four-wire settings and surface key drivers (e.g., total reactive power and phase-angle cues) in operator-facing reports to support actionable triage.