1. Introduction
The level of residential electrification has been greatly raised along with the social economic level growth as well as scientific and technological progress. The increase of household appliances not only increases the load of electricity, but also increases the potential safety hazards of electrical systems.
The United States fire service [
1] reports that there have been about 380,200 residential building fires in the United States during 2013 and 2015. These fires caused a total of 2695 deaths, about 12,000 people were injured, the economic losses amounted to
$700 million and it caused great harm to society. Electrical fires account for 30.4% of the total number of residential building fires and 46% of extraordinarily serious residential building fires [
2], which are one of the most frequent types of residential building fires. Electrical fires are caused by the chemical reaction between the ignition source (hot spot, arc, spark), oxidizer (oxygen) and combustible (close to the fire source in the line). Through the research of residential building fires data, it was found that the causes of electrical fire accidents [
1] are usually arcs, electric leakages, over-currents and overheating of electrical equipment. Circuit breakers in the circuit are able to realize the protection of over-currents, electric leakages and other electrical faults, but lack of the ability to effectively realize the detection of arc faults. The fires caused by arc faults account for 82% [
3] of electric fires, posing a great threat to the safety of the people’s lives and property.
Arcs are luminous electrical discharge between electrodes, which are able to generate a huge amount of heat. The temperature of the arc column is capable of reaching 20,000 K [
4] and nearby combustible material can be easily ignited. There are three types of arc faults [
5]: ground arc faults, series arc faults and parallel arc faults. The effective value of current is higher than that under normal operations when parallel arc faults and grounding arc faults occur in the circuit. These cases are equivalent to short circuit faults and the breaker is competent in protecting the circuit in time. However, the effective current value in the line is less than that during normal operations when arc faults occur, which are hard to identify [
6,
7]. Therefore, how to effectively identify series arc faults in the line is an urgent problem needing to be solved, which is of vital practical significance.
There has been much research on arc fault detection methods at present. Some researchers have studied the mathematical modeling of arcs by collecting experimental data [
8,
9,
10] and thus obtained the parameters of the arc model. However, the research on arc fault models has merely stayed in the simulation stage. There will be strong light and a high temperature, electromagnetic radiation, noise and other physical phenomena when an arc fault occurs, therefore, researchers are able to identify arc faults by detecting the occurrence of these physical phenomena [
11,
12,
13]. Based on the features extracted from the electromagnetic radiation signal, least squares [
14] and extreme learning machine [
15] are used to identify arc faults by collecting the voltage signals in the circuit. However, the arc faults are very random and their locations are uncertain. It is difficult to apply the detection methods which are based on sound, light, heat, voltage and electromagnetic signals in the actual environment because the sensors are placed in fixed positions.
In order to solve the problems of fault detection in terms of location and isolation on the transmission lines, a lot of research has been conducted. A fault location method that uses the faulted negative-sequence voltage and locates the faulted sections [
16] was proposed by applying the relationship between the fault distance and the clustered measurement groups. For a load change that would not lead to a 180° phase angle change, the current-only method [
17] was able to detect the overcurrent fault without requiring the measurement of voltage. The EnKF-based approach [
18] is able to accurately locate the short-circuit faults on transmission lines by ignoring the foreknowledge of either the fault type and an approximate guess of the fault location. Arc faults are very random and intermittent when compared to short-circuit faults and high-impedance faults, therefore, it is unsuitable to apply the methods for fault detection and the detection of their location in the transmission line mentioned above in the detection of arc faults.
Nowadays, the fault detection method of series arcs based on the line current signal is getting a lot of attention. Lu [
19] took the root mean square of the current signal as the fault feature. This method is simple and shows good performance in real-time, but determining the threshold of different loads is difficult. The harmonic power and the peak-to-peak value of the current are extracted under the conditions of linear loads and nonlinear loads, and the Mahalanobis distance method was employed as a classifier to realize the discrimination of AC arc faults [
20]. Based on the multi-resolution feature of wavelet decomposition, Zhang [
21] analyzed the components of the current signal at different frequency bands by wavelet transform (WT) and the wavelet energy was extracted as the fault feature. Chirp zeta transform (CZT) [
22] and WT [
23] were utilized to analyze the spectrum of the current signal and extract the time-frequency domain characteristics for arc fault detection. However, the microwave oven, computer and other nonlinear loads will produce a wealth of harmonic components under normal operation and this method is prone to misjudgment if only a time-frequency component is used as the fault characteristic. By obtaining the wavelet coefficient sequence, the characteristic matrix based on singular value decomposition is defined and used as the basis of series arcing fault detection [
24]. The different load types are of different characteristics, therefore, the threshold is hard to determine. The image processing method is another way to realize the detection of arc faults and the gray level-gradient co-occurrence matrix was proposed to extract the features [
25]. However, the process of constructing the gray image is able to be undertaken with the loss of detailed information about the current. Sparse representation is able to directly reduce the dimension of the original signal to construct the high-dimensional feature [
26] and a neural network is applied to realize the arc fault detection. However, the overlapping points of the features in the feature space make the features unable to effectively represent the characteristics of the arc faults under different load types. The two-dimensional identification features are constructed by extracting the flat shoulder phenomenon and the information dimension of the current signal [
27] but non-linear loads also have flat shoulder phenomena, which are likely to cause misjudgments. Based on the fault features extracted from the current signals, some researchers have applied machine learning methods such as support vector machine (SVM) [
28], LSSVM [
29], BPNN [
30] and the Kalman filter (KF) [
31] to detect arc faults. However, SVM, LSSVM and KF are binary-classifiers and several classifiers need to be trained at the same time when handling multiple classification problems. The BPNN is prone to over-fitting and falling into the local extremum.
Permutation entropy (PE), a method to detect the random fluctuation and dynamic change of time series [
32], has been successfully applied in areas such as mechanical fault diagnosis [
33,
34], biomedicine [
35,
36] and so on. PE is able to measure the complexity of the signal effectively, has excellent anti-noise ability, and is capable of detecting non-linear and non-stationary signals as well. The randomness and complexity of the current signals are enhanced when series arc faults occur in the circuit so that the permutation entropy can be utilized to measure the current change when a series arc fault occurs. The calculation of permutation entropy is based on the single scale time series similar to the traditional parameter extraction method. Costa [
37] proposed the method of multi-scale analysis to measure the complexity and randomness of the time series at different time scales so that the important information of the current signal at different time scales can be extracted by using the multi-scale analysis method to calculate the entropy of permutation. However, the multi-scale analysis is the procedure of averaging the original time series within a τ-length window and then downsampling it by a scale factor of τ, which will result in an unstable measurement of the permutation entropy. Azami [
38] proposes an improved multi-scale permutation entropy (IMPE) to avoid this problem. Due to the excellent performance of IMPE in extracting the intrinsic features of the signal, this paper utilizes IMPE to be a part of the features for the detection of serial AC arc faults.
WPT [
39] is an improved form of wavelet transform (WT) which is able to decompose the signals into a high-frequency part and low-frequency part at the same time and it is a more detailed decomposition method compared to WT. When the signal state changes, the proportion of the signals at each wavelet packet layer in terms of the total energy will change [
40]. Compared with normal operations, the energy distribution of signals in different frequency bands will change when series arc faults occur. The energy-entropy of the wavelet packet is suitable for measuring the homogeneity of the energy distribution of multi-layer current signals decomposed by wavelet packet transforms. It is applicable to measure the state change of the current signal state by the energy-entropy of the wavelet packet when series arc faults occur. Therefore, in this paper, wavelet packet energy-entropy is used as one of the fault characteristics to detect serial arc faults.
Random forest (RF) [
41] is an effective machine learning method, which can be applied to solve the problem of classification and regression. Random forest is an ensemble learning model based on the decision tree model, which combines Bagging and random subspace theory. RF is robust and is adept in processing high-dimensional features, and it has been widely used in fields such as machinery [
42] power electronics [
43], image processing [
44] and the biological field [
45]. There are no reports on the application of RF in series AC arc fault detection. In this paper, the method based on RF for series AC arc fault detection is proposed.
In order to improve the detection efficiency of the series arc fault detection method for different working states of different load types, this paper proposes a novel arc fault detection method by taking advantage of IMPE, WPT, singular value decomposition (SVD) and RF. The experimental results indicate that the working state of different loads of the test set can be accurately detected based on the method proposed by this paper and that the comparison experiment results demonstrate that the method proposed by this paper has a better performance than the prior methods. This method is also able to avoid incorrect detection in transient event experiments.
The rest of this paper is organized as follows. In
Section 2, we introduce the experimental platform and collect the experimental data.
Section 3 utilizes the IMPE, WPT and SVD to acquire high-dimensional fault features. In
Section 4, the designed RF model is applied to identify the work states of the different load types, the comparison with prior methods is given, and the reliability of RF is further verified by the transient event experiments.
Section 5 presents the conclusions.
4. The Detection of Serial Arc Fault
4.1. RF
RF is an ensemble algorithm that is composed of multiple decision trees [
41] and the detection result of RF is determined by decision tree voting. The construction of the random forest classifier is mainly divided into the following three steps:
1) Adopting the bootstrap resampling method to extract the J training dataset from the original data set and generating J decision trees;
2) At each internal node of the tree, f features are randomly selected from F features (f ≦ F) as candidate features. According to the principle of the minimum non-purity of nodes, an optimal feature is selected from f candidate features for the splitting and growth of the node;
3) The decision trees obtained by training are used to constitute the random forest classifier, which classifies the new dataset according to the voting results of the decision trees.
4.2. Analysis of detection results
In this paper, the 8 dimensional features obtained based on the training set are used to train the RF and then the RF is utilized to realize the detection of arc faults. The prior methods, BPNN and LSSVM, are used as the comparison.
The number of decision trees in RF affects the performance of the detection [
41]. The more decision trees there are, the higher the diversity of the classifiers that is guaranteed. After the number of decision trees reaches a certain number, the performance tends to be stable, but the training time and the complexity of algorithm increases. If the number of decision trees is too small, it will lead to poor performance and a large detection error. First of all, the influence of the number of decision trees on the detection accuracy is studied in order to select the appropriate number of decision trees. The change range of the number of decision trees is set as 1–40 and the step size is 1.
Figure 13 is the change curve of the training time (
Figure 13a) and detection accuracy (
Figure 13b) with the increase in the number of decision trees. As shown in
Figure 13a, the training time gradually increases with the increase in the number of decision trees. In general, the training time and the number of decision trees present the relationship of a linear function. As shown in
Figure 13b, the detection accuracy increases along with the increase in the number of decision trees in the first half of the curve. When the number of decision trees reaches 23, the detection accuracy is 96.42%. As the number of decision trees continues to increase, the detection accuracy is stabilized between the range of 96% and 97%, but the training time will increase in the form of a linear function. Therefore, the number of decision trees is 23 by comprehensively considering the detection accuracy and training time.
The trained RF was used to detect the test set. The allocation of the label and detection results of various loads in different operating conditions are shown in
Table 4.
Figure 14 is the confusion matrix of the detection results. In
Figure 14, the horizontal axis and vertical axis represent the actual labels and the labels of detection, respectively. For example, there are 45 samples in the test set which are accurately detected when the actual label is 4. A total of 3 samples and 5 samples in the test set are detected into label 7 and label 9, respectively. In other words, 8 samples are mistakenly detected.
We are able to come to the following conclusion:
1) The total detection accuracy is over 90%, reaching 96.71% (677/700), indicating that the trained RF is able to effectively detect the arc fault of different load types;
2) During normal operations: the detection accuracy reached 100% when the load type was one of the following: induction cooker (label 1)、incandescent lamp (label 3)、hairdryer (label 5)、electric hand drill (label 9)、electric oven (label 11) and vacuum cleaner (label 13). Only one sample was misjudged—the notebook computer (label 7)—and this condition is detected as the normal operation of the hairdryer, with a detection accuracy of 98%. The total detection accuracy under different loads during normal operations is 99.71%. This shows that the trained RF works well in identifying the different conditions of each load based on the high-dimensional fault features extracted;
3) During the serial arc fault conditions: the detection accuracy is 100% when the load types are electric drills (label 10) and vacuum cleaners (label 14). The accuracy is 98%、84%、94%、86% and 94% when the load is the induction cooker, incandescent lamp, hairdryer, notebook computer and electric oven, respectively. The total detection accuracy during serial arc fault conditions is 93.71%, which is lower than that during normal operations. This is because the arc is a kind of gas discharge phenomenon with an unstable state and the random fluctuation of the current in the circuit is enhanced simultaneously during serial arc fault conditions. Thus, the process of feature extraction and state detection will be disturbed. Consequently, the total detection accuracy of the loads during serial arc fault conditions is significantly lower than that during normal operations.
4.3. Comparison with Prior Methods
In order to further verify the performance of the trained RF, this section gives a comparison with the prior methods: BPNN and LSSVM. BPNN and LSSVM are trained and tested using the same dataset as RF.
BPNN [
30] is composed of an input layer, hidden layer and output layer. The neurons of each two adjacent layers are connected by a weight and the neurons in the same layer are not connected. A back-propagation algorithm is used to adjust the weights between neurons in the network during the training stage and the output of the network is able to approximate the actual output. In this paper, the features are 8 dimensions and there are 14 states based on the 7 load types. Therefore, the number of neurons in the input layer and output layer are 8 and 14 respectively. The learning rate and the number of hidden layer nodes are undetermined parameters. The number of nodes in the hidden layer affects the performance of the neural network. If the number of hidden layer nodes is too few, the system error of the network is independent to the training sample and it will result in the loss of its generalization ability. If the number of hidden layer nodes is too many, the complexity of the network will increase. Therefore, the training time will be too long and there will be an "overfitting" phenomenon. If the learning rate is too low, the training cost will increase. If the learning rate is too high, the training will not converge or even diverge.
SVM is a method based on statistical theory, which uses structural risk minimization (SRM) to construct the optimal separable hyperplane by mapping linearly indivisible data to high-dimensional space through the inner product kernel. LSSVM is an improved form of SVM [
29]. LSSVM transforms the quadratic constraint problem of the SVM inequality optimization into linear problem solving and greatly increases the computational efficiency. A single LSSVM is able to solve the binary classification problem. The arc fault detection in this paper belongs to the multiple classification problem. Here, one-versus-one coding is used to construct multiple LSSVMs to realize multiple classification. Since there are 14 working states for the 7 loads in this paper, it is necessary to build 14 × (14-1)/2 = 91 LSSVMs. The kernel parameter sig
2 and penalty factor gam in the LSSVM affect the detection performance [
29]: sig
2 mainly affects the distribution complexity of the sample data in the high-dimensional feature space. The role of gam is mainly reflected in the adjustment of the confidence range and risk ratio in the eigenspace.
Here, in order to avoid blindly selecting the parameters, the cross-validation method is adopted to optimize the parameters of BPNN and LSSVM. The number of hidden layer nodes and the learning rate of BPNN are 190 and 0.16, respectively. Due to the limited space in this paper, the parameters of 91 LSSVMs are not listed.
Table 5 shows the detection result of different classifiers.
Figure 15 and
Figure 16 show the detection confusion matrix of LSSVM and BPNN, respectively. The following can be known:
1) The total detection accuracy of BPNN and LSSVM is 88.71% and 92.43%, respectively, and the total detection accuracy of RF is 96.71%. RF has the best detection performance, followed by LSSVM, and BPNN has the worst detection performance;
2) During normal operations, the number mistakenly detected by BPNN was 19 and the detection accuracy was 94.57%. The number mistakenly detected by LSSVM was 17 and the accuracy was 95.14%. The number mistakenly detected by RF was 1 and the detection accuracy was 99.71%. Compared to LSSVM and BPNN, RF is more effective in avoiding false alarms (the normal operation being misreported as an arc fault);
3) During serial arc fault conditions, the number mistakenly detected by BPNN was 60 and the detection accuracy was 80%. The number mistakenly detected by LSSVM was 39 and the detection accuracy was 89.71%. The number mistakenly detected by RF was 22 and the detection accuracy was 93.71%. During serial arc fault conditions, the detection accuracy of the three classifiers is lower than that during normal operations, but RF also has the highest detection accuracy; the LSSVM and BPNN accuracies are less than 90%;
Based on the above analysis, the performance of the RF arc fault detection outperforms the prior methods, such as LSSVM and BPNN.
4.4. The Experiments of Transient Events
The transient evens in this section include the start operations and stop operations of different load types. For the transient events beyond the operating conditions in
Table 5, we train the RF using the training set by ignoring the load type. If the sample was detected as the normal operation, this sample was correctly detected. If the sample was detected as the serial arc fault condition, this sample was erroneously detected. We record 20 samples of each load type during the start operations or stop operations and the record detecting length of the samples is 0.01s.
Figure 17 shows the waveform of different load types during the start operations and stop operations, and the detection results are shown in
Table 6. Only 1 sample of the induction cooker during the stop operations was erroneously detected and the total detection accuracy was 99.28%. The randomness and complexity of the current are at the normal level regardless of whether the loads are under the start or stop operations. The serial arc fault detection method that this paper proposed has an excellent performance in avoiding the incorrect detection of different load types during start and stop operations. We can come to the conclusion that the reliability of this method is trustworthy.
5. Conclusions
Arc faults are an important cause of electric fires, which brings a great challenge to residential electricity safety. In this paper, a novel arc fault detection method is designed to accurately detect the arc faults of various load types. The main content is summarized as follows:
1) The characteristic of the arc is mainly reflected in the high-frequency part of the current signal. Based on the matrix construction method, the high-pass filtering of the current signal is realized by using SVD. The matrix construction method is simple and effective and it is without any predefined parameters
2) This paper proposes the application of the IMPE in arc fault feature extraction for the first time. The high-dimensional fault features consist of IMPE, wavelet packet energy and wavelet packet energy-entropy, which is able to reflect the complexity of the signal and the distribution characteristics of the signal in different frequency bands. Additionally, the characteristics of different features during normal operations and serial arc fault conditions are analyzed in detail;
3) This paper presents the application of RF to arc fault detection for the first time. In this paper, based on the high-dimensional fault features extracted, the trained RF detects the normal operations and serial arc fault conditions of different load types effectively. The effectiveness of the RF designed in this paper is proved through comparative experiments, RF is of better performance in arc fault detection compared with BPNN and LSSVM. Whenever the loads are in the state of the start operations or stop operations, the method that this paper proposed is able to effectively avoid incorrect detections.