**1. Introduction**

With the frequent occurrence of climate changes caused by global warming [1], environmental problems have attracted more and more attention. In order to reduce carbon dioxide emissions, the use of fossil energy is limited and green energy is more and more widely used. Solar energy is a kind of green energy that adds no pollution to the environment. Photovoltaic (PV) systems can convert solar energy into electric energy for people to use conveniently [2]; they play an essential role in distributed generation systems [3], so they are widely used in households and other places where solar energy is plentiful [4]. However, arc faults on the DC side of a PV system may cause severe electrical fires due to the high temperature above 5000 ◦C, which may ignite surrounding combustible material [5].

**Citation:** Wang, Y.; Bai, C.; Qian, X.; Liu, W.; Zhu, C.; Ge, L. A DC Series Arc Fault Detection Method Based on a Lightweight Convolutional Neural Network Used in Photovoltaic System. *Energies* **2022**, *15*, 2877. https://doi.org/10.3390/en15082877

Academic Editor: Luis Hernández-Callejo

Received: 1 April 2022 Accepted: 12 April 2022 Published: 14 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Due to the harmfulness of the DC arc faults in PV systems, in 2011 the National Electrical Code (NEC) required that roof PV systems with DC voltage higher than 80 V must be equipped with series DC arc fault circuit breakers. In 2014, this requirement was applied to all types of photovoltaic systems to reduce the fire hazard caused by the DC arc fault [6]. Moreover, the location of arc faults is stochastic, and arc current may be disturbed by high-frequency noise due to the pulse width modulated (PWM) control of the PV inverter [7], which makes it challenging to detect DC arc faults.

DC arc faults in PV systems can be categorized as parallel arc faults and series arc faults [8,9]. Parallel arc faults are generally caused by line-to-line and line-to-ground short circuit faults. The current amplitude of parallel arc faults can be larger than the current amplitude in a normal state, and can be easily detected by current changes [10]. Poor connection of wires or insulation deterioration can result in a series arc fault. Conversely, the current for a series arc fault does not increase due to the limitations of the series load, and the current is more likely to be affected or even masked by the noise from the series load, which makes the detection of series arc faults more challenging than it is for parallel arc faults [11,12]. This research focused on series arc faults.

In order to solve the problem of DC arc fault detection, many scholars have proposed different detection methods. A series of physical characteristics occurs in the process of arc faults, such as arc light and electromagnetic radiation. Murakami et al. [13] used a high-speed camera to observe the light emission of arcs. Yue et al. [14] detected arc fault using intermittent discharges or sparks occurring before series DC arc faults. In [6,15], a method based on high-frequency components of electromagnetic radiation was used to detect DC arc faults. Using physical characteristics to detect an arc fault is not complicated. However, the arc location in a PV system is stochastic, so it is difficult to judge the arc location and detect arc faults accurately.

Since arc current is independent of arc fault location, it is the most common parameter for arc fault detection. Arc current usually has the characteristics of transient and stochastic changes, which can be detected by different time domain methods, frequency domain methods, and time-frequency domain methods. In [16,17], circuit current data were used to identify arc faults by the time-domain method. Park et al. [18] used the time domain method to detect an arc fault initially, then used the frequency domain method to ensure the accuracy of the detection. Gu et al. [19] proposed a method based on fast Fourier transform (FFT) to detect arc faults. However, FFT does not reflect the time domain information, so it is impossible to determine the exact time of arc occurrence [20]. Liu et al. [21] proposed a method combining the time domain and the time-frequency domain to analyze circuit current and PV-side voltage, which improved the anti-interference ability of arc fault detection. Wang et al. [22] and Chen et al. [23] used wavelet transform to analyze arc signals with multiple resolutions in the time-frequency domain. These methods are superior to the methods of detecting the physical characteristics of DC arc faults and are not affected by the location of the arc fault. They are simple and easy to realize, at low cost. However, the thresholds used to judge normal states and arc fault states are set artificially and need to be adjusted, due to different PV systems' current and voltage complexities [24]. Therefore, more efficient arc fault detection methods are required.

In recent years, some scholars have explored artificial intelligence methods in arc fault detection. The neural network has become the first choice because of its robust feature learning and detail recognition ability. Li et al. [25] proposed an arc fault detection method based on a back propagation (BP) neural network, and the accuracy was 95.23%. Yang et al. [26] proposed a temporal domain visualization convolutional neural network (TDV-CNN) method. The current data was filtered and converted into gray images as the input of the CNN, and the accuracy of arc fault detection was 98.7%. Lu et al. [27] proposed domain adaptation combined with a deep convolutional generative adversarial network (DA-DCGAN) to detect DC arc faults. The PV loop current data were converted into a 2D matrix as the input of DA-DCGAN. Pedersen et al. [28] used a radial basis network to detect DC arc faults. The network's inputs were vectors that simplified the processing steps

of the input data. Other neural networks can also be applied to arc fault detection. The neural network methods have high accuracy and do not need to set the threshold artificially. To further improve the accuracy of arc fault detection, the depth scaling, width scaling, and resolution scaling of the network need to be increased. However, if the three scales are added together, this dramatically increases the requirement for computer computing resources. Therefore, most existing methods add one network scaling to improve accuracy.

Although the existing AI-based arc fault detection methods have achieved good accuracy, higher than 95%, the accuracy needs to be further improved to reduce fire risk. Moreover, existing methods have not considered the situation of high current value and the influence of normal operations, such as PV inverter startup and irradiance mutation, on arc fault detection; therefore, the robustness of the methods needs to be improved. Furthermore, the number of model parameters of the existing methods is vast. The computational burden is too enormous for industrial embedded microcontrollers to implement.

In this paper, we propose a lightweight convolutional neural network-based method for detecting DC series arc faults in PV systems.

The main contributions of this paper are as follows:


#### **2. Data Collection and Analysis**

#### *2.1. Arc Fault Experiment Platform*

Since the actual DC arc faults in PV systems have stochastic characteristics, it is challenging to directly capture a large amount of DC arc current data for the arc fault detection algorithm. Therefore, a DC arc fault experimental platform is established to generate DC series arc faults under different working conditions for collecting current data.

The experimental platform mainly includes a PV string, an arc fault generator, signal acquisition devices, and a PV inverter. The UL1699B standard includes four application examples for different AFDD installation positions. The first case, in which an AFDD is installed within the inverter, was used in this experimental platform. A GOODWE GW36K-MT three-phase inverter was used as the load, and the AFDD was installed within the inverter. The UL1699B standard indicates that the PV simulator can replace the actual PV string. Therefore, the ITECH IT6018C PV simulator replaced the PV string to make the experiments more convenient and diversified. The voltage range was 0–1500 V, and the current range was 0–40 A. The ITECH IT6018C PV can simulate the I-V curve under various weather conditions, such as irradiance. In accordance with the UL1699B standard, two circuit forms were used: (1) the circuit of one PV string for a centralized power inverter; (2) the circuit of two PV strings for a centralized power inverter. Figure 1 shows the circuit of the two PV strings for a centralized power inverter. characteristics of the PV system. The decoupling network is shown in Figure 2. According to the UL1699B standard, when *I*mpp = 3 A, R3 = R4 = 27 Ω, and when *I*mpp = 16 A, R3 = R4 = 4.5 Ω. According to the IEC 63027 standard, when *I*mpp = 25 A, R3 = R4 = 2.5 Ω.

make the experiments more convenient and diversified. The voltage range was 0–1500 V, and the current range was 0–40 A. The ITECH IT6018C PV can simulate the I-V curve under various weather conditions, such as irradiance. In accordance with the UL1699B standard, two circuit forms were used: (1) the circuit of one PV string for a centralized power inverter; (2) the circuit of two PV strings for a centralized power inverter. Figure 1

The different locations of arc faults have different effects on the DC side current of PV systems. In accordance with the UL 1699B standard, an arc generator was added to the circuit for simulating arc faults, as shown in , , and in Figure 1. They are between the PV strings, at the end of the PV strings, and at the start of the PV strings. The arc generator was integrated into the system and combined with the system to generate a

In order to simulate the parasitic capacitance and inductance generated by the long line (80 m) between the AFDD and the PV string in PV systems, an impedance network module was added to the circuit to simulate the high-frequency characteristics of the PV system. The impedance network parameters shown in Figure 1 were set in accordance with the UL 1699B standard. When C1 was set to two parameters for testing—300 nF and 20 μF, respectively—the arc fault was the most serious, so each situation had to be tested. The standard stipulates that a decoupling network should be added in front of the impedance network to control the output capacitance of the PV simulator and simulate the DC

*Energies* **2022**, *15*, x FOR PEER REVIEW 4 of 21

series arc fault.

shows the circuit of the two PV strings for a centralized power inverter.

**Figure 1.** DC series arc fault experimental platform. **Figure 1.** DC series arc fault experimental platform.

The different locations of arc faults have different effects on the DC side current of PV systems. In accordance with the UL 1699B standard, an arc generator was added to the circuit for simulating arc faults, as shown in <sup>1</sup> , <sup>2</sup> , and <sup>3</sup> in Figure 1. They are between the PV strings, at the end of the PV strings, and at the start of the PV strings. The arc generator was integrated into the system and combined with the system to generate a series arc fault.

In order to simulate the parasitic capacitance and inductance generated by the long line (80 m) between the AFDD and the PV string in PV systems, an impedance network module was added to the circuit to simulate the high-frequency characteristics of the PV system. The impedance network parameters shown in Figure 1 were set in accordance with the UL 1699B standard. When C1 was set to two parameters for testing—300 nF and 20 µF, respectively—the arc fault was the most serious, so each situation had to be tested. The standard stipulates that a decoupling network should be added in front of the impedance network to control the output capacitance of the PV simulator and simulate the DC characteristics of the PV system. The decoupling network is shown in Figure 2. According to the UL1699B standard, when *I*mpp = 3 A, R3 = R4 = 27 Ω, and when *I*mpp = 16 A, R3 = R4 = 4.5 Ω. According to the IEC 63027 standard, when *I*mpp = 25 A, R3 = R4 = 2.5 Ω.

**Figure 2.** Decoupling network. **Figure 2.** Decoupling network.

#### *2.2. Different Operating Conditions and Power Spectra of Current Data 2.2. Different Operating Conditions and Power Spectra of Current Data*

2.2.1. Different Operating Conditions in Experiments 2.2.1. Different Operating Conditions in Experiments

Different operating conditions were used for data collection to verify the generalization ability of the algorithm. In this experiment, we selected three various tests from the UL 1699B and IEC63027 standards, as shown in Table 1. In order to simulate the worst arc fault situation, the impedance network component C1 was set to two parameters for testing, 300 nF and 20 μF, respectively. Each test, as shown in Table 1, was performed at the three arc fault locations shown in Figure 1 to verify the reliability of the algorithm. The minimum *I*arc represents a realistic arc event with one or two strings at low irradiance, and *I*mpp, *V*mpp represent current and voltage in the maximum power point, respectively. *V*oc represents open-circuit voltage. The PV simulator can set the four parameters. A stepping motor controller can set the gap and arcing speed. In this experiment, we added two situations, as shown in Table 1: (1) a PV inverter startup, and (2) irradiance mutation, which causes current mutation. These two situations, which belong to the normal state, were tested to verify the robustness of the algorithm. Different operating conditions were used for data collection to verify the generalization ability of the algorithm. In this experiment, we selected three various tests from the UL 1699B and IEC63027 standards, as shown in Table 1. In order to simulate the worst arc fault situation, the impedance network component C1 was set to two parameters for testing, 300 nF and 20 µF, respectively. Each test, as shown in Table 1, was performed at the three arc fault locations shown in Figure 1 to verify the reliability of the algorithm. The minimum *I*arc represents a realistic arc event with one or two strings at low irradiance, and *I*mpp, *V*mpp represent current and voltage in the maximum power point, respectively. *V*oc represents open-circuit voltage. The PV simulator can set the four parameters. A stepping motor controller can set the gap and arcing speed. In this experiment, we added two situations, as shown in Table 1: (1) a PV inverter startup, and (2) irradiance mutation, which causes current mutation. These two situations, which belong to the normal state, were tested to verify the robustness of the algorithm.


The DC arc current in PV systems presents the characteristics of stochastic high-frequency burrs in the time domain. In contrast, the frequency spectrum amplitude increases **Table 1.** Different operating conditions of DC series arc fault experiments.

power spectrum estimation methods mainly include parametric model spectrum estima-<sup>1</sup> The symbol "/" indicates that the test excludes this variable.

tion and nonparametric model spectrum estimation. Compared with parametric model spectrum estimation, nonparametric model spectrum estimation has better spectrum estimation performance. However, it requires a large amount of calculation and model complexity, which present challenges in meeting the real-time requirements of DC arc fault detection in practical applications. Therefore, the power spectrum estimation method of the parametric model, with less calculation, is selected. The general power spectrum estimation methods of the parametric model include the autoregressive (AR) model and the autoregressive moving average (ARMA) model. The DC arc current in PV systems presents the characteristics of stochastic highfrequency burrs in the time domain. In contrast, the frequency spectrum amplitude increases slightly in a specific frequency band (such as 40–100 kHz) in the frequency domain. The high-frequency noise of a similar frequency band will be generated when the PV inverter is in the PWM state, and its frequency spectrum amplitude is the same as or even higher than the arc current signal. Therefore, it is difficult to distinguish between the normal state and the arc fault state according to the amplitude difference. However, the PWM noise generated by power electronic devices has regularity due to periodic modulation and system inertia, so the current signals under different working conditions can be distinguished by analyzing the power spectrum. The power spectrum can describe the stochastic

signal, which defines the power of the current signal as a function of frequency, and it is susceptible to the change of the signal. It can essentially reflect the objective law of signal change. The process of solving the power spectrum is called power spectrum estimation. Modern power spectrum estimation methods mainly include parametric model spectrum estimation and nonparametric model spectrum estimation. Compared with parametric model spectrum estimation, nonparametric model spectrum estimation has better spectrum estimation performance. However, it requires a large amount of calculation and model complexity, which present challenges in meeting the real-time requirements of DC arc fault detection in practical applications. Therefore, the power spectrum estimation method of the parametric model, with less calculation, is selected. The general power spectrum estimation methods of the parametric model include the autoregressive (AR) model and the autoregressive moving average (ARMA) model.

#### 2.2.2. AR Model

The time series *x*(*n*) of the *p*-order AR model is obtained by the superposition of the signal value at the first *p* moments and the white noise, and the calculation formula is

$$\mathbf{x}(n) = -\sum\_{m=1}^{p} a\_{\mathbf{m}} \mathbf{x}(n-m) + w(n) \tag{1}$$

In Formula (1), *a*<sup>m</sup> is the coefficient of the corresponding time series data, and *w* is the Gaussian white noise with mean value 0 and variance *σ* 2 .

The system transfer function expression of the *p*-order AR model is

$$H(z) = \frac{1}{1 + \sum\_{m=1}^{p} a\_m z^{-m}} \tag{2}$$

According to Equation (2), the AR model is an all-pole model, which can directly reflect the peak distribution in the power spectrum. The Fourier transform processes the transfer function in Equation (2) to obtain the power spectrum calculation, as shown in Equation (3):

$$\tilde{S}\_{\mathbf{x}}(\omega) = \frac{k^2}{\left| 1 + \sum\_{m=1}^{p} a\_{\mathbf{m}} e^{-j\omega m} \right|^2} \tag{3}$$

#### 2.2.3. ARMA Model

The time series calculation formula of the (*p*, *q*) order ARMA model is

$$\mathbf{x}(n) = \sum\_{i=0}^{q} b\_i \mathbf{w}(n-i) - \sum\_{m=0}^{p} a\_{\mathbf{m}} \mathbf{x}(n-m) \tag{4}$$

According to Equation (4), the system transfer function of the ARMA model is

$$H(z) = \frac{\sum\_{i=0}^{q} b\_i z^{-1}}{\sum\_{m=0}^{p} a\_m z^{-m}}\tag{5}$$

According to Equation (5), the ARMA model is a zero-pole model, which can directly reflect the peak and valley distribution in the power spectrum. The Fourier transform processes the transfer function in Equation (5) to obtain the power spectrum calculation, as shown in Equation (6):

$$\widetilde{S}\_{\mathbf{x}}(\omega) = \frac{k^2 \left| \sum\_{i=0}^{q} b\_i e^{-j\omega i} \right|^2}{\left| \sum\_{m=0}^{p} a\_m e^{-j\omega m} \right|^2} \tag{6}$$

The AR model has a simpler structure and fewer calculations than the ARMA model. Therefore, the AR model is selected as the power spectrum estimation model. After the AR model of the DC current signal is established, the model parameters need to be calculated.

#### 2.2.4. The Selection of Optimal Parameters in the AR Model

It can be seen from Equation (3) that the prediction accuracy of the power spectrum depends on the coefficient *a*<sup>m</sup> and the order *p*, so choosing a suitable model parameters calculation method is necessary. Commonly used calculation methods for model parameters include the Levinson-Durbin algorithm and Burg algorithm. In this paper, the Burg algorithm was selected as the parameter calculation method for the current signal AR model of the PV system for research, because it has the minimum sum of total mean square error. The calculation process is as follows.

Assuming *n* sample data *x*(1), *x*(2) . . . , *x*(*n*), initialize the forward prediction error *e f* and the backward prediction error *e b* , where *n* = 1, 2, 3, . . . , *N*.

$$\begin{cases} \begin{aligned} \boldsymbol{e}\_{0}^{\mathbf{f}}(n) &= \boldsymbol{e}\_{0}^{\mathbf{b}}(n) = \mathbf{x}(n) \\ \boldsymbol{e}\_{\mathbf{m}}^{\mathbf{f}}(n) &= \boldsymbol{e}\_{\mathbf{m}-1}^{\mathbf{f}}(n) + k\_{\mathbf{m}} \boldsymbol{e}\_{\mathbf{m}-1}^{\mathbf{b}}(n-1) \\ \boldsymbol{e}\_{\mathbf{m}}^{\mathbf{b}}(n) &= \boldsymbol{e}\_{\mathbf{m}-1}^{\mathbf{b}}(n-1) + k\_{\mathbf{m}} \boldsymbol{e}\_{\mathbf{m}-1}^{\mathbf{f}}(n) \end{aligned} \end{cases} \tag{7}$$

In Equation (7), *k*<sup>m</sup> is the reflection coefficient. The forward and backward prediction error power *ε* is defined as:

$$\boldsymbol{\varepsilon} = \sum\_{n=m}^{N-1} \left[ \boldsymbol{e}\_{\rm m}^{\rm f} \left( n \right)^{2} + \boldsymbol{e}\_{\rm m}^{\rm b} \left( n \right)^{2} \right] \tag{8}$$

To minimize the error power *ε*, make *∂ε ∂k*m = 0; the reflection coefficient *k*<sup>m</sup> is calculated by Equations (7) and (8):

$$k\_{\mathbf{m}} = -\frac{2\sum\_{n=m}^{N-1} \left[e\_{\mathbf{m}-1}^{\mathbf{f}}(n)\right] \left[e\_{\mathbf{m}-1}^{\mathbf{b}}(n-1)\right]}{\sum\_{n=m}^{N-1} \left\{\left[e\_{\mathbf{m}-1}^{\mathbf{f}}(n)\right]^2 + \left[e\_{\mathbf{m}-1}^{\mathbf{b}}(n-1)\right]^2\right\}}\tag{9}$$

In Equation (9), *m* = 1, 2, 3, . . . , *p*. Since the reflection coefficient *k*<sup>m</sup> is an unbiased estimation of the partial correlation coefficient, the autocovariance function *R*xx of order from 0 to *p*, which is related to the parameter, can be derived from the Yule-Walker formula:

$$R\_{\infty}(m) = \begin{cases} -\sum\_{k=1}^{p} a\_{\text{m}}(l) R\_{\infty}(m-l), m > 0\\ -\sum\_{k=1}^{p} a\_{\text{m}}(l) R\_{\infty}(m-l) + \sigma\_{p}^{-2}, m = 0\\ R\_{\infty}(-m), m < 0 \end{cases} \tag{10}$$

In Equation (10), *l* = 1, 2, . . . , *m* − 1. The following Equation (11) can be obtained by cycle calculation.

$$\begin{aligned} \Delta\_{\mathbf{m}} &= R\_{\mathbf{x}\mathbf{x}}(m) + \sum\_{l=1}^{m-1} a\_{\mathbf{m}-1}(l) R\_{\mathbf{x}\mathbf{x}}(m-l) \\ c\_{\mathbf{m}} &= -\Delta\_{\mathbf{m}} / \sigma\_{\mathbf{m}-1} \\ a\_{\mathbf{m}}(l) &= a\_{\mathbf{m}-1}(l) + c\_{\mathbf{m}} a\_{\mathbf{m}-1}(m-l), l = 1, 2, \dots, m-1 \end{aligned} \tag{11}$$

 

In Equation (9), the reflection coefficient *k*<sup>m</sup> can be used as the estimated value of *c*m. The Levinson recurrence Formula (12) can be obtained by substituting it into Equation (11). The AR model coefficient *a*<sup>m</sup> is calculated according to the recurrence relationship:

$$\begin{cases} a\_{\mathbf{m}}(m) = k\_{\mathbf{m}} \\ a\_{\mathbf{m}}(l) = a\_{\mathbf{m}-1}(l) + k\_{\mathbf{m}} a\_{\mathbf{m}-1}(m-l) \end{cases} \tag{12}$$

In Equation (12), *l* = 1, 2, . . . , *m* − 1. After the calculation, add 1 to the value of m and repeat the above steps until *m* = *p*.

After using the Burg algorithm to obtain the AR model coefficient *a*m, it is necessary to determine the optimal order *p* of the model. If the order is not selected correctly, the estimation results will be inconsistent with reality. Using the Akaike information criterion (AIC) to fit the asymptotic unbiased estimation of the difference between the AR model and truth-value, the best order of the model can be determined when the model is unknown. The smaller the AIC value, the better the fitting effect of the model.

The general form of the AIC criterion is:

$$\text{AIC} = -\ln \text{L} + 2k \tag{13}$$

where *k* is the number of parameters and L is the likelihood function. Assuming that the number of current samples is *N* and *SSR* is the sum of squares of residuals, Equation (13) can be converted to:

$$\text{AICC} = \text{Nl} \ln(\frac{SSR}{N}) + 2k \tag{14}$$

Equation (14) is applied to the order determination of the AR model. *k* represents the order *p*, *N* is the number of samples, and *SSR N* is the variance of the prediction error of the AR model, which can be replaced by *σ* 2 *p* ; then Equation (14) is converted to:

$$\text{AIC}(p) = \ln \sigma\_P^2 + \frac{2p}{N} \tag{15}$$

In Equation (15), *σ* 2 *p* can be calculated by the reflection coefficient *k<sup>p</sup>* in the Burg algorithm by Equation (9), and the calculation formula is:

$$
\sigma\_p \,^2 = (1 - \left| k\_{p-1} \right|\_2) \sigma\_{p-1} \,^2 \tag{16}
$$

In order to obtain the optimal order of the AR model, the above arc fault experimental platform was used to collect eighteen groups of DC side current data by tests no. 1, no. 2, and no. 3 with a 250 kHz sampling rate. The arc current is disordered and stochastic, and it influences the calculation result, so the current in the normal state was selected for calculating AIC values and analysis. The time window of each group of data was 10 ms. Thus, each time a window had 2500 samples, which ensured the validity of the calculation, and the samples were not very large. The order *p* was from 1 to 20, and the AIC values corresponding to different orders could be obtained according to Equations (15) and (16). The results are shown in Figure 3.

It can be seen from Figure 3 that when the order *p* = 12, the AIC value of the current data was the smallest. When the order *p* increased, the AIC value changed indistinctly and had a slightly increasing trend. Therefore, the optimal order of the DC current signal AR model was *p* = 12.

The Burg algorithm was used to solve the 12-order AR model coefficient of the current signal, and the expression of the transfer function is:

$$H(z) = \frac{1}{1 + \sum\_{m=1}^{12} a\_m z^{-m}} \tag{17}$$

**Figure 3.** AIC values of current data. **Figure 3.** AIC values of current data.

AR model was *p* = 12.

In Equation (15), σ

It can be seen from Figure 3 that when the order *p* = 12, the AIC value of the current data was the smallest. When the order *p* increased, the AIC value changed indistinctly According to Equation (17), the power spectrum estimation expression of the 12-order AR model of the current signal is calculated as:

<sup>2</sup> 2 *P <sup>p</sup> (p) ln <sup>N</sup>*

> σ

<sup>ଶ</sup> can be calculated by the reflection coefficient in the Burg

+ (15)

( | |) *k* − − = − (16)

σ

2 22 1 1 1 *p pp*

In order to obtain the optimal order of the AR model, the above arc fault experimental platform was used to collect eighteen groups of DC side current data by tests no. 1, no. 2, and no. 3 with a 250 kHz sampling rate. The arc current is disordered and stochastic, and it influences the calculation result, so the current in the normal state was selected for calculating AIC values and analysis. The time window of each group of data was 10 ms. Thus, each time a window had 2500 samples, which ensured the validity of the calculation, and the samples were not very large. The order *p* was from 1 to 20, and the AIC values corresponding to different orders could be obtained according to Equations (15)

AIC =

σ

algorithm by Equation (9), and the calculation formula is:

and (16). The results are shown in Figure 3.

$$\tilde{S}\_{\mathbf{x}}(\omega) = \frac{k^2}{\left| 1 + \sum\_{i=1}^{12} a\_i e^{-j\omega i} \right|^2} \tag{18}$$

12 1 1 1 <sup>−</sup> = <sup>+</sup> <sup>m</sup> m ( )= *m H z a z* (17) According to Equation (17), the power spectrum estimation expression of the 12-order AR model of the current signal is calculated as: When using the AR model to calculate the power spectrum of the PV system current data, it is necessary to select a suitable time window scale to enlarge the difference in the power spectrum of the current signal under different time windows. In particular, the difference can reflect the changing characteristics in arc current, which is significantly different from the normal state. Since the correlation coefficient can reflect the relationship between two variables, the correlation coefficients of the power spectrum under different time windows were calculated by three groups of current data in test no. 1 separately. The characteristics of tests no. 2 and no. 3 were similar to those of test no. 1, and the variance was also calculated. The larger the variance value, the more pronounced the power spectrum difference in different time windows. It can be seen from Table 2 that when 10 ms and 17 ms time windows were selected for power spectrum estimation, the variance of the correlation coefficient in the arc fault state was considerable.

**Table 2.** The variance of correlation coefficient of arc fault states' and normal states' power spectrum values under different time window scales.


In contrast, the variance of the normal state correlation coefficient was much smaller than that of the arc fault state. However, the 17 ms time window was too long to process data quickly. Therefore, 10 ms was selected as the time window scale for calculating the DC current power spectrum. 20 0.21711 0.00806

After the time window scale was determined, the power spectrum of current signals was drawn for comparative analysis. Six groups of current data were selected by tests no. 1, no. 2, and no. 3, the 12-order AR model was established, and the power spectrum was calculated. One of the results of test no. 1 is shown in Figure 4. After the time window scale was determined, the power spectrum of current signals was drawn for comparative analysis. Six groups of current data were selected by tests no. 1, no. 2, and no. 3, the 12-order AR model was established, and the power spectrum was calculated. One of the results of test no. 1 is shown in Figure 4.

*Energies* **2022**, *15*, x FOR PEER REVIEW 10 of 21

2 <sup>2</sup> <sup>12</sup>

*a e*

<sup>−</sup> <sup>+</sup>

jωi i =1

(18)

1

*<sup>k</sup> <sup>S</sup>*

*i*

When using the AR model to calculate the power spectrum of the PV system current data, it is necessary to select a suitable time window scale to enlarge the difference in the power spectrum of the current signal under different time windows. In particular, the difference can reflect the changing characteristics in arc current, which is significantly different from the normal state. Since the correlation coefficient can reflect the relationship between two variables, the correlation coefficients of the power spectrum under different time windows were calculated by three groups of current data in test no. 1 separately. The characteristics of tests no. 2 and no. 3 were similar to those of test no. 1, and the variance was also calculated. The larger the variance value, the more pronounced the power spectrum difference in different time windows. It can be seen from Table 2 that when 10 ms and 17 ms time windows were selected for power spectrum estimation, the variance of

In contrast, the variance of the normal state correlation coefficient was much smaller than that of the arc fault state. However, the 17 ms time window was too long to process data quickly. Therefore, 10 ms was selected as the time window scale for calculating the

**Table 2.** The variance of correlation coefficient of arc fault states' and normal states' power spectrum

1 0.05939 0.01268 4 0.06125 0.00185 6 0.07528 0.00265 8 0.06233 0.00195 10 0.24262 0.00140 12 0.02257 0.00085 15 0.18921 0.00110 17 0.37401 0.00151

**Time Window Scale/ms Arc Fault State Normal State** 

x

the correlation coefficient in the arc fault state was considerable.

DC current power spectrum.

values under different time window scales.

( )=

ω

**Figure 4. Figure 4.** Comparison of power spectrum between th Comparison of power spectrum between the normal state and the arc fault state. e normal state and the arc fault state.

In Figure 4, the orange line represents the power spectrum of the current in the arc fault state. With the increase of frequency, the power spectrum values decreased gradually. The values of the low-frequency part were significantly higher than those of the high-frequency part. The blue line represents the power spectrum of the current in the normal state. The power spectrum values were basically unchanged with the frequency increase, except for 0–10 kHz. In addition, the power spectrum values of arc fault were higher than those of the normal state. The spike at 32 kHz was due to the noise interference of the PV inverter. Therefore, the power spectrum was significantly different between the arc fault state and the normal state, and could be used as the neural network input to detect arc fault.

#### *2.3. Data Processing and Creating the Dataset*

The power spectrum of the DC current under the normal state and the arc fault state were different, so it could be used as the input of the neural network model for training. We used the experimental platform to collect the current data of the tests shown in Table 1. Tests no. 1 to no. 3 contained eighteen groups of data that included the normal state and the arc fault state. Test no. 4 contained three groups of data, and test no. 5 contained six groups of data. Both test no. 4 and test no. 5 belonged to the normal state.

The original data were split to extract arc fault data and normal data. Since neural network learning requires a large amount of data, the current data collected by the experimental platform were processed into a dataset, as input for the neural network. The dataset's format and size were unified to facilitate network training. In order to unify the size of the dataset, the classified data for the arc fault state and the normal state were processed into the same time scale, and the 10 ms sampling window was taken as the unit time window.

Since the dataset sampling rate was 250 kHz, the number of sampling points in the unit time window was 2500. The 12-order AR model was used to obtain the power spectrum data. According to the different range of power spectrum values under different working conditions, the values were normalized to map the data value between [0, 1]. The deviation standardization was used as the normalization method, and the equation was as follows:

$$\mathbf{x}^\* = \frac{\mathbf{x} - \mathbf{min}}{\mathbf{max} - \mathbf{min}} \tag{19}$$

In Equation (19), max and min are the maximum and minimum values of power spectrum data and *x*\* is the normalized value. Each group's normalized power spectrum data has the same order of magnitude.

The normalized data were transferred into the two-dimensional image format. In order to improve the training efficiency of the model, the images were processed into the gray images shown in Figure 5. The resolution of the images was converted into 240 × 240 to meet the EfficientNet-B1 input requirement. The total number of images was 10,000 in the dataset, including 6000 images in the training set, 2000 in the validation set, and 2000 in the test set. After the data were processed into images, each group of data was labeled and divided into two types: arc fault and normal. *Energies* **2022**, *15*, x FOR PEER REVIEW 12 of 21

**Figure 5.** The inputs to the neural network model are two-dimensional images. (**a**) Normal state; (**b**) Arc fault state. **Figure 5.** The inputs to the neural network model are two-dimensional images. (**a**) Normal state; (**b**) Arc fault state.

#### **3. Methodology 3. Methodology**

Convolutional neural network (CNN) has emerged as a fundamental feature exaction program for applications in image tasks. However, the existence of multiple complex behaviors of arc current in PV systems makes some convolutional frameworks suboptimal for the arc fault detection task. Due to the complexity of the DC series arc fault current in PV systems, it is difficult to find a suitable set of CNN parameters, including depth, width, and resolution size, for effectively distinguishing between the arc fault state and the normal state using the current. Inspired by EfficientNet and the attention mechanism, this paper proposes a model based on a lightweight convolutional neural network with a channel and spatial attention mechanism for arc fault detection, and names it ArcDetec-Convolutional neural network (CNN) has emerged as a fundamental feature exaction program for applications in image tasks. However, the existence of multiple complex behaviors of arc current in PV systems makes some convolutional frameworks suboptimal for the arc fault detection task. Due to the complexity of the DC series arc fault current in PV systems, it is difficult to find a suitable set of CNN parameters, including depth, width, and resolution size, for effectively distinguishing between the arc fault state and the normal state using the current. Inspired by EfficientNet and the attention mechanism, this paper proposes a model based on a lightweight convolutional neural network with a channel and spatial attention mechanism for arc fault detection, and names it ArcDetectionNet (ADNet).

#### tionNet (ADNet). *3.1. Lightweight Convolutional Backbone Network Structure*

*3.1. Lightweight Convolutional Backbone Network Structure*  A lightweight convolutional backbone network structure, referring to the idea of EfficientNet, is shown in Figure 6. H, C, and W represent three dimensions of the convolutional neural network. First, we performed a 1 × 1 point-by-point convolution on the input data and changed the output channel dimension according to the expansion ratio. The global features were obtained in the channel dimension of the feature map, and then k × k depth convolution was carried out. Second, we performed an excitation operation on the output result. The 1 × 1 convolution result was multiplied by the activation ratio R, A lightweight convolutional backbone network structure, referring to the idea of EfficientNet, is shown in Figure 6. H, C, and W represent three dimensions of the convolutional neural network. First, we performed a 1 × 1 point-by-point convolution on the input data and changed the output channel dimension according to the expansion ratio. The global features were obtained in the channel dimension of the feature map, and then k × k depth convolution was carried out. Second, we performed an excitation operation on the output result. The 1 × 1 convolution result was multiplied by the activation ratio R, and the original channel dimension was restored at the end of the 1 × 1 point-by-point convolution.

and the original channel dimension was restored at the end of the 1 × 1 point-by-point convolution. Finally, the connection deactivation and the input jump connection were car-

convolution operation in this module is normalized and uses the swish activation func-

In Equation (20), *β* is a constant or trainable parameter, which defaults to 1.

*f x x sigmoid x* () ( ) = ´

β

(20)

tion. The swish activation function equation is as follows:

Finally, the connection deactivation and the input jump connection were carried out. This structure is called mobile inverted bottleneck convolution (MBConv). Each convolution operation in this module is normalized and uses the swish activation function. The swish activation function equation is as follows:

$$f(\mathbf{x}) = \mathbf{x} \times \operatorname{sigmoid}(\mathfrak{f}\mathbf{x}) \tag{20}$$

**Figure 6.** Lightweight convolutional backbone network structure. **Figure 6.** Lightweight convolutional backbone network structure.

The effect of the swish function is better than that of the ReLU function on the deep In Equation (20), *β* is a constant or trainable parameter, which defaults to 1.

network model. It has a lower bound without an upper bound, and it is smooth and nonmonotonic. This method can make the model have stochastic depth, reduce the time required for model training, and improve model performance. The effect of the swish function is better than that of the ReLU function on the deep network model. It has a lower bound without an upper bound, and it is smooth and non-monotonic. This method can make the model have stochastic depth, reduce the time required for model training, and improve model performance.

#### *3.2. Arc Detection Attention Mechanism Module*  The neural network uses the attention mechanism to generate different connection *3.2. Arc Detection Attention Mechanism Module*

weights between layers and obtain the output of this layer, so it can focus on specific input characteristics, reduce the number of network operations, and improve network performance. This paper proposes an arc detection attention mechanism (ADAM) module. ADAM was calculated based on the channel and space dimensions for the feature map generated by the convolutional neural network. The calculation results were multiplied by the input data to carry out adaptive learning of features. Moreover, the module was designed for a convolutional neural network, which could be combined with various convolutional neural networks for end-to-end training. For example, we set the channel attention mechanism and then set the spatial attention mechanism after the channel attention mechanism. The structure of the ADAM module is shown in Figure 7. As shown in Figure 7, the ADAM module extracts data features from two dimen-The neural network uses the attention mechanism to generate different connection weights between layers and obtain the output of this layer, so it can focus on specific input characteristics, reduce the number of network operations, and improve network performance. This paper proposes an arc detection attention mechanism (ADAM) module. ADAM was calculated based on the channel and space dimensions for the feature map generated by the convolutional neural network. The calculation results were multiplied by the input data to carry out adaptive learning of features. Moreover, the module was designed for a convolutional neural network, which could be combined with various convolutional neural networks for end-to-end training. For example, we set the channel attention mechanism and then set the spatial attention mechanism after the channel attention mechanism. The structure of the ADAM module is shown in Figure 7.

sions: channel and space. The channel attention mechanism performs pooling and convolution operations for the input data. The output data of the above processes are each channel's weight coefficient, and the weight coefficient is multiplied by the input data to weight and fuse the channels. The output features weighted by the channel attention mechanism are used as the input of the spatial attention mechanism module to weight the crucial regions in the spatial dimension. The channel attention mechanism module and the spatial attention mechanism mod-As shown in Figure 7, the ADAM module extracts data features from two dimensions: channel and space. The channel attention mechanism performs pooling and convolution operations for the input data. The output data of the above processes are each channel's weight coefficient, and the weight coefficient is multiplied by the input data to weight and fuse the channels. The output features weighted by the channel attention mechanism are used as the input of the spatial attention mechanism module to weight the crucial regions in the spatial dimension.

ule are connected in serial. By changing the combination and position of the two modules, the optimal combination was selected to construct the ADNet model. The channel attention mechanism module and the spatial attention mechanism module are connected in serial. By changing the combination and position of the two modules, the optimal combination was selected to construct the ADNet model.

The ADAM module could be added at the front of the network, after the 3 × 3 convolution layer, or at the end of the network, after 16 MBConv modules. The optimal method was finally determined through the following experiments.

In addition to adding the ADAM module, it was also necessary to configure the other functions of the ADNet. The adaptive moment estimation algorithm was selected as the weight updating optimization algorithm, and the cross-entropy loss function was chosen as the loss function. The swish function was selected as an activation function.

**Figure 7.** The structure of the ADAM module. **Figure 7.** The structure of the ADAM module.

#### The ADAM module could be added at the front of the network, after the 3 × 3 convo-**4. Experimental Results and Analysis**

lution layer, or at the end of the network, after 16 MBConv modules. The optimal method was finally determined through the following experiments. In addition to adding the ADAM module, it was also necessary to configure the other functions of the ADNet. The adaptive moment estimation algorithm was selected as the This section analyzes the experimental results to select the optimal structure of the proposed ADNet algorithm. The dataset included current data under the arc fault state and the normal state in tests no. 1 to no. 5, and the samples in the test set excluded those in the training and validation sets.

#### weight updating optimization algorithm, and the cross-entropy loss function was chosen as the loss function. The swish function was selected as an activation function. *4.1. The Optimal Model Selection Based on EfficientNet*

**4. Experimental Results and Analysis**  This section analyzes the experimental results to select the optimal structure of the proposed ADNet algorithm. The dataset included current data under the arc fault state and the normal state in tests no. 1 to no. 5, and the samples in the test set excluded those in the training and validation sets. Since the ADNet network is based on the EfficientNet, and the EfficientNet model has eight models, the best model was selected at first. Among them, EfficientNet-B1~B7 are improved from the baseline model EfficientNet-B0. In order to get the most suitable network model, PyCharm software (JetBrains, Prague, Czech Republic) was used to build the program, and the environment was Python 3.7 (Guido van Rossum, Harlem, The Netherlands) and TensorFlow 2.4.0 (Google Brain, Mountain View, CA, USA). Due to the size of the dataset, a smaller network structure in the EfficientNet series networks was selected to reduce the number of parameters and unnecessary calculations for improving the training speed. The resolution of the input images becomes larger from EfficientNet-B0 to EfficientNet-B7, and the height and width of the output characteristic matrix of each layer structure will increase accordingly; the occupation of video memory will also increase. Therefore, the EfficientNet-B0–B3 of the EfficientNet series models were selected for training by the dataset. The model basic parameters and training results are shown in Table 3.

**Table 3.** EfficientNet-B0–B3 basic parameters and detection accuracy.


It can be seen from Table 3 that the detection accuracy of the EfficientNet-B0~B3 networks can reach more than 95%. The EfficientNet-B1 has the highest detection accuracy, indicating that it is suitable for the DC series arc fault detection in PV systems. At the same time, it avoids the problem of reducing the calculation speed caused by the increasing network complexity, which is the advantage of the EfficientNet series model.

In order to further improve the generalization ability and accelerate the convergence speed of the ADNet, considering that not every part of the power spectrum image is equally important, the channel attention mechanism was used, and different convolution kernels were used to capture various features for channel weighted fusion. In addition, the judgment of whether the circuit has an arc fault mainly depends on some critical areas of the power spectrum image, and the characteristics of each part of the image cannot be treated equally. Therefore, the spatial attention mechanism was used to weight some important regions in space, to strengthen important information and suppress nonimportant information.

We continued with experimental verification to find the optimal ADNet model. The experimental results of different ADAM types used in the ADNet are shown in Table 4. In Table 4, C represents the channel attention mechanism, and S represents the spatial attention mechanism. Q represents putting the attention mechanism in the front of the network, which follows the 3 × 3 convolution layer, and H represents adding the attention mechanism to the end of the network, which follows the 16 MBConv modules.


**Table 4.** The ADNet detection accuracy in different ADAM types.

According to Table 4, the ADNet model, compared with the original EfficientNet-B1 neural network model, improves the feature extraction ability of data samples and the accuracy of arc fault detection. Among the samples, the training set accuracy and test set accuracy of the improved CS-H model were the highest: the accuracy of arc fault detection of the training set was 99.96%, and that of the test set was 98.81%. Therefore, adding the channel attention mechanism first and then the spatial attention mechanism at the end of the network model can improve the model's detection accuracy. The ADAM module was more effective when applied to the deep layer of the network than when applied to the shallow layer of the network, because the characteristics of the deep layer of the network are more robust after multiple feature extractions. Thus, the ADNet model could capture some crucial features of power spectrum images with better robustness and performance after ADAM operation.

According to the above analysis, the EfficientNet-B1 and CS-H of the ADAM type were selected; the optimal structure of the ADNet model is shown in Figure 8.

**Figure 8.** The ADNet model's optimal structure. **Figure 8.** The ADNet model's optimal structure.

after ADAM operation.

#### *4.2. The Selection of ADNet Training Parameters 4.2. The Selection of ADNet Training Parameters*

The learning rate directly affects the convergence state of the network model, which determines the step length of the weight iteration. The model will not converge when the learning rate is set too large. When the learning rate is set too small, the convergence speed of the model will become slower, and it will be unable to learn. The best initial learning rate usually uses the search method, which starts training the model from small to large. After many experiments, 0.001 was chosen as the learning rate of the network to accelerate the convergence speed and save the training time. The learning rate directly affects the convergence state of the network model, which determines the step length of the weight iteration. The model will not converge when thelearning rate is set too large. When the learning rate is set too small, the convergence speedof the model will become slower, and it will be unable to learn. The best initial learningrate usually uses the search method, which starts training the model from small to large. After many experiments, 0.001 was chosen as the learning rate of the network to acceleratethe convergence speed and save the training time.

C–Q 99.82 97.37 C–H 99.90 97.32

According to Table 4, the ADNet model, compared with the original EfficientNet-B1 neural network model, improves the feature extraction ability of data samples and the accuracy of arc fault detection. Among the samples, the training set accuracy and test set accuracy of the improved CS-H model were the highest: the accuracy of arc fault detection of the training set was 99.96%, and that of the test set was 98.81%. Therefore, adding the channel attention mechanism first and then the spatial attention mechanism at the end of the network model can improve the model's detection accuracy. The ADAM module was more effective when applied to the deep layer of the network than when applied to the shallow layer of the network, because the characteristics of the deep layer of the network are more robust after multiple feature extractions. Thus, the ADNet model could capture some crucial features of power spectrum images with better robustness and performance

According to the above analysis, the EfficientNet-B1 and CS-H of the ADAM type

were selected; the optimal structure of the ADNet model is shown in Figure 8.

Batch size refers to the stochastic sample size used in the gradient descent algorithm, which affects the generalization performance of the convolutional neural network model. In a specific range, increasing the batch size will help the stability of convergence, improve the memory utilization rate, and speed up the processing speed of data volume. This paper set the batch size to 8 in many experiments with the ADNet model. Batch size refers to the stochastic sample size used in the gradient descent algorithm, which affects the generalization performance of the convolutional neural network model. In a specific range, increasing the batch size will help the stability of convergence, improve the memory utilization rate, and speed up the processing speed of data volume. This paper set the batch size to 8 in many experiments with the ADNet model.

Since the ADNet model has a complex structure, dropout was used, and the dropout rate was set to 0.2 in many experiments for avoiding over-fitting, and the number of iterations was 120 times. Since the ADNet model has a complex structure, dropout was used, and the dropout rate was set to 0.2 in many experiments for avoiding over-fitting, and the number of iterations was 120 times.

#### *4.3. Influence of Different Current Values on Detection Results*

In order to study the influence of different current values on the arc fault detection accuracy of the ADNet, we used 3 A, 16 A, and 25 A current data from the dataset to carry on experiments. Moreover, the PV inverter startup and irradiance mutation situations were considered the normal state to improve the robustness of the network. The results are shown in Table 5.


**Table 5.** The ADNet model's detection accuracy of different current values.

According to Table 5, with the increase of the current value, the accuracy of the training set and test set decreased gradually. By comparing Figures 4 and 9, it can be seen that with the increase of current values, the power spectrum values of current data also increased. Since the difference in power spectrum values between the high-frequency part and the low-frequency part decreased in the arc fault state, the power spectrum

characteristics of arc fault and normal states were similar, which had a certain impact on arc fault detection. However, according to Figure 9, whether the original power spectrum or the normalized power spectrum was considered, the power spectrum values in the arc fault state were basically higher than those in the normal state, and arc fault could still be detected accurately by the ADNet model, as shown in Table 5. The ADNet model's detection accuracy was 98.81%, including three current levels, indicating that this method can detect arc fault accurately. teristics of arc fault and normal states were similar, which had a certain impact on arc fault detection. However, according to Figure 9, whether the original power spectrum or the normalized power spectrum was considered, the power spectrum values in the arc fault state were basically higher than those in the normal state, and arc fault could still be detected accurately by the ADNet model, as shown in Table 5. The ADNet model's detection accuracy was 98.81%, including three current levels, indicating that this method can detect arc fault accurately.

In order to study the influence of different current values on the arc fault detection accuracy of the ADNet, we used 3 A, 16 A, and 25 A current data from the dataset to carry on experiments. Moreover, the PV inverter startup and irradiance mutation situations were considered the normal state to improve the robustness of the network. The results

**Current Value Training Set Accuracy Test Set Accuracy**  3 A 100% 99.97% 16 A 99.86% 98.96% 25 A 99.68% 97.87% overall 99.96% 98.81%

According to Table 5, with the increase of the current value, the accuracy of the training set and test set decreased gradually. By comparing Figures 4 and 9, it can be seen that with the increase of current values, the power spectrum values of current data also increased. Since the difference in power spectrum values between the high-frequency part and the low-frequency part decreased in the arc fault state, the power spectrum charac-

*Energies* **2022**, *15*, x FOR PEER REVIEW 17 of 21

*4.3. Influence of Different Current Values on Detection Results* 

**Table 5.** The ADNet model's detection accuracy of different current values.

are shown in Table 5.

**Figure 9.** The power spectrum of 25 A current data. (**a**) Original power spectrum; (**b**) Normalized power spectrum. **Figure 9.** The power spectrum of 25 A current data. (**a**) Original power spectrum; (**b**) Normalized power spectrum.

#### *4.4. Detection Accuracy of Different Existing Neural Networks 4.4. Detection Accuracy of Different Existing Neural Networks*

**Table 6.** The detection accuracy of different neural network models.

The existing research rarely used the power spectrum images as the input data for neural networks. Therefore, in order to verify whether the arc fault detection accuracy of the ADNet model is higher than that of the existing neural network models, we built GoogLeNet and AlexNet models to train and test with the same dataset as the ADNet model's and compared the accuracy of several existing arc fault detection networks. The results are shown in Table 6. The existing research rarely used the power spectrum images as the input data for neural networks. Therefore, in order to verify whether the arc fault detection accuracy of the ADNet model is higher than that of the existing neural network models, we built GoogLeNet and AlexNet models to train and test with the same dataset as the ADNet model's and compared the accuracy of several existing arc fault detection networks. The results are shown in Table 6.

**Model Training Set Accuracy/% Test Set Accuracy/%** 

According to Table 6, the detection accuracy of GoogLeNet, AlexNet, BP neural network, and DA-DCGAN is 96.23%, 96.83%, 95.23%, and 97.68%, respectively, and that of the ADNet model is 98.81%. Therefore, the arc fault detection accuracy of the ADNet model is higher than the other existing arc fault detection networks. The results indicate

The ADNet model can be used for edge applications based on embedded processors or modules of the arc fault detection equipment, such as Raspberry Pi, because: (1) The AR model-based data preprocessing method is employed to capture the arc features and remove un-sensitive parts of the power spectrum, which can help to reduce the amount of input data; (2) The ADNet model is based on EfficientNet-B1, a commonly-used lightweight convolutional neural network. Moreover, we used the attention mechanism to combine with the EfficientNet-B1, making the algorithm more concentrated on the arc features while ignoring the rest information. Specifically, spatial attention was used to locate the more sensitive part of the input signal, while channel attention was used to determine the more valuable channels or layers in the model [29]. Therefore, the proposed

GoogLeNet 96.37 96.23 AlexNet 96.91 96.83

BP neural network [25] \ 95.23 DA-DCGAN [27] 98.80 97.68 ADNet (ours) 99.96 98.81

that the ADNet model has a better performance in arc fault detection.

*4.5. Feasibility Analysis of Application in the Embedded Modules* 


**Table 6.** The detection accuracy of different neural network models.

According to Table 6, the detection accuracy of GoogLeNet, AlexNet, BP neural network, and DA-DCGAN is 96.23%, 96.83%, 95.23%, and 97.68%, respectively, and that of the ADNet model is 98.81%. Therefore, the arc fault detection accuracy of the ADNet model is higher than the other existing arc fault detection networks. The results indicate that the ADNet model has a better performance in arc fault detection.

#### *4.5. Feasibility Analysis of Application in the Embedded Modules*

The ADNet model can be used for edge applications based on embedded processors or modules of the arc fault detection equipment, such as Raspberry Pi, because: (1) The AR model-based data preprocessing method is employed to capture the arc features and remove un-sensitive parts of the power spectrum, which can help to reduce the amount of input data; (2) The ADNet model is based on EfficientNet-B1, a commonly-used lightweight convolutional neural network. Moreover, we used the attention mechanism to combine with the EfficientNet-B1, making the algorithm more concentrated on the arc features while ignoring the rest information. Specifically, spatial attention was used to locate the more sensitive part of the input signal, while channel attention was used to determine the more valuable channels or layers in the model [29]. Therefore, the proposed method can be further light-weighted with considerable detection accuracy; (3) Due to the above lightweight design and operation, the total parameters of the proposed ADNet model are only 6.58 <sup>×</sup> <sup>10</sup><sup>6</sup> , which are less than those of other commonly used methods. Meanwhile, the detection accuracy was higher than that of others. Table 7 shows a detailed comparison.



The more model parameters, the greater the amount of calculation and the slower the running speed [33]. We compared the number of network model parameters with the built GoogLeNet, AlexNet, and several commonly used networks. As shown in Table 7, the total of the parameters was the sum of the model parameters. The total number of model parameters in GoogLeNet, AlexNet, Inception V3, Xception, and ResNet50, which are commonly used convolutional neural networks, are 10.31 M, 14.59 M, 23.63 M, 22.86 M, and 23.48 M, respectively. The quantity is too large, resulting in too much computation and slowing down the running speed. However, the total number of parameters in the ADNet model, which belongs to the lightweight convolutional neural network, is 6.58 M, which is lower than that of the above networks. The results show that the proposed method achieves the best detection accuracy, with minimum computational burden, due to the well-designed lightweight algorithm. Therefore, the ADNet model is ready for edge applications and can be implemented with embedded processors or modules, such as the Raspberry Pi 3B with a quad-core 1.2 GHz CPU and 1 GB RAM. This calls for further research in the future.

#### **5. Conclusions**

In this paper, we established an experimental platform, based on the UL1699B standard to collect DC current data in creating a dataset, which can obtain current data efficiently. The power spectrum image of current data can clearly distinguish the current in the normal state and the arc fault state. Therefore, it can be used as the input for the arc fault detection algorithm. In order to avoid the problem of excessive consumption of computing resources due to increasing algorithm complexity, this paper proposed a detection method of DC series arc faults in PV systems based on a lightweight convolutional neural network, which has fewer parameters and a low computational burden. The power spectrum images were normalized and converted into 240 × 240 gray images as the dataset. Compared with the EfficientNet series model, the EfficientNet-B1 was selected as the optimal network. The channel attention mechanism and the spatial attention mechanism were added to the deep layer of the EfficientNet-B1 to construct the ADNet model for improving the network's detection accuracy and making it more effective. This method considered the situations of PV inverter startup and irradiance mutation, enhancing the robustness of the network. The results showed that the accuracy of the training set was 99.96%, and that of the test set was 98.81%, which are higher than the accuracies of GoogLeNet, AlexNet, and other commonly used networks. According to the above analysis, this method can be used in PV systems to detect DC series arc faults accurately and to reduce arc fire hazards. Therefore, the safety of PV systems will be improved, and solar energy may be used sufficiently and stably.

**Author Contributions:** Conceptualization, Y.W.; methodology, C.B. and W.L.; software, C.B.; validation, C.B., C.Z. and X.Q.; formal analysis, L.G.; investigation, Y.W.; resources, Y.W. and X.Q.; data curation, C.B.; writing—original draft preparation, C.B., Y.W. and W.L.; writing—review and editing, Y.W. and C.B.; visualization, C.B., X.Q. and C.Z.; supervision, L.G.; project administration, Y.W.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Natural Science Foundation of Hebei Province, grant number E2020202204, the Key Laboratory of Special Machine and High Voltage Apparatus (Shenyang University of Technology), grant number KFKT202003, the Zhejiang Provincial Natural Science Foundation of China, grant number LGG20E070002, the National Natural Science Foundation of China, grant number 51807134, and the State Key Lab of Reliability and Intelligence of Electrical Equipment (Hebei University of Technology) Open Project Funding Projects, grant number EERI\_KF20200014.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are contained within the article. Data sharing is not applicable to this article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

