1. Introduction
During the previous decade, photovoltaic (PV)-based electric generating has been a developing area of research in the industry domains [
1,
2], where GCPV systems have experienced the strongest growth [
3]. Moreover, the operation of high-efficiency photovoltaic systems has taken a major significance and a top priority, and a big challenge [
4]. In fact, many faults can occur and damage this kind of system, these faults can be categorized into three main classes: abrupt, incipient, or intermittent faults [
5]. Indeed, line-to-ground or line-to-line, short circuits, connector disconnection, open circuits, hot spots, and junction box failures are kinds of abrupt faults that can occur instantly often as a result of damage to the PV array. Because of their slower dynamics and smaller amplitudes, incipient faults are generally considered the most difficult faults. They can cause gradual damage to the PV cells, and lead to major problems if not detected early [
6]. Therefore, these kinds of faults can occur on both DC and AC sides. PV module defects for instance delamination, yellowing, and browning of solar cells, cracks, gaps, bubbles, and defects in the anti-reflective coating are examples of DC-side incipient faults [
7]. Wiring degradation, Insulated Gate Bipolar Transistor (IGBT) faults, islanding, overheating, and aging are all AC-side faults. Environmental stress or partial shading are kinds of intermittent failures that vary over time [
5,
8]. Thus, it is important to diagnose identify thereby forecast these faults early. Therefore, the demand for FDD algorithms is growing with the speedy growth of information and automation technologies, and data-driven process control approaches are being continuously enhanced. Indeed, different techniques and strategies have been developed in the literature. For instance, the authors in [
9], employ and present the initial results of an extensive, Long-Term study of the forecasting of voltage sags in distribution networks. The overriding objective of this research is to give the network operators proper algorithms that will allow them to forecast how many voltage sags will occur and the sites at which they are likely to occur. The authors in [
10], employed a Domain Reflectometry (TDR) technique to locate a failed PV module in a PV array, noting that the technique may also be used for fault localization and detection.
The research in [
11] provided a diagnostic method based on observing the magnitudes of various essential measurable frequency components for a DC-DC boost converter and a voltage source full-bridge inverter. A fault detection strategy for the GCPV systems using a wavelet transform (WT) is proposed in [
12]. Using power losses analysis, the authors in [
13] proposed a new statistical signal processing method for PV system (PVS) monitoring and fault detection. A strategy for automatic failure detection in (PVS) based on parameter extraction techniques was proposed in [
14]. To diagnose faults in (PVS), a statistical technique based on an exponentially weighted moving average chart was developed in [
15]. While, in [
16], authors presented an approach based on I–V characteristics analysis in order to detect PV array faults. In [
1,
17,
18,
19], additional multivariate and univariate statistical techniques for PV fault detection were reported. In [
20], an approach based on the estimating PV module’s crucial parameters was presented. A new algorithm for detecting faults in PV modules was developed in [
21]. The presented method in [
22] allows the identification of three major stages of faults, including faults in the string, faults in the module’s string, and a group of diverse failures, for example, aging, MPPT errors, and partial shadow. In [
23], a fault detection method that compares the current and the previous situations in a defective PV array (PVA), has been developed. The authors in [
24] presented a technique for determining the approximate position of faulty PVM in parallel or series PVA. For detecting DC cable faults and PV series arc failures, the authors proposed in [
25] a novel differential current-based quick detection and accurate failure localize estimate method. On a GCPV system, FDD has been employed using a reduced Kernel Random Forest (KRF) based on K-means clustering and a Euclidean distance-based KRF [
26]. Additionally, in order to optimize the voltage profile of distribution systems, centralized control is adopted and implemented in [
27] for determining the set points of the controllers of the distributed energy resources connected to the grid. The study in [
28] illustrated the use of an artificial NN (ANN) technique to diagnose GCPV system faults. A fault detection strategy for PV modules under partially shaded statues is proposed in [
29], which utilizes an artificial NN to predict electrical outputs and detect possible anomalies in the PV module using real time correlation of estimated and measured performances under variable conditions. The authors in [
30] used the ANN in conjunction with the traditional analytical approach to provide string-based PV systems with innovative and automatic fault detection and diagnostics. In [
31], a bi-directional input parameter integration-based ANN-based PV failure detection technique is developed. A Radial basis Network-based PV array defect detection method is provided in [
32]. The authors in [
33] employed a novel diagnostic strategy for PV systems based on artificial NNs to identify and classify the diverse failures occurring in the PV array. The work in [
34] presents a novel intelligent algorithm for PV system diagnosis and fault detection (IFD). In this work, the ANN algorithm can identify and thereby detect three recurrent states between healthy, string disconnection, and short circuit faults in the PV array. The paper [
35] proposes a customized NN algorithm that classifies, and identifies eight diverse commonly occurring PV faults scenarios. The authors in [
36] introduce the Laterally Primed Adaptive Resonance Theory (LAPART) artificial NN for PV system fault diagnostics and detection purposes. In [
37], the authors used back-propagation ANN, generalized regression ANN, probabilistic ANN, and two radial basis function ANNs (RBF) to detect and locate the most encountered failures in PV installations: short circuit, and open circuit string cases in PV generator.
The current work proposes an intelligent fault detection/diagnosis strategy based on Multiscale Principal Component Analysis (MSPCA) and NN classifiers in order to enhance the efficiency of conventional data-driven strategies for monitoring multivariate dynamic systems. Different from the classical and standard diagnosis approaches, the proposed MSPCA-based NN approaches are used to detect and thereby isolate faults. Therefore, the contributions of this work involve three major steps: First, the data are pre-processed by the use of a multiscale scheme in order to remove noise and stochastic observations. Second, the new dataset is fed as input to a PCA method in order to extract and select the most-significant attributes from the GCPV systems in order to improve and accelerate, thereby enhance, model convergence and classification performance and accuracy. After that, the extracted features are fed as inputs to the NN classifiers in order to detect, classify, and distinguish between the different conditions. Additionally, this study is being investigated and established to address and treat all the frequent and potential faults that might occur, damage, and affect PV systems. A total of 21 fault scenarios: line-to-line, line-to-ground, connectivity, and faults that can affect the bay-pass diodes’ normal operation are introduced at various levels and locations; each scenario contains a variety of conditions, including simple faults in the array, simple faults in the array, multiple faults in the array, multiple faults in the array, and mixed faults. Various ML techniques, including Decision Tree (DT), Support Vector Machine (SVM), Discriminant Analysis (DA), k-nearest neighbor (KNN), and Naive Bayes (NB), are employed to test and evaluate the performance of our suggested strategy in terms of diagnostic precision, recall, accuracy, and computation time. The obtained results demonstrate that the evolved strategy not only improves the accuracy compared to conventional ML methods but also provides an efficient reduction in computation time and storage space.
The sections of this paper are organized and arranged as follows: A thorough explanation and detailed description of the suggested multiscale PCA-based NNs is provided in
Section 2. The essential and main outcomes are presented in
Section 3. The paper is concluded in
Section 4.
3. Results and Discussion
3.1. Process Description
In this work, the distributed structure has been considered. This structure is a modular application that allows the multiplication and diversification of technologies, for which the combination of several different types of photovoltaic sensors can be made. One aspect of the possible configuration is shown in
Figure 7. A DC voltage bus with a 500-volt value is involved. All panel and converter components are linked in parallel to the DC voltage bus. Because each panel is optimally controlled individually, the downstream converter does not control the global MPP tracking. Besides, the controllers are resistant to external perturbations. Because of the used high voltage, it is possible to consider a reduction of the cable sections, which constitutes a material gain in copper or aluminum. The PV farm consists of 3 PV arrays, each delivering a maximum of 4 kW. A single PV array block is made up of two parallel strings, each having 24 modules connected in series. In each module, there are 20 cells. Each PV array has a DC/DC converter connected to it. The outputs of the boost converters are connected to a common 500-volt DC bus. Each boost is individually controlled using Maximum PowerPoint Trackers (MPPT). The PV array’s terminal voltage is varied by the MPPTs using the “perturb and observe” technique in order to obtain the maximum possible power. A three-phase source converter transforms the 500 V DC to 260 V AC and keeps the unity power factor. To connect the converter to the grid, a 100 (kVA) 260 (V)/25 (kV) three-phase coupling transformer is employed.
3.2. Description of the Input Data
Twenty-one frequent PV faults (, …, ) are treated in this current work.
As shown in
Table 1, we used five different types of faults to introduce various scenarios into the
and
systems in this work; for example, PV1’s simple faults include four possible fault scenarios:
(line-to-line fault) is injected betwixt two distinct points; a line-to-ground fault (
) is considered in String1 (
) positioned betwixt one point and the ground;
(connectivity fault) is injected in the first string between two modules;
impacts the bay-pass diodes by injecting a variation in resistance, the diverse positions of the aforementioned failures are shown in
Figure 8.
The second PV array receives the same simple fault injections. Then, numerous defects that present multiple faults are introduced into one PV array ( or ). In addition, we simultaneously injected mixed faults, which reflect numerous faults in both PV arrays.
The used simulated variables, which are gathered in order to assess FDD performance, are presented in [
48].
3.3. Fault Classification Results
The investigated GCPV system operates in 22 working modes (Class
) when the first mode is the healthy one. A sample training dataset using 50 percent of the data was utilized in order to train the NNs, and the remaining data were utilized to validate and evaluate the trained NNs (see
Table 2).
In the present work, a method for detecting and diagnosing faults is provided. Almost all stochastic measurements are decor-related. After being normalized to have unit variance and zero mean, a PCA model is generated. Then, using a 95% cumulative variance criterion, the acquired variances of the variables are stored and arranged in decreasing order after being computed by the use of the eigenvalue decomposition. Consequently, five PCs were maintained to be utilized to train the NN classifiers.
Therefore, denoising variables and selecting and thereby extracting statistical features using an MSPCA tool is crucial for achieving higher accuracy in FDD-based techniques. As a result, in this study, the NN classifiers are introduced with the newly obtained dataset. Using labeled training data, this method teaches a set of predefined fault types.
Several ML techniques, including DT, SVM, DA, KNN, and NB, are employed to test and evaluate the performance of our suggested strategy in terms of diagnostic precision, recall, accuracy, and computation time.
The different existing techniques are implemented in a MATLAB environment. The accuracy of these techniques is computed using a 10-fold cross-validation metric in order to determine the FDD efficiency of the suggested techniques. The number of hidden layers selected for the NN and CFNN is 10, and it was [10, 10, 10] for MNN, with a total of 50 max epochs with full batch size. The K value for KNN is equal to 3, and the K and C parameters for SVM are set with the lowest RMSE value. The number of splits for DT is equal to 50.
This work then employs a PCA model with a
group of features.
Table 3 shows the overall normalized accuracy values for the various extracted features and the NN classifiers.
Table 4 shows the obtained results in terms of normalized accuracy values for the diverse extracted features based on the combined MSPCA technique and the NN classifiers.
The established MSPCA-based NN methods are demonstrated to be efficient alternatives for fault diagnosis when compared to other existing methods. In spite of the fact that the MSPCA tool enhances and improves the overall performance of all the utilized techniques, the conventional methods still have significant drawbacks. For instance, the accuracy of the DA and NB approaches is still poor, and the SVM technique suffers from a difficult training phase and a high time complexity. In effect, it is clear that the suggested approach performs better and produces good outcomes in terms of classification accuracy compared to conventional techniques.Indeed, the accuracy of the training and testing phases of the ANN classifier increased by 9 and 1.97 percent, respectively, the training and testing phase accuracy of the MNN classifier improved by 19.75 and 11.99 percent, respectively; and indeed, for the CFNN classifier, the training mode accuracy increased by 17.48 percent and the testing mode accuracy by 16.88 percent. Besides, the evolved strategy reduces and decreases the computation time (CT), which speeds up the NN classifiers and slows down their convergence. For instance, for the ANN classifier, the CT has been decreased by 15.97 (s) and 0.17 (s) for the training and testing phases, respectively.
Table 5,
Table 6 and
Table 7 present the obtained testing classification outcomes of diverse classes by the use of the normalized confusion matrix in order to indicate the efficiency of the developed strategies. In fact, this matrix presents the samples that were correctly classified as well as the ones that were incorrectly classified for the healthy (
) and faulty modes (
to
). Actual classes and predicted process statuses are indicated by the raw and the column, respectively.
Table 5 shows that for faulty operating mode 1 (
), the ANN classifier recognizes 2863 observations out of 3000 (true positive) observations. For this scenario, the detection precision is 94.05 percent, the recall is 95.43 percent, and the misclassification rate is equal to 4.57%. For
, designated to class C2, the precision is equal to 94.44%, the recall is 98.66%, and the misclassification rate is equal to 1.34%. The Precision for
is 90.01%, the recall is 95.26%, and there is a 4.74% misclassification. The misclassification is therefore 8.17% for
, 1.87% for faulty operating mode5, 6.8% for
, 2.64% for
, 6% for
, 4.24% for faulty operating mode9, 5.97% for
, 9.93% for
, 14.07% for
, 1.37% for
, 23.74% for
, 8.3% for
, 2.27% for
, 8.07% for
, 6.14% for
, 2.37% for
, 2.6% for
, and 9.27% for
.
In
Table 6, the misclassification is 9.04% for the healthy case, 4.87% for
, 1.6% for
, 1.67% for
, 2.84% for faulty operating mode4, 2.87% for
, 4.1% for
, 7.8% for
, 9.3% for
, 6.07% for
, 9.34% for
, 6.24% for
, 10.24% for
, 3.6% for
, 9.94% for
, 9.3% for
, 2.5% for
, 5.97% for
, 5.47% for
, 2.94% for
, 4.14% for
, and 8% for
.
In
Table 7, the misclassification is 29.54% for the healthy case, 4.2% for
, 4.64% for
, 10.47% for
, 8.3% for faulty operating mode4, 0.64% for
, 10.5% for
, 5.84% for
, 10.54% for
, 32.64% for
, 98.24% for
, 6.44% for
, 3.77% for
, 3.44% for
, 9.24% for
, 8.44% for
, 2.6% for
, 12.77% for
, 38.57% for
, 33.6% for
, 14.84% for
, and 11.17% for
.
Despite the fact that the introduced faults are numerous, similar, and close, the developed technique, which merges the benefits of multiscale representation and the PCA technique, shows significant efficiency in detecting and diagnosing such frequent failures.
Therefore, overall results show that the evolved approach can improve the performance of a variety of existing techniques, not only in terms of recall, precision, and accuracy but also by significantly reducing computation time and storage space requirements. One can conclude that denoising variables, eliminating stochastic samples, removing irrelevant and correlated samples, and selecting and extracting only informative statistical features using an MSPCA tool are crucial to reducing the misclassification rate and thereby achieving the higher accuracy and reliability of FDD-based techniques.
4. Conclusions and Future Work
This paper investigated the problem of failure detection and diagnosis in grid-connected PV (GCPV) systems. The developed methodologies were based on Neural Network (NN), multiscale representation, and principal component analysis (PCA) tools. A multiscale PCA strategy was used to remove noise and extract and select more-relevant features. After that, the extracted features were fed as inputs to the NN classifiers in order to detect, classify, and distinguish between the different working conditions. After that, the extracted features were fed as inputs to the NNs classifiers in order to detect, classify, and distinguish between the different working conditions. In this work, we consider the diagnosis of all potential and frequent faults that may occur in GCPV systems in order to establish a comprehensive analysis and guarantee the efficiency and safety of such systems. Therefore, 21 faulty scenarios, including line-to-line, line-to-ground, connectivity faults, and faults that can affect the normal operation of the bay-pass diodes, were introduced. These faulty scenarios comprise various conditions: Simple, multiple, and mixed faults are injected at different levels and locations. To evaluate the robustness of the proposed strategy, various cases were investigated. The suggested solutions were sufficient for diagnosing the characteristics of GCPV operating conditions in both normal and abnormal modes. Nevertheless, the obtained fault diagnosis accuracy presented when applying the established approach demonstrated some missed detection and false alarm rates, thereby some faulty conditions not being correctly labeled. Accordingly, one future work aspect is to employ an online and adaptive NN-based tool to enhance the model, which can provide a reduced missed classification rate. Another direction of work is to develop adaptive NNs-based techniques to address and avoid uncertainties in PV systems using the interval-valued dataset representation. Indeed, an ensemble NNs-based model will be improved using multiple NNs-based strategies to raise the precision of the decision-making.