*4.2. Regression-Based Methods*

In this subsection, PV system fault detection methods that are regression-based are reviewed. Table 2 summarizes these methods, highlighting their main contributions.

For the detection of abnormal situations in the PV system, a novel approach has been employed in [91] based on regression and SVM, to compute the ideal power generation, which takes into account all of the three categories of failures in PV systems, namely failures of PV modules, inverters, and other components. Furthermore, the proposed method makes use of variables that are already present in a small-scale PV system, eliminating the need for the installation of extra expensive sensory equipment. Power, voltage, and current are collected from the power conversion system (PCS). Solar irradiance on the surface of the PV panels is measured with a pyranometer, and ambient and PV cell temperatures are monitored with a thermometer. As a result, the suggested PV abnormal condition detection system can be efficiently used in a small-scale PV system or as an early warning system for the PV operator/owner to undertake further system inspections. Merged regression and support vector machine (SVM) models were used to create the PV abnormal condition detection system. The regression model is used to calculate the expected power generation for each solar irradiation, which is then fed into the SVM model. The SVM model, on the

other hand, uses numerous variables, including the expected power generation, which is generated from the regression model to determine the abnormal condition of the PV system. Since the data used as the model's input variable is acquired from the PCS, the proposed model does not require the installation of extra measurement devices and can be constructed at minimal cost. The detection system's accuracy is also increased by taking into account the daylight time and interactions between the independent variables, as well as using the multi-stage k-fold cross-validation technique. The proposed detection system is evaluated using real data from a PV site, and the findings show that it can successfully discern between normal and abnormal PV system conditions using basic measures. The authors in [92] proposed a condition monitoring technique based on an online regression PV array performance model, in which PV array production, POA irradiance, and module temperature/maximum power (MPP) measurements are collected during the system's initial learning phase and used to automatically parameterize the system online using regression modeling. After the model has been automatically parameterized and optimized using the regression modeling methods, the condition monitoring system enters the normal operating phase, in which the performance model is used to anticipate the PV array's power production. The authors claimed that using projected and measured PV array output power values, the condition monitoring system based on the Sandia array performance model (SAPM) [92,93], could detect power losses in the PV array of larger than 5%. When compared to the existing model-based condition monitoring systems, the suggested method is unique in that it can take advantage of the I-V scanning capabilities of a new generation of commercial PV inverters [92], according to the authors. Using the real MPP collected from the I-V curve, as well as the ambient condition sensors, the system can generate an accurate performance model of the PV array in question during field operation. The authors' proposed method has the following advantages: Simple commissioning and operation requirements; potential applicability to a wide range of PV system configurations; and it does not require modeling and testing of the PV array prior to installation. It does, however, have an initial commissioning phase where ambient conditions and array MPP measurements are used for the automatic commissioning of the PV array. Figure 6 presents the condition monitoring systems' learning or commissioning phase.

**Figure 6.** Condition monitoring systems' learning phase [92].

Unlike Reference [92], which only used a regression model, Reference [94] used a combination of linear regression and artificial neural networks, as well as solar irradiation, ambient temperature, and maximum power point (MPP) characteristic variables of PV modules obtained from I-V tracers at the PV installation site, to predict the performance

of soiled PV modules. In the study, two methods (linear regression and neural networks) were used in modeling the output of soiled PV panels. In the first method, multiple linear models were created based on the last cleaning cycle time stamp, thus predicting the panel's output as a function of solar irradiance. In the second method, the neural network, date, time, and irradiance, and sometimes, the temperature data were utilized and information was inputted to predict the output of the solar panel. In both methods, PV panel generation predictions were used to have high accuracies. Another regression based fault detection model is presented in [95]. The study proposed a smart algorithm for the diagnosis and prognosis of reversed polarity fault in PV generators, utilizing a hybrid optimization of a support vector regression (SVR) technique by a k-NN regression tool (k-NNR). The main contribution of the study included the development of a smart prognosis algorithm that detects, locates, and characterizes a reversed polarity fault at the cell, bypass, and string levels. For data processing in a linear space, the SVR requires a kernel function in order for the data to be transformed in a new greater dimension space. To overcome the SVR output indetermination, the study utilizes k-NNR for predicting the approximate value of the SVR undetermined output. The method was validated using 50 samples of a typical generator and the results showed homogeneous and reliable predictions.

**Table 2.** Summary of regression-based methods.


#### *4.3. Decision Tree-Based Methods*

The study also presents a review of the AI methods of fault detection and diagnosis that are decision tree-based. Table 3 shows a summary of the methods presented in the subsection, with an outline of the main contribution of each literature.

A number of fault detection and diagnosis methods that are decision tree-based are provided in [67,96,97]. In [67], failure detection routines (FDRs), that use obtained datasets of grid-tied PV systems to accurately obtain and classify exhibited faults, were created. The FDRs are made of two stages, namely the fault detection stage, which uses a comparative algorithm for the detection of anomalies between the measured and simulated electrical measurements of a PV system; and a statistical algorithm for the identification of outliers, discrepancies, and normal system operation limits. The failure classification stage is otherwise called the decision stage, where developed logic and decision trees are used to perform the classification process. The failure detection stage included a comparing algorithm that discovered disparities between the measured and simulated electrical measurements on the dc side, which were obtained using empirical parametric models for each point. Significant variations between the measured and simulated parameters, indicating a noteworthy performance gap, were classified as failures. The obtained incident global irradiance at the POA, PV system specs, and module temperature were required as inputs for the detection algorithm utilized in this study. Another comparative method utilized in the study is three-sigma limit method. When typical operation limitations set by defined criteria were exceeded, failures were recognized using this approach. The upper

and lower control limits in statistical quality control charts were set using three-sigma limits (a statistical calculation that refers to data within three standard deviations of the average), which are commonly used to set the upper and lower control limits in statistical quality control charts. To establish and display the researched operation and boundaries, control charts were used. The typical operation limitations of the PV monitoring system were calculated by dividing the measured and simulated electrical data by the ratio. According to the PV production model, the ratio was used to determine how close the measurements were to their calculated values. The closer the ratio was to 1, the closer the measured parameters were to the modelled values. Since the system performance is affected by sunlight levels, an extra step is performed. To decrease the high bias errors occurring at low irradiance levels, the datasets were filtered at global irradiance levels >50 W/m2. In the statistical failure detection approach, the local outlier factor (LOF) algorithm was used to find the density-based local outliers. With LOF, each point's local density is compared to its k-nearest neighbors (k-NN), and if the point's density is significantly lower than its neighbors, then the point is in a sparser region than its neighbors, indicating that it is an outlier. Moreover, outlier testing was done using the Bonferroni outlier test algorithm. Based on the linear regression model of the observed and simulated dc power of the array, this program returns the Bonferroni *p*-value for the most extreme observation. In addition, the seasonal hybrid extreme studentized deviates (S-H-ESD) algorithm was used to find anomalies in a time series dataset that follows an approximately normal distribution. The S-H-ESD technique is an extension of the generalized extreme studentized deviates (ESD) algorithm that uses the time series decomposition and robust statistical measures in conjunction with ESD to detect both global and local anomalies [98]. For the detected failure classification, logic and decision trees were developed. In addition, the decision trees were trained using continuous samples divided in a 70:30% train to test the set ratio utilizing the acquired datasets that included the feature patterns seen during normal and faulty operations. Moreover, they were produced using supervised learning procedures. The accuracy of the proposed FDRs for fault detection and classification was demonstrated in the obtained results for three fault types (inverter failure, bypass diode faults, and partial shading fault), which showed that accuracy rates of 98.7, 95.3, and 96.6% were recorded for inverter failure, bypass fault, and partial shading, respectively. The authors in [96] presented a decision tree-based fault detection and classification for the PV array with an easy and straightforward model training process. Under the creation of a decision tree model for an experimental PV system in both normal and faulty working situations, the authors employed the PV array voltage, current, operating temperature, and irradiance as attributes for the training and test sets. The collected data and pre-processed training set are chosen at random from the experimental data and utilized to create the decision tree model using the WEKA software [99], after which the model is evaluated and validated on unseen real data. Fast training and classification phases, explicit interpretation, and straightforward implementation as an algorithm are all advantages of the proposed decision tree paradigm. Another benefit of the model is that it can detect problems in real time, with detection accuracy ranging from 93.56 to 99.98% and classification accuracy ranging from 85.43 to 99.8%, depending on the model's size. The authors of [97] proposed a defect detection and diagnostic technique for grid connected PV systems (GCPVS) based on the C4.5 decision tree algorithm (which is one of the most popular machine learning algorithms for classification problems [97]), in which a non-parametric model is utilized to forecast the state of the GCPVS through a learning task. Three numerical attributes (ambient temperature, irradiation, and power ratio) which are calculated from the measured and estimated power, as well as two targets (the first of which is either a healthy or a faulty state for detection, and the second of which contains four classes of labels named free fault, string fault, short-circuit fault or line-line fault for diagnosis) are chosen to form the final used data. The dataset was divided into two halves, with 66% utilized for learning and 34% for testing. Then, over the course of 5 days, additional data were collected to measure the robustness, effectiveness, and efficiency of both models. The dataset is required for the

learning process in order to construct the decision tree. As a result, an acquisition system is developed to be able to record and store data, such as climatic variation, as well as electrical variables, such as current, voltage, and power at the MPP. Three attributes are chosen, including temperature ambient, irradiation, and the power ratio, which is calculated from the estimated power by the Sandia model and the measured power of GCPVS production. The Sandia model is an empirical relationship that is used to estimate the generated power from a system in a healthy state at MPP using STC data. Since this model has unknown parameters, the flower pollination algorithm (FPA) is used to find the optimal parameter values that correspond to the smallest root mean square error between the estimated Sandia output and the measured power. As a result of the high correlation between the power ratio and the system state, a nominal property called target is constructed as a class label in each instance data in order to accurately forecast these errors. Two major approaches lead to the construction phase. To begin, a splitting criteria is used to select the best split attribute. Thereafter, the tree grows in length as this technique is repeated iteratively in order to categorize all of the instances or to verify one of the stopping criteria. Then, once the tree model has been obtained, a pruning process is carried out to remove the unneeded sub-trees in order to minimize the overfitting phenomena, which can result in a reduction in model complexity due to the reduced tree size. According to the test findings, the models have a great prediction performance in the detection with high accuracy, while the diagnostic model has an accuracy of 99.8%.

**Table 3.** Summary of decision tree-based methods.


#### *4.4. Support Vector Machine-Based Methods*

The methods reviewed in this subsection are support vector machine-based. Table 4 presents a summary of these methods as well as their contributions.

The authors of [100,101] suggested a system based on support vector machines (SVM) and k-NN tools with the goal of building a fault detection and diagnosis algorithm for PV generators. The algorithms are smart, according to the authors, since they are a hybridization of the SVM approach and the k-NN tool, which are used to improve the classification rate against observations on the classifier itself. The systems' originality is in the construction of a smart classifier based on collected data from the control system, as well as the fault identification and localization of short-circuits in a cell, bypass, and blocking diodes. First, the proposed method utilizes the SVM algorithm, which is a twoclass classification technique that seeks hyper-plane separating positive examples from negative ones by ensuring that there is a maximum margin between the nearest positive and negative examples. This ensures that the idea may be applied to new situations, as new examples may not be as similar to those used to determine the hyperplane, but may be on either side of the border. The selection of support vectors, which reflect the discriminate vectors by which the hyper-plane is determined, is a benefit of this method. Only those supporting vectors are utilized to assign a new case, and the examples used in the hyperplane search are no longer required. Second, k-NN which is a simple and straightforward approach, and does not require learning is utilized to compare new examples of unknown

class to old examples in its database. Then, for this new example, k-NN chooses a majority class among its nearest neighbors. In summary, the method uses the activation function of the SVM of Gaussian type and the Euclidean distance between the gravity centers of database observations of the k-NN method. The obtained simulation results, using the proposed smart algorithm in both literatures, exhibit a high classification rate and low error rate. However, the algorithms take a longer processing period due to the mathematical computations. Therefore, future works should focus on improving this aspect of the algorithms. The authors of [48] proposed an algorithm to improve the detection accuracy of line-to-line faults in PV arrays that occur under a wide range of situations, such as low irradiance conditions, high impedance faults, and low mismatch faults. The algorithm is based on pattern recognition (multi-resolution signal decomposition (MSD)) and machine learning techniques (two-stage SVM classifier). It takes advantage of the MSD technique for the extraction of the feature space of line-to-line faults, while the SVM part is essentially for decision making. The system does not require numerous sensors, since it uses measurements of the overall voltage and current of the PV array, thus making it an economical and fast option. It detects line-to-line defects quickly and accurately, and it can be combined with fault location techniques to solve faults quickly. The MSD stage performs digital signal processing (DSP), allowing for the simultaneous time and frequency analysis of a signal, such as the analysis of both stationary and transient components of power quality disturbance. The SVM stage, on the other hand, is carried out to improve accuracy. The two-stage SVM is a binary classifier, which requires training utilizing a minimal amount of historic data from the tested PV system. The authors suggest that the proposed method is not limited to line-line faults only and could be used for the detection of other PV system faults. A method for detecting problems and monitoring the state of PV modules using a two-class data fusion method is introduced in [102]. The approach was created by combining monitoring data from sensor nodes in wireless sensor networks (WSNs) at a monitoring center with a new semi-supervised support vector machine (SVM) classifier, devised and trained using the monitoring center's existing sun irradiance big data. The monitoring center was created in order to access various monitoring data from various PV power stations, and multiple applications were created to use the envisioned system in various platforms. In this paper, a wireless monitoring subnetwork was created to retrieve crucial data from PV modules in power stations, such as the current, voltage, and temperature. The monitoring data received from each sensor was fused by a sink node with sunlight intensity information, and the fusion results were provided to the monitoring center over Internet networks. The data received from the sink node were parsed using the data access interface, and the data from the parsing process was doublechecked using the outlier detection technique. The Cloud management module, which was also in charge of data security transmission between the Cloud and the applications, retained the regular data in the private Cloud. The authors built a semi-supervised SVM classifier using historical monitoring sunlight intensity data, which was employed in the outlier identification and solar power forecast algorithms. An outlier identification technique is designed using the prediction model provided by the trained classifier to identify and locate PV module faults by computing the average value of the problematic data. Furthermore, the authors employed a novel application of the PGKA technique to ensure the security of data transmission between the Cloud and apps. The fact that this approach does not require third-party certification to maintain file encryption and encryption keys is an apparent benefit. Despite the fact that this research focuses on crucial PV power station challenges, there is still a long way to go in terms of gathering PV power station data and intelligence properly.


**Table 4.** Summary of SVM-based methods.

### *4.5. Neuro-Fuzzy-Based Methods*

This subsection presents a review of AI methods of fault detection and diagnosis that are based on the neuro-fuzzy technique. Table 5 presents a summary of the methods as well as their contributions.

The authors of [103] proposed an intelligent system for automatic fault detection in PV fields based on the Takagi-Sugeno-Kahn fuzzy rule-based system (TSK-FRBS) [104]. The method is based on the analysis of recorded voltages and currents collected from a PV plant's inverter. The TSK-FRBS is a power estimator module that estimates the PV field's immediate power production in normal operating conditions (using temperature and irradiance input signals to assess the DC power that the PV plant should produce) and compares it to the real power to check for differences. If there is a large disparity between the two power sources, an alarm signal is issued. In this circumstance, the TSK-FRBS has two advantages. First, it can describe complicated system behaviors without requiring the use of a mathematical model. Second, it is able to deal with noisy and vague data. The schematics of the proposed intelligent system, which is connected to a PV system is shown in Figure 7. It consists of the data acquisition module, the detection module, and the diagnosis module, in a multi-array inverter PV plant. The acquisition module measures the temperature and solar irradiance on the PV plant in real time and extracts the DC current and voltage observed on the respective array from each MPPT. Sensors mounted on the PV field can be used to measure temperature and solar irradiation. They can also be obtained via a remote database linked to a weather station. Then, the acquisition module feeds these measured data to the detection module. The detection module estimates the DC power that each array should output if no fault occurs using these data from the acquisition module. This module compares the estimated and measured powers and generates an alert signal in correspondence with the arrays that provided a lower power than the estimated one; if the difference exceeds a threshold. Finally, the alert signals are sent to the diagnosis module, which can automatically provide information on the type of fault that occurred.

**Figure 7.** Proposed intelligent system's schematics [103].

The authors of [105] noted that numerous literatures have proposed methods of shadow detection and the reconfiguration of an array. In addition, most of these methods use the voltage, current, and power information to achieve this. The authors saw that monitoring these factors was time consuming and tiresome. Therefore, they presented a novel and effective method of shadow detection for the reconfiguration process, which will help increase the energy production of PV arrays, based on the fuzzy logic and computer vision. The method detects the edge of the object region on the panel from images taken with a camera. Using the background subtraction and object detection method, it then converts the background and foreground image into a grey image format. Object edge detection is performed after the determination of the updated mechanism, which is based on background subtraction and illumination variability. First, dilation and erosion operations are applied to the binary mask in order to determine the object region edge. The image's noise is reduced by these techniques. Following that step, using the Canny edge detection approach, the edges of the objects on the binary image are detected. The final stage in the determination of the subject borders on an image is to use the search and draw contour procedures to determine the related object regions. A relevant pixel region is created for each region whose side information is identified. Finally, the proposed method uses a fuzzy decision-making mechanism to classify the object region as a shadow region utilizing brightness and color distortion values of the object as input parameters. When employed as an input parameter for the reconfiguration operations, the proposed method has a success rate of 98% and increases energy usage performance by roughly 10–15%. The arc defect detection technique described in [106] has a minimal computational requirement, and can function with most of the conventional analog-digital converters (ADC) found in microcontrollers, making it useful in PV applications. When using an inverter rather than a solo device, the algorithm produces better results. This algorithm's integration is also a cost-effective way to detect arc faults and improve the PV system long-term safety. The short- and long-term measurement results are promising, but further long-term experiments are needed to fine-tune the device for phenomena that have yet to be identified. In the proposed method, the detection algorithm uses three indicators, namely frequency analysis, peak detection, and observation of the operating point, as parameters for fault detection. When an arc fault occurs, the indicators display a particular behavior. However, it is not always the same. The signal energy increases slowly at times and quickly at others. When an arc fault occurs in a small plant, the trip is apparent, but not in a large plant. Due to the haziness of the situation, the authors employed fuzzy logic to detect an arc defect. Furthermore, fuzzy logic makes it easier to incorporate the experiences of experts who are unfamiliar with the algorithm. Four sub-detectors, which are followed by the master fuzzy arc fault detector (MFAFD), are created to keep an overview of rules and input variables. Theses sub-detectors are the peak evaluator sub-detector (PESD), which analyzes all of the peaks and delivers a mass output proportional to the probability, indicating that this peak is from an arc fault; window near sub-detector (WNSD), which analyzes the change signal energy over a short timeframe; window wide sub-detector (WWSD), which analyzes the long-term signal energy and can be used when there is no abrupt signal energy growth; and power analyzer sub-detector (PASD), which supervises the power change. Outputs of the four sub-detectors serve as input for the MFAFD. The MFAFD outputs a number between 0 and 1, which represents the mass for the arc fault probability. An arc fault is detected if this probability exceeds a predefined threshold. The authors of [107] presented a method for detecting increases in series resistance using a fuzzy classifier that can distinguish between the increasing series losses and partial shadow situations for resistances greater than 400 W/m2. As shown in Figure 8, an optional shadow detection algorithm acting before the increased series losses detection system, which could improve the detection accuracy of the system, is also implemented in the diagnostic system. This strategy is especially significant, since the increased series losses and partial shadow circumstances are difficult to discern, as they diminish a PV system's peak output and fill factor. Rather than the controlled laboratory circumstances, the study

focuses on estimating the increased series resistance in the field. The method has been tested using experimental measurements. In addition, it has shown good detection rates across a wide range of irradiance levels, as well as in the presence of diverse sizes and patterns of partial shadows. Moreover, the authors showed that a dedicated partial shadow detection algorithm, implemented in the diagnostic system and functioning prior to the higher series losses detection method, improves the overall system's detection accuracy.

**Figure 8.** Proposed diagnostic system of partial shadow detection, increased series losses detection, and advanced system analysis and monitoring for the PV module [107].

Under low irradiance conditions, the DC side short-circuit faults in PV arrays consisting of multiple PV panels connected in a series/parallel configuration are nearly undetectable, especially when the MPPT algorithm is in use. In addition, if they go undetected, these faults can significantly reduce the output energy of PV systems, damage the panels, and potentially cause fire hazards. To avoid this, the authors of [108] present a fault detection scheme based on a pattern recognition approach that uses a multi-resolution signal decomposition technique to extract the necessary features, which is then used by a fuzzy inference system to assess whether a defect has occurred. PV array output volts and currents, as well as solar irradiation, are the system's inputs. The multi-resolution signal decomposition technique is used to extract four unique features depending on the above inputs. Following that step, the retrieved characteristics are supplied into the fuzzy inference system, which generates a scalar number based on carefully built membership functions and the associated rule base. Decisions are based on these results. The amount of this output determines whether a line-to-line, line-to-ground or none-of-the-above fault has occurred. In simulation and experiment-based case studies, the performance of the proposed method is demonstrated. Defect identification becomes increasingly difficult when the percentage of fault impedance or mismatch declines. The program also revealed one unintentional operation for a typical case, in which the irradiance had changed drastically within a short period of time, which is unusual in real-world systems. The promising performance of the proposed algorithm is supported by the experimental results. Lineto-line and line-to-ground faults are usually detected using this method. Open-circuits and hot-spot heating, for example, are two further types of PV array faults. As a result, the applicable algorithms for a more exact classification of additional PV concerns could be applied in parallel with or later phases of the detection system. Fuzzy logic-based algorithms are presented in [109,110] to detect the malfunctioning PV modules and partial shadowing circumstances that influence the DC-side of grid-connected PV systems. The authors' algorithm is made up of six layers that work in a sequential order. Input parameters make up the first layer (solar irradiance and module temperature). The theoretical performance analysis of the grid-connected PV system is generated using the LabVIEW virtual instrumentation program in the second layer. The power and voltage ratios are determined in the third layer, and high and low detection limits are set in the fourth layer, which is utilized to apply the 3rd-order polynomial regression model to the power and voltage ratios. The fifth layer contains the input parameters of the examined grid-connected

PV systems, as well as the 3rd-order polynomial detection restrictions. If the measured voltage ratio vs. the measured power ratio is outside of the detection limits, the data will be processed by the sixth layer (which is the last layer), which contains the fuzzy logic categorization system. The suggested method's innovation is proved by the fact that it provides a simple, dependable, and fast fuzzy logic classification system that can be employed with a variety of grid-connected PV systems. The algorithm is also unique in that it is based on fluctuations in the voltage and power of the grid-connected PV system. Few fault diagnosis techniques are capable of being implemented on integrated circuits, according to Sufyan Samara and Emad Natsheh's study [65], and these procedures require expensive and complicated hardware. The authors introduced a unique effective and implementable defect diagnosis approach based on the AI nonlinear autoregressive exogenous NARX neural network and Sugeno fuzzy inference to try to solve this problem. The program employs the Sugeno fuzzy network to isolate and classify errors in PV systems. The NARX network is used to estimate the PV system's maximum output power based on the real-time measured output and surrounding conditions, which is then used by the fuzzy inference algorithm to detect and categorize errors that may develop in the PV system. The algorithm has been demonstrated to work on a low-cost microcontroller. The suggested algorithm will be able to detect a variety of flaws in the PV system, including open- and short-circuit degradation, faulty MPPT, and PS issues. Furthermore, the proposed algorithm can capture non-linear patterns between predictors, such as radiation and temperature, as well as other non-linear correlations of patterns between predictors, to calculate the precise moment of maximum power for the PV system. The actual sensed PV system output power, anticipated PV system output power, and sensed surrounding conditions are all required for fuzzy inference. Using an AI NARX-based neural network, the PV system's output power is projected. The authors concluded that the ability of the proposed method to efficiently diagnose several PV system faults is an important step in achieving a complete system that can diagnose large PV system faults.


**Table 5.** Summary of neuro-fuzzy-based methods.
