1. Introduction
Over the last few years, photovoltaic (PV) system technologies have seen fast development, with a meaningful impact on electrical co-generation systems. Moreover, they are regarded as one of the most significant sources of renewable energy [
1,
2]. The grid-connected PV (GCPV) system is considered the most relevant PV system, where the photovoltaic array, the inverter, and the grid are the main components of these systems [
3]. Many researchers are searching for approaches to enhance the GCPV systems’ performance through the improvement of their maximum power point tracking (MPPT). Therefore, the GCPV systems can be exposed to several faults that cause performance degradation or even a complete breakdown of the system, leading to long downtime and maintenance periods. Generally, faults significantly affect the systems’ availability, production rate, and cost of maintenance.
Thus, early fault detection and diagnosis (FDD) are an optimal way to improve the operation of these systems and to decrease the maintenance costs [
4]. The FDD procedure comprises three principal steps: feature extraction, feature selection, and faults classification. Even the classification-based FDD might be grouped into a couple of subclasses, i.e., multiclass and one-class classification. Using the multiclass classification-based FDD, a dataset is classified into types of classes, including healthy and faulty groups. On the other hand, the one-class classification approach can distinguish samples belonging to a particular class among the entire sampling data by learning from a training set that only comprises data from that class [
5].
Usually, FDD procedures might be classified as a couple of major classes: the model-based approach and the data-driven-based approach [
6]. The model-based FDD demands a precise mathematical model and is notably hard to acquire in the real word [
7,
8]. Nevertheless, the data-driven approaches seek to extract the most relevant information obtained from the measured signals with the aim of training the model and then utilizing it in testing for FDD goals.
Recently, to improve the PV system’s reliability and performance, several machine learning (ML)-based FDD approaches have been suggested in the literature [
2]. The commonly used ML techniques are random forest (RF) [
9,
10], artificial neural networks (ANN) [
11,
12], support vector machine (SVM) [
13,
14], decision tree (DT) [
1,
15], and K-nearest neighbors (KNN) [
16,
17]. In [
18], the authors proposed a fault detection and classification approach of line-to-line and line-to-ground at the DC side of PV systems. Their proposed procedure utilizes fault detection and classification tools for monitoring PV systems over healthy and faulty conditions. At first, the faults are grouped through the hierarchical classification platform. Then, ML techniques are applied in order to detect and classify faults (line-to-line and line-to-ground faults). The proposed approach aims to obtain data reduction and high accuracy with a low mismatch degree and great fault impedance in comparison with other fault diagnostic techniques. The obtained findings demonstrate the effectiveness of the proposed procedure; the procedure accurately detects and classifies the studied faults over several situations and achieves accuracies of 96.66% and 91.66%. Authors in [
19] proposed a new fault diagnostic approach including two major steps: First, the most informative features are extracted by analyzing current–voltage (I–V) characteristics under different operating conditions (healthy mode and different line-to-line (LL) fault incidents). Then, a genetic algorithm (GA) is applied to select features and optimize the kernel functions utilized in the SVM technique. The proposed procedure requires only a small sample of data, unlike previous research, and it achieves high accuracy when dealing with LL fault events under low mismatch and high impedance levels with an average accuracy of 97.5%. In [
16], a real-time FDD technique based on K-nearest neighbors for PV systems is investigated, dealing with four types of faults, including line-to-line faults, open circuit faults, partial shading with and without bypass diode faults, and, finally, a partial shading with inverted bypass diode faults. In this study, a modeling of the PV systems is provided in detail through experimental data, which uses only the available data from the data sheet of the manufacturer under standard test conditions and normal operating cell temperatures. Thereafter, the error between the developed model and the measured samples is lower than the available models in the literature, as demonstrated by the simulation results. Finally, the proposed FDD reaches high accuracy with an average of 98.70% when using the dataset collected from the proposed model and experimental setup. The authors of [
20] proposed an improved FDD scheme using PCA-based supervised ML (SML). It involves two principal stages: feature extraction and selection through PCA and decision making using SML classifiers. In order to guarantee a global analysis and a complete study, various types of faults are discussed in this paper, including inverter faults, grid connection faults, sensor faults, and PV panel faults. Their obtained results demonstrate the efficiency and feasibility of their proposed procedure by reaching an average accuracy of 99.49%.
In [
10], the authors developed an improved classifier based on an RF model for diagnosing faults in grid-connected PV systems. The proposed procedure is composed of two steps: feature extraction and selection and fault classification. As a first step, the sampling data of the training phase is reduced through two techniques: Euclidean distance and K-means clustering. After that, the most relevant features are extracted and selected to then be fed to an RF classifier. The obtained results show the high classification accuracy of the developed approach by achieving 100%. The authors of [
1] proposed an approach for FDD in the GCPV system based on the DT method. This paper deals with three types of faults, including string faults, short circuit faults, and line-to-line faults. The first goal is to detect the fault’s occurrence and the second one is to classify the different operating modes. The obtained results demonstrate the high detection performance along with a high diagnosis accuracy reaching 99.80%. In [
21], the authors developed a robust and low-cost shading fault detection and classification approach in PV systems. The proposed strategy comprises two steps: At first, the authors built a database including extracted features from the different experimental tests under healthy and shading situations. Then, the features were analyzed through PCA. The proposed approach reached a high classification rate of over 97% with the studied configurations. However, the mentioned fault diagnosis approaches have been achieved under a fixed irradiance level, and the variations in this variable are not taken into account.
On the other hand, several researchers handle the issue of developing diagnostic solutions for variations in climatic conditions. For example, the authors of [
22] proposed an approach for the fault diagnosis of PV systems which aims to identify the number of PV modules tendering short or open circuits under uniform and non-uniform irradiance operating conditions. In [
23], a fault-detection procedure based on a multi-resolution signal decomposition (MSD) and a fuzzy inference system (FIS) is developed. The MSD method is applied to extract features from the PV array output current and voltage and the solar irradiance, to be then introduced into an FIS for decision making. The authors of [
24] proposed a sensorless and simple tracking technology to detect any line-to-line and line-to-ground faults in a PV array using MPPT. The proposed approach can also detect faults in low irradiance levels and partial shading conditions with incredible accuracy. In [
25], the authors proposed a fault diagnosis approach for PV systems based on metaheuristic optimization. This approach is capable of identifying and locating open and short-circuited modules in a PV array under non-uniform irradiance and temperature distribution. The previous research considered the irradiance variations but in just a few operating points.
Indeed, irradiance experiences rapid variations depending on the environmental conditions. In fact, the power provided by PV arrays is vigorously related to this irradiance. Accessible data via NOAA’s Surface Radiation (SURFRAD) network [
26] is utilized to demonstrate the high variation of the irradiance. The SURFRAD data are obtained from a seven-station network in the continental U.S. that measures a set of climatic parameters including wind speed, wind direction, relative humidity, air pressure, time of day, air temperature, and solar zenith angle.
Figure 1 shows the variance of irradiance during three days in Bondville, Illinois, United States, in July 2022. These observational data demonstrate that the irradiance changes from one day to the next and predominantly during the daytime hours.
Under practical conditions, a PV system in healthy operating mode may behave as if under faulty conditions at a specific irradiance level, and vice versa. In the case where this condition is true and the healthy operating mode is misclassified as a faulty one, it leads to maintenance without any real justification, causing a waste of time and fees. On the other hand, when a serious fault occurs (like the line-to-line fault) but is misclassified as a healthy scenario, no alarm is raised, leading to real damage to the GCPV system.
Therefore, to address these issues, we propose in this paper a new GCPV system-based model in which the irradiance variations are introduced in the dynamic GCPV modeling in order to highlight the impact of variations in climatic conditions. To accomplish this, a machine learning (ML)-based PCA technique is investigated. Multivariate and statistical features are adequately extracted from GCPV measurements using the PCA model, whereby the more relevant and accurate features are selected. Then, ML techniques are applied, through two classification types (multi-class and one-class), to the final features to distinguish between various faults that can happen during GCPV operation under irradiance variations.
To highlight the main challenge, three working modes are investigated under different irradiance levels. The healthy operating mode is performed under an irradiance level of 400 W/m, while the two faulty operating modes are injected under a 550 W/m irradiance level. This condition is evaluated under two strategies (traditional and proposed). After demonstrating the novel strategy’s abilities, several fault scenarios (e.g., simple faults on one PV array and mixed faults on both sides) are examined under a range of irradiance levels [Low, High] to better assess the robustness of this proposal.
The rest of the paper is outlined as follows:
Section 2 details the proposed fault detection and diagnosis procedure using ML-based PCA. The performances of the proposed fault diagnosis strategy are discussed in
Section 3. Finally,
Section 4 presents the conclusions of the paper.