One-Class Machine Learning Classifiers-Based Multivariate Feature Extraction for Grid-Connected PV Systems Monitoring under Irradiance Variations

Yahyaoui, Zahra; Hajji, Mansour; Mansouri, Majdi; Bouzrara, Kais

doi:10.3390/su151813758

Open AccessArticle

One-Class Machine Learning Classifiers-Based Multivariate Feature Extraction for Grid-Connected PV Systems Monitoring under Irradiance Variations

¹

Research Unit Advanced Materials and Nanotechnologies, Higher Institute of Applied Sciences and Technology of Kasserine, Kairouan University, Kairouan 3100, Tunisia

²

Electrical and Computer Engineering Program, Texas A&M University at Qatar, Doha 23874, Qatar

³

Laboratory of Automatic Signal and Image Processing, National Engineering School of Monastir, Monastir 5019, Tunisia

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(18), 13758; https://doi.org/10.3390/su151813758

Submission received: 6 April 2023 / Revised: 18 July 2023 / Accepted: 26 July 2023 / Published: 15 September 2023

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, photovoltaic (PV) energy production has witnessed overwhelming growth, which has inspired the search for more effective operations. Nevertheless, different PV faults may appear, which leads to various degradation stages. Furthermore, under different irradiance levels, these faults may be misclassified as a healthy mode owing to the high resemblances between them, thus provoking serious challenges in terms of power losses and maintenance costs. Hence, interposing the irradiance variation in grid-connected PV (GCPV) systems modeling is important for monitoring tasks to ensure the effective operation of these systems, to increase their reliability and to prevent false alarms. Therefore, in this paper, a fault detection and diagnosis (FDD) method for the GCPV systems using machine learning (ML) based on principal component analysis (PCA) is proposed in order to ensure the reliability and security of the whole system under irradiance variations. The proposed strategy consists of three main steps: (i) introduce the irradiance variations in PV system modeling because of its great impact on power production; (ii) feature extraction and selection through PCA; and (iii) fault classification using ML techniques. In this study, we generate a database that is used to compare the proposed strategy with the standard strategy (considering a fixed irradiance during FDD), to make, at first, a complete and significant comparative assessment of fault diagnosis and to demonstrate the efficiency of the proposed strategy. The achieved results show the high effectiveness of the proposed one-class classification-based approach to detect and diagnose PV array anomalies, reaching an accuracy up to 99.68%.

Keywords:

irradiance variations; grid-connected PV system; feature extraction and selection; fault classification; machine learning

1. Introduction

Over the last few years, photovoltaic (PV) system technologies have seen fast development, with a meaningful impact on electrical co-generation systems. Moreover, they are regarded as one of the most significant sources of renewable energy [1,2]. The grid-connected PV (GCPV) system is considered the most relevant PV system, where the photovoltaic array, the inverter, and the grid are the main components of these systems [3]. Many researchers are searching for approaches to enhance the GCPV systems’ performance through the improvement of their maximum power point tracking (MPPT). Therefore, the GCPV systems can be exposed to several faults that cause performance degradation or even a complete breakdown of the system, leading to long downtime and maintenance periods. Generally, faults significantly affect the systems’ availability, production rate, and cost of maintenance.

Thus, early fault detection and diagnosis (FDD) are an optimal way to improve the operation of these systems and to decrease the maintenance costs [4]. The FDD procedure comprises three principal steps: feature extraction, feature selection, and faults classification. Even the classification-based FDD might be grouped into a couple of subclasses, i.e., multiclass and one-class classification. Using the multiclass classification-based FDD, a dataset is classified into types of classes, including healthy and faulty groups. On the other hand, the one-class classification approach can distinguish samples belonging to a particular class among the entire sampling data by learning from a training set that only comprises data from that class [5].

Usually, FDD procedures might be classified as a couple of major classes: the model-based approach and the data-driven-based approach [6]. The model-based FDD demands a precise mathematical model and is notably hard to acquire in the real word [7,8]. Nevertheless, the data-driven approaches seek to extract the most relevant information obtained from the measured signals with the aim of training the model and then utilizing it in testing for FDD goals.

Recently, to improve the PV system’s reliability and performance, several machine learning (ML)-based FDD approaches have been suggested in the literature [2]. The commonly used ML techniques are random forest (RF) [9,10], artificial neural networks (ANN) [11,12], support vector machine (SVM) [13,14], decision tree (DT) [1,15], and K-nearest neighbors (KNN) [16,17]. In [18], the authors proposed a fault detection and classification approach of line-to-line and line-to-ground at the DC side of PV systems. Their proposed procedure utilizes fault detection and classification tools for monitoring PV systems over healthy and faulty conditions. At first, the faults are grouped through the hierarchical classification platform. Then, ML techniques are applied in order to detect and classify faults (line-to-line and line-to-ground faults). The proposed approach aims to obtain data reduction and high accuracy with a low mismatch degree and great fault impedance in comparison with other fault diagnostic techniques. The obtained findings demonstrate the effectiveness of the proposed procedure; the procedure accurately detects and classifies the studied faults over several situations and achieves accuracies of 96.66% and 91.66%. Authors in [19] proposed a new fault diagnostic approach including two major steps: First, the most informative features are extracted by analyzing current–voltage (I–V) characteristics under different operating conditions (healthy mode and different line-to-line (LL) fault incidents). Then, a genetic algorithm (GA) is applied to select features and optimize the kernel functions utilized in the SVM technique. The proposed procedure requires only a small sample of data, unlike previous research, and it achieves high accuracy when dealing with LL fault events under low mismatch and high impedance levels with an average accuracy of 97.5%. In [16], a real-time FDD technique based on K-nearest neighbors for PV systems is investigated, dealing with four types of faults, including line-to-line faults, open circuit faults, partial shading with and without bypass diode faults, and, finally, a partial shading with inverted bypass diode faults. In this study, a modeling of the PV systems is provided in detail through experimental data, which uses only the available data from the data sheet of the manufacturer under standard test conditions and normal operating cell temperatures. Thereafter, the error between the developed model and the measured samples is lower than the available models in the literature, as demonstrated by the simulation results. Finally, the proposed FDD reaches high accuracy with an average of 98.70% when using the dataset collected from the proposed model and experimental setup. The authors of [20] proposed an improved FDD scheme using PCA-based supervised ML (SML). It involves two principal stages: feature extraction and selection through PCA and decision making using SML classifiers. In order to guarantee a global analysis and a complete study, various types of faults are discussed in this paper, including inverter faults, grid connection faults, sensor faults, and PV panel faults. Their obtained results demonstrate the efficiency and feasibility of their proposed procedure by reaching an average accuracy of 99.49%.

In [10], the authors developed an improved classifier based on an RF model for diagnosing faults in grid-connected PV systems. The proposed procedure is composed of two steps: feature extraction and selection and fault classification. As a first step, the sampling data of the training phase is reduced through two techniques: Euclidean distance and K-means clustering. After that, the most relevant features are extracted and selected to then be fed to an RF classifier. The obtained results show the high classification accuracy of the developed approach by achieving 100%. The authors of [1] proposed an approach for FDD in the GCPV system based on the DT method. This paper deals with three types of faults, including string faults, short circuit faults, and line-to-line faults. The first goal is to detect the fault’s occurrence and the second one is to classify the different operating modes. The obtained results demonstrate the high detection performance along with a high diagnosis accuracy reaching 99.80%. In [21], the authors developed a robust and low-cost shading fault detection and classification approach in PV systems. The proposed strategy comprises two steps: At first, the authors built a database including extracted features from the different experimental tests under healthy and shading situations. Then, the features were analyzed through PCA. The proposed approach reached a high classification rate of over 97% with the studied configurations. However, the mentioned fault diagnosis approaches have been achieved under a fixed irradiance level, and the variations in this variable are not taken into account.

On the other hand, several researchers handle the issue of developing diagnostic solutions for variations in climatic conditions. For example, the authors of [22] proposed an approach for the fault diagnosis of PV systems which aims to identify the number of PV modules tendering short or open circuits under uniform and non-uniform irradiance operating conditions. In [23], a fault-detection procedure based on a multi-resolution signal decomposition (MSD) and a fuzzy inference system (FIS) is developed. The MSD method is applied to extract features from the PV array output current and voltage and the solar irradiance, to be then introduced into an FIS for decision making. The authors of [24] proposed a sensorless and simple tracking technology to detect any line-to-line and line-to-ground faults in a PV array using MPPT. The proposed approach can also detect faults in low irradiance levels and partial shading conditions with incredible accuracy. In [25], the authors proposed a fault diagnosis approach for PV systems based on metaheuristic optimization. This approach is capable of identifying and locating open and short-circuited modules in a PV array under non-uniform irradiance and temperature distribution. The previous research considered the irradiance variations but in just a few operating points.

Indeed, irradiance experiences rapid variations depending on the environmental conditions. In fact, the power provided by PV arrays is vigorously related to this irradiance. Accessible data via NOAA’s Surface Radiation (SURFRAD) network [26] is utilized to demonstrate the high variation of the irradiance. The SURFRAD data are obtained from a seven-station network in the continental U.S. that measures a set of climatic parameters including wind speed, wind direction, relative humidity, air pressure, time of day, air temperature, and solar zenith angle. Figure 1 shows the variance of irradiance during three days in Bondville, Illinois, United States, in July 2022. These observational data demonstrate that the irradiance changes from one day to the next and predominantly during the daytime hours.

Under practical conditions, a PV system in healthy operating mode may behave as if under faulty conditions at a specific irradiance level, and vice versa. In the case where this condition is true and the healthy operating mode is misclassified as a faulty one, it leads to maintenance without any real justification, causing a waste of time and fees. On the other hand, when a serious fault occurs (like the line-to-line fault) but is misclassified as a healthy scenario, no alarm is raised, leading to real damage to the GCPV system.

Therefore, to address these issues, we propose in this paper a new GCPV system-based model in which the irradiance variations are introduced in the dynamic GCPV modeling in order to highlight the impact of variations in climatic conditions. To accomplish this, a machine learning (ML)-based PCA technique is investigated. Multivariate and statistical features are adequately extracted from GCPV measurements using the PCA model, whereby the more relevant and accurate features are selected. Then, ML techniques are applied, through two classification types (multi-class and one-class), to the final features to distinguish between various faults that can happen during GCPV operation under irradiance variations.

To highlight the main challenge, three working modes are investigated under different irradiance levels. The healthy operating mode is performed under an irradiance level of 400 W/m

^{2}

, while the two faulty operating modes are injected under a 550 W/m

^{2}

irradiance level. This condition is evaluated under two strategies (traditional and proposed). After demonstrating the novel strategy’s abilities, several fault scenarios (e.g., simple faults on one PV array and mixed faults on both sides) are examined under a range of irradiance levels [Low, High] to better assess the robustness of this proposal.

The rest of the paper is outlined as follows: Section 2 details the proposed fault detection and diagnosis procedure using ML-based PCA. The performances of the proposed fault diagnosis strategy are discussed in Section 3. Finally, Section 4 presents the conclusions of the paper.

2. Proposed Technique

Figure 2 illustrates the flowchart of the proposed methodology, which comprises three principal phases involving feature extraction, feature selection, and fault classification. First of all, the collection of a dataset is achieved, including a total of 550,000 samples with 50,000 samples per class representing healthy and various possible faulty conditions under different irradiance levels (250, 400, 550, 700, 850, 1000, 1150, 1300, 1450, and 1600 W/m

^{2}

), as highlighted in Section 3. Thereafter, a PCA model is constructed using the collected data, which is projected onto a subspace of orthogonal directions by holding the highest captured features information. This subspace is characterized by a lower dimension than the initial data, comprising the most informative features for each operating mode, and they will then be coded using a logic sequencer. These codes are obtained from −1 to 1, which stands for a set of training features with labels. After that, various classifiers are trained using a variety of features as input and their matching labels as the desired output. To make an efficient decision, a comparison between the obtained output and the set of feature labels is performed. Furthermore, other ML classification-based experiments have been carried out through various extracted and selected features.

2.1. Feature Extraction and Selection Using Principal Component Analysis (PCA)

PCA is one of the top prevalent multivariate statistical methods, and it is utilized in several scientific disciplines [27,28]. It is also likely to be the oldest multivariate technique. It converts a set of correlated variables into a novel set of uncorrelated variables orthogonal to each other [29]. More precisely, its goal is to extract the important information from the data table and to express this information as a set of new orthogonal variables called principal components. PCA also represents the pattern of similarity of the observations and the variables by displaying them as points in maps.

2.1.1. PCA-Based Feature Extraction

In a conventional data-driven based approach for FDD the considered data comprise a matrix Z, over N observations of m variables [20].

Z = (\begin{matrix} z_{1} (1) & \dots & z_{m} (1) \\ ⋮ & ⋱ & ⋮ \\ z_{1} (N) & \dots & z_{m} (N) \end{matrix}) \in ℜ^{N \times m}

(1)

Previous researchers have considered the high influence on the GCPV system of the irradiance of a few operating points. We propose a novel strategy that consists of introducing a range of irradiance levels ([low, high]) in each operating mode i (going from 1 to n). Consider a new data matrix X, over O observations and m variables. X represents a set of samples for all classes n, where each class is presented over different irradiance level G corresponding to

o_{i}

(i = 1, \dots, n)

observations, as presented in Equation (2).

X = (\begin{matrix} x_{1, 1, 1} (1) & \dots & x_{m, 1, 1} (1) \\ ⋮ & ⋱ & ⋮ \\ x_{1, 1, G} (o_{1}) & \dots & x_{m, 1, G} (o_{1}) \\ ⋮ & ⋱ & ⋮ \\ x_{1, i, 1} (1) & \dots & x_{m, i, 1} (1) \\ ⋮ & ⋱ & ⋮ \\ x_{1, i, G} (o_{i}) & \dots & x_{m, i, G} (o_{i}) \\ ⋮ & ⋱ & ⋮ \\ x_{1, 1, 1} (1) & \dots & x_{m, 1, 1} (1) \\ ⋮ & ⋱ & ⋮ \\ x_{1, n, G} (o_{n}) & \dots & x_{m, n, G} (o_{n}) \end{matrix}) \in ℜ^{O \times m}

(2)

At first, the sampling data were normalized. Thereafter, a matrix of uncorrelated variables

T \in ℜ^{O \times m}

was extracted using the PCA transform [20], illustrating data features:

T = X P

(3)

with

T = {[t_{1}, t_{2}, \dots t_{k}, \dots t_{O}]}^{'}

,

t_{k} = [t_{k 1}, \dots, t_{k m}]

, and P denoting the loading matrix acquired through an orthogonal transformation of the covariance matrix

Φ

, where it is achieved using eigendecomposition:

Φ = P Λ P^{^{'}}

(4)

where

Λ = d i a g (λ_{1}, λ_{2}, \dots, λ_{m})

denotes the diagonal matrix which comprises the eigenvalues arranged in a decreasing order.

2.1.2. PCA-Based Feature Selection

By dividing P and

λ

based on modeled and non-modeled variation, a reduction in data dimensionality can be attained. The first one,

P_{ℓ} \in ℜ^{m \times ℓ}

and

Λ_{ℓ} \in ℜ^{ℓ \times ℓ}

, covers the principal subspace, while the other portion,

P_{m - ℓ} \in ℜ^{m \times (m - ℓ)}

and

Λ_{m - ℓ} \in ℜ^{(m - ℓ) \times (m - ℓ)}

, covers the residual subspace. The columns of

P_{ℓ}

denote the eigenvectors of

Φ

related to the first largest eigenvalues in

Λ_{ℓ}

corresponding to the greatest variation in the data. ℓ is the number of principal components (PCs). Finally, the columns of

P_{m - ℓ}

denote the left

m - ℓ

eigenvectors associated with eigenvalues in

Λ_{m - ℓ}

[28].

Accordingly, PCA decomposes the original data set X into:

X = \hat{X} + E

(5)

with

T_{ℓ} = X P_{ℓ} and \hat{X} = T_{ℓ} P_{ℓ}^{^{'}}

(6)

where

T_{ℓ}

denotes the selected features acquired via the projection of X onto the first ℓ eigenvectors, which correspond to the greatest variances in the sample covariance matrix. Briefly, according to the eigendecomposition of the covariance matrix

Φ

, the PCA model is determined. The achieved PCA model is applied to extract and select important features to be classified next. These features need to be appropriately extracted in order to highlight the variations between healthy and faulty operating conditions [28].

It is necessary to extract features through the PCA model by completely enumerating a few possible values to achieve the high performance of ML classification-based techniques. In this work, the selected features extracted via the PCA model are the

T^{2}

statistics, the squared prediction error (SPE) statistic, the squared weighted error (SWE) statistic, the first retained principal components (

T_{ℓ}

), and the sampled statistical parameters of

T_{ℓ}

(mean, variance, skewness and kurtosis). The mentioned features are introduced next [20].

$T^{2}$ Statistic

The

T^{2}

statistic, which measures the variations in the principal components at different time samples, is defined as [20]

T_{k}^{2} = x_{k}^{T} P_{ℓ} Λ_{ℓ}^{- 1} P_{ℓ}^{T} x_{k}

(7)

Q Statistic

The Q statistic, also known as the squared prediction error (SPE), measures the projection of a data sample on the residual subspace, which provides an overall measure of how a data sample fits the PCA model. It is defined as [20]

Q_{k} = {∥(I - P_{ℓ} P_{ℓ}^{T}) x_{k}∥}^{2}

(8)

$S W E$ Statistic

In the remainder PCs’ subspace, SWE stands out as a very important measure. It is defined as [20]

S W E_{k} = x_{k}^{T} P_{m - ℓ} Λ_{m - ℓ}^{- 1} P_{m - ℓ}^{T} x_{k}

(9)

Statistical Parameters

The statistical measures are utilized, comprising sampled mean

μ

, variance

σ^{2}

, skewness

s k

, and kurtosis

k u r

of the ℓ retained principal components [20]. They are calculated according to the following equations:

μ_{κ} = \frac{1}{ℓ} \sum_{j = 1}^{ℓ} t_{k j}

(10)

σ_{k}^{2} = \frac{1}{ℓ} \sum_{j = 1}^{ℓ} {(t_{k j} - μ_{κ})}^{2}

(11)

s k_{k} = \frac{1}{ℓ} \sum_{j = 1}^{ℓ} {(\frac{t_{k j} - μ_{ι}}{σ_{κ}})}^{3}

(12)

k u r_{k} = \frac{1}{ℓ} \sum_{j = 1}^{ℓ} {(\frac{t_{k j} - μ_{κ}}{σ_{κ}})}^{4}

(13)

2.2. Faults Classification Using Machine Learning (ML) Techniques

After extracting and selecting the most descriptive and informative features from the data, ML classifiers are applied to these features for fault-classification purposes. These classifiers include support vector machines (SVM), K-nearest neighbors (KNN), decision tree (DT), discriminant analysis (DA), naive Bayes (NB), and random forest (RF).

2.2.1. Support Vector Machines

The SVM classifiers, which were introduced by Vapnik [30], are elaborated based on Structural Risk Minimization (SRM) to reduce the empirical danger and the confidence interval of the ML method in order to reach a favorable generalization capability. They have been demonstrated to be an exceptionally powerful and effective method for regression and classification purposes. The intent of SVM classification is to determine decision limits in the feature space which separate data items associated with diverse classes. It is used to build an optimal separating hyperplane between two classes to maximize the margin and simultaneously reduce a proportional quantity to the number of misclassification errors.

2.2.2. K-Nearest Neighbors

The KNN is a robust technique in non-parametric classification that is applied to determine which class, already known, unknown data appertain to. It is widely known for its simplicity and efficiency [31].

2.2.3. Decision Tree

A DT is a widely known technique that has been applied effectively in various applications [1,32]. In a hierarchical architecture consisting of nodes and ramifications, this technique arranges information extracted from training samples. The main concept of this classifier is illustrated in two steps: first, to reduce the least squares error for the next split of a node in the tree, and second, to predict the average of the dependent variable of all training samples covered or samples that are not visible in a leaf [32].

2.2.4. Discriminant Analysis

DA is an ML method that estimates the parameters of discriminant functions of the predictor variables using training samples. The discriminant functions establish limits in predictor space among different classes. Based on the predictor samples, the resulting classifier discriminates between the classes [33].

2.2.5. Naive Bayes

The NB [34] is a classification algorithm based on the Bayes theorem, which supposes that the features are all tentatively independent of each other. This supposition value greatly simplifies the representation of

p (C_{i} | x)

and the issue of estimating it from the training dataset [34].

2.2.6. Random Forest

The RF [35] is an ML algorithm dedicated to classification and regression tasks. The implementation of the decision-making procedure in this approach allows for stochastically discriminating between the various classes for a given collection of features. Based on a training dataset, its classification model predicts the appointment of novel points to a labeled class in consideration of the weighted function at those points’ nearby locations [35].

3. Simulation Results

3.1. System Description

The distributed structure was investigated in this study [8], which is a flexible application allowing the diversification of technologies and the fusion of several kinds of PV sensors. A possible configuration among others is shown in Figure 3. All components (panels/converters) are connected in parallel to the DC voltage bus with a voltage value of 500 (V). Since each panel is controlled independently, the downstream converter does not control the whole maximum power point tracking (MPPT). In fact, the MPPTs utilize the “Perturb and Observe” method by varying the voltage across the terminals of the PV array for the purpose of obtaining the maximum possible power. Furthermore, the controllers are robust against external disturbances.

The PV farm is composed of three PV arrays each providing a maximum of 4 kW connected to a DC/DC converter, where the boost converters’ outputs are commonly connected to the DC bus. Each single PV array contains two parallel strings, in which every string has eight groups, and each group comprises three modules of 20 cells each, related in series. A three-phase source converter transforms the 500 (V) DC to 260 (V) AC and maintains a unity power factor. To connect the converter to the grid, a three-phase coupling transform (260 V/25 kV–100 kVA) is applied (please refer to Table 1).

In this study, we focused on two PV arrays. We have injected the same types of fault for both fields under different levels of irradiance, as shown in Figure 4, including the following:

Line-to-line (LL) fault: The LL fault is accidental and can occur due to the low resistance between any two points in the PV array. The existence of the LL fault brings on modification in a PV array due to partial or total bypassing of modules through strings led by line-to-line short circuits [37]. The LL is modeled by a fault resistance of 10 $Ω$ [23,38], located between points $A_{i}$ (i = 1, 2) and $B_{i}$ either in $P V_{1}$ or $P V_{2}$ arrays.
Line-to-ground (LG) fault: The PV modules’ LG fault may be a consequence of an accidental short-circuit between at least one current-carrying conductor and the ground. This type of fault may give rise to DC arcs at the fault point, leading to an augmentation of critical safety concerns. During this study, the LG is modeled by a fault resistance of 20 $Ω$ [23], located between point $C_{i}$ and the ground.
Bypass Diode (Bp) fault: The Bp diode is used to protect cells from problems like shading. This diode is related anti-parallel to a set of cells. Generally, this type of fault is represented by an open/short circuit, inverted diode, or by impedance. It causes a mismatch in the I–V characteristics of the cell. In this work, the bypass diode fault in module 1 is depicted by a short circuit.
Connectivity fault: The absence of connectivity in PV strings is usually due to a decrease in/erosion of contact attachment between two modules. In this work, it is modeled by high resistance in point $D_{i}$ .

The injected faults are examined through four scenarios:

First scenario: The occurrence of more than one fault at the same time is possible in real-world applications. Therefore, we have injected different fault types in the same $P V_{1}$ array to take this condition into account (refer to Table 2).
Second scenario: Identical faults injected in $P V_{2}$ as $P V_{1}$ (refer to Table 2).
Third scenario: During the system operation, faults may occur in both PV arrays simultaneously (refer to Table 2).
Fourth scenario: Here, all the above scenarios were combined and considered.

In healthy conditions and under an irradiance level of 1000 W/m

^{2}

, the PV arrays provide the same measurements of current

x_{1} = x_{2} = 15.92

A; the output voltage of the

P V_{1, 2}

panels is

x_{3} = x_{4} = 250.90

V, the bus voltage is

x_{5} = 500

V, and the grid current phase is

x_{6} = 0.39

A. Under an LL fault injected in

P V_{1}

and an irradiance level of 1000 W/m

^{2}

, the current

x_{1}

decreases by 50%. Also, the variable

x_{6}

decreases by 18%, while the output voltage (

x_{3}

) is not significantly affected by this type of fault, reaching a diminution of 2%. Similarly, for the LL fault injected in

P V_{2}

, the variables

x_{2}

,

x_{4}

, and

x_{6}

diminish by 50%, 2%, and 18%, respectively. Under the LG fault injected in

P V_{1} / P V_{2}

,

x_{1}

and

x_{2}

decrease by 26%, and

x_{6}

diminishes by 10%, while at the same time,

x_{3}

and

x_{4}

witness a minor diminution with an order of 2.8%. Dealing with the

B p

fault injected in both arrays

P V_{1} / P V_{2}

separately, the variables

x_{3}

,

x_{4}

, and

x_{6}

undergo a diminution of 2.5%, and

x_{1} / x_{2}

undergo negligible decreases. Furthermore, under the

C n

fault introduced in

P V_{1} / P V_{2}

, the current

x_{1} / x_{2}

experiences a significant decline of 50% and the output voltages undergoes a minor increase, while

x_{6}

witnesses a decrease by an order of 18%. The bus voltage

x_{5}

keeps the same value of 500 V under any type of studied fault. It is clearly shown from these interpretations that the high resemblances between the same faults, regardless of the array (

P V_{1} / P V_{2}

) into which faults are injected, affect the system behavior in a similar manner.

3.2. Classifiers’ Hyper-Parameters Setting

We have tested a variety of ML parameters over this study to obtain optimal values, which are illustrated in Table 3. Additionally, a 10 k-fold cross-validation design was approved in order to evaluate the performance of the learning techniques.

3.3. Computer System Setting

The research was carried out in Matlab 2021b, running on ASUSTEK Computer Inc., Taipei, Taiwan, Intel Core i7-1165G7 CPU at 2.80 GHz with 8 GB RAM, using the Simulink platform for the grid-connected PV system’s simulation and the statistics and ML Toolbox for ML models.

3.4. Classification Results

3.4.1. Standard Strategy

At a certain irradiance level, the risk of confusion between the GCPV system’s operating modes increases. Before applying the novel strategy, we demonstrated this challenge by injecting the most frequent faults (the LL fault [39,40] and the connectivity fault [41,42]) into the GCPV system. The studied faults are considered serious faults influencing power generation. To carry out this challenge, we considered three operation modes under different irradiance levels. For example, the healthy mode was performed under an irradiance level of 400 W/m

^{2}

, while the two faulty modes were injected under 550 W/m

^{2}

irradiance level. Nevertheless, there are other cases of resemblance in which the system in faulty mode at a given irradiance acts like a system in normal operation under other irradiance levels. This problem is discussed according to two different approaches whether the feature extraction and selection phase is executed automatically or whether it needs an external model through PCA when dealing with KNN using measured variables (see Table 1). Based on the results illustrated in Table 4 and Table 5 through the confusion matrices, which describe the accurately classified observations and the misclassified ones, the high misclassification between the studied operating classes is clearly shown. For the healthy class, the PCA-based KNN method establishes 2915 samples among 5000, where 41.7% of misclassification is given in such a way that 1914 and 171 samples are misclassified as LL faults and connectivity faults, respectively. Taking another example, for the LL fault class, the PCA-KNN approach determines 2708 observations (true positive) with 45.84% of observations being misclassifications, where 2171 and 121 observations are misplaced as healthy and connectivity fault, respectively. Furthermore, the ANN classifier dealing with the same class establishes 2874 samples among 5000, providing misclassifications of 42.52% of observations.

3.4.2. Novel Strategy

In order to perform the novel strategy, the measured variables illustrated in Table 1 indicate one healthy (class

C_{0}

) and 10 faulty operating cases of a PV system (class

C_{i}; i = 1, \dots, 10

). Each mode was collected under ten irradiance levels (including 250, 400, 550, 700, 850, 1000, 1150, 1300, 1450, and 1600 W/m

^{2}

), where each level is described over 5000 samples. In this way, every operating mode was represented by over 50,000 observations (

5000 \times 10

).

Firstly, in order to test the robustness of the novel modeling strategy in terms of misclassification, we have considered the same operating modes (healthy, line-to-line fault, connectivity fault) studied in Section 3.4.1. The training phase was performed under the 10 irradiance levels, and the testing phase was carried out by considering the same condition of irradiance discussed above, where the healthy case was studied under 400 W/m

^{2}

and the faulty modes were injected under 550 W/m

^{2}

. Based on the results presented in Table 6 and Table 7, the improved techniques of the new strategy increased the accuracy and rectified the high misclassification rates obtained previously (discussed in Section 3.4.1). Taking an example from the healthy class, the KNN-based PCA method determined 4888 samples among 5000, where 2.24% (112 samples) were misclassified. In addition, for the line-to-line faulty class, the PCA-KNN approach established 4990 samples (true positive), with negligible (0.2%) misclassification. The ANN classifier reached a 100% accuracy and no misclassifications.

Next, the validation will be presented using different ML-based PCA methods through four groups of features. The computed features are:

Group 1: sampled mean, variance, skewness, and kurtosis of the ℓ retained PCs;
Group 2: $T^{2}, Q$ ;
Group 3: $T^{2}, S W E$ ;
Group 4: $T_{ℓ}$ .

The training phase was performed on 70% of the 50,000 collected observations, while the testing phase was carried out on the remaining 30% of the data, as presented in Table 8.

Selecting an appropriate number of principal components (PCs) is essential in constructing a PCA model [20], as the number of retained PCs has a considerable effect on each stage of modeling and monitoring processes. Minimizing the variance in the reconstruction error is proposed by Qin and Dunia [43] in order to determine the number of PCs (in this study,

ℓ = 6

). Figure 5 shows the 3D plot of the three first PCs acquired under different operating conditions, and Figure 6 and Figure 7 indicate the scatter plots of the

T^{2}

and Q statistics and the

T^{2}

and

S W E

statistics, respectively. Eleven classes are obviously observed from these figures. Nevertheless, they are not completely distinguished. The classifiers are trained via the previously mentioned features to further enhance the discrimination results.

3.4.3. Multi-Class Classification Results

According to the results presented in Table 9, it is clear that the selected features of group 1 (sampled mean, variance, skewness, and kurtosis of the ℓ retained PCs) and the features of group 4 (the first ℓ = 6 PCs) provide high accuracies. It can be clearly seen that KNN, SVM, and RF classifiers provide the best results using the features of group 4 as inputs. Thus, the accuracy rates have effectively reached 98.06%, 98.00%, and 95.87%, respectively. The confusion matrix for the testing phase of the KNN classifier is illustrated in Table 10.

The confusion matrix is determined in order to evaluate the performance of each classifier, by which some metrics are measured. Accuracy is the most important metric; it indicates the correctness of the classification. Furthermore, other metrics are utilized: Recall and Precision. They are, respectively, assigned as:

Recall = \frac{T P}{T P + F N}

(14)

Precision = \frac{T P}{T P + F P}

(15)

where a

T P

(true positive) is a correctly classified positive sample,

F P

(false positive) is a sample misclassified as positive,

T N

(true negative) is a properly classified negative sample, and

F N

(false negative) is a sample misclassified as negative.

3.4.4. One-Class Classification Results

The one-class classification methods have an advantage over multi-class classification-based methods in that they can avoid the misclassification issue. In fact, by learning from a training dataset including just the samples of that class, these algorithms identify data of certain classes among all data. The fault classification task is to determine to which class the data belong. Compared with the multi-class classification-based algorithms, the one-class classification-based algorithms can be more effective [5].

Therefore, in this study, a group of one-class classifiers is used with the intention to improve further the aforementioned results. In this study, the classifier bank utilized eleven classes (one healthy class and ten faulty classes). As shown in Figure 8 and Table 11, the strategy of one-class classifiers simply assigns a 1 (logical TRUE) to the target class and assigns a −1 to all other classes. The class is identified through the combination of results from all the techniques based on the logic sequences indicated in Table 9. Table 12 illustrates the findings obtained from employing the bank of classifiers and utilizing group 4 as input.

The obtained results demonstrate a high classification performance enhancement compared with multi-class classifiers using the same group of features. As an example, the multi-class classification-based DA-based group 4 of features gives an accuracy of 35.13%, while the one-class classification-based DA reaches an average of accuracy 90.98%. As a conclusion based on results presented in Table 12, the composition of one-class classification-based methods (including KNN, DT, SVM, and RF) and group 4 of features show the highest accuracy rates by reaching an average accuracy of 99.64%, 98.68%, 99.47%, and 99.23%, respectively.

4. Conclusions

In the present study, the challenge of fault detection and diagnosis (FDD) for grid-connected PV systems under irradiance variations was presented. The major concept behind this work was to highlight the impact of irradiance variations on the GCPV system modeling and fault diagnosis performances. The proposed approach was based on machine learning (ML)-based principal component analysis (PCA). It was designed so that the PCA technique was applied for feature extraction and selection, and the ML methods were utilized for fault-diagnosis purposes. Different cases, including cases with different features and two distinct types of classifiers (multi-class and one-class), were investigated in order to show the robustness and the efficiency of the developed procedure. These techniques have been tested and investigated using simulated GCPV data under different irradiance levels depicting different operating conditions. The obtained results demonstrated the reliability and the high performance of the proposed diagnosis paradigms.

Author Contributions

Methodology, Z.Y., M.H., and M.M.; Validation, Z.Y. and M.H.; Investigation, M.H. and K.B; Writing—original draft, Z.Y.; Writing—review and editing, M.H. and M.M.; Supervision, M.H., M.M., and K.B. All authors have read and agreed to the published version of the manuscript.

Funding

Open Access funding provided by the Qatar National Library.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available upon Editor request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FDD	Fault Detection and Diagnosis
PV	Photovoltaic
GCPV	Grid Connected PV
PCA	Principal Component Analysis
ℓ	Number of retained PCs
ML	Machine learning
SVM	Support Vector Machine
KNN	K-Nearest Neighbors
DT	Decision Tree
DA	Discriminant Analysis
NB	Naive Bayes
RF	Random Forest
CM	Confusion Matrix

References

Benkercha, R.; Moulahoum, S. Fault detection and diagnosis based on decision tree algorithm for grid connected PV system. Sol. Energy 2018, 173, 610–634. [Google Scholar] [CrossRef]
Mansouri, M.; Trabelsi, M.; Nounou, H.; Nounou, M. Deep learning based fault diagnosis of photovoltaic systems: A comprehensive review and enhancement prospects. IEEE Access 2021, 126286–126306. [Google Scholar] [CrossRef]
Blaifi, S.A.; Moulahoum, S.; Benkercha, R.; Taghezouit, B.; Saim, A. M5P model tree based fast fuzzy maximum power point tracker. Sol. Energy 2018, 163, 405–424. [Google Scholar] [CrossRef]
Rouani, L.; Harkat, M.F.; Kouadri, A.; Mekhilef, S. Shading fault detection in a grid-connected PV system using vertices principal component analysis. Renew. Energy 2021, 164, 1527–1539. [Google Scholar] [CrossRef]
Zhao, Y.; Li, T.; Zhang, X.; Zhang, C. Artificial intelligence-based fault detection and diagnosis methods for building energy systems: Advantages, challenges and the future. Renew. Sustain. Energy Rev. 2019, 109, 85–101. [Google Scholar] [CrossRef]
Tidridi, K.; Chatti, N.; Verron, S.; Tiplica, T. Bridging data-driven and model-based approaches for process fault diagnosis and health monitoring: A review of researches and future challenges. Annu. Rev. Control. 2016, 42, 63–81. [Google Scholar] [CrossRef]
Abid, A.; Khan, M.T.; Iqbal, J. A review on fault detection and diagnosis techniques: Basics and beyond. Artif. Intell. Rev. 2021, 54, 3639–3664. [Google Scholar] [CrossRef]
Yahyaoui, Z.; Hajji, M.; Mansouri, M.; Abodayeh, K.; Bouzrara, K.; Nounou, H. Effective Fault Detection and Diagnosis for Power Converters in Wind Turbine Systems Using KPCA-Based BiLSTM. Energies 2022, 15, 6127. [Google Scholar] [CrossRef]
Ziane, A.; Dabou, R.; Sahouane, N.; Necaibia, A.; Mostefaoui, M.; Bouraiou, A.; Slimani, A. Detecting partial shading in grid-connected PV station using random forest classifier. In Proceedings of the International Conference in Artificial Intelligence in Renewable Energetic Systems, Saidia, Morocco, 3–15 April 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 88–95. [Google Scholar]
Chen, Z.; Han, F.; Wu, L.; Yu, J.; Cheng, S.; Lin, P.; Chen, H. Random forest based intelligent fault diagnosis for PV arrays using array voltage and string currents. Energy Convers. Manag. 2018, 178, 250–264. [Google Scholar] [CrossRef]
Omran, A.H.; Said, D.M.; Hussin, S.M.; Ahmad, N.; Samet, H. A novel intelligent detection schema of series arc fault in photovoltaic (PV) system based convolutional neural network. Period. Eng. Nat. Sci. PEN 2020, 8, 1641–1653. [Google Scholar]
Chine, W.; Mellit, A.; Lughi, V.; Malek, A.; Sulligoi, G.; Pavan, A.M. A novel fault diagnosis technique for photovoltaic systems based on artificial neural networks. Renew. Energy 2016, 90, 501–512. [Google Scholar] [CrossRef]
Miao, W.; Xu, Q.; Lam, K.; Pong, P.W.; Poor, H.V. DC arc-fault detection based on empirical mode decomposition of arc signatures and support vector machine. IEEE Sens. J. 2020, 21, 7024–7033. [Google Scholar] [CrossRef]
Ahmadipour, M.; Hizam, H.; Othman, M.L.; Mohd Radzi, M.A.; Chireh, N. A fast fault identification in a grid-connected photovoltaic system using wavelet multi-resolution singular spectrum entropy and support vector machine. Energies 2019, 12, 2508. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, L.; Lehman, B.; de Palma, J.F.; Mosesian, J.; Lyons, R. Decision tree-based fault detection and classification in solar photovoltaic arrays. In Proceedings of the 2012 Twenty-Seventh Annual IEEE Applied Power Electronics Conference and Exposition (APEC), Orlando, FL, USA, 5–9 February 2012; pp. 93–99. [Google Scholar]
Madeti, S.R.; Singh, S. Modeling of PV system based on experimental data for fault detection using kNN method. Sol. Energy 2018, 173, 139–151. [Google Scholar] [CrossRef]
Patil, M.; Hinge, T. Improved Fault Detection and Location Scheme for Photovoltaic System. In Proceedings of the 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), Vellore, India, 22–23 March 2019; Volume 1, pp. 1–6. [Google Scholar]
Eskandari, A.; Milimonfared, J.; Aghaei, M. Fault detection and classification for photovoltaic systems based on hierarchical classification and machine learning technique. IEEE Trans. Ind. Electron. 2020, 68, 12750–12759. [Google Scholar] [CrossRef]
Eskandari, A.; Milimonfared, J.; Aghaei, M.; Reinders, A.H. Autonomous monitoring of line-to-line faults in photovoltaic systems by feature selection and parameter optimization of support vector machine using genetic algorithms. Appl. Sci. 2020, 10, 5527. [Google Scholar] [CrossRef]
Hajji, M.; Harkat, M.F.; Kouadri, A.; Abodayeh, K.; Mansouri, M.; Nounou, H.; Nounou, M. Multivariate feature extraction based supervised machine learning for fault detection and diagnosis in photovoltaic systems. Eur. J. Control 2021, 59, 313–321. [Google Scholar] [CrossRef]
Fadhel, S.; Delpha, C.; Diallo, D.; Bahri, I.; Migan, A.; Trabelsi, M.; Mimouni, M.F. PV shading fault detection and classification based on IV curve using principal component analysis: Application to isolated PV system. Sol. Energy 2019, 179, 1–10. [Google Scholar] [CrossRef]
Gokmen, N.; Karatepe, E.; Silvestre, S.; Celik, B.; Ortega, P. An efficient fault diagnosis method for PV systems based on operating voltage-window. Energy Convers. Manag. 2013, 73, 350–360. [Google Scholar] [CrossRef]
Yi, Z.; Etemadi, A.H. Fault detection for photovoltaic systems based on multi-resolution signal decomposition and fuzzy inference systems. IEEE Trans. Smart Grid 2016, 8, 1274–1283. [Google Scholar] [CrossRef]
Pillai, D.S.; Rajasekar, N. An MPPT-based sensorless line–line and line–ground fault detection technique for PV systems. IEEE Trans. Power Electron. 2018, 34, 8646–8659. [Google Scholar] [CrossRef]
Das, S.; Hazra, A.; Basu, M. Metaheuristic optimization based fault diagnosis strategy for solar photovoltaic systems under non-uniform irradiance. Renew. Energy 2018, 118, 452–467. [Google Scholar] [CrossRef]
Zelikman, E.; Zhou, S.; Irvin, J.; Raterink, C.; Sheng, H.; Avati, A.; Kelly, J.; Rajagopal, R.; Ng, A.Y.; Gagne, D. Short-term solar irradiance forecasting using calibrated probabilistic models. arXiv 2020, arXiv:2010.04715. [Google Scholar]
Joe Qin, S. Statistical process monitoring: Basics and beyond. J. Chemom. A J. Chemom. Soc. 2003, 17, 480–502. [Google Scholar] [CrossRef]
Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Jackson, J.E.; Mudholkar, G.S. Control procedures for residuals associated with principal component analysis. Technometrics 1979, 21, 341–349. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Feng, C.; Zhao, B.; Zhou, X.; Ding, X.; Shan, Z. An Enhanced Quantum K-Nearest Neighbor Classification Algorithm Based on Polar Distance. Entropy 2023, 25, 127. [Google Scholar] [CrossRef] [PubMed]
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Abingdon, UK, 2017. [Google Scholar]
Alkan, A.; Günay, M. Identification of EMG signals using discriminant analysis and SVM classifier. Expert Syst. Appl. 2012, 39, 44–47. [Google Scholar] [CrossRef]
Muralidharan, V.; Sugumaran, V. A comparative study of Naïve Bayes classifier and Bayes net classifier for fault diagnosis of monoblock centrifugal pump using wavelet analysis. Appl. Soft Comput. 2012, 12, 2023–2029. [Google Scholar] [CrossRef]
Lakshmanaprabu, S.; Shankar, K.; Ilayaraja, M.; Nasir, A.W.; Vijayakumar, V.; Chilamkurti, N. Random forest for big data classification in the internet of things using optimal features. Int. J. Mach. Learn. Cybern. 2019, 10, 2609–2618. [Google Scholar] [CrossRef]
Hajji, M.; Yahyaoui, Z.; Mansouri, M.; Nounou, H.; Nounou, M. Fault detection and diagnosis in grid-connected PV systems under irradiance variations. Energy Rep. 2023, 9, 4005–4017. [Google Scholar] [CrossRef]
Dhoke, A.; Sharma, R.; Saha, T.K. An approach for fault detection and location in solar PV systems. Sol. Energy 2019, 194, 197–208. [Google Scholar] [CrossRef]
Aziz, F.; Haq, A.U.; Ahmad, S.; Mahmoud, Y.; Jalal, M.; Ali, U. A novel convolutional neural network-based approach for fault classification in photovoltaic arrays. IEEE Access 2020, 8, 41889–41904. [Google Scholar] [CrossRef]
Yi, Z.; Etemadi, A.H. Line-to-line fault detection for photovoltaic arrays based on multiresolution signal decomposition and two-stage support vector machine. IEEE Trans. Ind. Electron. 2017, 64, 8546–8556. [Google Scholar] [CrossRef]
Boggarapu, P.K.; Manickam, C.; Lehman, B.; Ganesan, S.I.; Chilakapati, N. Identification of pre-existing/undetected line-to-line faults in pv array based on preturn on/off condition of the pv inverter. IEEE Trans. Power Electron. 2020, 35, 11865–11878. [Google Scholar] [CrossRef]
Fadhel, S.; Trabelsi, M.; Bahri, I.; Diallo, D.; Mimouni, M.F. Faults effects analysis in a photovoltaic array based on current-voltage and power-voltage characteristics. In Proceedings of the 2016 17th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), Sousse, Tunisia, 19–21 December 2016; pp. 223–228. [Google Scholar]
Hichri, A.; Hajji, M.; Mansouri, M.; Abodayeh, K.; Bouzrara, K.; Nounou, H.; Nounou, M. Genetic-Algorithm-Based Neural Network for Fault Detection and Diagnosis: Application to Grid-Connected Photovoltaic Systems. Sustainability 2022, 14, 10518. [Google Scholar] [CrossRef]
Dunia, R.; Qin, S.J.; Edgar, T.F.; McAvoy, T.J. Identification of faulty sensors using principal component analysis. AIChE J. 1996, 42, 2797–2812. [Google Scholar] [CrossRef]

Figure 1. Comportment of irradiance during three days in Bondville, Illinois, United States (BON).

Figure 2. Flowchart of proposed GCPV system fault detection and diagnosis methodology.

Figure 3. The studied GCPV system structure.

Figure 4. Internal structure of PV array.

Figure 5. The 3D plot of the first retained principal components.

Figure 6. Scattergram of

T^{2}

and Q statistics.

Figure 6. Scattergram of

T^{2}

and Q statistics.

Figure 7. Scattergram of

T^{2}

and

S W E

statistics.

Figure 7. Scattergram of

T^{2}

and

S W E

statistics.

Figure 8. Flowchart of sampling design for one-class classifiers.

Table 1. Description of the monitored system variables [36].

Variables	Descriptions
$x_{1}$	$I_{p v 1}$ : Output current of the $P V_{1}$ panel $(A)$
$x_{2}$	$I_{p v 2}$ : Output current of the $P V_{2}$ panel $(A)$
$x_{3}$	$V_{p v 1}$ : Output voltage of the $P V_{1}$ panel $(V)$
$x_{4}$	$V_{p v 2}$ : Output voltage of the $P V_{2}$ panel $(V)$
$x_{5}$	$V_{d c}$ : Bus voltage $(V)$
$x_{6}$	$i_{a}$ : Grid current phase a $(A)$
$x_{7}$	$i_{b}$ : Grid current phase b $(A)$
$x_{8}$	$i_{c}$ : Grid current phase c $(A)$

Table 2. Description of the injected faults [36].

Type of Fault	Fault Label	Fault Scenario
		Line-to-line fault (LL $_{1}$ )
		Line-to-ground fault (LG $_{1}$ )
Simple Fault in $P V$ $_{1}$	SFC- $P V_{1}$	Bypass diode fault (Bp $_{1}$ )
		Connectivity fault (Cn $_{1}$ )
		Line-to-line fault (LL $_{2}$ )
		Line-to-ground fault (LG $_{2}$ )
Simple Fault in $P V$ $_{2}$	SFC- $P V_{2}$	Bypass diode fault (Bp $_{1}$ )
		Connectivity fault (Cn $_{1}$ )
		LL $_{1}$ + LL $_{2}$
Mixed Fault	MFC	Bp $_{1}$ + Cn $_{2}$

Table 3. Hyper-parameters setting.

Classifier	Hyper-Parameters	Types/Values
KNN	k-value	3
DA	discriminate	linear
NB	distribution	normal
DT	number of splits	50
SVM	kernel	radial basis function
RF	number of bags	50

Table 4. Confusion matrix for PCA-KNN approach under the standard strategy.

True Classes	Predicted Classes
True Classes	Healthy	Line-to-Line Fault	Connectivity Fault
Healthy	2915	1914	171
Line-to-Line Fault	2171	2708	121
Connectivity Fault	2083	2427	490

Table 5. Confusion matrix for ANN approach under the standard strategy.

True Classes	Predicted Classes
True Classes	Healthy	Line-to-Line Fault	Connectivity Fault
Healthy	3126	1874	0
Line-to-line Fault	1054	2874	1072
Connectivity Fault	436	2202	2362

Table 6. Confusion matrix for PCA-KNN approach under the novel strategy.

True Classes	Predicted Classes
True Classes	Healthy	Line-to-Line Fault	Connectivity Fault
Healthy	4888	91	21
Line-to-Line Fault	10	4990	0
Connectivity Fault	24	0	4976

Table 7. Confusion matrix for ANN approach under the novel strategy.

True Classes	Predicted Classes
True Classes	Healthy	Line-to-Line Fault	Connectivity Fault
Healthy	5000	0	0
Line-to-Line Fault	0	5000	0
Connectivity Fault	0	0	5000

Table 8. Collection of data for fault diagnosis system.

Classes	Mode	Training Data	Testing Data
$C_{0}$	Healthy	35,000	15,000
$C_{1}$		35,000	15,000
$C_{2}$	SFC-PV $_{1}$	35,000	15,000
$C_{3}$		35,000	15,000
$C_{4}$		35,000	15,000
$C_{5}$		35,000	15,000
$C_{6}$	SFC-PV $_{2}$	35,000	15,000
$C_{7}$		35,000	15,000
$C_{8}$		35,000	15,000
$C_{9}$	MFC	35,000	15,000
$C_{10}$		35,000	15,000

Table 9. Accuracies of extracted features with different classifiers under new strategy.

Classifiers	Data Extracted Features
Classifiers	Group 1	Group 2	Group 3	Group 4
KNN	97.41	58.23	79.82	98.06
DA	38.36	31.76	31.60	35.13
NB	58.56	44.78	47.92	58.87
DT	84.76	58.69	66.52	84.76
SVM	97.85	56.78	78.33	98.00
RF	96.36	71.04	78.43	95.87

Table 10. Confusion matrix of KNN in testing phase.

True Classes	Predicted Classes											Recall
True Classes	$C_{0}$	$C_{1}$	$C_{2}$	$C_{3}$	$C_{4}$	$C_{5}$	$C_{6}$	$C_{7}$	$C_{8}$	$C_{9}$	$C_{10}$	Recall
$C_{0}$	14,946	35	0	0	8	10	0	0	1	0	0	99.64
$C_{1}$	6	14,489	0	0	501	4	0	0	0	0	0	96.59
$C_{2}$	0	0	14,949	12	0	0	0	0	0	0	39	99.66
$C_{3}$	0	0	7	14,915	0	0	0	0	0	0	78	99.43
$C_{4}$	78	0	0	827	13,960	0	0	0	135	0	0	93.06
$C_{5}$	0	0	0	0	3	14,700	0	0	297	0	0	98.0
$C_{6}$	0	0	0	0	11	0	14,966	23	0	0	0	99.77
$C_{7}$	0	0	0	0	0	0	13	14,987	0	0	0	99.91
$C_{8}$	93	0	4	0	213	0	0	573	14,117	0	0	94.11
$C_{9}$	8	14	0	0	24	29	0	0	41	14,884	0	99.23
$C_{10}$	0	4	44	64	0	0	0	0	0	0	14,888	99.25
Precision	98.77	99.63	99.63	94.29	97.84	99.71	99.91	96.17	95.86	100.0	99.22	98.06

Table 11. Concept of multiple one-class classifiers for fault diagnosis.

	Classes
	$C_{0}$	$C_{1}$	$C_{2}$	$C_{3}$	$C_{4}$	$C_{5}$	$C_{6}$	$C_{7}$	$C_{8}$	$C_{9}$	$C_{10}$
$C_{0}$	$1$	−1	−1	−1	−1	−1	−1	−1	−1	−1	−1
$C_{1}$	−1	$1$	−1	−1	−1	−1	−1	−1	−1	−1	−1
$C_{2}$	−1	−1	$1$	−1	−1	−1	−1	−1	−1	−1	−1
$C_{3}$	−1	−1	−1	$1$	−1	−1	−1	−1	−1	−1	−1
$C_{4}$	−1	−1	−1	−1	$1$	−1	−1	−1	−1	−1	−1
$C_{5}$	−1	−1	−1	−1	−1	$1$	−1	−1	−1	−1	−1
$C_{6}$	−1	−1	−1	−1	−1	−1	$1$	−1	−1	−1	−1
$C_{7}$	−1	−1	−1	−1	−1	−1	−1	$1$	−1	−1	−1
$C_{8}$	−1	−1	−1	−1	−1	−1	−1	−1	$1$	−1	−1
$C_{9}$	−1	−1	−1	−1	−1	−1	−1	−1	−1	$1$	−1
$C_{10}$	−1	−1	−1	−1	−1	−1	−1	−1	−1	−1	$1$

Table 12. Accuracies of extracted features under the proposed strategy.

Class	Classifiers
Class	KNN	DA	NB	DT	SVM	RF
$C_{0}$	99.86	90.91	90.49	97.75	99.92	99.51
$C_{1}$	99.66	90.91	90.85	99.73	99.33	99.58
$C_{2}$	99.93	92.58	91.16	99.18	99.60	99.18
$C_{3}$	99.40	90.91	92.74	98.97	99.50	99.34
$C_{4}$	98.91	90.91	78.08	97.69	99.02	98.79
$C_{5}$	99.79	90.91	89.59	98.92	99.23	98.96
$C_{6}$	99.97	91.81	90.07	99.64	99.43	99.95
$C_{7}$	99.63	90.91	96.29	99.27	99.84	99.33
$C_{8}$	99.17	90.91	78.85	96.70	99.23	98.56
$C_{9}$	99.92	90.91	93.79	98.43	99.15	99.55
$C_{10}$	99.86	89.09	93.80	99.23	99.92	98.79
Average	99.64	90.98	90.43	98.68	99.47	99.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yahyaoui, Z.; Hajji, M.; Mansouri, M.; Bouzrara, K. One-Class Machine Learning Classifiers-Based Multivariate Feature Extraction for Grid-Connected PV Systems Monitoring under Irradiance Variations. Sustainability 2023, 15, 13758. https://doi.org/10.3390/su151813758

AMA Style

Yahyaoui Z, Hajji M, Mansouri M, Bouzrara K. One-Class Machine Learning Classifiers-Based Multivariate Feature Extraction for Grid-Connected PV Systems Monitoring under Irradiance Variations. Sustainability. 2023; 15(18):13758. https://doi.org/10.3390/su151813758

Chicago/Turabian Style

Yahyaoui, Zahra, Mansour Hajji, Majdi Mansouri, and Kais Bouzrara. 2023. "One-Class Machine Learning Classifiers-Based Multivariate Feature Extraction for Grid-Connected PV Systems Monitoring under Irradiance Variations" Sustainability 15, no. 18: 13758. https://doi.org/10.3390/su151813758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

One-Class Machine Learning Classifiers-Based Multivariate Feature Extraction for Grid-Connected PV Systems Monitoring under Irradiance Variations

Abstract

1. Introduction

2. Proposed Technique

2.1. Feature Extraction and Selection Using Principal Component Analysis (PCA)

2.1.1. PCA-Based Feature Extraction

2.1.2. PCA-Based Feature Selection

T 2 Statistic

Q Statistic

S W E Statistic

Statistical Parameters

2.2. Faults Classification Using Machine Learning (ML) Techniques

2.2.1. Support Vector Machines

2.2.2. K-Nearest Neighbors

2.2.3. Decision Tree

2.2.4. Discriminant Analysis

2.2.5. Naive Bayes

2.2.6. Random Forest

3. Simulation Results

3.1. System Description

3.2. Classifiers’ Hyper-Parameters Setting

3.3. Computer System Setting

3.4. Classification Results

3.4.1. Standard Strategy

3.4.2. Novel Strategy

3.4.3. Multi-Class Classification Results

3.4.4. One-Class Classification Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

$T^{2}$ Statistic

$S W E$ Statistic