A Sustainable Fault Diagnosis Approach for Photovoltaic Systems Based on Stacking-Based Ensemble Learning Methods

Mellit, Adel; Zayane, Chadia; Boubaker, Sahbi; Kamel, Souad

doi:10.3390/math11040936

Open AccessArticle

A Sustainable Fault Diagnosis Approach for Photovoltaic Systems Based on Stacking-Based Ensemble Learning Methods

¹

Faculty of Sciences and Technology, University of Jijel, Jijel 18000, Algeria

²

Department of Electrical and Computer Engineering, College of Engineering, King Abdul Aziz University, Jeddah 22254, Saudi Arabia

³

Department of Computer and Network Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 21959, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(4), 936; https://doi.org/10.3390/math11040936

Submission received: 24 December 2022 / Revised: 4 February 2023 / Accepted: 8 February 2023 / Published: 12 February 2023

(This article belongs to the Section Engineering Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

In this study, a novel technique for identifying and categorizing flaws in small-scale photovoltaic systems is presented. First, a supervised machine learning (neural network) was developed for the fault detection process based on the estimated output power. Second, an extra tree supervised algorithm was used for extracting important features from a current-voltage (I–V) curve. Third, a multi-stacking-based ensemble learning algorithm was developed to effectively classify faults in solar panels. In this work, single faults and multiple faults are investigated. The benefit of the stacking strategy is that it can combine the strengths of several machine learning-based algorithms that are known to deliver good results on classification tasks, producing results that are more precise and efficient than those produced by a single algorithm. The approach was tested using an experimental dataset and the findings show that it could accurately diagnose faults (a detection rate of around 98.56% and a classification rate of around 96.21%). A comparison study with different ensemble learning algorithms (AdaBoost, CatBoost, and XGBoost) was conducted to evaluate the effectiveness of the suggested method.

Keywords:

multi-stacking ensemble learning; solar systems; photovoltaic; fault detection; fault classification; machine learning

MSC:

97P80

1. Introduction

According to the International Energy Agency (IEA), global photovoltaic (PV) capacity installations around the world reached 942 GWp at the end of 2021 [1]. Solar power facilities are exposed to environmental and technical problems that need to be resolved for energy production to be maintained at planned rates and contribute to the response to the constantly increasing energy needs [2]. For instance, solar systems (domestic installations or large-scale solar farms) may experience breakdowns or serious energy losses because of aging or damage-induced failures occurring in the panels or other PV system components [3]. Thus, the huge number of PV plants that are installed worldwide need to be monitored and supervised carefully [4,5]. Automatic fault detection and isolation (FDI) is a highly required practice by maintenance practitioners in order to save time with visual checks and hand-made measurements. Conducting careful FDI in PV panels may contribute to the sustainability of this renewable resource gradually replacing classical fossil fuel energy resources. With the rapid growth in computing tools, data science, and artificial intelligence (AI), data-based (as opposed to physical-based) FDI models for PV solar systems are emerging.

Currently, many machine learning-based techniques (ML is a branch of AI) for diagnosing PV faults are being developed. For example, in [6], the authors developed an ensemble learning (EL) approach for the fault diagnosis of a small-scale photovoltaic installation. Their approach was based on the idea of combining the outputs of individual machine learning algorithms to obtain better predictions. The obtained results show the EL algorithm is more efficient than individual ML algorithms in terms of faulty operation mode classification. The authors also stated their approach to having high generalization ability. In their most recent research on the application of machine learning (ML) algorithms in solar systems [7], the authors stressed the capabilities of these algorithms in studying fault detection and isolation as well as their contributions in the field of preventive maintenance of PV installation. Three dominant trends were investigated in the last few years using ML and deep learning (DL) techniques: (i) the analysis of PV systems based on data (ii) processing of infrared images and (iii) data classification. The developed algorithms ranged from simple artificial neural networks (ANNs) to DL algorithms (CNNs: convolutional neural networks; LSTM: long-short term memory,...). Reference [8] focused on the combination of the Internet of Things and machine learning for the diagnosis of PV panels. One key benefit of the proposed framework is that data are collected in the cloud and that the system was implemented practically based on a Raspberry Pi microcontroller. As stated by the authors, the efficiency of the proposed approach was high, reaching more than 0.96 (on a scale ranging from 0 to 1) accuracy in, respectively, fault detection and classification.

In [9], a critical study (of relatively recent works) about the use of ANNs in PV fault categorization and detection was conducted. It was reported that the efficiency of both basic and deep neural networks was (globally) more than 90%. The two main limitations faced by the surveyed works are, respectively, the non-availability of open data and the difficulty of setting up the hyperparameters of training algorithms. Other problems remain open for more investigations, such as the FDI based on infrared imagery and convolutional neural networks (CNNs). The work conducted in [10] used ensemble learning (EL) based on voting principles (including linear regression (LR), support vector machine (SVM), and decision tree (DT)) of the FDI of a solar energy installation. The results show that the EL-proposed approach is more efficient than other methods in terms of custom performance metrics, including precision and F1-score. An EL approach for fault detection and classification in PV arrays was developed in [11]. The authors investigated a principle component analysis (PCA) algorithm for feature selection, used an LSTM DL algorithm for forecasting the power of the system, and detected/classified the faults using a category boosting (CatBoost) algorithm. The result accuracies were more than 98% in both detection and classification.

To date, the most popular supervised machine learning-based techniques in this field are k-nearest neighbors (k-NN), neural networks (NNs), support vector machine (SVM), random forest (RF), decision tree (DT) [12], and other boosting algorithms [13]. From the EL family, algorithms, such as extreme gradient boosting(XGBoost), categorical boosting (CatBoost), and light gradient boosting (LightGBM) were introduced and have good ability in resolving regression and classification problems [14]. To the best of the authors’ knowledge, this is the first time that stacking-based ensemble learning algorithms are examined in fault diagnoses in solar systems based on I–V curves, considering single and multiple faults. Few works are related to the application of ensemble learning methods in the fault classification of PV systems; most of them focus only on single faults [15].

Here is a summary of the work’s major contributions:

We introduce an efficient and novel technique for PV system defect detection and categorization. It consists of combining NNs and a multi-stacking algorithm to recognize and determine the type of the defect
An accurate neural network-based model was developed for PV power prediction; the main role of this model is to make decisions about the state of the PV system (i.e., if it is healthy or not) by making a comparison between the predicted power with the one measured.
To categorize the fault’s kind (i.e., nature of faults), multi-stacking ensemble learning based on boosting algorithms (such as XGBoost, LightGBM, and CatBoost) was designed.
Single and multiple faults were investigated and compared with other machine learning techniques. Single fault means only one defect can occur in the PV array (e.g., short-circuited PV modules, open circuit, dust accumulated on PV modules, etc.). While multi-fault means that at least two faults occur at the same time (dust accumulates with the defective bypass diode, etc.).

The remainder of the article is structured as follows: In Section 2, the materials and methods include the dataset, the used experimental setup, a description of the stacking EL algorithms, and their related mathematical formalism. The obtained results are discussed in Section 3. In the final Section 4, the conclusions are provided along with suggestions for the future.

2. Materials and Methods

2.1. Datasets

To develop our fault detection and classification method, two datasets were employed. The first one comprises measured solar irradiance, air temperature, and photovoltaic output power sampled at 1 h. The size of the first dataset is 7000 samples (around 365 days with some missing data). As an example, Figure 1 shows the measured data for a period of 11 days.

The second dataset contains measured current-voltage (I–V) curves (1500 samples). Both datasets were collected under various working conditions. The data were collected during a period of 125 days (12 samples each day). The frequent faults that may occur in PV arrays were investigated in this work. Namely, dust deposit on PV modules (F1), partial shading effect (F2), open circuit diode with dust accumulation (F3), partial shading with dust accumulation (F4), and shunted diode with shading (F5) (see Table 1).

Figure 2 shows the considered PV array, the test facility, and the used equipment for collecting the I–V curves (in faulty or normal working conditions).

Figure 3 depicts an example of the measured I–V curves under various working conditions.

2.2. Methods

This section provides the mathematical concepts of the algorithms used for developing the fault detection and classification approach applied to photovoltaic solar panels.

2.2.1. Neural Networks

ANNs are computational structures that have the ability to imitate the human brain’s behavior [16,17]. NNs are mainly composed of neurons organized in a form of layers. The operation of an NN is based on introducing the information to be processed (known as features) to the input layer. The information provided by the input layer is progressively transferred to the output layer via one (or multiple) hidden layers. We considered the vector of inputs to be

X^{i}

and the output of the NN to be

y_{o u t}^{i}

. The basic mathematical description of the NN operation is provided as follows:

y_{o u t}^{i} = F (X^{i}, W)

(1)

F is a nonlinear function, W denotes the vector of weights, and i is an integer number. The training process aims to find the optimal weights that minimize the quadratic error between the NN output

y_{o u t}^{i}

and the target

y^{i}

. Mathematically, training an NN is transferred into an optimization (minimization) problem defined as follows:

W^{*} = {arg}_{W} m i n \frac{1}{N} \sum_{i = 1}^{N} {(y^{i} - y_{o u t}^{i})}^{2}

(2)

Training algorithms usually use gradient descent techniques where the weights are iteratively updated while moving toward the steepest descent direction. The iterative form of the solution is given by:

W (t) = W (t - 1) + Δ W (t)

(3)

The gradient of the weight vector is provided by:

Δ W (t) = - ϵ (t) \frac{\partial (y^{i} - F (X^{i}, W))}{\partial W}

(4)

Training algorithms, such as Levenberg–Marquardt and resilient propagation, differ only by way of calculating the gradient component [18].

2.2.2. Ensemble Learning

EL is a machine learning approach where several models, referred to as weak learners, are trained to address the same issue and then merged to obtain better outcomes. [19]. With reference to the literature, there are mainly three ensemble learning approaches: boosting, bagging, and stacking. Boosting incorporates deterministic learning with highly adaptive sequential learning of homogenous weak learners (the base model relies on preceding models). Bagging considers homogenous weak learners, learns them concurrently and independently of one another, and then groups them based on some sort of deterministic mechanism. Stacking makes use of diverse weak learners, combines them through parallel learning, and trains a meta-model to produce forecasts based on the output of several weak models.

The main difference between stacking, bagging, and boosting algorithms can be summarized as follows [20,21]:

1.: Stacking utilizes a meta-model to learn how to integrate the base models. Boosting (e.g., AdaBoost, GMB, CatBoost, XGMB, LightGBM,...) and bagging (e.g., random forest, bagging-meta-estimator,...) follow deterministic approaches to aggregate weak learners.
2.: Stacking often considers heterogeneous weak learners. While bagging and boosting focus on homogeneous ones.

A block diagram of a stacking EL algorithm is shown in Figure 4.

The next subsections give brief descriptions of the ensemble learning algorithms that were taken into consideration in this work.

CatBoost

CatBoost was first introduced by Yandex in [22]. It is an open-source ML algorithm used for gradient boosting on decision trees. During the training phase, a collection of decision trees is created iteratively. As training goes on, each new tree is constructed with less loss than the one before it. CatBoost belongs to EL-based algorithms, and its main features are good performances without parameter tuning, categorical feature support, and improved accuracy with fast prediction, respectively. Similar to neural networks, a gradient boosting algorithm iteratively maps a set of input features to a target value [23]. The function

F (t)

(such as the one in Equation (1)), is obtained from its previous instance

F (t - 1)

according to the following equation:

F^{t} = F^{t - 1} + α h^{t}

(5)

α

is a step size and h is a base predictor chosen in such a manner that a loss function is minimized as follows:

h^{t} = {arg}_{h \in H} min L (F^{t - 1} + h)

(6)

For more details about the CatBoost algorithm and its flowchart, the interested reader can refer to [22].

XGBoost

XGBoost (eXtreme Gradient Boosting) was first introduced by Tianqi Chen [24]; it is an implementation of gradient boosting machines. It belongs to a broader collection of tools under the umbrella of the distributed ML community. Unlike gradient boost, XGBoost makes use of regularization parameters that help [prevent] overfitting. For more details, interested readers can refer to [24].

LightGBM

In 2016, G. Ke Microsoft created the light gradient boosting machine (its initial version) [25]. It consists of two novel methods: Dealing with a high number of features using exclusive feature bundling, gradient-based one-side sampling, and large numbers of data instances.

2.2.3. The Proposed Method

The global flowchart developed in this study is outlined in Figure 5 while the detailed one concerning the detection and classification models is outlined in Figure 6.

Features Extraction and Selection

After collecting the I–V curves, the next step consists of extracting different features (Imp, Isc, Vmp, Voc, Pmp, FF, V1, and I1) from the I–V curves. The above-mentioned variables are defined as follows: Imp is the current at the maximum power point (MPP), Vmp is the voltage at MPP, Isc is the short circuit current (V = 0), Voc is open voltage (I = 0), I

^{'}

and V

^{'}

are the calculated current and voltage I

^{'}

= f(V

^{'}

= 2/3Voc), I

^{'}

^{'}

and V

^{″}

are the calculated current and voltage I

^{″}

= f(V

^{″}

= 1/3Voc), see Figure 7.

As an example, the calculated values of features with labeled classes are listed in Table 2.

Regarding the fault labeling process, based on the I–V curves, we assigned the type of faults manually, for example, F1 = 1, F2 = 2, F3 = 3, F4 = 4, and F5 = 5.

Fault Detection Model

To detect the fault, a neural network (NN) was developed based on historical values of the measured PV output power (P_t), solar irradiance (G_t), and air temperature (T_t). The fault detection model estimates the PV power (P_t^*) and then compares it with the measured power (P_t). The model uses the actual values of G_t and T_t, and the output contains the predicted P_t* at time t as the input. A threshold (Thr) is estimated based on extensive experiments. Thus, if (Delta = P_t − P_t^*) is close to the reference threshold (Thr), there is no fault, otherwise a fault is detected. 80% of observations are used for training and 20% are used for testing.

Fault Classification Model

This work was motivated by the development of the multi-stacking classifier, which combines multiple levels. On the basis of the entire training set, the base-level models (model no. 1, model no. 2, and model no. 3) are trained. In the second level, the predictions are used to feed the two meta-models (meta-model no. 1 and meta-model no. 2). The final model is then trained using the features produced by the meta-models. Figure 8 shows the multi-stacking diagram for classification purposes.

The first level contains three base models: LightGBM, CatBoost, and XGBoost algorithms.
The second level contains two meta-models: meta-model no. 1 = RF (CatBoost and XGBoost), meta-model no. 2 = RF (LightGBM and XGBoost))
The last level is the final meta-model: meta-model final = RF (meta-model no. 1, meta-model no. 2).

A random forest (RF) algorithm is used as a meta-model. The main assumption considered in this paper is that if base models are combined in the right way, more accurate results may be obtained.

A set of observations (80%) was used to train the multi-stacking EL classifier, while the remaining set (20%) was used to test the classifier. Once the detection model (NN) finds the defect, the next step consists of classifying the type of fault using the developed multi-stacking classifier (MSEL). The inputs of the MSEL-based classifier are the selected features. The outputs are the labeled faults, respectively, F1 (class 1), F2 (class 2), F3 (class 3), F4 (class 4), and F5 (class).

Performance Evaluation

Error metrics, such as the correlation coefficient (R: a measurable correlation, or a statistical link between two variables, expressed numerically), the mean absolute error (MAE: a measurement of the differences in errors between matched observations representing the same phenomena), F1-score, precision, recall, and accuracy are calculated to assess the performances of the generated models. To obtain more reliable estimations of the models’ performances, the K-fold cross-validation technique was used.

a c c u r a c y = \frac{T_{H} + T_{F}}{T_{H} + T_{F} + F_{H} + F_{F}}

(7)

p r e c i s i o n = \frac{T_{H}}{T_{H} + F_{H}}

(8)

r e c a l l = \frac{T_{H}}{T_{H} + F_{F}}

(9)

F 1_{- s c o r e} = 2 \frac{p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l}

(10)

where

T_{H}

is the true healthy estimation,

T_{F}

is the true faulty estimation,

F_{H}

is the false healthy estimation,

F_{F}

is the false faulty estimation.

All codes were implemented in Python language using open-source libraries, such as Keras and TensorFlow. A version of Google Colab Pro was used to execute the codes online.

3. Results and Discussion

The proposed method was developed using Python language with other open-source machine learning libraries, such as scikit-learn [24], Keras [25], CatBoost [26], and LightGBM [27].

3.1. Features Selection

Each feature was taken from the I–V characteristic. To see how features are related to each other as well as with the fault class, the correlation matrix with the heat map was plotted (see Figure 9a). The heat map is used to identify which features are most related to the fault class.

In the last raw (fault), it can be seen that fault is highly correlated with Voc, which seems to be the least correlated with the other parameters. The important features are depicted in Figure 9b. As can be seen, the most important four features are FF, V′, Vmp, and Voc. These four features will be selected to develop the fault classification model.

3.2. Fault Detection Using the NN-Predictor

Figure 10a depicts a comparison of the observed and expected (by the developed NN-based predictor) PV powers and Figure 10b shows the correlation observed between the measured and predicted powers.

These graphs indicate the ability of the model to predict the power with good accuracy. Table 3 shows that the correlation coefficient is equal to 98.56%, and the MAE is equal to 1.76 W. These results demonstrate the capability of the NN model to estimate the PV power.

3.3. Fault Classification Using the Single and Multi-Stacking EL Classifier

To assess how well the single and multi-stacking EL classifier performed, the following performance metrics (recall, precision, accuracy, and F1-score) were calculated (see Table 4). The precision, recall, and F1-score for both classifiers ranged between 93% and 100%. As can be seen in Table 4, a slight improvement in accuracy (96.21%) was obtained (see the bold numbers). However, in terms of the training time, the multi-stacking classifier took time compared with a single-stacking classifier. It should be pointed out that several trials were conducted to optimize both classifiers by optimally tuning their hyperparameters (e.g., maxdepth, nestimators, learningrate reglamda, regalpha, numleaves, maxfeatures, etc.).

Figure 11 depicts the confusion matrix of the developed classifiers.

Values shown in the diagonal of the matrices are the correctly classified samples, while the other are misclassified. By dividing the total number of samples by the sum of the matrix’s diagonal elements, the accuracy may be determined.

3.4. Comparison with Other EL Algorithms

To show the effectiveness of the proposed method in terms of classification accuracy, a comparative study with other EL-based algorithms was made (e.g., boosting: AdaBoost, CatBoost, and XGBoost; bagging: meta, e.g., bagging and random forest). Table 4 depicts the obtained results.

Bold indicates the best results. From Table 5, it can be seen that random forest outperformed the other examined EL algorithms. The lower accuracy was obtained by AdaBoost. Other algorithms yielded almost similar results. Random forest had the same efficiency as single-stacking EL. While multi-stacking EL performed slightly better than RF (see Table 4).

4. Conclusions

This paper introduces a novel technique for the PV system’s defect identification and categorization. It consists of using a neural network for the detection process and a multi-stacking EL for the classification of the detected faults. Single and multiple faults are considered in this work. Different hyperparameters were investigated in order to improve the classifiers that were constructed. The accuracy of the fault detection procedure depends on the dataset, which should be updated frequently (primarily as a result of PV cells deteriorating over time). The classification accuracy could also be improved using a large dataset with better quality.

The present work could be further implemented and verified experimentally. In addition, designing an online-embedded system by integrating such techniques into hardware devices (such as microcontrollers, reconfigurable circuits, or microprocessors) is still an open challenge in this topic.

Author Contributions

Conceptualization, A.M., C.Z. and S.B.; methodology, A.M., C.Z., and S.B.; software, A.M. and S.K.; validation, A.M. and S.B.; formal analysis, A.M.; investigation, A.M., C.Z. and S.B.; resources, C.Z.; data curation, A.M.; writing—original draft preparation, A.M. and S.B.; writing—review and editing, A.M., C.Z., S.B., and S.K.; visualization, A.M.; supervision, C.Z.; project administration, A.M.; funding acquisition, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was funded by Institutional Fund Projects under grant no. (IFPIP: 1136-144-1443). The authors gratefully acknowledge the technical and financial support provided by the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.

Data Availability Statement

Data are not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ANNs	Artificial Neural Networks
CatBoost	Categorical Boosting
CNNs	Convolutional Neural networks
DL	Deep Learning
DT	Decision Tree
EL	Ensemble Learning
FD I	Fault Detection and Isolation
GBoost	Gradient Boosting
IEA	International Energy Agency
I_mp	Current at the maximum Power Point
IoT	Internet Of Things
Isc	Short Circuit Current
K-NN	K-Nearest Neighbors
LightGBM	Light Gradient Boosting Machine
LR	Linear Regression
LSTM	Long-Short Term Memory
MAE	Mean Absolute Error
ML	Machine Learning
MPP	Maximum Power Point
NNs	Neural Networks
PCA	Principle Component Analysis
PV	Photovoltaic
R	Correlation Coefficient
RF	Random Forest
SVM	Support Vector Machine
V_mp	Voltage at MPP
V_oc	Open Voltage
XGBoost	eXtreme Gradient Boosting

References

Snapshot of Global PV Markets. Report IEA-PVPS T1-42:2022. Available online: https://iea-pvps.org/snapshot-reports/snapshot-2022/ (accessed on 25 April 2022).
Al-Dousari, A.; Al-Nassar, W.; Al-Hemoud, A.; Alsaleh, A.; Ramadan, A.; Al-Dousari, N.; Ahmed, M. Solar and wind energy: Challenges and solutions in desert regions. Energy 2019, 176, 184–194. [Google Scholar] [CrossRef]
Leva, S.; Aghaei, M. Failures and Defects in PV Systems. Power Eng. Adv. Chall. Part Electr. Power 2018, 55, 56–84. [Google Scholar]
Mellit, A.; Kalogirou, S. Artificial intelligence and internet of things to improve efficacy of diagnosis and remote sensing of solar photovoltaic systems: Challenges, recommendations and future directions. Renew. Sustain. Energy Rev. 2021, 143, 110889. [Google Scholar] [CrossRef]
Chaouch, H.; Charfeddine, S.; Ben Aoun, S.; Jerbi, H.; Leiva, V. Multiscale Monitoring Using Machine Learning Methods: New Methodology and an Industrial Application to a Photovoltaic System. Mathematics 2022, 10, 890. [Google Scholar] [CrossRef]
Kapucu, C.; Cubukcu, M. A supervised ensemble learning method for fault diagnosis in photovoltaic strings. Energy 2021, 227, 120463. [Google Scholar] [CrossRef]
Tina, G.M.; Ventura, C.; Ferlito, S.; De Vito, S. A state-of-art-review on machine-learning based methods for PV. Appl. Sci. 2021, 11, 7550. [Google Scholar] [CrossRef]
Mellit, A.; Herrak, O.; Rus Casas, C.; Massi Pavan, A. A machine learning and internet of things-based online fault diagnosis method for photovoltaic arrays. Sustainability 2021, 13, 13203. [Google Scholar] [CrossRef]
Li, B.; Delpha, C.; Diallo, D.; Migan-Dubois, A. Application of Artificial Neural Networks to photovoltaic fault detection and diagnosis: A review. Renew. Sustain. Energy Rev. 2021, 138, 110512. [Google Scholar] [CrossRef]
Yang, N.C.; Ismail, H. Voting-based ensemble learning algorithm for fault detection in photovoltaic systems under different weather conditions. Mathematics 2022, 10, 285. [Google Scholar] [CrossRef]
Mellit, A.; Boubaker, S. An effective ensemble learning method for fault diagnosis of photovoltaic arrays. In Proceedings of the 3rd International Conference on Electronic Engineering and Renewable Energy (ICEERE’2022), Saidia, Morocco, 20–22 May 2022. [Google Scholar]
Mandal, R.K.; Kale, P.G. Assessment of different multiclass SVM strategies for fault classification in a PV system. In Proceedings of the 7th International Conference on Advances in Energy Research; Springer: Singapore, 2021; pp. 747–756. [Google Scholar] [CrossRef]
Mellit, A.; Kalogirou, S. Assessment of machine learning and ensemble methods for fault diagnosis of photovoltaic systems. Renew. Energy 2022, 184, 1074–1090. [Google Scholar] [CrossRef]
Dhibi, K.; Mansouri, M.; Bouzrara, K.; Nounou, H.; Nounou, M. An enhanced ensemble learning-based fault detection and diagnosis for grid-connected PV systems. IEEE Access 2021, 9, 155622–155633. [Google Scholar] [CrossRef]
Mellit, A.; Kalogirou, S. Handbook of Artificial Intelligence Techniques in Photovoltaic Systems: Modeling, Control, Optimization, Forecasting and Fault Diagnosis; Elsevier: Amsterdam, The Netherlands, 2022. [Google Scholar]
Haykin, S. Chapter 4 in Neural Networks and Learning Machines, 3/E; Pearson Education India: Delhi, India, 2009. [Google Scholar]
Lichtner-Bajjaoui, A. A Mathematical Introduction to Neural Networks. 2021. Available online: http://diposit.ub.edu/dspace/handle/2445/180441 (accessed on 25 April 2022).
Boubaker, S.; Kamel, S.; Kchaou, M. Prediction of daily global solar radiation using resilient-propagation artificial neural network and historical data: A case study of Hail, Saudi Arabia. Eng. Technol. Appl. Sci. Res. 2020, 10, 5228–5232. [Google Scholar] [CrossRef]
Zhang, C.; Ma, Y. Ensemble Machine Learning: Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef] [PubMed]
CatBoost. Available online: https://CatBoost.ai/ (accessed on 25 April 2022).
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 37. [Google Scholar]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Qi, M.L. A highly efficient gradient boosting decision tree. In Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
Keras. Available online: https://keras.io/ (accessed on 25 April 2022).

Figure 1. Measured data for 11 days: (a) PV output power, (b) air temperature, and (c) solar irradiance.

Figure 2. (a) The PV array under study, (b) test facility, and (c) Prova I–V tracer.

Figure 3. Screenshot of the collected I–V curves using the PROVA 210 I–V tracer, under various working conditions; (a) normal and (b) faulty I–V curve.

Figure 4. A block diagram of a stacking method using n models and one meta model.

Figure 5. The global flowchart developed in this study.

Figure 6. The fault detection and classification steps.

Figure 7. Example of extracted features from an I–V curve.

Figure 8. The proposed multi-stacking EL classifier.

Figure 9. (a) Correlation and heat map, (b) importance features.

Figure 10. Simulation results: (a) measured versus predicted PV power and (b) scattered plot of the measured and predicted PV powers.

Figure 11. Confusion matrices: (a) single-stacking classifier and (b) multi-stacking classifier.

Table 1. Dataset Description.

Type of Fault	Description	Single or Multiple Fault
F1	Dust deposit on PV modules	single
F2	Partial shading effect	single
F3	Open circuit diode with dust accumulation	multiple
F4	Partial shading with dust accumulation	multiple
F5	Shunted diode with shading	multiple

Table 2. Extracted features from the I–V characteristics with labeled classes at various working conditions.

G (W/ $m^{2}$ )	T (°C)	Isc (A)	Voc (V)	Imp (A)	Vmp (V)	Pmp (W)	FF ()	I $^{'}$ (A)	V $^{'}$ (V)	I $^{″}$ (A)	V $^{″}$ (V)	Class
⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯
327	22.89	3.66	18.32	2.45	7.99	19.61	0.29	3.11	12.25	3.29	6.10	4
320	28	17.27	9.77	10.42	6.43	67.04	0.39	14.68	6.52	15.54	3.25	2
380	29	5.102	16.21	3.370	6.8	23.05	0.278	4.34	10.81	4.59	5.40	3
420	37	25.79	17.26	19.77	13.06	258.24	0.58	21.92	11.50	23.21	5.73	1
210	17	12.89	18.26	9.88	14.06	139.01	0.59	10.96	12.172	11.65	6.08	5
⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯	⋯

Table 3. Error metrics of the NN model used for fault detection.

MLP-Model	MAE (W)	R(%)
Epoch = 2000, Optimizer = Adam, Activation function = Logistic, Loss = mse, number of unit = 20 × 10 × 5	1.76	98.56

Table 4. The calculated error metrics of the fault classification method (single and multi-stacking EL).

Fault Classes	Precision (%)	Recall (%)	F1-Score (%)	Accuracy (%)
	Single Stacking	EL classifier
F1 (class no. 1)	100	96	98
F2 (class no. 2)	97	93	95
F3 (class no. 3)	97	97	97	95.58
F4 (class no. 4)	93	96	94
F5 (class no. 5)	79	94	86
	Multi-Stacking	EL classifier
F1 (class no. 1)	1	96	98
F2 (class no. 2)	97	96	96
F3 (class no. 3)	97	97	97	96.21
F4 (class no. 4)	95	96	96
F5 (class no. 5)	79	94	86

Table 5. Accuracies of various classification models.

Fault Classes	Precision (%)	Recall (%)	F1-Score (%)	Accuracy (%)
		Boosting	algorithms
AdaBoost
F1 (class no. 1)	97	96	97
F2 (class no. 2)	94	92	93
F3 (class no. 3)	99	95	97	93.37
F4 (class no. 4)	92	95	93
F5 (class no. 5)	63	75	69
CatBoost
F1 (class no. 1)	96	97	97
F2 (class no. 2)	93	94	94
F3 (class no. 3)	100	97	97	95.26
F4 (class no. 4)	95	95	95
F5 (class no. 5)	81	81	81
XGBoost
F1 (class no. 1)	100	96	98
F2 (class no. 2)	96	96	96
F3 (class no. 3)	97	95	96	94.95
F4 (class no. 4)	95	95	95
F5 (class no. 5)	67	88	76
		Bagging	algorithms
Meta-bagging F1 (class no. 1)	100	96	98
F2 (class no. 2)	96	96	96
F3 (class no. 3)	99	95	97	94.32
F4 (class no. 4)	93	95	94
F5 (class no. 5)	60	75	67
Random forest F1 (class no. 1)	96	99	97
F2 (class no. 2)	96	96	96
F3 (class no. 3)	100	95	97	95.58
F4 (class no. 4)	95	95	95
F5 (class no. 5)	78	88	82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mellit, A.; Zayane, C.; Boubaker, S.; Kamel, S. A Sustainable Fault Diagnosis Approach for Photovoltaic Systems Based on Stacking-Based Ensemble Learning Methods. Mathematics 2023, 11, 936. https://doi.org/10.3390/math11040936

AMA Style

Mellit A, Zayane C, Boubaker S, Kamel S. A Sustainable Fault Diagnosis Approach for Photovoltaic Systems Based on Stacking-Based Ensemble Learning Methods. Mathematics. 2023; 11(4):936. https://doi.org/10.3390/math11040936

Chicago/Turabian Style

Mellit, Adel, Chadia Zayane, Sahbi Boubaker, and Souad Kamel. 2023. "A Sustainable Fault Diagnosis Approach for Photovoltaic Systems Based on Stacking-Based Ensemble Learning Methods" Mathematics 11, no. 4: 936. https://doi.org/10.3390/math11040936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sustainable Fault Diagnosis Approach for Photovoltaic Systems Based on Stacking-Based Ensemble Learning Methods

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Methods

2.2.1. Neural Networks

2.2.2. Ensemble Learning

CatBoost

XGBoost

LightGBM

2.2.3. The Proposed Method

Features Extraction and Selection

Fault Detection Model

Fault Classification Model

Performance Evaluation

3. Results and Discussion

3.1. Features Selection

3.2. Fault Detection Using the NN-Predictor

3.3. Fault Classification Using the Single and Multi-Stacking EL Classifier

3.4. Comparison with Other EL Algorithms

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI