1. Introduction
In recent years, with the development of the green energy industry and the implementation of sustainable energy policies by various countries, the construction of solar power plants has been expanding continuously, and the prospects of the solar energy market have been growing. However, due to prolonged operation in extreme and harsh natural environments, various faults are inevitable, leading to a significant reduction in actual service life. When a module fails, the direct risks include damage to the module itself, which in turn reduces the power generation efficiency; the indirect risks may cause the entire solar photovoltaic (PV) power generation system to malfunction, adversely affecting the grid and resulting in severe economic losses.
Image-based solar fault diagnostic methods use image processing and analyze different images, such as visible light, infrared thermography [
1], and electroluminescence images [
2,
3], to detect partial shading, hot spot faults, and crack faults in solar modules. These methods require the use of additional sensors to collect data and, thus, are time-consuming and costly.
To address the maintenance issues brought about by the rapidly increasing demand for large-scale solar panel installations in the future, this study aims to utilize deep learning in the string characteristics of solar systems. It establishes a fault diagnostic system for solar energy systems, diagnosing the I-V curves of solar string and utilizing deep learning algorithms to develop the required models to reduce manpower resource costs, thus more efficiently maintaining the solar arrays and increasing the power generation efficiency and stability, reducing the potential hazards caused by faults.
The current–voltage (I-V) curve of a solar string is a graphical representation of the relationship between the current and voltage outputs under fixed irradiance and temperature conditions. From the curve, some main parameters can be found, including open-circuit voltage (
Voc), short-circuit current (
Isc), maximum power point voltage (
Vmpp), maximum power point current (
Impp), maximum power (
Pmax), and fill factor (FF) [
4,
5,
6].
Most current fault diagnostic methods based on the I-V curve rely on feature extraction, summarizing the curve as a set of parameters and then using statistical methods [
7] or support vector machines [
8] to evaluate the parameters for fault classification, or using the entire I-V curve; employing automatic feature extraction with principal component analysis [
9] or analysis through a two-dimensional convolutional neural network (2D-CNN) [
10], and long short-term memory (LSTM) networks [
11]. Unlike previous methods, this study differs by using a multiple-input model that simultaneously inputs the entire I-V curve and the feature values extracted from the curve into the model, allowing the model to obtain more information from the curve to aid in fault classification. Most literature indicates that it is not easy to obtain a large quantity of current–voltage curve data for faults, and this study addresses this issue through simulation.
Considering the difficulty of collecting a large quantity of fault current–voltage curves from actual field sites as training data for the model, this study simulates a large number of fault characteristic curves of solar system string using MATLAB Simulink version 10.7 (R2023a) as training data for the fault diagnostic model, including voltage–current curves, irradiance, specifications of solar module short-circuit current, specifications of solar module open-circuit voltage, and the number of modules in a solar string, etc. The simulated data are preprocessed before being inputted into the multiple-input model for training. This model can reduce the maintenance costs of large solar power plants, decrease personnel costs and hazards, and obtain fault information more promptly, quickly addressing faults to enhance the stability, reliability, and safety of solar power plant operations, providing higher economic benefits for the solar power plants of the future.
In this paper, MATLAB Simulink is used to model various states of solar modules in
Section 2, with the simulated curves serving as training data for the model.
Section 3 introduces the data preprocessing methods employed in this study, emphasizing the necessity of preprocessing the training data before inputting it into the model for training. In
Section 4, a model is presented that is designed to simultaneously utilize curve features and curve images as input data, followed by an analysis of its performance on real field data.
2. Current–Voltage Curves
Among the mathematical models for solar modules, the single-diode model is the most widely used solar cell model [
12]. From the equivalent circuit, the current–voltage characteristic equation of the solar cell can be obtained. The Bishop Equation (1) [
13] with reverse bias characteristics is as follows:
where
I is the output current of the solar cell,
V is the terminal voltage of the solar cell,
q is the charge of an electron,
n1 is a curve-fitting constant,
k is the Boltzmann constant,
T is the cell temperature,
Rs is the series resistance of the cell,
Rsh is the parallel resistance of the cell,
a and
n are curve fitting coefficients,
Ub is the reverse bias voltage of the cell,
ID is the diode reverse saturation current, and
Iph is the photocurrent of the solar cell.
The formulas for
ID (2) and
Iph (3) are as follows:
where
Ior is the saturation reverse current of the reference diode,
Tr is the reference temperature,
Ego is the bandgap energy,
n2 is a curve-fitting constant,
ISCR is the reference photocurrent,
Ki is the temperature coefficient, and
G is the irradiance.
Although a two-diode model is used for simulating in MATLAB Simulink, adjusting the solar cell model to a “5-parameter mode” corresponds to the single-diode model shown in
Figure 1, where the second diode’s saturation current is zero, and the parallel resistance is infinite. Only the short-circuit current
Isc, open-circuit voltage
Voc, irradiance
Ir0 used for measurement, quality factor
N, and series resistance
Rs need to be adjusted to simulate different I-V curve scenarios. Since the parallel connection of the modules in this model under the “5-parameter model” cannot be adjusted, subsequent curve simulations will be conducted by modeling the series-parallel connection between modules and strings.
Through MATLAB Simulink, seven states of the string current–voltage curves of the solar system (normal, aging fault, shading fault, PID fault, short-circuit fault, hot spot fault, and crack fault) are simulated to serve as training data for the solar system fault classification model, with 5000 data points for each state of the solar system string current–voltage curve, totaling 35,000 training data points.
2.1. Normal Condition
To simulate the current–voltage curves of normal string connections using solar modules of different specifications, the number of cells connected in series within each module ranges from 330 to 450, and each cell has an open-circuit voltage of 1.35 V. The resulting simulated current–voltage (I-V) curves exhibit open-circuit voltages ranging from 445.5 V to 607.5 V and short-circuit currents from 8 A to 12 A, thereby allowing the model to be applicable to different site specifications as shown in
Figure 2. The irradiance is simulated from 600 W/m
2 to 1000 W/m
2 to mirror actual field conditions as shown in
Figure 3. The remaining fault conditions will also be simulated following this method.
2.2. Aging Faults
Aging faults in solar modules refer to the gradual decline in performance and efficiency over time, primarily due to prolonged exposure to sunlight, temperature variations, humidity, and environmental pollution. As the modules age, their energy conversion capacity weakens, leading to reduced output power. Aging faults may also result in increased internal resistance, higher leakage currents, and changes in transient and temperature characteristics [
15].
Aging faults can be classified into two types: one involving the change of the Rs parameter in the solar cell model to age all cells; the other involves connecting two solar cell modules in series, where one module has an additional resistor in series with some cells to simulate partial cell aging, this allows for the simulation of string’s current–voltage curves with different aging faults. These two types of aging faults exhibit distinct characteristics on the current–voltage (I-V) curves.
Each type of aging fault is simulated with 2500 instances. Total cell aging is simulated through a single solar module, as shown in
Figure 4, where the parameter
Rs is adjusted to achieve different levels of overall aging. The number of cells in series ranges from 330 to 450, with series resistance
Rs adjusted from 60 mΩ to 80 mΩ. Partial cell aging divides the cell string into two groups, as shown in
Figure 5, one group being normal and the other experiencing aging faults with an additional resistor. The percentage of aged cells is between 18% and 33% of the total, and the resistance connected in series is adjusted from 40 Ω to 60 Ω to simulate varying degrees of partial cell aging. On the I-V curve, the characteristic of total cell aging shows the maximum power point moving slightly towards the origin, while the characteristic of partial cell aging shows a change in slope near the open-circuit voltage, as illustrated in
Figure 6.
2.3. Shading Faults
Shading faults in solar modules occur when shadows cast on the surface of the solar panels affect part of their surface, preventing the shaded areas from effectively converting sunlight into electrical energy. This could be due to buildings, trees, or other obstacles obstructing sunlight at certain times or seasons. Shading faults reduce the overall performance of the solar panels, thereby decreasing the output power [
16].
The simulation of shading faults is conducted by connecting three solar cell modules and providing different levels of irradiance to each module as shown in
Figure 7. The irradiance for the shaded parts ranges from 100 W/m
2 to 500 W/m
2. The simulation also varies the shaded area coverage, with small portions covering 5% to 10% and larger areas covering up to 50%, to achieve different degrees of shading fault simulation as shown in
Figure 8.
2.4. PID (Potential Induced Degradation) Faults
In solar systems using transformerless inverters, there is no electrical isolation, and the negative pole of the solar module string does not need to be grounded. In grid-connected solar systems, solar modules are typically connected in series to form high-voltage outputs, and the module frames are grounded for safety reasons. This results in floating potentials; half of the modules are under positive bias and half under negative bias. The potential difference causes leakage currents to flow from the module frames to the solar cells. Under external voltage, sodium ions in the glass migrate and accumulate on or enter the surface of the cells, causing shunting and reducing their efficiency. PID faults occur after long-term operation of the module [
17].
When a solar module exhibits PID faults, characteristics such as decreased
Rsh, increased
Rs, and reduced open-circuit voltage (
Voc) appear. The current–voltage (I-V) curve will show characteristics as depicted in
Figure 9. By paralleling a resistor and a diode with the solar cell module and adjusting the parallel resistance, internal
Rs, and
Voc parameters of the solar cell module as shown in
Figure 10, the I-V curve of a PID fault is simulated as shown in
Figure 11.
2.5. Short-Circuit Faults
Short-circuit faults in solar modules occur when the current in the solar panel forms a low-resistance path on an unexpected route, causing the current to flow directly without passing through the load. This can be due to faults in the connecting wires or circuit components within the solar panel or due to external factors such as damage, overheating, or physical damage causing contact between two or more electrodes. Short-circuit faults lead to the solar panel’s current losing its normal path, potentially causing system overheating, damage to the battery components, or even severe consequences like fires.
As shown in
Figure 12, two solar cell modules are connected in series, each paralleled with a diode, and one module is short-circuited to simulate varying degrees of short-circuit faults as shown in
Figure 13.
2.6. Hot Spot Faults
Hot spot faults in solar modules primarily arise from mismatched power output among some solar cells within the module. Cells that are mismatched can become reverse-biased, transforming from power-generating elements to load-bearing ones, consuming energy and generating significant heat. When a solar module exhibits a hot spot fault [
19], its current–voltage (I-V) curve will display characteristics that vary depending on the distribution of the hot spots. Cells affected by hot spot faults exhibit lower short-circuit currents than normal cells, leading to power mismatches among them.
As shown in
Figure 14, among all fault types, only this fault employs module parallel connection. To simulate the distribution of hot spots in half-cut cells, the solar cell string is divided into three modules. Two of them are connected in series, with one being a normal module and the other experiencing hot spot faults. This string of solar cell modules is then paralleled with another set of normal solar cell modules; a resistor is paralleled on a solar module experiencing a hot spot fault. By changing its internal parameters
Isc and the value of the parallel resistor, a distorted curve hot spot is simulated as shown in
Figure 15. As shown in
Figure 16, three solar cell modules are connected in series; the top module is normal, while the two below exhibit varying degrees of hot spot faults with different resistor values paralleled, simulating a double-step curve hot spot as shown in
Figure 17.
2.7. Crack Faults
Solar modules may develop cell cracks during transportation from the factory to the installation site, during installation, and subsequently when exposed to repetitive weather conditions such as strong winds. Cracks in solar modules can lead to loss of connection between cells, decreased output power, insulation failure, non-compliance with safety standards, and potential safety hazards such as leakage currents.
When a solar cell module exhibits a crack fault [
20,
21], the current–voltage (I-V) curve will display stair-step features similar to those seen in shading faults but with the distinctive feature of the steps having a convex function characteristic. As shown in
Figure 18, by connecting two solar cell modules in series, each paralleled with a diode, providing different levels of irradiance to produce stair-step curves, and adjusting the internal parameter
N of the module with the crack fault to make the steps show convex function characteristics such as shown in
Figure 19, crack faults are simulated.
4. Solar Array Fault Classification Model
Convolutional Neural Networks (CNN) are a type of deep learning model primarily used for tasks involving data with a grid structure, such as image recognition and processing. They extract features from input data through convolutional and pooling layers and perform tasks like classification or regression through fully connected layers. CNNs are known for their ability to handle high-dimensional image data and generalize well across different settings. Therefore, in this study, CNNs are employed to extract features from the current–voltage (I-V) curve images of solar strings.
Deep Neural Networks (DNN) are a type of artificial neural network (ANN) with a deep architecture consisting of multiple hidden layers. DNNs learn and analyze input data in-depth, typically used for processing both structured and unstructured data. Through the backpropagation algorithm, DNNs can automatically learn and extract high-level features from input data, facilitating various tasks such as classification, regression, and generation. Since a 2D-CNN alone cannot simultaneously input images and feature values derived from the curves, a DNN is chosen to process the computed feature values.
Multi-input models differ from standard CNN models in that they can handle various types of data inputs, including numeric data, images, and categorical data. These inputs are processed by the model to learn and then output predictions. To enable the deep learning model to simultaneously input images and feature values for training, the multi-input model integrates CNNs with DNNs. The architecture of the model, as shown in
Figure 22, the activation function used in this architecture is ReLU for all layers except the output layer, which uses Softmax; the current–voltage curve image serves as the input for the CNN (Input1), which undergoes three convolutional layers and three pooling layers to extract features from the curve image. The convolutional layers employ 32, 64, and 128 3 × 3 filters, respectively, to extract features. Max pooling with a 2 × 2 kernel size is used to reduce the number of parameters, prevent overfitting, and retain important features. Subsequently, the flattened layer transforms the multidimensional input into one dimension, followed by a fully connected layer with 128 neurons. Dropout layers discard 20% of the neurons, and the output is passed through another fully connected layer with eight neurons.
The 10 features mentioned in
Section 3.1 are used as the input for the DNN (Input2), passing through a fully connected layer with 32 neurons. Dropout layers discard 20% of the neurons, followed by another fully connected layer with eight neurons. The outputs of these two components are combined into 16 neurons through a concatenation layer and then processed through a fully connected layer with 16 neurons. Finally, the output layer utilizes a Softmax activation function to generate predictions for seven categories. The overall structure of the study, as depicted in
Figure 23, shows how the required data, after data preprocessing, is input into the model, which then outputs the status of the solar strings.
4.1. Model Training
The 35,000 preprocessed data points were split into 80% for the training set and 20% for the testing set.
Figure 24 shows the accuracy and loss curves after 300 iterations, with the model achieving a final training accuracy of 99.9%. It is observed that before 15 iterations, the accuracy and loss change rapidly, and the model converges quickly. After surpassing 15 iterations, the accuracy and loss gradually stabilize. Based on the results in
Figure 24, the optimal number of epochs is set to 100.
To visually assess the classification results of the trained model, a confusion matrix is used to display the relationship between predicted results and actual labels. The results for the test set are shown in
Figure 25, where the numbers indicate the count of data points for each predicted label, with the diagonal representing correct predictions for each category, including ‘no’ for normal, ‘ag’ for aging faults, ‘ps’ for shading faults, ‘PID’ for PID faults, ‘sc’ for short-circuit faults, ‘hs’ for hot spot faults, and ‘ck’ for crack faults. The model predicted all test data correctly except for one instance where a hot spot fault was misclassified as normal. These results demonstrate the high recognition accuracy of the method.
4.2. Model Validation
In a real solar field, current–voltage curves were obtained from a string of 11 solar modules connected through an inverter, with the modules having an open-circuit voltage of 40.38 V and a short-circuit current of 10.85 A. The real solar field data comprised 23 entries, including 6 with minor crack faults, 14 with minor hot spot faults, 2 with small-area shading faults, and 1 with a large-area shading fault, all collected under an irradiance of 1000 W/m
2; the three types of faults are shown in
Figure 26.
After preprocessing, the real field data were input into the trained fault classification model. The results are displayed in
Figure 27. The model correctly identified all shading and hot spot faults. However, it correctly identified only one of the crack faults, misclassifying the remaining five as hot spot faults, resulting in a model prediction accuracy of 78%. The misclassification can be attributed to the subtle differences between minor hot spot and crack faults in the current–voltage curves in real-world conditions, which are not distinctly different, leading to errors in model predictions.
5. Conclusions
This study’s solar array fault diagnostic system utilizes a multi-input model architecture combining Convolutional Neural Networks (CNN) and fully connected layers, enabling simultaneous training with images and feature values. Through MATLAB Simulink, normal and six types of fault current–voltage (I-V) curves for solar strings were simulated. From the original simulated data, ten feature values were calculated, and Gaussian noise was added before converting them into curve images for training the multi-input model.
In the validation of actual field data, the model correctly identified shading and hot spot faults; however, it misclassified crack faults as hot spot faults. Future improvements might include calculating the slope at the steps in the I-V curves of hot spot and crack faults as a new feature, enhancing the model’s accuracy in distinguishing these faults. This study only requires obtaining the current–voltage curves of solar strings to quickly identify faults. Although hot spot and crack faults may be misclassified, shading faults can be correctly identified. Hot spot and shading faults exhibit similar step features, yet this model can distinguish the differences between them. Additionally, the study currently includes only shading, hot spot, and crack faults for field validation. To ensure the model’s accuracy in real-world applications, it is hoped that future work will incorporate actual field data from other types of faults to validate the model, training it with new fault categories.