1. Introduction
The amount of solar radiation that reaches the photovoltaic cells is one of the factors that determine the efficiency of photovoltaic systems. The amount of solar radiation reaching Earth’s surface depends on the angle of incidence of the Sun’s rays, which varies depending on the time of day, latitude, and season. The terrain, including altitude, slope, and orientation of the surface affects local insolation and the intensity of radiation. Atmospheric conditions, such as cloud cover and the presence of fog affect the scattering and absorption of solar radiation in the atmosphere. The transparency of the atmosphere, determined mainly by the concentration of water vapor, carbon dioxide, and other gases also shapes the amount of energy reaching the Earth’s surface. The presence of reflective surfaces, such as snow, water, and deserts, can further modify the local intensity of radiation through reflection or scattering. The content of air pollutants (dust and aerosols) in the atmosphere, from volcanic eruptions, fires, and human activities, can reduce the intensity of radiation through increased absorption and scattering of light. The performance of photovoltaic systems (PVs) is highly sensitive to environmental variables. Irradiance remains the most critical factor, directly affecting the efficiency of energy conversion. Its variability is driven by atmospheric and astronomical conditions [
1,
2,
3].
The main pollutants that limit the performance of photovoltaic systems are air pollutants, including dust, chemicals, and organic materials. Airborne dust, such as particulate matter (PM
2.5 and PM
10), forms a layer on the surface of PV panels that blocks light from reaching the cells, resulting in shading and a reduced power output. Chemical contaminants, such as sulfuric and nitric acids from acid rain, can cause corrosion and degradation of materials, shortening the life of the panels. Organic materials, such as leaves, resin, and insects, further reduce light transmission through the surface of the panel [
4,
5]. Air pollution, particularly in the form of particulate matter, can significantly reduce photovoltaic efficiency by scattering and absorbing sunlight, as well as physically accumulating on the panel surfaces [
6,
7,
8]. Studies have shown that dust and aerosols can cause efficiency losses of up to 16%, depending on the pollutant concentration and cleaning frequency [
9,
10,
11,
12,
13]. These losses are compounded in regions with frequent dust storms or urban pollution, underscoring the need for accurate modeling and maintenance planning [
14,
15]. Temperature effects are also well documented, with the PV output decreasing by approximately 0.3 to 0.5% per degree Celsius above the nominal operating temperature. Although moderate wind can cool panels and partially compensate for temperature losses, extreme wind events can bring mechanical damage or increased dust deposition [
11].
Integrating evolutionary algorithms to predict the impact of particulate matter on power generation in PV systems improves the accuracy of the models. The algorithms allow for more accurate predictions to optimize photovoltaic systems in polluted environments, which can improve energy efficiency. In fact, air pollutants have a long-term impact on the power generated by photovoltaic panels, as evidenced by the study [
13] that showed a stable impact of pollutants throughout the year at the analyzed site. Recent research has begun to incorporate advanced computational techniques to better predict PV performance under such variable conditions. Hybrid neural networks and evolutionary algorithms have proven particularly effective as they offer improved prediction accuracy for both photovoltaic output and pollution-related energy loss [
16,
17,
18,
19]. As shown in [
20], statistical tools can help photovoltaic systems adapt to dust storms and environmental variability, while advanced artificial intelligence models, such as hybrid neural networks and evolutionary algorithms, have been successfully used to predict power generation and improve system efficiency, particularly in highly polluted regions [
15]. These methods outperform traditional statistical models and provide new tools to optimize system operation in real time, especially in highly polluted or rapidly changing environments [
20,
21,
22,
23].
In order to model these complex dependencies in a better way, researchers are rapidly adopting machine learning (ML) techniques for PV performance prediction, system optimization, and monitoring. Several recent studies have benchmarked the performance of ML models, such as Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LGBM), and Multi-Layer Perceptron (MLP), demonstrating their superiority over classical regression methods, particularly in forecasting power output based on multiple weather characteristics [
24,
25]. Beyond classical ML, deep learning models—including Deep Neural Networks (DNNs) and Long Short-Term Memory networks (LSTMs), are increasingly being applied due to their ability to model temporal dynamics and nonlinear relationships in high-resolution photovoltaic datasets. Recent works showed that DNNs not only drastically reduced the computation time but also maintained strong prediction accuracy [
26,
27]. Similarly, LSTM-based multitask models have demonstrated superior forecast performance on hourly, daily, and weekly time horizons [
28]. Evolutionary algorithms represent a complementary strategy that focuses on model structure optimization and weight tuning, especially in complex, nonconvex search spaces. Their recent use in photovoltaic modeling has included feature selection, Maximum Power Point Tracking (MPPT), and hybrid system optimization, offering competitive performance in terms of both accuracy and interpretability [
20,
21,
22,
23,
29]. Clearly available artificial intelligence (AI) is also emerging, with novel techniques that combine neural networks and linear regression to balance performance with transparency in model interpretation [
27,
30].
Despite these advances, several research gaps remain. First, many models are trained on synthetic or single-source datasets, which limit their robustness in the real world. Second, environmental conditions remain not sufficiently integrated into most ML frameworks, despite its well-established effect on photovoltaic efficiency. Lastly, optimization and forecasting are often addressed in isolation, whereas real-world applications demand simultaneous adaptation, forecasting, and control.
Recent studies have shown that statistical tools can support photovoltaic systems in adapting to environmental variability, including dust storms [
16], while advanced AI methods, such as hybrid neural networks and evolutionary algorithms, have improved the prediction of PV output and system efficiency [
15]. Based on this foundation, the present study proposes a hybrid approach that combines real-world sensor data with correlation analysis and the optimization using evolutionary algorithms to evaluate the relative influence of irradiance, temperature, wind, and particulate matter on PV output. Unlike previous works that focused primarily on prediction accuracy, this study emphasizes interpretability and operational relevance, integrating environmental data with a modular SCADA system, and using evolutionary techniques for model calibration. This approach provides a robust basis for real-time forecasting and decision support in photovoltaic system management while also laying the groundwork for future integration with advanced machine learning models such as LSTM and XGBoost, enabling adaptive control under dynamic environmental conditions.
2. Materials and Methods
The study of the photovoltaic microgrid focuses on analyzing the effects of selected environmental factors on the performance of photovoltaic panels. Data from the monitored laboratory bench and meteorological stations were used, which included information on irradiance, temperature, wind speed, and pollutant concentration. The use of correlation made it possible to identify relationships between the variables studied. The use of an evolutionary algorithm made it possible to quantify the impact of environmental factors on the power obtained and, consequently, energy production [
31].
Compared to other works, the study stands out primarily for its comprehensive addressing of the topic by combining real, multifaceted environmental measurements (including not only solar radiation but also temperature, wind speed, and pollution levels) with advanced statistical analysis and the use of evolutionary algorithms. This allows for the precise determination of the importance of individual factors and their interactions for the efficiency of photovoltaic systems, which, in the literature, is sometimes treated in a piecemeal manner or is carried out under laboratory conditions. An additional value is the focus on real operating conditions and the practical aspects of energy production management, so that the results can be applied directly to the design and optimization of photovoltaic systems.
The flowchart shown in
Figure 1 presents the methodological workflow used in this study, beginning with data collection and concluding with model validation.
The first phase involved the deployment of a photovoltaic test system equipped with integrated environmental monitoring devices. Solar irradiance, ambient temperature, wind speed, particulate matter concentration, and photovoltaic power output were collected under real outdoor conditions. In the subsequent data preprocessing stage, the raw data were cleaned, normalized, and synchronized to ensure consistent time intervals. Outliers and missing values were addressed to improve data quality and reliability. Correlation analysis was then performed using Pearson and Spearman coefficients to explore relationships between environmental variables and photovoltaic output, providing initial information on their relative dependencies. A linear model was formulated to represent the photovoltaic power output as a weighted sum of normalized environmental parameters, excluding a constant term to focus exclusively on input-driven variations. To determine the optimal weights, an evolutionary algorithm was employed, functioning as a nature-inspired, population-based optimization technique. This algorithm simulated natural selection processes, using selection, crossover, and mutation operators to iteratively minimize the fitness error between the predicted and actual photovoltaic panel output. The resulting model was evaluated to quantify the influence of each environmental parameter, with irradiance identified as the most significant factor, followed by temperature, air pollution, and wind speed. Finally, to assess the stability of the model, the optimization procedure was repeated in multiple runs. The consistency of the results confirmed the precision of the applied method. To further verify the prediction performance, further validation was proposed using independent datasets.
2.1. Instrumentation and Experimental Facility
The object of the study is a photovoltaic microinstallation with a nominal electrical output of 1200 W presented in
Figure 2a.
Figure 2b shows the SCADA interface used to monitor the photovoltaic microinstallation located on the roof of a five-story building.
The photovoltaic microinstallation under study consists of four ML-S6MF/T1-300-992/1639 panels, manufactured by ML System S.A. (Zaczernie, Poland), which use 60 monocrystalline cells with dimensions of 156 × 156 mm and a dark blue color. Two of the panels were mounted at a 35° angle, while the other two were placed at a 15° angle, all facing south (0° S). Each cell is equipped with five busbars to increase current conduction, and its front is covered with tempered glass to protect against mechanical damage and adverse weather conditions. The back of the panel is finished with a multilayer film to protect it from moisture, dirt, and UV radiation, as well as to further protect the panel structure from mechanical damage. The entire structure is reinforced by an aluminum frame, while MC-4-type connectors and an IP65-rated junction box facilitate installation and ensure the durability of the connection. A single module has dimensions of about 1 × 1.6 m and a weight in the range of 18 kg. Under standard test conditions (STCs), with solar radiation of 1000 W/m
2, a temperature of 25 °C, and an AM value of 1.5, the panel reaches a nominal power of 300 Wp and has an efficiency of 18.44%. The temperature coefficients are, respectively, +0.05%/°C for the current, −0.33%/°C for voltage, and −0.42%/°C for power, which clearly shows the effect of changing the cell’s temperature on individual electrical parameters. During normal operation, modules undergo gradual degradation caused by exposure to solar radiation, large temperature fluctuations, and other atmospheric factors. Initially, a more dynamic decrease in power is observed, which then slows down. Long-term tests show that, over 25 years, the module power can decrease from about 300 watts to 250 watts per panel, a decline of 16%, as shown in
Figure 3. The reduction in power directly translates into a reduction in the amount of electricity generated, so long-term projections of plant performance should take this degradation effect into account. However, the panels retain much of their original efficiency and can effectively produce power for 25–30 years of use.
Data from two meteorological stations located near the photovoltaic modules were used for analysis. The stations provided information at 30 s intervals on total irradiance; irradiance from the E, W, and S directions; precipitation; temperature; wind speed and direction; sun height; and concentrations of particulate matter pollutants. The electrical parameters of the photovoltaic panels under testing were measured using a calculation system that recorded current, voltage, power, and energy sent to the grid. The computing system provided average, maximum, and minimum values for 30 s updates. A general guideline for the output power (W) of a panel with a maximum rated output of 300 W is shown in
Table 1.
Actual values for illuminance (the amount of light that reachesthe panel), incident solar power (the amount of solar radiation hitting the surface of the panel), and actual output power may vary depending on location, weather conditions, and the panel’s angle relative to the Sun. At midday under clear skies, the panel may reach its maximum output of 300 W, whereas, in less optimal conditions, such as cloudy weather, the output power will be lower. The analysis included data on air pollution, output power, illuminance, temperature, and wind speeds. During the dust tests, the layer deposited on the photovoltaic panels steadily increased, leading to a reduction in the amount of power produced by the panels. The lack of maintenance, leaving the dirt behind, was a deliberate measure to show the operation of the plant under conditions of increasing environmental pollution. The study estimated the average energy loss due to contamination of the photovoltaic panel surface to be 5% per year. This estimate was obtained by comparing the power generated by photovoltaic panels in two states: cleaned and untreated. All maintenance, or lack thereof, was aimed at replicating the actual operating conditions of the plant as closely as possible.
As part of a research project focused on the real-world operation of photovoltaic installations, a comprehensive environmental monitoring system was implemented based on the Modbus Remote Terminal Unit (RTU) communication protocol. The system is fully integrated with the SCADA ML System platform, designed to monitor the performance of photovoltaic systems and conduct both energy and environmental analyses.
This monitoring solution combines meteorological data with electrical parameters obtained from PV panel diagnostics. The measurement device is the Elsner P03/3-Modbus-GPS weather station, manufactured by Elsner Elektronik GmbH (Ostelsheim, Germany), mounted on the roof of the building. It provides continuous monitoring of atmospheric parameters, including:
solar illuminance [Lx];
ambient temperature [°C];
wind speed [m/s];
rainfall presence (binary detection);
solar elevation angle (in degrees);
GPS-based location and UTC time synchronization.
Weather data are collected every 30 s and visualized in real time within the SCADA system. The platform enables the following:
mapping sensor readings to Modbus registers (ambient temperature, wind speed, rainfall, and solar irradiance);
configuring real-time dashboards that display current sensor data;
generating time-based plots for selected days, weeks, or months;
archiving the data for comparative and analytical purposes.
Sensor data are transmitted over an RS485 bus using the Modbus RTU protocol. To protect against voltage surges, a dedicated Weidmüller VSSC6 RS485 surge protector, manufactured by Weidmüller (Detmold, Germany) was installed. The entire communication line is built with a shielded Li2YCY (TP) 2 × 2 × 0.5 mm2 cable, which is compliant with industrial data transmission standards. The cable terminates in the photovoltaic distribution board, which is equipped with a 12V DC buffer power supply (IQ7.1), ensuring uninterrupted operation of the weather station during temporary grid outages.
To determine the concentration of particulate matter, data were sourced from another monitoring station equipped with the Met One Instruments BAM-1020, manufactured by Beckman Coulter Life Sciences (Indianapolis, IN, USA). This device utilizes the beta attenuation method, employing a small carbon-14 source to deliver highly accurate, near-real-time particulate readings, independent of particle type or chemical composition.
The SCADA system was integrated with four photovoltaic modules (both framed and glass–glass types), enabling real-time correlation between environmental data and electrical parameters such as voltage, current, power, and energy both fed into and drawn from the grid. Additionally, the system accounts for environmental impact metrics, including avoided CO2 emissions, the equivalent distance driven by an electric vehicle, and the estimated number of trees required to offset the same amount of emissions.
All measurement devices and infrastructure components, such as wiring, surge protection, buffered power supplies, the Enphase IQ7+ photovoltaic microinverter, the Enphase Envoy Gateway, and the P30H DC parameter transducer with Ethernet, manufactured by DITEL (Barcelona, Spain), as shown in
Figure 4a,b, are fully compliant with the SCADA system’s communication and logical requirements. The installation was carried out within a dedicated low-voltage electrical distribution panel specifically designed for photovoltaic systems and SCADA-based monitoring applications. Internal wiring clearly separates the communication and power circuits, ensuring proper signal routing, as shown in
Figure 4c. Particular attention was paid to the routing and shielding of the RS485 signal cables, according to the electromagnetic compatibility (EMC) guidelines. This is critical, as the system includes numerous devices that transmit analog and digital signals and are either sources of or susceptible to electromagnetic interference. Based on graphical data analysis and system performance, stable data acquisition, high-quality sensor signals, and full SCADA visualization functionality were confirmed, as shown in
Figure 4d.
The multiresolution data acquisition structure supports both high-frequency temporal analysis and long-term environmental impact assessment. The environmental monitoring and photovoltaic system acquires data at the following intervals:
Weather station: Data are sampled every 30 s, including parameters such as solar irradiance, ambient temperature, wind speed, rainfall presence, and solar elevation angle.
Photovoltaic system parameters: Electrical variables such as voltage, current, and power output are acquired at 30 s intervals and synchronized with environmental measurements to enable real-time correlation and analysis.
Particulate matter concentration: The PM2.5 and PM10 levels are reported as 1 h averaged values, based on continuous measurements using the beta attenuation method.
The presented system offers an effective solution to monitor environmental conditions affecting the performance of a photovoltaic system. Its use of an open communication protocol and modular architecture allows for easy expansion with additional sensors, enhanced data analysis, and the integration of weather data into predictive and control algorithms.
2.2. Data Processing
The data collection period was from 1 January to 31 December 2023. The first stage of data processing was the creation of a database containing records from sensors that monit environmental parameters and instantaneous power. This allowed the creation of a complete database of the operating parameters of the photovoltaic microinstallation under actual atmospheric conditions.
Data analysis was first carried out using two methods to assess the relationship between variables: Pearson’s correlation and Spearman’s correlation. The purpose of the analysis was to perform a correlation assessment to support data preparation and simplification, thereby improving the efficiency of evolutionary algorithms. The correlation analysis was to show which parameters would affect performance so that, based on the selected parameters, the computation time would be minimized. Then, using scripts developed in the Matlab/Simulink R2024b environment, the effect of the environmental parameters on the photovoltaic system’s energy generation was evaluated. The analyses used evolutionary algorithms to predict the impact of the environmental factors under study. Selection, reproduction, and mutation mechanisms were used to gradually generate better solutions to optimization problems. The first step was to determine the parameters of the algorithm, such as the number of crossovers, the number of mutations, the size of the population, and the number of generations, so that the algorithm could operate efficiently. The process began with the creation of an initial population of potential solutions (individual units), which contained information that was evaluated according to a set criterion. At each stage of the algorithm, called generation, solutions considered the weakest were eliminated. The selection did not introduce new elements into the population but selected those that went on to the next stages: mutation and recombination, where the crossover operator was applied. The consequence of these steps was to find the areas closest to the optimum in the search space. The newly obtained solutions formed the population for the next generation, and the algorithm continued until a certain number of generations was reached, which made it possible to obtain a satisfactory solution and estimate the generations’ relationship between the possibility of using solar energy, converted to electricity by a photovoltaic installation, and environmental conditions. The resulting power output depended on different proportions of solar irradiance, air pollution concentration, ambient temperature, and wind speed.
2.3. Calculation of Power Generated by Photovoltaic Installation
Data analysis began with Pearson and Spearman correlations, which differ in how they assess relationships and suit different data types. Pearson’s correlation coefficient, denoted as r, measures the linear relationship between two variables. Its value is in the range <−1.1>, where:
r = 1 indicates a perfect positive correlation;
r = −1 indicates a perfect negative correlation;
r = 0 means no linear relationship.
The formula for Pearson’s correlation coefficient for two variables, X and Y, is as follows:
where:
—the values of the X and Y variables in the i-th measurement;
—the average values of the X and Y variables;
—number of observations.
This formula calculates a coefficient that indicates the linear dependence of variables X and Y are, taking into account their deviations from the averages.
The Spearman correlation
ρ, as a measure of the monotonic relationship between two variables, is used when the data are not normal distributions or when we can order the data. The formula for the Spearman correlation coefficient is as follows:
where:
To determine how strongly each environmental factor influences the performance of a photovoltaic system, we used an evolutionary algorithm, a class of optimization techniques inspired by the principles of natural selection and evolution. In this approach, each possible solution to the problem, representing a unique combination of weights assigned to solar irradiance, air temperature, wind speed, and air pollution, is treated as an individual in a population. Initially, a population of random solutions is generated. Each individual is then evaluated on how well its combination of weights reproduces the actual photovoltaic power output. The process mimics biological evolution:
crossover combines traits of two individuals to form new ones;
mutation introduces small random changes to explore new possibilities;
selection favors individuals with better performance (that is, those which output matches the measured data the most).
This iterative process continues for many generations, gradually improving the quality of the solutions. The fitness function measures the error between the predicted and actual power; lower values indicate better solutions. In our calculation the goal was to minimize the fitness function and thereby find the best combination of weights. Once the process converged, meaning that subsequent generations did not produce significant improvement, we interpreted the final weights as the relative influence of each environmental factor on the performance of the photovoltaic system. The final weights were converted into percentages to make these results easier to interpret. This modeling approach not only provided quantitative insights but also proved to be a repeatable method for analyzing complex environmental interactions with photovoltaic systems. The best performance was achieved using a configuration of 200 individuals, 200 generations and 15 mutations and 15 crossovers per generation, which offered a good balance between accuracy and computational efficiency.
An evolutionary algorithm was then used to find the best solution in the set of possible solutions to the problem [
13]. The algorithm developed to determine the required coefficients is shown in
Figure 5.
The diagram shows a schematic of how this algorithm works where a single solution, called an individual, represents a potential solution to a problem with an assigned evaluation function. The algorithm process is iterative and includes the following steps:
an initial population is created—a set of random solutions to the problem;
the quality of the solutions is evaluated—each solution is given an evaluation value;
evolutionary operators are performed—mutations and crossover, which change the population structure;
the best solutions are selected based on the evaluation value;
steps 3–4 are repeated until the termination condition is met, i.e., finding a solution of satisfactory quality.
The individual is represented according to the following formula:
where:
w11, w12, w13, and w14—genes/chromosomes of individual O1;
F(w11, w12, w13, w14)—value of the fitness function of the individual O1.
An individual is a vector consisting of five elements: four are weights and one that is a fitness function. Task representation consisted of writing the task formula in the form of a vector of numbers. The purpose of the analysis was to study the effects of irradiation, wind speed, dust concentration, and temperature on electricity production. The parameters analyzed were subjected to an evaluation of their impact on power generation in a photovoltaic system. In the case described, five series of measurements were taken at the same time points (energy production, particulate matter amount, irradiance, wind speed, and temperature). Each of these series of measurements had different units and different maximum values. At first, each waveform value was normalized so that all values were within <0.1>. The energy production run was the baseline run. Each of the four runs (air pollution, irradiance, wind speed, and temperature) was assigned a factor. The weighting factor was a numerical measure of the degree of influence of one factor on another. The four co-factors were represented as floating point numbers in the range <0.1> and calculated using an evolutionary algorithm. These cofactors formed a single individual in the form of a vector of 4 elements. The fifth element of the value of the vector was the fitness function (F), defined as follows:
where:
—value of the fitness function for the current j-th individual;
n—the number of measurement points, n = 8760;
i—the current time;
j—the current individual;
—the normalized amount of energy produced at time i;
—the normalized amount of sunlight reaching a unit surface area of the Earth at time i;
—the normalized wind speed at time i;
—the normalized particulate matter concentration at time i;
—the normalized temperature at time i;
—the sought dependency coefficients.
Five graphs are plotted to analyze the effects of irradiance, wind speed, air pollution, and temperature on electricity production. The goal of the fitness function is to minimize and evaluate the coefficients. In the limiting case, the F-function reaches a value equal to 0. This situation occurs when the sum of the graphs: multiplied by the corresponding coefficients reproduces the graph of the power generated by the photovoltaic installation. When this condition is met, we can make a detailed analysis of the coefficients. Their values make it possible to determine to what extent the various factors affect the electricity production of the panel under analysis. It is possible to determine the percentage contribution of each factor: in shaping the power graph. This approach makes it possible to quantify the influence of each of the studied parameters on the efficiency of the photovoltaic system.
The analyzed population is a fixed set of individuals. The initial population is the set of individuals generated and evaluated in the first step of the algorithm. The new population is the set of individuals that will be included in the next generation. The operators of the evolutionary algorithm are responsible for generating new individuals, which are created based on the current population. Depending on the type (crossover or mutation, the evolutionary algorithm operator selects one or more individuals from the population and processes their coordinates (or chromosomes). As a result, an adaptation function value is assigned to the new individual (new vector). The newly created individual is then added to the current population. Selection chooses individuals from the current population to form the next iteration of the algorithm. Tournament method selection was used in the analyses. In this step, the program increases the number of individuals in the population by the number of individuals generated by the operators of the evolutionary algorithm. The selection process involves selecting three individuals from the current population. Their adaptation function values are compared with those of the new population, and the individual with the lowest adaptation function value is selected for the next generation. This process is repeated until the assumed number of individuals in the new population is reached. To make the coefficients obtained in the evolutionary algorithm more readable, they are often converted to percentages, especially when the range is <0.1>. The formula for converting weights into percentages is as follows:
These formulas normalize the values of the weights, respectively, for so that their sum is 100%, making it easier to interpret their relative contributions to the system. After writing the evolutionary algorithm, a ten-fold calculation was performed to assess the stability of the results. Convergence was considered achieved when the differences in coefficients between successive generations were small, not significantly affecting the final results and interpretation of the problem. The convergence criterion was based on the stability of the coefficients. The values of changed from the average value between generations by less than 3.5% (with the average change for all values being 0.62%), which met the tolerance threshold set at 1%, confirming the efficiency of the algorithm and its ability to solve the task in a stable manner.
Data analysis uses the Pareto principle as a tool for prioritization and decision making. The Pareto principle, also known as the 80/20 rule, describes a phenomenon in which about 80% of the effects are due to 20% of the causes. This concept is particularly useful in data analysis, because it allows the identification of the most important variables or factors that have the greatest impact on the outcome. In the study, the Pareto principle was used to analyze the distribution of indicators to determine which have the minimum impact on the bottom line and which are the most important for achieving maximum benefits. A Pareto front graph was prepared to represent the trade-off between multiple objectives in multicriteria optimization. The line in the criteria plane is a form of visualization.
Based on the results of analyses that included Pearson’s and Spearman’s correlations and the application of evolutionary algorithms, conclusions are presented regarding the influence of environmental factors on the obtained power in photovoltaic energy.
3. Results and Discussion
The data were first analyzed using two different methods to assess the relationship between variables, i.e., Pearson’s correlation coefficient and Spearman’s rank correlation. Additionally, evolutionary algorithms were used to adjust the simulation parameters, which was necessary due to the difficulty in determining the exact input values. Evolutionary algorithms were employed to solve a complex problem where classical methods proved insufficient due to the extensive search space. The calculations involved multiple iterations for different populations, the number of generations, mutations, and crossovers, resulting in stable results. The population of possible solutions gradually evolved through successive iterations toward the best solution, increasing the fit and approaching the desired result. Then, a multivariate analysis of the causes and effects of climatic factors on energy production was conducted.
A mathematical model was developed to describe the relationship between the output and selected measured parameters. This model presented in Equation (4) takes the form of a linear regression without a constant term, where the basic functions consist of the individual environmental variables: normalized solar irradiance, wind speed, particulate matter concentration, and temperature. The weights of the model were optimized using an evolutionary algorithm. At this stage, the model was used exclusively to assess the relative influence of each parameter on the output. However, in future work, its predictive performance will be evaluated using additional validation data. In addition, alternative formulations, including nonlinear models, will be explored to potentially enhance both the accuracy and practical applicability of the approach.
3.1. Correlation Analysis
In order to obtain information about the relationships between variables, which can help formulate appropriate computational strategies, correlation analyses were carried out before applying evolutionary algorithms. Analysis of evolutionary algorithms in combination with correlations allows a more efficient solution of complex problems that require both search mechanisms and evaluation of the quality of the solutions obtained. Evolutionary algorithms, thanks to their ability to explore the solution space, make it possible to identify optimal solutions in difficult problems, while correlations make it possible to assess the connection between variables. This synergistic approach is particularly useful in engineering, where the relationships between parameters can be complex and challenging to capture using traditional methods. The integration of evolutionary algorithms with correlation analysis allows the algorithm parameters to be fine-tuned to the specifics of the problem being solved, especially in systems that require dynamic analysis of large datasets.
Figure 6 shows Pearson’s and Spearman’s correlation results for selected parameters.
Based on the correlation table presented we can conclude that the main factor affecting the power obtained from photovoltaic is illuminance, while temperature, air pollution, and wind speed have a weaker effect on the energy produced. Analyzing the conclusions on the influence of environmental factors on the power of photovoltaic panels, it was noted:
Conclusions on the relationship between environmental factors:
a weak negative correlation between air pollution and solar irradiance suggests that pollution can limit the amount of light reaching the PV surface, which may reduce the efficiency of the panels [
11,
13,
35];
an average positive correlation between temperature and irradiance indicates that the two can be associated with similar atmospheric conditions, suggesting their simultaneous occurrence under favorable conditions for solar energy production [
11,
13,
36];
a weak negative correlation between wind speed and air pollutant concentration suggests that the wind can disperse pollutants, which can positively improve air quality and increase the efficiency of PV panels [
11,
13,
37].
The correlation analysis carried out before the application of evolutionary algorithms allowed a better understanding of the relationships between variables and the prioritization of factors, which, combined with the optimization mechanisms of the algorithms, made it possible to effectively solve the problem of assessing the impact of environmental conditions on the performance of photovoltaic panels.
3.2. Analyzing Data Using Evolutionary Algorithms
It was decided to perform a multivariate analysis, the results of which were presented in graphs. The graphs in
Figure 7 and
Figure 8 present the influence of climatic factors on the energy production in photovoltaic panels by analyzing the effect of changing the number of individuals and the number of generations in the population of the chosen number of mutations and crossovers on the fitness function. In an evolutionary algorithm, the relationship between the fitness function and the correctness of the results is closely related. In our case, the goal is to minimize the fitness function. With minimization, lower values of the fitness function mean a better solution, indicating a better fit of the model to the data and getting closer to the best solution in the area. The fitness function evaluates the quality of the solution, and its lower value signals greater efficiency and relevance to the solution of the problem.
The graphs illustrate the effect of changes in the number of individuals in the population and the number of generations on the value of the fitness function, with a changing number of mutations and number of crossovers, in the context of maximizing the efficiency of the search process in the solution space. Analyzing the graphs, it was noted that:
analyzing the number of individuals and generations (25 individuals and 25–200 generations), it was noted that, with a limited number of individuals and generations, the algorithm shows high instability and inefficiency, leading to local maxima and suboptimal solutions. Increasing the number of generations improves the quality of the results, but with a small number of individuals, the algorithm does not achieve sufficient stability;
analyzing the number of individuals in the population (50–200 individuals), it was noted that increasing the number of individuals improves the quality of the results, allowing more accurate exploration of the search space and better representation of possible solutions. A larger population improves the stability of the results, which leads to a more efficient search space;
analyzing the number of generations, it was noted that a higher number of generations (50–200) with a larger number of individuals promotes the stability of the results and allows deeper exploration of the search space;
The configuration that ensures stability of the results and allows the best solutions is 100–200 individuals and 50–200 generations.
Changes in the value of the fitting function depending on the simulation parameters, such as the number of mutations (nmut) the number of crossovers (n(crs), the number of individuals in the population, and the number of generations, indicate the relationships:
analyzing the change in the number of generations (from 25 to 200), it was noted that, for a smaller number of generations (
Figure 8a,f), the graphs of the fitness function show greater irregularity, suggesting that the population has not managed to optimize the value of the fitness function. There are larger oscillations and smaller maximum values of the function. Increasing the number of generations (
Figure 8b–e and
Figure 8g–j) leads to higher maximum values of the fitness function, indicating an improvement in the quality of the solution. The higher number of iterations allows the algorithm to better adjust the parameters;
analyzing the effect of the number of individuals in the population (100 vs. 200), it was noted that the graphs for a population of 200 individuals (
Figure 8f–j) show more pronounced maxima and higher values of the matching function compared to a population of 100 individuals (
Figure 8a–e). A larger number of individuals provides better coverage of the search space, resulting in higher efficiency of the algorithm;
analyzing changes in the number of mutations and crossovers (0 vs. 25), it was noted that, for a smaller number of generations (
Figure 8a,b,f,g), the fitness function is more sensitive to changes in the number of mutations and crossovers, resulting in a large variation in the function area. Increasing the number of generations (
Figure 8d,e,i,j) results in stabilization of these parameters, and the extremes of the fitness function appear for more consistent values of the number of mutations and crossovers;
a larger number of individuals in the population (200) allows for a more efficient search space. Increasing the number of generations allows the evolutionary algorithm to better fit the parameters to the data. Mutation numbers and crossover numbers are more pronounced with fewer generations. In contrast, in a longer evolution, their role diminishes as the algorithm manages to adapt to the complexity of the search space. The best results are obtained with a combination of 200 individuals and 200 generations, as shown in
Figure 8j, making this configuration the most efficient solution from the perspective of designing an optimization system.
To achieve the expected results in the evolutionary algorithms, it was important to balance the number of generations with the number of individuals in the population. A higher number of generations was conducive to stabilizing the results, allowing more accurate adjustment of the parameters, while a higher number of individuals allowed more efficient exploration of the solution space. The combination of the two ensured an efficient search of the space and contributed to better optimization results.
The experiments proved that an appropriate balance of the number of generations and individuals was necessary for the best results. Analysis of the experimental results (
Figure 7 and
Figure 8) indicated that effective outcome was obtained with a configuration with 100–200 individuals and 50–200 generations, which ensured the stability of the results and efficient search of the solution space. Although the number of mutations and crossovers affected the quality of the matching, their effectiveness depended on the number of generations and the size of the population. The most effective values of the matching function were obtained with 200 individuals and 200 generations. For smaller values of the number of generations and individuals, the algorithm became unstable, which reduced its efficiency.
The graph with the Pareto front, or set of solutions that are optimal in the Pareto sense, which is shown in
Figure 9, is an important tool in the decision-making process, enabling the identification of the most relevant parameters that have the greatest impact on the results. This has made it possible to more precisely direct the evolution process toward the most efficient solutions. This type of analysis supports the management of parameters and minimizes the negative effects that can occur when searching the solution space.
Based on the analysis of the Pareto graph, the optimal values for minimizing the fitness function are located in regions where the number of mutations (nmut) is in the range of 10 to 20 and the number of crossovers (ncrs) in the range of 10 to 20. The Pareto front (orange line) indicates sets of parameters that are consistently better (that is, one value cannot be improved without worsening another). These values were determined by evaluating the Pareto graph, locating the points where the surface is lowest. The values adopted for minimizing the fitness function are:
The combination of 15 mutations and 15 crossovers leads to the most efficient results in the prepared evolutionary algorithm. Thus, for further calculations, the number of crossovers per generation was equal to 15, and the number of mutations per generation was equal to 15, i.e., the middle values of the Pareto front (from the orange line) were taken as the points representing the most balanced compromise between the number of crossovers and the number of mutations. This equilibrium point being the most convenient setting of the number of mutations and crossovers is a reference value and a starting point for further optimization, depending on changes in the other parameters. The history of the evaluation function was used to determine the number of generations and the number of individuals in the population. The expected graph of the evaluation function should resemble a hyperbola, large values at the beginning and a decreasing function with an asymptote at infinity. Three graphs were considered for 50 individuals in the population, as shown in
Figure 10.
Based on the analysis of the graphs shown in
Figure 10a–c, the graphs in
Figure 10a,b best fit the assumptions. The presence of an asymptote indicates that the evolutionary algorithm has reached the minimum. The result no longer improves but stabilizes at a constant level at these computational points. To optimize the computational time, the algorithm can be turned off just before the asymptote is reached, which allows one to avoid unnecessary calculations that do not bring new information to the optimization process. On the other hand, the graph in
Figure 10c shows a continuous decrease in value, with no clear asymptote.
The population size can be increased to speed up the evolution process and reduce the time needed to achieve the optimal results. However, caution should be exercised, as an excessive population size can lead to increased computation time without improving the final results. A population that is too large compared to the number of mutations and crossovers can not only increase the computation time but also reduce the efficiency of the algorithm.
Figure 11a–c show the analysis of the evaluation function for different configurations: 200 individuals in 200 generations, 200 individuals in 1000 generations, and 1000 individuals in 1000 generations.
Figure 11a,b show the presence of asymptotes, which appear around about the 150th generation. This means that further operation of the algorithm can be stopped, as the result of the calculation is not improved, and the calculation time can be reduced. For comparison,
Figure 11c shows the history of the fitness function for 1000 generations and 1000 individuals. As can be seen, despite the increase in the number of generations and the increase in the number of individuals, the accuracy of the calculation has not improved, and the calculation time has increased. Analysis of the above results indicates that the most optimal settings for the number of generations and the number of individuals in the population are the values of 200 adopted for the calculations because:
once the horizontal asymptote is reached, further increase in the computation time becomes unjustified, as it does not improve the results;
in the initial stage, the graph takes the form of a hyperbola, suggesting the effectiveness of these parameters in the initial stages of optimization.
Finally, the following were accepted for further analysis:
number of mutations (nmut) = 15;
number of crossovers (ncrs) = 15;
number of individuals = 200;
number of generations = 200.
The analysis performed determined the percentage effect of air pollution, solar irradiance, wind speed, and temperature on the energy production of photovoltaic panels using an evolutionary algorithm. After developing the algorithm, 10 iterations of the calculations were carried out, which showed similar results. The results of the iterations are summarized in
Table 2.
The obtained concordance of the results confirms the convergence of the algorithm to the minimum, which proves its stability and effectiveness in solving the posed optimization problem. The results obtained are as follows:
illuminance: = 97.79%;
wind speed: = 0.95%;
air pollution: = 0.72%;
temperature: = 0.54%.
The methodology is a computational approach to quantifying the impact of selected environmental parameters on the efficiency and performance of photovoltaic systems. The calculations are based on data collected throughout the year, providing a comprehensive statistical and temporal analysis. The fitness function serves as a pivotal metric, enabling the evolutionary algorithm to refine and optimize the solution iteratively. The approach is effective in analyzing complex multivariate relationships between environmental variables and energy production in photovoltaic systems.
3.3. Model Evaluation
Figure 12a–d show normalized data for a randomly selected three-day period to test the fitness function, demonstrating how environmental variables change over time and affect the performance of PV panels [
38].
These visualizations enable to assess the dynamic interaction between the energy produced and the environmental factors analyzed over a specific time interval. The computational results indicate that solar radiation has the most significant impact on the power generation of photovoltaic panels, accounting for 97.79% of the total influence. In comparison, temperature contributes a much smaller effect (0.95%), followed by air pollution (0.72%) and wind speed, which has the least impact at 0.54%. Among the analyzed factors, solar radiation stands out as the primary determinant affecting the energy output of photovoltaic panels.
3.4. Statistical Analysis Conclusions (Pearson’s and Spearman’s Correlations)
Correlation analysis showed that irradiance is the most important factor affecting the efficiency of photovoltaic systems. Pearson and Spearman correlation coefficients indicated a strong positive correlation, which means that an increase in solar radiation directly translates into higher electricity production.
Temperature showed a moderate positive correlation with PV output. In temperate latitudes, where temperatures do not reach extremes, its effect on panel efficiency can be beneficial. Under suitable climatic conditions, an increase in temperature does not significantly reduce efficiency, a result of the coexistence of heat with adequate sunlight.
Air pollution (particulate matter) showed a weak negative correlation with the photovoltaic output. Although the direct impact is small, pollution reduces the amount of light reaching the surface of the panels, which indirectly reduces their efficiency. The relationship between pollution and solar irradiance suggests that higher levels of dust in the atmosphere reduce the availability of solar radiation.
Wind speed had a weak positive correlation with the efficiency of photovoltaic panels. This means that wind can affect the cooling of the panels, which, in some cases, improves their efficiency. However, under the conditions analyzed, its influence is negligible. There are relationships between environmental factors:
air pollution vs. irradiance: a weak negative correlation indicates that higher levels of pollution reduce the amount of light reaching the panels, which can reduce the efficiency of energy production;
temperature vs. irradiation: the average positive correlation suggests that the two often occur together under favorable weather conditions, such as sunny days, which supports energy production;
wind speed vs. air pollution: a weak negative correlation may mean that wind contributes to the dispersion of pollutants, improving air quality and indirectly increasing panel efficiency.
The results of the correlation analysis made it possible to isolate the important variables that affect the performance of photovoltaic systems, which, in turn, enabled a more effective application of evolutionary algorithms. By identifying in advance the factors of fundamental importance, it was possible to optimize the calculation process and better adapt predictive models. The results of the statistical analysis provided guidance on optimizing the design and operation of photovoltaic systems. The strong dependence on irradiance underscores the importance of proper panel location selection, while the influence of pollution and temperature indicates the need to take into account local environmental conditions when planning photovoltaic installations.
In this study, Pearson and Spearman correlation coefficients were calculated, each serving a different purpose based on the characteristics of the data analyzed. The use of both methods was a deliberate and justified choice to capture different types of relationships between environmental variables and the performance of the photovoltaic system.
Table 3 summarizes the type of correlation used and provides the rationale for its selection.
Pearson’s coefficient was applied to identify and quantify the linear relationships between environmental parameters and the output of the photovoltaic system, particularly in cases where a linear dependency was theoretically expected, such as between solar irradiance and the generated power. This method assumes normally distributed continuous variables and was employed where those conditions were approximately satisfied.
On the contrary, the Spearman rank correlation coefficient was used to detect monotonic but potentially nonlinear relationships, especially for variables such as wind speed and particulate matter concentrations, which may not exhibit linear dependencies or may include outliers and nonnormal distributions. By ranking the data, Spearman’s method offers robustness against such irregularities and provides a more flexible tool to identify hidden patterns of environmental influence.
Combining both types of correlation ensures a comprehensive analysis of dependencies, enhancing the reliability and interpretability of the results under real-world measurement conditions.
This study combined traditional statistical methods with evolutionary algorithms to evaluate the influence of selected environmental variables on photovoltaic system performance under real-world conditions. While the Pearson and Spearman correlation coefficients provided an initial assessment of variable dependencies, the core novelty lies in the application of the evolutionary algorithms to optimize a multivariate linear model in a complex, high-dimensional search space, where conventional optimization techniques often underperform.
The correlation analyses revealed a strong linear and monotonic relationship between the PV power output and solar irradiance, confirming its role as the dominant factor influencing the performance of the system. However, temperature, particulate matter concentration, and wind speed showed only weak to moderate correlations, suggesting limited or indirect effects under the tested climatic conditions.
These correlations provided useful initial guidance, but they are inherently limited in capturing multivariate interactions and nonlinear effects, common in real-world energy systems. For example, while temperature and solar irradiance are often correlated, their combined impact on PV performance is modulated by factors such as panel thermal dynamics and local ventilation conditions. This underscores the need for a more robust modeling approach capable of handling complex interdependencies.
3.5. Conclusions from an Analysis with Evolutionary Algorithms
To overcome the limitations of correlation-based interpretation, an evolutionary algorithm determined the optimal set of weights in a linear model without a constant term. This method allowed for iterative refinement of parameter influence by minimizing a fitness function based on the difference between predicted and actual PV power output. The analysis, based on 10 independent runs of the algorithm, consistently showed that solar irradiance accounts for an average of 97.79% of the modeled influence on photovoltaic output. The remaining contributions were marginal: temperature (0.95%), air pollution (0.72%), and wind speed (0.54%). These results quantitatively confirm the dominant role of solar irradiance while reinforcing the secondary character of other environmental factors under moderate climatic conditions. In particular, the convergence and consistency between runs underscore the stability and reliability of the genetic algorithm, validating its use in modeling the behavior of solar panels in the real world. However, the model remains linear, and future work should explore whether nonlinear relationships, for example, between temperature and efficiency, could be better captured using more advanced forms, such as kernel regression or neural networks.
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11 illustrate how algorithmic performance critically depends on the configuration of evolutionary parameters. The experiments demonstrated that 200 individuals and 200 generations, combined with 15 mutations and 15 crossovers per generation, strike the optimal balance between computational cost and convergence accuracy. Larger populations or longer generations yielded diminishing returns, while smaller settings led to premature convergence and suboptimal solutions.
Pareto front analysis further helped isolate an efficient region in the search space, identifying parameter settings that simultaneously minimize the fitness function and balance evolutionary diversity. This methodological insight not only achieves the accuracy of the current model but also provides a guideline for future algorithm-based optimizations in PV performance modeling.
Although the dominant influence of solar irradiance is consistent with the previous literature and theoretical expectations, the relatively minor role of other variables, particularly air pollution, merits closer scrutiny. Although the algorithm quantifies the direct influence of particulate matter on the power output as small, its indirect impact through reduced irradiance can be substantial, especially in urban or industrial settings. This indirect pathway is not explicitly modeled in the linear framework and represents a limitation of the current approach.
The small effects of temperature and wind should not be misinterpreted as universally negligible. In hot climates or systems with poor ventilation, thermal effects can significantly degrade photovoltaic efficiency. Similarly, wind-induced cooling may become relevant in certain geographical regions or mounting configurations. Thus, while the current model provides a valuable general framework, its applicability must be context-sensitive and complemented by more localized or scenario-specific refinements.
The results of the analysis can be used to optimize energy production management strategies for photovoltaic systems, especially in the context of adaptation to changing environmental conditions. Evolutionary algorithms can also be used to predict the performance of photovoltaic systems under different climatic and environmental conditions.
4. Conclusions
The use of an evolutionary algorithm has enabled to accurately determine the impact of individual environmental factors on the performance of photovoltaic panels, the most important of which is solar irradiance, which is responsible for more than 90% of the power achieved by photovoltaics.
The results of the correlation analysis between environmental factors and the efficiency of photovoltaic panels confirmed that a strong relationship between the quantity of incoming solar radiation and the amount of electricity produced; an increase in irradiance leads to a correspondingly higher output. Other factors, such as temperature, air pollution, and wind speed show a secondary effect on the efficiency of the panels, but this does not mean that their role can be completely neglected.
The analysis of the relationship between environmental factors has shown that especially air pollution has a substantial indirect effect on panel performance by reducing the solar input, as confirmed by the research results [
38,
39,
40].
The data obtained provide valuable guidance for decisions on the location and orientation of the photovoltaic panels and allow for more effective forecasting of daily or seasonal electricity output. However, more research is needed to precisely determine how the various factors and their interactions affect the variability in panel performance, especially under complex environmental conditions where there are simultaneous changes in temperature, wind speed, and pollution levels.
This study demonstrated the effectiveness of evolutionary algorithms, specifically genetic optimization, in quantifying the influence of environmental variables on the performance of photovoltaic systems in the real world. The proposed modeling approach proved to be stable, repeatable, and well suited for analyzing complex environmental interactions using real-time data.
From a practical standpoint, the findings of this study can support the development of more intelligent photovoltaic system designs, such as optimizing module orientation and array configuration based on site-specific climatic conditions. Additionally, they enable the implementation of predictive maintenance strategies—particularly in the context of dust accumulation and thermal performance degradation—thereby enhancing the efficiency, reliability, and operational longevity of PV installations operating under diverse environmental conditions. These practical implications form a foundation for future research directions aimed at enhancing photovoltaic system adaptability, forecasting accuracy, and operational intelligence:
apply time series forecasting techniques with a focus on LSTM networks, because it is a type of recurrent neural network capable of modeling temporal and sequential dependencies in data, well suited for PV output prediction based on environmental trends, and XGBoost, because it is a powerful tree-based gradient boosting algorithm designed to handle complex patterns and interactions in data efficiently and accurately;
incorporate additional environmental parameters to enrich the input of the model and capture context-specific effects, such as dust and soiling levels, which directly reduce light absorption on PV panels and lower the overall system efficiency. As part of ongoing research, we are collecting data on Saharan dust events and comparing their impact with locally accumulated dust on photovoltaic modules. This will allow for a more detailed understanding of soiling-related losses and their integration into power output prediction models;
investigate the real-time implementation of intelligent control strategies using machine learning for adaptive PV system management under varying weather conditions. This includes designing automated decision-making tools that can adjust system settings, such as inverter control, load shift, and cleaning schedules, in response to predicted environmental changes, thus maximizing performance and reducing operational losses;
evaluate the transferability of the model across climates and seasons by testing predictive models trained in one region on datasets from others. This will help assess how well machine learning methods capture universal vs. location-specific patterns and guide the development of adaptive retraining mechanisms;
integrate predictive modules with decision support systems for energy management in smart grids, where PV forecasting becomes part of broader scheduling, storage, and dispatch optimization processes.
Together, these directions will improve model accuracy and enhance the adaptive intelligence of photovoltaic systems, enabling them to respond effectively to both long-term climate trends and short-term weather fluctuations. By focusing on time series forecasting models, incorporating context-specific environmental parameters and evaluating cross-regional applicability, future works will support the development of robust, predictive, and operationally responsive photovoltaic systems. These advances will also facilitate the integration into smart grid decision support platforms, optimizing energy management in various environmental conditions.