Next Article in Journal
The Concept of a Digital Twin for the Wały Śląskie Hydroelectric Power Plant: A Case Study in Poland
Previous Article in Journal
Residual Lipids Pretreatment Towards Renewable Fuels
Previous Article in Special Issue
Short-Term Electric Load Forecasting for an Industrial Plant Using Machine Learning-Based Algorithms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimizing Bifacial Solar Modules with Trackers: Advanced Temperature Prediction Through Symbolic Regression †

by
Fabian Alonso Lara-Vargas
1,2,
Carlos Vargas-Salgado
2,3,*,
Jesus Águila-León
4 and
Dácil Díaz-Bello
2
1
Programa de Ingeniería Electrónica, Grupo de Investigación ITEM, Universidad Pontificia Bolivariana Seccional Montería, Montería 230001, Colombia
2
Institute for Energetic Engineering, Universitat Politècnica de València, 46022 Valencia, Spain
3
Electrical Engineering Deparment, Universitat Politècnica de València, 46022 Valencia, Spain
4
Departamento de Estudios del Agua y de la Energía, Universidad de Guadalajara, Guadalajara 44410, Mexico
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published at the 19th Conference on Sustainable Development of Energy, Water and Environment Systems, Rome, Italy, 8–12 September 2024.
Energies 2025, 18(8), 2019; https://doi.org/10.3390/en18082019
Submission received: 10 March 2025 / Revised: 3 April 2025 / Accepted: 10 April 2025 / Published: 15 April 2025

Abstract

:
Accurate temperature prediction in bifacial photovoltaic (PV) modules is critical for optimizing solar energy systems. Conventional models face challenges to balance accuracy, interpretability, and computational efficiency. This study addresses these limitations by introducing a symbolic regression (SR) framework based on genetic algorithms to model nonlinear relationships between environmental variables and module temperature without predefined structures. High-resolution data, including solar radiation, ambient temperature, wind speed, and PV module temperature, were collected at 5 min intervals over a year from a 19.9 MW bifacial PV plant with trackers in San Marcos, Colombia. The SR model performance was compared with multiple linear regression, normal operating cell temperature (NOCT), and empirical regression models. The SR model outperformed others by achieving a root mean squared error (RMSE) of 4.05 °C, coefficient of determination (R2) of 0.91, Spearman’s rank correlation coefficient of 0.95, and mean absolute error (MAE) of 2.25 °C. Its hybrid structure combines linear ambient temperature dependencies with nonlinear trigonometric terms capturing solar radiation dynamics. The SR model effectively balances accuracy and interpretability, providing information for modeling bifacial PV systems.

1. Introduction

The global transition toward sustainable energy systems is necessary to address climate change and reduce reliance on fossil fuels. The consumption of fossil fuels contributes over 75% of global greenhouse gas emissions and approximately 90% of carbon dioxide emissions [1]. Consequently, solar energy is increasingly recognized as a clean, sustainable, and essential alternative for mitigating dependence on non-renewable resources [2].
In this context, bifacial photovoltaic (PV) modules have emerged as an advanced technology capable of capturing solar radiation from both surfaces, thereby overcoming the inherent limitations of traditional monofacial PV modules [3]. This innovative approach can potentially increase energy production by up to 20% under optimal conditions, enhancing solar energy efficiency [4].
Bifacial PV modules absorb direct solar radiation on the front surface facing the sun and reflect light on the rear surface, increasing their overall energy production efficiency by 5–20% [5,6]. Although the performance of bifacial PV modules depends on various factors, the angle of incidence effect is critical owing to its simultaneous effects on both module surfaces [7].
Additionally, dust accumulation on the panels and ambient temperature fluctuations significantly reduce their efficiency and energy generation capacity [8]. High solar irradiation and elevated ambient temperatures in tropical regions lead to thermal management challenges [9]. Given that the module efficiency decreases by 0.52% per °C rise above 25 °C in such climates [10], the module temperature directly impacts electrical performance, lifespan, and operational safety [11]. Therefore, developing accurate temperature prediction models for bifacial PV modules is a technical necessity that impacts the scalability and economics of solar energy as a sustainable long-term solution.
Bifacial PV module temperature prediction has garnered significant interest because it directly impacts solar system efficiency. Existing approaches include empirical models, energy balance frameworks, and machine learning (ML) techniques, each with distinct advantages and limitations. For instance, Meflah et al. [12] reported that the developed empirical models achieve a coefficient of determination (R2) exceeding 0.91 and a root mean square error (RMSE) below 3.74 °C under clear-sky conditions. However, these models assume linear relationships between solar radiation, ambient temperature, and module temperature, which limits their capacity to capture complex nonlinear interactions.
Other studies used the normal operating cell temperature (NOCT) model provided by the PV module manufacturer [13]. These studies presented mathematical models for estimating the solar panel temperature and achieved a coefficient of determination (R2) of 95% [14].
Thermal models based on heat transfer principles, such as those incorporating Grashof, Nusselt, and Rayleigh numbers, predict module temperature with mean absolute errors (MAEs) as low as 3.09 °C [15]. However, validation periods for these models are often limited to short-term intervals (e.g., days), raising concerns about their robustness under annual climatic variability [15]. Kaplani et al. [16] developed a computational approach based on the energy balance equation incorporating PV system orientation and natural and forced convective heat transfer. Even though the predicted temperature profiles closely aligned with measured values with annual temperature differences predominantly ranging within ±5 °C, the test period covered only six days. Zaimi et al. [17] provided analytical expressions for electrical parameter variations of the module as a function of temperature and irradiance; however, they limited their model comparison to a single day, which may not consider the variability of environmental conditions throughout the year.
Haeberle et al. [18] proposed an approach based on energy balance that presents an approximation with mean absolute errors (MAEs) below 8.5% under NOCT conditions despite incorporating conduction, convection, and radiation mechanisms. However, the computational complexity of these models, owing to the numerical iterations and the high number of variables involved, may limit their applicability in real-time and diverse operational environments [18].
Moreover, Keddouda et al. [19] reports that an R2 of 0.963 and MAE of 1.883 °C were achieved for an implicit model through a simulation model based on heat transfer principles. Similarly, an alternative explicit approach obtained a Mean Square Error (MSE) of 3.505. Despite their higher accuracy, these methods require substantial computational resources and heavily depend on assumptions that may not hold under dynamic conditions.
Furthermore, Kayri et al. [20] employed artificial neural networks (ANNs) to model module temperature, thereby achieving high-precision metrics such as an MAE of 1.45 °C, RMSE of 2.07 °C, mean percentage error (MAPE) of 6.37%, and correlation of 98.87%. Although these results are highly competitive in terms of error, the "black box" nature of ANNs impedes physical interpretation of how each environmental variable affects the module temperature.
The neural network model by Kayri et al. [20] exhibited robust performance, exhibited 98% correlation between actual and predicted values, and achieved an MAE of less than 1.45 °C. However, the study did not report additional metrics, such as the coefficient of determination (R2), which would have provided a more comprehensive assessment of the model’s predictive capabilities. Moreover, their model required considerable computational power and specialized knowledge for its creation and deployment, limiting its accessibility to a broader range of researchers and practitioners [20].
Finally, Keddouda et al. [21] compared twelve machine learning algorithms, reporting that the best model based on an ANN achieved an R2 of 0.986, MAE of 0.982 °C, and MSE of 2.181, while another nonlinear model attained an R2 of 0.981, MAE of 1.476 °C, and MSE of 3.464. Despite the high accuracy, these models require an extensive hyperparameter tuning process and lack an interpretative structure that allows for the extraction of physical knowledge from thermal dynamics.
Previous studies have investigated the energy production characteristics of bifacial PV systems and the influence of solar radiation on module temperature [22]. However, research using symbolic regression for temperature prediction in bifacial PV modules with solar trackers in tropical climates remains limited.
The approach proposed in this study is based on symbolic regression using genetic algorithms (GAs) to derive explicit mathematical expressions that describe the temperature of bifacial PV modules with trackers. Its primary advantages lie in the following:
  • Modeling complex relationships without predefined structures: symbolic regression enables the discovery of intrinsic nonlinear interactions among environmental variables without imposing a specific functional format, thereby overcoming the limitations of empirical and linear models [23].
  • Interpretability and transparency: the obtained equations are explicit, facilitating understanding of each factor’s (e.g., solar irradiation’s) influence on bifacial PV module temperature, contrary to ANNs and other machine learning models [24].
  • Efficiency and robustness: because the proposed model does not rely on an extensive training phase or complex adjustments, its implementation in practical applications is much more simplified [25].
  • Applicability to bifacial systems with trackers: the approach is specifically adapted to the configuration of bifacial PV modules in tracker systems, optimizing energy capture and enhancing system operability.
While symbolic regression has proven effective in modeling nonlinear dynamics across energy systems, such as wind speed characterization in wind power applications [25], its potential remains underexplored for photovoltaic thermal analysis.
This study addresses these gaps by proposing a symbolic regression model for predicting the operating temperature of bifacial PV modules equipped with solar trackers. Unlike traditional methods, this approach balances accuracy, interpretability, and computational efficiency, making it suitable for practical implementation.
Given the unique climatic conditions in tropical regions characterized by high solar radiation and elevated ambient temperatures, this study addresses the need for predictive models specifically adapted to these environments.
The effectiveness of the proposed model was evaluated in comparison with three established approaches, viz., the multiple linear regression (MLR) models, NOCT model, and a mathematical estimation model based on regression analysis using RMSE, MAE, Spearman’s correlation coefficients (SCC), and R2. Additionally, the influence of solar radiation and ambient temperature on the temperature of the bifacial PV modules were investigated. The symbolic regression model utilizes solar radiation data (W/m2), PV module temperature (°C), and ambient temperature (°C) collected at 5 min intervals over one year.
Although the existing approaches (particularly those based on ANN and other machine learning algorithms) offer highly competitive error metrics, they are limited by their lack of interpretability and the need for complex training configurations. Conversely, this study presents a transparent, interpretable solution that optimizes bifacial PV tracking systems. This study aims to develop a symbolic regression model to predict the operating temperature of bifacial PV modules equipped with solar trackers in tropical climates, balancing accuracy, interpretability, and computational efficiency.
The article’s organization is as follows: Section 2 outlines the characteristics of the studied PV system, data collection and processing methods, and design of the symbolic regression algorithm. Section 3 presents the results and analysis, comparing the performance of the SR model with those of traditional methods. Section 4 discusses the implications of the findings, including the trade-off between balancing accuracy, interpretability, and computational efficiency. Finally, Section 5 presents the conclusions, implications, and future research directions.

2. Materials and Methods

This section details the methodology used to build the model based on the symbolic regression of the bifacial module temperature. Figure 1 illustrates the general method followed by data collection, model construction, and comparison with other existing models.
The study began with the characterization of a bifacial PV plant equipped with solar trackers. Meteorological data—including solar irradiance (W/m2), PV module temperature (°C), ambient temperature (°C), and wind speed (m/s)—were collected and preprocessed to remove outliers and incomplete records to ensure dataset integrity. Normality was assessed using the Anderson–Darling test. This led to the use of SCC to analyze the relationships between the variables.
The processed data were employed to train a symbolic regression model using genetic algorithms that were benchmarked against three established methods: MLR, the NOCT model, and an empirical regression-based model. Predictive performance was evaluated using RMSE, MAE, R2, and SCC to quantify accuracy, error magnitude, explained variance, and monotonic agreement with observed values. The sensitivity analysis further validated the robustness of the model under varying environmental conditions.

2.1. Characteristics of the PV System

The dataset was gathered from a bifacial PV installation located in San Marcos, Colombia (8°34′32.5″ N, 74°51′27.7″ W): see Figure 2 the red dot illustrates the geographic location of the PV plant. This installation consists of a 19.9 MW bifacial PV plant integrated with solar tracking systems.
San Marcos (Sucre) is characterized by a tropical climate with high solar availability and consistently warm temperatures throughout the year. According to [26], the global horizontal irradiance in the sunniest months ranges between 5.5 and 6 kWh/m2 per day, corresponding to an annual average of approximately 5 kWh/m2 per day. Ambient temperatures typically fluctuate between 28 and 32 °C; however, during the dry season, peak temperatures exceeding 38 °C have been recorded, while in the rainy season, temperatures tend to be slightly lower, averaging between 26 and 28 °C [26]. Additionally, the relative humidity in San Marcos generally ranges from 70 to 80%, often rising above 80% during periods of heavy rainfall [26].
On the other hand, the trackers are powered by 30 W solar panels, which drive a DC motor coupled to a mechanical reducer, as illustrated in Figure 3. The solar tracker is single-axis and has a mobility of ±60° where the midday marks the 0° angle. Each degree equals 4 min. Detailed technical specifications of the bifacial PV panels are provided in Table 1.
Data that are fundamental to understanding the thermal behavior of bifacial solar modules were collected from 1 January to 31 December 2023, at 5 min intervals. The data acquisition system takes data from the anemometer every 30 s, weights them, and stores the average value every 5 min in the database.
The dataset comprised solar radiation (W/m2), ambient temperature (°C), PV module temperature (°C), and wind speed (m/s) data. Table 2 lists the instruments used in this study. The 110 PV temperature sensor has an adhesive strip, which allows it to be placed on the back of the photovoltaic module in the center of the module; the outer sheath of the cable is made of Santoprene, used for its resistance to extreme temperatures, humidity, and UV degradation. The sensor disc material is anodized aluminum.
From the total dataset comprising 420,480 records, 10,260 entries (2.44%) were identified as incomplete due to adverse weather conditions or system interruptions. These records were excluded during the data preparation phase to ensure the reliability and integrity of the analysis. The exclusion process was conducted systematically as follows:
  • Identification of Outliers: the maximum and minimum values of each variable were examined to detect and eliminate anomalies, ensuring consistency within the dataset.
  • No Imputation: Missing values were not imputed to prevent the introduction of potential distortions. This approach aligns with recommendations from prior studies [27], which emphasize preserving data integrity over imputation, particularly for high-precision modeling applications [28].
  • Focus on Normal Operations: Data corresponding to periods of maintenance or system failures were excluded to ensure that only records reflecting stable operating conditions were retained. This selective filtering enhanced the relevance and accuracy of the dataset for modeling purposes.
A statistical correlation analysis was subsequently performed to investigate the relationship between environmental variables and module temperature. The Anderson–Darling (AD) normality test (Equation (1)) was applied to assess the distribution of data. The results confirmed that the data did not follow a normal distribution, thereby necessitating the use of Spearman’s rank correlation method for further analysis.
A D = N 1 N k = 1 N 2 i 1 l n F Y i + ln 1 F Y N + 1 i
In statistical contexts, conventional notation is adopted for clarity and consistency. The symbol i denotes the ith observation in an ordered sample, N represents the total number of samples, and F(x) signifies the cumulative distribution function (CDF), a fundamental component in statistical analysis.
Pearson’s correlation method is appropriate for assessing linear relationships when the data conform to a normal distribution. Conversely, Spearman’s rank correlation method is employed when the data deviate from normality. The Spearman correlation coefficient denoted by ρ ranges between −1 and +1, where values closer to −1 or +1 indicate a stronger monotonic relationship. It is important to note that while a significant correlation may exist, it does not necessarily imply a linear relationship [29].
The p-value was computed based on the results derived from Equation (1), as summarized in Table 3. This metric provides critical insight into the statistical significance of the observed correlations, enabling robust interpretation of the findings.
The dataset was initially partitioned into 70% for training and 30% for testing, consistent with methodologies reported in previous studies [30]. To further assess the robustness of the model, additional splits of 80:20 and 90:10 were explored, as recommended by [31]. It is worth noting that only the MLR and symbolic regression (SR) models required training data, as these approaches rely on supervised learning techniques. The computational experiments were conducted on a personal computer, as described in Table 4 and Table 5.

2.2. Proposed Algorithm

Symbolic regression represents a robust methodology for modeling complex nonlinear relationships between variables by discovering the algebraic expressions that most accurately describe the dynamics of a given system. These expressions are derived from a predefined set of mathematical functions embedded within the algorithm [32].
This approach leverages genetic algorithms to simulate evolutionary processes, iteratively refining mathematical solutions and identifying optimal models that align with a specified objective metric [33]. This iterative optimization ensures both the accuracy and interpretability of the resulting models. The development and implementation of the symbolic regression algorithm is presented in Figure 4.
The dataset is organized in a tabular format with three columns. The first two columns contain the predictor variables, i.e., solar irradiance and ambient temperature, which serve as model inputs. The third column corresponds to the target variable: bifacial PV module temperature. Prior to analysis, rigorous quality checks were performed to eliminate incomplete or anomalous records to ensure a coherent and reliable dataset.
The initial population of mathematical equations was generated using basic arithmetic operators (addition, subtraction, multiplication, division) and advanced functions (sine, cosine, natural logarithm, exponential). The population size was systematically optimized within a range of 1000 to 10,000 data points through preliminary trials that balanced exploratory efficiency with computational costs. Populations smaller than 1000 individuals exhibited premature convergence due to limited genetic diversity, whereas those exceeding 10,000 led to exponentially increased computation times without proportional gains in accuracy.
Random coefficients were constrained within the interval (−5, 5), and potentially divergent functions (e.g., logarithms) incorporated ad hoc regularization terms to ensure numerical stability. The implementation was enhanced through parallelization using multiprocessing (8 cores) and dynamic memory management while prioritizing expressions with fewer than 50 operational nodes to maintain computational feasibility and model interpretability. This approach ensured robust exploration of the solution space and efficient execution of the algorithm.
The algorithm enters an iterative process to evaluate and refine equations until termination criteria are met. Each iteration takes the following steps:
  • The RMSE of each equation is calculated to quantify predictive accuracy.
  • Equations with the lowest RMSE (best fit to observed data) are selected as primary candidates.
  • Poorly performing equations undergo stochastic mutations before reintroduction into the population. This cycle continues until the RMSE stabilizes (e.g., no significant improvement over a defined number of iterations) or the maximum iteration limit is reached.
The two best-performing equations (lowest RMSE) undergo genetic crossover, generating a new “offspring” equation that inherits features from both parents. Subsequently, stochastic mutations are applied to alter coefficients, operators, or algebraic structures within the offspring equation.
The mutated equation is reevaluated; when its RMSE is lower than that of the existing equations in the population, it replaces the less effective solution. This competitive mechanism ensures that only the most promising candidates persist. Convergence is declared when the RMSE shows no significant improvement within the predefined number of iterations or when the specified error threshold is met.
The algorithm delivers the equation with the lowest RMSE, which models the relationship between solar radiation, ambient temperature, and bifacial PV module temperature. Table 6 describes the characteristics of symbolic regression coding with GA.

2.3. Model Evaluation Metrics

The method used to validate the outcome of the models was cross-validation, which consists of comparing the predicted data with the measured data using different data used in the construction of the models (test data) [31]. The optimal balance between the linear and nonlinear components of the symbolic regression (SR) model was determined by an evolutionary process-based SR model with GA, which explored combinations of mathematical operators. The algorithm evaluated the performance of each generated equation using root mean square error (RMSE), prioritizing those that integrated a linear dependence on ambient temperature (component 1 in Equation (7)) and nonlinear terms that captured solar radiation dynamics through trigonometric and absolute value functions (component 2 in Equation (7)).
The performance of the model was assessed using a comprehensive set of evaluation metrics, including RMSE, R2, MAE, and SCC, to ensure a robust and multifaceted analysis. The symbolic regression model was benchmarked against three established methodologies, viz., the MLR model, the NOCT model, and an empirical regression-based model. The evaluation was carried out using test datasets to validate the accuracy and reliability of the proposed model.
RMSE: This metric determines the magnitude of the errors between the model predicted (Vpredicted) and real (Vtarget) values, as shown in Equation (2) [34]. A lower RMSE indicates higher model accuracy.
R M S E = n = 1 N ( V p r e d i c t e d V t a r g e t ) 2 N
R2: This metric assesses the proportion of the variance in the measurements that is predictable from the independent variables in the model and is calculated as follows [34]. See Equation (3).
R 2 = 1 i = 1 N ( y i , a c t u a l y i , p r e d i c t e d ) 2 i = 1 N ( y i , a c t u a l y _ , a c t u a l ) 2
Here, yi,actual and yi,predicted are the actual and predicted values, respectively, y _ , a c t u a l is the average of the actual values, and N denotes the number of samples in the calibration and validation sets. An R-value closer to 1 indicates better regression or prediction results given the same concentration range.
MAE and SCC provide complementary perspectives on model performance, commonly used in various fields in model validation. MAE in Equation (4) measures the average size of errors in a set of predictions without considering their direction (prediction accuracy), whereas correlation coefficients quantify the degree to which two variables are linearly related (strength of the linear relationship) [35].
M A E = 1 n = k = 1 n y i y ^ i
Here, n is the number of observations, yi, and y ^ i denote the actual and predicted values, respectively.
The performance of the SR model was compared with that of an MLR model using the same training dataset. MLR is a statistical method that examines the connection between one dependent and multiple independent variables to identify the line that minimizes the sum of the squared differences between the observed and predicted values of the dependent variable [36].
In addition, the SR model was also compared to the NOCT model. The cell temperature, Tc (°C), is typically determined using the NOCT provided by the PV module manufacturer [37]. The relationship of Tc with the ambient temperature Ta (°C) and solar radiation G (W/m2) as explained in [38] is expressed as follows in Equation (5):
T c = T a + G 800 ( N O C T 20   ° C )
Finally, the SR model was compared with a regression analysis-based mathematical model using data from 2018 to determine the PV module temperature in the solar module power output calculation. Key factors for estimating the solar output include ambient temperature and solar radiation. Therefore, an empirical linear regression model was developed to measure the solar panel power output. Regression was performed on the datasets, and an empirical formula (Equation (6) was modeled with an R2 value of 96.02% [14])
T c = 1.0367 + 0.025710 G + 1.17970 T a
For graphical validation, specific days were selected to represent diverse environmental conditions:
  • The day with maximum solar radiation tested the model’s ability to predict module temperatures under extreme solar radiation.
  • An average sunny day ensured reliability under stable operating conditions.
  • A rainy day assessed robustness in low-irradiance scenarios.
A comparative analysis of the SR model distributions and the actual values clearly represented the medians, quartiles, and atypical values. This facilitated rapid evaluation of the model’s accuracy, revealing the alignment of predictions with actual values and highlighting discrepancies and atypical values that could affect the model’s performance.
Additionally, sensitivity analysis confirmed that the SR model accurately captured ambient temperature’s direct and dominant influence on the solar module temperature. Sensitivity analysis is critical for evaluating model outputs’ robustness by examining how input parameter variation affects the results [39]. The one-at-a-time (OAT) sensitivity analysis represents a straightforward approach wherein a single input parameter is modified while maintaining others at a constant value, thus allowing the observation of resultant changes in the output [40].
It is important to note the limitations associated with this model. First, the model’s performance is linked to the study site’s specific climatic and geographic characteristics (San Marcos, Colombia), which may limit its global applicability to other regions with different environmental profiles (e.g., higher altitudes, long droughts, frequent snow).
Second, factors such as panel soiling (dust/pollution accumulation) and shading effects were not incorporated into the analysis despite their potential to alter module temperature and irradiance absorption.

3. Results

This section presents the study’s key findings, focusing on variable correlation analysis and model evaluation. These results provide insights into data reliability and the effectiveness of the developed model.

3.1. Correlation Analysis of Variables

Table 7 summarizes the distribution of acquired data and presents results from the measuring instruments used to assess the solar plant. The substantial volume of the data available for analysis demonstrates the reliability and strengths of this study.
A normality assessment was performed using the AD test to determine whether the measured data were normally distributed. The calculated value exceeded the critical threshold of 0.751 (significance level 0.05), confirming non-normality. Consequently, Spearman’s correlation coefficient was applied. The correlation coefficients are listed in Table 8. The analysis revealed a significant correlation between the module temperature, solar radiation, and ambient temperature.
The correlation matrix shows that the solar panel’s temperature depends primarily on solar radiation and ambient temperature, indicating that both factors contribute significantly to its heating. Conversely, wind speed shows a less pronounced relationship with the other variables, i.e., its influence on the module temperature appears to be less crucial. Figure 5 describes the relationship of the variables over a period of one year; it can be seen that wind speed is only relevant up to 200 W/m2.
Wind speed is not included to simplify the computation process and the equation resulting from the symbolic regression algorithm, given its low correlation in this specific context (Table 8) and its interaction with the global dataset (Figure 5).
The data were divided into training and test groups under different ratios, viz., 70:30, 80:20, and 90:10. Because the MLR model with the 80% training group performed best, this group was used to build the SR model (see Table 9).
The similarities between the models indicate a consistent approach in capturing underlying relationships; however, variations in coefficients and intercepts highlight the influence of data differences, likely attributable to climatic fluctuations. Analysis of the MSE and R2 demonstrates that using 80% of the dataset for training provides an appropriate foundation for model development and testing. This partitioning ratio strikes an optimal balance between precision and stability, critical for meteorological research requiring reliable predictions under varying conditions. Furthermore, this approach enhances forecasting accuracy and facilitates a robust examination of the interdependencies among meteorological variables, ensuring the model’s applicability across diverse scenarios.

3.2. Model Evaluation

Data distribution with different populations was analyzed for model selection. The equation for a population of 1000 was selected because of its superior RMSE and R2 metrics, as shown in Table 10. The mutation rate of the genetic algorithm was set at 5%. The iterations of the algorithm were programmed in 100. The symbolic regression equation is shown in Equation (7).
S R   m o d e l = 0.99 A T + sin ( 10.76 S D ) S D cos 0.20 A T
Here AT denotes the ambient temperature (°C) and SD represents the solar radiation (W/m2). Sine and cosine functions are measured in degrees. The following sections provide detailed analyses of each component of this equation. See Equation (7a,b).
C o m p o n e n t   1 = 0.99 A T
C o m p o n e n t   2 = sin ( 10.76 S D ) S D cos 0.20 A T
Figure 6 shows the components of Equation (7) according to the time of day to examine their influence. Components 1 and 2 are essential aspects of the SR model used to predict the temperatures of bifacial PV modules comprehensively. The performance of these components was assessed by comparing them with the actual and predicted temperature obtained through symbolic regression. The data were from 29 November 2023, a day with clear skies.
Grouping the regression components enhances the interpretability of the regression analysis results because it captures diverse data patterns and ensures prediction stability, thereby strengthening the model’s applicability in real-world scenarios.
Component 1: Exhibits a linear relationship directly proportional to the ambient temperature, i.e., the solar panel temperature rises proportionally with an increase in ambient air temperature, scaled by a factor close to 1. This component reflects the fundamental thermal coupling between the environment and the module, serving as a baseline for predictions under varying climatic conditions.
Component 2: Incorporates trigonometric functions (sine and cosine) and the square root of the absolute value of differences between terms, introducing greater complexity and nonlinearity. The trigonometric functions capture periodic fluctuations in the data owing to diurnal variations in solar radiation and their impact on module temperature. Additionally, this component accounts for the influence of solar radiation intensity and its relation with ambient temperature. The observed nonlinearity highlights intricate underlying effects, such as the angle of incidence, the efficiency of radiation-to-heat conversion, and other thermal phenomena.
By integrating both components, the SR model effectively captures both linear and nonlinear effects, ensuring accurate solar panel temperature predictions across various environmental conditions. The output of the SR model in response to data obtained from an average clear-sky day on 29 November 2023 is shown in Figure 7.
Figure 8 shows the environmental conditions of solar radiation, ambient temperature, and wind speed on 29 November 2023. Here, it is observed that the components of Equation (7) (Figure 6) are similar to the forms of solar radiation and ambient temperature on this day.
Figure 9 shows the difference between the temperature measured in the module and the SR model predictions.
An MLR model was established using the training data and evaluated using the test dataset. The analysis revealed the coefficients as follows:
M o d e l   M L R = 30.24 + 2.09 · T A + 0.03 · S D + 0.42 · W S
Here, TA is the ambient temperature (°C), SD is the solar radiation (W/m2), and WS is the wind speed (m/s).
Table 11 summarizes the data obtained from the four models used for predicting the temperature of the PV module. Twenty percent of the global data (test data) were used to evaluate the models. These data were not used for model construction.
The models yield different results with respect to the metrics used. However, the SR model exhibits superior performance compared with the MLR and NOCT models.
A performance comparison of the different models for data obtained on a sunny day is shown in Figure 10. The proposed SR model exhibits the best performance in modeling the temperature of the bifacial PV module. The MLR and SR models provide results close to the actual module temperature, suggesting that they adequately capture the thermal dynamics under the analyzed conditions. In contrast, the NOCT- and empirical regression-based models exhibit significant deviations, suggesting systemic biases or limitations in their calibration.
Figure 11 compares the model performances using data from the day with the highest solar radiation (4 November 2023, with above 1000 W/m2 radiation). The decision to test the SR model on this day was based on the following:
  • Extreme conditions are key to evaluating the model’s performance under difficult circumstances and ensuring its viability in real-life scenarios. This method ensures the model’s ability to handle substantial changes in the input data, such as fluctuations in solar radiation.
  • Robustness testing demonstrates the stability and reliability of the model under varying conditions, which are essential for practical applications where consistent performance is required.
  • Performance optimization involves evaluating the model under the most demanding conditions to identify possible improvements.
  • Scientific and technical communities value accurate solar radiation predictions during peak hours because these periods considerably influence PV system performance and grid management.
    Scientific and technical communities value accurate solar radiation predictions during peak hours because these periods considerably influence PV system performance and grid management.
The SR and MLR models show the best average performance during peak periods. In contrast, the temperature predicted by the NOCT model shows substantial deviation from the actual temperature.
The models’ performances on a rainy day were also tested using data from 2 November 2023. The results in Figure 12 establish that the SR model continued to perform well by accurately capturing the temperature fluctuations of the PV module. On the contrary, the other models demonstrated limitations in their prediction abilities, particularly the NOCT model, which exhibited significantly poorer performance results.
A comparative analysis of the temperature distributions in Figure 13 shows that the SR model offers significant advantages in terms of accuracy and consistency. Compared with the other models, the SR model obtained a mean temperature value that was close to the actual temperature of the module, indicating higher precision. In contrast, the MLR model exhibited greater variability, suggesting problems with precision and sensitivity to environmental variations.
Although the NOCT model exhibits remarkable consistency with its narrower IQR, it tends to underestimate the actual temperatures, which can be disadvantageous in applications where accuracy is essential. In contrast, the SR model achieves this consistency and provides accurate predictions without compromising reliability. Compared with the empirical regression-based model, which has a similar distribution and provides higher precision, the SR model is characterized by its refined development and adaptation, which optimize the precision and consistency of predictions.
Sensitivity analysis was conducted to assess the robustness of the SR model. This analysis first examined the relationship with ambient temperature, as illustrated in Figure 14.
The graph illustrates a nearly linear relationship between the ambient temperature and SR model prediction results as the ambient temperature directly influences the temperature of the PV module. An increase in the ambient temperature leads to a corresponding increase in the solar panel’s temperature multiplied by a factor close to 1 (0.988171, according to the model’s formula). This result shows that the model accurately captures ambient temperature’s direct and dominant influences on PV module temperature.
Figure 15 illustrates the nonlinear relationship between solar radiation and SR prediction. As the solar radiation increases, the predicted temperature of the PV module also increases, but in a more complex manner than the ambient temperature. This nonlinear relationship suggests that other factors, such as the absorption efficiency of sunlight and the conversion of this energy into heat, influence solar radiation. The temperature increase is more pronounced at lower solar radiation, whereas the temperature increase in the solar module is dampened at higher solar radiation. This behavior can be attributed to thermodynamic phenomena whereby the module dissipates additional heat from solar radiation more efficiently through convection and thermal radiation mechanisms.

4. Discussions

The SR model proposed in this study demonstrates significant advancements in predicting the temperature of bifacial PV modules, effectively addressing the limitations of traditional models by balancing accuracy, interpretability, and computational efficiency. The SR model achieved the following performance metrics, viz., RMSE = 4.05 °C, R2 = 0.91, SCC = 0.95, and MAE = 2.25 °C; a summary of the performance metrics of all four, including the MLR, NOCT, and empirical models, is presented in Table 10.
Linear empirical models, such as those reported in [12] (R2 > 0.91, RMSE < 3.74 °C under clear skies), and the NOCT model [13] (R2 = 0.95 under controlled conditions), assume simplified relationships between the environmental variables thereby limiting their applicability to dynamic scenarios. In contrast, the SR model uses heterogeneous data for a year, capturing critical nonlinearities (e.g., radiation effects on bifacial PV module temperature, Figure 12). This flexibility allows the SR model to adapt to diurnal and extreme fluctuations, whereas linear models such as the empirical one in [14] (Equation (6)) underestimate the temperature during irradiation peaks (Figure 8).
Even though thermal models based on energy balances are accurate [15,16] (MAE ≈ 3.09 °C in [15]), they require solving differential equations and involve complex parameters such as Grashof number and convection coefficient, making them computationally expensive and impractical for real-time applications.
The SR model overcomes these limitations by generating explicit algebraic equations without numerical iterations (Equation (7)), reducing training time to <30 min (Table 9) and enabling implementation on standard hardware.
While neural network (ANN) models [20,21] achieve superior metrics (MAE = 1.45 °C in [20], R2 = 0.986 in [21]), their opaque nature complicates the extraction of physical relationships between variables. The SR model, however, provides interpretable equations (e.g., Component 1 = 0.99 A T in Equation (7a)), revealing the following:
  • Ambient temperature explains 81% of the module’s thermal variability (0.81), Table 7), aligning with studies in tropical climates [9].
  • Solar radiation induces nonlinear effects (Component 2), such as thermal saturation (Figure 12). Panel heating increases nonlinearly with solar radiation with a reduced rate of temperature rise at higher irradiation levels, a phenomenon reported in energy models [41].
The SR model demonstrated robustness under diverse conditions:
  • Extreme radiation (Figure 8): accurate predictions above 1000 W/m2, outperforming the NOCT model.
    Rainy days (Figure 9): captured thermal attenuation due to cloud cover, a challenge for linear models [12].
  • Annual variability: validated with 410,202 filtered data points, overcoming the short-term validation limitations (<6 days) of previous studies [16,19].
However, its applicability is tied to the tropical climate of San Marcos, Colombia. Factors such as wind-driven convection (0.38 in this study) or snow accumulation may require recalibration in arid or snowy regions. Additionally, excluding soiling and shading introduces uncertainty [8], a critical area for future research.
The SR model matches the metrics of established models and addresses practical challenges overlooked by traditional approaches, as listed below.
  • Tracker optimization: the explicit Equation (7) enables integration with tracking algorithms to maximize bifacial energy capture, avoiding the computational cost of iterative simulations [22].
  • Accessibility: its implementation on basic hardware (Table 4) facilitates use in resource-limited regions where complex computational models face adoption barriers [20].

5. Conclusions

This study introduces an innovative symbolic regression model for predicting the temperature of bifacial PV modules equipped with trackers, achieving a unique balance between precision, physical transparency, and computational efficiency. The model demonstrated robust performance by attaining an RMSE of 4.05 °C, R2 of 0.91, SCC of 0.95, and MAE of 2.25 °C, significantly outperforming traditional approaches such as the MLR (RMSE = 6.18 °C) and NOCT models (RMSE = 13.23 °C). These metrics, validated using a year of data under dynamic tropical conditions, underscore its ability to capture nonlinear interactions between environmental variables—a critical challenge for linear empirical models. Unlike opaque methods such as neural networks, the SR model generated explicit equations that revealed key physical mechanisms, including the dominant linear dependence on ambient temperature and nonlinear thermal saturation effects at high irradiation levels.
The model’s efficiency, with training times under 30 min on standard hardware, positions it as a practical alternative to the energy balance-based approaches which rely on numerical iterations and complex parameters. This computational agility enables real-time applications, such as adaptive solar tracker adjustments and integration with digital twins for PV plant optimization. However, its current applicability is limited to tropical climates, and validation in arid or temperate regions—where factors such as wind-driven convection may play a stronger role—is required. Future work should address excluded variables, such as panel soiling, humidity, and shading, to enhance universality.
By combining the rigor of physics-based models with the agility of empirical methods, symbolic regression emerges as a transformative paradigm for designing efficient photovoltaic systems.

Author Contributions

Conceptualization, F.A.L.-V., J.Á.-L., and D.D.-B.; methodology, C.V.-S.; software, J.Á.-L.; validation, F.A.L.-V.; investigation, F.A.L.-V., C.V.-S., J.Á.-L., and D.D.-B.; data curation, D.D.-B.; writing—original draft, F.A.L.-V.; writing—review and editing, C.V.-S. and F.A.L.-V.; supervision, C.V.-S.; project administration, C.V.-S.; funding acquisition, F.A.L.-V. All authors have read and agreed to the published version of the manuscript.

Funding

One of the authors, F.A.L.-V., was granted a scholarship by the Universidad Pontificia Bolivariana through Act 58 of 25 October 2023 for studies at the Universitat Politécnica de Valencia. In addition, another author, D.D.-B., was supported by the Ministry of Universities of Spain under Grant 628 FPU21/00677. This work was also supported by: a grant of the Cátedra de Transición Energética Urbana- funded by Ajuntament de València-Las Naves and Fundació València Clima i Energia; and the RES4CITY project, financed by the European Union under Grant Agreement No. 101075582.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors wish to express their gratitude to Atlantica Colombia SAS and Edison Ortega Oviedo for their support and collaboration in the development of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADAnderson–Darling Test
ANNArtificial Neural Networks
CDFCumulative Distribution Function
GAGenetic Algorithms
IQRInterquartile Range
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MLRMultiple Linear Regression
NOCTNormal Operating Cell Temperature
OATOne-at-a-Time (sensitivity analysis)
PVPhotovoltaic
R2Coefficient of Determination
RMSERoot Mean Square Error
ROIReturn on Investment
SCCSpearman’s Rank Correlation Coefficient
SRSymbolic Regression
STCStandard Test Conditions

References

  1. Osman, A.I.; Chen, L.; Yang, M.; Msigwa, G.; Farghali, M.; Fawzy, S.; Rooney, D.W.; Yap, P.-S. Cost, environmental impact, and resilience of renewable energy under a changing climate: A review. Environ. Chem. Lett. 2023, 21, 741–764. [Google Scholar] [CrossRef]
  2. AlTimimi, M.J.H. Solar energy. In Quantum Dots–Recent Advances, New Perspectives and Contemporary Applications; IntechOpen: London, UK, 2023; Volume 5, pp. 89–110. [Google Scholar] [CrossRef]
  3. Kahar, N.H.A.; Azhan, N.H.; Alhamrouni, I.; Zulkifli, M.N.; Sutikno, T.; Jusoh, A. Comparative analysis of grid-connected bifacial and standard mono-facial photovoltaic solar systems. Bull. Electr. Eng. Inform. 2023, 12, 1993–2004. [Google Scholar] [CrossRef]
  4. Baloch, A.A.B.; Armoush, M.; Hindi, B.; Bousselham, A.; Tabet, N. Performance assessment of stand-alone bifacial solar panel under real-time conditions. In Proceedings of the 2017 IEEE 44th Photovoltaic Specialist Conference (PVSC), Washington, DC, USA, 25–30 June 2017; pp. 1058–1060. [Google Scholar] [CrossRef]
  5. Yin, H.P.; Zhou, Y.F.; Sun, S.L.; Tang, W.S.; Shan, W.; Huang, X.M.; Shen, X.D. Optical enhanced effects on the electrical performance and energy yield of bifacial PV modules. Sol. Energy 2021, 217, 245–252. [Google Scholar] [CrossRef]
  6. Naik, S.D.; Kulkarni, S.V. Design and analysis of underwater bifacial solar photovoltaic cell. In Proceedings of the IEEE 3rd Global Conference for Advancement in Technology (GCAT), Bangalore, India, 7–9 October 2022; pp. 1–5. [Google Scholar] [CrossRef]
  7. Zhang, Y.; Gao, J.Q.; Yu, Y.; Shi, Q.; Liu, Z. Influence of incidence angle effects on the performance of bifacial photovoltaic modules considering rear-side reflection. Sol. Energy 2022, 245, 404–409. [Google Scholar] [CrossRef]
  8. Liu, X.; Cui, L.; Tao, Q.; Yi, Z.; Li, J.; Lu, L. Dust deposition mechanism and output characteristics of solar bifacial PV panels. Environ. Sci. Pollut. Res. 2023, 30, 100937–100949. [Google Scholar] [CrossRef] [PubMed]
  9. Kaewpraek, C.; Ali, L.; Rahman, M.A.; Shakeri, M.; Chowdhury, M.S.; Jamal, M.S.; Mia, M.S.; Pasupuleti, J.; Dong, L.K.; Techato, K. The effect of plants on the energy output of green roof photovoltaic systems in tropical climates. Sustainability 2021, 13, 4505. [Google Scholar] [CrossRef]
  10. Hudișteanu, V.-S.; Cherecheș, N.-C.; Țurcanu, F.-E.; Hudișteanu, I.; Romila, C. Impact of temperature on the efficiency of monocrystalline and polycrystalline photovoltaic panels: A comprehensive experimental analysis for sustainable energy solutions. Sustainability 2024, 16, 10566. [Google Scholar] [CrossRef]
  11. Stein, J.S.; Riley, D.; Lave, M.; Hansen, C.; Deline, C.; Toor, F. Outdoor field performance from bifacial photovoltaic modules and systems. In Proceedings of the IEEE 44th Photovoltaic Specialist Conference (PVSC), Washington, DC, USA, 25–30 June 2017; pp. 3184–3189. [Google Scholar] [CrossRef]
  12. Meflah, A.; Aouchiche, I.; Berkane, S.; Chekired, F. Estimation models of photovoltaic module operating temperature under various climatic conditions. Indones. J. Electr. Eng. Comput. Sci. 2023, 32, 13–20. [Google Scholar] [CrossRef]
  13. Mattei, M.; Notton, G.; Cristofari, C.; Muselli, M.; Poggi, P. Calculation of the polycrystalline PV module temperature using a simple method of energy balance. Renew. Energy 2006, 31, 553–567. [Google Scholar] [CrossRef]
  14. Viswanath, A.; Krishna, K.; Chandrika, T.; Purushotham, V.; Harikumar, P. Development of a mathematical model for solar power estimation using regression analysis. Adv. Intell. Syst. Comput. 2021, 1227, 589–597. [Google Scholar] [CrossRef]
  15. Vega, M.A.P.; Lopez, O.M.G.; Guarin, A.R.M.; Vasquez, R.D.G.; Bula, A.; Fandino, J.M.M. Estimation of the surface temperature of a photovoltaic panel through a radiation-natural convection heat transfer model in MATLAB Simulink. In Proceedings of the ASME 2016 International Mechanical Engineering Congress and Exposition, Volume 8: Heat Transfer and Thermal Engineering, Phoenix, AZ, USA, 11–17 November 2016. [Google Scholar] [CrossRef]
  16. Kaplani, E.; Kaplani, S. PV module temperature prediction at any environmental conditions and mounting configurations. In Renewable Energy and Sustainable Buildings, Proceedings of the World Renewable Energy Congress WREC 2018, London, UK, 30 July–3 August 2018; Springer: Cham, Switzerland, 2020; pp. 921–933. [Google Scholar] [CrossRef]
  17. Zaimi, M.; Achouby, H.E.; Ibral, A.; Assaid, E.M.; Maliki, M.A.E.; Saadani, R. Temporal monitoring of temperature and incident irradiance for predicting photovoltaic solar module peak power and efficiency using analytical expressions of model physical parameters. In Proceedings of the 6th International Renewable and Sustainable Energy Conference (IRSEC), Rabat, Morocco, 5–8 December 2018; pp. 1–7. [Google Scholar] [CrossRef]
  18. Haeberle, F.; Dias, J.B.; Cardoso, J.T., Jr.; Abe, C.F.; Notton, G. Estimativa da temperatura do módulo FV a partir de um modelo de balanço de energia (Estimate of PV module temperature using an energy balance model). In Proceedings of the Anais Congresso Brasileiro de Energia Solar-CBENS, Brasilia, Brazil, 16 August 2022. [Google Scholar] [CrossRef]
  19. Keddouda, A.; Ihaddadene, R.; Boukhari, A.; Atia, A.; Arici, M.; Lebbihiat, N.; Ihaddadene, N. Experimentally validated thermal modeling for temperature prediction of photovoltaic modules under variable environmental conditions. Renew. Energy 2024, 231, 120922. [Google Scholar] [CrossRef]
  20. Kayri, I.; Aydin, H. ANN-based prediction of module temperature in a single-axis PV system. In Proceedings of the Global Energy Conference (GEC), Batman, Turkey, 26–29 October 2022; pp. 361–367. [Google Scholar] [CrossRef]
  21. Keddouda, A.; Ihaddadene, R.; Boukhari, A.; Atia, A.; Arici, M.; Lebbihiat, N.; Ihaddadene, N. Photovoltaic module temperature prediction using various machine learning algorithms: Performance evaluation. Appl. Energy 2024, 363, 123064. [Google Scholar] [CrossRef]
  22. Ghenai, C.; Ahmad, F.F.; Rejeb, O.; Bettayeb, M. Artificial neural networks for power output forecasting from bifacial solar PV system with enhanced building roof surface Albedo. J. Build. Eng. 2022, 56, 104799. [Google Scholar] [CrossRef]
  23. He, M.; Zhang, L. Machine learning and symbolic regression investigation on stability of MXene materials. Comput. Mater. Sci. 2021, 196, 110578. [Google Scholar] [CrossRef]
  24. Shmuel, A.; Glickman, O.; Lazebnik, T. Symbolic regression as feature engineering method for machine and deep learning regression tasks. Mach. Learn. Sci. Technol. 2024, 5, 025065. [Google Scholar] [CrossRef]
  25. Radwan, Y.A.; Kronberger, G.; Winkler, S. A Comparison of Recent Algorithms for Symbolic Regression to Genetic Programming. arXiv 2024. [Google Scholar] [CrossRef]
  26. Ministerio de Minas y Energía de Colombia and Ministerio de Ambiente; Vivienda y Desarrollo Territorial de Colombia. Atlas de Radiación Solar de Colombia; Gobierno de Colombia: Bogotá, Colombia, 2005. [Google Scholar]
  27. Roy, M.S.; Roy, B.; Gupta, R.; Sharma, K.D. On-device reliability assessment and prediction of missing photoplethysmographic data using deep neural networks. IEEE Trans. Biomed. Circuits Syst. 2020, 14, 1323–1332. [Google Scholar] [CrossRef]
  28. Storlie, C.B.; Therneau, T.M.; Carter, R.E.; Chia, N.; Berquist, J.R.; Huddleston, J.M.; Romero-Brufau, S. Prediction and inference with missing data in patient alert Systems. J. Am. Stat. Assoc. 2020, 115, 32–46. [Google Scholar] [CrossRef]
  29. Thirumalai, C.; Chandhini, S.A.; Vaishnavi, M. Analysing the concrete compressive strength using Pearson and Spearman. In Proceedings of the International Conference on Electronics, Communication and Aerospace Technology, ICECA 2017, Coimbatore, India, 20–22 April 2017. [Google Scholar] [CrossRef]
  30. Kinaneva, D.; Hristov, G.; Kyuchukov, P.; Georgiev, G.; Zahariev, P.; Daskalov, R. Machine learning algorithms for regression analysis and predictions of numerical data. In Proceedings of the 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 11–13 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
  31. Huang, F.; Teng, Z.; Guo, Z.; Catani, F.; Huang, J. Uncertainties of landslide susceptibility prediction: Influences of different spatial resolutions, machine learning models and proportions of training and testing dataset. Rock Mech. Bull. 2023, 2, 100028. [Google Scholar] [CrossRef]
  32. Angelis, D.; Sofos, F.; Karakasidis, T.E. Artificial intelligence in physical sciences: Symbolic regression trends and perspectives. Arch. Comput. Methods Eng. 2023, 30, 3845–3865. [Google Scholar] [CrossRef]
  33. Moreno Parra, R.A. Un Uso de Algoritmos Genéticos Para la Búsqueda de Patrones, 1st ed.; Editorial XYZ: Bogotá, Colombia, 2019. [Google Scholar]
  34. Hodson, T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
  35. Dayev, Z.; Kairakbaev, A.; Yetilmezsoy, K.; Bahramian, M.; Sihag, P.; Kiyan, E. Approximation of the discharge coefficient of differential pressure flowmeters using different soft computing strategies. Flow Meas. Instrum. 2021, 79, 101913. [Google Scholar] [CrossRef]
  36. Jiang, J. Multiple Linear Regression. In Applied Medical Statistics; Wiley: Hoboken, NJ, USA, 2022; Chapter 15; pp. 345–367. [Google Scholar] [CrossRef]
  37. Nolay, P. Développement d’une Méthode Générale D’analyse des Systèmes Photovoltaïques (Development of a General Method for Analyzing Photovoltaic Systems). Ph.D. Thesis, École Nationale Supérieure des Mines de Paris, Paris, France, 1987. [Google Scholar]
  38. Bharti, R.; Kuitche, J.; Tamizhmani, M.G. Nominal operating cell temperature (NOCT): Effects of module size, loading and solar spectrum. In Proceedings of the 34th IEEE Photovoltaic Specialists Conference (PVSC), Philadelphia, PA, USA, 7–12 June 2009; pp. 001657–001662. [Google Scholar] [CrossRef]
  39. Tarantola, S.; Ferretti, F.; Piano, S.L.; Kozlova, M.; Lachi, A.; Rosati, R.; Puy, A.; Roy, P.; Vanucci, G.; Kuc-Czarnecka, M.; et al. An annotated timeline of sensitivity analysis. Environ. Model. Softw. 2024, 174, 105977. [Google Scholar] [CrossRef]
  40. Zand, M.; Nasab, M.A.; Padmanaban, S.; Maroti, P.K.; Muyeen, S.M. Sensitivity analysis index to determine the optimal location of multi-objective UPFC for improvement of power quality parameters. Energy Rep. 2023, 10, 1234–1245. [Google Scholar] [CrossRef]
  41. Wang, F.; Li, Z.; Liu, M.; Liu, X.; Pang, D.; Du, W.; Cheng, X.; Zhang, Y.; Guo, W. Heat-dissipation performance of photovoltaic panels with a phase-change-material fin structure. J. Clean. Prod. 2023, 423, 138756. [Google Scholar] [CrossRef]
Figure 1. Overview of the method for developing the proposed symbolic regression model and its evaluation.
Figure 1. Overview of the method for developing the proposed symbolic regression model and its evaluation.
Energies 18 02019 g001
Figure 2. Geographic location of the bifacial photovoltaic plant [26].
Figure 2. Geographic location of the bifacial photovoltaic plant [26].
Energies 18 02019 g002
Figure 3. Tracker used in bifacial solar photovoltaic plant.
Figure 3. Tracker used in bifacial solar photovoltaic plant.
Energies 18 02019 g003
Figure 4. Flowchart of the implemented symbolic regression algorithm.
Figure 4. Flowchart of the implemented symbolic regression algorithm.
Energies 18 02019 g004
Figure 5. Behavior of meteorological variables during the period of analysis.
Figure 5. Behavior of meteorological variables during the period of analysis.
Energies 18 02019 g005
Figure 6. Components of Equation (7) vs. time (data are from 29 November 2023).
Figure 6. Components of Equation (7) vs. time (data are from 29 November 2023).
Energies 18 02019 g006
Figure 7. Comparison of the actual temperature on 29 November 2023 and that obtained using the SR model.
Figure 7. Comparison of the actual temperature on 29 November 2023 and that obtained using the SR model.
Energies 18 02019 g007
Figure 8. Comparison between environmental variables and SR model on 29 November 2023.
Figure 8. Comparison between environmental variables and SR model on 29 November 2023.
Energies 18 02019 g008
Figure 9. Comparison between the one-month test data and the value predicted by the SR model.
Figure 9. Comparison between the one-month test data and the value predicted by the SR model.
Energies 18 02019 g009
Figure 10. Comparative graph of the models on a sunny day (29 November 2023).
Figure 10. Comparative graph of the models on a sunny day (29 November 2023).
Energies 18 02019 g010
Figure 11. Comparative graph of the models on the day with the highest radiation (4 November 2023).
Figure 11. Comparative graph of the models on the day with the highest radiation (4 November 2023).
Energies 18 02019 g011
Figure 12. Comparative graph of the models on a rainy day (2 November 2023).
Figure 12. Comparative graph of the models on a rainy day (2 November 2023).
Energies 18 02019 g012
Figure 13. Comparative graph of temperature distribution measurements.
Figure 13. Comparative graph of temperature distribution measurements.
Energies 18 02019 g013
Figure 14. Sensitivity of the SR model to ambient temperature at constant solar radiation.
Figure 14. Sensitivity of the SR model to ambient temperature at constant solar radiation.
Energies 18 02019 g014
Figure 15. Sensitivity of the SR model to solar radiation at constant ambient temperature.
Figure 15. Sensitivity of the SR model to solar radiation at constant ambient temperature.
Energies 18 02019 g015
Table 1. Technical specifications of bifacial PV panels.
Table 1. Technical specifications of bifacial PV panels.
ParameterSpecification
Power rating (STC)395–420 Wp
Open circuit voltage (Voc)48.6 V
Short circuit current (Isc)10.5 A
Maximum power point voltage (Vmp)40.45 V
Maximum power point current (Imp)9.90 A
Module efficiency19.7%
Bifacial factor70 ± 5%
Temperature coefficient (Pmax)−0.36%/°C
NMOT Power (Pmax)306.9 Wp (Nominal Module Operating Temperature = 44 °C ± 2 °C)
Operational temperature−40 °C to +85 °C
Table 2. Technical characteristics of data acquisition equipment.
Table 2. Technical characteristics of data acquisition equipment.
MeasurementRange Accuracy Resolution
Pyranometer MS-800–4000 W/m210 µV/W/m2±0.2%
CR 300 datalogger−40 to +70 °C ±1 min per month24-bit ADC
Scientific 110 PV CS(−40 to 135 °C)±0.024 °C±0.2 °C
HygroVUE5−40 to +70 °C±0.4 °C0.001 °C
Anemometer CS0 to 177 km/h±0.5 m/s0.02 m/s
Table 3. p-value calculation.
Table 3. p-value calculation.
Anderson–Darling (AD)p-Value
AD ≤ 0.2 1 e ( 13.436 + 101.14 A D 223.73 A D 2 )
0.2 < AD ≤ 0.34 1 e ( 8.318 + 42.796 A D 59.938 A D 2 )
0.34 < AD ≤ 0.6 1 e ( 0.9177 + 7.279 A D 1.38 A D 2 )
AD ≥ 0.6 1 e ( 1.2937 + 5.709 A D 0.0186 A D 2 )
Table 4. Specifications of the computing hardware employed.
Table 4. Specifications of the computing hardware employed.
Item Specification
RAM32 GB DDR4
RAM Speed4267 MHz
SoftwareSpyder 6.04
CPU8 × 1.7 GHz
Table 5. Specifications of libraries employed.
Table 5. Specifications of libraries employed.
Libraries Specification
SymPy v1.12Algebraic manipulation and symbolic expression generation
Scikit-learn v1.3Metric calculations (RMSE) stratified data splitting
NumPy v1.24-Pandas v2.0Array management and preprocessing
Matplotlib v3.7Convergence visualization of the algorithm
MultiprocessingParallelization of fitness evaluations
Table 6. Pseudocode of the symbolic regression algorithm with GA.
Table 6. Pseudocode of the symbolic regression algorithm with GA.
Symbolic Regression Algorithm with GA
    // Mathematical functions
   FUNCTIONS = [add, subtract, multiply, divide, sin, cos, tan, log]
   // Generate random expression 1
   GENERATE_EXPRESSION():
     num_terms = random(3, 6)
     terms = []
     for each term:
       expr1 = random_function(x1, x2)
       expr2 = random_function(x1, x2)
       if random > 0.5:
         term = expr1 + expr2
       else:
         term = expr1 * expr2
       add term to terms[]
     return sum(terms[])
   // Mutation process
   MUTATE(expression, mutation_probability):
     if random < mutation_probability:
       replace random subexpression with new expression
     return expression
   // Tournament selection process
   SELECT(population, k):
     selected = []
     for each individual needed:
       randomly select k individuals
       select best according to fitness
       add to selected[]
     return selected[]
   // Crossover process
   CROSSOVER(parent1, parent2):
     select subexpression from each parent
     swap subexpressions
     return new offspring
   // Main genetic algorithm SR
   GENETIC_ALGORITHM(train_data, test_data, parameters):
     population = [generate_expression() for _ in range(population_size)]
     best_rmse = infinity
       for each generation:
       // Evaluate population
       fitness = [evaluate_expression(ind) for ind in population]
       // Selection and crossover
       new_population = [best_individual] // elitism
       while len(new_population) < population_size:
         parent1, parent2 = select(population, k)
         child = mutate(crossover(parent1, parent2), mutation_probability)
         new_population.append(child)
       population = new_population
       if best_rmse > best_fitness(fitness):
         best_rmse = best_fitness(fitness)
         best_expression = best_individual
  return best_expression, best_rmse
Table 7. Acquired data distribution statistics.
Table 7. Acquired data distribution statistics.
ItemNumber of DataPercentage Over Total
Total data collected in the year420,480100%
Overall data for the study420,47699.99%
Data during maintenance 10,2742.44%
Filtered data for study410,20297.55%
Table 8. Correlation coefficients for critical parameters.
Table 8. Correlation coefficients for critical parameters.
Wind SpeedAmbient TemperatureSolar RadiationSolar Module Temperature
Wind speed1
Ambient temperature0.271
Solar radiation0.330.561
Solar module temperature0.380.810.801
Table 9. Evaluation of construction groups for MLR models.
Table 9. Evaluation of construction groups for MLR models.
Data SetMSER2Coefficient Ambient TemperatureCoefficient Solar RadiationCoefficient Wind SpeedIntercept
70%43.360.842.110.030.45−31.10
80%40.840.852.090.030.42−30.24
90%40.800.841.590.030.34−16.49
Table 10. Evaluation of the population for the construction of the symbolic regression model.
Table 10. Evaluation of the population for the construction of the symbolic regression model.
PopulationRMSER2Execution Time (min)
10004.050.9128.01
50005.100.7938.37
10,0005.030.8048.47
Table 11. Model evaluation.
Table 11. Model evaluation.
ModelRMSER2SCCMAE
MLR6.180.790.913.19
NOCT 13.230.050.818.42
Empirical regression-based 5.330.850.933.89
Symbolic regression4.050.910.952.25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lara-Vargas, F.A.; Vargas-Salgado, C.; Águila-León, J.; Díaz-Bello, D. Optimizing Bifacial Solar Modules with Trackers: Advanced Temperature Prediction Through Symbolic Regression. Energies 2025, 18, 2019. https://doi.org/10.3390/en18082019

AMA Style

Lara-Vargas FA, Vargas-Salgado C, Águila-León J, Díaz-Bello D. Optimizing Bifacial Solar Modules with Trackers: Advanced Temperature Prediction Through Symbolic Regression. Energies. 2025; 18(8):2019. https://doi.org/10.3390/en18082019

Chicago/Turabian Style

Lara-Vargas, Fabian Alonso, Carlos Vargas-Salgado, Jesus Águila-León, and Dácil Díaz-Bello. 2025. "Optimizing Bifacial Solar Modules with Trackers: Advanced Temperature Prediction Through Symbolic Regression" Energies 18, no. 8: 2019. https://doi.org/10.3390/en18082019

APA Style

Lara-Vargas, F. A., Vargas-Salgado, C., Águila-León, J., & Díaz-Bello, D. (2025). Optimizing Bifacial Solar Modules with Trackers: Advanced Temperature Prediction Through Symbolic Regression. Energies, 18(8), 2019. https://doi.org/10.3390/en18082019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop