Next Article in Journal
Research Synergies between Sustainability and Human-Centered Design: A Systematic Literature Review
Previous Article in Journal
Exploring Wild Edible Plants in Malakand, Pakistan: Ethnobotanical and Nutritional Insights
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mixed Multi-Pattern Regression for DNI Prediction in Arid Desert Areas

1
Systems Engineering Institute, School of Automation, Xi’an Jiaotong University, Xi’an 710049, China
2
Qinghai Photovoltaic Industry Centre Co., Ltd., State Power Investment Corporation, Xining 810000, China
3
Northwest Engineering Corporation Limited, PowerChina, Xi’an 710065, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(17), 12885; https://doi.org/10.3390/su151712885
Submission received: 14 July 2023 / Revised: 12 August 2023 / Accepted: 19 August 2023 / Published: 25 August 2023

Abstract

:
As a crucial issue in renewable energy, accurate prediction of direct normal solar irradiance (DNI) is essential for the stable operation of concentrated solar power (CSP) stations, especially for those in arid desert areas. In this study, in order to fully explore the laws of climate change and assess the solar resources in arid desert areas, we have proposed a mixed multi-pattern regression model (MMP) for short-term DNI prediction using prior knowledge provided by the clear-sky solar irradiance (CSI) model and time series patterns of key meteorological factors mined using PR-DTW on different time scales. The contrastive experimental results demonstrated that MMP can outperform existing DNI prediction models in terms of three recognized statistical metrics. To address the challenge of limited data in arid desert areas, we presented the T-MMP model involving combined transfer learning and MMP. The experimental results demonstrated that T-MMP outperformed MMP in DNI prediction by exploiting the significant correlation between meteorological time series patterns in similar areas for data augmentation. Our study provided a valuable prediction model for accurate DNI prediction in arid desert areas, facilitating the economical and stable operation of CSP plants.

1. Introduction

With the diminishing supply of fossil fuels and growing concerns about environmental degradation, the development of renewable and clean energy sources has emerged as a global priority [1,2]. Solar energy has rapidly grown and occupies a leading position in the global energy consumption market [3,4]. However, the availability of solar energy, specifically direct normal irradiance (DNI), is subject to intermittent fluctuations due to weather conditions [5]. The inherent variability witnessed in direct normal irradiance (DNI) has the potential to impart a level of uncertainty that permeates concentrated solar power (CSP) systems, thereby engendering intricate challenges that bear direct implications for the robustness of their operational integrity. In the context of a burgeoning demand for renewable energy sources, the task of proficiently overseeing the fluctuations in resource availability within power grid operations assumes a heightened degree of complexity [6]. Consequently, accurate DNI prediction is crucial for the safe and stable operation of CSP stations and remains an important issue to be solved [7].
The aberrations observed in direct normal irradiance (DNI) are under the sway of a myriad of meteorological variables, including but not limited to the solar altitude angle, cloud cover, aerosol absorption, ozone absorption, the mixed gas feature absorption spectrum, and water vapor absorption [8]. Among these, the volatility in cloud cover stands as the predominant trigger for abrupt oscillations in DNI over short periods [9]. Consequently, many scholars have proposed advanced DNI prediction models that derive their foundation from dynamic sky images, with evolving cloud covers as the seminal point of departure [10]. Chu et al. have developed a probabilistic forecasting model using cloud cover information as an exogenous feature input for intra-hour DNI prediction [11]. Zhu et al. have developed a Siamese convolutional neural network–long short-term memory (SCNN-LSTM) model combined with sky images and historical meteorological observations to predict intra-hour DNI [12]. Sanchez-Segura et al. have proposed a low-cost sky-imager system to estimate solar irradiance components [13]. Nonetheless, the endeavor to forecast DNI through the analysis of cloud dynamics via sky images encounters a constellation of formidable challenges. Particularly, the determination of optical cloud thickness, which is a parameter inherently tied to DNI attenuation, presents a challenge in terms of direct acquisition. Furthermore, the intricate interplay between cloud motions, often accompanied by their origination and dissipation, introduces complexity in modeling the trajectory of cloud movement. Capturing significant short-term changes in DNI within seconds comes at a high cost, and such capabilities are limited in availability [14]. Moreover, it is noteworthy that in arid desert regions, the occurrence of overcast sky days, which are conducive to capturing cloud dynamics via sky images, remains comparably sparse across the entirety of the annual cycle. Consequently, the limited availability of data hampers further analysis, making it challenging to predict DNI solely based on cloud variations observed in sky images.
The fluidity of cloud cover transitions is intricately linked to distinct meteorological patterns, holding pivotal influence over the constitution and trajectory of clouds. These patterns serve as the architects of cloud dynamics, thereby governing the amplitude, frequency, and duration of alterations in cloud cover [15,16]. Previous studies have explored the relationship between wind power output variability and weather patterns, demonstrating that synoptic weather patterns can effectively predict wind generation variability [17,18,19]. Consequently, the meticulous analysis of these intricate patterns not only presents a potent avenue for achieving accurate DNI prediction but also unveils the underlying laws governing climatic fluctuations. This endeavor carries the potential to offer a pragmatic resolution to the data scarcity predicament encountered in arid desert regions. Until now, the relationship between weather patterns and solar power variability has been preliminarily researched. Köhler et al. identified critical weather patterns for solar power integration in Germany [20]. Francisco et al. discovered four modes of variability of solar resources which specific impact solar power generation in the Iberian Peninsula [21]. Augustine et al. have found that the dimming and brightening cyclic patterns from 1996 to 2019 in the US was due to systematic changes in cloud cover, while atmospheric dust played only a minor role [22]. However, these patterns analyzed for solar radiation generally have a long time span (from months to years), which is not suitable for DNI short-term forecasting. Additionally, pattern analysis in solar irradiance research mainly focuses on exploring climate change regulations instead of predictions [19,20,21,22]. Therefore, using weather patterns for solar radiation resource variability to accurately predict DNI remains limited.
There is a consensus that the influence of meteorological factors affecting DNI are integrated into the changes in a DNI time series. Much effort has been devoted to DNI prediction from a time-series perspective. Ardila et al. have proposed two fuzzy time series methods for solar radiation prediction [23]. Zhu et al. have presented a novel long short-term memory (LSTM) model based on the attention mechanism and genetic algorithm (AGA-LSTM) for GHI and DNI prediction [24]. Although these models have certain predictive abilities, they require a large amount of training data and are not suitable for predicting DNI in arid desert areas with limited historical data. Incorporating a wealth of well-established physical insights and a comprehensive grasp of the intricate dynamics governing climatic variations, the CSI model emerges as an indispensable cornerstone for the meticulous assessment of solar resources across diverse geographic locations. Its pervasive utilization extends across a multitude of applications, underscoring its fundamental significance within the domain [25]. Incorporating CSI into model construction may be helpful for predicting DNI.
Therefore, in this study, considering the patterns for different time scales and the CSI model, we have developed a mixed multi-pattern regression model (MMP) for DNI prediction. Firstly, a pattern representation method based on the DTW (PR-DTW) proposed by our research team in 2020 [26] was used to extract the short-term, medium-term, and long-term patterns of key meteorological factors including DNI, global horizontal irradiance (GHI), and diffuse horizontal irradiance (DHI) from the actual time series and the theoretical time series obtained from the CSI model. Then, by calculating the pattern distance between the theoretical patterns and the actual patterns on different time scales, the pattern variables representing the pattern difference were constructed as part of the regression factors to be taken into the regression functions together with all the meteorological factors, including GHI, DNI, DHI, atmospheric pressure (Press), wind direction (Wspd), wind speed (Wspd), and relative humidity (Rhum). Finally, the regression results obtained using pattern regressions on different time scales were integrated into an ensemble model to obtain the final prediction results. The experimental results showed that compared to the exiting prediction models, MMP delivered the best DNI prediction ability and benefited from the patterns of key meteorological factors affecting the change in DNI on different time scales and the important prior physical knowledge provided by the CSI model. Additionally, in order to solve the problem of the limited observation data available in arid desert areas, we have proposed a model that combines transfer learning and MMP (T-MMP) to improve DNI prediction performance.
In general, in order to fully explore the changing regulation of climates and assess the solar resources in arid desert areas, a time series pattern analysis in arid desert areas was firstly approved to be effective for DNI prediction. To conduct real-time data analysis with a minimal cost, a mixed multi-pattern regression model (MMP) was developed for DNI prediction. The experimental results demonstrated that MMP is more capable of predicting the DNI trend than existing DNI prediction models. Moreover, T-MMP was developed to solve the problem of the small amount of historical data in arid desert areas. The remainder of this paper is organized as follows: Section 2 briefly reviews the related works, then describes the pattern analysis and construction of the MMP. The effectiveness of the MMP is evaluated through experiments on three primary datasets in Section 3. In Section 4, we determine how to apply transfer learning to improve the MMP prediction accuracy with less data. We conclude this paper in Section 5.

2. Materials and Methods

This section describes the three basic methods used in the construction of the MMP and the overall framework of the MMP.

2.1. PR-DTW

The pattern representation method based on DTW (PR-DTW), which was proposed by our research team in 2020, is a time series representation method [26]. With the advantage of the prevention of end effects, anti-noise, and segmentation, PR-DTW has shown its effectiveness in extracting time series patterns in the financial market, which has a prospective directive for the trend analysis of a time series. PR-DTW can undoubtedly extract changing patterns on different time scales for a DNI time series. Suppose we have a time series S = { s 1 , s 2 , , s n } and an integer m [ 1 , n ] representing the length of an extracted pattern. The pattern is denoted as S = { s t 1 , s t 2 , , s t m } and the optimization model of PR-DTW can be written as:
min S f ( S ) = D T W ( S , S )
s . t . S S t i < t j , 1 i < j m
As an elastic similarity measurement, dynamic time warping (DTW) has been widely used in speech recognition, pattern matching, and object tracking [27,28]. The specific calculation for DTW is described as follows:
To calculate DTW, considering two time series S = { s 1 , s 2 , , s m } and Q = { q 1 , q 2 , , q n } , DTW aims to find an optimal warping path P which can be seen as the corresponding relationship between the two time series. The optimal warping path P is calculated using a dynamic programming method, such as the Dijkstra algorithm, which seeks to obtain the cumulative distance C D ( i , j ) with distance d ( i , j ) = s i q j by adding the minimum of the cumulative distance.
D T W ( S , Q ) = min k = 1 K d ( p k )
Here K [ max ( m , n ) , m + n 1 ] .
Since the optimization model (1) is a NP-hard combination optimization problem, the simplified calculation algorithm is specifically introduced in [26].

2.2. Clear-Sky Solar Irradiance Model

The CSI model represents a theoretical estimation of solar radiation that is expected to be received at a specific location in standard meteorological conditions. Establishing the CSI is crucial for accurate DNI prediction [29]. The BIRD model (BIRD), an offshoot of the spectral irradiance model, commands substantial recognition as a preeminent and extensively adopted framework within the domain, acknowledged for its pragmaticity and widespread application [30]. It incorporates five distinct direct transmittance functions, accounting for factors such as Rayleigh scattering, ozone absorption, water vapor, mixed gases, and aerosol extinction. Furthermore, the model assesses the diffuse transmittance based on the same set of transmittance functions [31,32]. For the purposes of this study, the latest formulation of the BIRD model was employed, with necessary adjustments made to ensure its applicability in arid desert areas [33,34].

2.3. Gradient Boosting

Gradient boosting, a widely used algorithm in ensemble learning, has demonstrated remarkable effectiveness across various domains, including time series prediction, moving object detection, and natural language processing [35,36,37]. The fundamental principle behind gradient boosting involves training a collection of weak learners, with each weak learner aiming to fit the negative gradient of the loss function of the previously accumulated model. By iteratively adding these weak learners, the overall model loss is gradually reduced in the direction of the negative gradient. This iterative process of gradient boosting can be visualized as a step-by-step refinement, where each subsequent weak learner is trained to address the remaining errors of the previous models. By fitting the negative gradient, these weak learners focus on learning and capturing the patterns or relationships that the previous models failed to capture, thereby gradually improving model’s overall predictive performance. The calculation of gradient boosting is as follows:
Suppose there are training samples x i , y i , and the cumulative output of the first m 1 weak classifier is F m 1 ( x ) . The m th weak learner can be trained using the following formula:
F m ( x ) = F m 1 ( x ) + arg min h H   l o s s ( y i , F m 1 ( x i ) + h ( x i ) )
That is, to find a weak learner h in the function space H , the loss of the cumulative errors is made smaller after adding it. According to the characteristics of solar radiation time series, a gradient boosting regression tree (GBRT) is applied as the regressor in MMP [38].

2.4. Patterns in Solar Radiation Time Series

PR-DTW has emerged as a potent tool in time series prediction due to its exceptional ability to capture and retain the intricate morphological characteristics of the original sequences [26]. In the context of solar radiation analysis, it is crucial to account for the continuous influence of diverse factors that impact solar radiation, which adhere to the underlying physical regularities of atmospheric motion. By applying PR-DTW to analyze the patterns within a DNI time series, a comprehensive understanding of various interfering factors can be obtained. Moreover, this pattern analysis approach facilitates an intuitive exploration of the changing patterns and underlying regularities within the solar radiation time series, ultimately shedding light on the mechanisms governing atmospheric movements.
To illustrate the application of pattern extraction in a solar radiation analysis, we took the minute-level DNI time series from the NREL Solar Radiation Research Laboratory on November 23, 2022 as an example (Figure 1). In subsequent experiments, we used hourly sampling of the original time series. By employing pattern extraction techniques, we gained valuable insights into the morphological characteristics of the entire time series (Figure 2). When the extracted pattern length is relatively short, the pattern extraction algorithm places greater emphasis on capturing the overarching morphological features of the DNI time series. This approach allows us to discern broad patterns and trends, enabling a deeper understanding of the solar radiation behavior during the specified period. As the length of the extraction pattern increases, the algorithm begins to focus more on the finer details of the original time series. In particular, these extracted patterns demonstrated a strong correlation with cloud cover, as observed in the original all-sky images. Cloud cover plays a significant role in modulating solar radiation, acting as a crucial variable that influences the intensity and variability of DNI. Thus, by analyzing the extracted patterns in conjunction with cloud cover information, we could gain deeper insights into the complex relationship between cloud dynamics and solar radiation patterns.
The patterns extracted by the PR-DTW algorithm at different time scales displayed significantly different characteristics in the solar radiation time series analysis. Firstly, they revealed the change regularities within meteorological time series, offering insights into the systematic behavior of solar radiation. Additionally, these patterns showcased the complex variations in cloud cover, transitioning from an encompassing view of the entire sky to specific locations. This pattern decomposition mirrors the cloud pattern decomposition, making it a valuable tool in solar radiation time series analysis. This ability to decompose patterns and discern variations in cloud cover through different time scales not only enhances our understanding of solar radiation dynamics but also facilitates more accurate predictions and modeling. These insights may have practical applications in renewable energy planning, climate studies, and the optimization of solar power generation systems.

2.5. Mixed Multi-Pattern Regression Model

In order to take advantage of the patterns on different time scales and the important prior knowledge provided by the CSI model, a mixed multi-pattern regression model was developed for DNI prediction. There were seven regression factors in this study, which were denoted as GHI, DNI, DHI, Press, Wdir, Wspd, and RHum. For GHI, DNI, and DHI, PR-DTW was applied to obtain their short-term, mid-term, and long-term patterns, denoted as GHI_pattern, DNI_pattern, and DHI _pattern. Then, the obtained patterns of different time scales were calculated using the patterns obtained from the theoretical values provided by the CSI model to obtain the pattern variables in the short-term, mid-term, and long-term, respectively. Take sDNI_pattern_variable as an example, and the calculation is as follows:
sDNI _ pattern _ variable = DTW sDNI _ pattern , sDNI _ pattern _ CSI
As a variable representing the short-term DNI pattern, sDNI_pattern_variable de-notes the difference between the actual value and the theoretical value of the short-term DNI pattern properly, which can provide a nonlinear relationship between the patterns for DNI prediction. The remaining pattern variables could be obtained using the same method. Then, combined with the original time series, the pattern variables were used in the GBRT regressor to obtain predictions on different time scales. Finally, the predictions were integrated into a regressor to obtain the final prediction result. The design of nested GBRT can not only increase the model’s adaptability to different distributed datasets, but also balance the weights of DNI change patterns at different time scales to make accurate DNI predictions. The framework of the MMP is shown in Figure 3.

3. Results

In this section, we compared the MMP with nine DNI prediction models to evaluate the performance of the MMP.

3.1. Data Description

The experimental data were from the hourly observational data of Northwest Engineering Corporation Limited, which mainly contains seven key meteorological factors: GHI, DNI, DHI, Press, Wdir, Wspd, and Rhum. The arid desert regions to which the data belongs boast rich solar energies, and the collected data underwent a thorough quality check. Hami’s climatic conditions are representative of temperate continental climates, characterized by dry air, excellent atmospheric transparency, low cloud coverage, ample sunshine, and abundant solar energy. It is recognized as one of Xinjiang’s prime locations for solar resources. In Dunhuang, there is a prolonged daily sunshine duration, minimal cloud and rainfall levels, consistent high atmospheric transparency throughout the year, and a total sunshine duration surpassing 3257.9 h. Additionally, its vast desert areas provide ample usable space, making it an ideal location for the development of CSP stations. Jinta County in Gansu Province experiences an average daily sunshine duration of 3321 h, coupled with a solar radiation intensity of 64 million J / m 2 . It stands out as one of the nation’s highest solar radiation areas, with features such as prolonged sunshine, intense solar radiation, zero pollution, and a high conversion rate. The climate resource data in Northwest China holds immense value due to the challenges involved in its acquisition, and it has the potential to provide more abundant renewable energy conditions and help us analyze the regulation of climate change. The details of three datasets are shown in Table 1. The locations of the surface sites of three datasets in Northwest China are displayed in Figure 4.
The three datasets include the hourly time series data for a whole year. In the experiments, considering the greater randomness of DNI changes in the winter, in order to fully verify the validity of the models, we took the data from January to October as the training dataset for the model, and the data from November and December were used as the test dataset. Seven key meteorological factors, GHI, DNI, DHI, Press, Wdir, Wspd, and Rhum, were used as the inputs. Each meteorological factor of a sample is a time series with a fixed length, which is segmented from the original time series. The predicted value was chosen as the DNI value of one hour later. The fixed length of each time series in the experiments was chosen as 8. Therefore, DNI prediction can be taken as the task of a multidimensional time series prediction. The details of the data processing process are shown in Figure 5.

3.2. Benchmark Models and Standard Metrics

In order to highlight the effectiveness of MMP, comprehensive comparisons were made in this study by constructing another fourteen models, including the one smart persistence model, four linear regression models, three nonlinear regression models, and six deep learning models mentioned in [39,40]. Moreover, an ablation model was built to illustrate the effectiveness of the pattern. The details of the models are shown in Table 2.
Model 1 is a smart persistence model which is often referred to as the baseline in previous studies. Its main idea is to predict the next hour DNI by assuming that the next hour clear-sky index is the same as the current hour clear-sky index [41]. Model 2, model 3, model 4, and model 5 represent four traditional linear regression models called the Bayesian regression model [42], elastic-net regression model [43], lasso regression model [44] and ridge regression model [45], respectively. Model 6 and model 7 represent two traditional nonlinear regression models called the kernel ridge regression model [46] and nearest neighbors regression model [47], respectively. Model 9 is a gradient boosting regression tree (GBRT), which is a conventional model of ensemble learning [48]. Model 9 and model 10 are two feed-forward neural networks, and their specific parameter settings are the same as those in [39,40]. Model 11 is a model with a single layer convolutional neural network (CNN) structure [49]. Model 12 is a model with single layer long short-term memory (LSTM) structure [50]. Model 13 is encoder–decoder-LSTM model, and its parameter settings are shown in Figure 6a. Model 14 is a transformer model, and its parameter settings are shown in Figure 6b. Model 15 is the MMP model without pattern variables. Model 16 is the proposed MMP model.
To investigate the prediction performance of MMP and the baseline models, three statistical metrics were used to assess the models’ performance. Suppose the original measured sequence is S m = { s 1 m , s 2 m , , s N m } and the prediction sequence is S c = { s 1 c , s 2 c , , s N c } , then the three metrics are defined as:
Normalised mean absolute error (nMAE): nMAE is a commonly used metric for quantifying the deviation between a predicted sequence and the original sequence. It provides a normalized measure of the average absolute difference between the predicted values and the corresponding true values. By normalizing the error, nMAE enables fair comparisons across different datasets and scales. The value represents the average percentage error between the predicted and original sequences. nMAE is particularly valuable in assessing the accuracy of predictive models in various fields such as time-series analysis, machine learning, and forecasting. A lower nMAE indicates a more accurate prediction, as it reflects a smaller average discrepancy between the predicted and true values [51].
n M A E = 1 N i = 1 N s i c s i m / S ¯ m
Normalised root mean square error (nRMSE): nRMSE is a statistical metric commonly used to evaluate the accuracy of predictions by normalizing the square root of the difference between the predicted and true values. It provides a normalized measure of the average discrepancy between the predicted and original values, allowing for fair comparisons across different datasets and scales. It is particularly important when dealing with datasets that have varying ranges or units of measurement. The resulting nRMSE value represents the average percentage error after accounting for the magnitude of the original values [52].
n R M S E = 1 N i = 1 N ( s i c s i m ) 2 / S ¯ m × 100 %
R2 score ©: The R2 score, which is known as the coefficient of determination, holds a prominent position as a vital indicator for assessing the impact of model fitting. It indicates how well the model explains changes in the target variables. Specifically, when the R2 score is 1, the model fits perfectly, and when R2 is 0, it is stated that the model has no ability to account for changes in the target variable [53].
R = i = 1 N ( s i c S ¯ c ) ( s i m S ¯ m ) i = 1 N ( s i c S ¯ c ) 2 i = 1 N ( s i m S ¯ m ) 2

3.3. Model Comparison

The prediction results of the MMP were compared with fifteen prediction models, including one smart persistence model, four linear regression models, three nonlinear regression models, six deep learning models, and one ablation model. The contrastive experimental results are given in Table 3.
Compared with the 15 prediction models, the experimental results showed that the MMP and its ablation model delivered the best performance in terms of the three metrics, and it was even better than the currently recognized effective deep learning algorithms. Among the three datasets, the correlation coefficient between the MMP predicted value and true value was above 0.9, indicating that the MMP algorithm can effectively predict the trend in information of DNI. These good results were attributed to the CSI model and the patterns on different time scales. In addition, the construction of the MMP conforms to the human cognitive processes, imparting innovative insights that hold promise for advancing the realm of DNI prediction.
Figure 7 shows a comparison of the prediction results of the different algorithms on a clear-sky day and cloudy day, respectively. The red curve indicates the true values, and the blue curve indicates the predicted values of the MMP. The remaining curves indicate the prediction results of the comparison models. In the case of a sunny day (4 November 2013 in DunHuang), the DTW distance between the MMP prediction results and the original sequence is 445, which is closer to the original sequence in terms of the morphology compared to other models’ prediction results. Even in the case of a cloudy day (29 December 2013 in DunHuang), the DTW distance between the MMP prediction results and the original sequence is 609, which is also closer to the original sequence in terms of the trend changes compared to the other models’ prediction results. It is evident that, compared to the existing models, the MMP could effectively predict the changing trend in DNI and obtain prediction results that were closest to the true value, especially for conditions with rapidly changing hourly weather types. The MMP algorithm outperformed the other algorithms and benefited from the its use of patterns on different time scales and a well-designed model structure.

4. Discussion

Because CSP generation has a strong dependence on climate change factors in the natural environment, the quality of historical data may greatly affect the stable operation of CSP stations. Due to the intricate and ever-changing dynamics within the arid desert environments, the progress of the CSP generation displays substantial regional heterogeneity. This, in turn, gives rise to disparities and inadequacies in meteorological databases across different geographical areas. Currently, issues such as data fragmentation and gaps in data availability prevail, rendering the present proposed models not readily applicable for precise DNI prediction in arid desert areas. Consequently, the challenge of effectively harnessing limited datasets for achieving precise DNI prediction assumes paramount significance as an indispensable prerequisite for ensuring the steadfast operation of CSP stations.
Transfer learning, whose core is to achieve the purpose of assisting the learning process in the target domain by finding the similarity between the source domain and target domain, has shown great effectiveness in many fields [54,55,56]. Transfer learning provides a fruitful approach for learning from existing knowledge in the original domain, applying it to new domain based on the similarity of the datasets. Compared with traditional machine learning, transfer learning can use the learned knowledge to migrate to the target task, thus reducing the dependence of the target task on a large amount of data. To further improve the DNI prediction model using limited data, we have proposed a model combining the MMP and transfer learning (T-MMP).
The framework of T-MMP is illustrated in Figure 8. Firstly, T-MMP identified a source dataset that is similar to the target dataset, and then the MMP was trained on the source dataset to obtain a pre-trained MMP model. Finally, the pre-trained MMP model was fine-tuned using the target dataset. This approach leverages the similarities between the source and target datasets to enhance the performance of the DNI prediction model, especially in cases where the amount of target data is limited.
To verify the effectiveness of T-MMP, we selected the hourly data of Jinta County, Gansu Province, China in 2014 as the source dataset and the hourly data of Dunhuang, Gansu, China in 2013 as the target dataset based on similarity. Here, the similarity calculation mainly considered the distance between the geographical locations of the source dataset and the target dataset. The details are as shown in Table 4.
The three statistical metrics demonstrated that T-MMP performed better than MMP in DNI prediction, benefiting from its ability to capture the changes in interfering meteorological factors from a larger dataset (Table 5). Additionally, these results showed that the patterns of change in meteorological factors have a strong correlation with each other in similar areas, which could be used to explore the changing regulations of meteorological factors and improve solar radiation prediction.

5. Conclusions

Arid desert areas are rich in solar energy resources, but the randomness of solar energy sources makes CSP systems unstable. Accurate DNI prediction remains as a practical application of value engineering due to its impact on the stable operation of CSP stations. However, due to the randomness of time series, the existing models have difficulty making short-term DNI predictions accurately. In this study, in order to fully explore the changing regulation of climates and assess solar resources in arid desert areas, a time series pattern analysis in arid desert areas was proven to be effective for DNI prediction for the first time. To conduct real-time data analysis with minimal cost, we have developed a mixed multi-pattern regression model (MMP) for DNI prediction in arid desert areas by considering the CSI model and time series patterns. The MMP showed the best prediction performance among the existing prediction models in terms of three recognized statistical metrics, owing to its utilization of important prior physical knowledge provided by the CSI model and time series patterns on different time scales. The experimental results validated the effectiveness of the MMP in DNI prediction and also highlighted the potential of using time series patterns of meteorological factors for climate change research.
To address the issue of a low prediction accuracy due to limited data, we proposed T-MMP, which combines MMP and transfer learning. The experimental results showed that T-MMP outperformed MMP in DNI prediction, and also suggested that atmospheric change patterns in local areas with similar climate types are similar to each other. This surprising discovery may hold the potential to offer novel perspectives in the realm of data analysis within regions characterized by restricted historical data availability. By analyzing the patterns in meteorological time series, we can not only achieve more accurate DNI predictions, but also gain insights into the regularity of atmospheric movements. This provides a fascinating research perspective for meteorological analyses in future studies.

Author Contributions

T.H. conducted the research and wrote the manuscript; X.W. and Q.P. contributed to the data acquisition and the analyses; Y.W., K.C., H.P., Z.G., L.C. and W.S. reviewed and revised the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Shaanxi Provincial Department of Education Service Local Special Project (22JE010).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in our study are given in this paper, and they can be obtained publicly at: https://github.com/hantian421/Solar_datasets (accessed on 12 July 2023). Anyone in need can access it anytime.

Acknowledgments

This study was supported by the Shaanxi Provincial Department of Education.

Conflicts of Interest

Author X.W. is employed by Qinghai Photovoltaic Industry Centre Co., Ltd. Author K.C., author H.P. and author Z.G. are employed by Northwest Engineering Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Mitrašinović, A.M. Photovoltaics advancements for transition from renewable to clean energy. Energy 2021, 237, 121510. [Google Scholar] [CrossRef]
  2. Abdulhamed, A.J.; Adam, N.M.; Ab-Kadir, M.; Hairuddin, A.A. Review of solar parabolic-trough collector geometrical and thermal analyses, performance, and applications. Renew. Sustain. Energy Rev. 2018, 91, 822–831. [Google Scholar] [CrossRef]
  3. Nematollahi, O.; Alamdari, P.; Jahangiri, M.; Sedaghat, A.; Alemrajabi, A.A. A techno-economical assessment of solar/wind resources and hydrogen production: A case study with GIS maps. Energy 2019, 175, 914–930. [Google Scholar] [CrossRef]
  4. Corrocher, N.; Cappa, E. The Role of public interventions in inducing private climate finance: An empirical analysis of the solar energy sector. Energy Policy 2020, 147, 111787. [Google Scholar] [CrossRef]
  5. Liang, Z.; Wang, W.; Yu, Z.; Fan, L.; Hu, Y.; Ni, Y.; Fan, J.; Cen, K. An experimental investigation of a natural circulation heat pipe system applied to a parabolic trough solar collector steam generation system. Sol. Energy 2012, 86, 911–919. [Google Scholar]
  6. Relva, S.G.; Gimenes, A.L.V.; Udaeta, M.E.M.; Galvão, L.C.R. Transmittance index characterization at two solar measurement stations in Brazil. Theor. Appl. Climatol. 2020, 139, 205–219. [Google Scholar] [CrossRef]
  7. Montornès, A.; Codina, B.; Zack, J.W. Analysis of the ozone profile specifications in the W RF-ARW model and their impact on the simulation of direct solar radiation. Atmos. Chem. Phys. 2015, 14, 20231–20257. [Google Scholar]
  8. Li, Z. Influence of Absorbing Aerosols on the Inference of Solar Surface Radiation Budget and Cloud Absorption. J. Clim. 1998, 11, 5–17. [Google Scholar] [CrossRef]
  9. Tollenaar, M.; Fridgen, J.; Tyagi, P.; Stackhouse, P.W.S., Jr.; Kumudini, S. The contribution of solar brightening to the US maize yield trend. Nat. Clim. Change 2017, 7, 275–278. [Google Scholar] [CrossRef]
  10. Chu, Y.; Li, M.; Pedro, H.T.C.; Coimbra, C.F. Real-time prediction intervals for intra-hour DNI forecasts. Renew. Energy 2015, 83, 234–244. [Google Scholar] [CrossRef]
  11. Chu, Y.; Coimbra, C.F.M. Short-term probabilistic forecasts for direct normal irradiance. Renew. Energy 2017, 101, 526–536. [Google Scholar] [CrossRef]
  12. Zhu, T.; Guo, Y.; Li, Z.; Wang, C. Solar radiation prediction based on convolution neural network and long short-term memory. Energies 2021, 14, 8498. [Google Scholar] [CrossRef]
  13. Sánchez-Segura, C.D.; Valentín-Coronado, L.; Peña-Cruz, M.I.; Díaz-Ponce, A.; Moctezuma, D.; Flores, G.; Riveros-Rosas, D. Solar irradiance components estimation based on a low-cost sky-imager. Sol. Energy 2021, 220, 269–281. [Google Scholar] [CrossRef]
  14. Schreck, S.; Schroedter-Homscheidt, M.; Klein, M.; Cao, K.K. Satellite image-based generation of high frequency solar radiation time series for the assessment of solar energy systems. Meteorol. Z. 2020, 29, 377–393. [Google Scholar] [CrossRef]
  15. Salgueiro, V.; Costa, M.J.; Silva, A.M.; Bortoli, D. Effects of clouds on the surface shortwave radiation at a rural inland mid-latitude site. Atmos. Res. 2016, 178, 95–101. [Google Scholar] [CrossRef]
  16. Tzoumanikas, P.; Nikitidou, E.; Bais, A.F.; Kazantzidis, A. The effect of clouds on surface solar irradiance, based on data from an all-sky imaging system. Renew. Energy 2016, 95, 314–322. [Google Scholar] [CrossRef]
  17. Correia, J.M.; Bastos, A.; Brito, M.C.; Trigo, R. The influence of the main large-scale circulation patterns on wind power production in Portugal. Renew. Energy 2017, 102, 214–223. [Google Scholar] [CrossRef]
  18. Ohba, M.; Kadokura, S.; Nohara, D. Impacts of synoptic circulation patterns on wind power ramp events in East Japan. Renew. Energy 2016, 96, 591–602. [Google Scholar] [CrossRef]
  19. Steiner, A.; Köhler, C.; Metzinger, I.; Braun, A.; Zirkelbach, M.; Ernst, D.; Tran, P.; Ritter, B. Critical weather situations for renewable energies–Part A: Cyclone detection for wind power. Renew. Energy 2017, 101, 41–50. [Google Scholar] [CrossRef]
  20. Köhler, C.; Steiner, A.; Saint-Drenan, Y.M.; Ernst, D.; Bergmann-Dick, A.; Zirkelbach, M.; Ben Bouallègue, Z.; Metzinger, I.; Ritter, B. Critical weather situations for renewable energies–Part B: Low stratus risk for solar power. Renew. Energy 2017, 101, 794–803. [Google Scholar] [CrossRef]
  21. Rodriguez-Benitez, F.J.; Arbizu-Barrena, C.; Santos-Alamillos, F.J.; Tovar-Pescador, J.; Pozo-Vázquez, D. Analysis of the intra-day solar resource variability in the Iberian Peninsula. Sol. Energy 2018, 171, 374–387. [Google Scholar] [CrossRef]
  22. Augustine, J.A.; Hodges, G.B. Variability of Surface Radiation Budget Components Over the US From 1996 to 2019—Has Brightening Ceased? J. Geophys. Res. Atmos. 2021, 126, e2020JD033590. [Google Scholar] [CrossRef]
  23. Serrano Ardila, V.M.; Maciel, J.N.; Ledesma, J.J.G.; Junior, O.H.A. Fuzzy Time Series Methods Applied to (In) Direct Short-Term Photovoltaic Power Forecasting. Energies 2022, 15, 845. [Google Scholar] [CrossRef]
  24. Zhu, T.; Li, Y.; Li, Z.; Guo, Y.; Ni, C. Inter-Hour Forecast of Solar Radiation Based on Long Short-Term Memory with Attention Mechanism and Genetic Algorithm. Energies 2022, 15, 1062. [Google Scholar] [CrossRef]
  25. Ivanova, S.M.; Gueymard, C.A. Simulation and applications of cumulative anisotropic sky radiance patterns. Sol. Energy 2019, 178, 278–294. [Google Scholar] [CrossRef]
  26. Han, T.; Peng, Q.K.; Zhu, Z.B.; Shen, Y.; Huang, H.; Abid, N.N. A pattern representation of stock time series based on DTW. Phys. A Stat. Mech. Its Appl. 2020, 550, 124161. [Google Scholar] [CrossRef]
  27. Mueen, A.; Chavoshi, N.; Abu-El-Rub, N.; Hamooni, H.; Minnich, A.; MacCarthy, J. Speeding up dynamic time warping distance for sparse time series data. Knowl. Inf. Syst. 2018, 54, 237–263. [Google Scholar] [CrossRef]
  28. Sharabiani, A.; Darabi, H.; Harford, S.; Douzali, E.; Karim, F.; Johnson, H.; Chen, S. Asymptotic Dynamic Time Warping calculation with utilizing value repetition. Knowl. Inf. Syst. 2018, 57, 359–388. [Google Scholar] [CrossRef]
  29. Zhao, X.; Wei, H.; Shen, Y.; Zhang, K. Real-time clear-sky model and cloud cover for direct normal irradiance prediction. J. Phys. Conf. Series. IOP Publ. 2018, 1072, 012003. [Google Scholar] [CrossRef]
  30. Ruiz-Arias, J.A.; Gueymard, C.A. Worldwide inter-comparison of clear-sky solar radiation models: Consensus-based review of direct and global irradiance components simulated at the earth surface. Sol. Energy 2018, 168, 10–29. [Google Scholar] [CrossRef]
  31. Bird, R.E. Terrestrial solar spectral modeling. Sol. Cells 1982, 7, 107–118. [Google Scholar] [CrossRef]
  32. Bird, R.E.; Hulstrom, R.L. Availability of SOLTRAN 5 Solar Spectral Model. Sol. Energy 1983, 30, 379. [Google Scholar] [CrossRef]
  33. Bird, R.E.; Hulstrom, R.L. Review, Evaluation, and Improvement of Direct Irradiance Models. J. Sol. Energy Eng. 1981, 103, 182–192. [Google Scholar] [CrossRef]
  34. Gueymard, C.A. Clear-sky irradiance predictions for solar resource mapping and large-scale applications: Improved validation methodology and detailed performance analysis of 18 broadband radiative models. Sol. Energy 2012, 86, 2145–2169. [Google Scholar] [CrossRef]
  35. Nakagawa, K.; Yoshida, K. Time-series gradient boosting tree for stock price prediction. Int. J. Data Min. Model. Manag. 2022, 14, 110–125. [Google Scholar] [CrossRef]
  36. Cao, X.; Wu, C.; Lan, J.; Yan, P.; Li, X. Vehicle Detection and Motion Analysis in Low-Altitude Airborne Video Under Urban Environment. IEEE Trans. Circuits Syst. Video Technol. 2011, 21, 1522–1533. [Google Scholar] [CrossRef]
  37. Li, J.; Zhang, H.; Wei, Z. The weighted word2vec paragraph vectors for anomaly detection over HTTP traffic. IEEE Access 2020, 8, 141787–141798. [Google Scholar] [CrossRef]
  38. Rao, H.; Shi, X.; Rodrigue, A.K.; Feng, J.; Xia, Y.; Elhoseny, M.; Yuan, X.; Gu, L. Feature selection based on artificial bee colony and gradient boosting decision tree. Appl. Soft Comput. 2019, 74, 634–642. [Google Scholar] [CrossRef]
  39. Qing, X.; Niu, Y. Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM. Energy 2018, 148, 461–468. [Google Scholar] [CrossRef]
  40. El Boujdaini, L.; Mezrhab, A.; Moussaoui, M.A. Artificial neural networks for global and direct solar irradiance forecasting: A case study. Energy Sources Part A Recovery Util. Environ. Eff. 2021, 1, 1–21. [Google Scholar] [CrossRef]
  41. Yang, D. A guideline to solar forecasting research practice: Reproducible, operational, probabilistic or physically-based, ensemble, and skill (ROPES). J. Renew. Sustain. Energy 2019, 11, 2. [Google Scholar] [CrossRef]
  42. Muth, C.; Oravecz, Z.; Gabry, J. User-friendly Bayesian regression modeling: A tutorial with rstanarm and shinystan. Quant. Methods Psychol. 2018, 14, 99–119. [Google Scholar] [CrossRef]
  43. Su, M.; Wang, W. Elastic net penalized quantile regression model. J. Comput. Appl. Math. 2021, 392, 113462. [Google Scholar] [CrossRef]
  44. Ranstam, J.; Cook, J.A. LASSO regression. J. Br. Surg. 2018, 105, 1348. [Google Scholar] [CrossRef]
  45. Saleh, A.K.M.E.; Arashi, M.; Kibria, B.M.G. Theory of Ridge Regression Estimation with Applications; John Wiley Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
  46. Exterkate, P.; Groenen, P.J.F.; Heij, C.; van Dijk, D. Nonlinear forecasting with many predictors using kernel ridge regression. Int. J. Forecast. 2016, 32, 736–753. [Google Scholar] [CrossRef]
  47. Mailagaha Kumbure, M.; Luukka, P. A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance. Granul. Comput. 2022, 7, 657–671. [Google Scholar] [CrossRef]
  48. Yang, H.; Liu, X.; Song, K. A novel gradient boosting regression tree technique optimized by improved sparrow search algorithm for predicting TBM penetration rate. Arab. J. Geosci. 2022, 15, 461. [Google Scholar] [CrossRef]
  49. Liu, C.L.; Hsaio, W.H.; Tu, Y.C. Time series classification with multivariate convolutional neural network. IEEE Trans. Ind. Electron. 2018, 66, 4788–4797. [Google Scholar] [CrossRef]
  50. Hua, Y.; Zhao, Z.; Li, R.; Chen, X.; Liu, Z.; Zhang, H. Deep learning with long short-term memory for time series prediction. IEEE Commun. Mag. 2019, 57, 114–119. [Google Scholar] [CrossRef]
  51. Balamurugan, S.; Mallick, P.S. Error Compensation Techniques for Fixed-Width Array Multiplier Design—A Technical Survey. J. Circuits Syst. Comput. 2017, 26, 1730003. [Google Scholar] [CrossRef]
  52. Yu, S.H.; Lin, C.C.; Cheng, H.W. A note on mean squared prediction error under the unit root model with deterministic trend. J. Time Ser. Anal. 2012, 33, 276–286. [Google Scholar] [CrossRef]
  53. Dutta, K.; Chandra, S.; Gourisaria, M.K.; GM, H. A data mining based target regression-oriented approach to modelling of health insurance claims. In Proceedings of the 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; pp. 1168–1175. [Google Scholar]
  54. Lu, J.; Behbood, V.; Hao, P.; Zuo, H.; Xue, S.; Zhang, G. Transfer learning using computational intelligence: A survey. Knowl.-Based Syst. 2015, 80, 14–23. [Google Scholar] [CrossRef]
  55. Wang, S.; Zhang, L.; Fu, J. Adversarial transfer learning for cross-domain visual recognition. Knowl.-Based Syst. 2020, 204, 106258. [Google Scholar] [CrossRef]
  56. Tang, H.; Mi, Y.; Xue, F.; Cao, Y. Graph domain adversarial transfer network for cross-domain sentiment classification. IEEE Access 2021, 9, 33051–33060. [Google Scholar] [CrossRef]
Figure 1. Minute-level DNI, GHI, and DHI original time series from the NREL Solar Radiation Research Laboratory (BMS) on 23 November 2022. The red line is the GHI time series ( W / m 2 ). The blue line is the DHI time series ( W / m 2 ). The green line is the DNI time series ( W / m 2 ).
Figure 1. Minute-level DNI, GHI, and DHI original time series from the NREL Solar Radiation Research Laboratory (BMS) on 23 November 2022. The red line is the GHI time series ( W / m 2 ). The blue line is the DHI time series ( W / m 2 ). The green line is the DNI time series ( W / m 2 ).
Sustainability 15 12885 g001
Figure 2. The patterns obtained by PR_DTW on different time scales and their relationship with ‘cloud analyzed’ images (with cloud cover values).
Figure 2. The patterns obtained by PR_DTW on different time scales and their relationship with ‘cloud analyzed’ images (with cloud cover values).
Sustainability 15 12885 g002
Figure 3. The framework of the MMP.
Figure 3. The framework of the MMP.
Sustainability 15 12885 g003
Figure 4. The geographic locations of three experimental datasets in Northwest China.
Figure 4. The geographic locations of three experimental datasets in Northwest China.
Sustainability 15 12885 g004
Figure 5. The details of the data processing process.
Figure 5. The details of the data processing process.
Sustainability 15 12885 g005
Figure 6. The parameter settings of model 13 and model 14.
Figure 6. The parameter settings of model 13 and model 14.
Sustainability 15 12885 g006
Figure 7. (a) Comparison of DNI prediction results for a clear sky day in DunHuang (4 November 2013). (b) Comparison of DTW distance between the real DNI sequences and the predicted DNI sequences obtained using different models under sunny conditions. (c) Comparison of DNI prediction results for a clear sky day in DunHuang (29 December 2013). (d) Comparison of the DTW distance between the real DNI sequences and predicted DNI sequences obtained using different models under cloudy conditions.
Figure 7. (a) Comparison of DNI prediction results for a clear sky day in DunHuang (4 November 2013). (b) Comparison of DTW distance between the real DNI sequences and the predicted DNI sequences obtained using different models under sunny conditions. (c) Comparison of DNI prediction results for a clear sky day in DunHuang (29 December 2013). (d) Comparison of the DTW distance between the real DNI sequences and predicted DNI sequences obtained using different models under cloudy conditions.
Sustainability 15 12885 g007
Figure 8. The framework of T-MMP.
Figure 8. The framework of T-MMP.
Sustainability 15 12885 g008
Table 1. Details of the three datasets.
Table 1. Details of the three datasets.
PositionYearLatitudeLongitude
Dataset 1Hami, Xinjiang, China201343.0193.59
Dataset 2Dunhuang, Gansu, China201340.0894.47
Dataset 3Jinta, Gansu, China201440.0698.675
Table 2. The details of the prediction models about whether they consist of CSI and pattern variables.
Table 2. The details of the prediction models about whether they consist of CSI and pattern variables.
DescriptionCSIPatterns
Model 1Persistence model××
Model 2Bayesian regression××
Model 3Elastic-net regression××
Model 4Lasso regression××
Model 5Ridge regression××
Model 6Kernel ridge regression××
Model 7Nearest neighbors regression××
Model 8GBRT××
Model 9BPNN××
Model 10FFNN××
Model 11CNN××
Model 12LSTM××
Model 13Encoder-decoder-LSTM××
Model 14Transformer××
Model 15Mixed multi-pattern regression without pattern variables×
Model 16Mixed multi-pattern regression
Table 3. The prediction performance of different models on the three datasets.
Table 3. The prediction performance of different models on the three datasets.
ModelDataset 1Dataset 2Dataset 3
nMAEnRMSERnMAEnRMSERnMAEnRMSER
Model 10.36570.92210.71680.35340.82480.74460.45821.05590.5197
Model 20.39700.62290.87070.35170.53780.89140.23590.39910.9313
Model 30.39260.61660.87330.35130.53910.89090.23500.40020.9309
Model 40.39570.61450.87420.35570.53780.89140.23630.39930.9312
Model 50.39560.61400.87440.35680.53800.89130.23600.39710.9320
Model 60.38560.61160.87540.34720.53550.89230.23910.39760.9318
Model 70.28180.56820.89240.26940.53200.89370.23940.43370.9189
Model 80.32110.54290.90180.31140.50120.90570.24760.40730.9285
Model 90.33810.56970.89190.28890.48570.91140.23130.39070.9342
Model 100.30910.53620.90420.27740.47270.91610.22800.38650.9356
Model 110.38370.62790.86860.34520.54440.88870.22720.42730.9213
Model 120.38650.77350.80070.31870.65220.84030.27490.52250.8823
Model 130.31550.67930.84630.28890.57400.87630.22720.42730.9213
Model 140.34540.61430.87430.29610.53810.89130.22630.38550.9359
Model 150.28150.54990.89920.27080.48180.91280.23900.40600.9289
Model 160.27660.52480.90820.25700.46330.91940.21390.37810.9384
Table 4. Details of the datasets in T-MMP.
Table 4. Details of the datasets in T-MMP.
PositionYearLatitudeLongitude
Source datasetJinta, Gansu, China201440.0698.65
Target datasetDunhuang, Gansu, China201340.0894.47
Table 5. Comparison results between MMP and T-MMP in DNI prediction in Dunhuang.
Table 5. Comparison results between MMP and T-MMP in DNI prediction in Dunhuang.
nMAEnRMSER
MMP0.25700.46330.9194
T-MMP0.23570.42100.9236
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, T.; Wang, Y.; Wang, X.; Chen, K.; Peng, H.; Gao, Z.; Cui, L.; Sun, W.; Peng, Q. Mixed Multi-Pattern Regression for DNI Prediction in Arid Desert Areas. Sustainability 2023, 15, 12885. https://doi.org/10.3390/su151712885

AMA Style

Han T, Wang Y, Wang X, Chen K, Peng H, Gao Z, Cui L, Sun W, Peng Q. Mixed Multi-Pattern Regression for DNI Prediction in Arid Desert Areas. Sustainability. 2023; 15(17):12885. https://doi.org/10.3390/su151712885

Chicago/Turabian Style

Han, Tian, Ying Wang, Xiao Wang, Kang Chen, Huaiwu Peng, Zhenxin Gao, Lanxin Cui, Wentong Sun, and Qinke Peng. 2023. "Mixed Multi-Pattern Regression for DNI Prediction in Arid Desert Areas" Sustainability 15, no. 17: 12885. https://doi.org/10.3390/su151712885

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop