Optimization Hybrid of Multiple-Lag LSTM Networks for Meteorological Prediction

Zhu, Lin; Zhang, Zhihua; Crabbe, M. James C.; Das, Lipon Chandra

doi:10.3390/math11224603

Open AccessArticle

Optimization Hybrid of Multiple-Lag LSTM Networks for Meteorological Prediction

¹

School of Mathematics, Shandong University, Jinan 250100, China

²

Wolfson College, Oxford University, Oxford OX2 6UD, UK

³

Department of Mathematics, University of Chittagong, Chittagong 4331, Bangladesh

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(22), 4603; https://doi.org/10.3390/math11224603

Submission received: 23 September 2023 / Revised: 22 October 2023 / Accepted: 6 November 2023 / Published: 10 November 2023

(This article belongs to the Special Issue Advanced Statistical Techniques in Oceans and Climate Research)

Download

Browse Figures

Versions Notes

Abstract

:

Residences in poor regions always depend on rain-fed agriculture, so they urgently need suitable tools to make accurate meteorological predictions. Unfortunately, meteorological observations in these regions are usually sparse and irregularly distributed. Conventional LSTM networks only handle temporal sequences and cannot utilize the links of meteorological variables among stations. GCN-LSTM networks only capture local spatial structures through the simple structures of fixed adjacency matrices, and the CNN-LSTM can only mine gridded meteorological observations for further predictions. In this study, we propose an optimization hybrid of multiple-lag LSTM networks for meteorological predictions. Our model can make full use of observed data at partner stations under different time-lag windows and strong links among the local observations of meteorological variables to produce future predictions. Numerical experiments on the meteorological predictions of Bangladesh demonstrate that our networks are superior to the classic LSTM and its variants GCN-LSTM and CNN-LSTM, as well as the SVM and DT.

Keywords:

multiple-lag LSTM networks; optimization hybrid; meteorological prediction

MSC:

62P12; 86A05

1. Introduction

With the development of the modern industrial and energy sectors, human-caused greenhouse gas emissions have experienced a dramatic rise. This surge has resulted in a substantial elevation of the CO₂ level in the atmosphere, which soared from approximately 280 parts per million (ppm) in the 1850s to about 418 ppm in 2023. Consequently, the Earth’s average surface temperature has seen a notable increase of 0.8–1.3 degrees Celsius since the mid-19th century. The escalating global average temperatures raise the likelihood of approaching critical tipping points within the global climate system. Once these thresholds are surpassed, they can initiate self-reinforcing feedback mechanisms that exacerbate global warming. There is an increasing acknowledgment that achieving the goal of limiting the global average temperature increase to 1.5 or 2 degrees Celsius above the pre-industrial level, as established by the 2015 Paris Agreement, appears to be an exceedingly challenging task. Meanwhile, climate-induced damage always scales exponentially, rather than linearly, with mean temperature rises. The adverse impacts of climate change are already surpassing expectations, extending to a wider range of areas and displaying greater severity, especially for poor regions with agriculture-based economies, backward industries, and limited climate observations. High-accuracy climate/meteorological prediction tools, which make full use of limited meteorological observations in poor regions, can help the emergency management departments of these regions with advance preparation and mitigation responses to climate disasters and, in doing so, reduce heavy losses of lives and property [1,2].

The dynamic mechanism of the global climate system is governed mainly by the conservation principles of mass, momentum, and energy. Through the numerical solution of these physical equations, climate models can simulate evolution patterns and trends of the atmosphere, oceans, land, and ice in three spatial dimensions and over time [3,4]. Since many atmospheric processes occur on scales smaller than the grid resolution, climate models always use parameterization schemes that heuristically describe these sub-grid processes on resolved scales, leading to a major source of uncertainty in meteorological and climate simulations. High-accuracy, high-resolution predictions by climate models cannot tell us with certainty which kind of weather will appear next week; they can only provide a likely range of outcomes [5]. The key drawbacks of climate models include an imperfect ability to transform climate dynamics into accurate mathematical equations, the models’ inability to capture complex processes (e.g., Butterfly effects), inaccurate representations of interaction mechanisms, and the use of complex data assimilation techniques to incorporate the latest observations [6,7]. Due to their high computational cost, high-accuracy, high resolution climate simulations must be executed on supercomputers [3,8].

Unlike climate models, data-driven deep learning techniques open up a brand-new pathway for meteorological and climate predictions. These techniques have shown excellent advantages for mining complex links without relying on explicit functions. The well-known neural networks are the most popular deep learning techniques for meteorological predictions [9,10,11]. The core step is to replace the traditional sub-grid parameterizations in climate models by multiscale feature extraction in model training processes. Among the neural networks, Long Short-Term Memory (LSTM) networks are unanimously recognized as the most effective networks for handling and predicting temporal correlations [12,13,14,15]. Since LSTM networks cannot utilize spatial correlations of meteorological data, it has been proposed that LSTM networks should be integrated with Convolutional Neural Networks (CNNs) [16], which can extract coarser-scale spatial features from images or similar grid-based data. Kim et al. [17] demonstrated that precipitation predictions by the CNN-LSTM are obviously more accurate than the classic LSTM. Gamboa-Villafruela et al. [18] conducted precipitation nowcasting by training the CNN-LSTM network architecture using gridded precipitation datasets from the remote-sensing-based NASA Global Precipitation Measurement. Since CNN was initially invented for image learning, hybrid CNN-LSTM networks are much more suitable for regular data than non-regular data [19,20]. For irregularly distributed observations, the coupling of graph convolution networks (GCN) and LSTM networks has also been proposed, where an LSTM is used to capture temporal connections and a graph convolution is used to model its spatial relationships [13,21,22,23]. García-Duarte et al. [13] found that the GCN-LSTM produced more accurate predictions for air temperature than the CNN-LSTM. However, the GCN-LSTM also has drawbacks, including a higher number of hyperparameters, the need for time-consuming hyperparameter tuning, and fixed adjacency matrices.

In this study, we overcome the main drawbacks of the LSTM and two variants (GCN-LSTM, CNN-LSTM) and propose an optimization hybrid of multiple-lag LSTM networks for meteorological predictions. Our model can easily extract spatio-temporal links from the observation data from different stations under different time-lag windows and make full use of these links for high-accuracy predictions. Numerical experiments on meteorological predictions in Bangladesh demonstrate that our networks are superior to the classic LSTM and its variants (GCN-LSTM and CNN-LSTM) as well as to the SVM (support vector machines) and DT (decision tree).

2. LSTM Networks and Their Hybrid Models

The Earth’s climate is a highly intricate, multifaceted system with various physical processes operating on diverse temporal and spatial scales. Consequently, predicting climate/meteorological outcomes has become a formidable endeavor. Although conventional neural networks (NNs) may not directly uncover all of the concealed relationships within meteorological observations, they have the potential to harness these latent connections for meteorological predictions. It is worth emphasizing that neural networks frequently encounter the challenges of vanishing gradients and exploding gradients during the training process. The working principles of the Long Short-Term Memory (LSTM) networks are similar to those of NNs. LSTM networks are equipped with unique components known as memory blocks within their recurrent hidden layers [24]. Memory blocks consist of multiple self-connected memory cells, which retain the network’s temporal state. These cells are regulated by specialized multiplicative components known as gate units, which manage the information flow. The gate units include the input, output, and forget gates [24] (Figure 1).

➢: The input gate is responsible for managing the entry of input activations into the memory cell. Its role is to ensure that the memory cell remains shielded from irrelevant inputs by learning how to filter out such information and retain only relevant data.
➢: The output gate regulates the release of cell activations to the rest of the network. Its goal is to safeguard other units within the network from receiving irrelevant information stored in the memory cell by learning how to filter out unnecessary content.
➢: The forget gate adjusts the magnitude of the internal state of the cell. Its objective is to acquire the ability to govern how long a specific value persists within the memory cell.

LSTM networks have demonstrated good performances and high accuracy when managing and forecasting spatio-temporal connections [12,25]. This means that LSTM networks can only make predictions for single climatic time series from only one meteorological station. Since various meteorological variables are closely linked and are observed in a sparse and irregularly distributed network, the use of pure LSTM networks does not allow the use of these important links, so the accuracy of meteorological predictions is not satisfactory.

Dynamic networks are frequently represented in terms of direct or undirect graphs. The Graph Convolutional Network—Long Short-Term Memory (GCN-LSTM) is a hybrid model that combines Graph Convolutional Networks (GCN) and Long Short-Term Memory (LSTM) networks [26]. It is widely used for processing time series data, particularly in datasets that exhibit graph-like relationships, such as social networks, geographical spatial data, and more. The GCN-LSTM allows the model to effectively capture spatiotemporal information within the data, as it leverages the GCN to handle spatial relationships and the LSTM to handle temporal sequences, along with their connections using static adjacency matrices [27]. The GCN-LSTM can effectively learn and integrate both the structural and temporal characteristics of dynamic networks. This, in turn, enables it to make predictions regarding the future addition and removal of network links [28]. Hence, the GCN-LSTM network is proficient at leveraging high-dimensional, time-dependent, and sparsely structured sequential data as its input. It can then produce predicted network configurations, all while seamlessly conducting network link predictions. This integrated approach is highly effective, allowing good multi-site time series forecasting [27]. Since the link weights of the GCN-LSTM are fixed under adjacency matrices, only the linear links between different meteorological stations can be utilized for predictions. For the GCN layer, we design a multidimensional time series graph of observation stations with an adjacency matrix whose weight is just the reciprocal of the distance. The GCN-LSTM primarily utilizes the reciprocal relationship between the distances among stations, while our proposed approach presented in Section 3 primarily relies on the correlation matrix between stations.

The other approach to solve the limitation of LSTM networks is through merging the CNN into the LSTM [10,29,30]. A Convolutional Neural Network (CNN) is a hybrid model designed for effectively handling multidimensional data with structured grid-like patterns. The CNN-LSTM network combines the power of Convolutional Neural Network (CNN) layers to extract spatial features from multidimensional data and Long Short-Term Memory (LSTM) units to facilitate sequence predictions [29,30]. The CNN can extract features and capture spatial connections within grid data, while the LSTM is good at capturing long-term temporal features [31]. Since the CNN was initially invented for image learning, the hybrid CNN-LSTM networks are much more suitable for regular data than non-regular data [19,20].

The Support Vector Machine (SVM) is a supervised learning model that is used for classification and regression analyses [32]. It has extensive application within the realm of machine learning and has demonstrated its strong performance and solid theoretical foundation. It employs hyperplanes, i.e., multidimensional boundaries, to effectively separate data input points with different values. The SVM can simulate the boundaries of complex nonlinear decisions, making it popular for use in meteorological forecasting [33,34].

The Decision Tree (DT) is a tree-like structured supervised learning model that is used for classification and regression tasks [35]. It progressively splits data based on input features to create a decision tree where each leaf node represents a category or a numerical value. This model is non-parametric, making it suitable for effectively handling extensive and intricate datasets without the need for complex parameter specifications. When the dataset is sufficiently large, it can be split into training and validation sets. The training set is used to construct a decision tree model, and the validation set is employed to determine the ideal tree size for achieving the best final model.

3. Our Improved LSTM Network

Residences in poor regions always depend on rain-fed agriculture, so they urgently need suitable tools to make accurate meteorological predictions. Unfortunately, meteorological observations in these regions are usually sparse and irregularly distributed. The conventional LSTM networks only handle temporal sequences and cannot utilize the links of meteorological variables among stations, so they are only suitable for predictions for a single station. Although GCN-LSTM networks can capture local spatial structures by embedding graph elements, the simple structure of the fixed adjacency matrices in the GCN have a limited capacity to predict spatial evolution. Meanwhile, noticing that CNNs, arising from image learning, always use gridded data as their inputs, the hybrid network CNN-LSTM cannot utilize non-grid data as its input and then make good meteorological predictions. In summary, various types of LSTM networks are not suitable prediction tools for sparse and irregularly distributed meteorological observations.

In this section, we propose the optimization hybrid of multiple-lag LSTM networks for meteorological predictions. The whole prediction process for meteorological variable Y by using this network is as follows:

Step 1 (Selection of Partner Stations). The meteorological variable Y is observed in all meteorological stations. In order to reduce the computational cost and time cost in the prediction process, for each station, we only select three partner stations, such that the correlation coefficients of the observed meteorological variable Y among these four meteorological stations reach their maximal values. In most cases, these three partner stations are often the three closest stations to the targeted station.

Step 2 (Selection of Auxiliary Factors). We cannot obtain good predictions of meteorological variable Y only by using its observed data. At the same time, various meteorological variables can be observed at each meteorological station. When one meteorological variable observed in the targeted station and its associated partner stations is significantly linked to meteorological variable Y at the targeted station, we select this meteorological variable as one auxiliary factor for the prediction of meteorological variable Y. The simplest standard to determine a significant link is that the correlation coefficient must be large.

Step 3 (Multiple-Lag LSTM Network). Different time-lag windows have different impacts on the meteorological prediction performance. In order to enhance the interpretability of LSTM networks and reveal the weights of different time-lag windows in the prediction accuracy, we consider multiple-lag LSTM networks, each of which is input by observed data in a single time window. For example, if we consider seven-lag LSTM networks, the first LSTM network uses the observed data from day

t - 1

to predict the value of meteorological variable Y on day

t

;……; the seventh LSTM network uses the observed data from day

t - 7

to predict the value of meteorological variable Y on day

t

.

Step 4 (Optimization Hybrid). In order to couple predictions by different single time windows, during model training processing, we need to search for optimal weights, such that the weighted ensemble of predictions by different single-time windows can produce the best predictions for training datasets. In detail, we need to solve the following optimization problem:

\underset{w^{j} \geq 0, j = 1,2, . . K,}{m i n} {|\sum_{j} w^{j} Y_{p}^{t - j} - Y_{o}^{t}|}^{2} such that \sum_{j} w^{j} = 1,

where

Y_{o}^{t}

represents the observed value for meteorological variable Y on day t,

Y_{p}^{t - j}

represents the predicted value of meteorological variable A by using the observed data from day

t - j

,

w^{j}

is the weight of the LSTM prediction under the single time window with lag

j

. These weights not only guarantee a high prediction accuracy but also improve the model’s interpretability. These weights reveal the importance of observed meteorological data under different single time windows in the final meteorological prediction.

4. Meteorological Observation Data in Bangladesh

Bangladesh is located on the deltas of large rivers flowing from the Himalayas. Due to the impacts of South Asia monsoons, Bangladesh has completely different seasons: pre-monsoon, monsoon, post-monsoon, and winter. Due to environmental, economic, and topographic restrictions, the distribution of the 34 observation stations of the Bangladesh Meteorological Department is irregular and sparse (Figure 2). Daily observation data for the night meteorological factors (maximum temperature, average temperature, minimum temperature, total rainfall, sea level pressure, relative humidity, wind speed, wind direction, cloud amount) during 1989–2018 were collected from these stations.

5. Case Study: Meteorological Predictions for Bangladesh

The observed meteorological data in Bangladesh collected during 1989–2018 were divided into three parts: the first 70% were used as the training set, the middle 10% were used as the validation set, and the last 20% were used as the test set. That is, the data collected during 1989–2009 were used for training, and those collected during 2010–2012 were used for validation, and the data collected during 2013–2018 were used for testing. In this section, we demonstrate the prediction performance of our optimization hybrid of multiple-lag LSTM networks by comparing it with the LSTM, GCN-LSTM, and CNN-LSTM, as well as the SVM and DT. All networks adopted a seven-day time window during the prediction process, i.e., the observed data on days

t - 1, \dots, t - 7

were used to predict the value on day

t

. The quality of the meteorological predictions was measured by four statistical metrics: the root mean square error (RMSE), the correlation coefficient (R), the mean absolute percentage error (MAPE), and the Nash–Sutcliffe efficiency coefficient (NSE).

In order to reduce the complexity of the model and reduce the risk of overfitting, we set the number of hidden units in the LSTM layer to a relatively small number and set a gradient threshold to prevent gradient explosion. During the training process, we used the mini-batch to enhance the training stability and avoid overfitting. At the same time, we used Dropout regularization technology to randomly turn off a portion of the neurons to reduce the neural network’s over-reliance on the training data.

The training process of our models was as follows: First we selected the top three most correlated observation stations as partner stations according to the correlations of the maximum temperature (humidity/precipitation) (Figure 3).

Different observation stations had different indicator correlation matrices. Taking the Barisal observatory as an example, we show in Table 1 the correlation matrix of the maximum temperature (Max), relative humidity (RH), precipitation (Precip), minimum temperature (Min), average temperature (Ave), sea level pressure (SLP), cloud cover (Cc), wind speed (WS), wind direction (WD), season (Ssn), month (Mth), and day.

We only reserved the indicators that had relatively large correlation coefficients with the maximum temperature (humidity/precipitation). The inputs selected for the maximum temperature prediction included the maximum temperature, average temperature, minimum temperature, season, and sea level pressure. The indicators selected for the relative humidity prediction include the relative humidity, cloud cover, sea level pressure, month and day, minimum temperature, and precipitation. The indicators selected for the precipitation prediction include the precipitation, relative humidity, cloud cover, and minimum temperature.

The training process for the GCN-LSTM and CNN-LSTM was as follows: For the prediction of the maximum temperature (humidity, precipitation), the resulting prediction performance after inputting all 12 meteorological indicators (Table 1) into the GCN-LSTM or CNN-LSTM was not good. In order to make a fair comparison, we sorted all of the meteorological indicators according to the magnitude of the correlation with the maximum temperature (humidity, precipitation) and gradually selected the indicators with large correlations as the model inputs. Finally, the input indicators for the maximum temperature prediction included the maximum temperature, average temperature, minimum temperature, sea level pressure, and season; the input indicators for the humidity prediction included the humidity, cloud cover, minimum temperature, and precipitation; and the input indicators for the precipitation prediction included the precipitation, cloud cover, humidity, minimum temperature, sea level pressure, and season. The training processes for the LSTM, SVM, and DT were similar to those of the GCN-LSTM and CNN-LSTM.

5.1. Prediction of the Maximum Temperature

Compared with GCN-LSTM, CNN-LSTM, LSTM, SVM, and DT, our optimization hybrid of the multiple-lag LSTM network demonstrated the best prediction accuracy, followed by the GCN-LSTM and CNN-LSTM and the LSTM, SVM, and DT. The average RMSE, R, and MAPE values for the prediction of the maximum temperature by using our optimization hybrid of multiple-lag LSTM network reached 1.593, 0.887, and 3.925, respectively. On the daily scale, the CNN-LSTM has a slightly higher R value (0.886) than the GCN-LSTM, and the MAPE value (3.972) of the GCN-LSTM is lower than those of other models, except for our model. The SVM has the poorest performance, especially in terms of the R value (0.698) and MAPE value (6.751) on the daily scale. On the weekly, monthly, and seasonal scales, the GCN-LSTM’s R value is the same as that of CNN-LSTM, but its MAPE value is higher than that of the CNN-LSTM. This is because our network can capture the spatial evolution of the climate system well; the GCN-LSTM and CNN-LSTM can capture partially local spatial structure of the climate system through simple adjacency matrices of graphs and convolution processes; and the LSTM, SVM, and DT can only capture temporal correlations. Table 2 presents a comparison of the prediction accuracies of the different models on daily, weekly, monthly, and seasonal scales. Our optimization hybrid of multiple-lag LSTM network was the best prediction model for all temporal scales. Figure 4 demonstrates the prediction curves of three types of LSTM networks for the maximum temperature time series in Cox’s Bazar station. The prediction curve of our optimization hybrid of multiple-lag LSTM network was closest to the observed data.

Figure 5 demonstrates the distribution of the RMSE values for the maximum temperature predictions of all 34 meteorological stations on daily, weekly, monthly, and seasonal scales. The minimal RMSE was achieved using our optimization hybrid of multiple-lag LSTM network whose RMSE distribution was concentrated between 1.5 and 1.7 °C on the daily scale. Our model also performed best on the weekly scale, and the associated RMSE values for the 34 meteorological stations were concentrated between 0.55 and 0.65 °C. It was followed by the GCN-LSTM and CNN-LSTM, and the prediction performance of the GCN-LSTM was better than that of the CNN-LSTM with one outlier representing a significantly large error which appeared in the CNN-LSTM. On the monthly and seasonal scales, our models achieved smaller RSME values: 0.35 °C on the monthly scale and 0.25 °C on the seasonal scale. Four mechanisms (selection of partner stations, selection of auxiliary factors, multiple-lag LSTM network, optimization hybrid) can extract the links among meteorological variables at different stations under different temporal scales, so our model achieved the best performance. Several outliers representing significantly large errors appeared in predictions of the CNN-LSTM and GCN-LSTM.

We observed the spatial distribution of the RMSE values for the prediction of the maximum temperature by comparing our optimization hybrid of multiple-lag LSTM network with the CNN-LSTM and GCN-LSTM (Figure 6).

➢: For the Dinajpur, Sydpur, Bogra, and Rangpur stations in northern Bangladesh, our model reduced the average RSME of the maximum temperature prediction by 0.32, 0.73, 0.56, and 0.54 over the GCN-LSTM and by 0.35, 0.93, 0.86, and 0.86 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.
➢: For the Sylhet, Srimangal, and Comilla stations in eastern Bangladesh, our model reduced the average RSME of the maximum temperature prediction by 0.04, 0.19, 0.17, and 0.14 over the GCN-LSTM and 0.09, 0.30, 0.27, and 0.23 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.
➢: For the Chuadanga, Khulna, and Satkhira stations in southwestern Bangladesh, our model reduced the average RSME of the maximum temperature prediction by 0.29, 0.74, 0.70, and 0.72 over the GCN-LSTM and by 0.29, 0.85, 0.93, and 0.89 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.
➢: For the Cox’s Bazar, Rangamati, and Chittagong stations in southeastern Bangladesh, our model reduced the average RSME of the maximum temperature prediction by 0.16, 0.43, 0.44, and 0.48 over the GCN-LSTM and by 0.13, 0.46, 0.45, and 0.45 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.

Finally, we examined the differences in the season-scale predictions by our model. The prediction accuracy of our model in the winter season was the highest among all four seasons, with the RMSE reaching 0.206 and the R reaching 0.978. The prediction performance during the monsoon season was relatively low, and the prediction accuracy during the post-monsoon season was higher than during the pre-monsoon season. The RMSE and MAPE during the post-monsoon season were both lower than those during the pre-monsoon season, while the R value was greater (Table 3).

5.2. Prediction of the Relative Humidity and Precipitation

Our optimization hybrid of multiple-lag LSTM network demonstrated the best prediction accuracy, followed by the GCN-LSTM and CNN-LSTM and the LSTM, SVM, and DT. The average RMSE, R, and MAPE for relative humidity predictions obtained by using our optimization hybrid of multiple-lag LSTM network reached 4.469, 0.823, and 4.390, respectively. The average RMSE, R, and MAE values for all 34 meteorological stations using our model for daily precipitation predictions reached 13.54, 0.66, and 5.73, respectively. On the daily scale, except for our model, the LSTM (Step1 + Step2) and CNN-LSTM had the highest R values for the relative humidity and precipitation predictions, and their MAPE and MAE values were relatively small. At the same time, the R value of the CNN-LSTM outperformed that of the GCN-LSTM. The DT had the worst MAE for the precipitation prediction compared to the other models. The SVM had the poorest performance in terms of its R values (0.632 for relative humidity and 0.33 for precipitation prediction) compared to the other models. On the weekly, monthly, and seasonal scales, the CNN-LSTM had a slightly higher R value than that of the GCN-LSTM, and the GCN-LSTM’s MAPE/MAE values were lower than that of the CNN-LSTM. The SVM had the lowest R for humidity and precipitation predictions across all four scales. The DT had the highest MAE for precipitation predictions on the daily, monthly, and seasonal scales. This is because our network can capture spatial correlations embedded in meteorological observations well, while the GCN-LSTM and CNN-LSTM can partially capture the spatial structure through the fixed adjacency matrices of graphs and convolution processes, respectively. Table 4 demonstrates a detailed comparison of the humidity and precipitation prediction accuracy by different models on daily, weekly, monthly, and seasonal scales. Our optimized hybrid multiple-lag LSTM network had the best prediction performance on all temporal scales. Figure 7 further displays the prediction curves of three types of LSTM networks for the relative humidity and precipitation time series at Barisal station. Clearly, our optimized hybrid multiple-lag LSTM network’s prediction curve closely matches the observed data.

Figure 8 demonstrates the distribution of the RMSE values for the humidity and precipitation predictions in all 34 meteorological stations on daily, weekly, monthly, and seasonal scales. The minimal humidity and precipitation prediction RMSE values were achieved by our model whose RMSE distributions were concentrated between 4.2 and 4.6% and 11 and 15 mm on a daily scale. Our model also performed the best on a weekly scale and the associated RMSE values for the humidity and precipitation predictions were concentrated between 1.6 and 1.8% and 28 and 45 mm. This was followed by the GCN-LSTM and CNN-LSTM, and the prediction performance of the GCN-LSTM was better than that of the CNN-LSTM. On the monthly and seasonal scales, our model also achieved the lowest RMSE values. Four mechanisms (selection of partner stations, selection of auxiliary factors, multiple-lag LSTM network, optimization hybrid) can extract the links among meteorological variables for different stations and different temporal scales well, so our model achieved the best performance.

We observed the spatial distribution of the RMSE values for the prediction of humidity and precipitation by comparing our optimization hybrid of multiple-lag LSTM network with the CNN-LSTM and GCN-LSTM (Figure 9 and Figure 10).

➢: For the Dinajpur, Sydpur, Rangpur, and Bogra stations in northern Bangladesh, our model reduced the average RSME of the humidity prediction by 0.39, 1.04, 0.74, and 0.37 over the GCN-LSTM and 0.66, 1.82, 1.72, and 1.26 over the CNN-LSTM and reduced the average RSME of the precipitation prediction by 7.43, 17.35, 41.15, and 126.30 over the GCN-LSTM and by 6.97, 15.75, 35.98, and 105.83 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.
➢: For the Sylhet, Srimangal, and Comilla stations in eastern Bangladesh, our model reduced the average RSME of the humidity prediction by 0.39, 1.45, 1.35, and 0.99 over the GCN-LSTM and by 0.50, 1.62, 1.39, and 1.14 over the CNN-LSTM and reduced the average RSME of the precipitation prediction by 9.65, 34.31, 44.59, and 113.02 over the GCN-LSTM and by 9.01, 24.07, 15.21, and 27.04 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.
➢: For the Chittagong, Rangamati, and Teknaf stations in southeastern Bangladesh, our model reduced the average RSME of the humidity prediction by 0.58, 1.40, 1.18, and 1.03 over the GCN-LSTM and by 0.59, 1.63, 1.50, and 1.35 over the CNN-LSTM and reduced the average RSME of the precipitation prediction by 11.54, 54.09, 116.67, and 283.93 over the GCN-LSTM and by 10.97, 51.77, 134.45, and 356.45 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.
➢: For the Bhola, Patuakhali, Hatiya, and Feni stations in southern Bangladesh, our model reduced the average RSME of the humidity prediction by 0.68, 2.38, 2.31, and 1.98 over the GCN-LSTM and by 0.45, 1.99, 1.75, and 1.45 over the CNN-LSTM and reduced the average RSME of the precipitation prediction by 14.39, 61.02, 159.72, and 352.88 over the GCN-LSTM and by 14.29, 66.54, 198.17, and 528.69 over the CNN-LSTM on the daily, weekly, monthly, and seasonal scales, respectively.
➢: For the Ishurdi, Jessore and Rajshahi stations in western Bangladesh, our model reduced the average RSME of the humidity prediction by 0.44, 1.16, 1.08, and 1.04 over the GCN-LSTM and by 0.48, 1.37, 1.40, and 1.29 over the CNN-LSTM and reduced the average RSME of the precipitation prediction by 9.13, 29.21, 75.80, and 169.05 over the GCN-LSTM and by 9.41, 32.03, 82.59, and 206.92 over the CNN-LSTM for the Ishurdi, Khulna, and Mongla stations in southwestern Bangladesh on the daily, weekly, monthly, and seasonal scales, respectively.

Finally, we examined the difference in season-scale predictions by our model. For relative humidity predictions, our model performed better during the winter and post-monsoon seasons. In particular, in the winter season, it achieved the maximal R value (0.989), accompanied by a lower RMSE value (0.693) and MAPE value (0.777). During the monsoon season, the humidity predictions by our model exhibited lower RMSE and MAPE values compared to those obtained for the pre-monsoon season. Similar to the relative humidity prediction, the precipitation prediction accuracy during the monsoon season was relatively poor compared to that during the other three seasons (Table 5).

6. Conclusions

Residents in poor regions are extremely vulnerable to climate change due to their rain-fed agriculture and weak industry base. At the same time, the observation stations in these regions are sparse and irregularly distributed. Conventional LSTM networks only handle temporal sequences. The LSTM variants (GCN-LSTM and CNN-LSTM) can partially capture local spatial structures through the simple structures of fixed adjacency matrices on irregular data or convolution processing on gridded data. In this study, we overcame the drawbacks of LSTM networks and their variants and proposed the novel optimization hybrid of multiple-lag LSTM networks for meteorological predictions. Four mechanisms (selection of partner stations, selection of auxiliary factors, multiple-lag LSTM network, optimization hybrid) embedded into the LSTM network architecture can extract the links among meteorological variables from different stations and on different temporal scales, so our model achieved the best prediction performance. We conducted meteorological prediction experiments on the maximum temperature, relative humidity, and precipitation observed at 34 meteorological stations in Bangladesh. Compared with the GCN-LSTM, CNN-LSTM, LSTM, SVM, and DT, our optimization hybrid of multiple-lag LSTM network demonstrated the best prediction accuracy.

Author Contributions

L.Z. and Z.Z. are co-first authors. Conceptualization, Z.Z.; Methodology, L.Z. and Z.Z.; Software, L.Z.; Formal analysis, Z.Z.; Investigation, L.Z., M.J.C.C. and L.C.D.; Resources, Z.Z. and L.C.D.; Data curation, L.Z. and L.C.D.; Writing—original draft, Z.Z.; Writing—review & editing, Z.Z. and M.J.C.C.; Visualization, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The corresponding author was supported by the European Commission Horizon 2020 Framework Program No. 861584 and the Taishan Distinguished Professor Fund No. 20190910.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sziroczak, D.; Rohacs, D.; Rohacs, J. Review of using small UAV based meteorological measurements for road weather management. Prog. Aerosp. Sci. 2022, 134, 100859. [Google Scholar]
Van Aalst, M.K. The impacts of climate change on the risk of natural disasters. Disasters 2006, 30, 5–18. [Google Scholar] [PubMed]
Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar] [PubMed]
Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Barker, D.M.; Duda, M.G.; Powers, J.G. A description of the advanced research WRF version 3. NCAR Tech. Note 2008, 475, 113. [Google Scholar]
Gettelman, A.; Geer, A.J.; Forbes, R.M.; Carmichael, G.R.; Feingold, G.; Posselt, D.J.; Zuidema, P. The future of Earth system prediction: Advances in model-data fusion. Sci. Adv. 2022, 8, eabn3488. [Google Scholar]
Cobb, A.; Steinhoff, D.; Weihs, R.; Delle Monache, L.; DeHaan, L.; Reynolds, D.; Ralph, F.M. West-WRF 34-Year Reforecast: Description and Validation. J. Hydrometeorol. 2023, 24, 2125–2140. [Google Scholar] [CrossRef]
Hamill, T.M.; Whitaker, J.S.; Shlyaeva, A.; Bates, G.; Fredrick, S.; Pegion, P.; Woollen, J. The Reanalysis for the Global Ensemble Forecast System, Version 12. Mon. Weather. Rev. 2022, 150, 59–79. [Google Scholar]
Kumar, P.; Kishtawal, C.M.; Pal, P.K. Impact of ECMWF, NCEP, and NCMRWF global model analysis on the WRF model forecast over Indian Region. Theor. Appl. Climatol. 2017, 127, 143–151. [Google Scholar]
Chattopadhyay, A.; Nabizadeh, E.; Hassanzadeh, P. Analog forecasting of extreme-causing weather patterns using deep learning. J. Adv. Model. Earth Syst. 2020, 12, e2019MS001958. [Google Scholar]
Rao, A.R.; Wang, Q.; Wang, H.; Khorasgani, H.; Gupta, C. Spatio-temporal functional neural networks. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; pp. 81–89. [Google Scholar]
Tao, H.; Awadh, S.M.; Salih, S.Q.; Shafik, S.S.; Yaseen, Z.M. Integration of extreme gradient boosting feature selection approach with machine learning models: Application of weather relative humidity prediction. Neural. Comput. Appl. 2022, 34, 515–533. [Google Scholar]
Akbari Asanjan, A.; Yang, T.; Hsu, K.; Sorooshian, S.; Lin, J.; Peng, Q. Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res. Atmos. 2018, 123, 12–543. [Google Scholar] [CrossRef]
García-Duarte, L.; Cifuentes, J.; Marulanda, G. Short-term spatio-temporal forecasting of air temperatures using deep graph convolutional neural networks. Stoch. Environ. Res. Risk Assess. 2023, 37, 1649–1667. [Google Scholar] [CrossRef]
Pathan, M.S.; Jain, M.; Lee, Y.H.; Al Skaif, T.; Dev, S. Efficient forecasting of precipitation using LSTM. In Proceedings of the 2021 Photonics & Electromagnetics Research Symposium (PIERS), Hangzhou, China, 21–25 November 2021; pp. 2312–2316. [Google Scholar]
Tran, T.T.K.; Bateni, S.M.; Ki, S.J.; Vosoughifar, H. A review of neural networks for air temperature forecasting. Water 2021, 13, 1294. [Google Scholar] [CrossRef]
Forsyth, D.A.; Mundy, J.L.; di Gesú, V.; Cipolla, R.; LeCun, Y.; Haffner, P.; Bengio, Y. Object recognition with gradient-based learning. In Shape, Contour and Grouping in Computer Vision; Springer: Berlin/Heidelberg, Germany, 1999; pp. 319–345. [Google Scholar]
Kim, S.; Hong, S.; Joh, M.; Song, S.K. Deeprain: Convlstm network for precipitation prediction using multichannel radar data. arXiv 2017, arXiv:1711.02316. [Google Scholar]
Gamboa-Villafruela, C.J.; Fernández-Alvarez, J.C.; Márquez-Mijares, M.; Pérez-Alarcón, A.; Batista-Leyva, A.J. Convolutional lstm architecture for precipitation nowcasting using satellite data. Environ. Sci. Proc. 2021, 8, 33. [Google Scholar]
Li, W.; Pan, B.; Xia, J.; Duan, Q. Convolutional neural network-based statistical post-processing of ensemble precipitation forecasts. J. Hydrol. 2022, 605, 127301. [Google Scholar] [CrossRef]
Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimed. Tools. Appl. 2019, 78, 26597–26613. [Google Scholar] [CrossRef]
Liu, X.; Qin, M.; He, Y.; Mi, X.; Yu, C. A new multi-data-driven spatiotemporal PM2. 5 forecasting model based on an ensemble graph reinforcement learning convolutional network. Atmos. Pollut. Res. 2021, 12, 101197. [Google Scholar] [CrossRef]
Wilson, T.; Tan, P.N.; Luo, L. A low rank weighted graph convolutional approach to weather prediction. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 627–636. [Google Scholar]
Yu, Y.; Hu, G. Short-term solar irradiance prediction based on spatiotemporal graph convolutional recurrent neural network. J. Renew. Sustain. Energy 2022, 14, 053702. [Google Scholar]
Zhang, Z.; Li, J. Big Data Mining for Climate Change; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
O’Donncha, F.; Hu, Y.; Palmes, P.; Burke, M.; Filgueira, R.; Grant, J. A spatio-temporal LSTM model to forecast across multiple temporal and spatial scales. Ecol. Inform. 2022, 69, 101687. [Google Scholar]
Qi, Y.; Li, Q.; Karimian, H.; Liu, D. A hybrid model for spatiotemporal forecasting of PM2. 5 based on graph convolutional neural network and long short-term memory. Sci. Total Environ. 2019, 664, 1–10. [Google Scholar] [CrossRef]
Zhang, B.; Rong, Y.; Yong, R.; Qin, D.; Li, M.; Zou, G.; Pan, J. Deep learning for air pollutant concentration prediction: A review. Atmos. Environ. 2022, 2022, 119347. [Google Scholar] [CrossRef]
Li, J.; Li, R.; Xu, L. Multi-stage deep residual collaboration learning framework for complex spatial–temporal traffic data imputation. Appl. Softw. Comput. 2023, 147, 110814. [Google Scholar] [CrossRef]
Qu, J.; Qian, Z.; Pei, Y. Day-ahead hourly photovoltaic power forecasting using attention-based CNN-LSTM neural network embedded with multiple relevant and target variables prediction pattern. Energy 2021, 232, 120996. [Google Scholar] [CrossRef]
Song, J.; Zhang, L.; Xue, G.; Ma, Y.; Gao, S.; Jiang, Q. Predicting hourly heating load in a district heating system based on a hybrid CNN-LSTM model. Energy Build. 2021, 243, 110998. [Google Scholar] [CrossRef]
Fu, Q.; Niu, D.; Zang, Z.; Huang, J.; Diao, L. Multi-stations’ weather prediction based on hybrid model using 1D CNN and Bi-LSTM. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 3771–3775. [Google Scholar]
Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
Fan, J.; Wang, X.; Wu, L.; Zhou, H.; Zhang, F.; Yu, X.; Xiang, Y. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Convers. Manag. 2018, 164, 102–111. [Google Scholar] [CrossRef]
Wu, Z.; Cui, N.; Gong, D.; Zhu, F.; Li, Y.; Xing, L.; Zha, Y. Predicting daily global solar radiation in various climatic regions of China based on hybrid support vector machines with meta-heuristic algorithms. J. Clean. Prod. 2023, 385, 135589. [Google Scholar] [CrossRef]
Balogun, A.L.; Tella, A. Modelling and investigating the impacts of climatic variables on ozone concentration in Malaysia using correlation analysis with random forest, decision tree regression, linear regression, and support vector regression. Chemosphere 2022, 299, 134250. [Google Scholar] [CrossRef]

Figure 1. Structure of the Long Short-Term Memory (LSTM) network.

Figure 2. Locations of 34 meteorological stations in Bangladesh.

Figure 3. Correlation coefficient matrices of the maximum temperature (left), humidity (middle), and precipitation (right) at 34 observation stations.

Figure 4. Prediction of the maximum temperature at Cox’s Bazar station by three types of LSTM networks on daily, weekly, monthly, and seasonal scales.

Figure 5. Box plot of RMSE values for predictions of the maximum temperature observed at all 34 observation stations in Bangladesh.

Figure 6. Spatial distribution of RMSE values for the maximum temperature predictions by different LSTMs, and their differences on daily, weekly, monthly, and seasonal scales.

Figure 7. Prediction of humidity (left column) and precipitation (right column) in Barisal station by three types of LSTM networks under daily, weekly, monthly and seasonal scales.

Figure 8. Box plot of the RMSE values for the prediction of humidity (first row) and precipitation (second row) observed at all 34 observation stations in Bangladesh.

Figure 9. Spatial distribution of the RMSE values for the humidity predictions by different LSTMs and their differences on the daily, weekly, monthly, and seasonal scales.

Figure 10. Spatial distribution of the RMSE values for the precipitation predictions by different LSTMs and their differences on the daily, weekly, monthly, and seasonal scales.

Table 1. Correlation matrix of indicators for the Barisal observation station.

Max	RH	Precip	Min	Ave	SLP	Cc	WS	WD	Ssn	Mth	Day
1	−0.16	−0.04	0.74	0.88	−0.51	0.23	0.13	0.13	−0.70	0.01	0.01
	1	0.37	0.39	0.18	−0.40	0.63	0.08	−0.02	−0.01	0.40	0.40
		1	0.18	0.14	−0.30	0.36	0.15	0.11	−0.14	0.06	0.06
			1	0.95	−0.79	0.70	0.26	0.17	−0.71	0.18	0.18
				1	−0.75	0.57	0.24	0.17	−0.77	0.11	0.10
					1	−0.71	−0.34	−0.24	0.59	−0.06	−0.06
						1	0.33	0.20	−0.46	0.11	0.11
							1	0.55	−0.32	−0.19	−0.19
								1	−0.26	−0.24	−0.24
									1	0.15	0.16
										1	1.00
											1

Table 2. Prediction accuracies of different models for the maximum temperature (°C). (The best result is in bold).

Scale	Models	RMSE	R	MAPE
daily	Our model	1.593	0.887	3.925
	LSTM (Step1 + Step2)	1.608	0.885	3.977
	GCN-LSTM	1.613	0.885	3.972
	CNN-LSTM	1.636	0.886	4.073
	LSTM	2.055	0.814	5.166
	SVM	2.666	0.698	6.751
	DT	2.218	0.803	5.573
weekly	Our model	0.618	0.981	1.593
	LSTM (Step1 + Step2)	0.655	0.980	1.686
	GCN-LSTM	0.718	0.977	1.848
	CNN-LSTM	0.786	0.977	2.042
	LSTM	1.098	0.949	2.869
	SVM	1.346	0.899	3.282
	DT	1.085	0.962	2.843
monthly	Our model	0.351	0.994	0.927
	LSTM (Step1 + Step2)	0.377	0.994	0.983
	GCN-LSTM	0.435	0.992	1.140
	CNN-LSTM	0.522	0.992	1.380
	LSTM	0.744	0.979	1.988
	SVM	0.979	0.950	2.006
	DT	0.801	0.986	2.170
seasonal	Our model	0.258	0.996	0.696
	LSTM (Step1 + Step2)	0.282	0.996	0.747
	GCN-LSTM	0.347	0.995	0.946
	CNN-LSTM	0.425	0.995	1.170
	LSTM	0.577	0.988	1.540
	SVM	1.165	0.943	1.923
	DT	0.702	0.990	1.913

Table 3. Seasonal-scale prediction accuracy of our model for the maximum temperature (°C).

Season	RMSE	R	MAPE
winter	0.206	0.978	0.672
pre-monsoon	0.256	0.988	0.641
monsoon	0.289	0.919	0.794
post-monsoon	0.219	0.924	0.630

Table 4. Prediction accuracy of different models for the relative humidity (%) and precipitation (mm). (The best result is in bold).

Scale	Models	RMSE	R	MAPE/MAE
daily	Our model	4.469/13.54	0.823/0.66	4.390/5.73
	LSTM (Step1 + Step2)	4.506/15.20	0.820/0.56	4.434/6.19
	GCN-LSTM	4.564/16.05	0.819/0.46	4.498/7.24
	CNN-LSTM	4.585/16.01	0.820/0.47	4.522/7.21
	LSTM	5.558/16.61	0.711/0.41	5.412/7.78
	SVM	6.319/15.49	0.632/0.33	6.228/7.79
	DT	6.155/17.26	0.689/0.35	6.055/8.12
weekly	Our model	1.740/38.01	0.969/0.85	1.754/21.77
	LSTM (Step1 + Step2)	1.854/42.64	0.965/0.80	1.857/23.55
	GCN-LSTM	2.142/45.57	0.957/0.81	2.160/25.09
	CNN-LSTM	2.211/45.48	0.958/0.82	2.232/25.28
	LSTM	2.784/51.07	0.909/0.73	2.751/29.17
	SVM	4.061/46.03	0.807/0.69	4.055/24.14
	DT	2.509/47.53	0.928/0.81	2.501/28.41
monthly	Our model	1.002/88.97	0.988/0.92	1.013/59.97
	LSTM (Step1 + Step2)	1.076/96.14	0.988/0.91	1.084/63.32
	GCN-LSTM	1.351/99.82	0.982/0.92	1.384/61.38
	CNN-LSTM	1.424/102.28	0.984/0.93	1.465/64.29
	LSTM	1.620/116.17	0.963/0.88	1.476/75.51
	SVM	2.937/143.33	0.880/0.74	2.927/77.45
	DT	1.605/113.81	0.967/0.93	1.609/83.79
seasonal	Our model	0.759/181.33	0.990/0.97	0.775/131.69
	LSTM (Step1 + Step2)	0.774/198.41	0.990/0.97	0.797/140.15
	GCN-LSTM	1.030/201.97	0.986/0.98	1.070/133.90
	CNN-LSTM	1.098/215.16	0.987/0.98	1.150/142.86
	LSTM	1.519/226.91	0.943/0.96	1.185/154.59
	SVM	2.584/304.63	0.867/0.88	2.479/209.57
	DT	1.364/254.00	0.960/0.97	1.324/191.71

Table 5. Seasonal-scale prediction accuracy of our model for the relative humidity and precipitation.

Season	RMSE	R	MAPE/MAE
winter	0.693/80.44	0.989/0.60	0.777/70.35
pre-monsoon	0.841/165.10	0.988/0.80	0.941/137.67
monsoon	0.670/279.68	0.944/0.79	0.686/235.28
post-monsoon	0.622/97.70	0.954/0.77	0.668/84.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, L.; Zhang, Z.; Crabbe, M.J.C.; Das, L.C. Optimization Hybrid of Multiple-Lag LSTM Networks for Meteorological Prediction. Mathematics 2023, 11, 4603. https://doi.org/10.3390/math11224603

AMA Style

Zhu L, Zhang Z, Crabbe MJC, Das LC. Optimization Hybrid of Multiple-Lag LSTM Networks for Meteorological Prediction. Mathematics. 2023; 11(22):4603. https://doi.org/10.3390/math11224603

Chicago/Turabian Style

Zhu, Lin, Zhihua Zhang, M. James C. Crabbe, and Lipon Chandra Das. 2023. "Optimization Hybrid of Multiple-Lag LSTM Networks for Meteorological Prediction" Mathematics 11, no. 22: 4603. https://doi.org/10.3390/math11224603

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization Hybrid of Multiple-Lag LSTM Networks for Meteorological Prediction

Abstract

1. Introduction

2. LSTM Networks and Their Hybrid Models

3. Our Improved LSTM Network

4. Meteorological Observation Data in Bangladesh

5. Case Study: Meteorological Predictions for Bangladesh

5.1. Prediction of the Maximum Temperature

5.2. Prediction of the Relative Humidity and Precipitation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI