5.2. Qualitative Analysis of the ConvLSTM-TransGAN Model
In this section, we present a visualization of the properties of our proposed TransGAN. Once the training of the TransGAN is completed, we extract the generator and connect it to the output end of the ConvLSTM network.
In this study, the Moving MNIST dataset was used to train the model, and for comparison purposes, the traditional models such as ConvLSTM, TrajGRU, PredRNN, and PredRNN++ were also tested. The results are presented in
Figure 9, where the input is the 1–10 frame input image and the ground truth is the 11–20 frame real image.
The findings from
Figure 9 reveal that the ConvLSTM-TransGAN model demonstrates superior performance in accurately predicting the morphological changes and motion trajectories of the digits in the image sequence, as well as producing images with higher sharpness and clarity compared with other models. Specifically, when compared with ConvLSTM, TrajGRU, PredRNN, and PredRNN++, the ConvLSTM-TransGAN model produces a more complete image of the digit “2” with higher contrast and clarity. Furthermore, ConvLSTM and TrajGRU’s predicted images become increasingly blurry as time progresses, while PredRNN and PredRNN++ lose significant features in the last few frames, resulting in failed predictions of the final forms of the digits “2” and “7”. In contrast, the ConvLSTM-TransGAN model accurately predicts the basic forms and motion patterns of these two digits, resulting in images that are not only clearer but also more accurate in predicting the motion trend in the image sequence.
Moreover, the model’s performance was validated on the HKO-7 dataset, which comprised 812 days for training, 50 days for validation, and 131 days for testing, with data recorded every 6 min, resulting in 240 frames per day. To avoid insufficient differences between adjacent frames, the model utilized eight frames as input and predicted the next frame.
Figure 10 illustrates a performance example of the ConvLSTM-TransGAN model. The first row displays the ground truth sequence of Hong Kong’s weather on 11 April 2015. This particular day was characterized by heavy rainfall, making it highly representative and typical. In this visualization, the model takes 8 frames as input and displays only the last 4 frames starting at 7:36 AM, with a frame interval of 6 min. The second row represents the actual ground-truth data. Subsequently, the third and fourth rows exhibit the predictions made by the ConvLSTM and TrajGRU models, respectively. Moving on, the fifth and sixth rows present the predictions obtained from the PredRNN and PredRNN++ models. Finally, the last row shows the prediction results of the ConvLSTM-TransGAN model. It is worth noting that the images in the last five rows are generated predictions using 8 real early frames to better demonstrate the long-term performance of the model, starting at 8:00, with a frame interval of 18 min.
Analyzing
Figure 10, several observations can be made. Firstly, ConvLSTM struggles to make accurate long-term predictions for dynamic features such as the appearance and motion trajectory of radar images. On the other hand, TrajGRU, PredRNN, and PredRNN++ models show promising results in predicting the initial frames. However, as the prediction range extends to 8:36, the quality of the predicted images deteriorates, exhibiting only rough outlines and limited internal details.
In contrast, the ConvLSTM-TransGAN model demonstrates impressive performance. It accurately predicts the appearance contour of radar images and maintains consistency with the actual image in terms of motion trajectory. Notably, as time progresses, the ConvLSTM TransGAN model produces increasingly clearer images, avoiding the problem of declining image quality observed in the TrajGRU, PredRNN, and other models. This model effectively tackles the prevalent issue of image blurriness in predictions, resulting in visually appealing results. In terms of prediction accuracy, the ConvLSTM-TransGAN model surpasses all other models in the comparison. Furthermore, it performs well in predicting extreme weather events, such as heavy precipitation, demonstrating its ability to handle typical cases effectively.
To evaluate the practicality of the ConvLSTM-TransGAN model in precipitation tasks, this chapter conducted a comparative analysis involving other four well-known models: ConvLSTM, TrajGRU, PredRNN, and PredRNN++, using the HKO-7 dataset for training. The quantitative performance of these five models was assessed using six evaluation metrics, namely MAE, MSE, SSIM, CSI, POD, and FAR, as summarized in
Table 3. The values of these six metrics are the average of the last 10 frames of the prediction results and are obtained by averaging the results of multiple experiments.
The results demonstrate that the ConvLSTM-TransGAN model outperforms the other four models across all evaluation metrics, showing significant improvements. By incorporating the generative network, the ConvLSTM-TransGAN model enhances its prediction capabilities while maintaining the accuracy of spatiotemporal sequence prediction, thereby generating radar echo images of higher quality. Moreover, the ConvLSTM-TransGAN model exhibits promising performance in the three short-term precipitation prediction evaluation metrics, further validating its effectiveness.
Overall, the findings support the feasibility and superiority of the ConvLSTM-TransGAN model in practical precipitation tasks, highlighting its potential as a valuable tool for improved prediction and the generation of radar echo images.
To visually illustrate the disparities in prediction performance among the models, this study compares the output results of each model for each frame and plots the change curves of different evaluation metrics in relation to the prediction time steps. Specifically,
Figure 11 depicts the comparison using the MAE, MSE, and SSIM evaluation metrics. The findings indicate that as the prediction time steps increase, the performance of all models in terms of prediction deteriorates. However, the ConvLSTM-TransGAN model consistently maintains the highest level of performance across all evaluation metrics. This indicates that the model effectively captures spatiotemporal features while generating clear and accurate radar images. The results highlight the superiority of the ConvLSTM-TransGAN model over ConvLSTM, TrajGRU, PredRNN, and PredRNN++ in spatiotemporal sequence prediction and radar image generation. The ConvLSTM-TransGAN model stands out as it consistently outperforms the other models, reinforcing its capability to produce high-quality predictions and confirming its effectiveness in capturing complex spatiotemporal relationships.
5.3. Selection of Spatio-Temporal Prediction Models
In this study, our focus was on selecting an appropriate spatio-temporal sequence model to achieve accurate nowcasting. We compared two candidate models, ConvLSTM and PredRNN, in our experiments. Specifically, we conducted comparative analyses between the ConvLSTM-TransGAN model and the PredRNN-TransGAN model, computing their MSE, MAE, and SSIM as shown in
Table 4.
From
Table 4, it can be observed that the performance metrics of the two models are very close, especially in terms of SSIM. Based on this, we can conclude that the TransGAN generative module has the capability to effectively enhance the image quality of both ConvLSTM and PredRNN models during the image augmentation process. In this regard, there is not a significant difference between ConvLSTM and PredRNN.
Indeed, the ConvLSTM model holds a remarkable position as a milestone in deep learning for nowcasting, exhibiting strong representativeness. It has already demonstrated its ability to capture spatio-temporal information, and subsequent models such as PredRNN and PredRNN++ are based on improvements to the ConvLSTM framework. The TransGAN generative model effectively optimizes information extraction for ConvLSTM.
In our experiment, the addition of the TransGAN image generation module at the output of ConvLSTM and PredRNN resulted in comparable results. Compared with more complex models such as PredRNN and its improvements, the ConvLSTM is simpler and more efficient in inference performance.
Based on the aforementioned considerations, we selected the ConvLSTM model as the spatio-temporal sequence module in this study. Its representativeness, capability to capture spatio-temporal information, and advantages in model reasoning make ConvLSTM the optimal choice for achieving accurate nowcasting.
5.4. How Does the Learning Rate Affect the Prediction Performance?
The learning rate, denoted as
, plays a crucial role as a hyperparameter in the field of deep learning. It governs the magnitude of parameter updates during the training process. Choosing an appropriate learning rate can expedite the convergence rate of the model, mitigate fluctuations, enhance generalization capabilities, and effectively support various optimization algorithms. Consequently, it significantly improves the overall performance of the model. In our specific experiment, we begin by training the TransGAN image generation module until it achieves sufficient convergence and balance. Afterward, we freeze its parameters and use the trained module as the postprocessing module for the feature extraction module. Subsequently, we continue to train the feature extraction module based on the ConvLSTM architecture. During this training process, we conduct a comprehensive analysis using different
values, specifically 0.00001, 0.0001, 0.001, 0.01, and 0.1. After meticulously scrutinizing the results presented in
Table 5, it is evident that when
is set to 0.0001, the indicators MSE, MAE, and SSIM all reach the optimal level, indicating that the model can effectively capture spatiotemporal features while generating clear and accurate radar images at
= 0.0001. In contrast, when using a lower learning rate of 0.00001, the model’s training process becomes hindered as it gets trapped in local optima. This phenomenon extends the training time and consequently hampers the improvement rate of the evaluation metrics, leading to suboptimal results.
5.5. Limitation
In recent years, the rapid advancement of the artificial intelligence industry and deep learning technology has provided novel opportunities for the investigation of short-term and near-term precipitation forecasting. This study employed deep learning methodologies to explore the realm of short-term and near-term precipitation prediction, yielding promising prognostic outcomes. Nevertheless, there remain several aspects that warrant further inquiry.
Firstly, the current body of research predominantly relies on radar echo images as the primary data source for precipitation prediction, disregarding the multifaceted influences that affect precipitation. Such overreliance on radar echo images may introduce errors, thereby potentially compromising the accuracy of predictions.
Secondly, the extraction of long-range spatial correlation features from radar images necessitates the accumulation of information over multiple temporal intervals. However, the ConvLSTM-TransGAN model proposed in this paper adopts a sequential computation approach, progressively attenuating the efficacy of information over time. Consequently, the acquisition of long-range features becomes increasingly limited, leading to a decline in the precision of predictions. To mitigate this issue, the paper aimed to maximize the utilization of radar echo images and capture global features to the greatest extent possible. Consequently, the model exhibits a substantial parameter count and complexity, thereby augmenting the challenges associated with its training process.