Author Contributions
Conceptualization, S.L., Y.Y. and F.G.; data curation, S.L.; funding acquisition, F.G. and C.L.; investigation, S.L., F.G., Y.Y. and C.L.; methodology, S.L.; resources, F.G. and C.L.; software, S.L.; supervision, F.G., Y.Y. and C.L.; validation, F.G., Y.Y. and C.L.; visualization, S.L.; writing—original draft, S.L.; writing—review and editing, F.G., Y.Y. and C.L. All authors have read and agreed to the published version of the manuscript.
Figure 1.
(a) The overall architecture of the perceiving multi-scale spatiotemporal dynamics network (PMSTD-Net) consists primarily of the Encoder, Translator, and Decoder modules. (b) The structure of the MACU and MSEA modules.
Figure 1.
(a) The overall architecture of the perceiving multi-scale spatiotemporal dynamics network (PMSTD-Net) consists primarily of the Encoder, Translator, and Decoder modules. (b) The structure of the MACU and MSEA modules.
Figure 2.
MCAU module main framework and MSDA and MLSA capture spatial displacements and local-state features.
Figure 2.
MCAU module main framework and MSDA and MLSA capture spatial displacements and local-state features.
Figure 3.
Capture of local situation and spatial displacement information in space as well as fusion of spatial information and learning of its temporal and spatial patterns of change.
Figure 3.
Capture of local situation and spatial displacement information in space as well as fusion of spatial information and learning of its temporal and spatial patterns of change.
Figure 4.
The two existent forms of dynamic changes in the predicted targets on the publicly available dataset Moving MNIST, the video prediction dataset KTH, Human3.6m, and the GPM satellite remote sensing precipitation dataset are captured by MLSA and MSDA in PMSTD-Net at different scales to characterise the localised change in the posture of the predicted targets and the dynamic features of the spatial displacements, respectively, and MSEA is used at different scales to fuse the spatial features and learn their spatiotemporal evolutionary properties.The green small arrows in the figure indicate the trend in the spatial location of the predicted targets.
Figure 4.
The two existent forms of dynamic changes in the predicted targets on the publicly available dataset Moving MNIST, the video prediction dataset KTH, Human3.6m, and the GPM satellite remote sensing precipitation dataset are captured by MLSA and MSDA in PMSTD-Net at different scales to characterise the localised change in the posture of the predicted targets and the dynamic features of the spatial displacements, respectively, and MSEA is used at different scales to fuse the spatial features and learn their spatiotemporal evolutionary properties.The green small arrows in the figure indicate the trend in the spatial location of the predicted targets.
Figure 5.
The main framework and structure of MSEA.
Figure 5.
The main framework and structure of MSEA.
Figure 6.
The spatial extent within the blue dotted line is the area included in the GPM satellite precipitation dataset.
Figure 6.
The spatial extent within the blue dotted line is the area included in the GPM satellite precipitation dataset.
Figure 7.
Visualization of the Moving MNIST dataset.
Figure 7.
Visualization of the Moving MNIST dataset.
Figure 8.
Visualization of the predicted effects of the dataset.
Figure 8.
Visualization of the predicted effects of the dataset.
Figure 9.
Variation of MSE with forecast duration for different modelling models on GPM satellite remote sensing precipitation dataset.
Figure 9.
Variation of MSE with forecast duration for different modelling models on GPM satellite remote sensing precipitation dataset.
Figure 10.
Changes in CSI and Precision metrics with increasing forecast length for different models on the GPM satellite precipitation dataset.
Figure 10.
Changes in CSI and Precision metrics with increasing forecast length for different models on the GPM satellite precipitation dataset.
Figure 11.
Visualization of the predictive effects of different models in the GPM satellite remote sensing precipitation dataset.
Figure 11.
Visualization of the predictive effects of different models in the GPM satellite remote sensing precipitation dataset.
Figure 12.
The receptive field effect diagrams of the single-layer MLSA module, MSDA module, and their combination, the MCAU module, on the first ten frames of Moving MNIST.
Figure 12.
The receptive field effect diagrams of the single-layer MLSA module, MSDA module, and their combination, the MCAU module, on the first ten frames of Moving MNIST.
Figure 13.
Demonstration of the effect of PMSTD-Net and each experimental group on Params and Training Time for MSE on the Moving MNIST dataset, where PMSTD-Net is shown in red and the comparison experiments are shown in blue. Arrows indicate the direction of model optimisation.
Figure 13.
Demonstration of the effect of PMSTD-Net and each experimental group on Params and Training Time for MSE on the Moving MNIST dataset, where PMSTD-Net is shown in red and the comparison experiments are shown in blue. Arrows indicate the direction of model optimisation.
Table 1.
Dataset statistics.
Table 1.
Dataset statistics.
| | | (C,T,H,W) | (C,T’,H,W) |
---|
Moving MNIST [4] | 10,000 | 10,000 | (1,10,64,64) | (1,10,64,64) |
KTH [5] | 5200 | 3167 | (1,10,128,128) | (1,20,128,128) |
Human3.6m [6] | 2624 | 1135 | (3,4,128,128) | (3,4,128,128) |
GPM Precipitation | 7280 | 1456 | (1,6,144,144) | (1,6,144,144) |
Table 2.
Setting of training parameters and model parameters on the four datasets.
Table 2.
Setting of training parameters and model parameters on the four datasets.
Parameter Type | Moving MNIST | KTH | Huamn3.6m | GPM Precipitation |
---|
Seed | 1 | 1 | 1 | 1 |
Batchsize | 16 | 4 | 16 | 16 |
Training epochs | 2000 | 100 | 100 | 30 |
Learning rate | 0.001 | 0.001 | 0.01 | 0.001 |
Layers of Encoder | 4 | 2 | 2 | 2 |
Layers of Decoder | 4 | 2 | 2 | 2 |
Layers of Translator | 8 | 7 | 5 | 5 |
Number of Encoder channels | 64 | 64 | 64 | 64 |
Number of Decoder channels | 64 | 64 | 64 | 64 |
Number of Translator channels | 512 | 256 | 256 | 256 |
Table 3.
Comparison of PMSTD-Net with models from recent years on the Moving MNIST dataset on MSE, MAE, and SSIM metrics, where (↓) lower or (↑) higher indicates better predictions.
Table 3.
Comparison of PMSTD-Net with models from recent years on the Moving MNIST dataset on MSE, MAE, and SSIM metrics, where (↓) lower or (↑) higher indicates better predictions.
Method | Conference | MSE (↓) | MAE (↓) | SSIM (↑) |
---|
ConvLSTM [10] | NIPS 2015 | 103.3 | 182.9 | 0.707 |
FRNN [30] | CVPR 2017 | 69.7 | - | 0.813 |
PredRNN [11] | NIPS 2017 | 56.8 | 126.1 | 0.867 |
PredRNN++ [12] | ICML 2018 | 46.5 | 106.8 | 0.898 |
MIM [14] | CVPR 2019 | 44.2 | 101.1 | 0.910 |
LMC [31] | CVPR 2021 | 41.5 | - | 0.924 |
E3D-LSTM [13] | ICLR 2018 | 41.3 | 87.2 | 0.910 |
MAU [15] | NIPS 2021 | 27.6 | - | 0.937 |
MotionRNN [16] | CVPR 2021 | 25.1 | - | 0.920 |
PhyDNet [32] | CVPR 2020 | 24.4 | 70.3 | 0.947 |
SimVP [8] | CVPR 2022 | 23.8 | 68.9 | 0.948 |
Crevnet [33] | ICLR 2020 | 22.3 | - | 0.949 |
MMVP [20] | ICCV 2023 | 22.2 | - | 0.952 |
TAU [21] | CVPR 2023 | 19.8 | 60.3 | 0.957 |
SwimLSTM [17] | ICCV 2023 | 17.7 | - | 0.962 |
PMSTD-Net | - | 14.8 | 49.6 | 0.968 |
Table 4.
Comparison of SSIM and PSNR metrics under different model outputs on KTH, where (↓) lower or (↑) higher indicates better predictions.
Table 4.
Comparison of SSIM and PSNR metrics under different model outputs on KTH, where (↓) lower or (↑) higher indicates better predictions.
Method | SSIM (↑) | PSNR (↑) |
---|
ConvLSTM [10] | 0.712 | 23.58 |
DFN [18] | 0.794 | 27.26 |
MCnet [34] | 0.804 | 25.95 |
PredRNN [11] | 0.839 | 27.55 |
PredRNN++ [12] | 0.865 | 28.47 |
E3D-LSTM [13] | 0.870 | 29.31 |
MMVP [20] | 0.906 | 27.54 |
SimVP [8] | 0.905 | 33.72 |
TAU [21] | 0.911 | 34.13 |
PMSTD-Net | 0.916 | 34.65 |
Table 5.
Comparison of MAE, MSE and SSIM metrics under different model outputs on Human3.6m, where (↓) lower or (↑) higher indicates better predictions.
Table 5.
Comparison of MAE, MSE and SSIM metrics under different model outputs on Human3.6m, where (↓) lower or (↑) higher indicates better predictions.
Method | MSE/10 (↓) | MAE/100 (↓) | SSIM (↑) |
---|
ConvLSTM [10] | 50.4 | 18.9 | 0.776 |
PredRNN [11] | 48.4 | 18.9 | 0.781 |
PredRNN++ [12] | 45.8 | 17.2 | 0.851 |
MIM [14] | 42.9 | 17.8 | 0.790 |
E3D-LSTM [13] | 46.4 | 16.6 | 0.869 |
MotionRNN [16] | 34.2 | 14.8 | 0.846 |
PhyDNet [32] | 36.9 | 16.2 | 0.901 |
SimVP [8] | 31.6 | 15.1 | 0.904 |
PMSTD-Net | 30.2 | 13.5 | 0.910 |
Table 6.
Comparison of MSE, CSI and Precision Metrics of Different Models on GPM Satellite Remote Sensing Dataset, where (↓) lower or (↑) higher indicates better predictions.
Table 6.
Comparison of MSE, CSI and Precision Metrics of Different Models on GPM Satellite Remote Sensing Dataset, where (↓) lower or (↑) higher indicates better predictions.
Method | MSE (↓) | CSI 0.1 mm (↑) | CSI 1.0 mm (↑) | CSI 5.0 mm (↑) | Precision 0.1 mm (↑) | Precision 1.0 mm (↑) | Precision 5.0 mm (↑) |
---|
PredRNN++ [12] | 1.555 | 0.463 | 0.455 | 0.281 | 0.511 | 0.618 | 0.578 |
PredRNNv2 [26] | 1.524 | 0.467 | 0.469 | 0.295 | 0.514 | 0.656 | 0.587 |
MotionRNN [26] | 1.507 | 0.459 | 0.475 | 0.300 | 0.499 | 0.659 | 0.583 |
SimVP [8] | 1.403 | 0.470 | 0.485 | 0.310 | 0.508 | 0.645 | 0.617 |
TAU [21] | 1.447 | 0.501 | 0.482 | 0.301 | 0.548 | 0.673 | 0.604 |
PMSTD-Net | 1.382 | 0.515 | 0.486 | 0.312 | 0.577 | 0.666 | 0.629 |
Table 7.
PMSTD-Net correlation metrics for MLSA, MSDA, and MSEA modular ablation experiments on Moving MNIST, Human3.6m, and GPM satellite precipitation datasets, where (↓) lower or (↑) higher indicates better predictions.
Table 7.
PMSTD-Net correlation metrics for MLSA, MSDA, and MSEA modular ablation experiments on Moving MNIST, Human3.6m, and GPM satellite precipitation datasets, where (↓) lower or (↑) higher indicates better predictions.
Dataset | Type/Index | Method 1 | Method 2 | Method 3 | PMSTD-Net |
---|
– | MLSA | √ | | √ | √ |
MSDA | | √ | √ | √ |
MSEA | | | | √ |
Moving MNIST | MSE (↓) | 39.05 | 34.81 | 32.21 | 27.02 |
MAE (↓) | 105.51 | 96.54 | 89.91 | 78.08 |
SSIM (↑) | 0.907 | 0.919 | 0.926 | 0.939 |
Human3.6m | MSE/10 (↓) | 38.5 | 31.5 | 30.8 | 30.2 |
MAE/100 (↓) | 16.2 | 13.6 | 15.3 | 13.5 |
SSIM (↑) | 0.895 | 0.907 | 0.905 | 0.910 |
GPM Precipitation | MSE (↓) | 1.463 | 1.576 | 1.487 | 1.382 |
CSI 0.1 mm (↑) | 0.448 | 0.269 | 0.467 | 0.515 |
CSI 1.0 mm (↑) | 0.474 | 0.436 | 0.463 | 0.486 |
CSI 5.0 mm (↑) | 0.262 | 0.222 | 0.268 | 0.312 |
Precision 0.1 mm (↑) | 0.484 | 0.278 | 0.513 | 0.577 |
Precision 1.0 mm (↑) | 0.625 | 0.639 | 0.638 | 0.666 |
Precision 5.0 mm (↑) | 0.663 | 0.662 | 0.630 | 0.629 |
Table 8.
Replacement ablation of the modules of PMSTD-Net on the Moving MNIST dataset, where consists of using convolutional kernels of 3, 5, 7, 9, and the and consist of 3 convolutional kernels and an inflated convolution with an inflation rate of 3, where (↓) lower or (↑) higher indicates better predictions.
Table 8.
Replacement ablation of the modules of PMSTD-Net on the Moving MNIST dataset, where consists of using convolutional kernels of 3, 5, 7, 9, and the and consist of 3 convolutional kernels and an inflated convolution with an inflation rate of 3, where (↓) lower or (↑) higher indicates better predictions.
– | Type/Index | Method 1 | Method 2 | Method 3 | Method 4 | Method 5 | PMSTD-Net |
---|
Part A | | √ | | | | √ | |
| | | √ | | | |
MLSA | | √ | | √ | | √ |
Part B | | √ | √ | | | | |
| | | √ | | | |
MSDA | | | | √ | √ | √ |
Part C | | √ | | | √ | | |
MSEA | | √ | √ | | √ | √ |
Moving MNIST | MSE (↓) | 28.96 | 28.36 | 27.78 | 27.54 | 27.21 | 27.02 |
MAE (↓) | 81.74 | 81.06 | 79.57 | 79.12 | 78.47 | 78.08 |
SSIM (↑) | 0.935 | 0.936 | 0.938 | 0.938 | 0.939 | 0.939 |
Params (↓) | 60.33 M | 53.49 M | 56.10 M | 56.15 M | 63.10 M | 56.15 M |
FLOPs (↓) | 20.04 G | 18.29 G | 18.96 G | 18.97 G | 20.73 G | 18.97 G |
Training Time (↓) | 545 s | 369 s | 175 s | 175 s | 383 s | 175 s |