4.4.2. Decomposition Blocks in Dual Encoders

We evaluated the impact of decomposition blocks for dual encoders by removing the series decomposition (SD) blocks or wavelet transform (WT) blocks. As shown in Table 3, the dual encoders without decomposition blocks will give some performance degradation. This shows that the WT and SD can enhance the learning capacity of dual encoders and provide informative features for the decoder. In addition, the performance of the RSI encoder without WT is worse than the LSI encoder without SD, which indicates that the high- and low-frequency features of cloud images are essential for short-term PV power prediction.

**Table 3.** Ablation studies of decomposition structures. SD and WT mean the series decomposition block and wavelet transform block, respectively. W/o means "without".


#### 4.4.3. Different Attention Modules

In DualET, the decoder's self- and cross-attention modules are formed with Prob-Sparse, while the others are enhanced by fast Fourier transform (FFT). As shown in Table 4, we evaluated the impact of different attention modules including the cross-domain attention module. We can see that if all attention modules are the same type (ProbSparse form or FFT), it results in performance degradation. Therefore, we employed the various attention modules with suitable modifications for better correlation capture. Moreover, the prediction error of the model without cross-domain attention significantly increases. This

demonstrates the effectiveness of cross-domain attention to learn the correlation between features in sequence data and image data.

**Table 4.** Different attention modules. W/o means "without".


#### **5. Conclusions**

To handle satellite images and ground measurements, in this paper, we proposed a novel transformer-based model with dual encoders, named DualET, for short-term PV power prediction. To obtain cloud detailed information from satellite images, a twodimensional wavelet transform block and a residual block were used in the remote-sensing information encoder. For the local seasonal information encoder, we conducted selfattention and series decomposition to learn the temporal patterns from local sequences. For the decoder, we employed three types of attention modules and series decomposition blocks to model the joint features of local and remote-sensing information and output the prediction. Specifically, a cross-domain attention module was proposed to learn the correlation between the temporal features and cloud information. Finally, the experiments on real-world datasets, including PV station data and satellite images, were presented to show the prediction performance of DualET. In addition, the ablation studies show the effectiveness of our design. In the future, we will attempt to improve the model architecture so that more data sources (e.g., NWP or other satellite remote sensing data) can be utilized to predict a longer horizon.

**Author Contributions:** Conceptualization, J.W. and H.H.; methodology, H.C. and J.Y.; software, H.C. and J.Y.; validation, X.Z., T.Y. and Y.W.; writing—original draft preparation, H.C. and X.Z.; writing—review and editing, T.Y., J.W. and Y.W.; visualization, J.Y. and X.Z.; supervision, J.W. and H.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Key R&D Program of China (2021ZD0110403).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The remote-sensing data used in this paper were processed from Himawari-8 satellite data supplied by the P-Tree System, Japan Aerospace Exploration Agency (JAXA). We also gratefully acknowledge the support of the MindSpore team.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
