**1. Introduction**

Facing the depletion of finite fossil fuels and the demands of carbon emission reduction, photovoltaic (PV) energy has raised widespread interest in recent years [1,2]. As a sustainable and flexible distributed energy source, PV power has been a significant part of the power grid with the rapid growth of its installed capacity worldwide [3]. However, the high intermittency and uncertainty of PV power make accurate prediction difficult and further pose technical challenges for reliable power system operation and control [4,5]. Meteorological factors, e.g., solar irradiance, temperature, wind speed and direction, cloud changes, and air pressure, are the primary causes of this arbitrary fluctuation. Therefore, effective PV prediction places critical demands on the model's capacity to take into account and handle various meteorological factors.

For PV power prediction, the prediction model and the forecast horizon mainly depend on the data sources. Traditional methods such as ARIMA [6] and exponential smoothing [7] hardly deal with multivariate data and thus usually take historical PV power series as the sole input. However, they are still limited to modeling non-stationary changes. To leverage the meteorological information, many previous studies mainly concentrate on employing local measurements [8] or numerical weather prediction (NWP) data [9]. However, these methods are incapable of capturing the volatility of PV power affected by cloud changes due to ignoring the data sources containing cloud conditions. More specifically, the solar irradiance reaching the PV panels mainly depends on the amount and

**Citation:** Cao, H.; Yang, J.; Zhao, X.; Yao, T.; Wang, J.; He, H.; Wang, Y. Dual-Encoder Transformer for Short-Term Photovoltaic Power Prediction Using Satellite Remote-Sensing Data. *Appl. Sci.* **2023**, *13*, 1908. https://doi.org/10.3390/ app13031908

Academic Editor: Sergio Toscani

Received: 10 December 2022 Revised: 19 January 2023 Accepted: 31 January 2023 Published: 1 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

motion of clouds [10,11] recorded in consecutive cloud images. In light of this, groundbased or satellite remote-sensing images are valuable data sources to support accurate short-term PV power prediction.

Several approaches employ ground-based/sky images for the prediction of irradiance [12,13] and PV power [14]. With sky images, the cloud motion vector (CMV) methods that can calculate the speed and direction of clouds have been developed to improve the irradiance prediction accuracy [15,16]. These methods have a considerably decreasing value beyond a 30 min future horizon and are thus commonly used in ultra-short-term PV power prediction [17]. Moreover, ground-based images are limited in observation range and require equipment installation. In contrast, satellite images offer top-down detection with greater spatial coverage, as well as advantages for prediction up to a few hours ahead [18]. They are not only suitable for CMV methods [11,19–21] but also able to provide other valuable information. Cai et al. [22] cluster the interval gray value of satellite images to construct a relationship with PV power. Wang et al. [23] propose a CNN-based fluctuation pattern prediction model with satellite images as input and apply LSTMs to individually forecast different patterns of PV power. Besides these methods that adopt only satellite images, researchers are increasingly interested in coupling satellite data with ground measurements to improve forecasting performance [24]. Si et al. [25] extracted cloud cover factor from satellite images by the modified CNN and then combined it with meteorological information for irradiance prediction. Agoua et al. [26] presented a spatiotemporal model with Lasso to integrate multi-source data for 6-h-ahead prediction. Furthermore, they used the proposed model to assess the impact of each source on forecasting performance, which showed that satellite images improved the accuracy by 3%. Yao et al. [27] proposed an encoder–decoder model, which includes U-Net and LSTM to extract the spatial features from satellite short-wave radiation data and temporal features from meteorological sequences, respectively. Then, the spatial and temporal features were concatenated for intra-hour PV power prediction.

However, a flexible predictive framework capable of handling data sources with different structures (i.e., image and sequence) is still not well discussed. Due to the insufficient consideration of the characteristics of these two structures, the spatiotemporal features extracted using the above methods are still not informative for PV power prediction up to hours. Although various deep neural networks have been applied in PV power prediction, the majority of existing methods consist of conventional deep learning models such as CNNs, RNNs, and fully connected layers. At present, few studies have followed the state-of-the-art transformer [28] architecture, which has achieved great success in natural language processing [29] and computer vision [30]. Compared with convolutional or recurrent neural networks, transformer-based models with the self-attention mechanism are superior in capturing temporal dependencies. Simeunovic et al. [31] combined a graph neural network with a transformer to propose a graph–convolutional transformer for multi-site PV power prediction based on historical weather data. In addition, most existing models based on hybrid neural networks merely concatenate the spatial and temporal features before passing them as a whole input to the following layer. As a result, this rough treatment is incapable of learning and utilizing the correlation between the features of structured local measurements and unstructured satellite images well.

Motivated by the aforementioned issues, we renovate the transformer architecture and propose a novel transformer-based PV power prediction model that can process the locally measured sequences and satellite remote-sensing images. The main contributions are summarized as follows:

1. We propose a novel **dual**-**e**ncoder **t**ransformer (DualET) for short-term PV power prediction. Distinct from the standard transformer architecture, DualET contains dual encoders, including a local seasonal information (LSI) encoder and a remotesensing information (RSI) encoder, and a single decoder. The dual encoders are designed to handle sequence and image data, while the decoder is to model the joint meteorological features from the encoders and outputs the short-term prediction.

