**2. Problem Definition**

Two types of data sources are used in this study to predict the PV power, including cloud images processed from the satellite remote sensing data and in situ data from real PV stations. The in situ data contain historical records of several meteorological factors and PV power. The detailed data description is provided in Section 4.1. We denote the historical steps of input data as *L*in and the prediction steps as *L*out. The number of in situ data attributes and remote-sensing data channels used is denoted as *D*LS and *C*RS, respectively. In summary, this PV power forecasting problem can be formulated as

$$f(\mathsf{X}\_{\mathsf{RS}}; \mathsf{X}\_{\mathsf{LS}}) \longrightarrow \mathfrak{Y} \tag{1}$$

where *<sup>f</sup>*(·) is the mapping function, i.e., the forecasting model; **<sup>X</sup>**RS <sup>∈</sup> <sup>R</sup>*L*in×*H*×*W*×*C*RS , **<sup>X</sup>**LS <sup>∈</sup> R*L*in×*D*LS are the input remote-sensing data and local sequences (in situ data), respectively; **yˆ** <sup>∈</sup> <sup>R</sup>*L*out is the PV power prediction. The hidden dimension of the model is denoted as *<sup>D</sup>*.

#### **3. Methodology**

PV power generation substantially fluctuates with meteorological factors, such as solar irradiance, temperature, wind speed/direction, and cloud changes. For an accurate PV power prediction, we propose a novel dual-encoder transformer (DualET) to capture context features from these factors. As shown in Figure 1, DualET has an encoder–decoder architecture like most transformers but with dual encoders. One encoder, i.e., local seasonal information encoder (LSI encoder), is to model temporal dynamics from in situ measurements. The other, i.e., remote-sensing information encoder (RSI encoder), is to learn the spatial and temporal features of clouds from satellite images. The output of the LSI encoder and that of the RSI encoder contain fluctuations in the local measurements and clouds, respectively. Both of them can be combined to provide comprehensive fluctuation information for modeling the changes in the PV power series. Further, we design a joint feature decoder to predict future short-term PV power. The details of DualET will be introduced in the following subsections.

**Figure 1.** Model architecture of DualET.
