*4.1. Data Cleaning*

In this paper, a historical dataset collected from a photovoltaic power station with a sampling rate of 1 day, which includes daily average temperature, maximum temperature, minimum temperature, daily sunshine duration, and daily generating energy, was introduced into the experiment [SM]. The input data sample is a 4-dimensional vector, which denotes the above-mentioned four environmental features, and every input feature vector corresponds to a daily generating energy, as the output value.

For data cleaning, firstly, the data sample with missing or invalid features was preprocessed. In this paper, the data sample with invalid features was eliminated directly.

Secondly, different features have values of different ranges, making it necessary to normalize the feature data. The normalized value could be calculated by:

$$\begin{cases} \begin{array}{c} \overline{\mathbf{x}} = \frac{1}{n} \sum\limits\_{i=1}^{n} \mathbf{x}\_{i} \\ \mathrm{std}(\mathbf{x}) = \sqrt{\frac{1}{n} \sum\limits\_{i=1}^{n} (\mathbf{x}\_{i} - \overline{\mathbf{x}})^{2}} \\ \mathbf{y}\_{i} = \frac{\mathbf{x}\_{i} - \overline{\mathbf{x}}}{\mathrm{std}(\mathbf{x})} \end{array} \tag{12}$$

where *xi* is the *i*-th original feature value; *yi* is the *i*-th normalized feature value; *n* is the amount of data samples.
