2.4.2. Training and Testing Dates

The days comprising the temporal interval of the study are divided into a training sample and three testing samples. The training sample is fed into the ANN algorithms to build the photovoltaic power estimation models, while the testing samples are used to evaluate the performance of the estimation models.

First, all complete available days are filtered. Here, a complete day is defined as any day containing at least all its daytime hours (hours between dawn and dusk instants, computed as indicated in the Data Preprocessing section) and with no missing data for all its available hours (in the monitoring or the GDAS datasets). From this pool of complete days, two days are selected at random from each month and added to the first and second test samples, respectively.

The random selection algorithm ensures that all complete days from each month have the same probability of being included on either test sample. It also ensures that no day belongs to both random test samples. The first and second testing samples each comprise 22 individual days. These samples can be considered representative of the entire temporal span of the study, as all months are guaranteed to be equally represented. This means that all seasonal weather conditions are taken into account when testing the performance of the estimation models.

Two complete weeks of data are handpicked to form the last testing sample from the remaining complete days not belonging to any of the two random test samples. They span from 28th April to 5th May 2012, and 12th April to 19th April 2013. This sample is to be used to graphically represent the performance of the weather and power output variables. The two complete weeks have enough duration to capture the multidaily-resolution features of the studied variables, while still being able to spot hourly-resolution patterns. However, days belonging to this testing sample are not to be considered representative of the entire temporal span of the study, as they were manually chosen.

Finally, the training sample includes all days not belonging to any of the three testing samples, and comprises the majority of the available data. This training sample is fed into the different ANN architectures to build the estimating models.
