*3.1. Dataset Description*

Panels data and meteorological data of the full dataset were used in this study as detailed in the methodology. Panels log their power reading during the daytime, i.e., sunrise to sunset. From the sunset to sunrise, the panel does not provide any information about their output power; however, we still have data about their average temperature. Therefore, to maximize the information in our data, we filtered out the period between sunset to sunrise which varies between winter and summer. We added the meteorological data, which consisted of cloudiness, relative humidity, and sun time, to the panels' data. Meteorological data were recorded as one sample per day while panels' data were recorded every 5 min. Therefore, we created three new columns for every panel's data file and assigned to those columns the meteorological values for that day by repeating it n times where n is the number of rows/entries in that panel's data file. In this way, we have built a connection between solar data and meteorological data. We used the same epochs numbers and batch size for two models owing to the comparison.

Deep ANN and LSTM were utilized in this study to predict the daily solar radiation. Inputs were amorphous silicon PV panel in kWh, mono silicon PV panel in kWh, poly silicon PV panel in kWh, average atmospheric temperature in ◦C, average panel temperature in ◦C, daily average cloudiness, daily average relative humidity (%), and daily sunshine time in hours, and the output was the predicted radiation amount (W/m2).

Four years' worth of data were utilized in the proposed study. The data were split into 3 years for training and 1 year for testing. Results in terms of mean square error (MSE) were computed for each model. The training set was split further into a training and validation set for both models. Eventually, the two trained models were evaluated using the testing set. The assumption was that the trained model which has been trained on the 3 years of data can then be used to perform the prediction throughout the 4th year with the same range of error. For both models, we concatenated the daily meteorological data to the data acquired from the PV panels. In order to preserve the data acquired by the panel, each daily metrological value was repeated in the rows corresponding to that day. We trained both models using the aggregated data by averaging every 12 rows (=60 min) and predicting the following 48 row's solar radiation (=predicting the solar radiation after 48 h). The data were then normalized between 0 and 1, and the normalized data were used for the learning process.
