**1. Introduction**

Anthropogenic climate change and increasing levels of pollution are critical issues that continue to motivate the transition from the use of fossil fuels to renewable energy sources. The Paris Agreement, signed by 196 countries during the 2015 United Nations Climate Change Conference, still drives most United Nations' members to reduce their dependency on non-renewable energy sources. After stalling in 2018, installed renewable power is expected to increase to 1200 GW between 2019 and 2024. Solar PV energy sources are expected to account for more than half of this increase, followed by onshore wind systems [1]. Despite the disruptions caused in global energy markets by the Covid-19 pandemic, the total generation of renewable energy is expected to increase by up to 5% in 2020 [2].

Most renewable energy technologies rely on atmospheric conditions to generate electric power. Wind speed and direction determine the performance of wind turbines. Solar irradiation is the key factor that controls the power output of photovoltaic (PV) and thermal solar systems, with other variables like air temperature and humidity also affecting their performance. Reliable weather data is

required to either evaluate the best placement for a new renewable installation or predict the future output of an existing system [3,4]. However, meteorological data is not always monitored at the location of renewable energy production systems, so historical local datasets may not be available. When in situ weather measurements are not available, physical or statistical models for computing the required variables [5], spatial interpolations of data from regional networks of weather stations [6], or Numerical Weather Prediction (NWP) models [7] can be used.

Several studies have attempted to forecast weather related variables using Artificial Neural Network (ANN) models. In [8], nonlinear autoregressive (NAR) neural networks were used to predict the fluid flow rate in shallow aquifers. In [9], changes in precipitable water vapour were estimated by a nonlinear autoregressive approach with exogenous input (NARX). In [10], temperature and wind speed were predicted using different upgraded versions of convolutional neural networks (CNN). In [11], the performance of multilayer perceptron (MLP) and multigene genetic programming (MGGP) neural networks for estimating the solar irradiance in PV systems are compared.

In many publications, different regression and machine learning algorithms have also been used in combination with weather data from NWP to forecast future PV power. In [12], the GPV-MSM mesoscale model (5 km of horizontal resolution) of the Japanese Meteorological Agency was combined with a support vector regression algorithm. In [13], the Global Forecast System (GFS) 0.5 product (0.5◦ horizontal resolution) of the National Oceanic and Atmospheric Administration of the United States was used to feed a multivariate adaptive regression splines model. In [14], an ANN model was trained with a numerical model from the European Centre for Medium-Range Weather Forecasts, although the name of the particular numerical model was not specified. ANNs are a family of machine learning techniques which are widely used to predict PV system output. An extensive review on forecasting techniques applied to PV power is presented in [15]; in that publication, ANN models are described as the most widely used option for PV forecasting among statistical, physical, and hybrid techniques. Another review on this issue is presented in [16], where ANN forecasting models are again the most widely represented algorithms.

The Global Forecast System is one of the most widely used global-scale NWP models [17–19], providing public, freely-licenced, hourly weather forecasts over different grids of horizontal and vertical points covering the entire planet. Four GFS products are currently operated: the 1.00◦, 0.50◦, and 0.25◦ horizontal grid resolution models, and the surface flux (sflux) model, with a horizontal resolution of roughly 13 km. The Global Data Assimilation System (GDAS) is another NWP model. It is used to process observational measurements (aircraft, surface, satellite, and radar) which are scattered and irregular in nature, and place them into a gridded, regular space. Those gridded measurements are then used by other models like GFS as a starting point to develop weather forecasts. Four GDAS products are currently active, with each one feeding initialisation data to one of the four aforementioned GFS products. Both the GFS and GDAS models are developed and maintained by the National Oceanic and Atmospheric Administration (NOAA), a governmental agency belonging to the United States Department of Commerce. More details about GFS and GDAS physics and subsystems can be found in [20].

One issue with GFS products is the lack of open, long-term data repositories offering access to their data outputs. Although all GFS and GDAS products are publicly available at the NOAA Operational Model Archive and Distribution System (NOMADS) near-real-time repository of NWP input and output data, each individual file is only stored there for 10 days [21]. For long-term storage, the NOAA maintains the Archive Information Request System (AIRS) [22], but only the coarser resolution GFS products (1.00◦ and 0.50◦) are stored. However, this is not the case for GDAS products; outputs from the 0.25◦ product have been stored at the AIRS repository since June 2015, while sflux files have been stored since February 2012.

In this study, a real photovoltaic installation located in the South of Italy is modelled using ANN, and its PV power production is predicted. Two data sources are used to feed the neural network models. The first one includes experimental weather and power production data gathered in [23] from the real PV system during 2012 and 2013. The second source uses weather data from the GDAS sflux

model, one of the NWP models from the NOAA with the highest spatial resolution, and the one with the highest resolution outputs that are publicly available for the temporal span of the experimental data campaign. Three scenarios are designed with different combinations of monitoring and GDAS weather data to train and test the neural network model. PV power output data from the monitoring campaign are used in all scenarios.

The fundamental idea behind the present study (shared with previous works by the same authors) is to evaluate the performance of global-scale data sources as either complements or replacements of more local- or regional-scale data sources (either forecasts or measurements) for different uses. This approach implies a lower quality of the available particular values. Performance differences between global-scale models and local sources may be larger or smaller depending on the quality of said models, but the former are not expected to be better except for outlier cases (e.g. using faulty or non-representative measurements, or ill-tuned local models). However, local data sources have relevant issues related to the lack of standardisation (requiring ad hoc retrieving and processing solutions) and the lack of historic data (e.g., the unavailability or excessive cost of the required amounts of data). If the differences in data accuracy and precision for global-scale models can be kept within an acceptable threshold level, they may be outweighed by the benefits of said models regarding the aforementioned issues.

Multiple studies have already combined different NWP models with statistical algorithms in order to forecast PV generation outputs, including ANN algorithms. To the best of the authors' knowledge, however, very few studies have combined weather data from the GDAS products with ANN models, and none has used GDAS products for photovoltaic power forecasting. Hence, the novelty and main objective of this study is to evaluate the performance of the GDAS sflux model as a replacement for in situ weather measurements with the aim of predicting photovoltaic power generation.
