1. Introduction
Agricultural productivity can be increased by knowing and predicting more precisely crop yields under various conditions. This is a key concept in both precision agriculture and agricultural modelling. Several authors have studied the different techniques applied in precision agriculture and in the modelling of crop production where they involve meteorological variables, with the objective of improving quality, profitability, resource use efficiency and sustainability [
1,
2,
3]. Among these techniques, the application of variable doses of water, fertilizers and agrochemicals (while considering agrometeorological conditions), as well as the estimation of production (based on the evolution of meteorological variables and the physiological response of crops), are the most frequently used and are currently adopted by many farmers. Indeed, in most cases, crop recommendations are based on data recorded from field studies that compile their conditions (soil and environment) [
4].
The impact of global solar irradiation on the Earth’s surface has a significant influence on a country’s economy, including, for example, agricultural productivity, renewable energy use, food security and human health risks [
5], as reported in [
6,
7,
8,
9,
10].
Prediction and estimation studies of meteorological variables focus on measured data as inputs to the model. Franco et al. [
11] found that there is a lack of such studies that use ANN models, and that focus on generating data in sites where such data are not available, so that they can be used as inputs to other models.
Solar radiation is a fundamental factor for most physical and biophysical processes due to its role contributing in to the balance of energy and water. However, interpolation techniques are applied to large areas and do not capture the high variation at finer scales. Fu and Rich [
12] calculated insolation maps based on regression analysis of atmospheric conditions, elevation, surface orientation and the influence of surrounding topography, by correlating ground temperature with insolation and elevation, explaining the marginal variation of other factors, such as crop canopy, in the vicinity of Rocky Mountain Biological Laboratory, Gunnison, CO, USA, which area is approximately 300 km
2 and has dramatic topographic variation, with an elevation ranging from 2500 to 4300 m.
The lack of site-specific global solar radiation data is a significant barrier to most applications of crop models. Indeed, Mavromatis and Jagtap [
13] evaluated several empirical methods for estimating daily solar radiation from observed maximum and minimum air temperatures, using data from urban and rural sites in Florida (USA), and using spatially interpolated coefficients to improve the results, which are applied to estimate crop yield potential and evapotranspiration. The Donatelli–Bellocchi model [
14,
15] achieved the most accurate estimates with a Root Mean Square Error (RMSE) of 3.1–4.1 MJ/(m
2 d) in rural areas and 3.2–4.9 MJ/(m
2 d) in urban areas.
Spatial interpolation is a classical geostatistical operation that aims to predict values assigned to unobserved locations from a defined sample of data on specific substrates. However, the underlying continuity and heterogeneity of spatial data are too complex to be approximated by traditional statistical models. By using deep learning models, in particular the idea of conditional generative adversarial networks (CGAN) [
16], deeper representations of sampled spatial data and their interactions with local structural patterns can be captured. Zou et al. [
17], with a case study (global solar radiation) on elevations in southeast of China, demonstrated the model ANN capacity to achieve outstanding interpolation results compared to the benchmark methods: a model ANN (9-17-1) provided better accuracy (RMSE = 1.34 MJ/m
2, and R
2 = 0.91) compared to the improved Bristow–Campbell model (RMSE = 2.19 MJ/m
2, and R
2 = 0.83) and the improved Ångström–Prescott model (RMSE = 2.65 MJ/m
2, and R
2 = 0.68).
Environmental variables are recorded by point sampling. However, precision agriculture requires more precise and specific knowledge of these characteristic variables near or within the crop, and thus, spatially continuous data on environmental variables becomes necessary. Li and Heap [
18] classified 25 Spatial Interpolation Methods (SIM) into three different categories: non-geostatistical, geostatistical, and combined methods, and provided guidelines and suggestions for selecting the appropriate method for a specific environmental dataset.
A typical spatial interpolation method, which is very efficient and simple, is Inverse Distance Weighting (IDW), for which Li et al. [
19] proposed a new approach, called Dual IDW (DIDW), which takes into account the correlation of the data, to avoid unfavourable estimates with unevenly distributed samples. A case study based on Walker Lake data indicates that DIDW significantly improves interpolation accuracy over traditional IDW, and also slightly outperforms Ordinary Kriging (OK) for small data samples to capture adequate spatial continuity.
The spatial interpolation of the Earth’s weather variables occupies an important role in climate studies, but most of the traditional spatial interpolation methods do not consider geographical semantics in their practical application. Wu et al. [
20] proposed an improved algorithm for IDW by considering geographic Semantics (SIDW), which adds the influence of land use type on the interpolation of land surface temperature data by the Landsat 8 OLI-TIRS satellite over China, achieving generally higher accuracy and precision than IDW, Kriging, natural neighbour, and spline function interpolation methods.
Loghmari et al. [
21] developed and evaluated two monthly spatial interpolation models of global solar radiation, for the purpose of predicting global solar radiation within a distance of more than 50 km in southern and central Tunisia: an artificial neural network (ANN) that obtained better results than a model based on IDW.
In order to spatially fill gaps (nowcasting) in micrometeorological data sets (wind, humidity and temperature), Gunawardena et al. [
22] employed Multivariate Linear Regression (MLR) and ANN at eight locations, using measurements from three nearby weather stations, covering scales from 100 m to 5 km. These measurements were made in regions marked by complex terrain, where spatial variability is high on small length scales, which in this case is the Cadarache Valley, which is located in southeastern France, from December 2016 to June 2017, demonstrating that both methods are acceptable.
In this case [
23], it is notable the interpolation of the observed weather in the centre of a 25 by 25 km grid, where the weather data is homogeneous, and the temperature, sunshine, humidity and wind speed are expected to change gradually at distances of 50 to 150 km in the European Commission’s MARS (Monitoring Agriculture with Remote Sensing) Crop Yield Forecasting System (MCYFS) wiki.
Geographic Information Systems (GIS) offer different options to analyze and represent the spatial heterogeneity of the incident solar radiation in a given area. Martín and Dominguez [
24] presented a description of the methods for estimating the distribution of solar radiation in geographical areas, from a sample of data, using deterministic techniques (global polynomial interpolation, local polynomial interpolation, inverse distance weighting and radial basis functions) and geostatistical techniques (kriging and co-kriging) applying them for the summer solstice 2011, from 45 stations in Spain. Indeed, the global polynomial method presents interpolations closer to the real value, the geostatistical methods, in turn, generally present very low squared errors (the universal kriging and the ordinary co-kriging are those that show the best adequacy in the results).
The data, which is collected at discrete weather stations, can only be meaningful when represented by surfaces. Spatial interpolation methods help to convert the point data into surfaces by estimating missing values for areas where data is not collected. In addition to the objective, the total number of data points, their location and their distribution in the study area affect the accuracy and efficiency of the interpolation. Keskin et al. [
25] aimed to investigate the optimal spatial interpolation method for mapping meteorological data (precipitation, temperature and wind speed) in the Northern part of Turkey, using the interpolation methods (IDW, kriging, radial basis and natural neighbour). This investigation was carried out in January 2005, resulting in a three-locations average RMSE for a temperature of 0.94 °C with IDW, 0.75 °C with kriging and 0.70 °C natural neighbour.
Yazar [
26] performed spatial interpolation of solar radiation with data from 81 agrometeorological stations over heterogeneous agricultural areas including different crop species, irrigation techniques, and topographical and other conditions in Southeastern Turkey, by applying Ordinary Kriging (OK) individually and to reduce the Ordinary Co-Kriging (OCK) error with solar radiation related data (air temperature, vapour pressure deficit and digital elevation model), with up to 21% accuracy, which allowed for better evaluation and management of crop development and yield.
Leirvik and Yuan [
5] employed statistical methods (Random Forest (RF); Linear Regression (LR); Generalized Additive Regression (GAM); Least Squares Dummy Variable (LSDV); Ordinary Kriging (OK); and combinations, as LR + OK, GAM + OK, and LSDV + OK) to interpolate missing values in a monthly dataset spanning nearly five decades of global solar irradiation over the Earth’s surface, highlighting the benefits of using Machine Learning in environmental research.
Antonić et al. [
27] used ANN models for monthly mean values of meteorological variables (air temperature, daily minimum and maximum air temperature, relative humidity, precipitation, global solar irradiation and evapotranspiration) through data obtained from 127 meteorological stations in Croatia. The inputs used (elevation, latitude, longitude, month and time series of the respective climatic variables) were from two meteorological stations. The quality of the results allows the construction of spatial distributions of the average climate for a given period, which would be useful for dendroecological analysis.
Siqueira et al. [
28] performed the generation of synthetic daily solar irradiation series from spatial interpolation based on ANNs, employing geographic variables (latitude, longitude and altitude) and meteorological variables (precipitation, maximum and minimum temperature), which were easily available. The data were measured during the months of November (from 2001 to 2006) over seven locations in Pernambuco, Brazil.
Many climate studies need to generate predictions of a climate variable at a given location using values from other locations. Snell et al. [
29] conducted a spatial interpolation of daily maximum surface air temperatures using ANNs, so as to generate estimates at 11 locations in the central U.S. continent, using information from a network of surrounding stations for the 4- and 16-point cases and over a 63-year period (from 1931 to 1993) that were used as input and output vectors for the ANNs. The results obtained are better than the spatial average, nearest neighbour and inverse distance methods, and the potential of using ANNs for downscaling General Circulation Models (GCMs) of temperature is discussed.
Rigol et al. [
30] performed a spatial interpolation of daily minimum air temperature using an ANN trained with input variables (date, field variables and neighbouring temperature observations) for a full year, covering an area of 100 km × 100 km in Yorkshire, UK, analyzing the internal weights of the inputs to estimate the degree of spatial correlation between neighbouring stations, and the most influential variables contributing to the trend. The performance when testing ANN (33-1-1) is RMSE = 3.15 °C, of ANN (19-4-1) is RMSE = 1.26 °C, and of ANN (45-4-1) RMSE = 1.15 °C.
Zambon et al. [
31] reviewed Industry 4.0 procedures suitable for the agricultural sector, while pointing out that the 4.0 revolution in agriculture is still limited to a few innovative companies. Additionally, environmental variability and stochastic events contribute to a high degree of uncertainty in the supply chain and a lack of predictability in agricultural operations. This is where recent technologies related to the digital age, such as precision agriculture, which uses positioning technologies combined with the application of sensors and data, provide digital information in all agricultural processes.
In this paper, the concept of a Virtual Weather Station (VWS) is used and employs meteorological data from real stations to estimate data from a nearby location that does not have a weather station. As part of the VWS development, the performance of ANN models for interpolating each separate meteorological variable (global solar irradiation, maximum, average and minimum temperatures) was evaluated. The performance of the models is compared with those obtained by Franco et al. [
11], who proposed the use of a VWS in places where meteorological data are needed, as an alternative to their acquisition, when it is not possible to install a meteorological station. The ANN models, in this case, were used with all the variables of the same place, while in this article, the estimation of each variable (solar irradiation and temperatures) is carried out separately (an ANN model for each meteorological variable).
4. Discussion
In this paper, ANNs were used to perform spatial weather forecasts using data measured by SIAR agrometeorological stations in Castilla and León (Spain), one of the largest regions in Europe (94,224 km
2, where more than half of the area is agricultural land), using meteorological data from both the area near the reference station and the neighbouring areas, which achieved a better performance of the ANN models. Loghmari et al. [
21] applied an ANN model using the available meteorological data in the target area with a Recorded Average Relative Root Mean Square Error (ARRMSE) of 6.4%, while the IDW model estimated the global solar radiation measured in nearby areas with an error of 5.11%.
The date set used by Franco et al. [
11] to interpolate the values of the most important meteorological variables in agriculture using an ANN was daily precipitation (mm), evapotranspiration ETo (mm), mean daily air temperature (°C), maximum temperature (°C), minimum temperature (°C), mean daily relative humidity (%), maximum relative humidity (%), minimum relative humidity (%), mean wind speed (m/s) and total solar irradiation (MJ/m
2) during the summer months (June, July and August) by the same SIAR agrometeorological stations in the territory of Castilla and León, Spain.
In this paper, ANN models are performed independently for each daily variable studied (global solar irradiation, and maximum, average and minimum temperatures) from the geographic coordinates [longitude and latitude] of the location to be estimated, achieving better performance in RMSE values (1.04 MJ/m
2, 0.68 °C, 0.58 °C, and 0.83 °C, respectively), compared to the ANN models. Franco et al. [
11] simultaneously analyzed in the same ANN, ten meteorological variables, during the summer months, obtaining RMSE values of 1.63 MJ/m
2, 1.28 °C, 0.99 °C, and 1.55 °C, respectively, for the same variables.
5. Conclusions
Precision agriculture can improve the performance of crops, and thus increase agricultural productivity, by considering a precise knowledge of the meteorological variables that affect them in their development. The number of agrometeorological station networks is increasing, but it is still interesting to have data from the specific location of the crops, which can be obtained by interpolating the data measured by the agrometeorological station network. Strong et al. [
36] assessed and evaluated the barriers to the adoption of smart agriculture through the Internet of Things (IoT) among Brazilian farmers in the Rio Grande do Sul, where they found that elements such as compatibility, complexity, testability, and visibility were the predictors of farmers’ adoption of innovative solutions. As for ANN models, they were analyzed in this paper to describe the importance of their application for the adoption of climate-smart agriculture.
Kilelu et al. [
37] carried out a report on the development of enterprises providing agricultural services in the context of the transformation of agricultural value chains and food systems in the dairy sector in Kenya, where they have the potential to provide innovation support to entrepreneurial farmers as well as contribute to the sustainable growth of the sector.
In this article, ANN models were used to interpolate the data measured daily by the SIAR network of agrometeorological stations in the Region of Castilla and León (Spain) for several meteorological variables: global solar irradiation, maximum, average and minimum temperatures, from the geographical coordinates of the location where the interpolation was carried out, by means of an ANN model for each of the variables studied. This study uses meteorological data available in the target region (areas close to the reference station) and in neighbouring regions (areas far from the reference station). The possibility of having synthetic meteorological data that best represent the local meteorology at each place and time is therefore very important to be able to apply advanced agricultural forecasting techniques that, for example, are related to the knowledge of the phenological behaviour of plants of productive interest, to the prediction of the necessary irrigation doses and the incidence of pests and diseases, or to the estimation of the potential product of the crops [
38,
39,
40].
The results obtained from this study are more successful than those obtained previously for the same SIAR network by applying a single ANN model for all meteorological variables (10 variables). The key to this improvement in results is the use of more simplified and simpler ANN models, which provide a more accurate ANN (Occam’s razor).
In addition, the results obtained from the VWS in this study can be applied to make the prediction, at the same location, of the global solar irradiation of the next day with the ANN models developed by Diez et al. [
34], and to estimate the hourly distribution of the ambient temperature, during the 24 h of the day, with the ANN models developed by Diez et al. [
35], as well as the prediction of the values, for the next day, of the temperature (maximum, average and minimum).
Future studies that develop these ANN models for the interpolation of meteorological variables from geographic coordinates for crop production could include a predictor variable that directly affects the variable to be estimated (in a sloping terrain, its orientation to interpolate solar irradiation, or in the case of temperatures, the type of vegetation cover) that would increase the accuracy of the ANN models.