**1. Introduction**

Solar energy yield is related to the quantity of radiation received at a specific geographical location which in turn depends on a number of environmental factors. There are other factors such as temperature, but still, estimation of solar radiation is fundamental requisite for siting of photovoltaic and solar thermal installations. These estimations are calculated from radiation data obtained in meteorological stations or on-site measurements but satellite data is gaining more and more importance. Good estimates are obtained combining long-term satellite-based time series with a short-term measuring campaign of at least one year. However, there are many countries with an insufficient network of meteorological stations and on-site measurements are costly, especially in remote areas. This is a major obstacle for reliable estimates of solar resources for policy makers and regional planning in these countries.

Accurate modeling of solar radiation is a difficult job due to the high number of atmospheric parameters and their spatio-temporal variations. It received special attention over the last three decades due to the rise of solar applications worldwide. Three main groups of models can be identified from the literature:


#### *1.1. Classical Solar Resource Data Modeling Approaches*

These classical methods commonly apply indirect approaches and empirical relationships like Angstrom [1], Glover and McCulloch [2], Paulescu [3], and Almorox [4]. Improved methods have been described by Perez [5], and Kamali [6]. Models which in addition use ground measurements of global radiation and its components are described in Kumar [7], and Monteiro [8]. Munkhammar [9] uses a statistical model based on copula to represent the dependence among solar radiation data at several locations. Currently huge progress has been achieved in this field. Several approaches exist to estimate solar radiation at a global scale for any application from simple to in-depth. In any case, ground measurements are the fundamental basis of all classical methods. Chelbi [10] and Robaa [11] estimate the solar radiation in developing countries with different models in countries where it is difficult to obtain measurement data as Tunisia and Egypt.

#### *1.2. Solar Resource Modeling Based on GIS*

GIS-based solar assessment has become more and more attractive as modern desktop computers are able to manage large amounts of data and many powerful GIS software tools are available. Rich [12] and Dubayah [13] aimed to estimate the diffuse irradiance of the sky in the absence of clouds with the sky image obtained from a point and assuming isotropic conditions (equivalent to view shed operation in GIS). Kumar [7] and Gueymard [14] presented algorithms to estimate radiation over a large area under clear sky conditions, they use digital elevation and latitude data to evaluate radiation changes in different aspects, slopes and positions of adjacent surfaces. Voivontas [15] developed a solar model using GIS that provides some tools to deal with spatial and temporal differences in solar radiation and demand. Monteiro [16] developed a model for calculating solar radiation maps under real-sky conditions using adaptive triangular meshes, specifically focusing on focusing on accurately defining the terrain surface and the generated shadows. The model can be used as a local server to interface with GIS tools. Piedallu [17] presented GIS-based programs to calculate solar radiation with and without clouds. Shortwave radiation components are calculated considering three sets of parameters—atmospheric attenuation, topographic parameters from the digital elevation model (DEM), and geometric relationship between the earth's surface and the sun, Charabi [18] analyses the influence of the sun in the DEM during a time period using ArcGIS tools. These methods combine ground measurements with physical models. Measurement data availability depends on location and the algorithms are simplifications of physical processes.

#### *1.3. Solar Resource Modeling Based on Satellite Data*

Currently, several institutions have implemented their own solar data bases, based on satellite measurement data. Data grids provided is based on cells with a size between 10 km<sup>2</sup> and 100 km2. Data values are given as average over the area of each cell. Its purpose is to fill the gap where ground measurements are missing, although spatial resolution is low.

An outstanding example of a worldwide solar data base is the SWERA project [19]. It provides data freely available to the public and is the result of a cooperation of renowned organizations such as DLR, NASA, NREL, DTU and UNEP.

The input consisting of high resolution direct normal irradiation (DNI) and global horizontal (GHI) is checked for data quality and repaired by DLR and SUNY methodologies, and has a coarse resolution of 10 km × 10 km.

In the data set the primary component of information for the estimation of irradiation is a digital imagery for cloud detection. The cloud index of 10 km × 10 km resolution is extracted from half-hourly data from Meteosat satellites. The second part of information holds the physical atmospheric data set (including Aerosol Optical Thickness, Total Ozone, Transmission of Raleigh Atmosphere and Mixed Gases, water vapor etc.). The final data set is compiled using satellite imagery and ground measurements provided by the project countries. Furthermore, for the solar radiation data processing data sets from previous processes are combined into geo-referenced maps and site specific hourly time series of GHI and DNI using the methodologies of DLR for Direct Radiation and SUNY for Global Radiation.

Another example of free-access database is PVGIS of the European Joint Research Center (JRC). It contains solar resource data for Europe, Africa, and South-West Asia [20].

Other databases provide specific regional data, such as the Renewable Resource Data Center (RReDC) from NREL for the USA [21], Natural Resources Canada [22], Australian Bureau of Meteorology [23].

Finally, there are of course commercial products such as Vaisala [24] or SolarGIS [25].

Therefore, in regions where no ground measurements are available, several institutions have implemented their own climatological data modeling systems, based on satellite data and integrated on GIS platforms [7,8,13,15,26]. Huld [27] recently developed PVMAPS a set of computational tools and climate data for GRASS GIS to calculate solar radiation on large areas.

#### *1.4. Proposed Resource Modeling with Data Mining*

Satellite data is the solution for regions where no ground measurements are available. The main drawback is the coarse spatial resolution. Here is where this work is proposing a refining method based on data mining techniques. The method is explained in general and illustrated with a study case.

The presented method aims to provide a low-cost tool which can be applied even by entities with very limited budget, in order to assess solar development options for remote regions. It is considered especially useful if no reliable data is available from meteorological stations.

The paper is structured in the following way—in the introduction the motivation of the present work is presented. Section 2 describes the proposed geo-statistical data mining methodology. The method is illustrated in Section 3 with a case study. Within this section, aspects such as definition of Adaptive Neuro-Fuzzy Inference System (ANFIS) training parameters, sampling process for training data, and finally results are shown. A summary and concluding remarks are given in Section 4.

#### **2. Geo-Statistical Data Mining Methodology**
