*2.1. General Considerations*

As seen before, ground measurements are the fundamental basis for reliable solar resource estimation. However, especially for remote and poor regions, the initial cost of this approach is too high. Furthermore, the conventional approach extrapolating known solar radiation data from reference sites is not feasible. Large distances between meteorological stations render extrapolations too uncertain to be useful.

From existing models such as PVGIS, irradiation data is available with coarse resolution of 10 km × 10 km for example. At the same time, topographic data from high resolution DEM is available for almost any location [20]. Consequently, the objective of this work is to develop a method to increase

the spatial resolution of available solar radiation maps combining coarse irradiation data (GHI) with high-resolution topographic data.

The method is based on the hypothesis that atmospheric parameters such as cloud index, water vapor, aerosol and ozone are already incorporated in the initial low-resolution solar irradiation data. In addition, it is assumed that solar irradiation is largely influenced by the local terrain.

Some authors introduce the terrain conditions to improve the calculation of the solar radiation received in a certain area [10,18,28,29]. On flat terrain with clear-sky, solar radiation is almost the same over relatively large areas [7]. In hilly and mountainous terrains, altitude and slope distribution has a greater effect on local climate [30]. Surface radiation can change a lot depending on the frequency and thickness of the clouds [31]. As a result, terrain parameters such as elevation are related to solar radiation because they have a direct impact on cloud covers.

Therefore, the present methodology combines topographic data with geo-referenced GHI values. Geo-statistical information is captured from GHI data available in lower resolution (e.g., cell size of 10 km × 10 km) using ANFIS to estimate data with higher resolution (e.g., 1 km × 1 km).

The ANFIS is trained with a reduced representative training set, which is obtained based on statistical analysis of the original data. With a show case, it is demonstrated that the obtained ANFIS model is valid for the whole region under study.

This novel approach constitutes a fundamental difference with regard to the above mentioned models as it represents an indirect assessment. It is not a complete model in itself, but it is a tool to enhance existing model output considering topographical information.

#### *2.2. Data Mining Using ANFIS*

Artificial intelligence techniques, such as the well known heuristic method ANFIS have been used successfully in different renewable energy applications [32–35].

The basic steps of the proposed geo-statistical data mining methodology are:


In summary, the relationship of spatial variation of global solar radiation (available in low resolution) and terrain parameters (available in high resolution) is captured in a data mining model. This model is later applied to generate a solar radiation map with considerably higher resolution.

The steps mentioned above are illustrated in the next section with a case study from a remote region in Ethiopia.

#### **3. Case Study: Amhara Region Ethiopia**

In order to illustrate the proposed method, a study case is presented for the Amhara region in North Shewa, Ethiopia. As input, publicly available long-term data of GHI is used from SWERA project [19] having a coarse resolution of 10 km × 10 km (100 km<sup>2</sup> cell size). Further, a high-resolution DEM was obtained from GeoCommunity™ website [36]. In Figure 1 geo-referenced maps of elevation (above) and irradiation (below) are shown for the studied Amhara region.

In the following sections, the case study is developed following step by step the proposed method. At the end, the result is a refined map of GHI with a resolution of 1 km × 1 km.

**Figure 1.** High-resolution elevation map (above) and low-resolution daily solar radiation map with 100 km<sup>2</sup> cell size (below) of Amhara region, North Shewa (Ethiopia).

#### *3.1. Definition of ANFIS Training Parameters*

The topographical terrain parameters used to describe the study region are terrain elevation, slope and terrain aspect, as proposed in Reference [7].

It may be mentioned here that the terminology used here is based on GIS standards. Translated to commonly used terms in solar energy assessment, aspect would be azimuth (*γ*) and slope would be tilt (*β*). Nevertheless, it is preferred here to maintain the GIS terminology, as these parameters are referred exclusively to the terrain and not to the solar energy capturing system.

In addition, values of standard deviation (STD) of elevation and slope are added to the training set. Tests (not shown here) have shown that by including these additional parameters the stability of model outputs can be improved. For example, estimated negative radiation values are avoided.

Using GIS Global Mapper platform raster maps of 1 km × 1 km resolution are obtained directly from DEM data [36] for elevation and slope and their corresponding standard deviation for each cell.

A continuous raster of the terrain aspect was created in order to facilitate data sampling (see Section 3.2. This was attained using the built-in shader function for slope direction of the GIS platform of Global Mapper software [37]. In order to obtain useful data, the default grayscale has been adapted. As a simplification of more complex natural phenomena, it is assumed that east and west orientation have the same effect on radiation. Therefore, the scale is defined ranging from 0 (black = north) to 255 (white = south). West and east are equally represented by the value 127 (gray). As a result, a circular scaling is obtained. From the GIS data an average aspect value is calculated for each square kilometer of the region extension. The resulting map is shown in Figure 2.

**Figure 2.** Full grayscale bitmap of aspect of the region terrain.

As a result five parameters compose the input for the training session: terrain elevation, slope (inclination), aspect (orientation) and STD of elevation and slope. The output is solar radiation (see Table 1).

**Table 1.** Ranges of Adaptive Neuro-Fuzzy Inference System (ANFIS) training data (matrix of 219 columns and 130 rows).


#### *3.2. Representative Data Sampling for ANFIS Training*

Solar radiation data of the region under study is only available with a resolution of 10 km × 10 km. It is given in a matrix of 13 rows and 22 columns, covering a surface of 130 km × 220 km. On the other hand, terrain data (elevation, slope, aspect) are available with a resolution of 1 km × 1 km. In order to obtain a representative training sample which establishes a relationship between terrain data and solar radiation, an adequate sampling method is required.

Therefore, in a first step, the resolution of solar data is increased to 1 km × 1 km, such as the terrain layers (see Figure 3). Notice that in this step, only the number of data points is increased and no information is added. This step is merely for convenience for the following steps, as all four matrices M (one for solar radiation and three for terrain properties) have the same size and each cell *Xi*,*<sup>j</sup>* is geo-referenced to the same location.

**Figure 3.** Illustration of the increase of solar radiation data resolution from 10 km × 10 km to 1 km × 1 km.

In a second step a simple sampling method is applied, which makes sure that all solar radiation data is included and a representative selection of terrain data is obtained. This sampling method consists in extracting every tenth row of each matrix M. Hence, the new dataset contains exactly 10% of the original data. In the presented show case, 28,470 values are reduced to 2847 values for every training parameter.

The same procedure is repeated for all five matrices and a training set of five reduced matrices is obtained. With this procedure, ten different training sets can be generated, depending on the row where the selection is started.

It can be expected that any of these sets are suitable as training set. In order to verify if the obtained reduced datasets are representative for the original data and to select the best training set, two additional steps are proposed here. Notice that the representativeness for solar radiation data is given automatically, as exactly one line (ten identical values) is extracted from each cell of 10 km × 10 km. First, histograms are computed for all reduced matrices of terrain data. An example is given in Figure 4, where the histograms of the ten possible training sets for terrain elevation, slope, aspect as well as for the STD of elevation and aspect are shown.

**Figure 4.** Histograms of the ten possible training sets of terrain parameters from top to bottom: elevation, slope, aspect, STD of elevation, STD of Slope.

In order to evaluate the representativeness of the extracted training sets, in a second step the root mean square error (*RMSE*) is computed for each histogram compared to the histogram of the whole dataset.

$$RMSE\_i = \sqrt{(\overline{f\_i - f})^2},\tag{1}$$

where *fj* and *f* are vectors of 10 values, representing the frequency in the 10 bins of the histogram.

The results for all five terrain variables are shown in Figure 5. The numbering of the histograms is related to the first line *i* of extraction. In addition, the average of all 5 curves is shown. From this average, it can be derived that the most representative training sets would be those with index *i* = 1, 2 and 9. Nevertheless, it can also be concluded that any training set is suitable, as all sets have distributions which are very similar to each other, with deviations of less than 1%.

**Figure 5.** RMSE of the histograms of the ten extracted possible training sets from terrain aspect data.

In order to illustrate the obtained training data set, in Figure 6, a cross-correlation plot of available data and the extracted training data for radiation and elevation is shown. The strong correlation of these two parameters can be observed easily.

**Figure 6.** Cross-correlation plot of available (raw) data and the extracted training data for radiation and elevation.

#### *3.3. Training of Neuro-Fuzzy Network*

The data mining model has been implemented with Matlab ANFIS. The sub-clustering architecture in combination with hybrid optimization was chosen, because this combination offered the fastest and best approximation of the solar radiation estimation algorithm.

The training procedure used for the selected data is using Matlab built-in ANFIS editor, where a common modeling technique is implemented. After appropriate data sampling (as explained in Section 3.2), the obtained training set can be imported in text format into the ANFIS editor. For effective ANFIS training,

the structure of the neuro-fuzzy network should be as simple as possible and be able to capture the desired information. In a first step, the numbers of membership functions are assigned arbitrarily. Then, a hybrid optimization method optimises the ANFIS structure and parameters until a correct reproduction of the training data is achieved. The number of epochs selected was 80–300 and the training population size is 2144 data sets with 5 to 8 alternative runs. At the end of this learning process, it was possible to tailor a membership function such that adequate inference and rule surfaces were obtained. For example, the Fuzzy Inference System (FIS) rule surface in Figure 7 shows that radiation values remain defined for all possible combinations of elevation and slope.

**Figure 7.** Fuzzy Inference System (FIS) Rule surface of elevation and slope for the estimation of radiation.

#### *3.4. Estimation of High-Resolution Radiation Data*

With the trained ANFIS model, solar radiation is estimated for the entire region under study with a resolution of 1 km × 1 km. For convenience, the result is exported to a text file with grid format (x, y, radiation). This file can be imported and represented as another layer for the geo-referenced region in any GIS tool.

In Figure 8 the result of the ANFIS model is compared with original coarse resolution data from the study region of Northern Ethiopia. Both maps are generated with the raster extrapolation tool of ArcGIS software.

Comparing the two maps, the refined resolution of solar radiation data in the lower map is evident. Especially the strong correlation with elevation data can be noticed.

**Figure 8.** *Cont.*

**Figure 8.** Comparison of coarse daily solar radiation data (**top**) and refined data of 1 km × 1 km (**bottom**), applying ANFIS estimation model.

## *3.5. Validation of Results*

In order to give an idea of the validity of the estimation, two validation steps are presented in this section. First, the consistency of training data and estimation results is evaluated. In a second step, results are compared with two benchmark data sources (PVGIS and SWERA).

The consistency of the ANFIS output with original data is demonstrated representing the data in several different plots.

In Figure 9, the effect of the modification of solar radiation data can be observed. Raw data (coarse radiation) is compared with ANFIS estimation for every 1 km × 1 km grid cell. The graph represents a sequential scan of the map from the upper left corner down to the lower right corner. On the horizontal axis all the pixels of the map are represented in order. Training data is represented at the same position as raw data. Therefore, training data (black dots) is separated into 13 almost vertical lines which represent the 13 rows of the coarse radiation data set. This way it can be seen how three-dimensional data has been represented in a two-dimensional plot. It can be observed how minimum and maximum radiation limits are widely respected and much more intermediate values are generated. In addition, it can be seen how few training data values of solar radiation were employed.

**Figure 9.** ANFIS radiation extrapolation compared to average (raw) and training data.

In Figure 10 a correlation plot is shown of original (10 km × 10 km) against estimated refined data (1 km × 1 km) of GHI for the training set (black) and the final estimate of the entire region (red). Representativeness of training data becomes evident as it covers the complete range of results. In addition, the final estimate shows almost no outliers, which demonstrates the stability of the model.

**Figure 10.** Correlation between 100 km<sup>2</sup> radiation data and 1 km<sup>2</sup> resolution estimated radiation.

Following the same procedure, in Figure 11 three different radiation data sets are represented against the terrain elevation—Original GHI with coarse resolution, refined GHI from the training set and from the final estimate of the entire region. Again, the consistency of the model output becomes visible. Also, a remarkable correlation pattern can be observed between elevation and GHI. This pattern is captured by the ANFIS model and as a result estimated data are much more correlated with terrain elevation than original low-resolution input data.

It must be pointed out here that the observed pattern is a specific feature of the region under study. This by itself is a valuable result of the model, as it indicates an interesting subject, which is worth to be further investigated. Additional studies may determine if this pattern also can be found in regions with different climates and elevation patterns.

**Figure 11.** Comparison of raw radiation data versus ANFIS training and final ANFIS estimation as a function of terrain elevation.

The presented correlation plots give plausible evidence that ANFIS estimation has been able to capture the behaviour of the input data to be assessed and produces coherent results. The next step is a validation with available GHI data for some specific locations within the region under study.

The best way to validate the model would be with random samples of long-term on-site measurements. But as these measurements are not available, the validation is carried out with existing data from PVGIS and SWERA as benchmarks. PVGIS was chosed as it is a standard freely available online database and SWERA in order to illustrate the modification which the ANFIS model made at certain locations compared to the original data.

In Table 2, values of GHI are shown for some selected locations within the studied region of Ethiopia (sorted in alphabetical order). ANFIS estimation is compared with original data from SWERA (mean value of 10 × 10 km cell) and PVGIS.

The difference of ANFIS output compared to PVGIS (Average difference: 45% is slightly higher than the difference to SWERA (Average difference: 18%). This is mainly due to the fact that SWERA data was used to train the ANFIS model.

It is worth noticing that for some locations a large difference of GHI can be observed when data from SWERA is compared to PVGIS. The average deviation is with 45% in a similar range as the deviations of the ANFIS results compared to both models (18% and 45% respectively).

It is evident that only on-site measurements can determine with more clarity the degree of accuracy the proposed model is providing.

