**4. Methodology**

This study fits in with the techniques of the ESDA, concentrating particularly on the analysis of the phenomenon of spatial dependence or autocorrelation. In order to achieve this we have used the ArcGIS software in its 10.3 version, which from a geostatistical perspective allows the analysis of spatial dependence or autocorrelation by means of the most commonly used indexes.

The variables to analyse are on the one hand the beds offered by the various types of accommodation establishments in the region of Extremadura, and on the other hand the occupation level of the region, using for the purpose a sample consisting of a total of 270 tourist lodgings which provided their occupation details to the Extremadura Tourist Observatory in July 2015.

The month and year selected for carrying out this study depend on a strategic criterion within the general objective pursued by the authors. In this sense it should be emphasised that this study is part of a more extensive research project which aims to analyse the spatial patterns of distribution over a three-year period from 2015 to 2017. In this way the decision was made to perform the analysis using the first year of reference and within this year, July was chosen for being the first month of the quarter (the third of the year) recording the highest occupation levels in most of the territories to analyse. Also, this month is characterized by being among the two that present the highest occupancy, July and August, which has a lower variability. Once the reference quarter has been selected, the objective is to see which month meets the requirement to position itself as a month that presents a good performance for most of the territories under analysis and, in turn, does not present a grea<sup>t</sup> variability between the different months under study. For this, the study carried out on tourist seasonality between the

di fferent territories is taken as a reference [59]. This study analyzes, among other aspects, the tourist density for the di fferent tourist territories in which the region is divided and it is obtained that although August, together with the month of July, are the months that present maximum values for the greater part of the territories, August is also established as a minimum month for any of them; the same not happening with the month of July. Therefore, although the final objective is to analyze the tourism efficiency pattern in the territory in the entire annuity, it is decided to use the month of July to start as a basis for analysis and confirm if, depending on the results obtained after this first analysis between the di fferent territories, it is worth extending the analysis to an annual scale.

The findings of this research therefore not only generate exhaustive knowledge of the distribution pattern of the supply and its satisfactory adjustment to the tourist demand in the region during the month and year selected but also allow the laying down of the methodological foundations for the analysis of the remainder of the period of time considered.

Within the range of the possible variables available for measuring tourist activities, the decision was made to select the two variables considered most suitable for the characterisation of tourist activities as a whole, as in this way the distribution pattern of the supply can be analysed and in turn, the satisfactory adjustment between supply and demand measured by the occupation level of each of the lodgings.

The study of the autocorrelation or spatial dependence of the variables mentioned in the territory of Extremadura is moreover analysed from a double perspective: global and local. The objective of the contrast of spatial dependence in the global perspective is to identify spatial trends or structures in a specific geographical space. The indicators proposed by Moran [45] and Getis and Ord [46] are used for this purpose. These indicators are the first formulations proposed in the literature as statistical measurements of the spatial autocorrelation e ffect. Moreover, they are characterised by their capacity to summarise a general outline of dependence in a single indicator [44].

Both contrasts assume an objective statistical criterion which allows the confirmation or rejection of the presence of trends or spatial structures in the distribution of a variable. In both cases, the null hypothesis to confirm is the lack of spatial dependence, i.e., the randomness of the distribution of the variable in the territory selected.

Moran's I test (1948) is given by the following, Equation (1):

$$I = \frac{N}{S\_0} \frac{\sum\_{ij}^{N} w\_{ij} \times (y\_i - \overline{y}) \times \left(y\_j - \overline{y}\right)}{\sum\_{i=1}^{N} (y\_i - \overline{y})} \text{ i } \neq \text{ j.} \tag{1}$$

where

*wij* is the element of the matrix of spatial weights corresponding to par (*i*, *j*);

*S*0: the sum of the spatial weights *i j wij*;

*y*: the average or expected value of the variable;

*N*: the number of observations.

Once one proceeds to standardisation by rows of the matrix of spatial weights *S*0 = *N*, the statistic *I* takes the form of the following, Equation (2):

$$I = \frac{\sum\_{i} \sum\_{j} w\_{ij} \times (y\_i - \overline{y}) \times \left(y\_j - \overline{y}\right)}{\sum\_{i=1}^{N} (y\_i - \overline{y})}. \tag{2}$$

According to Cli ff & Ord [60], when the sample is large enough this statistic is distributed as a standard rule *N* (0.1). The inferential process uses the standardised values ( *Z*) of each of them, which are obtained by the quotient between the di fference of the initial value and the theoretical average and the deviation, i.e.,

$$z = \frac{\mathbf{I} - \mathbf{E}[\mathbf{I}]}{\mathbf{SD}[\mathbf{I}]}.\tag{3}$$

The values obtained by the test will be interpreted as follows: non-significant values of test I will involve the non-rejection of the null hypothesis of the random distribution of the variable in the space studied. For their part, significant values of the variable and positive values (values exceeding 1.96 at a significance level of 5%) will indicate the presence of positive spatial autocorrelation, i.e., they will identify values of the variable (high or low) specially grouped in the space to a greater extent than would be expected if they were following a random distribution pattern. The significant and negative values of the variable (values lower than −1.96 at a significance level of 5%) will reflect the existence of negative spatial autocorrelation, or what amounts to the same they will identify a non-grouping pattern of similar values (high or low) of the variable which is higher than normal in a random spatial pattern.

In order to complete the global analysis of the distribution of the variables, the family of indicators proposed by Getis and Ord [46] is also used. They stand out by using a criterion which is different to those used up to now to measure spatial autocorrelation based on the distance or spatial concentration statistics.

The calculation of the statistic requires the definition of a critical distance (d), as from this distance a radius of influence is established from which it is determined which units are neighbours to others depending on whether they are within the radius of influence determined by the critical distance.

It is given as follows:

$$G(d) = \frac{\sum\_{i=0}^{n} \sum\_{j=0}^{n} w\_{i,j}(d) y\_i y\_j}{\sum\_{i=0}^{N} \sum\_{j=0}^{N} y\_i y\_j} \text{ for } i \neq j,\tag{4}$$

where two pairs of spatial units *i* and *j* are neighbours if they are found within a determined distance d, taking the *wij* value of 1 when this is so or 0 when it is not.

The statistical significance is checked by means of the standardised statistic *Z* which is distributed at an asymptotic level according to a rule *N* (0.1). The interpretation of this test in those cases with statistical significance will be as follows: a positive (or negative) *z* value exceeding 1.96 for the absolute value will indicate a tendency to the concentration of similar high (or low) levels.

One of the main limitations of global autocorrelation tests is they are incapable of detecting local spatial structures, hotspots or coldspots, which may or may not extend to the global pattern structure [41,46–52]. It was in order to overcome this limitation that local spatial autocorrelation tests were developed. The objective of these tests is the detection of particularly high or low values (hotspots or coldspots) of a variable in comparison with its average values. They are characterised by being calculated for each of the spatial units to analyse, owing to which they allow the detection of those concentrating higher or lower values than what can be expected in a homogeneous distribution.

The analysis of local spatial autocorrelation may present two different scenarios in contrast to global spatial autocorrelation as is pointed out by Vayá and Suriñach [52]. In the first place, it may occur that in a specific space as a whole a distribution pattern of the concentration or dispersal of values at a global level is not detected and indeed there are small clusters in which high (or low) values of the variable are concentrated. Secondly, it may also occur that given the existence of a global distribution pattern, some spatial units contribute to a greater extent to that global indicator.

For this reason, the analysis of autocorrelation at a local level constitutes a good complement to the study of global distribution.

The local indicators of spatial association (LISA) proposed by Anselin [47] and the Gi family of statistics of Getis and Ord [46] and Ord and Getis [48] are the most frequently used indicators for the study of spatial autocorrelation at a local level. In this study, as is explained in section two, we decided to use the LISA maps of Anselin [47] as a criterion as we consider that the results of this test give a wider interpretation.

Anselin [47] proposes a set of local indicators of spatial association with the objective on the one hand of the determination of significant local spatial groupings (clusters) and on the other the detection of pockets of spatial instability, understood as the presence of atypical values.

Among the indicators proposed by the author, Moran's local *Ii* statistic stands out; its equation is as follows: 

$$I\_i = \frac{z\_i}{\sum\_{i} z\_i^2 / N} \times \sum\_{j \in j\_i} w\_{ij} \, z\_{j\prime} \tag{5}$$

where *zi* is the standardised value of the spatial unit *i* and *ji* the set of spatial units proximal to *i*.

According to a random distribution hypothesis, the probability of the statistic is:

$$E\_A(I\_i) = -\frac{w\_i}{N - 1},\tag{6}$$

where *wi* is the sum of all the elements corresponding to the row of unit *i*.

The hypothesis assumed is that the standardised Ii statistic is distributed as an *N* (0.1) rule.

The standardised statistic is interpreted as follows: a high positive value (*z*-score) exceeding 1.96 at 5% of significance will indicate the presence of clusters of high or low values of the variable. For its part, a significant negative value (less than −1.96 at 5% of significance) indicates the existence of spatial outliers.

For each of the tests listed up to now, and as has been revealed in section two of this study, it is necessary to choose a proximity criterion that adjusts satisfactorily to the particularities of the area under study. In order to be able to make this choice, various tests have been carried out with each of the three possible criteria in accordance with the geostatistical perspective used. After they were carried out it was decided to use the criterion most frequently followed in the literature to date, the fixed band distance criterion; the distance used is that established by the programme by default to ensure that all spatial units have at least one neighbour, 15.79 miles.

Once the di fferent contrasts to be used in this study have been presented, the following section gives the main results obtained from the analysis of spatial autocorrelation at both a local and global level of the variables of the beds available and the occupation level of the region of Extremadura.
