1. Introduction
One of the major changes that have occurred within the framework of the theory of economic development over the last twenty years is the consolidation of a new paradigm known as territorial, endogenous or local development [
1]. According to this paradigm, far from eliminating differences between areas, the process of globalization is stimulating the expansion of all of its forms, consistent with the new spatial logic of global capitalism [
2,
3,
4]. Some authors highlight the great growth of medium-sized cities and their influence on urban and socioeconomic development [
5,
6,
7].
According to the national urban networks in Europe, the European Commission [
8], highlights small and medium-sized European cities as centers for the development of industrial activities and services for research and technology, and for tourism and leisure. In recent decades, these cities have reaffirmed their role. Thus, they function as regional centers that must cooperate as part of a polycentric model, in order to ensure their added value compared to other cities located in rural and peripheral areas, as well as in areas with specific geographical challenges and needs [
9,
10].
In this sense, now, and in line with what happened in other parts of Europe, Andalusia (Spain) is embarking on a process of structural change in which intermediate cities are becoming increasingly visible, which is based, inter alia, on the enhancement of the endogenous resources serving tourism development [
11], with the conviction that this activity has a strong dynamic effect on the economy as a whole.
In this way, tourism has favored the proliferation of research on how this process contributes to the social, economic and cultural well-being of the inhabitants of these cultural heritage sites, focusing mainly on large monumental cities although, more recently, on medium-sized cities [
12].
The capacity of tourism as a lever for development in medium-sized cities has made one of the most important consequences of the progressive incorporation of tourism in the whole of the Spanish Mediterranean coast. This has led to the generation of production specialization processes in many of these cities [
13].
In fact, the whole of the Spanish Mediterranean coast has become, after 60 years of continuous development, one of the densest regions in Europe. Some authors speak of a long and compact “linear city”, made up of a conglomerate of hotels, restaurants and leisure facilities based almost exclusively on tourism [
14].
Tourism, understood as an economic activity, has spread throughout the settlement system and has contributed directly to the strengthening of the network of medium-sized cities, which in turn drives the phenomenon of urban deconcentration on a regional scale. These medium-sized cities that have been organized by capturing seasonal (tourist) flows have given rise to places with favorable conditions for the residential location of certain segments of the population [
15]. In this way, the set of tourist municipalities that, in the mid-twentieth century, constituted towns with temporary visitors, have ended up being medium-sized cities or places with consolidated urban attributes [
13,
16].
Thus, tourism is a dynamic tool of the territory for this region, so that for medium cities, it can represent an important advance in terms of socio-economic development, especially in the case of medium cities that are the object of this study. In this context, and taking as reference Pulido and Parrilla studies [
17], the object of the research is the relationship between tourism development and socioeconomic development, focusing on medium-sized cities.
The hypothesis of this research is that the level of tourism development of an area (in this particular research study, the medium-sized cities of Andalusia) affects its level of socioeconomic development. Put another way, those territories with a higher level of tourism development are also those showing a higher level of socioeconomic development, which would demonstrate that tourism is an important instrument of endogenous development.
To test this hypothesis, several indicators aimed at measuring the level of tourism development and socioeconomic development of the medium-sized cities of Andalusia will be developed. Then, it will be verified whether any relationship exists between both indicators, and these cities will be classified in order to draw some conclusions.
3. Materials and Methods
To carry out the appropriate methodology in this study, the research by Pulido and Parrilla [
17] has been taken as a reference, with the particular difference that in our case, the study is focused on medium-sized cities. The concept of a medium-sized city depends on the territorial framework and the aspects taken into account for its conceptualization. In the case of Andalusia, and from the economic and geographic perspective given [
20], the following characteristics are considered:
- −
Population size
- −
Population growth in recent years
- −
Capacity for territorial planning in relation to the urban functions performed
- −
Economic potential, degree of industrialization and specialization
Given these parameters, and acknowledging the importance of the configuration of these types of urban structures in Andalusia, medium-sized cities are those with populations between 10,000 and 90,000 inhabitants, that show a rapid rate of population growth, that are sometimes located near large metropolitan areas whose capacities for territorial planning have been established not only on the basis of the role of each population center within the system of cities, but also of the equipment operating as intermediate centers with the capacity to organize the environment.
Another important feature is the consideration of newly established companies, the jobs they create and the number of exporting firms, establishing the economic potential of medium-sized cities to determine their economic dynamism linked to the territory.
On the basis of the above,
Table 1 shows the medium-sized cities analyzed in this paper, divided within the provinces of Andalusia.
Thus, first of all, the levels of tourism development and socioeconomic development of the selected cities have been analyzed. Then, it has been determined whether any relationship exists between both indicators. Finally, a classification of these cities according to the type of relationship that exists between these two latent variables has been presented, which allows for conclusions to be drawn that validate our initial hypothesis.
3.1. Selection of Indicators
An empirical work seeking to determine whether the level of tourism development of a territory (in our research, medium-sized cities of Andalusia) determines its level of economic development should be performed using a sufficiently long-time horizon that allows meaningful measurement of the influence of the variables used in this research (in 15 years). For this reason, in this study, the time horizon comprises the period from 2004 to 2019. The reason for choosing this time horizon is the availability of the indicator and the comparison of its evolution in this period of time.
In the present study, two latent variables are considered, which are called tourism development and socioeconomic development. These variables are determined by sixty-two manifest variables. Specifically, the tourism development variable is expressed in terms of thirty-nine of these manifest variables, while the socioeconomic development variable has been measured by the remaining thirty-three indicators.
These indicators have been chosen, taking into account the limitations that exist with regard to the availability of local information. The two main statistical sources providing information at the municipal level in Andalusia (National Institute of Statistics and Institute of Statistics and Cartography of Andalusia) have been consulted. These sources have redirected the selection of indicators of tourism development and socioeconomic development to other primary sources which provide individualized information on each area of study (tourism, economy, innovation, society, social welfare, environment), which enable, in general, an approximation of the measurement of the latent variables under study.
The full list of indicators (and their corresponding sources) that has been considered for each of the two latent variables can be found in
Table 2.
Having selected the indicators, we calculated the relative rate of change of each one of them for the period 2004–2019. There is a total of
n = 140 observations corresponding to an equal number of medium-sized Andalusian cities. For each locality,
p +
q = 62 variables of tourism development and socioeconomic development have been measured in two time periods, that is, tinitial and tfinal; their corresponding relative rate of change has been calculated, according to the following expression:
Needless to say, all the observed features are quantitative in nature, so their relative rates of change are also quantitative variables. Specifically, in contrast to most of the features initially observed, rates are continuous quantitative variables, and their range of variation is the entire real space. These rates are, besides, dimensionless and are expressed as decimal values.
Finally, it should be noted that it has been taken into consideration the positive or negative sign that applies to the direct or inverse relationship of each indicator with the two latent variables analyzed (tourism development and socioeconomic development).
3.2. Structural Equation Modeling
Structural equation models (hereinafter referred to as SEM) [
38,
39,
40,
41] allow researchers to measure the relationships that occur between a set of independent variables and a set of dependent variables, as well as to determine the level of support that a sample of observations provides to the hypothesis of causality between latent variables. These models are used as confirmatory tools aimed at checking the different dependency relationships existing between the variables, in this case, tourism development and socioeconomic development.
Given that the overall aim is to check the level of support that the sample of observations provides to the hypothesis of causality between tourism development and socioeconomic development, tourism development is considered as an exogenous variable and it will be denoted by ξ
1, while socioeconomic development will play the role of an endogenous variable and will be denoted by η
1. It is possible to make a model of this situation by means of a diagram of paths or trajectories, as shown in
Figure 1.
3.3. Factor Analysis of Principal Components
In the principal component analysis (hereinafter referred to as PCA), the primary objective is to maximize the variance of a linear combination of variables. Suppose we have a sample of n observation vectors , so that each observed vector is a point cloud in a p-dimensional space. Assuming y as having an ellipsoidal distribution (only for better geometric visualization, as the PCA may be applied with any distribution of y), if the variables , for each vector y_i are correlated, the ellipsoidal point cloud is not oriented parallel to any of the axes represented by the variables. So, we try to find the natural axes of the point cloud whose origin coincides with the centroid of the ellipsoid, , that is, the axes of the ellipsoid. This can be done through the translation of the origin to , and later, through the rotation of the axes. After this rotation, in which the new axes become the natural axes of the ellipsoid, the new variables, that is, the main components, are uncorrelated, which means that the principal components’ variance–covariance matrix is diagonal.
The rotation of the axes can be performed by multiplying the variables by an orthogonal matrix A:
so that the distance from the origin is invariant.
It can be seen that the orthogonal matrix that transforms into is none other than the transpose of the matrix whose columns are the normalized eigenvectors of the variance–covariance matrix of the original dataset. When the variables present significantly different variances, or when the measurement units vary, the eigenvectors are extracted from the correlation matrix to obtain a more balanced representation.
Thus, it is possible to calculate as many principal components as measured variables, so that the first of these principal components explains the greater proportion of variance of all principal components; the second principal component explains the greater proportion of variance that the first component has not been able to explain, and so on. Generally, based on the assumption that the variables are highly correlated among them, the proportion of variance explained by the last principal components will be very small, so it will be possible to discard some of them and represent the sample data using less than dimensions.
3.4. Statistical Software
In order to carry out this analysis, version 3.0.1 of the free statistical software R has been used. This is a modular program providing basic functionality that can be extended by downloading and installing a variety of additional packages that allow performing many statistical analyses. Within this context, a package can be defined as a group of functions that together solve a common problem. Among these packages, we find lavaan, which enables fitting different models involving latent variables, such as confirmatory factor analysis or structural equation modeling, among others. This has been, therefore, the package used for the data analysis.
The analysis has been carried out in two parts. In the first, a structural equation modeling analysis (SEM) is carried out to measure the possible relationship between tourism development and socioeconomic development, and later a factor analysis of the main components is carried out with the aim of obtaining a ranking of medium-sized cities based on tourism development and socioeconomic development to obtain greater results and, therefore, a broader discussion and conclusions.
4. Results and Discussion
After a descriptive analysis of the data, it is observed that most of the variable means fluctuate around zero. This is due to the fact that most of the values of the RRC (relative rate of change) vary between −1 and 1. It is possible to identify a group of variables (Population census, Natural Population Growth, Foreign Population, Annual Personal Income Tax, Commercial Vehicles, Hotel Rooms, Apartment-Hotel Rooms, 1-Key Apartment rooms, Film Screens) showing an unusually high variance. This is explained by the existence of extreme values far from the bulk of the observations, which implies a very different evolution of the medium-sized cities analyzed as far as these variables are concerned. After checking the data and ensuring that these extreme observations are not the result of any kind of error, but are actual values of the variables, the implementation of the structural equation analysis itself is carried out.
We began with the formulation of the model. In this case, the structural model follows the equation that is shown below:
Meanwhile, the measurement model is given by the following equations:
It remains now to be seen whether it is possible to identify the model. To do this, the necessary condition for the identification will be verified. While it is true that these conditions do not guarantee the identification of the model in all cases (they are only necessary, and not necessary and sufficient), it has been experimentally proven that the vast majority of models that meet these conditions happen to be identifiable.
The most important of these conditions states that the number of parameters to be estimated has to be less than or equal to the number of non-redundant elements of the sample variance–covariance matrix. In this case, the estimation of a total of 125 parameters, distributed as follows, is required:
while the variance–covariance matrix includes a total of 1953 non-redundant elements, given that:
Therefore, the first condition is fulfilled.
Another important condition is that relating to the number of indicators per latent variable. It is recommended that a minimum of three indicators per latent variable be used, and that each one of the indicators load only on one latent variable.
Moreover, regarding the metrics of the latent variables, their variances have been set to one, not only to satisfy the identification condition, but also to favor the convergence of the method of the parameter estimation. Finally, it is considered that the parameters of the regression coefficients of the indicators on their respective error terms are all equal to one.
Given the compliance of the model with the above four conditions, the probability that it can be identified is very high. We will check whether it is actually possible to estimate all the parameters that make up the model.
The tables presented below include the estimates of the parameters calculated using the maximum-likelihood method. For all parameters, their estimations and their standard errors, which have been calculated using a bootstrap or resampling method, are shown.
The
p-value associated with the statistic Z, which contrasts the significance of the parameter, is also shown for parameters
λX (
Table 3),
λY (
Table 4) and
γ (
Table 5). According to
Table 5, parameter
indicates that the sample of observations would support the hypothesis of causality between tourism development and economic development, since the value of
is significantly different from zero.
The results obtained should, however, be treated with caution, given that the high number of indicators loaded on each of the latent variables could be masking the true relations between them. As Hoyle [
42] points out, there seems to be agreement among researchers regarding the consideration of a minimum of three indicators per latent variable for the structural equation analysis to be carried out without problems. There is no consensus, however, on whether a maximum number of indicators per factor exist. Yet, between five and ten manifest variables are usually considered for each latent variable. In this case, this number is significantly higher, so a re-specification of the model would be advisable.
As it can be seen in the previously shown tables, only eight of the tourism development indicators are associated with a parameter that is significant at a 95% confidence level. Something similar happens with socioeconomic development, for which only sixteen out of the thirty-three indicators considered are associated with a significant parameter, considering the same confidence level. The other indicators cannot be, therefore, considered as such, since the parameter that goes with them is not significant, and they do not help to measure the latent variable in question. The variables whose parameters were significant are those listed in
Table 6. Taking this into account, a new model that considers only these indicators will be fitted.
After verifying that the identification of this second model, which will be called the reduced model to distinguish it from the general model, is possible, its forty-nine parameters have been estimated using the maximum–likelihood method.
As can be seen, now all parameters
(
Table 7),
(
Table 8) e
(
Table 9) are significantly different from zero at a 95% confidence level. This means that, on the one hand, given this confidence level, indicators
and
’ reduce the initial set of indicators to those presented in
Table 6, while, on the other hand, the relationship
provides support for the causal relationship between tourism development and socioeconomic development.
Once the parameters have been obtained, the goodness-of-fit of the reduced model is analyzed, comparing the results with those of the general model. To do this, we will use the measurements outlined in
Table 10 as a basis.
In general terms, we can conclude that the reduced model improves the measures of fit of the general model. In both models, the hypothesis that the observed covariance matrix is equal to the reproduced covariance matrix is rejected. Although this may be due to the fact that the model does not adequately reproduce the covariance matrix, this test is severely affected by large sample sizes, as in this case. Moreover, the reduced model improves all relative goodness-of-fit measurements (shown in italics in
Table 10), obtaining values closer to the unit. The reduced model is associated also with smaller values of AIC, BIC and RMR compared to the same values for the general model, which implies a better fit of the former model compared to the latter [
43,
44,
45,
46,
47,
48].
4.1. Ranking of Municipalities Based on Tourism Development and Socioeconomic Development
Finally, municipalities will be classified into two categories: the first one will be based on the value of the tourism development index that each place presents, while the second one will be based on the value of the socioeconomic development index. As is well known, these indexes are not directly observable or measurable, so in order to obtain their value in each of the one hundred and forty municipalities that make up the set of observations, the technique known as principal component analysis has been used.
4.1.1. Tourism Development Index
By applying this statistic technique to this specific case, and in order to obtain the values of the tourism development index (TDI), four principal components have been extracted from the correlation matrix (as the variables showed very different variances), which can be considered subindexes or sub-measures of that index. It will be calculated then as follows:
where
,
,
,
weight each subindex according to the percentage of variance that explains each one of them. Thus, the first four principal components explain, together, 70.049% of the total variability of the observations, which is distributed as presented in
Table 11.
Table 12 allows for obtaining the expression of the four extracted components according to the dummy variables, as was already done with the principal components for the TDI.
Once the weight of the TDI variables is known, the value of the TDI for each city is calculated. The thirty municipalities with a higher level of tourism development are listed in
Table 13.
On the basis of the data obtained, it is possible to identify three main groups of medium-sized cities according to their spatial characteristics, resources and location. Firstly, we find a group referred to as “synergistic medium-sized cities”, including those cities that are part of the metropolitan area of Andalusian provincial capitals, which in principle, according to the position of many of them in the ranking, seem to have a strong tourist appeal. However, these kinds of cities are not in line with the concept of tourism destination, as their relevance is due to the fact that they offer a wide variety of services at competitive prices, which makes them become dormitory medium-sized cities linked to any of the big tourism capitals of Andalusia (Seville, Malaga and Granada).
In this analysis, this group of cities represents 25% of all medium-sized cities of Andalusia. Among the thirty cities that have shown a higher level of tourism development, we find the municipalities of Bormujos, Mairena del Aljarafe, Espartinas, Churriana de la Vega, Las Gabias, Ogíjares, Atarfe, Huércal de Almeria, Cartaya, La Carlota, Chiclana de la Frontera, La Zubia, Gines, Punta Umbría and Tomares. These cities represent 50% of the thirty municipalities with a higher level of tourism development.
A second group, referred to as “coastal medium-sized cities”, has also been identified. It includes those cities that meet the definition of coastal tourism destinations, as they have high tourist appeal and possess a variety of natural tourism resources typical of the Andalusian coast. In the sample, this group represents 19% of the one hundred and forty observations. Out of the thirty municipalities with a higher level of tourism development, this group comprises the municipalities of Mijas, Marbella, Rincón de la Victoria, Níjar, Conil de la Frontera, Vera, Estepona, Torrox, Roquetas de Mar and Isla Cristina, representing 34% of the sample.
The third group that has been identified corresponds to “inland medium-sized cities”, which includes those cities located in the interior classified as tourism destinations. These cities have a historical and cultural heritage located in their territorial space that makes them a unique tourism site. This group represents 56% of all municipalities analyzed. As seen in
Table 13, it includes cities such as Alhaurín de la Torre, Cártama, Arcos de la Frontera, Vícar and Vejer de la Frontera, which account for 16% of the more developed municipalities from the tourism point of view.
4.1.2. Socioeconomic Development Index
In line with the process carried out with the dummy variables of tourism development, five principal components have been extracted from the correlation matrix (since the variables show very different variances), which can be considered subindexes or sub-measures of the socioeconomic development index (SDI). It is calculated as follows:
where
,
,
,
,
weight each subindex according to the percentage of variance that explains each one of them. Thus, the first four principal components explain, together, 73.303% of the total variability of the observations.
The values included in the fourth column of
Table 14 will play the role of weights to calculate the SDI.
Meanwhile,
Table 15 shows the weight of each variable on each one of the five components extracted. Its content enables the expression of each component, so, for instance, the first one of them can be calculated considering the following expression:
where the values of the variables have been previously standardized. Similarly, the other four components can be calculated.
Using the expressions of the principal components, it is possible to calculate the scores of each of the observations of the sample data for each factor to, in turn, obtain the value of the SDI in each municipality. It can be verified, therefore, that the five municipalities with a higher SDI are those listed in
Table 16.
In order to draw conclusions regarding the SDI, it is necessary to compare the results with those discussed above, in the ranking of tourist cities (
Table 13). The joint analysis of
Table 13 and
Table 16 allows for observing that some municipalities appear in the tables as changing their position, or even do not appear in any of these two tables. This situation is due, according to the variables used in the analysis, to the fact that there are some municipalities with a high level of socioeconomic development which, however, does not correspond to the same level of tourism development; or, on the contrary, there are municipalities with a high level of tourism development which, from a socioeconomic perspective, shows a lower level of development.
In short, it can be concluded that the reason for this disparity lies in the use of tourism as a development factor or, on the contrary, in the use of other factors that obviate the possible potential of tourism, as there may be other factors, not related to tourism, that determine, to a greater extent, the economic development level.
4.1.3. Global Index
A general index of tourism and economic development has been also elaborated, by averaging the values obtained for the two indexes. The thirty municipalities leading the general index are shown in
Table 17.
Table 17 presents the ranking of cities by means of a global index representing the sum of the components that mark TDI and SDI. Therefore, the overall result shows major tourism destinations located both in coastal and inland Andalusia (shown in italics in the table), while the others correspond to medium-sized cities within the metropolitan area of large provincial capitals.
As a final discussion, it is worth mentioning that the direct relationship between tourism development and socioeconomic development is a matter of importance, especially due to the current situation generated by the COVID-19 pandemic in which the main world economies, such as the Spanish economy, linked to tourism have suffered a greater drop in its indicators of socioeconomic development.
Furthermore, the established ranking makes it possible to clearly see that those cities near the coast or large provincial capitals develop with a very clear pattern that corresponds to medium-sized cities in an area of touristic importance in Andalusia. Some things to bear in mind are some measures included in the conclusions.
5. Conclusions
The technique of structural equation modeling was applied to a total of one hundred and forty observations, for which a total of sixty-two relative growth rates were measured, obtained from the measurement of other many features in two different time periods. Twenty-nine of the relative rates of change analyzed make up the group of indicators of a latent variable that has been called “tourism development”, while the remaining thirty-three form a group of indicators related to another latent variable, referred to as “socioeconomic development”.
The maximum likelihood estimation of the parameters of the structural equation model revealed the existence of many non-significant parameters, so a re-specification of the model was performed by eliminating those variables whose parameters could be considered zero.
As a result, we obtained the significance of most of the parameters of the model at a 95% confidence level, and the support to the hypothesis of causality between tourism development and socioeconomic development, at that same confidence, which is especially relevant, taking into account the current situation of the COVID-19 pandemic and the relationship between those territories that have experienced a decrease in tourism.
Once the ranking of municipalities was obtained, using the analysis of principal components, three lists were elaborated (municipalities with more tourism development, municipalities with more socioeconomic development and municipalities with more tourism and socioeconomic development) which allow for drawing the necessary conclusions for the set hypothesis.
In fact, this research work has demonstrated that, in the cities analyzed, there is a relationship between tourism development and socioeconomic development, or rather, that tourism development influences socioeconomic development. Furthermore, these cities develop with a very clear pattern; they grow around key tourist development areas for the Andalusia region.
It has also been shown that this relationship does not occur with equal intensity in all cities. In fact, it has been found that cities leading the ranking of the TDI do not occupy the same position in the ranking of the SDI, and the other way round, which means that, even having demonstrated this causal relationship, it is conditioned by a number of factors that make it more or less intense.
The next step, and therefore, a future line of research, would be to identify those factors that help or hinder this relationship, which ultimately explain why the position occupied by the cities in the two rankings is not the same.