**Geo Data Science for Tourism**

Editors

**Andrea Marchetti Angelica Lo Duca**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Andrea Marchetti Institute of Informatics and Telematics—National Research Council (IIT-CNR) Italy

Angelica Lo Duca Institute of Informatics and Telematics—National Research Council (IIT-CNR) Italy

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *ISPRS International Journal of Geo-Information* (ISSN 2220-9964) (available at: https://www.mdpi. com/journal/ijgi/special issues/GIS tourism).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-5029-9 (Hbk) ISBN 978-3-0365-5030-5 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**

## **About the Editors**

#### **Andrea Marchetti**

Andrea Marchetti (Senior Technologist) works at the Institute of Informatics and Telematics of the National Research Council, Italy. His main research activities are data science, data quality, and web application. He applies them to various fields such as cultural heritage, tourism, health, and other topics of national interest. He has participated in several European and national projects including OpeNer, Caper, and GeoMemories. He is currently in charge for the IIT of the interreg "Pitem Pace Far Conoscere" project and the "Osiris-FO" project. Since 2012, he has been teaching courses on Web Design and Data Journalism at the University of Pisa.

#### **Angelica Lo Duca**

Angelica Lo Duca (Researcher) works at the Institute of Informatics and Telematics of the National Research Council, Italy. She is also an external professor of Data Journalism at the University of Pisa. Her research interests include Data Science, Data Journalism, and Web Applications. She used to work on Network Security, Semantic Web, Linked Data, and Blockchain. She has published more than 40 scientific papers at national and international conferences and journals. She has participated in different national and international projects and events. She has been a member of the Program Committee at different conferences. She is also part of the Editorial Team of the HighTech and Innovation Journal.

## **Preface to "Geo Data Science for Tourism"**

Tourism is one of the largest and most important industries in the world. It directly employs millions of people and generates billions of dollars in revenue each year. Given its importance, it is not surprising that data science is increasingly being used to understand and optimise the tourism industry. Geodata science, in particular, is playing a key role in this effort. By analysing large data sets, geodata scientists are able to identify patterns and trends that can be used to improve the efficiency and effectiveness of the tourism sector.

Geodata Science for Tourism aims at investigating the recent challenges in tourism seen from the point of view of data science. There are many challenges that tourism businesses face when it comes to data science. One of the biggest is understanding how to use data to drive decision making. With so much data available, it can be difficult to identify which features are most relevant and how to use them to improve operations. Additionally, data science can be used to better understand customer behaviour and preferences, which can help businesses tailor their offerings and better meet customer needs. However, collecting and analysing customer data can be costly and time-consuming, making it a challenge for smaller businesses in particular. Another key challenge is staying ahead of the curve in terms of technology and analytics. As the tourism industry evolves, so too do the tools and techniques that data scientists use to understand it. Businesses need to invest in keeping their data science teams up to date with the latest developments in order to stay competitive.

By understanding customer behaviour and preferences, businesses can make more informed decisions about where to open new locations, what type of amenities to offer, and how to price their services. Additionally, data can help businesses track and understand trends in the industry, such as shifts in customer demand or changes in competitors' offerings.

Operationally, data can be used to optimise everything from staffing levels to inventory management. By understanding which times of day are busiest or which services are most popular, businesses can staff accordingly and ensure they have the necessary supplies on hand.

In short, data is essential for businesses in the hospitality industry to succeed. Geodata science can help these businesses make better use of data to improve their decision making and operations.

There are many different types of data that can be used to study tourism. This includes data on tourist destinations, travel patterns, and spending. Geodata science is a relatively new field that uses geographical data to study tourism and its impact on the environment. Geospatial data is data that captures the location and shape of an object on the earth's surface. This type of data can be used to track the movement of people and objects, as well as to identify patterns and trends.

One of the benefits of using geodata science for tourism research is that it can help to identify trends and patterns in tourist behaviour, as you will see in the first four articles of the book. Another benefit of geodata science is that it can help to assess the impact of tourism on the environment. This includes looking at things such as water consumption, energy use, and carbon emissions. By understanding the environmental impact of tourism, we can make more sustainable choices about how we travel. Geodata science for tourism can also be used to predict future trends, as you will read in the last two chapters of the book. This information can be used by policymakers to make decisions about where to allocate resources.

We hope that this book will help you to extend your awareness of the benefits of using geodata science in the tourism industry.

**Andrea Marchetti and Angelica Lo Duca**

## *Article* **Nanjing's Intracity Tourism Flow Network Using Cellular Signaling Data: A Comparative Analysis of Residents and Non-Local Tourists**

**Lingjin Wang, Xiao Wu \* and Yan He**

School of Architecture, Southeast University, Nanjing 210096, China; lingjin\_wang@seu.edu.cn (L.W.); 230189014@seu.edu.cn (Y.H.)

**\*** Correspondence: 101010124@seu.edu.cn; Tel.: +86-138-5178-0536

**Abstract:** With the rapid development of transportation and modern communication technology, "tourism flow" plays an important role in shaping tourism's spatial structure. In order to explore the impact of an urban tourism flow network on tourism's spatial structure, this study summarizes the structural characteristics of the tourism flow networks of 43 scenic spots in Nanjing from three aspects—tourism flow network connection, node centrality, and communities—using cellular signaling data and the social network analysis method. A comparative analysis revealed the tourism flow network structures of residents and non-local tourists. Our findings indicated four points. Firstly, the overall network connectivity was relatively good. Core city nodes displayed high spatial concentration and connection strength. However, suburban nodes delivered poor performance. Secondly, popular nodes were intimately connected, although there were no "bridging" nodes. Lesser-known nodes were marginalized, resulting in severe node polarization. Thirdly, regarding the network community structure, the spatial boundary between communities was relatively clear; the communities within the core city were more closely connected, with some parts encompassing suburban nodes. Most suburban communities were attached to the communities in the core area, with individual nodes existing independently. Fourthly, the primary difference in the tourism flow network structures between residents and non-local tourists was that the nodes for residents manifested a more balanced connection strength and node centrality. Core communities encompassed more nodes with more extensive coverage. Conversely, the nodes for non-local tourists showed wide discrepancies in connection strength and node centrality. Furthermore, core communities were small in scale with clear boundaries.

**Keywords:** tourism flow; cellular signaling data; social network analysis; network connection; node centrality; communities

#### **1. Introduction**

Since the 1960s, due to the continued developments in modern science and technology (including computer and network information technology, advanced transportation, modern communications, globalization, and informatization), global networking has become a significant development tendency. Against this background, by integrating the Marxist theory of globalization, information theory, and postmodern space theory, sociologist Manuel Castells proposed a novel social research theory—the space of flows theory—revealing a new perspective on the organizational logic of the modern social system [1]. Under its influence, "space of flows" has become a research hotspot in the geography domain, and has inspired many research topics, including information flow, traffic flow, knowledge flow, culture flow, and technology flow.

Tourism space is the projection of tourist activities in space. Regarding tourism destinations, tourists display the characteristic of mobility [2]. Therefore, the concept of tourism flow appeared when the spatial structures of tourism, at urban and regional levels,

**Citation:** Wang, L.; Wu, X.; He, Y. Nanjing's Intracity Tourism Flow Network Using Cellular Signaling Data: A Comparative Analysis of Residents and Non-Local Tourists. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 674. https://doi.org/10.3390/ijgi10100674

Academic Editors: Andrea Marchetti, Angelica Lo Duca and Wolfgang Kainz

Received: 20 July 2021 Accepted: 1 October 2021 Published: 4 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

were studied from the perspective of tourist behavior [3]. It has become an important research topic in western tourism geography, which unfolds mainly from the spatial models of tourism flow, its causes and impact mechanisms, and tourists' spatiotemporal behavior. The study of the spatial models of tourism flow started in 1977 when Hill et al. proposed the "core-edge" model of tourism flow [4]. Most of the literature that followed also began from the spatial analysis angle, and successively proposed such laws or models as distance decay, the gravity model, and spatial dimension [5]. The research on the causes and impact mechanisms of tourism flow began in the 1980s, when knowledge from related subjects (including mathematics, geography, and economics) was introduced to explore the causal mechanism of the economic impact of tourism flow on destinations, and the law of its endogenous occurrence and occurrence pattern simulations. Since the beginning of the 21st century, such research has gradually matured into a system. There is quantitative research focusing on the related political factors and economic constraints [6], and the impacts of various supply-side resources [7]. The research on tourists' spatiotemporal behavior only began in the late 1990s. It focused on the prediction of the direction and quantity of tourism flow, and tourism consumption [8,9]. By comparison, tourism flowrelated research in China started late, commencing in the 1980s. It focused mainly on spatiotemporal distribution and the law of evolution. The relevant explorations centered around the laws of the spatiotemporal evolution of tourism flow [10], and its node transfer mechanism on a countrywide, provincewide, and citywide scale [11,12], summarized in the spatial structural model of tourism flow [13,14]. In a preliminary exploration of the driving-force mechanisms of tourism flow, researchers usually relied on annual and seasonal change indices. They based their studies on statistical indicators, such as indegree and out-degree nodes in social networks [15] and the skewness index [16]. They performed their research using time-series and cluster analyses, while simultaneously drawing on the push-pull theory, origin–destination (OD) distribution theory, and drivingforce theory [17,18]. Whether it is information, traffic, or tourism flow, in this study, they are all significant channels through which we may glimpse the structure of urban social and spatial networks.

Tourism flow shaped the structure of the tourism network, which is of great significance to the spatial study of tourism. Tourism networks are fundamentally social networks constituted by tourist behavior. In recent years, scholars have brought the social network analysis from the sociological domain into the analysis of tourism flow network structures [19,20]. A social network is a structure made of mutually connected behavioral objects, which is considered to be a structure that is constituted by social relations. It is used extensively in studying social media networks, information communication networks, friend circles, business networks, kinship networks, and disease transmission networks [21]. Social network analysis is a research method that is based on social network theory and is applied to the social interaction between individuals in complex relationships [22]. This method has established rich model parameters for studying node objects, edge objects, and the constitutive features of networks themselves (including centrality analysis, community analysis, network correlation, and core-edge structure). These areas became the research directions for multiple-domain applications in sociology, mathematics, and computer and communication science. With the rapid development of information technology, the largescale statistics and a vast amount of data regarding tourists' spatiotemporal behavior are readily available. This data enables the study of tourism flow network structures, based on the social network analysis method, to move further toward refinement and quantification. Currently, the related big data studies concentrate on national and regional scales [23]. Scholars study tourists' spatial distribution and construct tourism flow networks using big data [24] (such as the GPS location information of pictures [25], independent itineraries [26], network travel notes, and tour routes in online bookings [27]), and further investigate the influencing evolutionary factors or network formation. Some scholars use location-based social networks to understand human mobility and people's behavior by mining check-in patterns, studying the influence of hidden structural patterns in social network nodes, and

the changes in external environment, on user check-in patterns [28]. In addition, due to the current global spread of COVID-19, many scholars have combined the geo-tagged data of social media with epidemic prevention needs to establish a reference model to predict the infection risk of social interaction and travel between residents and tourists [29,30].

Considering the lack of existing research, this study highlights three aspects of innovation. Firstly, it highlights the combination of the application of big data and research on tourism flows within cities at the medium and micro scales. Due to the limited precision of big data, the demand for washing and screening users' phone signaling data within a city is higher; therefore, the existing research is mainly from a regional perspective, focusing on the tourist flow between cities and the relationship between the cities' tourism resources [31]. This study carried out data processing on a variety of filter condition experiments, and the processing results, multiple times, meaning the relatively accurate basic data could be used to analyze the spatial structure of tourism flow within the city. Secondly, it highlights the combination of the social network analysis method and traditional space theory. Traditional space theory emphasizes static material space expression; however, this study used the social network analysis method to connect human activities with spatial structures, which can reflect the functional connection in urban space more accurately. Thirdly, it analyses the differences between local residents and tourists in a tourism network structure. The differences in tourism flow network structures, caused by different sources of tourists (local and non-local, age structure differences, consumption level differences, etc.) have been ignored in the existing research. This study attempts to conduct a comparative analysis of the tourism flow network that is formed by local residents and tourists, and to excavate the differences between them.

Therefore, in order to further explore the impact of an urban tourism flow network on tourism's spatial structure, this study uses the social network analysis method and cellular signaling data. From the perspective of tourists' spatiotemporal behavior, and using a summary of the overall characteristics of the tourism flow network structure of Nanjing city proper, this paper differentiates between resident and non-local tourism flow network structures. In addition to addressing the deficiency in the existing research, regarding the application of big data and the absence of research scale, it provides a scientific basis for the differentiated organization of tourism space and tourism routes, urban infrastructure, transportation planning, and tourism social management.

#### **2. Materials and Methods**

#### *2.1. Research Districts*

Nanjing city has a long history and culture, and unique natural landscapes, making it a major scenic tourist destination at the national level. In 2015, the number of domestic tourists reached 99.9266 million, ranking it among the top tourist cities in the country. According to the 2018 monthly "Report on the Platform for Operation Monitoring of Smart Tourism Big Data in Nanjing", issued by the Nanjing Municipal Bureau of Culture and Tourism, the number of tourists visiting scenic spots with grades of 2A or above reached 91.8% (referring to China's "Classification and Evaluation of Tourist Scenic Spot Quality Grade", the scenic spot classification includes five grades; 5A is the highest grade for scenic spots). This study took Nanjing city proper as its spatial scope, selecting all three A-grade or above scenic spots in the city as the specific research districts, including 43 popular scenic spots (see Figure 1), for tabulating the statistics of tourist spatial behaviors.

**Figure 1.** Research scope and selected scenic spots: (**a**) entire city of Nanjing; (**b**) core city of Nanjing.

#### *2.2. Data Sources*

This study performed two statistical analyses on different temporal and spatial scales, using cellular signaling data, to examine the phenomenon of tourism flow resulting from the spatial displacement of tourist crowds. This method overcame the predicament of the insufficient temporal and spatial accuracy of tourist behavior, encountered in previous research adopting the traditional analytical method of regional perspective. The data in this study came mainly from the cellular signaling data (user ID, number attribution, geographical position of triggering base station, and triggering moment) provided by Nanjing Mobile. The concrete selection steps are outlined as follows. First, data collection appearing in Nanjing city proper on four statutory holidays in 2015, viz., November 14, 15, 21, and 27. Second, tourist monitoring at related scenic spots in Nanjing city, referencing average length of visit for each scenic spot in the city. Data were selected from at least

two scenic spots within the research scope (see the above 44 scenic spots) that tourists visited for over an hour. These two scenic spots were treated as the spatial departure and destination points. Third, on this basis, per number attribution, the data of local and non-local users were identified. Based on the length of evening visits (staying for more than 3 consecutive hours between 24:00 and 07:00) on working days, versus day visits during working hours (staying for more than 4 consecutive hours between 07:00 and 19:00), the residences and workplaces of local users were identified. Based on the time spent on handling business in the daytime (staying for more than 2 consecutive hours between 08:00 and 19:00), the destinations of non-local users on business travel were identified, which were later excluded. In the end, from the 2.365 million active users (who appeared at least twice on the four statutory holidays), the data of 1.212 million local users and 108,000 non-local users were identified and used as the foundational data for this comparative study on tourism flow structures of local and non-local users.

Of the sampled users, Nanjing Mobile's market share was 64.7%, meaning the above identification results were tantamount to a large sample. Based on the permanent resident population of approximately 8 million in Nanjing's 6th population census, the sample of identified local users equaled 23.4% of the total population. Considering the average daily reception of 270,000 non-local tourists in Nanjing, the amount of non-local users in the sample was 10%. Such samples are far larger than those in traffic and manual questionnaire surveys. No statistical data were available to examine the tourist spots identification results in terms of identification accuracy. However, on the four holidays, the correlation coefficients between the daily distribution of any two recreation areas all reached over 0.8, indicating that the recreation areas' identification results were stable and should also be credible, thereby bearing overall representativeness.

#### *2.3. Research Methods*

The transfer and diffusion of tourists between tourism destinations produces a certain connection between each destination, giving rise to a dynamic evolutionary system. The social network analysis method can precisely depict the various relationships within a system, from the perspective of macro-structural relations. For this reason, has been used reliably in recent tourism studies. Therefore, this study, with the aid of the social network analysis software UCINET [32], analyzed three aspects—namely, tourism flow network connection, node centrality, and network communities. In addition, the overall characteristics of Nanjing's tourism flow network structure were summarized visually using the ArcGIS digital technology platform. On this basis, this study further compared the differences in the characteristics of tourism flow network structures, between local residents and non-local tourists. The concrete research methods are shown in Figure 2.

#### 2.3.1. Methods for Analyzing Tourism Flow Network Connections

Based on social network analysis, the spatial departure statistics and tourist destination points in users' tourist activities were collected at each scenic spot, per their spatial positions. The scenic spots were set as network nodes, and visiting tourists were sequentially linked to form a network. By collecting the flow volume statistics of each node, we obtained the spatial concentration of each node in the entire tourism flow network to represent the out-degree centrality. Next, an asymmetric adjacency matrix was constructed, the multi-valued matrix was converted to a binary matrix, and the ArcGIS software analyzed the strength of network connection between nodes.

**Figure 2.** Research methodology.

2.3.2. Methods for Analyzing Node Centrality in Tourism Flow Networks

The centrality of nodes is a significant indicator in social network analysis. Through the depiction of the centrality of different standards, the value and status of a node's existence can be reflected in tourism flow network structures. Three forms of centrality have been selected for analysis in this study:

• In-degree centrality (popularity and attractiveness)

In-degree centrality measures the popularity of a certain node, representing the extent of a scenic node's clustering ability. This study selected the in-degree centrality and standardized it to the [0,1] interval. The calculating formula of the standard in-degree centrality *CRDi* of node i is as equation:

$$\mathbb{C}\_{RD\_i} = \frac{\sum\_{j} a\_{ji}}{\max(\sum a)} \tag{1}$$

• Eigenvector centrality (latent attractiveness)

This depends on the direct relation of a node with its adjacent point and represents the connectivity level of tourism nodes. Regarding adjacent matrix *A*, the score *Xi* of the relative centrality of node *i* should exist:

$$X\_i = \frac{1}{\lambda} \sum\_j a\_{ji} X\_j \tag{2}$$

Rewriting the expression gives the eigenvector calculating equation:

$$A\mathbf{x} = \lambda\mathbf{x} \tag{3}$$

Solving and standardizing the matrix gives the calculating equation of the standard eigenvector centrality *CRβ*:

$$\mathcal{L}\_{R\bar{\beta}} = a \left( I - \lambda A \right)^{-1} A I \tag{4}$$

In the equation, *α* is the standardized constant, *λ* is the eigenvalue corresponding to the first eigenvector, which determines the importance of the adjacent point to centrality, and *I* is the identity matrix.

• Betweenness centrality (controlling ability of mediation)

This measures the ability of a node to control the movement of tourists between node pairs in the tourism flow network, manifesting the controlling ability of a node, or its network mediating and moderating effects.

Assuming that the number for the shortest path between node *j* and node *k* is *gjk*, and the number for the shortest path between node *j* and node *k* that must pass through *i* is *gjk(i)*, then the probability of *i* situating in the shortest path between *j* and *k* can indicate the betweenness centrality of node *i*. The calculating formula of the standard betweenness centrality is as calculating equation:

$$\mathcal{C}\_{RB\_i} = \frac{2\sum\_{j\neq k} \frac{\mathcal{S}\_{jk}(i)}{\mathcal{S}\_{ik}}}{(n-1)(n-2)}\tag{5}$$

Based on the above standard centrality, centrality was divided into three classes (high, middle, and low) through natural breaks, in order to ensure that the internal difference within the same class was minimized, while the differences between different classes was maximized. This paper combined the characteristics of node centrality (in-degree centrality, eigenvector centrality, and betweenness centrality) (see Table 1), and constructed an evaluation system for the node status in the tourism network through the (matrix) model.

**Table 1.** Combination relationship of three types of node centrality in the tourism network.


2.3.3. Methods for Analyzing the Communities in the Tourism Flow Network

Community analysis is another important focus of research in social network analysis. The communities that are demarcated through nodes, and their connections, can reflect the extent of independence and popularity change of network tourism routes. This paper used the CONCOR method of the UCINET analysis software to perform a cohesive subgroup analysis, measuring the strength of node connections in flow volume. The entire network was divided into several sub-networks with powerful internal connections. Through multiple iterations, a correlation coefficient matrix was created. The higher the numerical value of the density matrix, the closer the connection between subgroups. It has guiding significance regarding tourists in their selection of combinations of tourism nodes, and designation of tourism routes. Furthermore, through the "core-edge" model, we further determined the status of tourism nodes in the overall network, and summarized and analyzed the structural model of Nanjing's tourism flow network.

#### **3. Results**

*3.1. Analysis of Tourism Network Connections*

3.1.1. Spatial Concentration of the Tourism Flow Network

According to the research methods stated in Section 2.3.1, of the 43 scenic spots comprising the spatial scope, we selected 10,252 pairs of nodes with vectors. We collected the statistics of each node's visit volume. Then, the nodes were divided into five classes, according to natural breaks. The larger the nodes, the higher their spatial concentration. The spatial concentration of the overall–local–non-local networks (see Figure 3) was thus derived, representing their out-degree centrality. The results showed the following.

**Figure 3.** Spatial concentration of the tourism flow network under three types of network context (overall, resident, and non-local tourist): (**a**) overall tourism flow network of entire city; (**b**) resident tourism flow network of entire city; (**c**) non-local tourism flow network of entire city; (**d**) overall tourism flow network of core city; (**e**) resident tourism flow network of core city; (**f**) non-local tourism flow network of core city.

As a whole, the overall connectivity of Nanjing's tourism flow network was relatively good. No nodes were completely isolated. From the entire city proper, the network density was not high. However, the nodes in the core city were highly popular, with flows exhibiting the clustering feature. Nodes whose spatial concentration was in the first class (Zhongshan Scenic Area, Confucius Temple, Xuanwu Lake, Presidential Place (1912 District), Nanjing University Gulou District, and Hongshan Forest Zoo) were all

located in the core city. Their spatial concentrations were far higher than those in the second class, thus demonstrating their absolute core status. Additionally, all of the nodes whose spatial concentration ranked in the first three classes were in the core city. Most of the 14 nodes with the lowest class spatial concentration were in the suburbs outside the core city (including Gaochun International Cittaslow Tranquil, Pingshan Forest Park, Jinniu Lake Scenic Area, Fangshan Scenic Area, Dajinshan Scenic Area, Tiansheng Bridge Scenic Area, etc.), except for the Nanjing Yangtze River Bridge and Meixian Xincun Memorial Hall, which were in the core city. Generally, the spatial concentration of Nanjing's tourism flow showed a network characteristic of "exceptionally high spatial concentration for core city nodes, low spatial concentration for suburban nodes, and outstanding performance of core nodes".

By comparing the nodes' spatial concentrations formed by the tourism behavior of residents and non-local tourists, we discovered that the local network visits to scenic spots manifested higher densities than non-local tourists, although both displayed the characteristic of high spatial concentration for the core city network, and low spatial concentration for the suburban network. The spatial concentration of residents and nonlocal tourists related to core city nodes was generally similar. Residents displayed higher spatial concentration than non-local tourists, in terms of humanistic scenic spots (e.g., the Nanjing Museum, Nanjing University Gulou District, and Yihe Road Republican Architecture Complex). In comparison with non-local tourists, residents showcased higher and more comprehensive spatial concentration regarding suburban scenic spots. For nonlocal tourists, the nodes of Pingshan Forest Park, Fangshan Scenic Area, and Gaochun International Cittaslow Tranquil were isolated. In sum, both groups, residents and non-local tourists, displayed a characteristic of "relatively consistent overall spatial concentration, with slight differences existing in individual nodes".

#### 3.1.2. Analysis of the Connection Strength of the Tourism Flow Network

Based on the above conclusion, this study further analyzed the strength of the network connection between nodes. The lines between tourism nodes represent the volume of tourism flows, with the thickness of the lines indicating the volume size, and the arrows signifying the direction of the flows. The connection strength was obtained from the overall–local–non-local networks (see Figure 4). Our findings showed the following.

Generally, over 85% of Nanjing's node connections were concentrated in the core city area. Fundamentally, the node connections gradually weakened as the distance from the core city area increased. Most of the suburban nodes showed a unidirectional inflow. Such connections resulted from the spillover from the core city nodes. The top five node pairs in the network connections (Zhongshan Scenic Area–Confucius Temple, Zhongshan Scenic Area–Xuanwu Lake, Beiji Ge Park (Jiming Temple)–Xuanwu Lake, Hongshan Forest Zoo–Xuanwu Lake, and XuanwuLake–Hongshan Forest Park) were all bidirectionally connected. Among them, Zhongshan Scenic Area, Confucius Temple, and Xuanwu Lake formed strong connections with two or more nodes and, thus, served as the absolute core in the network connection. Most of the second class node pairs formed from the first class node connection with other nodes, and so on. Thus a "tree" network structure manifests in Nanjing city's tourism flow network, where nodes with high connection strength spill over progressively to those with low connection strength. On the whole, the connection strength of Nanjing's tourism flow showed a "high-strength connection between nodes in the core city area, with a progressive decrease towards the periphery spatially".

Comparing the node connection strength that was formed by residents and non-local tourist behaviors, we found that both exhibited high connection strength in the core city, and low connection strength in the suburbs. However, the nodes that residents visited showed a higher overall connection strength than those of non-local tourists. The connection strength of the nodes visited by residents displayed a relatively balanced spatial distribution, and more diverse types. Comparatively, the connection strength of the nodes visited by nonlocal tourists showed a more concentrated spatial distribution, with blank spaces appearing

in the network connection of some suburb nodes. Additionally, the spatial concentration tended to fall on higher-rated and better-known node connections. Resident-visited nodes that fell in the first grade of connection strength included the following four pair nodes: Xuanwu Lake–Zhongshan Scenic Area, Xuanwu Lake–Beijing Ge Park (Jiming Temple), Xuanwu Lake–Hongshan Forest Zoo, and Old East Gate–Confucius Temple; non-local resident-visited nodes with first grade connection strengths include the following two node pairs: Xuanwu Lake–Zhongshan Scenic Area and Confucius Temple–Zhongshan Scenic Area. This shows that marked differences existed between these two groups. In terms of tourism route selection, residents attached greater priority to proximity in spatial location, whereas non-local tourists tended to choose well-known scenic spots. Generally, concerning node connection strength, "the nodes residents visited displayed a spatial balance in connection strength and diversification of types, while the nodes visited by non-local tourists exhibited a spatial concentration and inclination toward highly-rated nodes. The two groups showed distinct differences in tourism route selection".

**Figure 4.** Connection strength of tourism flow networks under three types of network context (overall, resident, and non-local tourist): (**a**) overall tourism flow strength of entire city; (**b**) resident tourism flow strength of entire city; (**c**) non-local tourism flow strength of entire city; (**d**) overall tourism flow strength of core city; (**e**) resident tourism flow strength of core city; (**f**) non-local tourism flow strength of core city.

#### *3.2. Analysis of Node Centrality in the Tourism Flow Network*

#### 3.2.1. Node Centrality Analysis

Centrality is a quantitative statistic of node power, as seen in the analytical methods for tourism flow networks in Section 2.3.2. Centrality can depict the values of nodes in tourism networks. The calculations in this paper produced the differentiated measure values of the three types of centrality under different tourism flow network contexts—namely, overall, local, and non-local (see Table 2). The natural break classification categorized them into three levels. The internal difference within the same level was minimized, while the differences between different classes was maximized. A spatial visualization was also performed (see Figures 5–7).


**Table 2.** Tabulation of three types of node centrality in the tourism flow network.


**Table 2.** *Cont.*

\* Note: the in-degree centrality is ranked in descending order, and only the first 15 are shown.

• Popularity and Attraction of Network Nodes (In-degree Centrality)

As shown in Figure 5, the popularity and attraction of the core city nodes were far higher than the suburban nodes. A total of 90% of the core city nodes were within the high and medium levels of popularity and attraction, with a relatively small range. As for the suburban nodes, their popularity and attractiveness were low, with only a small number at the high and medium levels of popularity and attractiveness.

A comparison of the popularity and attractiveness of the nodes visited by residents and non-local tourists showed that, in both groups, core city nodes were more popular than suburban nodes. The most popular nodes included Xuanwu Lake, Zhongshan Scenic Area, Confucius Temple, and Nanjing University Gulou District. The nodes that attracted residents were more balanced in their spatial distribution (particularly in the core city), with many of the nodes falling within the high and middle classes. By comparison, the nodes that attracted non-local tourists were more concentrated, and the number of popular nodes was far lower than that of the popular nodes visited by residents.

**Figure 5.** Classification map of the in-degree centrality of nodes under three types of network context (overall, resident, and non-local tourist): (**a**) entire city of Nanjing; (**b**) core city of Nanjing.

• Latent Attraction of Network Nodes (Eigenvector Centrality)

As indicated in Figure 6, the intensity of the activities around the core city nodes was far higher than in the suburban nodes. Six nodes were located in centers of intense activity (Zhongshan Scenic Area, Xuanwu Lake, Hongshan Forest Zoo, Confucius Temple, Nanjing University Gulou District, and Yihe Road Republican Architectural Complex). Qixiashan Scenic Area, in the suburbs, had a high-level latent attraction, which resulted from the high intensity in the core city nodes.

Comparing the latent attraction of the nodes that were visited by residents and nonlocal tourists, our results showed that both bore similarities in the core city area. The nodes that were visited by residents achieved a higher rate of latent attraction than those that were visited by non-local tourists. In the suburbs, the nodes in the non-local tourist network displayed a higher level of latent attraction than those that were visited by residents. The findings show that residents had more flexible and varied selections of tourism nodes, while non-local tourists selected popular nodes as often as possible.

**Figure 6.** Classification map of the eigenvector centrality of nodes under three types of network context (overall, resident, and non-local tourist): (**a**) entire city of Nanjing; (**b**) core city of Nanjing.

• Mediating and Controlling Power of Network Nodes (Betweenness centrality)

As shown in Figure 7, there was an unclear relationship between the mediating and controlling power of the nodes in the city proper and their spatial locations. Almost all of the nodes were at medium and low levels, with Zhongshan Scenic Area being the only essential node. This indicates that all of Nanjing city's nodes have weak mediating and controlling power, meaning each node was relatively independent, or mostly combinations of nodes, with no nodes being necessary. Thus, the network structure was rather loose, with no significantly popular tourism routes (links whose node number was greater than two).

Similarities were found in the comparison of the mediating and controlling power of the nodes visited by residents and non-local tourists. The nodes that non-local tourists selected had slightly stronger mediating and controlling power than those that were selected by residents. This shows that the combinations of nodes in the non-local tourism flow network were relatively uniform and stable, while those in the resident networks were more flexible and varied.

**Figure 7.** Classification map of the betweenness centrality of nodes under three types of network context (overall, resident, and non-local tourist): (**a**) entire city of Nanjing; (**b**) core city of Nanjing.

3.2.2. Evaluation of Node Status Based on Node Centrality

A correlation exists between all of the types of tourism node centrality. Therefore, these three types of node centrality were arranged and combined to create a more comprehensive status evaluation system. Figure 8 depicts a coordinate system of differences in centrality combinations. In this system, axes X, Y, and Z represent in-degree centrality, eigenvector centrality, and betweenness centrality, respectively, and each divides into high, medium, and low levels. If a node had a combination of high and medium levels, or medium and low levels, the range was one. However, if it consisted of all three levels, or only high and low levels, the range was two. When all of the three centrality type levels were identical, there was no gradation. When all of the three node centrality types were low, there was always an edge node in the tourism network. When all of the three node centrality types were at the medium level, this meant that the node was still in balanced development. When the three node centrality types were high from beginning to end, it was a popular core node. This coordinate system arranged and combined the three different centrality type levels (high, medium, and low) to highlight tourism significance.

All of the nodes populated the coordinate system according to the three centrality combination types depicted in the node status evaluation system (see Figure 9). Three nodes with low levels of eigenvalue centrality were excluded from the diagram to improve readability. In the overall table, the more common combinations were: (1) middle indegree, middle eigenvalue, and low betweenness—nodes had a certain popularity, as did the adjacent points, although it is very likely that they were the tourist's starting and destination points; (2) high in-degree, medium eigenvalue, and low betweenness—nodes were popular, and the adjacent points also had a certain level of popularity, being situated in the core position of non-core communities; (3) high in-degree, high eigenvalue, and medium betweenness—nodes and adjacent points were popular, situated at the "bridging" position between edge nodes and the core district, or between small communities.

**Figure 8.** Node status evaluation system for tourism flow networks.

**Figure 9.** Evaluation of the node status in the overall tourism flow network.

This study further compared the resident and non-local tourist networks (see Figure 10). We revealed and ranked the resident network's first three combinations: (1) medium indegree, medium eigenvalue, and low betweenness; (2) medium in-degree, high eigenvalue, and medium betweenness; (3) high in-degree, medium eigenvalue, and low betweenness. This showed "high in-degree centrality, generally moderate eigenvalue centrality, and seriously polarized betweenness centrality". The results indicate that the popular nodes that were approved by residents were more diversified. Some of the popular nodes showed close connections and balanced development, while others existed independently. The first three combinations in the ranking of non-local tourist networks were: (1) medium in-degree, medium eigenvalue, and low betweenness; (2) low in-degree, medium eigenvalue, and low betweenness; (3) medium in-degree, medium eigenvalue, and medium betweenness—manifesting the characteristic of "overall low in-degree centrality, higher eigenvalue centrality, and seriously polarized betweenness centrality". Compared to those visited by residents, the popular nodes that were approved by non-local tourists

were relatively concentrated and stable, and mostly in a state of moderate popularity or marginalization.

**Figure 10.** Comparison of the node status evaluation in resident networks and non-local tourist networks: (**a**) resident networks; (**b**) non-local tourist networks.

#### *3.3. Analysis of Tourism Flow Network Communities*

#### 3.3.1. Cohesive Subgroup Analysis of the Tourism Flow Network

The CONCOR method in the UNICET software was used to perform a cohesive subgroup analysis. We calculated the coefficients of each row (or column) in the matrix, with the final results shown in Table 3. The higher the numerical value of the density matrix, the closer the subgroup connection.

The results of the cohesive subgroup analysis revealed the substructures within the tourism flows, and show more tourist route combinations. In the cohesive subgroup density matrix of the overall tourism flow network in Nanjing city (see Table 3), eight substructures formed, each with varying closeness of flow connections between the subgroups. Subgroup 5, 6, and 8 showed the most frequent interactions between internal node members. Subgroup 7 showed a higher frequency in the internal interaction between its subgroup nodes

and their external connections. In this light, combination marketing, or combined tourism tickets, may appeal to these two subgroups. The nodes in subgroup 1 and 3 were mainly dependent on their connections with other subgroups. These node types may be treated as additional products to the above subgroups. Subgroup 2 and 4 were relatively independent, with comparatively weak internal and external connections, and can be branded separately as tourism products with unique features.


**Table 3.** Density matrix of the cohesive subgroups in the tourism flow network.

\* Note: R2 = 0.159, AVG = 0.298. 1—Bailuzhou Park, Nanshan Lake Tourist Resort, Jiangjun Mountain, Pingshan Forest Park; 2—Wulongtan Park, Great Bao'en Tower (Porcelain Tower), Meiyuan Xincun Memorial Hall, Laoshan National Forest Park, Tangshan Scenic Area, Fangshan Scenic Area, Jinniu Lake Scenic Area, Stone City Park, Gaochun International Cittaslow Tranquil; 3—Yuhuatai Gongde Park, Memorial Hall of the Victims in Nanjing Massacre, Yuhuatai, Chrysanthemum Park; 4—Tianshen-qiao Scenic Area, Niushoushan Forestal Park, Nanjing Yangtze River Bridge, Ginkgo Lake Eco-tourism Resort Leisure; 5—Changjiang Guanyin Scenic Area, Hongshan Forest Zoo, Great Bridge Park, Beigushan, Beiji Ge Park (Jiming Temple), Yuejiang Tower Scenic Area, Xuanwu Lake, Zhongshan Scenic Spot; 6—Sipailou Campus of Southeast University, Yihe Road Republican Architectural Complex, Zheng He Treasure Ship Park, Nanjing University Gulou District, Qinglinagshan Park; 7—Dajinshan Scenic Area, Confucius Temple, Chaotian Palace Scenic Area, Qixiashan Scenic Area, Old East Gate, Nanjing Museum, Presidential Palace (1912 District); 8—Mochou Lake, Nanhu Lake Park.

> Based on the analysis of the eight cohesive subgroups in the overall tourism flow network, these subgroups fall into four classifications: (1) **endogenous agglomeration type**—AVG1 (degree of internal connection > AVG, and AVG2 (degree of external connection) ≤ AVG, manifesting as the subgroups converging inward, powerful connections between internal individual members, and weak connections outside of the subgroups; (2) **internal-external balance type**—AVG1 (degree of internal connection) > AVG, and AVG2 (degree of external connection) > AVG, manifesting as the subgroups having strong internal and external connections; (3) **externally attached type**—AVG1 (degree of internal connection) ≤ AVG, and AVG2 (degree of external connection) > AVG, manifesting as weak connections between members and internal subgroups, although forming a strong connection with one or several external subgroups; (4) **individual independence type**—AVG1 (degree of internal connection) ≤ AVG, and AVG2 (degree of external connection) ≤ AVG, with members within subgroups having relatively weak internal and external connections, and failing to form obvious network connections with other subgroups and individuals (or forming a connection with individual subgroups).

> Likewise, the local and non-local tourism flow networks could be classified according to this criterion, which produced our results for the analysis of the tourism flow networks of the three types of subgroup (overall, local, non-local) (see Table 4, Figure 11).

> Overall, in Nanjing city, all of the tourism nodes with a high visit volume (tourism brand image) were within the endogenous agglomeration and internal-external balance types, thereby becoming the absolute cores of these two kinds of subgroups. Spatially, they connected to several adjacent nodes, with well-defined edges to the subgroups, and all were within the core city of Nanjing. Furthermore, some nodes that were on the periphery of the core city existed by attaching to these two kinds of subgroups. There were also some relatively independent, small-scale node groups, with only a single flow path. These results align with the study on the all-for-one tourism policy of cities.


**Table 4.** Cohesive subgroups of tourism flow networks under three types of network context.



From the residents' point of view, more nodes were classified as endogenous agglomeration and internal-external balance types. These form an even greater agglomeration network in the tourism flow, breaking down the spatial boundary of the core city, and absorbing more suburban nodes. Furthermore, the subgroups of externally attached and individual independence types had relatively few nodes, and showed the tendency of gradual absorption by the subgroups with a high agglomerative nature.

From the perspective of non-local tourists, no subgroup emerged from the endogenous agglomeration type. Additionally, almost all of the subgroups of the internal-external balance type were within the core city. Such subgroups were formed by core nodes and a few adjacent nodes, and were, therefore, subject to spatial limitations. More than half of the nodes were situated within the subgroups of the externally attached and individual independence types, indicating that only the core nodes in tourism flow network structures performed prominently, whereas the overall connection was insufficient and fragmented.

**Figure 11.** Cohesive subgroups' spatial distribution of tourism flow networks under three types of network context (overall, resident, and non-local tourist): (**a**) overall cohesive subgroup of entire city; (**b**) resident cohesive subgroup of entire city; (**c**) non-local cohesive subgroup of entire city; (**d**) overall cohesive subgroup of core city; (**e**) resident cohesive subgroup of core city; (**f**) non-local cohesive subgroup of core city.

#### 3.3.2. "Core-Edge" Model

Based on the binarization results of the communities in the tourism flow networks that were extracted using the "core-edge" model of UNICET, the results of the overall tourism flow in Nanjing indicate the following (see Table 5, Figure 12). The core district members included four nodes—namely, Confucius Temple, Presidential Palace (1912 District), Xuanwu Lake, and Zhongshan Scenic Area. The remaining 39 nodes were at the district's edge. Regarding the correlation degree, core district members reached 0.78, while district edge members were only 0.08, indicating obvious structural stratification in the tourism network in Nanjing. Furthermore, the correlation degree between core members and edge members reached 0.29, indicating that a connection between core and district edge was also relatively close. Regarding spatial distribution, the core district nodes were all located within the core city. Spatially, a diminished connection density was observed between the district edge and the core district nodes, from the core to the periphery. Therefore, aside from actively developing the core tourism districts, it is necessary to simultaneously enhance the overall tourism competitiveness of Nanjing city through positive cultivation, systematic and active expansion, and linkage to edge tourism districts.

By comparing the "core-edge" structures that were formed by resident and nonlocal tourist tourism flow networks (see Table 5), this study found that the respective core district members had slightly different compositions. The core district that was formed by resident tourism flow included Zhongshan Scenic Area, Xuanwu Lake, Beiji Ge Park (Jiming Temple), Confucius Temple, and Hongshan Forest Zoo. The non-local tourist district included Zhongshan Scenic Area, Confucius Temple, Xuanwu Lake, and Presidential Palace (1912 District). These findings show that the approval rate of Beiji Ge Park (Jiming Temple) and Hongshan Forest Zoo was only high within Nanjing, while the Presidential Place (1912 District) was generally a check-in spot for non-local tourists. In terms of correlation degree, the core districts that were formed by the non-local tourist tourism flow had a higher degree of internal correlation than those that were formed by residents, although residents had a higher degree of internal correlation for the edge districts, and between the core members and edge members. This shows that the node combinations for non-local tourists were relatively stable, and their polarization was more serious. On the other hand, resident tourism routes have begun to develop in a diversified and individualized direction, with some niche scenic spots beginning to enter their horizons. In terms of spatial distribution, the core districts that were formed by non-local tourists in the tourism flow network showed a concentrated and contiguous distribution. Meanwhile, residents have just begun to break through the spatial contiguity, manifesting a spatial form of separation between the two groups. Therefore, the differences between the tourist groups should be considered when a tourism development strategy is formulated for differentiation, individualization, and diversification.


Zoo

(a total of 4, correlation = 0.782)



Palace (1912 District)


**Figure 12.** "Core-edge" model spatial distribution under three types of network context (overall, resident and, non-local tourist): (**a**) overall "core-edge" model of core city; (**b**) resident "core-edge" model of core city; (**c**) non-local "core-edge" model of core city.

#### **4. Discussion**

This study summarized the structural characteristics of the tourism flow network of 43 scenic spots in Nanjing city from three aspects: tourism flow network connection, node centrality, and communities. A comparative analysis revealed the tourism flow net structures that were formed by resident and non-local tourist behavior. The results are presented in Table 6.


**Table 6.**

Summary of tourism flow network structures under three context types (overall, resident and, non-local tourist).

The tourism network structure in Nanjing city exhibits the characteristics of single center (core city), core-edge differentiation, and spatial agglomeration and diffusion. Furthermore, differences exist between the resident and non-local tourist characteristics. What are the reasons for the formation of such a tourism flow network structure in Nanjing? Generally speaking, at the urban scale, traffic flow is the main carrier of tourism flow. Therefore, traffic accessibility and travel mode must be the most directly influencing factors. Nanjing's road and rail traffic, for example, is radiated outward, from a central city layout; the density inside the core city road network is much higher than in other areas, meaning that the good traffic accessibility attracts tourists from all parts of the city. This directly caused the single center tourist flow network structure, which is located in the core city center. At the same time, due to the low accessibility of public transportation in the outer suburbs' scenic spots, road trips have become the main means of arrival. Additionally, local residents are more qualified to use this means of travel, which results in the tourism flow network of local residents virtually covering the whole city, while non-local tourists are mostly concentrated in the inner city of the central city.

Moreover, these differences are due to the combined influence of scenic spot characteristics and tourists forming the tourism flow network structure. Tourist spot characteristics (in addition to traffic factors) concretely involve the locations of scenic spots, grade, type differences (historical-cultural, natural landscape, indoors versus outdoors), and popularity (Internet heat). Furthermore, tourist characteristics include tourist source locations (local and non-local), residence, travel time (travel season, length of commute), travel motives, knowledge of and affection for destinations, information sources, revisit rate, and travel modes.

In light of the aforementioned conclusion, this study suggests that we should grasp the characteristics of the overall structure and nodes of the Nanjing city tourism flow network, and develop the entire region in unison. Efforts could include the following: (1) bringing nodes to maturity to form core tourism subdistricts; (2) energizing developing nodes; (3) devising specialized and themed tourism routes; (4) developing individualized ways of traveling for relatively independent nodes; (5) focusing on group differences among tourists and precisely positioning the audience groups for scenic spots; (6) executing accurate publicity.

The result of this study is the network structure of tourist flow represented by people flow. Therefore, it cannot reflect all of the characteristics of spatial structure, and different conclusions may be drawn by using other data, such as information flow, logistics and traffic flow. Secondly, the research uses mobile phone signaling data as the basic data, which is less authoritative than traditional official statistics; however, the characteristics of tourism flow networks are difficult to be realized using traditional data analysis. Although the data itself, and the processing process, will produce errors, the identification results were tested to ensure the randomness of the sampling, and can reflect the overall characteristics and rules. In addition, due to the length limit, this study focused on grasping the overall structural characteristics of the tourism flow network in Nanjing city and does not elaborate on the strategies used. Future studies should discuss in-depth methods to optimize tourism flow networks.

**Author Contributions:** Conceptualization, Lingjin Wang; methodology, Lingjin Wang; software, Lingjin Wang and Yan He; validation, Lingjin Wang and Xiao Wu; formal analysis, Lingjin Wang; investigation, Lingjin Wang; resources, Lingjin Wang; data curation, Yan He; writing—original draft preparation, Lingjin Wang; writing—review and editing, Lingjin Wang; visualization, Lingjin Wang; supervision, Xiao Wu; project administration, Xiao Wu; funding acquisition, Xiao Wu. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China(51878142), National Key Research and Development Program of China(2019YFD1100800), Postgraduate Research&Practice Innovation Program of Jiangsu Province(KYCX20\_0143) and the Fundamental Research Funds for the Central Universities(3207032101D).

**Data Availability Statement:** Restrictions apply to the availability of these data. Data were obtained from China Mobile Communications Group Co., Ltd., and are available from Lingjin Wang, with the permission of China Mobile Communications Group Co., Ltd.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Socioeconomic and Environmental Impacts on Regional Tourism across Chinese Cities: A Spatiotemporal Heterogeneous Perspective**

**Xu Zhang 1,†, Chao Song 2,3,4,†, Chengwu Wang 1,\*, Yili Yang 4, Zhoupeng Ren 2, Mingyu Xie 4, Zhangying Tang <sup>1</sup> and Honghu Tang <sup>1</sup>**


**Abstract:** Understanding geospatial impacts of multi-sourced drivers on the tourism industry is of great significance for formulating tourism development policies tailored to regional-specific needs. To date, no research in China has explored the combined impacts of socioeconomic and environmental drivers on city-level tourism from a spatiotemporal heterogeneous perspective. We collected the total tourism revenue indicator and 30 potential influencing factors from 343 cities across China during 2008–2017. Three mainstream regressions and an emerging local spatiotemporal regression named the Bayesian spatiotemporally varying coefficients (Bayesian STVC) model were constructed to investigate the global-scale stationary and local-scale spatiotemporal nonstationary relationships between city-level tourism and various vital drivers. The Bayesian STVC model achieved the best model performance. Globally, eight socioeconomic and environmental factors, average wage (coefficient: 0.47, 95% credible intervals: 0.43–0.51), employed population (−0.14, −0.17–−0.11), GDP per capita (0.47, 0.42–0.52), population density (0.21, 0.16–0.27), night-time light index (−0.01, −0.08–0.05), slope (0.10, 0.06–0.14), vegetation index (0.66, 0.63–0.70), and road network density (0.34, 0.29–0.38), were identified to have nonlinear effects on tourism. Temporally, the main drivers might have gradually changed from the local macro-economic level, population density, and natural environment conditions to the individual economic level over the last decade. Spatially, city-specific dynamic maps of tourism development and geographically clustered influencing maps for eight drivers were produced. In 2017, China formed four significant city-level tourism industry clusters (hot spots, 90% confidence), the locations of which coincide with China's top four urban agglomerations. Our local spatiotemporal analysis framework for geographical tourism data is expected to provide insights into adjusting regional measures to local conditions and temporal variations in broader social and natural sciences.

**Keywords:** Chinese regional tourism; socioeconomic and environmental drivers; spatiotemporal influencing factors; spatiotemporal estimation mapping; Bayesian STVC model; spatiotemporal nonstationary regression; geographical data modeling analysis

**Citation:** Zhang, X.; Song, C.; Wang, C.; Yang, Y.; Ren, Z.; Xie, M.; Tang, Z.; Tang, H. Socioeconomic and Environmental Impacts on Regional Tourism across Chinese Cities: A Spatiotemporal Heterogeneous Perspective. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 410. https://doi.org/ 10.3390/ ijgi10060410

Academic Editors: Andrea Marchetti and Angelica Lo Duca

Received: 29 April 2021 Accepted: 10 June 2021 Published: 14 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Tourism is an underlying industry that promotes the development of the global economy [1]. According to the World Travel & Tourism Council (WTTC), tourism contributed 10.3% (8.9 trillion US dollars) of global GDP and provided one-tenth of the total number of jobs (330 million positions) in 2019 before the pandemic [2]. Through developing the tourism industry, local governments can markedly improve the level of infrastructure construction, increase employment opportunities, improve people's living conditions, and promote urban economic growth [3–5]. In addition, tourism development is a fundamental part of a sustainable development strategy, which is recognized as a green industry by the world due to its low energy consumption and light pollution characteristics in the development process [6].

Despite being one essential force promoting regional economy, regional tourism itself is greatly influenced by socioeconomic status [7,8], including GDP [8], employment status [9], personal income [10], health and hygiene [7], industrial production index [11] and social media [12]. Besides the socioeconomic condition, research also identified the notable role of the environment in affecting regional tourism [13–15], especially climatic conditions, such as temperature [16], precipitation [17], sunshine [18], and relative humidity [19]. Road infrastructure was also a critical environmental driver enhancing the tourism industry [20,21]. However, all these previous studies only adopted a limited number of factors. It is necessary to consider the comprehensive impacts on tourism by combining socioeconomic conditions with environmental conditions.

When investigating relationships between regional tourism and potential explanatory factors, an unrealistic assumption persistently embedded in previous literature was that the variables' relationships were homogeneous, which had been defined as stationarity. For instance, non-spatial tourism studies using qualitative analysis [22], feasible generalized least square (FGLS) regression [19], linear and quantile regression [23], or logit regression [24] are regarded as global-scale analyses and also ignore the existence of spatial effects. Likewise, some geospatial tourism studies using the spatial regression models, such as the exploratory spatial data analysis (ESDA) [25] or spatial econometric models [26], are capable of incorporating spatial effects for intercept or residual but are still unable to estimate a set of space-scale coefficients to characterize the varying region-specific relationships between variables. Hence, a more reasonable assumption in the real world highlights the heterogeneous or varying impacts of explanatory variables on tourism development due to region-specific situations, especially for studies conducted across large domains at finer geospatial scales. Such spatially heterogeneous variables relationships are called spatial nonstationarity in the field of statistics. At present, the geographically weighted regression (GWR) [27] is frequently used in tourism research, aiming at exploring such spatial nonstationarity between tourism and various influencing factors [28,29]. However, to the best of our knowledge, no study has been conducted from the spatiotemporal integrated nonstationary perspective, to fully explore both socioeconomic and environmental drivers on regional tourism development.

In China, as the area of interest in this study, there has long been an issue of regional tourism development disparities [30], which obstructed regional tourism sustainability to some extent [31]. Although these geospatial disparities have been extensively discussed at a provincial-level scale [32] or city group scale [33], seldom have studies explored the cityspecific disparities of regional tourism, especially over mainland China. Based on tourism connotations and tourism elements, Chinese scholars have established a comprehensive indicator framework of influencing the urban tourism industry from multiple dimensions. Socioeconomic and environmental aspects are also considered indispensable indicators reflective of regional tourism industry development [34]. However, no existing studies ever investigated the joint impacts of socioeconomic and environmental conditions on China's city-level tourism from a spatiotemporal heterogeneous perspective, to provide evidence-based implications for assisting the formulation of tourism-related policies at governmental levels in a timely and effective manner.

In an attempt to find effective factors affecting regional tourism outcomes to provide tourism strategies tailored for specific local spatial conditions and changing temporal circumstances, we constructed an explanatory variable framework composed of 30 variables, including socioeconomic and environmental conditions. We explored spatiotemporal heterogeneous relationships between the regional tourism development and the multi-source explanatory factors from 2008 to 2017 across Chinese cities by employing the Bayesian spatiotemporally varying coefficients (STVC) model [35,36]. The establishment of such an explanatory variable framework in our study also served as a contributor to the current literature in this field in terms of improving the comprehensiveness of the existing research index system as well as adding novel perspectives into this field based on the consideration of both spatial and temporal heterogeneity.

#### **2. Materials and Methods**

#### *2.1. Study Area and Data*

Considering the unbalanced development speed and regional differences in China's tourism industry during the last decade, in this study, 343 prefecture-level areas were selected as the underlying research units (excluding Hong Kong, Macao, and Taiwan). Total tourism revenue was employed as a proxy variable to describe the regional tourism development level from 2008 to 2017 [30]. Figure 1 illustrates the original geographical distribution of city-level total tourism revenue across China in 2017.

**Figure 1.** Geographical distribution of the original city-level total tourism revenue across China in 2017.

Correspondingly, we collected a relatively comprehensive system of 30 explanatory variables at the city level, including 21 socioeconomic factors and nine environmental variables (summarized in Table 1), to detect their impacts on total tourism revenue in China. The total tourism revenue and socioeconomic data were retrieved from the China City Statistical Yearbook and Statistical Bulletin. The climate data (EV1-EV4) were collected from the National Meteorological Information Center (http://data.cma.cn/, accessed on 28 April 2021). The other environmental factors (EV5-EV8) were downloaded from the Resource and Environment Science and Data Center (http://www.resdc.cn/, accessed on 28 April 2021). As a list of environmental variables, including elevation, road network density, slope, and nighttime light index, were not temporally continuous data, these variables were only added as a part of the local-scaled modeling for spatial nonstationary analysis. Other socioeconomic and environmental factors had spatiotemporal variation characteristics, satisfying the hypothesis of spatiotemporal nonstationarity.

**Table 1.** City-level potential explanatory variables of regional tourism in China: SV1-21 denote twenty-one socioeconomic factors, and EV1-9 denote nine environmental factors.


#### *2.2. Statistical Methods*

#### 2.2.1. Variable Selection

Two widely adopted approaches, namely multicollinearity assessment and random forest [37], were employed in a progressive manner as a screening step for identifying the most representative influencing factors on the tourism industry from 30 candidate variables. Precisely, the indicator variance inflation factor (VIF) was first adopted to measure the multicollinearity effect, referring to a correlation between explanatory factors [38]. Commonly, VIF < 10, representing mild and negligible multicollinearity, is adopted as the threshold to screen variables [39]. Here, given the adequacy of candidate variables involved in this analysis, a stricter standard was adopted, indicating that a candidate variable with VIF > 5 was removed. Following the VIF step, random forest, an integrated machine learning approach relying on the decision tree, was adopted for further screening the explanatory variables according to the calculation of an indicator named mean decrease impurity (MDI), which has been commonly used for reflecting the ranking of a factor's relative importance [40]. For a candidate variable, a higher value of MDI is associated with the increased importance of the variable. This random forest step is typically empirical and data-driven, as MDI is not a relative measure [41].

#### 2.2.2. Bayesian STVC Model

The Bayesian spatiotemporally varying coefficients (STVC) model is a recently burgeoning local spatiotemporal regression developed under the Bayesian hierarchical modeling (BHM) framework. It is mainly designed to quantitatively characterize structured and heterogeneous spatiotemporal impacts (expressed as local-scale coefficients) of different covariates on the outcomes of the variable of interest, that is, to explore the spatiotemporal nonstationarity inherent in geospatial research phenomena [35,36].

For China's tourism case, *Yit* denotes the space–time monitoring data of the total tourism revenue indicator, in which *i* = 1, ... , *I* (*I* = 343) are the administrative geographical units of the cities. For each city, data are available for a ten-year period from 2008 to 2017, labeled as *t* = 1, ... , *T* (*T* = 10). Then, the structured additive predictor *ζit* = *g*(*Yit*) within a reduced Bayesian STVC model is formulated in Equations (1)–(3), i.e.,

$$\mathcal{L}\_{it} = \mathcal{g}(\mathbf{Y}\_{it}) = \eta + \sum\_{k=1}^{K} f\_{\text{space}}(\omega\_{ik} \mathbf{S} \mathbf{X}\_{itk}) + \sum\_{m=1}^{M} f\_{timc}(\boldsymbol{\varphi}\_{tm} \mathbf{T} \mathbf{X}\_{itm}),\tag{1}$$

$$\omega\_{i}|\omega\_{-i}, \tau\_{\omega}, \mathcal{W} \sim \mathcal{N}(\frac{\sum\_{j=1}^{I} w\_{ij}\omega\_{j}}{\sum\_{i=1}^{I} w\_{ij}}, \frac{1}{\tau\_{\omega}\sum\_{i=1}^{I} w\_{ij}}), i = 1, \dots, I,\tag{2}$$

$$\left|\varphi\_{t+1} - \varphi\_t\right| \tau\_{\varphi} \sim N(0, \frac{1}{\tau\_{\varphi}}), t = 1, \dots, T - 1, \text{ or}$$

$$\left|\varphi\_{\tau\_{\varphi}} - \varphi\_{\tau\_{\varphi}} - \varphi\_{\tau\_{\varphi}} - \varphi\_{N(0)}\right| \tau\_{\varphi} \le \frac{1}{\tau\_{\varphi}} \left|\varphi\_{\tau\_{\varphi}} - \varphi\_{\tau\_{\varphi}}\right| \tau\_{\varphi} \ll \frac{1}{\tau\_{\varphi}}.$$

$$\left|\varphi\_{t} - 2\varphi\_{t+1} + \varphi\_{t+2}\right| \tau\_{\varphi} \sim N(0, \frac{1}{\tau\_{\varphi}}), t = 1, \dots, T - 2,\tag{3}$$

In Equation (1), *g*(·) denotes a log-Gaussian likelihood function for this case to link *Yit* and *ζit*. *η* denotes the intercept with fixed effect. *SX* signifies *K* main covariates with the spatial nonstationary assumption. *TX* represents *M* main covariates that are assumed to be temporally nonstationary. The parameter *ωik* is named as space-coefficients (SCs) and *ϕtm* is named time-coefficients (TCs), which are two fundamental outputs of the STVC model. *fspace*(·) and *ftime*(·) signify the spatial and temporal latent Gaussian models (LGMs) that are used for fitting the random effects of spatial and temporal nonstationarity to estimate local parameters SCs and TCs [42,43].

In Equation (2), on account of the spatial LGM *fspace*(·), the prior intrinsic conditional autoregressive (iCAR) model is adopted for fitting the spatial autocorrelation characteristics that are also called the spatial structured random effects within a BHM [44], where *ω*−*<sup>i</sup>* denotes every spatial unit in *ω* apart from the *i*-th spatial unit, *W* = (*wij*) represents the spatial relation matrix in which *wij* = 1 if spatial units *i* and *j* are neighbors, e.g., spatial adjacency relations here, and *wij* = 0 otherwise, as well as *τω* further indicates the precision parameter [45].

In Equation (3), the prior random walk (RW) model is used as the temporal LGM *ftime*(·) to estimate the temporal autocorrelation characteristics of TCs, where the structured temporal random effect of covariates *ϕ* can be a random walk of order one or two, with *τϕ* being the precision parameter [46]. The prior RW model of order two is more suitable for the research object with a clear linear time trend, compared with the prior RW model of order one.

#### 2.2.3. Model Implementation and Comparison

To explore both the global homogeneous and local heterogeneous impacts of socioeconomic and environmental factors on city-specific outcomes of China's tourism, we implemented four types of Bayesian regressions, the multiple linear regression (MLR, model 1), the ordinary generalized additive model (GAM, model 2), the global spatiotemporal regression (model 3), and the Bayesian STVC model (model 4), which belongs to the local spatiotemporal regression family. We chose these models based on the following considerations. First, model 1 and model 2 were traditional mainstream models. We used them to fit the overall linear and nonlinear impacts of covariates on tourism [47]. Then, we used a widely applied spatiotemporal regression (model 3), which mainly served as a spatiotemporal descriptive tool, to depict the original smoothed spatial variations and temporal trends of tourism in China [42]. However, models 1–3 are regarded as the global-based type of regression, meaning that covariate impacts (coefficients) were homogeneous across space and over time [35]. Given this underlying limitation of stationarity, model 4 was finally employed to explore the structured heterogeneous (varying) impacts of covariates at both space and time scales [36].

To be specific, the equation of an MLR (model 1) is given by

$$\mathcal{Z}\_{it} = \mathcal{g}(\mathcal{Y}\_{it}) = \eta + \sum\_{k=1}^{K} \chi\_k X\_{itk\prime} \tag{4}$$

where *χ<sup>k</sup>* denotes the overall coefficient of the *k*-th covariate *X*, which qualifies the linear numerical impacts of explanatory factors on *Yit*.

An ordinary GAM (model 2) is formulated as [48]

$$\mathcal{Z}\_{it} = \mathcal{g}(\mathbf{Y}\_{it}) = \eta + \sum\_{k=1}^{K} f\_{\text{GAM}}(\delta\_{\text{lk}} \mathbf{X}\_{itk})\_{\prime} \tag{5}$$

where *fGAM*(·) denotes the nonparametric smooth function for fitting a set of coefficients *δhk* with *h* groups, representing the numerical nonlinear impacts of the *k*-th covariate. Unlike model 1, model 2 is useful in identifying response–covariate numerical nonlinear relationships. However, both model 1 and model 2 cannot consider the spatiotemporal effects essential for geospatial analysis.

A global spatiotemporal regression (model 3) can be modeled with [42,46,49]

$$\mathcal{J}\_{it} = \mathcal{g}(\mathcal{Y}\_{it}) = \eta + \sum\_{k=1}^{K} \chi\_k X\_{itk} + f\_{\text{space}}(\mu\_i) + f\_{\text{time}}(\lambda\_t), \tag{6}$$

where *μ<sup>i</sup>* signifies the space-intercepts (SIs) representing the structured spatial distribution of *Yit*, *λ<sup>t</sup>* signifies the time-intercepts (TIs) representing the structured temporal trend of *Yit*, LGMs *fspace*(·) and *ftime*(·) are the same as in Equation (1).

Model 4, as fully introduced in Equations (1)–(3), has been as a reduced Bayesian STVC regression by removing the spatiotemporal random effects of intercepts to ensure noticeable variations of both spatial and temporal nonstationary impacts of different explanatory factors on the target response variable [36,50].

Finally, the optimal model from the above four regressions with the best model fitness and predictability was further utilized to estimate the complete spatiotemporal maps of *Yit*.

#### 2.2.4. Model Inference and Evaluation

Alternative regression models were established using the Bayesian statistics based on the advanced hierarchical modeling strategy, that is, a BHM framework. Non-informative priors were assigned to parameters within the BHM to embody the idea of data-driven modeling [47]. The integrated nested Laplace approximation (INLA) algorithm, an approximate Bayesian inference technique, was adopted to estimate these regression models using the R-INLA package in the R environment [51] due to its advantage of producing reliable estimated results with a relatively short computation time [52]. The performances of these alternative regressions are evaluated in terms of three aspects, including the degree of model fitting, model complexity, and predictive ability [46]. Specifically, the deviation information criterion (DIC) [53] and the Watanabe–Akaike information criterion (WAIC) are used for reflecting the degree of fitting of the Bayesian regression, for which a smaller value indicates a better model fit. Likewise, the complexity of the Bayesian regression is evaluated with two indices (*PDIC* and *PWAIC*) that can be simultaneously obtained via the adoption of both the DIC and WAIC methods, for which smaller values are also reflective of better model performances. In terms of the model predictive power, a logarithmic score (LS) retrieved from the conditional predictive ordinates under a leave-one-out cross-validation is used, with smaller values associated with better predictive capacities [54].

#### **3. Results**

#### *3.1. Selected Drivers for Modeling*

As indicated in Figure 2, through setting VIF < 5 as the inclusion criteria, potential explanatory variables with higher rankings of MDI were selected from the screening outcomes and were added into the regression modeling. To be specific, first, we removed factors with higher multicollinearity based on the exclusion threshold of 5 for VIF, as shown in Figure 2a. This step left 13 socioeconomic factors (i.e., SV1, SV2, SV3, SV6, SV7, SV9, SV10, SV14, SV17, SV18, SV19, SV20, and SV21) and four environmental factors (i.e., EV5, EV6, EV8, and EV9). Further, using Figure 2b, we selected the top eight factors (i.e., EV9, SV21, EV8, EV5, EV6, SV20, SV1, and SV2), which had relative higher importance (contribution) to the response variable. Because the selection of MDI is generally empirical, here, the main reason for our choice of MDI is that there was an apparent bluff trend between the two factors of SV2 and SV7. The top eight factors covered four socioeconomic factors and four environmental factors, which was ideal for exploring the combined impacts of the above two critical aspects on tourism development. Hence, based on the perspectives above, the screening threshold applicable to this case is MDI > 200. Summing up the above, a core variables system particularly applicable to China's tourism case was created, which contained a total of eight critical factors (renamed as X1–X8 in Table 3), and was further incorporated into the next-step regression analysis.

**Figure 2.** Two-step variables screening procedure: (**a**) remove variables with higher multicollinearity (VIF > 5); (**b**) select variables with the higher relative importance (MDI > 200).

#### *3.2. Model Assessment and Comparison*

We assessed the four types of comparative Bayesian regression models' performances by jointly considering model fitness, complexity, and predictive power, for which a total of five representative evaluation indicators are summarized in Table 2. Model 4 (STVC) showed the best performance with the minimum assessment indicators DIC, WAIC, and LS. However, for *PDIC* and *PWAIC*, model 4 demonstrated a notable deficiency and it presented a much higher complexity than the other three mainstream benchmark models. The complexity (*PDIC*) of model 4 was found to be about 99 times higher than that of a multiple linear regression (model 1), which was 2.6 times higher than that of a global spatiotemporal regression (model 3). Two possible reasons were considered for explaining the increased complexity of the STVC model. Specifically, the STVC model demonstrated both superior model fitness and predictive capacity compared with all the other regressions. Moreover, it should be pointed out that the STVC model was the only one that had the

capacity for synchronously detecting both temporal and spatial heterogeneous associations between variables to be further interpreted at a space-time scale. Therefore, model 4 (STVC) was selected as the final regression to explore the spatiotemporal heterogeneous relationships between tourism and eight selected explanatory variables, which was also used for producing a series of estimated spatiotemporal distribution maps reflective of the city-level tourism revenue in China.

**Table 2.** Bayesian modeling evaluations of the alternative regressions for China's tourism case account for model fitness, complexity, and predictive power.


Model 1–4: multiple linear regression, generalized additive model, global spatiotemporal regression, and local spatiotemporal regression STVC model; DIC: deviance information criterion; WAIC: Watanabe–Akaike information criterion; *PDIC*: effective number of parameters from DIC; *PWAIC*: effective number of parameters from WAIC; LS: logarithmic score.

#### *3.3. Global-Scale Impacts of Drivers*

Two kinds of overall impacts of socioeconomic and environmental variables on tourism were estimated: one was the global-scale linear numerical effects based on model 1; the other one was the global-scale nonlinear numerical effects obtained from model 2. We summarized the critical parameters of model 1 in Table 3, including the overall coefficients representing the stationary relationship among variables, standard deviation (SD), and the 2.5% and 97.5% quantiles of Bayesian credible intervals (CIs). In terms of the four socioeconomic variables, X1 and X2 reflected the income level of individual residents, X3 represented the regional macroeconomic development conditions, and X4 represented the population condition. For the other four environmental variables, X5 and X7 represented the city-specific urbanization process and vegetation coverage based on satellite remote sensing data, respectively. X6 and X8 reflected the general geographical situations characterized by topography and transportation, respectively. Except for X2 and X5, the overall coefficients of the other six factors were found to be greater than zero. This finding indicated that most core variables served as positive stimulants for tourism development from a global-scale perspective. Notably, the NDVI (X7), the average wage of employees in urban units (X1), GDP per capita (X3), and road network density (X8) demonstrated more significant impacts on tourism among the eight factors.

**Table 3.** Linear numerical impacts of main drivers on China's city-level tourism industry.


Furthermore, the exponent-scale nonlinear numerical effects of the eight selected drivers were illustrated in Figure 3. We noticed that all the variables' numerical influencing curves had a similar upward trend. At the same time, we identified the varying impacts of each variable across their development process. It is worth mentioning that X2 and X5 were negatively linearly correlated with the tourism industry, which could not be explained directly. While by further analyzing the nonlinear results, only X2 and X5 appeared to have a significant downward trend, leading to the overall negative linear association in Table 3. This finding proved that model 2 had a superior interpretation capacity over model 1 in fitting global-scale numerical impacts. However, both linear and nonlinear numerical modeling results were produced based on a stationary assumption. As a result, these global-scale outputs might smooth or hide the local-scale heterogeneous impacts of different variables on the tourism industry over the entire study area and time frame, particularly for a fine-scale space–time dataset.

**Figure 3.** Global-scale nonlinear numerical effects of main drivers on China's city-level tourism industry: X1—average wage of employed persons in urban units, X2—employment density of urban units, X3—GDP per capita, X4—population density, X5—nighttime light index, X6—slope, X7—NDVI, and X8—road network density.

#### *3.4. Temporally Varying Impacts of Drivers*

In Figure 4, we presented a TIs graph and five TCs graphs with 95% Bayesian CIs, to exhibit the crude temporal dynamic trend of tourism and the temporally heterogeneous impacts of main drivers on tourism in China, as well as the uncertainties of these estimated parameters. According to Figure 4a, China's tourism development level demonstrated a continuously increasing trend from 2008 to 2017, meaning that China's tourism industry maintained a high development speed spanning ten years. Furthermore, it can be seen from Figure 4b that the temporal tourism–covariates relationships varied non-linearly over 2008–2017. This visualization of local-scale nonstationary regression relationships over periods was an essential feature of the Bayesian STVC model that cannot be facilitated via the adoption of global-scaled coefficients. Generally speaking, X3 (GDP per capita), X4 (population density), and X7 (NDVI) showed a downward trend from 2008 to 2017, which indicated a strong to weak impact of these variables on tourism development over time. While X1 (average wage of employed persons in urban units) and X2 (employment density of urban units) presented an initial downtrend, followed by an upward tendency starting from 2013, suggesting that their roles in promoting tourism development gradually especially after 2013. These findings also meant that groups with high quality of living might be a potentially vital force to promote tourism. In addition, we noticed that the TCs

of X1 and X2 had relatively higher uncertainties (CIs) due to the fluctuation ranges of TCs of X1 (−0.05–0.05) and X2 (−0.02–0.02) being much narrower compared with those of the other indicators, e.g., X4 (−0.4–0.2). In fact, when we plotted all the factors within a single graph using the same vertical coordinate, the uncertainties (CIs) of X1 and X2 turned out to be very small; however, the time trends of X1 and X2 could be smoothed out. From this perspective, the uncertainties of all indicators were within an acceptable range.

**Figure 4.** (**a**) Time-intercepts (TIs) graph: the temporal trend of China's tourism industry from 2008 to 2017, and (**b**) time-coefficients (TCs) graphs (covariates' temporal nonstationarity): the impacts of drivers (X1–X4 and X7) on tourism are varying over ten years. Covariates with time-scale variations: X1—average wage of employed persons in urban units, X2—employment density of urban units, X3—GDP per capita, X4—population density, and X7—NDVI. Shadow areas in each facet are the 95% Bayesian CIs to describe the uncertainties of time-scale parameters.

#### *3.5. Spatially Varying Impacts of Drivers*

Spatially, we retrieved the parameters of SIs from model 3 to geographically map the ten-year average tourism revenue distribution across China, as presented in Appendix A Figure A1. In addition, utilizing the SCs parameters from model 4, the variables' spatial nonstationary maps were depicted in Figure 5a. The cluster maps for parameter SCs were also produced to highlight those significant (>90% confidence) hot spots and cold spots at the city level, as shown in Figure 5b.

From Figure A1, the spatial distribution characteristics of China's tourism revenue demonstrated a gradual increase from West China to East China. Simultaneously, we detected diverse geospatial tourism–covariates relationships at the city level from Figure 5a and an apparent spatial agglomeration effect of SC maps in Figure 5b. In fact, for every single factor of interest, city-specific areas with higher sensitivity to this particular covariate could be visually identified in terms of achieving regional tourism development, based on direct analysis of the corresponding SC map produced by that covariate. Furthermore, within each city area, a series of urban policies could be proposed to facilitate its tourism development based on the relative impacts of the eight-core variables. The relative effect of each covariate within each city could be assessed by vertically integrating the local-scale information from all the SC maps together [50].

**Figure 5.** (**a**) Space-coefficients (SCs) maps (covariates' spatial nonstationarity): spatially varying impacts of main drivers (X1–X8) on total tourism revenue at city level across China, and (**b**) hot spot analysis for SCs maps: X1—average wage of employed persons in urban units, X2—employment density of urban units, X3—GDP per capita, X4—population density, X5—nighttime light index, X6—slope, X7—NDVI, and X8—road network density.

Looking at the macroscopic regional scale using the hot spot maps in Figure 5b, we may discover that: in Northeast China, X8 may serve as an essential factor for promoting the development of the local tourism industry, while X5 and X6 may have no impacts, and the other factors may also have an individual city-specific impact yet without generating geographic hotspot regions in the past. Likewise, the high-level tourism development in China's eastern region may be mainly promoted by X2 and X5; and X4 are not entirely essential. In Western China, with low-level tourism development, X4 may be a primary determinant to improve its tourism conditions. Meanwhile, X6 and X7 also present spatially positive clustered effects in some areas, such as Yunnan and Sichuan. In the regions of Central China, tourism development seems to be dominated by socioeconomic factors, including X1, X2, and X3.

#### *3.6. Spatiotemporal Estimated Maps of China's City-Level Tourism Revenue*

A complete series of spatiotemporal distribution maps of China's city-specific tourism development level from 2008 to 2017 was produced by adopting the optimal Bayesian STVC model (model 4), as shown in Figure 6. The newly model-estimated tourism maps highlighted hidden areas (e.g., cities with missing values) and provided more intuitive information (e.g., smooth the city-level extreme outliers), which were expected to assist in making policies about the sustainable development of tourism. Generally, the overall growth in the city-level tourism industry was identified over the past decade in China, during which time diverse improvement intensities were found among regions at a local

city scale. In Central China, since 2008, about 77% of blue-colored cities with weak tourism industries gradually shifted to yellow/red colors with relatively strong tourism industries. In contrast, such a shifting proportion of East China was about 55%, suggesting that Central China's tourism industry grew faster than that of eastern cities. In terms of West China and Northeast China, the shifting proportions were about 51% and 41%, respectively, which were relatively lower than the other two divisions. In 2017, the low-tourism-level cities with a blue color were mainly distributed in the provinces of Heilongjiang, Gansu, Ningxia, Xinjiang, Qinghai, and Tibet.

**Figure 6.** Estimated spatiotemporal maps for showing dynamic variations of city-level tourism development across China from 2008 to 2017.

Lastly, we performed a hot spot analysis for the newly estimated complete tourism maps in 2007 and 2018, respectively, to detect those significant city clusters (>90% confidence) of the tourism industry, as shown in Figure 7. In 2017, we found four significant tourism industry clusters (hot spots) and one less-developed tourism region (cold spot) at the city level, compared with 2008 with two clearly formed hot spots of the tourism industry. These four identified high-level tourism city clusters in 2017 were demonstrated to be consistent with China's top four major urban agglomerations, namely, Beijing–Tianjin– Hebei, the Yangtze River Delta, the Pearl River Delta, and the Sichuan–Chongqing Region. This might reveal that current tourism agglomeration development is closely related to the urbanization degree. Meanwhile, a cold spot was detected in West China, indicating that the tourism development of western cities was relatively slow.

**Figure 7.** The urban agglomeration of the tourism industry across Chinese cities in 2008 and 2017: hot spot mapping for the areas' total tourism revenue, estimated by the optimal Bayesian STVC model.

#### **4. Discussion**

In this study, the multidimensional impacts of socioeconomic and environmental variables, including linear and nonlinear numerical effects and spatiotemporal heterogeneous effects, on regional tourism were comprehensively investigated across Chinese cities along with the first production of a set of spatiotemporal maps depicting China's total tourism revenue. These findings may add innovative insights about the mechanisms of how multi-source geospatial factors have affected the regional tourism industry, and is expected to provide a brand-new viewpoint for policymakers. According to different scales, we have some conclusions, as follows.

Globally, significant effects of both socioeconomic and environmental variables were identified [28,55–57], which highlighted the necessity of taking a wide range of factors into accounts throughout the procedure of tourism policies formulation. Tourism is a comprehensive industry composed of multiple elements, including food, shelter, transportation, travel, entertainment, and purchase. However, the importance of some of these elements embedded in the tourism industry, such as food, shelter, and transportation, is always ignored for the reason that they are simply regarded as the basic service facilities of a city. Therefore, the positive effects of the socioeconomic and environmental factors on tourism are supposed to be focused on the industrial level, which suggests that the idea of developing industries should always be adopted as the guideline for developing the tourism industry regardless of regional or national levels. At present, the "Travel +" strategy being implemented by the Chinese government is exactly based on this idea [58].

Temporally, the development of China's tourism has mainly benefited from comprehensive time–scale impacts of multiple factors. Based on temporal nonstationarity, the predominant stimulants for tourism development were demonstrated to have gradually switched from the regional economy, populational size, and tourism resource attractiveness to personal economic status. These results implied that China's current tourism industry demonstrated a new feature that a transition from sightseeing tourism to leisure and holiday tourism is very much likely to occur. Meanwhile, residents' affluence has been highlighted as an indispensable contributor to nationwide tourism development [59]. Under such a changing background of the tourism industry in China, it is highly suggested that improving personal income, as well as safeguarding the rights and interests of employees, should be adopted as an essential strategy for facilitating the nationwide tourism industry development, which might be achieved via the implementation of multiple tourism-related policies at governmental levels, such as approving paid-leave policies for employees, encouraging enhanced flexibilities of work schedules to be tailored for vocational leaves, as well as encouraging off-peak vocational arrangements.

Spatially, the development of China's tourism could be characterized as "strong in the east and weak in the west" [30], which was affected by various factors. Cities of West China were mainly affected by population size and tourism resources, while personal income, employment and urbanization had more contributions to cities in the east region [60]. The city-level spatial nonstationarity found in this study could serve as an acceptable reference in the procedure of making more targeted policies by governments at all levels. For example, the western region may put forward corresponding talent introduction policies while promoting economic development. In addition, the local government can develop sightseeing and holiday tourism through developing natural landscapes. Cities of East China need to focus on optimizing the protection system of workers' rights and interests and developing characteristic tourism products to provide tourists with high-end, comfortable, and personalized services for stimulating tourism. Northeast China may focus on infrastructure and strengthen the planning and laying of the road networks to enhance regional tourism accessibility. Furthermore, city-level local authorities could utilize local resources rationally and determine the direction of tourism strategies by using the critical drivers' local spatial influencing maps to support ecotourism, sightseeing tourism, vacation tourism, geological tourism, and urban tourism. In addition, the first series of maps displaying China's tourism revenue's spatiotemporal distributions at an administrative city level from 2008 to 2017 was produced, which was further analyzed to provide urbanizationrelated insights into empirically optimizing the unbalanced development of the tourism industry [61,62].

To sum up, from the multidimensional spatiotemporal heterogeneous perspective, the government should formulate various tourism policies based on region-specific conditions, as well as pursue the development concept of "applying proper measurements in line with local conditions and temporal variations". At present, tourism industry development in areas with relatively high urbanization levels has demonstrated a change from sightseeing tourism to leisure tourism. As a result, socioeconomic status should be continuously considered as a significant factor throughout tourism-related policy-making procedures in these regions. In contrast, regarding cities with low-level urbanization distributed in West China, environmental factors or sightseeing resources, instead of other factors, should be addressed as predominant issues to be considered throughout the formulation of tourismrelated policies [60]. Therefore, making city-specific strategies that take city-specific factors into account is expected to improve the accuracy of policy formulation, as well as the effectiveness of strategic implementation, which would further mitigate both "invalid policy" and "weak policy" produced by the "one-size-fits-all" policy.

Finally, we would like to underline the importance of the local spatiotemporal regression approach, namely, the Bayesian STVC model we have selected. As discussed above, introducing a spatiotemporal heterogeneous perspective to regional tourism management could avoid the one-size-fits-all issue via providing multidimensional spatiotemporal information. In the spatial statistics field, local regressions that can deal with such spatiotemporal heterogeneity among variables relationships (spatiotemporal nonstationarity) are relatively rare, which can be generally classified into the frequentist-type model [63–65] and the Bayesian-type model [35,36,66], as they were proposed independently under different statistical traditions. The main reasons we chose the Bayesian STVC model as the applied local spatiotemporal regression lie in the following considerations. First, only the Bayesian-based local spatial or spatiotemporal model is an actual "full-map" modeling technique; thus, the results are more reliable [67,68]. Second, the Bayesian STVC model follows a space–time independent nonstationary assumption, dramatically reducing the computational burden and weakening the overfitting problem. Last but not least, due to its separately fitting of space-coefficients (SCs) and time-coefficients (TCs), the Bayesian STVC model is more user-friendly: stakeholders can directly separately obtain the spatial and temporal autocorrelated regularities [36,50]. Beyond these benefits, the Bayesian STVC model still needs further improvement to solve more complex space–time interaction issues in natural and social sciences.

#### **5. Conclusions**

This study verifies that socioeconomic and environmental factors simultaneously affect tourism development over China, globally and locally, supported by the up-to-date space-time data of city-level tourism statistics and a series of advanced Bayesian regressions. Remarkably, the local impacts of socioeconomic and environmental conditions vary heterogeneously at the city level in both time and space dimensions across China, and was demonstrated by adopting the cutting-edge Bayesian STVC model, which was also used for estimating the first series of spatiotemporal maps of city-level tourism development. These fruitful findings provide novel insights into policy-making procedures at multiple levels. Here, the Bayesian STVC model was successfully applied to mine the spatial and temporal autocorrelated nonstationarity inherent in tourism–covariates relationships over China and could serve as an emerging tool to offer new insights on spatiotemporal-oriented influencing factor analysis and high-precision prediction in broader GIScience-related fields of social and natural sciences.

Apart from all these achievements, several concerns should be better addressed in future lines of research. First, the seasonal effect is the main factor affecting tourists' behavior [69], which emphasizes collecting and using quarterly tourism data in tourism research. However, this study is limited because national urban tourism data sources only have annual scale records. Second, other underlying tourism-related factors such as tourism resources were not fully considered in this study [34]. Future studies might focus on a relatively small area with seasonal heterogeneity by using multi-source tourism data to construct more scientific indicators [70] and developing more sophisticated spatiotemporal statistical models for outputting more informative results for regional tourism research.

**Author Contributions:** Conceptualization, Chengwu Wang and Chao Song; data curation, Xu Zhang, Zhangying Tang and Honghu Tang; formal analysis, Xu Zhang, Chao Song, and Chengwu Wang; funding acquisition, Chao Song, Chengwu Wang and Zhangying Tang; investigation, Chengwu Wang and Xu Zhang; methodology, Chao Song and Zhoupeng Ren; project administration, Chao Song; software, Chao Song and Mingyu Xie; visualization, Xu Zhang, Chao Song, Mingyu Xie and Honghu Tang; writing—original draft, Xu Zhang; writing—review and editing, Chao Song, Chengwu Wang and Yili Yang. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was collaboratively supported by the National Natural Science Foundation of China (Grant No. 42071379 and 41701448), the State Key Laboratory of Resources and Environmental Information System (Grant No. 201811), the Applied Basic Research Funded Project of Sichuan Science and Technology Department (Grant No. 2020YJ0117), the Humanities and Social Sciences Research Fund Project of Southwest Petroleum University (Grant No. 2019RW021), the Chengdu Federation of Social Science Association (Grant No. 2021ZC003), and the Fund for Introducing Talents of Sichuan University (Grant No. YJ202157). The funders had no conflicts in study design, data collection and analysis, decision to publish, or manuscript preparation.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. The tourism and socioeconomic data can be found in China City Statistical Yearbook and Statistical Bulletin. The environmental data can be found here: http://data.cma.cn/ (accessed on 28 April 2021) and http://www.resdc.cn/ (accessed on 28 April 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Figure A1.** Space-Intercepts (SIs) map: the model-estimated geographical variations of China's ten-year average total tourism revenue at the city level during 2008–2017.

#### **References**


## *Article* **Spatial Distribution Pattern and Influencing Factors of Sports Tourism Resources in China**

**Yifan Zuo 1,2,†, Huan Chen 2,†, Jincheng Pan 3, Yuqi Si 2, Rob Law <sup>4</sup> and Mu Zhang 2,\***

	- <sup>3</sup> Department of Physical Education, Guizhou University of Finance and Economics, Guiyang 550000, China; 1711621003@student.sus.edu.cn
	- <sup>4</sup> School of Hotel and Tourism Management, The Hong Kong Polytechnic University, Hong Kong, China; rob.law@polyu.edu.hk
	- **\*** Correspondence: zhangmu@jnu.edu.cn; Tel.: +86-755-2693-1865
	- † The authors contribute equally to this article.

**Abstract:** Sports tourism is an emerging tourism product. In the sports and tourism industry, resource mining is the foundation that provides positive significance for theoretical support. This study takes China's sports tourism boutique projects as the study object, exploring its spatial distribution pattern through the average nearest neighbor index, kernel density, and spatial autocorrelation. On the strength of the wuli–shili–renli system approach, the entropy value method and geographic detector probe model are used to identify the driving factors affecting the spatial distribution pattern. Findings reveal the following: (1) From 2013 to 2014, the sports tourism resources in China present a distribution pattern with the Yangtze River Delta urban agglomeration as the high-density core area and the Guizhou–Guangxi border area and the western Hubei ecological circle as the sub-density core areas. (2) From 2014 to 2018, China's sports tourism boutique projects increased by 381, and the regional differences among various provinces tended to converge. The high-density core area remained unchanged. The sub-density cores are now the Yunqian border area of the Karst Plateau, the Qinglong border area of the Qilian Mountains, and the Jinji border area of the Taihang Mountains, shaping the distribution trends of "depending on the city, near the scenery" and "large concentration, small dispersion". (3) The proportion of provincial sports tourism development classified as being in the coordinated stage is 61.29%. (4) The explanatory power of the factors affecting the spatial layout in descending order is natural resource endowment, sports resource endowment, transportation capacity, industrial support and guidance, market cultivation and development, people's living standards, software and hardware services, and economic benefit effects. The explanatory power of the interaction of two different factors is higher than that of the single factor.

**Keywords:** sports tourism; spatial distribution; geographic detector; influencing factors; China

### **1. Introduction**

Sports tourism is defined as "the use of sports as a vehicle for tourism endeavor" [1]. In recent years, with the increase in public leisure time, continuous enhancement of fitness awareness, and rapid expansion of tourism consumption, the Chinese government has vigorously promoted the development of sports tourism to satisfy the people's yearning for a better life. The government has called for the creation of sports tourism demonstration zones and encourages the construction of relevant boutique projects. However, in the course of its rapid development in various places, problems such as unreasonable layout, inadequate resource utilization, insufficient capital investment, lack of stadiums, and poor matching of resources with sports characteristics have frequently occurred. As an emerging spatial and regional unit, effective analysis of sports tourism patterns and influ-

**Citation:** Zuo, Y.; Chen, H.; Pan, J.; Si, Y.; Law, R.; Zhang, M. Spatial Distribution Pattern and Influencing Factors of Sports Tourism Resources in China. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 428. https://doi.org/10.3390/ ijgi10070428

Academic Editors: Wolfgang Kainz, Andrea Marchetti and Angelica Lo Duca

Received: 22 May 2021 Accepted: 21 June 2021 Published: 23 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

encing factors has important practical significance for its layout optimization as well as the sustainable and moderate development.

Current research on sports tourism mainly focuses on the impact on society, economy, and culture. In addition, the characteristics, needs, behaviors, and markets of sports tourists are the exploration hotspots, along with the sports tourism destination planning, product development, and safety management. Cooper and Alderman discussed the influence of canceling sports events on the economy, society, culture, and the environment; in response to the COVID-19, relevant sports tourism alternatives are necessary to promote sustainability [2]. Nishio et al. developed a motivational scale for sports fans (social, achievement, relaxation, and games) and a general tourist motivation scale (escape, nature, shopping, and food) for sports tourism [3]. Jin et al. proposed that the event quality affects the respondents' perceived value, destination image, and behavioral intentions. A structural equation model for related tests ultimately showed that the quality of the event and its perceived value have a significant effect on behavioral intentions [4]. Page et al. compared the safety experience of adventure travelers in New Zealand and Scotland, then commented on the adventure tourism accident compensation legislation and jurisprudence. In addition, the study discussed the injury experience and safety management of adventure travel customers in Queensland and analyzed the adventure travel accidents of inbound tourists from 1982 to 1996 in New Zealand [5].

At present, few studies focus on the spatial structure or distribution of sports tourism resources. Fugao and Li construed the ideal evolution of the spatial structure of sports tourism in the entire region, using spatial structure theories such as growth pole, point–axis, patch–corridor–matrix, and network structure to explain the generation and evolution of the spatial structure of "point–line–surface–domain" [6]. Zuo et al. took the lead in exploring the spatial distribution characteristics of Chinese marathon events. The size of urban populations; living standards; and the overall quality of urban residents (including the concept of sports and leisure), the social environment, and other social mechanism factors affect the spatial distribution of marathon events [7]. Additionally, the spatial distribution characteristics of Chinese marathon events are investigated based on the perspective of natural resources [8]. Geneletti took advantage of geographic information system (GIS) technology with biology, physics, landscape, and other indicators to determine the environmental effect assessment of ski tourism destinations [9]. Thus far, only a few studies have analyzed the spatial distribution pattern of sports tourism resources in China [10,11].

In summary, academic circles have not sufficiently probed the issues of sports tourism and rather focused mainly on account of psychology, management, or behavior. Although a few studies involve spatial structure, the scale is mostly limited to regions, provinces, and cities and therefore lacks a macro-level analysis on the systematic review of the spatial distribution pattern of national sports tourism resources. In view of this, this article systematically sorts out China's sports tourism boutique projects from 2013 to 2018, selecting the nodes in 2014 and 2018. Moreover, the spatial distribution and distribution characteristics of China's sports tourism resources are described through average nearest neighbors, nuclear density analysis, and spatial autocorrelation. In addition, the study uses the entropy weighting method and geographic detector model to identify the driving factors affecting the spatial distribution, making theoretical contributions to the study of the spatial distribution pattern of sports tourism resources. The result builds a systematic index system of influencing factors. It is expected to provide countermeasures for the optimization and healthy development of China's future sports tourism spatial layout, and to provide a reference for the reasonable layout and appropriate development of future sports tourism resources in other countries or regions.

#### **2. Materials and Methods**

*2.1. Methods*

#### 2.1.1. Average Nearest Neighbor

The nearest neighbor distance measures the mutual proximity of sports tourism resources in the spatial distribution. The nearest neighbor index reflects the spatial aggregation characteristics of sports tourism resources, that is, the ratio of the actual to the theoretical nearest neighbor distances [12]. The nearest neighbor index is calculated as

$$ANNI = \frac{ANNO}{ANNE} = 2\sqrt{D} \times ANNO \tag{1}$$

In the above formula, *ANN* represents the nearest neighbor index, *ANNO* represents the average nearest neighbor distance, *ANNE* represents the theoretical nearest neighbor distance, and *D* represents the nearest neighbor density [12], where

$$ANNE = \frac{1}{2\sqrt{n/A}} = \frac{1}{2\sqrt{D}}\tag{2}$$

In Formula (2), *A* represents the area of the province and *n* represents the number of sports tourism resources. When *ANNI* = 1 and *ANNO* = *ANNE*, the sports tourism resources are randomly distributed; when *ANNI* < 1 and *ANNO* < *ANNE*, the sports tourism resources are in an agglomerated distribution; when *ANNI* > 1 and *ANNO* >*A NNE*, the sports tourism resources are uniformly distributed. The smaller the *ANNE*, the higher the concentration of sports tourism resources. Both Zuo et al. [7] and Wang et al. [13] used average nearest neighbor to determine the distribution state of the studied elements; quantify the spatial relationship; and judge whether the elements are clustered, random, or dispersed.

#### 2.1.2. Kernel Density

Kernel density analysis is a quantitative estimation of the density of dot-like objects using a moving cell. The assumption is that geographic events can occur at any location in space, but with different probabilities at different locations. The probability of event occurrence is high in areas where dot-like objects are dense and low in areas where dot-like objects are sparse is low [7]. The analytical formula for kernel density is

$$\widetilde{\lambda}(s) = \sum\_{i=i}^{n} \frac{1}{\pi^2} k(\frac{s - s\_i}{\pi}) \tag{3}$$

In the above formula, *k*( ) represents the kernel function, *τ* (*τ* > 0) represents the bandwidth, *n* represents the number of sample points, and (*s* − *si*) represents the distance between the dot-like object *s* and the estimated point *si* [7]. This formula has been tested many times, and the data selection search bandwidth is 333.6 km to more intuitively reflect the spatial distribution of sports tourism resources. Yoo et al. [14] and Allen et al. [15] made use of kernel density in order to determine the center position of a specific element. The density is the highest at the center position, and it decays with distance. The density is zero at the limit distance in the end.

#### 2.1.3. Spatial Autocorrelation (Global Moran's I)

Spatial autocorrelation reflects the degree of correlation between a certain geographic phenomenon or attribute value on a regional unit and the same phenomenon or attribute value on adjacent regional units [16]. This study uses Moran's I index, which is

$$I = \frac{\sum\_{i=1}^{n} \sum\_{j=1}^{n} \omega\_{ij} (X\_i - \overline{X}) (X\_j - \overline{X})}{S^2 \sum\_{i=1}^{n} \sum\_{j=1}^{n} \omega\_{ij}} \tag{4}$$

In the above formula, ω represents the spatial weight between areas *i* and *j*; n represents the number of regions; and *Xi* and *Xj* represents the observation values of locations *i* and *j*, respectively. The value range of Moran's I is [−1, 1]: Moran's *I* > 0 indicates a positive spatial correlation phenomenon, Moran's *I* < 0 indicates a negative correlation phenomenon, and Moran's *I* = 0 indicates an independent random distribution [16]. Zuo et al. [7] and Zhang et al. [17] used global Moran's I to calculate the Moran's *I* value of the research elements on a continuous spatial scale to explore the strength of the spatial correlation of the research elements and their changes with the spatial scale.

#### 2.1.4. Entropy Method

Compared with the analytic hierarchy process, the entropy method is more objective. The weight is determined mainly based on the information provided by index data, and not by whether the data are linear or not. This method can effectively avoid the interference of human factors and has a higher credibility [18]. Zhang et al. [19] and Li et al. [20] used the entropy method to weight the indicators according to the connection degree of each indicator or the amount of information provided, effectively avoiding the subjective factors of the indicator system results. When constructing the indicator system, this study uses the entropy method to measure the natural resource endowment, sports resource endowment, hardware and software services, transportation capacity, people's living standards, industrial support and guidance, economic benefit effects, and market development in various provinces and municipalities in China to accurately analyze various influencing factors. Thus, this study provides the premise and foundation for the influence of the spatial layout of sports tourism resources.

First, the range standardization is performed on the original data of different magnitudes and dimensions. The formula is

$$q\_{ij} = \begin{cases} \left. \mathbf{x}\_{i\bar{j}} - \min(\mathbf{x}\_{i\bar{j}})/\max(\mathbf{x}\_{i\bar{j}}) - \min(\mathbf{x}\_{i\bar{j}}) \right| & q\_{ij} \text{ is the positive index} \\ \max(\mathbf{x}\_{i\bar{j}}) - \mathbf{x}\_{i\bar{j}}/\max(\mathbf{x}\_{i\bar{j}}) - \min(\mathbf{x}\_{i\bar{j}}) & q\_{ij} \text{ is a negative index} \end{cases} \tag{5}$$

In the above formula, *qij* represents the data value after standardized processing; *xij* represents the original data value, where *i* (*i* = 1, 2, 3, ... , m) is the sequence number of the evaluation index; *j* (*j* = 1, 2, 3, ... , n) is the number of points; and *max xij* and *min xij* are the maximum and minimum values of the corresponding index of the order parameters at the critical point of system stability, respectively.

The weight of the *i*th index of a data set containing m indexes and *n* samples is calculated as

$$\mathcal{W}\_{i} = \frac{1 + \frac{1}{\ln n} \sum\_{j=1}^{n} \left( \frac{Q\_{ij}}{\sum\_{j=1}^{n} Q\_{ij}} \ln \frac{Q\_{ij}}{\sum\_{j=1}^{n} Q\_{ij}} \right)}{m + \sum\_{i=1}^{m} \left( \frac{1}{\ln n} \sum\_{j=1}^{n} \frac{Q\_{ij}}{\sum\_{j=1}^{n} Q\_{ij}} \ln \frac{Q\_{ij}}{\sum\_{j=1}^{n} Q\_{ij}} \right)} \tag{6}$$

In the above formula, *Wij* represents the weight of the *i*th index; *Qij* represents the standardized data value, and each index is summed; and *Uij* represents the comprehensive evaluation value of the factors affecting the spatial layout of sports tourism resources. The formula is

$$\mathcal{U}\_{ij} = \sum\_{i=1}^{m} \mathcal{W}\_i \times \mathcal{Q}\_{ij} \tag{7}$$

#### 2.1.5. Geodetector

Geodetector is a tool used to analyze and detect spatial differentiation by identifying the extent to which a certain factor explains the spatial differentiation of the result variable, thereby revealing the source of its spatial difference [21]. The formula is

$$q = 1 - \frac{\sum\_{h=1}^{L} N\_h \sigma\_h^2}{N \sigma^2} \tag{8}$$

In the above formula, *L* represents the variable stratification, that is, classification or partition; *Nh* and *N* represent the number of units in layer *h* and the entire area, respectively; *σ*2 *<sup>h</sup>* and *<sup>σ</sup>*<sup>2</sup> represent the variance of the result variable in layer *<sup>h</sup>* and the entire area, respectively; and *q* represents a certain front. The magnitude of influence of the dependent variable on the outcome variable is in the range of [0, 1]. The closer *q* is to 1, the greater the explanatory strength of the pre-dependent variable on the outcome variable. Conversely, the closer *q* is to 0, the smaller the explanatory strength. This study uses the geographic detector method to identify the factors affecting the spatial distribution of sports tourism resources in China.

The purpose of interaction detection is to assess whether the explanatory power of the spatial differentiation of China's sports tourism resources increases or decreases when two factors are working together. The evaluation method is to judge the direction and method of interaction between factors by comparing the values of the single and double factors of *q*, which can generally be divided into five categories [21]: (1) nonlinear weakening Q < Min(q(X1),q(X2)); (2) single-factor nonlinear weakening Min(q(X1),q(X2)) < Q < Max(q(X1),q(X2)); (3) two-factor enhancement Q > Max(q(X1),q(X2)); (4) independent Q = X; and (5) nonlinear enhancement Q > X, where Q = q(X1)∩q(X2), X = q(X1) + q(X2). Among them, Q = q(X1)∩q(X2), X = q(X1) + q(X2), where q(X1) and q(X2) are the influencing factors of the spatial differentiation of sports tourism resources in China. Both Chi et al. [22] and Zhang et al. [17] used Geodetector to study the similarity between the independent variable and the dependent variable in the spatial distribution to understand whether different influencing factors have an interactive effect on the spatial distribution.

#### *2.2. Index Selection*

This study integrates the particularity of China's sports tourism resources in its development and follows the relevant principles of scientific, representativeness, operability, reliability, and availability in the selection of indicators. The structure of factors affecting the spatial distribution of sports tourism resources is described in view of the wuli–shili– renli (WSR) methodology, a system theory with Eastern philosophy. The basic core of its philosophy and concept is to consider not only the aspects of objects, but also their better applications to material aspects when dealing with complex issues [23]. Given that sports tourism contains many complex components of people and things, involving their composition and relationships, we learn from previous studies and apply the WSR to multidimensional analysis [24]. WSR methodology was proposed by Gu and Zhu. It is not only a methodology, but also a framework tool for solving complex problems. The connection between wuli, shili, and renli is the coordination of the relationship between intention, goal, reality, strategy, plan, and conception, which can coordinate the relationship between input, output, and outcome of system practice [25].

The quantity of sports tourism resources is taken as the dependent variable. At the same time, we construct a model that can explore the main influencing factors of the spatial distribution of sports tourism resources in China. Variables are selected from natural resource endowment, sports resource endowment, hardware and software services, transportation capacity, people's living standards, industrial support and guidance, economic benefit effects, and market cultivation and development, as shown in Figure 1. The four factors of natural resource endowment, sports resource endowment, software and hardware services, and transportation capacity provide conditions for the occurrence of sports tourism activities and also restrict the scale and efficiency of internal operations. They are the internal motivation of sports tourism activities and are at the core, which is in line with the physical dimension of the understanding of the objective world, belonging to the physical dimension. People's living standards, industrial support, and guidance are external influencing factors that provide power for the demand market of the sports tourism industry. They are the prerequisite and foundation for the smooth operation of the industry, which is consistent with shili dimension's response to events. Therefore, the two factors belong to shili dimension. The two factors of economic benefit effects and market cultivation and development are internal factors. They act on the people's sports tourism practice and open up the sports tourism market, which can promote the improvement and effect of external influencing factors. In line with renli dimension's understanding of the actual effects of the incident, they belong to the renli dimension.

**Figure 1.** Impact analysis framework of the spatial distribution of sports tourism resources in China.

As shown in Table 1, the choice of variables is built on the following assumptions:



**Table 1.** Index selection of factors affecting sports tourism resources in China.

#### *2.3. Data Sources*

The data of sports tourism resources came from the recommended list of "China Sports Tourism Boutique Projects" (only the finalists) announced by the General Administration of Sports. In order to promote reasonable regional planning of sports tourism and accelerate the efficiency of sports tourism in releasing new economic kinetic energy, the Chinese government began to cultivate sports tourism boutique projects in 2013. Sports tourism boutique projects are operational tourist attractions, scenic spots, routes, events, festivals, and other projects that are reported by provinces, municipalities, autonomous regions, and municipalities directly under the Central Government and selected by expert appraisal teams. They are based on the market and centered on the sports needs of tourists. Besides, they are supposed to provide tourists with a certain degree of participation and viewing value. Statistics from 2013 to 2018 show a total of 755 sports tourism boutique projects. Supplementary data were found in the China Statistics Bureau, provincial (cities, districts) tourism development statistical bulletins, local tourism industry bulletins, and "China Sports Tourism Boutique Project Development Report". Repeated declarations were screened. The number of sports tourism boutique projects was 209 in 2014 and 590 in 2018. Sampling was based on the venue for boutique events, coordinates of the visitor center for boutique scenic spots, government location for boutique destinations, and starting point locations for the boutique route.

Considering the consistency of the statistical caliber of relevant indicators involved in sports tourism, data from 31 provinces in China (excluding Hong Kong, Macao, and Taiwan) in 2018 were selected for analysis. The data were mostly derived from the 2018 China Statistical Yearbook, China Tourism Statistical Yearbook, and China Mass Sports Development Report. Several indicators were supplemented by data from local statistical yearbooks, statistical bulletins of the local sports bureaus, and the official website of the Ministry of Finance of China. In addition, the maps of China were all obtained from the Resource and Environmental Science Data Center of the Chinese Academy of Sciences (http://www.resdc.cn/Default.aspx, accessed on 12 May 2021).

#### **3. Results**

#### *3.1. Pattern of Sports Tourism Resources in 2014*

Figure 2 shows that the distribution level of sports tourism resources of the 31 provincial research units in China can be classified as tentative (cumulative ratio = 0%), low (cumulative ratio 0–6%), medium (cumulative ratio 6–44%), or high (cumulative ratio 44–100%). Specifically, Beijing, Hunan, Liaoning, Ningxia, Sichuan, Tianjin, Yunnan, Chongqing, Jilin, and other places do not have shortlisted sports tourism boutique projects in 2013 and 2014 and are considered blank areas. Jiangxi, Tibet, Guangdong, Zhejiang, Shanghai, and Xinjiang account for 6% of the national sports tourism resources, representing areas with low development levels. Guangxi, Heilongjiang, Hainan, Shaanxi, Fujian, Gansu, Henan, Shanxi, Qinghai, Hebei, and other provinces account for 38% of the national sports tourism resources, belonging to the middle-level development area. Shandong, Jiangsu, Inner Mongolia, Hubei, Guizhou, and Anhui account for 56% of the national sports tourism resources, belonging to the high-level development area.

**Figure 2.** Lorentz curve of the distribution of China's sports tourism resources in 2014.

This study used ArcGIS 10.2 software (ESRI, Inc., Redlands, CA, USA) and average nearest neighbor to analyze China's sports tourism resources in 2014. The results are as follows: average observation distance is 62,725.6701 m, expected average distance is 133,750.1520 m, nearest neighbor ratio R is 0.468976, Z is −14.686483, and the significance level is *p* < 0.001, indicating that China's sports tourism resources in 2014 showed a clear agglomeration distribution in space.

The spatial agglomeration characteristics of sports tourism resources are discussed through nuclear density mapping. Figure 3 shows that before 2014, China's sports tourism resources displayed the Yangtze River Delta city cluster as the high-density core area, while the Guizhou–Guangxi border and the western Hubei ecological circle are the secondary density core area. Moreover, China's sports tourism resources have a distribution trend of "depending on the city, near the scenery", which means forming a central city based on the surrounding scenery. Central diffusion gradually forms an axial zone, which superimposes with the central radiation and coexists to form a network surface [6]. Moreover, the spatial characteristics reveal large concentration and small dispersion, that is, mainly concentrated in urban agglomerations and areas with high natural resource endowments, and a small amount scattered in areas with less traffic access but with unique natural resources. The possible reasons are, on the one hand, the sports tourism industry in the Yangtze River Delta urban agglomeration has a good foundation and clear location advantages. On the other hand, the levels of per capita disposable income and per capita consumption expenditure are much higher than those of other regions in the country. As a result, consumption demand is continuously driven and the supply side structure is continuously optimized. The market of sports tourism is vast, leading the country in terms of development [31]. The Guizhou–Guangxi border area and western Hubei ecological circle have superior natural environmental conditions, both of which are karst geomorphic regions. The natural scenery is the development feature. The precious resource of sports tourism is an ideal place to carry out exciting and entertaining activities such as rock climbing and bungee jumping [32]. Urban agglomerations and areas with high natural resource endowments have attracted a large number of sports tourism resources due to their

superior geographical location, sound infrastructure, convenient transportation routes, and industrial policy support and guidance. Zones with poor transportation access but with unique natural resources because of their different tourist experiences can still attract the layout of sports tourism.

**Figure 3.** Distribution of core density of China's sports tourism resources in 2014. The color in the figure changes from gray to red: the redder the color, the more clustered the sports tourism projects. See the nuclear density index as shown in the legend for details.

By using global spatial autocorrelation technology, the spatial characteristics and aggregation effects of China's sports tourism resources were further explored. Adopting ArcGIS 10.2, the global Moran's I index was calculated to obtain the global autocorrelation of China's sports tourism resources. Global Moran's I is −0.0084, Z(I) is 0.2503, and P(I) is 0.376, illustrating that the national provincial sports tourism resources do not have significant spatial agglomeration trends. The amount of sports tourism resources in each province is not related to those in surrounding provinces. Above all, the effect of "neighboring dependence" has not been formed, which is not conducive to sports tourism development in the entire region.

#### *3.2. Pattern of Sports Tourism Resources in 2018*

From 2014 to 2018, China's sports tourism boutique projects increased by 381, with an average annual increase of 127. Most provinces have a certain degree of growth. Jiangsu, Qinghai, Yunnan, Gansu, Shanxi, Anhui, Guizhou, and Hubei all have over 30 new projects, with a rapid growth trend. Comparing the Lorentz curves of the distributions of sports tourism resources in each province in the two years (Figures 2 and 4), it can be seen that the regional differences tend to converge. The division of the resource distribution in 31 provincial research units across the country also changed into tentative (cumulative ratio = 0%), low (cumulative ratio 0–17%), medium (cumulative ratio 17–46%), and high (cumulative ratio 46–100%). In particular, from 2015 to 2018, no sports tourism boutique projects were shortlisted in Beijing, Hunan, Chongqing, and other places, which are temporarily depicted as blank areas. Sichuan, Tibet, Guangdong, Jilin, Hainan, Ningxia, Tianjin, Shanxi, Shanghai, Liaoning, Jiangxi, Zhejiang, and other provinces account for 17% of the national sports tourism resources and are areas with low development levels. Henan, Shandong, Heilongjiang, Guangxi, Xinjiang, Hebei, Fujian, and Inner Mongolia account for 17% of the national sports tourism resources. Regions with medium development levels account for 29%. Yunnan, Hubei, Gansu, Shanxi, Jiangsu, Anhui, Guizhou, and Qinghai account for 54% of the national sports tourism resources and are regions with high development levels. Yunnan, Gansu, Shanxi, and Qinghai have risen from the previous blank, medium, and low to high development levels. Xinjiang has risen from a previous low development level to a medium development level, and Liaoning, Ningxia, Sichuan, Tianjin, and Jilin have risen from the previous blank development level to a low level of development. However, Shandong, Hainan, Shaanxi, and Inner Mongolia have been downgraded.

**Figure 4.** Lorentz curve of the distribution of China's sports tourism resources by province in 2018.

ArcGIS 10.2 (ESRI, Inc., Redlands, CA, USA) was applied to analyze the average nearest neighbors of China's sports tourism resources in 2018. Results show that the average observation distance is 33,333.9247 m, expected average distance is 82,961.2671 m, the nearest neighbor ratio R is 0.401801, Z score is −27.797295, and the significance level is *p* < 0.001. Thus, China's sports tourism resources in 2018 show a clear agglomeration distribution in space.

As shown in Figure 5. Compared with the results of 2014, while the Yangtze River Delta urban agglomeration remains a high-density core area, the secondary density core areas are now the Yunqian border area of the Karst Plateau, Qinglong border area of the Qilian Mountains, and the Jinji border area of the Taihang Mountains. Nonetheless, the distribution trait is also "depending on the city, near the scenery" and "large concentration, small dispersion". By contrast, the location of the core area of the Yangtze River Delta urban agglomeration has no significant change in 2018, but the sub-density core area extends to the northwest toward the Qinglong border area of the Qilian Mountains and to the southwest towards the Yunqiangui border area of the Karst Plateau. The core area of the western Hubei ecosphere is transformed into the Taihangshan Jinji border area. The possible reason is mainly that the Yangtze River Delta urban agglomeration has the highest level of economic development and the highest residents' living standards in China. The sports tourism market and industrial chain in this area are more mature than in

other regions. Moreover, the development and investment prospects of the sports tourism market are broad. Numerous ethnic minorities reside near the Yunnan–Guizhou Plateau, and their habitats, geomorphology, and climatic conditions are special. The surrounding environment is beautiful and scenic, thus ushering in the explosive period of sports tourism development [33]. Qilian Mountain Qinglongbian District is situated in the golden section of the Silk Road Economic Belt. Rich in geography, water, biological resources, cultural relics, folk customs, sports competitions, and other resources, the area is suitable for the development of sports tourism projects. The relevant resources in the core area have gradually changed from "dispersed" to "intensive" [34]. Natural beauty, historical civilization, and revolutionary historical sites together constitute the unique sports tourism resources Jinji border area of the Taihang Mountains. Relying on the complex and changeable geology, geomorphology, hydrology, and meteorology, as well as a long history and a heavy and ancient sports culture, this area can provide the foundation and guarantee for the development of sports tourism in the Shanxi–Hebei border area of the Taihang Mountains [35].

**Figure 5.** Distribution of core density of China's sports tourism resources in 2018. The color in the figure changes from gray to red: the redder the color, the more clustered the sports tourism projects. See the nuclear density index as shown in the legend for details.

Using ArcGIS 10.2 to calculate the global Moran's I index, global Moran's I is −0.0897, Z(I) is −0.4782, and P(I) is 0.325, demonstrating that no remarkable spatial aggregation trend occurred for sports tourism resources at the provincial level across China in 2018.

As a further exploration, the amount of sports tourism resources in each province was superimposed with the growth rate. These two factors were divided into six development stages using the coupling and coordination model [36]. Figure 6 shows that the provinces coordinating the development of sports tourism account for 61.29%. Fewer provinces exhibit extreme incoordination, namely Chongqing, Hunan, Beijing, and Sichuan. Several provinces are between basic incoordination (Shandong, Ningxia, Jilin Province, Zhejiang) and primary coordination (Hainan, Shanghai, Liaoning, Hebei). The main reason for the above situation is that the development of sports tourism has attracted much attention in recent years. Both the sports and tourism industries are strongly advocating sports tourism, and China has promulgated various policies that are conducive to this development. In addition, marathon events and sports hardware facilities in scenic spots have been implemented as its foundation. As such, the coordination stage of sports tourism development in the eastern region is far ahead, which is evidently higher than the national average. The northeast, northwest, and parts of the southwest are at the same level as the national average, which is stable and gradually becoming more coordinated. Sichuan, Chongqing, Hunan, Guangdong, Shaanxi, and other places still have a large room for improvement.

**Figure 6.** Level of coordination of sports tourism in various provinces in China. The color in the picture changes from blue to red: the redder the color, the more coordinated the development of sports tourism. See the legend for details.

#### *3.3. Factors Influencing the Spatial Distribution of Sports Tourism Resources* 3.3.1. Influencing Factors of the Spatial Distribution of Sports Tourism Resources

The geographic detector model was used to explore the essential mechanism of the differences in the spatial distribution of sports tourism resources in China to seek a more scientifically specific optimization path for regional sports tourism development planning. The rapid cluster analysis method in SPSS 24.0 (SPSS Inc., Chicago, IL, USA, 2019) was used to classify the driving factors, such as natural resource endowments, sports resource endowments, software and hardware services, transportation capacity, people's living standards, industry support and guidance, economic benefit effect, and market cultivation and development into five categories from high to low. Then, the geographic detector analysis was carried out to calculate the q value of each driving factor on the spatial distribution of China's sports tourism resources. Table 2 shows the results.

In Table 2, the q value means the extent to which the detection factor explains and affects the spatial distribution of China's sports tourism resources. The larger the q value, the greater the impact of the factor [22]. In general, among the identified eight driving factors, the order of descending impact on the spatial distribution of China's sports tourism resources is as follows: natural resource endowment > sports resource endowment > transportation capacity > industrial support and guidance > market cultivation and development > people's living standards > software and hardware services > economic benefit effect. The principal factors are natural resource endowment, sports resource endowment, transportation capacity, industry support and guidance, and market cultivation and development.

**Table 2.** Detection results of factors affecting the spatial distribution of sports tourism resources in China.


Significance level, *p* < 0.05.

The details are as follows:


#### 3.3.2. Analysis of Detection Factor Interaction Results

Interaction was used to reveal whether an interactive relationship exists among the abovementioned influencing factors. In Table 3, the results show that the explanatory power of different two-factor interactions is higher than that of single-factor interactions. Meanwhile, the interaction types presented are nonlinear and two-factor enhancements. Specifically, the explanatory powers after Living∩Market, Living∩Industrial, Natural∩Sport, Natural∩Living, and Service∩Living are in the top five of all interaction factors. The biggest differences in explanatory power before and after the interaction are those of Living∩Market, Service∩Living, and Living∩Industrial. The reasons are clear. First, the improvement of people's living standards has greatly satisfied their material needs, allowing people to place greater emphasis on the pursuit of spiritual life. Sports tourism can relax tourists and achieve the purpose of physical and mental pleasure, which can satisfy people's pursuit of spiritual life. Second, the sports tourism industry has gradually increased in importance in the development of China's National Economic Law, which can effectively promote progress in related industries. The sustainable development of sports tourism is inseparable from natural resources, sports resources, and software and hardware services. Consequently, the comprehensive interaction of the above factors can significantly affect the spatial distribution of sports tourism resources. For example, located in the core area of tourism, Anhui has certain advantages in natural resources, transportation, economic benefits, and market cultivation and development. With the relative balance of all influencing factors, sports tourism development in Anhui is comparatively high. In similar situations are Guizhou, Qinghai, Gansu, Shanxi, Inner Mongolia, Yunnan, and Fujian provinces.

Nevertheless, Tibet, Ningxia, and Liaoning, which are the cold spots of sports tourism, are relatively backward in natural resources, sports resources, hardware and software services, transportation capacity, people's living standards, economic benefits, and market cultivation and development. In addition, the levels of these influencing factors show spatial unevenness. However, the above provinces have great potential for improvement, which illustrates that the future development of sports tourism should give full play to the advantages of natural and sports resources and carry out good planning and layout. Making full use of financial support and counting on the tourism public service platform gradually improves the sports tourism service system. Reinforcing the connection and extension of expressways and ordinary roads in remote areas is encouraged to provide better transportation services for tourism. Subsequently, accelerating the development of the tertiary industry to optimize the province's tourism market structure is also a fantastic way to create a distinctive sports tourism brand, which can lead to economic gains and promote related industries through internal penetration, extension, and expansion of the industry.


**Table 3.** Interaction results and types of detection factors.

#### **4. Conclusions**

This study comprehensively implemented a combination of the average nearest neighbor, kernel density, and spatial autocorrelation to explore the spatial distribution of 209 and 590 sports tourism boutique projects in 2014 and 2018, respectively. Their influencing factors were determined by combining the entropy method and the geographic probe model. The major takeaways from this study are as follows:


#### **5. Discussion**

The analysis above shows that China's sports tourism resources present an obvious agglomeration distribution. In their research, Fugao and Li proposed that the development of regional sports tourism has gone through four stages: point symbiosis at the core node of sports tourism, intermittent symbiosis of sports tourism short chain, continuous symbiosis of sports tourism industry chain, and integrated symbiosis of sports tourism industry network. The stages embody the generation and evolution process of the spatial structure of "point–line–surface–domain" [6]. As seen from the results of this study, the spatial distribution of China's sports tourism resources can indeed reflect the spatial structure of "point–line–surface–domain". The polarization of "spots" is the embryonic stage of sports tourism. Research has found that sports tourism tends to gather in the "spots" that are more developed. The development form is mainly spontaneous, and the spatial form is mainly scattered. The findings of this article are similar to those of Zuo et al. Both the development of sports tourism and the marathon events have emerged in economically developed areas and spread to surrounding areas. For example, they originated from Beijing, Shanghai, and so on, and then spread to Beijing–Tianjin–Hebei region and the Yangtze River Delta.

The analysis also indicates that natural resource endowments and sports resource endowments are the most important factors affecting the spatial distribution of sports tourism in China. It is worth noting that this viewpoint is supported by Kurtzman et al. and Zuo et al. They believe that natural resource endowment is the main source of demand for the development of sports tourism resources. The richer the natural tourism resources, the better the development of sports tourism [26]. The holding of sports events often also promotes the development of the sports tourism industry [7].

As mountain biking, skiing, rock climbing, sailing, swimming, and other sports activities often occur in natural settings such as rivers, mountains, skiing, forests, lakes, seashores, hot springs, grasslands, and other places, the natural resources of tourist destinations are the key attraction for tourists [11]. Besides, our findings are similar to those of Kurtzman and Zauhar, who found that sports events can attract a large number of spectators to participate in tourism [37]. In particular, the sports boutique events attract a large number of sports enthusiasts every year. Among them, the more influential and representative events include the upcoming Beijing Winter Olympics and the Paralympics, as well as a series of international marathons [7]. At present, a large number of studies have shown that transportation capacity restricts the development of tourism. According to our research results, the development of sports tourism is also affected by transportation. The completeness of transportation corridors and other carriers can effectively promote the development of sports tourism activities. Yang et al. believe that the pivotal factor for the sustainable development of sports tourism lies in mass transit planning. Traffic problems often have a great negative impact on sports tourism, while sustainable mass transit planning can reduce the risk of traffic problems [38].

In allusion to the assessment of the spatial distribution pattern and influencing factors of sports tourism resources in China, the following proposals are put forward:

First, the development of sports tourism resources is concerned with local conditions. It is necessary to consider the advantages of local natural resources in combination with the selection of a sports ontology resource as the core attraction. With the help of highlighting unique characteristics, it becomes possible to create fine products and lift the matching degree of local tourism resources and sports tourism. Notably, emphasis on the importance of resources in a region is vital to the development of its sports tourism. Thus, the so-called unique sports events must rely on natural or sports resources with local characteristics. To develop these unique projects, localities must first clarify their own resource advantages. However, the current actual situation is that such natural resources are insufficient without a deep understanding that the core of the sports tourism industry is composed of natural and sports resources. Although the existing abundant sports resources can support the tourism industry for a period of time, creating a unique brand is not sustainable and even difficult. Hence, establishing a concept of "creating a solid basic resource environment" is indispensable. At this time, nature and sports resources may be coordinated to promote the development of the sports tourism industry. A single type of resource should not be overexploited to further develop surrounding sports projects and corresponding industries.

Second, with the opportunity of region-wide sports tourism, the connection and extension of expressways and ordinary roads in remote areas can be improved. This will not only strengthen the construction of tourist passages in border areas, but also elevate the traffic capacity inside and in neighboring provinces. Simultaneously, accelerating the connection of sports characteristic towns and sports tourism may be safely carried out. The construction of high-quality projects and the national sports tourism demonstration base for dedicated roads can expand the radiation range of popular scenic spots and provide better transportation services. In the core areas, consideration of dependence on excellent tourist cities and 5A-level scenic spots can unite sports tourism resources. For evacuation areas, greater attention is needed on the link function of roads, and emphasis should be placed on integrating sports tourism resources in eco-tourism to adapt to the demands of self-driving groups.

Finally, stronger guidance is necessary for local industries to meet the needs of tourists for food, housing, transportation, travel, shopping, entertainment, and other aspects of sports tourism to promote the development of related industries. As a result, more employment opportunities become available for more people, which generates a virtuous circle. According to the local living standards, everything can be carried out within capabilities and invested appropriately. To strengthen cooperation and sharing with the surrounding sports industry market, the government has to increase public financial expenditures for sports and introduce large-scale strategic investors. The enthusiasm of market players should also be fully mobilized. In the meantime, aggrandizement for personnel training is necessary. Routes for mutual promotion, mutual delivery of tourists, information interchange, and complementary features with other types of tourism activities can be formed. Regarding other factors, supporting amenities and the overall service quality of sports tourism should be advanced, which can increase the construction of public service facilities such as sports service and consulting centers. However, certain literature recommends clarifying tourism's own positioning before building supporting restaurants, hotels, or restaurants. Cultivating a new driving force for economic development by accelerating the development of sports tourism is a better option. The awareness of high-quality development of sports tourism can even enhance the quality of sports tourism resources and the ability to develop sustainably.

Through the analysis of the spatial distribution characteristics of sports tourism resources, this study reveals the main driving factors that affect the distribution of sports tourism resources on a national scale. Certain practical significance is provided for the scientific and reasonable layout and the appropriate and sustainable development of sports tourism resources. However, this study still has the following limitations: First, being limited by the availability of selected indicators, tourism boutique projects in recent years could not be selected as the research object, which restricts the topic pertinence and timeliness. Second, as a new mode of integration and development of two industries, sports tourism has been the subject of relatively few quantitative explorations of the coordination and coupling in different regions and the local natural environment, politics, economy, population, and other related factors. Therefore, deepening research on these issues in the future is necessary to enhance the pertinence and effectiveness of sports tourism development measures. For instance, the new pattern of sports tourism resources after 2018 or the factors such as topography, climatic conditions, hydrological conditions, and population economy can be examined. In turn, the effects of reasonable and effective layout, construction according to local conditions, and coordination of the matching resources and sports characteristics can be identified.

**Author Contributions:** Conceptualization, Yifan Zuo and Yuqi Si; methodology, Yifan Zuo and Yuqi Si; software, Yifan Zuo; validation, Jincheng Pan; investigation, Jincheng Pan; data curation, Yifan Zuo and Huan Chen; writing—original draft preparation, Yifan Zuo, Huan Chen and Rob Law; writing—review and editing, Yifan Zuo, Huan Chen, Rob Law and Mu Zhang; visualization, Yifan Zuo. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by the National Social Science Fund of China (grant No. 19BTY066).

**Data Availability Statement:** The raw data supporting the conclusions of this manuscript can be made available by the authors to qualified researchers.

**Acknowledgments:** We would like to thank the three anonymous reviewers and the editors for their valuable comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Identifying the Relatedness between Tourism Attractions from Online Reviews with Heterogeneous Information Network Embedding**

**Peiyuan Qiu 1,2, Jialiang Gao 2,3 and Feng Lu 2,3,4,5,\***


**Abstract:** The relatedness between tourism attractions can be used in a variety of tourism applications, such as destination collaboration, commercial marketing, travel recommendations, and so on. Existing studies have identified the relatedness between attractions through measuring their co-occurrence—these attractions are mentioned in a text at the same time—extracted from online tourism reviews. However, the implicit semantic information in these reviews, which definitely contributes to modelling the relatedness from a more comprehensive perspective, is ignored due to the difficulty of quantifying the importance of different dimensions of information and fusing them. In this study, we considered both the co-occurrence and images of attractions and introduce a heterogeneous information network (HIN) to reorganize the online reviews representing this information, and then used HIN embedding to comprehensively identify the relatedness between attractions. First, an online review-oriented HIN was designed to form the different types of elements in the reviews. Second, a topic model was employed to extract the nodes of the HIN from the review texts. Third, an HIN embedding model was used to capture the semantics in the HIN, which comprehensively represents the attractions with low-dimensional vectors. Finally, the relatedness between attractions was identified by calculating the similarity of their vectors. The method was validated with mass tourism reviews from the popular online platform MaFengWo. It is argued that the proposed HIN effectively expresses the semantics of attraction co-occurrences and attraction images in reviews, and the HIN embedding captures the differences in these semantics, which facilitates the identification of the relatedness between attractions.

**Keywords:** relatedness between attractions; online tourism reviews; heterogeneous information network; embedding; attraction image; topic extraction

#### **1. Introduction**

The relatedness between geographic objects captures a broad relation between objects that can be close or far apart in location, can be linked by interaction, or may simply share a common property [1]. Identifying the relatedness between tourism attractions can be used in a variety of tourism applications, such as (1) destination collaboration, e.g., evaluating the connection between attractions and find the core attractions in a tourist destination [2]; (2) commercial marketing, e.g., testing how changes in links between destinations influence market equilibrium [3]; (3) travel recommendation, e.g., recognizing the popular tourist areas for tourism route recommendation based on the interactions between attractions [4].

**Citation:** Qiu, P.; Gao, J.; Lu, F. Identifying the Relatedness between Tourism Attractions from Online Reviews with Heterogeneous Information Network Embedding. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 797. https://doi.org/10.3390/ ijgi10120797

Academic Editors: Andrea Marchetti, Angelica Lo Duca and Wolfgang Kainz

Received: 28 August 2021 Accepted: 26 November 2021 Published: 29 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In recent years, with the development of ICT (information and communications technology), big data, such as UGC (user-generated content) data, device data, and transaction data, has made great contributions to improving tourism research [5]. In particular, massive travel reviews of tourists are becoming easily accessible through social networks, such as Yelp, TripAdvisor, Booking, and so on. These reviews support the different types of information about visited attractions, visited times, travel notes and basic profiles of tourists, labels, ranks, review texts, and basic attributes of attractions. Intuitively, the relatedness between attractions can be identified by measuring the co-occurrence of attractions from the above information: the higher the frequency of co-occurrence of attractions (namely, the attractions are mentioned more in the information at the same time), the stronger the relatedness between them. On the one hand, the co-occurrence of attractions is reflected in the lists of tourists' visited attractions, which can be used to construct an attraction flow network. Then, the relatedness between attractions can be identified with network analytics. The results of identified relatedness are helpful to cognize the tourism movement patterns [6,7], evaluate the market position of different attractions [7,8], and reveal the factors affecting the network structure of the tourist flows [9,10]. On the other hand, the co-occurrence of attractions is expressed in review texts or travel note texts. For example, Haris et al. extracted the semantic relationships between tourist places from travel notes through the natural language processing (NLP) technique, then constructed a points of interest (POIs) graph to find the popular attractions and popular trip patterns which consist of the related attractions [11]. Yuan et al. implemented the frequent pattern mining method to identify the city's popular locations by their sequenced co-occurrences from travel blogs, then develop a max-confidence-based method to detect travel routes from the popular location network [12].

In addition to the co-occurrence of attractions, the implicit semantic information in tourism online reviews definitely contributes to modelling the relatedness from a more comprehensive perspective. The attraction image is one of these information types, which is the impression attractions on tourists, and it has different topics, such as the attractions to be seen (e.g., sand and beach), the environment to be perceived (e.g., weather, public hygiene), and experiences to remember (e.g., surfing, swimming) [13]. Thus, if two attractions have more similar images, they will have a stronger relatedness. Due to the attraction image being described in review texts and travel note texts, a topic model can be used to "understand" and extract the attraction image topics from these texts and divide the images into different semantic dimensions. The topic model is a probabilistic model for uncovering the underlying the semantic structure of a document collection based on a hierarchical Bayesian analysis of the original texts [14]. In tourism research, the topic model is used to discover the abstract "topics" in texts [15,16]. Then, the attraction images by tourists in different dimensions are obtained by fusing the topics related to this attraction, and the relatedness between attractions can be measured. The extracted attraction images facilitate the tourism destination analysis [13,17] or tourism personalized recommendation [18–21].

The key to using multi-dimensional semantic information to comprehensively identify relatedness is to quantify the importance of different dimensions of information and fuse them. That is, if two attractions have a higher frequency of co-occurrence, or more similar images, or both, they should have stronger relatedness. Determining the importance of these from massive online travel reviews manually is difficult. Thus, in this paper, we introduce a heterogeneous information network (HIN) to represent the tourism online reviews to characterise the co-occurrence and images of attractions, then comprehensively identify the relatedness between attractions through the HIN embedding technique automatically.

In the HIN, the type of nodes (or objects) or edges (or relations, links) is greater than one [22]. Therefore, the HIN can better model the real interacting system existing in multiple types of relationships. For example, a bibliographic information network can be organized as a HIN, which expresses many facts "one or more authors written a paper", "a paper has been published in a venue", and "a paper cited one or more papers" [23]. In this HIN, the types of nodes are "author", "paper", and "venue", and the types of edges are "written" (links author and paper), "published in" (links paper and venue), and "citing" (links paper and paper). Then, the relationships between authors can be characterized with the semantics of research area topic from this HIN compared with the homogeneous information network. Moreover, the social network [24–26] and bioinformatic network [27–29] have been modelled as HINs. HINs have been applied to massive tasks as clustering, classification, link prediction, ranking, recommendation, information fusion, influence propagation, and so on [30]. The HIN embedding technique characterizes the nodes of HIN with low-dimensional vectors, i.e., embeddings [24]. Then, the semantic information is embedded in the low-dimensional vector space and the relationships between nodes can be calculated by vector operation.

Taking the HIN's advantage in expressing the different types of semantics between nodes, we utilize it to represent the tourism online reviews and use HIN embedding to comprehensively identify the relatedness between attractions. First, an online review-oriented HIN is designed to form the different types of elements in the reviews. Second, a topic model is employed to extract the nodes of the HIN from the review texts. Third, an HIN embedding model is used to capture the semantics in the HIN and comprehensively represent the attractions with low-dimensional vectors. Finally, we conduct several experiments to verify the effectiveness of the proposed method.

The remainder of this paper is structured as follows. Section 2 proposes the structure of an online review HIN, the construction method, and the embedding method of this HIN. Section 3 conducts a case study using online tourist review data. Section 4 is devoted to discussions, and Section 5 concludes this work.

#### **2. Materials and Methods**

The procedure of identifying the relatedness between tourism attractions from online reviews with HIN embedding is shown as Figure 1. Firstly, a structure of HIN is designed to represent the tourism online reviews. Next, the original online reviews are transformed into the form of the proposed HIN through direct extraction and image topic extraction. Then, the attractions in HIN are embedded into the n-dimensional vectors by HIN embedding technology. Finally, the relatedness between attractions is calculated based on the vector similarity.

**Figure 1.** Flowchart for identifying the relatedness between tourism attractions with HIN embedding.

#### *2.1. Online Review HIN Structure*

In this research, we built an HIN for representing tourists' online reviews. Online reviews support which attractions are visited and the attraction images of tourists. Specifically, the attraction image in review is expressed around one or more topics, such as cost, dining, feature of attraction, traffic, and so on. So, the types of nodes in the proposed online review HIN are "attraction", "tourist", "review", and "topic". The types of edges between these nodes are "havingreview" (an attraction has a review), "reviewof" (a review of an attraction), "writing" (a tourist writes a review), "writtenby" (a review is written by a tourist), "hastopic" (a review has a topic) and "topicof" (an image topic of a review). Figure 2 illustrates an example of the online review HIN from four reviews about two tourists, three attractions and two topics.

**Figure 2.** Example of an online review HIN.

In this online review HIN, the node path "attraction"→"review"→"tourist"→"review" →"attraction", that is, the edge path "havingreview"→"writtenby"→"writing" →"reviewof", holds the co-occurrence of attractions visited by the same tourists, and the node path "attraction"→"review"→"topic"→"review"→"attraction", that is, the edge path "havingreview"→"hastopic"→"topicof"→"reviewof", holds the relationship between attractions by the same topics of attraction images. Thus, this online review HIN expresses the co-occurrence of attractions and attraction images through the above long hop paths.

#### *2.2. Topic Extraction and HIN Construction*

The key task of constructing the presented online review HIN is extracting the nodes and edges from the reviews. The nodes "attraction" and "review" and their edge "havingreview"/"reviewof" can be directly extracted from the review list of the attraction. For the nodes "tourist" and "review", their edge "writing"/"writtenby" can be directly parsed from the basic information of the review, which contains tourist name, score given to an attraction, time of posting the review, etc. However, the image topic is not provided as basic information by the online review, so the node "topic" and the edge "hastopic"/"topicof" between "review" and "topic" are not directly extracted from the online review. Meanwhile, the image topic can be represented by certain words which make up the review text, so the image topic can be acquired from the review text through topic extraction.

Topic models are widely used for extracting abstract "topics" and hidden semantic structures from vast textual documents. Topic models as unsupervised machine learning models can automatically analyse the documents in the corpus and extract potential topics according to the co-occurrence of words in documents. For example, particular words such as "train", "subway" and "taxi" would co-occur more frequently in a document about the topic "traffic". In this study, we use the Latent Dirichlet Allocation (LDA) model [31], which is the most popular topic model, to extract the topics of review from the review text. Inputting several documents, the two main outputs of the LDA model are the probabilities that each document belongs to the different topics and the high-frequency keywords

of each topic. Then, the meaning of each topic can be summed up manually from its high-frequency keywords.

However, the original LDA model experiences large performance degradation over short texts due to the lack of word co-occurrence information in each short text [32]. Meanwhile, most of the tourism online review texts are short texts, and the word count in these texts is less than 100. Thus, we introduce the word embedding technique to extend the context of online tourism review texts to meet the word count requirement for the original LDA. For word embedding, the words in the corpus are encoded into a continuous low-dimensional semantic vector space, where each word is represented by a fixed dimensional real-valued vector [33,34]. For instance, the words "France" and "U.S.A" are represented by the 200-dimensional real-valued vectors, respectively, through word embedding; then, their distance can be calculated in the vector space. If the distance between two words is close, these words have similar semantics or related semantics [35]. For example, the distance between "France" and "U.S.A" (or "France" and "French") is less than the distance between "France" and "Mountain" in the vector space. Thus, words with similar semantics to the original words in a review text can be obtained through a similarity calculation.

The detailed procedure of acquiring the edges "hastopic"/"topicof" between "review" and "topic" from the online reviews through topic extraction is shown in Figure 3.

**Figure 3.** Flowchart for acquiring the edges "hastopic"/"topicof" between nodes "review" and "topic" through topic extraction.

Firstly, the punctuation, stop words, and emojis are removed from the original review text to reduce the interference of this meaningless information on the subsequent processing. The processed texts form a corpus "C1".

Secondly, a TextRank [36] algorithm is conducted to extract the keywords of each review in the corpus "C1" for highlighting the key information in review. The extracted *k* keywords of each review represent this review and form a new corpus "C2".

Thirdly, we use the word embedding model Word2Vec to obtain the low-dimensional semantic vectors of each word in the corpus "C2". Then, the semantic similarity between words can be measured by the cosine similarity as follows:

$$\text{CostSim}(\mathbf{x}, \mathbf{y}) = \cos(\theta) = \frac{\mathbf{x} \cdot \mathbf{y}}{||\mathbf{x}|| ||\mathbf{y}||} = \frac{\sum\_{i=1}^{n} \mathbf{x}\_{i} y\_{i}}{\sqrt{\sum\_{i=1}^{n} \mathbf{x}\_{i}^{2}} \sqrt{\sum\_{i=1}^{n} y\_{i}^{2}}} \tag{1}$$

where *x* and *y* are the vectors of two words. *xi* and *yi* are components of vector *x* and *y*, respectively.

For one word, the semantic similarities between this word and each other word can be measured by Equation (1) and ranked in ascending order. Then, a dictionary records the top *l* most similar words of each word built. We can use this dictionary "D" to quickly obtain a similar word set of an input word.

Fourthly, each word of the review in the corpus "C2" has *l* semantic similar words as its extended words from the dictionary "D". The original words and their extended words in the corpus "C2" consist of a new corpus, e.g., extending the context of reviews. To avoid the importance of original words being diluted by their extended words, the original words can be repeated *m* times, respectively, in the new corpus. The final corpus is named "C3".

Fifthly, the number *n* topics, with their high-frequency keywords and the probabilities that each review belongs to the different topics, were obtained through using the corpus "C3" to train the LDA model. The meaning of each topic can be summed up manually from its high-frequency keywords. The topic with the highest probability is the image topic of a review. Then, the edges "hastopic"/"topicof" between reviews and topics are constructed from the reviews and their topics.

#### *2.3. HIN Embedding and Identifying the Relatedness between Attractions*

In order to achieve good performance in such tasks as clustering, classification, link prediction, recommendation, etc., the HIN embedding technique is proposed to embed the nodes of HIN into low-dimensional vectors, and then the embedded nodes can be input into the advanced machine learning models. In recent years, many HIN embedding models have been proposed, such as Metapath2Vec [37], HIN2Vec [38], HAN [39], HERec [24], and so on. While these models have been used to represent the nodes in HINs of a bibliography (e.g., from DBLP, AMiner), social media platforms (e.g., from an online blog, Flickr, Yelp, Douban), bioinformatics (e.g., from HMDD, aBiofilm), etc., they have not been applied to the HIN of tourism information before.

In this research, we select the HIN embedding model HIN2Vec to embed the online review HIN. The HIN2Vec model captures the semantic information contained in meta-paths (namely the node path or edge path mentioned in Section 2.1) and the whole network structure. Then, the relevant nodes which have semantic relationships are close to each other in the low-dimensional vector space. Compared with other HIN embedding models, the HIN2Vec model automatically constructs meta-paths with a given path length and captures the semantic information in these meta-paths instead of the limited short hop (one-hop or two-hop) meta-paths in other models. Thus, HIN2Vec can capture the semantic information in the long hop meta-paths of the online review HIN mentioned in Section 2.1: "havingreview"→"writtenby"→"writing"→"reviewof" and "havingreview"→"hastopic"→"topicof"→"reviewof".

Specially, the HIN2Vec model is a neural network model which learns the lowdimensional vectors of nodes and edges in HIN by a prediction task: input nodes *x*, nodes *y* and edges *r* to the model to predict whether *r* exists between *x* and *y*. The structure of the HIN2Vec model is shown in Figure 4. The input layer accepts the one-hot vectors → *x* , → *y* and <sup>→</sup> *r* of *x*, *y* and *r*. The latent layer transforms <sup>→</sup> *x* , → *y* and <sup>→</sup> *r* into latent vectors *W X* → *x* , *W Y* → *<sup>y</sup>* and *<sup>f</sup>*01 *W R* → *r* in the d-dimensional vector space. Then, a Hadamard function is used to aggregate these latent vectors and an Identity function is applied for activation. Finally, the output layer uses the Summation as the input function and the Sigmoid function for activation to finish the prediction. The goal of the HIN2Vec model is to learn the optimal vectors *W X* → *x* , *W Y* → *<sup>y</sup>* and *<sup>f</sup>*01 *W R* → *r* of *x*, *y* and *r* to ensure that the predicting result is true if *r* exists between *x* and *y* in the real HIN, and false if *r* does not exist between *x* and *y* in the real HIN.

The process of identifying the relatedness between attractions through the HIN embedding model HIN2Vec is shown in Figure 5. Each edge in the online review HIN is re-represented by the tuple form *nodei*, *nodej*, *edgek* for meeting the input of the model's prediction task and used to train a HIN2Vec model, where *nodei* and *nodej* are the head node and tail node in the edge *edgek*. Then, the vectors of "attraction" nodes are extracted from the trained HIN2Vec model. Finally, the relatedness between two attractions can be identified by a variety of vector similarity measurements such as Euclidean distance, Manhattan distance, cosine similarity, and so on, according to applications.

**Figure 5.** The process of identifying the relatedness between attractions through HIN2Vec model.

#### **3. Case Study**

In this section, we verify the performance of the proposed method with the mass tourism reviews. Firstly, the tourism review data and the constructed online review HIN are described. Then, three experiments are conducted: (1) visualization of the HIN embedding result, (2) top related attractions finding, and (3) attractions clustering.

#### *3.1. Review Data*

The tourism online review data were collected from the popular tourist-oriented information sharing platform MaFengWo (www.mafengwo.cn/). We selected attractions with reviews from within China (except Hong Kong, Macao, and Taiwan), and the period of reviews was from 2014 to 2018. Moreover, to ensure that each attraction had enough reviews for extracting tourists and topics to build paths to other attractions, attractions with fewer than 20 reviews were filtered out. The final review data to conduct the experiments contained 11,122 attractions, 202,777 tourists, and 1,087,438 reviews. The spatial distribution of attractions is shown in Figure 6.

**Figure 6.** Spatial distribution of attractions in the review data. (Base map is obtained from Map World: http://lbs.tianditu. gov.cn/server/MapService.html).

#### *3.2. Online Review HIN Construction*

#### 3.2.1. Image Topic Extraction

In the attraction image topic extraction, some model parameters are set considering the amount of data and efficiency. Primarily, the original review text was segmented into word sequences using the Chinese word segmentation tool because Chinese texts do not use space or another symbol to indicate different words. We used the HanLP2 tool (www.hanlp.com/) to segment the tourist reviews. Then, for TextRank which is also implemented in the HanLP2 tool, the maximum number *k* of keywords extracted was 50. Next, we used the gensim tool (radimrehurek.com/gensim/) to train the Word2Vec and

LDA models. For the Word2Vec model, the dimension of the vector is 250, the window is 5, the minimum word frequency is 5, and the skip-gram model [34] was selected. The number *l* of words used to extend the context of reviews is 25, and the repeat times *m* of original words to avoid the importance of original words being diluted by their extended words is 12. Then, the average word number of reviews in the corpuses "C1", "C2", and "C3" are 21.98, 12.21, and 440.78, respectively. So, the length of the reviews in corpus "C3" with extended context is suitable for the LDA model to extract image topics. The topic number *n* of the LDA model was set as 200 to fully distinguish the semantic differences between potential image topics. All training was conducted on a computer equipped with two 2.20 GHz Intel Xeon CPUs and 128 GB RAM.

The probabilities that each document belongs to the different topics and the highfrequency keywords of each topic are the two main outputs of LDA. The extracted 200 image topics can be further divided into 13 categories and 155 sub-categories through manually interpreting the high-frequency keywords of each image topic. The categories and sub-categories are shown in Table 1. Table 2 shows 20 image topics of the above 155 sub-categories and their top 10 high-frequency keywords. These results indicate that the proposed image topic extraction method can capture the semantic difference in the reviews, which will facilitate the relatedness identification between attractions. Finally, the topic with the highest probability is the image topic of a review. It needs to be emphasized that the interpreted categories were used for understanding the meanings of images and to verify the validity of the topic extraction results. So, we kept all the extracted 200 image topics in the next HIN construction instead of merging the topics that belonged to the same category for the HIN2Vec model, which captures the slight semantic differences between the topics.

**Table 1.** Image topic category and sub-categories (the number of extracted topics belong to the category is in the brackets).



**Table 2.** Examples of image topics and top 10 high-frequency keywords.

Chinese words translated into the same English word are merged and the number of these Chinese words is shown in brackets.

#### 3.2.2. Online Review HIN

The final online review HIN was constructed based on the nodes and edges acquired from the above original review data and image topic extraction result. The online review HIN contains 1,301,537 nodes and 7,017,522 edges. The number of different types of nodes and edges is shown in Table 3.

**Table 3.** Statistics of nodes and edges in the online review HIN.


*3.3. Online Review HIN Embedding*

The HIN2Vec model was implemented by the open code of the author (https:// github.com/csiesheep/hin2vec). For training a HIN2Vec model, some important param-

eters were set considering the amount of data and efficiency: the dimension of vectors is 150, the number of negative samples is 5, and the length of random walks is 1000. The length of meta-paths was set to 4 to capture the semantics in the two edge paths "havingreview"→"writtenby"→"writing"→"reviewof" and "havingreview"→"hastopic" →"topicof"→"reviewof" presented in Section 2.1.

Inspired by the work of Liu et al. [40], we used t-SNE to reduce the HIN2Vec embedding result with 150 dimensions to two dimensions for visualization on a two-dimensional plane. The results are shown in Figure 7: (a) is the visualization of all nodes, and (b) is that of the nodes except for the "review" nodes. It illustrates that all nodes are mixed in the visualization result, but the nodes of "attraction", "topic", and "tourist" are grouped. The possible reason is the "review" nodes are connected with all the other kinds of nodes in the online review HIN, so the HIN2Vec model cannot discriminate the difference of semantics between "review" nodes and other kinds of nodes in the embedding process. Consequently, the HIN2Vec model captures the semantic differences between attractions, topics, and tourists, which ensures the effectiveness of the relatedness identification between attractions.

**Figure 7.** Visualization of the embedding result.

#### *3.4. Top Related Attractions Finding*

This experiment was conducted to find the top related attractions of a given attraction to verify the presented method. The relatedness *rel*\_*hin ai*, *aj* between attraction *ai* and *aj* based on online review HIN embedding was identified through measuring the cosine similarity between the vectors of attractions, which is a common metric of measuring the similarity between high-dimensional vectors in machine learning.

#### 3.4.1. Comparative Relatedness Identification Methods

We used two comparison relatedness identifying methods based on homogeneous co-occurrence attraction network embedding and image topic distribution as the contrasts of the proposed relatedness identification.

#### (1) Relatedness Identification Based on Homogeneous Network Embedding

We built a homogeneous co-occurrence attraction network from the assumption "a tourist written a review text to a tourism attraction" means "this tourist has visited this tourism attraction". Thus, if a tourist wrote different reviews of different tourism attractions, this tourist has visited all these tourism attractions. That is, these tourism attractions cooccur, which can be used to identify the relatedness between attractions, as mentioned in Section 1. Specifically, the node in the homogeneous co-occurrence attraction network is attraction. The edge represents that its two nodes (attractions) have been visited by the same tourists. Moreover, the edge has a weight to indicate the number of the same tourists. A higher weight of the edge means that the nodes (attractions) of this edge are visited together more frequently.

Then, the homogeneous network embedding model LINE (Large-scale Information Network Embedding) was used to characterize the nodes with low-dimensional vectors. The LINE model is suitable for undirected, directed, and/or weighted networks containing millions of nodes [41]. This model (1) captures the first-order proximity between the nodes of the observed links in the network, and (2) explores the second-order proximity between the nodes, which is not measured through the observed links but through the shared neighborhood structures of the nodes. Thus, the LINE model can solve the problem of sparse edges in the large real homogeneous network, which leads to poor performance of node embedding.

The LINE model was implemented by the open code of the author (https://github. com/tangjianpku/LINE). For training a LINE model, some important parameters were set: the vector dimension was 128, the number of negative samples was 5, the total number of training samples is 10,000, the edge is undirected, and the *first-order* and second-order proximity were both used. Similarly, to the result of the HIN2Vec model, the relatedness *rel*\_*line ai*, *aj* between *ai* and *aj* was also identified by calculating their cosine similarity in the vector space embedded by the LINE model.

#### (2) Relatedness Identification Based on Image Topic Distribution

An attraction has many different reviews, and a review has an image topic, so an attraction has different image topics, namely image topic distribution of this attraction. The image topic distribution of attraction can form a vector of this attraction: the vector dimension is the number of all image topics, and the dimension value is the reviews' number that belongs to the corresponding topic. Thus, the relatedness between two attractions was identified by these vectors: the high relatedness means these two attractions have similar image topic semantics. Specifically, the numbers of an attraction's reviews belonging to each image topic are counted from the result of topic extraction and as the dimension values. Therefore, the vector dimension was 200, consistent with the parameter of the LDA model. After normalizing each vector of the attraction, the relatedness *rel*\_*topic ai*, *aj* between *ai* and *aj* was measured by the cosine similarity.

#### 3.4.2. Results

Each attraction can obtain its top 1000 related attractions by identifying and sorting the *rel*\_*line*, *rel*\_*topic*, and *rel*\_*hin*, which reflects the perspectives of attraction co-occurrence, image topic semantics, and HIN, respectively. Figure 8 shows the spatial distribution of the top 1000 related attractions of five attractions sorting by *rel*\_*line*,*rel*\_*topic*, and *rel*\_*hin* (abbreviated as *SD*\_*line*, *SD*\_*topic*, and *SD*\_*hin*, respectively, for brevity): the Palace Museum, Shanghai Disneyland, Qingdao Trestle, Mount Siguniang, and Potala Palace. To observe the difference in spatial distribution more clearly, the Kernel Density Estimation (KDE) surface generating from the top related attractions overlays each map. This figure illustrates that, compared with the attractions in *SD*\_*hin* and *SD*\_*topic*, the attractions in *SD*\_*line* were closer to the given attractions. This phenomenon is consistent with the notion that frequent pairwise occurrences of points of interest (POIs) indicate their geographic proximity [11] because the *SD*\_*line* is conducted from the co-occurrence attraction network. Meanwhile, compared with the attractions in *SD*\_*hin* and *SD*\_*line*, the attractions in *SD*\_*topic* were more scattered in China (e.g., the high-density surfaces are greater). The reason is that the geographic proximity of attractions' image topics is not significant. For some image topics relating to certain types of natural terrain, the spatial distributions of these topics may present some rules. For instance, Qingdao Trestle is a wharf that stretches into the sea at Qingdao, so most of its attractions in *SD*\_*topic* are located on the coastline of China. Overall, the *SD*\_*hin* is situated between the *SD*\_*line* and *SD*\_*topic*, showing that the proposed method identifies the relatedness between attractions from the perspectives of attraction co-occurrence and attraction image topic comprehensively.

**Figure 8.** Spatial distribution of the top 1000 related attractions of the given attractions sorting by different relatedness identification (the cyan points are the given attractions; the blue points are the top related attractions; the "yellow-red" surfaces are the KDE surfaces generating from the top related attractions: "yellow" indicates a low density of attractions, and "red" indicates a high density of attractions).

#### 3.4.3. Efficiency Analysis

We calculated the *rel*\_*line* and *rel*\_*topic* between each attraction and its top 1000 related attractions which were sorted by *rel*\_*hin*. The statistical indicators' average, median, first quartile, and third quartile of the *rel*\_*line*, *rel*\_*topic*, and *rel*\_*hin* on each sorting position are shown in Figure 9. This figure illustrates that the tendencies of *rel*\_*line* and *rel*\_*topic* both decreased when the *rel*\_*hin* decreased. This result indicates that the HIN2Vec model is most efficient in terms of fusing information of attraction co-occurrence and the image topic semantics to comprehensively identify the relatedness between attractions.

Furthermore, we calculated the distances between each attraction and its top 1000 related attractions which were sorted by *rel*\_*line*, *rel*\_*topic*, and *rel*\_*hin*, respectively. The statistical indicators' average, median, first quartile, and third quartile of the distances on each sorting position are shown in Figure 10. It can be seen that the distances between attractions and their top 1000 related attractions sorting by *rel*\_*topic* are large, and the differences between the distances on different sorting positions are slight. It illustrates that the geographic proximity of attraction image topics is again not significant. Furthermore, the distances between attractions and their top 1000 related attractions sorting by *rel*\_*line* and *rel*\_*hin* increased as the relatedness decreased. Specifically, the distances based on *rel*\_*hin* increased faster than the distances based on *rel*\_*line*, e.g., the median distance

of each attraction and its 200th related attraction sorted by *rel*\_*line* is 391.04 km, but the median distance of that sorted by *rel*\_*hin* is 907.71 km. These show that the HIN2Vec model can capture the image topic similarity based on geographic proximity. That is, the HIN embedding listed not only the near co-occurrence attractions as the related attractions of an attraction, but also the attractions far away but with similar image topics.

**Figure 9.** Statistical indicators of *rel*\_*hin*, *rel*\_*line*, and *rel*\_*topic* between each attraction and its each top 1000 related attraction sorted by *rel*\_*hin* (the *rel*\_*hin* is calculated by cosine similarity).

**Figure 10.** Statistical indicators of distances between each attraction and its top 1000 related attractions sorted by *rel*\_*hin*, *rel*\_*line*, and *rel*\_*topic* (the *rel*\_*hin* is calculated by cosine similarity).

#### *3.5. Attractions Clustering*

The attractions can be grouped using a clustering algorithm based on the vectors from HIN embedding. In this case study, the Affinity Propagation (AP) clustering algorithm was selected to group the attractions. AP clustering views each data point as a node in a network, then recursively transmits real-valued messages along the edges of the network until a good set of exemplars and corresponding clusters emerges [42]. Specially, the realvalued messages are divided into responsibility and availability. The former is the message sent from data point *i* to candidate clustering centre point *j*, reflecting the suitability that point *j* is the clustering centre of point *i*. The latter is the message sent from candidate clustering centre point *j* to data point *i*, reflecting the appropriateness that point *i* selects point *j* as its clustering centre. AP clustering determines the clustering centre of all data points by the iterative calculation of these two real-valued messages, then finishes the clustering. Thus, the number of clusters of the Affinity Propagation clustering algorithm was not prespecified, which is consistent with the lack of prior knowledge to determine the optimal number of clusters of attractions. Finally, 11,122 attractions were clustered into 467 clusters.

Then, we calculated the average of the relatedness based on the online review HIN (*ave*\_*rel*), the average of distances (*ave*\_*dis*) and the standard deviation of distances (*std*\_*dis*) between all attractions in each cluster. The larger *ave*\_*rel* of cluster indicated that the attractions in this cluster have stronger relatedness. The larger *ave*\_*dis* of cluster indicated that the attractions in this cluster were distributed in a larger space range. The larger *std*\_*dis* of cluster indicated that the attractions in this cluster were distributed more unevenly in space. Because the similarity between data points in AP clustering is measured by the negative Euclidean distance between vectors, we also used negative Euclidean distance to identify the relatedness between attractions in this experiment:

$$relatedness(\mathbf{x}, \mathbf{y}) = -dist(\mathbf{x}, \mathbf{y}) = -\sqrt{\sum\_{i=1}^{n} (\mathbf{x}\_i - \mathbf{y}\_i)} \tag{2}$$

where *x* and *y* are the vectors of two attractions. *xi* and *yi* are components of vector *x* and *y*, respectively.

Figure 11 indicates the overall trend of *ave*\_*dis* and *std*\_*dis* decreasing with *ave*\_*rel* increasing, while it is not strictly decreasing. It illustrates that the attractions which are spatially close and uniformly distributed have a higher probability of being clustered. That is, the HIN2vec model decides that attraction co-occurrence is a factor that may be more important than image topic in determining the semantic relationship between attractions from the proposed online review HIN. However, the HIN2vec model embeds the attractions from the structure of the online review HIN rather than simply combining co-occurrence relatedness and image topic relatedness between attractions directly. This process may take advantage of additional potential semantic relationships, so the trend is not strictly decreasing.

**Figure 11.** Trend of *ave*\_*dis* and *std*\_*dis* between attractions in each cluster as *ave*\_*rel* increases (the *ave*\_*rel* is calculated by negative Euclidean distance).

We used Jenks natural breaks classification method to further divide the above 467 clusters into five groups based on the *ave*\_*rel* of these clusters. Then, one cluster for each group was chosen for exploring the validity of the clustering results. The spatial distributions of the five groups and the attractions in the five sample clusters are shown in Figure 12. It indicates that the attractions in each cluster were spatially concentrated as *ave*\_*rel* increased. Most attractions in cluster #6 and all attractions in cluster #20 were concentrated in a city (Harbin and Wuhan). Besides, even the attractions of a cluster are distributed in a large space range, these attractions may have similar image topics, e.g., cluster #441 is about "museum", cluster #307 is about "beach", and cluster #6 is about "historic towns". Meanwhile, the attractions in cluster #20 and cluster #161 are clustered because if these attractions are distributed in a small space range, then they have a higher probability of being co-visited by tourists, resulting in a stronger co-occurrence relatedness between these attractions than image topic relatedness between them. Overall, the attractions in different clusters present co-occurrence relatedness or image topic relatedness, which demonstrates that the HIN embedding automatically adjusts the importance of attraction co-occurrence and attraction image in final relatedness from the characteristics of real data. The clustering result helps one to further discover the attraction communities, of which the attractions can establish close cooperation.

**Figure 12.** Spatial distributions of the attractions in the clusters with different *ave*\_*rel*. The attractions in the left column are the attractions in all clusters with given range of *ave*\_*rel*. The attractions in the right column are the attractions in the clusters sampled from the corresponding left clusters (the *ave*\_*rel* is calculated by negative Euclidean distance).

#### **4. Discussion**

In this study, the HIN2Vec model was used to embed the online review HIN into low-dimensional vector space, whereas there are many other HIN embedding models, as mentioned in Section 2.3, such as Metapath2Vec, HAN, and HERec. These models also show a good performance in representing the nodes in HIN. The reasons we selected the HIN2Vec model in this research are: (1) as presented in Section 2.3, the HIN2Vec model can construct meta-paths automatically and avoid meta-path design. Although the two edge paths "havingreview"→"writtenby"→"writing"→"reviewof" and "havingreview"→"hastopic"→"topicof"→"reviewof" express the semantics of attraction co-occurrence and attraction image, as explained in Section 2.1, we think the other edge paths can still give clues for the HIN2Vec model to mine the semantic relationships between nodes, which may not have significant meanings for people to understand. (2) No model has demonstrated undisputed performance on HIN embedding, because the above models are verified in different tasks and evaluation metrics with different pre-processing [43]. Overall, the emphasis of this research illustrates that the HIN can retain the difference between different relationship semantics when the online reviews are reorganized into a network structure, and the HIN embedding model can capture and fuse these different relationship semantics, which facilitate identifying the relatedness between attractions from a comprehensive perspective.

The proposed relatedness identification between attractions based on online review HIN is a data-driven approach. The HIN2Vec model can automatically capture and fuse heterogeneous semantic information in the online review HIN and give the attraction vectors through fusing all information, without the need to manually set the weights of attraction co-occurrence and attraction image topic. Specifically, the strength of attraction co-occurrence is reflected by the heterogeneous network structure, rather than the weight of edges, as in the traditional network analytics. That is, if two attractions have more reviews written by the same tourists, the HIN2Vec model will ensure these attractions are closer to each other in the embedding vector space, i.e., these attractions have stronger relatedness. Moreover, the HIN2Vec model generates the training data from HIN based on random walk and negative sampling, which overcomes the data-sparsity problem and outputs the effective embedding vectors of attractions that have a few co-occurrences with other attractions or attraction image topics.

While the number of node types in the proposed online review HIN was four and the number of edge types was six, more information in the tourism online reviews should be introduced into the online review HIN in future to better identify the relatedness between attractions, such as the type of attraction, the level of attraction, the residence of the tourist, and so on. Nevertheless, the quality and reliability of the information needs to be noticed to avoid introducing noise into the HIN. For instance, the attraction level "National AAAAA level tourism attraction" is labelled as "National 5A level tourism attraction", "AAAAA attraction", "5A level attraction", etc. in Chinese on MaFengWo. The reason is that the information in social networks lacks strict inspection and revision. Thus, the model will determine these labels as having different semantics if these labels are not uniformed. Furthermore, the data size affects embedding efficiency. The training time of the HIN2Vec model exceeded 15 hours based on the constructed online review HIN. If the length of the meta-paths was set to 5, the HIN2Vec model would not have completed training for five days. Consequently, while HIN embedding showed good performance in identifying the relatedness between attractions, the HIN structure, data size, data quality, and HIN embedding model need to be carefully selected to ensure the training efficiency.

The related attractions of an attraction can be used as the recommendation information when a user browses this attraction online. Meanwhile, the attraction manager can regard the tourists who visited these related attractions as potential customers and take measures to attract these tourists. In addition, the HIN embedding model embeds not only the attractions, but also tourists and image topics in the online review HIN. Thus, the relatedness between tourists can also be identified, which helps to extract tourist profiles, cluster tourists, and recommend related tourists based on fusing the multiple relationship semantics. Furthermore, the attractions that may be of interest to a tourist can be obtained based on the relatedness between tourists and attractions by the operation of their vectors.

#### **5. Conclusions**

Most studies identify the relatedness between attractions through measuring their co-occurrence extracted from online tourism reviews. However, the implicit semantic information in these reviews, which definitely contributes to modelling the relatedness from a more comprehensive perspective, is ignored due to the difficulty of quantifying the importance of different dimensions of information and fusing them. Thus, this paper introduces HIN to reorganize the tourism online reviews for representing the co-occurrence and images of attractions, and then uses HIN embedding to comprehensively identify the relatedness between attractions. First, an online review-oriented HIN was designed to form the different types of elements in the reviews. Second, a topic model was employed to extract the nodes of the HIN from the review texts. Third, an HIN embedding model was used to capture the semantics in the HIN and comprehensively represent the attractions with low-dimensional vectors. The effectiveness of the presented method was validated by three tasks based on the tourist review data from MaFengWo: (1) the visualization illustrates the HIN2Vec model accurately discriminates the attraction, topic, and attraction image types of elements in an online review HIN; (2) the top 1000 related attraction findings show that the presented method comprehensively identifies the relatedness between attractions from the perspectives of both attraction co-occurrence and attraction image; (3) the result of attraction clustering demonstrates the HIN embedding can automatically adjust the importance of attraction co-occurrence and attraction image in final relatedness based on the characteristics of real data. These results indicate that the online review HIN can correctly express the semantics of attraction co-occurrences and attraction images in reviews, and the HIN embedding can capture the differences in these semantics, which facilitates identifying the relatedness between attractions from a comprehensive perspective.

Limitations also exist in this study. Firstly, the structure of the proposed online review HIN only contained four node types and six edge types. Meanwhile, the tourism online reviews provided more types of information, such as the type of attraction, the level of attraction, the residence of the tourist, etc., which helped to identify the relatedness through integrating more semantics. Secondly, we only used the HIN2Vec model to verify the effectiveness of the proposed online review HIN, not to compare the effects of different HIN embedding models. Moreover, while the HIN2Vec model can capture the semantic information in the long hop edge paths, its training time increased significantly with the increase in data size. Therefore, in future work, we would like to (1) extend the online review HIN with more types of information; (2) improve the training efficiency in terms of model selection, model optimization, and HIN structure optimization; and (3) apply the proposed relatedness identification to tourism recommendation and tourism analytics.

**Author Contributions:** Conceptualization, Peiyuan Qiu and Feng Lu; methodology, Peiyuan Qiu; validation, Peiyuan Qiu and Jialiang Gao; formal analysis, Peiyuan Qiu; investigation, Peiyuan Qiu and Jialiang Gao; resources, Peiyuan Qiu and Jialiang Gao; data curation, Jialiang Gao and Peiyuan Qiu; writing—original draft preparation, Peiyuan Qiu; writing—review and editing, Feng Lu; supervision, Feng Lu; project administration, Feng Lu; funding acquisition, Feng Lu and Peiyuan Qiu. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China (Grant No. 41631177, Grant No. 42001341); Doctoral Research Fund of Shandong Jianzhu University, grant number X20084Z; and a grant from State Key Laboratory of Resources and Environmental Information System.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** The authors would like to thank the four anonymous reviewers for their valuable suggestions, which significantly improve the quality of this article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Tour-Route-Recommendation Algorithm Based on the Improved AGNES Spatial Clustering and Space-Time Deduction Model**

**Xiao Zhou 1, Jiangpeng Tian 2,\* and Mingzhan Su <sup>2</sup>**

	- **\*** Correspondence: tjp\_study@infu.ac.cn; Tel.: +86-135-9241-9753

**Abstract:** This study designed a tour-route-planning and recommendation algorithm that was based on an improved AGNES spatial clustering and space-time deduction model. First, the improved AGNES tourist attraction spatial clustering algorithm was created. Based on the features and spatial attributes, city tourist attraction clusters were formed, in which the tourist attractions with a high degree of correlation among attributes were gathered into the same cluster. It formed the precondition for searching tourist attractions that would match tourist interests. Using tourist attraction clusters, this study also developed a tourist attraction reachability model that was based on tourist-interest data and geospatial relationships to confirm each tourist attraction's degree of correlation to tourist interests. A dynamic space-time deduction algorithm that was based on travel time and cost allowances was designed in which the transportation mode, time, and costs were set as the key factors. To verify the proposed algorithm, two control algorithms were chosen and tested against the proposed algorithm. Our results showed that the proposed algorithm had better results for tour-route planning under different transportation modes as compared to the controls. The proposed algorithm not only considered time and cost allowances, but it also considered the shortest traveling distance between tourist attractions. Therefore, the tourist attractions and tour routes that were suggested not only met tourist interests, but they also conformed to the constraint conditions and lowered the overall total costs.

**Keywords:** AGNES clustering; tourist attraction clustering; tourist attraction reachability space model; space-time deduction; tour route searching

#### **1. Introduction**

Tourists are the core of tourism activity. A key issue of smart tourism is how to improve tourist satisfaction and provide the best experience. A complete tourism activity cycle includes pre-travel, traveling, and post-travel activities. The pre-travel experience includes itinerary, planning, and tour-route searching, etc. The travel process itself includes visiting tourist attractions and the travel between locations, etc. The post-travel activity includes the evaluation and feedback on the tourism experience as a whole. In the whole tourism activity cycle, the pre-travel experience is the most important factor to influence tourists' satisfaction and, therefore, their subjective evaluations regarding the quality of their experiences. Tourists will have spent a certain amount of time and cost on their experiences. Therefore, devising and suggesting tour routes according to tourists' needs and desires while realizing the minimum time and cost as well as the maximum benefit is key to optimal tour planning.

In a tour, the tourism objects are tourist attractions. Searching the very tourist attractions that accurately match the tourists' needs is the critical step for the planning and recommending of tour routes. Tourists' needs have large discrepancies, while the tourist

**Citation:** Zhou, X.; Tian, J.; Su, M. Tour-Route-Recommendation Algorithm Based on the Improved AGNES Spatial Clustering and Space-Time Deduction Model. *ISPRS Int. J. Geo-Inf.* **2022**, *11*, 118. https:// doi.org/10.3390/ijgi11020118

Academic Editors: Andrea Marchetti, Angelica Lo Duca and Wolfgang Kainz

Received: 27 December 2021 Accepted: 2 February 2022 Published: 7 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

attractions distributed in a city also have different feature attributes and spatial attributes, for which reason, each tourist attraction has relatively large different capacities on meeting tourists' needs. The diversity of tourists' needs and tourist attractions' attributes makes the searching process complex. Thus, rapidly confirming the interested tourist attraction groups according to the tourists' needs can improve the searching algorithm's accuracy and efficiency, so that applying the effective method to generate tourist attraction groups is the key step to search the accurate tourist attractions. In data mining technology, a clustering algorithm can group spatial dots. It absorbs the spatial dots which have the similar attributes into the same group and divides the large scale data mining into a smaller scale one in the group, which can improve the algorithm efficiency. This paper uses the clustering algorithm to generate city tourist attraction clusters in accordance with tourist attraction attributes, and it provides the algorithm basis for searching accurate tourist attractions. There are many kinds of clustering algorithms. This paper uses AGNES as the basic algorithm to set up the clustering model. The agglomerative nesting (AGNES) algorithm is a hierarchical clustering method that operates from bottom to top. It sets the elements as the bottom layer in the spatial distribution and gathers them from bottom to top according to a defined criterion. The AGNES algorithm is a single-link method where each cluster is represented by its arbitrary elements. Therefore, the degree of correlation of two clusters is determined by the two values with the highest degree of correlation in each cluster. The clustering process begins at the discrete distributed bottom layer and gathers each dot within the clusters and ends with the preset number of clusters. A traditional AGNES algorithm is operated with the same spatial distance as the criterion. The reasons for this paper to use AGNES are as follows. First, AGNES is simple and it is easy to implement. AGNES is a naive clustering algorithm, which has a concise principle and process. Its starting and ending conditions are definite and the selecting of the starting seed point is simple. In the clustering process, it only needs to judge the dispersion between the seed point and the non-seed point. Compared with other clustering algorithms, it is more accessible and easier to implement. Second, AGNES has relatively low spatial complexity and time complexity. It has a faster operating rate and consumes less computer memory. Third, AGNES is very suitable for the clustering on small scale dataset. In this paper, the research objects are city tourist attractions; it forms a typical small scale dataset, thus the AGNES is feasible. Fourth, AGNES is more flexible and can realize the multiple layer clustering structures on different granularity by setting different parameters. It has no strict requirements on the samples inputting sequence and can realize the synchronous clustering from different dots to reduce the convergence time.

In tourism clustering research, [1] provided a general introduction on the clustering method, including the AGNES clustering algorithm. In [2], the researchers applied the hierarchical cluster analysis to a set of Indonesian tourism sites in and around Malang City, Malang Regency, and Batu City using the AGNES algorithm to optimize a search engine that could assist tourists when choosing tourist attractions under certain constraints. In another study [3], the AGNES algorithm was applied to the data from the online platform, Airbnb. The collaborative economy of tourism hosts based on their geographic distribution was studied. The city of Guanajuato, Mexico, was selected as the subject city for convenience purposes, and the main touristic attractions were used as parameters to conduct the analysis. According to [4], an ontology-based clustering method was used to analyze the qualitative factors from a semantic perspective to define tourist segments and understand why tourists travelled to a particular destination in the Catalonia region of Spain, and the researchers reported better results using this method as compared to classic clustering algorithm methods. In the literature, the proposed ontology-based clustering method was derived from an extension of the AGNES clustering algorithm. Researchers in [5] designed an original approach to characterize the daily behaviors of tourists by analyzing the sequences of places that were visited by tourists per day, in which the geolocation information of tourists on photo-sharing websites was used as the data, from which the AGNES clustering

algorithm formed clusters and carried out the experiment. The study in [6] proposed a point-of-interest (POI) recommendation method to plan tourism routes.

Different clustering methods have been developed in the design and implementation of tourist route-information recommendation systems based on user POI indices, including AGNES clustering. In [7], the AGNES clustering algorithm was used to identify residents' dependence on public transport. It provided a potential method for choosing the transportation mode for tourists. The researchers in [8] applied semantic clustering to extract tourist preferences. It compared the semantics of tourist preferences with tourist attraction attributes and provided tourist attraction suggestions. The researchers in [9] used the partitioning clustering method to find the nearest tourism destination according to the extracted geotagged photograph-location data. The researchers in [10] studied the cluster-mapping procedures for tourism regions based on the fuzzy-clustering method. This method proved to increase the identification accuracy of the tourism clusters. The researchers in [11] developed a Bali tourism information system by using web-scrapping and clustering methods. The clustering algorithm was used to process the word-text data and output word clusters, and then performed clustering on the website. The researchers in [12] proposed a tourist-preference clustering method that was based on tourist facial and background information that were extracted from photographs. The clustering method was used to generate tourist classifications. The researchers in [13] used spatial clustering methods to mine tourist destinations and preferences, in which the regions of tourist attractions for each tourism category were derived by the clustering algorithm. The researchers in [14] used a density-based spatial clustering algorithm to study tourist behavior, and by extracting the tourist behaviors, the tourism hot-spots were extracted as they related to tourist behavior. In [15], the clustering algorithm was used to generate tourist-attraction clusters via network and geographic information system (GIS) analyses, and three tourist-attraction clusters were extracted.

For tour-route algorithms, the researchers in [16] proposed a tour-route-recommendation method using the multiple-criteria tensor model fusing time–space information. The researchers in [17] combined factors of time and space and used the tourist-attraction photographs that were posted on a website by previous tourists to set up a tour-routerecommendation model. The researchers in [18] applied a heuristic method for tour-route recommendation based on urban traffic monitoring. The researchers in [19] employed social-network analysis combined with deep-learning theory to develop a tour-routerecommendation model. The researchers in [20] created a tour-route-recommendation model that was based on Smart Agent technology. In [21], an individualized tour-routerecommendation model that was based on POI functionality and accessibility was proposed, and it determined tourist physiological and physical conditions as the important reference criteria. The researchers in [22] suggested an individualized tour-route-recommendation model that was based on social networks' geographical context cognition, and it used social relationships and trust networks among tourists as the important indices. The researchers in [23] developed a tour-route-recommendation model that was based on improved collaborative filtering technology. The researchers in [24] designed a tour-route-recommendation algorithm that was based on dynamic clustering to counter the challenge of data scarcity. In [25], a tour-route-recommendation algorithm was designed that was based on deepinterest label mining and association rule clustering. The researchers in [26] also designed a tour-route-recommendation model that was based on a collaborative filtering algorithm. The researchers in [27] suggested a tour-route-recommendation model that was based on geotagging and temporal divisions where the core principle that included user and group ratings as well as time and distance. The researchers in [28] proposed a tour-route-recommendation method that was based on tourist time–space behavioral constraints, and it used temporal and spatial constraints as the important factors. The researchers in [29] proposed a tourroute-recommendation method that was based on a combined recommendation algorithm including hybrid-interest modelling and a heuristic tour-route-planning algorithm. In [30], an energy-aware clustering method was used for mobile application, which provided a

method that efficient routing, resource allocation, and energy management can be achieved through clustering of mobile into local groups. Ref. [31] collected tourists' traveling data on the website and analyzed the tourists' behaviour and, based on the website tourists' mobile data as well as the mined POIs, it set up the tour route recommendation algorithm. Ref. [32] studied the importance of the mobile devices and location-based services. Based on the big data, such as tourism data, location predicting could be realized, which could be used in studying tourists' mobility and the tendency on the traveling behaviour.

According to the literature review, tourism clustering research has predominantly focused on tourist attractions and tourist clustering. As seen in [1–7], clustering algorithms have been used in tourism research for POI extraction, data mining, algorithm modeling, transportation behavior, etc. The other clustering methods in [8–15] indicated that spatial and attribute data of tourist attractions were the main targets that were used to generate proper tourism categories, extract tourist preferences, and recommend appropriate tourist destinations. The studies concerning tourist-attraction data extraction and tour-route algorithms that were used in [16–29] focused on three specific aspects. Refs. [30–32] tended to study the big data that were obtained from social networks on mobile devices and website. The big data could be used as the basic data to do clustering on tourist attractions and tourists or could be used to study the tourists' mobility and traveling behaviour. First, they examined the recommendation algorithm itself, including data scarcity and "cold start" issues. The data scarcity means that, in a database, the most valuable and useful data are missing, or the majority of the data are zero. The "cold start" means that, in a recommendation system, the newly registered users and new added products lack historical data, and they could be hardly recommended to the new registered users. Second, they developed an improved algorithm that was based on traditional recommendation methods such as the collaborative filtering algorithm, where historical data that were extracted from users and groups with similar interests to the current user are identified to customize the recommendations for the current user. Third, they mined historical tourist-interest data to recommend tour routes for the current tourists. Common methods that are used for this process include tourist label, photo, and evaluation data mining. Overall, the existing methods focused on improvements in algorithm performance, historical data, and the improvement on solving the problems such as "cold start" and data scarcity but overlooked tourist needs, attraction attributes, real-world geospatial environment, and tour-route searching, so they have typically yielded fuzzy results that lacked sufficient accuracy.

As indicated above, the challenges in tour-route planning remain. First, the research on tourist needs and tourist-attraction attributes is insufficient, especially in terms of real-world concerns, such as time and cost. Second, since tourist-attraction clustering provides preconditions for matching tourists' interests, there is no effective and reasonable mechanism for urban tourist attraction clustering, and the clustering criterion is merely the spatial distance, neglecting the inner attributes such as tourist attraction classification, popularity, optimal traveling time, and traveling fee. For the traditional AGNES clustering algorithm, a specific tourist's individualized needs and interests were not fully considered. Third, the research on the space-time deduction on the traveling process is insufficient, in which the space-time deduction means a tourist's traveling activity in a whole tour route will be constrained by time, space, and cost, and it is a dynamic deduction process on the traveling cost. The more tourist attractions to be visited, the more time, traveling distance, traveling fee, etc., will be produced. The time and cost play key roles on recommending tour routes. Fourth, under the conditions of fixed time and cost budget, the transportation mode determines the selected tourist attraction quantity and the planned tour route. The existing methods seldom study the mixing transportation modes with tour route planning.

Therefore, this study designed and tested a tour-route-recommendation algorithm that was based on an improved AGNES spatial clustering and space-time deduction model, focusing on precise interest-matching, urban tourist-attraction spatial clustering, spacetime deduction of the traveling process, and precise tour route searching based on the transportation mode. Compared with the previous studies, the proposed algorithm has

differences and novelties. First, the AGNES method is not merely and directly used as a clustering tool. In this study, the AGNES was set as the research target and content, whereby the improved AGNES algorithm was developed. It is the precondition of modeling the tour route algorithm. Second, in the process of developing the improved AGNES clustering algorithm, the tourist attractions' feature attributes were set as the critical parameters in forming the clustering criterion function, conforming to the tourist activity in matching tourist interests, while previous studies had only considered the spatial attributes. Third, different from the research line in which the location-based social network was exploited to understand human mobility and people behavior by mining check-in patterns, this research was based on the city tourist attractions' attributes and one tourist's specific interests. The former studies were performed on tourism big data, and they tended to mine the tourists' moving behaviors and find out the potential interested tour routes. The proposed method is a one–one mode in which tourist interests were studied and set as the specific preconditions to extract certain tourist attractions, and then the path-searching algorithm was used to find out the optimal tour route. Thus, they are different in algorithm mechanisms. Fourth, the studies on the tourist attraction and tour route recommendation are based on the fuzzy recommendation, while the proposed algorithm is under the consideration and constraint of the real-world city tourism environment, road conditions, and transportation modes, thus it could find out the global optimal routes that match the tourists' interests within the limited time and space complexity.

Figure 1 shows the research work and the structure of the paper.

**Figure 1.** The research work and the structure of the paper.

#### **2. The Improved AGNES Tourist Attraction Spatial Clustering Model**

The features and spatial attributes of urban tourist attractions can vary widely. The feature attributes are the characteristics of one tourist attraction that differ from another one, such as tourist attraction classification, popularity, optimal traveling time, traveling fee, etc. The classification labels represent the characteristics or features of one tourist attraction, they determine the tourist attraction's category, and they are typically mined from feature mapping data. The popularity is the average attraction capacity of one tourist attraction, which is determined from the online "big data" sources; for example, "Ctrip", "Fliggy", and "Qunar", among others, provide popularity data for tourist attractions in China. The optimal traveling time and cost stand for the basic time and cost that are needed by the tourists to visit one tourist attraction. Each tourist attraction has various feature attributes that are associated to quantified values. The spatial attributes consider the geospatial location and positioning of a tourist attraction, including the discrete features and the indirectly correlated features. The discrete features should be considered independently for all tourist attractions [33]. Indirectly, the correlated features represent that each tourist attraction is connected with another one by urban roads and tourists can move between two tourist attractions freely. Tourist attraction attributes determine that tourist attractions have a close or distant relationship with each other, bringing different capacities for satisfying tourist interests. The precondition of selecting the tourist attractions to be visited is to confirm the classification that meet the tourist needs and interests. Therefore, the urban tourist attractions should be clustered primarily.

#### *2.1. The Foundation of Tourist Attraction Attribute Label Matrix Model*

The preconditions for the clustering algorithm confirmed the tourist attraction attributes and developed the association model that would measure the degree of correlation among the attractions. The degree of correlation among their attributes would be determined by their features as well as by their spatial factors. Thus, the clustering model should combine with the feature attribute factors and the spatial attribute factors [34].

The arbitrary typical tourist attraction in a tourism city is the tourist attraction element *s*(*i*), and it belongs to one certain tourist attraction classification. All the elements of *s*(*i*) form an entire research range, and it is the tourist attraction research domain **S**. The domain **S** contains different types of tourist attractions, and it can be divided into several classifications.

The inner characteristics of one tourist attraction are the feature attributes, and they are noted as *t*1(*i*1). The feature attributes influence tourist choices on the interest tendency and intelligent system's search results of tourist attraction clusters and specific tourist attractions. The factor *i*<sup>1</sup> is the footnote of the feature attribute. Meanwhile, the touristattraction geolocation is the spatial attribute factor *t*2(*i*2), and *i*<sup>2</sup> is the factor's footnote. The tourist attraction attributes include *m* number of feature attributes *t*1(*i*1) and *n* number of spatial attributes *<sup>t</sup>*2(*i*2), *<sup>i</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>m</sup>*] <sup>⊂</sup> <sup>Z</sup>+, *<sup>i</sup>*<sup>2</sup> <sup>∈</sup> (0, *<sup>n</sup>*] <sup>⊂</sup> <sup>Z</sup>+. Each factor *<sup>t</sup>*1(*i*1) or *<sup>t</sup>*2(*i*2) is one feature attribute label and spatial attribute label of *s*(*i*), and collectively, the tourist attraction attribute label.

The feature attribute *t*1(*i*1) includes *u*(*i*1) items of classifying indices *t*1(*i*1,*j*1), *<sup>j</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>α</sup>*] <sup>⊂</sup> <sup>Z</sup>+, *<sup>j</sup>*<sup>1</sup> is the footnote of *<sup>t</sup>*1(*i*1,*j*1), and *<sup>α</sup>* is the maximum number of *<sup>t</sup>*1(*i*1). The 1 × *u*(*i*1) dimension matrix **t**1(*i*1) is formed by *u*(*i*1) items of *t*1(*i*1,*j*1) in the factor *t*1(*i*1) and determines the No. *t*1(*i*1) feature attribute and tourists' interest tendency. The tourist attraction feature attribute label vector is **t**1(*i*1). The spatial attribute *t*2(*i*2) includes *u*(*i*2) items of classifying indices *<sup>t</sup>*2(*i*2,*j*2), *<sup>j</sup>*<sup>2</sup> <sup>∈</sup> (0, *<sup>α</sup>*] <sup>⊂</sup> <sup>Z</sup>+, *<sup>j</sup>*<sup>2</sup> is the footnote of the indices *<sup>t</sup>*2(*i*2,*j*2) of *t*2(*i*2), and *α* is the maximum number of *t*2(*i*2). The 1 × *u*(*i*2) dimension matrix **t**2(*i*2) is formed by *u*(*i*2) items of *t*2(*i*2,*j*2) in the factor *t*2(*i*2) and determines the No. *t*2(*i*2) spatial attribute, and it is the spatial attribute label vector **t**2(*i*2). The classifying index of the spatial attribute is formed to match the attribute label vector and create the attribute matrix. The vector's rank meets *rank*(**t**1(*i*<sup>1</sup>)) = *u*(*i*1), and *rank*(**t**2(*i*<sup>2</sup>)) = *u*(*i*2).

For the tourist attraction attribute label matrix **T**, it is formed by *m* number of **t**1(*i*1) and *n* number of **t**2(*i*2) and determines the tourist attraction's features and spatial attributes as well as influences the tourists' interest tendency. The matrix **T** meets the following conditions: The matrix row is the vector **t**1(*i*1) or **t**2(*i*2). The matrix column is the element of the vector **t**1(*i*1) or **t**2(*i*2). The matrix contains *m* + *n* number of rows and *α* number of columns. The rows from 1 to *<sup>m</sup>* relate to the vector **<sup>t</sup>**1(*i*1) <sup>∼</sup> *<sup>i</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>m</sup>*] <sup>⊂</sup> <sup>Z</sup>+, the *<sup>m</sup>* <sup>+</sup> 1 to *<sup>m</sup>* <sup>+</sup> *<sup>n</sup>* rows relate to the vector **<sup>t</sup>**2(*i*2) <sup>∼</sup> *<sup>i</sup>*<sup>2</sup> <sup>∈</sup> (0, *<sup>n</sup>*] <sup>⊂</sup> <sup>Z</sup>+. One tourist attraction relates to one matrix element distribution. Equation (1) is the general formula of the matrix **T** and its element distribution.

$$\mathbf{T} = \begin{bmatrix} \mathbf{t}\_{1(1)} & \dots & \mathbf{t}\_{1(m)} & \mathbf{t}\_{2(1)} & \dots & \mathbf{t}\_{2(n)} \end{bmatrix}^{\mathrm{T}} = \begin{bmatrix} t\_{1(1,1)} & t\_{1(1,2)} & t\_{1(1,3)} & \dots & 0 & 0 \\ & \dots & & \dots & \\ t\_{1(m,1)} & t\_{1(m,2)} & \dots & & \dots & t\_{1(m,n)} \\ t\_{2(1,1)} & t\_{2(1,2)} & \dots & 0 & \dots & t\_{2(1,n)} \\ & & \dots \\ t\_{2(m+n,1)} & t\_{2(m+n,2)} & \dots & t\_{2(m+n,4)} & \dots & 0 \end{bmatrix} \tag{1}$$

The feature attributes and spatial attributes are quantified. The feature attributes include tourist attraction classification *t*1(1), popularity *t*1(2), optimal travel time *t*1(3), and traveling fee *t*1(4). The spatial attribute mainly relates to the longitude and latitude coordinates (*t*2(1), *t*2(2)) ∼ (*l*, *B*) of the tourist attraction [35]. The feature attributes *t*1(*i*1) and the spatial attributes *t*2(*i*2) are quantified, where *t*1(1) is tourist attraction classification; *<sup>t</sup>*1(2) is popularity degree, noted as *ho*, *ho* <sup>∈</sup> (0, 1) <sup>⊂</sup> <sup>R</sup>+, representing the users' average

evaluation scores on the website; *t*1(3) is the optimal travel time, noted as *tb*, unit: hour; and *t*1(4) is the traveling fee (cost), noted as *co*, unit: CNY, ¥ yuan. The spatial attributes include longitude *t*2(1) ∼ *l* and latitude *t*2(2) ∼ *B*. Each attribute factor includes a specific data value range which forms the tourist attraction feature attribute label vector **t**1(*i*1) and spatial attribute label vector **t**2(*i*2). The classification factor is determined by the tourist attraction's inner attributes and it is the critical index to distinguish different tourist attractions and an important reference for a smart system to select a tourist attraction cluster and specific tourist attractions. The popularity degree represents the average preference of tourists on a tourist attraction *s*(*i*). The optimal travel time represents the most suitable time for tourists to visit a tourist attraction *s*(*i*). The traveling fee represents the minimum cost for tourists to visit a tourist attraction *s*(*i*) such as the fee for the entrance ticket. The formed

tourist attraction attribute label matrix **<sup>T</sup>** is **t**1(1),... **t**1(*m*), **t**2(1),... **t**2(*n*) *T* , each label vector includes the specific index *t*1(*i*1,*j*1) or *t*2(*i*2,*j*2). Quantify the index *t*1(*i*1,*j*1) or *t*2(*i*2,*j*2) as follows, in which the classification factor is also quantified into a specific value.

**T**:{*t*1(1): Tourist attraction classification; *t*1(2): popularity degree; *t*1(3): the optimal travel time; *t*1(4): traveling fee; *t*2(1): longitude; *t*2(2): latitude};

**t**1(1): {*t*1(1,1): nature park (1.00); *t*1(1,2): humanistic history (2.00); *t*1(1,3): amusement park (3.00); *t*1(1,4): leisure shopping (4.00); *t*1(1,5): modern science and technology (5.00); *t*1(1,6): artistic aesthetics (6.00)};

**t**1(2):{*t*1(2,1): *ho* ∈ (0, 0.25]; *t*1(2,2): *ho* ∈ (0.25, 0.50]; *t*1(2,3): *ho* ∈ (0.50, 0.75]; *t*1(2,4): *ho* <sup>∈</sup> (0.75, 1.00)}, *<sup>t</sup>*1(2,*j*1) <sup>∈</sup> <sup>R</sup>+;

**t**1(3):{*t*1(3,1): *tb* ∈ (0, 2.0]: *t*1(3,2): *tb* ∈ (2.0, 4.0]; *t*1(3,3): *tb* ∈ (4.0, 6.0]; *t*1(3,4): *tb* <sup>∈</sup> (6.0, <sup>+</sup>∞)}, *<sup>t</sup>*1(3,*j*1) <sup>∈</sup> <sup>R</sup>+;

**t**1(4):{*t*1(4,1): *co* ∈ (0, 100]; *t*1(4,2): *co* ∈ (100, 200]; *t*1(4,3): *co* ∈ (200, 300]; *t*1(4,4): *co* <sup>∈</sup> (300, <sup>+</sup>∞)}, *<sup>t</sup>*1(3,*j*1) <sup>∈</sup> <sup>R</sup>+.

When all the feature attribute label vectors **t**1(*i*1) for all the elements *s*(*i*) in domain **S** are confirmed, the correction parameter for each vector **t**1(*i*1) is then defined to normalize all the values.

The impact of each feature attribute label vector impact on calculating the degree of correlation between the tourist attractions should be in the same order of magnitude, and thus the feature attribute label vector normalized parameter *δ*1(*i*1) is generated, and all the labels are normalized according to a range of (0, 1]. According to the range of the vector **t**1(*i*1), each normalized parameter *δ*1(*i*1) is confirmed as follows:

$$\delta\_{1(1)} = 0.100, \delta\_{1(2)} = 1.000, \delta\_{1(3)} = 0.100, \delta\_{1(4)} = 0.001.$$

The parameter *δ*1(*i*1) is used to normalize each vector **t**1(*i*1) in the matrix **T** to obtain a new normalized matrix **T***δ*. As compared to the matrix **T**, the elements in the matrix **T***<sup>δ</sup>* are all normalized except for the vector **t**2(*i*2). Equation (2) is the general formula for the matrix **T***δ*.

$$\begin{cases} \mathbf{T}\_{\delta} = \begin{bmatrix} \delta\_{1(1)} \cdot \mathbf{t}\_{1(1)}, & \dots & \delta\_{1(m)} \cdot \mathbf{t}\_{1(m)} \cdot \mathbf{t}\_{2(1)} & \dots & \mathbf{t}\_{2(n)} \end{bmatrix}^{T} \\ \begin{bmatrix} \delta\_{1(1)} \cdot \mathbf{t}\_{1(1)} & \delta\_{1(2)} \cdot \mathbf{t}\_{1(12)} & \delta\_{1(3)} \cdot \mathbf{t}\_{1(3)} & \dots & 0 & 0 \\ & & \ddots & & \ddots & \\ & & & t\_{2} & & \\ & t\_{2}(1), & t\_{2}(2) & \dots & 0 & \dots & t\_{2(1,n)} \\ & & & & \ddots & \\ & & & t\_{2(m+n,1)} & \dots & \mathbf{t}\_{2(m+n,4)} & \dots & 0 \\ \end{bmatrix} \end{cases} \tag{2}$$

Based on the tourist attraction attribute label matrix **T** and the normalized matrix **T***δ*, the tourist attraction research domain **S** clustering algorithm is created.

#### *2.2. The Tourist Attraction Domain Clustering Algorithm Based on the Improved AGNES Algorithm*

The aim of the tourist attraction domain clustering was to obtain a cluster with a high degree of correlation among the attributes, realizing that the tourist attractions in the same clusters have a high degree of correlation among the attributes while those in different clusters have a low degree of correlation among the attributes, and finally to guide the smart system into precisely matching tourist interests. The clustering process was the automatic process driven by data, and the clustering criteria could differentiate according to the different clustering targets. When a spatial dot is the only a location point in a coordinate system, a traditional clustering algorithm will assume the spatial distance as a singular criterion. Tourist attractions have spatial attributes and feature attributes, and thus the criteria for tourist-attraction clustering should combine both factors.

The *k* number of elements *s*(*i*) in the domain **S** are clustered by the clustering algorithm, and the tourist attractions *s*(*i*), which have a high degree of correlation among the attributes and are in the same cluster *S*(*i*), while the tourist attractions *s*(*i*) and <sup>¬</sup>*s*(*i*), which have a low degree of correlation among the attributes, are in the different clusters *S*(*i*) and <sup>¬</sup>*S*(*i*), *k* ∈ N. The cluster's element is noted as *s*(*i*,*j*), *i* is the footnote of the cluster *S*(*i*), *j* is the footnote of the element in the cluster *S*(*i*). In all, it is supposed that the clustering algorithm forms *p* number of clusters, *p* ∈ N and *p* << *k*. Assume that the cluster *S*(*i*) contains *k*(*i*) number of elements *<sup>s</sup>*(*i*,*j*), and *<sup>j</sup>* <sup>∈</sup> (0, *<sup>k</sup>*(*i*)] <sup>⊂</sup> <sup>Z</sup>+, so thus <sup>∑</sup>*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> *k*(*i*) = *k*. The elements *s*(*i*,*j*) in the same cluster *S*(*i*) have a high degree of correlation among the attributes, and elements *s*(*i*,*j*) in different clusters *S*(*i*) and <sup>¬</sup>*S*(*i*) have a low degree of correlation among the attributes. An arbitrary cluster ∀*S*(*i*) contains at least one element *s*(*i*). Arbitrary one element ∀*s*(*i*) in the domain **S** only belongs to one certain cluster *S*(*i*). Clusters *S*(*i*) and other cluster <sup>¬</sup>*S*(*i*) have no intersection, but in the aspect of spatial analysis, the clusters may have a buffer overlap in the city space. The union of all the clusters *S*(*i*) is the domain **S**, and *i* ∈ (0, *p*] ⊂ N. In the domain **S**, there are at least two clusters, that is *p* ≥ 2.

Whether the tourist attraction element *s*(*i*) should be absorbed into the cluster *S*(*i*) is determined by the objective function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) among *s*(*i*) and other tourist attractions. The function is determined by several clustering factors, including the feature attribute factors *t*1(*i*1) and the spatial attributes factors *t*2(*i*2). As to the two independent tourist attractions *s*(*i*1) and *s*(*i*2), their degree of correlation includes their geospatial relationship and the spatial attributes correlation, and thus their neighborhood relationship is determined by consensus of the two factors. Therefore, the matrix **T** and matrix **T***<sup>δ</sup>* both contain the factors classification *t*1(1), popularity degree *t*1(2), the optimal travel time *t*1(3), and the traveling cost *t*1(4), as well as longitude and latitude (*t*2(1), *t*2(2)) ∼ (*l*, *B*). The improved Minkowski distance is applied to for the objective function, and the clustering criteria should consider features and spatial attributes simultaneously. The pseudo-code of the process to create the function *ξ*(*s*(*i*1),*s*(*i*2)). (Algorithm 1) is shown as follows.

**Algorithm 1:** The process to create the function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>))

1: **Step 1:** Confirm **T***δ*(*i*) for *s*(*i*) in **S**.


The Minkowski distance between the two samples **x** and **y** is shown in Equation (3). The Minkowski distance is used to define the objective function *ξ*(*s*(*i*1),*s*(*i*2)), shown as Equations (4) and (5). According to the function *ξ*(*s*(*i*1),*s*(*i*2)), the norm value of the function is used to judge whether the tourist attractions *s*(*i*1) and *s*(*i*2) belong to the same cluster. Therefore, the function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) value is set as the clustering criterion.

$$d(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\| p = \left[\sum\_{i=1}^{n} \left| x\_{(i)} - y\_{(i)} \right|^{r} \right]^{1/r} \tag{3}$$

$$\mathcal{S}\_{\left(s\_{\left(i1\right)}, s\_{\left(i2\right)}\right)} = \left\| \left| \mathbf{T}\_{\delta\left(i1\right)} ^T - \mathbf{T}\_{\delta\left(i2\right)} ^T \right| \right\|\_{2} \tag{4}$$

$$\xi\_{\left(\mathbf{s}\_{\left(\mathbf{i}\right),\mathbf{s}\_{\left(\mathbf{i}\right)}\right)}} = \left[\sum\_{i=1}^{m} \left| \delta\_{1,\mathbf{i}\_{\left(\mathbf{i}\right)}} \cdot t\_{1,\mathbf{i}\_{\left(\mathbf{i}\right)}\mathbf{i}\_{\left(\mathbf{i}\right)}} - \delta\_{1,\mathbf{2}\left(\mathbf{i}\right)} \cdot t\_{1,\mathbf{2}\left(\mathbf{i}\_{\left(\mathbf{i}\right)}\mathbf{i}\right)} \right|^{2} + \sum\_{i=1}^{n} \left| t\_{2,\mathbf{i}\_{\left(\mathbf{i}\right)}\mathbf{i}\_{\left(\mathbf{i}\right)}} - t\_{2,\mathbf{2}\left(\mathbf{i}\_{\left(\mathbf{2},\mathbf{j}\right)}\mathbf{i}\right)} \right|^{2} \tag{5}$$

In the process of generating clusters, the *k* number of elements *s*(*i*) are dynamically stored into one matrix **K**∧(*p* × max*k*(*i*)) in the cluster code sequence by the clustering algorithm. Each row in the matrix dynamically stores the related cluster's elements. When the clustering algorithm ends, all the tourist attraction elements are consistently stored in the matrix **K**(*p* × max*k*(*i*)) according to the cluster code *i* and cluster's element code *j*. The matrix row number is *p*, the column number is max*k*(*i*), in which *k* number of elements are used to store tourist attractions, while the other *p* × max*k*(*i*) − *k* number of elements are stored as 0. The row rank meets at *rank*(**K**∧ (*p*•)) ≤ *<sup>p</sup>* and *rank*(**K**(*p*•)) ≤ *<sup>p</sup>*. The column rank meets at *rank*(**K**∧ (•max*k*(*i*))) ≤ max*k*(*i*)) and *rank*(**K**(•max*k*(*i*))) ≤ max*k*(*i*)). The matrix **K**(*p* × max*k*(*i*))) has at least two non-empty rows. Equations (6) and (7) relate to the matrix **K**∧(*p* × max*k*(*i*)) and **K**(*p* × max*k*(*i*)), in the formula, *s*(*i*) ∧ represents the element with random storage value.

$$\mathbf{K}^{\wedge}(p \times \max(\mathbf{s}\_{(i)})) = \begin{bmatrix} \mathbf{s}\_{(1)} \wedge & \mathbf{s}\_{(2)} \wedge & \dots & \mathbf{0} & \mathbf{s}\_{(i1)} \wedge\\ \mathbf{s}\_{(i1+1)} \wedge & \dots & \mathbf{0} & \dots & \mathbf{s}\_{(i2)} \wedge\\ \mathbf{0} & \dots & \mathbf{s}\_{(i)} \wedge & \dots & \mathbf{0} \\ \mathbf{s}\_{(i3)} \wedge & \dots & \mathbf{0} & \dots & \mathbf{s}\_{(k)} \wedge \end{bmatrix} \tag{6}$$

$$\mathbf{K}(p \times \text{max}k\_{(\mathbf{j})}) = \begin{bmatrix} s\_{(1,1)} & s\_{(1,2)} & \dots & \dots & s\_{(1,\text{maxk}(1))} \\ s\_{(2,1)} & \dots & s\_{(2,k(2))} & \dots & s\_{(2,\text{maxk}(2))} \\ \dots & \dots & s\_{(\boldsymbol{i},\boldsymbol{j})} & \dots & 0 \\ s\_{(p,1)} & \dots & 0 & s\_{(p,\text{maxk}(p))} & 0 \end{bmatrix} \tag{7}$$

The tourist attraction clustering objective function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) is set as the improved AGNES clustering algorithm criterion. The *k* number of elements *s*(*i*) in the domain **S** are clustered into *p* number of clusters and stored into the matrix **K**(*p* × max*k*(*i*))).

In the improved AGNES clustering algorithm, in a single instance of dot gathering from the bottom to top, a seed point element *s*(*i*)∗ is chosen as the tourist attraction representing a certain cluster *S*(*i*). Take the element *s*(*i*)∗ as a criterion to calculate and judge the objective function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) to confirm another element to be gathered and form the cluster. The tourist attractions that are not the seed point are noted as <sup>¬</sup>*s*(*i*)∗.

In one instance of gathering from bottom to top, if one point <sup>¬</sup>*s*(*i*)∗ belongs to the cluster *S*(*i*), the point <sup>¬</sup>*s*(*i*)∗ is absorbed into the cluster *S*(*i*), the edge *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)) connecting *s*(*i*)∗ and <sup>¬</sup>*s*(*i*)∗ is generated. When the clustering algorithm ends, the *k*(*i*) number of tourist attractions as well as the gathered *k*(*i*) − 1 number of topological edges *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)) in the cluster *S*(*i*) form a cluster structure tree *Tr*(*S*(*i*)). The spatial range that is expanded by the tree *Tr*(*S*(*i*)) forms the cluster spatial buffer *ra*(*S*(*i*)). The edge *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)) and the tree *Tr*(*S*(*i*)) show the visualized process of the improved AGNES algorithm. The buffer *ra*(*S*(*i*)) is the visualized range for each cluster. Since the objective function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) contains both feature attributes and spatial attributes, different buffers *ra*(*S*(*i*)) may intersect. Figure 2 shows the spatial relationship among the cluster *S*(*i*) topological edge *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)), cluster structure tree *Tr*(*S*(*i*)), and the cluster spatial buffer *ra*(*S*(*i*)). Figure 2a is an edge *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)), Figure 2b is the tree *Tr*(*S*(*i*)) which is formed by several edges *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)), and Figure 2c is the buffer *ra*(*S*(*i*)) which is formed by the cluster structure tree *Tr*(*S*(*i*)).

**Figure 2.** The spatial relationship among the cluster *S*(*i*) topological edge *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)), cluster structure tree *Tr*(*S*(*i*)), and the cluster spatial buffer *ra*(*S*(*i*)). (**a**) is an edge *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)), (**b**) is the tree *Tr*(*S*(*i*)) that is formed by several edges *l*(*s*(*i*), <sup>¬</sup> *s*(*i*)), and (**c**) is the buffer *ra*(*S*(*i*)) that is formed by the cluster structure tree *Tr*(*S*(*i*)).

According to the modeling principle, the improved AGNES clustering algorithm has been created. The smart system will search the optimal tourist attractions and tour routes by the *p* number of clusters and tourists' interests, time budget, and cost budget, etc. The pseudo-code of the process to create the improved AGNES clustering algorithm


The proposed AGNES clustering algorithm is significantly different from those that have been used in previous research (see the Introduction section). First, the aim is totally different; the proposed method is to find out the classifications of city tourist attractions, and it tends to extract the correlation among different tourist attractions and calculate the degree of correlation between two tourist attractions, and finally output the tourist attraction clusters. This clustering process is the critical step for tourists' interests matching the tourist attractions' attributes. The previous methods did not concern tourist attractions clustering, and they mainly tended to find out the tourists' clusters, tourists' behaviour, and the relationship between the collaborative economy and tourism, etc. Second, the parameters that were used in developing the AGNES model are different. Besides the spatial attributes, the proposed AGNES algorithm makes improvements on the clustering criterion function by adding tourist attraction's feature attributes, which makes the tourist attraction clustering more logical, since the clusters and tourist attractions are grouped to

match the tourists' interests. The previous methods directly used the AGNES algorithm itself on the basis of spatial attributes such as longitude and latitude. Third, since the proposed AGNES algorithm is an improved method, the detailed steps on modeling the algorithm are provided in the paper. It is an important research content and precondition of the whole research work. In previous studies, the AGNES algorithm is merely a tool that is used by the authors without detailed algorithm modeling process.

#### **3. Tour-Route-Recommendation Algorithm Based on the Space-Time Deduction**

The selection of tourist attractions and tour-route design are the two critical factors for any tour itinerary. Tourists must choose the tourist attractions that best match their interests and then plan the most reasonable route based on their selections. Time and cost are always limitations, to which travel and participation significantly contribute. Therefore, with available time and financial expectations as fixed conditions, a "smart" system should be able to recommend attractions that best-match an individual's preferences as well as optimize the transportation route. Since the mode of transportation would be largely influenced by the tourist themselves, it was a crucial factor for consideration when developing our model [36,37].

#### *3.1. Tourist Attraction Reachability Space Model Based on Interest Matrix and Geographical Position*

The precondition for the smart system to recommend tourists with tour routes was obtaining the tourist interest data. The interest data were set as the input labels and then matched with the tourist attraction attributes. The capacity of each tourist attraction to satisfy a tourist's interests would be different, and this capacity was defined as the reachable capacity, the value of which would dictate the likelihood of its recommendation by the system. Therefore, creating a reachability space model between the tourist interest data and the tourist attractions was the precondition when searching for tourist attractions that would best satisfy an individual's interests [38,39].

The tourist interest label vector **n**1(*i*1) and spatial positioning vector **n**2(*i*2) have the same dimension as the vectors of **t**1(*i*1) and **t**2(*i*2), and they represent the tourist-interest data. The variable **n**1(*i*1) is a 1 × *u*(*i*1) dimension vector. The interest label factor *n*1(*i*1) contains *<sup>u</sup>*(*i*1) items of different classifying indices *<sup>n</sup>*1(*i*1,*j*1) and *<sup>j</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>α</sup>*] <sup>⊂</sup> <sup>Z</sup>+. The variable *<sup>j</sup>*<sup>1</sup> is the footnote for the index *n*1(*i*1,*j*1) of the factor *n*1(*i*1), and *α* is the maximum number of the index. The variable **n**2(*i*2) is a 1 × *u*(*i*2) dimension vector. The spatial positioning factor *<sup>n</sup>*2(*i*2) contains *<sup>u</sup>*(*i*2) items of different classifying indices *<sup>n</sup>*2(*i*2,*j*2) and *<sup>j</sup>*<sup>2</sup> <sup>∈</sup> (0, *<sup>α</sup>*] <sup>⊂</sup> <sup>Z</sup>+, where *j*<sup>2</sup> is the footnote for the index *n*2(*i*2,*j*2) of the factor *n*2(*i*2), and *α* is the maximum number of the index. The number of vectors **n**1(*i*1) and **n**2(*i*2) are *m* and *n*.

The starting point of one tour route for the tourist is *St*. The point *St* determines the dimension and specific values of the spatial location vector **n**2(*i*2). The matrix **N** is formed by *m* number of feature attribute label vectors **n**1(*i*1) and *n* number of spatial attribute label vector **n**2(*i*2) and represents the tourists' interest tendency. The matrix row is the vector **n**1(*i*1) or **n**2(*i*2) and the column is the specific element of the vector **n**1(*i*1) or **n**2(*i*2). It contains *m* + *n* number of rows and *α* number of columns. The No.1 to No.*m* rows relate to the vector **<sup>n</sup>**1(*i*1) <sup>∼</sup> *<sup>i</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>m</sup>*] <sup>⊂</sup> <sup>Z</sup>+, the No.*<sup>m</sup>* <sup>+</sup> 1 to No.*<sup>m</sup>* <sup>+</sup> *<sup>n</sup>* rows relate to the vector **<sup>n</sup>**2(*i*2) <sup>∼</sup> *<sup>i</sup>*<sup>2</sup> <sup>∈</sup> (0, *<sup>n</sup>*] <sup>⊂</sup> <sup>Z</sup>+. When the tourist interest data are confirmed, the arbitrary row **<sup>n</sup>**1(*i*1) will form one item of an attribute element value *<sup>n</sup>*1(*i*1,*j*1), *<sup>j</sup>*<sup>1</sup> <sup>∈</sup> (0, *<sup>α</sup>*] <sup>⊂</sup> <sup>Z</sup>+, and the other elements are 0. The Equation (8) is the general formula **N** and its specific elements.

**<sup>N</sup>** = **<sup>n</sup>**1(1), ..., **<sup>n</sup>**1(*m*), **<sup>n</sup>**2(1), ..., **<sup>n</sup>**2(*n*) *<sup>T</sup>* = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ *n*1(1,1) *n*1(1,2) *n*1(1,3) ... 0 0 ... ... *n*1(*m*,1) *n*1(*m*,2) ... ... *n*1(*m*,*α*) *n*2(1,1) *n*2(1,2) ... 0 ... *n*2(1,*α*) ... *n*2(*m*+*n*,1) *n*2(*m*+*n*,2) ... *n*2(*m*+*n*,4) ... 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (8)

> The matrix **N** elements are related to the matrix **T** elements, including tourist classification **n**1(1), popularity degree **n**1(2), travel time **n**1(3), traveling fee **n**1(4), longitude **n**2(1), and latitude **n**2(2).

> **n**1(1):{*n*1(1,1): nature park (1.00); *n*1(1,2): humanistic history (2.00); *n*1(1,3): amusement park (3.00); *n*1(1,4): leisure shopping (4.00); *n*1(1,5): modern science and technology (5.00); *n*1(1,6): artistic aesthetics (6.00).};

> **n**1(2):{*n*1(2,1): *ho* ∈ (0, 0.25]; *n*1(2,2): *ho* ∈ (0.25, 0.50]; *n*1(2,3): *ho* ∈ (0.50, 0.75]; *n*1(2,4): *ho* <sup>∈</sup> (0.75, 1.00)}, *<sup>n</sup>*1(2,*j*1) <sup>∈</sup> <sup>R</sup>+;

> **n**1(3):{*n*1(3,1): *tb* ∈ (0, 2.0]; *n*1(3,2): *tb* ∈ (2.0, 4.0]; *n*1(3,3): *tb* ∈ (4.0, 6.0]; *n*1(3,4): *tb* ∈ (6.0, <sup>+</sup>∞)}, *<sup>n</sup>*1(3,*j*1) <sup>∈</sup> <sup>R</sup>+;

> **n**1(4):{*n*1(4,1): *co* ∈ (0, 100]; *n*1(4,2): *co* ∈ (100, 200]; *n*1(4,3): *co* ∈ (200, 300]; *n*1(4,4):*co* ∈ (300, <sup>+</sup>∞)}, *<sup>n</sup>*1(3,*j*1) <sup>∈</sup> <sup>R</sup>+.

> The spatial location vector **n**2(*i*2) of the matrix **N** is determined by the longitude and latitude of the point *St*.

> The correlation between the tourists' interest and the tourist attraction attributes is determined by the interest quantitative matching objective function *ξ*(*N*,*T*). Transfer the feature attribute label vector normalized parameter *δ*1(*i*1) and take it as the parameter to create the function *ξ*(*N*,*T*), then confirm the tourist-interest data. Traverse *j*1, *j*<sup>2</sup> ∼ (0, *α*], search and extract the non-zero elements *δ*1(*i*1) · *n*1(*i*1,*j*1) and *n*2(*i*2,*j*2) in the matrix **N** label vector *<sup>δ</sup>*1(*i*1) · **<sup>n</sup>**1(*i*1) and **<sup>n</sup>**2(*i*2). Transpose the matrix **<sup>N</sup>** and generate the matrix **<sup>N</sup>***T*. Create the norm relationship of the Minkowski distance between the tourists' interests and the tourist attractions, shown in the Equations (9) and (10). Calculate the function *ξ*(*N*,*T*) between the matrix **N** and the matrix **T**. Use the interest quantitative matching objective function matrix **P***ξ*(*N*,*T*) to store the function value.

$$\xi\_{(N,T)}^{\mathbf{x}} = \left\| \mathbf{N}^T - \mathbf{T}^T \right\|\_2 \tag{9}$$

$$\mathcal{L}\_{\mathbf{(N,T)}} = \left| \sum\_{i=1}^{m} \left| \delta\_{1(i1)} \cdot n\_{1(i1,j1)} - \delta\_{1(i\_1)} \cdot t\_{1(i1\_i)1} \right|^2 + \sum\_{i=2}^{n} \left| n\_{2(i2\_i)2} - t\_{2(i2\_i)} \right|^2 \right|^{1/2} \tag{10}$$

When the interest data remains the same, the tourist attractions *s*(*i*) in different clusters will generate different function values *ξ*(*N*,*T*). The values *ξ*(*N*,*T*) are stored in the sequence of the cluster *S*(*i*) footnote of the matrix **P***ξ*(*N*,*T*). The value *ξ*(*N*,*T*) is stored in the **P***ξ*(*N*,*T*) in ascending order from the first row and column to the last one. When tourists confirm the interest data, they contain the longitude and latitude of the starting point *St*. Taking the point *St* as the center core, each row of the matrix **P***ξ*(*N*,*T*) represents the correlation between the tourist attractions and the interest data, and also represents the reachability extent of the tourist attractions.

#### *3.2. The Dynamic Space-Time Deduction Algorithm Based on the Travel Time and Cost*

During a city tour, tourists expect to visit several tourist attractions in one day; tourists have different interests and levels of desire to visit various kinds of tourist attractions that will each involve different time investments and associated costs. Therefore, when the time and cost are fixed, the number of tourist attractions to be visited must be finite. According to the tourist attraction reachability model, the smart system would formulate a tour route that would best match tourist interests while meeting the time and cost conditions. Furthermore, depending on the mode of transportation that was chosen, the goal of saving time and cost could result in better attraction recommendations and optimized routeplanning. Since travel time would be directly affected by the mode of transportation and route between attractions, the precondition for the dynamic space–time deduction of the tour-route-recommendation algorithm had the lowest path-searching cost [37,38].

#### 3.2.1. The Shortest-Path-Searching Algorithm Based on the Space-Vector Lattice

After visiting a tourist attraction, tourists will move to the next one. This activity is based on specific activities. First, tourists will use a transportation mode such as walking, cycling, taxi service, etc. Second, they will travel city roads to the destination. Third, the moving process will consume time and cost.

The traffic space between two tourist attractions is the tourist attraction traffic subspace Φ. The space Φ is the interval from point *A* to point *B*, and it is a vector space with coordinates, shown in Figure 3. The left bottom dot of the square Φ is the origin of coordinate. Each line represents an abstract city road. The line intersection *a*(*i*) represents the road intersection. In the Figure 3, the space Φ contains all city roads between the two points *A* and *B*. The road distance *dis*(*a*(*i*), *a*(*j*)) of the edges *CD*, *DE*, *EF,* and *CF* in the small square *CDEF* may be different.

**Figure 3.** The space-vector lattice between the point *A* and *B* to search the shortest path. (**a**) is the spatial connecting line of the space Φ. (**b**) is the spatial road and lattice relationship as well as the searching process for the series contained in the square *Squ*(*A*, *a*(1), *a*(5), *a*(6)). (**c**) is the spatial road-lattice relationship as well as the searching process for the series that is contained in the square *Squ*(*A*, *a*(2), *a*(10), *a*(12)).

Starting from the point *A*, search the path along the road until the point *B* is reached; in the whole process, all the searched points are listed in the spatial searching series *Seq*. The searching series *Seq* represents a reachable path that is related to a searched distance *dis*(*Seq*). Figure 3 shows the shortest-path algorithm searching process.

The pseudo-code of the process to create the shortest-path algorithm (Algorithm 3) is shown as follows. This searching mode considers all the city roads and intersections between two points and finds the global minimum value, which may be more precise than the other shortest-path algorithms. The shortest path may reduce time and costs and thus increase the number of tourist attractions to be visited.

#### 3.2.2. The Dynamic Space-Time Deduction Tour-Route-Searching Algorithm

Once the tourist attractions are been identified based on tourist interests, the travel time and cost would determine the number of tourist attractions to be visited and the optimal travel route, all of which would influence a tourist's overall experience. Based on the matrix **P***ξ*(*N*,*T*) and fixed time and cost conditions, the dynamic space-time deduction tour-routesearching algorithm was created. The basic process of the algorithm was as follows: using a daytrip as an example, a tourist confirms the travel time budget *t* (unit: hour), traveling fee *c* (unit: CNY ¥ yuan), and then chooses one transportation mode. Starting from the point *St*, search the shortest path between the point *St* and tourist attractions and confirm the travel time using the chosen transportation mode. Iterate the visiting time in the tourist attractions and calculate the travel costs and any entrance or activity fees. Search the minimum time and cost between the point *St* as well as all the tourist attractions and set the related point as the first tourist attraction *K*(1) to be visited. Starting from the *K*(1) point, search the

next minimum travel time and cost tourist attraction *K*(2) until the total travel time or cost reaches the preset time budget *t* or cost budget *c*, and then the optimal tour route is defined.

#### **Algorithm 3:The process to create the shortest-path algorithm**


In a tour, the road interval from the point *A* to *B* is a cost iteration sub-unit *Q*(*i*). This sub-unit is the basic unit for the dynamic deduction process. The sum of the travel time contains the visiting time of a tourist attraction and the travel time from the point *A* to *B*; it is the time consumption *t*(*i*) of the sub-unit. The time *t*(*i*) is determined by the tourist attraction *B* visiting time and the travel time to the *B*. The sum of traveling costs contains the visiting fees of the tourist attractions and the travel costs from the point *A* to *B*; it is the cost consumption *c*(*i*) of the sub-unit. Starting from the point *St*, the tourist passes through *n* number of sub-units *Q*(*i*) and finally deduces to the terminal tourist attraction *P*; in this process, the total time and costs are noted as the dynamic deduction time Δ*t* dynamic deduction cost Δ*c*, as shown in the Equation (11):

$$
\Delta t = \sum\_{i=1}^{n} t(i) \quad , \quad \Delta c = \sum\_{i=1}^{n} c\_{(i)} \quad , \quad i, n \in \mathbb{N} \tag{11}
$$

A 1 × *k* dimension vector **T***<sup>s</sup>* is used to consistently store tourist attractions that represent the tour route after the searching process. The sequence of the matrix **T***<sup>s</sup>* in storing the tourist attractions obeys the algorithm rule, and the empty elements are 0. A 1 × *k* dimension vector Δ**T***s* is used to dynamically store the tourist attractions in the searching process, and the empty elements are 0. The pseudo-code of the process to create the tour-route-searching algorithm (Algorithm 4) is as follows:


#### **4. Sample Experiment and Result Analysis**

To verify the advantages of the proposed algorithm, the tourism city of Chengdu was selected as the subject of the experiment. The basic thought of the experiment is as follows. First,15 popular tourist attractions in the Chengdu City were selected. All the tourist attraction feature attributes and spatial attributes were confirmed and quantified. According to the tourist attraction attributes, we used the proposed clustering algorithm to obtain tourist attraction labels and clusters, cluster structure trees, and cluster spatial buffers. Based on these clusters, the tourist-interest data were obtained and the quantified interest-matching objective function matrix was created. According to the tourist time and cost allowances as well as the preferred mode of travel, the tourist attractions and tour routes were analyzed for optimal matches. For the tour-route optimization, the experiment chose two frequently used shortest path searching algorithms as a control group to verify the advantages of our proposed algorithm.

#### *4.1. The Collection Result of the Tourist Attraction Attributes*

4.1.1. The Results of the Research Range

The tourist attraction research range of the Chengdu City was as follows:

**S** = {*s*(1): Chunxi Road and Zhongshan Square; *s*(2): Jinsha Site; *s*(3): Temple of Marquis Wu; *s*(4): The People's park; *s*(5): Wide and Narrow Alley; *s*(6): East Lake Park; *s*(7): Wenshu Temple; *s*(8): Qingyang Taoist Temple; *s*(9): Wangjiang Park; *s*(10): Jinniu Wanda; *s*(11): Tazishan Park; *s*(12): Eastern Suburb Memory; *s*(13): SM Square; *s*(14): Chengdu Zoo; *s*(15): Raffles Square}.

#### 4.1.2. Analysis and Results of the Feature Attribute and Spatial Attribute

Table 1 shows the quantified feature attributes and spatial attributes of each tourist attraction. The symbol *t*1(1) represents the classification, *t*1(2) represents the popularity, *t*1(3) represents the best travel time, *t*1(4) represents the traveling fee, *t*2(1) represents the longitude, and *t*2(2) represents the latitude.

**Table 1.** The collected quantified feature attributes and spatial attributes of each tourist attraction.


*4.2. The Result of the Clustering and Cluster Visualization*

4.2.1. The Results of the Function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) Values

Based on the Table 1 data, the proposed improved AGNES spatial clustering algorithm was performed to generate tourist attraction clusters. Table 2 shows the analyzed results of the clustering objective function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) values among the tourist attractions.


**Table 2.** The analyzed results of the clustering objective function *ξ*(*s*(*i*1),*s*(*i*<sup>2</sup>)) values among tourist attractions.

#### 4.2.1.1. The Output Result of the Clusters

Based on the objective function values and clustering algorithm, the analysis resulted in three tourist attraction clusters *S*(1), *S*(2), and *S*(3) as follows:


In the clustering process, the cluster structure trees and cluster spatial buffers were generated, as shown in Figure 4. Figure 4a is the tourist attraction distribution, and Figure 4b–d are the visualization results of the structure trees and spatial buffers for the clusters *S*(1)–*S*(3).

#### *4.3. The Output Result of the Tourist Attractions and Tour Route*

Considering the daytrip example, we chose two tourists as the research objects. Table 3 shows the attribute label values based on the tourist interests. The last two indices were the longitude and latitude of the starting point for each tourist. The first tourist sample *T*(1) chose to use a bicycle for transportation, while the second tourist sample *T*(2) chose to use a taxi service.

#### 4.3.1. The Analyzed Results of the Interest-Matching Objective Function Values

Based on the output cluster results and the tourist interest data, the interest-matching objective function *ξ*(*N*,*T*) values between the tourist interests and each tourist attraction were calculated, as shown in Table 4.

#### 4.3.2. The Sequencing Results of Interest-Matching Objective Function Values

Based on the data shown in Table 4, the results were provided in ascending order values of function *ξ*(*N*,*T*) in the sequence of the clusters, as shown in Table 5.

**Figure 4.** The tourist attraction distribution and clusters, structure trees, and spatial buffers of the clusters. (**a**) is the tourist attraction distribution. (**b**–**d**) are the visualization results of the structure trees and spatial buffers for the clusters *S*(1)–*S*(3).



**Table 4.** The interest-matching objective function *ξ*(*N*,*T*) values between the tourist samples and each tourist attraction.



**Table 5.** The interest-matching objective function *ξ*(*N*,*T*) ascending values between the tourist samples and the cluster tourist attractions.

(1) Figure 5a shows the function *ξ*(*N*,*T*) value distribution of the first tourist in the sequence of the tourist attraction footnotes in the research domain **S**.

**Figure 5.** The interest-matching objective function *ξ*(*N*,*T*) between the tourist samples and the tourist attractions. (**a**) shows the interest-matching objective function *ξ*(*N*,*T*) of the first tourist. (**b**) shows the interest-matching objective function *ξ*(*N*,*T*) of the second tourist. (**c**) shows the interest-matching objective function *ξ*(*N*,*T*) of the first tourist in the cluster sequence. (**d**) shows the interest-matching objective function *ξ*(*N*,*T*) of the second tourist in the cluster sequence.


In Figure 5c,d, in the cluster sequences, the tourist attraction objective function values in each cluster are listed in the ascending order in which the red curve represents the cluster *S*(1), the blue curve represents the cluster *S*(2), and the green curve represents the cluster *S*(3).

4.3.3. The Results of the Tourist Attractions and Tour-Route Planning

According to the data in Table 3 for a one-day tour, the travel-time allowance for the first tourist sample was 9 h and the cost budget was CNY 300 yuan. The travel-time allowance for the second tourist sample was 11 h and the cost budget was CNY 500 yuan. The first tourist chose to take the bicycle while the second tourist chose to take the taxi. Based on the proposed algorithm, a potential tourist attraction itinerary and tour route that was based on each tourist sample's interests and their chosen modes of transportation (i.e., bicycle and taxi service, respectively) were identified, as shown in Table 6.


**Table 6.** The tourist attractions and tour route that best match the tourists' interests.

Table 6 shows the results of the tourist attraction element *Ts*(*i*) of the tour-route searching steady vector **T***<sup>s</sup>* and the cost iteration sub-unit *Q*(*i*). The values were the required time (unit: hour) and the minimum cost (unit: CNY yuan) to visit the tourist attractions. The values between the two tourist attractions represented the travel time (unit: hour) and minimum travel cost (unit: CNY yuan) in the cost iteration sub-unit *Q*(*i*) under the condition of the chosen transportation mode. It also shows the optimal tourist attractions and tour route based on the tourists' requirements.

#### *4.4. The Comparison Results of the Algorithms*

To verify the results of the proposed algorithm, a control set of algorithms were conducted under the same experimental conditions and their results compared with those of the proposed algorithm.

#### 4.4.1. Selecting and Confirming of the Control Algorithms

In tourism research, shortest-path algorithms such as the Dijkstra and A\* algorithms have typically been used to plan tour routes with the shortest traveling distances. They have the benefits of being easily accessed and applied [40–42]. In addition, the shortest-path algorithms were also constrained by tourism factors such as features and spatial attributes. Once the traveling distances between the tourist attractions have been defined by the city roads and road nodes, the shortest-path algorithms can operate. Since the proposed algorithm's experimental environment conforms to these conditions, the Dijkstra algorithm and the A\* algorithm were chosen as controls to plan the travel routes for the sub-unit Φ, and the control group algorithms were defined as Algorithm 1 (A1) and Algorithm 2 (A2). Under the same conditions of the algorithm operating time and the interest data of the two tourist samples, the control group algorithms were used to dynamically search the same tourist attractions, cost iteration sub-units, and tour routes. Their results of were then compared with those of the proposed algorithm (PA), as shown in Table 7, in which the first tourist chose cycling, the second tourist chose a taxi service.

#### 4.4.2. The Comparison Results of the Proposed Algorithm with the Control Algorithms

Table 7 shows the element tourist attractions *Ts*(*i*) of the steady matrix **T***<sup>s</sup>* and the cost iteration sub-units *Q*(*i*) under the condition of each algorithm. The values between the two tourist attractions represent the travel time (unit: hour) and minimum moving cost (unit: CNY yuan) in the cost iteration sub-unit *Q*(*i*) with the chosen transportation modes. According to Table 7, the Figure 6 curve results were as follows:


**Table 7.** The tourist attractions and the tour routes that best match tourist interests under the condition of the three algorithms.


With regard to the computer algorithm optimization, when searching for the shortest route, the Dijkstra algorithm has low efficiency. Compared to the Dijkstra algorithm, the heuristic function is introduced to the A\* algorithm, to some extent, the algorithm efficiency was improved. In comparison, the proposed algorithm is based on multiple dot parallel searching, it has higher operating efficiency, and consumes smaller operating space than the Dijkstra algorithm and A\* algorithm. Table 8 shows the comparison of the Dijkstra algorithm, A\* algorithm, and the proposed algorithm with regard to the time complexity (TC) and space complexity (SC). The data in the table shows the TC and SC examples when the tourist attraction numbers are *n* = 4, *n* = 5, and *n* = 6. The symbol *ρ*1,1 represents the TC ratio between the Dijkstra algorithm and the proposed algorithm, the symbol *ρ*1,2 represents the SC ratio between the Dijkstra algorithm and the proposed algorithm. The symbol *ρ*2,1 represents the TC ratio between the A\* algorithm and the proposed algorithm, the symbol *ρ*2,2 represents the SC ratio between the A\* algorithm and the proposed algorithm.

**Figure 6.** The time- and fee-cost deduction and fluctuating tendency of each algorithm for the two tourist samples. (**a**–**c**) are the deduction and fluctuating tendency of the visiting tourist attraction time, travel time between two tourist attractions, and the total time of the proposed algorithm, Algorithm 1, and Algorithm 2, respectively, for the first tourist sample. (**d**–**f**) are the deduction and fluctuating tendency of the visiting tourist attraction fee, travel fee between two tourist attractions, and the total costs of the proposed algorithm, Algorithm 1, and Algorithm 2, respectively, for the first tourist sample. (**g**–**i**) are the deduction and fluctuating tendency of the visiting tourist attraction time, travel time between two tourist attractions, and the total time of the proposed algorithm, Algorithm 1, and Algorithm 2, respectively, for the second tourist sample. (**j**–**l**) are the deduction and fluctuating tendency of the visiting tourist attraction fee, travel fee between two tourist attractions, and the total costs consuming of the proposed algorithm, Algorithm 1, and Algorithm 2, respectively, for the second tourist sample.

**Figure 7.** The comparison of the total time and costs of the tour routes for the two tourist samples. (**a**,**b**) shows the comparison of each algorithm on the total time and the total costs for the first tourist under the condition of the first tourist's interest data and the same tourist attractions and cost subunits. (**c**,**d**) shows the comparison of each algorithm on the total time and the total fee cost for the second tourist under the condition of the second tourist's interest data and the same tourist attractions and cost sub-units.

**Table 8.** The comparison of the Dijkstra algorithm, A\* algorithm, and the proposed algorithm on the aspect of time complexity (TC) and space complexity (SC).


*4.5. The Analysis and Conclusions of the Experiment Results*

4.5.1. The Analysis and Conclusion on the Collection Results of the Tourist Attractions and Tourist Attraction Attributes

After analyzing Section 4.1 and Table 1 data, the following conclusions were reached.

	- <sup>1</sup> Figure 4a shows the distribution of all the tourist attraction samples with note labels. They were spatially discrete.
	- <sup>2</sup> As to the inner tree structure of the clusters: in Figure 4b, the topological connecting lines among tourist attractions formed the first structure tree and it indicated the searching process of the first cluster. In Figure 4c, the topological connecting lines among the tourist attractions formed the second structure tree, and it indicated the searching process of the second cluster. In Figure 4d, the topological connecting lines among the tourist attractions formed the third structure tree, and it indicated the searching process of the third cluster.
	- <sup>3</sup> As to the structure of the cluster buffer: in Figure 4b, the closed brown space was the first cluster spatial buffer and indicated the spatial range of the first cluster. In Figure 4c, the closed blue space was the second cluster spatial buffer and indicated the spatial range of the second cluster. In Figure 4d, the closed green space was the third cluster spatial buffer and indicated the spatial range of the third cluster.

4.5.3. The Analysis and Conclusion on the Results of the Tourist Attractions and Tour Route

After analyzing Section 4.3, Tables 3–6 data, and Figure 5, the following conclusions were reached.

	- <sup>1</sup> The values were different due to Table 3 preconditions and the operation of the proposed algorithm. It indicated that each tourist attraction's capacity on satisfying tourist's interests would be different. The tourist attraction that had the stronger capacity would be preferentially selected as the tour-route tourist attraction.

As to the first tourist, the interest-matching objective function values are shown in the Figure 5a,c:


As to the second tourist, the interest-matching objective function values are shown in the Figure 5b,d.

	- <sup>1</sup> The tourist attractions of the two tour routes all matched the tourist interests.
	- <sup>2</sup> The recommended tour route for the first tourist was 8.77 h long and cost CNY 33 yuan. We interpreted that the proposed algorithm's tour route results conformed to the tourist's requirements.
	- <sup>3</sup> The recommended tour route for the second tourist was 10.25 h long and cost CNY 136 yuan. We interpreted that the proposed algorithm's tour route conformed to the tourist's requirements.
	- <sup>4</sup> The total time and costs were within the ranges of the tourists' allowances and met their needs. We interpreted that the algorithm was feasible and accurate.

4.5.4. The Analysis and Conclusion on the Comparison Result of the Algorithms

After analyzing Section 4.4, Table 7, Table 8, and Figures 6 and 7, the following conclusions were reached.

	- <sup>1</sup> The tour routes by the Dijkstra and A\* algorithms were less efficient and more expensive than those by the proposed algorithm. We interpreted that the proposed algorithm had an advantage on saving time and costs when planning tour routes, as compared to the controls.
	- <sup>2</sup> From the Table 8, it can be concluded that the three algorithms had different performances. On the aspect of computer algorithm performance, when searching the shortest tour route, the proposed algorithm had much lower time complexity and space complexity than the Dijkstra algorithm, while it had much lower time complexity than the A\* algorithm and had the same dimension of space complexity with the A\* algorithm. Through the mathematical calculating, the ratio *ρ* was obtained. When the tourist attraction number *n* was larger than 2, the ratios *ρ*1,1, *ρ*1,2, *ρ*2,1, and *ρ*2,2 were all larger than 1. It can be concluded that when tourist attractions are confirmed in the searching process on the shortest tour route, the Dijkstra and A\* algorithm always consumed higher time complexity and space complexity than the proposed algorithm, and the Dijkstra algorithm always consumed higher time complexity than the proposed algorithm while the A\* algorithm consumed the same dimension of space complexity with the proposed algorithm.
	- <sup>3</sup> Under the condition of the small tourist attraction data set, the proposed algorithm relied on an exhaustive method, and thus it found global optimal solutions. The Dijkstra and A\* algorithms rely on local "greedy" search methods, they might easily converge on a local optimal solution and consume more time complexity and space complexity. In one complete tour route, the larger number of tourist attraction is, the more computer operating time and computer space will be required. That is, the weaker the algorithm performance is, the more time complexity and space complexity will be needed to search the optimal solution. In the experiment, when the three algorithms were carried out under the same computer operating times, the proposed algorithm would find out the optimal tour route more quickly, while the control group algorithms might not find out the optimal one since the Dijkstra algorithm and the A\* algorithm's performances were not better than the proposed algorithm with regard to time complexity and space complexity, especially when the tourist attraction number is sufficiently large, the time and space consuming gap would be rapidly widened. Thus, under the conditions of the identical limited operating time and space consumption, the Dijkstra and A\* algorithm are inferior to finding out the optimal solution, or even could not find it out and converge on a local optimal solution. In other words, if the Dijkstra algorithm or the A\* algorithm are set as the embedded algorithm of the smart tourism

system, they can also find out the optimal tour route, but they will consume more computer operating time and space. In all, the proposed algorithm had a better performance than the Dijkstra and A\* algorithms in searching optimal tour routes.

	- <sup>1</sup> With regard to the first tourist, the proposed algorithm route was 8.77 h long and cost CNY 33 yuan. The Dijkstra algorithm route was 9.14 h long and cost CNY 36.5 yuan. The A\* algorithm route was 9.2 h long and cost CNY 37 yuan. We interpreted that the proposed algorithm was superior to the control algorithms.
	- <sup>2</sup> With regard to the second tourist, the proposed algorithm route was 10.25 h long and cost CYN 136 yuan. The Dijkstra algorithm route was 10.55 h long and cost CYN 147 yuan. The A\* algorithm route was 10.44 h long and cost CYN 145 yuan. We interpreted that the proposed algorithm was superior to the control algorithms.
	- <sup>3</sup> For the first tourist, the time duration of the tour routes that were recommended by the Dijkstra and A\* algorithms both exceeded the nine hours, and thus the results did not conform to the tourist's allowance. In this aspect, we interpreted that the control algorithms were inferior to the proposed algorithm.

#### **5. Conclusions**

#### *5.1. Contribution*

Based on the current challenges in tour-route planning and attraction recommendations, this study designed a tour-route planning and recommendation algorithm that was based on an improved AGNES spatial clustering and space-time deduction model. This model improved interest-matching, urban-tourist-attraction clustering, space-time deduction, and tour-route planning based on various modes of transportation. By combining the tourist attraction features and spatial attributes, the improved AGNES tourist attraction clustering algorithm was created, and the cluster structure trees, cluster spatial buffers, and clusters were generated. All the tourist attractions with a high degree of correlation among the attributes were clustered together. Based on the tourist-interest data, the interestmatching objective function was created. This function reflected each tourist attraction's capacity for satisfying the tourist's interests, which formed the precondition when planning the tour route. Under the constraint conditions of time and cost allowance, the proposed algorithm searched for the optimal tourist attractions to match the tourist interests as well as considered the optimal tour route. The resultant tour routes met the tourists' needs and interests. Based on the comparison results, the proposed algorithm had advantages when compared to the controls. The proposed algorithm reduced the costs and time investment for tour-route planning. The improved AGNES clustering algorithm considered spatial distance and various tourist attraction attributes. The proposed algorithm integrated mixed (i.e., preferred) transportation modes for different optimized results. Tour-route planning that was based on space-time deduction was an innovative method that not only considered the time and cost constraints, but also considered the shortest traveling distance between two tourist attractions. Therefore, the resultant tour routes satisfied the tourist's interests and reduced the time and costs that were invested by tourists.

#### *5.2. Addressing Challenges for Research*

Smart mobile devices have become part of daily life, and, for many applications, activities and events are planned using smart mobile devices. Mobile planning is the key to ensuring efficient routing, resource allocation, and energy management. For example, the researchers in [30] considered that efficient routing, resource allocation, and energy management could be achieved through clustering of mobile nodes into local groups. In the study, a clustering scheme was developed to prolong the network lifetime by distributing energy consumption among clusters. In [31], a novel travel route recommendation system

was proposed that collected tourist on-site travel behavior data automatically regarding a specific POI that was based on smart phone and Internet of Things technologies. The tour-route-recommendation algorithm was then created to search and rank the tangible travel routes. The researchers in [32] considered that the prevalence of smart mobile devices and location-based services would lead to an increasing volume of mobility data. Based on big mobile data, it proposed a method for accurately predicting the next location of a traveling object.

In tourism activities, tourists' traveling behaviors also generate massive amounts of data on mobile devices. How to appropriately and accurately use these data is a future challenge for tourism research. Mobile data could be used in tourism data mining, tourist attraction location, tourist interest tendency research, tourism facility evaluations, tour-route planning, and recommendations, etc. It has been deemed the most important, challenging, and valuable research field for the future. How to precisely optimize mobile data acquisition, mine interest data, match tourists' needs, search optimal solutions, etc., are challenges that should be addressed.

#### *5.3. Limitation and Future Work*

When searching the tour routes, the proposed algorithm sets the transportation mode, time allowance, and costs as the constraint conditions. However, the proposed algorithm still has some drawbacks and limitations. First, the AGNES clustering algorithm itself has its limitation in efficiency, accuracy, and space complexity. Second, in the tour-route algorithm, the transportation modes were relatively fixed, whereas tourists might choose different transportation modes in the tour process. Third, the proposed method did not involve mobile data; we provided a method under the condition of city tourist attractions' attributes, tourists' specific interests, and an urban tourism environment. Therefore, additional research could expand and validate our proposed algorithm further. First, more precise tourist attraction clustering methods could be studied, which could refine and better target the clustering results based on tourist interests. The clustering objective function criteria and model procedure could be refined further as well. The criteria to select the parameters could add more factors to satisfy more individualized interests. Second, the transportation mode selection for the whole tour should be more flexible and random, which could then consider tourist selection tendency on different cost deduction sub-units between two tourist attractions. In further research, we will study random transportation mode selection in different sub-units, and a more individualized tour-route-searching algorithm will be designed and proposed. Third, mobile data should be used to mine tourist interests and to integrate specialized interests. To some extent, a smart tourism recommendation system could be set up by mining historical tourists' data and find related knowledge.

**Author Contributions:** Conceptualization, Xiao Zhou, Jiangpeng Tian and Mingzhan Su; methodology, Xiao Zhou and Jiangpeng Tian; validation and formal analysis, Mingzhan Su and Jiangpeng Tian; investigation, data resources and data processing, Mingzhan Su; writing—original draft preparation, Xiao Zhou; writing—review and editing, Xiao Zhou, Jiangpeng Tian, and Mingzhan Su; visualization, Xiao Zhou and Mingzhan Su; supervision, Xiao Zhou and Jiangpeng Tian; project administration and funding acquisition, Jiangpeng Tian; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key Research and Development Program of China(Grant No.2017YFB0503503), the National Natural Science Foundation of China(Grant No.41701457), the Military "Double Key" construction project(Grant No.2021KY05), and the Leshan Science and Technology Project (Grant No.20RKX0007 and No.20ZRKX006).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available from the author upon reasonable request.

**Acknowledgments:** The authors would like to thank the postdoctoral innovation practice base of Sichuan province of Leshan vocational and Technical College and Computer Science postdoctoral mobile station of Sichuan University. Meanwhile, we thank the editors and reviewers for their valuable comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Spatiotemporal Dynamic Analysis of A-Level Scenic Spots in Guizhou Province, China**

**Yuanhong Qiu 1,2, Jian Yin 1,2,3,\*, Ting Zhang 1, Yiming Du <sup>1</sup> and Bin Zhang 1,2**


**Abstract:** A-level scenic spots are a unique evaluation form of tourist attractions in China, which have an important impact on regional tourism development. Guizhou is a key tourist province in China. In recent years, the number of A-level scenic spots in Guizhou Province has been increasing, and the regional tourist economy has improved rapidly. The spatial distribution evolution characteristics and influencing factors of A-level scenic spots in Guizhou Province from 2005 to 2019 were measured using spatial data analysis methods, trend analysis methods, and geographical detector methods. The results elaborated that the number of A-level scenic spots in all counties of Guizhou Province increased, while in the south it developed slowly. From 2005 to 2019, the spatial distribution in A-level scenic spots were characterized by spatial agglomeration. The spatial distribution equilibrium degree of scenic spots in nine cities in Guizhou Province was gradually developed to reach the "relatively average" level. By 2019, the kernel density distribution of A-level scenic spots had formed the "twoaxis, multi-core" layout. One axis was located in the north central part of Guizhou Province, and the other axis ran across the central part. The multi-core areas were mainly located in Nanming District, Yunyan District, Honghuagang District, and Xixiu District. From 2005 to 2007, the standard deviation ellipses of the scenic spots distribution changed greatly in direction and size. After 2007, the long-axis direction of the ellipses gradually formed a southwest to northeast direction. We chose elevation, population density, river density, road network density, tourism income, and GDP as factors, to discuss the spatiotemporal evolution of the scenic spots' distribution with coupling and attribution analysis. It was found that the river, population distribution, road network density, and the A-level scenic spots' distribution had a relatively high coupling phenomenon. Highway network density and tourist income have a higher influence on A-level tourist resorts distribution. Finally, on account of the spatiotemporal pattern characteristics of A-level scenic spots in Guizhou Province and the detection results of influencing factors, we put forward suggestions to strengthen the development of scenic spots in southern Guizhou Province and upgrade the development model of "point-axis network surface" to the current "two-axis multi-core" pattern of tourism development. This study can explain the current situation of the spatial development of tourist attractions in Guizhou Province, formulate a regulation mechanism of tourism development, and provide a reference for decision-making to boost the high-quality development of the tourist industry.

**Keywords:** A-level scenic spots; spatiotemporal evolution; trend analysis; Geodetector

#### **1. Introduction**

The planning and development of tourist attractions have become a key link to promote the growth of the local tourist economy. As the important material carrier of tourist supply, tourist scenic spots provide a material basis for the development of the regional

**Citation:** Qiu, Y.; Yin, J.; Zhang, T.; Du, Y.; Zhang, B. Spatiotemporal Dynamic Analysis of A-Level Scenic Spots in Guizhou Province, China. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 568. https://doi.org/10.3390/ijgi10080568

Academic Editors: Andrea Marchetti, Angelica Lo Duca and Wolfgang Kainz

Received: 23 June 2021 Accepted: 20 August 2021 Published: 23 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

tourism industry. Their projection in geographical space shows the spatial attributes and mutual relations of tourist activities, which influences and promotes the development of regional tourism resources and the tourist economy [1]. In order to strengthen the quality assessment and management of tourist scenic spots, the Chinese government has formulated the A-level scenic spot planning (http://zwgk.mct.gov.cn/zfxxgkml/zcfg/ gfxwj/202012/t20201204\_906214.html, accessed on 16 June 2021). The grade is divided into A, AA, AAA, AAAA and AAAAA. A-level scenic spots refer to scenic spots that can receive tourists, have the functions of sightseeing and entertainment, and have a relatively complete management system. A-level scenic spots must have a visitor center, basic visitor services, tourist consultations, tourist complaints, and management of all kinds of tourist affairs within the service radius of the visitor center and the visitor center itself. The rating of the scenic spot includes eight aspects: tourist transportation (14%), sightseeing (21%), tourist safety (8%), health (14%), post and telecommunications services (3%), tourist shopping (5%), comprehensive management (19.5%), and protection of resources and environment (15.5%); the rating agency will give a score on each aspect [2]. According to the standard "Classification and Evaluation of Tourist Areas (Spots) Quality Grade" (GB/T 17775–1999), the score of "service quality and environment evaluation system", "landscape quality evaluation system", and "tourist opinion evaluation system", the level of participating scenic spots is divided [2].

Guizhou Province is a distinctive mountain tourist area in Southwest China. Its unique karst landform and climate characteristics make tourist scenic spots diverse in Guizhou. In the Fourteenth Five-Year Plan for National Economic and Social Development of Guizhou Province and the Outline of the Vision of the Year 2035, it is proposed to actively promote the construction and upgrading of scenic spots [3]. Recently, owing to the booming of tourist industry in Guizhou, it is urgent to explore the development process of scenic spots, especially A-level scenic spots. The study of the scenic spots' spatial distribution and its influencing factors plays a positive role in formulating tourism planning, promoting traffic development, and alleviating the environmental pressure caused by tourism [4,5].

At present, scholars' studies on the evolution pattern of scenic spots mainly focus on the network of scenic spots [6], the spatio–temporal influence of scenic spots on tourists' behavior and emotion [7,8], demand prediction [9,10], optimization of tourists' tourism experience [11–13], landscape changes of scenic spots [14], environmental impact [15–20] and intelligent tourism [21]. Since the China Tourism Administration issued the "A-level scenic spot assessment standard" in 2002, how to rationally plan and develop the A-level scenic spots has become a hot spot for domestic scholars to study scenic spots. From the perspective of research content, it mainly focuses on the spatial planning of scenic spots [22], spatial structure and optimization [23–27], influence factors of scenic spot distribution [28–35], and so on. In terms of research methods, exploratory spatial data analysis and geographic detector were mainly applied. For example, Liu and Hao [5] researched the influencing factors of the spatial distribution evolution of scenic spots in Shanxi Province with the help of a geographic detector model. Peng and Huang [21] analyzed the popularity distribution of Beijing's scenic spots under different temporal and weather contexts. Li and Zhang [28] systematically sorted out the spatial distribution characteristics and influencing factors of 1010 scenic spots in the Yellow River Basin, China. Tang and Sun [29] explored the spatial layout of scenic spots in Beijing–Tianjin–Hebei urban agglomeration and its influencing factors by using spatial data analysis, the Gini coefficient, and geographical detector methods. Jia and Hu [34] used the average nearest neighbor index, kernel density analysis, and geographic detector to analyze the spatial distribution evolution and the influence mechanism of A-level scenic spots in the middle reaches of the Yangtze River through exploratory spatial data analysis and geographic detector. Lu and Zhang [35] explored the spatial distribution characteristics, differentiation trend, and driving mechanisms of A-level scenic spots. From the perspective of time scale, some studies analyzed the spatial evolution characteristics of scenic spot distribution from a single time node to a continuous-time point [23–36]. From the perspective of the study region, the studies

covered the evolution of the spatial pattern of national A-level scenic spots at the national, urban agglomeration, provincial, and urban levels [23–36]. Liu and Wang [37] analyzed the spatial distribution characteristics of scenic spots in Guizhou Province, China, while they lacked research on the evolution of the spatial pattern of tourism scenic spots.

Guizhou is a province with unique tourism characteristics in China, its tourism income accounts for a very large proportion of GDP. In recent years, the number of scenic spots in Guizhou Province has developed rapidly, but the research on scenic spots in Guizhou is very lacking. The current analysis methods for scenic spots distribution contain one or more of exploratory spatial data analysis, density analysis, direction analysis, spatial coupling analysis, and influencing factor analysis, but lack comprehensive analysis. Tourism is a complex spatial process. Multi angle analysis is more conducive to the exploration of the spatiotemporal dynamic processes.

Against this backdrop, we selected the A-level scenic spots in Guizhou, established a database of A-level scenic spots from 2005 to 2019 and explored the spatiotemporal distribution characteristics and influencing factors of the A-level scenic spots in Guizhou Province, by using direction analysis, the Gini coefficient method, trend analysis, and geographical detector. The aim is to reveal the evolution of the scenic spots' distribution law, clear the driving mechanism of its spatial dynamic characteristics, and put forward suggestions for optimizing the layout of scenic spots, in order to provide decision support for the upgrading of A-level scenic spots and promoting the development of regional tourism quality of Guizhou Province.

#### **2. Materials and Methods**

#### *2.1. Data and Area*

Guizhou Province is located in the hinterland of the southwest of China, which owns 88 counties and districts (Figure 1). There have been many ethnic minorities living in Guizhou for generations, and the ethnic culture is profound. Guizhou is an important transportation hub in Southwest China, and it is a world-famous mountain tourist destination with a livable climate, good ecological environment, and tourism conditions. Affected by the South Asian monsoon, Guizhou Province has distinct dry and wet seasons. It is a typical low-latitude plateau climate, warm and humid. Due to many clouds throughout the year, it has less sunshine and more cloudy days, an obvious rainy season, abundant precipitation, and the rainy and hot periods are mostly concentrated in summer. The average annual precipitation is 682–1134 mm, and the average annual temperature is 14–16 ◦C [38]. By 2020, it had seven national 5A-level scenic spots, including famous scenic spots such as Huangguoshu Waterfall, Loong Palace, Zhenyuan Ancient Town, Qingyan Ancient Town, National Forest Park of Azalea, and Mount Fanjing.

The data of scenic spots mainly contained the distribution, rating, and geographical coordinates of A-level scenic spots in Guizhou Province from 2005 to 2019, and the influencing factors included tourist income, GDP, the river system, altitude, population distribution, and vegetation coverage rate. With the help of the Baidu API coordinate pickup system, particle coordinates of each scenic spot were calibrated as the spatial position of the scenic spot. Among them, A-level scenic spots in the Guizhou directory data mainly came from the culture and tourism section of the Guizhou hall official website (http://whhly.guizhou.gov.cn/, accessed on 16 June 2021); part of the scenic spot data was from the municipal state tourism administration network. The administrative boundaries, digital elevation, normalized difference vegetation index (NDVI), and population distribution came from the resources and environmental science and data center of the Chinese Academy of Sciences (http://www.resdc.cn, accessed on 16 June 2021). The river data and road data were from an open-source map website (http://www.openstreetmap.org, accessed on 16 June 2021); the tourism income data came from the macroeconomic database (http://hgk.guizhou.gov.cn/index.vhtml#, accessed on 16 June 2021).

**Figure 1.** Map of the counties located in Guizhou Province.

#### *2.2. Methodology*

In the study, the average nearest neighbor index and Gini coefficient were used to calculate the equilibrium degree of the spatial distribution of scenic spots, then the standard deviation ellipse and trend analysis were used to calculate the spatial distribution trend of scenic spots. Kernel density analysis was used to calculate the difference of the spatial distribution of scenic spots, the factors affecting the spatial distribution of scenic spots were analyzed in the geographical detector according to the kernel density analysis results. The following is the workflow of the study (Figure 2).

#### 2.2.1. Average Nearest Neighbor Index

In this study, A-level scenic spots in Guizhou Province were taken as point-like targets. The nearest neighbor index is a measurement method to measure the actual point-like distribution based on the condition of random distribution. The nearest neighbor analysis can determine the attributes of point pattern more accurately and objectively [5]. The clustering degree of A-level scenic spots in Guizhou each year was obtained by analyzing the data of A-level scenic spots over the years with the average nearest neighbor index.

$$H\_1 = \frac{\sum\_{i=1}^{n} x\_i}{n}, H\_2 = \frac{1}{2\sqrt{n/S}}, H' = \frac{H\_1}{H\_2} \tag{1}$$

where, *N* represents the number of scenic spots, *S* represents the area of Guizhou Province, and *H*<sup>1</sup> denotes the average nearest distance of each scenic spot, *H*<sup>2</sup> represents the theoretical nearest proximity distance. *H* is the ratio of *H*<sup>1</sup> to *H*2, that is, the average nearest neighbor index. When *H* > 1, the locations of scenic spots are evenly distributed. When *H* < 1, scenic spots are clustered and distributed.

**Figure 2.** Workflow of the study.

#### 2.2.2. Gini Coefficient

The Gini coefficient was originally used as a common indicator to measure regional economic income differences, and was later improved by relevant scholars and applied to the measurement of geographic spatial distribution. In this study, the Gini coefficient algorithm proposed by Zhang [39] was used to ensure the accuracy of the measurement of spatial distribution balance degree of A-level scenic spots in nine cities and states of Guizhou Province. According to Liu's paper, the Gini coefficient values of scenic spots can be divided into different equilibrium types of the spatial distribution of scenic spots [40].

$$G = 1 - \frac{2\sum\_{i=1}^{d=1} \mathbf{W}\_i + 1}{\mathbf{d}} \tag{2}$$

where, d is the number of cities and state scenic spots. *G* is the Gini coefficient from 0 to 1, and *G* was also close to 1, indicating that the balance of distribution of A-level scenic spots in Guizhou Province was smaller.

#### 2.2.3. Kernel Density Analysis

The spatial distribution density of regional elements can clearly reflect their spatial dispersion or agglomeration characteristics and the change of this form. The spatial distribution density of regional elements is usually expressed by kernel density estimation method [24–27]. Kernel density clearly reflected the spatial dispersion and agglomeration characteristics of A-level scenic spots in Guizhou Province. Then, the evolution law of its characteristics was obtained by analyzing the annual kernel density of scenic spots. The analytical formula for kernel density is

$$f\_z(k) = \frac{1}{n\chi} \sum\_{i=1}^{n} h(\frac{k - K\_i}{\chi}) \tag{3}$$

where, *n* represents the number of sample points, *h()* represents the kernel function, *x* > 0 and represents the bandwidth, and (*k* − *Ki*) represents the distance from *k* to the event *Ki*. This formula was tested many times, and the data selection bandwidth was 3 km to more intuitively reflect the spatial distribution of tourism resources.

#### 2.2.4. Direction Distribution Analysis

Direction distribution can reflect the degree of dispersion and evolution of the spatial distribution of scenic spots from time and space dimensions. The standard deviation ellipse which is a common direction distribution analysis method, was employed to reflect the spatial distribution characteristics and the spatial distribution variation of research elements [25–30]. This method can reflect spatial characteristics such as the centrality, distribution, and directionality of the spatial distribution of scenic spots in each year by using index parameters such as the center, long axis, short axis, and azimuth of the standard deviation ellipse.

#### 2.2.5. Geodetector

The geographical detector was originally based on the geographical perspective proposed by Wang [41]. This study used the Wang's Geodetector model for calculation [42,43]. An algorithm is about detecting the spatial difference of the influence factors on the dependent variables [42–46].

$$q = 1 - \frac{\sum\_{x=1}^{L} N\_x \sigma\_x^2}{N \sigma\_x^2} \tag{4}$$

where, the *q* value represents the influence degree of each detection factor on the distribution of A-level scenic spots in Guizhou. *L* represents the variable stratification, that is, classification or partition; *Nx* and *N* represent the number of units in layer h and the entire area, respectively. *σ*<sup>2</sup> refers variance.

#### *2.3. Data Preprocessing*

#### 2.3.1. Data of Scenic Spots

Based on the collected data, we built a space database containing the name, grade, counties and cities, geographic coordinates, and evaluation time of scenic spot grade in Guizhou from 2005 to 2019. Figure 3 shows the spatial distribution of A-level scenic spots in 2005, 2009, 2013, 2016, and 2019 as examples.

**Figure 3.** Spatial distribution of A-level scenic spots in Guizhou Province, (**a**) Spatial distribution of A-level scenic spots in 2005, (**b**) Spatial distribution of A-level scenic spots in 2009, (**c**) Spatial distribution of A-level scenic spots in 2013, (**d**) Spatial distribution of A-level scenic spots in 2016, (**e**) Spatial distribution of A-level scenic spots in 2019.

#### 2.3.2. Geodetector Data

In the explanatory variables of the Geodetector model, tourism income and Gross Domestic Product (GDP) were the statistical data of 88 counties (districts) in Guizhou, while altitude, population density, river density, and highway network density were vector data. The kernel density of scenic spots in Guizhou Province over the years was the explained variable of the model. All data in the model were reclassified using the natural breakpoint method. Considering the long construction cycle of scenic spots, the dependent variables in the model were all data with one lag period.

#### **3. Results**

#### *3.1. Spatial Distribution Characteristics of A-Level Scenic Spots*

#### 3.1.1. A-Level Scenic Spots Development

According to the line chart of the number of A-level scenic spots in Guizhou Province in Figure 4, the number of A-level scenic spots increased from 6 to 406 during 2005–2019. As can be seen from the line chart, the number of A-level scenic spots increased little from 2005 to 2011, and the number of scenic spots only increased by 35 in six years. From 2011 to 2015, the growth rate of the number of A-level scenic spots increased slowly. With the improvement of Guizhou's tourism policy, market, and system, the number of A-level scenic spots increased steadily. From 2016 to 2019, Guizhou Province clearly proposed to improve in the three "long boards" of big data, big ecology and big tourism. Scenic spots had "blowout" growth, the scenic spots increased, respectively, by 73, 122, and 80 in 2017, 2018, and 2019.

**Figure 4.** Changes in the number of A-level scenic spots in Guizhou Province (2005–2019).

3.1.2. Evolution of Spatial Distribution Types

Table 1 shows the analysis results of the average nearest neighbor index of A-level scenic spots in Guizhou Province over the years. *p* values were less than 0.05, that is, they passed the significance test in 95% of cases. From 2005 to 2019, the nearest neighbor index was less than 1, and the spatial distribution types of scenic spots were all concentrated. From 2005 to 2009, the value of the nearest neighbor index increased gradually, from 0.356 to 0.717. From 2009 to 2016, the value of the nearest neighbor index increased in an "M" shape, showing an overall upward trend. In 2012, the value of the nearest neighbor index reached the highest value of 0.854, and in 2017–2019, the value changed slightly, all around 0.81, which was relatively stable.

**Table 1.** The nearest neighbor index of the average spatial distribution of A-level scenic spots in Guizhou Province.


#### 3.1.3. Equilibrium of Spatial Distribution

Table 2 shows the calculation results of the Gini coefficient. The spatial distribution equilibrium degree of A-level scenic spots in Guizhou has changed greatly, developing towards the "relatively equality" type [40]. From 2005 to 2019, the Gini coefficient values showed a downward trend on the whole, but the coefficient values were all greater than 0.2 (below 0.2 is the "absolute equality"), with the maximum coefficient value of 0.741 in 2005 and the minimum coefficient value of 0.2 in 2017. For the A-level scenic spots, the space distribution equilibrium degree was mainly "inequality" from 2005 to 2006, the degree was "relative inequality" from 2007 to 2011, the degree was "reasonable" from 2012 to 2016, and the degree was "relative equality" after 2017. The development process of spatial distribution equilibrium degree could be divided into three periods: "great disparity—relatively reasonable—relative equality".

**Table 2.** Gini coefficient calculation results of A-level scenic spots in Guizhou.


According to the line chart of the Gini coefficient change of grade A-level scenic spots in Guizhou Province from 2005 to 2019 (Figure 5), the Gini coefficient dropped sharply in the three years from 2005 to 2007, which indicates that the regional gap of the scenic spots' spatial distribution balance was narrowing. From 2007 to 2011, the Gini coefficient was stable at around 0.55, and the balance of spatial distribution of scenic spots was not significantly improved. In 2012, the Gini coefficient dropped below 0.4, reaching 0.359. From 2013 to 2016, the Gini coefficient was in the range of 0.3–0.4. In 2017, the Gini coefficient dropped to the lowest level over the years, reaching 0.215. From 2018 to 2019, the Gini coefficient rose and finally stabilized at 0.285. In general, the Gini coefficient of the spatial distribution of A-level scenic spots in the nine cities and states of Guizhou Province showed a decreasing trend from 2005 to 2017 and an increasing trend from 2018 to 2019. Moreover, the Gini coefficient showed a significant decrease in 2012 and 2017. With the steady growth of the number of A-level scenic spots in Guizhou Province, the spatial distribution of the scenic spots gradually shifted from highly concentrated in 2005 to balanced development. However, due to the rapid growth of A-level scenic spots after 2012 and 2017, the spatial distribution of the scenic spots showed the characteristics of small concentration.

**Figure 5.** Change process of the Gini coefficient (2005–2019).

#### *3.2. Spatial Distribution Evolution Processes of A-Level Scenic Spots*

#### 3.2.1. Density Change Process

The data of A-level scenic spots in Guizhou Province in 2005, 2009, 2013, 2016, and 2019 were selected for the density analysis. The density evolution of scenic spots each year was investigated to reveal the evolution law of spatial nuclear density of A-level scenic spots in Guizhou Province. The result is shown in Figure 6.

**Figure 6.** Kernel density map of spatial distribution of A-level scenic spots in Guizhou Province, (**a**) Kernel density of spatial distribution of A-level scenic spots in 2005, (**b**) Kernel density of spatial distribution of A-level scenic spots in 2009, (**c**) Kernel density of spatial distribution of A-level scenic spots in 2013, (**d**) Kernel density of spatial distribution of A-level scenic spots in 2016, (**e**) Kernel density of spatial distribution of A-level scenic spots in 2019.

According to Figure 6, the spatial distribution of A-level scenic spots in Guizhou had obvious changes in the kernel density map. In 2005, there were three high-density areas in Guizhou Province, which were located in Guiyang City, Zunyi City, and the junction of Zunyi–Bijie City, and the spatial core density of the scenic spots was relatively small on the whole. In 2009, the high-density area of A-level scenic spots was expanded. Compared with 2005, two high-density areas were added at the junction of Anshun–Liupanshui City and the junction of Qiannan Zhou–Qiandongnan Prefecture, while the high-density area at the junction of Zunyi–Bijie disappeared. In 2013, the main core density area in Guiyang City and Zunyi City was still expanding. Compared with 2009, three high-density areas were added in Bijie, Tongren, and Qianxinan Prefecture. At this time, high-density areas of scenic spots appeared in all nine cities and prefectures in Guizhou Province. In 2016, one main core area and three secondary core areas appeared, and the Zunyi, Qiannan and Anshun core areas also gradually formed. Other high-density areas expanded significantly, and the overall spatial pattern of "one axis and multiple cores" was formed. The "one axis" was located in the central part of Guizhou, spanning Anshun, Guiyang, southern Guizhou, and southeast Guizhou By 2019, the spatial distribution of A-level scenic spots in Guizhou Province had formed the feature of "two-axis, multi-core". One axis is located in the north of Guizhou Province, ranging from Bijie to Zunyi. The other axis crosses the middle of Guizhou, along with the distribution of "Anshun–Guiyang–Qiannan–Qiandongnan". The core areas were mainly distributed in Liupanshui, Xingyi, Qiandongnan, and Zunyi. The southern and eastern parts of Guizhou Province are mostly low-density areas.

#### 3.2.2. Directional Distribution

Figure 7 and Table 3, respectively, represent the standard deviation ellipse plot and its attribute table after directional distribution analysis. According to Figure 7, the overall spatial distribution of A-level scenic spots in Guizhou Province showed the obvious southwest to northeast trend. On the whole, the coverage of the ellipse tended to expand. This trend was more obvious after 2007. The standard deviation ellipses in 2005 and 2006 showed significant morphological differences compared with other years. The ellipse range of standard deviation in 2005 includes parts of Guiyang, Zunyi, Bijie, and Qiannan Prefecture. In 2019, it overlapped with some areas of all nine cities in Guizhou Province. In 2005, the standard deviation ellipse was located in the central part of Guizhou Province, its main axis was in the north–south direction, which indicated that the scenic spot expanded and developed more greatly in the north–south direction than in the southeast and northwest directions. In 2006, the shape of the standard deviation ellipse was close to the circle, which indicated that the expansion and development direction of the scenic spot was relatively uniform. From 2017 to 2019, the ellipse centered on Guiyang, the capital of Guizhou Province, and expanded mainly along the east–west direction. According to the migration map of the ellipse center (Figure 7), the migration scope of the center was small from 2005 to 2019, and the centers were almost located in Guiyang.

According to the structure calculated in Table 3, the standard deviation ellipse area increased significantly in 2006, 2007, and 2009. The standard deviation ellipse area in 2006 increased by 132.27% compared with that in 2005, 21.16% compared with that in 2006, and 34.49% compared with that in 2008. From 2005 to 2019, the standard deviation ellipse area increased year by year, except that there was no change in 2011. The minimum standard deviation ellipse area in 2005 was 9121.892 km2, the standard deviation ellipse area reached the maximum of 70589.659 km<sup>2</sup> in 2019. The area of the standard deviation ellipse increased year by year, with an increase of 61467.767 km2 from 2005 to 2019. From the point of view of the scenic spot distribution center, the central location was moving in Xiuwen County, Kaiyang County, Wudang District, and Baiyun District of Guiyang City. The standard deviation ellipse center of most years was located in Kaiyang County from 2005 to 2019.

**Figure 7.** Standard deviation ellipse of the spatial distribution of A-level scenic spots in Guizhou Province.


**Table 3.** Calculation results of the standard deviation ellipse.

According to Figure 7 and Table 3, the standard deviation ellipse area was minimum and the scenic spots were along the north and south direction in 2005. This showed that the development scope of scenic spots in Guizhou Province was increasing. Since 2007, the development direction of scenic spots gradually formed a trend of extending along the southwest and northeast, and tended to disperse along the northwest and southeast. Both the short axis and the long axis of the standard deviation ellipse showed a "growing trend in fluctuations" during the process of change from 2005 to 2019.

#### 3.2.3. Spatial Differentiation Characteristics

Trend surface analysis can intuitively show the general trend of the number of Alevel scenic spots in the spatial layout of each county (district) in Guizhou Province [47]. The results are shown in Figure 8, where x and y axes point to the east and north directions respectively, and the *z*-axis represents the number of A-level scenic spots in each county of Guizhou Province.

**Figure 8.** The spatial distribution trend surface analysis plots of A-level scenic spots in Guizhou, (**a**) 2010, (**b**) 2015, (**c**) 2019.

In the east–west direction, the A-level scenic spots in 2010 and 2015 presented an inverted U-shaped distribution, and the growth rate of the number of scenic spots was greater in the central counties. In 2019, the distribution of A-level scenic spots tended to be uniform, and the number of A-level scenic spots in the western counties such as Xingyi, Panzhou, and Shuicheng increased significantly. The trend curve in the east–west direction was more flat than in the north–south direction. In 2010, the trend curve was relatively flat, and its central part was slightly higher. In 2015, the distribution of scenic spots showed a "parabola" in the form of higher in the north and lower in the south. The growth rates of scenic spots in northern counties such as Xishui, Renhuai, and Chishui were greater. In 2019, the distribution of scenic spots showed a "concave" pattern with a high level in the north and a low level in the south, but the overall number of scenic spots was significantly higher than that of 2015, and the growth rate of the number of scenic spots in the south was higher than that in the north. On the whole, A-level county scenic spots in Guizhou Province were distributed in the east–west direction, and the curve gradually changed from a relatively steep inverted U-shaped curve to a gentle curve with high height in the west and low height in the east. In the north–south direction, the steepness of the trend curve gradually changed from a gentle curve to a steep curve, and finally formed a concave curve with high height in the north and low value in the south.

#### *3.3. Factors Influencing the Spatial Distribution of A-Level Scenic Spots*

The spatial distribution of A-level scenic spots in Guizhou Province changed significantly from 2005 to 2019. By the end of 2019, there had been 18 scenic spots selected into the fifth batch of the Chinese national representative catalog of intangible cultural heritage.

The distribution of scenic spots is mainly affected by natural and cultural factors. Both natural resources and cultural resources are important driving forces for the development and construction of scenic spots, and the density of scenic spots in resource rich areas will increase accordingly. The study was based on the perspective of time and space, and summarized the evolution law of the quantitive and spatial differentiation characterisrics of A-level scenic spots in Guizhou Province. In addition, we discussed the coupling relationship between natural and human factors and the distribution of scenic spots and put forward some suggestions for the optimization of decision-making of the development planning and layout of scenic spots in Guizhou on the basis of detecting the impact of various factors on the development and construction of scenic spots in Guizhou Province.

#### 3.3.1. Coupling Analysis of Natural/Human Elements and Scenic Spots Distribution

The distribution of the A-level scenic spots is greatly affected by the factors of topography and altitude, and the scattered topography can create a stronger visual impact and appreciation. Water and vegetation are also important elements of scenic spots, and

different water and vegetation landscapes create natural scenic spots with different characteristics. Guizhou Province is located in the Yunnan–Guizhou Plateau in Southwest China with an average altitude of 1100 m. More than 50% of the area is karst landform. The unique climate of "one mountain has four seasons and ten miles with different days" has become one of the natural advantages for the development of regional scenic spots. Based on the terrain slope of Guizhou (Figure 9a), the central and northern parts of the terrain are relatively gentle and the surrounding terrain is relatively steep. There was no obvious coupling phenomenon between the newly added scenic spots and the slope area in 2019. Guizhou is the main birthplace of the Yangtze River Basin and the Pearl River Basin, with a dense river network and broad watershed, and the distribution of A-level scenic spots is highly coincident with the 3 km buffer zone of the rivers (Figure 9b). The NDVI was high in the southeast and northern regions and low in the central and western regions in 2018 (Figure 9c). The A-level scenic spots in Guizhou in 2018 and the new A-level scenic spots in 2019 were mainly distributed in areas with high NDVI value.

The development of the regional economy is the basis to promote the development of tourism and is also a powerful guarantee to strengthen the construction of tourism infrastructure. From 2005 to 2019, the GDP of Guizhou Province increased from 2005 billion yuan to 16,769 billion yuan. Rapid economic development promotes the rapid development of tourism, and the proportion of tourism income makes the tourist industry become one of the indispensable key industries to promote the economic development of Guizhou Province. By 2019, the total tourism income of Guizhou Province had jumped to third place in China, and the added value of tourism income had increased to 11.6% of the province's GDP. The spatial distribution of scenic spots was not only closely related to topography, rivers, and economic development but also the population distribution and road network were important factors affecting the distribution of scenic spots. In this study, GDP, population, and traffic were selected to conduct coupling analysis with the distribution of A-level scenic spots. Based on the spatial distribution chart of GDP in 2018 (Figure 9d), it can be seen that the areas with high GDP value in Guizhou Province mainly appeared in Guiyang and Zunyi. However, there was no significant coupling between the A-level scenic spots in 2018 and the new A-level scenic spots in 2019. The areas with high population in 2018 were mostly concentrated in central and northern Guizhou, and the A-level scenic spots in 2018 and the new A-level scenic spots in 2019 were also mostly concentrated in the same areas (Figure 9e). It can be seen in Figure 9e that the railway network mainly runs through Guizhou along the east–west direction and extends northward and westward in the centre of Guizhou. The railway density in southern Guizhou is relatively low. There was a high spatial distribution correlation between A-level scenic spots and the railway network. The expressway distribution in Guizhou Province is more uniform, more intensive in the middle and north. The distribution of the areas with concentrated population density, dense railway, and expressway network in Guizhou Province is consistent with that of the areas with dense A-level scenic spots. The A-level scenic spots are built in the areas with high road network accessibility, which will also increase the accessibility of scenic spots, thus improving the travel time efficiency of tourists, and become one of the advantages of attracting tourists.

**Figure 9.** The typical factors influencing the spatial distribution of A-level scenic spots in Guizhou, (**a**) Slope, (**b**) Rivers, (**c**) NDVI, (**d**) GDP, (**e**) Population, (**f**) Traffic.

According to the topography of Guizhou Province, the elevation is divided into four grades: below 791 m, 792–1169 m, 1170–1682 m and 1683–2885 m (Table 4). From 2015 to 2019, the number of A-level scenic spots at the four different elevations increased. In the areas with an elevation below 791 m, the proportion of scenic spots decreased from 2015 to 2016, then, the number of scenic spots increased gradually from 2016 to 2019. In the areas with an elevation of 792–1169 m, the proportion of A-level scenic spots decreased slowly

year by year. In the areas with an elevation of 1170–1682 m, the proportion of A-level scenic spots increased gradually. In the areas with an elevation of 1683–2885 m, the number of A-level scenic spots increased firstly and then decreased, showing an inverted U-shaped downward trend. Generally, the areas with an altitude of less than 791 m and 1170–1682 m gradually became the preferred areas for the construction of A-level scenic spots. The development degree of A-level scenic spots in areas with an altitude of 792–1169 m and 1683–2885 m decreased.


**Table 4.** Distribution of A-level scenic spots at different elevations.

3.3.2. Analysis of Detection Factor Interaction Results

Coupling analysis failed to quantify the influence of various factors on the distribution of scenic spots. A geographical detector model [41] was used to explore the influence mechanism of the spatial distribution of A-level scenic spots from 2013 to 2019. The *p* values of all influencing factors in the measurement results were less than 0.01, which means that all factors passed the significance test. Table 5 showed that six detection factors influenced the A-level scenic spots development and construction, the road network density and tourism income factor explanatory power (*q*) averaged over 20%, GDP and the altitude factor explanatory power averaged around 10%, population density and river density factor explanatory power averaged small, under 5%.

**Table 5.** Detection results of the spatial evolution influencing factors of A-level scenic spots in Guizhou Province.


Both the density of the road network and tourism income have a great influence on the spatial layout of scenic spots. The factor explanatory power of road network density increased year by year from 2013 to 2018, reaching more than 20% after 2015, and reached its highest value in 2018, which was 32.5%. The explanatory power of the tourism income factor from 2013 to 2019 was above 20%, and its influence degree fluctuated slightly with the year and finally decreased. Tourism income feeds into local economic growth.

The influence of GDP on the distribution of A-level scenic spots showed an inverted U-shaped growth, increasing from 2013 to 2017 and decreasing from 2017 to 2019. The influence of altitude on the distribution of A-level scenic spots showed a V-shaped growth, and the explanatory power of the factor reached its highest in 2013, 10.3%, and showed a significant decline from 2014 to 2015, and began to rise from 2016.

The influence of population density on the spatial distribution of A-level scenic spots showed a fluctuating upward trend. The explanatory power of population density showed a W-shaped fluctuation from 2013 to 2017 and changed to stabilize after 2017. The influence

of river density on the spatial distribution of A-level scenic spots in Guizhou showed a fluctuating upward trend. From 2013 to 2015, the impact was relatively stable. The impact was significantly higher in 2016 than in 2015. The impact began to decline in 2017.

During the study period, the influences of altitude, river density, and population density on the spatial distribution of A-level scenic spots in Guizhou Province were relatively stable. The q value of road network density and GDP increased significantly, indicating that these two influencing factors had a significant increase in the distribution of A-level scenic spots. The influences of tourism income on the distribution of A-level scenic spots decreased slightly. Compared with other detection factors, road network density and tourism income had a higher influence on the spatial distribution of A-level scenic spots.

#### **4. Discussion and Suggestions**

#### *4.1. Discussion*

From 2005 to 2019, the number growth of A-level scenic spots in Guizhou Province can be divided into three stages: the early development period from 2005 to 2010, the moderate development period during 2011–2015, and the rapid development period after 2016. According to the calculation through the average nearest neighbor index, the spatial distribution of A-level scenic spots in Guizhou Province has been concentrated for many years. During China's 13th Five Year Plan period, Guizhou Province launched the integration policy of 'big tourism', 'big data' and 'big ecology'. The number of scenic spots has ushered in a blowout development, and tourism income has also achieved considerable growth. Tourism development in Southern Guizhou still lags behind relatively.

The Gini coefficient of the spatial distribution of A-level scenic spots in Guizhou Province showed a significant downward trend from 2005 to 2017. The Gini coefficient increased slightly from 2017 to 2019. It showed that the equilibrium degree of the scenic spot distribution was developing towards equality from 2005 to 2017, and the equilibrium degree was closest to absolute equality in 2017. While the spatial distribution equilibrium of scenic spots tended to be inequality from 2017 to 2019, with the characteristics of smallscale concentration.

Based on the spatial distribution kernel density of A-level scenic spots in Guizhou Province from 2005 to 2019, the number and scope of high-density areas in the scenic spots increased year by year, forming the spatial distribution characteristics of "one axis and two cores" in 2017 and "two axes and multiple cores" in 2019. According to the standard deviation ellipse of the distribution of A-level scenic spots in Guizhou Province, the size of the ellipse increased year by year, and Guiyang was always in the center of the ellipse. The long axis direction of the standard deviation ellipse changed significantly from 2005 to 2007, and the long axis direction from 2007 to 2019 was mainly southwest to northeast.

In the east–west direction, the number of county-level scenic spots in Guizhou Province was a gentle curve with high in the West and low in the East. In the north– south direction, the steepness of the trend curve was a concave curve with high in the north and low in the south. The distributions of rivers and scenic spots showed coupling phenomenon. The distributions of slope, NDVI, and scenic spots showed a significant coupling phenomenon. The population distribution, the road network, and the scenic spot distribution were highly correlated. In recent years, with the continuous improvement of China's GDP, the government's investment in transportation and tourism has promoted the development of regional tourism and the construction of scenic spots [48–50]. At the same time, the development of tourism promotes the development of transportation and regional economy [48–50]. It has also driven GDP growth and local investment.

With the help of Geodetector, it was found that the road network density and tourism income had a strong impact on the distribution of A-level scenic spots. The density of road network will directly affect the accessibility of scenic spots, thus affecting the tourism planning of tourists in selecting scenic spots, and then affect the maintenance income and brand effect of scenic spots. Tourism income will stimulate local attention and investment

in the tourist industry, and further affect the construction and development of local scenic spots. The influence of altitude on the distribution density of scenic spots in the early years was stronger than that in the later years. The possible reason is that in the early years, the construction of scenic spots with high altitude was difficult and relatively inexperienced, while in the later years, with the progress of technology, the construction difficulty was no longer a large problem in determining the construction of scenic spots. With the growth and change of the regional economy, the mode of economic growth will gradually promote the tourism industry. Therefore, the impact of GDP on the distribution of scenic spots is increasing year by year. The population density is mainly affected by the city, and scenic spots are mainly used as a tourist destination for non-local visitors. The rivers are distributed widely and evenly in Guizhou Province. So, the impact of population density and river distribution on the spatial distribution of scenic spots is relatively weak.

#### *4.2. Suggestions*

Based on the analysis of the spatiotemporal evolution characteristics and influencing factors of A-level scenic spots in Guizhou Province, combined with the regional resource endowment, we put forward suggestions for the development, construction, and layout optimization of scenic spots in Guizhou Province.

According to the evolution characteristics of the spatial distribution of A-level scenic spots, the lagging development of scenic spots in the south of Guizhou is the important problem existing in the development of scenic spots in Guizhou Province. Traffic conditions are one of the main factors affecting scenic spot planning. However, the road network density is small in the south of Guizhou. Therefore, strengthening the traffic construction of Qianxinan, Qiannan, and Qiandongnan can effectively improve the regional accessibility of the three prefectures, strengthen the convenience for tourists in the southern scenic spots, promote the growth of local tourism, and promote the development and construction of scenic spots in the region. The tourism resource endowment in the southern region has not been effectively developed. There are abundant river valleys with excellent water resources. Using river resources to build hydrological scenic spots can be one of the effective ways to develop southern scenic spots.

The construction of A-level scenic spots is an important starting point for tourism development, regional coordination, and urban–rural integration. Therefore, Guizhou should make full use of the regional resources advantages to optimize the layout of the scenic spots. At present, A-level scenic spots in Guizhou present a spatial layout of "two axes and multiple cores". Guiyang has always been the core of scenic spot planning and development. Guizhou can take the advantage of the "two axis and multi-core" pattern, combined with the development mode of "point axis and network", explore relying on the big data technology advantage platform to expand the radiation scope of each A-level scenic spot, so as to strengthen inter regional tourism industry cooperation and eliminate a regional tourism market fortress, promote the development of global tourism in Guizhou Province, and finally realize the effective allocation and rational utilization of tourism resources.

#### **5. Conclusions**

This paper studied the temporal and spatial evolution characteristics of A-level scenic spots in Guizhou Province from 2005 to 2019, including spatial distribution, density, balance degree, temporal change trend, and direction characteristics, and analyzed the natural and human factors influencing the scenic spots' distribution qualitatively and quantitatively.

Overall, the A-level scenic spots in Guizhou Province have shown a good development trend in recent years. However, the development in southern Guizhou province is less optimistic. The rapid growth in the number of A-level scenic spots led to small-scale agglomeration in spatial distribution from 2017 to 2019. Guiyang has always been the center of A-level scenic spots planning in Guizhou Province. The kernel density distribution of A-level scenic spots in Guizhou Province forms the "two-axis, multi-core" layout. The road network density, tourism income, and GDP had a higher influence on the A-level scenic spots distribution. As time goes by, the influence of terrain height on scenic spot construction was gradually reduced. The area with an altitude of 1170 m to 1682 m has gradually become the first choice for the construction of scenic spots in Guizhou Province. Because of the unique terrain and water system in Guizhou Province, population distribution and rivers have little impact on the planning and construction of A-level scenic spots. Finally, we have provided some suggestions for scenic spot layout optimization in Guizhou Province on the basis of the perspective of regional resource endowment and scenic spot spatial layout.

**Author Contributions:** Conceptualization, Jian Yin; methodology, Jian Yin and Yuanhong Qiu; validation, Jian Yin, Yuanhong Qiu and Ting Zhang; formal analysis, Yuanhong Qiu; investigation, Jian Yin; resources, Ting Zhang and Bin Zhang; data curation, Yuanhong Qiu; writing—original draft preparation, Yuanhong Qiu and Yiming Du; writing—review and editing, Jian Yin; visualization, Jian Yin and Yuanhong Qiu. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by the MOE (Ministry of Education in China) Liberal Arts and Social Sciences Foundation (Grant No. 19YJCZH228), and the Scientific Research Project of Guizhou University of Finance and Economics (2020ZXSY08). The authors are grateful to the reviewers for their help and thought-provoking comments.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** We thank the editors and the anonymous reviewers for their valuable comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Spatiotemporal Evolution and Trend Prediction of Tourism Economic Vulnerability in China's Major Tourist Cities**

**Chengkun Huang 1,†, Feiyang Lin 2,†, Deping Chu 1,3,\*, Lanlan Wang 1, Jiawei Liao <sup>1</sup> and Junqian Wu <sup>4</sup>**


**Abstract:** The evaluation and trend prediction of tourism economic vulnerability (TEV) in major tourist cities are necessary for formulating tourism economic strategies scientifically and promoting the sustainable development of regional tourism. In this study, 58 major tourist cities in China were taken as the research object, and an evaluation index system of TEV was constructed from two aspects of sensitivity and adaptive capacity. On the basis of the entropy weight method, TOPSIS model, obstacle diagnosis model, and BP neural network model, this study analyzed the spatiotemporal patterns, obstacle factors, and future trends of TEV in major tourist cities in China from 2004 to 2019. The results show three key findings: (1) In terms of spatiotemporal patterns, the TEV index of most of China's tourist cities has been on the rise from 2004 to 2019. Cities throughout the coast of China's Yangtze River Delta and the Pearl River Delta urban agglomeration show high vulnerability, whereas low vulnerability has a scattered distribution in China's northeast, central, and western regions. (2) The proportion of international tourists out of total tourists, tourism output density, urban industrial sulfur dioxide emissions per unit area, urban industrial smoke and dust emission per unit area, and discharge of urban industrial wastewater per unit area are the five major obstacles affecting the vulnerability degree of the tourism economy. (3) According to the prediction results of TEV from 2021 to 2030, although the TEV of many tourist cities in China is increasing year by year, cities with low TEV levels occupy the dominant position. Research results can provide reference for tourist cities to prevent tourism crises from occurring and to reasonably improve the resilience of the tourism economic system.

**Keywords:** tourism economic vulnerability; spatiotemporal evolution; obstacle factors; trend prediction; major tourist cities

## **1. Introduction**

The concept of "vulnerability" originated from natural science research; it is used to characterize the ability of a system or system combination to withstand and recover from risk events [1]. In the early stages, the vulnerability concept was mainly applied to the assessment of natural disasters such as floods and droughts or ecosystems such as forests and coasts [2–5]. With the gradual integration and penetration of the natural and social systems, the interaction between the natural environment and human social activities has become increasingly obvious [6], and the relevant research on vulnerability has gradually extended to the social and economic fields [7,8]. As one of the important components of the concept and connotation of vulnerability, economic vulnerability refers to the bearing capacity of the regional economy due to the impact of unexpected events in the process of

**Citation:** Huang, C.; Lin, F.; Chu, D.; Wang, L.; Liao, J.; Wu, J. Spatiotemporal Evolution and Trend Prediction of Tourism Economic Vulnerability in China's Major Tourist Cities. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 644. https://doi.org/10.3390/ ijgi10100644

Academic Editors: Andrea Marchetti, Angelica Lo Duca and Wolfgang Kainz

Received: 22 July 2021 Accepted: 21 September 2021 Published: 25 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

development [9,10]. Economic vulnerability was first proposed by Briguglio in the 1990s and has been gradually deepened in subsequent studies [11]. At present, it has become an important indicator to measure whether the development of a regional or urban economic system is healthy and stable [12].

With the rapid development of China's social economy and the improvement of people's living standards, tourism has been gradually positioned as a "strategic pillar industry and modern service industry," playing an increasingly important role in regional economic development. However, as a typical sensitive industry, tourism will be greatly impacted by financial crises [13], political conflicts [14], social disturbances [15], public health events [16], and natural disasters [17] in the context of the integration of global trade of services. This is especially true in areas where economic development is highly dependent on tourism; although these areas have gained huge profits through the vigorous development of tourism, the instability of tourism will inevitably bring about regional economic shocks, and regional economic development is generally vulnerable to hidden worries [18]. Nowadays, with the increasing role of tourism in national political communication, economic development, and residents' well-being, as well as the pursuit of regional sustainable development goals, the research on TEV is receiving growing attention [19,20].

TEV refers to the inherent property wherein the structure and function of the tourism economy system are easily damaged due to the restriction of its own property and the inability to adapt to various disturbances inside and outside the system [21]. TEV is usually divided into two types: "endogenous" and "exogenous". Endogenous vulnerability is formed under the constraints of certain economic systems and tourism resources and cannot be eliminated by conscious actions, such as policy combining [22,23]. Exogenous vulnerability is a result of "non-systemic causes" from the external environment, such as earthquakes, public health events, financial crises, and social disturbances, which have contingent and sudden characteristics [22,23]. In general, the literature on TEV mainly focused on the following two aspects: (1) The analysis of TEV under the impact of crisis events; such studies focus on the impact of some emergencies on the tourism economy from the perspective of crisis management and take the impact degree of the crisis as the basis for assessing vulnerability. For example, Huang et al. analyzed the long-term impact of the Wenchuan Earthquake on inbound tourists in Sichuan and found a significant increase in inbound tourists after the earthquake, with a "blessing in disguise" effect [24]. Pham et al. used the tourism satellite account approach and tourism CGE model to effectively measure the changes and impacts of COVID-19 on the core and related industries of Australia's inbound tourism [25]. In addition, the recovery and development of the tourism economy in the context of crisis events is also an important research topic [26]. Gurtner used the case of Bali to illustrate that after a tourism crisis, the government, industry, community, and other tourism stakeholders need to strengthen cooperation and adopt a wide range of new strategies to deal with the changing destination environment and potential challenges in the future [27]. Raki et al. discussed the role of active and proactive tourism recovery strategies in improving the well-being of tourists, improving the profitability of companies, and reducing employee turnover under the impact of COVID-19 [28]. (2) Assessment of the TEV of typical tourist destinations; this kind of research focuses on the evaluation of tourism economic system shock resistance of various types of tourism destinations. Research on islands, countries, typical tourism cities, national regions, and other traditional tourist destinations is prioritized using the entropy weight method, TOPSIS model, obstacle degree model, and geographical detectors and comprehensive quantitative analysis of vulnerability degree; research contents include TEV measurement, spatiotemporal pattern evolution, and influence factors [29–33].

It can be seen from the above analysis that the existing literature is still mostly limited to discussing the TEV of individual typical tourist destinations. However, with the rapid development of China's tourism industry, a global analysis of TEV in major tourist cities on a national scale is urgently required to optimize the regional pattern of tourism development. In addition, the existing literature usually measures regional TEV in previous

years but lacks predictive research on the regional TEV in the future. Such discussion is more conducive to grasping the evolutionary trend of TEVs in order to rationally plan relevant strategies to reduce the regional TEV. In view of this, 58 major tourist cities in China were selected as study areas for this paper. Our objectives were as follows: (1) Clarify the spatiotemporal evolution of TEV in major tourist cities in China. (2) Explore the main obstacles affecting TEV in major tourist cities in China. (3) Forecast the evolution trend of TEV of major tourist cities in China in the next 10 years.

In this study, we first assessed the level of TEV of each city from 2004 to 2019 based on the case studies of 58 major tourist cities in China, using the entropy weight method and TOPSIS model. Then, the obstacle diagnosis model was used to analyze the obstacle factors affecting TEV in major tourist cities in China. Finally, the BP neural network model was used to predict the evolutionary trend of TEV in major tourist cities in China in the future. The research conclusions are of great significance for the detailed understanding of TEV and the future evolutionary trend of major tourist cities in China in the context of high-quality development. These results can provide a reference for regional tourism crisis prevention and effectively enhance the resilience of the urban tourism economy.

The structure of this study can be divided into five parts. The first part is the Introduction, which introduces the research background, research objectives, existing research results, and the value of this research. The second part is the Materials and Methods, which establishes the evaluation index system of TEV, explains the data sources, and introduces the application logic of research methods. The third part is the Results, which expounds on the spatiotemporal evolution of the TEV of major tourist cities in China, the obstacle factors affecting the TEV, and the future evolutionary trend of the TEV. The fourth part is the Discussion, which summarizes the spatiotemporal characteristics and future evolutionary trends of TEV in major tourist cities in China, and puts forward countermeasures to improve the resilience of the urban tourism economy. The fifth part is the Conclusion, which shows the highlights of the results and limitations of the study.

#### **2. Materials and Methods**

#### *2.1. Study Area*

A tourist city considers tourism development to be an important goal that has a prominent function after a certain period of accumulation [34]. The Yearbook of China Tourism Statistics has recorded long-term tracking statistics on the tourism development of 60 major tourist cities in China. However, due to the lack of statistical data of Yanbian and Lhasa, among the 60 major tourist cities, this study selected only 58 cities as research objects in this study, as shown in Figure 1. These major tourist cities not only have prominent tourism functions, evident progress in the city's tourism construction, and enjoy high popularity at home and abroad; they also have large differences in their urban population on an economic scale, wide regional coverage, and diverse urban types. These characteristics make them suitable for exploring the urban TEV.

#### *2.2. Research Framework*

Figure 2 shows the implementation framework of this study, which mainly includes three steps. First, on the basis of relevant research, the evaluation index system of TEV was constructed from the two dimensions of sensitivity and adaptive capacity. Second, the data needed for this study were collected from various statistical yearbooks of China. Finally, the econometric correlation model and spatial visualization methods were used to present the research results.

**Figure 1.** Spatial distribution of 58 major tourist cities in China.

#### *2.3. Index System Construction*

Polsky et al. constructed the vulnerability assessment system of "exposure, sensitivity, and adaptive capacity" in 2007, which provided a solid theoretical basis for vulnerability research [35]. After that, scholars in different fields continued to apply and expand the vulnerability theory model based on it, among which the two dimensions of "sensitivity and adaptive capacity" have been gradually taken as the core dimension of the vulnerability assessment of tourism [23,36]. Sensitivity refers to the ability of a system to withstand damage in the case of internal disorder and external impact [35]. The weaker the sensitivity is, the less vulnerable a system is to damage. Adaptive capacity refers to the ability of

a system to quickly adjust from a crisis situation to a safe and stable situation [35]. The stronger the adaptive capacity is, the stronger the self-maintenance ability of a system and its ability to quickly recover from adverse effects. Sensitivity and adaptive capacity determine the vulnerability of a system in the interaction. The interaction between sensitivity and adaptive capacity determines the vulnerability of the tourism economic system. When the tourism economy has a high vulnerability, it indicates that the tourism economy has a poor anti-crisis ability, which reduces the speed at which the tourism economy can recover to a stable state; otherwise, the economic system is more secure.

As the sensitivity and adaptive capacity of the tourism economy are multiple structural variables, they involve complex economic environmental factors. To reflect the degree of TEV of major tourist cities in China in a comprehensive way, the evaluation index system proposed in this study was constructed as follows. First, the construction methods and contents involved in the existing research on the index system of TEV were fully utilized for reference [21,23,32,36]. Second, the accessibility of the data of each indicator was ensured. Finally, the index system can be applied to different types of tourist cities in China. On the basis of the above considerations, this study combined the basic elements of the tourism industry, social economy, finance, infrastructure construction, and ecological environment, and a total of 27 indicators from the two aspects of sensitivity and adaptive capacity were selected to construct an evaluation index system for the TEV of major tourist cities in China. Table 1 shows the specific indicators.


#### **Table 1.** Index system of TEV.

In terms of sensitivity, TEV is not only affected by the core elements within the tourism industry; it is also closely related to the external elements of the tourism industry. According to the viewpoints of scholars, the sensitivity is positively correlated with TEV; that is, the higher the sensitivity, the higher the TEV, and vice versa [23,37]. Therefore, all indicators attribute of sensitivity should be positive. In this study, the sensitivity index of TEV was mainly constructed from two levels of industry core elements (S1–S6) and industry-related elements (S7–S10), including 10 specific indicators. Among them, S1, S2, S5, and S6 were mainly used to reflect the dependence of urban economic development on the tourism industry; due to the instability of the tourism industry, the higher the dependence proportion, the higher the vulnerability of the urban tourism economy. S3 and S4 mainly reflect the dependence of the urban tourism industry on inbound tourism development. Inbound tourism has many potential uncertainties and is more susceptible to various unexpected factors than domestic tourism. Therefore, the higher the dependency ratio, the higher the vulnerability of the urban tourism economy. S7, S8, and S9 mainly reflect the level of environmental quality of the tourist destination; The higher the pollution level, the higher the vulnerability of the urban tourism economy. S10 mainly reflects the employment situation of tourist cities; if the unemployment rate is higher, it indicates that urban economic development is at a low stage, and the vulnerability of the urban tourism economy is higher.

In terms of adaptive capacity, when the urban tourism economic system is impacted, the development potential of the urban tourism industry and the construction level of the city in terms of economy, ecology, and public services are particularly important for coping with the crisis. According to the viewpoints of scholars, the adaptive capacity is negatively correlated with TEV; that is, the higher the adaptive capacity, the lower the TEV, and vice versa [23,37]. Therefore, the indicators attribute of adaptive capacity should all be negative. In this study, the indicators of the adaptive capacity of the tourism economic system were mainly constructed from four aspects of the industrial potential, economic vitality, environmental protection, and public service of urban tourism, including 17 specific indicators. A1 and A2 reflect the growth capacity of the regional tourism industry and the attraction of urban tourism, respectively; the higher the growth rate of the total tourism income and the total number of tourists received, the stronger the adaptive capacity of the tourism economy and the lower the TEV. A3, A4, A5, and A6 objectively reflect the city's overall economic strength and economic development potential. The higher the GDP per capita, GDP growth rate, fixed asset investment per capita, and per capita year-end deposit balance of financial institutions, the higher the level of urban economic development, and the lower the TEV [36]. A7, A8, A9, and A10 reflect the environmental protection level of the city; the more green space and the higher the garbage and sewage treatment rate, the higher the anti-risk response-ability of the city's tourism economy and the lower the TEV. A11, A12, A13, A14, A15, A16, and A17 reflect the city's public service levels in terms of postal services, medical services, transportation, and communication; the better the public service level, the stronger the city's ability to deal with tourism emergencies and the lower the corresponding TEV.

#### *2.4. Data Sources*

The data sources of this study mainly include the following two aspects. First, data on the economy, environment, and public services of 58 major tourist cities in China, from 2004 to 2019, mainly came from the China City Statistical Yearbook (CCSY). CCSY is an annual publication reflecting the social and economic development of Chinese cities. Each issue contains major statistics on the social and economic development of Chinese cities at all levels in the previous year. Detailed statistics of the development data of 58 major tourist cities in China can be found in CCSY. If some of the data could not be found in the CCSY, Statistical Yearbooks (SY) of each tourist city were searched to supplement the data in this study. Second, data on the tourism industry and other aspects of 58 major tourism cities in China, from 2004 to 2019, mainly came from the SY of each city, Statistical Communique of National Economic and Social Development (SCNESD), and the Yearbook of China Tourism Statistics (YCTS). In addition, in order to enhance comparability, some of the data were processed by secondary calculations. The data sources for each case city were detailed in Table A1 (Appendix A).

#### *2.5. Research Methods*

#### 2.5.1. The Weights of Indicators Were Calculated by the Entropy Weight Method

As an objective weight assignment method, the entropy weight method determines the weight based on the variation degree of the data, which can effectively eliminate the interference of human factors and has strong objectivity and reliability. In view of this, the method in this study was used to calculate the weight of each of the 27 indicators in the TEV index system. The formula for each step of the model is as follows [38,39]:

(1) Set the original evaluation matrix as:

$$X = (\mathfrak{x}\_{it})\_{m \times n} \tag{1}$$

In the formula, *xit* represents the original value of the *t*-th index in the *i*-th sample; *i* = 1, 2, ... , *m*, where *m* is the sample number; *t* = 1, 2, ... , *n*; *n* is the number of indicators. It should be noted that the sample number *m* in this study is 928, which is composed of 16 years of data (2004–2019) for 58 major tourist cities in China. In addition, the number of indicators *n* in this study is 27, and they are the indicators in Table 1.

(2) Standardize the above original evaluation matrix to form a standardized matrix:

$$\mathcal{Y} = (y\_{it})\_{m \times n} \tag{2}$$

where *yit* represents the standardized value of the *t*-th indicator in the *i*-th sample. Among them, the positive indicators are *y*it = (*xit* − *xmin*)/(*xmax* − *xmin*), and the negative indicators are *yit* = (*xmax* − *xit*)/(*xmax* − *xmin*).

(3) Use the entropy weight method to obtain the weights of indicators. The specific calculation formula is as follows:

$$w\_t = (1 - E\_t) / \left( m - \sum\_{t=1}^{m} E\_t \right) \tag{3}$$

$$p\_{it} = y\_{it} / \sum\_{i=1}^{n} y\_{it} \tag{4}$$

$$E\_l = -\frac{1}{\ln n} \sum\_{i=1}^{n} p\_{it} \ln p\_{it} \tag{5}$$

In the formula, *i* is the sample reference, and *t* is the indicator reference. *pit* represents the feature proportion, *Et* represents the information entropy, and *wt* represents the weight of the *t*-th indicator.

2.5.2. TOPSIS Model Was Used to Calculate the Values of Sensitivity, Adaptive Capacity, and TEV

On the basis of the indicator weight, the TOPSIS model can be used to calculate the value of each evaluation object. The calculation principle of the TOPSIS model is to calculate the distance between each evaluation object and the optimal (inferior) solution, and then determine the relative approximation degree between the evaluation object and the ideal solution, so as to calculate the evaluation value. In this study, it was introduced to calculate the annual value of the TEV of 58 major tourist cities in China from 2004 to 2009, as well as the values of sensitivity and adaptive capacity of the components of TEV. In this study, the formula of the TOPSIS model used in the calculation of sensitivity, adaptive capacity, and TEV is the same, with the only differences being the indicator types. The calculation formula for each step of the TOPSIS model is as follows [38,39]:

(1) Construct the weighting matrix:

$$S = Y \times \mathcal{W}\_t \tag{6}$$

where *Y* is the matrix obtained after standardized processing in the entropy weight method mentioned above, and *Wt* is the weight of indicator *t*.

(2) Determine the optimal solution *S*<sup>+</sup> *<sup>t</sup>* and the worst solution *S*<sup>−</sup> *<sup>t</sup>* for the *t*-th indicator:

$$\begin{array}{l} S\_t^+ = \max \{ S\_{1t}, S\_{2t}, \dots, S\_{mt} \} \\ S\_t^- = \min \{ S\_{1t}, S\_{2t}, \dots, S\_{mt} \} \end{array} \tag{7}$$

(3) Calculate the Euclidean distance between the optimal (inferior) solution and the positive (negative) ideal solution. *i* is the sample reference, and *t* is the indicator reference:

$$\mathcal{R}\_{i}^{+} = \sqrt{\sum\_{t=1}^{n} \left(\mathcal{S}\_{t}^{+} - \mathcal{S}\_{it}\right)^{2}};\\\mathcal{R}\_{i}^{-} = \sqrt{\sum\_{t=1}^{n} \left(\mathcal{S}\_{t}^{-} - \mathcal{S}\_{it}\right)^{2}}\tag{8}$$

(4) Calculate the proximity *Ci*:

$$\mathcal{C}\_{i} = \frac{\mathcal{R}\_{i}^{-}}{\mathcal{R}\_{i}^{+} + \mathcal{R}\_{i}^{-}} \tag{9}$$

In the formula, *i* is the sample reference. The *Ci* value is within (0, 1). The higher the value of *Ci* is, the better the evaluation object, and vice versa.

It should be noted that when calculating sensitivity, the indicator *t* in the formula contains S1–S10, a total of 10 indicators. When calculating adaptive ability, the indicator *t* in the formula includes A1–A17, a total of 17 indicators. When calculating TEV, the indicator *t* in the formula includes S1–S10 and A1–A17, a total of 27 indicators.

#### 2.5.3. The Main Factors Affecting TEV Were Detected by the Obstacle Diagnosis Model

The obstacle diagnosis model can effectively analyze and identify the obstacles that affect the development level of the regional system elements and has been widely used in many fields. In this study, two problems can be clarified by introducing the obstacle degree model. First, it is clear which of the 27 indicators of TEV have a major impact on TEV. Second, the obstacle factors affecting TEV in different cities are clearly different. The formula is as follows [40]:

$$M\_{it} = \frac{R\_t \times P\_{it}}{\sum\_{t=1}^{n} (R\_t \times P\_{it})} \times 100\% \tag{10}$$

In the formula, *i* is the sample reference, and *t* is the indicator reference. *Mit* is the obstacle degree of the *t*-th indicator to the ecological tourism security in *i* samples; *Rt* is the weight of each indicator, representing the contribution degree of the obstacle factors. *Pit* = 1 − *yit* represents the deviation between indicators and development goals, and *yit* is the standardized value of each indicator. In addition, it should be noted that there may be deviations in evaluation results caused by accidental factors in a single year. Therefore, for the diagnosis results of the obstacle factors in 58 major tourist cities in China, the 16-year average, from 2004 to 2019, was used to obtain the diagnosis results.

#### 2.5.4. The Evolution Trends of TEV Were Predicted by the BP Neural Network Model

(1) Model setting

The BP neural network, also known as the error-back propagation neural network, has been developed into the most important and widely used artificial neural network algorithm owing to its advantages such as flexible structure design, multiple training algorithms, and good operability [41]. The structure of the BP neural network is a multilayer forward neural network, with an input layer, several hidden layers, and an output layer (Figure 3). Neural networks are connected by links, each of which has a weight. Weight is the basic form of the neural network, and artificial neurons learn by constantly adjusting these weights. The process of a neural network involves the following steps [42]. The first is the selection framework; the second is deciding what kind of learning algorithm to use. Finally, the neural network is trained, which involves initializing the weight of the network and changing the weight value through a series of training steps.

**Figure 3.** Architecture of the BP neural network.

#### (2) Model building

The BP neural network with a three-layer structure was adopted. The input variable is the year corresponding to TEV index, the middle is the hidden layer, and the output variable is TEV index. The number of neurons in the hidden layer was determined by experiments. According to the number of neurons in the input layer and the output layer, the number of neurons in the hidden layer was tentatively determined as 8–12. By comparing the prediction errors of different hidden layer networks, the number of hidden layer neurons was finally set as 10.

#### (3) Initial data processing and parameter setting

To prevent neurons from reaching the saturation state, the sample data were first normalized. MATLAB programming was used to normalize the sample data to the interval of 0–1, according to the positive and negative properties of the indicators. These data were taken as the input, and the standardized TEV was taken as the output data to form a training sample for the BP neural network. When the transfer function of the intermediate layer is an S-shaped tangent function, and the transfer function of the output layer is a linear function, the prediction result is optimized. Considering that the function trainlm converges quickly, and the training error of the network is relatively small, the LM algorithm was selected for training, the maximum training times were set as 1000, the target accuracy was set as 0.0001, and the learning rate was set as 0.01.

#### (4) Model training and testing

The data of 58 major tourist cities in China, from 2004 to 2019, were trained separately. During the training, the sample data were randomly divided into two groups according to the proportions of 80% and 20% and used as training and test data, respectively. Figure 4 shows the regression accuracy of the neural network model. The correlation coefficient *R*<sup>2</sup> of the test samples was higher than 0.95, and the average error rate was 1.49%, showing a good fitting effect. Therefore, the neural network model can be used to better predict the TEV of China's major tourist cities from 2021 to 2030.

**Figure 4.** Accuracy tests of the BP neural network model.

#### **3. Results**

#### *3.1. Spatiotemporal Evolution of Urban TEV*

3.1.1. The Evaluation of Each Indicator Weight in Urban TEV

In this study, the weights of 27 indicators were calculated by using the entropy weight method, and the calculation results are shown in Table 2. Among them, S1–S10 are the indicators of the "sensitivity" part of the TEV, A1–A17 are indicators of "adaptive capacity" part of the TEV.

**Table 2.** The Weight of each indicator.


#### 3.1.2. Spatiotemporal Changes of Urban TEV

Figure 5 shows the sensitivity dimension of the urban TEV. During the period from 2004 to 2007, cities with high sensitivity values were mainly distributed in economically developed regions such as Shanghai, Suzhou, Shenzhen, Zhuhai, and Guangzhou in China's Yangtze River Delta and Pearl River Delta. In addition, Tianjin, located in Northern China, has a high sensitivity value. During the period from 2008 to 2011, the sensitivity values of all cities in this stage were basically similar to those in the previous stage, and only a few cities' sensitivity values changed. For example, Shanghai was added as one of the cities with the highest sensitivity values, whereas the sensitivity of Tianjin declined in this stage. During the period from 2012 to 2015, among the cities with high sensitivity values, the value of Zhuhai declined, whereas the value of Taiyuan rose sharply and became one of the cities with the highest sensitivity values. In addition, the sensitivity values of Huangshan, Qinhuangdao, and other cities rose to a higher level. During the period from 2016 to 2019, the cities with the highest sensitivity were Shanghai, Xiamen, Zhangzhou, and Shenzhen, among which the sensitivity value of Zhangzhou increased the most. In addition, compared with the previous stage, the sensitivity of Chongqing has also been greatly improved.

**Figure 5.** Spatiotemporal evolution of the sensitivity index ((**a**) Average sensitivity index for 2004–2007 were shown; (**b**) Average sensitivity index for 2008–2011 were shown; (**c**) Average sensitivity index for 2012–2015 were shown; (**d**) Average sensitivity index for 2016–2019 were shown).

Figure 6 shows the dimension of adaptive capacity of the urban TEV. During the period from 2004 to 2007, Harbin, Jilin, Chengde, Xining, Luoyang, Chongqing, Changsha, Huangshan, Guiyang, Quanzhou, and Nanning had the highest adaptive capacities. During the period from 2008 to 2011, the adaptive capacity of all cities as a whole declined significantly. Meanwhile, at this stage, Urumqi, Xining, Wenzhou, Guiyang, Beihai, Shantou, and other cities had the highest adaptive capacity. During the period from 2012 to 2015, the adaptive capacity of all cities decreased further on the whole. At this stage, the adaptive capacity of Lianyungang, Nanjing, and Xining were at their highest level. Compared with the previous stage, the adaptive capacity of many cities decreased significantly. During the period from 2016 to 2019, the adaptive capacity of all cities decreased further compared with the previous period. Only Harbin and Nanjing had a high level of adaptive capacity. The adaptive capacity of Lianyungang, Nanning, Xining, and other cities decreased significantly in this stage.

**Figure 6.** Spatiotemporal evolution of the adaptive capacity index ((**a**) Average adaptive capacity index for 2004–2007 were shown; (**b**) Average adaptive capacity index for 2008–2011 were shown; (**c**) Average adaptive capacity index for 2012–2015 were shown; (**d**) Average adaptive capacity index for 2016–2019 were shown).

As shown in Figure 7, During the period from 2004 to 2007, cities with a high TEV were mainly distributed in the economically developed areas along the eastern coast of China, including Shanghai, Shenzhen, Zhuhai, Guangzhou, Zhongshan, and Tianjin. During the period from 2008 to 2011, Shanghai, Shenzhen, and Zhuhai still had the highest TEV, while the TEVs of the tourist cities in central and western China generally decreased. During the period from 2012 to 2015, the coastal cities of Shanghai, Xiamen, and Shenzhen had the highest TEV. The TEV of Ningbo, Fuzhou, Quanzhou, Zhangzhou, Shantou, and other coastal cities had decreased. The TEVs in the Middle and western regions of China generally had little change, and only the TEV in Taiyuan increased. During the period from 2016 to 2019, the TEV in Jilin, Chengde, Datong, Chengdu, Chongqing, Guilin, Beihai, and other cities increased significantly compared with the previous stage.

**Figure 7.** Spatiotemporal evolution of the TEV index ((**a**) Average TEV index for 2004-2007 were shown; (**b**) Average TEV index for 2008-2011 were shown; (**c**) Average TEV index for 2012-2015 were shown; (**d**) Average TEV index for 2016-2019 were shown).

#### *3.2. Obstacle Factor Diagnosis of TEV*

Considering the space limitation, we only screened out the top five main obstacle factors of each city for display and explanation (Table 3).

Table 3 shows that in the obstacle factors of the TEV of China's major tourist cities, S3 (proportion of international tourists out of total tourists), S6 (tourism output density), S8 (urban-industrial sulfur dioxide emissions per unit area), S9 (urban-industrial smoke and dust emission per unit area), and S7 (discharge of urban industrial wastewater per unit area) are the five indicators with the highest occurrence frequency. Thus, the five factors were the top five most critical factors affecting TEV values. In all 58 major tourist cities in China, S3 was the greatest obstacle factor affecting TEV, and the obstacle degree of all the other cities was above 0.2 except for Shenzhen, Taiyuan, and Zhuhai. The second obstacle factor of most cities was S6, and the obstacle degree was between 0.1234 and 0.1650. However, the second obstacle factor of a few cities, such as Guangzhou, Xiamen, Shanghai, and Shenzhen, was S9, and the obstacle degree was between 0.1276 and 0.1362. The third obstacle factors affecting TEV in 58 major tourist cities were S8 and S9. Among these 58 cities, S8 was the obstacle factor in 27 cities, and the obstacle degree was between 0.0989 and 0.1308. By comparison, S9 was the obstacle factor in 31 cities, and the obstacle degree was between 0.1109 and 0.1276. The fourth obstacle factor affecting TEV was similar to the third obstacle factor, and the fourth obstacle factors in most cities were mainly S8 and S9. In 27 cities, the fourth obstacle factor was S8, and the obstacle degree ranged from 0.1033 to 0.1252. In 22 cities, the fourth obstacle factor was S9, and the obstacle degree ranged from 0.1124 to 0.1259. In addition, heterogeneity was observed in some cities. For example, the fourth obstacle factor in Guangzhou, Xiamen, and Shanghai was S6, with the obstacle degree being between 0.0886 and 0.1228; whereas the fourth obstacle factor in Guiyang, Luoyang, Ningbo, Qinhuangdao, Shenzhen, and Taiyuan was S7, with an

obstacle degree between 0.0987 and 0.1116. The fifth obstacle factor of 49 cities was S7, and the obstacle degree ranged from 0.0838 to 0.1196. In addition, the distribution of the fifth obstacle factor in other cities was scattered, among which the fifth obstacle factor in Guiyang, Luoyang, Ningbo, and Taiyuan was S8, and the obstacle degree ranged from 0.1034 to 0.1097. The fifth obstacle factor in Suzhou and Wuxi was S8, and the obstacle degrees were 0.0892 and 0.0957, respectively. The fifth obstacle factor in Xiamen was S10 (urban registered unemployment rate), and the obstacle degree was 0.0720. The fifth obstacle factor in Shenzhen was S6, and the obstacle degree was 0.0952. The fifth obstacle factor in Qinhuangdao was S9, and the obstacle degree was 0.1077.

#### *3.3. Prediction of the Evolution Trend of Urban TEV in the Next 10 Years*

In this study, the vulnerability indices of the tourism economy from 2004 to 2019 were taken as sample data and imported into the trained network model to obtain the vulnerability values of the tourism economy from 2021 to 2030; Figure 8 shows the results. During the period of 2021–2030, although the TEV of many major tourist cities in China increases year by year, the cities with low TEV levels still occupy the dominant position. In this period, the cities with high TEV levels will be Shenzhen, Xiamen, Shanghai, and Zhuhai. These cities are all located in the eastern coastal zone of China, and the average values of their TEV will be 0.2911, 0.2621, 0.2510, and 0.2092, respectively. Low-level TEV cities are mostly concentrated in the northeast and western regions of China, such as Yinchuan, Lanzhou, Harbin, and Hohhot, and the average TEV are 0.0310, 0.0483, 0.0513, and 0.0531, respectively. In general, the TEV of high-level and low-level regions differ greatly, indicating that the TEV of major tourist cities in China have strong spatial heterogeneity during this period. The cities with high TEV are mostly distributed in the Yangtze River Delta and Pearl River Delta urban agglomerations along the eastern coast of China, whereas the cities with low TEV are scattered in the northeast, central, and western regions of China. This spatial feature is similar to the existing situation explored above.

**Figure 8.** Predicted index of TEV in 2021–2030.



#### **4. Discussion**

#### *4.1. Internal Logic of Spatiotemporal Evolution of TEV*

The tourism industry is highly sensitive due to location variability, complexity, and comprehensiveness. Under the influence of various factors, such as economy, society, and nature, TEV has formed significant regional differences [43]. During the study period, the cities with high TEV values are mainly distributed in the eastern region of China, with Shanghai, Shenzhen, Zhuhai, and other economically developed cities as typical representatives. These cities are located in the center of China's economy, with convenient transportation and frequent business and trade exchanges at home and abroad. Owing to the high proportion of the regional tourism industries scale and a large number of inbound tourists, the regional tourism economy faces a higher risk of external impact, to some extent; thus, presenting strong vulnerabilities [44].

The cities with low TEV in China are widely distributed in the northeast, central, and western regions, and they are characterized by a contiguous distribution. First, Harbin, Jilin, and Changchun in northeast China are important old industrial bases. In recent years, the development speed of tourism has been slow compared with that of other regional central cities with developed tertiary industries. In addition, due to the remote geographical location, fewer long-distance tourists, weak ability to earn foreign exchange in tourism, and low dependence on the tourism industry, the cities show a low TEV [36]. Second, cities such as Urumqi, Yinchuan, and Lanzhou are located in the underdeveloped areas in the west of China; thus, the level of social and economic development is relatively weak. In addition, the status of the local tourism industry is not outstanding, and tourism visibility and attraction are not high. As a result, the development level of the tourism industry is low, the industrial correlation is not strong, and the tourism economy is weak [23]. Finally, due to the geographical location, ecological environment, and socioeconomic characteristics of the central Chinese cities, the multiplier effect and ripple effect of the tourism industry are relatively weak, and they do not occupy a dominant position in the economic structure so they exhibit low TEV levels. The above analysis shows that the vulnerability of China's tourism economy generally still follows the distribution characteristics dominated by the economy, which echoes the previous research conclusions to a certain extent [23].

Overall, During the period from 2004 to 2011, the TEV of most tourism cities showed a decreasing trend year by year. At this stage, the tourism industry has not yet formed a perfect system, the growth of the tourism market is flat, and the tourism economy has not formed enough scale to cause strong economic sensitivity. Moreover, the tourism incentive policy accelerates the influx of tourism enterprises and the construction of tourism facilities, which makes the growth rate of regional tourism economic strain capacity higher than the sensitivity of the tourism economy [45]. During the period from 2012 to 2019, the TEV of most tourism cities showed a slight upward trend, which is closely related to the imbalance of industrial structure caused by the rapid growth of the tourism industry and the external dependence caused by international tourism income.

#### *4.2. Obstacle Factors Affecting TEV*

On the basis of the obstacle diagnosis model, this study measures the obstacle factors that affect TEV. The results show that the proportion of international tourists out of total tourists is the most influential factor, which is mainly due to the many unstable factors in the international environment, such as natural disasters, economic crisis, and social unrest. A series of factors may have a strong impact on inbound tourism and impact the whole tourism economic system [46,47]. Tourism output value density is the second major factor affecting TEV. According to Sun, due to the high sensitivity and low resistance of the tourism economic system to internal and external environmental disturbances, compared with other industries, it is very easy to lose the original structure, state, and functional attributes of the tourism economic system in internal and external environmental disturbances, thus leading to a fluctuating and unstable state [48]. The high density of the

tourism industry in the local area will magnify this inherent defect to a certain extent [49]. In addition, urban-industrial sulfur dioxide emissions per unit area, urban industrial smoke and dust emission per unit area, and discharge of urban industrial wastewater per unit area of the three environmental factors have a great impact on TEV. This reflects that the development of the tourism industry has higher requirements for the local ecological environment, which is consistent with the views of Fei et al. [50]. At the same time, it also highlights the characteristics of the tourism economy as an "eco-socioeconomic" composite system [51]. The three are interdependent, adapt to each other, and penetrate, blend, and interact in their development [52]. How to realize the coordinated development of ecology, culture, and economy is a subject to be discussed in the future.

#### *4.3. Trend Prediction of TEV*

The prediction accuracy of the BP neural network model established in this study is more than 95%. Therefore, the prediction method proposed in this study is applicable to the development analysis of China's urban TEV and can provide an important theoretical basis for the development and decision-making of the tourism industry. According to the evolution trend, TEV values in China's major tourist cities will continue to show an increasing trend in the next 10 years. However, the rise of the TEV will obviously bring many adverse effects, so how to reasonably regulate TEV to achieve sustainable development of the tourism economy is an urgent issue to be discussed at present. For some scholars, TEV is accumulated by two forms of environmental stress: endogenous and exogenous [22]. The endogenous vulnerability factors are formed by the activities of the tourism economic system, including the irrationality of the internal structure of the tourism market structure, tourism income structure, tourism organization structure, tourism investment structure, and tourism product structure. Exogenous environmental stress is the abrupt change and gradual change of external environmental factors of the tourism economic system, such as the political environment, economic environment, natural environment, and tourism industry policy. This understanding means that to reduce TEV, we need to start from two aspects of internal structure optimization and external policy regulation.

#### **5. Conclusions**

At present, the rapid development of China's tourism industry plays an evident role in promoting economic and social development. However, due to the inherent nature of the tourism industry, it is vulnerable to the impact of the internal and external environment. Therefore, promoting the sustainable development of the regional tourism economy is an objective requirement to evaluate and forecast the TEV in major tourist cities. Using panel data from 2004 to 2019, a comprehensive evaluation index system for TEV was constructed in this study, which used 58 major tourist cities in China as the research objects. The TEV was reasonably measured by using entropy weight method, TOPSIS model, obstacle diagnosis model, and BP neural network model. Finally, the spatiotemporal pattern, obstacle factors, and future trend of TEV were discussed.

The contributions of this study to the literature are as follows. Limited by the difficulty of obtaining statistical data, existing studies mostly compare the state of TEV in different cities from a horizontal perspective, and there is a lack of studies on the evolution process and mechanism of TEV in different cities from a vertical perspective [36]. In this study, panel data of a longer time scale were used to predict the evolutionary trend of TEV in the future, which can provide scientific reference for different tourism cities to formulate targeted tourism economic development policies. In addition, prediction is the basis of decision-making, but the traditional statistical methods have a strong assumption of the data distribution law, so the problem of random interference in the economic system has not been addressed [53]. In the prediction of the TEV, the existing time series analysis method can only reflect the linear law with a strong tendency and cannot describe the nonlinear characteristics. In this study, BP neural network was used to build a prediction model, which can excavate and predict the regularity of time series indicators. The application of

this method not only enriches the research system of vulnerability methods but also has reference significance for other fields.

The findings of this study have several practical implications for the development of the urban tourism economy. First, in terms of the spatiotemporal pattern of evolution, cities with high TEV are mostly distributed in the eastern coastal urban agglomerations of China, while cities with low TEV are scattered in the northeastern, central, and western regions of China. This is the result of tourism industry dependence and is closely related to location, economy, nature, and other factors. Therefore, promoting the coordination of urban infrastructure, industrial structure, and the ecological environment should become an important measure of urban construction. Second, this study found that the five obstacle factors that have the greatest impact on the vulnerability of the urban tourism economy are the proportion of international tourists out of total tourists, tourism output density, urban industrial sulfur dioxide emissions per unit area, urban industrial smoke and dust emission per unit area, and discharge of urban industrial wastewater per unit area. This shows that accelerating the adjustment of economic structure and the transformation of economic mode [54], as well as the purification and discharge of waste gas, centralized treatment of hazardous waste and wastewater, and the improvement of tourists' awareness of environmental protection, should become key issues to reduce TEV. Third, in the next 10 years, TEV of major tourist cities in China will increase, which is the result of the accumulation of endogenous structure and the stress of the exogenous environment. In order to effectively reduce the vulnerability of tourism economic development, we can adjust the orientation of the tourism industry development, highlight the driving effect of tourism association, and build a multi-pillar industry system.

Although this study measured and analyzed the spatiotemporal evolution, obstacle factors, and future trends of the TEV of China's major tourist cities, it has limitations. The index system of this research is constructed on the basis of the general characteristics of all typical tourist cities in the dataset. However, due to the large area of China, cities in different regions in the natural environment, and social-cultural differences, the index system will ignore the heterogeneity between different cities, leading to uncertain factors. Future research may construct an inter-city differentiated index evaluation system according to the unique properties of each city. Such an evaluation system may make the measurement results more accurate. In addition, on the basis of the BP neural network, this study prefigured the time series evolution of the future TEV of China's major tourist cities well. The overall model shows high precision, but some cases show a poor-fitting effect. Scholars point out that combination-prediction is better than single prediction [55]. Therefore, building a variety of prediction models for comparison, such as a GM (1,1) prediction model, linear regression prediction model, and time series prediction model supplemented by the BP neural network model to make decisions, presents a promising direction for future research.

**Author Contributions:** Conceptualization, Deping Chu; methodology, Chengkun Huang and Feiyang Lin; software, Chengkun Huang and Feiyang Lin; validation, Chengkun Huang, Feiyang Lin, and Deping Chu; formal analysis, Chengkun Huang; investigation, Chengkun Huang, Feiyang Lin, Lanlan Wang, and Jiawei Liao; data curation, Chengkun Huang, Feiyang Lin, Lanlan Wang, and Jiawei Liao; writing—original draft preparation, Chengkun Huang and Feiyang Lin; writing—review and editing, Chengkun Huang, Feiyang Lin, Deping Chu, and Junqian Wu; visualization, Chengkun Huang and Feiyang Lin; supervision, Deping Chu and Junqian Wu; project administration, Deping Chu and Junqian Wu; funding acquisition, Deping Chu. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Natural Science Foundation of Fujian Province, China (grant number 2018J01743).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

#### **Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**


#### **References**


## *Article* **Why Is Green Hotel Certification Unpopular in Taiwan? An Analytic Hierarchy Process (AHP) Approach**

**Yen-Cheng Chen 1, Ching-Sung Lee 2,\*, Ya-Chuan Hsu <sup>2</sup> and Yin-Jui Chen <sup>2</sup>**


**Abstract:** The main purpose of this study was to investigate the factors that discouraged Taiwan hoteliers from applying for green hotel certification. The analytic hierarchy process (AHP) method was used to perform a weighted analysis that comprehensively identified important hindering factors based on information from hotel industry, government, academic, and consumer representatives. Overall, in order of importance, the five dimensions of hindering factors identified by these experts and scholars were hotel internal environment, consumers' environmental protection awareness, environmental protection incentive policy, hotel laws and regulations policy, and hotel external environment. Among the 26 examined hindering factor indices, the three highest-weighted indices overall for hoteliers applying for green hotel certification were as follows: environmental protection is not the main consideration of consumers seeking accommodations, lack of support by investment owners (shareholders), and lack of relevant subsidy incentives. The major contribution of this study is that hoteliers can understand important hindering factors associated with applying for green hotel certification; therefore, strategies that can encourage or enhance the green certification of hotels can be proposed to improve corporate image in the hotel industry, implement social responsibility in this industry, and obtain consumers' approval of and accommodation-willingness for green hotels.

**Keywords:** green hotel; corporate social responsibility; green hotel certification

#### **1. Introduction**

Green business practices have become very popular with the wave of green and sustainable issues in recent years. Throughout the world, enterprises are adopting a variety of environmentally sustainable activities while managing their business operations [1,2]. One motivation for these changes is that many individuals and corporate customers consider the company's sustainable environmental performance when making purchasing decisions [3,4]. Of course, other reasons also exist, such as government supervision requirements, social responsibility requirements, and mandatory implementation of green practices in enterprises [5–8].

Taiwan is an island area composed of Taiwan Island and 121 small islands. Development on islands is affected by their remoteness, limited natural resources, small markets, marginal decision-making centers, unique internal structure, and vulnerability to natural disasters. The islands of various countries in the world, especially small islands, are geographically isolated from the mainland, resulting in differences in climate, topography, and physical environment, and each has its own natural and cultural characteristics. Taiwan is surrounded by the sea; its fishery resources are rich, the ecological environment is well preserved, and the natural landscape is dominated by ocean features, which constitute the greatest attractions of island tourism. Taiwan has unique natural resources and is an island with the potential to develop international ecological tourism. To help Taiwan move

**Citation:** Chen, Y.-C.; Lee, C.-S.; Hsu, Y.-C.; Chen, Y.-J. Why Is Green Hotel Certification Unpopular in Taiwan? An Analytic Hierarchy Process (AHP) Approach. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 255. https:// doi.org/10.3390/ijgi10040255

Academic Editors: Andrea Marchetti and Wolfgang Kainz

Received: 30 January 2021 Accepted: 5 April 2021 Published: 10 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

towards green and sustainable development, this research analyzed expert data to identify the obstacles to applications for green hotel certification.

The hotel industry used to be regarded as a chimney-free industry [9–11]. However, with the increased range and level of services, energy consumption, amounts of waste and wastewater, and chemical emissions now have a considerable impact on the environment [12,13]. Most studies on green hotels in Taiwan areas have examined green hotel-related issues from the perspectives of consumers, whereas the difficulties in obtaining green hotel certification from the perspective of the hotelier have rarely been discussed [14,15]. Therefore, how hotels in Taiwan area make green changes and how they reduce energy consumption and damage to the environment, as well as the difficulties encountered in applying for green hotel certification, remain important and unanswered issues in the sustainable operation of green hotels in Taiwan.

Sustainability is currently one of the major priorities of tourism all over the world. One aspect of sustainable tourism is green management. According to the trends in the hotel industry, accommodation facilities have recognized that adopting green practices is beneficial [16]. The Environmental Protection Administration of Taiwan launched the "Green Mark" for the hotel industry in 2008 in response to the global demand for environmental protection and sustainable development; however, there were very few applicants. Therefore, the government re-launched the "Green Hotels" project with lower thresholds for environmental protection conditions in 2011 to shorten and reduce the review time and investment and increase the application willingness of hoteliers. However, by the end of February 2019, among 13,268 hotels in the tourism industry, only 1450 hotels, less than 10%, became green hotels [17]. There are only 64 hotels with a green mark certification [18]. This result indicates that hoteliers in Taiwan generally hold a reserved attitude towards green hotels, worry about the investment and service quality of green facilities, do not know how to proceed, or even do not understand the market benefit. What are the important factors that hinder the application for green hotel mark certification, and why is the number of hotels that obtain green mark certification not increasing as expected?

To address these questions, this study used the analytic hierarchy process (AHP) method to analyze the level of importance of each factor hindering the application for green hotel certification based on questionnaire surveys of industrial experts, government unit experts, scholars, and consumers. Finally, quantitative ranking of the level of importance of each hindering factor of application for green hotel mark certification, identified through the expert questionnaire surveys, was performed to provide a reference to aid in increasing the number of applications for green hotel certification in the future.

#### **2. Background and Related Works**

#### *2.1. Impact of Data Science and Geographical Location on Taiwan's Green Certified Hotels*

In recent years, Taiwanese governmental units have integrated tourism and information technology, promoted sustainable green tourism information services, integrated and established a "Taiwan Green Certificate Hotel Database" [18], and promoted tourism business models. At the same time, mastering the development trend of cloud technology combined with social media and mobile technology and gradually integrating various artificial intelligence tourism services is an important policy of Taiwan's tourism official unit. The O2O (online to offline) model has been widely used in Taiwan's tourism industry. Green hotels are trying to use a variety of online channels, including online travel agency (OTA) and hotel websites, to show their green certification to attract customers to their hotels. The OTA website provides certified green hotel geographical distribution information and marketing activities, helps introduce new customers to the green hotel, and provides information to enrich the OTA website (Figure 1). Therefore, applying for green hotel certification and attracting guests through OTA have become top priorities for hotel operators in Taiwan.

**Figure 1.** Green Hotel OTA website in Taiwan.

Factors affecting the location of green hotels include traffic conditions, geographical conditions, natural landscapes, and geographical location, and they have a certain impact on green hotels [19]. Popovic et al. [20] pointed out that the hotel's location should be considered as follows: geographical environment: beautiful and comfortable climate, cultural attraction, recreational opportunities, and surrounding environmental characteristics; accessibility: convenient transportation; natural limitations: topography and slope, hydrology, geology, plants and wildlife; environmental management: urban area division, building regulations, comfort and convenience, current land use, restrictions on future land and building changes. Fang et al. [21] found that hotel location factors include following: transaction advantages: the hotel's location close to the tourist destination; landscape factors: the landscape and public facilities near the area; convenience: including time and distance, highways, and railway connectivity; hotel environment: appropriateness of the surrounding environment.

Therefore, most hotels with green certification in Taiwan are concentrated in metropolitan areas (Table 1). Additionally, a city's geographical location will become an important key factor for green hotel certification because of the relevant subsidy incentives of cities in Taiwan, strict environmental laws and regulations for geographical locations, regional restrictions in cities, and land cost considerations.


**Table 1.** Distribution of Green Hotel Certificates in Taiwan.

Source: Environmental Protection Administration Executive Yuan, R.O.C. (Taiwan) [18].

#### *2.2. Development of Green Hotel Certification*

In the hotel and tourism field, consumers' understanding of the sustainable development of hotels is also increasing [6,16,22,23]. Therefore, hotel operators and managers

also recognize that hotels should actively participate in sustainable operations and environmental protection to attract customers with increasing green consciousness [24,25]. According to the global sustainable travel report released by Bookings.com, 65% of global travelers expressed their intention to stay in green hotels [26]. In response to the higher expectations of these environmentally conscious consumers regarding "green accommodations", many hotels have actively adopted green and sustainable environmental protection practices [16,23,27–29]. Most hotel operators have a positive awareness of green hotels and environmentally friendly labels and have a willingness to implement them. They agree that green hotels can help improve the hotel's image and energy-saving effects and are willing to implement environmental protection measures to help reduce hotel costs [30–32].

Green hotel certification is intended to provide a series of environmentally friendly standards and encourage the hotel industry to increase its environmental performance. Of approximately 140 green certification institutions for hotels, 50 eco-labels focus on green hotel certification [33]. The green hotel certification schemes vary; however, the majority of certifications include the following components: water, energy, waste, sustainable procurement, biodiversity conservation, community engagement, and architecture and design [34]. Reviewing the green hotel certification program is very important to understanding the key structural components of hotel environmental management.

The well-known Hilton hotels launched the corporate responsibility plan, "Travel with Purpose", in 2011. This plan includes not only social impacts but also environmental impacts to focus on effectively reducing energy, water, and waste output in environmental management [35]. TripAdvisor provides information about hotel environmental practices. The company negotiated with international expert organizations for sustainable development, the United Nations Environment Programme, and the International Tourism Organization to develop the TripAdvisor GreenLeaders program. The GreenLeaders program includes 6 components: energy, water, waste, purchasing, site, and innovation and education. The evaluation items include towel and linen reuse, electric car recharge stations, and solar panels. Hotels can apply for the TripAdvisor GreenLeaders program through completion of a self-evaluation survey. Hotels can achieve 1 of the 4 badge levels based on the environmental protection practice level: bronze, silver, gold, or platinum. More than 1000 hotels have obtained the TripAdvisor GreenLeaders award, including all major brands and many independent hotels. In addition, hotel customers can comment on the green practices of hotels to ensure the integrity of the program [36,37].

Green Seal is a nonprofit organization that provides environmental certification standards. These standards represent responsible choices for company purchasers and consumers to promote more effective sustainable development. Green Seal released 33 standards covering 400 product and service categories [38]. Green Seal regulates requirements for hotels and lodging properties. The Green Seal standard GS-33 for lodging properties has three levels: bronze, silver, and gold. Hotels can apply for green certification. Since its first launch in 1999, this green seal standard has become a method to help hotels to improve their environmental practices and develop into environmental leaders. This standard focuses on energy conservation, pollution prevention, waste minimization, water conservation, and freshwater resource management [39].

#### **3. Materials and Methods**

This study used the AHP method to analyze important factors that hinder hoteliers from applying for green hotel certification. In the study design, green hotel scholars and experts and hoteliers that already obtained green hotel certifications were first interviewed through meetings. In addition, previous related literature was collected using the literature analysis method to prepare the AHP questionnaire. Next, AHP was performed to categorize and rank important factors that hindered hoteliers from applying for green hotel certification.

#### *3.1. Sampling Design*

The sample selection of this study is based on the proportion of geographical distribution, and the invitation process is to invite participants through Email or telephone. As was explained to experts, the survey items were filled in from a professional perspective. Moreover, because most selected experts support the topic of green hotels, they were very enthusiastic to respond to the invitation and make suggestions. Because this research topic was professional, actual surveys were performed based on the research content to target hotel and environmental protection professionals with more than 15 years of experience in four fields—hotel industry professionals (five people), government unit professionals (4 people), academic professionals (four people), and consumer professionals (four people)—to determine the weight of each item. All of the above experts are professionals who are engaged in or have contact with fields closely related to this research topic. The survey was completed in 2020. Expert background information is provided in Table 2 below.


**Table 2.** Description of Participating Experts by the AHP method.

Before the study was officially conducted, all indices that hindered green hotel certification were explained to the respondents in detail to avoid confusion and to effectively establish the understanding of respondents on each index and their relationship. After invalid questionnaires such as those with missing answers were excluded, consistency statistical verification was performed. The results showed that recovered questionnaires all conformed to the standard of the consistency ratio (CR) value lower than 0.10. Therefore, there were, in total, 17 copies of valid recovered questionnaires. This study used the postevent method to calculate weighted values of each index in valid recovered questionnaires

based on the AHP guidelines. In addition, individual and overall weights were analyzed based on their professional attribute categories to evaluate each index.

#### *3.2. Application of the AHP Method for Indicators That Hindered Green Hotel Certification*

The AHP method is mainly applied in uncertain conditions and decision-making issues with many evaluation criteria. The application scope of AHP is very diverse, especially in planning, prediction, judgement, resource allocation, and portfolio trials [40]. AHP analyses and divides complicated questions into several hierarchies to establish a hierarchical structure with mutual influence. It decomposes step-by-step from high hierarchies to low hierarchies. Through quantitative judgement, AHP simplifies and improves the previous decision-making procedures of decision makers who relied on instinct to obtain priority-weighted values of all schemes. The hierarchical relationship can provide a logical approach to evaluation for decision makers to select appropriate schemes. Schemes with higher priority-weighted values have higher priority orders of acceptance. Therefore, the risk of mistakes in decision making is reduced. The procedure of the AHP method is divided into eight steps [41].

1. Decision-making issues are identified, and evaluation indicators are listed.

The definition of the research topic is determined, the scope of the problems is analyzed and defined, and the purpose of decision making is confirmed. Next, opinions of experts and decision makers are integrated. The relevant evaluation criteria of the decisionmaking problems are listed, the criteria are defined, and the criteria are categorized into different hierarchies.

2. The hierarchical structure is constructed

All viewpoints of decision makers are repeatedly amended, using the group discussion method or referencing relevant literature and expert opinions, and summarized to establish the target-scheme hierarchical structure.

3. Pairwise comparisons are performed for evaluation

After the hierarchical structure is established, the next step is the evaluation task. Indices in the upper hierarchy are mainly used as the baseline. Pairwise comparisons between the relative importance of the index to the upper hierarchy of any two indices at the same hierarchy are performed. If there are n indices, n(n − 1)/2 pairwise comparisons should be performed to determine the relative importance among all indices at the same level of the hierarchy.

4. The matrix at each level of the hierarchy is developed according to step 3 to construct all evaluation matrices.

This study targeted sub-hierarchies of all hierarchies to perform pairwise comparisons to obtain all evaluation matrices. Evaluation matrices of all hierarchies are constructed according to the following Formula (1):

$$\begin{aligned} \begin{bmatrix} A\_k \end{bmatrix} = \begin{bmatrix} a\_{ij} \end{bmatrix} = \begin{bmatrix} 1 & a\_{12} & \dots & a\_{1n} \\ 1/a\_{12} & 1 & \dots & a\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 1/a\_{1n} & 1/a\_{2n} & \dots & 1 \end{bmatrix}, \begin{array}{c} \mathbf{k} = 1, 2, \dots, n, \\ \end{array} \end{aligned} \tag{1}$$

where *Ak* is the evaluation matrix at each hierarchy;


According to evaluation data obtained in step 3, a pairwise comparison matrix is constructed and is called the positive reciprocal matrix. Based on the constructed pairwise comparison matrix, the eigenvector and the maximized eigenvalue (*λmax*) are calculated. Next, the values of differences between the maximized eigenvalue and n indices are converted into the consistency index (*CI*). The ratio between indices in the evaluation matrix is measured and used as the reference for whether the pairwise comparison matrix is acceptable. The consistency is examined using the following Formula (2):

$$C \ I = (\lambda\_{\max} - n) / (n - 1). \tag{2}$$

The consistency index of the randomly produced positive reciprocal matrix is the random index (RI). Using the above *CI* and RI, the consistency ratio of the pairwise comparison matrix is obtained, CR = *CI*/RI.


The relative weights of indices in all hierarchies are integrated to calculate the total priority weight of the overall hierarchy. The calculated weight represents the relative priority order of all decision-making schemes corresponding to the decision-making target. This study targeted experts in four individual fields (industry experts, government unit representatives, scholars, and consumers) to separately calculate the index weights of all hierarchies to evaluate the priority order of all indices and determine the quantitative ranking of the levels of importance of hindering factors of application for green hotel mark certification.

8. The consistency of the overall hierarchy is evaluated.

The CR of the overall hierarchy is mainly the consistency index of the hierarchy (CIH) divided by the random index of the hierarchy (RIH). Therefore, the consistency ratio of the overall hierarchy (CRH) should be smaller than 0.10. If this standard is not met, evaluation should be amended again to improve the CR. In summary, the CRH was smaller than 0.10. The AHP questionnaire design of this research is based on the environmental hotelrelated literature and a survey of experts (list of experts is shown in Table 1). A total of 26 indicators that affect the application for environmental hotel certification are classified into five major categories: hotel internal environment, hotel external environment, hotel laws and regulations policy, environmental protection incentive policy, and consumers' environmental awareness, as shown in Figure 2.

#### **4. Results**

#### *4.1. AHP Analysis of Hindering Indices of Green Hotel Certification by Scholars and Experts*

Table 3 shows that "hotel internal environment" (0.262) was the most important index, followed by "consumers' environmental awareness" (0.258) and "environmental protection incentive policy" (0.204). These results indicated that scholars and experts overall considered that the hotel internal environment was the index with the highest level of hindrance in green hotel certification.

**Table 3.** Analysis of hindering indices of application for green hotel certification considered by scholars and experts.


Regarding the index weights within each individual dimension, in the hotel internal environment dimension, "not supported by investment owners (shareholders)" had the highest weight (0.345), followed by "high initial investment in environmental protection" (0.223) and "difficult to improve existing hotel equipment" (0.189). In the hotel external environment dimension, "becoming green does not guarantee improvement in the accommodation rate" had the highest weight (0.348), followed by "lack of widespread acceptance of green hotels by consumers" (0.305) and "no urgency for hotels to apply for green mark certification" (0.131). In the hotel laws and regulations policy dimension, "green hotel application procedure is complicated" had the highest weight (0.311), followed by "lack of environmental protection-related counselling mechanisms" (0.251) and "conflicts between green hotel marks and hotel star reviews" (0.193). In the environmental protection incentive

policy dimension, "lack of relevant subsidy incentives" had the highest weight (0.322), followed by "limited resources for promoting green hotels" (0.176) and "lack of integration of green hotel promotion by relevant government departments" (0.162). In the consumers' environmental awareness dimension, "environmental protection is not the main condition for consumers to choose accommodations" had the highest weight (0.375), followed by "green hotel-related marks are not well known" (0.171) and "consumers' cognitive differences regarding green hotels" (0.166).

Furthermore, the relative weights of all indices were determined. Overall, the scholar and expert representatives expressed that "environmental protection is not the main condition for consumers to choose accommodations" (0.099) had the highest relative weight among all indices. This result indicated that these representatives believed that the most critical factor hindering green hotel certification was this index, followed by "not supported by investment owners (shareholders)" (0.093) and "lack of relevant subsidy incentives" (0.075).

#### *4.2. Analysis of the Relative Weighted Ranking of Indices That Hindered Green Hotel Certification Considered by All Scholars and Experts*

In the analysis of the relative weighted ranking of indices that hindered green hotel certification, this study ranked the relative weights considered by scholars and experts in all fields and performed comprehensive analyses targeting consumer representatives, government representatives, academic representatives, and industry representatives to understand the concentration level of ranking of all indices. Table 4 shows that the results were mainly divided into: 1. high-ranking concentration indices (indices concentrated in the top five of all fields), 2. low-ranking concentration indices (indices concentrated in the bottom five of all fields, and 3. differential ranking indices (indices ranked in both the top five and the bottom five).

In the high-ranking indices, "environmental protection is not the main condition for consumers to choose accommodations" was a high-ranking concentration index. The results showed that consumer representatives and industry representatives all considered that the relative weight of this index was the overall number 1 in ranking, and academic representatives considered it number 3. The second was "not supported by investment owners (shareholders)". The results showed that government representatives and academic representatives all considered that the relative weight of this index is the overall number 1 in ranking, and industry representatives considered it number 4. The next was "lack of relevant subsidy incentives". The results showed that government representatives and academic representatives all considered that the relative weight of this index is the overall number 2 in ranking, and consumer representatives considered it number 3.

In the low-ranking indices, "lack of support from the hotel industry association" was the low-ranking concentration index. The results showed that consumer representatives and academic representatives all considered that the relative weight of this index was last in the overall ranking (26th place), and industry representatives considered it to be second to last (25th place), followed by "lack of understanding of the green (environmental protection) consumer market trend". The results showed that academic representatives considered that the relative weight of this index was the second to last in the overall ranking (25th place), consumer representatives considered it to be third to last (23rd place), and government representatives considered it to be fourth to last (22nd place). The next was "no urgency for hotels to apply for green mark certification". The results showed that government representatives considered that the relative weight of this index was second to last in the overall ranking (25th), and consumer representatives considered it to be third to last (24th place).



In the differential ranking of indices, "green hotels are not a requirement for government procurement" had the largest difference in the ranking of overall indices. The results indicated that consumer representatives considered that the relative weight of this index to be ranked fourth among the overall indices, and industry representatives considered it to be in last place (26th place). The next was "conflicts between green hotel marks and hotel star reviews". The results indicated that consumer representatives considered that the relative weight of this index overall ranked second, and government representatives considered it to rank third to last (24th place). The next was "Lack of widespread acceptance of green hotels by consumers". Industry representatives considered that the relative weight of this index ranked fifth among all indices, and government representatives considered it to be ranked last (26th place).

In summary, this study compared the ranking in the fields of consumer representatives, government representatives, academic representatives, and industry representatives and showed that the ranking of the relative weights of some indices had high degrees of concentration, showing that scholars and experts in all fields agreed on the degree of hindrance of that index, whereas there were opposite opinions on some indices. Therefore, green hotel promotion involves different fields and scholars. Experts in all fields had different subjective experiences on green hotel application hindrance and had different opinions on the levels of hindrance.

#### **5. Discussion**

Green hotels are a current trend in tourism accommodation, and the distribution of green hotels is related to the environmental protection of tourism-related geography. The indicators revealed in this research can provide a reference for increasing applications for green hotels and illustrate the importance of regional geography to the environmental ecology. This study analyzed five important index dimensions: hotel internal environment, hotel external environment, hotel laws and regulations policy, environmental protection incentive policy, and consumers' environmental awareness. The study results are summarized and discussed below.

The "not supported by investment owners (shareholders)" index, "difficult to improve existing hotel equipment" index, and "high initial investment in environmental protection" index ranked second, sixth, and fourth, respectively, in the overall ranking of indices by scholars and experts. These results indicated that the thinking of hoteliers and the recognition of hotels' senior executives were important factors affecting the willingness of hoteliers to participate in green hotel certification. "Not supported by investment shareholders" and "high initial investment in environmental protection" were important factors hindering hoteliers from applying for green marks. This result is consistent with Chan et al. [42], Iorgulescu [43], and Moon et al. [44]. Although the hotel industry has an incentive to invest in environmentally friendly hotels, it is currently adopting a wait-andsee attitude due to input cost considerations and uncertainty in output performance.

These five indices—"lack of support from the hotel industry association", "lack of widespread acceptance of green hotels by consumers", "no urgency for hotels to apply for green mark certification", "lack of understanding of the green (environmental protection) consumer market trend", and "becoming green does not guarantee improvement in the accommodation rate"—in the analysis of overall ranking by overall scholars and experts were all not in the top 10, whereas the ranking of "hotel external environment dimension" was ranked last among five dimensions.

The "green hotel application procedure is complicated" index ranked 7th in the analysis of overall rankings of indices by scholars and experts. The results indicated that the certification process of many green hotel marks in Taiwan is too complicated, and the content of green mark requirements is not clear. This result is consistent with Suryawan and Aris [45], Nelson et al. [46], and Sharma et al. [47]. Many hoteliers expressed that the label certification procedures are too complicated, and they are not clear about the content of the green label items.

The "lack of relevant subsidy incentives" index ranked third in the analysis of overall ranking of indices by scholars and experts. This result is consistent with Heras-Saizarbitoria et al. [48] and dos Santos et al. [49]. The increase in incentives is a very important planning consideration for groups targeted by the policy. The government should start with subsidies and environmental protection tax incentives to directly encourage hoteliers to implement green hotels, and relevant incentive measures by the government are an important hindering factor affecting hoteliers from applying for green marks.

The "environmental protection is not the main condition for consumers to choose accommodations" and "consumers do not have high environmental protection consciousness" indices ranked first and fifth in the analysis of the overall rankings of indices by scholars and experts, respectively. The results indicated that when consumers choose hotels for accommodations, green hotels will not be listed as a factor in purchasing decisions. Consumer consider price and only have economic motivation [50–52].

#### **6. Conclusions**

The "environmental protection is not the main condition for consumers to choose accommodations" index in the analysis of indices by consumer representatives and hotel industry representatives was ranked 1st in both rankings. Therefore, hoteliers understand very clearly that consumers still use the economic factor as the priority in the accommodation selection environment [53]. Hoteliers should target environmental behaviors such as consumers bringing their own toiletries to give substantial price discounts in the practical operations. The "not supported by investment owners (shareholders)" index ranked first in the analysis of indices by government representatives and academic representatives. Therefore, the major decision makers of the hotels are still owners (shareholders). The attitude of owners (shareowners) is an important factor determining whether the promotion of government policy succeeds or fails. It is recommended that government units seek assistance from hotel associations in all counties and cities to promote green hotels and strengthen education about the environment in the ventures undertaken by owners (shareholders). In addition, counselling and promotion can be immediately performed when hotels apply for certification. In addition, hotels planned and established using the standard of green hotels should be opened. Furthermore, when hotels actually obtain green hotel mark certification after formal operations, they can also apply for subsidies to increase the application willingness of hoteliers. The contributions of this research are as follows: through the analysis of the actual data indicators of the tourism experts in this research, we know that the current official government tourism-related units in the Taiwan region recognize the policy effectiveness of the environmentally friendly hotel label system. In discussing the obstacles to the environmental labelling of hotels, this study collected opinions from tourism experts, and the expert data presented by AHP can be used by official tourism organizations and tour operators in Taiwan as a reference, and the impact of various related factors can be explored and prioritized.

#### **7. Limitations and Future Studies**

Experts in the fields in this study were limited to the Taiwanese area. It is recommended that researchers in the future expand the sources and categories of experts to make them more representative; thus, the study results will not be influenced by the regions in which the experts are located. This study only targeted the hindering factors of green mark certification in the Taiwanese area. Because the implementation time of green hotels in various countries around the world is earlier than that in Taiwan, it is suggested that subsequent researchers target hindrance of implementation of green hotels in different countries to perform in-depth studies and compare the results with the factors hindering the implementation of green hotels in Taiwan. In addition, we believe that the comparison from the perspectives of hoteliers and the perspectives of consumers would help us better understand the factors affecting the promotion of green hotel marks.

**Author Contributions:** Conceptualization, Ching-Sung Lee; Data curation, Yin-Jui Chen; Formal analysis, Ching-Sung Lee; Investigation, Ya-Chuan Hsu and Yin-Jui Chen; Methodology, Yen-Cheng Chen and Ching-Sung Lee; Project administration, Yen-Cheng Chen and Ya-Chuan Hsu; Software, Ya-Chuan Hsu; Validation, Yin-Jui Chen; Writing—review & editing, Yen-Cheng Chen. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Ministry of Science and Technology, Taiwan: 109-2410-H-030-047-.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*ISPRS International Journal of Geo-Information* Editorial Office E-mail: ijgi@mdpi.com www.mdpi.com/journal/ijgi

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-5030-5