1. Introduction
As living standards improve, there is a growing tendency for people to indulge in travel [
1]. Among Chinese tourists, mountainous scenic areas have maintained their popularity [
2]. Typically, these areas feature exceptional natural surroundings, marked by lush forest cover perfect for sightseeing and recreational pursuits, complemented by a diverse range of tourism amenities and services [
3,
4]. Moreover, mountain tourism is often intertwined with rich historical and cultural heritage, attracting numerous sightseers and recreational travelers [
5,
6,
7]. Nevertheless, recent years have witnessed a substantial surge in tourist arrivals at scenic locales in China, driven by the sustained growth of cultural tourism, leading to significant challenges in visitor management. To effectively address this issue, forecasting tourist demand and analyzing their spatio-temporal behavior within tourist destinations are crucial [
8].
Understanding tourist behavior relies heavily on analyzing their movement patterns, which are essential for gaining insights into their spatio-temporal behavior, thereby influencing the development of scenic locales and the crafting of marketing strategies [
9].
Movement pattern analysis typically consists of three dimensions: inter-destination, intra-destination, and intra-attraction tourist behaviors [
10,
11,
12].
The analysis of inter-destination tourist behavior focuses on tourists’ movement between origins and destinations, typically comprising countries, cities, and regions. For example, Chung et al. [
13] utilized social media posts to gather data points and investigate the movement patterns of Korean tourists across various European countries and cities, offering insights for European tourism marketing organizations. Phithakkinukoon et al. [
14] analyze tourist mobility in diverse Japanese cities and its correlation with travel behaviors by leveraging cell phone GPS records. Liu et al. [
15] employed network analysis and questionnaires to examine tourists’ movement between scenic spots in the Xinjiang Uygur Autonomous Region, China, revealing competitive dynamics among attractions and the influence of strategic resources on attraction popularity. Similarly, Xu et al. [
16] examined international tourists’ preferences for different cities in South Korea using mobile location data, uncovering varying destination preferences based on nationality and significant disparities in destination attractiveness.
The analysis of intra-destination tourist behaviors focuses on tourists’ movement patterns among scenic spots (attractions) within a specific area. For instance, Miah et al. [
17] utilized a Flickr dataset to investigate tourists’ POIs at various attractions within the metropolitan area of Melbourne, Australia, and predict their behavioral tendencies. Hu et al. [
18] employed a Twitter dataset to investigate tourists’ movement patterns among different attractions in the New York metropolitan area, identifying popular attractions within the city. Mou et al. [
19] studied the spatial tourism patterns in Qingdao City based on tourists’ digital footprints, revealing the influence of distance and popularity on the spatial distribution of scenic areas. The uneven distribution of core tourism nodes within the city can lead to intense internal competition. Finally, Zhou and Chen [
20] collected data from Instagram to analyze tourists’ movement patterns to attractions within various administrative districts of Hong Kong, classifying attractions into four types and exploring the characteristics of each type.
The analysis of intra-attraction tourist behaviors focuses on tourists’ movement among attractions within a scenic area. Xia et al. [
21] utilized Semi-Markov processes to simulate tourist movement at Phillip Island Nature Park, Australia, and assessed the marketing potential of each attraction based on their duration of stay. Smallwood et al. [
22] employed face-to-face interviews to investigate the mobility patterns of tourists within the Ningaloo Marine Park in Australia. Their study revealed a notable reliance among visitors on the scenic road network. Moreover, they identified substantial variances in travel distances between first-time international tourists and their domestic counterparts. Birenboim et al. [
23] examined the spatio-temporal trajectories of tourists visiting the PortAventura theme park in Catalonia, Spain, employing GPS positioning technology. Their investigation reveals distinct spatio-temporal behavioral patterns within the theme park, reflecting variations in visitors’ duration of stay, time allocations, and intradiurnal temporal trends. Huang et al. [
24] explored the spatio-temporal behavior of visitors at Hong Kong’s Ocean Park by utilizing handheld GPS devices. The study identified three distinct spatio-temporal behavioral patterns and introduced a mobile trajectory measurement method based on path length, travel time, area coverage, and ellipse circumference.
Drawing from the research discussed above, it is evident that analyzing tourist behavior operates on three distinct levels: macro, meso, and micro. At the macro level, the research scope extends to inter-destination tourist behavior analysis. Meanwhile, the meso level delves into intra-destination tourist behaviors. At the microlevel, the focus narrows to intra-attraction tourist behaviors. Historically, scholarly research has predominantly concentrated on the macro and meso levels, emphasizing the flows and behaviors of tourists between destinations or within specific destinations. However, in recent years, there has been a noticeable paradigm shift. With the increasing availability and diversity of social media data for analysis, more scholars are now directing their attention toward studying the spatio-temporal behaviors of visitors within individual attractions.
Thus, this study utilizes the Kushan Scenic Area in Fuzhou as a case study and collects data from the 2bulu social media platform to investigate the attraction of distinct zones within the urban forest, analyze tourist movement patterns, and discern preferences regarding travel duration. The study aims to answer the following questions: (1) how the spatial distribution of the Kushan Scenic Area can be categorized based on tourists’ movement trajectories; (2) what categories of patterns emerge from tourists’ movement behaviors within the Kushan Scenic Area; and (3) what preferences tourists exhibit regarding travel durations within the Kushan Scenic Area.
2. Methodology
2.1. Study Area: Kushan Mountain Area, Fuzhou, China
Situated on the southeast coast of China, Fuzhou City in Fujian Province spans 12,000 square kilometers, with a built-up area covering 416 square kilometers. The urbanization rate is 72.5%, while the forest coverage rate reaches 58.41%. Geographically, Fuzhou typifies an estuarine basin characterized by higher terrain in the northwest, gradually descending to lower elevations in the southeast. Mountains and hills collectively dominate 72.68% of the total land area, with mountains encompassing 32.41% and hills 40.27%. Fuzhou is encompassed by a ring of mountain ranges, featuring Kushan Mountain (919.1 m) to the east, Qishan Mountain (820 m) to the west, Wuhu Mountain (700 m) to the south, and Lotus Peak (605.3 m) to the north.
Kushan Mountain, nestled in the northeast of Jin’an District, Fuzhou City, stands as a renowned scenic area cherished for its leisure and recreational offerings, as illustrated in
Figure 1. With a history spanning nearly 2200 years of urban development, Kushan Mountain has retained its natural beauty, preserved as an urban forest. Its peripheral location within Fuzhou has shielded it from the disruptions of urbanization, ensuring the preservation of its overall landscape. The forests in the Kushan Scenic Area are characterized by two main categories: the subtropical evergreen arborvitae species,
Pinus massoniana, and the evergreen arborvitae,
Acacia confusa Merr. Some deciduous broad-leaved, evergreen broad-leaved tree species and shrubs form transitional zones within the forest. There are many ancient and valuable tree species in Kushan Mountain, including
Cryptomeria fortunei Hooibrenk,
Pinus massoniana,
Albizzia chinensis,
Cinnamonum camphora,
Bauhinia blakeana,
Liquidambar formosana,
Cycas revoluta,
Osmanthus fragrans,
Keteleeria fortunei,
Lagerstroemia, etc. Rare and precious plants include Alsophila spinulosa,
Rhododendron protistum,
Dendrobium officinale Kimura et Migo,
Cymbidium dayanum Rchb. F, and various ferns and
Cymbidium sinense in the forest [
25].
As outlined in the General Plan of Kushan Scenic and Historic Area (2022–2035), the total area spans 49.72 square kilometers, with the core scenic area covering 12.72 square kilometers. The area boasts 164 landscape sources (groups), comprising 48 cultural landscape resources and 116 natural landscape sources. Currently, the scenic areas comprise Cedar and Kuliang Scenic Area, Kushan and Yongquan Scenic Area, Phoenix Pool and White Cloud Scenic Area, Mo Brook and Sword Gorge Scenic Area, White Horses Crossing Shan Brook Scenic Area, and Nanyang Emerald Scenic Area. In order to ensure optimal tour route connectivity and visitor accessibility, this study focuses on analyzing the spatio-temporal behaviors of tourists within the Cedar and Kuliang Scenic Area, Kushan and Yongquan Scenic Area, Phoenix Pool and White Cloud Scenic Area, Mo Brook and Sword Gorge Scenic Area, and White Horses Crossing Shan Brook Scenic Area [
26].
2.2. Data Source
2bulu (
www.2bulu.com, accessed on 31 October 2023) is a social platform that integrates outdoor travel resource sharing and community interaction. It serves as a go-to tool for outdoor enthusiasts, offering a plethora of features and functionalities tailored to meet their needs. The app developed by this website offers professional outdoor maps, navigation, and trajectory route services for travel enthusiasts. Users can record various information, such as time, speed, elevation, photos, and text, while using the app. The platform provides two types of open data for users to download: GPS trajectory data and geotagged photo data. The former enables the determination of tourists’ spatio-temporal status by calculating speeds between trajectory points, thus revealing activity ranges and behavioral patterns within tourist attractions. Additionally, GPS trajectory data facilitates the analysis of tourists’ stay times in scenic areas and the examination of transfer patterns and spatio-temporal behaviors. The latter type consists of photographs taken by tourists, which, containing geographic coordinates, are spontaneously uploaded by tourists, showcasing the locations and areas of interest visited during the tour.
2.2.1. Data Acquisition and Cleaning
We utilized Python to develop a script for collecting GPS trajectory data and photo information uploaded by tourists visiting the Kushan Scenic Area. A total of 4669 GPS trajectory data points and 51,205 raw datasets of geotagged photos were acquired from the 2bulu social media platforms, covering the timespan from 22 January 2011 to 30 October 2023.
Table 1 presents the raw datasets comprising tourist user IDs, longitude, latitude, elevation, and timestamps.
GPS trajectory data are frequently influenced by various factors, leading to deviations and incompleteness in user trajectories. We excluded four types of problematic GPS trajectories and trajectory points: (1) trajectories situated outside the study area; (2) repeated trajectories from the same user; (3) trajectories deviating from the intended scenic routes; and (4) trajectories not adhering to movement rules. To ensure accuracy in screening problematic trajectories and trajectory points, we implemented the following steps: (a) Given the relatively gentle travel route of Kushan Mountain Area, we utilized the findings from Liu et al. [
27], which stated that the walking speed is 5 km/h, equivalent to 1.4 m/s. (b) We employed the transformed projected coordinates to calculate the time and distance between two points using the formula presented in Equation (1) [
28]. In the formula,
represents the previous trajectory point of the user during movement, and
represents the subsequent trajectory point. The formula used the absolute value of the difference between two adjacent points, and the distance between two points was calculated. (c) The user’s movement distance was determined by subtracting the arrival time at the previous point from the arrival time at the next point. (d) To assess the variance between the user’s theoretical and actual distance, we multiplied the resulting time by the speed (1.4 m/s). We removed the trajectory point if the actual distance exceeded the theoretical distance. This method allowed us to evaluate whether the user’s movement data adhered to the movement rules. Following data cleaning, we identified 2377 valid trajectory lines, 1,787,323 trajectory points, and 31,802 user-uploaded photos. Moreover, we used the point of interest (POI) acquisition tool from Guihuayun (
http://guihuayun.com/poi/, accessed on 31 October 2023) to collect POI data for the primary attractions in the scenic area, obtaining POI information for 37 attractions. After finishing the data cleaning process, we will adhere to the procedures outlined in
Figure 2.
2.2.2. Tourist Route Reconstruction
To accurately count the number of tourists visiting various attractions, we employed Python to reconstruct the movement routes of each tourist in chronological order, following a thorough data-cleaning process. This facilitated the calculation of the transfer probabilities of tourists between attractions. Given the plethora of attractions within the scenic areas, visitation rates exhibit considerable variation. Consequently, we identified attractions with higher tourist flow as focal points for our statistical analysis. These attractions encompassed Xie Courtyard, Stone Gate Pavilion, Half-Mountain Pavilion, Observation Tower, Eighteen Scenes Park, Yongquan Temple, Lingyuan Cave, White Cloud Pavilion, Gratitude Pavilion, Buddhist Cave, Bore Nunnery, White Cloud Peak, White Cloud Cave, Jicui Nunnery, Mo Brook, Qingyangzuo, Shan Brook, Kuliang Club, Cryptomeria Fortunei Park, and Keping Reservoir. The ultimate length of the reconstructed tour routes depended on the variable number of attractions visited by tourists within the scenic area, as delineated in
Table 2.
2.3. Spatio-Temporal Behavior Patterns Analysis
We propose an analytical approach to investigate the spatio-temporal behavioral patterns of tourists. In this phase, we analyzed tourists’ movement patterns within the Kushan Scenic Area using GPS trajectory data and geotagged photographs. By delving into both spatial and temporal dimensions, we aim to effectively uncover the travel preferences of tourists.
Specifically, this approach can be divided into three steps: (1) Spatial distribution of tourists. This initial step involves the spatial gridding of the area of interest (AOI) range within the Kushan Scenic Area to assess tourists’ stay time, number of photographs taken, and trajectory points within each grid. Subsequently, the spatial link function of ArcMap 10.8 is employed to visualize the data, identify tourists’ areas of interest, and clarify the spatial characteristics of tourist behavior. (2) Tourists’ movement laws and patterns. To analyze the movement laws and patterns of tourists, we utilize a Markov chain model to compute the transfer probabilities of tourists between attractions and acquire the final steady-state distribution. (3) Spatio-temporal behavior patterns of tourists. Given the extensive network of roads within the Kushan Scenic Area, tourists have a multitude of route options, and their points of entry may vary. Thus, we designate Xie Courtyard, Gratitude Pavilion, Mo Brook, Shan Brook, Jicui Nunnery, and Kuliang Club as both the starting and ending points for tourists. Recognizing that the Observation Tower and White Cloud Peak serve as essential waypoints along these routes, attracting a significant flow of tourists, we classify these two attractions as the midpoints of the six routes. Leveraging this framework, we performed a K-Means clustering analysis of the arrival and departure times of attractions to investigate the spatio-temporal behavioral patterns of tourists.
2.3.1. Spatial Distribution of Tourists
Spatial data gridding serves as a fundamental technique for geometrically counting and integrating heterogeneous data [
29]. In this process, GPS trajectory points play a pivotal role, providing crucial spatial information. Leveraging ArcMap’s grid analysis tool, researchers can partition the study area into customizable grid cells, which enhances the processing of tourist trajectory data. By gridding the study area, researchers can effectively quantify various aspects of tourist behavior, including the duration of their stay, the frequency of geotagged photos, and the overall tourist count at specific locations. This systematic approach ensures a comprehensive analysis of spatio-temporal patterns within the tourist destination. In this study, we partitioned the Kushan Scenic Area into 100 m × 100 m grid cells using ArcMap’s grid tool. Subsequently, we employed the spatial connectivity function to link GPS trajectory points and geotagged photographs with these grid cells. This method facilitated the computation of total tourist stay time, photo count, and trajectory points across various grid cells, providing valuable insights into tourist behavior within the scenic area.
2.3.2. Tourists’ Movement Laws and Patterns
In studying tourist behavior, we employ the Markov transfer probability matrix to analyze the probability of tourists transitioning between various attractions [
30]. This probability is computed as the ratio of the number of visitors transferring to other attractions from a specific one to the total number of visitors departing from all attractions within that specific site. We calculated the transfer probability using all available data from 2011 to 2023. Ultimately, the attractions most frequently visited are identified based on the steady-state distribution probability following convergence. In this study, a tour route is regarded as a Markov chain, with attractions connected based on the sequence of tourist visits. For example, a tour route might progress as follows: Xie Courtyard → Stone Gate Pavilion → Half-Mountain Pavilion → Observation Tower is considered. To illustrate this concept, we present the reconstructed tour routes in
Table 2.
We hypothesize that tourists’ transfer between various attractions follows a Markov chain model. We define A = {a
1, a
2, a
3… a
n} as the node representing tourist transfers, with transfer routes denoted by t
1 < t
2 < t
3 <… <t
k for any given time sequence T = (
). Under the assumption of a steady-state Markov chain, the probability of a tourist transitioning from attraction
to attraction
at time
tk depends on the state and transfer probabilities at
tk. This probability is independent of previously experienced routes and is calculated using the formula presented below in Equation (2).
2.3.3. Laplace Smoothing
During the computation of Markov transfer probability matrices, certain attractions (nodes) may have events with a visit frequency of zero. However, it would be unreasonable to assume that attractions with a visit probability of zero are necessarily unvisited solely based on this dataset. According to the actual situation, in the Kushan Scenic Area, within the road complex, tourists can start at any attraction and move to another attraction, which is strongly random. Considering the irreducible nature of the Markov chain, we adopt Laplace Smoothing for nodes with zero probability. In statistics, additive smoothing, also recognized as Laplace smoothing, aims to address zero-probability events by employing the “plus one” method to augment each count [
31]. The formula is shown below in Equation (3). The
count (t
ij) denotes the frequency of visitors transitioning from attraction
i to
j within the dataset.
N signifies the total count of transfers, while
A denotes the total count of attractions.
2.3.4. K-Means Clusters Analysis
Cluster analysis is utilized to identify the proximity of discrete data attributes and to uncover similarities and anomalies within the dataset. The K-Means clustering algorithm is employed for this purpose, which groups multiple informational data objects into several meaningful clusters. Each cluster’s centroid represents the mean value of its members. The primary goal is to maximize the similarity within each cluster while minimizing the dissimilarity between clusters [
32]. The K-means clustering method operates on a dataset comprising
n data points {
x1,
x2,
x3…
xa} and a set of
k cluster centers {
c1,
c2,
c3…
ck}. It calculates the Euclidean distance from each point to the centroid, assigning points to the cluster whose centroid is closest. This process iterates until the cluster centers converge [
33]. The formula is shown below in Equation (4):
5. Conclusions
This study analyzed the spatio-temporal behavioral patterns of tourists in the Kushan Scenic Area by examining their movement trajectory data and geotagged photographs. Research findings: (1) The grid analysis categorizes the scenic area into three distinct zones: the Traditional Tourist Area, the Recent Development Area, and the Outdoor Adventure Area. (2) Analysis of tourists’ movement trajectories identifies six patterns: (a) Traditional Mountaineering and Sightseeing Patterns; (b) Short-Distance Mountaineering Patterns; (c) Sightseeing and Mountaineering Patterns; (d) Long-Distance Mountaineering Patterns; (e) Long-Distance Mountaineering and Sightseeing Patterns; (f) Outdoor Exploration Pattern. (3) Upon convergence, the steady-state distribution results of the Markov chain reveal that tourists predominantly transfer towards attractions located within the Traditional Tourist Area. (4) Many tourists engage in daytime activities within the scenic area, primarily departing in the morning, while nighttime activities attract fewer visitors. In conclusion, this study aids government departments, scenic area planners, and destination marketing organizations in identifying popular attractions and areas within scenic regions. It also facilitates tourism-related departments’ efforts to enhance the environment and quality of scenic areas, building upon existing infrastructure and enriching tourists’ experiences within the urban forest.
However, this study also possesses certain limitations. Specifically, it focuses on the mountainous and forested scenic area, which differs somewhat from other scenic areas regarding the tourism experience. It remains uncertain whether the differences extend to the applicability of the methodology of this study to other types of research. Meanwhile, this study lacks a stringent demographic stratification standard compared to survey research. The GPS trajectory data solely reflects the overall spatio-temporal behavior of tourists within scenic areas, leaving unexplored the potential differences resulting from variables such as gender, education, age, and others. Therefore, further research on mountain scenic areas requires more advanced technical methods to analyze tourists’ spatio-temporal behavior patterns.