Delineating Peri-Urban Areas Using Multi-Source Geo-Data: A Neural Network Approach and SHAP Explanation

Sun, Xiaomeng; Liu, Xingjian; Zhou, Yang

doi:10.3390/rs15164106

Open AccessArticle

Delineating Peri-Urban Areas Using Multi-Source Geo-Data: A Neural Network Approach and SHAP Explanation

by

Xiaomeng Sun

^1,2

,

Xingjian Liu

³ and

Yang Zhou

^1,2,*

¹

Key Laboratory for Geographical Process Analysis & Simulation of Hubei Province, Wuhan 430079, China

²

College of Urban and Environmental Sciences, Central China Normal University, Wuhan 430079, China

³

Department of Urban Planning and Design, University of Hong Kong, Hong Kong SAR, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(16), 4106; https://doi.org/10.3390/rs15164106

Submission received: 24 June 2023 / Revised: 11 August 2023 / Accepted: 18 August 2023 / Published: 21 August 2023

(This article belongs to the Special Issue Geospatial Foundation Model in Urban Environments: Challenges and New Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

Delineating urban and peri-urban areas has often used information from multiple sources including remote sensing images, nighttime light images, and points-of-interest (POIs). Human mobility from big geo-spatial data could also be relevant for delineating peri-urban areas but its use is not fully explored. Moreover, it is necessary to assess how individual data sources are associated with identification results. Aiming at these gaps, we apply a neural network model to integrate indicators from multi-sources including land cover maps, nighttime light imagery as well as incorporating information about human movement from taxi trips to identify peri-urban areas. SHapley Additive exPlanations (SHAP) values are used as an explanation tool to assess how different data sources and indicators may be associated with delineation results. Wuhan, China is selected as a case study. Our findings highlight that socio-economic indicators, such as nighttime light intensity, have significant impacts on the identification of peri-urban areas. Spatial/physical attributes derived from land cover images and road density have relative low associations. Moreover, taxi intensity as a typical human movement dataset may complement nighttime light and POIs datasets, especially in refining boundaries between peri-urban and urban areas. Our study could inform the selection of data sources for identifying peri-urban areas, especially when facing data availability issues.

Keywords:

peri-urban; urban–rural fringe; taxi trajectory; nighttime light images; SHAP values

1. Introduction

Peri-urban areas or urban–rural fringes emerge during rapid urban expansion [1,2,3,4], featuring “multiple rural-to-urban transformations that create a mosaic-like landscape of urban and rural land-uses, livelihoods, and lifestyles” [1] (p. 1). Such ambiguity may posit challenges for development, planning, and management [5] and call for accurate delineation of peri-urban boundaries [6,7]. This task is challenging considering how land use, population, and economic activities are mixed and evolving in the fringe areas [1,6,8].

Peri-urban areas have been delineated and characterized in terms of land use patterns, socio-economic conditions, and development policies [8,9,10,11]. In addition to these factors, information about human mobility such as urban–rural commuting and public transport could be relevant for delineating peri-urban areas [12,13,14]. As for empirical methods, conventional methods are often transect-line-based [15], remote-sensing-based and land-use-centered [16,17]. Such methods may require priority knowledge in deciding cluster parameters [16] or rely on manually selected mutation points [18]. Furthermore, given the increasing amount of datasets that could be potentially useful for identifying peri-urban areas, it may be necessary to assess how individual data sources are associated with identification results.

This study, therefore, aims to delineate peri-urban areas using multi-source data and machine learning as well as incorporating information about human movement. Particularly, we utilize datasets of land use and land cover (LULC) maps, nighttime light images, points of interest (POIs), road network, and taxi trajectories. We group relevant datasets and factors into physical/spatial, socio-economic, and human movement intensity types, so as to explain the potential contributions of individual factors to identifying peri-urban areas through an explanation tool of SHapley Additive exPlanations (SHAP) [19].

This study aims to contribute in the following ways. First, we evaluate and interpret how different data sources and indicators may be associated with the delineation results using SHAP. A more solid understanding of such associations may help data selection, especially when facing data availability issues and constraints. Second, we use taxi trip origins/destinations (OD) [14] to explore the potential of jointly applying human mobility data as well as other socio-economic and land use datasets for differentiating peri-urban and urban areas. Subsequent sections are organized as follows. The next section reviews current approaches to identifying peri-urbans and common data sources. The study area, datasets, and methodology are detailed in Section 3. Results and comparisons with other studies are discussed in Section 4. The study concludes with key findings and future work.

2. Literature Review

2.1. Peri-Urban Identification

While peri-urban areas have been defined differently [1], their roles as “transitions” between the urban and rural are often highlighted [1,20,21,22]. Importantly, urban peripheries could be dynamic during socio-ecological transitions [4], posing challenges to the delineation of peri-urban boundaries [7,10]. While Mortoja et al. (2020, p. 10) have suggested that “a generic method does not exist to demarcate peri-urban areas as characteristics of peri-urban areas are not uniform across the globe” [10], the following types of methods can be identified: qualitative, transect-line-based, remote-sensing-based and land-use-centered as well as data-driven approaches [1,7,10].

In qualitative (and often case-based) approaches, interviews and document analysis are often used to define and categorize peri-urban areas [3,7,11,21]. Differences in terms of culture, market, and industry may be associated with different types of (in)formal urban expansion and peri-urban patterns [3]. Efforts have been made to integrate quantitate and qualitative approaches as well as to extend these methods across geographical contexts [1,3].

Many studies employ transect lines to characterize urban–rural changes and gradients of population distribution and land use [23]. Transect lines are often generated by creating lines from identified centers towards the boundaries of study areas [24,25]. A key methodological issue, therefore, concerns the detection of major transitions along transect lines [24]. Common methods include constraints from multiple weighted indices [25] and spatial continuous wavelet transform [15,26]. However, in this approach, an urban center needs to be decided first in order to create sample or transect lines, which may limit the method’s application in cities with multiple centers.

Another group of literature has classified urban land use/land cover (LULC) based on remote sensing images. Indicators such as built-up densities [25] and Shannon diversity [27] have been used to assess urban and rural morphological characteristics. Nightlight satellite images, such as those gathered by Suomi National Polar-Orbiting Partnership’s Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) [28], can be adapted and used as a proxy measurement of population and economic intensity [17,29,30,31]. Classification methods such as support vector machine [32], K-means [16,17], and K-means++ [33] have been applied at metropolitan and regional scales.

Recent development in urban data also opens up opportunities for peri-urban area identification. For example, the density of POIs has been integrated to characterize functional use of urban land [25]. Still, mobility indicators such as public transport provision, commuting statistics [13], and “migration-commuting nexus” [12] (p. 118) are used to investigate the character of urban periphery growth. The wide spread of human mobility data has shown its potential in revealing patterns of human activities and urban form [34,35,36]. In this regard, taxis represent one of the most common types of cars cruising within cities and their large amounts of trips are widely analyzed [37]. In He et al. (2021) [26], the migration data from Tencent heatmap platform have been fused with nighttime images and POIs to depict urban agglomeration boundaries.

Overall, different indicators used in various studies have been summarized in other comprehensive reviews [1,7,10]. Peri-urban areas are often “characterized by either the loss of ‘rural’ aspects…or the lack of ‘urban’ attributes” [22] (p. 136). To this end, the delineation of peri-urban areas may employ urbanization-related indicators covering both physical development and socio-economic features [1,10]. Moreover, whether and to what extent the human movement intensity could contribute to describing the variations in the transitional areas, especially at the metropolitan level, needs to be further explored.

2.2. Neural Network in Peri-Urban Study and the SHAP Explanation

Specifically, neural networks (NNs) are adopted to identify peri-urban areas [38,39] and associated urbanization and regional changes [40]. For example, Tayyebi et al. (2011) use a NN model to simulate and predict the urban boundary of Tehran, Iran [41]. Peng et al. (2020) map urban and urban–rural fringe in Beijing based on self-organizing features [42]. Still, Tsagkis et al. (2023) develop a NN model to predict urban growth in selected Greek cities [39]. NN models have the property of nonlinearly combining multiple features in the final classification results, and thus “tend to be fairly insensitive to problems of multicollinearity” [43] (p. 394). Nevertheless, the needs for properly explaining model results have been highlighted [44]. While there are many insightful reviews, the following discussion draws extensively from sources such as Linardatos et al. (2021) and Molnar (2022) and provides the context for this study [44,45].

Conventional methods often help with interpreting machine learning models via various plots [44,45]. While the Partial Dependence Plot (PDP) represents a global or overall explanation and shows a feature’s average marginal effect considering all training instances [46], it may overlook how effects may vary across samples for the same feature [45,46]. The Individual Conditional Expectation (ICE) plot improves the PDP by providing more (local) information about individual samples or instances, though the plot could be too dense if the training dataset is large [45,47]. Still, the Permutation Feature Importance (PFI) measures and ranks the importance of a feature based on permutation and prediction errors [48]. However, the PFI may be associated with bias as the order of features matters in tree-type models [45,49].

In light of the above limitations, explainable AI tools [44] notably including Local Interpretable Model-agnostic Explanations (LIME) [50] and SHAP have been proposed [44]. LIME is an explanation model which uses local interpretable models to explain individual predictions [50] and SHAP links LIME with Shapley values from coalitional game theory [19,45,51]. SHAP has different variants such as Kernel SHAP which has wide applicability [19] and TreeSHAP which entails a fast implementation [49].

Existing studies have pointed to the following strengths of SHAP [45,49]. First, SHAP reports both the global importance of a feature and effects of such a feature at the individual instance or sample level [19,45]. Second, SHAP is grounded in game theory and has several “describable properties: local accuracy, missingness, and consistency” [44] (p. 12). Third, SHAP can provide additional interaction effect between feature combinations using the SHAP dependence plot with interaction values [19]. Last, SHAP and LIME can work with a wide range of data formats such as text and images [44,45]. Nevertheless, as Molnar (2022) noted, the disadvantage of SHAP is that Kernel SHAP is computationally costly and may overlook feature dependence [45].

Therefore, we use a neural network to identify peri-urban areas and apply SHAP values to assess how different indicators can be linked with identification results. In this regard, it may be deemed an attempt at moving the NN model from a “black box” towards a “white box” [44] (p. 1).

3. Methodology

The framework of this study mainly contains three parts (Figure 1). First, we calculate eight commonly used urbanization-related indicators introduced in existing studies based on road networks, remote-sensing-based LULC map, nighttime light images, POI data, as well as taxi trip OD data. Then, a neural network model is applied to integrate these indicators and capture the non-linear relationships between indicators and peri-urban patterns. Finally, SHAP analysis is used to investigate the association between different indicators and model outputs, so as to explain the potential association between urbanization indicators and peri-urban boundaries. Whenever possible, our results are cross-referenced with the relevant literature and official plans of the city.

3.1. Study Area and Data Sources

We use Wuhan, one of the largest cities in central China with an administrative jurisdiction of about 8569

{k m}^{2}

(Figure 2), as the case. Wuhan may serve as a suitable test ground for delineating peri-urban areas, as the city in past years has undergone significant changes in terms of rapid urbanization, population growth, and subsequently shifting peri-urban areas [16,52]. For example, according to the Statistical Bulletin of Wuhan on National Economic and Social Development (http://tjj.wuhan.gov.cn/tjfw/tjgb/, accessed on 13 April 2023), the built-up area of Wuhan has increased from 500

{k m}^{2}

in 2010 to 885.11

{k m}^{2}

in 2020.

To capture physical land use and land cover, socio-economic development as well as human movement, eight indicators from five categories of datasets are employed and listed in Table 1. This study uses the LULC dataset from a global land cover map of 2017, with a 10-m resolution, and produced by Gong et al. (2019) [53]. In this dataset, land use and cover are classified (e.g., cropland, water, and impervious) based on Sentinel-2 images and other auxiliary data [53]. The nighttime light dataset is obtained from annual night light datasets for China based on NPP-VIIRS in 2018, which is in turn released by China Resource and Environment Science and Data Center [54]. We also gather the POI records in Wuhan in 2018 through application programming interfaces (APIs) of Gaode navigation map platform. After removing redundant and missing records, there are a total of 601,346 POIs. Then, we follow Dong et al. (2022) [55] and select the following service-oriented POI types for subsequent analysis: administration, institutions, public service, catering, shopping, educational, entertainment, life services, automobile services, hotels, and finance. Moreover, road networks in Wuhan are collected from a local navigation map company instead of using OpenStreetMap to avoid uncertainties or lack of volunteered data in especially remote rural areas. Furthermore, the taxi trip dataset is acquired for the period of 7–11 May 2018, which includes 407,713 trip records. We keep the picking-up and dropping-off points of each occupied trip as trip O/Ds in this study.

3.2. Deriving Urbanization-Related Indicators

In this section, we calculated eight commonly used indicators according to the literature to characterize urbanization patterns, namely, built-up land density (BLD), Shannon diversity index (SHDI), road network density (RD), nighttime light intensity (NLI), nighttime light fluctuation (NLF), POI density (PD), service POI density (SPD), and taxi trip intensity (TI; Table 1). The study area is organized into grids of 500-by-500 m to allow for common analytical units as well as considering the spatial resolutions of various datasets.

Built-up land density (BLD) is derived from LULC data and characterizes the proportion of built-ups. The calculation of BLD for individual grids follows that of Yang et al. (2021) [25] and Peng et al. (2018) [15]:

{B L D}_{i} = \frac{{i m p e r v i o u s}_{i}}{{A r e a}_{i}}

(1)

where

{i m p e r v i o u s}_{i}

and

{A r e a}_{i}

represent the impervious and total area of grid

i

.

With rapid land use transformation, peri-urban areas tend to have higher land use diversity degree than urban and rural areas [27,52]. The Shannon diversity index (SHDI) is employed to characterize land use diversity and landscape complexity [27,52].

S H D I = - \sum_{i = 1}^{n} p_{i} \ln p_{i}

(2)

where

p_{i}

denotes the proportion of different land use and cover types within individual grids.

Road network density (RD) is defined by dividing total road length over total area of individual grids [56]. It is calculated in Equation (3):

{R D}_{i} = \frac{\sum_{j = 1}^{n} {L e n g t h}_{j}}{{A r e a}_{i}}

(3)

where

{L e n g t h}_{j}

denotes length of individual road links within the corresponding grid.

Nighttime light images used in this study are annually aggregated and averaged to reduce pixel saturation and inter-annual variation (see also He et al. (2021), Sutton et al. (2010)) [26,31]. The monthly nighttime light intensity (NLI) is resampled to the units of analysis in the current study based on average rules.

{N L I}_{i}

is used to denote the night-time light intensity of the grid

i

.

Nighttime light fluctuation (NLF) is calculated from the light intensity values and characterizes the variation therein [17]. Specifically and following Feng et al. (2020) [17], NLF is calculated as follows:

{N L F}_{i} = {m a x (N L I}_{j \in n e i g h b o r (i)}) - {m i n (N L I}_{j \in n e i g h b o r (i)})

(4)

where

{N L F}_{i}

is the nighttime light fluctuation index of grid

i

, the neighborhood includes the eight adjacent grids by queen contiguity and the center point itself as well as

{m a x (N L I}_{j \in n e i g h b o r (i)})

and

{m i n (N L I}_{j \in n e i g h b o r (i)})

which denote maximum and minimum NLI of the neighbors.

POI density (PD) and service POI density (SPD) are calculated as follows:

{P D}_{i} = \frac{{N P O I}_{i}}{{A r e a}_{i}}

(5)

{S P D}_{i} = \frac{{N P O I}_{i} (s e r v i c e)}{{A r e a}_{i}}

(6)

where

{P D}_{i}

,

{N P O I}_{i}

,

S {P D}_{i}

, and

{N P O I}_{i} (s e r v i c e)

represent the overall POI density of grid

i

, number of the POI falling into the grid

i

, service POI density of grid

i

, and number of the service POI, respectively.

Taxi intensity (TI) denotes the number of taxi trip OD that falls inside the grid. TI is calculated considering both pick-up and drop-off points, as denoted in Equation (7):

{T I}_{i} = \frac{{P i c k u p}_{i} + {D r o p o f f}_{i}}{{A r e a}_{i}}

(7)

where

{T I}_{i}

is the taxi trip OD intensity of grid

i

,

{P i c k u p}_{i}

is the number of the taxi pick-up data falling into the grid

i

,

{D r o p o f f}_{i}

is the number of the taxi drop-off data falling into the grid

i

. All above indicators are normalized using min-max scaling.

3.3. Classifying Peri-Urbans by a Nerual Network

A neural network is applied to integrate the abovementioned indicators and capture the non-linear relationships between urbanization-related indicators and peri-urban patterns. The neural network framework has four layers (Figure 3), containing two hidden layers similar to Tasgkis et al. (2023) [39]. The activation function uses the sigmoid function. We input eight indicators of each grid into the model, with the three output nodes corresponding to urban, peri-urban, and rural, respectively [8].

The selection of grids for training to represent urban, peri-urban, and rural areas is ‘supervised’. Specifically, we select 5 sample areas for each type, and identify 750 sample grids overall (i.e., 250 grids for individual types). The samples are selected based on human inspection of remote sensing images, street views (e.g., those from Baidu map), and plans as well as validated based on results from similar studies such as Ding and Chen (2022) [16]. We have conducted two rounds of independent sample selection to ensure consistency. Specifically, the Jaccard indices between identification results from the two sets of samples are above 80%. Subsequently, 70% of the sampled grids are randomly selected and used for model training. The test set, therefore, uses the rest of the samples. We set the learning rate to be 0.0001 and iterated the training over 3000 times to ensure the overall accuracy of the model is above 95%. As a result, the whole metropolitan area is classified into three categories. The neural network is implemented through the Keras library (version 2.10.0) in Python [57].

3.4. SHAP Explanation

SHAP leverages game theory and calculates Shapley values for each instance (in the training set) to measure the impact of features (i.e., in our case the urbanization-related indicators) on the model predictions (i.e., in our case the identification of urban, peri-urban, and rural areas) [19,45]. Subsequent description of the method draws heavily upon Lundberg and Lee (2017) and Molnar (2022) [19,45]. SHAP values are calculated based on weighted averages of differences between predictions from training the model with all features and with focused features removed [19,45]. Specifically, the Shapley value of a feature is calculated following Equation (8) [19]:

Φ_{i} = \sum_{S \subseteq F {i}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} [f_{S \cup \{i\}} (x_{S \cup \{i\}}) - f_{S} (x_{S})]

(8)

where

F

and

S

represent the full and sub-set of features;

f

,

f_{S \cup \{i\}}

, and

f_{S} (x_{S})

denotes the prediction model, prediction using feature

i

and prediction without using feature {

i

}, respectively.

Φ_{i}

thus characterizes the feature’s contribution to the final prediction, with a positive value suggesting that the corresponding feature increases the prediction value and a negative value otherwise. Calculations can also be performed for a feature at the sample level [45]. SHAP feature importance is the magnitude of the feature attribution to the model. It is subsequently calculated by averaging absolute Shapley values across samples, as shown in Equation (9) [45].

I_{i} = \frac{1}{n} \sum_{j = 1}^{n} | Φ_{i}^{(j)} |

(9)

where

i

and

j

represent a feature and samples, respectively.

As mentioned, the conventional Kernel SHAP method assumes features to be independent, which may be hard to satisfy in our study. Therefore, we have applied an extended Kernel SHAP, which can account for dependent features and provide more accurate approximations to the Shapley values [58]. SHAP values and summary plot are produced to assess how individual features are related to the delineation of peri-urban areas. Main calculations are implemented with package Shaprpy (version 0.1) [59] and visualizations are based on Shap (version 0.41.0) [19,60] in Python.

4. Results

4.1. Peri-Urban Area Identification

Figure 4 maps distribution of normalized indicators. With the concentration of built-up area, economic activities, and population, core urban areas are associated with high levels of BLD, RD, NLI, and PD. NLF highlights areas that have sharp changes in NLI compared to its neighbors, such as areas around the airport, Yangluo Port, and Wuhan Iron and Steel Corporation (Figure 4e). These transport and industrial areas are located on the periphery, with high night light intensity and fluctuation rate. The values of SPD are positively correlated with those of PD. However, these values only count for service type of POI and thus are lower than those of PD. The taxi OD intensity TI is quite concentrated and mainly highlights the urban and peri-urban, differentiating them from rural areas (Figure 4h).

Figure 5 characterizes the distribution of identified urban, rural, and peri-urban areas from the output of the NN model. Average values of all indicators for the three types of identified areas are listed in Table 2. Overall, rural areas account for the largest proportion, reaching 63.90% and with an area of 5475

{k m}^{2}

. Peri-urban areas account for 30.91%, with an area of 2649

{k m}^{2}

. Urban areas are identified to account for 5.19% and have an area of 445

{k m}^{2}

.

It is noticed from Figure 4 and Table 2 that the urban area has the highest values in BLD, RD, PD, SPD, and TI. In other words, the urban core has a high built-up density, intensive road network, abundant POIs, as well as intensive taxi trips. The fringe areas show more fragmented and diverse landscape, thus have the highest mean value of SHDI, which is in accordance with studies in Huang et al. (2016) and Long et al. (2022) [27,52]. A joint reading of remote sensing images and local information suggests that the high values of SHDI in the northeast district Xinzhou correspond to an extensive area of ponds and waterbodies, often used for aquaculture. As depicted in Figure 4b, the distribution of high SHDI is quite fragmented and dispersed across the city. This indicates that SHDI may not be able to capture the diversity of the three areas in and by itself. The fringe areas also show high variation of light intensity, with highest value in NLF compared to urban core and rural areas, which is also in line with the description in Feng et al. (2020) [17]. Rural areas seem to have the lowest values in all indicators.

We have benchmarked our results against two sets of studies [16,52]. First, the comparison is made against previously published academic research of peri-urban patterns in Wuhan (Table 3). Here we compare our result with Ding and Chen (2022, p. 7) [16], specifically the set of results for the year of 2020 and with a cluster number of 4 in the K-means classification. The Jaccard indices of urban and peri-urban areas between our results and (digitized) Ding and Chen (2022)’s results when k = 4 in K-means are about 64.94% and 71.37%, respectively. Results in Long et al. (2022; Table 3, p. 12) [52] seem to be different from ours and Ding and Chen (2022) [16]. The differences may be attributed to different data sources, methodology, years of analysis, and definitions of peri-urban areas.

Second, following Ding and Chen (2022) [16], we have compared the delineation results with the official Wuhan comprehensive plan (http://gtghj.wuhan.gov.cn, accessed on 13 April 2023; Figure 6). The plan characterizes land uses for different purposes such as residential, administrative, commercial, and educational as well as the “zones for development” (as highlighted by red dotted lines in Figure 6). The total of urban and peri-urban areas identified from our study overlaps with the ‘zones for development’, with a Jaccard index of 74.07%.

The difference between identified peri-urban areas and the plan can be mainly grouped into four types based on a synthesis of information. The first type is often related to small towns located outside the urban development zone, such as area ① in Figure 6. Even though these towns are far from the city center, they often have clusters of residential buildings and relatively high road network densities. For example, area ① has a BLD value of 0.543 and an RD value of 0.575. Second, some ecological lands may locate inside and near the edge of the development zone but are often classified as rural areas in our results, such as area ② in Figure 6. Such areas are often farmlands and show no significant signs of development. Third, there are areas labelled as urban land uses in the plan, but classified as peri-urbans, such as area ③ in Figure 6. By zooming into the remote sensing image and street views, it is noticed that, for example, area ③, located on the edge of the third ring road, is having newly constructed residential zones and with a BLD value of 0.687. However, as shown in Figure 6, the corresponding streetscape may not show high levels of human activity. Specifically, population and socio-economic indices (PD = 0.309, NLI = 0.284, TI = 0.074) have not reached the average levels of urban areas. Fourth, some of the peri-urban areas overlap with the ecological land, like the example of area ④ in Figure 6. We can also find a mixed pattern of built-ups and farmlands in the remote sensing images. In fact, area ④ has a value high value of NLI (NLI = 0.493) but a low PD (PD = 0.125) and a low TI (TI = 0.001), featuring an industry park with several logistic companies.

4.2. SHAP Analysis of Individual Factors

To explore the association between different indicators and model outputs, we calculate SHAP values for each indicator. Figure 7 is the SHAP summary plot, which shows overall impacts of features in identifying urban, peri-urban, and rural areas. Individual points on Figure 7 represent samples, with red and blues ones corresponding to samples with high and low values, respectively. As mentioned above, the horizontal axis shows the SHAP value, with a positive value implying that the corresponding feature increases the prediction value and a negative value otherwise [19,45]. The features are sorted along the vertical axis by SHAP feature importance [45], which is also annotated as bold bars. Moreover, in order to illustrate the impacts of different indicators, we have produced delineation results with different combinations of physical spatial, socio-economic and human movement indicators (see comparisons in Figure 8). Additionally, Table 4 summarizes the Jaccard indices between scenarios plotted in Figure 8a–e and the scenario in Figure 8f which is produced with all eight urbanization-related indicators.

The result of Figure 8f is used as the benchmark in Table 4 because its model accuracy is the highest among the six scenarios. Moreover, it is suggested that the comprehensive index may work better than single indices compared in Yang et al. (2021) [25]. For example, results from using merely physical spatial features (BLD, SHDI, and RD) seem to identify larger urban areas (Figure 8a), whereas using indicators merely from the socio-economic (NLI, NLF, PD, and SPD) tends to identify more peri-urban areas (Figure 8b).

As revealed in SHAP values, indicators from nighttime light data (NLI and NLF) and TI are closely related to the identification of the peri-urban area. Jointly applying socio-economic indicators and taxi intensity (TI) (Figure 8e) produces results similar to those in Figure 8f, with a Jaccard index of 83.06% for peri-urban areas (Table 4). Socio-economic attributes, especially NLI and NLF, rank top in SHAP values in scenarios of classification of three areas, while PD ranks high in the output of urban areas (Figure 7). On the contrary, eliminating socio-economic indicators from the analysis (Figure 8d,f) tends to produce results with smaller urban areas and more scattered peri-urban areas. It is also consistent with Table 4 that scenarios without socio-economic indicators have low Jaccard indices regarding peri-urban areas, with values of 46.85% in scenario (a) and 52.42% in scenario (d), respectively.

Among the physical attributes, the SHAP importance of SHDI and RD rank low in three cases. BLD has a moderately high association with the delineation of both peri-urban and rural areas (Figure 7). Peri-urban areas are fragmented and interweaved with urban areas (Figure 8a). It may be that infrastructures have been constructed and that land use has been transformed to built-up in the periphery (as shown in Figure 4a, where both urban and peri-urban areas have high BLD values), making it difficult to separate urban and peri-urban areas merely based on physical patterns. In this regard, physical indicators may be less relevant than socio-economic indicators in identifying peri-urban areas (e.g., Figure 8a,b). This observation is in line with the claim that “peri-urban demarcation from only satellite images is biased” in Sahana et al. (2023, p. 21) [1]. However, by comparing Figure 8a,e,f, it seems that the use of physical attributes combined with other indicators may help in refining boundaries between peri-urban and rural.

5. Discussions

5.1. The Effect of TI

Taxi trip intensity TI ranks second in influence for identifying the urban, and third for peri-urban areas (Figure 7a,b). The low impact of taxi ODs in classifying rural areas can be explained by the sharp decrease in taxi OD points in rural areas. Moreover, TI may be complementary to other indicators especially when differentiating urban areas from peri-urban areas, by comparing Figure 8a–c with Figure 8d–f. This may be due to the fact that human mobility reveals urban socio-economic and travel demand inside the city, and thus could help in highlighting areas not picked up by nighttime image and POI density.

This is further illustrated with a comparison between Qingshan and Jiangxia areas (Figure 9). Qingshan area covers many old neighborhood communities with limited commercial POI points and is connected with other parts of the urban core through others’ commuting trips [61]. By contrast, Jiangxia area represents an outlying town with solid physical infrastructure (BLD = 0.597, RD = 0.559) and local economies (NLI = 0.209, PD = 0.331). Without the inclusion of TI data, the model may identify much of Qingshan as peri-urban and parts of Jiangxia as urban (Figure 9). The distribution of taxi trajectories may in this case help in refining boundaries between peri-urban and core urban areas.

We further produce the force plot for the peri-urban output in Figure 10 to inspect the SHAP values grids in Qingshan area and Jiangxia area whose classification type is different between Figure 9a and Figure 9b. In the force plot, the x and y axes denote sample id/number and output prediction values, respectively [60]. The blue and red lines show whether individual samples or grids can be associated negative (blue) and positive (red) impacts (towards model prediction) [45,60]. The feature values can also be annotated (as in, for example, sample #11 in Figure 10). The left 18 samples in Figure 10 are located in Jiangxia. TI shows a strong positive impact (red) in classifying these instances as peri-urban. For these samples, the values of TI are relatively low. By contrast, the right part of the figure represents the patterns of SHAP values from Qingshan. TI shows a strong negative impact (blue) on classifying these samples as peri-urban. For these areas, the values of TI are relatively high. In this regard, with the help of SHAP, we can take a closer look into the individual predictions to explain how features impact differently across instances.

Existing literature suggests to us to consider human movement as an indicator to depict the urban structure [34,35,36,62], especially when mobility positioning data becomes increasingly available. While socio-economic data may be sufficient for identifying the overall patterns, TI may be useful for identifying peri-urban areas in borderline cases (Figure 9).

5.2. Inclusion of Other Indicators

The result in Section 4.2 shows that socio-economic indicators, i.e., NLI, NLF, and PD, have larger impacts for identifying peri-urban areas. However, we also tried other indicators, such as population density from LandScan [63], to check if these indicators can be substituted. When putting physical indicators (BLD, SHDI, and RD), mobility indicator TI, and population density into the model, the SHAP feature importance of population density will rank first. This may be due to the fact that population density data at fine resolutions is often estimated from multiple source data including nighttime light images [64]. Furthermore, the spatial distribution of the output shows some misclassified areas in the urban areas that have low population density but active socio-economic activities (e.g., urban renewal). Given such cases, socio-economic indicators may not be fully substituted by population density data.

Some studies have also discussed the use of industrial POIs in identifying the urban fringe area, such as Dong et al. (2022) [55]. A key assumption or context for the analysis of Dong et al. (2022, p. 3) is that the density of manufacturing industry in Beijing follows a pattern of “increasing first and then decreasing”, and with “the sudden change point of its rapid decrease can be regarded as the external fringe of the urban fringe area”. This assumption may not hold for the current study as there are still many POIs of manufacturing industries located in the urban center of Wuhan.

6. Conclusions

Although multi-source datasets have been used in classifying urban, peri-urban, and rural areas, the contribution of individual indicators is not fully discussed. In this paper we integrate physical spatial features, socio-economic features, as well as human movement distribution in a neural network framework to delineate boundaries of peri-urban areas. By applying the SHAP method, we explain the potential association between urbanization-related indicators and peri-urban boundaries.

Our results can be summarized as follows. First, among all indicators, indicators from socio-economic groups, such as nighttime light intensity (NLI), nighttime light fluctuation (NLF), and POI density (PD) are more relevant for identifying peri-urban areas. Built-up land density (BLD) also has high impact on identifying rural areas, while other indicators of physical environment, such as Shannon diversity index (SHDI) and road density (RD) from LULC images are not so associated with regard to comparatively lower SHAP importance. Second, taxi intensity (TI) as a typical human movement data may complement with the NLI and PD, especially in those residential communities at the edge of the urban that have fewer POIs but high levels of travel/commuting behaviors (e.g., Figure 9). For example, the taxi travel intensity may identify those peri-urban areas that are physically more developed but socio-economic-wise less integrated with the urban core. In this sense, the TI may help in refining boundaries between the urban and peri-urban.

The limitations can be summarized as follows. First, the NN model does not take the spatial continuity into account, thus potentially producing more scattered results. Second, all taxi movements are constrained inside the road network [36]. Therefore, it may be difficult for taxi trip data to characterize the human mobility characteristics outside urban areas. Combination with other human movement data such as mobile phone data may be considered [62], such as the rural–urban migration and commuting distance used in Brown et al. (2015) [12]. Still, while our current observations and conclusions will be useful for cities similar to Wuhan with rapid socio-economic and physical changes, future work may cover other cities and explore relationships between indicators and identified peri-urban areas under different contexts.

Author Contributions

Y.Z.: Conceptualization, methodology, formal analysis, writing—original draft, reviewing and editing. X.S.: Data curation, investigation, visualization, writing—original draft. X.L.: Conceptualization, methodology, formal analysis, writing—reviewing and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 42001399), Hong Kong Research Grant Council (grant number 17600918).

Data Availability Statement

Sources of datasets used in this study are listed in the manuscript in Table 1.

Acknowledgments

The authors thank the journal editor and five anonymous reviewers for their valuable comments. We thank a local expert and Mian Huang (Wuhan University) for their technical assistance and comments. We also would like to thank Hong Chen (Huazhong University of Science and Technology) for generously sharing figure outputs from Ding and Chen (2022), Zhixiang Fang (Wuhan University) for providing taxi OD dataset, as well as Mingshu Wang (Glasgow University) and Xiaohu Zhang (The University of Hong Kong) for their comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sahana, M.; Ravetz, J.; Patel, P.P.; Dadashpoor, H.; Follmann, A. Where Is the Peri-Urban? A Systematic Review of Peri-Urban Research and Approaches for Its Identification and Demarcation Worldwide. Remote Sens. 2023, 15, 1316. [Google Scholar] [CrossRef]
Van Vliet, J. Direct and Indirect Loss of Natural Area from Urban Expansion. Nat. Sustain. 2019, 2, 755–763. [Google Scholar] [CrossRef]
Kleemann, J.; Inkoom, J.N.; Thiel, M.; Shankar, S.; Lautenbach, S.; Fürst, C. Peri-Urban Land Use Pattern and Its Relation to Land Use Planning in Ghana, West Africa. Landsc. Urban Plan. 2017, 165, 280–294. [Google Scholar] [CrossRef]
La Rosa, D.; Geneletti, D.; Spyra, M.; Albert, C. Special Issue on Sustainable Planning Approaches for Urban Peripheries. Landsc. Urban Plan. 2017, 165, 172–176. [Google Scholar] [CrossRef]
Simon, D. Urban Environments: Issues on the Peri-Urban Fringe. Annu. Rev. Environ. Resour. 2008, 33, 167–185. [Google Scholar] [CrossRef]
Amirinejad, G.; Donehue, P.; Baker, D. Ambiguity at the Peri-Urban Interface in Australia. Land Use Policy 2018, 78, 472–480. [Google Scholar] [CrossRef]
Ahani, S.; Dadashpoor, H. A Review of Domains, Approaches, Methods and Indicators in Peri-Urbanization Literature. Habitat Int. 2021, 114, 102387. [Google Scholar] [CrossRef]
Mortoja, M.G.; Yigitcanlar, T. Why Is Determining Peri-Urban Area Boundaries Critical for Sustainable Urban Development? J. Environ. Plan. Manag. 2023, 66, 67–96. [Google Scholar] [CrossRef]
Fang, L.; Wang, Y. Multi-Disciplinary Determination of the Rural/Urban Boundary: A Case Study in Xi’an, China. Sustainability 2018, 10, 2632. [Google Scholar] [CrossRef]
Mortoja, M.G.; Yigitcanlar, T.; Mayere, S. What Is the Most Suitable Methodological Approach to Demarcate Peri-Urban Areas? A Systematic Review of the Literature. Land Use Policy 2020, 95, 104601. [Google Scholar] [CrossRef]
Saastamoinen, U.; Vikström, S.; Helminen, V.; Lyytimäki, J.; Nurmio, K.; Nyberg, E.; Rantala, S. The Limits of Spatial Data? Sense-Making within the Development and Different Uses of Finnish Urban-Rural Classification. Land Use Policy 2022, 120, 106231. [Google Scholar] [CrossRef]
Brown, D.L.; Champion, T.; Coombes, M.; Wymer, C. The Migration-Commuting Nexus in Rural England. A Longitudinal Analysis. J. Rural Stud. 2015, 41, 118–128. [Google Scholar] [CrossRef]
Gonçalves, J.; Gomes, M.C.; Ezequiel, S.; Moreira, F.; Loupa-Ramos, I. Differentiating Peri-Urban Areas: A Transdisciplinary Approach towards a Typology. Land Use Policy 2017, 63, 331–341. [Google Scholar] [CrossRef]
Liu, Y.; Liu, X.; Gao, S.; Gong, L.; Kang, C.; Zhi, Y.; Chi, G.; Shi, L. Social Sensing: A New Approach to Understanding Our Socioeconomic Environments. Ann. Assoc. Am. Geogr. 2015, 105, 512–530. [Google Scholar] [CrossRef]
Peng, J.; Hu, Y.; Liu, Y.; Ma, J.; Zhao, S. A New Approach for Urban-Rural Fringe Identification: Integrating Impervious Surface Area and Spatial Continuous Wavelet Transform. Landsc. Urban Plan. 2018, 175, 72–79. [Google Scholar] [CrossRef]
Ding, W.; Chen, H. Urban-Rural Fringe Identification and Spatial Form Transformation during Rapid Urbanization: A Case Study in Wuhan, China. Build. Environ. 2022, 226, 109697. [Google Scholar] [CrossRef]
Feng, Z.; Peng, J.; Wu, J. Using DMSP/OLS Nighttime Light Data and K-Means Method to Identify Urban-Rural Fringe of Megacities. Habitat Int. 2020, 103, 102227. [Google Scholar] [CrossRef]
Zhu, J.; Lang, Z.; Yang, J.; Wang, M.; Zheng, J.; Na, J. Integrating Spatial Heterogeneity to Identify the Urban Fringe Area Based on NPP/VIIRS Nighttime Light Data and Dual Spatial Clustering. Remote Sens. 2022, 14, 6126. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
Pryor, R.J. Defining the Rural-Urban Fringe. Soc. Forces 1968, 47, 202. [Google Scholar] [CrossRef]
Dadashpoor, H.; Ahani, S. A Conceptual Typology of the Spatial Territories of the Peripheral Areas of Metropolises. Habitat Int. 2019, 90, 102015. [Google Scholar] [CrossRef]
Allen, A. Environmental Planning and Management of the Peri-Urban Interface: Perspectives on an Emerging Field. Environ. Urban. 2003, 15, 135–148. [Google Scholar] [CrossRef]
Vizzari, M.; Sigura, M. Landscape Sequences along the Urban-Rural-Natural Gradient: A Novel Geospatial Approach for Identification and Analysis. Landsc. Urban Plan. 2015, 140, 42–55. [Google Scholar] [CrossRef]
Ortiz-Báez, P.; Cabrera-Barona, P.; Bogaert, J. Characterizing Landscape Patterns in Urban-Rural Interfaces. J. Urban Manag. 2021, 10, 46–56. [Google Scholar] [CrossRef]
Yang, J.; Dong, J.; Sun, Y.; Zhu, J.; Huang, Y.; Yang, S. A Constraint-Based Approach for Identifying the Urban-Rural Fringe of Polycentric Cities Using Multi-Sourced Data. Int. J. Geogr. Inf. Sci. 2021, 36, 114–136. [Google Scholar] [CrossRef]
He, X.; Yuan, X.D.; Zhang, D.H.; Zhang, R.R.; Li, M.; Zhou, C.S. Delineation of Urban Agglomeration Boundary Based on Multisource Big Data Fusion-A Case Study of Guangdong-Hong Kong-Macao Greater Bay Area (GBA). Remote Sens. 2021, 13, 1801. [Google Scholar] [CrossRef]
Huang, J.; Zhou, Q.; Wu, Z. Delineating Urban Fringe Area by Land Cover Information Entropy-An Empirical Study of Guangzhou-Foshan Metropolitan Area, China. ISPRS Int. J. Geo-Inf. 2016, 5, 59. [Google Scholar] [CrossRef]
Cao, C.; De Luccia, F.J.; Xiong, X.; Wolfe, R.; Weng, F. Early On-Orbit Performance of the Visible Infrared Imaging Radiometer Suite Onboard the Suomi National Polar-Orbiting Partnership (S-NPP) Satellite. IEEE Trans. Geosci. Remote Sens. 2014, 52, 1142–1156. [Google Scholar] [CrossRef]
Doll, C.N.H.; Muller, J.P.; Morley, J.G. Mapping Regional Economic Activity from Night-Time Light Satellite Imagery. Ecol. Econ. 2006, 57, 75–92. [Google Scholar] [CrossRef]
Shi, K.; Yu, B.; Huang, Y.; Hu, Y.; Yin, B.; Chen, Z.; Chen, L.; Wu, J. Evaluating the Ability of NPP-VIIRS Nighttime Light Data to Estimate the Gross Domestic Product and the Electric Power Consumption of China at Multiple Scales: A Comparison with DMSP-OLS Data. Remote Sens. 2014, 6, 1705–1724. [Google Scholar] [CrossRef]
Sutton, P.C.; Goetz, A.R.; Fildes, S.; Forster, C.; Ghosh, T. Darkness on the Edge of Town: Mapping Urban and Peri-Urban Australia Using Nighttime Satellite Imagery. Prof. Geogr. 2010, 62, 119–133. [Google Scholar] [CrossRef]
He, C.; Wei, A.; Shi, P.; Zhang, Q.; Zhao, Y. Detecting Land-Use/Land-Cover Change in Rural-Urban Fringe Areas Using Extended Change-Vector Analysis. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 572–585. [Google Scholar] [CrossRef]
Li, G.; CAO, Y.; He, Z.; He, J.; Cao, Y.; Wang, J.; Fang, X. Understanding the Diversity of Urban–Rural Fringe Development in a Fast Urbanizing Region of China. Remote Sens. 2021, 13, 2373. [Google Scholar] [CrossRef]
Long, Y.; Han, H.; Tu, Y.; Shu, X. Evaluating the Effectiveness of Urban Growth Boundaries Using Human Mobility and Activity Records. Cities 2015, 46, 76–84. [Google Scholar] [CrossRef]
Zhong, C.; Arisona, S.M.; Huang, X.; Batty, M.; Schmitt, G. Detecting the Dynamics of Urban Structure through Spatial Network Analysis. Int. J. Geogr. Inf. Sci. 2014, 28, 2178–2199. [Google Scholar] [CrossRef]
Zhou, Y.; Fang, Z.; Thill, J.C.; Li, Q.; Li, Y. Functionally Critical Locations in an Urban Transportation Network: Identification and Space-Time Analysis Using Taxi Trajectories. Comput. Environ. Urban Syst. 2015, 52, 34–47. [Google Scholar] [CrossRef]
Liu, Y.; Wang, F.; Xiao, Y.; Gao, S. Urban Land Uses and Traffic “Source-Sink Areas”: Evidence from GPS-Enabled Taxi Data in Shanghai. Landsc. Urban Plan. 2012, 106, 73–87. [Google Scholar] [CrossRef]
Luo, R.; Liu, X.; Wu, Z.; Chen, Y. Delineation of the Urban Fringe Using Multi-Indicators and Deep Neural Network. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 7411–7414. [Google Scholar]
Tsagkis, P.; Bakogiannis, E.; Nikitas, A. Analysing Urban Growth Using Machine Learning and Open Data: An Artificial Neural Network Modelled Case Study of Five Greek Cities. Sustain. Cities Soc. 2023, 89, 104337. [Google Scholar] [CrossRef]
Wang, J.; Biljecki, F. Unsupervised Machine Learning in Urban Studies: A Systematic Review of Applications. Cities 2022, 129, 103925. [Google Scholar] [CrossRef]
Tayyebi, A.; Pijanowski, B.C.; Tayyebi, A.H. An Urban Growth Boundary Model Using Neural Networks, GIS and Radial Parameterization: An Application to Tehran, Iran. Landsc. Urban Plan. 2011, 100, 35–44. [Google Scholar] [CrossRef]
Peng, J.; Liu, Q.; Blaschke, T.; Zhang, Z.; Liu, Y.; Hu, Y.; Wang, M.; Xu, Z.; Wu, J. Integrating Land Development Size, Pattern, and Density to Identify Urban–Rural Fringe in a Metropolitan Region. Landsc. Ecol. 2020, 35, 2045–2059. [Google Scholar] [CrossRef]
De Veaux, R.D.; Ungar, L.H. Multicollinearity: A Tale of Two Nonparametric Regressions. In Proceedings of the Selecting Models from Data; Cheeseman, P., Oldford, R.W., Eds.; Springer: New York, NY, USA, 1994; pp. 393–402. [Google Scholar]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2021, 23, 18. [Google Scholar] [CrossRef] [PubMed]
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable; Independently published. 2022. ISBN 9798411463330. Available online: https://christophm.github.io/interpretable-ml-book (accessed on 17 August 2023).
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Goldstein, A.; Kapelner, A.; Bleich, J.; Pitkin, E. Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. J. Comput. Graph. Stat. 2015, 24, 44–65. [Google Scholar] [CrossRef]
Fisher, A.; Rudin, C.; Dominici, F. All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res. JMLR 2019, 20, 1–81. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Ribeiro, M.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
Shapley, L.S. A Value for N-Person Games; RAND Corporation: Santa Monica, CA, USA, 1952. [Google Scholar]
Long, Y.; Luo, S.; Liu, X.; Luo, T.; Liu, X. Research on the Dynamic Evolution of the Landscape Pattern in the Urban Fringe Area of Wuhan from 2000 to 2020. ISPRS Int. J. Geo-Inf. 2022, 11, 483. [Google Scholar] [CrossRef]
Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable Classification with Limited Sample: Transferring a 30-m Resolution Sample Set Collected in 2015 to Mapping 10-m Resolution Global Land Cover in 2017. Sci. Bull. 2019, 64, 370–373. [Google Scholar] [CrossRef]
Xu, X.; Annual Night Light Datasets of China. Resources and Environmental Science Data Registration and Publishing System. Available online: https://www.resdc.cn/DOI/DOI.aspx?DOIID=105 (accessed on 9 September 2022).
Dong, Q.; Qu, S.; Qin, J.; Yi, D.; Liu, Y.; Zhang, J. A Method to Identify Urban Fringe Area Based on the Industry Density of POI. ISPRS Int. J. Geo-Inf. 2022, 11, 128. [Google Scholar] [CrossRef]
Zeng, Z.; Song, B.; Zheng, X.; Li, H. Changes of Traffic Network and Urban Transformation: A Case Study of Xi’an City, China. Land Use Policy 2019, 88, 104195. [Google Scholar] [CrossRef]
Chollet, F.; Keras. GitHub Repository 2015. Available online: https://github.com/fchollet/keras (accessed on 20 October 2022).
Aas, K.; Jullum, M.; Løland, A. Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
Sellereite, N.; Jullum, M.; Redelmeier, A.; Lachmann, J. Shapr: Prediction Explanation with Dependence-Aware Shapley Values. 2023. Available online: https://github.com/NorskRegnesentral/shapr (accessed on 17 August 2023).
Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.-W.; Newman, S.-F.; Kim, J.; et al. Explainable Machine-Learning Predictions for the Prevention of Hypoxaemia during Surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar] [CrossRef]
Zhao, Y.; Zhu, X.; Guo, W.; She, B.; Yue, H.; Li, M. Exploring the Weekly Travel Patterns of Private Vehicles Using Automatic Vehicle Identification Data: A Case Study of Wuhan, China. Sustainability 2019, 11, 6152. [Google Scholar] [CrossRef]
Tu, W.; Cao, J.; Yue, Y.; Shaw, S.L.; Zhou, M.; Wang, Z.; Chang, X.; Xu, Y.; Li, Q. Coupling Mobile Phone and Social Media Data: A New Approach to Understanding Urban Functions and Diurnal Patterns. Int. J. Geogr. Inf. Sci. 2017, 31, 2331–2358. [Google Scholar] [CrossRef]
Rose, A.; McKee, J.; Urban, M.; Bright, E.; Sims, K. LandScan Global 2018 (Version 2018); Oak Ridge National Laboratory: Oak Ridge, YN, USA, 2019. [Google Scholar] [CrossRef]
Dobson, J.; Bright, E.; Coleman, P.; Durfee, R.; Worley, B. LandScan: A Global Population Database for Estimating Populations at Risk. Photogramm. Eng. Remote Sens. 2000, 66, 849–857. [Google Scholar]

Figure 1. Framework of the methodology.

Figure 2. Location and map of the study area, Wuhan.

Figure 3. Frame diagram of neural network model.

Figure 4. Spatial distribution of different indicators.

Figure 5. Distribution of three areas from the NN model in Wuhan.

Figure 6. The overlay of identified peri-urban areas and Wuhan comprehensive plan. Areas ①–④ indicate corresponding map locations of individual images.

Figure 7. The importance of different urban indicators with SHAP values.

Figure 8. Results of scenarios using different combination of indicators.

Figure 9. Comparison of delineation with/without human movement data.

Figure 10. Force plot of prediction outputs (classification as peri-urban) for interested observations in Qingshan and Jiangxia.

Table 1. Description of datasets.

Features	Datasets	Indicators	Description and Sources
Spatial features	LULC images	BLD (Built-up land density) SHDI (Shannon diversity index)	10-m resolution global land cover (GLC10) http://data.ess.tsinghua.edu.cn/ [53], accessed on 9 November 2022.
Spatial features	Road network	RD (Road network density)	2018 Wuhan Navigation Road Network Data
Socio- economic	Nighttime light	NLI (Nighttime light intensity) NLF (Nighttime light fluctuation)	Annual Night Light Datasets for China based on NPP-VIIRS https://www.resdc.cn/DOI/DOI.aspx?DOIID=105 [54], accessed on 9 September 2022.
Socio- economic	POI	PD (POI overall density) SPD (Service POI density)	Gaode navigation map POI data https://www.amap.com/, accessed on 25 September 2018.
Human movement	Taxi trip OD	TI (Taxi trip OD intensity)	Taxi data of 5 days in 2018

Table 2. The average value of indicators in different classes.

Identified Areas	BLD	SHDI	RD	NLI	NLF	PD	SPD	TI
Urban	0.674	0.322	0.729	0.535	0.114	0.473	0.468	0.147
Peri-urban	0.298	0.406	0.442	0.134	0.117	0.108	0.047	0.012
Rural	0.050	0.323	0.058	0.009	0.001	0.001	0.001	0.001

Table 3. Comparison with literature that also identifies peri-urban areas in Wuhan.

	Study Year	Urban		Peri-Urban
	Study Year	Terms Used in Corresponding Publications	Area ( ${k m}^{2}$ )	Terms Used in Corresponding Publications	Area ( ${k m}^{2}$ )
This study	2018	Urban area	445	Peri-urban area	2649
Ding and Chen (2022) [16] (k = 4 in K-means)	2020	“Urban core” (p. 2)	425	“Near urban core” and “Near rural area” (p. 2)	2247
Long et al. (2022) [52]	2020	“City center district” (p. 1)	744	“Urban fringe areas” (p. 1)	1220

Table 4. Comparisons between different scenarios in Figure 8.

Landscape	Jaccard Index Compared to Scenario (f)
Landscape	(a)	(b)	(c)	(d)	(e)
Urban	43.63%	71.00%	80.32%	57.28%	74.93%
Peri-urban	46.85%	81.73%	92.45%	52.42%	83.06%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, X.; Liu, X.; Zhou, Y. Delineating Peri-Urban Areas Using Multi-Source Geo-Data: A Neural Network Approach and SHAP Explanation. Remote Sens. 2023, 15, 4106. https://doi.org/10.3390/rs15164106

AMA Style

Sun X, Liu X, Zhou Y. Delineating Peri-Urban Areas Using Multi-Source Geo-Data: A Neural Network Approach and SHAP Explanation. Remote Sensing. 2023; 15(16):4106. https://doi.org/10.3390/rs15164106

Chicago/Turabian Style

Sun, Xiaomeng, Xingjian Liu, and Yang Zhou. 2023. "Delineating Peri-Urban Areas Using Multi-Source Geo-Data: A Neural Network Approach and SHAP Explanation" Remote Sensing 15, no. 16: 4106. https://doi.org/10.3390/rs15164106

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Delineating Peri-Urban Areas Using Multi-Source Geo-Data: A Neural Network Approach and SHAP Explanation

Abstract

1. Introduction

2. Literature Review

2.1. Peri-Urban Identification

2.2. Neural Network in Peri-Urban Study and the SHAP Explanation

3. Methodology

3.1. Study Area and Data Sources

3.2. Deriving Urbanization-Related Indicators

3.3. Classifying Peri-Urbans by a Nerual Network

3.4. SHAP Explanation

4. Results

4.1. Peri-Urban Area Identification

4.2. SHAP Analysis of Individual Factors

5. Discussions

5.1. The Effect of TI

5.2. Inclusion of Other Indicators

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI