*2.2. Research Methods*

#### 2.2.1. R Statistics

The R statistic was first proposed by Clark and introduced into geographical research by Dacey in 1960 [52]. The core idea is to compare the minimum distance between each point and the distance between its nearest neighbors to obtain the spatial distribution characteristics of points, which can effectively reveal the basic characteristics of aggregation or dispersion of observation patterns and random patterns [53]. The theoretical formula is as follows:

$$R = \frac{r\_{\text{obs}}}{r\_{\text{exp}}}; r\_{\text{obs}} = \frac{\sum\_{i=1}^{n} d\_i}{n}; r\_{\text{exp}} = 0.5\sqrt{\frac{A}{n}}$$

where *r*obs is the average distance observation value of the nearest neighbor; *r*exp is the expected average distance of the nearest neighbor; *di* is the nearest neighbor distance of rural residential area *i*; *n* is the total number of rural residential areas; *A* is the area of the study area. If *R* > 1, it indicates that the observation mode is more dispersive than the random mode; If *R* < 1, it indicates that the observation mode is more concentrated than the random mode.

#### 2.2.2. Kernel Density Analysis

Kernel Density Estimation (KDE) is a nonparametric method for estimating probability density function and a spatial analysis method for studying the distribution characteristics of certain elements in a region. The basic principle is to estimate the density function of the research object first, and then calculate the density value from the density function. In theory, the higher the density value is, the higher the distribution density of the geographic object is. The calculation formula is as follows [54,55]:

$$f(\mathbf{x}, \mathbf{y}) = \frac{1}{\mathbf{n} \mathbf{h}^2} \sum\_{i=1}^n K(\mathbf{d}\_i \,/\mathbf{h})^2$$

where *f*(x,y) is the density estimation of (x,y) position; *n* is the observed value; h is the smoothing parameter; *K* is the kernel function; *di* is the distance between (x,y) position and the ith observed position. Kernel density estimation is calculated by running ArcGIS10.2 software.

#### 2.2.3. Hot Spot Detection Analysis

The local spatial autocorrelation analysis method is used to identify the possible agglomeration pattern in the local space, judge the spatial correlation between the rural settlement density and the settlement density in the surrounding areas, so as to show its spatial agglomeration or discrete characteristics. The theoretical model of *Gi* \* index is as follows [56,57]:

$$G\_i^\*(d) = \sum\_{j=1}^n W\_{ij}(d) X\_j / \sum\_{j=1}^n X\_j$$

where *Wij* is the spatial weight matrix, spatial adjacency is 1, and non adjacency is 0. If *Gi* \* is positive and significant, it indicates that the rural settlement density around the location is concentrated in high value space. On the contrary, if *Gi* \* is negative and significant, it indicates that the rural settlement density around the location is low.

#### 2.2.4. Semi Variant Function

The rural settlement forms are different with different location directions of villages, and show certain spatial differentiation rules, which belong to regionalized variables. Semi variogram is an effective tool to describe the spatial variation rules and spatial structure of regionalized variables. In this paper, semi variogram method based on landscape shape index (LSI) is used to explore the distribution characteristics of rural settlement morphology. The theoretical formula is as follows [58,59]:

$$\gamma(h) = \frac{1}{2N(h)} \sum\_{i=1}^{n} [Z(\mathbf{x}\_i) - Z(\mathbf{x}\_i + h)]^2$$

The spatial variation function is generally represented by the variance graph (Figure 3), which is the corresponding graph between the variation function value *γ*(h) of a certain lag variable H and this H. It is defined under the condition that the regionalized variable satisfies the stationary and eigenassumptions. When the semi variogram increases, the spatial autocorrelation decreases. The distance *h* is the most important characteristic of variance graph. Another important characteristic quantity is the direction, that is, isotropy and anisotropy. Where, is called block gold value, which represents the discontinuous variation when the regionalization variable is smaller than the observation scale. C is the structural equation; C + C0 is the base value, which represents the stationary value of semi-variogram variable as the spacing increases to a certain scale. a is the range, which represents the interval when the semi-variogram reaches the abutment value. The

commonly used fitting models include spherical model, exponential model, Gaussian model, power exponential model, logarithmic model.

**Figure 3.** Model variogram.

#### *2.3. Data Collection*

The research data mainly included three parts: the remote sensing image data, the basic geographic data and the economic and social data.

(1) The remote sensing image data. Based on Google earth high-definition remote sensing image in 2022, the resolution was 30 m, and using Arcgis10.2 software, through geometric correction, coordinate registration, visual interpretation and vectorization, the rural settlement data in Nanjing were obtained (Figure 2). It was mainly used to analyze the distribution characteristics of rural settlements in Nanjing.

(2) The basic geographic data. DEM data were obtained from the geospatial data cloud platform (http://www.gscloud.cn, accessed on 2 September 2022), the resolution was 30 m; river and traffic data were obtained from the national geographic information resources directory service system. It was mainly used to analyze the influencing factors of rural settlement distribution in Nanjing.

(3) The economic and social data. The economic and social data were obtained from Jiangsu Statistical Yearbook, Nanjing Statistical Yearbook, Nanjing National Economic and Social Development Statistical Bulletin and other relevant materials. It was mainly used to analyze the influencing factors of rural settlement distribution in Nanjing.

#### **3. Results**

#### *3.1. Distribution Characteristics*

3.1.1. Spatial Distribution Characteristics of Rural Settlements

The spatial distribution of rural settlements presented a pattern of "agglomeration" in the metropolitan fringe area. Based on ArcGIS10.2 software, the centroid of patches of rural settlements was extracted and converted into point format. By using the Near tool in GIS software, the nearest spatial distance between rural settlements was calculated, and the R statistic and standardized Z value of rural settlements were calculated. The results showed that the R statistic of rural settlements in Nanjing was less than 1, and the standardized Z value was less than −1.96, which further indicated that the spatial distribution and aggregation trend of rural settlements were significant, showing the characteristics of "agglomeration type" spatial pattern in Nanjing.

The density distribution of rural settlements showed a "multi-core" center in the metropolitan fringe area, and the high-density were located in the agricultural county far from the built-up area. Based on ArcGIS10.2 analysis software, the vector data of rural settlements in Nanjing were converted into raster data, and the density distribution map of rural settlements in Nanjing was generated by Kernel density analysis method. The density of rural settlements in Nanjing was divided into five grade areas by Jenks natural fracture point method: low-density area (0–6.43 units/km2), sub-low-density area (6.44–12.87 units/km2), medium density area (12.88–19.29 units/km2), sub-high-density

area (19.30–25.73 units/km2), high-density area (25.74–32.16 units/km2), and output the spatial distribution Kernel nuclear density map of rural settlements in Nanjing (Figure 4). As shown in Figure 4: (1) The spatial distribution of rural settlements in Nanjing generally showed a "multi-core" center, and the spatial distribution showed a stepwise decreasing trend from the core to the periphery, showing a typical "core-edge" structure. (2) The areas with high-density of rural settlements were distributed in Luhe and Jiangning, with the density values above 20.08 units/km2. These areas were located in plain and polder areas, with flat terrain and rich hydrothermal resources. At the same time, agricultural production and agricultural economy in these areas developed rapidly, which also had a certain impact on the expansion and development of rural settlements. Medium density areas were mainly distributed in Lishui, Gaochun and other areas. The low-density areas were mainly distributed around the urban core area and the periphery of the new urban area, the villages around the urban core area were radiated by the city, and the population was urbanized locally, rural settlements gradually evolved into urban settlements, resulting in a small distribution of rural settlements.

**Figure 4.** Density distribution of rural settlements in Nanjing.

3.1.2. Scale Distribution Characteristics of Rural Settlements

The scale distribution of rural settlements showed the autocorrelation of agglomeration in the metropolitan fringe area. Taking the rural settlement patch area as the analysis variable, the global G(d) index was used to detect the global agglomeration characteristics of the rural settlement land scale in Nanjing. According to the calculation, the G(d) index value of the rural settlement scale in Nanjing in 2022 was 0.582, and the distribution of rural settlement size in Nanjing showed positive spatial correlation. This indicated that the high value agglomeration characteristics of rural settlement scale distribution were significant in Nanjing.

The scale distribution of rural settlements showed a pattern of "hot spot clustering in the near suburbs and cold spot clustering in the far suburbs". The hot spot detection tool was used to analyze the characteristics of local scale differentiation of rural settlements, and the G*<sup>i</sup>* \* statistical value of the rural settlement land scale in each administrative village in Nanjing was obtained. The G*<sup>i</sup>* \* score value was divided into cold and hot spots, and the hot spot map of rural settlement scale distribution was produced (Figure 5). Figure 5

showed that: (1) There was a significant spatial difference in the size distribution of rural settlements in Nanjing, showing that the size of rural settlements gradually decreased with the increase in the distance from the central city. The overall distribution pattern was that the size of rural settlements in the near suburbs was large, the size of rural settlements in the outer suburbs was moderate, and the size of rural settlements in remote areas was small. (2) The large-scale rural settlements in Nanjing were concentrated in the suburban areas of the central urban area. The suburban areas mainly attract the rural population, capital, technology and other production factors to the city and the suburbs due to the strong pull of the urban economy, thus changing the location characteristics of rural settlements in the suburbs, and thus changing the scale of rural settlements. (3) The small rural settlements in Nanjing were mainly distributed in rural areas far from the built-up areas, which were limited by the radiation of the metropolis and were still dominated by traditional agriculture. The lack of external power and limited economic development were not conducive to the settlements agglomeration, which led to the small scale of rural settlements.

**Figure 5.** Hot spots pattern of rural settlements scale in Nanjing.

3.1.3. Morphological Distribution Characteristics of Rural Settlements

The morphological distribution of rural settlements had good stability, and the spatial self-organization of morphological distribution was strong in the metropolitan fringe area. The semi variation function was used to express the morphological distribution characteristics of rural settlements in Nanjing. Taking landscape shape index (LSI) of rural settlements as an indicator, it was given to the geometric center of each town as attribute data. The sampling step was set to 2000 m, and the experimental variation function was calculated, respectively. The best model was fitted and selected, and Kriging interpolation was carried out (Table 1, Figure 6). (1) From the perspective of the abutment value and nugget value indicators, the abutment valueC+C0 was 0.0402, while the nugget value C0 was 0.0378, which had a medium degree of spatial autocorrelation. This indicated that structural factors (topography, geomorphology and other geographical and environmental factors) and random factors (economic development, policies and systems, etc.) jointly played a role in the differentiation of rural settlements. (2) From the model selected for fitting, the spatial fitting model selected by the least square method was Gaussian model, and the determination coefficient R<sup>2</sup> reached 0.895, indicating that the distribution of rural settlements had good stability, and the spatial self-organization of rural settlements was strong in Nanjing. (3) From Kriging interpolation fitting diagram, the γ(h) curve in each direction had a certain regularity, indicating that the distribution pattern of rural settlement morphology had the characteristic of autocorrelation. The spatial distribution morphology had a unique internal structure, showing a "bimodal" morphological distribution characteristic.



**Figure 6.** Variation function diagram of rural settlement morphology distribution in Nanjing.

The rural settlements morphology has significant spatial differentiation characteristics in the metropolitan fringe area. In order to more accurately consider the differences in rural settlement morphology in the metropolitan fringe, based on the interpretation and analysis of remote sensing images of Nanjing and the field visits and surveys of villages in different distribution locations such as Qixia District, Jiangning District, Luhe District, Pukou District, Lishui District, Gaochun District, it was found that the rural settlement morphology in the metropolitan fringe mainly existed four types (Table 2, Figure 7).

**Table 2.** The morphological types and basic characteristics of rural settlements in Nanjing.


**Figure 7.** Distribution of rural settlement types in Nanjing.
