*2.2. Selection of the Variables and Data Sources at Two Scales*

At the regional scale (using 1 km × 1 km grid cells and 30 m spatial resolution), the landscape character was divided by five variables (landform, vegetation, hydrology, soil, and geology) to explore the natural features of the railway on a large scale. The five variables were graded into 45 landscape indicators, and the first letters of the variables were used as capitalized acronyms to represent these landscape indicators (Table 1). The correlation tests between the variables were conducted before clustering as the correlation could affect the clustering results [27]. Chi-squared and Lambda tests were used to determine the

correlation of the five categorical variables, which were relatively independent for their low correlation. The Digital Elevation Model (DEM, ASTER GDEM 30M) was obtained from the Geospatial Data Cloud website (http://www.gscloud.cn/search, accessed on 21 May 2022). The hydrological data were calculated based on the DEM in ArcGIS. The soil and vegetation datasets were collected from the Institute of Soil Science, Chinese Academy of Sciences. The data on landforms (2016, 1:2,700,000) and geology (2014, 1:2,700,000) were obtained from the China Geological Survey website (https://www.cgs.gov.cn/, accessed on 5 June 2022).

**Figure 1.** Location and two-scaled boundaries of the Yunnan–Vietnam Railway (Yunnan section).

At the corridor scale (using 0.5 km × 0.5 km grid cells and 12.5 m spatial resolution), we focused on the natural and cultural features of the railway and its surrounding environment. The landscape character was classified by six variables: altitude, relief, slope, aspect, land use, and heritage density. Pearson analysis was used to analyze the correlations of five continuous variables. The correlation coefficient between slope and relief was 0.974, showing high correlation. Considering its influence on railway landscape character, the relief variable was excluded. Finally, five variables were selected and divided into 24 indicators, which were coded with Greek alphabet characters such as α and β (Table 2). The DEM data (ALOS 12.5 M DEM) were obtained from the Alaska Satellite Facility website (ASF, https://search.asf.alaska.edu/, accessed on 3 June 2022). The datasets for slope and aspect were calculated based on the DEM. Sentinel-2 data (10 m resolution) from the United States Geological Survey website (USGS, https://earthexplorer.usgs.gov/, accessed on 2 June 2022) were used. The data on land use were calculated based on the Sentinel-2 data in ENVI. There were 300 heritage sites identified along the railway, involving industrial railway heritage, Chinese traditional villages, and various national scenic and historic areas. Heritage density was calculated using 1 km × 1 km grid cells.


**Table 1.** Variables used for landscape classification at the regional scale.

**Table 2.** Variables used for landscape classification at the corridor scale.


*2.3. Analysis Methods*

This paper adopted a methodological framework combining the holistic and parametric approaches. The framework primarily included three-stage process: (a) selection of data

sources; (b) recognition of landscape character types; and (c) division and description of landscape character areas (Figure 2).

**Figure 2.** Methodological framework used to classify landscape character types and areas.

First, all the data variables were entered into ArcGIS in order to unify the coordinate system, spatial resolution, and grid cells at each scale. A 30 m spatial resolution and 1 km × 1 km grid cells were selected for the regional scale, while 12.5 m spatial resolution and 0.5 km × 0.5 km grid cells were selected for the corridor scale. All variables were divided into grid cells in order to establish a matrix that connected the variables and the grid cells through extracted multi-values to point and spatial join tools. In this way, it was ensured that each grid cell had unique corresponding landscape indicators. For example, one grid cell at the regional scale could consist of the indicators L3, V5, H7, S4, and G3. The connection matrix at the two scales was imported into SPSS25. Standardized processing and correlation analysis were performed to eliminate the influence of dimensionality and ensure the independence of the variables.

Second, the data matrix was imported into the Jupyter Notebook platform and the Python programming language was used to program the k-prototypes clustering algorithm. The landscape character types were classified by scatter plots and spatial distribution of clusters. The landscape character types were represented by landscape indicator codes; when a ratio of landscape indicator to landscape character type X accounted for more than 60%, it was indicated as "X", as "{X}" for a ratio between 30% and 60%, and as "(X)" for a ratio between 10% and 30%. When a ratio accounted for less than 10%, it was not

represented. The purpose of clustering is to divide a set of data objects into multiple clusters in such a way that the data objects in one cluster are more similar than those in the other clusters [28,29]. The initial prototype of the k-prototypes algorithm was a k-means algorithm, which was primarily used to analyze numerical data. Then, the kmodes algorithm was extended to a k-means algorithm to deal with categorical data [30]. The k-prototypes algorithm integrates the k-means and k-modes algorithms, which can be applied to analyze numerical and categorical mixed data [31,32]. In this paper, we selected the k-prototypes clustering algorithm as a parameterized method for identifying landscape character types by fully considering the mixed attributes of landscape variables. The objective function is as follows [33]:

$$E = \sum\_{i=1}^{n} \sum\_{j=1}^{k} w\_{ij} d\left(\mathbf{x}\_i, \boldsymbol{\mu}\_j\right) \tag{1}$$

where *wil* is an element of the partition matrix, *Wn*×*k*. *xi*(*<sup>i</sup>* = 1 ... , *<sup>n</sup>*) are the objects in the dataset, *uj*(*j* = 1 ... *k*) are the prototype observations or the representative vectors for clusters, and *d*(*xi*, *uj*) is the degree of dissimilarity, defined below:

$$d\left(\mathbf{x}\_{i},\boldsymbol{u}\_{j}\right) = \sum\_{m=1}^{q} \left(\mathbf{x}\_{i}^{m} - \boldsymbol{u}\_{j}^{m}\right) + \gamma \sum\_{j=p+1}^{m} \delta\left(\mathbf{x}\_{i}^{m},\boldsymbol{u}\_{j}^{m}\right) \tag{2}$$

where the first term is the squared Euclidean distance for the numerical variables and the second term is the simple matching dissimilarity for categorical attributes. Here, γ is the weight for categorical attributes, and the simple matching dissimilarity is

$$\delta(a,b) = \begin{cases} 0 & (a=b) \\ 1 & (a \neq b) \end{cases} \tag{3}$$

Finally, the landscape character areas were delimited by multiresolution segmentation (MRS) and manual delineation. MRS is a bottom-up region merging technique that is commonly used for the classification of objects; it has frequently been applied to image processing and classification [34]. The control variable method was used to set the MRS parameters. We first set the scale and compactness parameters to 100 and 0.5, then successively tested the segmentation effect of shape parameters from 0 to 0.9 to determine the best shape parameter a). Then, the scale parameter and shape parameter were set to 100 and a, respectively, and the segmentation effect of the compactness parameters of 0–1 were tested successively to establish the compactness parameter b. After the shape and compactness parameters were established, the scale parameter c was determined by estimating the peak value of the plug-in by estimating the scale parameter (ESP2). In this way, parameters a, b, and c of the segmentation were obtained. As the results were always over-segmented, manual delineation was used to adjust the results.

#### **3. Results**
