1. Introduction
Over the past three decades, China’s rapid expansion of economic growth and increasing population have driven industrialization and urbanization in many cities. Pursuing a single land use space no longer satisfies the needs of people to improve living conditions and shape their futures. As a result of this situation, land use and land management have undergone significant changes [
1,
2]. Accurate prediction for land use can support the government in strengthening the supervision and scientific regulation of urban land use changes [
3]. Therefore, it is necessary to explore and summarize the spatial–temporal change characteristics of urban land use and to simulate future changes.
Researchers have successfully built many models to analyze and simulate Land Use and Land-Cover Change (LUCC) at national [
3,
4,
5] and regional [
6,
7,
8] scales. The models are referred to as numerical simulation models, spatial simulation models, or coupling models according to the simulation methods.
Numerical simulation models are based on mathematical theoretical frameworks used to predict quantitative changes in land use, including the Markov model, logistic regression, back-propagation neural networks [
9], and the system dynamic model (SD) [
10]. The simulation results of these models only reflect the amount of urban land and do not simulate the characteristics of the spatial distribution pattern [
3,
11]. Common spatial simulation models include the cellular automaton (CA) model [
12], CLUE-S model [
13], and future land use simulation (FLUS) model [
14]. The CA model has the advantage of simplicity and the flexibility of integration with other models [
15], but it relies solely on spatial data and has poor simulation accuracy. The CLUE-S model works well for simulating land use changes in small-scale areas but can be limited by the spatial resolution [
11]. In general, the accuracy of spatial simulation models alone is not ideal and is improved when combined with numerical models. Thus, the coupled models for quantitative and spatial simulation have been proposed for research on land use simulation, such as the CA-Markov, CA-SD, and CA-SVM models [
16,
17].
The CA-Markov model is currently the most widely used simulation model for land use changes [
18,
19], and is capable of predicting not only the total changes in various land use types and the state transition probability matrix, but also the spatial distribution of land use through inter-neighborhood analysis [
7,
8]. In general, the CA-Markov model is most suitable for short-term simulation. Marwa Waseem et al. used the CA-Markov model to predict the LUCC changes in parts of the northwestern desert of Egypt in 2023, revealing the potential and merit of this method in predicting future land use changes [
20]. However, the causes and trends of land use changes in such regions are poorly understood. During the process of a city’s development, there are always some driving factors that dominate the speed and direction of change in urban spatial land use [
21]. The traditional CA-Markov model can only simulate the linear evolution process of land use without considering the dynamic and regulatory power of the driving factors of land use patterns.
In order to improve the validity and rationality of land use simulation, the key to improving the CA-Markov model is to transform the complex nonlinear driving factor data into land use transition rules. The logistic model can be used to calculate the regression coefficients of multiple driving factors of land use distribution and then generate regional probability maps of each type of land use. The atlas can be incorporated into the CA-Markov model to achieve multi-scenario simulations of land use changes in the future [
22,
23,
24]. Multiple criteria evaluation (MCE) can be applied to score the suitability of drivers. Li et al. [
11] predicted the land use patterns in small and medium cities based on the MCE-CA-Markov model, and the accuracy of the simulation was significantly improved compared with the traditional model. However, determining the weight of driving factors was vulnerable to subjective influence. Fu et al. improved the MCE method based on historical data so that the selection of factors, scores, and weights are determined to reflect actual historical trends, but the limitation of the model was obvious because it assumed that the spatial–temporal variation in land use is a linearly changing trend [
25].
With the development of artificial intelligence, machine learning and deep learning methods have been applied to land use simulation. Zhou et al. used the random forest algorithm to explore the transition rules of the CA-Markov model and constructed the RF-CA-Markov model, which can improve prediction accuracy without significantly increasing the calculation cost [
26]. Artificial neural networks (ANNs) are some of the most powerful artificial-intelligence-based tools. Since ANNs are adept at handling complex nonlinear relationships through learning, the transition suitability atlases generated by them can obtain better simulation accuracy when being integrated into the CA-Markov model [
15,
27].
The land use transition rules based on the CA-Markov model include not only the land use suitability atlas but also the transition probability matrix of the Markov chain, which directly determines the transition rules and quantitative change trends between each type of land use. Most existing studies calculate the transition probability matrix based on two-period historical data of land use [
8,
15]. This method limits the simulation to the time interval of the data period [
28,
29]. Based on historical statistical data in respect of land use, the linear transformation optimization Markov (LTOM) model proposed in this paper can estimate the transition probability matrix of a Markov chain by building a linear programming model. It breaks the restriction of time intervals and is less affected by data with respect to abnormal land use changes, which is conducive to improving the performance and stability of the model. Coupling the LTOM mathematical prediction model, ANN model, and CA spatial simulation model in the land use simulation and prediction can achieve satisfactory simulation accuracy under the comprehensive consideration of multiple driving factors.
In this paper, we took the Nansha District in Guangzhou City, China, as the study area and collected historical land use data, DEM, traffic and road data, arable land data, and socioeconomic data for this area. Two time periods, 2010–2018 and 2018–2020, were chosen as the research period. Taking 2014 and 2019 as base periods, respectively, we employed the ANN-CA-LTOM-coupled model to simulate the land use changes in 2018 and 2020, compared the prediction results with the observed land use data, and verified the simulation accuracy with kappa coefficients. Finally, taking 2020 as the base period, we simulated and predicted the land use changes from 2021 to 2023 and analyzed the spatial–temporal evolution of the simulated land use structure and distribution. We explored the spatial–temporal dynamic changes of the three major types of land use in Nansha District, aiming to discover the law of land use changes, so as to assist in optimizing the spatial layout of three major types of land use and achieve harmonious human–land relationships and sustainable economic, social, and ecological development.
3. Case Study
3.1. Study Area
The Nansha District (
Figure 2) is located in the southernmost part of Guangzhou, Guangdong Province, China. The Nansha District includes nine townships, namely, Lanhe Town, Dongyong Town, Dagang Town, Huangge Town, Hengli Town, Nansha Street, Zhujiang Street, Longxue Street, and Wanqingsha Town. The total area of the administrative division is about 803 km
2, of which the sea area occupies a quarter. The overall topography of the Nansha District is gentle, and the landform types include low hills, basins, plains, and tidal flats. Currently, Guangzhou is in the deepening stage of industrialization and urbanization, and Nansha District’s economic aggregates and permanent population are growing steadily and continuously. As of November 2020, the resident population of Nansha District was 846,584, which is 3.26-times the resident population in the sixth national population census in 2010, and the average annual growth rate of the resident population from 2010 to 2020 was 12.53%, which is 3.19-times the city’s average annual growth rate (3.93%) in the same period. Driven by the dual role of the urban economy and population growth, the demand for natural resources in Nansha District is showing an apparent increase, and the total amount of construction land has a fluctuating upward trend.
3.2. Data Sources and Pre-Processing
In this study, the land use classification follows the three major categories of construction land, agricultural land, and unused land. Guangzhou carried out its first national land survey, second national land survey, and third national land survey from 1984 to 1997, 2009 to 2012, and 2018 to 2021, respectively, and has conducted annual surveys of land change. The land use data in this study are derived from the data of the third national land survey (2019–2021) and the land change survey data (2010–2018). Compared with the conventional methods of identifying, interpreting, and extracting land use information from remote sensing images, our data ensure higher accuracy and reliability of land use classification.
The driving factors include terrain and social economy. Digital elevation model (DEM) data were derived from Shuttle Radar Topography Mission (SRTM) data with a spatial resolution of 30 m × 30 m. The slope map and aspect map of the study area were generated based on DEM data. The population density data originated from the 1000 m resolution raster dataset of WorldPop (
https://www.worldpop.org/, accessed on 4 August 2022), and the population density raster was resampled to 30 m × 30 m resolution using the bilinear interpolation method. Vector data for railways and main highways originated from OpenStreetMap (
https://www.openstreetmap.org/, accessed on 4 August 2022). The water area, urban, and rural data were extracted from national land survey data and change survey data. According to the Nansha statistical yearbook data, we calculated the proportion of the cultivated land area in the prime farmland protection area and the industrial output value of each street and town in the Nansha District, and then obtained the corresponding raster data.
In order to facilitate the processing and analysis of the spatial data, all the data in this study were converted into a unified geographic coordinate system (GCS_China_Geodetic_Coordinate_System_2000) and Gauss–Kruger projection (CGCS2000_3_Degree_GK_Zone_38). All the raster data were unified to 30 m spatial resolution and were extracted according to the land survey boundary of Nansha District using a mask.
3.3. Estimating the Transition Probability Matrix
We calculated and counted the area percentages of construction land, agricultural land, and unused land in the Nansha District from 2010 to 2019, and the results are listed in
Table 1. It can be seen from the table that the proportion of construction land shows a fluctuating upward trend, while the proportion of agricultural land and unused land shows a fluctuating downward trend. It should be noted that the proportion of construction land decreased significantly in the period 2018–2019, while the proportion of agricultural land and unused land increased. The primary reason for this is that the data for 2019 were generated from the third national land survey, which has a somewhat different classification system compared to the land change survey data for the period 2010–2018.
According to Equation (10), a linear programming solution model with the objective function of minimizing the sum of relative errors was built. Based on the above data from 2010 to 2019 (
Table 1), the state transition probability matrix of three types of land use in the Nansha district was estimated as follows:
The value of each item in the matrix is between 0 and 1, and the closer the value is to 1, the more likely it is to shift from one type to another type. The values on the main diagonal of the matrix indicate the probability that the land cover type will remain in its original state. In contrast, the values on the off-diagonal indicate the probability that the land cover type will be converted into another type. According to the matrix, it can be seen that each land use type has the highest probability of remaining unchanged, which is above 98%. Construction land is the most stable category, and the probability of conversion into agricultural land and unused land is small, which is consistent with the actual situation. The conversion probability from agricultural land into construction land is about 1%. This indicates that the increase in construction land comes at the cost of a decrease in agricultural land.
Taking 2015 and 2019 as the base periods, respectively, the estimated state transition probability matrix using the LTOM model was used to predict the area percentage of three major land categories in Nansha District in 2020, and the relative error and average relative error of the prediction results were calculated and compared with the results of the proportion in 2020 predicted by the estimated state transition probability matrix using the Markov model (
Table 2 and
Table 3). In
Table 2, the average relative error of the predicted value calculated by the estimated state transition probability matrix is only 1.93%, which is significantly lower than the average relative error of the predicted value calculated by the estimated state transition probability matrix using the Markov model (7.95%).
Table 3 shows similar results. The LTOM model performs well with high reliability of the prediction results.
3.4. Transition Suitability Atlas
Building the transition suitability atlas of land use classification is the key to implementing the ANN-CA-LTOM model for land use simulation. There is a complex relationship between the spatial distribution pattern of land use and regional site conditions. Land use is not only limited by environmental factors but is also influenced by social and economic factors. Taking into account the characteristics of the natural and social environment of the study area, we chose three natural factors (the digital elevation model (DEM), slope, and aspect), four distance factors (the distance from water (Dis_water), the distance from the city (Dis_city), the distance from villages (Dis_village), and the distance from railways/main highways (Dis_traffic)), and three socioeconomic factors (population density (Pop), industrial output value (Industry), and the proportion of farmland in the prime farmland protection area (Farmland)), as the factors for building a transition suitability atlas of land use classification (
Figure 3) based on the principles of accessibility, continuity, reliability, diversity, and representativeness of the driving factors.
Table 4 lists the details of all driving factors such as the year of data, data source, etc. We normalized the above ten driving factors, and took the land use classification data of 2019 and the normalized driving factors as the input data of ANN. The training samples were sampled in the way of uniform sampling, and the number of hidden layers of the neural network was set to 12. After training, the probability suitability map of the spatial distribution of each land use type in 2019 was output, as shown in
Figure 4. The value of suitability ranges from 0 to 1. Being closer to 1 indicates the higher suitability of specific land use in this region, and being closer to 0 indicates lower suitability. Finally, the Collection Editor tool of the IDRISI software package was used to generate a land use transition suitability atlas from the suitability maps of the three land use types obtained from the ANN model, which was used for the transition rules of the CA model.
3.5. Simulation Result Analysis and Model Validation
Based on the raster data of three major types of land use in the Nansha District in 2014 and 2019, we used the CA-Markov model in IDRISI to import the estimated transition probability matrix of land use and the suitability atlas obtained by the ANN model. At the same time, we set the standard 5 × 5 grid-cell contiguity filter to constitute the CA model filter [
19] and the number of iterations to be a multiple of the time interval between the base period and the end time.
Figure 5 and
Figure 6, respectively, show the observed and simulated land use distributions in the Nansha District in 2018 and 2020. We randomly and evenly select five areas which are distributed in Lanhe Town, Dongchong Town, Dagang Town, Huangge Town, Hengli Town, Zhujiang Street, Longxue Street, and Wanqingsha Town. The results show that the simulated results are highly similar to the spatial patches of the observed land use patterns in Parts 1 to 5. It is worth stating that the simulation results in Part 4 in
Figure 5 are different from the spatial patches of the observed land use pattern. The reason for the inconsistency is that a factory has been built in Part 4 according to the government’s plan, which is difficult to simulate with the existing driving factors. Planning data should be introduced to the model in the future to avoid such inconsistencies. Once authoritative planning data have been obtained, it is straightforward to introduce them into the model of this paper as a driving factor.
To further validate the model and assess its reliability in predicting land use classification in future years, we compared the maps of the observed distributions of three major land use types in 2018 and 2020 with the simulated maps for the same periods. Then, a quantitative accuracy test and confusion matrix were used to test the consistency between the observed and simulated land use. According to the confusion matrix in
Table 5, the overall accuracy between the simulated cell number and the actual cell number of the three types of land use in 2018 was 94.22% and the kappa value was 0.904565. According to the confusion matrix in
Table 6, the overall accuracy between the simulated cell number and the observed cell number of the three land use types in 2020 was 97.72% and the kappa value was 0.962761. In addition, I calculated the figure of merit for the simulation results of the three land use types in 2018 and 2020 (
Table 7). According to the table, the value of figure of merit in 2018 is above 80%, and that in 2019 is above 90%. The results show that the simulation accuracy of the model is high. This also indicates that the model has a high confidence level.
3.6. Model Comparison
In order to compare the performance of our model with the CA-Markov model and CA-LTOM model, we used the distribution map of the three land use types in 2019 as the base period data and the three models to simulate the distribution of land use in Nansha District in 2020. Among them, the transition suitability atlas used in the CA-Markov model and CA-LTOM model was generated by the Markov tool in the IDRISI software. The tool generates a suitability atlas based on two periods of historical land use data without incorporating actual economic and social driving factors. However, the transition suitability atlas used in the ANN-CA-LTOM model was output by the ANN model based on ten driving factors. The quantitative accuracy test and confusion matrix were used to test the consistency of land use classification between the observed and simulated results from different models in 2020. The accuracy statistics of the simulation results are listed in
Table 8. According to the table, the overall simulation accuracy of three models is high, which verifies the CA-Markov model’s excellent performance in land use simulation. Comparatively, the traditional CA-Markov model has the lowest simulation accuracy of 95.50%, with a kappa coefficient of 0.926538. The overall simulation accuracy of the CA-LTOM model is significantly improved, with a value of 97.71% and a kappa coefficient of 0.96256. From the estimated results, the overall accuracy of the ANN-CA-LTOM model built in this paper is slightly higher than the CA-LTOM model. Furthermore, the artificial neural network detects the potential interdependencies and captures different variables and dynamics behind land conversion [
15]. Thus, our model is flexible by coupling ANN. We will consider more practical driving factors to enhance the simulation accuracy and achieve more meaningful prediction under different incentives or limited development policies in the future.
3.7. Simulation and Prediction of Land Use Changes in the Period 2021–2023
In our study, we assume that there is no significant change in the natural and socioeconomic factors and land use policy of Nansha District from 2020 to 2023. The land use changes from 2021 to 2023 can be simulated and predicted according to the land use transition rules from 2010 to 2020. The land use simulation maps of the Nansha District from 2021 to 2023 are shown in
Figure 7. In order to quantitatively analyze land use changes from 2020 to 2023,
Table 9 lists the area and change of observed land use in 2020 and simulated land use types from 2021 to 2023. As shown in
Table 9, during the period 2020–2023, the proportion of agricultural land is the largest, exceeding 50% of the total area of the study area, followed by the construction land and unused land. The area of construction land will continue to increase from 191.01 km
2 (28.16%) in 2020 to 200.07 km
2 (29.49%) in 2023, but the increase will slow down year by year. The area of agricultural land and unused land will continue to decline. The proportions will decrease from 51.50% (349.34 km
2) and 20.35% (138.04 km
2) in 2020 to 50.35% (341.60 km
2) and 20.15% (136.72 km
2) in 2023, respectively, with the decline also slowing down year by year.
According to the simulation results, although agricultural land occupies the most significant proportion among the three types of land use, with the expansion of construction land, the proportion of agricultural land will gradually decline, similar to results reported by Anne Gharaibeh [
15]. From the simulated map, the expansion mode of construction land is mainly spreading and filling [
35], extending inward or outward along the edge of the original construction land, and the area adjacent to the leading road network is more likely to change into construction land. This result is also the same as the previous study, probably because the cost of developing and building a city based on the original construction land is the lowest. The flat terrain is also conducive to horizontal expansion and change of construction land. Without restrictions, agricultural land in the future may be covered by linear urban areas along the main roads. In addition, the unused land in Nansha District mainly includes rivers and wetlands. According to the simulation results, the proportion of unused land presents a slowly decreasing trend. As an essential hub node connecting the Pearl River port city cluster and Hong Kong and Macao, it is necessary to implement protection and restoration measures for water bodies in Nansha District. Strengthening wetland protection is also an essential part of regional ecological protection. Our research results show that the urbanization process of urban areas in the Nansha District will slow down during 2020–2023. Based on this situation, cities should focus on agricultural protection policies and high-quality urban development, limiting the spatial changes that may lead to urban expansion. Under the finite incremental index, planning departments should reasonably use the construction land index, promote economical and intensive land use, focus on the optimization and structural adjustment of territorial space layout in the urban construction area, and tap the potential of land resource utilization to realize improved functions of land use.
The case study in this paper only divides Nansha into three primary types of land use. However, other case studies may include more changes in land use. Our model is easy to apply to other case studies by adding more land use types into the model. This will not significantly increase data requirements or time costs.
4. Conclusions
In this paper, we aim to develop an efficient coupled model for simulating future urban land use changes. Firstly, the linear transformation optimization Markov (LTOM) model is employed to estimate the state transition probability matrix of land use types at any time interval. Secondly, based on the historical statistics and various driving factors of land use changes, we loosely couple a LTOM quantitative prediction model, CA spatial prediction model, and artificial neural network to improve the prediction accuracy with respect to future land use changes. Such a coupled model is flexible and high expansibility. It can reduce the limitations of a single model and enhance the advantages of each model. Taking Nansha District as the research area, we used national land survey data and change survey data to ensure the correctness and authority of the original experimental data. The model only needs historical statistical data of land use without the limitation of requiring two periods of land use data. Furthermore, the estimation results are rarely affected by abnormal data. The results show that the ANN-CA-LTOM model performs well in regional land use simulation with an overall accuracy of 97.72% and a kappa value of 0.962761. The study results can reveal the trends in regional land use change in the future and provide a reference for the formulation of land-planning policies and thus the optimal control of national land space.
However, the approach in our study does have certain limitations in its present state, and these limitations will mark the directions of our future research. The driving factor of the proportion of farmland in prime farmland protection areas is introduced in the study of land use suitability, but it is still not deeply integrated with regional policies, and the driving factors selected from natural conditions and socioeconomic perspectives are not comprehensive enough. In future research, regional planning and policies should be considered, so that the simulated results will conform to the urban development to improve the validity and accuracy of the simulation.