1. Introduction
Rapid urbanization due to economic development and population growth is increasing the demand for nearby farmland and natural resources, especially nearby megacities [
1,
2]. In 2008, more than half of the global population lived in cities and 81% of the global population will live in cities by 2030 [
3]. According to the United Nations, the prime locus of this spurt in urban population, such as India, China, and Nigeria, will account for 37% of the projected urban population growth in the world from 2014 to 2050 [
3]. This rapid urbanization process has created massive changes in land use and further modified the biogeochemical and hydrological cycles of our living environment [
4,
5]. Understanding historical spatiotemporal dynamics of land use change and predicting patterns of future change can provide useful information for local communities and policy decision makers to make sound decisions to enable sustainable development [
6,
7,
8].
One of the typical land use changes caused by urbanization is the conversion of the agricultural lands to residential, industrial, recreational, and transportation use, resulting in growing pressures on the sustainability of agriculture production and resources [
9,
10]. For example, the rapid urbanization in China has led to the loss of agricultural land and increases of land use intensity [
11]. Similarly, the conversion of agricultural lands in India caused numerous abandonment, resulting in degradation in roughly 50% of India’s land resources [
10,
12,
13]. The loss and degradation in agricultural land associated with the population growth have contributed to a decline in the per capita availability of food grains [
10,
14]. Most developing countries are facing the dilemma of an increasing demand on agricultural production and a decreasing availability of productive agricultural land due to the economic development [
15,
16]. Therefore, sustainable development of agricultural land is a key concern for both science and policy communities and the significant step to solve this dilemma is to understand the past and future trajectories of farmland loss due to the urban expansion.
Remote sensing data provide effective tools to examine the area of agricultural land loss over large geographic areas. With increasing availability and improving quality, satellite images have been widely applied to monitor and analyze land use change during urban expansion in a timely and cost-effective way [
17,
18,
19,
20]. Studies have utilized multi-temporal satellite images to investigate the significant loss of agricultural lands in different Indian cities over defined periods [
21,
22,
23]. The accuracies of historical analysis and future prediction largely depend on the quality of the input data [
24] and remote sensing data have been widely used in landscape change analysis and model development due to their high spatial accuracy and temporal frequency [
24,
25,
26,
27]. Significant improvement of remote sensing data in both spatial and temporal resolution has greatly extended research in examining the trajectory of land use change, as well as its driver analysis [
26,
28].
There are a wide variety of spatial models available to simulate and predict the land use change using remote sensing data [
29], ranging from physical models [
30], statistical models [
26,
31], expert models [
32], and cellular models [
24,
33], to hybrid models [
34,
35]. Basically, land use change models can be categorized into two groups: statistical description models [
31,
36,
37] and spatial transition models [
38,
39,
40]. The spatial description models simulate the dynamics of landscape structure through a variety of regression or statistic techniques, while the spatial transition models incorporate more spatial information, such as the location or state configuration of the landscape [
24]. For example, the early Markovian analysis was used as a descriptive tool to predict the percentage change of landscape types [
41,
42]. Through analyzing two or more consequent land use maps, Markov Chain (MC) models predict the future patterns using the probability of change from one state to another [
43]. However, the Markov model alone is not sufficient to predict the spatial distribution of each category, even though it can simulate the magnitude of change.
The Cellular Automata (CA) model is one of the most successful spatial transition models used to simulate the complex land use change [
33,
44]. The CA model begins with a homogeneous cell-based grid and adjusts itself through the transition rule derived from its neighborhoods. Through involving more unknown, immeasurable spatiotemporal variables, this model is suitable for simulating complex and hierarchical structures at a large scale [
45,
46]. The advantage of the CA model in simulating land use change has been widely recognized since the theoretical abstraction and constraint of the CA model in the real world can be easily achieved within a grid-GIS system using remote sensing data [
47,
48,
49].
In the last two decades, the hybrid Markov-CA model has been applied as a robust method in the geographic and spatial domains, especially in land use change and urban growth prediction since the remote sensing data can be efficiently incorporated into it [
50,
51]. As a spatial transition model, the Markov-CA model has the advantage of two-way transition predictions among land use classes, by combining the stochastic aspatial Markov techniques with the stochastic spatial cellular automata method [
52,
53]. This model outperformed other regression-based models in predicting land use change [
38,
54].
Whereas many approaches and techniques have been successfully applied for modeling and predicting land use change, there is still a lack of incorporating socioeconomic or demographic variables into simulation models [
2], while land use change is affected by factors of biophysical condition, socioeconomic development, government policy etc. [
55]. Research on natural, social, and political phenomena has traditionally been separated into different disciplines, each with its own terminologies and methodologies [
56,
57]. This separation ignores the specific features of human-dominated land use change and is unable to adapt to dynamic system requirements. To monitor and predict this land use change, it is essential to understand the complex interactions between physical and social factors and how these interactions impact the change of landscape structure and patterns.
The research objective of this paper is to integrate multi-temporal land use data derived from Landsat images, the Markov-CA model, and other ancillary factors to explore the historical and future trajectories of farmland loss in the Delhi metropolitan area, one of the most rapidly urbanized areas in the world. During the last two decades, the Delhi metropolitan area and its surrounding satellite cities, called the National Capital Region (NCR), have exhibited soaring rates of landscape change, representing typical characteristics of urbanization in developing countries. The specific objectives of this research are to: (1) obtain the LULC maps of 1994, 2003, and 2014 for NCR by classifying multi-temporal Landsat images; (2) develop and calibrate the Markov-CA model based on the Markov transition probabilities of LULC classes, the CA diffusion factor, and ancillary factors; and (3) analyze and compare the past loss of farmland and predict future loss of farmland in relation to rapid urban expansion to the year 2030. The results from this research will provide an overview of past and future trajectories of farmland loss and help decision makers and planners better manage the expected future farmland loss and guide sustainable development during a period of rapid urbanization.
4. Results and Discussion
4.1. MC Transition Matrices from the Historical LULC Changes
The LULC maps classified from 1994, 2003, and 2014 Landsat images are shown in
Figure 5. The dominant LULC type is farmland, covering 7925.86 km
2 and 78.98% of the study area in 2014. The urban area has a very similar size to grassland during the 1990s, but increased dramatically from 461.35 km
2 (4.60%) in 1994 to 1273.55 km
2 (12.69%) in 2014. The obvious urban sprawl was found in the North West and South West Districts in Delhi due to the rapid population growth and immigration, as well as the area along the major transportation routes. These initial change analyses help us to identify roads and the population as two suitability drivers in the model development.
The total change areas between 1994 and 2003 and between 2003 and 2014 were 1190.81 km2 and 1311.58 km2 and the percentage change were 11.87% and 13.07%, respectively. The dominant LULC class, farmland, decreased from 8745.56 km2 in 1994 to 8389.11 km2 in 2003 and to 7949.97 km2 in 2014. The most significant changes were farmland loss to urban, which were 43.54% and 50.64%, respectively, of the total area of farmland loss. This result indicates that significant farmland has been lost in the study area during the last two decades due to its rapid urbanization.
The MC transition matrices were calculated based on the classified maps to determine the landscape change (
Table 3 and
Table 4). Within the transition table, each value represents the total changed area and percentage from the starting year (column side) to the ending year (row side). The annual transition probability matrix was further derived using Equations (2) and (3), by accumulating from these two periods (
Table 5). By examining the detailed landscape change data (
Table 3,
Table 4 and
Table 5), it is clear that the change from farmland to urban was the largest change, followed by the change from farmland to natural vegetation (forest and grassland). This result indicates an obvious farmland loss during the last two decades in the study area. The yearly transition probability was incorporated into the Markov-CA model for the annual updates in the model prediction.
4.2. Predicted LULC Change Using Markov-CA Model
Our algorithm (
Figure 3) was scripted and implemented in the Matlab and IDL/ENVI using the landscape maps for 1994, 2003, and 2014. The first set of annual maps was produced using only the Markov-CA model, without considering other driving factors. In order to illustrate the changing pattern and model simulation results, the three simulated maps in the years 1995, 2003, and 2014 are listed in
Figure 6. The simulated LULC maps in 2003 and 2014 were further overlaid with the actual classified map to display the simulation quality of the Markov-VA model. Obviously, the predicted result in 2003 is better than the result in 2014, particular in the forest and grassland in the suburban area, with less difference compared to the empirical maps (
Figure 6). This might be caused by the error propagation as the predicted result from the current year will be the input data for the next year’s simulation. The error was accumulated not only from the model itself, but also the initial image classification. For example, the low accuracy in grassland led to the low accuracy of grassland in our model prediction (
Table 2 and
Figure 6). The landscape types with a less direct connection to the socioeconomic drivers, e.g., grassland, wetland, bare land, and water, have a lower accuracy than the socioeconomic-related landscapes. Meanwhile, the accumulated error increased over time and offset the advantage of our model.
As indicated in the figures, the urban area will continue to grow in the same pattern since the expansion area will be located next or close to the existing urban areas. During the last two decades, the dominant LULC class was farmland, as it covers more than 70% of the study area. In our results, the initial predicted results showed that the increasing urban area from 1995 to 2005 was 401 km2, which was less than the increased urban area (455 km2) from 2005 to 2015. This might have been caused by the rapid urbanization in New Delhi after the 2000s. Urbanization and its subsequent landscape change are usually driven by many other factors, such as population growth, economic development, and government policies. Incorporating more drivers will help to develop better prediction results.
In our simulated landscape maps, the best predictable type of urban growth is the outward expansion of urban area in the suburban area in the Markov-CA model [
73]. As the “spreading” runs in a repeated model in CA, the “changing” area in the suburban area can be easily predicted. In this study, the predicted area can be easily turned into a homogeneous and isolated “island”; however, it is hard or even impossible to predict the “emergent” centers, which might be caused by the local policy or even population growth/migration.
Different from the previous widely used Markov-CA model, our model further incorporated other socioeconomic drivers into the model (
Figure 3). The advantage of our model is that it can incorporate various socioeconomic developments, which might be major drivers of urban sprawls in our study area. To measure the improvement as a result of the incorporation of these drivers, the predicted results from our model (namely full Markov-CA model) and the Markov-CA model without drivers (namely “only” Markov-CA model) were compared with the empirical maps that were derived by classifying the Landsat images (in 1998 and 2009) (
Figure 7). The maps B are the predicted results derived by “only” the Markov-CA model, while the maps A are the results from our model. In the predicted map in the year of 1998, the urban area was 744 km
2 and 747 km
2 in the “only” Markov-CA model and our model, respectively while the empirical map was 774 km
2. In 2009, our predicted urban area was 1163 km
2 and the “only” Markov-CA model was 1144 km
2 while the area in the empirical map was 1196 km
2. This comparison indicated that the improvements in the map derived by our model for 1998 are not as obvious as the improvements in the map derived by our model for 2009, especially for the “new” urban centers far away from the existing urban area. These differences could be found in the South West and North West districts, the most rapid developing districts in Delhi. The population growth in these two districts was rapid, contributing 41.98% of the total population growth in Delhi [
63]. The large area of farmland in these two districts provides the potential “developable” area for urban expansion. Rapid urbanization in these two districts led to the development of various new settlements, including the informal Jhuggis and Jhoparis resettlement colonies, refugee resettlement colonies, slum resettlement colonies, authorized/planned colonies, unauthorized-regularized colonies, urbanized village/colonies, etc. [
59]. The incorporated drivers, especially the population change, help our model to predict these changes.
A detailed comparison of the empirical maps and the simulated results from two models were conducted and the results are shown in
Table 6. Obviously, our model coincided more with the empirical map, especially in the predicted farmland and urban area. Although the RMSE increased from the year 1998 to 2009 due to the error propagation, our model improved the simulation result by incorporating other driving factors. Although the “full” model can predict better than the “only” model, the improvement in the “full” model is not very evident, especially for the early prediction. For example, the RMSE only decreased from 0.61 (“only” model) to 0.60 (“full” model) in 1998, while the decrease in RMSE is more noticeable in 2009 (from 1.96 to 1.89). The possible reason for this is that the length of prediction time is a significant factor that determines the RMSE. The longer the prediction time, the larger RMSE will be. Meanwhile, the socioeconomic development is not that substantial in a short period of time, which will limit the function of socioeconomic factors.
In the further analysis between the “only” model and “full” model in
Table 6, it could be noted that the socioeconomic drivers influenced different LULC types differently. The most obvious improvement could be found in urban and farmland, which were more related to the socioeconomic drivers. Obviously, the heterogeneous pattern of socioeconomic factors in urban or farmland was more evident than the nature-related LULC classes and this helped the “full” model to predict. While other LULC classes were not highly related to socioeconomic development, the improvements were not very obvious. Another advantage of our model is for long-term prediction, since the socioeconomic development usually needs decades. The input of socioeconomic drivers could help the model simulation in the “correct” track.
4.3. Model Validation
In order to validate our model, the simulated map was overlaid with the empirical map for the same year. Two stages were used in the validation: visual inspection and quantitative evaluation.
Figure 8 shows the agreement and disagreement components between the simulated map and the corresponding empirical map in 2017. Obviously, the most accurate predicted classes were urban and farmland. Although there are many errors found along the boundary of the small towns, the majority of the large city area, especially the most populated area, has a relatively higher accuracy than other classes. With socioeconomic variables being considered, this model is more applicable to the human-disturbed landscapes, such as urban area and farmland.
In order to further validate the model prediction, a confusion matrix among land use types between the empirical map and predicted map was developed (
Table 7). The table shows the comparisons and agreements between the simulated result and empirical map, and both user’s accuracy and producer’s accuracy were calculated and listed in
Table 7. In this research, the best predicted class is farmland (with 89.08 user’s accuracy and 95.07% producer’s accuracy), followed by the urban area (user’s accuracy 82.60% and producer’s accuracy 74.04%). The natural LULC classes, particularly the classes with small areas, have a relatively low accuracy. The possible reason for this poor performance is the frequent seasonal/annual fluctuations among these natural classes. The largest error was found between forest and grassland (126 km
2), followed by the error between grassland and farmland (105 km
2). These two errors caused the low accuracy in both grassland and forest. Yamuna River, the longest and the second largest tributary river of the Ganges River in northern India, is the major river in the study area and its major water source is the Yamunotri Glacier. During the last two decades, Yumuna River has experienced a decrease in water quality due to population growth and irrigation use, as well as a fluctuation in water discharge due to the seasonal melting of the glacier [
74]. These unexpected changes in water area, as well as the nearby wetland and bare land, are hard to predict and the simulation for the natural landscapes needs more input drivers or information.
4.4. Future Farmland Trajectories
As the national capital city of India, Delhi has experienced rapid LULC change as the result of population growth and numerous migrants. Our model was used to predict the LULC change from 1995 to 2030 at yearly steps. From each predicted map, the area of two dominant LULC classes, farmland and urban area, was calculated to analyze the past and future trajectories of farmland loss due to rapid urbanization. In the simulated result, the farmland has a consistent decreasing trend from the 1990s and the trend continues to 2030, while the urban area, on the other hand, keeps on increasing from 1995 to 2030. Specifically, the urban area will increase from 504.13 km2 to 2679.54 km2 and farmland will decrease from 8778.19 km2 to 7242.94 km2. Over the last two decades and next fifteen years, rapid urbanization was and will still be the dominant change in the study area, which is the major reason for the farmland loss.
Although the change patterns of urban and farmland are in the opposite directions, their change rates are different. The farmland has a relatively stable decreasing pattern from 1995 to 2030, while the increase rates of change of urban areas are larger during 2000s to 2020 than during the other periods (
Figure 9). This predicted result is consistent with the intensive urbanization in Delhi from the 2000s. The rapid urbanization leads to the development of urban forms with the destruction of other land use, particularly farmland from the 2000s [
75]. Based on the predicted result, this rapid urbanization in Delhi will continue until 2020 and slow down from 2020 to 2030.
Besides the significant change from farmland to urban, the change from farmland to forest/grassland was also obvious. The change from farmland to forest/grassland was caused by the forest recovery policy [
76], e.g., the native forest cover decreased before 2000 and then increased due to reforestation policy. Therefore, this recovery is the major reason for the farmland loss to forest. Another reason for the farmland loss is the misclassification, especially between farmland and grassland. Although the selected images are both in growing seasons (e.g., September in 1994 and March in 2003), a spectral difference between September and March exists, which caused the misclassification among cropland and grassland and led to the error in model prediction. A further phenology analysis between cropland and forest/grassland is needed to improve classification in the future.