1. Introduction
Urbanization has led to massive population migration, contributing to the rapid expansion of informal settlements in many developing countries. These settlements are often characterized by inadequate living conditions, heightened socio-economic vulnerabilities, and a lack of essential services like housing, sanitation, and waste management. Around 55% of the global population now resides in urban areas, with a substantial portion living in informal settlements that exacerbate existing social inequalities and environmental challenges [
1,
2]. While the United Nations Sustainable Development Goal 11 (SDG 11) calls for ensuring access to adequate housing, safe services, and slum upgrading [
3], there remains a critical gap in obtaining accurate baseline data for informal settlements. Resource limitations, poor accessibility, and inadequate urban planning prevent effective intervention. The expansion of these settlements further deepens social inequalities and environmental degradation, hindering progress towards sustainable development. Despite extensive studies on informal settlements, many lack comprehensive, spatially explicit data needed for effective planning and policy making. This study, therefore, seeks to fill this gap by offering an integrated approach to assess both the spatial distribution and habitat quality of informal settlements in Mumbai, aligning with SDG 11 to improve quality of life and promote ecological sustainability.
Informal settlements are characterized by complex and dispersed distributions, complicating efforts to standardize measurement and identification across different types of settlements. These challenges significantly impact the extraction process of informal settlements from remote sensing data [
4]. The United Nations defines informal settlements as unplanned urban areas lacking basic infrastructure, with shelters constructed on unauthorized land [
5]. While this definition is widely accepted, informal settlements also face additional challenges such as limited access to transportation, inadequate waste management, and severe environmental pollution, all of which worsen social disparities and health conditions [
6,
7,
8,
9]. Furthermore, they are also vulnerable to natural disasters like extreme rainfall, floods, earthquakes, and epidemics [
10]. Although previous studies have categorized informal settlements using key dimensions like housing quality, living space, and sanitation, these studies often lack comprehensive methodologies that integrate spatial data and environmental quality. This study builds upon existing frameworks by applying a more integrated approach that combines spatial data extraction with habitat quality assessment. We systematically categorize informal settlements using five key dimensions: insecure tenure, poor housing structure, limited space, insufficient access to water, and poor sanitation infrastructure [
11], and propose targeted interventions to improve these aspects in the context of SDG 11.
Due to the unique distribution and characteristics of informal settlements, traditional data collection methods, such as manual sampling proposed by the Slum Health Improvement Coalition [
12] and periodic housing surveys conducted by international organizations like the United Nations [
13], are insufficient for assessing SDG 11. Recent advancements in remote sensing technology, coupled with machine learning, have greatly improved the precision of classifying slums [
14,
15]. The effectiveness of these technologies has been validated through various studies. For example, Pelizari et al. [
16] utilized feature subset selection and random forest classifiers to enhance the accuracy of detecting refugee camp buildings; Prabhu and Alagu Raja [
17] developed a method combining statistical and spectral techniques for accurate identification of urban slums; Williams et al. [
18] introduced an object-based hierarchical machine learning approach that integrates high-resolution imagery and boundary data to map slums. Additionally, Leonita et al. [
19] demonstrated the effectiveness of support vector machines and random forest algorithms in improving the accuracy of slum mapping, particularly in Bandung, Indonesia. While these methods have shown promise, the balance between classification accuracy and geographic granularity remains a key research challenge [
20]. This study seeks to address this gap by integrating scene-based methods that incorporate spatial context, improving the precision of informal settlement mapping without sacrificing geographic resolution.
While deep learning (DL) has shown significant promise in slum mapping and SDG 11.1.1 monitoring, it faces notable limitations compared to traditional machine learning techniques. High-resolution (HR) and very high-resolution (VHR) imagery, while offering detailed texture features, are resource-intensive, requiring substantial time, labor, and computational power, which limits their scalability for large-scale urban analyses [
21]. Additionally, DL models tend to be less interpretable than traditional machine learning models, making them less reliable for real-world slum mapping tasks. In contrast, machine learning algorithms, by integrating feature engineering with expert knowledge, are better suited to understanding the characteristics of informal settlements in high-density urban areas [
22]. Moreover, DL performs poorly when relying on limited samples from informal settlements. The development history of informal settlements varies across regions, leading to significant differences in their types and characteristics, making it difficult to generate large-scale, high-quality training samples [
23]. Therefore, machine learning, especially small-scale models based on a small number of high-quality samples, is more feasible and reliable for SDG 11.1.1 measurement tasks.
The rapid growth of informal settlements is driven by socio-economic and demographic factors, leading to significant challenges not only for residents’ quality of life but also for the overall livability of cities [
24]. The concept of human settlement suitability, introduced in the 1950s by Greek planner Doxiadis [
25], has evolved into a comprehensive method for assessing livability, emphasizing the scientific and quantitative evaluation of living conditions. While human settlement suitability has been widely studied, many existing studies rely on static surveys and methods that fail to capture the dynamic, multi-dimensional nature of informal settlements. In 1961, the World Health Organization proposed “suitability” as one of the basic requirements for human settlement construction and development, urging governments to focus on improving public service facilities and enhancing the quality of life for residents of informal settlements. For a long time, the human settlement environment has been widely studied by experts in geography, ecology, and sociology, incorporating various regions, methods, and indicators. Most studies combine natural and human environmental factors to construct human settlement suitability indices [
26]. For example, Maimaiti et al. [
27] defined the suitability of human settlements in arid and semi-arid regions using remote sensing imagery and socio-economic data, while Huang et al. [
28] integrate remote sensing data with social perception data to establish frameworks for evaluating urban residential land suitability. Some studies have also developed multi-level, multi-indicator systems [
29] to assess the quality of urban living environments, while methods like ImPACT and trade-off analysis [
30] have been applied to quantitatively identify spatial differentiation in rural living environments. In addition to suitability analysis, many studies have focused on issues related to the quality of life and vulnerability in informal settlements. For instance, Shahraki et al. [
31] surveyed 400 households in informal areas of Kabul, Afghanistan, to assess local residents’ quality of life, finding widespread dissatisfaction with transportation, leisure, and governance, as well as material deprivation in basic services like water and energy. Chitekwe-Biti et al. [
32] used survey data from 328 households in informal areas of Zhanjan City, finding that somatic factors (with a coefficient of 5.61) had the greatest impact on livability conditions. Giri et al. [
33] randomly selected 300 slum households from the hilly areas of Kathmandu, Nepal, to assess residents’ vulnerability, revealing that those in flat areas were more vulnerable than those in hilly areas. Patri et al. [
34] collected structured questionnaire data from 200 slum households to construct a vulnerability index, showing that slum populations are more susceptible to natural disasters due to economic, material, and awareness-related vulnerabilities. Most studies on human settlement evaluation in informal settlements focus on factors such as quality of life, comfort, and habitability, typically relying on survey methods such as questionnaires and interviews. However, these methods are limited in their ability to capture the broader and multi-dimensional environmental dynamics of informal settlements.
Based on the above analysis, it is clear that informal settlements not only lack accurate baseline data but also exhibit complex spatial variations in their living environments, which are influenced by a range of natural, socio-economic, and policy factors. While existing studies have addressed some of these factors, they often rely on traditional data collection methods or static surveys, which fail to capture the dynamic and multi-dimensional nature of informal settlements. This study seeks to fill these gaps by integrating spatial data extraction with habitat quality assessment, providing a more comprehensive approach to analyzing informal settlements. By combining advanced spatial data extraction techniques with seasonal feature engineering and a Gaussian fuzzy evaluation model, we aim to assess the suitability of human living environments in informal settlements more accurately and holistically. Specifically, we employ RF models to classify slum areas in Mumbai, selecting 18 secondary indicators across four dimensions: economy, society, environment, and living conditions. Using multi-source remote sensing data, we develop a Gaussian fuzzy evaluation model with the entropy-weighted method (EWM) to assess habitat quality in informal settlements. This integrated approach addresses the limitations of previous studies by incorporating dynamic spatial data and improving the precision of environmental quality assessments. The findings from this study contribute valuable insights for urban planning, particularly for optimizing and improving the living conditions in informal settlements, in line with SDG 11.
3. Methodology Overview
This study follows a research process consisting of four key steps, designed to accurately extract informal settlements in Mumbai and thoroughly assess their habitat quality (
Figure 3). First, by combining prior knowledge and references, we extract the main factors influencing informal settlements from multi-source remote sensing data (including Sentinel-1 and Sentinel-2 data) and preprocess the data using GEE and Arc GIS 10.4 software. The data preprocessing steps include image registration, denoising, and image fusion, ensuring spatial consistency across data from different sources and providing high-quality inputs for subsequent analysis. Next, we apply interpolation methods to construct a complete dataset for identifying informal settlements and assessing habitat quality, combining spectral indices, texture features, and other key geographic data. During this process, to reduce feature redundancy and avoid overfitting, we combine hierarchical clustering with the random forest model to optimize feature selection. This ensures the representativeness of features and enhances the model’s stability and accuracy. Based on the optimized features, we use the random forest model to map the spatial distribution of informal settlements at a 10 m resolution. An entropy-weighted fuzzy evaluation method is then used to determine the weights of each influencing factor, and a weighted overlay of grid maps for each factor is performed to obtain a comprehensive habitat suitability assessment. Finally, based on spatial overlay analysis, we combine the geographical points of informal settlements with the habitat suitability assessment results and perform a grading analysis to determine the habitat characteristics of different areas, assessing Mumbai’s progress in achieving SDG 11.1. This technical approach ensures the accuracy of informal settlement extraction and the scientific rigor of habitat evaluation, providing a reliable basis for subsequent urban planning and sustainable development goals.
3.1. Calculation of Spectral Indices and Texture Features
Sentinel-1 radar backscatter data are sensitive to buildings, surface roughness, and urban structures. By combining VV and VH polarization data, more mixed surface feature information can be extracted, which facilitates detecting building density and layout in informal settlements. Sentinel-2 provides medium-resolution spectral data, which can identify vegetation, bare land, and buildings at different scales, enhancing the spectral differentiation between informal settlements and other land covers, improving classification accuracy. All data preprocessing in this section was completed on the GEE cloud platform. Using GEE’s median composite algorithm, cloud-free gaps were filled using the median value of pixels over seasonal periods, resulting in high-quality images. A total of 404 scenes were obtained. Therefore, base index data for four seasons (Spring: Spr, Summer: Sum, Autumn: Aut, Winter: Win) in 2017 and 2022 were obtained, as shown in
Table 2 and
Table 3.
In this study, texture features were extracted from the Sentinel-2 imagery using the GLCM method in GEE. We selected the near-infrared (B8), red (B4), and green (B3) bands to calculate a series of texture indices, including contrast, energy, correlation, homogeneity, and entropy, using the built-in ee.Reducer.glcm() function with a 3 × 3 neighborhood window. By extracting these data indices seasonally, we are able to capture the seasonal variations in the texture features of informal settlements, thus providing a more accurate reflection of how different seasons impact the spatial structure and environmental conditions of informal settlements. This seasonal data extraction enables us to conduct more detailed urban environmental monitoring and change analysis.
3.2. Informal Settlement Extraction Based on Random Forest Algorithm
The RF model is an ensemble learning method based on decision trees, widely used for classification and regression tasks. It improves the robustness and accuracy of the model by combining the results of multiple decision trees. Its core mechanism is bagging, where subsets of training data are randomly selected to train several decision trees, each generated from different subsets, reducing the risk of overfitting by a single decision tree. As a result, RF is more robust and accurate compared to many traditional classifiers [
47]. When training a random forest model, commonly adjusted parameters include the number of trees and the minimum leaf population. Increasing the number of trees can slightly improve the model’s accuracy but also increases computational cost. Therefore, based on previous studies, setting the number of trees to 100 strikes a balance between accuracy and efficiency, and the minimum leaf population controls the maximum depth of each tree setting to 10 to prevent overfitting [
48]. Additionally, four other random forest parameters, including variablesPerSplit (the number of variables per split, default was the square root of the number of features), bagFraction (the fraction of the data used for each tree, default was 0.5), outOfBagMode (whether the classifier should operate in out-of-bag mode), and seed (random seed), were set to their default values. To ensure the model’s robustness and accuracy, the dataset is split into training and testing sets, typically with a standard ratio of 70% for training and 30% for testing. For 2017, the number of samples used to build the random forest model is 45,766, and the number of samples used for accuracy assessment is 19,614. For 2022, the number of samples used to build the RF model is 45,794, and the number of samples used for accuracy assessment is 19,626.
3.3. Accuracy Evaluation Method for Informal Settlement Extraction
The confusion matrix clearly shows the differences between predicted classifications and actual classifications, allowing the calculation of performance metrics like overall accuracy, recall, F1 score, and kappa coefficient. A confusion matrix is used to verify the accuracy of the random forest model in extracting informal settlements and validating the results. After the random forest model completes the classification of informal settlements, it generates a set of predictions. The confusion matrix compares these predictions with the actual classification labels in geospatial data, dividing the results into four categories: true positive (TP), false positive (FP), true negative (TN), and false negative (FN).
In the classification of informal settlements, overall accuracy reflects the model’s general accuracy in predicting all types of areas, including both residential and non-residential zones.
In the formula,
represents the observed accuracy and
is the expected random accuracy. The kappa coefficient is particularly useful for evaluating the classification of binary datasets, with informal settlements being one example.
where F1 is the harmonic mean of Precision and Recall, where Precision represents the proportion of correctly classified samples among those predicted as informal settlements, and Recall refers to the proportion of actual informal settlement samples that were correctly predicted.
3.4. Fuzzy Comprehensive Evaluation Based on Entropy-Weighted Method (EWM)
When constructing the above evaluation index system, we encountered the issue of spatial heterogeneity in raster data. Due to varying data resolutions, certain indicators, such as distance-based metrics, exhibited significant gradient changes, while other data, such as PM2.5 and surface temperature, showed notable local fluctuations. This uneven spatial distribution caused by varying resolutions resulted in small differences in the calculated weights when using the EWM, failing to accurately reflect the relative importance of each indicator. Particularly for indicators with similar distributions, such as distances to primary roads, major roads, schools, and stations, the weight differences were not significant, making it difficult to distinguish their different contributions to habitat quality. To address the issue, we introduced Gaussian fuzzification processing on the basis of the EWM. Gaussian fuzzification helps smooth data fluctuations, reduce the impact of extreme values, and preserve the spatial gradient characteristics of distance-based metrics. Specifically, Gaussian fuzzification applies fuzzification to each raster dataset, making spatial data changes more continuous and smooth, thus reducing the impact of local outliers on weight calculations while enhancing the spatial continuity of each indicator. This method effectively improves the shortcomings of the EWM when handling data with spatial gradient variations, enhancing the differentiation of indicator weights.
The EWM itself is an objective allocation method based on the principles of information entropy, which eliminates subjectivity and improves the reliability of weight calculation [
6,
7,
8,
9]. Gaussian fuzzification is applied to the standardized data to smooth fluctuations and reduce the influence of extreme values, further enhancing the stability and reliability of the final weights [
49]. The combination of these two methods ensures that the weight distribution for each indicator is more scientific and objective, while fully considering the spatial heterogeneity of the data, making the final composite score more aligned with the actual habitat environment. Therefore, the entropy values and weights calculated by combining the EWM with Gaussian fuzzification were used to compute the overall habitat environment score for informal settlements in Mumbai based on the index weights.
The study refers to the research of Guo et al. [
50] and Bole et al. [
51], classifying the indicators into positive and negative categories. In simple terms, the distance to POIs such as schools, hospitals, parks, or roads is considered a negative indicator because the farther the distance, the poorer the accessibility, which negatively impacts the residents’ quality of life. Therefore, these distances are generally categorized as negative indicators. On the other hand, indicators such as nighttime lights and green space density, which symbolize economic and environmental factors, are considered positive indicators because the higher their values, the more positive the impact on the residents’ quality of life.
Since the evaluation indicators have different dimensions, standardization of each indicator should be performed before constructing the comprehensive index. The methods were as follows. Positive indicator standardization:
Negative indicator standardization:
where, in Equations (4) and (5),
represents the normalized data for positive and negative indicators in the habitat environment evaluation of Mumbai,
and
denote the minimum and maximum values of the corresponding indicators in the evaluation for that region.
Calculate the relative weight of each subindicator:
In calculating the entropy value of the corresponding indicators:
Calculate the weight of each corresponding indicator:
Then apply the Gaussian membership function to each weight value to adjust the weight function:
represents the Gaussian membership of the preliminary weight, represents the mean of the preliminary weights, represents the standard deviation of the preliminary weights.
For the weights of positive and negative indicators, the mean and standard deviation are as follows:
The final weight values after Gaussian fuzzification are:
At this point, the corresponding weights are multiplied by the evaluation indicators and then summed, resulting in a single column of data, which represents the “comprehensive score”—the comprehensive habitat environment index (HEI) for Mumbai.
4. Results
4.1. Analysis of the Results of Informal Settlement Indicator Selection
In this study, a series of features were extracted based on Sentinel-1/2 data to identify the spatial distribution of informal settlements. These features include spectral indices (like EVI, BSI, BUI, NBI, BAEI, NDBI, UI, NBAI) as well as 18 band indicators from the GLCM. Additionally, polarization information from radar data, containing VV, VH, and VVH from ascending and descending orbits, and slope information extracted from DEM data were used. These initial features were filtered using random forest algorithms, hierarchical clustering, and Spearman correlation analysis to retain the most representative indicators for classifying informal settlements.
From the 2017 feature importance chart (
Figure 4a), the texture features Spr-diss and Spr-idm from the GLCM stood out, indicating that informal settlements exhibited significant spatial heterogeneity compared to formal residential areas. Particularly in terms of gray-level dissimilarity and homogeneity, texture features were key in distinguishing informal settlements. This finding is consistent with Kuffer et al. [
52], which revealed that texture-based methods demonstrated strong robustness in urban images during their investigation of global slum spatial characteristics.
Furthermore, it is important to note that changes in feature importance over time might also reflect underlying factors like the spatial and structural evolution of informal settlements, as influenced by urbanization and infrastructure development. For example, spring (Spr-diss and Spr-idm) and winter (Win-BAEI, Win-NDBI) features showed higher importance in 2017, possibly reflecting seasonal variations in vegetation cover and construction activities. These temporal changes are reflective of the evolving land use patterns and dynamic human activities in informal settlements, which contribute to the prominence of certain features at different times.
In contrast, the feature importance rankings for 2022 (
Figure 4b) changed significantly. Sum-inertia and Spr-contrast became the most important features, indicating that energy and contrast in the gray-level matrix played a more significant role during this period. This change reflected the evolution of the morphology and structure of informal settlements in the study area over the years. The 2022 data showed a different seasonal trend compared to the 2017 data. During this period, the importance of autumn features (Aut-inertia and Aut-savg) increased, which was related to vegetation degradation and increased surface exposure in autumn. Spring and winter features still maintained high importance, but summer features (Sum-inertia and Sum-BAEI) also had a greater influence on classification during this period, as surface coverage and object complexity, influenced by climate change, had a stronger impact on classification results.
Additionally, texture features and indices related to building extraction (Win-BAEI, Win-NDBI, and Win-BSI) retained high importance, with surface coverage features also playing a crucial role and providing strong support in distinguishing informal settlements.
We used the Spearman correlation heatmap (
Figure 5) to show the correlations between different features. The 2017 heatmap showed strong positive correlations between several spectral features (
Figure 5a), exemplified by Win-BSI and Win-NDBI, indicating that these features reflected similar surface information when distinguishing informal settlements. Additionally, radar polarization features such as Spr-VVH and Aut-VH also showed some correlation, suggesting that different polarization combinations of radar data have consistency in reflecting surface characteristics. The 2022 feature heatmap (
Figure 5b) showed changes in the correlation structure between different features, particularly with higher correlations between texture features such as Sum-inertia and Spr-contrast, while correlations between polarization and spectral features decreased. This change indicated that, over time, texture features were more important in reflecting the complexity of informal settlement structures, while the contribution of spectral and radar data to classification results had varied with the expansion or transformation of informal settlements.
By comparing the feature importance and correlation analysis results from different years, the following conclusions can be drawn: first, texture features from the GLCM consistently showed significant advantages across different years and seasons, especially in describing the complexity and structural changes of surface objects. Second, spectral features, namely, Win-BAEI, Win-NDBI, and Win-BSI, had high feature importance in both years, reflecting the close relationship between informal settlements and bare soil and building coverage. Finally, the importance of features changed over time, with texture features becoming more critical in 2022, while the role of polarization features diminished.
4.2. Results and Analysis of Informal Settlement Extraction
The RF model performed exceptionally well in handling high-dimensional data and preventing overfitting, yielding better results than rule-based OBIA methods, making it suitable for recognizing complex geographic information in urban informal settlements. Based on the feature selection and importance analysis mentioned above, we constructed an efficient classification model for informal settlements and trained and validated it using the RF algorithm. The model was trained with positive samples (informal settlements) and negative samples (formal settlements), and the dataset was split into training and test sets in a 7:3 ratio. Model performance was validated using kappa coefficients and overall accuracy (OA) results, as shown in
Table 4, with kappa coefficients above 0.77 and overall accuracy exceeding 0.89.
This study extracted informal settlements by using medium- and high-resolution remote sensing data. Kuffer et al. [
53] achieved an overall accuracy of 87% for Mumbai by integrating high-resolution imagery (WorldView). Similarly, Fallatah et al. [
54] combined single-day VHR imagery from the GeoEye-1 sensor with medium-resolution Landsat data, using the RF algorithm to classify six land cover types, comprising informal settlements, with an accuracy exceeding 90%. Peng et al. [
55] constructed a composite slum spectral index using Sentinel data and achieved a 54.45% intersection-over-union score through machine learning. Moreover, Matarira et al. [
56] also used machine learning classifiers on Quickbird imagery, achieving relatively high accuracy (>80%). In contrast, Najmi et al. [
57] combined street view and 0.4 m VHR imagery to extract informal settlements, obtaining a lower overall accuracy of 74.5–80.2%. In this article, only medium-resolution Sentinel data were used, with an overall accuracy of over 89% and an F1 score exceeding 90%.
The binary classification results obtained using the random forest algorithm are shown in
Figure 6. In the 2017 map, informal settlements (marked in purple) accounted for 20.6% of the built-up area, mainly concentrated in the northern and central parts of the city. By 2022, informal settlements further expanded southward, with more purple areas visible along the southern edge of the city, indicating the expansion of informal settlements in that region. The total area of informal settlements expanded from 45.37 km
2 in 2017 to 50.64 km
2 in 2022, accounting for 23.1% of the built-up area. Compared to 2017, the distribution of informal settlements in 2022 became more widespread, particularly in the peripheral areas, reflecting a trend of urban expansion along with the growth of informal settlements.
We utilized Google Earth and Sentinel satellite imagery to monitor the spatial distribution changes of informal settlements (ISs) in Mumbai between 2017 and 2022. Google Earth provides high-resolution localized views, while Sentinel offers medium-resolution wide-area imagery. By comparing the imagery and extraction results from these two data sources, it was demonstrated that free Sentinel data are effective and accurate in capturing the area and spatial distribution of large-scale informal settlements.
Figure 7 and
Figure 8 show the 2017 and 2022 Google Earth and Sentinel imagery, along with the corresponding informal settlement extraction results. Although Sentinel imagery has lower resolution and cannot provide the same level of detail as Google Earth, its spatial distribution trend is generally consistent with the high-resolution imagery and effectively reflects the overall distribution of informal settlements. Particularly in large-scale areas, Sentinel imagery exhibits high extraction accuracy, demonstrating its effectiveness in large-scale monitoring.
The expansion trend of informal settlements in 2022 is evident, especially at the urban edge and near transportation and commercial zones, consistent with the findings of Kohli et al. [
58]. Comparing the 2017 and 2022 imagery, the expansion speed of informal settlements significantly accelerated around the urban periphery, transportation routes, and commercial zones, reflecting the driving force of urbanization in the expansion of informal settlements. The regions shown in
Figure 7a,g,h,i and
Figure 8a,g,h,i indicate that informal settlements expanded rapidly, particularly near natural resources like farmland and forests. These settlements are often located in suburban or remote areas, where agriculture and settlement develop alternately [
59]. Furthermore, areas lacking government oversight typically form unplanned, dense settlements, especially along rivers and natural resource zones. Informal settlements are often located in hazardous areas, such as floodplains and marshlands, posing significant risks to residents [
24]. Regions in
Figure 8i,j,k show further expansion of informal settlements along rivers and transportation hubs, highlighting the significant impact of natural resources and infrastructure on the development of informal settlements. The comparison from 2017 to 2022 reveals a close connection between the spatial distribution of informal settlements and natural resources, infrastructure, and the urbanization process. Using Google Earth and Sentinel imagery, especially Sentinel’s medium-resolution imagery, has shown good accuracy in large-scale monitoring of informal settlements, particularly in capturing spatial distribution trends. Future research can focus on how to combine high-resolution and medium-resolution imagery to improve extraction accuracy and explore more efficient monitoring methods, providing theoretical support for urban planning and the improvement of informal settlements.
The findings of this study highlight that urbanization is not a linear process, with different countries following distinct urbanization pathways. In developing countries, urbanization often leads to overurbanization, where land becomes urbanized at a faster pace than the population, accelerating the growth of informal settlements. This pattern is particularly evident in the rapid expansion of informal settlements at the urban edge, near transportation routes and commercial zones. The spatial distribution of informal settlements, especially around these zones, shows a clear reflection of the driving forces of urbanization, where land is rapidly transformed but the population’s settlement patterns struggle to keep up. The dynamic of overurbanization shows that, while land is urbanized quickly, population movements may lag behind, leading to the formation of informal settlements in areas that have been urbanized but are lacking in infrastructure and services. This pattern is especially prominent along transportation corridors and areas adjacent to commercial developments, where informal settlements continue to emerge despite the presence of infrastructure and services intended for formal residents.
4.3. Evaluation Result of the HEI in Informal Settlements
The study determined the weights of the habitat environment index evaluation indicators, constructed from basic geographic data and remote sensing data, through the integration of the EWM and Gaussian fuzzification evaluation model. This method effectively improves the differentiation of weight calculations and more accurately reflects the spatial heterogeneity of raster data. The weight results of the living environment evaluation indicators for 2017 and 2022 (
Table 5) revealed the impact of various indicators on the habitat environment and their changes over time.
Overall, economic conditions had the highest weight in the habitat environment evaluation, with significant fluctuations in nighttime light data and housing price data, indicating the importance of economic vitality in evaluating the habitat environment quality. This is because economic development is closely related to urban infrastructure construction, and the improvement of infrastructure usually directly enhances residents’ quality of life. Nighttime light data reflect the intensity of urban economic activity. Highly concentrated commercial areas are often associated with stronger nighttime lighting, which directly impacts residents’ living and working environments. Housing price fluctuations also reflect the level of regional economic activity and residents’ living standards. High housing prices are often associated with better urban facilities and quality of life. Therefore, the high weight of economic condition indicators can also be seen as a reflection of the close connection between urban economic vitality and residents’ quality of life. Another important indicator closely related to economic conditions is urban infrastructure, including transportation convenience, distance to hospitals, and commercial areas in the index system. By comparing the data from 2017 and 2022, we found that the weight changes of transportation infrastructure and healthcare services were minimal. Regardless of urban development, the accessibility of basic social services (such as hospitals and commercial facilities) is crucial for residents’ convenience. Transportation facilities directly affect residents’ mobility and work efficiency, while the accessibility of healthcare facilities such as hospitals is directly related to residents’ health levels. Therefore, the stability and criticality of infrastructure have made these indicators’ weights remain relatively consistent between the two periods.
Despite efforts in recent years to improve air quality, air pollution remains a key factor affecting residents’ health and quality of life. The weight of PM2.5 concentration slightly decreased but remained at a high level, indicating that air quality remains an important long-term factor affecting habitat environment quality. With the progress of urbanization, especially in some informal settlements, air pollution issues may continue to affect residents’ health and living environments. However, the impact of extreme weather and climate change may persist in the long term, and therefore, PM2.5 concentration remains one of the key indicators for evaluating habitat environment quality. The weight of green space density dropped from 0.143 to 0.080, indicating that the relative reduction in green spaces during urban expansion had diminished its influence on the habitat environment index evaluation.
As urbanization progresses, the reduction in green spaces might be offset by improvements in other indicators, particularly the economy and infrastructure. However, this also means that future urban development must pay more attention to the protection of green spaces and public areas to improve residents’ quality of life. The weight of urban transportation infrastructure remained stable. Improvements in infrastructure can directly enhance residents’ mobility and improve the quality of life, especially in areas with urban expansion and high-density development.
In summary, the reason for the higher weights of some indicators in the habitat environment index is mainly because these indicators are directly related to the basic living conditions and quality of residents. Economic activities (such as nighttime light, housing prices) and infrastructure (such as transportation, healthcare) are usually closely related to the level of urban development and residents’ quality of life, and improvements in these factors directly drive the enhancement of residents’ living standards. Although environmental quality (such as PM2.5 concentration and green space density) also plays an important role in habitat environments, its impact has changed during urbanization. Therefore, future urban planning should focus more on the coordinated development of the economy and infrastructure, while protecting and restoring environmental resources to ensure the long-term well-being of residents.
The above weight results were comprehensively evaluated to obtain the HEI, and this result was classified using the natural break method into four categories of habitat environment evaluation (unsuitable, lowly suitable, moderately suitable, highly suitable), overlaid with the spatial distribution of informal settlements. As shown in
Figure 9, it was clear that the overall suitability of the habitat environment was improved with most informal settlements located in lowly suitable or unsuitable areas with relatively poor living conditions. In the spatial distribution map, highly suitable areas (green regions) expanded more noticeably in the city center and southern parts, with the suitability of the urban core significantly improving, especially in 2022.
This improvement was reflected not only in spatial distribution but also in the statistical data of habitat environment categories and the area proportions (as shown in
Figure 10 and
Table 6), confirming the enhancement of the habitat environment quality.
Figure 10 shows a significant improvement in Mumbai’s overall habitat environment between 2017 and 2022, particularly with the increase in highly suitable areas and the decrease in lowly suitable areas, reflecting the upgrading of the city center and some secondary centers. However, the improvement in informal settlements was relatively slow, with the proportion of unsuitable areas increasing (from 6.52% to 8.27%), indicating that residents in the city’s periphery and resource-poor areas still face significant challenges.
Despite significant improvements in some areas, unsuitable regions (marked in red) still existed and expanded in the city’s outskirts and some informal settlements. The proportion of highly suitable areas in Mumbai’s urban region increased from 33.25% to 43.42%. Although the total area slightly decreased (from 76.46 km2 to 73.04 km2), the more suitable areas became more concentrated, likely due to improvements in infrastructure and the expansion of social services. A similar trend was observed in informal settlements, where the area of highly suitable regions increased from 2.52 km2 to 3.47 km2, indicating that some of these areas benefited from policy interventions and redevelopment projects, improving the habitat environment quality.
The unsuitable areas in the northern and southern outskirts increased in 2022 (
Figure 10 and
Table 6). The area of unsuitable regions in Mumbai’s urban area increased from 7.04 km
2 to 10.57 km
2, indicating that these areas still face significant challenges, especially due to weak infrastructure, poor air quality, or inconvenient transportation. Similarly, the unsuitable areas in informal settlements increased from 2.52 km
2 to 3.47 km
2, with their proportion rising from 6.52% to 8.27%. These areas may have failed to improve due to poor economic conditions, high population density, or insufficient resource allocation. And moderately suitable areas saw an overall decrease in 2022, with Mumbai’s urban area dropping from 92.33 km
2 to 73.67 km
2. This shift is shown in
Figure 8, where some moderately suitable areas are replaced by highly suitable ones, reflecting overall improvements in the city center or secondary centers. However, some moderately suitable areas were downgraded to lowly suitable or unsuitable, particularly at the edges of informal settlements. The area of lowly suitable regions decreased from 54.11 km
2 to 37.04 km
2, indicating that some areas were upgraded to more suitable living conditions. This is also reflected in
Figure 10, where the orange areas gradually shrink, and some peripheral regions are improved and transformed into green areas.
5. Discussion
5.1. Validity and Limitations of the Methodology
The study extracted informal settlements using medium-resolution data combined with seasonal indicators and found that the spatial heterogeneity of informal settlements showed significant differences in textural features compared to spectral features. These texture features consistently exhibited stable and significant advantages across different years and seasons, demonstrating their robustness over time. The study utilized the MDI indicator to evaluate feature importance [
60], combined with hierarchical clustering to obtain optimal indicators, effectively leveraging the RF model’s feature importance evaluation capability to filter out the best feature subsets, thereby improving classification accuracy and efficiency. However, while the random forest model demonstrated high performance, the potential for overfitting during feature selection was not fully addressed. Overfitting can occur if the model becomes overly complex by retaining too many features that do not generalize well to unseen data. In this study, we utilized hierarchical clustering and the MDI indicator to optimize the feature selection process. However, the model’s reliance on the most informative features could still lead to overfitting, especially when there is high redundancy or correlation between features. To address this issue, future research could explore regularization techniques, such as feature pruning or cross-validation methods, to mitigate overfitting. Moreover, incorporating additional feature selection methods, such as the Boruta algorithm or ReliefF-RFE, could improve the robustness of the model by further reducing dimensionality and retaining only the most relevant features. These approaches would help prevent the model from becoming too sensitive to specific features and improve its generalizability to different spatial and temporal contexts. Choosing the right feature selection algorithm was crucial for the performance of machine learning classifiers in terms of both accuracy and simplicity [
61]. The study applied an entropy-weighted Gaussian fuzzy evaluation model to assess the habitat quality in informal settlements and dynamically adjusted the weight of different factors [
62] based on the data’s discreteness, adaptively reflecting the impact of different indicators on the HEI at specific time points. This method overcomes the limitations of traditional fixed-weight methods, which may not adapt well to dynamic socio-economic and environmental changes. While this analysis primarily focused on short-term spatial patterns, it provides valuable insights into the habitat quality of informal settlements. Future research could leverage multi-temporal data and long-term monitoring to track changes in habitat suitability over time. This will offer deeper insights into the evolving conditions of informal settlements, providing urban planners and policymakers with the necessary data to implement more effective, adaptive interventions. Previous studies on urban habitat environment quality have typically focused on multi-dimensional indicators to evaluate aspects such as quality of life, environmental suitability, and resilience [
63] but often lack a comprehensive approach to account for dynamic changes. In contrast, our approach, by considering both objective environmental indicators and spatial gradient characteristics, offers a more adaptable framework for assessing the evolving nature of informal settlements.
5.2. Reliability and Suitability of Data Sources
Acquiring high-resolution imagery for remote sensing is often limited by high costs, restricted coverage and difficulties in data acquisition, particularly in large cities like Mumbai, where the timeliness and availability of such imagery failed to meet the requirements for long-term and continuous monitoring. However, the accessibility, low cost, and open availability of Sentinel data provided a reliable and universal solution for this study. We used multi-spectral and multi-polarized Sentinel satellite data as the remote sensing data source. The combination of spectral and texture features provided rich information for extracting informal settlements, and the use of multi-seasonal data enhanced the model’s ability to identify complex surface objects. Nevertheless, while Sentinel-1 and Sentinel-2 data are effective for large-scale monitoring, they have limitations in capturing fine urban details, particularly in densely built-up areas with complex urban morphology. Sentinel-1, based on radar imaging, has difficulty distinguishing closely spaced structures, which may lead to misclassification, especially in areas with informal settlements characterized by irregular land use. Similarly, Sentinel-2’s 10 m resolution may not accurately delineate the boundaries of informal settlements, particularly in regions with mixed land use, dense vegetation, or significant shadowing. These challenges highlight the difficulty in accurately identifying informal settlements in cities with high urban complexity. To improve this, future research could combine medium-resolution data from Sentinel-1 and Sentinel-2 with higher-resolution commercial satellite imagery, such as WorldView or Pleiades, to refine the spatial boundaries of informal settlements and improve classification accuracy. Additionally, other methods such as OBIA or DL model-based approaches might provide better solutions to address the complexity of urban settlements.
5.3. Socioeconomic Drivers Behind the Expansion of Informal Settlements
The expansion of informal settlements from 2017 to 2022 is closely tied to socio-economic factors such as migration, economic disparity, and limited access to formal housing. Rapid urbanization and migration from rural areas have contributed to the growth of informal settlements, as many low-income individuals and families seek affordable housing. These settlements typically emerge in areas where land is cheaper, often at the urban periphery or near natural resources. Economic disparity plays a significant role, as rising land prices in city centers push low-income populations to the outskirts. The lack of affordable housing options forces many to settle in informal areas, where access to basic services and infrastructure is limited. Moreover, the absence of effective governance in these regions exacerbates the problem. Informal settlements are often left unchecked due to insufficient urban planning and government intervention. This study highlights that the growth of informal settlements is not only a consequence of economic pressures but also the result of broader governance challenges, including the lack of policies to manage urban sprawl and provide adequate infrastructure. Addressing these issues requires integrated urban planning and policies that balance urban growth with social equity and environmental sustainability. In conclusion, the expansion of informal settlements between 2017 and 2022 reflects the complex interplay of socio-economic and governance factors. Future urban planning strategies must focus on improving infrastructure and providing affordable housing to reduce the pressure on informal settlements and promote sustainable urban growth.
5.4. Relevance and Policy Implications of Evaluation Results
In the habitat environment quality evaluation indicators, the weight of economic conditions (especially nighttime light data and population density) has gained significant importance in recent years, reflecting the critical influence of urban economic vitality and population concentration on the overall habitat environment. However, the weight of social and natural conditions has changed little, indicating the continued influence of infrastructure and environmental quality on living standards. The decline in green space density highlights the balance issue between urban expansion and environmental resources. Given these findings, policymakers should adopt a more integrated approach to urban development, ensuring that economic growth, environmental sustainability, and social equity are all prioritized. The findings suggest that informal settlements in areas lacking infrastructure require urgent attention. A balanced approach is needed, with a focus on providing infrastructure in informal settlements while simultaneously protecting green spaces and promoting socio-economic equality. Moreover, inclusive governance should ensure that residents of informal settlements are involved in the planning process. This will enable urban development that is both equitable and sustainable. While the HEI in Mumbai has improved, many informal settlement residents still live in low-suitability conditions, indicating that their living conditions have not kept pace with overall urban development. Informal settlements near urban centers tend to have better access to economic opportunities and infrastructure support, while those relying on natural resources face greater risks due to inadequate infrastructure and lack of resilience to environmental shocks such as floods and pollution. Long-term adaptive interventions are necessary to track changes and guide policy adaptation, ensuring that informal settlements are gradually improved over time.
6. Conclusions
This study provides valuable insights into informal settlements in Mumbai by integrating multi-source remote sensing data and applying an entropy-weighted Gaussian fuzzy evaluation model. The research aimed to address critical gaps in understanding informal settlement dynamics and habitat quality, particularly by providing accurate baseline data for spatial distribution and environmental quality assessment. The following conclusions are drawn:
(1) The RF model, combined with MDI indicator extraction results, achieved kappa coefficients above 0.77, overall accuracy exceeding 89%, and F1 scores above 90%, demonstrating its high reliability and stability in extracting informal settlements. The study found that informal settlements expanded from 45.37 km2 to 50.64 km2 between 2017 and 2022, particularly in peripheral areas, reflecting ongoing urban expansion. These settlements typically expanded in two main patterns: one gradually near formal residential areas and the other around natural resources such as farmland, forests, and water bodies.
The findings highlight the growing significance of informal settlements in urban development, especially in developing countries where they emerge due to the limitations of state capacity to provide affordable housing. Informality is not a marginal phenomenon but an integral part of urbanization, shaped by both market forces and state policies. Understanding informal settlements requires considering them within broader urban governance processes and addressing underlying issues of social inequality. This study also emphasizes that informal settlements should be recognized as a norm in urban development, particularly in the context of rapid urbanization in the global south.
(2) This study underscores the importance of balanced development between urban and rural areas as a critical factor in addressing the growth of informal settlements. Viewing urban and rural areas as binary opposites exacerbates disparities and contributes to the formation of informal settlements, particularly at the urban periphery. To mitigate these challenges, both urban and rural areas require simultaneous development strategies that promote spatial equity, ensure access to basic services, and reduce the pressures on urban infrastructure and housing. Policymakers must recognize the interconnectedness of urban and rural development and implement strategies that do not favor one at the expense of the other. Urban development must be integrated with rural policies to ensure that investments in infrastructure and housing are distributed across both domains. By fostering synergy between urban and rural development, policymakers can better manage the expansion of informal settlements and promote sustainable development across regions. The findings also highlight that urbanization is not a linear process and that different countries follow distinct urbanization pathways. In developing countries, urbanization often leads to overurbanization, where land undergoes urbanization at a faster pace than the population, leading to further informal settlement growth. Addressing these challenges requires a deeper understanding of the underlying differences in urbanization trajectories and the dynamics of population movements, which often take much longer to align with land development. Additionally, urbanization’s impact on informal settlement growth requires adaptive and inclusive policies that integrate urban planning with rural development and address the long-term dynamics of population shifts and land use change.
(3) The weight results of HEI evaluation indicators from 2017 and 2022 revealed that economic conditions had the most significant impact on habitat quality, with nighttime light data and housing prices reflecting economic vitality. In contrast, the weights of social, natural, and residential conditions remained relatively stable. Over time, the overall suitability of the habitat environment improved, with highly suitable areas increasing from 33.53% to 43.42% and low-suitability areas decreasing from 3.06% to 2.49%. However, resource-poor areas saw slower improvements, with unsuitable areas rising from 6.52% to 8.27%, highlighting the ongoing challenges for residents on the outskirts. While economic factors such as economic vitality play a critical role in improving habitat quality, addressing the challenges of informal settlements requires more than just economic development. Effective urban governance must extend beyond governance effectiveness to focus on governance inclusivity, ensuring that informal settlement residents are involved in decision making and have access to essential resources and opportunities. This research provides evidence for integrating informal settlement areas into urban planning strategies and urban renewal policies, ensuring that resources are directed to areas in critical need. Urban renewal should be part of a broader policy agenda that integrates both social and economic inclusion, addressing the complex needs of marginalized communities. The findings also highlight the importance of recognizing informal settlements not as isolated phenomena but as integral parts of the urbanization process, requiring targeted interventions to ensure sustainable urban development.
This study enhances the understanding of the environmental quality of informal settlements by integrating multi-source remote sensing data and applying an entropy-weighted Gaussian fuzzy evaluation model. By providing detailed spatial and environmental data, it establishes a solid foundation for sustainable urban planning strategies, ensuring that interventions are tailored to the unique challenges of informal settlements while promoting equitable development. The findings are particularly valuable for policymakers and urban planners aiming to implement targeted interventions that address the specific needs of informal settlement areas. By identifying key regions of expansion, the research offers actionable recommendations for directing resources and infrastructure investments to the most vulnerable areas. Additionally, the study underscores the importance of integrating informal settlements into broader urban renewal and governance frameworks, contributing to the achievement of Sustainable Development Goal 11.1, which focuses on improving the living conditions of urban slums. These insights provide a pathway for improving the habitat quality within these settlements, guiding policy decisions that balance economic growth, social equity, and environmental sustainability.