1. Introduction
For decades, sustainable development has been a significant challenge for nations, which is supported by, among other aspects, the environmental and socio-economic impacts associated with registered population growth. In 2018, 55% of the world’s population lived in urban areas, which is expected to increase to 68% by 2050 [
1]. The primary objective in addressing this challenge is to provide an orientation for a sustained improvement in the population’s living conditions, which faces poverty, disease (associated with environmental and social determinants), and violence, among other situations. In this regard, the development and implementation of the Millennium Development Goals (MDGs) and the subsequent Sustainable Development Goals (SDGs) play an important role in determining the progress made towards achieving sustainable development.
The concept has been analyzed in different studies from different approaches [
2,
3,
4], based on a broad spectrum of interpretations, primarily founded on the notion established in the report Our Common Future, which states “development that meets the needs of the present while considering the needs of future generations” [
5] (p. 16). Notwithstanding the global nature of the term [
2,
3], studies primarily focus on analyzing three fundamental pillars, environmental, social, and economic dimensions. Each dimension has its own specific challenges with respect to territorial conditions, in addition to being connected and integrated with one another, in order to make sustainable development achievable.
Evaluating sustainability establishes a degree of development for urban ecosystems in which natural and artificial structures interact and coexist. Ecosystem services, provided by natural systems, contribute to urban ecosystems’ sustainability through the provision of goods and services. However, environmental conditions are altered (air emissions, waste, wastewater, among others) as the result of man-made structures and urban communities.
The pillars of sustainable development are looked at from a policy context, with a view towards an interaction between ecology and society, human ecology. The environmental dimension corresponds to natural resources and anthropogenic structures, while the biological community refers to the living components of ecosystems [
6]. The social, economic, and institutional dimensions are part of the social system that is modified by technological infrastructure, knowledge, and social organization.
Social systems’ influence on ecosystem services impacts the environmental dimension, not just at the resources level, but also in its biological community. In this manner, these interactions have been measured through indicators, whose objective is to establish conditions for the analyzed resource, in order to make decisions about its resilience.
Population growth and the ensuing pressure on natural systems through the use and exploitation of resources creates a need to understand these forms of pressure and possible measures that can be implemented to promote achieving the Sustainable Development Goals. Knowing the variation of sustainable development in a territory, based on its behavioral pattern, is an indispensable input for planning actions and measures. Natural ecosystems are basic to human life. As such, forecasting ecosystems’ behavior, both natural and urban, can provide tools to protect human ecology.
Several studies have been developed to measure progress levels with respect to sustainability in countries and cities [
2,
4,
7,
8,
9,
10,
11], in addition to other studies that have created inputs for forecasting nations’ sustainability levels by using machine learning tools [
12,
13,
14,
15]. These studies have established procedures for the calculation, aggregation, and comparison of indicators in different settings, and have also proposed tools that can be useful in decision-making. However, these studies have been developed mainly from a global perspective, for a comparison between the behavior of countries and cities, leaving aside a more detailed territorial level approach, which is useful for the territorial synergy required to implement the SDGs. There is a need to integrate actors in an analyzed territory, in addition to structuring a comprehensive instrument to support the development of urban sustainability processes at the local level.
Machine learning tools have been used for decades in different settings to forecast the future behavior of input information. With the generation of large volumes of data, using these tools has become more useful in developing improvement strategies and analyzing sustainability from the smallest urban setting (organizations, households) to the territorial level. Therefore, it is essential to understand that achieving sustainable development is not only carried out through a national policy perspective but also in understanding the actions of territories that are part of cities and regions, urban micro-territories [
16].
In this vein, this study seeks to establish a methodology for forecasting sustainability levels of an urban ecosystem through supervised modeling with machine learning tools. For the case study here described, the locality of Kennedy was selected, which is an urban territory in the city of Bogotá, the capital of Colombia. Kennedy has 1.2 million inhabitants with a rapidly growing population, 38% growth from 1993 to 2017. Additionally, 5.3% of this population lives in multidimensional poverty, among which, the health dimension (60%) is where most people are affected [
17]. Kennedy is characterized by being one of the most polluted zones in Bogotá in terms of air quality, in addition to having high levels of insecurity. Several economic and service activities with contrasting environmental, social, and economic behavior interact in this urban micro-territory. The analysis period for this study was 2009–2017.
Developing the aspects contained herein is innovative in that it applies machine learning tools to a territorial analysis approach. This study analyzed the dimensions of sustainable development in a more specific territorial scope that addresses aspects such as the difficulty in accessing information, a common characteristic in Latin America. This study is pioneering as it not only includes opinions from experts and community residents in the territory, but also an analysis of complaints and requests in the context of urban needs. The territorial scope established for sustainability analysis, in the field of human ecology, is a perspective that nations need to take into account in order to achieve better results related to sustainable development goals and targets. In general, there is a lack of machine learning models that forecast the sustainability behavior of urban territories, starting at the micro-territorial level, to support national and global perspectives for informed decision-making.
This study is structured as follows: Following this introduction, a description is given of the different steps undertaken for the supervised modeling of sustainability levels. These include collecting information by evaluating sustainability levels, applying machine learning tools, and an analysis of the same according to evaluation metrics. Afterwards, the results from applying this methodology in the case study are presented, in which the conditions of the micro-urban territory were identified, along with an indicator correlation within the framework of the sustainability dimensions. The territory’s behavior over the years analyzed is presented through a categorization of sustainability levels, as well as the behavior of the machine learning models that were used. The study concludes with an analysis and discussion of the results, putting forth a suggested method to forecast sustainability levels in urban territories.
3. Results
3.1. Characterization of the Study Area
A set of 81 indicators was established to be used as inputs for the process. The table presented in the
Supplementary Material puts forth a description of the indicator set according to the dimension to which it belongs, the intersection if the indicator is part of an intersection (livable, equitable, viable), as well as the related sustainable development goal and target. Each indicator has an identification code, a combination of a letter and a number. The E letter identifies indicators belonging to the environmental dimension, the S letter identifies indicators belonging to the social dimension, the letters EC identify indicators belonging to the economic dimension and, the letter I identifies indicators of the institutional dimension.
Table 2 presents an outline of the indicator set, displaying the number of indicators according to the characteristics established for each cell.
With regard to the environmental dimension, over the analysis period, the study zone has improved in terms of its indicators on air quality, waste collection, and areas allocated for green spaces. However, domestic wastewater generated in the locality is discharged into water sources without any type of treatment. On the other hand, while some indicators behave in a relatively constant manner, the importance of their improvement is noteworthy, specifically km2 of green areas and recreational spaces.
With respect to the social dimension, a substantial number of indicators (25%) are related to the subject of health, given the influence exercised by socio-environmental determinants. These indicators’ behavior does not reflect a marked upward or downward trend but responds specifically to the health determinant conditions present each year in the study area. Despite the variability, improvements are seen in indicators such as the child malnutrition rate, under-five mortality rate, all-cause infant mortality rate, and maternal mortality ratio.
Regarding the education indicators, gross education coverage decreased in 2016 and 2017 in the study area. However, the indicator behavior improved for areas such as years of schooling completed, illiteracy rate, population with middle and high school level education, and school attendance rate during the analysis period. Furthermore, with respect to population, the number of inhabitants per square kilometer has seen an upward trend, but the number of square kilometers with informal settlements has decreased, while coverage of the storm drainage system and the number of passengers transported by the mass transportation system have increased.
The study area is noted for having many security concerns, shown in indicators such as theft, aggravated robbery, and reports of domestic, family and child abuse, indicators which had a negative behavior trend during the study period.
Concerning its economic structure, the locality has high levels of its population living under the poverty line, with its highest recorded value in 2015, with 183,966 inhabitants in this condition. In the final two years of the study period, this indicator decreased by nearly 10%, in which there was a higher risk of water shortages (on average, 171 people ± 42). However, there was an improvement in indicators such as access to electricity (a yearly increase of nearly 2%), per capita household income, and improvements to the road network in the urban area.
Lastly, the institutional dimension is supported by policies and actions from the institutional sphere to meet the needs of the other pillars. The indicators that comprise this dimension had stable behavior during the analysis period.
As shown by the indicators, these characteristics are consistent with the frequency analysis of complaints filed by community members, which had high values concerning safety (15% of the 46,800 written complaints analyzed). This is in addition to the situation of the canonical correlation that enabled the indicators to be conjugated, which is described below.
Canonical Correlation
In the correlation analysis of the 81 indicators with an annual frequency in the period 2009–2017, the comparison between environmental protection and economic growth (see
Figure 3) found a relation between indicators such as PM
10, PM
2.5, access to public services and the unemployment rate. The upper right-hand margin of
Figure 3 shows an important grouping of economic indicators. All have positive behavior, in the sense of increased per capita household income (EC5), an increase in energy consumption (EC12), and growth of the employed population (EC3), for example. In this grouping, there are environmental indicators such as the average annual concentration of PM
10 (E1), the number of trees per hectare (E13), and the water quality of the Tunjuelito River (E10). Furthermore, the same quadrant includes indicators regarding PM
2.5 (E2) and the road network in good condition (EC15), both with improving trends.
The second chart (
Figure 3b) shows an initial grouping of indicators that measure mortality rates: All-cause infant mortality (S6), under-five mortality from pneumonia (S4), under-five mortality (S10), perinatal mortality (S18), and life expectancy at birth (S28). The air quality index (E5) is included within this set of indicators in
Figure 3b. There is also a set of health indicators such as acute malnutrition in children under five (S7) and the infant death rate (S21), indicators that characterize the physical conditions of the study area such as km
2 of areas susceptible to flooding (S38), as well as service indicators, which include the number of passengers who commute via the mass transportation system (S35) and households with access to natural gas service (S42). Furthermore, there are education indicators such as school attendance rate (S23), average years of schooling completed (S22), and population with a middle and high school education (S26). Another social indicator in this grouping corresponds to deaths due to firearms (S31). In addition to this set, there is the average annual concentration of PM
10 (E1) and closely related indicators such as the water quality of the Tunjuelito River (E10) and the number of trees per hectare (E13). This same chart shows the closeness of indicators that report excesses of PM
10 (E3) and PM
2.5 (E4), as well as the indicator that corresponds to the mortality rate due to cardiopulmonary disease, pulmonary circulation diseases and other forms of heart disease (S1).
Lastly, the third graph (see
Figure 3c) shows a comparison between social inclusion and economic growth in which there is a correlation between indicators such as access to public services, the economically active population, and education level.
3.2. Progress Level of Sustainable Development
Applying Equations (1) to (4) (see
Table 1), the sustainability categories were calculated for each analysis year in Kennedy. The locality has had low to medium sustainability levels (see
Figure 4). However, the behavior in 2016 and 2017 surpassed the medium sustainability level (0.33–0.66). Moreover, the biogram presented in
Figure 5 shows the behavior of the environmental, social, economic, and institutional sub-indices for the study area.
Figure 5 shows the influence of the institutional and economic dimensions, with a lag seen in the environmental pillar when compared with the other dimensions. In general, the behavior related to the SDI has improved for each dimension from 2015 to 2017.
3.3. Machine Learning Model
As mentioned in the methodological description, yearly and monthly information was used to develop the models. Each model was calibrated based on specific parameters for each machine learning tool, following the selection criteria provided by the kappa and accuracy measurements, as presented in
Table 3.
By applying the models, we found that due to the limited number of observations (nine data points for each indicator), models based on yearly information turn out to be inconclusive. Given the low volume of observations entered, it was not possible to forecast sustainability levels. However, using a monthly scale increased the number of observations, which enabled a greater volume of information to be available to train and validate the models.
Table 4 presents the results for the three models developed. The labels high, medium and low correspond to the classification categories of the sustainability level assigned to the model for training and subsequent forecasting. Values with results in the 0.67–1 range belong to the high sustainability category, values with results in the 0.34–0.66 range correspond to the medium category, and values with results ranging from 0 to 0.33 belong to the low category.
As this is a multi-class model as a whole, the decision tree model yields the best metrics (see
Table 4). Decision trees and neural networks were 95% and 96% accurate, respectively. The high and medium territory sustainability categories were 81% and 80% accurate, respectively. While the support vector machine was not as accurate, it performed well in the classification, with values of 79% for the high category and 70% for the medium category.
The accuracy of the low classification category indicates that neural networks and the support vector machine classify the information for this category in a random manner. Only decision trees were 60% accurate in the low classification category.
These values are consistent with the results established by the precision metric, in which the decision tree and neural network models correctly predicted 75% of the labels in the high category. According to the recall metric, 100% of the labels for this category were forecasted. With respect to the medium sustainability category, the precision metric shows that 90% of the forecasted labels were correct in the decision tree model, and according to the recall metric, 82% of the category was forecasted.
Variable Importance Based on the Gini Index
For the decision tree model, the variables with the greatest importance were: Population with access to health services (S47), residential per capita water consumption (EC16), and excess PM
10 (E3) (see
Figure 6). For the neural network model, the variables with the greatest importance were: Reports of violence and domestic abuse (S32), excess PM
10 (E3), theft and aggravated robbery (S33), mortality rate due to pneumonia in adults older than 64 years of age (S3), and average annual concentration of PM
2.5 (E2) (see
Figure 6). With respect to the support vector model, the most influential variables that exceeded 60% importance were: Population with access to health services (S47), passengers who commute via the public mass transportation system (S35), reports of violence and domestic abuse (S32), energy consumption (EC13), average annual concentration of PM
2.5 (E2), excess PM
10 (E3), and residential per capita water consumption (EC16). The above can be seen in
Figure 6a–c, related to each forecasted level of sustainable development.
When comparing the most influential variables in the models, the excess of PM
10 variable (E3) is present in the three applied models, with similar levels of importance: 64% for ANN, 78.4% for SVM, and 37.8% for DT, for the high and medium sustainability categories (see
Figure 6a,b). Additionally, its importance drops by 19 percentage points in the low category for the SVM model (see
Figure 6c). While the population with access to health services variable (S47) is the most important variable in the DT and SVM models, it scores less than 30% in the ANN model. The role of the social dimension’s variables, related to security, stands out, given its influence on the classification of sustainability levels of the urban area.
4. Discussion
The canonical correlation analysis found that the behavior described by the indicators shows that the urban area has different needs regarding the sustainability pillars and residents’ quality of life. This is reflected in the interactions between indicators that seemingly do not show a direct relationship, yet describe specific determinants of the micro-territory’s reality in the habitable and equitable interactions in the urban area [
10].
There is an interaction between indicators such as the employed population between 12 and 64 years old (EC3), the economically active population (EC2), and indicators related to the habitable interaction, such as water quality of the Tunjuelo River (E13) and trees per hectare (E10). In addition to the analysis, there is a connection between indicators regarding economic issues and those that address social characteristics in the area, in terms of education and security (theft and violence). The grouping with the canonical correlation reflects behavior as described by Tanguay (2017) [
10], for each of the pillars’ interactions. Furthermore, the grouping of sustainability indicators, such as passengers transported (S35), aging rate (S30), households with access to water (S42), energy consumption (EC12) and acute malnutrition of children (S7), which, despite the classification of specific issues, result in the interaction of sustainability dimensions in the territory. With respect to these interactions, it is important to note that the priorities in evaluating and measuring urban sustainability are determined by the territorial characteristics themselves [
2]. That said, it is necessary to establish a comparison line in order to identify territories’ evolution. To this end, the Sustainable Development Goals and its targets are an appropriate platform that brings together common goals.
Previous studies on the city of Bogotá have determined that the most relevant variables in the sustainable development index are poverty, crime, and unemployment [
4], in which the index was calculated by applying a sustainability assessment by fuzzy evaluation. These variables are consistent with the results from this study in the complaints analysis as an input to prioritize indicators and calculate the Sustainable Development Index. However, it is considered that they should not be the only factor of interest as sustainable development is achievable only to the extent that interactions are addressed and balanced, such as the livable, viable and equitable dimensions [
7,
11], as shown by the canonical correlation analysis.
These indicators’ behavior establishes that the population increase in the urban area and its resulting impacts, substantiate the need to advance a process of continuous feedback in order to support improving the conditions of the environmental, social, economic and institutional dimensions in territories. These are the results obtained from evaluating the Sustainable Development Index.
Kennedy is the second most populated territory in Bogotá. According to the SDI evaluation, the SDI of the urban area has moved from the low to the medium category over the period 2009–2015, with values that surpassed the medium sustainability category in 2016 and 2017 (See
Figure 4). Prior studies have determined that Bogotá has reached a medium sustainability level (0.55, on a 0–1 scale), ranking 88 among 106 European, African, Asian, and Latin American cities [
4]. Another study that applied multivariate statistical techniques [
8] identified a medium sustainability level for Kennedy. Despite the difference in the methods applied to evaluate sustainability, these studies were consistent with the results presented in this paper. Furthermore, the variation in the numerical values recorded is limited, which is counterbalanced by studies that analyzed the variation in results with respect to the methodological variation in calculating sustainability, which yielded similar results even with different methodologies applied [
10]. That said, it is important to note the importance of indicator selection for a relevant evaluation of sustainability.
Furthermore, a comparison of the influence of a micro-territory with better socio-economic behavior than Kennedy found that the results obtained through the SDI evaluation for Kennedy in this study are consistent with results from prior studies [
8]. Teusaquillo is another micro-territory in Bogota, which, unlike Kennedy, is characterized by having greater purchasing power, more employed people, as well as having better educational, financial, cultural, and recreational services. In this vein, according to Carrillo and Toca (2013) [
8], Teusaquillo achieved a high sustainable level in the evaluation. These are aspects that, despite the difference in methodologies, influence territories’ progress towards sustainability.
Moreover, it has been noted that the development and implementation of a machine learning model require enough observations to ensure adequate training and validation of its behavior. This project faced limitations associated with not having enough information. Some of the available information corresponds to specific data concerning the city of Bogotá, primarily corresponding to the periods in which surveys, reports on the implementation of government plans, or the gathering of information for specific purposes were carried out. Planning and territorial evaluation processes do not consider creating range indicators for urban sustainability dimensions at the micro-urban territory level. In the face of these limitations, the following three specific aspects stand out:
(1) Benchmarking was used to select the indicators for this study, which was carried out by examining many existing studies on these types of indicators, in addition to reviewing the framework of the SDGs to achieve congruity amongst the indicators. The analyses presented herein are consistent with those presented by L.-Y. Shen et al. (2011), Shen et al. (2013), and Verma et al. (2018), regarding the need to have valid objectives and targets for each territory as a clear support mechanism to evaluate progress made towards sustainability [
2,
3,
26]. The indicators are matters of governance, but not issued by the government [
8]. As such, it is necessary to develop a collection of historical data on territorial behavior, as this provides evidence of territories’ evolution and support for sustainable development processes. Furthermore, given that population is an essential component of urban activities [
2], participation from interest groups and including their needs to determine the set of indicators is necessary.
(2) The evolution of territories, as a goal of sustainable development in which human beings are the central axis of governments, requires coherence and coordination to identify, collect, and process information. Several studies use national statistics that have been published on various platforms for years prior to the implementation of the Millennium Development Goals as the basis for their information sources. Unfortunately, a clear example of the need to prioritize indicators can be seen in Latin American territories, where a greater impulse is required in information management, as demonstrated in the micro-territory analyzed in this study. It is also a mitigating circumstance for the capital city’s position in the ranking of cities with the lowest sustainability levels, according to the results from Phillis et al. (2017) [
4].
(3) At the international level, proposals for forecasting sustainable development in different cities and countries have been developed using indicators with a yearly scale [
12,
13,
15]. However, the present study was not able to yield conclusive results for this time scale. In applying DTs, as one of the simplest tools for this type of classification problem, and the SVM and ANNs as robust tools, nine observations were not enough to properly train the model and validate its results. As stated above, 70% of the data was used for training, and 30% for behavioral validation. Therefore, using these types of tools requires large amounts of information, which prevents generalization problems and ensures the information’s quality to support decision-making. In this vein, the model for this study reduced the working scale to monthly indicators, finding that the decision trees had the best behavior, with neural networks having the potential for improvement.
Lastly, the method applied and structured through this study established a logical procedure that begins with identifying the most influential parameters in an urban territory and concludes with forecasting their behavior in terms of sustainable development (see
Figure 7). This procedure collected experiences developed in various studies that combine community participation in the territory, the technical expertise of professionals in areas of sustainable development, and the robustness offered by machine learning tools such as decision trees, neural networks, and support vector machines. This study was innovative in that it took a methodological step forward by integrating the community who are affected by their government’s decisions, while including experiences from different studies, and the vision of the SDGs. It also integrated different tools for decision making, to be used for annual and statistical collection plans, as well as to manage the different resources that characterize the sustainability pillars.
Future studies should focus on the importance of having spatialized information, which enables the identification of the behavior of habitability interactions and the viability of sustainable development in different territories. This information can be used to forecast sustainability categories with machine learning tools as additional support for decision-making. Similarly, it can resolve difficulties in accessing information [
2], even at the level of an urban micro-territory analysis, which was chosen for this study.
5. Conclusions
As shown in the present research, urban ecosystems include a combination of diverse micro-ecosystems, whose interaction supports economic development, yet leads to environmental damage and the deterioration or improvement of the population’s quality of life. In this manner, the continuous evaluation and forecasting of this behavior contribute to developing strategies to improve the habitability, viability, and equity of urban territories with a view towards meeting the targets established by the SDGs.
While some studies have been developed to forecast sustainable development, these have focused either on specific sustainability dimensions or on understanding countries’ evolution regarding the same. The latter are analyzed from a global perspective based on behavior in different territories. Along these lines, this study, which includes coordinating a series of procedures, contributes to the advancement of sustainability at the urban micro-territory scale. Its comprehensive method contributes to the academic and public arenas in the sense that it puts forth a tool that forecasts the category level of future sustainability in a micro-territory, such as Kennedy. It provides an opportunity to develop information-gathering strategies and action plans, as well as monitor their implementation.
This instrument stands out in the sense that it reduces the territorial and temporal scope of information, in order to have a better territorial observation and to make use of systematized tools to analyze the portfolio of governmental proposals as techniques in different fields of sustainability, thus contributing to habitability, viability, and equity interactions.
The micro-territory analyzed as a case study in this research study is representative of different environmental, social, and economic conditions in Bogota. Kennedy is one of the most populated areas of the city, is one of the most polluted zones in Bogota in terms of air quality, in addition to having high levels of insecurity. It also represents an important economically active population of the city. The results from this study show consistent progress in implementing several policies and show the value of using statistical and machine learning tools to identify behavioral patterns of variables that influence the performance of micro-territories in the city, which is useful for decision-makers. Currently, decision-makers need to understand future situations regarding the implementation of current measures. Knowing of indicators that influence sustainable development enables leaders to make more informed decisions.
Concerning the results of the statistical analysis and the important variables through the Gini index in machine learning models, it is important to note that the later reinforces results from traditional methods.
This study found limitations on information availability for indicators that describe the behavior of sustainability dimensions in the micro territory. It is necessary to have a significant amount of information either for an appropriate characterization of each sustainability dimension, or to feed the machine learning models. Therefore, the information gathering phase required the most time and resources of this study.
Further research studies will be able to apply the methodology developed herein, in conjunction with machine learning models for each micro-territory in Bogota. The studies contemplate an analysis of micro-territories and how sustainable dimensions and their interactions are influenced by socio-economic aspects. This will enable a comparative analysis of the behavior of micro-territories, taking into account indicators on the environmental, social, and economic dimensions, as useful tools for decision-making related to resource prioritization and allocation. Additionally, conducting research that considers spatialized information will identify the behavior of habitability interactions and the viability of sustainable development in different territories.