In this section, the application of the attribute-weighting method to a real case study is presented.
4.2. Implementation of the Case Study
The methodology used in the case study is depicted in
Figure 1. To carry out the implementation of the proposed method and the multicriteria decision-making process, the R programming language was employed to perform the necessary calculations in all phases [
40]. FactoMineR library was chosen for its capability to facilitate multivariate data analysis, particularly Principal Component Analysis [
41]. In the case of the Entropy Method, operations were performed manually.
Data Collection: The data used in this study were obtained from the National Population Council (CONAPO) [
42]. These data underwent a comprehensive cleaning process, which included the removal of redundant data, imputation of missing values, and standardization of the dataset. In total, the dataset comprises 1814 instances, each representing observations associated with different localities and characterized by nine specific attributes.
Table 2 provides a detailed summary of each attribute.
Table 3 presents an excerpt with the dataset values to facilitate a comprehensive and transparent understanding of the data used in this analysis.
Later, a data normalization technique was applied to ensure that the dataset attributes were on the same scale. The normalization technique used was z-score normalization. In this method, dataset values are transformed to have a distribution with a mean of 0 and a variance of 1. Normalization plays a crucial role due to its impact on the homogeneity and comparability of the attributes, essential factors for the subsequent attribute-weighting process.
The attribute-weighting phase stands out as the most crucial stage in the multicriteria decision-making process. To assign weights to the attributes, the proposed method, based on Principal Component Analysis and the Entropy Method, was applied, as detailed in
Section 3.1. The following outlines the steps followed during the application of the proposed method for attribute weighting:
The first step involves calculating the covariance matrix. The covariance matrix provides a detailed perspective on the relationship among the various attributes that make up the dataset.
Table 4 displays the values obtained from the covariance matrix.
From the covariance matrix, the vector of eigenvalues associated with the matrix was obtained. These eigenvalues are of paramount importance, as they encapsulate valuable information about the distribution of variance in the dataset. In other words, eigenvalues reflect the amount of variance present in each direction, indicating which dimensions of the dataset are more significant.
Table 5 displays the obtained vector of eigenvalues and the percentage of explained variance for each eigenvalue.
This step involves obtaining the matrix of eigenvectors. Each eigenvector, constituting a column in the matrix, is linked to a specific eigenvalue and provides information about the precise direction along which the greatest variability in the original data occurs. Each row of the eigenvector matrix corresponds to the attributes of the dataset. Thus, each value in the matrix reflects the relative importance of each attribute in relation to the overall variability of the data, enabling a deeper understanding of the relationships among the attributes. See
Table 6 for details.
This step involves obtaining the decision matrix. In order to ensure a meaningful representation of the inherent variability in the data, a threshold of 95% explained variance is established. This choice justifies the selection of the first five eigenvectors. The result is an
decision matrix.
Table 7 provides a clear visualization of this matrix. The values of CP1 indicate that attributes 2, 7, and 8 have a greater impact. For the CP4 vector, attribute 3 has a higher impact compared to the rest.
Subsequently, the values of the decision matrix were normalized within a range [0, 1], and Entropy was calculated for each column of the normalized matrix. See
Table 8.
Entropy emerges as a crucial tool that enables the assessment of information and uncertainty present in the dataset.
Table 9 displays the vector of Entropy values.
This process culminates in assigning weights to different attributes, establishing a ranking on a scale ranging from 0 to 1. The allocation of weights on a 0 to 1 scale provides a more precise measure of the influence of each attribute, allowing for a detailed interpretation of its impact in multicriteria analysis.
Table 10 displays the weight values for each attribute and their level of importance within the dataset.
It is observed that the attribute with the highest relevance is , which is related to the percentage of the population aged 15 and over who are illiterate. Another highly relevant attribute within the dataset is , related to the percentage of households without access to drainage or toilet facilities. On the other hand, the least important attribute within the dataset is , related to the percentage of households without a refrigerator.
This stage combines the contribution of each attribute in a weighted manner, providing a comprehensive representation of the alternatives. During this phase, the exponential weighted aggregation model was implemented, as detailed in Equation (10). For the classification of alternatives, based on the results obtained in the alternative evaluation, the Jenks method, a known interval classification method, was employed.
where
represents variable
, and
is the weight associated with variable
i.
To carry out a comprehensive evaluation of the proposed method’s performance, measurements were conducted using specific metrics to enable a detailed understanding of its effectiveness. In this context, the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were employed. In Equations (11) and (12), the mathematical expression for the Mean Absolute Error and Mean Absolute Percentage Error is presented. These metrics were chosen to accurately determine the proximity of the estimated results to those presented using the marginalization index. The primary objective of this evaluation is to comprehend the extent to which the obtained results align with the actual values of the marginalization index.
Mean Absolute Error (MAE):
where
is the total number of observations;
is the real value of observation ;
is the estimated value of observation .
Mean Absolute Percentage Error (MAPE):
where
is the total number of observations;
is the real value of observation ;
is the estimated value of observation
Based on the calculations performed, an MAE of 0.244 and an MAPE of 11.33% are obtained. These results indicate that the estimates are approximately 0.24 units away from the actual values, with a deviation of 11.3%. In a dataset comprising 1814 observations, it is noteworthy that, for 1381 of them, the absolute distance between the estimate and the reference value is zero. For another 422 observations, the distance is one, and finally, for the remaining 11 observations, the distance is two. These findings demonstrate that the estimates closely align with the actual data, supporting the effectiveness and precision of the proposed method. Additionally, it is observed that the estimated values follow a similar trend to the behavior of the marginalization index.
Figure 3 provides a graphical representation of the comparison between the estimated and actual values of the marginalization level for 100 locations.
In this phase, a sensitivity analysis was conducted to assess the robustness and stability of the obtained model. The sensitivity analysis aims to understand how the model results vary when adjusting sensitive parameters, such as attribute weights. Random numbers were generated in the interval [0, 0.5] for each attribute and added to the weights of the obtained model. This process was repeated in 10 experiments. The best result exhibited a Mean Absolute Error of 0.15 and a Mean Absolute Percentage Error of 10.03%, surpassing the values in the previous stage. Specifically, it was observed that approximately 77% of the data showed an absolute distance equal to zero between the estimated value and the reference value. The absolute distance is equal to 1 for 23% of the data. Finally, the absolute distance between the estimated value and the reference value is 2 for 0.3% of the data. This analysis provides valuable information about the model’s performance in different scenarios, contributing to a more detailed understanding of its behavior.
Based on the obtained results and with the help of the Geographic Information System QGIS version 3.34.3, a cartographic representation of locations with their respective levels of marginalization in Mexico City was performed, using a scale of five colors. The scale of marginalization levels includes: very low, low, medium, high, and very high: see
Figure 4. This color scale allows for observing which locations are more and less vulnerable based on various social aspects.