An Integrated Attribute-Weighting Method Based on PCA and Entropy: Case of Study Marginalized Areas in a City

Pliego-Martínez, Odette; Martínez-Rebollar, Alicia; Estrada-Esquivel, Hugo; de la Cruz-Nicolás, Ernesto

doi:10.3390/app14052016

Open AccessArticle

An Integrated Attribute-Weighting Method Based on PCA and Entropy: Case of Study Marginalized Areas in a City

by

Odette Pliego-Martínez

,

Alicia Martínez-Rebollar

^*,

Hugo Estrada-Esquivel

and

Ernesto de la Cruz-Nicolás

CENIDET, National Technological Institute of Mexico, Cuernavaca 62490, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(5), 2016; https://doi.org/10.3390/app14052016

Submission received: 25 January 2024 / Revised: 21 February 2024 / Accepted: 25 February 2024 / Published: 29 February 2024

Download

Browse Figures

Versions Notes

Abstract

:

The precise allocation of weights to criteria plays a fundamental role in multicriteria decision-making, exerting a significant influence on the obtained results. Ensuring an appropriate weighting of criteria is crucial for conducting a fair and accurate evaluation of various alternatives. In this context, we present an innovative solution that addresses the allocation of weights to attributes in datasets, aiming to overcome limitations and challenges associated with expert consultation in multicriteria problems. The proposed method is grounded in an objective approach and adopts a hybrid perspective by integrating the mathematical principles of Principal Component Analysis with the application of the Entropy Method. This method was implemented along with the exponential weighted sum model in a case study related to the classification of neighborhoods in Mexico City based on the level of marginalization. Results were compared with the marginalization index reported in official sources, using evaluation metrics MAE and MAPE with values of 0.24 and 11.3%, respectively. This research demonstrates the efficiency of the proposed method, which integrates techniques used for attribute weighting, providing a robust and reliable tool for decision-making.

Keywords:

multiple-attribute decision-making; weighting of criteria; PCA; entropy

1. Introduction

Criteria weighting is a crucial phase within the multicriteria decision-making (MCDM) process. Criteria weighting involves assigning weights to each criterion or attribute within a dataset [1]. This weight assignment determines the importance of each attribute in relation to the rest of the attributes within the dataset. The challenge lies in determining values that indicate which attributes are more relevant and less important [2]. Therefore, weight assignment becomes a fundamental pillar in the MCDM process to address complex problems and formulate effective decision-making strategies.

Various methods have been employed for attribute weighting, classified into objective and subjective approaches. Subjective weighting methods are based on expert opinions. Some of the subjective methods used include the Analytic Hierarchy Process (AHP), the Delphi Method, Pairwise Comparison, the Ranking Method, the Multiple Attribute Ranking Technique (SMART), among others. Objective approaches rely on statistical and/or mathematical analyses based on data information. Objective methods used include the Entropy Method, Criteria Importance through Intercriteria Correlation (CRITIC), Mean, and Standard Deviation, among others. Some limitations of subjective methods lie in the perspective and judgment of experts, the composition of the expert group, and the time and resource costs [3]. In contrast, the advantage of objective methods lies in determining weights based on the structure and behavior of the data [4,5,6].

In conclusion, selecting the appropriate weighting method is not a simple task; it depends on the scope, data availability, and research objectives. Choosing the most suitable weighting method contributes to achieving higher-quality results and making decisions efficiently. The purpose of this work is to present an objective-focused attribute-weighting method based on Principal Component Analysis and Entropy. This proposed method expands the available options in existing weighting methods, offering an additional alternative when challenges arise in accessing expert consultations. It is expected that this integrative approach will provide greater robustness and flexibility in attribute evaluation. Additionally, it aims to enhance effectiveness in situations where limitations in expert consultations may impact the application of other methods.

The document is organized as follows: Section 2 reviews existing works in the reviewed literature; Section 3 provides a description of the proposed method for attribute weighting; Section 4 presents the application of the proposed attribute-weighting method in a case study. Section 5 presents the results and discussion. Section 6 includes the conclusions.

2. Related Works

In the field of attribute weighting in multicriteria decision-making problems, the literature presents various works with diverse approaches and applications. Some of these works have made valuable contributions by integrating various methods to improve decision-making in various contexts and sectors. For example, the study [7] applied the Analytical Hierarchy Process (AHP) to determine criterion weights based on a survey evaluation. This information was used to identify habitats with higher botanical richness, threatened species, and threatened areas of important green spaces. In the research [8], an innovative model called Level-Based Weight Assessment (LBWA) was introduced for criterion weighting. This model seeks to define relationships between criteria to facilitate rational decision-making. Notable advantages of LBWA include its ability to calculate weighting coefficients with a reduced number of criterion comparisons. The model presented in [9] introduces a novel approach to multiple-attribute group decision-making (MAGDM) under the framework of Interval-Valued Hesitant Fuzzy Sets (IVHFS); using an enhanced version of the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). This proposed method is used to evaluate and classify alternative candidates more accurately based on selected criteria. The work [10] addresses criterion weighting using neutrosophic sets. This method has the advantage of being a deferred neutrosophic process with an efficient equation and was applied to an illustrative case of insurance option selection.

On the other hand, [11] focuses on attribute weighting based on Spherical Fuzzy Sets, combining methods of Simple Additive Weighting (SAW) and the Weighted Product Method (WPM). This method is illustrated through a practical case of insurance selection for individuals. In [12], a comparison of variable weighting methods is conducted to measure the financial performance of a manufacturing company. The results highlight that the Preference Ranking Organization Method for Enrichment of Evaluations (PROMETHEE), used with a hybrid approach, provides a superior ranking of preferences in enrichment evaluations. The work of [13] stood out by combining the Analytical Hierarchy Process (AHP) and Goal Programming (GP) to weigh criteria in the context of hydroelectric sustainability, addressing environmental and social issues. The synergy of AHP, utilizing rational thinking, and GP provides a comprehensive and robust solution for heterogeneous groups. The research [14] is noteworthy for combining failure modes and Effects Analysis (FMEA) with the fuzzy method multictriteria optimization and compromise solution (VIKOR). This approach is applied to the weighting of risk factors, identifying severe failure modes and allowing the implementation of corrective actions with a solid priority ranking. In [15], a new method of Modified Integrated Weighting (MIW) is proposed, combined with the method of Complex Proportional Assessment (COPRAS). Its application focuses on classifying datasets, specifically in peanut cultivation sites, supporting decision-making for sustainable development. In [16], utility functions of Complex Proportional Assessment (COPRAS) are integrated with the Step-wise Weight Assessment Ratio Analysis (SWARA) for criterion weighting, aiming to better select suppliers in the oil and gas industry. The research [17] used a hybrid model that integrates the fuzzy Analytical Hierarchy Process (AHP), fuzzy Analytic Network Process (ANP), and fuzzy decision-making trial and evaluation laboratory (DEMATEL) for criterion weighting. The model was applied to a multicriteria problem related to drone firefighting and disaster recovery services. Similarly, in [18], a combinative weighting method based on fuzzy group decision-making (FGAHP) and Criteria Importance Through Intercriteria Correlation (CRITIC) is developed to calculate criterion weights, focusing on the selection of the site for tidal power plants.

3. Proposed Method for Attribute Weighting

The methodology employed in this research is based on the multicriteria decision-making process [19]. The main stages include Data Collection, Attribute Weighting, Alternative Evaluation, and Sensitivity Analysis. Figure 1 illustrates the Multicriteria decision-making process, divided into four phases, with attribute weighting as the central phase, as it is where the proposed method of attribute weighting is focused. In this section, a detailed description of the proposed attribute-weighting method is provided, which is based on Principal Component Analysis and the Entropy Method. The stages of Data Collection, Alternative Evaluation, and Sensitivity Analysis will be detailed in Section 4.2, which is linked to the implementation of the case study.

3.1. Proposed Method for Attribute Weighting Based on PCA and Entropy

The proposed method is a hybrid strategy with an objective focus that merges the strengths of two methods employed in attribute weighting: Principal Component Analysis (PCA) and the Entropy Method. The integration of these two approaches not only reflects a solid mathematical foundation but also leads to a more accurate and meaningful weighting of attributes. This combined process optimizes decision quality in complex environments by providing a more holistic and detailed understanding of the elements involved.

Principal Component Analysis transforms the original dataset, generating a new set of data. Each attribute in the new dataset offers insights into the relationship between the original attributes, considering the highest percentage of explained variance from the original dataset. In scenarios where expert opinions are unavailable, this information serves as the basis for constructing the decision matrix. Each element in this matrix includes scores that reflect the performance of each attribute in relation to the established objectives. The introduction of the Entropy Method in this process brings significant benefits, as it allows for the measurement of uncertainty by assessing mutual information between attributes and capturing non-linear relationships. This integration contributes to a more informed and precise decision-making process.

The method that combines PCA and Entropy unfolds in two phases to optimize the evaluation of attributes in multicriteria decision-making problems. In the first phase, the construction of the decision matrix begins, utilizing information derived from PCA and focusing on the matrix of eigenvectors. This initial step lays the groundwork for understanding the data structure and highlighting the most relevant features. In the second phase, determining attribute weights takes priority through the application of the Entropy Method. In this stage, the focus is on quantifying information and reducing uncertainty associated with each attribute. The sequential application of PCA and Entropy enables a comprehensive and detailed evaluation of attributes, providing a more precise and enriched weighted assignment of the attributes involved. In Phase 2 of Figure 1, the fundamental steps of the proposed method for attribute weighting are presented, which is based on Principal Component Analysis and the Entropy Method.

3.2. The Description of Principal Component Analysis and the Entropy Method

In this section, the mathematical foundations and the description of Principal Component Analysis and the Entropy Method are presented. These are the tools upon which the proposed method is based.

3.2.1. Principal Component Analysis

Principal Component Analysis (PCA) is a well-known non-parametric technique in unsupervised learning [20]. This statistical technique has become a commonly used approach to address dimensionality reduction in large datasets. The primary purpose of PCA lies in transforming a multivariate dataset, characterized by n attributes, into a new dataset that preserves the essence, the same dimensionality, and the key information of the original set [21].

The PCA process involves a linear transformation of the data, specifically an orthogonal transformation of variable vectors, which may be correlated, into a set of linearly uncorrelated variables [22]. During this transformation, eigenvectors, known as principal components, and their corresponding eigenvalues are obtained. It is crucial to note that the first principal component exhibits the maximum variability present in the original data, while successive components capture the remaining variability in descending order [23]. Eigenvalues are usually presented in descending order due to the inherent nature of the decomposition process; the main purpose is to capture as much of the total variability present in the original dataset as possible.

The explanation of total variance is closely linked to the eigenvalues derived from the covariance matrix of the dataset or from the correlation matrix. The choice between the covariance matrix and the correlation matrix depends on the objectives and characteristics of the data. These eigenvectors form a one-dimensional orthogonal basis. As a result, principal components are expressed through a set of indices summarized using the linear combination of coefficients and values of variables present in the dataset [24,25]. Given its ability to reduce complexity and preserve essential information, PCA has established itself as a crucial tool in complex data analysis and has found applications in a wide range of disciplines. The following outlines the PCA calculation process, which occurs in several stages.

Step 1. Calculation of the Covariance Matrix

Consider the dataset X with n variables and m rows. From the provided data, the covariance matrix is calculated. The covariance matrix C is a symmetric

n \times n

matrix that displays the covariance between every pair of variables [26,27]. Refer to Equation (1).

C o v (X_{i}, Y_{j}) = \frac{\sum_{k = 1}^{n} (X_{i k} - {\bar{X}}_{i}) (Y_{j k} - {\bar{Y}}_{j})}{n - 1},

(1)

where

{\bar{X}}_{i}

,

{\bar{Y}}_{j}

are the means of

X_{i} y Y_{j}

, respectively.

Step 2. Decomposition of the Covariance Matrix.

The next step involves decomposing the covariance matrix in terms of the associated eigenvalues and eigenvectors. Eigenvalues represent the amount of explained variance for each component, while eigenvectors define the direction in which there is greater variability in the data. The computation of eigenvalues and eigenvectors is performed using the characteristic equation in Equation (2) [28,29].

C v = λ v,

(2)

where

v

is an eigenvector, and

λ

is the corresponding eigenvalue.

The eigenvalues are determined by solving the characteristic polynomial with Equation (3).

d e t (C - λ I) = 0,

(3)

where

I

is the identity matrix, and

λ

is a real value.

The eigenvectors corresponding to each eigenvalue are determined by solving the system of Equation (4).

(C - λ I) v = 0,

(4)

These eigenvectors, or principal components, form a matrix

V

with dimensions

n \times n

. The eigenvector matrix consists of linearly independent vectors, each with a magnitude of one and a right angle between them. Each vector is associated with its eigenvalue.

Step 3. Eigenvalue Sorting

As the matrix C is a positive definite matrix, it enables the calculation of the total explained variance of the data based on its eigenvalues. Therefore, the eigenvalues are arranged in descending order, facilitating the identification of the most significant eigenvectors in terms of the total explained variance [30].

Step 4: Data Projection into the New Principal Component Space

The transformation of data into the new principal component space is accomplished using Equation (5).

Y = X \times V,

(5)

where

Y

is the matrix of data projected into the new space,

X

is the original data matrix, and

V

is the eigenvector matrix.

Step 5: Explained Variance

The proportion of variance explained using the principal components is calculated by considering the sum of the first k eigenvalues over the total sum of all eigenvalues, using Equation (6).

R^{2} = \frac{\sum_{i = 1}^{k} λ_{i}}{\sum_{i = 1}^{n} λ_{i}},

(6)

Step 6: Selection of Principal Components

Determining the optimal number, k, of principal components to retain in a Principal Component Analysis (PCA) involves, firstly, evaluating the cumulative proportion of explained variance. However, this criterion is not isolated, as there are other approaches that can be considered for the selection of the number of principal components.

The ultimate outcome of a Principal Component Analysis reveals the projection of the original data into a space of equal dimensions, where the new set of variables represents a linear combination of the original variables. The significance and essence of the original dataset are predominantly manifested in the initial variables or components of this newfound set. In the context of our research, the linear combination provided by PCA plays a pivotal role in facilitating the assignment of weights to each variable concerning the original set for every record. This capability for weight assignment enables a detailed understanding of the individual contribution of each variable in the analysis, thereby enhancing our overall comprehension of the dataset.

3.2.2. Entropy Method

The Entropy Method is a commonly utilized objective weighting tool in decision analysis involving multiple criteria [31]. Rooted in the concept of Entropy from thermodynamics, as introduced by Shannon, this method quantifies the variability of criteria [32]. Entropy measures the uncertainty or variability present in the data, offering crucial insights into the diversity of criteria within the dataset. Its primary objective lies in calculating criterion weights based on the inherent diversity of the dataset [33]. Essentially, this approach involves the use of the decision matrix and the execution of various algebraic operations to derive attribute weights from informed judgments, harnessing the information and specific characteristics of the data [34]. The subsequent section outlines the step-by-step calculation process of the Entropy Method.

Step 1. Construction of the Decision Matrix

The decision matrix is a matrix that reflects the information for each alternative concerning each criterion. Consider the data matrix with

n

variables and

m

alternatives, where each element

d_{i j}

represents the rating of alternative

i

on criterion

j

.

Step 2. Normalization of the Decision Matrix

In this step, a normalization technique is applied to ensure data is on a consistent scale. The specific technique used depends on the nature of the problem. Consider the normalized data matrix,

N

, obtained through this normalization process.

Step 3. Entropy Calculation [35]

This step involves computing the Entropy for the normalized data matrix,

N

, using Shannon’s Entropy Equation (7).

e_{i j} = - \frac{1}{\ln (n)} \sum_{k = 1}^{n} \frac{n_{i k}}{n} l n (\frac{n_{i k}}{n}),

(7)

Here,

n_{i k}

represents the normalized value of alternative

i

on criterion

j .

Subsequently, the Entropy for each criterion is calculated using the Equation (8).

H_{j} = \frac{1}{m} \sum_{i = 1}^{m} e_{i j},

(8)

Step 4. Criterion Weight Determination

The final step involves determining the weights of the criteria, as depicted in Equation (9).

W_{j} = \frac{1 - H_{j}}{m - \sum_{k = 1}^{m} H_{k}},

(9)

Here,

W_{j}

represents the weight of criterion

j

, and

H_{j}

is the Entropy of criterion

j .

The outcome yields the weights for each variable within the dataset, effectively presenting them in a ranked list. These weights can be utilized to assign priorities and make more consistent decisions.

4. Application Proposed Method for Attribute Weighting on a Case Study

In this section, the application of the attribute-weighting method to a real case study is presented.

4.1. Case Study

In the context of this case study, the decision has been made to address the issue of marginalization. Following the premises of reference [36], marginalization is a comprehensive metric that allows for the assessment of inequalities experienced by the population at the territorial level, emphasizing deficiencies related to education, housing, and asset ownership. Marginalization is conceived as an analytical tool that emerges as an effective means to observe and examine disparities present in various geographical areas. The attributes comprising marginalization provide a valuable perspective for understanding territorial inequities, allowing for a holistic approach that encompasses both social, economic, and geographic factors.

The geographical area considered is Mexico City, located in the center of the country. Mexico City encompasses an extensive area of 1494.3 km² divided into 16 territorial demarcations [37], where 99% of the areas are urban and 1% are rural, see Figure 2. Mexico City plays a crucial role in the national dynamics, possessing the country’s most robust economy and serving as a financial, economic, political, and cultural center. The characteristics of the city in terms of social and economic indicators contributing to marginalization at state level are highlighted in Table 1.

The choice of Mexico City as a case study is justified not only by its sizable population and economic relevance but also by its territorial diversity and essential role in shaping the national landscape. These elements provide a robust framework to comprehensively examine the factors influencing marginalization and social disparities in this multifaceted urban context.

4.2. Implementation of the Case Study

The methodology used in the case study is depicted in Figure 1. To carry out the implementation of the proposed method and the multicriteria decision-making process, the R programming language was employed to perform the necessary calculations in all phases [40]. FactoMineR library was chosen for its capability to facilitate multivariate data analysis, particularly Principal Component Analysis [41]. In the case of the Entropy Method, operations were performed manually.

Phase 1. Data collection:

Data Collection: The data used in this study were obtained from the National Population Council (CONAPO) [42]. These data underwent a comprehensive cleaning process, which included the removal of redundant data, imputation of missing values, and standardization of the dataset. In total, the dataset comprises 1814 instances, each representing observations associated with different localities and characterized by nine specific attributes. Table 2 provides a detailed summary of each attribute. Table 3 presents an excerpt with the dataset values to facilitate a comprehensive and transparent understanding of the data used in this analysis.

Later, a data normalization technique was applied to ensure that the dataset attributes were on the same scale. The normalization technique used was z-score normalization. In this method, dataset values are transformed to have a distribution with a mean of 0 and a variance of 1. Normalization plays a crucial role due to its impact on the homogeneity and comparability of the attributes, essential factors for the subsequent attribute-weighting process.

Phase 2. Attribute Weighting

The attribute-weighting phase stands out as the most crucial stage in the multicriteria decision-making process. To assign weights to the attributes, the proposed method, based on Principal Component Analysis and the Entropy Method, was applied, as detailed in Section 3.1. The following outlines the steps followed during the application of the proposed method for attribute weighting:

Step 1: Calculate covariance matrix

The first step involves calculating the covariance matrix. The covariance matrix provides a detailed perspective on the relationship among the various attributes that make up the dataset. Table 4 displays the values obtained from the covariance matrix.

Step 2: Calculate eigenvalue vector

From the covariance matrix, the vector of eigenvalues associated with the matrix was obtained. These eigenvalues are of paramount importance, as they encapsulate valuable information about the distribution of variance in the dataset. In other words, eigenvalues reflect the amount of variance present in each direction, indicating which dimensions of the dataset are more significant. Table 5 displays the obtained vector of eigenvalues and the percentage of explained variance for each eigenvalue.

Step 3: Calculate eigenvector matrix

This step involves obtaining the matrix of eigenvectors. Each eigenvector, constituting a column in the matrix, is linked to a specific eigenvalue and provides information about the precise direction along which the greatest variability in the original data occurs. Each row of the eigenvector matrix corresponds to the attributes of the dataset. Thus, each value in the matrix reflects the relative importance of each attribute in relation to the overall variability of the data, enabling a deeper understanding of the relationships among the attributes. See Table 6 for details.

Step 4: Obtaining decision matrix

This step involves obtaining the decision matrix. In order to ensure a meaningful representation of the inherent variability in the data, a threshold of 95% explained variance is established. This choice justifies the selection of the first five eigenvectors. The result is an

8 \times 5

decision matrix. Table 7 provides a clear visualization of this matrix. The values of CP1 indicate that attributes 2, 7, and 8 have a greater impact. For the CP4 vector, attribute 3 has a higher impact compared to the rest.

Step 5: Normalize decision matrix

Subsequently, the values of the decision matrix were normalized within a range [0, 1], and Entropy was calculated for each column of the normalized matrix. See Table 8.

Step 6: Calculate Entropy

Entropy emerges as a crucial tool that enables the assessment of information and uncertainty present in the dataset. Table 9 displays the vector of Entropy values.

Step 7: Obtaining weights

This process culminates in assigning weights to different attributes, establishing a ranking on a scale ranging from 0 to 1. The allocation of weights on a 0 to 1 scale provides a more precise measure of the influence of each attribute, allowing for a detailed interpretation of its impact in multicriteria analysis. Table 10 displays the weight values for each attribute and their level of importance within the dataset.

It is observed that the attribute with the highest relevance is

x_{1}

, which is related to the percentage of the population aged 15 and over who are illiterate. Another highly relevant attribute within the dataset is

x_{3}

, related to the percentage of households without access to drainage or toilet facilities. On the other hand, the least important attribute within the dataset is

x_{8}

, related to the percentage of households without a refrigerator.

Phase 3. Evaluating of alternatives

This stage combines the contribution of each attribute in a weighted manner, providing a comprehensive representation of the alternatives. During this phase, the exponential weighted aggregation model was implemented, as detailed in Equation (10). For the classification of alternatives, based on the results obtained in the alternative evaluation, the Jenks method, a known interval classification method, was employed.

Y = \sum_{i = 1}^{n} x_{i} e^{w_{i}},

(10)

where

x_{i}

represents variable

i

, and

w_{i}

is the weight associated with variable i.

To carry out a comprehensive evaluation of the proposed method’s performance, measurements were conducted using specific metrics to enable a detailed understanding of its effectiveness. In this context, the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were employed. In Equations (11) and (12), the mathematical expression for the Mean Absolute Error and Mean Absolute Percentage Error is presented. These metrics were chosen to accurately determine the proximity of the estimated results to those presented using the marginalization index. The primary objective of this evaluation is to comprehend the extent to which the obtained results align with the actual values of the marginalization index.

Mean Absolute Error (MAE):

M A E = \frac{1}{n} |y_{i} - \hat{y_{i}}|

(11)

where

n

is the total number of observations;

y_{i}

is the real value of observation

i

;

\hat{y_{i}}

is the estimated value of observation

i

.

Mean Absolute Percentage Error (MAPE):

M A P E = \frac{1}{n} \sum_{i = 1}^{n} \frac{|y_{i} - \hat{y_{i}}|}{|y_{i}|} 100

(12)

where

n

is the total number of observations;

y_{i}

is the real value of observation

i

;

\hat{y_{i}}

is the estimated value of observation

i .

Based on the calculations performed, an MAE of 0.244 and an MAPE of 11.33% are obtained. These results indicate that the estimates are approximately 0.24 units away from the actual values, with a deviation of 11.3%. In a dataset comprising 1814 observations, it is noteworthy that, for 1381 of them, the absolute distance between the estimate and the reference value is zero. For another 422 observations, the distance is one, and finally, for the remaining 11 observations, the distance is two. These findings demonstrate that the estimates closely align with the actual data, supporting the effectiveness and precision of the proposed method. Additionally, it is observed that the estimated values follow a similar trend to the behavior of the marginalization index.

Figure 3 provides a graphical representation of the comparison between the estimated and actual values of the marginalization level for 100 locations.

Phase 4. Sensitivity analysis

In this phase, a sensitivity analysis was conducted to assess the robustness and stability of the obtained model. The sensitivity analysis aims to understand how the model results vary when adjusting sensitive parameters, such as attribute weights. Random numbers were generated in the interval [0, 0.5] for each attribute and added to the weights of the obtained model. This process was repeated in 10 experiments. The best result exhibited a Mean Absolute Error of 0.15 and a Mean Absolute Percentage Error of 10.03%, surpassing the values in the previous stage. Specifically, it was observed that approximately 77% of the data showed an absolute distance equal to zero between the estimated value and the reference value. The absolute distance is equal to 1 for 23% of the data. Finally, the absolute distance between the estimated value and the reference value is 2 for 0.3% of the data. This analysis provides valuable information about the model’s performance in different scenarios, contributing to a more detailed understanding of its behavior.

Based on the obtained results and with the help of the Geographic Information System QGIS version 3.34.3, a cartographic representation of locations with their respective levels of marginalization in Mexico City was performed, using a scale of five colors. The scale of marginalization levels includes: very low, low, medium, high, and very high: see Figure 4. This color scale allows for observing which locations are more and less vulnerable based on various social aspects.

5. Results and Discussion

This study introduced a hybrid method for weighting attributes in a dataset, related to multicriteria decision-making problems. This method stands out for its strong mathematical foundation based on the integration of Principal Component Analysis and the Entropy Method, providing a robust methodology to determine the relative importance of each attribute in the dataset. By adopting an objective approach in weight assignment, the method emerges as an alternative, especially in situations where accessing information through expert consultation becomes challenging.

The results derived from applying the model to the case study offer a detailed and insightful view of the method’s performance in various scenarios, enriching our understanding of its behavior. With an MAE of 0.15 and a MAPE of 10.03%, they emphasize the remarkable accuracy of the method and the model’s ability to make estimates very close to real values. The cartographic representation further solidifies the effectiveness of the proposed approach. Confidence in the results establishes a robust foundation for designing more effective strategies and policies, aiming to accurately address socioeconomic disparities in different areas.

In future stages of the research, there are plans to address the identified limitations in the current phase of the study and strengthen the validity and applicability of the results. Recognizing the importance of a thorough comparison with similar studies and additional datasets, we aim to conduct a comprehensive literature review and expand our dataset. We will implement additional benchmarking techniques for a more comprehensive evaluation, also exploring the sensitivity of the method to variations. These efforts will contribute to consolidating our research and advancing the field significantly.

6. Conclusions

In conclusion, the primary purpose of this research is to enhance the efficiency in calculating attribute weights in numerical datasets by adopting an objective weighting approach. The aim is to address the challenges that may arise in expert consultations during the multicriteria decision-making process. This article introduces a method that combines two widely used techniques in attribute weighting: Principal Component Analysis and the Entropy Method. The integration of these methods, along with the utilization of the exponential weighted aggregation model, has resulted in a model that demonstrates the ability to capture and generalize patterns in the data for the case study. The results obtained through the MAPE evaluation metric in the Alternative Evaluation Phase and Sensitivity Analysis allowed for estimating the level of marginalization with an accuracy of 88.6%. In sensitivity analysis, the model was further improved, achieving an accuracy of 89.97%. The evaluation supports the utility of this model as a reliable tool for estimating the level of marginalization in various locations, clearly identifying the situations of geographical areas within Mexico City.

This study opens the door to future research that could further enrich and refine the proposed attribute-weighting method. For example, exploring variations in the Entropy Method and Principal Component Analysis to enhance modeling capacity and adaptability to the complexity of datasets. Additionally, conducting a detailed comparison with other established methods and evaluating their performance across different domains, datasets, and application areas would be valuable.

Author Contributions

O.P.-M. and A.M.-R. developed the research idea and software investigation and carried out the methodology and validation of results. H.E.-E. and E.d.l.C.-N. edited the manuscript, designed the manuscript and reviewed and edited the document. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets was analyzed in this study. This data can be found here: https://www.gob.mx/conapo/documentos/indices-de-marginacion-2020-284372 (accessed on 24 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Odu, G.O. Weighting methods for multi-criteria decision-making technique. J. Appl. Sci. Environ. Manag. 2019, 23, 1449–1457. [Google Scholar] [CrossRef]
Huang, B. Comprehensive Geographic Information Systems; Elsevier: Amsterdam, The Netherlands, 2017; Volume 1. [Google Scholar]
Keshavarz-Ghorabaee, M.; Amiri, M.; Zavadskas, E.K.; Turskis, Z.; Antucheviciene, J. Determination of objective weights using a new method based on the removal effects of criteria (MEREC). Symmetry 2021, 13, 525. [Google Scholar] [CrossRef]
Rădulescu, C.Z.; Rădulescu, M. A Hybrid Method for Cloud Quality of Service Criteria Weighting. New Trends Emerg. Complex Real Life Probl. ODS 2018, 1, 425–432. [Google Scholar] [CrossRef]
Şahin, M. A comprehensive analysis of weighting and multicriteria methods in the context of sustainable energy. Int. J. Environ. Sci. Technol. 2021, 18, 1591–1616. [Google Scholar] [CrossRef]
Németh, B.; Molnár, A.; Bozóki, S.; Wijaya, K.; Inotai, A.; Campbell, J.D.; Kaló, Z. Comparison of weighting methods used in multicriteria decision analysis frameworks in healthcare with focus on low-and middle-income countries. J. Comp. Eff. Res. 2019, 8, 195–204. [Google Scholar] [CrossRef]
Hamidah, M.; Hasmadi, I.M.; Chua, L.; Yong, W.; Lau, K.; Faridah-Hanum, I.; Pakhriazad, H. Development of a protocol for Malaysian Important Plant Areas criterion weights using Multi-criteria Decision Making-Analytical Hierarchy Process (MCDM-AHP). Glob. Ecol. Conserv. 2022, 34, e02033. [Google Scholar] [CrossRef]
Žižović, M.; Pamučar, D. New model for determining criteria weights: Level Based Weight Assessment (LBWA) model. Decis. Mak. Appl. Manag. Eng. 2019, 2, 126–137. [Google Scholar] [CrossRef]
Gitinavard, H.; Mousavi, S.M.; Vahdani, B. A new multi-criterion weighting and ranking model for group decision-making analysis based on interval-valued hesitant fuzzy sets to selection problems. Neural Comput. Appl. 2016, 27, 1593–1605. [Google Scholar] [CrossRef]
Boltürk, E.; Karaşan, A.; Kahraman, C. Simple additive weighting and weighted product methods using neutrosophic sets. In Fuzzy Multi-Criteria Decision-Making Using Neutrosophic Sets; Springer: Berlin/Heidelberg, Germany, 2019; Volume 369, pp. 647–676. [Google Scholar] [CrossRef]
Kutlu Gündoğdu, F.; Yörükoğlu, M. Simple additive weighting and weighted product methods using spherical fuzzy sets. In Decision Making with Spherical Fuzzy Sets; Kahraman, C., Gündoğdu, F.K., Eds.; Springer: Berlin/Heidelberg, Germany, 2021; Volume 392, pp. 241–258. [Google Scholar] [CrossRef]
Baydaş, M.; Elma, O.E. An objectıve criteria proposal for the comparison of MCDM and weighting methods in financial performance measurement: An application in Borsa Istanbul. Decis. Mak. Appl. Manag. Eng. 2021, 4, 257–279. [Google Scholar] [CrossRef]
Gómez, J.A.; Flores, R.S.; Román, S.G. Determinación de las Ponderaciones de los Criterios de Sustentabilidad HidroEléctrica Mediante la Combinación de los Métodos AHP and GP Extendido. Ingeniería 2019, 24, 116–142. [Google Scholar] [CrossRef]
Liu, H.C.; Liu, H.C. FMEA Using Combination Weighting and Fuzzy VIKOR and Its Application to General Anesthesia. In Improved FMEA Methods for Proactive Healthcare Risk Analysis; Springer: Singapore, 2019; pp. 151–172. [Google Scholar] [CrossRef]
Deepa, N.; Ganesan, K.; Srinivasan, K.; Chang, C.-Y. Realizing sustainable development via modified integrated weighting MCDM model for ranking agrarian dataset. Sustainability 2019, 11, 6060. [Google Scholar] [CrossRef]
Yazdi, A.K.; Wanke, P.F.; Hanne, T.; Abdi, F.; Sarfaraz, A.H. Supplier selection in the oil & gas industry: A comprehensive approach for Multi-Criteria Decision Analysis. Socio-Econ. Plan. Sci. 2022, 79, 101142. [Google Scholar] [CrossRef]
Zhang, J.Z.; Srivastava, P.R.; Eachempati, P. Evaluating the effectiveness of drones in emergency situations: A hybrid multi-criteria approach. Ind. Manag. Data Syst. 2023, 123, 302–323. [Google Scholar] [CrossRef]
Shao, M.; Zhao, Y.; Sun, J.; Han, Z.; Shao, Z. A decision framework for tidal current power plant site selection based on GIS-MCDM: A case study in China. Energy 2023, 262, 125476. [Google Scholar] [CrossRef]
Zardari, N.H.; Ahmed, K.; Shirazi, S.M.; Bin Yusop, Z. Weighting methods and their effects on multi-criteria decision-making model outcomes in water resources management. In SpringerBriefs in Water Science and Technology; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
Hasan, B.M.S.; Abdulazeez, A.M. A review of principal component analysis algorithm for dimensionality reduction. J. Soft Comput. Data Min. 2021, 2, 20–30. [Google Scholar] [CrossRef]
Kherif, F.; Latypova, A. Principal component analysis. In Machine Learning; Mechelli, A., Vieira, S., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 209–225. [Google Scholar] [CrossRef]
Rehman, A.; Khan, A.; Ali, M.A.; Khan, M.U.; Khan, S.U.; Ali, L. Performance analysis of pca, sparse pca, kernel pca and incremental PCA algorithms for heart failure prediction. In Proceedings of the International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Istanbul, Turkey, 12–13 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar] [CrossRef]
Abdelaziz, S.; Gad, M.I.; El Tahan, A.H.M. Groundwater quality index based on PCA: Wadi El-Natrun, Egypt. J. Afr. Earth Sci. 2020, 172, 103964. [Google Scholar] [CrossRef]
Cartone, A.; Postiglione, P. Principal component analysis for geographical data: The role of spatial effects in the definition of composite indicators. Spat. Econ. Anal. 2021, 16, 126–147. [Google Scholar] [CrossRef]
Mahmoudi, M.R.; Heydari, M.H.; Qasem, S.N.; Mosavi, A.; Band, S.S. Principal component analysis to study the relations between the spread rates of COVID-19 in high risks countries. Alex. Eng. J. 2021, 60, 457–464. [Google Scholar] [CrossRef]
Janzamin, M.; Ge, R.; Kossaifi, J.; Anandkumar, A. Spectral learning on matrices and tensors. Found. Trends Mach. Learn. 2019, 12, 393–536. [Google Scholar] [CrossRef]
Gewers, F.L.; Ferreira, G.R.; De Arruda, H.F.; Silva, F.N.; Comin, C.H.; Amancio, D.R.; Costa, L.D.F. Principal component analysis: A natural approach to data exploration. ACM Comput. Surv. 2021, 54, 4. [Google Scholar] [CrossRef]
Salem, N.; Hussein, S. Data dimensional reduction and principal components analysis. Procedia Comput. Sci. 2019, 163, 292–299. [Google Scholar] [CrossRef]
Aït-Sahalia, Y.; Xiu, D. Principal component analysis of high-frequency data. J. Am. Stat. Assoc. 2019, 114, 287–303. [Google Scholar] [CrossRef]
Mewenemesse, H.T.; Yan, Q.; Acouetey, P.F. Policy Analysis of Low-Carbon Energy Transition in Senegal Using a Multi-Criteria Decision Approach Based on Principal Component Analysis. Sustainability 2023, 15, 4299. [Google Scholar] [CrossRef]
Salehi, A.; Izadikhah, M. A novel method to extend SAW for decision-making problems with interval data. Decis. Sci. Lett. 2014, 3, 225–236. [Google Scholar] [CrossRef]
Wang, A.; Le, T.Q.; Chang, K.-H.; Dang, T.-T. Measuring Road Transport Sustainability Using MCDM-Based Entropy Objective Weighting Method. Symmetry 2022, 14, 1033. [Google Scholar] [CrossRef]
Chen, P. Effects of the entropy weight on TOPSIS. Expert Syst. Appl. 2021, 168, 114186. [Google Scholar] [CrossRef]
Del, M.S.T.T.; Tabrizi, S.K. A methodological assessment of the importance of physical values in architectural conservation using Shannon entropy method. J. Cult. Herit. 2020, 44, 135–151. [Google Scholar] [CrossRef]
Saraswat, S.K.; Digalwar, A.K. Evaluation of energy alternatives for sustainable development of energy sector in India: An integrated Shannon’s entropy fuzzy multi-criteria decision approach. Renew. Energy 2021, 171, 58–74. [Google Scholar] [CrossRef]
Marginalization Index. Available online: https://www.gob.mx/conapo/documentos/indices-de-marginacion-2020-284372 (accessed on 20 June 2023). (In Spanish).
INEGI, Tell Me. Available online: https://cuentame.inegi.org.mx/monografias/informacion/df/territorio/ (accessed on 10 June 2023). (In Spanish).
Ciudad de México. Available online: https://www.economia.gob.mx/datamexico/es/profile/geo/ciudad-de-mexico-cx?redirect=true#:~:text=La%20poblaci%C3%B3n%20total%20de%20Ciudad,%25%20mujeres%20y%2047.8%25%20hombres (accessed on 7 July 2023).
Mexico City: Economy. Available online: https://www.economia.gob.mx/datamexico/es/profile/geo/ciudad-de-mexico-cx (accessed on 7 July 2023). (In Spanish).
The Comprehensive R Archive Network. Available online: https://cran.r-project.org/ (accessed on 4 September 2023).
FactoMineR: Multivariate Exploratory Data Analysis and Data Mining. Available online: https://cran.r-project.org/web/packages/FactoMineR/index.html (accessed on 4 September 2023).
Marginalization Index by Locality-Municipalities. Available online: https://datos.gob.mx/busca/dataset/indice-de-marginacion-carencias-poblacionales-por-localidad-municipio-y-entidad2 (accessed on 25 July 2023). (In Spanish).

Figure 1. Multicriteria decision-making process.

Figure 2. Locations of México City.

Figure 3. Marginalization level and model derived from the proposed attribute weighting.

Figure 4. Cartographic representation of marginalization level by locality.

Table 1. Characteristics of Mexico City.

Mexico City
Description	Data
Population (year 2020) [38]	9,209,944 Inhabitants
Percentage of illiterate population aged 15 and older	1.43
Percentage of population aged 15 and older without basic education	17.64
Percentage of occupants in private homes without drainage or toilet	0.05
Percentage of occupants in private homes without electricity	0.05
Percentage of occupants in private homes without piped water	1.24
Percentage of occupants in private homes with earthen floor	0.63
Percentage of private homes with overcrowding	14.40
Percentage of localities with fewer than 5000 inhabitants	1.01
Percentage of employed population with incomes below 2 minimum wages	56.13
Level of marginalization at the state level [39]	Very low

Table 2. Description of dataset attributes.

Attribute	Description
ID	Locality code
$x_{1}$	Percentage of the population aged 15 and over who are illiterate
$x_{2}$	Percentage of the population aged 15 and over without basic education
$x_{3}$	Percentage of households without access to drainage or toilet facilities
$x_{4}$	Percentage of households without access to electricity
$x_{5}$	Percentage of households without access to piped water
$x_{6}$	Percentage of households with dirt floors
$x_{7}$	Percentage of households with overcrowding
$x_{8}$	Percentage of households without a refrigerator

Table 3. Partial data from the dataset.

ID	$x_{1}$	$x_{2}$	$x_{3}$	$x_{4}$	$x_{5}$	$x_{6}$	$x_{7}$	$x_{8}$
1478	2.464	16.016	0.000	0.000	0.000	0.254	16.462	2.937
1479	4.666	17.573	0.000	0.042	0.004	0.220	25.172	3.652
1480	9.717	18.087	0.000	0.000	0.059	0.006	25.399	2.939
1481	4.734	15.200	0.045	0.014	0.266	0.260	21.460	3.793
1482	4.289	19.838	0.067	0.019	0.116	0.089	24.279	3.796
1483	4.593	18.948	0.002	0.001	0.004	0.498	24.766	3.763
1484	6.980	17.998	0.021	0.000	0.027	0.021	25.982	4.104
1485	6.547	21.680	0.012	0.076	0.077	1.150	28.828	5.632
1486	6.563	29.620	0.000	0.455	46.909	3.301	46.515	14.591
1487	5.119	20.490	0.000	0.000	0.004	0.000	22.583	8.546
1488	2.550	7.859	0.027	0.023	0.471	0.736	7.698	1.970
1489	10.019	19.699	0.000	0.001	0.349	1.401	22.116	5.165
1490	2.402	6.825	0.000	0.000	0.074	0.280	4.227	0.627
1491	7.418	24.908	0.213	0.137	16.760	4.038	32.242	16.489
1492	7.692	10.297	0.000	0.000	0.000	0.279	11.779	1.711
1493	4.791	19.672	0.089	0.000	0.608	0.831	24.464	6.800
1494	2.631	17.479	0.101	0.068	0.036	1.410	18.705	4.526
1495	4.210	24.365	0.062	0.012	2.568	1.453	29.558	8.673
1496	5.697	14.386	0.000	0.208	0.042	0.532	12.504	4.611

Table 4. Covariance matrix.

	x₁	x₂	x₃	x₄	x₅	x₆	x₇	x₈
x₁	1.000	0.009	−0.047	0.073	−0.045	0.008	0.015	−0.041
x₂	0.009	1.000	0.212	0.298	0.251	0.492	0.930	0.811
x₃	−0.047	0.212	1.000	0.188	0.296	0.251	0.233	0.292
x₄	0.073	0.298	0.188	1.000	0.303	0.425	0.323	0.417
x₅	−0.045	0.251	0.296	0.303	1.000	0.427	0.249	0.431
x₆	0.008	0.492	0.251	0.425	0.427	1.000	0.494	0.600
x₇	0.015	0.930	0.233	0.323	0.249	0.494	1.000	0.828
x₈	−0.041	0.811	0.292	0.417	0.431	0.600	0.828	1.000

Table 5. Eigenvalue vector.

λ₁	λ₂	λ₃	λ₄	λ₅	λ₆	λ₇	λ₈
0.459	0.140	0.129	0.100	0.082	0.060	0.021	0.009
0.459	0.599	0.728	0.828	0.910	0.970	0.991	1.000

Table 6. Eigenvector matrix.

CP1	CP2	CP3	CP4	CP5	CP6	CP7	CP8
0.003	−0.274	0.888	0.313	0.187	−0.002	0.049	0.006
−0.451	−0.377	−0.140	0.104	0.022	−0.119	−0.393	0.675
−0.216	0.502	−0.051	0.774	−0.309	0.062	−0.019	0.016
−0.289	0.239	0.376	−0.453	−0.656	−0.278	−0.074	0.017
−0.281	0.546	0.090	−0.140	0.651	−0.384	−0.156	−0.023
−0.386	0.165	0.120	−0.235	0.109	0.856	−0.098	−0.015
−0.457	−0.364	−0.127	0.113	−0.016	−0.129	−0.273	−0.734
−0.479	−0.123	−0.094	−0.009	0.062	−0.090	0.854	0.072

Table 7. Decision matrix.

CP1	CP2	CP3	CP4	CP5
0.003	−0.274	0.888	0.313	0.187
−0.451	−0.377	−0.140	0.104	0.022
−0.216	0.502	−0.051	0.774	−0.309
−0.289	0.239	0.376	−0.453	−0.656
−0.281	0.546	0.090	−0.140	0.651
−0.386	0.165	0.120	−0.235	0.109
−0.457	−0.364	−0.127	0.113	−0.016
−0.479	−0.123	−0.094	−0.009	0.062

Table 8. Normalized decision matrix.

CP1	CP2	CP3	CP4	CP5
0.003	−0.274	0.888	0.313	0.187
−0.451	−0.377	−0.140	0.104	0.022
−0.216	0.502	−0.051	0.774	−0.309
−0.289	0.239	0.376	−0.453	−0.656
−0.281	0.546	0.090	−0.140	0.651
−0.386	0.165	0.120	−0.235	0.109
−0.457	−0.364	−0.127	0.113	−0.016
−0.479	−0.123	−0.094	−0.009	0.062

Table 9. Values of Entropy obtained.

CP1	CP2	CP3	CP4	CP5
0.183	0.185	0.221	0.205	0.207

Table 10. Ranking of attribute weight.

x₁	x₂	x₃	x₄	x₅	x₆	x₇	x₈
0.215	0.070	0.198	0.188	0.163	0.049	0.069	0.048
1	5	2	3	4	7	6	8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pliego-Martínez, O.; Martínez-Rebollar, A.; Estrada-Esquivel, H.; de la Cruz-Nicolás, E. An Integrated Attribute-Weighting Method Based on PCA and Entropy: Case of Study Marginalized Areas in a City. Appl. Sci. 2024, 14, 2016. https://doi.org/10.3390/app14052016

AMA Style

Pliego-Martínez O, Martínez-Rebollar A, Estrada-Esquivel H, de la Cruz-Nicolás E. An Integrated Attribute-Weighting Method Based on PCA and Entropy: Case of Study Marginalized Areas in a City. Applied Sciences. 2024; 14(5):2016. https://doi.org/10.3390/app14052016

Chicago/Turabian Style

Pliego-Martínez, Odette, Alicia Martínez-Rebollar, Hugo Estrada-Esquivel, and Ernesto de la Cruz-Nicolás. 2024. "An Integrated Attribute-Weighting Method Based on PCA and Entropy: Case of Study Marginalized Areas in a City" Applied Sciences 14, no. 5: 2016. https://doi.org/10.3390/app14052016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Integrated Attribute-Weighting Method Based on PCA and Entropy: Case of Study Marginalized Areas in a City

Abstract

1. Introduction

2. Related Works

3. Proposed Method for Attribute Weighting

3.1. Proposed Method for Attribute Weighting Based on PCA and Entropy

3.2. The Description of Principal Component Analysis and the Entropy Method

3.2.1. Principal Component Analysis

3.2.2. Entropy Method

4. Application Proposed Method for Attribute Weighting on a Case Study

4.1. Case Study

4.2. Implementation of the Case Study

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI