1. Introduction
The goal of multiple criteria decision aiding (MCDA) is to support a decision maker (DM) in choosing a subset of alternatives that are judged as the most satisfactory from the whole set, or ranking alternatives from the best to the worst in a preference pre-order, or assigning all alternatives to predefined and preferentially ordered categories [
1,
2,
3]. The above types of problems are referred to as
choice,
ranking, and
sorting, respectively [
4,
5]. Refs. [
6,
7] provided comprehensive reviews of MCDA approaches in the literature for dealing with various decision scenarios, and reported new trends and recent developments from the last several years.
In this paper, we focus on the MCS problem. The MCS problem is of major practical interest in many domains, such as supply chain management [
8], ecological planning [
9], renewable energy [
10], and education [
11]. Like the choice and ranking problems, the MCS problem requires the DM to provide preference information in either a direct or indirect manner. In the direct case, the DM is guided by an analyst to specify the parameters of the preference model that could be either precise values for or constraints on weights, trade-offs, category profiles and so on [
12]. Such an elicitation process is usually structured as a series of interactive communication sessions in which the analyst guides the DM to express preferences progressively. Thus, the willingness of the DM to cooperate with the analyst and the ability of the analyst to elicit preference information from the DM strongly influences of the success of this approach [
13]. On the other hand, in the indirect case the DM is required to provide a set of assignment examples on reference alternatives, which could either come from the DM’s previous decisions, or specified by the DM on a subset of alternatives under consideration. Such indirect preference information is usually utilized along with the preference disaggregation analysis, which is a general methodological framework for constructing a preference model that is as consistent as possible with the assignment examples provided by the DM [
14]. The constructed preference model is applied to new decision instances to arrive at an assignment recommendation. MCS approaches that utilize indirect preference information and the preference disaggregation paradigm are regarded to be more user-friendly to the DM than methods based on direct preference information, because the former reduces the cognitive effort for the DM during the course of providing preference information [
14].
Recent developments in information technology have resulted in an explosive growth in data gathered from various fields. Ninety percent of the data in the world has been created in the last two years, at a rate of 2.5 quintillion bytes on a daily basis. This is commonly referred to as big data
1. Retrieving useful information and knowledge from big data can enable organizations to reach better decisions, including deepening customer engagement, optimizing operations, preventing threats and fraud, and capitalizing on new sources of revenue. An estimated 89 percent of enterprises believe that those who do not take advantage of big data analytics run the risk of losing a competitive edge in the market
2, and thus they are gearing up to leverage their information assets to gain a competitive advantage. MCS is also gaining importance in the era of big data, because it can help firms, organizations, and governments to reach better data-driven decisions. For example, in the field of finance, a rating agency can adopt MCS approaches to evaluate the credit risks of thousands of firms, and assign a credit rating to each firm [
15,
16,
17]. As another example, in the field of e-business, firms can utilize big data analytics to analyze the preferences of a large population of consumers, and then apply MCS approaches to divide a large market into segments in order to tailor different marketing policies for targeted segments [
18,
19].
However, it is challenging for existing MCS methods to deal with problems that contain a large set of alternatives and massive preference information. In fact, most MCS methods were originally designed to address problems with several dozen alternatives. The scalability to different problem sizes is not at the core of such approaches [
14]. More specifically, for MCS methods based on the preference disaggregation paradigm, the construction of a preference model from a set of assignment examples usually involves linear programming (LP) and integer programming (IP) formulations. LP models are usually employed to minimize the sum or the maximum of real-valued error variables that represent the degrees of violations of preference relations. IP formulations are used to minimize the number of deviations between the actual evaluation of the DM on the reference alternatives and the recommendations of the inferred preference model [
13,
20]. To the best of our knowledge, all LP and IP formulations in existing MCS methods require the data to fit into the operational memory. Such requirement stems from the fact that most LP or IP solvers search for the optimal solution in the operational memory of a computer and thus the constraint matrix must be loaded in advance. This exceeds the processing capabilities of existing MCS methods in terms of the memory consumption and/or the computational time when dealing with huge amounts of data. Thus, new techniques must be introduced to redesign existing MCS methods so that they can scale up well with new storage and time requirements.
In this study, we aim to propose a new MCS approach for addressing a large set of alternatives and massive preference information. The preference information refers to assignment examples on reference alternatives, which may come from past decision examples, such as historical credit ratings of firms or past customer segmentation during a certain period. In order to deal with such large-scale data sets efficiently, the proposed approach is implemented with the MapReduce framework in a parallel manner. MapReduce is a popular parallel computing paradigm, which is designed to process large-scale data sets. It divides the original data set into disjoint splits, each of which can be easily handled by a single working node. Then, the intermediate solutions from different subsets are aggregated in some way to obtain the final outcomes. Such a computing paradigm allows us to parallelize applications on a computer cluster in a highly scalable manner.
Note that the decision scenario considered in this study is slightly different from those encountered by most traditional MCS methods. In traditional research, constructing a preference model based on the preference disaggregation paradigm is implemented in an interactive procedure where the DM, with the assist of the analyst, could provide assignment examples incrementally so that the DM will calibrate the constructed preference model progressively [
14]. However, in this study, we assume that the assignment examples come from past decision examples, and aim to infer a preference model from such preference information directly and then apply the constructed preference model to new decision alternatives. Therefore, we need a new method for constructing a preference information in an automatic way without the participation of the DM. Such a constructed preference model should be not only consistent with the provided assignment examples, but also robust in the sense that it can tolerate the imperfection in the considered data sets since we care about the predictive ability of the preference model on new decision alternatives.
In this proposed approach, we employ an additive piecewise-linear value function as the preference model, and infer the model’s parameters from the assignment examples using the preference disaggregation paradigm. According to the principle of assignment consistency, we propose a convex optimization model to derive a preference model that can restore the preference information as consistently as possible. This optimization model uses the Sigmoid function to measure the degree of inconsistency when the assignment of any pair of reference alternatives
that violates the consistency principle and the margin to the decision boundary when the assignment of
accords the consistency principle, thus ensuring the derived value function model being both consistent and robust. In contrast to traditional MCS methods, this optimization model avoids using slack variables to specify the degree of inconsistency for any assignment example and does not translate preference information to the linear constraints of the optimization model. Thus, it is especially suitable for addressing large-scale data sets since it does not need to load a huge constraint matrix into the operational memory. In order to solve the optimization model efficiently, we implement the Zoutendijk’s feasible direction method [
21] with the MapReduce framework. The algorithm iteratively searches for the optimal solution by checking a direction that is both feasible and descent. During the iterative process, the MapReduce framework is used to accelerate calculating the value of the objective function at the current solution and its gradient in a parallel manner.
We summarize the contributions of our work from the following aspects. First, our work represents a new MCS approach implemented with the MapReduce framework to address a large set of alternatives and massive preference information. Although data-driven decision making has become a popular contemporary subject, no previous methods can deal efficiently with such a MCS problem. Our approach displays high scalability and an ability to tackle the MCS problem with a large set of alternatives and massive preference information. Second, we propose a convex optimization model to derive a preference model, which answers to consistency and robustness concerns simultaneously. Differently from traditional techniques, this optimization model avoids using slack variables to specify the degree of inconsistency for any assignment example and thus contains a relatively small set of variables and linear constraints, which is especially suitable for addressing large-scale MCS problems. Finally, we propose a parallel method to solve the developed optimization model efficiently. This parallel method does not need to load the whole data set into the main memory and thus has no specific requirements regarding the processing capabilities of working nodes. It proceeds by scanning the preference information sequentially, which is beneficial for the parallel implementation with the MapReduce framework to accelerate the computation. Moreover, it is robust to the splitting and combining operations in the MapReduce framework and the final results are irrelevant for the parallel implementation.
This paper is an extended version of the paper submitted to the DA2PL 2018 conference (From Multiple Criteria Decision Aid to Preference Learning) [
12]. In original paper, we proposed the prototype of the developed scalable decision-making approach and discussed its basic properties. In this paper, the extensions include: (a) a generalized formulation of the problem space, (b) comprehensive performance evaluation across different problem settings, and (c) formal complexity analysis of the proposed approach.
We organize the remainder of the paper in the following way.
Section 2 provides a literature review on MCS methods based on indirect preference information and the preference disaggregation paradigm. In
Section 3, we describe the proposed approach to deal with the MCS problem with a large set of alternatives and massive preference information, as well as its MapReduce implementation. In
Section 4, we apply the proposed approach to a real-word data set and compare the results to that of the UTADIS method.
Section 5 presents and discusses the experimental results of applying the proposed approach to artificially generated data.
Section 6 concludes the paper and provides discussion for future research.
2. Literature Review
The MCS problem has attracted significant interest over the past decade, and various MCS approaches have been proposed in the literature [
22,
23]. According to the employed preference model, which reflects the value system of the DM, MCS approaches in the literature can be categorized into three types: (1) methods motivated by value functions, such as the UTADIS method and its variants [
5,
24,
25,
26,
27,
28]; (2) methods based on outranking relations, such as the ELECTRI Tri-B methods [
29,
30,
31,
32], ELECTRI Tri-C methods [
33,
34], ELECTRI-based methods [
35,
36] and PROMETHEE-based methods [
36,
37]; and (3) rule induction-oriented procedures, such as the DRSA method and its extensions [
38,
39].
Because the approach proposed in this paper belongs to the class of methods based on indirect preference information and the preference disaggregation paradigm, let us review the development of this family of MCS methods as follows. First of all, we focus on value-driven soring methods. The earliest study in this line originates from the UTADIS method [
24,
40]. Based on a set of assignment examples provided by the DM, UTADIS aims to estimate an additive value function, as well as comprehensive value thresholds separating consecutive categories, with the minimum misclassification error. For each assignment example on a particular reference alternative, UTADIS introduces two slack variables to measure the differences between the reference alternative’s comprehensive value and the upper and lower value thresholds that delimit the corresponding category. Then, UTADIS organizes all assignment examples as linear constraints and uses the LP technique to minimize the sum of slack variables for all assignment examples. The constructed preference model with the minimal sum of misclassification errors is applied to assign a non-reference alternative by comparing its comprehensive value to the estimated value thresholds. Another representative among the MCS methods in this line is the MHDIS method [
24]. Differently from UTADIS, the MHDIS method employs a sequential/hierarchical process to perform the assignment decision. It uses several LP and IP models to construct a value function model from assignment examples by minimizing the total number of misclassifications as well as maximizing the clarity of correct classification. From the last decade, the methodology of Robust Ordinal Regression (ROR) has been prevailing in the MCDA community. The methods in the family of ROR for sorting problems include UTADIS
GMS [
5], UTADIS
GMS-GROUP [
41] and MCHP for sorting [
42]. These methods propose to consider the whole set of compatible instances of value functions and derive necessary and possible assignments for a non-reference alternative. Since the necessary and possible assignments are obtained from the whole set of compatible instances of value functions, it is necessary to obtain a consistent set of assignment examples from the DM. In case of inconsistency, it is usual to identify the inconsistent subset of assignment examples using IP formulations and require the DM to revise or remove them so as to restore consistency.
The second stream of MCS methods based on indirect preference information and the preference disaggregation paradigm are the ones based on outranking relations. Ref. [
43] first applied the preference disaggregation analysis to infer the parameters (including criteria weights, category profiles and thresholds) of the ELECTRE TRI-B method [
29] from the given assignment examples. One can refer to [
30,
31,
44,
45,
46] for more methods utilizing various techniques that aim to infer the parameters of ELECTRE TRI-B from the assignment examples. Besides, Ref. [
47] reformulated the ELECTRI Tri-C methods [
33,
34] and developed a disaggregation method for estimating a compatible outranking model from the provided preference information. Apart from the preference disaggregation approaches based on ELECTRE TRI-B and ELECTRE TRI-C, Ref. [
35] proposed an assignment principle for outranking-based sorting and presented a disaggregation procedure to compare non-reference alternatives to reference ones and to assign them to categories accordingly. Ref. [
36] extended the work of Ref. [
35] and developed a method for robustness analysis within the framework of ROR. Due to the relational nature of outranking relations and the involvement of several thresholds related to each criterion, the above outranking-based preference disaggregation methods utilize the integer/non-linear programming techniques or evolutionary algorithms to infer the parameters of an outranking model from the given preference information.
When it comes to rule induction-oriented preference disaggregation procedures for sorting problems, let us mention the DRSA method [
38], which aims to infer a preference model composed of a set of “
if…,
then…” decision rules from the assignment examples. DRSA utilizes a dominance relation for the rough approximation of categories and applies a rule induction strategy for representing the knowledge underlying the assignment examples and for performing the assignment of non-reference alternatives. Ref. [
39] adapted the methodology of ROR to the rule-based preference model and provided for each non-reference alternative the necessary and possible assignments. Note that the complexity of generating decision rules from assignment examples is exponential and thus a considerable amount of computational time and operational memory is needed [
39].
To sum up, the above MCS methods reveal the following two inherent features that are distinguished from other data analysis techniques (e.g., machine learning). The first one consists in the fact that the size of alternatives in MCS is usually limited to several dozens. The optimization techniques (including LP, IP, evolutionary algorithms, and so on) used in these methods can work out the optimal solutions in a reasonable time for small data sets. Thus, the computational efficiency is not the concern of such MCS methods. The second feature is the active participation of the DM. In most of such MCS methods, constructing a preference model is usually organized as interactive sessions in which the DM express their preferences in form of decision examples in order to calibrate the constructed model that would fit the DM’s preferences progressively. However, recall that the decision scenario considered in this study is different from those encountered by traditional MCS methods. When facing a large set of alternatives and massive preference information, traditional MCS methods cannot address the problem properly due to the limitation of computational time and operational memory. Moreover, without the participation of the DM, it is difficult for traditional MCS methods to derive a preference model that both fits decision examples well and has a good predictive ability on new decision alternatives. Thus, it is necessary to propose a new MCS approach for dealing with large-scale data sets and constructing a preference model from assignment examples in an automatic way without the participation of the DM.
3. Proposed Approach
3.1. Problem Description
The aim of this study is to classify a finite set of m alternatives … into p categories …, such that is preferred to (denoted by ), . Each alternative from set A has to be assigned to exactly one category. Such a classification decision is built on some preference information provided by the DM, which involves a set of assignment examples concerning a finite set of reference alternatives …}. Let be the cardinality of the reference set . An assignment example corresponds to the specified assignment of a reference alternative to a category . All the alternatives are evaluated in terms of n criteria …. The performance of on , , is denoted by . Without loss of generality, all the criteria are assumed to have a monotone increasing direction of preferences, i.e., either the greater or the less , the more preferred on , .
To perform the assignment of alternative
, we shall use as the preference model an additive value function
U of the following form:
where
is the comprehensive value of
a, and
,
, are marginal value functions for each criterion. Additive value functions are the most common preference model in MCDA due to their intuitive interpretation and relatively easy computation, despite the underlying assumption on the preferential independence of criteria and the compensation between criteria [
48]. In many practical applications, this assumption is commonly adopted due to its ability to significantly reduce the complexity of the decision model. For instance, the UTADIS method, which is a well-established approach in multiple criteria sorting, also relies on this assumption to construct additive value functions. Our choice of an additive piecewise-linear value function is primarily motivated by the need to balance model complexity with computational efficiency, especially when dealing with large-scale datasets.
In this paper, a piecewise-linear function
is used to estimate the actual value function of criterion
,
(see
Figure 1). Let
denote the evaluation scale of criterion
, such that
and
are the minimal and maximal evaluation, respectively. In defining the piecewise-linear form of the marginal value function
,
is divided into
equal sub-intervals
, where
,
…
. Then, we can use linear interpolation to estimate marginal values:
According to (
1), we only need to determine the marginal values
at characteristic points, so that the piecewise-linear value function
can be fully specified. Note that when using piecewise-linear value functions with a sufficient number of characteristic points, we can approximate any non-linear value function. The latter observation enhances their appropriateness for a wide range of applications.
Then, we introduce a new variable
to denote the difference between
and
:
Therefore, with the use of
,
can be formulated in the following way:
Definition 1 ([49]). The characteristic vector of alternative a is a column vector of which entries are the coefficients of variables in marginal values on each criterion , , where
where satisfies for . Having defined
, we can compute comprehensive value
as follows:
where
. To normalize the value of alternative
a such that
, we can set
where
is a column vector whose entries are all equal to 1. In this case, the trade-off weights for each criterion
can be retrieved as
,
.
3.2. Model for Estimating Value Function
Definition 2. For any pair of alternatives a and b, a given value function is said to be consistent with the assignment of a and b (denoted by and , respectively, and ), if and only ifwhere ≿ and ≾ mean “as least as good as” and “as most as good as”, respectively. Observe that (2) and (3) are equivalent tosince and . For many MCS problems, we cannot find a consistent value function given a set of reference alternatives because of some inconsistent assignment examples. In this case, traditional methods based on the preference disaggregation paradigm introduce a set of slack variables, which would specify how much inconsistency there is for any pair of reference alternatives that violate the above consistency principle, and then minimize the sum of all the slack variables by solving a LP model to derive a value function that is as consistent with the preference information as possible. However, considering that a large-scale problem contains a huge number of reference alternatives, the number of linear constraints and that of slack variables in the objective of the LP model will be quite large, which exceeds the processing capabilities of most LP solvers. In this paper, differently from traditional methods, we propose a new model to estimate an additive value function as follows.
First, let us introduce a set of indicators
for any pair of reference alternatives
such that
, which are defined as
According to the above consistency principle for assignment, we aim to find a vector
such that, for any pair of reference alternatives
with
, we have
Then, instead of using slack variables, we can transform
into a value
for any pair of reference alternatives
so that we can use the difference between
and
to measure the inconsistency.
should satisfy the following conditions:
(a) is monotone and increasing with respect to ,
(b) is bounded with the interval ,
(c) if ; if ; and if .
In this way, we can use the difference between and to measure (a) how much inconsistency there is for any pair of reference alternatives and that violate the consistency principle and (b) how much the margin to the decision boundary is when the assignment of accords the consistency principle. Specifically, for the case that , if the inconsistency occurs (i.e., ), the difference between and is greater than 0.5 because ; the greater the difference between and is, the larger the degree of consistency is for the assignment of and . On the contrary, if the assignment of accords the consistency principle (i.e., ), the difference between and is smaller than 0.5 because ; the smaller the difference between and is, the larger the margin to the decision boundary is for the assignment of and . The case that can be analyzed analogously. Therefore, we propose to minimize the difference between and so that we can derive a value function that will not only minimize the degree of inconsistency when the assignment of violates the consistency principle (i.e., consistency), but also maximize the margin to the decision boundary when the assignment of accords the consistency principle (i.e., robustness) so that the derived preference model will tolerate the imperfection in the considered data set.
In this paper, we use the following Sigmoid function to instantiate the function
:
The Sigmoid function satisfies all the above requirements of the function
. Then, we can consider the following non-linear optimization model to derive a value function that is as both consistent and robust as possible.
For any pair of reference alternatives
, when
, the above optimization model aims to maximize
to minimize the difference between
and
and neglects the term
because
. On the contrary, when
, the above optimization model attempts to maximize
to minimize the difference between
and
and neglects the term
because
. Note that the objective (
8) is in the multiplicative form, rather than the additive form, because the former avoids the compensation between different terms in the objective, which is especially useful when the number of pairs of reference alternatives is large. The most important benefit of choosing the Sigmoid function consists in that, by using the Sigmoid function, the optimization model (
8)–(
9) is a convex optimization problem because the objective (
8) is convex which has been proved in [
50]. This inspires us to extend classical convex optimization algorithms to address such a problem. Note that the objective (
8) has been used in the logistic regression, a machine learning model for binary classification in the probabilistic framework [
50]. One can observe that the optimization model (
8)–(
9) contains
linear constraints and all the preference information is involved in the objective (
8). In this way, we avoid loading a huge constraint matrix into the operational memory.
3.3. Algorithm Based on the MapReduce Framework
In order to solve the non-linear optimization model (
8)–(
9) for a large-scale problem efficiently, we propose a new parallel implementation for the Zoutendijk’s feasible direction method [
21] based on the MapReduce framework. The Zoutendijk’s feasible direction method is an algorithm for solving constrained optimization problem and iteratively searches for a direction that is both feasible and descent given a current feasible solution until finding the optimal solution. Although the problem (
8)–(
9) contains a small set of linear constraints, the objective (
8) involves the information of the set of all pairs of reference alternatives
such that
, the size of which is quite large. Thus, we propose to utilize the MapReduce framework to accelerate the iterative searching procedure of the Zoutendijk’s feasible direction method in a parallel manner so that it can address large-scale MCS problems.
3.3.1. Zoutendijk’s Feasible Direction Method for Solving the Optimization Problem
By taking the logarithm of (
8), the non-linear optimization problem (
8)–(
9) can be organized as follows
Then, the model (
10)–(
11) can be reformulated as follows
In order to apply the Zoutendijk’s feasible direction method, we need to define the feasible direction and the descent direction for a feasible solution at first.
Definition 3 ([51]). Let be a feasible solution of the model (12)–(13). A vector is said to be a feasible direction at , if there exists a positive number δ such that is a feasible solution for any scalar . Proposition 1. Let be a feasible solution of the model (12)–(13), and andA vector is a feasible direction at , if and only if and . Proof. On the one hand, suppose that is a feasible direction. According to the definition of feasible direction, there exists a positive scalar for which is a feasible solution, i.e., and . Because is a feasible solution, we have , and for , and for . Therefore, we have and . On the other hand, assume that and . Because for , there exists a positive number such that , for any scalar . Moreover, as for , we have for . Thus, . Additionally, since and , it must be that . Therefore, is a feasible solution and is a feasible direction. □
Definition 4. ([21]). Let be a feasible solution of the model (12)–(13). A vector is said to be a descent direction at , if there exists a positive number δ such that for any scalar . The Zoutendijk’s feasible direction method iteratively searches for a direction
that is both feasible and descent given a current feasible solution
. According to Definition 4 and Proposition 1, searching for such a direction
can be addressed by solving the following LP model
where (
17) guarantees to derive a bounded solution. Obviously,
is a feasible solution of the LP model (
14)–(
17). Thus,
at the optimum must be not greater than zero. If
at the optimum, then
is a direction that is both feasible and descent; otherwise,
is the global optimal solution of the model (
12)–(
13) which is proved by the following proposition.
Proposition 2. Let be a feasible solution of the model (12)–(13), and andThen, is the global optimal solution of the model (12)–(13), if and only if, for the LP model (14)–(17), the objective at the optimum is equal to zero. Proof. For the model (
12)–(
13), according to the Karush–Kuhn–Tucker conditions [
21],
is a local optimal solution, if and only if there exist
,
and
such that
and
. Let
and
. Then, we have
According to the Farkas’ lemma [
51], (
18)–(
20) is feasible if and only if
,
,
and
is infeasible. Then,
,
and
is infeasible. Thus,
is a local optimal solution of the model (
12)–(
13), if and only if, for the LP model (
14)–(
17), the objective
at the optimum is equal to zero. Considering that the model (
12)–(
13) is a convex optimization problem,
is the global optimal solution. □
With the current feasible solution
and the obtained direction
that is both feasible and descent, the Zoutendijk’s feasible direction method proceeds to the next iteration according to the following equation
where
is the feasible solution for the next iteration. Let us consider to determine
for (
21) which should guarantee that (1)
is feasible, and (2) the objective
of the model (
12)–(
13) should decrease as quickly as possible. This can be achieved by solving the following optimization problem
Proposition 3. The model (22)–(25) is equivalent to the following optimization problemwhere Proof. Since
is a feasible solution and
is a feasible direction, we have
and
. Thus, Constraint (
23) can be eliminated. Moreover, considering that
and
for
and
, we have
for
. Thus, Constraint (
24) can be reduced to
for
. Then, we can derive the upper bound for
as
Therefore, the model (
22)–(
25) can be transformed to the model (
26)–(
27) equivalently. □
Since the model (
26)–(
27) contains only one variable
, we can use the golden section method [
21], a derivative-free line search algorithm, to address the optimization problem, which is given in Algorithm 1. Then, based on the above analysis, the Zoutendijk’s feasible direction method for addressing the problem (
12)–(
13) is described in Algorithm 2.
Algorithm 1 The golden section method for addressing the model (26)–(27). |
Input: and stopping tolerance . - 1:
, , , . - 2:
if
then - 3:
Stop and is the optimal solution. - 4:
else - 5:
Calculate and . - 6:
if then - 7:
, , and go to step 2. - 8:
else - 9:
, , and go to step 2. - 10:
end if - 11:
end if
Output: The optimal solution . |
Algorithm 2 The Zoutendijk’s feasible direction method for addressing the problem (12)–(13). |
Input: Initial feasible solution . - 1:
Determine and according to the current feasible solution . - 2:
Calculate . - 3:
Solve the LP model ( 14)–( 17) and obtain the optimal solution . - 4:
if
then - 5:
Stop and is the global optimal solution. - 6:
else - 7:
Use Algorithm (1) to solve the model ( 26)–( 27) and obtain the optimal solution . - 8:
Update . - 9:
Go to step 1. - 10:
end if
Output: The optimal solution . |
3.3.2. Implementing the Algorithm with the MapReduce Framework
Considering that the number of reference alternatives in a large-scale problem is reasonably large, the objective (
12) is composed of a huge number of terms (i.e.,
,
). This inspires us to utilize the MapReduce framework to accelerate the computation of
and
for the Zoutendijk’s feasible direction method.
The MapReduce framework is a parallel computing paradigm proposed by Google Inc. It allows large data sets to be tackled over a cluster of computers in a parallel manner regardless of the underlying hardware. It is based on the philosophy that most computing tasks can be performed in the same way and then intermediate results can be aggregated to generate the final results. Such a programming paradigm is implemented based on the Map and Reduce functions, which are inherited from the classical functional programming paradigm. Specifically, the cluster of computers is composed of a master working node and several slave working nodes. In the Map phase, the input data set is divided into independent and disjoint subsets by the master node and then distributed to slave nodes. Then, smaller problems are handled by the slave nodes and then answers for all sub-problems are returned to the master node. Finally, in the Reduce phase, the answers for all sub-problems are combined in some manner to generate the final output by the master node. Thus, in order to utilize the MapReduce framework, one only need to specify what should be calculated in the Map and Reduce functions, while the issue of how to distribute the data among the cluster of working nodes is addressed by the system automatically [
52]. It is worty to note that, while alternative frameworks such as Apache Spark offer advantages in iterative tasks and in-memory computation, MapReduce was chosen due to its suitability for handling large-scale data sets with high fault tolerance and low resource requirements. MapReduce’s simplicity and maturity make it an ideal choice for implementing the proposed optimization algorithm in a distributed environment.
Both the Map and Reduce functions employ
pairs as input and output. The Map phase takes each
pair as input and generates a set of intermediate
pairs as output. This could be presented as follows:
Then, the master node merges all the values that share the same intermediate key as a list (known as the shuffle phase). The Reduce phase takes a key and its associated value list as input, and produces the final value. This could be presented as follows:
Figure 2 presents the flowchart of the MapReduce framework.
In order to utilize the MapReduce framework to accelerate the computation of
and
, we need to divide the whole set of pairs of reference alternatives
such that
into disjoint subsets. Then, each subset is replicated and transfered to working nodes to be handled by a Map task in parallel. A Map task calculates the sum of increments for
and
based on the subset it receives. Finally, the Reduce task sums over all the outputs generated by each Map task and obtains the final results. In this parallel manner, the MapReduce framework helps to accelerate the computation of
and
. Algorithms 3 and 4 describe the Map and Reduce phases for calculating
, respectively. Since the computation of
is similar to that of
, we do not present the corresponding algorithm here to save space. Remark that, because
and
are in additive forms with respect to pairs of reference alternatives
such that
, the final outcomes are irrelevant for the parallel implementation and this method is robust to the Map and Reduces phases in the MapReduce framework.
Algorithm 3 Calculate : Map phase. |
Input: where is the index of subset and is the subset of pairs of reference alternatives such that ; the current feasible solution . - 1:
. - 2:
for any pair of reference alternatives in this subset do - 3:
. - 4:
end for
Output: . |
Algorithm 4 Calculate : Reduce phase. |
Input: . - 1:
. - 2:
for any in do - 3:
. - 4:
end for
Output: where is equal to . |
With the optimal solution
, we can calculate comprehensive values
for each alternative
. Because the value function constructed with the optimal solution
does not guarantee perfect consistency (i.e., reproduction of all assignment examples), we cannot apply the standard example-based value-driven sorting procedure [
5] to perform the assignment for alternatives
and need a new method to deal with the potential inconsistencies when suggesting assignment. For this purpose, we can calculate the following consistency degree for quantifying the assignment
,
, and choose
with the maximum
as the final assignment of alternative
a:
This degree indicates a proportion of reference alternatives assigned to a class either worse or better than
that attain comprehensive values, respectively, lower or greater than
a according to the estimated preference model. Clearly, the greater
is, the more justified is the assignment of
a to
. Hence, we select
with the maximal
as the recommended category for
.
4. Application to University Classification
In this section, we apply the proposed approach to university classification and then examine its performance by analyzing the experimental results. The experimental analysis is based on Best Chinese Universities Rankings (BCUR) in 2018
3, which provides the overall ranking of 600 universities in China. BCUR ranks universities according to their performances in four dimensions: teaching and learning, research, social service as well as internationalization. The specification of the considered criteria and corresponding indicators in each dimension are summarized in
Table 1.
The BCUR 2018 dataset was chosen for several key reasons that make it particularly suitable for validating our proposed method:
Comprehensive and Multi-Dimensional Ranking: BCUR ranks universities based on their performance in four key dimensions: teaching and learning, research, social service, and internationalization. This multi-dimensional approach aligns well with the nature of multiple criteria sorting (MCS) problems, where decisions are based on multiple criteria.
Transparency and Reliability: BCUR is known for its transparency, as it not only publishes the detailed methodology and data sources but also provides the raw data of all evaluated indicators. This allows for thorough validation and further analysis, which is crucial for our experimental setup.
Representativeness: The dataset includes 600 universities from mainland China, providing a broad and representative sample. This ensures that the results obtained from our method are generalizable and can be applied to a wide range of decision-making scenarios.
Relevance and Practical Application: The ranking criteria used in BCUR are highly relevant to real-world decision-making processes. For example, the “Quality of Incoming Students” indicator reflects the recognition of universities by parents and students, while the “Education Outcome” indicator measures the acceptance of graduates by employers. These criteria are directly applicable to the types of decisions our method aims to support.
Data Quality and Availability: BCUR provides high-quality, structured data that is publicly available. This makes it an ideal dataset for validating our method, as it ensures the reliability of the results and allows for reproducibility of the experiments.
Although the size of this data set is not large enough, we can compare the performance of the proposed approach to that of the classical UTADIS method for this problem. Since both of the two methods employ an additive value function as the preference model, one can clearly observe the results from them in terms of some measures as well as the shape of the constructed value functions. First, we divide all the 600 universities into five categories according to their total scores. Each category is composed of 120 universities and and are the best and worst categories, respectively. For the convenience of subsequent experimental analysis, the evaluation scale of each criterion is divided into the same number of sub-intervals. In particular, we examine the results for the following numbers of sub-intervals: . The performance of the proposed approach is compared with that of the UTADIS method through cross-validation: the reference set is divided into S (usually ) folds of equal size ensuring that the proportions between categories are the same in each fold as they are in the whole reference set. Then, of the folds are used to train the model that is then evaluated on the remaining fold. This procedure is repeated for all S possible choices, and then the classification accuracies from the S runs are averaged. Since the problem contains only 600 universities, we implement the proposed approach without the MapReduce framework. The UTADIS method and the proposed approach are executed with Java and IBM ILOG CPLEX® solver.
Table 2 reports the average of accuracies for ten-fold cross-validation. It is obvious that the proposed approach has a significant advantage in classification accuracy over the UTADIS method for the same
. When
, both the proposed approach and the UTADIS method achieve the best performance. Thus,
is the optimal setting of the number of sub-intervals on each criterion for this problem. Moreover, the results of the one-tailed paired
t-test with the significance level 0.05 to test the statistical significance confirm that the proposed approach performs better than the UTADIS method significantly. The reason underlying the significant difference between the performance of the two methods consists in that the proposed approach not only minimizes the degree of inconsistency when the assignment of
violates the consistency principle, but also maximizes the margin to the decision boundary when the assignment of
accords the consistency principle, which produces a model that is as both consistent and robust as possible. By contrast, the UTADIS method only focuses on minimizing the sum of inconsistency degrees between the comprehensive values of reference alternatives and the respective category thresholds. When the assignment examples of reference alternatives are consistent (e.g., the considered problem of university classification), there are multiple optimal solutions for the UTADIS method. In this case, the selected value function selected by the UTADIS method is very likely to be different from the true one.
Then, we use the whole reference set as the training data to derive value functions using different methods and depict marginal value functions on each criterion in
Figure 3. The trade-off weights of criteria derived from different methods are reported in
Table 3. We observe that the shape of marginal value functions derived from the UTADIS method with different
differ from each other significantly. Moreover, the trade-off weights for the same criterion with different
vary significantly. This indicates that the UTADIS method is not robust to the way of dividing the range of evaluation into sub-intervals, which makes the trade-off weights for criteria unstable and the results difficult to interpret. In contrast to the UTADIS method, the marginal value functions derived from the proposed approach are more smooth than those generated by UTADIS method. Additionally, the trade-off weights for the same criterion with different
do not vary significantly. Thus, the proposed approach seems to be robust to the way of dividing the range of evaluation into sub-intervals.
5. Simulation Experiments
This section presents the experimental analysis that was performed to examine the performance of the proposed approach in dealing with large-scale MCS problems. First,
Section 5.1 describes the performance measures used to evaluate the proposed approach.
Section 5.2 presents the configuration details for the hardware and software used in our experiments.
Section 5.3 describes the generation of the data, the factors considered in the implementation, and the general experimental process.
Section 5.4 reports and discusses the obtained results.
5.1. Performance Measures
In this work, we study the implementation of the MapReduce framework for the proposed approach. Thus, we can evaluate the performance of the proposed approach according to the following measures.
Accuracy: This measure quantifies how many alternatives are correctly classified by the proposed approach. In our experiments, we only focus on the classification accuracy on the set of non-reference alternatives A.
Runtime: This metric quantifies the overall computational time required by the proposed approach.
Speedup: This measure compares the efficiency of the proposed approach implemented with a parallel system to that achieved by the sequential version on a single node. It can be calculated as
5.2. Hardware and Software
The MapReduce framework in this study is implemented on seven working nodes in a cluster: one master node and six working nodes. All the working nodes share the same configurations as follows:
Processors: 2 × Intel Xeon E5-2620.
Cores: 6 per processor (12 threads).
Clock speed: 2 GHz.
Cache: 15 MB.
Network connection: Intel 82579LM Gigabit.
Hard drive: 1 TB (7200 rps).
RAM: 32 GB.
The software for implementing the MapReduce framework and associated settings are as follows:
5.3. Experimental Design
The investigation is based on the generation of random data that follows a uniform distribution. We describe the relevant factors in the simulation in
Table 4:
We adopt a ceteris paribus design in the following experimental analysis and begin the experiments from the benchmark of the problem setting as follows: K, , , and . Then, we consider different levels of one factor while keeping others fixed to investigate its impact on the results. For each problem setting, 50 data sets are randomly generated to test the performance of the proposed approach. Each data set is equally divided into two subsets: the reference set and the test set A. The reference set is used to develop the sorting model, and the test set A is used to test the performance of the proposed approach. Both the two sets contain the same number of alternatives.
In the phase of data generation in each experiment, we first specify the actual value function and the actual category thresholds. Then, the actual assignment of alternatives can be determined by calculating their comprehensive values and comparing them to category thresholds. We specify the actual preference model in a way such that the number of alternatives is equally distributed in all the categories for both and A. Note that for each generated data set, we mislabel the assignment of 5% of the reference alternatives in , considering them to be inconsistent assignment examples.
The general experimental process is described as follows:
Select a specific problem setting. For the problem involving alternatives and m criteria, drawing values randomly from a uniform distribution [0, 1] to generate an evaluation matrix.
Specify the actual preference model, including the actual value function and the actual category thresholds.
Obtain actual assignment outcomes for alternatives in and A by calculating their comprehensive values and comparing them to category thresholds.
Apply the proposed approach to develop a preference model.
Determine the assignment outcomes by applying the developed preference model to the set of alternatives .
Compare the two assignment outcomes, and evaluate the performance of the proposed approach by collecting the accuracy, runtime, and speedup.
5.4. Experimental Results
This section describes and analyzes the results generated in the experimental study. The performance of the proposed approach is evaluated in terms of the accuracy, runtime, and speedup. In the following, we present the results and analysis for various levels of the considered factors.
5.4.1. Performance Evaluation on Different Numbers of Alternatives
Table 5 and
Figure 4 summarize the results obtained on the generated data sets with different numbers of alternatives. Other than the number of alternatives, the other settings remain the same as the benchmark.
Table 5 presents the minimum, maximum, average, and standard deviation of the accuracy and the average of the computational time.
Figure 4 plots the “Box and Whisker” graph, to show the distribution of the accuracy for different numbers of alternatives. From
Table 5, we observe that there is no noticeable tendency in the variation of the classification accuracy when the number of alternatives varies. This is because we use the same method for generating the experimental data sets, and thus all of the generated data sets share a similar problem structure, although they have different numbers of alternatives. On the other hand, we observe from
Table 5 that a longer runtime is required for the proposed approach as the number of alternatives increases. This is because that more alternatives bring more pairwise comparisons between reference alternatives and the proposed approach needs more runtime to handle large-scale reference sets. However, note that traditional MCS methods cannot deal with such large-scale problems because it exceeds processing capabilities of these methods in terms of memory consumption and computational time. By contrast, the proposed approach provides a feasible way of handling huge amounts of data.
5.4.2. Performance Evaluation on Different Numbers of Criteria
In this part, we focus on the analysis of the performance of the approach on data sets with different numbers of criteria.
Table 6 and
Figure 5 present the obtained results. As we can see, the accuracy gradually decreases and its standard deviation increases as the number of criteria increases. Specifically, the proposed approach can achieve an average of accuracy of 0.930 when the number of criteria is only two. The average of accuracy drops below 0.860 when the number of criteria reaches seven. The reason for this is that when there are more criteria, the preference model contains more parameters that need to be estimated, and thus it becomes more difficult for the proposed approach to infer a preference model sufficiently close to the actual one, which increases the number of incorrect assignment. On the other hand, it can be observed from
Table 6 that although the difficulty of solving the problem increases when dealing with more criteria, the computational time of the proposed approach does not increase dramatically, ranging from 952.251 s for two criteria to 1271.060 s for seven criteria.
To analyze this phenomenon further, we measure the difference between the inferred preference model and the actual one by calculating the Euclidean distance between the two models’ weight vectors. The outcomes are summarized in
Table 7, showing that when there are more criteria, the Euclidean distance between the two models’ weight vectors becomes larger. Thus, the ability of the proposed approach to infer a weight vector that is close to the actual one deteriorates when dealing with more criteria, which is the main reason for the occurrence of more incorrect assignment. However, this argument assumes that the considered criteria are not correlated. When the criteria are highly correlated, the inferred preference model will barely change from the actual one.
5.4.3. Performance Evaluation on Different Numbers of Categories
This section focuses on the study of the performance of the approach in dealing with different numbers of categories. As
Table 8 and
Figure 6 clearly show, the average of the accuracy decreases and the variation of the accuracy increases when there are more categories. More precisely, the average of the accuracy decreases from 0.917 to 0.810, while the variation in the accuracy increases from 0.030 to 0.053. This reveals that the difference between the inferred preference model and the actual model increases when dealing with more categories, because the presence of more categories introduces more flexibility to the considered problem. On the other hand, the runtime of the proposed approach increases when dealing with more categories, because more pairwise comparisons between reference alternatives must be considered in the phase of inferring the parameters of the preference model.
In addition, we count the number of incorrect assignment for various differences between the predicted assignment and the actual one. The difference between the predicted assignment and the actual one is defined as the absolute difference between the indices of the two assignment. For example, if the actual assignment for an alternative
is category
but it is assigned to category
by the proposed approach, then the difference between the predicted assignment and the actual one for the alternative
a is two.
Table 9 summarizes the percentage of incorrect assignment with various differences between the predicted assignment and the actual one in cases of different numbers of categories. One can observe that the difference between most wrong assignment and the corresponding actual assignment is one, even in the case of six or seven categories. This indicates that although the performance of the proposed approach deteriorates as the number of categories increases, alternatives that are misclassified are assigned to categories that are close to their actual assignment, and such outcomes can be accepted by the DM.
5.4.4. Performance Evaluation on Different Numbers of Working Nodes
In this section, we aim to study the influence of the number of working nodes on the performance of the proposed approach.
Table 10 presents the average runtime and the speedup achieved by the proposed approach. Note that we perform all comparisons on the same data sets in this section. Given the results, we wish to make the following comments. On the one hand, because we utilize the MapReduce framework to accelerate the computation of the coefficients for the LP model and work out recommended assignment for non-reference alternatives, the achieved accuracy of the final outcomes is independent of the number of working nodes used in the implementation. On the other hand, as the computational complexity remains the same for each certain problem, more working nodes will accelerate the computational process and improve the efficiency of the approach. Thus, one can observe that the runtime decreases and the speedup increases as the number of working nodes increases.
Table 11 and
Figure 7 report the runtime for different numbers of alternatives and working nodes. We can observe that the runtime under different settings varies significantly: when six working nodes are used to deal with 10 K alternatives, less than one minute is needed to solve the problem; when addressing 400 K alternatives with only one working node, the proposed approach spend more than 10,000 s. On the other hand, although more alternatives induce more pairwise comparisons, of which the complexity is
, the runtime does not increase in a ratio of
. This is due to the characteristics of the Hadoop platform, such as the load balance technique, which enhances the performance of the proposed approach and improves the computation efficiency during the process.
6. Conclusions
In this paper, we develop a new MCS approach based on the MapReduce framework for dealing with a large set of alternatives and massive preference information. This approach utilizes the preference disaggregation paradigm to construct a convex optimization model, which aims to infer a preference model in form of a piecewise-linear additive value function from assignment examples on reference alternatives. Such an model considers the consistency and robustness concerns of the preference model simultaneously and avoids using slack variables to specify the degrees of inconsistency for assignment examples, which is especially suitable for addressing large-scale MCS problems. Then, a new parallel algorithm is developed to solve this optimization model and the MapReduce framework is utilized to deal with the set of reference alternatives and associated preference information in a parallel manner in order to accelerate the computation.
The main advantage of the proposed approach consists in its scalability in dealing with large sets of alternatives and massive preference information. It avoids using slack variables to construct a preference model and thus contains a relatively small set of variables and linear constraints. Then, it utilizes the MapReduce framework to deal with the set of reference alternatives and associated preference information in a parallel manner, and has no specific requirements on the processing capabilities on working nodes. Moreover, the proposed approach is robust to the parallel implementation, and the final outcomes are irrelevant to the splitting and the combining operations in the MapReduce framework. Therefore, it is not necessary to specify particular operations for how to split the original data set and aggregate the intermediate results.
We acknowledge that our approach relies on an additive piecewise-linear value function, which assumes preferential independence among criteria. While this assumption simplifies the model and enhances computational efficiency, it may not adequately capture complex, non-linear preference structures that are common in real-world decision-making scenarios. Non-linear preference structures can arise when the interaction effects between criteria significantly influence the overall preference. For example, in certain multi-criteria optimization problems, the combined effect of two criteria may be greater than the sum of their individual effects. Our method, as currently formulated, does not account for such interactions, which could lead to suboptimal decision outcomes in scenarios where non-linear preferences are prevalent. Moreover, the performance of our approach is inherently dependent on the quality of the input data. Inaccurate or noisy data can lead to misleading preference models and, consequently, incorrect decision recommendations. For instance, if the assignment examples provided by the decision maker are inconsistent or contain errors, the inferred value function may not accurately reflect the true preferences. This sensitivity is a common challenge in data-driven decision-making and underscores the importance of robust data preprocessing and validation steps. While our method includes mechanisms to handle some degree of inconsistency in the data, it is not immune to the detrimental effects of poor data quality.
We consider the extension of this approach in several interesting directions. First, a real-world application for a large-scale MCS problem is required to validate the proposed approach. Second, in cases where significant interactions are expected, future extensions of our work could incorporate more sophisticated preference models that account for such interactions, such as non-additive value functions or models that explicitly consider criterion interdependencies. Third, another interesting direction would be to employ other types of preference models, such as outranking preference models and rule induction model, to address the considered MCS problem with the MapReduce framework. This needs developing hybrid models that combine the strengths of outranking, rule-based, and value function approaches. Such models could offer greater flexibility and robustness in handling diverse decision scenarios. Moreover, it requires exploring how to further enhance the scalability of outranking and rule-based models when applied to large-scale datasets. This could involve optimizing the computational efficiency of these models or developing novel parallelization strategies.Finally, by incorporating multiple datasets and rigorous validation techniques in future research, we aim to strengthen the external validity and generalizability of our findings. This will ensure that our method is not only scalable but also reliable and applicable across a wide range of decision-making scenarios.