A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets

Topal, Ahmet; Guler Bayazit, Nilgun; Ucan, Yasemen

doi:10.3390/math12182944

Open AccessArticle

A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets

by

Ahmet Topal

^1,2,*

,

Nilgun Guler Bayazit

²

and

Yasemen Ucan

²

¹

Department of Mathematics Engineering, Istanbul Technical University, Sariyer, 34469 Istanbul, Türkiye

²

Department of Mathematics Engineering, Yildiz Technical University, Esenler, 34220 Istanbul, Türkiye

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(18), 2944; https://doi.org/10.3390/math12182944

Submission received: 23 August 2024 / Revised: 15 September 2024 / Accepted: 19 September 2024 / Published: 22 September 2024

(This article belongs to the Special Issue Fuzzy Decision Making and Soft Computing Applications: Future Perspectives)

Download

Browse Figures

Versions Notes

Abstract

The handling of missing attribute values remains a challenging and problematic issue in data analysis. Imputation techniques are key procedures used to deal with missing attribute values. However, although these methods are widely used, they cause data bias. Rough set theory, a unique mathematical tool for decision making under uncertainty, overcomes this problem by properly adjusting the relationships. Rough sets are often preferred in both classification and sorting problems. The aim of sorting problems is to sort the objects in the decision table (DT) from best to worst and/or to select the best one. For this purpose, it is necessary to obtain a pairwise comparison table (PCT) from the DT. However, in the presence of missing values, the transformation from DT to PCT is not feasible because there are no ranking methods in the literature for sorting problems based on rough sets. To address this limitation, this paper presents a way to transform from DT to PCT and introduces a generalization of the relation belonging to the “do not care” type of missing values in the dominance-based rough set approach (DRSA) to the decision support tool jRank. We also adapted the DomLem algorithm to enable it to work in PCT with missing values. We applied our method step by step to a decision table with 11 objects and investigated the effect of missing values. The experimental results showed that our proposed approach captures the semantics of ‘do not care’ type missing values.

Keywords:

decision tables; pairwise comparison table (PCT); missing attribute values; dominance-based rough sets; ranking

MSC:

03E75; 68V30

1. Introduction

Decision-making under uncertainty is a fundamental challenge in many fields, ranging from economics and business to science and everyday life [1,2,3]. Unlike decisions made under certainty, where outcomes and probabilities are known, uncertainty requires individuals and organizations to make decisions with incomplete or ambiguous information. This introduces a level of complexity that requires a careful assessment of risks, potential rewards, and unforeseen consequences. Understanding how to approach decisions in uncertain environments is crucial for minimizing risks and maximizing outcomes [1].

In today’s business environment, data are critical elements in making organizations more competitive and developing more effective strategies [4]. As organizations increasingly rely on data-driven insights to guide their operations, the ability to collect, analyze, and interpret data has become a key differentiator [4,5]. When used properly, data enable organizations to make more informed, knowledge-based decisions. In parallel with the increasing reliance on data, the need for effective data management has become more prominent. Organizations need to focus not only on collecting large amounts of data but also on ensuring the quality and integrity of the data they collect.

One of the biggest challenges at this point is that data sets often suffer from incomplete or missing information due to various factors, such as human error, technical glitches, or limitations in data collection tools. These missing values pose significant challenges, particularly in decision-making processes where each criterion plays a crucial role in determining outcomes. These missing values need to be filled in order to apply an appropriate multi criteria decision making technique.

Many techniques have been developed in the literature to make it possible to analyze datasets with missing values [6,7,8]. One of the simplest and most commonly used approaches is to delete the samples with the missing values. This method, which is usually preferred for datasets with a small number of missing values, results in a loss of information in the datasets, as it also deletes the values of other available attributes of the relevant instances [9]. Another commonly used approach is to fill the missing attribute values with an appropriate value (e.g., mean, median) or by using advanced methods such as machine learning techniques and model-based methods [10]. However, these approaches introduce data bias [11]. Therefore, the use of these techniques in decision making is in some cases undesirable or should be treated with caution.

Rough set theory [12], proposed by Pawlak in 1982, allows for analysis without forcing missing attribute values in data tables. The problem is solved by correctly defining the relationship that forms the information granules [13]. Kryszkiewicz [14] adapted the concepts of dispensable and indispensable attributes, core, decision rules, and reductions from classical rough set theory to incomplete information systems with respect to a tolerance relation. Stefanowski and Tsoukias [15] proposed two extensions of classical rough set theory to deal with the missing values: valued tolerance relations and non-symmetric similarity relations. Wang [16] generalized the classical rough set theory based on the limited tolerance relation, making it suitable for incomplete data tables. On the other hand, adaptations of the dominance-based rough set approach (DRSA) under incomplete information systems were presented in [17,18]. Błaszczyński, Słowiński, and Szelag [18] presented alternative ways of handling missing values in dominance-based rough sets. In their experimental study, Naive Bayes, C4.5, Ripper, VC-DomLEM-mv2, VC-DomLEM-smv, SVM, OLM, and OSDL classifiers were compared in terms of mean absolute error on datasets with the percentages of missing values ranging from 5% to 50%. Experimental results revealed that VC-DomLEM-mv2, SVM, and Naive Bayes were the best-performing classifiers.

Unlike the classification problem in dominance-based or classical rough sets, it may be desirable to rank objects in a decision table and choose the best one among them [19]. However, the most important point is how to rank objects under missing attribute values. A gap in the literature is the absence of rule-based ranking for information systems with “do not care” type missing values. The key idea of this paper is to define a methodology for dealing with a multi-criteria ranking problem based on the DRSA using a decision table that contains ”do not care” type of missing values. Our proposed approach consists of three main steps: transformation from decision table to PCT, generalization of dominance relations under incomplete PCT, and adaptation of the DomLem algorithm. Our contributions are detailed as follows.

We constructed a PCT from an incomplete decision table by defining transformation formulas for both ordinal and cardinal attributes.
We introduced a generalization of dominance relations to compute the dominating set, dominated set, and set approximations under an incomplete PCT.
We adapted the DomLem algorithm to deal with “do not care” missing values for the purpose of extracting decision rules from a PCT.

The remaining parts of the paper are structured as follows. In Section 2, we provide some basic mathematical notions of the DRSA and its extension to the “do not care” type missing values. In Section 3, we present how a multi-criteria ranking method based on the dominance rough set approach can be applied to an information system with missing values of the “do not care” type. Furthermore, we establish the relationships concerning the transformation of attributes from the decision table to the pairwise comparison table in the presence of missing values. Additionally, we address the adaptation of the rule induction algorithm DomLEM to pairwise comparison tables with missing values. In Section 4, we present the experimental design and its outcomes through an example. The last section has our concluding remarks.

2. Dominance-Based Rough Set Approach

DRSA, which was introduced by Greco, Matarazzo, and Słowiński [20] in the early 2000s, is a new version of rough set theory suggested for multi-criteria decision analysis. DRSA, rather than the classical rough set theory, does not require the discretization of continuous condition attributes, and it has other distinctive characteristics such as handling inconsistent elements originating in the dominance principle and taking into account the preference order of the attributes [20]. The limitations that the classical approach had on some issues has been eliminated thanks to these improvements. Basically, DRSA attempts to extract useful information from the ordinal evaluations of objects on preference-ordered attributes (or criteria). In this approach, the decision attribute has a preference-ordered domain, and the attributes are divided into two categories depending on the monotonic relationship with the decision attribute: gain-type attributes and cost-type attributes. A gain-type attribute is one in which the high values in its domain are favored over the low values in its domain in terms of the decision. However, a cost-type attribute is a criterion (i.e., its domain has preference-ordered values) that is not a gain-type. For example, we consider the credit risk (decision attribute) of a person with respect to the value of their assets (condition attribute). Here, low credit risk is preferred to high credit risk. Since the person has a low (high) credit risk when the value of assets is high (low), high values of the condition attribute are at least as good as low values according to the decision. So, it is a gain-type criterion. In the continuation of this section, basic mathematical concepts about the DRSA and the adaptation of the DRSA to the “do not care” kind of missing values will be explained in detail [17,20,21,22,23].

2.1. Basic Concepts

A decision table is a simple visual representation that describes objects in terms of various attributes. In this table, objects or elements are placed into rows, while attributes are placed into columns. An object’s value for a particular attribute is entered into the cell where the corresponding row and column intersect. In mathematical terms, a decision table is described by the following quadruple [20]:

A finite non-empty set of objects (samples or elements) E;
A finite set of attributes $X = X_{c} \cup X_{d}$ , where $X_{c}$ and $X_{d}$ are condition and decision attribute sets, respectively;
The set of values is taken by all attributes in the decision table (denoted by V);
Information function $f : E \times X \to V$ .

To simplify matters, we assume that

X_{d} = {x_{d}}

. For

E = {e_{1}, e_{2}, e_{3}}

and

X_{c} = {x_{c 1}, x_{c 2}, x_{c 3}}

, Table 1 illustrates a sample decision table.

Let

O_{x}

represent an outranking relation on the universe set E according to the attribute x. If for

e_{1}, e_{2} \in E

and

x \in X

, then the expression

e_{1} O_{x} e_{2}

means that object

e_{1}

is at least as good as object

e_{2}

according to the attribute x. Since it has transitive and reflexive properties, it is a pre-order relation. Also,

e_{1} D_{Q} e_{2}

refers to “

e_{1}

Q-dominates

e_{2}

” if object

e_{1}

is at least as good as object

e_{2}

for all attributes in any subset Q of the set of condition attributes. Similar to the outranking relation,

D_{Q}

is a pre-order relation. However, since every pair of objects in the universe set E cannot be compared with respect to the relation

D_{Q}

, it is a partial pre-order relation. The relationship between the outranking relation and dominance relation can be expressed as follows:

e_{1} D_{Q} e_{2} = \underset{\forall q \in Q \subseteq X_{c}}{⋀} e_{1} O_{q} e_{2}

(1)

where ∧ denotes the logical AND operator.

Using the dominance relation, it becomes viable to identify the set of objects dominated by an element

e_{i} \in E

. This set is known as the Q-dominated set, denoted by

D_{Q}^{-} (e_{i})

, and defined as

D_{Q}^{-} (e_{i}) = {e_{j} \in E : e_{i} D_{Q} e_{j}}

. On the other hand, the set of objects dominating object

e_{i}

can also be obtained. This set is known as the Q-dominating set, denoted by

D_{Q}^{+} (e_{i})

, and defined as

D_{Q}^{+} (e_{i}) = {e_{j} \in E : e_{j} D_{Q} e_{i}}

.

The decision attribute

x_{d}

forms a partition of the universal set E. Let

K = {1, 2, \dots, p}

be the set of values taken by the decision attribute

x_{d}

. It partitions the universal set E into p decision classes, and the definitions of these classes are given as

C l_{n} = {e \in E : f (e, x_{d}) = n}

. The upward union of preference-ordered classes, denoted by

C l_{n}^{\geq}

, and the downward union of preference-ordered classes, denoted by

C l_{n}^{\leq}

, are defined as

⋃_{s \geq n} C l_{s}

(n = 2, \dots, p)

and

⋃_{s \leq n} C l_{s}

(n = 1, \dots, p - 1)

, respectively. In other words, the set of the upward union of a decision class contains the decision classes that are at least as good as itself, and its downward union contains the decision classes that are at most as good as itself.

With respect to

Q \subseteq X_{c}

, for

n = 2, \dots, p

, the Q-lower approximation of

C l_{n}^{\geq}

, denoted by

\underset{̲}{Q} (C l_{n}^{\geq})

, and the Q-upper approximation of

C l_{n}^{\geq}

, denoted by

\bar{Q} (C l_{n}^{\geq})

, are:

\underset{̲}{Q} (C l_{n}^{\geq}) = \{e_{i} \in E : D_{Q}^{+} (e_{i}) \subseteq C l_{n}^{\geq}\},

(2)

\bar{Q} (C l_{n}^{\geq}) = \{e_{i} \in E : D_{Q}^{-} (e_{i}) \cap C l_{n}^{\geq} \neq \emptyset\} .

(3)

Analogously, the Q-lower approximation and the Q-upper approximation of

C l_{n}^{\leq}

for

n = 1

, \dots, p - 1

are defined, respectively, as follows:

\underset{̲}{Q} (C l_{n}^{\leq}) = \{e_{i} \in E : D_{Q}^{-} (e_{i}) \subseteq C l_{n}^{\leq}\},

(4)

\bar{Q} (C l_{n}^{\leq}) = \{e_{i} \in E : D_{Q}^{+} (e_{i}) \cap C l_{n}^{\leq} \neq \emptyset\} .

(5)

The definitions of the Q-boundary region of

C l_{n}^{\geq}

for

n = 2, \dots, p

and

C l_{n}^{\leq}

for

n = 1

, \dots,

p - 1

are

B N_{Q} (C l_{n}^{\geq}) = \bar{Q} (C l_{n}^{\geq}) - \underset{̲}{Q} (C l_{n}^{\geq}),

(6)

B N_{Q} (C l_{n}^{\leq}) = \bar{Q} (C l_{n}^{\leq}) - \underset{̲}{Q} (C l_{n}^{\leq}) .

(7)

The lower approximation of a set comprises the objects that definitely belong to that set, whereas the upper approximation of a set includes objects that probably belong to that set. Objects present in the upper approximation but absent in the lower approximation constitute the boundary regions. These objects in the boundary regions are referred to as Q-inconsistent elements, as they exhibit some uncertainties.

2.2. Incomplete Information Systems

The symbol “*” will be used in decision tables to indicate any missing attribute values. We will assume that the value of at least one condition attribute for each element e in the universal set is known. In other words, for all

e \in E

, it is possible to find a condition attribute

x_{c}

in set

X_{c}

such that

f (e, x_{c}) \neq *

. We will also assume that the value of each object on the decision attribute is known.

We now define the dominance relations,

D_{Q}

and

ᗡ_{Q}

(Q \subseteq X_{c})

, before adapting DRSA to information systems with missing values. Given

e_{1}, e_{2} \in E

without any missing attribute values, and if

e_{2} O_{q} e_{1}

for all

q \in Q

, then the object

e_{2}

dominates the object

e_{1}

and is denoted by

e_{2} D_{Q} e_{1}

. On the other hand, if

e_{1} O_{q} e_{2}

for all

q \in Q

, then the object

e_{2}

is dominated by the object

e_{1}

and is denoted by

e_{2} ᗡ_{Q} e_{1}

. Therefore,

e_{2} D_{Q} e_{1}

if and only if

e_{1} ᗡ_{Q} e_{2}

. However, in the presence of the missing values, the dominance relation may lose some of its properties such as transitivity and a specific kind of symmetry [17].

The following outlines the extension of dominance based rough set approach to handle missing attribute values of the “do not care” kind [23]:

A subject element $\tilde{e}$ dominates a referent object e (denoted by $\tilde{e} D_{Q} e$ ) if and only if $\forall q \in Q$ , $\tilde{e} O_{q} e$ , or $f (\tilde{e}, q) = *$ , or $f (e, q) = *$ .
A subject element $\tilde{e}$ is dominated by a referent object e (denoted by $\tilde{e} ᗡ_{Q} e$ ) if and only if $\forall q \in Q$ , $e O_{q} \tilde{e}$ , or $f (\tilde{e}, q) = *$ , or $f (e, q) = *$ .

Dominance relations need to be redefined in order to handle information systems containing missing values. Therefore, the generalized definitions of rough approximations will be utilized.

For an object

e_{i} \in E

and

Q \subseteq X_{c}

, the Q-dominated set (denoted by

ᗡ_{Q}^{-} (e_{i})

) and the Q-dominating set (denoted by

ᗡ_{Q}^{+} (e_{i})

) corresponding to the relation

ᗡ_{Q}

as well as

D_{Q}^{+} (e_{i})

and

D_{Q}^{-} (e_{i})

should also be considered. These sets are called negative and positive dominance cones with respect to

ᗡ_{Q}

and their definitions are [23]:

ᗡ_{Q}^{-} (e_{i}) = \{e_{j} \in E : e_{j} ᗡ_{Q} e_{i}\},

(8)

ᗡ_{Q}^{+} (e_{i}) = \{e_{j} \in E : e_{i} ᗡ_{Q} e_{j}\} .

(9)

ᗡ_{Q}^{-} (e_{i})

is the set of objects dominated by

e_{i}

, whereas

ᗡ_{Q}^{+} (e_{i})

is the set of objects dominating

e_{i}

. The important point to emphasize here is that the equalities

ᗡ_{Q}^{-} (e_{i}) = D_{Q}^{-} (e_{i})

and

ᗡ_{Q}^{+} (e_{i}) = D_{Q}^{+} (e_{i})

are always satisfied for each

e_{i} \in E

in decision tables that do not contain missing attribute values.

Now, we present the generalized lower and upper approximations under incomplete information systems to address the missing values [13,23]. The generalized Q-lower and Q-upper approximations of the upward union of the decision classes

(C l_{n}^{\geq}, n = 2, \dots, p)

are defined, respectively, as follows:

\underset{̲}{Q} (C l_{n}^{\geq}) = \{e \in E : ᗡ_{Q}^{+} (e) \subseteq C l_{n}^{\geq}\},

(10)

\bar{Q} (C l_{n}^{\geq}) = \{e \in E : D_{Q}^{-} (e) \cap C l_{n}^{\geq} \neq \emptyset\} .

(11)

Analogously, the generalized definitions of the Q-lower and Q-upper approximations of the downward union of the decision classes

(C l_{n}^{\leq}, n = 1, \dots, p - 1)

are as follows:

\underset{̲}{Q} (C l_{n}^{\leq}) = \{e \in E : D_{Q}^{-} (e) \subseteq C l_{n}^{\leq}\},

(12)

\bar{Q} (C l_{n}^{\leq}) = \{e \in E : ᗡ_{Q}^{+} (e) \cap C l_{n}^{\leq} \neq \emptyset\} .

(13)

The definitions of Q-boundary regions of the downward and upward union of the decision classes are the same as Equations (6) and (7), respectively.

Decision rules, which provide a concise description of the decision table, are derived from the objects in rough approximations. A decision rule consists of two parts: a cause clause and a decision clause. Decision rules are articulated using the quantifiers “at least” and “at most”, as attributes have preference-ordered domains. For instance, suppose we have the following decision rule in consideration:

IF Fever of a patient is at least 38.5 °C and loss of sense of taste is at most 60% THEN the patient has at least moderate degree of carrying the SARS-CoV-2 virus.

There are two elementary conditions in the decision rule. First, the patient’s fever is no less than 38.5 °C, and second, the loss of sense of taste is no more than 60%. The conjunction of these elementary conditions forms the cause clause of the decision rule. In the decision clause of the rule, it is stated that the virus-carrying status of a patient with the above characteristics will belong to a middle or higher-level decision class. In our study, we consider the exact decision rules induced from the objects in the lower approximations. The properties and syntax of these rules are as follows [24,25]:

Exact ${Dec}_{\geq}$ −rule (Type-1) is extracted from the objects in $\underset{̲}{Q} (C l_{n}^{\geq})$ . Namely, objects that pertain to the lower approximation of the upward union of decision classes are positive, while all the others are negative.
IF $f (e, x_{c 1}) \geq r_{1}$ and $f (e, x_{c 2}) \geq r_{2}$ and ⋯ and $f (e, x_{c s}) \geq r_{s}$ THEN $e \in C l_{n}^{\geq}$ , where ${x_{c 1}, x_{c 2}, \dots, x_{c s}} \subseteq X_{c}, (r_{1}, r_{2}, \dots, r_{s}) \in V_{c 1} \times V_{c 2} \times \dots \times V_{c s}$ and $n = 2, \dots, p$ .
Exact ${Dec}_{\leq}$ −rule (Type-3) is extracted from the objects in $\underset{̲}{Q} (C l_{n}^{\leq})$ . Namely, objects that pertain to the lower approximation of the downward union of decision classes are positive, while all the others are negative.
IF $f (e, x_{c 1}) \leq r_{1}$ and $f (e, x_{c 2}) \leq r_{2}$ and ⋯ and $f (e, x_{c s}) \leq r_{s}$ THEN $e \in C l_{n}^{\leq}$ , where ${x_{c 1}, x_{c 2}, \dots, x_{c s}} \subseteq X_{c}, (r_{1}, r_{2}, \dots, r_{s}) \in V_{c 1} \times V_{c 2} \times \dots \times V_{c s}$ and $n = 1, \dots, p - 1$ .

If all elementary conditions of a rule r are satisfied by the object e, then the rule r covers the object e. Furthermore, if the object e satisfies both the cause clause and the decision clause of the rule r, then the rule r is supported by the object e.

The DomLEM algorithm [26] needs to be modified in some ways in order to extract decision rules under information systems with the “do not care” kind of missing attribute values [18]. Elementary conditions must be created from the non-missing attribute values of the objects in the rough approximations. Moreover, when the value of an object on an attribute is missing, that object is covered by any elementary condition created from that attribute. In other words, if

f (e, x_{c}) = *

for

e \in E

and

x_{c} \in X_{c}

, then all elementary conditions created from the attribute

x_{c}

cover the object e. The remaining portions of the algorithm are unchanged from the original version.

3. Material and Methods

Multi-criteria decision analysis includes selection and sorting problems, as well as classification problems. jMAF [27], Jamm [28], and 4emka [25] are all software tools for classification problems, while jRank [19] is a software tool for selection and sorting problems. These tools are convenient and useful in managing the decision-making process. The purpose of jRank is to rank objects in a decision table from best to worst and/or choose the best one [29]. In this section, we consider selection and sorting problems under incomplete decision tables and propose an adaptation of jRank to handle “do not care” type of the missing attribute values.

The dominance relation stands as the only objective information that can be utilized to compare objects [19]. However, many of the objects in the decision table might not be comparable to each other, as the dominance relation is a partial pre-order relation. To make it possible to compare chosen objects pairwise, this weakness of the dominance relation can be remedied with a domain expert. In this case, a pairwise comparison table (PCT) can be prepared by conducting a comprehensive comparison of the reference objects selected in the original decision table by a decision maker [19]. It should also be emphasized that the PCT is a decision table containing pairs of objects:

Pairs of objects ${(e_{i}, e_{j}) : (e_{i} \in A) \land (e_{j} \in A) \land (A \subseteq E)}$ are placed in the rows.
Derived attributes from the original ones are placed in the columns. In the PCT, $X_{P C T} = X_{P C T}^{C} \cup X_{P C T}^{D}$ , where $X_{P C T}^{C}$ and $X_{P C T}^{D}$ represent the set of condition attributes and the set of decision attributes, respectively.
$\bar{V}$ denotes the set of all values taken by the attributes in the PCT.
$g : (A \times A) \times X_{P C T} \to \bar{V}$ is an information function.

Dealing with ranking problems that involve missing values requires transforming from decision table to PCT. We proposed the following definitions to describe how the PCT is constructed using a decision table containing missing values:

The value of pairs of objects on a cardinal attribute is determined by the difference operation. If the value of objects on the cardinal attribute is not missing, then the difference of these values is taken. However, when at least one of the values of the objects on the cardinal attribute is missing, the object pair’s value on that attribute is also missing. This case is valid when the objects in a pair are different from each other. On the other hand, the difference operation consistently yields zero for a pair composed of identical objects.

Definition 1.

Let

e_{i}

and

e_{j}

be any two objects,

x_{C}

be any cardinal attribute, and g and f be information functions in the PCT and in an ordinary DT (a DT that contains individual objects), respectively. Then, the transformation on the cardinal attribute for a pair of objects is defined in the following manner:

g ((e_{i}, e_{j}), x_{C}) = \{\begin{matrix} 0 & i f i = j \\ f (e_{i}, x_{C}) - f (e_{j}, x_{C}) & i f f (e_{i}, x_{C}) \neq * a n d f (e_{j}, x_{C}) \neq * a n d i \neq j \\ * & i f f (e_{i}, x_{C}) = * a n d / o r f (e_{j}, x_{C}) = * a n d i \neq j \end{matrix}

(14)

On the other hand, the value of a pair of objects on an ordinal attribute is the ordered pair of their values in the original decision table.

Definition 2.

Let

e_{i}

and

e_{j}

be any two objects,

x_{O}

be any ordinal attribute, and g and f be information functions in the PCT and in an ordinary DT (a DT that contains individual objects), respectively. Then, the transformation on the ordinal attribute for a pair of objects is defined in the following manner:

g ((e_{i}, e_{j}), x_{O}) = \{\begin{matrix} (f (e_{i}, x_{O}), f (e_{j}, x_{O})) & i f f (e_{i}, x_{O}) \neq * a n d f (e_{j}, x_{O}) \neq * \\ (*, f (e_{j}, x_{O})) & i f f (e_{i}, x_{O}) = * a n d f (e_{j}, x_{O}) \neq * \\ (f (e_{i}, x_{O}), *) & i f f (e_{i}, x_{O}) \neq * a n d f (e_{j}, x_{O}) = * \\ (*, *) & i f f (e_{i}, x_{O}) = * a n d f (e_{j}, x_{O}) = * \end{matrix}

(15)

The decision class of a pair of objects is related to either relation S (comprehensive outranking) or relation

S^{c}

(comprehensive non-outranking), depending on the opinion of the decision maker. Given a pair of objects, these relations indicate whether or not the former is preferred over the latter for the decision maker. For example,

e_{1} S e_{2}

states that object

e_{1}

is at least as good as

e_{2}

, whereas

e_{1} S^{c} e_{2}

states that object

e_{1}

is at most as good as

e_{2}

.

Definition 3.

Let

e_{i}

and

e_{j}

be any two objects,

x_{D}

be the decision attribute, and g be an information function in the PCT. Then, the decision class of a pair of objects is defined in the following manner:

g ((e_{i}, e_{j}), x_{D}) = \{\begin{matrix} S & i f e_{i} S e_{j} \\ S^{c} & i f e_{i} S^{c} e_{j} \end{matrix}

(16)

The set of condition attributes

X_{P C T}^{C}

can consist of the union of three subsets: the set of regular attributes

X_{P C T}^{C, R}

(attributes with no preference-ordered domain), the set of ordinal attributes

X_{P C T}^{C, O}

, and the set of cardinal attributes

X_{P C T}^{C, C}

. In our study, we ignore the set of regular attributes (i.e.,

X_{P C T}^{C, R} = \emptyset

). For

Q \subseteq X_{P C T}^{C}

, we proposed to generalize the dominance relations in [19] under an incomplete PCT as follows:

With respect to set Q, the pair of objects

(e_{x}, e_{y})

dominates the pair of objects

(e_{w}, e_{z})

(denoted by

(e_{x}, e_{y}) D_{Q} (e_{w}, e_{z})

) if and only if:

[(e_{x}, e_{y}) D_{X_{P C T}^{C, C}} (e_{w}, e_{z}) \Leftrightarrow \forall q_{i} \in X_{P C T}^{C, C},

((e_{x}, e_{y}) O_{q_{i}} (e_{w}, e_{z}) or g ((e_{x}, e_{y}), q_{i}) = * or g ((e_{w}, e_{z}), q_{i}) = *)]

⋀

[(e_{x}, e_{y}) D_{X_{P C T}^{C, O}} (e_{w}, e_{z}) \Leftrightarrow \forall q_{i} \in X_{P C T}^{C, O},

(e_{x} O_{q_{i}} e_{w} or f (e_{x}, q_{i}) = * or f (e_{w}, q_{i}) = *) \land (e_{z} O_{q_{i}} e_{y} or f (e_{z}, q_{i}) = * or f (e_{y}, q_{i}) = *)] .

With respect to set Q, the pair of objects

(e_{x}, e_{y})

is dominated by the pair of objects

(e_{w}, e_{z})

(denoted by

(e_{x}, e_{y}) ᗡ_{Q} (e_{w}, e_{z})

) if and only if:

[(e_{x}, e_{y}) ᗡ_{X_{P C T}^{C, C}} (e_{w}, e_{z}) \Leftrightarrow \forall q_{i} \in X_{P C T}^{C, C},

((e_{w}, e_{z}) O_{q_{i}} (e_{x}, e_{y}) or g ((e_{x}, e_{y}), q_{i}) = * or g ((e_{w}, e_{z}), q_{i}) = *)]

⋀

[(e_{x}, e_{y}) ᗡ_{X_{P C T}^{C, O}} (e_{w}, e_{z}) \Leftrightarrow \forall q_{i} \in X_{P C T}^{C, O},

(e_{w} O_{q_{i}} e_{x} or f (e_{w}, q_{i}) = * or f (e_{x}, q_{i}) = *) \land (e_{y} O_{q_{i}} e_{z} or f (e_{y}, q_{i}) = * or f (e_{z}, q_{i}) = *)],

where ∧ denotes the logical AND operator, and

D_{X_{P C T}^{C, C}}

,

ᗡ_{X_{P C T}^{C, C}}

and

D_{X_{P C T}^{C, O}}

,

ᗡ_{X_{P C T}^{C, O}}

denote dominance with respect to set

X_{P C T}^{C, C}

and dominance with respect to set

X_{P C T}^{C, O}

, respectively. When the PCT contains no missing attribute values, one can easily observe that

(e_{x}, e_{y}) D_{Q} (e_{w}, e_{z})

implies

(e_{w}, e_{z}) ᗡ_{Q} (e_{x}, e_{y})

or vice versa. However, the dominance relations defined above may lose some properties under an incomplete PCT. Therefore, we will provide the following generalized definition of rough approximation:

Given a pair of objects

(e_{x}, e_{y}) \in A \times A

and a subset of condition attributes

Q \subseteq X_{P C T}^{C}

, positive and negative dominance cones with respect to relations

D_{Q}

and

ᗡ_{Q}

are as follows:

The set of objects that dominates the pair of objects $(e_{x}, e_{y})$ with respect to relation $D_{Q}$ is:

$D_{Q}^{+} (e_{x}, e_{y}) = \{(e_{w}, e_{z}) \in A \times A : (e_{w}, e_{z}) D_{Q} (e_{x}, e_{y})\} .$

(17)
The set of objects that dominates the pair of objects $(e_{x}, e_{y})$ with respect to relation $ᗡ_{Q}$ is:

$ᗡ_{Q}^{+} (e_{x}, e_{y}) = \{(e_{w}, e_{z}) \in A \times A : (e_{x}, e_{y}) ᗡ_{Q} (e_{w}, e_{z})\} .$

(18)
The set of objects that is dominated by the pair of objects $(e_{x}, e_{y})$ with respect to relation $D_{Q}$ is:

$D_{Q}^{-} (e_{x}, e_{y}) = \{(e_{w}, e_{z}) \in A \times A : (e_{x}, e_{y}) D_{Q} (e_{w}, e_{z})\} .$

(19)
The set of objects that is dominated by the pair of objects $(e_{x}, e_{y})$ with respect to relation $ᗡ_{Q}$ is:

$ᗡ_{Q}^{-} (e_{x}, e_{y}) = \{(e_{w}, e_{z}) \in A \times A : (e_{w}, e_{z}) ᗡ_{Q} (e_{x}, e_{y})\} .$

(20)

It should be noted that

D_{Q}^{+} (e_{x}, e_{y}) = ᗡ_{Q}^{+} (e_{x}, e_{y})

and

D_{Q}^{-} (e_{x}, e_{y}) = ᗡ_{Q}^{-} (e_{x}, e_{y})

are always valid when the PCT has no missing attribute values.

Recall that there are two classes, S and

S^{c}

, in the pairwise comparison table. Since there are no other classes that are at least as good as S except itself, the upward union of the class S is equal to itself only (i.e.,

{(S)}^{\geq} = S

). Similarly, the downward union of the class

S^{c}

is also equal to itself (i.e.,

{(S^{c})}^{\leq} = S^{c}

) because no other classes exist that are at most as good as it. We shall, therefore, provide the definitions of the lower and upper approximations for the decision classes themselves.

The Q-lower approximation of the outranking relation S, denoted by

\underset{̲}{Q} (S)

, and the Q-upper approximation of the outranking relation S, denoted by

\bar{Q} (S)

, are:

\underset{̲}{Q} (S) = \{(e_{x}, e_{y}) \in A \times A : ᗡ_{Q}^{+} (e_{x}, e_{y}) \subseteq S\},

(21)

\bar{Q} (S) = \{(e_{x}, e_{y}) \in A \times A : D_{Q}^{-} (e_{x}, e_{y}) \cap S \neq \emptyset\} .

(22)

The Q-lower approximation of the non-outranking relation

S^{c}

, denoted by

\underset{̲}{Q} (S^{c})

, and the Q-upper approximation of the non-outranking relation

S^{c}

, denoted by

\bar{Q} (S^{c})

, are:

\underset{̲}{Q} (S^{c}) = \{(e_{x}, e_{y}) \in A \times A : D_{Q}^{-} (e_{x}, e_{y}) \subseteq S^{c}\},

(23)

\bar{Q} (S^{c}) = \{(e_{x}, e_{y}) \in A \times A : ᗡ_{Q}^{+} (e_{x}, e_{y}) \cap S^{c} \neq \emptyset\} .

(24)

The Q-boundary region of the outranking relation S, denoted by

B N_{Q} (S)

, and the Q-boundary region of the non-outranking relation

S^{c}

, denoted by

B N_{Q} (S^{c})

, are as follows:

B N_{Q} (S) = \bar{Q} (S) - \underset{̲}{Q} (S),

(25)

B N_{Q} (S^{c}) = \bar{Q} (S^{c}) - \underset{̲}{Q} (S^{c}) .

(26)

Note that when

(e_{x}, e_{y}) D_{Q} (e_{w}, e_{z})

implies

(e_{w}, e_{z}) ᗡ_{Q} (e_{x}, e_{y})

and vice versa, then:

The lower and upper approximations of the outranking relation S, as defined in [19], are identical to the definitions of the lower and upper approximations provided by (21) and (22).
The lower and upper approximations of the non-outranking relation $S^{c}$ , as defined in [19], are identical to the definitions of the lower and upper approximations provided by (23) and (24).

Decision rules can be used to offer a broad description of the preference-ordered information in PCTs. Thus, a PCT can be viewed as the set of decision rules in the form of “IF {elementary condition(s)} THEN {decision(s)}”. A decision rule is simply a combination of the elementary condition(s) and the decision(s). There are three kinds of rules that can be induced from rough approximations: exact decision rules, possible decision rules, and approximate decision rules [26]. In our experimental work in the next section, we will employ exact decision rules, whose syntaxes are given below [19].

Decision rules derived from the objects belonging to the lower approximation of the outranking relation (Type-1):

IF $f (e_{x}, q_{i 1}) - f (e_{y}, q_{i 1}) \geq r_{1}$ and ⋯ and $f (e_{x}, q_{i e}) - f (e_{y}, q_{i e}) \geq r_{e}$ and $f (e_{x}, q_{i (e + 1)}) \geq r_{i (e + 1)}$ and $f (e_{y}, q_{i (e + 1)}) \leq s_{i (e + 1)}$ and ⋯ and $f (e_{x}, q_{i p}) \geq r_{i p}$ and $f (e_{y}, q_{i p}) \leq s_{i p}$ , THEN $e_{x} S e_{y}$ , where $X_{P C T}^{C, C} = {q_{i 1}, \dots, q_{i e}} \subseteq Q, X_{P C T}^{C, O} = {q_{i (e + 1)}, \dots, q_{i p}} \subseteq Q, (r_{1}, \dots, r_{p}) \in {\bar{V}}_{q_{i 1}} \times \dots \times {\bar{V}}_{q_{i p}}$ , and $(s_{i (e + 1)}, \dots, s_{i p}) \in {\bar{V}}_{q_{i (e + 1)}} \times \dots \times {\bar{V}}_{q_{i p}}$ .

Decision rules derived from the objects belonging to the lower approximation of the non-outranking relation (Type-3):

IF $f (e_{x}, q_{i 1}) - f (e_{y}, q_{i 1}) \leq r_{1}$ and ⋯ and $f (e_{x}, q_{i e}) - f (e_{y}, q_{i e}) \leq r_{e}$ and $f (e_{x}, q_{i (e + 1)}) \leq r_{i (e + 1)}$ and $f (e_{y}, q_{i (e + 1)}) \geq s_{i (e + 1)}$ and ⋯ and $f (e_{x}, q_{i p}) \leq r_{i p}$ and $f (e_{y}, q_{i p}) \geq s_{i p}$ , THEN $e_{x} S^{c} e_{y}$ , where $X_{P C T}^{C, C} = {q_{i 1}, \dots, q_{i e}} \subseteq Q, X_{P C T}^{C, O} = {q_{i (e + 1)}, \dots, q_{i p}} \subseteq Q, (r_{1}, \dots, r_{p}) \in {\bar{V}}_{q_{i 1}} \times \dots \times {\bar{V}}_{q_{i p}}$ , and $(s_{i (e + 1)}, \dots, s_{i p}) \in {\bar{V}}_{q_{i (e + 1)}} \times \dots \times {\bar{V}}_{q_{i p}}$ .

If a pair of objects

(e_{x}, e_{y}) \in A \times A

satisfies all elementary conditions of a rule r, it is covered by the rule r. Moreover, if

(e_{x}, e_{y})

is covered by the rule r and belongs to the decision class suggested by r, then it supports the rule r.

Our method introduces the necessary changes to the algorithm for extracting decision rules from the PCT with missing attribute values. The changes required in the algorithm are similar to the modified DomLEM algorithm used in [18]. The algorithm needs to be rearranged to construct a candidate elementary conditions set and to determine whether any pairs of objects are covered by elementary conditions of a rule under an incomplete PCT. The set of candidate elementary conditions must consist of non-missing attribute values of related objects. In other words, it is possible to create a candidate elementary condition from attribute

q_{i}

of a pair of objects

(e_{x}, e_{y})

when

f (e_{x}, q_{i}) \neq *

and

f (e_{y}, q_{i}) \neq *

(i.e.,

g ((e_{x}, e_{y}), q_{i}) \neq *

). On the other hand, a pair of objects meets all elementary conditions on the cardinal attribute

q_{i}

if at least one of the values of the objects on

q_{i}

is missing. Furthermore, a pair of objects meets all elementary conditions on the ordinal attribute

q_{i}

if both of the values of the objects on

q_{i}

are missing. When one of the values of the objects on the ordinal attribute

q_{i}

is known, the object that has a missing value on

q_{i}

satisfies all the elementary conditions created from that attribute. In order to determine whether or not the pair of objects is covered by the elementary conditions created from

q_{i}

, the other object that does not have a missing value on

q_{i}

must also be tested on these elementary conditions. The rest of the algorithm is the same as the original one.

4. Experimental Results and Discussion

In this section, we will consider the incomplete version of the house location problem given in [19]. This example is provided to illustrate the consistency of the generalized relations in our proposed approach under the semantics of ‘do not care’ type of missing values. Table 2 was obtained by deleting some of the attribute values of several locations from the decision table associated with the original house location problem. In this incomplete decision table, both distance and price are cardinal attributes, whereas comfort is an ordinal attribute. The symbols ‘↓’ and ‘↑’ stand for cost-type attributes and gain-type attributes, respectively. As can be seen from Table 2, distance (

q_{1}

) and price (

q_{2}

) are cost-type attributes, while comfort (

q_{3}

) is a gain-type attribute. Regarding the domain of the attribute comfort, medium is worse than good, but it is better than basic. The opinion of the decision maker for the first seven referent objects (L1–L7) in the decision table is illustrated in Figure 1 in the form of a graph data structure. Each node represents the objects in the house location problem (such as L1, L2, etc.), and the arc between two nodes represents the preference information between the objects related to that arc. For example, node 1 stands for the city Poznan. The arc between node 1 and node 2 means that the city Poznan is at least as good as the city Kapalica (i.e., L1SL2). It will also be assumed that the preference information yielded by the decision maker is symmetric (i.e., L1SL2⇔L2

S^{c}

L1).

The pairwise comparison table in accordance with the preference information shown in Figure 1 is presented in Table 3. For instance, the attribute value for the pair of objects (Poznan, Malbork) in terms of distance was obtained by subtracting the distance values of the former from the latter. Since the value of the location Malbork on the cardinal attribute price is missing, the value of row 4 on this attribute in the PCT is also missing. On the other hand, the value of comfort for the pair of objects (Poznan, Malbork) in the PCT is expressed as an ordered pair. As seen from Figure 1, location Poznan is preferred over the location Malbork by a decision maker so its decision class is the outranking relation S. The Q-dominated sets for the relation

D_{Q}

and the Q-dominating sets for the relation

ᗡ_{Q}

are presented in Appendix A. From Equations (23) and (24), the Q-lower and Q-upper approximations of the non-outranking relation

S^{c}

were computed as:

\underset{̲}{Q} (S^{c}) = \{13, 14, 15, 16, 17, 18, 19, 20, 21, 22\},

\bar{Q} (S^{c}) = \{11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 27, 29, 30, 31\} .

From Equations (21) and (22), the Q-lower and Q-upper approximations of the outranking relation S were computed as:

\underset{̲}{Q} (S) = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 25, 26, 28\},

\bar{Q} (S) = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 23, 24, 25, 26, 27, 28, 29, 30, 31\} .

By means of the lower and upper approximations of the decision classes in the PCT, the pairs of objects in the Q-boundary regions are:

B N_{Q} (S) = B N_{Q} (S^{c}) = \{11, 12, 23, 24, 27, 29, 30, 31\} .

Six exact decision rules induced from the Q-lower approximations of S and

S^{c}

are listed below. The first three decision rules show the assignments belonging to the decision class S, and the last three rules show the assignments belonging to the decision class

S^{c}

.

IF $f (e_{x}, q_{3}) \geq$ Good and $f (e_{y}, q_{3}) \leq$ Medium THEN $e_{x} S e_{y}$ .
IF $f (e_{x}, q_{3}) \geq$ Basic and $f (e_{y}, q_{3}) \leq$ Basic THEN $e_{x} S e_{y}$ .
IF $f (e_{x}, q_{1}) - f (e_{y}, q_{1}) \leq 0$ and $f (e_{x}, q_{3}) \geq$ Good and $f (e_{y}, q_{3}) \leq$ Good THEN $e_{x} S e_{y}$ .
IF $f (e_{x}, q_{3}) \leq$ Medium and $f (e_{y}, q_{3}) \geq$ Good THEN $e_{x} S^{c} e_{y}$ .
IF $f (e_{x}, q_{3}) \leq$ Basic and $f (e_{y}, q_{3}) \geq$ Medium THEN $e_{x} S^{c} e_{y}$ .
IF $f (e_{x}, q_{1}) - f (e_{y}, q_{1}) \geq 32$ and $f (e_{x}, q_{3}) \geq$ Good and $f (e_{y}, q_{3}) \leq$ Good THEN $e_{x} S^{c} e_{y}$ .

Considering the assignment part of the rules covered by the pairs of objects, a preference graph was generated using MATLAB and is presented in Figure 2. Nodes in this graph represent all objects in the house location problem, and directed arcs between any two nodes indicate either the relation S or relation

S^{c}

. An S-arc between a pair of locations means that this pair of locations is covered by a rule that suggests the assignment to decision class S. Similarly, an

S^{c}

-arc between a pair of locations means that this pair of locations is covered by a rule that suggests the assignment to decision class

S^{c}

. Directed blue arcs between two locations symbolize the outranking relation S, while directed red arcs symbolize the non-outranking relation

S^{c}

. The weight of each directed arc on the graph is also equal to 1.

Six ranking techniques can be applied to the preference graph depicted in Figure 2 to obtain the final rankings of the locations in the decision table [19]. In this experimental study, the Net Flow Score, which is the default technique for the jRank tool, was considered, and the final ranking is provided in Table 4. The Net Flow Score is denoted by

N F S (S, S^{c})

and calculated as follows [19]:

We shall first define the terms positive and negative scores. A positive score refers to the sum of the number of

S^{c}

-arcs entering a node and the number of S-arcs leaving that node. A negative score refers to the sum of the number of

S^{c}

-arcs leaving a node and the number of S-arcs entering that node. The Net Flow Score for any node is defined as the difference between its positive score and its negative score. Thus,

N F S (S, S^{c}) = P S (S, S^{c}) - N S (S, S^{c}),

(27)

where

P S

and

N S

are the functions belonging to the positive and the negative scores, respectively. For example, we consider the location Lublin (L10). As shown in Figure 2, this location has 10

S^{c}

-arcs entering and 10 S-arcs leaving, so its positive score is 20 (i.e.,

P S (S, S^{c}) = 20

). On the other hand, it has 3 S-arcs entering and 10

S^{c}

-arcs leaving, resulting in a negative score of 13 (i.e.,

N S (S, S^{c}) = 13

). Therefore,

N F S (S, S^{c}) = 20 - 13 = 7

.

As observed from Table 4, locations are sorted in descending order according to their Net Flow Scores. Among the locations, Poznan has the highest score, while Warszawa has the lowest score. Therefore, Poznan and Warszawa are the best and worst locations, respectively. There are five locations with the same Net Flow Score. Line number 6 is shared by the locations Krakow, Malbork, Gdansk, Kornik, and Torun.

By comparing the incomplete decision table to the decision table presented in [19], we can make some important inferences. For instance, in the decision table provided in [19], the value of the location Rogalin for comfort is basic, which is the worst value in the domain of the attribute comfort. However, the value of Rogalin for comfort is missing in Table 2. Since this missing attribute value is of the “do not care” kind, it can take all possible values within the domain of comfort. This missing attribute value can thus have values that are at least as good as basic. It results in the location Rogalin having a better score in this experimental study than its score in [19]. Furthermore, Wroclaw receives almost the best value in the domain of the attribute distance in the decision table provided in [19]. The value of Wroclaw for the attribute distance, however, is missing in Table 2 and this missing value can take any value within the domain of distance. Therefore, the value of this missing attribute could be almost as good as the value it receives in [19]. This leads to the Net Flow Score of Wroclaw being lower than its Net Flow Score computed in [19].

It is clear that a ‘do not care’ type of missing attribute value indicates that it can take any possible value within its domain. The absence of any attribute value for an object directly impacts its final ranking score. In this case, an increase in the number of missing attributes can either positively or negatively influence the final score. The effect of missing attribute values is highly dependent on how these values relate to the overall attribute domain. When the missing values mostly correspond to the worst (or best) values within the defined attribute domain, the object’s score will increase (or decrease) compared to the table without missing values.

Our approach improves the computational complexity cost in various phases, such as rule extraction, dominating set computations, and dominated set computations. It achieves this by reducing the number of comparison operations. For example, in dominating and dominated set computations, when we compare an object with other objects based on any attribute, if the value for that attribute is missing in the object being compared, the object is considered to be dominant over all other objects with respect to that attribute, without any comparison being necessary.

Remark 1.

The efficiency of computational complexity is enhanced through reducing the number of comparisons.

A strong correlation exists between the final ranking and the consistent and inconsistent pairs of objects in the PCT. The final ranking of consistent objects aligns with the decision maker’s opinion-based ranking. For instance, in the PCT, the object pair (Poznan, Kapalica) is consistent, and as anticipated, Poznan scores higher than Kapalica in the final ranking. On the other hand, (Gdansk, Wroclaw) is an inconsistent pair of objects. In the final ranking, these two locations must be ranked in a way contrary to the decision maker’s opinion. As seen in the table, Wroclaw’s ranking is at least as good as Gdansk’s. A similar case can also be observed for the other inconsistent pair (Gdansk, Malbork). This indicates that findings of our methodology aligns with the study [30].

5. Conclusions

Data collected to facilitate the management of decision-making processes may in some cases contain a certain amount of missing values. Although imputation techniques allow data analysis on incomplete datasets, they can lead to data distortion. Rough set theory is able to tackle the data distortion issue by appropriately adapting the relations it employs to construct rough approximations. In the literature, there is no ranking method for information systems with missing attribute values that uses the dominance-based rough set approach. Inspired by this deficiency, in this paper we present how a multi-criteria ranking method based on the dominance-based rough set approach can be applied to an information system with missing values of “do not care” type. Its impact on the final ranking of objects is studied and discussed in detail through an example. Experimental results showed that an object with missing attribute values may experience a change, either an increase or a decrease, in its ranking score compared to when all attribute values are present. It suggests that when the missing values are close to the worst values within the domain of attributes, the score tends to increase, whereas if they are almost the best, the score tends to decrease. Moreover, it indicates that the final ranking supports the decision maker’s choices for consistent pairs, but opposes those for inconsistent pairs. Our approach also reduces the number of comparison operations leading to an improvement in computational complexity.

Author Contributions

Conceptualization, A.T.; Methodology, A.T.; Formal Analysis, A.T., N.G.B. and Y.U.; Software, A.T.; Writing—Original Draft Preparation, A.T., N.G.B. and Y.U.; Visualization, A.T.; Writing—Review and Editing, A.T., N.G.B. and Y.U.; Supervision, N.G.B. and Y.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The authors confirm that the data that supports the findings of this study are available within the article. Raw data that support the finding of this study are available from the corresponding author, upon reasonable request.

Acknowledgments

This research has been supported by Yildiz Technical University, Scientific Research Coordinatorship under project code FYL-2020-4001.

Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Abbreviations

The following abbreviations are used in this manuscript:

DT	Decision Table
PCT	Pairwise Comparison Table
DRSA	Dominance-based Rough Set Approach
SVM	Support Vector Machines
OLM	Ordinal Learning Model
OSDL	Ordinal Stochastic Dominance Learner
VC	Variable Consistency
NFS	Net Flow Score
PS	Positive Score
NS	Negative Score

Appendix A

Appendix A.1. Q-Dominated Sets with Respect to Relation D_Q

Pair #1: 1, 15, 16, 18, 19, 20
Pair #2: 2, 5, 11, 14, 15, 16, 17, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31
Pair #3: 2, 3, 5, 9, 10, 11, 13, 14, 15, 16, 17, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
Pair #4: 1, 2, 4, 5, 6, 8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31
Pair #5: 5, 11, 14, 15, 16, 17, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31
Pair #6: 16, 23, 6, 22, 24, 11, 13
Pair #7: 6, 7, 10, 11, 13, 16, 22, 23, 24
Pair #8: 2, 5, 6, 8, 11, 13, 14, 15, 16, 17, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31
Pair #9: 9, 11, 14, 15, 16, 17, 20, 21, 22, 24, 27, 28, 29, 30, 31
Pair #10: 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 27, 28, 29, 30, 31
Pair #11: 11, 12, 14, 15, 16, 17, 18, 19, 20, 22, 24
Pair #12: 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 27, 29, 30, 31
Pair #13: 13, 16
Pair #14: 16, 20, 14, 15
Pair #15: 15
Pair #16: 16
Pair #17: 16, 17, 20, 14, 15
Pair #18: 16, 18, 19, 20
Pair #19: 19
Pair #20: 16, 20
Pair #21: 15, 21, 22
Pair #22: 15, 19, 22
Pair #23: 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 27, 29, 30, 31
Pair #24: 11, 16, 22, 23, 24
Pair #25: 16, 17, 20, 25, 26, 14, 15
Pair #26: 16, 17, 20, 25, 26, 14, 15
Pair #27: 11, 14, 15, 16, 17, 20, 21, 22, 24, 27, 29, 30, 31
Pair #28: 15, 21, 22, 28
Pair #29: 11, 14, 15, 16, 17, 20, 21, 22, 24, 27, 29, 30, 31
Pair #30: 11, 14, 15, 16, 17, 20, 21, 22, 24, 27, 29, 30, 31
Pair #31: 11, 14, 15, 16, 17, 20, 21, 22, 24, 27, 29, 30, 31

Appendix A.2. Q-Dominating Sets with Respect to Relation ᗡ_Q

Pair #1: 1, 4
Pair #2: 2, 3, 4, 8
Pair #3: 3
Pair #4: 4
Pair #5: 2, 3, 4, 5, 8
Pair #6: 4, 6, 7, 8
Pair #7: 7
Pair #8: 4, 8
Pair #9: 3, 9, 10
Pair #10: 3, 7, 10
Pair #11: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 23, 24, 27, 29, 30, 31
Pair #12: 4, 10, 11, 12, 23
Pair #13: 3, 4, 6, 7, 8, 13
Pair #14: 2, 3, 4, 5, 8, 9, 10, 11, 12, 14, 17, 23, 25, 26, 27, 29, 30, 31
Pair #15: 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14, 15, 17, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31
Pair #16: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 23, 24, 25, 26, 27, 29, 30, 31
Pair #17: 2, 3, 4, 5, 8, 9, 10, 11, 12, 17, 23, 25, 26, 27, 29, 30, 31
Pair #18: 1, 4, 18, 23, 10, 11, 12
Pair #19: 1, 4, 10, 11, 12, 18, 19, 22, 23
Pair #20: 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14, 17, 18, 20, 23, 25, 26, 27, 29, 30, 31
Pair #21: 2, 3, 4, 5, 8, 9, 10, 12, 21, 23, 27, 28, 29, 30, 31
Pair #22: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 21, 22, 23, 24, 27, 28, 29, 30, 31
Pair #23: 2, 3, 4, 5, 6, 7, 8, 10, 12, 23, 24
Pair #24: 2 3 4 5 6 7 8 9 10 11 12 23 24 27 29 30 31
Pair #25: 2, 3, 4, 5, 8, 25, 26
Pair #26: 2, 3, 4, 5, 8, 25, 26
Pair #27: 2, 3, 4, 5, 8, 9, 10, 12, 23, 27, 29, 30, 31
Pair #28: 3, 9, 10, 28
Pair #29: 2, 3, 4, 5, 8, 9, 10, 12, 23, 27, 29, 30, 31
Pair #30: 2, 3, 4, 5, 8, 9, 10, 12, 23, 27, 29, 30, 31
Pair #31: 2, 3, 4, 5, 8, 9, 10, 12, 23, 27, 29, 30, 31

References

Mishra, S. Decision-making under risk: Integrating perspectives from biology, economics, and psychology. Personal. Soc. Psychol. Rev. 2014, 18, 280–307. [Google Scholar] [CrossRef] [PubMed]
Zavadskas, E.K.; Turskis, Z. Multiple criteria decision making (MCDM) methods in economics: An overview. Technol. Econ. Dev. Econ. 2011, 17, 397–427. [Google Scholar] [CrossRef]
Von Winterfeldt, D. Bridging the gap between science and decision making. Proc. Natl. Acad. Sci. USA 2013, 110 (Suppl. S3), 14055–14061. [Google Scholar] [CrossRef] [PubMed]
Choi, T.M.; Chan, H.K.; Yue, X. Recent development in big data analytics for business operations and risk management. IEEE Trans. Cybern. 2016, 47, 81–92. [Google Scholar] [CrossRef] [PubMed]
Provost, F. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2013; Volume 355. [Google Scholar]
Ramezani, R.; Maadi, M.; Khatami, S.M. A novel hybrid intelligent system with missing value imputation for diabetes diagnosis. Alex. Eng. J. 2018, 57, 1883–1891. [Google Scholar] [CrossRef]
Strike, K.; El Emam, K.; Madhavji, N. Software cost estimation with incomplete data. IEEE Trans. Softw. Eng. 2001, 27, 890–908. [Google Scholar] [CrossRef]
Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inform. Decis. Mak. 2016, 16, 197–208. [Google Scholar] [CrossRef] [PubMed]
Lin, W.C.; Tsai, C.F. Missing value imputation: A review and analysis of the literature (2006–2017). Artif. Intell. Rev. 2020, 53, 1487–1509. [Google Scholar] [CrossRef]
Khan, M.A. A Comparative Study on Imputation Techniques: Introducing a Transformer Model for Robust and Efficient Handling of Missing EEG Amplitude Data. Bioengineering 2024, 11, 740. [Google Scholar] [CrossRef] [PubMed]
Vidal-Paz, J.; Rodríguez-Gómez, B.A.; Orosa, J.A. A Comparison of Different Methods for Rainfall Imputation: A Galician Case Study. Appl. Sci. 2023, 13, 12260. [Google Scholar] [CrossRef]
Pawlak, Z. Rough set theory and its applications to data analysis. Cybern. Syst. 1998, 29, 661–688. [Google Scholar] [CrossRef]
Greco, S.; Matarazzo, B.; Slowinski, R. Dealing with missing data in rough set analysis of multi-attribute and multi-criteria decision problems. In Decision Making: Recent Developments and Worldwide Applications; Springer: Berlin/Heidelberg, Germany, 2000; pp. 295–316. [Google Scholar]
Kryszkiewicz, M. Rough set approach to incomplete information systems. Inf. Sci. 1998, 112, 39–49. [Google Scholar] [CrossRef]
Stefanowski, J.; Tsoukias, A. Incomplete information tables and rough classification. Comput. Intell. 2001, 17, 545–566. [Google Scholar] [CrossRef]
Wang, G. Extension of rough set under incomplete information systems. In Proceedings of the 2002 IEEE World Congress on Computational Intelligence, 2002 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE’02. Proceedings (Cat. No. 02CH37291), Honolulu, HI, USA, 12–17 May 2002; IEEE: Piscataway, NJ, USA, 2002; Volume 2, pp. 1098–1103. [Google Scholar]
Szeląg, M.; Błaszczyński, J.; Słowiński, R. Rough set analysis of classification data with missing values. In Proceedings of the Rough Sets: International Joint Conference, IJCRS 2017, Olsztyn, Poland, 3–7 July 2017; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2017; pp. 552–565. [Google Scholar]
Błaszczyński, J.; Słowiński, R.; Szeląg, M. Induction of ordinal classification rules from incomplete data. In Proceedings of the Rough Sets and Current Trends in Computing: 8th International Conference, RSCTC 2012, Chengdu, China, 17–20 August 2012; Proceedings 8. Springer: Berlin/Heidelberg, Germany, 2012; pp. 56–65. [Google Scholar]
Szeląg, M.; Słowiński, R.; Greco, S.; Błaszczyński, J.; Wilk, S. jRank—Ranking using Dominance-based Rough Set Approach. Newsl. Eur. Work. Group Mult. Criteria Decis. Aiding 2010, 3, 13–15. [Google Scholar]
Greco, S.; Matarazzo, B.; Slowinski, R. Rough sets theory for multicriteria decision analysis. Eur. J. Oper. Res. 2001, 129, 1–47. [Google Scholar] [CrossRef]
Slowinski, R.; Greco, S.; Matarazzo, B. Rough set and rule-based multicriteria decision aiding. Pesqui. Oper. 2012, 32, 213–270. [Google Scholar] [CrossRef]
Uçan, Y.; Topal, A.; Bayazit, N.G. A new method for obtaining the inconsistent elements in a decision table based on dominance principle. Turk. J. Math. 2020, 44, 561–568. [Google Scholar]
Greco, S.; Matarazzo, B.; Słowinski, R. Handling missing values in rough set analysis of multi-attribute and multi-criteria decision problems. In Proceedings of the New Directions in Rough Sets, Data Mining, and Granular-Soft Computing: 7th International Workshop, RSFDGrC’99, Yamaguchi, Japan, 9–11 November 1999; Proceedings 7. Springer: Berlin/Heidelberg, Germany, 1999; pp. 146–157. [Google Scholar]
Greco, S.; Matarazzo, B.; Slowinski, R. A new rough set approach to evaluation of bankruptcy risk. In Operational Tools in the Management of Financial Risks; Springer: Berlin/Heidelberg, Germany, 1998; pp. 121–136. [Google Scholar]
Greco, S.; Matarazzo, B.; Slowinski, R. Multicriteria Classification by Dominance-Based Rough Set Approach; Politechnika Poznańska: Poznan, Poland, 2000. [Google Scholar]
Greco, S.; Matarazzo, B.; Slowinski, R.; Stefanowski, J. An algorithm for induction of decision rules consistent with the dominance principle. In Rough Sets and Current Trends in Computing, Proceedings of the Second International Conference, RSCTC 2000, Banff, AB, Canada, 16–19 October 2000; Revised Papers 2; Springer: Berlin/Heidelberg, Germany, 2001; pp. 304–313. [Google Scholar]
Błaszczyński, J.; Greco, S.; Matarazzo, B.; Słowiński, R.; Szelag, M. jMAF-Dominance-based rough set data analysis framework. In Rough Sets and Intelligent Systems-Professor Zdzisław Pawlak in Memoriam: Volume 1; Springer: Berlin/Heidelberg, Germany, 2013; pp. 185–209. [Google Scholar]
Slowinski, R. The International Summer School on MCDM 2006. Class Note. Kainan University, Taiwan. Software. 2006. Available online: https://fcds.cs.put.poznan.pl/IDSS/software/jamm.htm (accessed on 10 June 2024).
Alvarez, P.A.; Ishizaka, A.; Martinez, L. Multiple-criteria decision-making sorting methods: A survey. Expert Syst. Appl. 2021, 183, 115368. [Google Scholar] [CrossRef]
Szeląg, M.S. Application of the Dominance-Based Rough Set Approach to Ranking and Similarity-Based Classification Problems. Ph.D. Dissertation, Poznań University of Technology, Poznan, Poland, 2015. [Google Scholar]

Figure 1. Opinion of the decision maker for the first seven referent objects [19].

Figure 2. Preference graph for incomplete house location problem.

Table 1. A sample decision table.

	Condition and Decision Attributes ( $X = X_{c} \cup X_{d}$ )
Objects ( $E$ )	$x_{c 1}$	$x_{c 2}$	$x_{c 3}$	$x_{d}$
$e_{1}$	$f (e_{1}, x_{c 1})$	$f (e_{1}, x_{c 2})$	$f (e_{1}, x_{c 3})$	$f (e_{1}, x_{d})$
$e_{2}$	$f (e_{2}, x_{c 1})$	$f (e_{2}, x_{c 2})$	$f (e_{2}, x_{c 3})$	$f (e_{2}, x_{d})$
$e_{3}$	$f (e_{3}, x_{c 1})$	$f (e_{3}, x_{c 2})$	$f (e_{3}, x_{c 3})$	$f (e_{3}, x_{d})$

Table 2. Incomplete information system for house location problem.

	Distance ( $q_{1}, ↓$ )	Price ( $q_{2}, ↓$ )	Comfort ( $q_{3}, ↑$ )
L1-Poznan	3	60	Good
L2-Kapalica	35	30	Good
L3-Krakow	7	85	Medium
L4-Warszawa	10	90	Basic
L5-Wroclaw	*	60	Medium
L6-Malbork	50	*	Medium
L7-Gdansk	5	70	Medium
L8-Kornik	50	40	Medium
L9-Rogalin	15	50	*
L10-Lublin	*	60	Good
L11-Torun	100	50	Medium

Note: Asterisk (*) indicates the ’do not care’ type missing value for the respective attributes.

Table 3. PCT for the house location problem under missing attribute values.

No	Pair	Distance	Price	Comfort	Decision
1	(L1, L2)	−32	30	(Good, Good)	S
2	(L1, L3)	−4	−25	(Good, Medium)	S
3	(L1, L4)	−7	−30	(Good, Basic)	S
4	(L1, L6)	−47	*	(Good, Medium)	S
5	(L1, L7)	−2	−10	(Good, Medium)	S
6	(L2, L3)	28	−55	(Good, Medium)	S
7	(L2, L4)	25	−60	(Good, Basic)	S
8	(L2, L6)	−15	*	(Good, Medium)	S
9	(L3, L4)	−3	−5	(Medium, Basic)	S
10	(L5, L4)	*	−30	(Medium, Basic)	S
11	(L7, L5)	*	10	(Medium, Medium)	S
12	(L7, L6)	−45	*	(Medium, Medium)	S
13	(L2, L1)	32	−30	(Good, Good)	$S^{c}$
14	(L3, L1)	4	25	(Medium, Good)	$S^{c}$
15	(L4, L1)	7	30	(Basic, Good)	$S^{c}$
16	(L6, L1)	47	*	(Medium, Good)	$S^{c}$
17	(L7, L1)	2	10	(Medium, Good)	$S^{c}$
18	(L3, L2)	−28	55	(Medium, Good)	$S^{c}$
19	(L4, L2)	−25	60	(Basic, Good)	$S^{c}$
20	(L6, L2)	15	*	(Medium, Good)	$S^{c}$
21	(L4, L3)	3	5	(Basic, Medium)	$S^{c}$
22	(L4, L5)	*	30	(Basic, Medium)	$S^{c}$
23	(L5, L7)	*	−10	(Medium, Medium)	$S^{c}$
24	(L6, L7)	45	*	(Medium, Medium)	$S^{c}$
25	(L1, L1)	0	0	(Good, Good)	S
26	(L2, L2)	0	0	(Good, Good)	S
27	(L3, L3)	0	0	(Medium, Medium)	S
28	(L4, L4)	0	0	(Basic, Basic)	S
29	(L5, L5)	0	0	(Medium, Medium)	S
30	(L6, L6)	0	0	(Medium, Medium)	S
31	(L7, L7)	0	0	(Medium, Medium)	S

Table 4. Final rankings of the objects according to Net Flow Score.

Rank	Locations	$NFS (S, S^{c})$
1	L1	17
2	L2	12
3	L10	7
4	L5	−1
5	L9	−2
6	L3, L6, L7, L8, L11	−3
7	L4	−18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Topal, A.; Guler Bayazit, N.; Ucan, Y. A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets. Mathematics 2024, 12, 2944. https://doi.org/10.3390/math12182944

AMA Style

Topal A, Guler Bayazit N, Ucan Y. A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets. Mathematics. 2024; 12(18):2944. https://doi.org/10.3390/math12182944

Chicago/Turabian Style

Topal, Ahmet, Nilgun Guler Bayazit, and Yasemen Ucan. 2024. "A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets" Mathematics 12, no. 18: 2944. https://doi.org/10.3390/math12182944

APA Style

Topal, A., Guler Bayazit, N., & Ucan, Y. (2024). A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets. Mathematics, 12(18), 2944. https://doi.org/10.3390/math12182944

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets

Abstract

1. Introduction

2. Dominance-Based Rough Set Approach

2.1. Basic Concepts

2.2. Incomplete Information Systems

3. Material and Methods

4. Experimental Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Q-Dominated Sets with Respect to Relation D_Q

Appendix A.2. Q-Dominating Sets with Respect to Relation ᗡ_Q

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Method to Handle the Missing Values in Multi-Criteria Sorting Problems Based on Dominance Rough Sets

Abstract

1. Introduction

2. Dominance-Based Rough Set Approach

2.1. Basic Concepts

2.2. Incomplete Information Systems

3. Material and Methods

4. Experimental Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Q-Dominated Sets with Respect to Relation DQ

Appendix A.2. Q-Dominating Sets with Respect to Relation ᗡQ

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A.1. Q-Dominated Sets with Respect to Relation D_Q

Appendix A.2. Q-Dominating Sets with Respect to Relation ᗡ_Q