Optimization of Decision Trees with Hypotheses for Knowledge Representation

Azad, Mohammad; Chikalov, Igor; Hussain, Shahid; Moshkov, Mikhail

doi:10.3390/electronics10131580

Open AccessArticle

Optimization of Decision Trees with Hypotheses for Knowledge Representation

¹

Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka 72441, Saudi Arabia

²

Intel Corporation, 5000 W Chandler Blvd, Chandler, AZ 85226, USA

³

Computer Science Program, Dhanani School of Science and Engineering, Habib University, Karachi 75290, Pakistan

⁴

Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(13), 1580; https://doi.org/10.3390/electronics10131580

Submission received: 26 May 2021 / Revised: 23 June 2021 / Accepted: 28 June 2021 / Published: 30 June 2021

(This article belongs to the Special Issue AI-Based Knowledge Management)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we consider decision trees that use two types of queries: queries based on one attribute each and queries based on hypotheses about values of all attributes. Such decision trees are similar to the ones studied in exact learning, where membership and equivalence queries are allowed. We present dynamic programming algorithms for minimization of the depth and number of nodes of above decision trees and discuss results of computer experiments on various data sets and randomly generated Boolean functions. Decision trees with hypotheses generally have less complexity, i.e., they are more understandable and more suitable as a means for knowledge representation.

Keywords:

knowledge representation; decision tree; hypothesis; depth; number of nodes

1. Introduction

Decision trees are used in many areas of computer science as a means for knowledge representation, as classifiers, and as algorithms to solve different problems of combinatorial optimization, computational geometry, etc. [1,2,3]. They are studied, in particular, in test theory initiated by Chegis and Yablonskii [4], rough set theory initiated by Pawlak [5,6,7], and exact learning initiated by Angluin [8,9]. These theories are closely related: attributes from rough set theory and test theory correspond to membership queries from exact learning. Exact learning studies additionally the so-called equivalence queries. The notion of “minimally adequate teacher” that allows both membership and equivalence queries was discussed by Angluin in Reference [10]. Relations between exact learning and PAC learning proposed by Valiant [11] are discussed in Reference [8].

In this paper, which is an extension of two conference papers [12,13], we add the notion of a hypothesis to the model that has been considered in rough set theory, as well as in test theory. This model allows us to use an analog of equivalence queries. Our goal is to check whether it is possible to reduce the time and space complexity of decision trees if we use additionally hypotheses. Decision trees with less complexity are more understandable and more suitable as a means for knowledge representation. Note that, to improve the understandability, we should not only try to minimize the number of nodes in a decision tree but also its depth that is the unimprovable upper bound on the number of conditions describing objects accepted by a path from the root to a terminal node of the tree. In this paper, we concentrate only on the consideration of complexity of decision trees and do not study many recent problems considered in machine learning [14,15,16,17].

Let T be a decision table with n conditional attributes

f_{1}, \dots, f_{n}

having values from the set

ω = {0, 1, 2, \dots}

in which rows are pairwise different, and each row is labeled with a decision from

ω

. For a given row of T, we should recognize the decision attached to this row. To this end, we can use decision trees based on two types of queries. We can ask about the value of an attribute

f_{i} \in {f_{1}, \dots, f_{n}}

on the given row. We will obtain an answer of the kind

f_{i} = δ

, where

δ

is the number in the intersection of the given row and the column

f_{i}

. We can also ask if a hypothesis

f_{1} = δ_{1}, \dots, f_{n} = δ_{n}

is true, where

δ_{1}, \dots, δ_{n}

are numbers from the columns

f_{1}, \dots, f_{n}

, respectively. Either this hypothesis will be confirmed or we obtain a counterexample in the form

f_{i} = σ

, where

f_{i} \in {f_{1}, \dots, f_{n}}

, and

σ

is a number from the column

f_{i}

different from

δ_{i}

. The considered hypothesis is called proper if

(δ_{1}, \dots, δ_{n})

is a row of the table T.

In this paper, we study four cost functions that characterize the complexity of decision trees: the depth, the number of realizable nodes relative to T, the number of realizable terminal nodes relative to T, and the number of working nodes. We consider the depth of a decision tree as its time complexity, which is equal to the maximum number of queries in a path from the root to a terminal node of the tree. The remaining three cost functions characterize the space complexity of decision trees. A node is called realizable relative to T if, for a row of T and some choice of counterexamples, the computation in the tree will pass through this node. Note that, in the considered trees, all working nodes are realizable.

Decision trees using hypotheses can be essentially more efficient than the decision trees using only attributes. Let us consider an example, the problem of computation of the conjunction

x_{1} \land \dots \land x_{n}

. The minimum depth of a decision tree solving this problem using the attributes

x_{1}, \dots, x_{n}

is equal to n. The minimum number of realizable nodes in such decision trees is equal to

2 n + 1

, the minimum number of working nodes is equal to n, and the minimum number of realizable terminal nodes is equal to

n + 1

. However, the minimum depth of a decision tree solving this problem using proper hypotheses is equal to 1: it is enough to ask only about the hypothesis

x_{1} = 1, \dots, x_{n} = 1

. If it is true, then the considered conjunction is equal to 1. Otherwise, it is equal to 0. The obtained decision tree contains one working node and

n + 1

realizable terminal nodes, altogether

n + 2

realizable nodes.

We study the following five types of decision trees:

Decision trees that use only attributes.
Decision trees that use only hypotheses.
Decision trees that use both attributes and hypotheses.
Decision trees that use only proper hypotheses.
Decision trees that use both attributes and proper hypotheses.

For each cost function, we propose a dynamic programming algorithm that, for a given decision table and given type of decision trees, finds the minimum cost of a decision tree of the considered type for this table. Note that dynamic programming algorithms for the optimization of decision trees of the type 1 were studied in Reference [18] for decision tables with one-valued decisions and in Reference [19] for decision tables with many-valued decisions. The dynamic programming algorithms for the optimization of decision trees of all five types were studied in References [12,13] for the depth and for the number of realizable nodes.

It is interesting to consider not only specially chosen examples as the conjunction of n variables. For each cost function, we compute the minimum cost of a decision tree for each of the considered five types for eight decision tables from the UCI ML Repository [20]. We do the same for randomly generated Boolean functions with n variables, where

n = 3, \dots, 6

.

From the obtained experimental results, it follows that, generally, the decision trees of the types 3 and 5 have less complexity than the decision trees of the type 1. Therefore, such decision trees can be useful as a means for knowledge representation. Decision trees of the types 2 and 4 have, generally, too many nodes.

Based on the experimental results, we formulate and prove the following hypothesis: for any decision table, we can construct a decision tree with the minimum number of realizable terminal nodes using only attributes.

The motivation for the work is related to the use of decision trees to represent knowledge: we try to reduce the complexity of decision trees (and improve their understandability) by using hypotheses. The main achievements of the work are the following: (i) we have proposed dynamic programming algorithms for optimizing five types of decision trees relative to four cost functions, and (ii) we have shown cases, when the use of hypotheses leads to the decrease in the complexity of decision trees.

The rest of the paper is organized as follows. In Section 2 and Section 3, we consider main notions. In Section 4, Section 5, Section 6, Section 7 and Section 8, we describe dynamic programming algorithms for the decision tree optimization. In Section 9, we prove the above hypothesis. Section 10 contains results of computer experiments, and Section 11 gives short conclusions.

2. Decision Tables

A decision table is a table T with

n \geq 1

columns filled with numbers from the set

ω = {0, 1, 2, \dots}

. Columns of this table are labeled with conditional attributes

f_{1}, \dots, f_{n}

. Rows of the table are pairwise different. Each row is labeled with a number from

ω

that is interpreted as a decision. Rows of the table are interpreted as tuples of values of the conditional attributes.

Each decision table can be represented by a word (sequence) over the alphabet

{0, 1,;, |}

: numbers from

ω

are in binary representation, we use the symbol “;” to separate two numbers from

ω

, and we use the symbol “|” to separate two rows (for each row, we add corresponding decision as the last number in the row). The length of this word is called the size of the decision table.

A decision table T is called empty if it has no rows. The table T is called degenerate if it is empty or all rows of T are labeled with the same decision.

We denote

F (T) = {f_{1}, \dots, f_{n}}

and denote by

D (T)

the set of decisions attached to the rows of T. For any conditional attribute

f_{i} \in F (T)

, we denote by

E (T, f_{i})

the set of values of the attribute

f_{i}

in the table T. We denote by

E (T)

the set of conditional attributes of T for which

|E (T, f_{i})| \geq 2

.

A system of equations over T is an arbitrary equation system of the kind

{f_{i_{1}} = δ_{1}, \dots, f_{i_{m}} = δ_{m}},

where

m \in ω

,

f_{i_{1}}, \dots, f_{i_{m}} \in F (T)

, and

δ_{1} \in E (T, f_{i_{1}}), \dots, δ_{m} \in E (T, f_{i_{m}})

(if

m = 0

, then the considered equation system is empty).

Let T be a nonempty table. A subtable of T is a table obtained from T by removal of some rows. We correspond to each equation system S over T a subtable

T S

of the table T. If the system S is empty, then

T S = T

. Let S be nonempty and

S =

{f_{i_{1}} = δ_{1}, \dots, f_{i_{m}} = δ_{m}}

. Then,

T S

is the subtable of the table T containing the rows from T that, in the intersection with columns,

f_{i_{1}}, \dots, f_{i_{m}}

have numbers

δ_{1}, \dots, δ_{m}

, respectively. Such nonempty subtables, including the table T, are called separable subtables of T. We denote by

S E P (T)

the set of separable subtables of the table T.

3. Decision Trees

Let T be a nonempty decision table with n conditional attributes

f_{1}, \dots, f_{n}

. We consider the decision trees with two types of queries. We can choose an attribute

f_{i} \in F (T) = {f_{1}, \dots, f_{n}}

and ask about its value. This query has the set of answers

A (f_{i}) = {{f_{i} = δ} : δ \in E (T, f_{i})}

. We can formulate a hypothesis over T in the form of

H = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

, where

δ_{1} \in E (T, f_{1}), \dots, δ_{n} \in E (T, f_{n})

, and ask about this hypothesis. This query has the set of answers

A (H) = {H, {f_{1} = σ_{1}}, \dots, {f_{n} = σ_{n}} : σ_{1} \in E (T, f_{1}) ∖ {δ_{1}}, \dots, σ_{n} \in E (T, f_{n}) ∖ {δ_{n}}}

. The answer H means that the hypothesis is true. Other answers are counterexamples. The hypothesis H is called proper for T if

(δ_{1}, \dots, δ_{n})

is a row of the table T.

A decision tree over T is a marked finite directed tree with the root in which:

Each terminal node is labeled with a number from the set $D (T) \cup {0}$ .
Each node, which is not terminal (such nodes are called working), is labeled with an attribute from the set $F (T)$ or with a hypothesis over T.
If a working node is labeled with an attribute $f_{i}$ from $F (T)$ , then, for each answer from the set $A (f_{i})$ , there is exactly one edge labeled with this answer, which leave this node and there are no any other edges leaving this node.
If a working node is labeled with a hypothesis $H = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}$ over T, then, for each answer from the set $A (H)$ , there is exactly one edge labeled with this answer, which leaves this node and there are no any other edges leaving this node.

Let

Γ

be a decision tree over T and v be a node of

Γ

. We now define an equation system

S (Γ, v)

over T associated with the node v. We denote by

ξ

the directed path from the root of

Γ

to the node v. If there are no working nodes in

ξ

, then

S (Γ, v)

is the empty system. Otherwise,

S (Γ, v)

is the union of equation systems attached to the edges of the path

ξ

.

A decision tree

Γ

over T is called a decision tree for T if, for any node v of

Γ

,

The node v is terminal if and only if the subtable $T S (Γ, v)$ is degenerate.
If v is a terminal node and the subtable $T S (Γ, v)$ is empty, then the node v is labeled with the decision 0.
If v is a terminal node and the subtable $T S (Γ, v)$ is nonempty, then the node v is labeled with the decision attached to all rows of $T S (Γ, v)$ .

A complete path in

Γ

is an arbitrary directed path from the root to a terminal node in

Γ

. As the time complexity of a decision tree, we consider its depth that is the maximum number of working nodes in a complete path in the tree or, which is the same, the maximum length of a complete path in the tree. We denote by

h (Γ)

the depth of a decision tree

Γ

.

As the space complexity of the decision tree

Γ

, we consider the number of its realizable relative to T nodes. A node v of

Γ

is called realizable relative to T if and only if the subtable

T S (Γ, v)

is nonempty. We denote by

L (T, Γ)

the number of nodes in

Γ

that are realizable relative to T. We also consider two more cost functions relative to the space complexity:

L_{t} (T, Γ)

— the number of terminal nodes in

Γ

that are realizable relative to T and

L_{w} (T, Γ)

— the number of working nodes in

Γ

. Note that all working nodes of

Γ

are realizable relative to T.

We will use the following notation:

For $k = 1, \dots, 5$ , $h^{(k)} (T)$ is the minimum depth of a decision tree of the type k for T.
For $k = 1, \dots, 5$ , $L^{(k)} (T)$ is the minimum number of nodes realizable relative to T in a decision tree of the type k for T.
For $k = 1, \dots, 5$ , $L_{t}^{(k)} (T)$ is the minimum number of terminal nodes realizable relative to T in a decision tree of the type k for T.
For $k = 1, \dots, 5$ , $L_{w}^{(k)} (T)$ is the minimum number of working nodes in a decision tree of the type k for T.

4. Construction of Directed Acyclic Graph $Δ (T)$

Let T be a nonempty decision table with n conditional attributes

f_{1}, \dots, f_{n}

. We now describe an Algorithm

A_{D A G}

for the construction of a directed acyclic graph (DAG)

Δ (T)

that will be used for the study of decision trees. Nodes of this graph are separable subtables of the table T. During each iteration we process one node. We start with the graph that consists of one node T, which is not processed and finish when all nodes of the graph are processed. This algorithm can be considered as a special case of the algorithm for DAG construction considered in Reference [18].

Algorithm

A_{D A G}

(construction of DAG

Δ (T)

).

Input: A nonempty decision table T with n conditional attributes

f_{1}, \dots, f_{n}

.

Output: Directed acyclic graph

Δ (T)

.

Construct the graph that consists of one node T, which is not marked as processed.
If all nodes of the graph are processed, then the algorithm halts and returns the resulting graph as $Δ (T)$ . Otherwise, choose a node (table) $Θ$ that has not been processed yet.
If $Θ$ is degenerate, then mark the node $Θ$ as processed and proceed to step 2.
If $Θ$ is not degenerate, then, for each $f_{i} \in E (Θ)$ , draw a bundle of edges from the node $Θ$ . Let $E (Θ, f_{i}) = {a_{1}, \dots, a_{k}}$ . Then, draw k edges from $Θ$ and label these edges with systems of equations ${f_{i} = a_{1}}, \dots, {f_{i} = a_{k}}$ . These edges enter nodes $Θ {f_{i} = a_{1}}, \dots,$ $Θ {f_{i} = a_{k}}$ , respectively. If some of the nodes $Θ {f_{i} = a_{1}}, \dots, Θ {f_{i} = a_{k}}$ are not present in the graph, then add these nodes to the graph. Mark the node $Θ$ as processed and return to step 2.

The following statement about time complexity of the Algorithm

A_{D A G}

follows immediately from Proposition 3.3 [18].

Proposition 1.

The time complexity of the Algorithm

A_{D A G}

is bounded from above by a polynomial on the size of the input table T and the number

|S E P (T)|

of different separable subtables of T.

Generally, the time complexity of the Algorithm

A_{D A G}

is exponential, depending on the size of the input decision tables. Note that, in Section 3.4 of the book [18], classes of decision tables are described for each of which the number of separable subtables of decision tables from the class is bounded from above by a polynomial on the number of columns in the tables. For each of these classes, the time complexity of the Algorithm

A_{D A G}

is polynomial depending on the size of the input decision tables.

Note that similar results can be obtained for the space complexity of the considered algorithm.

5. Minimizing the Depth

In this section, we consider some results obtained in Reference [12]. Let T be a nonempty decision table with n conditional attributes

f_{1}, \dots, f_{n}

. We can use the DAG

Δ (T)

to compute values

h^{(1)} (T), \dots, h^{(5)} (T)

. Let

k \in {1, \dots, 5}

. To find the value

h^{(k)} (T)

, for each node

Θ

of the DAG

Δ (T)

, we compute the value

h^{(k)} (Θ)

. It will be convenient for us to consider not only subtables that are nodes of

Δ (T)

but also empty subtable

Λ

of T and subtables

T_{r}

that contain only one row r of T and are not nodes of

Δ (T)

. We begin with these special subtables and terminal nodes of

Δ (T)

(nodes without leaving edges) that are degenerate separable subtables of T and step-by-step move to the table T.

Let

Θ

be a terminal node of

Δ (T)

or

Θ = T_{r}

for some row r of T. Then,

h^{(k)} (Θ) = 0

: the decision tree that contains only one node labeled with the decision attached to all rows of

Θ

is a decision tree for

Θ

. If

Θ = Λ

, then

h^{(k)} (Θ) = 0

: the decision tree that contains only one node labeled with 0 will be considered as a decision tree for

Λ

.

Let

Θ

be a nonterminal node of

Δ (T)

such that, for each child

Θ^{'}

of

Θ

, we already know the value

h^{(k)} (Θ^{'})

. Based on this information, we can find the minimum depth of a decision tree for

Θ

, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled:

With an attribute from $F (T)$ (we denote by $h_{a}^{(k)} (Θ)$ the minimum depth of such a decision tree).
With a hypothesis over T (we denote by $h_{h}^{(k)} (Θ)$ the minimum depth of such a decision tree).
With a proper hypothesis over T (we denote by $h_{p}^{(k)} (Θ)$ the minimum depth of such a decision tree).

Since

Θ

is nondegenerate, the set

E (Θ)

is nonempty. We now describe three procedures for computing the values

h_{a}^{(k)} (Θ)

,

h_{h}^{(k)} (Θ)

, and

h_{p}^{(k)} (Θ)

, respectively.

Let us consider a decision tree

Γ (f_{i})

for

Θ

in which the root is labeled with an attribute

f_{i} \in E (Θ)

. For each

δ \in E (T, f_{i})

, there is an edge that leaves the root and enters a node

v (δ)

. This edge is labeled with the equation system

{f_{i} = δ}

. The node

v (δ)

is the root of a decision tree of the type k for

Θ {f_{i} = δ}

for which the depth is equal to

h^{(k)} (Θ {f_{i} = δ})

. It is clear that

h (Γ (f_{i})) = 1 + max {h^{(k)} (Θ {f_{i} = δ}) : δ \in E (T, f_{i})} .

Since

h^{(k)} (Θ {f_{i} = δ}) = h^{(k)} (Λ) = 0

for any

δ \in E (T, f_{i}) ∖ E (Θ, f_{i})

,

h (Γ (f_{i})) = 1 + max {h^{(k)} (Θ {f_{i} = δ}) : δ \in E (Θ, f_{i})} .

(1)

Evidently, for any

δ \in E (Θ, f_{i})

, the subtable

Θ {f_{i} = δ}

is a child of

Θ

in the DAG

Δ (T)

, i.e., we know the value

h^{(k)} (Θ {f_{i} = δ})

.

One can show that

h (Γ (f_{i}))

is the minimum depth of a decision tree for

Θ

in which the root is labeled with the attribute

f_{i}

and which uses for the subtables corresponding to the children of the root decision trees of the type k.

We should not consider attributes

f_{i} \in F (T) ∖ E (Θ)

since, for each such attribute, there is

δ \in E (T, f_{i})

with

Θ {f_{i} = δ} = Θ

, i.e., based on this attribute, we cannot construct an optimal decision tree for

Θ

. As a result, we have

h_{a}^{(k)} (Θ) = min {h (Γ (f_{i})) : f_{i} \in E (Θ)} .

(2)

Computation of

h_{a}^{(k)} (Θ)

. Construct the set of attributes

E (Θ)

. For each attribute

f_{i} \in E (Θ)

, compute the value

h (Γ (f_{i}))

using (1). Compute the value

h_{a}^{(k)} (Θ)

using (2).

Remark 2.

Let Θ be a nonterminal node of the DAG

Δ (T)

such that, for each child

Θ^{'}

of Θ, we already know the value

h^{(k)} (Θ^{'})

. Then, the procedure of computation of the value

h_{a}^{(k)} (Θ)

has polynomial time complexity depending on the size of decision table T.

A hypothesis

H = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

over T is called admissible for

Θ

and an attribute

f_{i} \in F (T) = {f_{1}, \dots, f_{n}}

if, for any

σ \in E (T, f_{i}) ∖ {δ_{i}}

,

Θ {f_{i} = σ} \neq Θ

. The hypothesis H is not admissible for

Θ

and an attribute

f_{i} \in F (T)

if and only if

| E (Θ, f_{i}) | = 1

and

δ_{i} \notin E (Θ, f_{i})

. The hypothesis H is called admissible for

Θ

if it is admissible for

Θ

and any attribute

f_{i} \in F (T)

.

Let us consider a decision tree

Γ (H)

for

Θ

in which the root is labeled with an admissible for

Θ

hypothesis

H = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

. The set of answers for the query corresponding to the hypothesis H is equal to

A (H) = {H, {f_{1} = σ_{1}}, \dots, {f_{n} = σ_{n}} : σ_{1} \in E (T, f_{1}) ∖ {δ_{1}}, \dots, σ_{n} \in E (T, f_{n}) ∖ {δ_{n}}}

. For each

S \in A (H)

, there is an edge that leaves the root of

Γ (H)

and enters a node

v (S)

. This edge is labeled with the equation system S. The node

v (S)

is the root of a decision tree of the type k for

Θ S

, which depth is equal to

h^{(k)} (Θ S)

. It is clear that

h (Γ (H)) = 1 + max {h^{(k)} (Θ S) : S \in A (H)} .

We have

Θ H = Λ

or

Θ H = T_{r}

for some row r of T. Therefore,

h^{(k)} (Θ H) = 0

. Since H is admissible for

Θ

,

E (Θ, f_{i}) ∖ {δ_{i}} = \emptyset

for any attribute

f \in F (T) ∖ E (Θ)

. It is clear that

Θ {f_{i} = σ} = Λ

and

h^{(k)} (Θ {f_{i} = σ}) = 0

for any attribute

f_{i} \in E (Θ)

and any

σ \in E (T, f_{i}) ∖ {δ_{i}}

such that

σ \notin E (Θ, f_{i})

. Therefore,

h (Γ (H)) = 1 + max {h^{(k)} (Θ {f_{i} = σ}) : f_{i} \in E (Θ), σ \in E (Θ, f_{i}) ∖ {δ_{i}}} .

(3)

It is clear that, for any

f_{i} \in E (Θ)

and any

σ \in E (Θ, f_{i}) ∖ {δ_{i}}

, the subtable

Θ {f_{i} = σ}

is a child of

Θ

in the DAG

Δ (T)

, i.e., we know the value

h^{(k)} (Θ {f_{i} = σ})

.

One can show that

h (Γ (H))

is the minimum depth of a decision tree for

Θ

in which the root is labeled with the hypothesis H and which uses for the subtables corresponding to the children of the root decision trees of the type k.

We should not consider hypotheses that are not admissible for

Θ

since, for each such hypothesis H for corresponding query, there is an answer

S \in A (H)

with

Θ S = Θ

, i.e., based on this hypothesis, we cannot construct an optimal decision tree for

Θ

.

Computation of

h_{h}^{(k)} (Θ)

. First, we construct a hypothesis:

H_{Θ} = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

for

Θ

. Let

f_{i} \in F (T) ∖ E (Θ)

. Then,

δ_{i}

is equal to the only number in the set

E (Θ, f_{i})

. Let

f_{i} \in E (Θ)

. Then,

δ_{i}

is the minimum number from

E (Θ, f_{i})

for which

h^{(k)} (Θ {f_{i} = δ_{i}}) = max {h^{(k)} (Θ {f_{i} = σ}) : σ \in E (Θ, f_{i})}

. It is clear that

H_{Θ}

is admissible for

Θ

. Compute the value

h (Γ (H_{Θ}))

using (3). Simple analysis of (3) shows that

h (Γ (H_{Θ})) = h_{h}^{(k)} (Θ)

.

Remark 3.

Let Θ be a nonterminal node of the DAG

Δ (T)

such that, for each child

Θ^{'}

of Θ, we already know the value

h^{(k)} (Θ^{'})

. Then, the procedure of computation of the value

h_{h}^{(k)} (Θ)

has polynomial time complexity depending on the size of decision table T.

Computation of

h_{p}^{(k)} (Θ)

. For each row

r = (δ_{1}, \dots, δ_{n})

of the decision table T, we check if the corresponding proper hypothesis

H_{r}

= {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

is admissible for

Θ

. For each admissible for

Θ

proper hypothesis

H_{r}

= {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

, we compute the value

h (Γ (H_{r}))

using (3). One can show that the minimum among the obtained numbers is equal to

h_{p}^{(k)} (Θ)

.

Remark 4.

Let Θ be a nonterminal node of the DAG

Δ (T)

such that, for each child

Θ^{'}

of Θ, we already know the value

h^{(k)} (Θ^{'})

. Then, the procedure of computation of the value

h_{p}^{(k)} (Θ)

has polynomial time complexity depending on the size of decision table T.

We describe an Algorithm

A_{h}

that, for a given nonempty decision table T and given

k \in {1, \dots, 5}

, calculates the value

h^{(k)} (T)

, which is the minimum depth of a decision tree of the type k for the table T. During the work of this algorithm, we find for each node

Θ

of the DAG

Δ (T)

the value

h^{(k)} (Θ)

.

Algorithm

A_{h}

(computation of

h^{(k)} (T)

).

Input: A nonempty decision table T, the directed acyclic graph

Δ (T)

, and number

k \in {1, \dots, 5}

.

Output: The value

h^{(k)} (T)

.

If a number is attached to each node of the DAG $Δ (T)$ , then return the number attached to the node T as $h^{(k)} (T)$ and halt the algorithm. Otherwise, choose a node $Θ$ of the graph $Δ (T)$ without attached number, which is either a terminal node of $Δ (T)$ or a nonterminal node of $Δ (T)$ for which all children have attached numbers.
If $Θ$ is a terminal node, then attach to it the number $h^{(k)} (Θ) = 0$ and proceed to step 1.
If $Θ$ is not a terminal node, then, depending on the value k, do the following:
- In the case $k = 1$ , compute the value $h_{a}^{(1)} (Θ)$ and attach to $Θ$ the value $h^{(1)} (Θ) = h_{a}^{(1)} (Θ)$ .
- In the case $k = 2$ , compute the value $h_{h}^{(2)} (Θ)$ and attach to $Θ$ the value $h^{(2)} (Θ) = h_{h}^{(2)} (Θ)$ .
- In the case $k = 3$ , compute the values $h_{a}^{(3)} (Θ)$ and $h_{h}^{(3)} (Θ)$ , and attach to $Θ$ the value $h^{(3)} (Θ) = min {h_{a}^{(3)} (Θ), h_{h}^{(3)} (Θ)}$ .
- In the case $k = 4$ , compute the value $h_{p}^{(4)} (Θ)$ and attach to $Θ$ the value $h^{(4)} (Θ) = h_{p}^{(4)} (Θ)$ .
- In the case $k = 5$ , compute the values $h_{a}^{(5)} (Θ)$ and $h_{p}^{(5)} (Θ)$ , and attach to $Θ$ the value $h^{(5)} (Θ) = min {h_{a}^{(5)} (Θ), h_{p}^{(5)} (Θ)}$ .
Proceed to step 1.

Using Remarks 2–4, one can prove the following statement.

Proposition 5.

The time complexity of the Algorithm

A_{h}

is bounded from above by a polynomial on the size of the input table T and the number

|S E P (T)|

of different separable subtables of T.

A similar bound can be obtained for the space complexity of the considered algorithm.

6. Minimizing the Number of Realizable Nodes

In this section, we consider some results obtained in Reference [13]. Let T be a nonempty decision table with n conditional attributes

f_{1}, \dots, f_{n}

. We can use the DAG

Δ (T)

to compute values

L^{(1)} (T), \dots, L^{(5)} (T)

. Let

k \in {1, \dots, 5}

. To find the value

L^{(k)} (T)

, we compute the value

L^{(k)} (Θ)

for each node

Θ

of the DAG

Δ (T)

. We will consider not only subtables that are nodes of

Δ (T)

but also empty subtable

Λ

of T and subtables

T_{r}

that contain only one row r of T and are not nodes of

Δ (T)

. We begin with these special subtables and terminal nodes of

Δ (T)

(nodes without leaving edges) that are degenerate separable subtables of T and step-by-step move to the table T.

Let

Θ

be a terminal node of

Δ (T)

or

Θ = T_{r}

for some row r of T. Then,

L^{(k)} (Θ) = 1

: the decision tree that contains only one node labeled with the decision attached to all rows of

Θ

is a decision tree for

Θ

. The only node of this tree is realizable relative to

Θ

. If

Θ = Λ

, then

L^{(k)} (Θ) = 0

: the decision tree that contains only one node labeled with 0 will be considered as a decision tree for

Λ

. The only node of this tree is not realizable relative to

Λ

.

Let

Θ

be a nonterminal node of

Δ (T)

such that, for each child

Θ^{'}

of

Θ

, we already know the value

L^{(k)} (Θ^{'})

. Based on this information, we can find the minimum number of realizable relative to

Θ

nodes in a decision tree for

Θ

, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled

With an attribute from $F (T)$ (we denote by $L_{a}^{(k)} (Θ)$ the minimum number of realizable relative to $Θ$ nodes in such a decision tree).
With a hypothesis over T (we denote by $L_{h}^{(k)} (Θ)$ the minimum number of realizable relative to $Θ$ nodes in such a decision tree).
With a proper hypothesis over T (we denote by $L_{p}^{(k)} (Θ)$ the minimum number of realizable relative to $Θ$ nodes in such a decision tree).

We now describe three procedures for computing the values

L_{a}^{(k)} (Θ)

,

L_{h}^{(k)} (Θ)

, and

L_{p}^{(k)} (Θ)

, respectively. Since

Θ

is nondegenerate, the set

E (Θ)

is nonempty.

Let us consider a decision tree

Γ (f_{i})

for

Θ

in which the root is labeled with an attribute

f_{i} \in E (Θ)

. For each

δ \in E (T, f_{i})

, there is an edge that leaves the root and enters a node

v (δ)

. This edge is labeled with the equation system

{f_{i} = δ}

. The node

v (δ)

is the root of a decision tree of the type k for

Θ {f_{i} = δ}

for which the number of realizable relative to

Θ {f_{i} = δ}

nodes is equal to

L^{(k)} (Θ {f_{i} = δ})

. It is clear that

L (Θ, Γ (f_{i})) = 1 + \sum_{δ \in E (T, f_{i})} L^{(k)} (Θ {f_{i} = δ})

. Since

L^{(k)} (Θ {f_{i} = δ}) = L^{(k)} (Λ) = 0

for any

δ \in E (T, f_{i}) ∖ E (Θ, f_{i})

,

L (Θ, Γ (f_{i})) = 1 + \sum_{δ \in E (Θ, f_{i})} L^{(k)} (Θ {f_{i} = δ}) .

(4)

Evidently, for any

δ \in E (Θ, f_{i})

, the subtable

Θ {f_{i} = δ}

is a child of

Θ

in the DAG

Δ (T)

, i.e., we know the value

L^{(k)} (Θ {f_{i} = δ})

. One can show that

L (Θ, Γ (f_{i}))

is the minimum number of realizable relative to

Θ

nodes in a decision tree for

Θ

, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled with the attribute

f_{i}

.

We should not consider attributes

f_{i} \in F (T) ∖ E (Θ)

since, for each such attribute, there is

δ \in E (T, f_{i})

with

Θ {f_{i} = δ} = Θ

, i.e., based on this attribute, we cannot construct an optimal decision tree for

Θ

. As a result, we have

L_{a}^{(k)} (Θ) = min {L (Θ, Γ (f_{i})) : f_{i} \in E (Θ)} .

(5)

Computation of

L_{a}^{(k)} (Θ)

. Construct the set of attributes

E (Θ)

. For each attribute

f_{i} \in E (Θ)

, compute the value

L (Θ, Γ (f_{i}))

using (4). Compute the value

L_{a}^{(k)} (Θ)

using (5).

Let us consider a decision tree

Γ (H)

for

Θ

in which the root is labeled with an admissible for

Θ

hypothesis

H = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

. For each

S \in A (H)

, there is an edge that leaves the root of

Γ (H)

and enters a node

v (S)

. This edge is labeled with the equation system S. The node

v (S)

is the root of a decision tree of the type k for

Θ S

, for which the number of realizable relative to

Θ S

nodes is equal to

L^{(k)} (Θ S)

. It is clear that

L (Θ, Γ (H)) = 1 + \sum_{S \in A (H)} L^{(k)} (Θ S)

.

Denote

r = (δ_{1}, \dots, δ_{n})

. It is easy to show that

Θ H = Λ

if r is not a row of

Θ

and

Θ H = T_{r}

if r is a row of

Θ

. Therefore,

L^{(k)} (Θ H) = \{\begin{matrix} 0, & if r is not a row of Θ, \\ 1, & if r is a row of Θ . \end{matrix}

(6)

Since H is admissible for

Θ

,

E (Θ, f_{i}) ∖ {δ_{i}} = \emptyset

for any attribute

f_{i} \in F (T) ∖ E (Θ)

. It is clear that

Θ {f_{i} = σ} = Λ

and

L^{(k)} (Θ {f_{i} = σ}) = 0

for any attribute

f_{i} \in E (Θ)

and any

σ \in E (T, f_{i}) ∖ {δ_{i}}

such that

σ \notin E (Θ, f_{i})

. Therefore,

L (Θ, Γ (H)) = L^{(k)} (Θ H) + K (Θ, H),

(7)

where

K (Θ, H) = 1 + \sum_{f_{i} \in E (Θ), σ \in E (Θ, f_{i}) ∖ {δ_{i}}} L^{(k)} (Θ {f_{i} = σ}) .

(8)

Evidently, for any

f_{i} \in E (Θ)

and any

σ \in E (Θ, f_{i}) ∖ {δ_{i}}

, the subtable

Θ {f_{i} = σ}

is a child of

Θ

in the DAG

Δ (T)

, i.e., we know the value

L^{(k)} (Θ {f_{i} = σ})

. It is easy to show that

L (Θ, Γ (H))

is the minimum number of realizable relative to

Θ

nodes in a decision tree for

Θ

, which uses for the subtables corresponding to the children of the root decision trees of the type k and in which the root is labeled with the admissible for

Θ

hypothesis H.

We should not consider hypotheses that are not admissible for

Θ

since, for each such hypothesis H for corresponding query, there is an answer

S \in A (H)

with

Θ S = Θ

, i.e., based on this hypothesis, we cannot construct an optimal decision tree for

Θ

. As a result, we have

L_{h}^{(k)} (Θ) = min {L (Θ, Γ (H)) : H \in A d m (Θ)},

(9)

where

A d m (Θ)

is the set of admissible hypotheses for

Θ

.

For each

f_{i} \in {f_{1}, \dots, f_{n}}

, denote

a_{i} (Θ) = max {L^{(k)} (Θ {f_{i} = σ}) : σ \in E (Θ, f_{i})}

and

C (Θ, f_{i}) = {σ \in E (Θ, f_{i}) : L^{(k)} (Θ {f_{i} = σ}) = a_{i} (Θ)}

. Set

C (Θ) = C (Θ, f_{1}) \times \dots \times C (Θ, f_{n})

. It is clear that, for each

\bar{δ} = (δ_{1}, \dots, δ_{n}) \in C (Θ)

, the hypothesis

H_{\bar{δ}} = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

is admissible for

Θ

. Simple analysis of (8) shows that the set

{H_{\bar{δ}} : \bar{δ} \in C (Θ)}

coincides with the set of admissible for

Θ

hypotheses H that minimize the value

K (Θ, H)

. Denote

K_{min} = K (Θ, H_{\bar{δ}})

, where

\bar{δ} \in C (Θ)

.

Let there be a tuple

\bar{δ} \in C (Θ)

, which is not a row of

Θ

. Then,

L^{(k)} (Θ H_{\bar{δ}}) = 0

and

L_{h}^{(k)} (Θ) = K_{min}

. Let all tuples from

C (Θ)

be rows of

Θ

. We now show that

L_{h}^{(k)} (Θ) = 1 + K_{min}

. For any

\bar{δ} \in C (Θ)

, we have

L (Θ, Γ (H_{\bar{δ}})) = 1 + K_{min}

. Therefore,

L_{h}^{(k)} (Θ) \leq 1 + K_{min}

. Let us assume that

L_{h}^{(k)} (Θ) < 1 + K_{min}

. Then, by (9), there exists an admissible for

Θ

hypothesis

H = {f_{1} = σ_{1}, \dots, f_{n} = σ_{n}}

for which

(σ_{1}, \dots, σ_{n}) \notin C (Θ)

and

L (Θ, Γ (H)) < 1 + K_{min}

, but this is impossible since, according to (7),

L (Θ, Γ (H)) \geq K (Θ, H) \geq K_{min} + 1

.

As a result, we have

L_{h}^{(k)} (Θ) = K_{min}

if not all tuples from

C (Θ)

are rows of

Θ

, and

L_{h}^{(k)} (Θ) = 1 + K_{min}

if all tuples from

C (Θ)

are rows of

Θ

.

Computation of

L_{h}^{(k)} (Θ)

. For each

f_{i} \in {f_{1}, \dots, f_{n}}

, we compute the value:

a_{i} (Θ) = max {L^{(k)} (Θ {f_{i} = σ}) : σ \in E (Θ, f_{i})}

and construct the set

C (Θ, f_{i}) = {σ \in E (Θ, f_{i}) : L^{(k)} (Θ {f_{i} = σ}) = a_{i} (Θ)}

. For a tuple

\bar{δ} \in C (Θ) = C (Θ, f_{1}) \times \dots \times C (Θ, f_{n})

, using (8), we compute the value

K_{min} = K (Θ, H_{\bar{δ}})

. Then, we count the number N of rows from

Θ

that belong to the set

C (Θ)

and compute the cardinality

| C (Θ) |

of the set

C (Θ)

that is equal to

| C (Θ, f_{1}) | \cdot \dots \cdot | C (Θ, f_{n}) |

. As a result, we have

L_{h}^{(k)} (Θ) = K_{min}

if

N < | C (Θ) |

and

L_{h}^{(k)} (Θ) = 1 + K_{min}

if

N = | C (Θ) |

.

Computation of

L_{p}^{(k)} (Θ)

. For each row

r = (δ_{1}, \dots, δ_{n})

of the decision table T, we check if the corresponding proper hypothesis

H_{r}

= {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

is admissible for

Θ

. For each admissible for

Θ

proper hypothesis

H_{r}

, we compute the value

L (Θ, Γ (H_{r}))

using (6), (7), and (8). One can show that the minimum among the obtained numbers is equal to

L_{p}^{(k)} (Θ)

.

We now consider an algorithm

A_{L}

that, for a given nonempty decision table T and number

k \in {1, \dots, 5}

, calculates the value

L^{(k)} (T)

, which is the minimum number of nodes realizable relative to T in a decision tree of the type k for the table T. During the work of this algorithm, we find for each node

Θ

of the DAG

Δ (T)

the value

L^{(k)} (Θ)

.

The description of the algorithm

A_{L}

is similar to the description of the Algorithm

A_{h}

. Instead of

h^{(k)}

, we should use

L^{(k)}

. For each

b \in {a, h, p}

, instead of

h_{b}^{(k)}

, we should use

L_{b}^{(k)}

. In particular, for each terminal node

Θ

,

L^{(k)} (Θ) = 1

.

One can show that the procedures of computation of the values

L_{a}^{(k)} (Θ)

,

L_{h}^{(k)} (Θ)

, and

L_{p}^{(k)} (Θ)

have polynomial time complexity depending on the size of the decision table T. Using this fact, one can prove the following statement.

Proposition 6.

The time complexity of the algorithm

A_{L}

is bounded from above by a polynomial on the size of the input table T and the number

|S E P (T)|

of different separable subtables of T.

A similar bound can be obtained for the space complexity of the considered algorithm.

7. Minimizing the Number of Realizable Terminal Nodes

The procedure considered in this section is similar to the procedure of the minimization of the number of realizable nodes. The main difference is that, in decision trees with the minimum number of realizable terminal nodes, it is possible to meet constant attributes and hypotheses that are not admissible. Fortunately, for any decision table and any type of decision trees, there is a decision tree of this type with the minimum number of realizable terminal nodes for the considered table that do not use such attributes and hypotheses. We will omit many details and describe main steps only.

Let T be a nonempty decision table with n conditional attributes

f_{1}, \dots, f_{n}

and

k \in {1, \dots, 5}

. To find the value

L_{t}^{(k)} (T)

, we compute the value

L_{t}^{(k)} (Θ)

for each node

Θ

of the DAG

Δ (T)

. We begin with terminal nodes of

Δ (T)

that are degenerate separable subtables of T and step-by-step move to the table T.

Let

Θ

be a terminal node of

Δ (T)

. Then,

L_{t}^{(k)} (Θ) = 1

: the decision tree that contains only one node labeled with the decision attached to all rows of

Θ

is a decision tree for

Θ

. The only node of this tree is a terminal node realizable relative to

Θ

.

Let

Θ

be a nonterminal node of

Δ (T)

such that, for each child

Θ^{'}

of

Θ

, we already know the value

L_{t}^{(k)} (Θ^{'})

. Based on this information, we can find the minimum number of realizable relative to

Θ

terminal nodes in a decision tree for

Θ

, which uses for the subtables corresponding to children of the root decision trees of the type k and in which the root is labeled

With an attribute from $F (T)$ (we denote by $L_{t, a}^{(k)} (Θ)$ the minimum number of realizable relative to $Θ$ terminal nodes in such a decision tree).
With a hypothesis over T (we denote by $L_{t, h}^{(k)} (Θ)$ the minimum number of realizable relative to $Θ$ terminal nodes in such a decision tree).
With a proper hypothesis over T (we denote by $L_{t, p}^{(k)} (Θ)$ the minimum number of realizable relative to $Θ$ terminal nodes in such a decision tree).

We now describe three procedures for computing the values

L_{t, a}^{(k)} (Θ)

,

L_{t, h}^{(k)} (Θ)

, and

L_{t, p}^{(k)} (Θ)

, respectively. Since

Θ

is nondegenerate, the set

E (Θ)

is nonempty.

Computation of

L_{t, a}^{(k)} (Θ)

. Construct the set of attributes

E (Θ)

. For each attribute

f_{i} \in E (Θ)

, compute the value:

L_{t, a}^{(k)} (Θ, f_{i}) = \sum_{δ \in E (Θ, f_{i})} L_{t}^{(k)} (Θ {f_{i} = δ}) .

Then, compute the value:

L_{t, a}^{(k)} (Θ) = min {L_{t, a}^{(k)} (Θ, f_{i}) : f_{i} \in E (Θ)} .

Computation of

L_{t, h}^{(k)} (Θ)

. For each

f_{i} \in {f_{1}, \dots, f_{n}}

, we compute the value:

a_{i} (Θ) = max {L_{t}^{(k)} (Θ {f_{i} = σ}) : σ \in E (Θ, f_{i})}

and construct the set

C (Θ, f_{i}) = {σ \in E (Θ, f_{i}) : L_{t}^{(k)} (Θ {f_{i} = σ}) = a_{i} (Θ)}

. For a tuple

(δ_{1}, \dots, δ_{n}) \in C (Θ) = C (Θ, f_{1}) \times \dots \times C (Θ, f_{n})

, we compute the value:

K_{min} = \sum_{f_{i} \in E (Θ), σ \in E (Θ, f_{i}) ∖ {δ_{i}}} L_{t}^{(k)} (Θ {f_{i} = σ}) .

Then, we count the number N of rows from

Θ

that belong to the set

C (Θ)

and compute the cardinality

| C (Θ) |

of the set

C (Θ)

that is equal to

| C (Θ, f_{1}) | \cdot \dots \cdot | C (Θ, f_{n}) |

. As a result, we have

L_{t, h}^{(k)} (Θ) = K_{min}

if

N < | C (Θ) |

and

L_{t, h}^{(k)} (Θ) = 1 + K_{min}

if

N = | C (Θ) |

.

Computation of

L_{t, p}^{(k)} (Θ)

. For each row

r = (δ_{1}, \dots, δ_{n})

of the decision table T, we check if the corresponding proper hypothesis

H_{r}

= {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

is admissible for

Θ

. For each admissible for

Θ

proper hypothesis

H_{r}

, we compute the value:

L_{t, p}^{(k)} (Θ, H_{r}) = 1 + \sum_{f_{i} \in E (Θ), σ \in E (Θ, f_{i}) ∖ {δ_{i}}} L_{t}^{(k)} (Θ {f_{i} = σ}) .

One can show that the minimum among the obtained numbers is equal to

L_{t, p}^{(k)} (Θ)

.

We now consider an algorithm

A_{L_{t}}

that, for a given nonempty decision table T and number

k \in {1, \dots, 5}

, calculates the value

L_{t}^{(k)} (T)

, which is the minimum number of terminal nodes realizable relative to T in a decision tree of the type k for the table T. During the work of this algorithm, we find for each node

Θ

of the DAG

Δ (T)

the value

L_{t}^{(k)} (Θ)

.

The description of the algorithm

A_{L_{t}}

is similar to the description of the Algorithm

A_{h}

. Instead of

h^{(k)}

, we should use

L_{t}^{(k)}

. For each

b \in {a, h, p}

, instead of

h_{b}^{(k)}

, we should use

L_{t, b}^{(k)}

. In particular, for each terminal node

Θ

,

L_{t}^{(k)} (Θ) = 1

.

One can show that the procedures of computation of the values

L_{t, a}^{(k)} (Θ)

,

L_{t, h}^{(k)} (Θ)

, and

L_{t, p}^{(k)} (Θ)

have polynomial time complexity depending on the size of the decision table T. Using this fact, one can prove the following statement.

Proposition 7.

The time complexity of the algorithm

A_{L_{t}}

is bounded from above by a polynomial on the size of the input table T and the number

|S E P (T)|

of different separable subtables of T.

A similar bound can be obtained for the space complexity of the considered algorithm.

8. Minimizing the Number of Working Nodes

The procedure considered in this section is similar to the procedure of the minimization of the depth. We will omit many details and describe main steps only.

Let T be a nonempty decision table with n conditional attributes

f_{1}, \dots, f_{n}

and

k \in {1, \dots, 5}

. To find the value

L_{w}^{(k)} (T)

, we compute the value

L_{w}^{(k)} (Θ)

for each node

Θ

of the DAG

Δ (T)

. We begin with terminal nodes of

Δ (T)

that are degenerate separable subtables of T and step-by-step move to the table T.

Let

Θ

be a terminal node of

Δ (T)

. Then,

L_{t}^{(k)} (Θ) = 0

: the decision tree that contains only one node labeled with the decision attached to all rows of

Θ

is a decision tree for

Θ

. This tree has no working nodes.

Let

Θ

be a nonterminal node of

Δ (T)

such that, for each child

Θ^{'}

of

Θ

, we already know the value

L_{w}^{(k)} (Θ^{'})

. Based on this information, we can find the minimum number of working nodes in a decision tree for

Θ

, which uses for the subtables corresponding to children of the root decision trees of the type k and in which the root is labeled

With an attribute from $F (T)$ (we denote by $L_{w, a}^{(k)} (Θ)$ the minimum number of working nodes in such a decision tree).
With a hypothesis over T (we denote by $L_{w, h}^{(k)} (Θ)$ the minimum number of working nodes in such a decision tree).
With a proper hypothesis over T (we denote by $L_{w, p}^{(k)} (Θ)$ the minimum number of working nodes in such a decision tree).

We now describe three procedures for computing the values

L_{w, a}^{(k)} (Θ)

,

L_{w, h}^{(k)} (Θ)

, and

L_{w, p}^{(k)} (Θ)

, respectively. Since

Θ

is nondegenerate, the set

E (Θ)

is nonempty.

Computation of

L_{w, a}^{(k)} (Θ)

. Construct the set of attributes

E (Θ)

. For each attribute

f_{i} \in E (Θ)

, compute the value:

L_{w, a}^{(k)} (Θ, f_{i}) = \sum_{δ \in E (Θ, f_{i})} L_{w}^{(k)} (Θ {f_{i} = δ}) .

Then, compute the value:

L_{w, a}^{(k)} (Θ) = min {L_{w, a}^{(k)} (Θ, f_{i}) : f_{i} \in E (Θ)} .

Computation of

L_{w, h}^{(k)} (Θ)

. First, we construct a hypothesis:

H_{Θ} = {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

for

Θ

. Let

f_{i} \in F (T) ∖ E (Θ)

. Then,

δ_{i}

is equal to the only number in the set

E (Θ, f_{i})

. Let

f_{i} \in E (Θ)

. Then,

δ_{i}

is the minimum number from

E (Θ, f_{i})

for which

L_{w}^{(k)} (Θ {f_{i} = δ_{i}}) = max {L_{w}^{(k)} (Θ {f_{i} = σ}) : σ \in E (Θ, f_{i})}

. Then

L_{w, h}^{(k)} (Θ) = 1 + \sum_{f_{i} \in E (Θ), σ \in E (Θ, f_{i}) ∖ {δ_{i}}} L_{w}^{(k)} (Θ {f_{i} = σ}) .

Computation of

L_{w, p}^{(k)} (Θ)

. For each row

r = (δ_{1}, \dots, δ_{n})

of the decision table T, we check if the corresponding proper hypothesis

H_{r}

= {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

is admissible for

Θ

. For each admissible for

Θ

proper hypothesis

H_{r}

, we compute the value:

L_{w, p}^{(k)} (Θ, H_{r}) = 1 + \sum_{f_{i} \in E (Θ), σ \in E (Θ, f_{i}) ∖ {δ_{i}}} L_{w}^{(k)} (Θ {f_{i} = σ}) .

One can show that the minimum among the obtained numbers is equal to

L_{w, p}^{(k)} (Θ)

.

We now consider an algorithm

A_{L_{w}}

that, for a given nonempty decision table T and

k \in {1, \dots, 5}

, calculates the value

L_{w}^{(k)} (T)

, which is the minimum number of working nodes in a decision tree of the type k for the table T. During the work of this algorithm, we find for each node

Θ

of the DAG

Δ (T)

the value

L_{w}^{(k)} (Θ)

.

The description of the algorithm

A_{L_{w}}

is similar to the description of the Algorithm

A_{h}

. Instead of

h^{(k)}

, we should use

L_{w}^{(k)}

. For each

b \in {a, h, p}

, instead of

h_{b}^{(k)}

, we should use

L_{w, b}^{(k)}

. In particular, for each terminal node

Θ

,

L_{w}^{(k)} (Θ) = 0

.

One can show that the procedures of computation of the values

L_{w, a}^{(k)} (Θ)

,

L_{w, h}^{(k)} (Θ)

, and

L_{w, p}^{(k)} (Θ)

have polynomial time complexity depending on the size of the decision table T. Using this fact, one can prove the following statement.

Proposition 8.

The time complexity of the algorithm

A_{L_{w}}

is bounded from above by a polynomial on the size of the input table T and the number

|S E P (T)|

of different separable subtables of T.

A similar bound can be obtained for the space complexity of the considered algorithm.

9. On Number of Realizable Terminal Nodes

Based on the results of experiments, we formulated the following hypothesis:

L_{t}^{(1)} (T) = L_{t}^{(3)} (T) = L_{t}^{(5)} (T)

for any decision table T. In this section, we prove it. First, we consider a simple lemma.

Lemma 9.

Let

T

be a decision table and

T^{'}

be a subtable of the table T. Then,

L_{t}^{(3)} (T^{'}) \leq L_{t}^{(3)} (T)

.

Proof.

It is easy to prove the considered inequality if

T^{'}

is degenerate. Let

T^{'}

be nondegenerate and

Γ

be a decision tree of the type 3 for T with the minimum number of realizable relative to T terminal nodes. Then, the root r of

Γ

is a working node. It is clear that the table

T^{'} S (Γ, r)

is nondegenerate. For each working node v of

Γ

such that the table

T^{'} S (Γ, v)

is degenerate and the table

T^{'} S (Γ, v^{'})

is nondegenerate, where

v^{'}

is the parent of v, we do the following. We remove all nodes and edges of the subtree of

Γ

with the root v with the exception of the node v. If

T^{'} S (Γ, v) = Λ

, then we label the node v with the number 0. If the subtable

T^{'} S (Γ, v)

is nonempty, then we label the node v with the decision attached to each row of this subtable. We denote by

Γ^{'}

the obtained decision tree. One can show that

Γ^{'}

is a decision tree of the type 3 for the table

T^{'}

and

L_{t}^{(3)} (T^{'}, Γ^{'}) \leq L_{t}^{(3)} (T, Γ)

. Therefore,

L_{t}^{(3)} (T^{'}) \leq L_{t}^{(3)} (T)

. □

Proposition 10.

For any decision table T, the following equalities hold:

L_{t}^{(1)} (T) = L_{t}^{(3)} (T) = L_{t}^{(5)} (T) .

Proof.

It is clear that

L_{t}^{(3)} (T) \leq L_{t}^{(5)} (T) \leq L_{t}^{(1)} (T)

for any decision table T. To prove the considered statement, it is enough to show that

L_{t}^{(1)} (T) \leq L_{t}^{(3)} (T)

for any decision table T. We will prove this inequality by induction on the number of attributes in the set

E (T)

.

We now show that

L_{t}^{(1)} (T) \leq L_{t}^{(3)} (T)

for any decision table T with

| E (T) | = 0

. If

| E (T) | = 0

, then either the table T is empty or the table T contains one row. Let

T

be empty. In this case, the decision tree that contains only one node labeled with 0 is considered as a decision tree for T. The only node of this tree is not realizable relative to T. Therefore,

L_{t}^{(1)} (T) = L_{t}^{(3)} (T) = 0

. Let T contain one row. In this case, the decision tree that contains only one node labeled with the decision attached to the row of T is a decision tree for T. The only node of this tree is realizable relative to T. Therefore,

L_{t}^{(1)} (T) = L_{t}^{(3)} (T) = 1

.

Let

n \geq 1

and, for any decision table T with

| E (T) | \leq n - 1

, the inequality

L_{t}^{(1)} (T) \leq L_{t}^{(3)} (T)

hold. Let T be a decision table with

| E (T) | = n

and T have

m \geq n

columns labeled with the attributes

f_{1}, \dots, f_{m}

. Let, for the definiteness,

E (T) = {f_{1}, \dots, f_{n}}

. If T is a degenerate table, then, as it is easy to show,

L_{t}^{(1)} (T) = L_{t}^{(3)} (T) = 1

. Let T be nondegenerate.

We denote by

Γ

a decision tree of the type 3 for the table T for which

L_{t} (T, Γ) = L_{t}^{(3)} (T)

and

Γ

has the minimum number of nodes among such decision trees. One can show that the root of

Γ

is either labeled with an attribute from

E (T)

or with a hypothesis over T that is admissible for T. We now prove that the tree

Γ

can be transformed into a decision tree

Γ^{*}

of the type 1 for the table T such that

L_{t} (T, Γ^{*}) \leq L_{t}^{(3)} (T)

.

Let the root of

Γ

be labeled with an attribute

f_{i} \in E (T)

. Then, for each

σ \in E (T, f_{i})

, the root of

Γ

has a child

v_{σ}

such that

T S (Γ, v_{σ}) = T {f_{i} = σ}

and the root of

Γ

has no other children. Since

f_{i} \in E (T)

,

| E (T {f_{i} = σ} | \leq n - 1

. Using the inductive hypothesis, we obtain that there is a decision tree

Γ_{σ}

of the type 1 for the table

T {f_{i} = σ}

such that

L_{t} (T {f_{i} = σ}, Γ_{σ}) \leq L_{t}^{(3)} (T {f_{i} = σ})

. For each child

v_{σ}

of the root of

Γ

, we replace the subtree of

Γ

with the root

v_{σ}

with the tree

Γ_{σ}

. As a result, we obtain a decision tree

Γ^{*}

of the type 1 for the table T such that

L_{t} (T, Γ^{*}) \leq L_{t} (T, Γ) = L_{t}^{(3)} (T)

.

Let the root of

Γ

be labeled with a hypothesis

H = {f_{1} = δ_{1}, \dots, f_{m} = δ_{m}}

over T that is admissible for T; see Figure 1, which depicts a prefix of the tree

Γ

. The root of

Γ

has a child

v_{0}

such that

T S (Γ, v_{0}) = T H = T {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}}

. For each

f_{i} \in E (T)

and each

σ_{i} \in E (T, f_{i}) ∖ {δ_{i}}

, the root of

Γ

has a child

v_{i, σ_{i}}

such that

T S (Γ, v_{i, σ_{i}}) = T {f_{i} = σ_{i}}

. The root of

Γ

has no other children.

We transform the tree

Γ

into a decision tree

Γ^{*}

of the type 1 for the table T; see Figure 2, which depicts a prefix of the tree

Γ^{*}

. For the node

u_{0}

of the considered prefix,

T S (Γ^{*}, u_{0}) = T {f_{1} = δ_{1}, \dots, f_{n} = δ_{n}} = T H

. For each

f_{i} \in E (T)

and each

σ_{i} \in E (T, f_{i}) ∖ {δ_{i}}

, the node of this prefix labeled with the attribute

f_{i}

has a child

u_{i, σ_{i}}

such that

T S (Γ^{*}, u_{i, σ_{i}}) = T {f_{1} = δ_{1}, \dots, f_{i - 1} = δ_{i - 1}, f_{i} = σ_{i}}

. It is clear that

T S (Γ^{*}, u_{i, σ_{i}})

is a subtable of

T S (Γ, v_{i, σ_{i}})

. By Lemma 9,

L_{t}^{(3)} (T S (Γ^{*}, u_{i, σ_{i}})) \leq L_{t}^{(3)} (T S (Γ, v_{i, σ_{i}}))

. It is also clear that

| E (T S (Γ^{*}, u_{i, σ_{i}}) | \leq n - 1

. Using the inductive hypothesis, we obtain that there is a decision tree

Γ_{i, σ_{i}}

of the type 1 for the table

T S (Γ^{*}, u_{i, σ_{i}})

such that

L_{t} (T S (Γ^{*}, u_{i, σ_{i}}), Γ_{i, σ_{i}}) \leq L_{t}^{(3)} (T S (Γ^{*}, u_{i, σ_{i}})) \leq L_{t}^{(3)} (T S (Γ, v_{i, σ_{i}}))

.

We now transform the prefix of a decision tree

Γ^{*}

depicted in Figure 2 into a decision tree

Γ^{*}

of the type 1 for the table T. First, we transform the node

u_{0}

into a terminal node labeled with the number 0 if

(δ_{1}, \dots, δ_{n})

is not a row of T and labeled with the decision attached to

(δ_{1}, \dots, δ_{n})

if this tuple is a row of T. Next, for each

f_{i} \in E (T)

and each

σ_{i} \in E (T, f_{i}) ∖ {δ_{i}}

, we replace the node

u_{i, σ_{i}}

with the tree

Γ_{i, σ_{i}}

. It is clear that the obtained tree

Γ^{*}

is a decision tree of the type 1 for the decision table T and

L_{t} (T, Γ^{*}) \leq L_{t} (T, Γ) = L_{t}^{(3)} (T)

.

We proved that, for any decision table T,

L_{t}^{(1)} (T) \leq L_{t}^{(3)} (T)

; hence,

L_{t}^{(1)} (T) = L_{t}^{(3)} (T) = L_{t}^{(5)} (T)

. □

10. Results of Experiments

We conducted experiments with eight decision tables from the UCI ML Repository [20]. Table 1 contains information about each of these decision tables: its name, the number of rows, and the number of attributes. For each of the considered four cost functions, each of the considered five types of decision trees, and each of the considered eight decision tables, we find the minimum cost of a decision tree of the given type for the given table.

For

n = 3, \dots, 6

, we randomly generate 100 Boolean functions with n variables. We represent each Boolean function f with n variables

x_{1}, \dots, x_{n}

as a decision table

T_{f}

with n columns labeled with variables

x_{1}, \dots, x_{n}

considered as attributes and with

2^{n}

rows that are all possible n-tuples of values of the variables. Each row is labeled with the decision that is the value of the function f on the corresponding n-tuple. We consider decision trees for the table

T_{f}

as decision trees computing the function f.

For each of the considered four cost functions, each of the considered five types of decision trees, and each of the generated Boolean functions, using its decision table representation, we find the minimum cost of a decision tree of the given type computing this function.

The following remarks clarify some experimental results considered later.

From Proposition 10, it follows that

L_{t}^{(1)} (T) = L_{t}^{(3)} (T) = L_{t}^{(5)} (T)

for any decision table T.

Let f be a Boolean function with

n \geq 1

variables. Since each hypothesis over the decision table

T_{f}

is proper, the following equalities hold:

\begin{matrix} h^{(2)} (T_{f}) = h^{(4)} (T_{f}), h^{(3)} (T_{f}) = h^{(5)} (T_{f}), \\ L^{(2)} (T_{f}) = L^{(4)} (T_{f}), L^{(3)} (T_{f}) = L^{(5)} (T_{f}), \\ L_{t}^{(2)} (T_{f}) = L_{t}^{(4)} (T_{f}), L_{t}^{(3)} (T_{f}) = L_{t}^{(5)} (T_{f}), \\ L_{w}^{(2)} (T_{f}) = L_{w}^{(4)} (T_{f}), L_{w}^{(3)} (T_{f}) = L_{w}^{(5)} (T_{f}) . \end{matrix}

10.1. Depth

In this section, we consider some results obtained in Reference [12]. Results of experiments with eight decision tables from Reference [20] and the depth are represented in Table 2. The first column contains the name of the considered decision table T. The last five columns contain values

h^{(1)} (T), \dots, h^{(5)} (T)

(minimum values for each decision table are in bold).

Decision trees with the minimum depth using attributes (type 1) are optimal for 5 decision tables, using hypotheses (type 2) are optimal for 4 tables, using attributes and hypotheses (type 3) are optimal for 8 tables, using proper hypotheses (type 4) are optimal for 3 tables, using attributes and proper hypotheses (type 5) are optimal for 7 tables.

For the decision table soybean-small, we must use attributes to construct an optimal decision tree. For this table, it is enough to use only attributes. For the decision tables breast-cancer and nursery, we must use both attributes and hypotheses to construct optimal decision trees. For these tables, it is enough to use attributes and proper hypotheses. For the decision table tic-tac-toe, we must use both attributes and hypotheses to construct optimal decision trees. For this table, it is not enough to use attributes and proper hypotheses.

Results of experiments with Boolean functions and the depth are represented in Table 3. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values

h^{(1)}, \dots, h^{(5)}

in the format

_{\min} {Avg}_{\max}

.

From the obtained results, it follows that, generally, the decision trees of the types 2 and 4 are better than the decision trees of the type 1, and the decision trees of the types 3 and 5 are better than the decision trees of the types 2 and 4.

10.2. Number of Realizable Nodes

In this section, we consider some results obtained in Reference [13]. Results of experiments with eight decision tables from Reference [20] and the number of realizable nodes are represented in Table 4. The first column contains the name of the considered decision table T. The last five columns contain values

L^{(1)} (T), \dots, L^{(5)} (T)

(minimum values for each decision table are in bold).

Decision trees with the minimum number of realizable nodes using attributes (type 1) are optimal for 4 decision tables, using hypotheses (type 2) are optimal for 0 tables, using attributes and hypotheses (type 3) are optimal for 8 tables, using proper hypotheses (type 4) are optimal for 0 tables, and using attributes and proper hypotheses (type 5) are optimal for 8 tables.

Decision trees of the types 3 and 5 can be a bit better than the decision trees of the type 1. Decision trees of the types 2 and 4 are far from the optimal.

For the decision tables hayes-roth-data, soybean-small, tic-tac-toe, and zoo-data, we must use attributes to construct optimal decision trees. For these tables, it is enough to use only attributes. For the rest of the considered decision tables, we must use both attributes and hypotheses to construct optimal decision trees. For these tables, it is enough to use attributes and proper hypotheses.

Results of experiments with Boolean functions and the number of realizable nodes are represented in Table 5. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values

L^{(1)}, \dots, L^{(5)}

in the format

_{m i n} A v g_{m a x}

.

From the obtained results, it follows that, generally, the decision trees of the types 3 and 5 are slightly better than the decision trees of the type 1, and the decision trees of the types 2 and 4 are far from the optimal.

10.3. Number of Realizable Terminal Nodes

Results of experiments with eight decision tables from Reference [20] and the number of realizable terminal nodes are represented in Table 6. The first column contains the name of the considered decision table T. The last five columns contain values

L_{t}^{(1)} (T), \dots, L_{t}^{(5)} (T)

(minimum values for each decision table are in bold).

Decision trees of the types 1, 3, and 5 are optimal for each of the considered tables. Decision trees of the types 2 and 4 are far from the optimal.

Results of experiments with Boolean functions and the number of realizable terminal nodes are represented in Table 7. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values

L_{t}^{(1)}, \dots, L_{t}^{(5)}

in the format

_{m i n} A v g_{m a x}

.

From the obtained results, it follows that, generally, the decision trees of the types 1, 3, and 5 are optimal, and the decision trees of the types 2 and 4 are far from the optimal.

10.4. Number of Working Nodes

Results of experiments with eight decision tables from Reference [20] and the number of working nodes are represented in Table 8. The first column contains the name of the considered decision table T. The last five columns contain values

L_{w}^{(1)} (T), \dots, L_{w}^{(5)} (T)

(minimum values for each decision table are in bold).

Decision trees with the minimum number of working nodes using attributes (type 1) are optimal for 2 decision tables, using hypotheses (type 2) are optimal for 0 tables, using attributes and hypotheses (type 3) are optimal for 8 tables, using proper hypotheses (type 4) are optimal for 0 tables, using attributes and proper hypotheses (type 5) are optimal for 7 tables.

Decision trees of the types 3 and 5 can be a bit better than the decision trees of the type 1. Decision trees of the types 2 and 4 are far from the optimal.

For all decision tables with the exception of soybean-small and zoo-data, we must use both attributes and hypotheses to construct optimal decision trees. Moreover, for tic-tac-toe, it is not enough to use attributes and proper hypotheses. For soybean-small and zoo-data, it is enough to use only attributes to construct optimal decision trees.

Results of experiments with Boolean functions and the number of working nodes are represented in Table 9. The first column contains the number of variables in the considered Boolean functions. The last five columns contain information about values

L_{w}^{(1)}, \dots, L_{w}^{(5)}

in the format

_{m i n} A v g_{m a x}

.

From the obtained results, it follows that, generally, the decision trees of the types 3 and 5 are better than the decision trees of the type 1, and the decision trees of the types 2 and 4 are far from the optimal.

We can now sum up the results of the experiments. Generally, the decision trees of the types 3 and 5 are slightly better than the decision trees of the type 1. Decision trees of the types 2 and 4 have, generally, too many nodes.

11. Conclusions

In this paper, we studied modified decision trees that use both queries based on one attribute each and queries based on hypotheses about values of all attributes. We designed dynamic programming algorithms for minimization of four cost functions for such decision trees and considered results of computer experiments. The main result of the paper is that the use of hypotheses can decrease the complexity of decision trees and make them more suitable for knowledge representation. In the future, we are planning to compare the length and coverage of decision rules derived from different types of decision trees constructed by the dynamic programming algorithms. Unfortunately, the considered algorithms cannot work together to optimize more than one cost function. In the future, we are also planning to consider two extensions of these algorithms: (i) sequential optimization relative to a number of cost functions and (ii) bi-criteria optimization that allows us to construct for some pairs of cost functions the corresponding Pareto front.

Author Contributions

Conceptualization, all authors; methodology, all authors; software, I.C.; validation, I.C., M.A., and S.H.; formal analysis, all authors; investigation, M.A. and S.H.; resources, all authors; data curation, M.A. and S.H.; writing—original draft preparation, M.M.; writing—review and editing, all authors; visualization, M.A. and S.H.; supervision, I.C. and M.M.; project administration, M.M.; funding acquisition, M.M. All authors have read and agreed to the published version of the manuscript.

Funding

Research funded by King Abdullah University of Science and Technology.

Data Availability Statement

Data available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. This data can be found here: http://archive.ics.uci.edu/ml (accessed on 12 April 2017).

Acknowledgments

Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST) including the provision of computing resources. The authors are greatly indebted to the anonymous reviewers for useful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 1984. [Google Scholar]
Moshkov, M. Time complexity of decision trees. In Trans. Rough Sets III; Lecture Notes in Computer Science; Peters, J.F., Skowron, A., Eds.; Springer: Berlin, Germany, 2005; Volume 3400, pp. 244–459. [Google Scholar]
Rokach, L.; Maimon, O. Data Mining with Decision Trees—Theory and Applications; Series in Machine Perception and Artificial Intelligence; World Scientific: Singapore, 2007; Volume 69. [Google Scholar]
Chegis, I.A.; Yablonskii, S.V. Logical methods of control of work of electric schemes. Trudy Mat. Inst. Steklov 1958, 51, 270–360. (In Russian) [Google Scholar]
Pawlak, Z. Rough sets. Int. J. Parallel Program. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Pawlak, Z. Rough Sets—Theoretical Aspects of Reasoning about Data; Theory and Decision Library: Series D; Kluwer: Dordrecht, The Netherlands, 1991; Volume 9. [Google Scholar]
Pawlak, Z.; Skowron, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
Angluin, D. Queries and concept learning. Mach. Learn. 1988, 2, 319–342. [Google Scholar] [CrossRef] [Green Version]
Angluin, D. Queries revisited. Theor. Comput. Sci. 2004, 313, 175–194. [Google Scholar] [CrossRef] [Green Version]
Angluin, D. Learning regular sets from queries and counterexamples. Inf. Comput. 1987, 75, 87–106. [Google Scholar] [CrossRef] [Green Version]
Valiant, L.G. A theory of the learnable. Commun. ACM 1984, 27, 1134–1142. [Google Scholar] [CrossRef] [Green Version]
Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Minimizing depth of decision trees with hypotheses (to appear). In Proceedings of the International Joint Conference on Rough Sets (IJCRS 2021), Bratislava, Slovakia, 19–24 September 2021. [Google Scholar]
Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Minimizing number of nodes in decision trees with hypotheses (to appear). In Proceedings of the 25th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2021), Szczecin, Poland, 8–10 September 2021. [Google Scholar]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ruiz-Garcia, A.; Schmidhuber, J.; Palade, V.; Took, C.C.; Mandic, D.P. Deep neural network representation and Generative Adversarial Learning. Neural Netw. 2021, 139, 199–200. [Google Scholar] [CrossRef] [PubMed]
Shamshirband, S.; Fathi, M.; Chronopoulos, A.T.; Montieri, A.; Palumbo, F.; Pescapè, A. Computational intelligence intrusion detection techniques in mobile cloud computing environments: Review, taxonomy, and open research issues. J. Inf. Secur. Appl. 2020, 55, 102582. [Google Scholar] [CrossRef]
Shamshirband, S.; Fathi, M.; Dehzangi, A.; Chronopoulos, A.T.; Alinejad-Rokny, H. A review on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues. J. Biomed. Inform. 2021, 113, 103627. [Google Scholar] [CrossRef] [PubMed]
AbouEisha, H.; Amin, T.; Chikalov, I.; Hussain, S.; Moshkov, M. Extensions of Dynamic Programming for Combinatorial Optimization and Data Mining; Intelligent Systems Reference Library; Springer: Berlin, Germany, 2019; Volume 146. [Google Scholar]
Alsolami, F.; Azad, M.; Chikalov, I.; Moshkov, M. Decision and Inhibitory Trees and Rules for Decision Tables with Many-Valued Decisions; Intelligent Systems Reference Library; Springer: Berlin, Germany, 2020; Volume 156. [Google Scholar]
Dua, D.; Graff, C. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 12 April 2017).

Figure 1. Prefix of decision tree

Γ

.

Figure 1. Prefix of decision tree

Γ

.

Figure 2. Prefix of decision tree

Γ^{*}

.

Figure 2. Prefix of decision tree

Γ^{*}

.

Table 1. Decision tables from Reference [20] used in experiments.

Decision	Number of	Number of
Table	Rows	Attributes
balance-scale	625	5
breast-cancer	266	10
cars	1728	7
hayes-roth-data	69	5
nursery	12,960	9
soybean-small	47	36
tic-tac-toe	958	10
zoo-data	59	17

Table 2. Experimental results for decision tables from Reference [20]—depth.

Decision	$h^{(1)} (T)$	$h^{(2)} (T)$	$h^{(3)} (T)$	$h^{(4)} (T)$	$h^{(5)} (T)$
Table T
balance-scale	4	4	4	4	4
breast-cancer	6	6	5	6	5
cars	6	6	6	6	6
hayes-roth-data	4	4	4	4	4
nursery	8	8	7	8	7
soybean-small	2	4	2	6	2
tic-tac-toe	6	6	5	8	6
zoo-data	4	4	4	5	4
Average	5.00	5.25	4.63	5.88	4.75

Table 3. Experimental results for Boolean functions—depth.

Number of	$h^{(1)}$	$h^{(2)}$	$h^{(3)}$	$h^{(4)}$	$h^{(5)}$
Variables n
3	$_{2} {2.82}_{3}$	$_{1} {2.06}_{3}$	$_{1} {1.89}_{2}$	$_{1} {2.06}_{3}$	$_{1} {1.89}_{2}$
4	$_{3} {3.94}_{4}$	$_{2} {3.05}_{4}$	$_{2} {2.97}_{3}$	$_{2} {3.05}_{4}$	$_{2} {2.97}_{3}$
5	$_{4} {4.95}_{5}$	$_{4} {4.08}_{5}$	$_{3} {3.99}_{4}$	$_{4} {4.08}_{5}$	$_{3} {3.99}_{4}$
6	$_{5} {5.99}_{6}$	$_{5} {5.01}_{6}$	$_{5} {5.00}_{5}$	$_{5} {5.01}_{6}$	$_{5} {5.00}_{5}$

Table 4. Experimental results for decision tables from Reference [20]—number of realizable nodes.

Decision	$L^{(1)} (T)$	$L^{(2)} (T)$	$L^{(3)} (T)$	$L^{(4)} (T)$	$L^{(5)} (T)$
Table T
balance-scale	501	5234	499	5234	499
breast-cancer	161	18,061	159	24,226	159
cars	396	65,250	391	65,250	391
hayes-roth-data	52	317	52	338	52
nursery	1066	12,625,955	1061	12,625,955	1061
soybean-small	6	4839	6	21,251	6
tic-tac-toe	244	154,311	244	468,447	244
zoo-data	17	1370	17	2847	17
Average	305	1,609,417	304	1,651,694	304

Table 5. Experimental results for Boolean functions—number of realizable nodes.

Number of	$L^{(1)}$	$L^{(2)}$	$L^{(3)}$	$L^{(4)}$	$L^{(5)}$
Variables n
3	$_{5} {8.41}_{13}$	$_{5} {12.38}_{22}$	$_{5} {7.40}_{12}$	$_{5} {12.38}_{22}$	$_{5} {7.40}_{12}$
4	$_{9} {16.26}_{25}$	$_{14} {43.89}_{66}$	$_{8} {14.59}_{25}$	$_{14} {43.89}_{66}$	$_{8} {14.59}_{25}$
5	$_{17} {30.42}_{41}$	$_{113} {201.95}_{283}$	$_{17} {27.83}_{39}$	$_{113} {201.95}_{283}$	$_{17} {27.83}_{39}$
6	$_{49} {58.94}_{77}$	$_{638} {1057.16}_{1406}$	$_{46} {54.13}_{71}$	$_{638} {1057.16}_{1406}$	$_{46} {54.13}_{71}$

Table 6. Experimental results for decision tables from Reference [20]—number of realizable terminal nodes.

Decision	$L_{t}^{(1)} (T)$	$L_{t}^{(2)} (T)$	$L_{t}^{(3)} (T)$	$L_{t}^{(4)} (T)$	$L_{t}^{(5)} (T)$
Table T
balance-scale	401	4369	401	4369	401
breast-cancer	99	15,577	99	21,208	99
cars	290	50,260	290	50,260	290
hayes-roth-data	36	240	36	259	36
nursery	759	9,643,103	759	9,643,103	759
soybean-small	4	4508	4	19963	4
tic-tac-toe	155	125,604	155	401,862	155
zoo-data	10	1217	10	2560	10
Average	219	1,230,610	219	1,267,948	219

Table 7. Experimental results for Boolean functions—number of realizable terminal nodes.

Number of	$L_{t}^{(1)}$	$L_{t}^{(2)}$	$L_{t}^{(3)}$	$L_{t}^{(4)}$	$L_{t}^{(5)}$
Variables n
3	$_{3} {4.70}_{7}$	$_{4} {8.83}_{14}$	$_{3} {4.70}_{7}$	$_{4} {8.83}_{14}$	$_{3} {4.70}_{7}$
4	$_{5} {8.63}_{13}$	$_{11} {31.53}_{45}$	$_{5} {8.63}_{13}$	$_{11} {31.53}_{45}$	$_{5} {8.63}_{13}$
5	$_{9} {15.71}_{21}$	$_{86} {145.68}_{200}$	$_{9} {15.71}_{21}$	$_{86} {145.68}_{200}$	$_{9} {15.71}_{21}$
6	$_{25} {29.97}_{39}$	$_{485} {770.52}_{1005}$	$_{25} {29.97}_{39}$	$_{485} {770.52}_{1005}$	$_{25} {29.97}_{39}$

Table 8. Experimental results for decision tables from Reference [20]—number of working nodes.

Decision	$L_{w}^{(1)} (T)$	$L_{w}^{(2)} (T)$	$L_{w}^{(3)} (T)$	$L_{w}^{(4)} (T)$	$L_{w}^{(5)} (T)$
Table T
balance-scale	100	865	98	865	98
breast-cancer	49	2415	45	2926	45
cars	104	14,981	99	14,981	99
hayes-roth-data	16	77	14	77	14
nursery	286	2,980,719	281	2,980,719	281
soybean-small	2	330	2	1281	2
tic-tac-toe	88	27,867	85	65,104	86
zoo-data	7	151	7	284	7
Average	82	378,426	79	383,280	79

Table 9. Experimental results for Boolean functions—number of working nodes.

Number of	$L_{w}^{(1)}$	$L_{w}^{(2)}$	$L_{w}^{(3)}$	$L_{w}^{(4)}$	$L_{w}^{(5)}$
Variables n
3	$_{2} {3.70}_{6}$	$_{1} {3.55}_{8}$	$_{1} {2.58}_{4}$	$_{1} {3.55}_{8}$	$_{1} {2.58}_{4}$
4	$_{4} {7.63}_{12}$	$_{3} {12.36}_{21}$	$_{3} {5.62}_{9}$	$_{3} {12.36}_{21}$	$_{3} {5.62}_{9}$
5	$_{8} {14.71}_{20}$	$_{27} {56.25}_{83}$	$_{8} {11.38}_{15}$	$_{27} {56.25}_{83}$	$_{8} {11.38}_{15}$
6	$_{24} {28.97}_{38}$	$_{153} {286.52}_{401}$	$_{19} {22.76}_{29}$	$_{153} {286.52}_{401}$	$_{19} {22.76}_{29}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Optimization of Decision Trees with Hypotheses for Knowledge Representation. Electronics 2021, 10, 1580. https://doi.org/10.3390/electronics10131580

AMA Style

Azad M, Chikalov I, Hussain S, Moshkov M. Optimization of Decision Trees with Hypotheses for Knowledge Representation. Electronics. 2021; 10(13):1580. https://doi.org/10.3390/electronics10131580

Chicago/Turabian Style

Azad, Mohammad, Igor Chikalov, Shahid Hussain, and Mikhail Moshkov. 2021. "Optimization of Decision Trees with Hypotheses for Knowledge Representation" Electronics 10, no. 13: 1580. https://doi.org/10.3390/electronics10131580

APA Style

Azad, M., Chikalov, I., Hussain, S., & Moshkov, M. (2021). Optimization of Decision Trees with Hypotheses for Knowledge Representation. Electronics, 10(13), 1580. https://doi.org/10.3390/electronics10131580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization of Decision Trees with Hypotheses for Knowledge Representation

Abstract

1. Introduction

2. Decision Tables

3. Decision Trees

4. Construction of Directed Acyclic Graph $Δ (T)$

5. Minimizing the Depth

6. Minimizing the Number of Realizable Nodes

7. Minimizing the Number of Realizable Terminal Nodes

8. Minimizing the Number of Working Nodes

9. On Number of Realizable Terminal Nodes