Next Article in Journal
Rolling Bearing Fault Diagnosis Using Multi-Sensor Data Fusion Based on 1D-CNN Model
Previous Article in Journal
A Novel Image Encryption Scheme Based on Elliptic Curves over Finite Rings
Previous Article in Special Issue
Language Representation Models: An Overview
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Two-Parameter Fractional Tsallis Decision Tree

by
Jazmín S. De la Cruz-García
1,†,
Juan Bory-Reyes
2 and
Aldo Ramirez-Arellano
1,*,†
1
SEPI-UPIICSA, Instituto Politécnico Nacional, Mexico City C.P. 08400, Mexico
2
SEPI-ESIME-ZACATENCO, Instituto Politécnico Nacional, Mexico City C.P. 07738, Mexico
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2022, 24(5), 572; https://doi.org/10.3390/e24050572
Submission received: 28 February 2022 / Revised: 2 April 2022 / Accepted: 15 April 2022 / Published: 19 April 2022
(This article belongs to the Special Issue Information-Theoretic Data Mining)

Abstract

:
Decision trees are decision support data mining tools that create, as the name suggests, a tree-like model. The classical C4.5 decision tree, based on the Shannon entropy, is a simple algorithm to calculate the gain ratio and then split the attributes based on this entropy measure. Tsallis and Renyi entropies (instead of Shannon) can be employed to generate a decision tree with better results. In practice, the entropic index parameter of these entropies is tuned to outperform the classical decision trees. However, this process is carried out by testing a range of values for a given database, which is time-consuming and unfeasible for massive data. This paper introduces a decision tree based on a two-parameter fractional Tsallis entropy. We propose a constructionist approach to the representation of databases as complex networks that enable us an efficient computation of the parameters of this entropy using the box-covering algorithm and renormalization of the complex network. The experimental results support the conclusion that the two-parameter fractional Tsallis entropy is a more sensitive measure than parametric Renyi, Tsallis, and Gini index precedents for a decision tree classifier.

1. Introduction

Entropy is a measure of the unpredictability of the state in physical systems that would be needed to specify the degree of disorder in full micro-structure of them. Claude Elwood Shannon [1] defined a measure of entropy to measure the amount of information in a digital system in the context of theory communication that has been applied in a variety of fields such as information theory, complex networks, and data mining techniques.
The most widely used form of the Shannon entropy is given by
I = lim t 1 d d t i = 1 N p i t = i = 1 N p i ln p i ,
where N is the number of possibilities p i and k = 1 n p k = 1 .
Two celebrated generalizations of Shannon entropy are Renyi [2] and Tsallis entropies [3]. Alfred Renyi proposed a universal formula to define a family of entropy measures given by the expression [2]
I q R = 1 1 q i = 1 N p i q ,
where q denotes the order of moments.
Constantino Tsallis proposed the q -logarithm defined by
l n q ( x ) = x 1 q 1 1 q ,
to introduce a physical entropy given by [3]
I q T = i = 1 N p i l n q p i = 1 q 1 ( 1 i = 1 N p i q ) .
Tsallis entropy could be rewritten [4,5,6] as
I q T = lim t 1 D q t i = 1 N p i t ,
where D q t of a function f given by D q t f ( t ) = f ( q t ) f ( t ) ( q 1 ) t , t 0 , stands for the Jackson [7] q -derivative, to reflect that it is an extension of Shannon entropy.
Renyi and Tsallis entropy measures depend on the parameter q, which describes their deviations from the standard Shannon entropy. Both entropies converge to Shannon entropy in the limit q 1 . For complex network applications [8] and data mining techniques [9,10,11,12,13,14,15,16,17], the parameter q varies into a range of values. On the other hand, the computation of the entropic index q of the Tsallis entropy was implemented for physics applications in [18,19,20,21,22,23,24,25].
Shannon and Tsallis entropies can be obtained by the action of standard derivative or q -derivative, respectively, to the same generating function i = 1 N p i t with respect to variable t and then letting t 1 . This approach can be used to reveal different entropy measures based on the actions of appropriate fractional order differentiation operators [26,27,28,29,30,31,32].
The major goal of this work is to introduce a new decision tree based on a two-parameter fractional Tsallis entropy. This new kind of tree is tested on twelve databases for a classification task. The structure of the paper is as follows. Section 2 focuses attention on the notion of two-parameter fractional Tsallis entropy. In Section 3, two-parameter fractional Tsallis decision trees and a constructionist approach to the representation of the databases as a complex network are introduced. The basic facts on the box-covering algorithm of a complex network are reviewed. Finally, we compute an approximation set of parameters q, β , and α of the two-parameter fractional Tsallis entropy. Section 4 is concerned with the testing of two-parameter fractional Tsallis decision trees on twelve databases. Next, the approximations of q values are tested on Renyi and Tsallis entropies. Discussion of the findings of this study and concluding remarks are offered in Section 5.

2. Two-Parameter Fractional Tsallis Entropy

Based on the actions of fractional order differentiation operators, several entropy measures of fractional order are introduced in [26,27,28,29,30,33,34,35,36,37,38,39,40,41,42,43]. Following this approach in [22], the two-parameter fractional Tsallis entropy is introduced by merging two typical examples of fractional entropies.
The first fractional entropy of order δ ( 0 , 1 ] is introduced as
I δ 1 = lim t 1 d δ d t i = 1 N p i t = i = 1 N p i δ ln p i ,
and the second one by
I δ 2 = lim t 1 d d t ( D t δ 1 i = 1 N e t ln p i ) = i = 1 N p i ( ln p i ) δ , w h e r e
d δ f ( t ) d t = lim h 0 f δ ( t + h ) f δ ( t ) ( t + h ) δ h δ
and
D t δ 1 f ( t ) = 1 Γ ( 1 δ ) t f ( t ) ( t t ) δ d t ,
where Γ denotes the gamma function.
Combining (6) with (7) yields a two-parameter fractional relative entropy as follows [31]:
I β α = i = 1 N p i α ( ln p i ) β ,
for 0 < α , β . The entropy (10) reduces to (6) when β 1 and reduces to (7) when α 1 , yielding the Shannon entropy when both parameters approach 1 .
Analogously, two extra-parameter-dependent Tsallis entropies are introduced [22]:
T δ , q 1 = i = 1 N p i δ l n q p i .
and
T δ , q 2 = i = 1 N p i ( ln q p i ) δ .
Combining theses entropies and motivated by (10), we obtain the following two-parameter fractional Tsallis entropy [22]:
T q α , β = i = 1 N p i α ( ln q p i ) β ,
for 0 < α , β .
Note that Tsallis entropy is recovered when lim α , β 1 T q α , β . This implies that the non-extensibility of T q [44] forces T q α , β to be so.

3. Parametric Decision Trees

A decision tree is a supervised data mining technique that creates a tree-like structure, where the non-leaf node tests a given attribute [45]. The outcome gives us the path to reach a leaf node, where the classification label is found. For example, let (x = 3, y = 1) be a tuple to be classified by the decision tree of Figure 1. If we test x = 1 , we must follow the left path to reach y = 1 and finally arrive at the leaf node with the classification label “a”.
In general, the cornerstone of the construction process of decision trees is the evaluation of all attributes to find the best node and the best split condition on this node to classify the tuple with the lower error rate. This evaluation is carried out by information gain on each attribute a [45]:
G a = I ( D ) I c ( D ) ,
where I ( D ) is the entropy of the database after being partitioned by the condition c of a given attribute a and I c ( D ) is the entropy induced by c. The tree’s construction needs to evaluate several partition conditions c on all attributes of the database, then chooses the pair of attribute–condition with the highest value. Once a pair is chosen, the process evaluates the partitioned database recursively using a different attribute–condition. The reader is referred to [45] for details on decision tree construction and computation of (14).

3.1. Renyi and Tsallis Decision Trees

In classical decision trees, I in (14) denotes Shannon entropy; however, other entropies such as Renyi or Tsallis can replace it. Thus, (14) can be written using Renyi entropy (2) as
G R , a = I q R ( D ) I q , c R ( D ) ,
and using Tsllis entropy (4) as follows:
G T , a = I q T ( D ) I q , c T ( D ) .
The parametric decision trees generated by (15) or (16) have been studied in [9,10,11,12,13,14].

3.2. Two-Parameter Fractional Tsallis Decision Tree

Following a similar fashion, a two-parameter fractional decision tree can be induced by the information gain obtained by rewritten (14) using (13):
G T q α , β , a = T q α , β ( D ) T q , c α , β ( D ) ,
An alternative informativeness measure for constructing decision trees is the Gini index, or Gini coefficient, which is calculated by
G i n i = 1 i = 1 N p i 2 .
The Gini index can be deduced from Tsallis entropy (4) using q = 2 [14]. On the other hand, the two-parameter fractional Tsallis entropy with q = 2 , α = 1 , β = 1 reduces to the Gini index. Hence, Gini decision trees are a particular case of both Tsallis and two-parameter fractional Tsallis trees.
The main issue with Renyi and Tsallis decision trees is the estimation of q -value to obtain a better classification than the one produced by the classical decision trees. Trial and error is the accepted approach for this purpose. It consists of testing several values in a given interval, usually [ 10 , 10 ] , and comparing the classification rates. This approach becomes unfeasible in two-parameter fractional Tsallis decision trees as it is needed to tune q, α , and β . A representation of a database as a complex network is introduced to face this issue. This representation lets us compute α and β following the approach in [22], which is the basis for determining the fractional decision tree parameters.

3.3. Network’s Construction

A network is a powerful tool to model the relationships among entities or parts of a system. When those relationships are complex, i.e., properties that cannot be found by examining single components, something emerges that is called a complex network. Thus, networks as a skeleton of complex systems [46] have attracted considerable attention in different areas of science [47,48,49,50,51]. Following this approach, a representation of the relationships among attributes (system entities) of a database (system) as a network is obtained.
The attribute’s name will be concatenated before the value of a given row to distinguish the same value that might appear on different attributes. Consider the first record of the database shown on the top of Figure 2. The first node will be N A M E . B r u c e D i c k i n s o n , the second node will be P H O N E . 54 76 90 , and the third node will be Z I P . 08510 . These nodes belong to the same record, so they must be connected; see dotted lines of the network in the middle of Figure 2. We next consider the second record; the nodes N a m e . M i c h a e l K i s k e and P H O N E . 87 34 67 will be added to the network. Note that the node Z I P . 08510 was added in the previous step. We may now add the links between these three nodes. This procedure is repeated for each record in the database.
The outcome is a complex network that exhibits non-simple topological features [52], which cannot be predicted by analyzing single nodes as occurs in random graphs or lattices [53].

3.4. Computation of Two-Parameter Fractional Tsallis Decision Tree Parameters

By the technique introduced in [22], the parameters α and β —on the network representation of the database—of the two-parameter fractional Tsallis decision tree are defined to be
α l , i 1 = 1 + | G i | i n n e r d e g ( G i ) n i = 1 N b i n n e r d e g ( G i ) ,
α l , i 2 = 1 | G i | i n n e r d e g ( G i ) n i = 1 N b i n n e r d e g ( G i ) ,
where | G i | is the number of nodes in the box G i obtained by the box-covering algorithm [54], n is the number of nodes of the network, and i n n e r d e g ( G i ) is the average degree of the nodes of the G i box. Similarly, two values of β are computed as follows [22]:
β l , i 1 = 1 + o u t e r d e g ( G i ) l Δ i = 1 N b o u t e r d e g ( G i ) ,
β l , i 2 = 1 o u t e r d e g ( G i ) l Δ i = 1 N b o u t e r d e g ( G i ) ,
where l > 2 is the diameter of the box G i , Δ is the diameter of the network, and o u t e r d e g ( G i ) is the number of links among the boxes G i . The computation of i n n e r d e g and o u t e r d e g will be explained later.
Inspired by the right-hand term of (19) and (20) (named α ) with the fact that δ = l N b ( l ) Δ l = 2 Δ N b ( l ) is a normalized measure of the number of boxes to cover the network [20], an approximation of the q -value for the two-parameter fractional decision tree is introduced:
q α = δ α = n l N b ( l ) i = 1 N b i n n e r d e g ( G i ) Δ | G i | i n n e r d e g ( G i ) l = 2 Δ N b ( l ) .
Similarly, from the right hand of (21) and (22) (named β ), a second approximation of the q -value is given by
q β = δ β = N b ( l ) i = 1 N b o u t e r d e g ( G i ) o u t e r d e g ( G i ) l = 2 Δ N b ( l ) ,
where N b ( l ) is the minimum number of boxes of diameter l to cover the network, n, Δ , N b ( l ) , | G i | .
The process to compute the minimum number of boxes N b of diameter l to cover the network G is shown in Figure 3. A dual network ( G ) is created only with the nodes of the original network, Figure 3b. Then, the links in G are added following the rule: two nodes i, j, in the dual network, are connected if the distance between l i j is greater than or equal to l. In our example, l = 3 , and node one is selected to start. Node one will be connected in G with nodes five and six since their distance is four and three. The procedure is repeated with the remaining nodes to obtain the dual network shown in Figure 3b. Next, the nodes will be colored as follows: two directly connected nodes in G must not have the same color. Finally, the nodes colored in G are mapped to G; see Figure 3c. The minimum number of boxes to cover the network given l equals the number of colors in G. In addition, the nodes in the same color belong to the same box. In practice, l = [ 1 , Δ ] ; thus, N b ( l ) of the example are shown in Table 1. For details of the box-covering algorithm, the reader is referred to [54].
Now, we are ready to compute i n n e r d e g . Two boxes were found following the previous example for for l = 3 ; see Figure 4a. The i n n e r d e g ( G 1 ) = 2 is the average link per node between the nodes of this box; for this reason, the link between nodes four and six is omitted in this computation. Similarly, i n n e r d e g ( G 2 ) = 1 . The o u t e r d e g is the degree of each node of the renormalized network; see the network of Figure 4b.
In our example, o u t e r d e g ( G 1 ) = o u t e r d e g ( G 2 ) = 1 . The renormalization converts each box into a super node, preserving the connections between boxes. On the other hand, it is known that N b ( 1 ) = n , and N b ( Δ + 1 ) = 1 ; in the first case, each box contains a node, and in the second one, there is one box to cover the network that contains all nodes. For this reason, the i n n e r d e g and o u t e r d e g are not defined for l = 1 and l = Δ + 1 , respectively. This force to l = [ 2 , Δ ] as was stated in (19)–(24). Additionally, note that the right hand of (19) and (20) ( α ), (21) and (22) ( β ) are “pseudo matrices”, where each row has N b ( l ) values; see Table 1. Consequently, q α and q β are also “pseudo matrices”.
The network represents the relationships between attribute-value (nodes) of each record and the relationships between different database records. For example, the dotted lines in Figure 2 show the relationships between the first record’s attribute value. Links of the node ZIP.08510 are the relationships between the three records, and the links of PHONE.54-76-90 are the relationships between the first and third one. The box-covering algorithm groups these relationships into boxes (network records). The network in the middle of Figure 2 shows that the three boxes (in orange, green, and blue) coincide with the number of records in the database. However, the attribute value of each box does not coincide entirely with records in the database since box-covering finds the minimum number of boxes with the maximum number of attributes where the boxes are mutually exclusive.
The nodes in each box (network record) are enough to differentiate the records in the database. For example, the first network record consists of name, phone, and zip values (nodes in orange). The second record in the database can be differentiated from the first by its name and phone (values of those attributes are the second network record in green). The third one can be distinguished from the two others by its name (the third network record in blue). The cost of differentiating the first network record (measured by i n n e r d e g ) is the highest; meanwhile, the lowest is for the third. Thus, α measures the local differentiation cost for the network records.
On the other hand, β measures the global differentiation cost (by o u t e r d e g ). For example, the global cost for the first network record is two, and one for the second and third; see the renormalized network (at the bottom of Figure 2). It means that the first network record needs to be differentiated from two network records, and the second and third only need to be distinguished from the first. Note that α , β for a given l relies on the topology network that captures the relationships of the records and their values. Finally, q α is the ratio between network records (normalized number of boxes δ ) and the local differentiation cost; meanwhile, q β is the ratio between network records and the global differentiation cost.

4. Methodology

Twelve databases (biological, technological, and social disciplines) from the UCI repository [55] were managed in the experiments; see Table 2. Their number of records, attributes, and classes are representative. Once a network was obtained from the database, q, α , and β parameters of the fractional Tsallis decision were approximated by the following four sets: ( < q α > , < α 1 > , < β 1 > ) , ( < q α > , < α 2 > , < β 2 > ) , ( < q β > , < α 1 > , < β 1 > ) , ( < q β > , < α 2 > , < β 2 > ) , where < > means the average value of the pseudo matrices obtained by (19)–(24).
The network can be obtained from a raw database or after being discretized. Since the classification—measured by the area under receiver operating characteristic curve (AUROC) and Matthews correlation coefficient (MCC)—was better using the approximations computed on the networks from discretized databases, these approximations are only reported. The attribute discretization of a database can be found in [56]. The discretization technique is unsupervised and uses equal-frequency binning. The discretized databases were only used to obtain the networks so that the classification task was carried out using the original databases. The networks obtained from the discretized and non-discretized databases turned out to be different; see Figure 5.
The classification task was performed by classical, Renyi, Tsallis, Gini, and the two-parameter fractional Tsallis decisions trees on each database. We used a 10-fold cross-validation repeated ten times to calculate the AUROC and MCC. The best value of the AUROC and MCC, produced by one of the four sets of parameters—used to approximate q, α , and β —of fractional Tsallis decisions trees, was chosen and compared with the classical and Gini decision trees. In the same way, q α or q β was chosen for the q parameter of Renyi and Tsallis trees. Then, their AUROCs and MCCs were compared with those of the classical trees. It is known that decision trees could produce non-normal distributed AUROC and MCC measures [57]. Hence, the normality was verified by the Kolmogorov–Smirnov test. These measures were compared using a T or a U Mann–Whitney test, according to their normality [10,57,58,59].

5. Applications

The approximations of q, α , and β parameters computed on discretized databases are shown in Table 3. Table 4 shows the AUROC and MCC of classical and two-parameter fractional Tsallis decisions trees and the result of the statistical compassion. In addition, the values of the parameters of fractional Tsallis decisions trees are reported.
The two-parameter fractional Tsallis decision tree outperforms the AUROC and MCC of the classical trees for eight databases. The statistical result of both measures disagrees with Car and Haberman. The AUROC of the two-parameter fractional Tsallis tree was equal to the classical trees for Car, Image, Vehicle, and Yeast; meanwhile, for Haberman, Image, Vehicle, and Yeast, the MCC of both trees showed no difference.
Tsallis entropy is a non-extensive measure [60] as well as a two-parameter fractional Tsallis entropy [22]. On the contrary, Shannon entropy is extensive. The super-extensive property is given by q < 1 , and sub-extensive property by q > 1 . Note that the approximations of the q parameter for all the databases, see Table 3, are < 1 except for Yeast. Thus, they can be considered candidates for being named super-extensive databases. We say that a database is super-extensive if q < 1 and its value produces a better classification (AUROC, MCC, or another measure) than the classical trees (based on Shannon entropy). Similarly, a database is sub-extensive if q > 1 and its value produces a better classification. Otherwise, the database is extensive since, in this case, the Shannon entropy (the cornerstone of classical trees) is a less complex measure than the two-parameter fractional Tsallis entropy; hence Shannon entropy must be preferred. The two-parameter fractional Tsallis trees produce classifications equal to or better than the classical trees. Following those conditions, based on MCC, Breast Cancer, Car, Cmc, Glass, Hayes, Letter, Scale, and Wine are super-extensive. Meanwhile, Haberman, Image, Vehicle, and Yeast can be classified as extensive.
The AUROC and MCC of Renyi and Tsallis decision trees are compared with the baseline of the classical ones. The q α and q β were tested as the entropic index of both parametric decision trees. The parameters of Renyi ( q r ) and Tsallis ( q t ) that produce the better AUROCs and MCCs are reported in Table 5. The result shows that the AUROC of Renyi trees was better for Breast Cancer, Glass, Letter, and Yeast and worse for Cmc and Haberman than classical trees. The results are quite similar for MCC, where Car’s classification outperforms the classical tree classification. On the contrary, the MCC of the Vehicle database was statistically less than that of the classical tree. The Tsallis AUROCs were better for Cmc, Glass, Haberman, Hayes, and Wine and worse for Yeast than those of classical trees. Additionally, the MCCs of Car, Cmc, Glass, and Scale were higher, and lower for Yeast, than the classical trees’ MCCs. Based on MCC, Car, Cmc, Glass, and Scale are super-extensive, which is a subset of the classification obtained by two-parameter fractional Tsallis.
Finally, the Gini and the two-parameter fractional Tsallis decisions trees are compared using AUROC and MCC. The results are shown in Table 6. These results indicate that two-parameter fractional Tsallis trees outperform AUROC of Gini trees in six databases, and MCC in ten. It underpins that Gini trees are a particular case of two-parameter fractional Tsallis trees with q = 2 . In summarizing, two-parameter fractional Tsallis trees have better classifications than classical and Gini trees.

6. Conclusions

This paper introduces two-parameter fractional Tsallis decision trees underpinned by fractional-order entropies. The three parameters of this new decision tree need to be tuned to produce better classifications than the classical ones. The trial and error approach is the standard method to adjust the entropic index for Renyi and Tsallis decision trees. However, it is unfeasible for two-parameter fractional Tsallis trees. From a database representation as a complex network, it was possible to determine a set of values for parameters q, α , and β based on this network. The experimental results on twelve databases show that the proposed values yield better classifications (AUROC, MCC) for eight of them, and for the four remaining, the classification was equal to that produced by classical trees.
Moreover, two values ( q a l p h a , q b e t a ) were tested in Renyi and Tsallis decision trees. The results show that Renyi outperforms the classical trees in four (AUROC) and five (MCC) out of twelve databases. Similarly, Tsallis decision trees produced better classification for five (AUROC) and four (MCC) databases. The classification was worse in almost three and one databases for Renyi and Tsallis, respectively. The overall results of both parametric decision trees suggest that both outperform the classical trees in seven databases. All of the above is less favorable than what happened in eight databases analyzed with the two-parameter fractional Tsallis decision trees. In addition, the databases with a better classification using Tsallis decision trees are a subset of those for which two-parameter fractional Tsallis trees produced a better classification. It supports the conjecture that two-parameter fractional Tsallis entropy is a finer measure than the parametric entropies such as Renyi and Tsallis.
The approximate technique for the tree parameters introduced here is a valuable alternative for practitioners. Furthermore, the network classification based on the non-extensive properties of Tsallis and two-parameter fractional Tsallis entropies reveals that the relationships between the records and their attribute values (modeled by a network) are complex. Such complex relationships are better measured by two-parameter fractional Tsallis entropy, the cornerstone of the proposed decision tree.
The results pave the way for using the two-parameter Tsallis fractional entropy in other data mining techniques such as K-means, generic MST, Kruskal MST, and algorithms for dimension reduction in the future. Our research has the limitation that the databases used in the experiments are not large enough to reveal the reduction in time compared with the trial-and-error approach to set the tree parameters. However, we may conjecture that our method works in large databases, which will be the scope of future research.

Author Contributions

Conceptualization, A.R.-A.; formal analysis, J.S.D.l.C.-G.; investigation, J.S.D.l.C.-G. and J.B.-R.; methodology, J.S.D.l.C.-G.; supervision, A.R.-A.; writing—original draft, J.S.D.l.C.-G.; writing—review and editing, J.B.-R. and A.R.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Secretaria de Investigación de Posgrado grant number SIP20220415.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

This work was partially supported by Secretaria de Investigación de Posgrado under Grant No. SIP20220415.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

MDPIMultidisciplinary Digital Publishing Institute
DOAJDirectory of open-access journals
TLAThree-letter acronym
LDLinear dichroism

References

  1. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
  2. Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; The Regents of the University of California, University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
  3. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  4. Abe, S. A note on the q-deformation-theoretic aspect of the generalized entropies in nonextensive physics. Phys. Lett. A 1997, 224, 326–330. [Google Scholar] [CrossRef]
  5. Johal, R.S. q calculus and entropy in nonextensive statistical physics. Phys. Rev. E 1998, 58, 4147. [Google Scholar] [CrossRef]
  6. Lavagno, A.; Swamy, P.N. q-Deformed structures and nonextensive-statistics: A comparative study. Phys. A Stat. Mech. Appl. 2002, 305, 310–315. [Google Scholar] [CrossRef]
  7. Jackson, D.O.; Fukuda, T.; Dunn, O.; Majors, E. On q-definite integrals. Q. J. Pure Appl. Math. 1910, 41, 193–203. [Google Scholar]
  8. Duan, S.; Wen, T.; Jiang, W. A new information dimension of complex network based on Rényi entropy. Phys. A Stat. Mech. Appl. 2019, 516, 529–542. [Google Scholar] [CrossRef]
  9. Maszczyk, T.; Duch, W. Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees. In Artificial Intelligence and Soft Computing—ICAISC 2008; Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 643–651. [Google Scholar]
  10. Ramirez-Arellano, A.; Bory-Reyes, J.; Hernandez-Simon, L.M. Statistical Entropy Measures in C4.5 Trees. Int. J. Data Warehous. Min. 2018, 14, 1–14. [Google Scholar] [CrossRef]
  11. Gajowniczek, K.; Orlowski, A.; Zabkowski, T. Entropy Based Trees to Support Decision Making for Customer Churn Management. Acta Phys. Pol. A 2016, 129, 971–979. [Google Scholar] [CrossRef]
  12. Lima, C.F.L.; de Assis, F.M.; Cleonilson Protásio, C.P. Decision Tree Based on Shannon, Rényi and Tsallis Entropies for Intrusion Tolerant Systems. In Proceedings of the 2010 Fifth International Conference on Internet Monitoring and Protection, Barcelona, Spain, 9–15 May 2010; pp. 117–122. [Google Scholar] [CrossRef]
  13. Wang, Y.; Song, C.; Xia, S.T. Improving decision trees by Tsallis Entropy Information Metric method. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 4729–4734. [Google Scholar] [CrossRef]
  14. Wang, Y.; Xia, S.T.; Wu, J. A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification. Knowl.-Based Syst. 2017, 120, 34–42. [Google Scholar] [CrossRef]
  15. Sharma, S.; Bassi, I. Efficacy of Tsallis Entropy in Clustering Categorical Data. In Proceedings of the 2019 IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India, 26–28 July 2019; pp. 1–5. [Google Scholar] [CrossRef]
  16. Zhang, L.; Cao, Q.; Lee, J. A novel ant-based clustering algorithm using Renyi entropy. Appl. Soft Comput. 2013, 13, 2643–2657. [Google Scholar] [CrossRef]
  17. Wang, Y.; Xia, S.T. Unifying attribute splitting criteria of decision trees by Tsallis entropy. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 2507–2511. [Google Scholar] [CrossRef]
  18. Tsallis, C.; Tirnakli, U. Non-additive entropy and nonextensive statistical mechanics – Some central concepts and recent applications. J. Phys. Conf. Ser. 2010, 201, 012001. [Google Scholar] [CrossRef]
  19. Tsallis, C. Introduction to Non-Extensive Statistical Mechanics: Approaching a Complex World; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  20. Ramirez-Arellano, A.; Hernández-Simón, L.M.; Bory-Reyes, J. A box-covering Tsallis information dimension and non-extensive property of complex networks. Chaos Solitons Fractals 2020, 132, 109590. [Google Scholar] [CrossRef]
  21. Ramirez-Arellano, A.; Sigarreta-Almira, J.M.; Bory-Reyes, J. Fractional information dimensions of complex networks. Chaos Interdiscip. J. Nonlinear Sci. 2020, 30, 093125. [Google Scholar] [CrossRef]
  22. Ramirez-Arellano, A.; Hernández-Simón, L.M.; Bory-Reyes, J. Two-parameter fractional Tsallis information dimensions of complex networks. Chaos Solitons Fractals 2021, 150, 111113. [Google Scholar] [CrossRef]
  23. Ramírez-Reyes, A.; Hernández-Montoya, A.R.; Herrera-Corral, G.; Domínguez-Jiménez, I. Determining the Entropic Index q of Tsallis Entropy in Images through Redundancy. Entropy 2016, 18, 299. [Google Scholar] [CrossRef] [Green Version]
  24. Chen, X.; Zhou, J.; Liao, Z.; Liu, S.; Zhang, Y. A Novel Method to Rank Influential Nodes in Complex Networks Based on Tsallis Entropy. Entropy 2020, 22, 848. [Google Scholar] [CrossRef]
  25. Zhang, Q.; Li, M.; Deng, Y. A new structure entropy of complex networks based on non-extensive statistical mechanics. Int. J. Mod. Phys. C 2016, 27, 1650118. [Google Scholar] [CrossRef]
  26. Shafee, F. Lambert function and a new non-extensive form of entropy. IMA J. Appl. Math. 2007, 72, 785–800. [Google Scholar] [CrossRef] [Green Version]
  27. Ubriaco, M.R. Entropies based on fractional calculus. Phys. Lett. A 2009, 373, 2516–2519. [Google Scholar] [CrossRef] [Green Version]
  28. Ubriaco, M.R. A simple mathematical model for anomalous diffusion via Fisher’s information theory. Phys. Lett. A 2009, 373, 4017–4021. [Google Scholar] [CrossRef] [Green Version]
  29. Karci, A. Fractional order entropy: New perspectives. Optik 2016, 127, 9172–9177. [Google Scholar] [CrossRef]
  30. Karci, A. Notes on the published article “Fractional order entropy: New perspectives” by Ali KARCI, Optik-International Journal for Light and Electron Optics, Volume 127, Issue 20, October 2016, Pages 9172–9177. Optik 2018, 171, 107–108. [Google Scholar] [CrossRef]
  31. Radhakrishnan, C.; Chinnarasu, R.; Jambulingam, S. A Fractional Entropy in Fractal Phase Space: Properties and Characterization. Int. J. Stat. Mech. 2014, 2014, 460364. [Google Scholar] [CrossRef] [Green Version]
  32. Ferreira, R.A.C.; Tenreiro Machado, J. An Entropy Formulation Based on the Generalized Liouville Fractional Derivative. Entropy 2019, 21, 638. [Google Scholar] [CrossRef] [Green Version]
  33. Machado, J.T. Entropy analysis of integer and fractional dynamical systems. Nonlinear Dyn. 2010, 62, 371–378. [Google Scholar]
  34. Machado, J.T. Fractional order generalized information. Entropy 2014, 16, 2350–2361. [Google Scholar] [CrossRef] [Green Version]
  35. Wang, Q.A. Extensive Generalization of Statistical Mechanics Based on Incomplete Information Theory. Entropy 2003, 5, 220–232. [Google Scholar] [CrossRef] [Green Version]
  36. Wang, Q.A. Incomplete statistics: Nonextensive generalizations of statistical mechanics. Chaos Solitons Fractals 2001, 12, 1431–1437. [Google Scholar] [CrossRef] [Green Version]
  37. Kaniadakis, G. Maximum entropy principle and power-law tailed distributions. Eur. Phys. J. B 2009, 70, 3–13. [Google Scholar] [CrossRef] [Green Version]
  38. Tsallis, C. An introduction to nonadditive entropies and a thermostatistical approach to inanimate and living matter. Contemp. Phys. 2014, 55, 179–197. [Google Scholar] [CrossRef]
  39. Kapitaniak, T.; Mohammadi, S.A.; Mekhilef, S.; Alsaadi, F.E.; Hayat, T.; Pham, V.T. A New Chaotic System with Stable Equilibrium: Entropy Analysis, Parameter Estimation, and Circuit Design. Entropy 2018, 20, 670. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Jalab, H.A.; Subramaniam, T.; Ibrahim, R.W.; Kahtan, H.; Noor, N.F.M. New Texture Descriptor Based on Modified Fractional Entropy for Digital Image Splicing Forgery Detection. Entropy 2019, 21, 371. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Ibrahim, R.W.; Jalab, H.A.; Gani, A. Entropy solution of fractional dynamic cloud computing system associated with finite boundary condition. Bound. Value Probl. 2016, 2016, 94. [Google Scholar] [CrossRef] [Green Version]
  42. He, S.; Sun, K.; Wu, X. Fractional symbolic network entropy analysis for the fractional-order chaotic systems. Phys. Scr. 2020, 95, 035220. [Google Scholar] [CrossRef]
  43. Machado, J.T.; Lopes, A.M. Fractional Rényi entropy. Eur. Phys. J. Plus 2019, 134, 217. [Google Scholar] [CrossRef]
  44. Beck, C. Generalized information and entropy measures in physics. Contemp. Phys. 2009, 50, 495–510. [Google Scholar] [CrossRef]
  45. Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2011. [Google Scholar]
  46. Hilpert, J.C.; Marchand, G.C. Complex Systems Research in Educational Psychology: Aligning Theory and Method. Educ. Psychol. 2018, 53, 185–202. [Google Scholar] [CrossRef]
  47. Karuza, E.A.; Thompson-Schill, S.L.; Bassett, D.S. Local Patterns to Global Architectures: Influences of Network Topology on Human Learning. Trends Cogn. Sci. 2016, 20, 629–640. [Google Scholar] [CrossRef] [Green Version]
  48. Ramirez-Arellano, A. Students learning pathways in higher blended education: An analysis of complex networks perspective. Comput. Educ. 2019, 141, 103634. [Google Scholar] [CrossRef]
  49. Zhao, X.; Wang, X.Y. An Approach to Compute Fractal Dimension of Color Images. Fractals 2017, 25, 1750007. [Google Scholar] [CrossRef]
  50. Stanisz, T.; Kwapień, J.; Drożdż, S. Linguistic data mining with complex networks: A stylometric-oriented approach. Inf. Sci. 2019, 482, 301–320. [Google Scholar] [CrossRef] [Green Version]
  51. RamirezArellano, A. Classification of Literary Works: Fractality and Complexity of the Narrative, Essay, and Research Article. Entropy 2020, 22, 904. [Google Scholar] [CrossRef] [PubMed]
  52. Kim, J.; Wilhelm, T. What is a complex graph? Phys. A Stat. Mech. Appl. 2008, 387, 2637–2652. [Google Scholar] [CrossRef]
  53. van Steen, M. Graph Theory and Complex Networks: An Introduction; Cambridge University Press: Cambridge, MA, USA, 2010. [Google Scholar]
  54. Song, C.; Gallos, L.K.; Havlin, S.; Makse, H.A. How to calculate the fractal dimension of a complex network: The box covering algorithm. J. Stat. Mech. Theory Exp. 2007, 2007, P03006. [Google Scholar] [CrossRef] [Green Version]
  55. Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 22 February 2022).
  56. Yang, Y.; Webb, G.I. Proportional k-Interval Discretization for Naive-Bayes Classifiers. In European Conference on Machine Learning (ECML 2001); Springer: Berlin/Heidelberg, Germany, 2001; Volume 2167, pp. 564–575. [Google Scholar]
  57. Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  58. Sprent, P.; Smeeton, N.C. Applied Nonparametric Statistical Methods, 3rd ed.; Texts in Statistical Science; Chapman & Hall/CRC: Boca Raton, FL, USA, 2001. [Google Scholar]
  59. Montgomery, D.C.; Runger, G.C. Applied Statistics and Probability for Engineers; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
  60. Tsallis, C. Entropic nonextensivity: A possible measure of complexity. Chaos Solitons Fractals 2002, 13, 371–391. [Google Scholar] [CrossRef] [Green Version]
Figure 1. A decision tree to the classification task.
Figure 1. A decision tree to the classification task.
Entropy 24 00572 g001
Figure 2. Network construction from a database. The nodes in the same color belong to the same box for l = 2 .
Figure 2. Network construction from a database. The nodes in the same color belong to the same box for l = 2 .
Entropy 24 00572 g002
Figure 3. Box covering of a network for l = 3 . (a) Original network. (b) Dual network. (c) Colouring process. (d) Mapping colours to the original network.
Figure 3. Box covering of a network for l = 3 . (a) Original network. (b) Dual network. (c) Colouring process. (d) Mapping colours to the original network.
Entropy 24 00572 g003
Figure 4. Renormalization of a network. (a) Grouping nodes into boxes. (b) Converting boxes into supernodes.
Figure 4. Renormalization of a network. (a) Grouping nodes into boxes. (b) Converting boxes into supernodes.
Entropy 24 00572 g004
Figure 5. The networks from (a) non-discretized and (b) discretized vehicle database.
Figure 5. The networks from (a) non-discretized and (b) discretized vehicle database.
Entropy 24 00572 g005
Table 1. The results of N b ( l ) and δ from the network of Figure 3, and the “pseudo matrix” of α .
Table 1. The results of N b ( l ) and δ from the network of Figure 3, and the “pseudo matrix” of α .
l N b ( l ) δ α l , 1 α l , 2 α l , 3
16----
230.107 α 2 , 1 α 2 , 2 α 2 , 3
320.107 α 3 , 1 α 3 , 2 -
420.143 α 4 , 1 α 4 , 2 -
51----
Table 2. Database and network features. N = nominal, U = numerical, M = mixed.
Table 2. Database and network features. N = nominal, U = numerical, M = mixed.
DatabaseRecordsAttributesTypeClassesBalancedNodesEdges
Breast Cancer6999N2No7371276
Car17286N4Yes2570
Cmc14739M3No74264
Glass21410U7No11591743
Haberman3063U2No94395
Hayes1605N3No150186
Image231019U7Yes12,70524,411
Letter20,00016U16Yes2822700
Scale6254N3No2390
Vehicle94618U4Yes14348064
Wine17813U3No12792239
Yeast14849M10No19174907
Table 3. The parameters of the fractional Tsallis decision tree were obtained using the networks from discretized databases.
Table 3. The parameters of the fractional Tsallis decision tree were obtained using the networks from discretized databases.
Database < q α > < q β > < α 1 > < β 1 > < α 2 > < β 2 >
Breast Cancer0.1730.1891.1471.1340.8530.866
Car0.3030.3471.1371.1200.8630.880
Cmc0.1690.1851.1521.1380.8480.862
Glass0.1710.1871.1541.1410.8460.859
Haberman0.3440.4201.3331.2730.6670.727
Hayes0.2690.3101.2311.2000.7690.800
Image0.1170.1231.0561.0540.9440.946
Letter0.1550.1651.051.0470.9500.953
Scale0.3520.4211.2171.1820.7830.818
Vehicle0.0920.0961.1061.1010.8940.899
Wine0.1190.1271.1471.1380.8530.862
Yeast4.5745.0811.0031.0030.9970.997
Table 4. The AUROC and MCC of classical (CT) and two-parameter fractional Tsallis decision trees (TFTT) and their parameters q, α , β . + means that AUROC or MCC is statistically greater than AUROC or MCC of CT.
Table 4. The AUROC and MCC of classical (CT) and two-parameter fractional Tsallis decision trees (TFTT) and their parameters q, α , β . + means that AUROC or MCC is statistically greater than AUROC or MCC of CT.
Database CT AUROC TFTT AUROC CT MCC TFTT MCC q α β Param. Set
Breast Cancer0.959 0.964   + 0.889 0.967   + 0.1730.8530.866 < q α > , < α 2 > , < β 2 >
Car0.9810.9820.892 0.912   + 0.3470.8630.880 < q β > , < α 2 > , < β 2 >
Cmc0.691 0.714   + 0.315 0.349   + 0.1691.1521.138 < q α > , < α 1 > , < β 1 >
Glass0.794 0.874   + 0.56 0.673   + 0.1711.1541.141 < q α > , < α 1 > , < β 1 >
Haberman0.579 0.610   + 0.18 0.156 0.3441.3331.273 < q α > , < α 1 > , < β 1 >
Hayes0.869 0.895   + 0.578 0.645   + 0.2691.2311.200 < q α > , < α 1 > , < β 1 >
Image0.9940.9920.982 0.978 0.1231.0561.054 < q β > , < α 1 > , < β 1 >
Letter0.969 0.974   + 0.912 0.934   + 0.1550.9500.953 < q α > , < α 2 > , < β 2 >
Scale0.845 0.861   + 0.678 0.703   + 0.4211.2171.182 < q β > , < α 1 > , < β 1 >
Vehicle0.7620.7550.395 0.387 0.0920.8940.899 < q α > , < α 2 > , < β 2 >
Wine0.968 0.977   + 0.933 0.957   + 0.1191.1471.138 < q α > , < α 1 > , < β 1 >
Yeast0.7430.7330.462 0.463 4.5740.9970.997 < q α > , < α 2 > , < β 2 >
Table 5. AUROC and MCC of classical (CT), Renyi (RT), and Tsallis (TT) decision trees. + means that AUROC is statistically greater than AUROC or MCC of CT, and − means the opposite.
Table 5. AUROC and MCC of classical (CT), Renyi (RT), and Tsallis (TT) decision trees. + means that AUROC is statistically greater than AUROC or MCC of CT, and − means the opposite.
Database CT AUROC RT AUROC TT AUROC CT MCC RT MCC TT MCC q r q t
Breast Cancer0.959 0.971   + 0.9630.889 0.901   + 0.887 < q α > = 0.173 < q α > = 0.173
Car0.9810.9830.9820.892 0.906   + 0.912   + < q α > = 0.303 < q β > = 0.347
Cmc0.691 0.676   0.712   + 0.315 0.256   0.35   + < q β > = 0.185 < q α > = 0.169
Glass0.794 0.838   + 0.835   + 0.56 0.622   + 0.599   + < q β > = 0.187 < q α > = 0.171
Haberman0.579 0.500   0.610   + 0.18 0.024   0.152 < q α > = 0.344 < q α > = 0.334
Hayes0.8690.869 0.895   + 0.578 0.579 0.587 < q α > = 0.269 < q α > = 0.269
Image0.9940.9970.9950.9820.9840.978 < q α > = 0.117 < q β > = 0.123
Letter0.969 0.980   + 0.9670.912 0.939   + 0.913 < q β > = 0.165 < q α > = 0.155
Scale0.8450.8390.8570.678 0.651 0.706   + < q β > = 0.421 < q β > = 0.421
Vehicle0.7620.7760.7480.395 0.297   0.371 < q β > = 0.096 < q α > = 0.092
Wine0.9680.963 0.976   + 0.933 0.923 0.924 < q α > = 0.119 < q α > = 0.119
Yeast0.743 0.789   + 0.578   0.462 0.505   + 0.098   < q β > = 5.081 < q α > = 4.574
Table 6. AUROC and MCC of Gini decision trees (GT) and two-parameter fractional Tsallis decision trees (TFTT). + means that AUROC is statistically greater than AUROC or MCC of GT.
Table 6. AUROC and MCC of Gini decision trees (GT) and two-parameter fractional Tsallis decision trees (TFTT). + means that AUROC is statistically greater than AUROC or MCC of GT.
Database GT AUROC TFTT AUROC GT MCC TFTT MCC
Breast Cancer0.963 0.964 0.888 0.967   +
Car0.981 0.982 0.897 0.912   +
Cmc0.58 0.714   + 0.357 0.349
Glass0.712 0.874   + 0.437 0.673   +
Haberman0.52 0.61   + 0.068 0.156   +
Hayes0.871 0.895   + 0.655 0.645
Image0.988 0.992 0.946 0.978   +
Letter0.962 0.974   + 0.894 0.934   +
Scale0.866 0.861 0.654 0.703   +
Vehicle0.71 0.755   + 0.294 0.387   +
Wine0.932 0.977 0.847 0.957   +
Yeast0.728 0.733 0.414 0.463   +
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

De la Cruz-García, J.S.; Bory-Reyes, J.; Ramirez-Arellano, A. A Two-Parameter Fractional Tsallis Decision Tree. Entropy 2022, 24, 572. https://doi.org/10.3390/e24050572

AMA Style

De la Cruz-García JS, Bory-Reyes J, Ramirez-Arellano A. A Two-Parameter Fractional Tsallis Decision Tree. Entropy. 2022; 24(5):572. https://doi.org/10.3390/e24050572

Chicago/Turabian Style

De la Cruz-García, Jazmín S., Juan Bory-Reyes, and Aldo Ramirez-Arellano. 2022. "A Two-Parameter Fractional Tsallis Decision Tree" Entropy 24, no. 5: 572. https://doi.org/10.3390/e24050572

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop