Next Article in Journal
Factor Implicit Model of Machine Classification Learning Based on Factor Space Theory
Previous Article in Journal
The Influence of Lower Limb Muscle Selection on Synergy Analysis during Running
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Research on a Streamlined Causal Tree Algorithm Based on Factor Space Theory †

1
College of Science, Liaoning Technical University, Fuxin 123000, China
2
Institute of Intelligent Engineering and Math, Liaoning Technical University, Fuxin 123000, China
*
Authors to whom correspondence should be addressed.
Presented at the 2023 Summit of the International Society for the Study of Information (IS4SI 2023), Beijing, China, 14–16 August 2023.
Comput. Sci. Math. Forum 2023, 8(1), 72; https://doi.org/10.3390/cmsf2023008072
Published: 16 August 2023
(This article belongs to the Proceedings of 2023 International Summit on the Study of Information)

Abstract

:
Decision rule extraction is an important tool for artificial intelligence and data mining, but decision rule redundancy reduces the generalization ability of causal trees. In order to better reduce the size of causal trees and improve the classification accuracy, based on factor space theory and aiming at the elimination of noise and special samples in the dataset using the extension decision degree criteria, the conditional factor corresponding to the optimal extension decision degree is used as the branch node of the tree, and the abnormal state object is removed from the conditional factor, recurring to obtain the streamlined causal tree algorithm. Comparison with other classification algorithms shows that the streamlined causal tree algorithm produces the smallest causal tree size, the least redundant rules, and the best classification accuracy.

1. Introduction

In 1982, Wang Peizhuang [1] proposed the idea of factor space from the origin of object cognition and based on it, established the mathematical theory of knowledge representation—factor space theory—which is the earliest basic theory of artificial intelligence in international intelligence mathematics. In 2014, Wang Peizhuang [2] et al. Proceeded with the rapid extraction of causal rules based on the logical nature of reasoning and proposed the factor analysis method, which is one of the core algorithms in factor space and provides important tools for artificial intelligence and data mining. Bao Yanke [3] et al. proposed a subtraction and rotation calculation to improve the utilization of factor analysis methods in the training set sample information. Liu Haitao [4] et al. provided a reasoning model for the factor analysis method, which solved the problem of object recognition caused by incomplete training set samples and improved the accuracy of the factor analysis method. Wang Huadong [5] adopts a column-by-column advancement method when selecting factors for superposition division to improve the accuracy and running speed of the factor analysis method.
However, current literature studies have not significantly reduced the size of the causal tree in the factor analysis method. The main method to reduce the size of the causal tree is pruning [6]. Current literature research shows that pruning can reduce the size of causal trees to a certain extent, but the resulting causal trees are not streamlined. The size of the causal tree reflects the generalization ability of the tree to a certain extent. The more complex the rules extracted from the dataset, the larger the size of the tree. Rule redundancy will lead to overfitting and weaken the generalization ability. It is particularly important to minimize the size of the causal tree without affecting classification accuracy. Therefore, this paper proposes a streamlined causal tree algorithm where, by using a self-defined threshold, the noise samples in the training set are filtered out and the optimized causal tree is trained in the same step, thereby greatly reducing the size of the causal tree and improving its classification performance. In addition, the deletion of the determining region is a key factor in reducing the computational complexity of the algorithm and achieving fast convergence. The streamlined causal tree algorithm can find a larger determining region, enabling the algorithm to converge faster under the optimal threshold.

2. Basic Knowledge

Factor is a key to describing everything and can be understood as a generalized gene. From a mathematical perspective, factors are a special mapping that maps objects onto their phases. The basic theories related to factor space [7] are as follows:
Factors influence each other, restrict each other, and cause and affect each other. In the factor analysis method, the factor g that is concerned is called the result factor, and those factors { f 1 , f 2 , f n } that have an influence on it are called conditional factors.
The causal analysis table takes the object as the row and the conditional and result factors as the columns, as shown in Table 1. The i-th row and j-th column elements in Table 1 represent the state of the i-th object under the j-th factor.
Definition 1.
Given a conditional factor  f j  and the state  a t  taken by that factor, remember  [ a t ] = { u i | f j ( u i ) = a t } , if all objects in  [ a t ]  have the same result, (there is a state or level where the result factor g exists such that  [ g l ] = { u i | g ( u i ) = g l } [ a t ] ), it is said to be a determining class  [ a t ]  of factor  f j . The union of all determining classes of factor  f j  is called the determining region for the result factor. The ratio of the number of rows h in the determining region of the factor  f j  to the number of rows in the table (i.e., the number of all objects) m is called its determining degree on the result factor g, denoted as  d ( f j ) = h / m .
Definition 2.
If the class  [ a t ]  of conditional factors  f j  is a determining class and all objects in the class  [ a t ]  have a unique and definite result, then it is called “if  f j  is  a t , then the result g is  g l ”. This sentence is a reasoning sentence determined by conditional factors  f j , denoted as  f j = a t g = g l .

3. The Streamlined Causal Tree Algorithm

The factor analysis method in the factor space can quickly and concisely analyze the causal relationships contained in the dataset, establish causal rules, and obtain a causal tree. However, when using the factor analysis method to train causal trees, when there are too many conditional factors in the dataset or when there are many states of conditional factors, the trained causal tree rules are redundant, and the prediction effect is poor. Since the calculation principle of determining degree is too absolute, noisy object data and special object data generated due to input errors, measurement equipment failures, and other reasons in the dataset will have a significant negative impact on the training of the factor analysis method. This means that it cannot cope with noisy data, has poor robustness, and the decision effectiveness of factors cannot be fully utilized, thus limiting the application of this algorithm. Even if pre- and post-pruning are used for the trained causal tree, the negative impact is inevitable.
In order to solve the interference of noisy data, improve the robustness of causal tree algorithms, reduce the size of causal trees, and improve the accuracy of classification prediction, a streamlined causal tree algorithm is proposed.

3.1. Algorithm Principle

The purpose of factor analysis in factor space is to transform a table into a set of inference sentences (decision rules). Since the determining class is contained by the result class, an inference sentence is formed from the determining class to the result class containing it, and finally a rule causal tree is obtained from the conditional factor to the result factor.

3.1.1. Theoretical Knowledge

Definition 3.
(extended determining class) Given a conditional factor  f j  and a state  [ a t ]  taken by that factor, remember  [ a t ] = { u i | f j ( u i ) = a t , u i U } . All objects in  [ a t ]  have  [ g l ] = { u i | g ( u i ) = g l ,   u i [ a t ] } [ a t ] ,   l = 1 ,   2 , ,   s  for all states  { g 1 ,   g 2 , ,   g s }  of the result factor  g . Given a  α  threshold  ( α ( 0.5 , 1 ] ) , if  | [ g l ] | | [ a t ] | > α , it is said  [ a t ]  is an extended determining class of factor  f j . The union of all extended determining classes of factor  f j  is called the extended determining region of the result factor  g .
Definition 4.
(extended determining degree) The ratio of the number of objects q in the extended determining region of the factor  f j  to the number of all objects m is called the extended determining of the result factor, denoted as  d ( f j ) = q / m .

3.1.2. Algorithm Principle

The extended determining degree criterion, which adopts the extended determining degree with the essence of reasoning set logic as the criterion, selects the optimal conditional factors and achieves fast convergence of the algorithm by expanding the determining region.

3.2. Setting of the α Threshold

If the α threshold is too low, during the training process, the conditions are easy to meet, which will delete too many non-noise objects and special objects, easily leading to underfitting, resulting in a single decision tree rule and loss of decision value.
If the α threshold is too high, during the training process, the conditions are difficult to satisfy, which is not enough to delete noisy objects and special objects and cannot achieve the purpose of optimizing the training set and reducing the size of the causal tree.
Through experiments, it was found that the α threshold range is generally 0.8~0.95.

3.3. Algorithm Steps

Domain U = { u 1 ,   u 2 , ,   u m } , the conditional factor is F = { f 1 ,   f 2 , ,   f n } , and the state space of the conditional factor is I ( f j ) = { a j 1 ,   a j 2 , ,   a j k } ( j = 1 ,   2 , ,   n ) ; the result factor is g , and the state space of the result factor is I ( g ) = { g 1 ,   g 2 , ,   g s } . The steps for a streamlined causal tree algorithm are:
Input: Dataset.
Step 1. Divide the dataset into Train_data and Test_data. Given the initial α threshold.
Step 2. Calculate q ( a j t ) and q ( a j t , g l ) . Traverse all conditional factors, calculate the number of objects q ( a j t ) ( j = 1 ,   2 , ,   n ;   t = 1 ,   2 , ,   k ) corresponding to all states of the conditional factor f j . Calculate the number of objects q ( a j t , g l )   ( j = 1 ,   2 , ,   n ;   t = 1 ,   2 , ,   k ;   l = 1 ,   2 , ,   s ) in all states of the conditional factor f j corresponding to the result factor.
Step 3. Calculate the ratio r of q ( a j t , g l ) to q ( a j t ) .
Step 4. Determine the extended determining class. Compare r a j t , 1 ,   r a j t , 2 , ,   r a j t , s under the same state of the conditional factor f j with α . If r a j t , l > α , then all objects whose state a j t of the conditional factor f j corresponds to the result factor state g l are extended determining classes.
Step 5. Determine the extended determining region. Union of the extended determining classes of each conditional factor to obtain the extended determining region.
Step 6. Calculate the extended determining degree. Calculate the number of objects in the extended determining region of each conditional factor and obtain the extended determining degree d = { d 1 ,   d 2 , ,   d n } . Calculate the maximum extended determining degree d max = max { d 1 ,   d 2 , ,   d n } .
Step 7. Update the training set. Suppose that the conditional factor corresponds to d max is f j , if there are q ( a j t , g l ) extended determining classes in a certain state of the conditional factor f j , label all objects in the extended determining class as normal objects. At the same time, in this state, there are Q = q ( a j t , g 1 ) + q ( a j t , g 2 ) + + q ( a j t , g l 1 ) + q ( a j t , g l + 1 ) + + q ( a j t , g s ) objects that are not extended determining classes and are marked as abnormal objects, namely noise objects and special objects to be deleted. In the training set Train_data, objects marked abnormal were deleted to obtain a new training set Train_data1.
Step 8. Extraction rules. For Train_data1, it uses d max corresponding to conditional factors to extract decision rules and divides the dataset to obtain sub-datasets.
Step 9. Building a causal tree. Repeat steps 2 to 8 on the sub-dataset to construct a causal tree under the α threshold. Each node of the causal tree satisfies the condition α , and each branch is carried out on the updated training set under the condition α .
Step 10. Select the optimal α threshold. Given a step size s t e p = 0.01 , repeat steps 2 to 9. Analyze the relationship between α threshold and the accuracy of causal tree prediction and select the optimal α threshold on the training set.
Output: The causal tree under the optimal α threshold.

3.4. Instance Analysis

Five classification datasets in the UCI database were analyzed using the streamlined causal tree algorithm, the factor analysis method, the ID3 algorithm, and the C4.5 algorithm. Tenfold cross-validation was used to obtain the number of decision rules, accuracy, precision, recall, F1-measure, and running time. The running time of the streamlined causal tree algorithm is the causal tree training time under the optimal threshold. The experimental results are shown in Table 2.

3.5. Conclusions

The causal tree trained by the streamlined causal tree algorithm has the smallest size, the best classification accuracy and F1-measure, and the least redundant rules. Therefore, the streamlined causal tree algorithm can not only reduce rule redundancy and significantly reduce the size of the causal tree but also improve the classification performance of the causal tree to a certain extent, expanding the theory and application of factor space.

Author Contributions

There are five authors in this paper. K.L. provided the algorithm and software coding for verification analysis and writing of the paper; F.Z. provided the guidance for writing and preparing the first draft; X.L., K.Z. and Y.W. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Liaoning Provincial Department of Education Project, grant number LJ2019JL019, and the Key Research Projects of Basic Scientific Research Projects in Higher Education Institutions of the Liaoning Provincial Department of Education, grant number LJKZZ20220047.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data used in this article is all from the UCI dataset.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, P.Z.; Sugeno, M. Background structure of factor field and Fuzzy Set. Fuzzy Math. 1982, 2, 45–54. [Google Scholar]
  2. Wang, P.Z.; Guo, S.C.; Bao, Y.K.; Liu, H.T. Factor Analysis Method in Factor Space. J. Liaoning Tech. Univ. (Nat. Sci.) 2014, 33, 865–870. [Google Scholar]
  3. Bao, Y.K.; Ru, H.Y.; Jin, S.J. A new algorithm of knowledge mining in factor space. J. Liaoning Tech. Univ. (Nat. Sci.) 2014, 33, 1141–1144. [Google Scholar]
  4. Liu, H.T.; Guo, S.C. Reasoning model of factor analysis method. J. Liaoning Tech. Univ. (Nat. Sci.) 2015, 34, 124–128. [Google Scholar]
  5. Wang, H.D.; Wang, P.Z.; Guo, S.C. Improved factor analysis on factor spaces. J. Liaoning Tech. University. Nat. Sci. 2015, 34, 539–544. [Google Scholar]
  6. Fan, S.B.; Zhang, Z.J.; Huang, J. Association Rule Classification Method for Decision Tree Pruning Strengthening. Comput. Eng. Appl. 2023, 59, 87–94. [Google Scholar]
  7. Wang, P.Z.; Liu, H.T. Factor Space and Artificial Intelligence, 1st ed.; Beijing University of Posts and Telecommunications Press: Beijing, China, 2021; pp. 89–92. [Google Scholar]
Table 1. Causal analysis table.
Table 1. Causal analysis table.
UFg
f1f2 fng
u1fj(ui)g(ui)
u2
um
Each row of the causal analysis table is the coordinate of an object in the factor space. A finite number of objects constitute a domain U = { u 1 ,   u 2 , ,   u m } . The conditional factors are F = { f 1 ,   f 2 , ,   f n } . The state space of conditional factors is I ( f j ) = { a j 1 ,   a j 2 , ,   a j k } ( j = 1 ,   2 , ,   n ) . The result factor is g . The state space of the result factor is I ( g ) = { g 1 ,   g 2 , ,   g s } .
Table 2. Comparison of experimental results.
Table 2. Comparison of experimental results.
DatasetsIndexesFactor Analysis MethodStreamlined Causal Tree AlgorithmID3C4.5-PEP
LymphographyNumber of decision rules45245028
Accuracy0.76140.85140.74380.7567
Precision0.78220.87810.81060.8074
Recall0.76140.85140.74380.7567
F10.76190.8530.75490.7627
Time/ms55405774
DermatologyNumber of decision rules982612531
Accuracy0.75930.91670.70110.9134
Precision0.82380.95250.80730.9487
Recall0.75930.91670.70110.9134
F10.77760.9290.73590.9224
Time/ms201133216358
CancerNumber of decision rules86348575
Accuracy0.93120.96340.92240.9313
Precision0.93940.96080.93260.9389
Recall0.86220.92970.84130.8637
F10.8950.94410.88380.8983
Time/ms9050108132
AustralianNumber of decision rules26338222160
Accuracy0.73480.87680.77390.8116
Precision0.77950.9220.80450.8506
Recall0.72360.8540.78340.8077
F10.74790.8840.79170.8259
Time/ms20990203274
Tic-tac-toeNumber of decision rules27176190122
Accuracy0.78280.84870.84760.7975
Precision0.83670.84810.89390.8378
Recall0.82770.93950.86870.8588
F10.83060.89030.88050.847
Time/ms240120193257
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lin, K.; Zeng, F.; Liu, X.; Wang, Y.; Zhang, K. Research on a Streamlined Causal Tree Algorithm Based on Factor Space Theory. Comput. Sci. Math. Forum 2023, 8, 72. https://doi.org/10.3390/cmsf2023008072

AMA Style

Lin K, Zeng F, Liu X, Wang Y, Zhang K. Research on a Streamlined Causal Tree Algorithm Based on Factor Space Theory. Computer Sciences & Mathematics Forum. 2023; 8(1):72. https://doi.org/10.3390/cmsf2023008072

Chicago/Turabian Style

Lin, Kaile, Fanhui Zeng, Xiaotong Liu, Ying Wang, and Kaijie Zhang. 2023. "Research on a Streamlined Causal Tree Algorithm Based on Factor Space Theory" Computer Sciences & Mathematics Forum 8, no. 1: 72. https://doi.org/10.3390/cmsf2023008072

Article Metrics

Back to TopTop