3.3.1. Entropy

Entropy indicates the degree of randomness in a given input set. A branch with 0 entropy is chosen to be the leaf node. If the entropy is not equal to zero, the branch is further split. The Entropy, E(S), measured in "bits" is given by

$$E(S) = \sum\_{i=1}^{K} -p\_i \log\_2 p\_i \tag{4}$$

where *pi* is the percentage of class *i* in the node or the probability, and index *i* runs from 1 to *K* number of classes or attributes. The process of splitting an attribute is continued until the entropy of resulting subsets is less than the previous input or training set, eventually leading to leaf nodes of zero entropy. Minimization of entropy is desired as it reduces the number of rules of the Decision Trees. The lowering of entropy essentially leads to Decision Trees with fewer branches. The entropy is defined as an information-theoretic measure of the 'uncertainty' present in the dataset due to multiple classes [35].
