3.3.2. Information Gain

The information gain of an input attribute gives its relation with the target output. A higher information gain suggests that the parameter can separate the training data following the target output to a greater extent. Information gain is inversely proportional to the entropy. The higher the randomness in a set of inputs, the lower will be the information gain. The ID3 model [36] uses Information Gain as the method for classification.

$$Information\ Gain = Entropy(before) - \sum\_{j=1}^{K} Entropy(j, after) \tag{5}$$

where the index *j* runs from 1 to *K* possible classes. Maximizing the information gain essentially leads to minimization of entropy for that particular attribute. In the above Equation (5), the first term on the right-hand side is fixed or it is the entropy at the beginning. The attribute is selected for splitting first for which we obtain the minimal second term on the right-hand side resulting in maximization of information gain for that particular attribute.
