*4.5. Decision Tree Classifier Based Decision Support System (DTCDSS)*

A Decision Support System (DSS) can be built from the above results of decision tree models, as shown in Figure 10.

**Figure 10.** Decision Tree Classifier based Decision Support System (DTCDSS).

The input parameters and the sample water quality observations come from the input databases, as shown in the decision tree classifier (DTC). The above-discussed decision tree models help classify the water quality data based on input parameters and respective decision tree algorithms. The results are shown in the output database in "water quality predictions/formulae/bounds". The outputs are put into and as per classes of US EPA water usage classification. From the output classifications, one could figure out the under/over reporting issues, and check for maximum contamination levels (MCLs). In this process, the US EPA data warehouse system will help compare the output water quality parameters with stipulated MCLs. The inner working of any of the above-discussed decision tree classifiers is shown in the flow chart Figure 11.

Firstly, the DTC classifier receives the input data from the input databases such as climate, land use data, and water quality data. The entropy or information gain, gain ratio, and Gini index are computed based on the particular model chosen of the decision tree in the DTC classifier. For example, if the decision tree in the DTC classifier is ID3, then entropy or information gain is computed. If the decision tree in the DTC classifier is CART, then the gain ratio is computed. Similarly, if the decision tree is C4.5, then the Gini index is computed. The input parameters and the output parameters of a data sample are presented at the root node first. Then the tree is split based on the decision of the "if-else" statement minimizing the heterogeneity of data or increasing the homogeneity. The tree branches keep increasing with the addition of new data samples at the root node, and slowly, the leaf nodes get formed. At every level node of the tree, the entropy or information gain, gain ratio, Gini index are computed so that the data could be split easily and the data get traversed to the leaf nodes. Thus, the DTC classifier helps us classify the data or make a decision into four classes of output, namely, body contact and recreation, fishing and boating, domestic utilization, and dangerous at the leaf nodes. The model performance can be computed using the metrics such as accuracy, precision, recall, and F1-score. The particular DT model in the classifier can be any one of the CART, ID3, C4.5 with bagging and boosting variants, Random Forest, and Extremely Randomized Trees. The output performance of the DSS can be specific to the DT model chosen and could also be data sensitive. The best DT model for the DSS can be fixed only by experimenting with above-stated decision tree models for the data of several watersheds consisting of stream networks of varied conditions. Suppose we replace the DTC classifier with Decsion

Tree Regressor (DTR). In that case, we will be able to predict the output parameter such as fecal coliform, from the input parameters such as climate and land use. The output parameter predictions can also be further generalized or recasted using the regression formulae obtained of DTR and the output parameter bounds. The current Decision Support System is an improvised version of the DSS discussed in [29]. The Artificial Neural Network (ANN) model is replaced by DTC to suit the current problem of classification stream waters into four classes. The performances of DSS using DTR, the comparison of DTR and ANN models are out of the scope of the current work and are pursued elsewhere as separate research studies. Comparing output parameter predictions with respective MCLs within the DSS framework ensures the suitability of stream waters broadly into "safe" or "unsafe," thus making a helpful decision.

**Figure 11.** Flow Chart of a Decision Tree Classifier Model.
