*2.2. Dataset Preprocessing*

Categorical and continuous data were separated into two groups. Classification and regression are considered supervised machine learning algorithms used for accuracy assessment and instruction. The selected dataset had noise—some values were small and some had a considerable enough amount of data for the supervised, trained machine learning model. Data samples containing outliers were discarded; the remaining data were filtered with data mining. Data mining involves clustering, classification, feature selection, association, calculation, outlier analysis, and pattern discovery. Incomplete data were removed during the data cleaning process. Similarly, several steps were involved in the data post-processing, such as pattern interpretation, pattern evolution, pattern visualization, and pattern selection. 1. K-fold assists in the accuracy of the ML (machine learning) model after training. 2. Spyder IDE helps establish a Python environment data science application using anaconda distribution.
