1. Introduction
Due to the numerous causes of pollution, most of which are caused by human activity, ensuring that water is safe to drink is difficult. The overexploitation of environmental resources is one of the primary factors contributing to water quality issues [
1]. The quick speed of industrialization and a more significant emphasis on agricultural expansion, along with the most recent innovations, agricultural fertilizers, and a lack of enforcement of rules, have led to a substantial increase in water pollution [
2]. The situation might be made significantly worse on occasion due to the uneven distribution of rainfall. In addition, the behaviors that are particular to each individual contribute considerably to the formation of the general pattern of water quality. Both point and non-point causes of pollution have a detrimental effect on the water quality [
3,
4]. Some examples of point causes of decay are sewage discharge, discharge from factories, run-off from agricultural areas, and urban run-off. A lack of awareness and education among individuals who consume water may also contribute to flooding and droughts. These are two other types of water pollution.
The implication of user contribution in sustaining groundwater consumption quality and assessing other factors, such as cleanliness, sanitary conditions, preservation, and waste treatment, is essential for preserving groundwater quality [
5]. Low water quality significantly contributes to the spread of illnesses and slows socioeconomic development. Worldwide, waterborne infections are responsible for the deaths of around 5 million people each year. The rain has the potential to carry the fertilizers and pesticides that farmers employ through the soil and deposit them in rivers [
6]. Additionally, waste materials from industry are flushed down drains and into lakes and rivers. These contaminants make their way into the food chain, where they continue to build up until they reach hazardous levels for animals, fish, and birds and finally cause their deaths. Wastes from chemical plants are often discarded into nearby bodies of water [
7]. The water from rivers is used in factories to either power or clean machinery. They are raising the temperature to a higher level.
Pathogens, or substances that can cause disease, are eliminated from water using disinfectants such as chlorine and fluoride. The cause of the difficulty in freshwater management is a residue left in the water when excessive amounts of chlorine are mixed [
8]. Because when the residual quantity of chlorine is consumed in large quantities, it interacts within the stomach and destroys specific organ cells. Various unfavorable health impacts are associated with drinking chlorinated tap water. Some of these consequences are caused by chlorine’s propensity to create trihalomethanes. Trihalomethanes (THMs), such as chloroform, are produced when chlorine combines with small organic particles in drinking water [
9]. The chemical compounds, as mentioned above, have been linked to various undesirable health effects, including respiratory infections, gastrointestinal issues, congenital anomalies, and prostatic and rectal caecum cancers.
Furthermore, irritating the eyes, nose, and throat can arise from prolonged contact with chlorine at low concentrations. At greater concentrations, inhaling chlorine gas can cause alterations in the pace at which a person breathes, as well as coughing and lung problems [
10]. Chlorine exposure can also cause other severe symptoms. Workers who are exposed to chlorine run the risk of experiencing health problems. The dose, duration, and job being done all have a role in determining the degree of exposure [
11].
Furthermore, the signs and symptoms of chlorine poisoning can be rather severe. Workers who are exposed to chlorine run the risk of experiencing health problems. The dose, duration, and job being done all have a role in determining the degree of exposure [
12]. In addition, the chlorine decay rate within drinking water sources depends on several variables, particularly temperature. Hence, we require a real-time quantity analysis of drinking water.
The main intention of the proposed framework is to sustain a continuous supply of drinkable water. Its quality concerning chlorine content must be checked in real-time [
13]. Additionally, the existing system makes most laboratory-based testing methodologies time-consuming and expensive. Even though advancements have been made in water monitoring systems [
14], they still rely on wireless sensor networks or wireless network technology, which has several drawbacks, including inadequate information security, transmission coverage, and power consumption control [
15]. In this characteristic, the Internet of Things (IoT) has benefited since it enables the construction of real-time systems that are much more effective, reliable, and cost-efficient. The proposed model continuously updates the chlorine level values to the sensor hub. Further, that information is transferred through a communications channel for data analysis to evaluate chlorine levels in drinking water and residual chlorine concentration over time in the drinking water.
2. Related Work
Pollution caused by industrialization and urbanization is detrimental to the environment and Earth’s life. Due to industrialization and urbanization, the environment has become contaminated, especially the water [
16]. Extremely polluted water could induce food contamination, diarrhea, short-term digestive disturbances, lung ailments, skin issues, and significant health effects. In an emerging nation such as Bangladesh, ready-made clothing manufacturing is one of the most important suppliers to the overall Gross Domestic Product (GDP). Most waste from garment manufacturers is thrown into adjacent rivers or canals [
3,
17]. Consequently, the water quality in these rivers and streams has become very inappropriate for living creatures, becoming one of the greatest threats to the environment and human health [
18].
It is critical to monitor the water quality in real time to reduce the risk of water pollution. As a further consequence of water pollution in the sub-continent, the number of fish in the country’s rivers and canals is dwindling daily [
19,
20]. To conserve fish and other aquatic species and the ecosystem, one must monitor the water’s quality and determine the source of the contamination. Most strategies for reducing water pollution are primarily biological and conducted in laboratories, which require a significant investment of time and materials. Researchers developed a mobile application with an Internet of Things (IoT) platform to create a real-time groundwater quality assessment framework [
21,
22,
23,
24,
25].
A platform based on the Internet of Things and used to monitor water quality will include a CPU, sensors, and a communication module. The system’s processing unit is a microcontroller; the sensor module controls the data collection, and the access point transmits the data [
26]. Through the use of Wi-Fi modules, the data that has been gathered may either be sent to a cloud storage service to undergo additional analysis or to an interface suitable for personal computers and smartphones [
27]. There are a few different framework designs, some of which recommend installing solar panels as the system’s primary electricity source.
The research that has been published contains a large number of different concepts that have been offered for Internet of Things-based groundwater pollution monitoring. The IoT-based framework focuses on portability, overall efficiency, completeness of data, cost-effectiveness, and communication capacity of the component [
28,
29]. IoT is used to observe chemical or physical changes in rivers and streams to determine pollutant concentrations in real-time and detect hydrologic variation in coastal and marine ecosystems for early warning platforms and quick response to hazardous algal bloom occurrences. Developing nations such as Nigeria utilize the Internet of Things to monitor the aquatic environment, groundwater, rain, and commercially accessible water to guarantee that rural communities can access potable water.
This study presents a technique for identifying the substantial effect of non-point sources, and the implications of micronutrients and confinement are also assessed. An aquatic environment surveillance network shows a sensor network for detecting alkalinity, soluble oxygen, conductivity, and humidity [
30]. A remote monitoring system utilizing Zigbee technology is presented for lowering expenses and maximizing adaptability; this system delivers water-related data to the user. A system that employs temperature and pH sensors to check water quality is presented for the autonomous observation of artificial lake water [
31]. In this investigation, the relevance of applying the Zigbee module for remote agriculture monitoring is examined, as is the low power consumption of Zigbee during wireless data transfer. A remote surveillance communication approach is formulated to track how water is distributed. This model uses the flat-earth double-ray framework to prove that it tends to work.
The pH level sensor, DO sensor, and GPS focus specifically on groundwater resources as part of a cutting-edge system for measuring and tracking water quality [
32]. The temperature, phosphate concentration, oxygen dissolvability, resistance, pH, opacity, and water depth are just a few parameters that may be monitored using Intelligent Coast Multi-Sensor Technology [
33]. A framework is presented for evaluating freshwater using the Internet of Things (IoT) in a real-world environment. To assess the marine systems in real-world systems, a pH sensor coupled with a Wi-Fi mode is used [
34]. In addition, long-range communication is presently used in various applications regularly because it offers low-power, long-range, and dependable communication throughout sending data from edge devices to the server. The recommended system can carry out the long-range IoT water quality evaluation technique without causing any data loss; the platform focuses on using various sensors, including those that measure pH, humidity, sedimentation, and conductivity.
Researchers came across this information in a literature review when looking into the different techniques for evaluating the quantities of water quality criteria. The researchers from [
35] used web scraping to measure the many factors that make up the water’s quality. They used a YSI 600 module and Raspberry Pi to measure these parameters. The researchers only found instances of individuals utilizing Internet of Things equipment to run a Raspberry Pi with a few sensors [
36]. The researchers devised an innovative approach for assessing water quality and created a packaging system with a microprocessor and other sensors in this investigation. They used innovative, one-of-a-kind sensor components that monitor and broadcast real-time information. The sensors form a system that, combined with a Zigbee module, may offer both a visual and an audible depiction of the experimental measurements in the water [
37]. Examining water quality also incorporates computer vision and neural network techniques. In building models for water quality classification, the Recurrent Neural Network (RNN) and the Long Short-Term Memory (LSTM) artificial neural techniques are used.
The remaining sections are categorized as follows. In
Section 2, we briefly cover some pertinent literature.
Section 3 describes the proposed methodology, Chlorine level Assessment and Prediction in Water Monitoring System, using a fuzzy set specifically using a decision tree algorithm. In
Section 4, the data processing workflow of the proposed model is detailed. In
Section 5, the results and discussion are illustrated.
Section 6 concludes with the conclusion.
3. Proposed Model
In IoT-enabled ML for Water Quality Assessment, we use a temperature sensor, flow sensor, chlorine sensor, and solenoid valve. These sensors are integrated into a microcontroller such as the Raspberry PI 3 model, which then transmits the sensor readings to the Thing Speaks cloud service. After the data have been compiled in the cloud service, an appropriate machine learning model will be utilized to anticipate the amount of chlorine content, in addition to a forecasting strategy, essentially a subdivision of predictive modeling. The suggested model will primarily be used to measure and analyze the current level of pollution as well as to estimate what that level will be in the future.
Chlorine level prediction using machine learning involves building a model that can accurately predict the amount of chlorine in a sample based on various inputs as mentioned in
Figure 1. The process typically involves collecting data on chlorine levels and various factors that may affect chlorine levels, such as temperature, pH, and other chemicals. Once the data are collected, the next step is to preprocess the data to remove any outliers or missing values and to normalize the data if necessary. The preprocessed data are then divided into training and testing datasets. The machine learning algorithm is then trained using the training dataset. This involves adjusting the model parameters based on the input data so that the model can accurately predict the chlorine level for a given set of inputs. The model is then tested on the testing dataset, and the accuracy of the predictions is measured. If the accuracy is not satisfactory, the model may need to be adjusted, or a different machine-learning algorithm may need to be used. Once the model is optimized, it can be used to make predictions for new data points, providing accurate estimates of the chlorine level in a sample. This information can be used to make informed decisions about water treatment processes and ensure water is safe for consumption.
Temperature can affect the solubility of the chlorine compounds in the water and the reaction kinetics between the chlorine and other substances in the water. For example, warmer temperatures can increase the solubility of chlorine compounds and can speed up the reaction kinetics, leading to a higher chlorine aggregate ratio. It is important to monitor the groundwater’s temperature and consider it when assessing the chlorine aggregate ratio. This can help ensure that accurate predictions are made and that the water quality is properly maintained.
3.1. Phase 1
Temperature is one factor that impacts the chlorine aggregate ratio in groundwater. The chlorine bulk decay rate in drinking water is one factor that is predominantly influenced by temperature, and the temperature sensor is used to determine this rate. The chlorine sensor detects the amount of chlorine dissolved in the water utilizing PPM (Part Per Million). We have ranged the value from 0 to 10 PPM in the proposed model. Where values 4, 5, 6 come under moderate chlorine content, values 7, 8, 9, 10 come under high chlorine content, and values 1, 2, 3 come under low chlorine content, respectively. The PPM data are transmitted to the sensor hub in real-time, as was just indicated (Raspberry PI). The real-time PPM values from the sensor hub are also sent to the Data Acquisition Centre (DAC) via a communication network.
3.2. Phase 2
Fuzzy logic can be used to predict chlorine levels in the water. Fuzzy rules describe the relationship between input variables (e.g., temperature, pH, etc.) and the output variable (chlorine level). These rules can be combined to form a fuzzy inference system, making predictions based on input data. The accuracy of the predictions depends on the quality of the fuzzy rules and the accuracy of the input data. It is important to validate the fuzzy inference system using data to ensure accuracy.
In phase two, fuzzy rules are employed to calculate the amount of chlorine in the drinking water. If the range of chlorine content is less, another beach is required; if the scope of the chlorine value is moderate or less, another beach is not required to be added. Further, through a fuzzy algorithm, the determined chlorine range sends the signal to the solenoid valve. The solenoid valve is an electromechanical component that opens up the beach or chlorine in the water tank, as mentioned in
Figure 2. Further, flow meters are used to calculate the volume of chlorine, measured in liters, that has to be decomposed in the storage tank. A solenoid valve is used to make the connection to the flow sensor.
The decision tree algorithm is an intuitive and interpretable method for making predictions. It can also handle missing data and is not sensitive to outliers. However, it can become complex and overfit the data if fewer branches and input variables exist. To address this issue, decision trees can be pruned by removing branches that do not significantly improve the accuracy of the predictions. The pruning process helps reduce the model’s complexity and improve its accuracy. The decision tree algorithm is a popular choice for chlorine level assessment and prediction in water monitoring systems due to its ease of use and ability to handle complex data. However, it is important to validate the model’s performance using appropriate metrics to ensure that the predictions are accurate.
The chlorine level is predicted based on two input variables: temperature and chlorine values. Similarly, chlorine sensors measure the amount of chlorine dissolved in the water in the buffer. Since multiple sensors are involved in measuring different values for a particular time interval, it leads to fuzzy rules. The workflow diagram for predicting chlorine levels using fuzzy rules is shown in
Figure 3.
The initial step is fuzzification, which converts the measured chlorine level value from the sensor as input and converts it to linguistic variables of fuzzy sets, which consists of 5 tuples as Very High, High, Medium, Low, and Very Low. Fuzzy rules are formed and evaluated by Fuzzy Associative Mapping (FAM), which is illustrated in
Table 1.
In the system model, we take temperature and chlorine as two input variables and map them into a matrix format in two-dimensional matrix form, which is shown in
Table 2. Chlorine is represented column-wise, and temperature is represented row-wise. Fuzzy Associative Mapping reduces the rate of false negatives during the prediction phase.
4. Data Processing Workflow
Several challenges must be addressed when using the proposed model to predict chlorine levels in water monitoring systems, including the data quality and availability; the machine learning algorithms require large amounts of accurate data to train the model. Ensuring that the data collected is high quality and available in sufficient quantities is crucial for developing an accurate prediction model. Furthermore, feature selection is important to identify the key factors that affect chlorine levels in the water, as these will become the input variables for the prediction model. Careful selection of the input variables is crucial for obtaining accurate predictions. Additionally, a simple model may not capture the complex relationships between the input variables and chlorine levels. At the same time, a too complex model may overfit the data, leading to poor predictions for new data. Finally, model evaluation is important in evaluating the model’s performance using appropriate metrics such as mean absolute error and root means square error or R-squared. This helps to determine the predictions’ accuracy and identify areas for improvement.
The time of observation of the problem statement and combination of elements for the water quality domain must be comprehensive. Consider the chlorine level data set, which includes all qualities and characteristics of the water quality data set. It facilitates feature engineering and enables the selection of the best feature. Employ pre-processing techniques such as data cleaning to eliminate erroneous or inconsistent data or to fill in missing data in a variety of ways, for as by discarding missing values, filling them in indiscriminately, or predicting incomplete data. Finally, remove unnecessary entries from the pre-processed data. Pearson, spearman, and Kendall may correlate target class labels and characteristics. Correlation affects the model’s performance. Dimension reduction is performed on the pre-processed dataset to extract vital elements/attributes from the current dataset. It reduces the dimension via matrix factorization. Feature extraction and feature selection are used to decrease the features in the following ways: feature extraction generates new features from current ones.
The feature selection technique picks a subset of the current method. The aim is to eliminate inappropriate and noisy information, minimize computing time and complexity, and performance enhancement of machine learning algorithms. Three types of feature selection methods are distinguished. Filter method is expandable and performs quicker before classification; moreover, a Black Box Evaluator is utilized to determine the optimum subsection of features. It is more efficient than the filter technique. Filter technique is used more often than wrappers, which is quite costly. Finally, the hybrid approach bridges the gap between the filter and wrapper methods. The filter will provide the wrapper with a reduced feature set, allowing the wrapper to grow to more enormous datasets efficiently. The hybrid strategy, which merges the wrapper and filter approaches, utilizes the ranking information provided by the filter method.
Model validation is essential in chlorine level assessment and prediction in water monitoring systems. The purpose of model validation is to assess the accuracy of the model’s predictions by comparing its predictions with the actual chlorine levels in the water. As mentioned above, the most common model validation approach divides the data into two parts: a training dataset and a testing dataset. The training dataset is used to train the proposed algorithm, while the testing dataset is used to validate the model’s performance, as mentioned in
Figure 4. The model is trained on the training dataset, and its predictions are compared with the actual chlorine levels in the testing dataset. This comparison calculates metrics such as recall, precision, and F-score. These metrics give a quantitative measure of the accuracy of the predictions and can be used to compare the performance of the proposed model.
5. Result and Discussion
The performance evaluation of the proposed framework for determining the level of chlorine substance is focused on obtaining accurate classification on metrics such as recall, precision, and F-score, which are perhaps the most prominent in the classification algorithm. This is done so that the level of chlorine substance can be accurately determined. One way to evaluate how well a machine learning framework performs its job is to look at how accurately it makes optimistic predictions. This particular metric is referred to as precision. The ratio of the number of accurate forecasts, also known as true positives, to the maximum population of precise predictions, also known as pessimistic predictions, is called precision. This total number considers the number of correct predictions and false positives. Another essential measure is recalled, which refers to the portion of a class’s samples that a model adequately predicts. It is defined as the proportion of samples.
One of the most common metrics in machine learning algorithms is the F1 score. This score is the modulation index of the provided accuracy and recall values. The proposed water quality algorithm has proven to be effective in comparison with KNN and SVM, where recall is 90%, the precision value is 92%, and the F-score value is 89%, respectively, as mentioned in
Figure 5.
The thresholds are the separate probability cut-offs used in binary classification to discriminate between the two groups. By using the idea of probability, it finds out how well a model distinguishes between the paths. The proposed model ROC is persuaded towards an actual positive rate with AUC values of 0.9108, which is effective compared to SVM (Support Vector Machine) and KNN (K-Nearest Neighbor), respectively, in
Figure 5. The region underneath the receiver operating characteristic (RoC) curve, often known as the coefficient of determination (AUC), is an additional standard measure as mentioned in
Figure 6.
The fusion performance within a range of 10 to 30 m evaluation and estimation of the suggested model is Centered on Perception, Precision, Accuracy, and Matthew Correlation Coefficient (MCC). In addition, we calculate the F-Curve, the P-Curve, the R-Curve, and the PR-Curve. Ultimately, we analyze the performance comparison based on metrics such as training dataset vs. box loss, obj loss, cls loss, and other similar metrics. The recommended approach’s accuracy may now be evaluated by dividing the number of actual positive (+VE) forecasts by the total number of optimistic (+VE) predictions, as seen in
Figure 7.