1. Introduction
It has long been recognized that water pollution has negative impacts on human health and ecosystems [
1]. Regarding human health, the most immediate and most severe impact is the lack of improved sanitation, associated with the lack of safe drinking water, which currently affects more than one third of the world’s population [
2]. Water pollution is caused by industrial and anthropogenic activities, such as industrial accidents [
3,
4], urbanization [
5,
6] and natural phenomena like soil erosion [
7,
8]. Additional threats include, for example, exposure to pathogens or chemical toxicants via the food chain (e.g., as a result of irrigating plants with contaminated water and of the bioaccumulation of toxic chemicals by aquatic organisms, including seafood and fish) or during recreation (e.g., swimming in contaminated surface water) [
9,
10]. Serious water pollutants will cause different degrees of loss to industry. Therefore, the prevention and control of water pollution is a top priority in all of life and industry [
11].
To effectively solve the problem of water pollution at several spatial scales, water quality modeling is of utmost relevance for evaluation. A Multiple Criteria Decision-Making (MCDM) model is developed in this case for evaluating water quality. The measurement of water quality is one way of comparing degrees of water pollution [
12]. There have been several works devoted to evaluating water pollution. For example, Reference [
13] used MCDM techniques to present different aspects of water pollution control and reported the results of a case study for developing a master plan for water resources pollution control in Isfahan Province in Iran. Reference [
14] reported the application of fuzzy MCDM for ranking different types of industries based on water pollution potential in the State of Gujarat, India. Fuzzy analytic hierarchy process, as one of the most versatile MCDM methods, was used for managerial decision-making in complex surface water pollution situations with multiple and varied measures [
15]. An alternative evaluation index system was developed to establish the priorities of a range of water pollution alternatives using MCDM techniques [
16].
However, existing methods may obtain different evaluation results in the same case, so it is difficult for policy-makers to decide which method to employ. In this circumstance, ensemble vote, as a kernel of ensemble learning, can represent a good assistant to solve the multi-MCDM model problem [
17]. Ensemble learning can not only solve the problem of evaluation result non-conformity in water pollution, but also solve water pollution environmental management issues.
MCDM methods have been applied to a wide range of application areas. For instance, the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) is used primarily in the manufacturing, industry, and government sectors. Analytic Hierarchy Process (AHP) is used for evaluating weight derivation and it is still the most popular method despite some criticism in recent years [
18]. Meanwhile, the hybrid learning style was applied in research suggesting that an integrated TOPSIS model could be used to solve the truck selection problem of a land transportation company [
19]. A novel conjunctive MCDM approach that combines AHP and TOPSIS was used to develop an innovation evaluation system for location selection [
20]. Similarly, a hybrid ensemble approach was used in a suppliers’ selection problem, applying ensemble methods to obtain a better predictive performance than that achieved by an individual algorithm [
21]. A novel hybrid ensemble learning paradigm integrating ensemble empirical mode decomposition was proposed for nuclear energy consumption forecasting, and empirical results demonstrated that the ensemble learning paradigm could outperform some forecasting models in both prediction and directional forecasting [
22]. Furthermore, water quality is of great importance to our daily lives and is forecasted by using a multi-task multi-view learning method to blend multiple datasets from different domains [
23]. The results of this evaluation are inconsistent between different models even under the same cases. Therefore, the proposed ensemble learning paradigm is used for the evaluation of water quality in this paper.
To address the aforementioned concerns, this paper proposes hybrid ensemble learning for a data-mining evaluation framework, which combines the TOPSIS [
24], Grey Relational Analysis (GRA) [
25], AHP [
26], and Takagi-Sugeno (TS) fuzzy neural network [
27,
28].
The main objective of this work is to select and combine some algorithms of optimum configuration for an integration of multiple MCDM models in a framework based on multi-criteria evaluation. The main motivation is to evaluate water quality in six sites. Related work is discussed in
Section 2.
Section 3 proposes the frame and the methodology. The results and discussion are given in
Section 4. The conclusions of this work are summarized in
Section 5.
2. Related Work
Generally, most of the criteria required in MCDM cannot be evaluated accurately, since it is impossible to obtain precise data concerning the assessment of the decision makers. Moreover, some of the criteria are only evaluated subjectively, leading to incorrect results [
29]. MCDM is widely used in many evaluation cases, and it is used in the evaluation of water quality in our research. Therefore, we focus on comparing water quality in different sites to compare the water pollution degree of different sites according to the required water criteria [
30]. Water quality evaluation refers to the understanding of water quality conditions, which is a complex problem involving multiple disciplines. Many countries established relevant standards for different types of water systems. Water quality standards for surface waters are established by foundation of the water quality-based pollution control schemes under the Clean Water Act (CWA) in the USA [
31]. Meanwhile, China promulgated the surface water quality standards (GB3838-2002) for the surface water quality evaluation of rivers and lakes in 2002 [
32]. Evaluation theory and evaluation methods have enjoyed great developments in the recent years. For instance, the index evaluation method [
33], fuzzy evaluation method [
34], Grey theory evaluation method [
35], and neural network evaluation method have been put forth by different researchers [
36].
As an active research area, several evaluation methods have been proposed. For example, TOPSIS is used to evaluate the nature of green supplier selection. This is a complex multi-criteria problem, including both quantitative and qualitative factors, and there may be conflicts and uncertainness. The identified components are integrated into a novel hybrid fuzzy MCDM model that combines the TOPSIS with the Analytical Network Process [
37]. Meanwhile, the hybrid learning style is applied, and some research has suggested integrating the TOPSIS model to help the industrial practitioners with performance evaluation in a fuzzy environment [
38]. Similarly, a novel conjunctive MCDM approach that combines AHP and TOPSIS order was employed to develop an innovation support system that considers the interdependence of higher education institutes to comprehensively evaluate their innovation performance [
39]. Another study presented the application of multi-criteria evaluation in the selection of an optimal allocation for an air quality model [
40], which was developed to support managers in exploring the strengths and weaknesses of each alternative. This approach identifies a preferred result based on the integration of Bayesian networks and fuzzy logic to rank and evaluate suppliers [
41].
MCDM and integrating machine learning algorithms have been developed in recent years, including some hybrid methodology techniques that have effectively been used to conduct multi-attribute inventory analyses. For instance, naive Bayes, Bayesian network, artificial neural network (ANN), and support vector machine (SVM) algorithms [
42], they have successfully dealt with inventory classification problems. Financial risk prediction is used by MCDM methods (e.g., TOPSIS) to develop a two-step approach to evaluate classification algorithms, and results show that linear logistics, Bayesian network, and ensemble methods are the top-three classifiers [
43]. In the assessment of machine learning algorithms, selecting the appropriate classifier from the list of available candidates is a time-consuming and challenging task. Therefore, an accurate MCDM methodology evaluates and ranks classifiers based on experience, allowing end-users or experts to select the top-ranked classifier for their applications to learn and build classification models for their specific purposes [
44].
3. Methods
To demonstrate the above concerns, we propose a hybrid ensemble multi-model for a data-mining evaluation framework combining GRA, TS Fuzzy Neural Network, TOPSIS, and AHP, as shown in
Figure 1. We know from
Figure 1 that our framework consists of other key phases, including indicator preprocessing, a weights model, and an evaluation training model. In this work, the main contribution is shown in node 3, which have solved different results from multiple model evaluation, that is to say, a consistent result is obtained by ensemble vote. We describe the phases in detail in the following sections.
3.1. Ensemble Learning Approach
Ensemble learning is a learning task by building and combining multiple learners [
45]. There are two main advantages in this method. One is multiple training machines which can achieve the same performance if the learning task is large from the perspective of statistics. Also, this multiple model would reduce risk, because it may not lead to the wrong selection compared to a single model. The other one is learning algorithms, which tend to fall into the local minimum, and lead to poor generalization performance from a computational perspective. Meanwhile, a combination of multiple models and learning algorithms can reduce the risk of falling into a bad local minimum. In addition, some learning tasks may not satisfy the current hypothesis learning algorithm, and a single model would definitely be invalid in this case. By combining with multiple models, the corresponding hypothesis space has a better expanded learning better. Therefore, the ensemble learning is often adopted to improve the overall accuracy of regression and classification methods [
46]. We used the ensemble vote integration results of a multi-MCDM model evaluation to obtain a more accurate result. The given ensemble includes
T model
, where the
output is
in training set
x. The voting is involved in the testing stage, when the independent prediction is combined to predict an invisible instance. Some popular methods of voting include equal voting, weighted voting, and naive Bayesian voting. This paper employs the weighted voting method, in which the weights of each models are the different, as shown in Equation (
1).
where
is a single training model result,
is the model result weights,
is the ensemble value of each model’s result, and
T is the count.
3.2. Evaluation Based on TS Fuzzy Neural Network System
ZadehLA, an American cybernetics expert at the University of California, pioneered the concept of fuzzy sets in 1965 [
47]. Fuzzy theory has captured the characteristics of the ambiguity of human thinking and can thus be used to solve conventional problems.
TS fuzzy systems have a very strong self-adaptive ability, which can automatically update and modify the membership function of fuzzy subsets. Therefore, the error of a model is modified slowly in the running process, and the result is relatively better. A TS fuzzy system has a set of “if-then” rules to be defined. Concerning the rules for the
, the fuzzy reasoning is as follows:
where
,
, and so on, hereinafter;
is the fuzzy sets for the fuzzy systems;
is the parameters of the fuzzy system; and
is the output according to the fuzzy rules. If the input part (i.e., “if”) is fuzzy, then the output part (i.e., “then”) is determined.
For input value
, the membership of each input variable is
according to fuzzy rules is:
where
,
represent the membership function center and width, respectively;
n is the input parameters;
m is the number of fuzzy subsets.
The calculation of the fuzzy degree of membership, fuzzy operator for the multiplication operator is as follows:
According to the fuzzy results, to compute the model output value
:
However, model is lack of ability to learn in the application of the fuzzy comprehensive evaluation, due to complex optimization of the model parameters. The ANNs have self-learning, ability of self-organization and self-adaptation. If the combining them, they can effectively play their respective advantages.
To obtain the evaluation result, the back-propagation (BP) neural network combined with the TS fuzzy system for training was employed.
Figure 2 displays a neural network training procedure.
There is also a
(or threshold
). The above can be expressed mathematically:
where
are input signals;
are the weights of neurons
k;
is a linear combination;
is a threshold;
is the activation function; and
is the output of neurons
k.
The TS Fuzzy Neural Network is divided into the input layer, the fuzzy classification layer, fuzzy programming to calculate the layer, and output layer. The input layer relates to the input vector , which is the same as the number of nodes and the dimensions of the input vector. The fuzzy classification layer uses the fuzzy degree of the membership function of input values to obtain the fuzzy membership value. The layers of fuzzy rules are based on the fuzzy multiplication formula. The output layer uses this formula to calculate the output of the fuzzy neural network. The learning algorithm of the fuzzy neural network is as follows.
- (1)
The error computing
where
is the network expectation output;
is the actual output of the network;
e is the actual output of the error.
- (2)
The modified coefficient
where
is the coefficient of the neural network;
is the network learning rate;
is the network input parameters; and
is the product of the degree of membership of input parameters.
- (3)
The modification parameter
where
,
represent the membership function center and width, respectively.
3.3. Evaluation Based on TOPSIS
TOPSIS was first proposed by Hwang and Yoon in 1981 and developed by Chen and Hwang in 1992 [
39]. The TOPSIS model was used to evaluate the characteristics of the comprehensive evaluation index system of water quality and the purpose of evaluation in this paper. The basic idea of the TOPSIS method is to calculate the distance between the best scheme and the worst scheme in the continuous time series of samples and use the relative degree of the ideal solution as the standard of comprehensive evaluation. The TOPSIS method is detects the evaluation object and the optimal solution, as well as the worst solution of the distance to sort. It is best when the evaluation object is closest to the optimal solution and as far away as possible from the worst solution, otherwise it is will not be optimal. Firstly, we standardized the original data sets. Then
was employed as the for normalized dimensionless data matrix, where
.
Secondly, we determined the positive ideal and negative ideal solutions. The best and worst values are respectively represented by
and
. Then:
Thirdly, we determined the relative closeness to the ideal solution. The relative closeness
to the ideal solution can be expressed as follows:
where
and
are the separation of each alternative from the positive ideal solution and the negative ideal solution, respectively. Each is
. The larger the
, the closer the sample is to the optimal sample point.
Fourth, in this paper, the fuse neural network and entropy method was used to determine the weights of the indicators. The main idea can be summarized by the Reference [
24].
3.4. Evaluation Based on Grey Relational Analysis
The Grey system theory has been widely applied in various fields [
25], and has been proven to be useful in dealing with information on poverty, incompleteness, and uncertainty. We used GRA to evaluate water quality. Firstly, we determined the evaluation matrix, then determined each indicator’s weights, as shown in
Section 3.3. We then calculated the Grey relational coefficient:
where
and
are the Grey relational coefficients between
and
;
is the distinguishing coefficient,
. Finally, the Grey weighting relational was calculated,
where
w is the weight of attribute
j; while
is the Grey relational grade between
and
, which represents the level of correlation between the reference sequence and the comparability sequence.
3.5. Evaluation Based on AHP
AHP was put forth for operations research by Professor Saaty at the University of Pittsburgh in the 1970s [
48]. The main indicators of this method reflect the nature of complicated decision-making problems. This process conducts an in-depth analysis of the influencing factors and their relationship with the base station. It also involves less quantitative information to make the decision-making process of mathematical thinking, for the multi-objective and multi-criteria or non-structure decision-making method of simple and complex decision-making problem. It is difficult to completely apply a quantitative decision-making model or method to a complex system.
AHP can be succinctly summarized for water quality modeling as follows: structuring the decision hierarchy of interrelated decision elements, collecting input data by pairwise comparisons of decision elements, evaluating the consistency of managerial judgements, and applying the eigen-vector method to compute relative weights.
Concerning the details of the process reported in the literature [
49,
50], the main difference that arises is establishment of the matrix. We determined the matrix, in accordance with the theory of scale relation and the relationships between indicators, as shown in
Table 1. Similarly, we aimed to overcome the difficulty and discrimination of only expert scoring using this scale.
5. Conclusions and Future Work
Surface water quality has a major impact on the sustainable development of cities, especially in developing countries. How to efficaciously evaluate the impacts of surface water quality on social and economic development is an important issue for sustainable urban development. Studies have shown that integrating ensemble vote with MCDM is an effective means to improve traditional MCDM technique. This study has illustrates the usefulness of the proposed multi-MCDM approach in combination with ensemble vote as a tool for the evaluation of surface water quality. By comparing the results of four comprehensive evaluation methods, we found that different methods evaluating the same sample can results in different evaluation scores. However, we need same ranking results of various evaluation methods to employ a combined approach. Ensemble vote can solve this problem. According to the characteristics of the comprehensive evaluation index system of water quality and the purpose of the evaluation, this paper has used the TOPSIS, GRA, AHP, and TS Fuzzy Neural Network to evaluate the surface water quality of the study samples.
Although all results of this study illustrate the effectiveness of multi-MCDM methods in extracting characteristics from surface water quality datasets, there are some limitations to be noted. The disadvantage of this study is that the pollution sources are not identified to enable a fully understanding of the temporal and spatial variations of surface water quality. Further studies on surface water quality parameters, including TEMP, pH, and NH
4–N, should be carried out [
55]. These parameters can be further monitored for more accuracy and controlled. In addition, to obtain the contribution of different pollutants in each area, it is necessary to quantitatively evaluate pollution sources.
The purpose of this work is to explore water pollution evaluation, with the additional aim of improving water quality in the future. According to the influence of surface water quality evaluation, we hope that the government can put forward the countermeasures forth the management of water pollution in the future. Furthermore, based on the found compatibility between the compliance assessments and the practical surface water quality evaluation, a compatible grading evaluation and management scheme has been developed for better private and public decision-making.