Next Article in Journal
Community-Level Factors Associated with COVID-19 Cases and Testing Equity in King County, Washington
Next Article in Special Issue
Marine Waste—Sources, Fate, Risks, Challenges and Research Needs
Previous Article in Journal
Sex Differences in Risk Factors for Metabolic Syndrome in the Korean Population
Previous Article in Special Issue
Glucocorticoids in Freshwaters: Degradation by Solar Light and Environmental Toxicity of the Photoproducts
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods

1
School of Environmental Studies, China University of Geosciences, No. 68 Jincheng Street, Wuhan 430078, China
2
Technology Innovation Center of Geo-Environmental Restoration, Ministry of Natural Resources, No. 388 Lumo Road, Wuhan 430074, China
3
Institute of Geological Survey, China University of Geosciences, No. 388 Lumo Road, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2020, 17(24), 9515; https://doi.org/10.3390/ijerph17249515
Submission received: 3 November 2020 / Revised: 14 December 2020 / Accepted: 15 December 2020 / Published: 18 December 2020
(This article belongs to the Special Issue Environmental Chemistry and Technology)

Abstract

:
Traditional methods for hydrochemical analyses are effective but less diversified, and are constrained to limited objects and conditions. Given their poor accuracy and reliability, they are often used in complement or combined with other methods to solve practical problems. Cluster analysis is a multivariate statistical technique that extracts useful information from complex data. It provides new ideas and approaches to hydrogeochemical analysis, especially for groundwater hydrochemical classification. Hierarchical cluster analysis is the most widely used method in cluster analysis. This study compared the advantages and disadvantages of six hierarchical cluster analysis methods and analyzed their objects, conditions, and scope of application. The six methods are: The single linkage, complete linkage, median linkage, centroid linkage, average linkage (including between-group linkage and within-group linkage), and Ward’s minimum-variance. Results showed that single linkage and complete linkage are unsuitable for complex practical conditions. Median and centroid linkages likely cause reversals in dendrograms. Average linkage is generally suitable for classification tasks with multiple samples and big data. However, Ward’s minimum-variance achieved better results for fewer samples and variables.

1. Introduction

Traditional methods for graphical analysis of hydrochemical data include Piper (trilinear) diagrams, scatter plots, quadrilateral diagrams, rhombus diagrams, triangle diagrams, Schuka Lev classification, Broski classification, Kurllov’s (KypmoBa) formula, etc. [1,2,3,4,5]. Studies relying on one aforementioned method or measure may be susceptible to limited and biased results. For example, the classification of water samples using Piper diagrams tend to be vague and ineffective as it only plots a few major anions and cations [6,7]. The Schuka Lev classification has clear indices (for chemical constituents in groundwater) and a subjective predetermined threshold in milliequivalents (mEq) for ions. Therefore, this method obscures the fuzziness in water quality to some extent, and the variation of water quality is not detailed enough in classification results [8,9,10,11].
In recent years, cluster analysis (CA) and other multivariate statistical methods have been increasingly used in the classification of foundations. They can effectively extract useful information from complex datasets, and provide a reasonable and efficient approach to the study of chemical characteristics of groundwater [12,13]. The main factors affecting the hydrochemical field can be effectively identified using information regarding major ionic and nonionic components of groundwater that are extracted through multivariate statistical methods, which may further facilitate the understanding of the formation mechanism in the hydrochemical field [7,14,15,16,17,18,19]. Furthermore, clustering methods provide comprehensive analysis of the hydrochemical properties and improve the rationality in hydrochemical analysis by showing the sources of recharge, hydraulic relations, transport laws of groundwater, and the interaction characteristics between groundwater and its surrounding environment to a certain extent [20,21,22].
Moreover, CA covers many topics and is flexible. There are many theories and techniques related to CA, which may be applied to various objects and conditions. If the selected technique is unsuitable for a task, characterization of the nature and internal laws of data will be difficult, and may produce results that deviate from reality and the original intention of research. Therefore, core issues that need to be urgently addressed are: (a) Selection of one or several clustering methods for analysis under specific conditions; (b) comparing the advantages and disadvantages of various methods; (c) approximation of actual object compositions and the reflection of the objective laws of data; (d) achieving the optimal process and results through CA.
Therefore, in this study we performed a CA on 19 groups of leakage water samples collected from the Bayi Tunnel in Chongqing (municipality directly under the Central Government) to investigate the internal relationship between the sample data using six hierarchical cluster analysis (HCA) methods, i.e., single linkage, complete linkage, median linkage, centroid linkage, average linkage (including between-groups and within-groups linkage), and Ward’s minimum-variance. In addition, this study compared the advantages and disadvantages of the aforementioned methods and analyzed their objects, conditions, and scope of application.

2. Materials and Methods

2.1. General Setting of the Study Area

The Bayi Tunnel is located in between the Lianglukou Subdistrict and the Shangqingsi Subdistrict of Yuzhong District in Chongqing, Southwestern China. The entrance of the Bayi Tunnel is located in Jianxinpo, and the exit is at the southeast of the Chongqing Municipal Facilities Administration Bureau. This tunnel passes beneath the Chongqing Emergency Medical Center (CEMC), Chongqing Sports Bureau, and Lines 1 and 3 (Jianxinpo Tunnel) of the Chongqing Rail Transit. This tunnel was constructed in 1984, surrounded by roads in all directions. There is convenient daily traffic in its surrounding areas with dense flows of people and vehicles. It is an important tunnel in the Chongqing traffic hub. However, this tunnel has incurred water leakage and has other issues, partly because of the long service life, and partly because of intense human activities and complex natural conditions in its surrounding areas.
The soil in the study area is mainly composed of Quaternary gray brown clay and gray purple silty sand, mixed with gravel, with good hydraulic conductivity. The outcropping strata are fluvial and lacustrine sedimentary rocks, mainly composed of Jurassic fine sand and silty mudstone. The weathering fracture depth is generally 0.2–1.5 m. The groundwater is mainly distributed in the pores of Quaternary loose layer and weathered fissures of bedrock, which is mainly recharged by precipitation.

2.2. Sample Collections

After a rainfall event, a total of 19 water sample sets were collected: One sample set of underground sewer water (USW) from CEMC above the Bayi Tunnel; one set of precipitation (rain) samples from the atmosphere near the tunnel periphery; one sample set of the bedrock fissure water (BFW) and a set of pumping pipeline water (PPW) from superjacent Jianxinpo Tunnel; fifteen leakage water sample sets were collected from the Bayi Tunnel. Three sets of the fifteen were collected from the drain hole in the lining (at 272 m) of the Bayi Tunnel on three consecutive days. Twelve sets were collected on four consecutive days from three leakage points of the tunnel lining, at 327.5, 347, and 355 m, respectively.
Polyethylene bottles with 50-mL capacity were used as sample containers. The bottles were cleaned with distilled water before sampling and then rinsed 2 to 3 times with the water sample to be taken. Each sample set comprised two portions: A sample for cation analysis, to which dilute nitric acid (HNO3) was added until its pH was less than 2; and the other sample for anion analysis, which was unprocessed. The sampling process was in line with the relevant specifications and requirements in the Guidance of Collection and Preservation of Groundwater Sample for Quality Control (DZ/T 0064.2-93).

2.3. Chemical Analyses

HCO3- was measured in the field using a simple titration device with an analysis precision of 0.03 mmol/L (1.83 mg/L). The pH, temperature, and electrical conductivity (EC) measurements were conducted in-field using a Hanna HI8733 portable conductivity meter and Hanna HI8242 portable pH/mV meter, with the analysis precisions of 0.01 (pH), 0.1 °C (temperature), and 1 µs/cm (EC).
Water samples were sent to the State Key Laboratory of Biogeology and Environmental Geology in China University of Geosciences (Wuhan) for cation and anion analyses in one week after the rainfall event. Cations were measured using inductively coupled plasma optical emission spectrometry (ICP-OES, IRIS Intrepid II XSP, Thermo Fisher Scientific, Waltham, MA USA) with a precision of 1 × 10−3 mg/L, and anion analysis was performed using an ion chromatograph (IC, DX-120, Dionex, Sunnyvale, CA USA) with a precision of 0.01 mg/L (Table 1).

2.4. Data Quality Assurance

National reference materials (NRM) of China, GSBZ 50017-90 (202158 pH = 4.12, 202164 pH = 7.35, 202160 pH = 9.04), GBW(E) 130285 (EC = 12.88 ms/cm), GBW(E) 130415 (EC = 1000 µs/cm), and GBW(E) 130416 (EC = 100 µs/cm) have been applied for Hanna HI8242 and HI8733 calibrations. GSBZ 50017-90, GSB 04-1720-2004, GSB 04-1733-2004, GSB 04-1735-2004 (a), GSB 04-1738-2004, GSB 04-1770-2004, GSB 04-1771-2004, GSB 04-1772-2004, and GSB 04-1773-2004 (a) have been utilized for measuring pH, Ca2+, K+, Mg2+, Na+, Cl, F, NO3, and SO42−, respectively. Six concentration gradients of NRM ranging from 1 to 200 mg/L (1, 5, 10, 50, 100, 200 mg/L) have been established as calibration standards for cation measurement. By contrast, eight concentration gradients of NRM ranging from 0.1 to 200 mg/L (0.1, 0.5, 2, 5, 10, 50, 100, 200 mg/L) have been selected as calibration standards to measure anions.
WS 02 and WS 08 represent USW from CEMC and rain from tunnel periphery, respectively. Due to the particularity of these two samples, NO3 from WS 02, together with Mg2+ from WS 08, have not been detected. Affected by sampling time (before and after the rainfall), Ca2+ and Mg2+ have detected no data at the same time from WS 07 and WS 14. In order to excavate the internal relationship between different water sample types, as well as the temporal transforming pattern from the same water sample type, these four water samples with missing value(s) were reserved for CA. Because the contents of these variables are lower than the detection limits, 0 was introduced to replace the no data in CA.
The charge-balance error (CBE) was within ±5%, as the percentage relative total of the cation–anion difference was calculated on the sums from each water sample (Table 1). All analyses yielded analytical errors <5% and external precision of known–unknown analytical standards. To better ensure the quality of raw data, EC was also processed and calculated to compare with total dissolved solids (TDS) [23,24,25]. Unary linear regression equation of TDS(y) versus EC(x), y = 0.7117x, was extracted with R2 = 0.9906. All procedures of sampling, preservation, and transportation to the laboratory were strictly conducted in accordance with standard methods [26].

2.5. Cluster Analysis (CA)

2.5.1. Concept

CA is a multivariate statistical method that gradually classifies samples based on their similarity. It regards the samples as points in a multidimensional space, and the similarity between points are indicated using statistics [13,27]. Objects with a high degree of similarity are classified into a small cluster, while those with a low degree of similarity are classified into a large cluster. This classification continues until all data objects are classified. In CA, a data set is divided into several clusters, and the objects in the same cluster have a higher degree of similarity than those in other clusters [12,28,29]. CA is seen as a typical combinatorial optimization problem, which is described by the following mathematical model.
In a given set of pattern samples {X}, there are n samples and k classes of patterns {Sj, j = 1,2, …, k}. Each sample contains m variables. The set X can be expressed by a matrix as:
X   =   ( x 1 , x 2 , , x n ) =   x 11     x 12         x 1 n x 21     x 22         x 2 n             x m 1   x m 2         x mn
Each column of X is a sample, where x1i, x2i, …, xmi denote the first, second, …, m-th variable of the i-th sample. To classify samples, the minimum distance between each sample and its cluster center is taken as the similarity or distance metric, and its objective function is:
T   =   min j = 1 k X S i X m j m j = 1 i = 1 n y ij i = 1 n y ij X i
where k is the number of clusters; mj denotes the mean vector of the j-th sample (Sj); i = 1 n y ij = 1, implying that the sample i is only assigned to a cluster center. The classification rule is that if i is assigned to j-th cluster center, then yij = 1; or else, yij = 0.

2.5.2. Hierarchical Cluster Analysis

Existing clustering algorithms mainly include hierarchical clustering, partitioning, density-based clustering, grid-based clustering, model-based clustering, and fuzzy clustering. In particular, hierarchical clustering consists of hierarchical decomposition of a given set of data objects. Each object is initially regarded as an individual cluster. Then, objects with the shortest distance are joined into a new cluster until all are joined together in one large cluster.
Depending on the definition of the nearest (neighbor) distance and the recursion equation for clustering, hierarchical clustering can be subdivided into single linkage, complete linkage, median linkage, centroid linkage, average linkage, and Ward’s minimum-variance [30]. At present, hierarchical clustering is the most widely used clustering method. The related calculation and analysis modules have been integrated into many statistical analysis software packages or systems, such as SPSS, SAS, and S-PLUS, so that the users can directly invoke relevant functions.

Single Linkage

In single-linkage clustering, the two closest clusters are joined into a new cluster, and the shortest distance between members (in different clusters) is the distance between the new cluster and another cluster. Two clusters with the shortest distance are joined until one large cluster remains (Figure 1).
Let the distance between xi and xj, i.e., d(xi, xj), be represented as dij. Let Gp and Gq denote two clusters containing np and nq objects, respectively. D(Gp, Gq) or Dpq represent the distance between clusters Gp and Gq. Let Gr = {Gp} represent the new cluster that Gp and Gq join into.
The distance between clusters Gp and Gq is defined as the distance between their closest members, which is referred to as the shortest distance. It is calculated as:
D(Gp, Gq) = min{dij|iGp, jGq, p ≠ q}
After Gp and Gq are joined into a new cluster Gr, the distance between Gr and another cluster Gk (k ≠ p, q) is calculated based on the single-linkage clustering using the formula below:
D(Gr, Gk) = min{dij|iGr, jGk}
=min{min{dij|iGp, jGk}, min{dij|iGq, jGk}}
=min{D(Gp, Gk), D(Gq, Gk)}

Complete Linkage

This method joins two closest clusters into a new cluster and takes the longest distance between its members as the distance between the new cluster and another cluster. Among the farthest-apart members, two clusters that have the shortest distance are joined until all members are in the same cluster (Figure 2).
The distance between clusters Gp and Gq is defined as the distance between their farthest-apart members, which is referred to as the longest distance. It is calculated as:
D(Gp, Gq)=max{dij|iGp, jGq, pq}
After Gp and Gq are joined into a new cluster Gr, the distance between Gr and another cluster Gk (k ≠ p, q) is calculated using the complete-linkage clustering through the following formula:
D(Gr, Gk) = max{dij|iGr, jGk}
=max{max{dij|iGp, jGk}, max{dij|iGq, jGk}}
=max{D(Gp, Gk), D(Gq, Gk)}

Median Linkage

The shortest and longest distances in single and complete linkages represent two extremes in distance measurement. In contrast, median linkage uses an approach that falls within the shortest and complete linkages for calculating the distance between clusters (Figure 3).
After Gp and Gq join into a new cluster Gr, the distance between Gr and another cluster Gk (k ≠ p, q) is calculated based on median linkage using the equation below:
D 2 ( G r ,   G k )   =   1 2 ( D p k 2 + D q k 2 ) + β D p q 2 ( 1 4 β 0 )
where β is often set to β = 1 4 . Here, Drk is the midsegment across the side Dpq of the triangle formed by Dpk, Dqk, and Dpq.

Centroid Linkage

From a physical perspective, representing a cluster with its centroid is more reasonable. In centroid linkage, the distance between the centroids of two clusters is used to measure the distance between clusters. The distance between clusters is defined as the distance between their centroids. In object classification, the centroid for a cluster is considered to be the mean value of objects in that cluster (Figure 4).
After Gp and Gq are joined into a new cluster Gr, they contain np, nq, and nr (nr = np + nq) objects, respectively. Their centroids are denoted as X ¯ ( p ) , X ¯ ( q ) , and X ¯ ( r ) , respectively. We obtain:
X ¯ ( r )   =   1 n r ( n p X ¯ ( p )   +   n q X ¯ ( q ) )
The distance between Gr and another cluster Gk(k ≠ p, q) is:
D 2 ( G r ,   G k )   =   ( X ¯ ( k ) X ¯ ( r ) ) T ( X ¯ ( k ) X ¯ ( r ) ) = n p n r D p k 2 + n q n r D q k 2 n p n q n r 2 D p q 2

Average Linkage

Average linkage considers the average distance between members in two clusters, which can be further subdivided into two types: Between-groups linkage and within-groups linkage. When calculating the distance between clusters, between-groups linkage considers the average distance between members in different clusters, while within-groups linkage considers the distance between all members.
The distance between Gp and Gq is defined as the average distance between their member pairs, which is referred to as the average distance between clusters. It is calculated as:
D 2 ( G p ,   G q )   =   1 n p n q i G p j G q d i j 2
The distance between the new cluster Gr and another cluster Gk (k ≠ p, q) is calculated as:
D 2 ( G r ,   G k )   =   n p n r D p k 2 + n q n r D q k 2
where nr = np + nq.
a.
Between-groups linkage
This method defines the distance between two clusters as the average distance between their member pairs, and the two members are from different clusters. At each step, two clusters with the shortest average distance are merged until all members are joined into a large cluster (Figure 5). In other words, the average distance between each member pairs of two clusters is the shortest after they merge into a new cluster using between-groups linkage.
b.
Within-groups linkage
This method defines the distance between two clusters as the average distance between any two members of the clusters, including the distance between any two members, irrespective of the cluster. At each step, two clusters with the shortest average distance are merged until all members are joined into a large cluster (Figure 6). This means that after two clusters merge into a new cluster, the average distance between their members in the new cluster is minimized.

Ward’s Minimum-Variance

This method is based on the analysis of variance (ANOVA). For the correct classification, the ANOVA results show small within-groups sum of squares and large between-groups sum of squares.
Assuming that n samples are categorized into k groups, the i-th sample in the cluster Gt is denoted as X i ( t ) , and nt represents the number of samples in Gt. Let the centroid of the cluster be X ¯ ( t ) . Then, the sum of squares within Gt is:
S t = i = 1 n t ( X i ( t ) X ¯ ( t ) ) T ( X i ( t ) X ¯ ( t ) )
The total sum of squares for k groups is:
S = t = 1 k S t   = t = 1 k i = 1 n t ( X i ( t ) X ¯ ( t ) ) T ( X i ( t ) X ¯ ( t ) )
In Ward’s minimum-variance method, n samples are initially considered as separate clusters. Each time two clusters merge, the number of clusters decreases by one, and S increases. At each step, the two clusters are merged, resulting in the least increase of S, until all samples are joined into the same cluster.
The distance between Gp and Gq is defined as the sum of squares between the two clusters:
D 2 ( G p ,   G q )   =   S r S p S q
where Gr = {Gp, Gq}. The distance between the new cluster Gr and another cluster Gk (k ≠ p, q) is calculated as:
D 2 ( G r ,   G k )   =   n k + n p n r + n k D k p 2 + n k + n q n r + n k D k q 2 n k n r + n k D p q 2

2.5.3. Data Standardization

Because the observed values of each variable of samples have different orders of magnitude and measurement units, data transformations are necessary to obtain dimensionless data to avoid inefficient classification and improve the classification accuracy. After utilizing Z-scores to standardize raw data, the mean value of the transformed data was 0, and the standard deviation was 1 (standard normal distribution) in this study (Table 2):
We have x i j * = x i j x i ¯ S i ( i = 1, 2, …, m ; j = 1, 2, …, n )
where x i ¯ = 1 n j = 1 n x i j ; S i = 1 n 1 j = 1 n ( x i j x i ¯ ) 2 , ( i = 1, 2, …, m ).

2.5.4. Euclidean Distance

The distance is often used as a quantitative indicator for the degree of similarity between samples. Each sample is regarded as a point in an m-dimensional space. By defining a certain distance between points in m-dimensional space, we can classify the closer points to the same cluster and farther ones into different clusters. This study uses Euclidean distance (Table 3):
d ( x i , x j ) = k = 1 m ( x k i x k j ) 2
All calculations and classification results in this study are obtained using SPSS (IBM, Amonk, NY, USA).

3. Results

3.1. Single Linkage Method

According to Figure 7, if a line is drawn (Line A) at the Euclidean distance of 2.33, 6 clusters are made: Water leaked from the Bayi Tunnel, running water from the drain hole, BFW and PPW from the Jianxinpo Tunnel, and rain and USW from the CEMC. At the distance of 4.76, three clusters were formed, while only one large cluster existed at the distance of 6.871.
If a line (Line B) was drawn at the distance of 2.643, leaked water from the tunnel and the running water from the tunnel drain hole would join into a cluster, indicating a correlation between the two. However, these two types of water samples were distinguished at a distance less than 2.643, showing difference between the running water through the tunnel drainage system and the water in the hydrochemical process during leakage.

3.2. Complete Linkage Method

According to Figure 8, if a line (Line B) is drawn at the Euclidean distance of 3.691, six clusters are made, four clusters at the distance of 5.551 (Line C), while only one large cluster at the distance of 8.881. At a distance of 5.551, water leaked from the tunnel and the running water from the tunnel drain hole were joined, indicating a certain correlation between water leaked from different parts of the tunnel. At the distance of 2.9 (Line A), water leaked from the tunnel was clearly divided into three types: (a) The running water from the tunnel drain hole at +272 m; (b) water leaked near the point at +327.5 m; and (c) water leaked near the point at +355 m. The gradual changes in hydrochemistry of water samples with different sampling locations were reflected in the clustering process and the dendrogram.

3.3. Median Linkage Method

Single linkage underestimated the distance between clusters, while complete linkage exaggerated the distance between clusters. Median linkage provided an approach that fell within the scope of these two linkages. According to Figure 9, if a line (Line A) is drawn at a Euclidean distance of 2.062, six clusters are formed: Water leaked from the Bayi Tunnel; the running water from the drain hole in the tunnel; BFW and PPW from the Jianxinpo Tunnel; and rain and USW from the CEMC. At a distance of 3.614 (Line B), three clusters were formed: One cluster included the water leaked from the tunnel, the running water from the tunnel drain hole, and BFW and PPW from the Jianxinpo Tunnel. One cluster only included rain, while another cluster only included USW. This result suggests the composition difference between rain from the atmosphere and USW of the CEMC. In contrast, there was only one large cluster at a distance of 5.567.

3.4. Centroid Linkage Method

From a physical perspective, it is more reasonable to represent a cluster with its centroid. In centroid linkage, the distance between the centroids of two clusters is used to represent the distance between clusters. In object classification, the centroid for a cluster is considered to be the mean of objects in that cluster.
According to Figure 10, if a line (Line A) is drawn at a Euclidean distance of 2.626, five clusters are formed: Water leaked and the running water from the drain hole in Bayi Tunnel; BFW from the Jianxinpo Tunnel; PPW from the Jianxinpo Tunnel; rain; and USW from the CEMC. In median linkage, water leakage from the tunnel and the running water from the drain hole were considered as two different types of water. This differentiation reflects a slight difference between median linkage and centroid linkage, though they were joined at a different distance in centroid linkage.
At a distance of 4.163 (Line B), three clusters were formed, which is consistent with the classification results of median linkage. Specifically, one cluster included water leaked from the tunnel, the running water from the drain hole in the tunnel, and BFW and PPW from Jianxinpo Tunnel. One cluster only included rain, while another cluster only included USW of the CEMC. The above results show the similarities between centroid linkage and median linkage. In contrast, there was only one large cluster at a distance of 5.793.

3.5. Average Linkage Method

3.5.1. Between-Groups Linkage

According to Figure 11, if a line (Line A) is drawn at an average Euclidean distance of 2.916, the 19 samples will be categorized into six clusters: Water leaked from the Bayi Tunnel; the running water from the drain hole in the tunnel; BFW from the Jianxinpo Tunnel; PPW from the Jianxinpo Tunnel; rain; and USW from the CEMC. At a distance of 4.401 (Line C), 4 clusters were formed. One cluster included the water leaked from the tunnel, the running water from the drain hole in the tunnel, and the BFW from the Jianxinpo Tunnel. One cluster included the PPW from the Jianxinpo Tunnel, while another cluster included rain and USW from the CEMC. In contrast, only one large cluster existed at a distance of 7.553.

3.5.2. Within-Groups Linkage

According to the dendrogram in Figure 12, 19 groups of samples were classified into three clusters at a distance of 3.316 (Line B). One cluster included the water leaked from the tunnel, PPW from Jianxinpo Tunnel, and rain. This classification suggests that the water loss from leakage in the Jianxinpo Tunnel and the Bayi Tunnel may be replenished through rainfall. One cluster included the running water from the drain hole in the Bayi Tunnel and the BFW from the Jianxinpo Tunnel. This indicates a connection between the two and a certain hydraulic relation in rock mass between the two tunnels. Another cluster only included the USW from the CEMC. It showed poor connection with other types of water samples, which were observed in results with other methods. This is because USW is human sewage or wastewater with complex composition, which is completely different from the composition of water samples that are naturally produced.

3.6. Ward’s Minimum-Variance Method

According to the dendrogram in Figure 13, if a line (Line B) is drawn at the sum of squares of 27.467, the 19 groups of water samples will be classified into two large clusters: A cluster with only the water leaked from Bayi Tunnel, and the other cluster with other water samples. The 19 groups of water samples could be further classified into six clusters at the sum of squares of 10.837 (Line A): Water leaked near the point at +327.5 m; water leaked near the point at +355 m; the running water from the drain hole; BFW and PPW from the Jianxinpo Tunnel; rain and USW from the CEMC.

4. Discussion

4.1. Single Linkage Method

In Figure 7, the leaked water from the tunnel only joins BFW from the Jianxinpo Tunnel and rain at distances of 4.76 (Line C) and 5.357 (Line D), respectively. This indicates the absence of a close direct correlation and the significant effects of delayed or lagged rainfall. The water leaked from the tunnel finally joined USW at the late stage of clustering, showing composition differences between water samples. It is inferred that the pipeline was unlikely to be the source of water leak.
The single linkage method is simple and easy to use, which reflects the basic idea of hierarchical clustering in the most intuitive way. The obtained clustering results were consistent with the water samples determined at the initial sample collection stage. This finding suggests that without external influence and interference, single-linkage clustering showed great performance in data classification and characterization, and could be used to produce relatively clear and accurate clustering results.
However, owing to its inherent limitations in methodology, the closest distance was selected at each step. Sometimes in a long period of clustering, these shortest distances were very close. This may result in little differentiation in clustering steps (see the joint marked by “I” in Figure 7), which may further intervene with the clustering process and classification mapping.
Moreover, the dendrogram of data through this method is in a ladder-like shape and shows an extended-chain structure, implying that links are inevitable. Therefore, the internal connections among samples may be obscured to some extent. This is because the distance between clusters was the shortest. After the two clusters were joined into a new cluster, the distance between the new cluster and any other clusters was shortened, so it was easier to form a large cluster, and most samples were joined in the same cluster. In addition, existing literature shows that single linkage is significantly affected by outliers [31], which limits its application in processing complex data.

4.2. Complete Linkage Method

BFW and PPW from the Jianxinpo Tunnel, USW of the emergency center, and rain appeared to have greater distance from the water leaked from the tunnel, suggesting a gradual weakening of the relationship. A relatively strong relationship between the water from the tunnel drainage system and water leaked in the tunnel could be inferred. However, their chemical composition was still slightly different because of different paths and seepage time.
In the complete linkage method, the distance between clusters was defined as the longest distance between the clusters, which made adjustments and improvement on the basis of single linkage. It avoided the inevitable generation of links in single linkage. After the two clusters merged, their distance to other clusters was considered to be the distance from one of the two clusters that had the largest distance. This method increased the distance between the merged cluster and other clusters, and avoided the inevitable generation of links and a ladder-like pattern. Compared to single linkage, the horizontal axis of the dendrogram was extended and covered a larger range in the complete linkage, which produced a more refined clustering result. Objects were further classified into small clusters, and could be used to better characterize the data. Despite its advantages, relevant literature shows that this method may result in many clusters and data distorted by outliers, when dealing with data having large dispersions [32].

4.3. Median Linkage Method

The sample order was the same in dendrograms of median linkage and single linkage. Furthermore, results showed the integrity of water leaks in the tunnel and a connection between the running water from the drain hole and BFW. This information was unclear in the previous results, indicating that this method is better in portraying certain details.
Nevertheless, anomalies were detected during clustering. As shown in steps 9, 11, and 16 in the dendrogram below, the distance for merging was less than the distance in the previous step. Reversals (labeled as “I, II, and III”) were observed, which resulted in crossing lines and closed links. Given the non-monotonicity of median linkage, the clustering results were often unsatisfactory, and it was difficult to track links using the dendrogram [33]. Therefore, this method is rarely used.

4.4. Centroid Linkage Method

In centroid linkage, the sample order in a dendrogram was similar to that of single linkage and median linkage. In addition, its clustering process was similar to that of median linkage, especially with samples of water leakage in small clusters. The centroid linkage differed from median linkage in the middle stage of clustering. The median linkage strengthened the relationship between the running water from the drain hole and PPW from the Jianxinpo Tunnel, which was stronger than the connection with the water leaked from the tunnel. However, the water leaked from tunnel and the running water from the tunnel drain hole were considered to be within the same large cluster, so their correlation with BFW from the Jianxinpo Tunnel was poor.
Three anomalies were observed during the centroid linkage clustering where the distance for merging was less than the distance in the previous step. Similarly, anomalies occurred in steps 9, 11, and 16. This is the exact same order of anomalies in median linkage clustering. Even the first outlier (0.786) was the same. These small statistical values would inevitably cause partial reversals in the dendrogram. The three abnormal distances for merging were 0.786, 1.053, and 4.163, which correspond to closed links labeled as “I, II, and III (Figure 10)” in the dendrogram, respectively.
Centroid linkage requires the Euclidean distance. Each time the two clusters were merged, the cluster centroids had to be recalculated. Therefore, this method is less affected by outliers. While clusters were well represented by centroid linkage, reversals were likely to occur in dendrograms as the distance between clusters did not follow a monotonous increasing trend [27,34]. It is difficult to track links in the dendrogram, and the symbols may change frequently. In addition, it may involve complex calculation, which further limits its applications.

4.5. Average Linkage Method

4.5.1. Between-Groups Linkage

According to the clustering results with between-groups linkage, the relationship between the running water from the drain hole and BFW from the Jianxinpo Tunnel was strengthened, though such an effect only occurred in step 14 of merging at the average Euclidean distance of 3.844 (Line B). Based on the clustering analysis with the aforementioned methods, it can be inferred that BFW from the Jianxinpo Tunnel had a closer connection with the water leaked and the running water in the Bayi Tunnel than other water samples.
As shown in the dendrogram below, between-group linkage resolved the issue in single and complete linkages where the distance between clusters was easily affected by extreme values. It defined the distance between two small clusters as the average distance between all sample pairs, which utilized the distance information of all sample pairs [35].

4.5.2. Within-Groups Linkage

Similar to between-groups linkage, the results of clustering with within-groups linkage showed a correlation between BFW from the Jianxinpo Tunnel and the running water from the drain hole in the Bayi Tunnel at an average Euclidean distance of 2.309 (Line A). During the within-group linkage clustering, the correlation between PPW from the Jianxinpo Tunnel, rain, and the water leaked from Bayi Tunnel was improved, which was not observed in the clustering results with the aforementioned methods.
The within-groups linkage method calculates the average distance of sample pairs, including the pairs between small clusters and pairs within the same cluster. Compared to between-group linkage, it considers the similarity of objects within the same cluster in each clustering step. This method makes use of the known information and considers all samples and individuals. As shown in the dendrogram below, this clustering method achieves good clustering results and has wide applications in practice.

4.6. Ward’s Minimum-Variance Method

Compared to the aforementioned methods, the results and effects of clustering with Ward’s minimum-variance method were most consistent with the original type of sample collections. This is because the method required the distance between samples in Euclidean distance, and the distance between two clusters was significantly affected by the number of samples in the two clusters. Therefore, the two clusters tended to be far apart, making it difficult to merge the two. Nevertheless, this approach often met the actual requirements for practical clustering. Therefore, this method performs well in differentiating objects and shows great resistance to interferences. The results of classification using this method were less affected by outliers. Its dendrogram was often clearly structured, straightforward, accurate, and well represented the classification results.
In dealing with the classification of small samples, Ward’s minimum-variance method makes full use of the sample information to explore the internal connection in the data. In the event of little differentiation in samples, this method enlarges the differences between clusters and captures the essential attributes of clusters, thereby providing accurate and reliable classification results [27,36]. In the past, the application of Ward’s minimum-variance method was restricted by the complicated calculations. With the growth of computational technology, it is no longer a great challenge to manage such calculations. Therefore, this method is a very effective clustering method in theory and practice.

4.7. Hydrochemical Characteristics

Traditional methods of hydrochemical analysis, Piper trilinear diagram, Schuka Lev classification, and Kurllov’s formula were also conducted to interpret the geneses, connections, and the classifications of these water samples. As shown in Figure 14, Bayi Tunnel has a good aggregation of leakage water, and it is close to the rainfall with time passing by, which shows that the tunnel leakage water is strongly mixed by rainfall, and further shows that the rainfall has an extremely important impact on the leakage water of the tunnel. From different aspects of classification in Table 4, the leakage water types of Bayi Tunnel basically preserved the same, showing significant differences from the rainfall, the CEMC USW, the Jianxinpo Tunnel BFW and PPW, which is consistent with the results of CA. This indicates that the CA results of multivariate statistical methods and the results of traditional hydrochemical analysis had strong comparability and could be mutually verified.

5. Conclusions

(1)
In the HCA, single linkage was the most basic, comprehensible, and accessible method, which reflected the concept of hierarchical clustering directly. However, it was limited by little differentiations in clustering steps and the inevitable linking tendency (as seen from the ladder-like shapes in dendrograms). Complete linkage adjusted and improved the basis of single linkage. It avoided the inevitable generation of links and ladder-shaped dendrograms. By increasing the distance between clusters for merging, clustering with complete linkage was more refined and data sensitive. However, both single and complete linkage were significantly affected by outliers, and were therefore ineffective when processing data with large dispersions;
(2)
Unlike single and complete linkage, median linkage avoided measuring extreme distances, whereas centroid linkage emphasized the representativeness of a cluster. The centroids of clusters had to be recalculated each time after every two clusters merged; therefore, centroid linkage performed more stably when dealing with outliers. However, given the non-monotonicity of these two methods, the distance for merging was likely less than the distance in the previous step, which may have led to reversals, partially closed and crossing links, or other issues in dendrograms. Therefore, these two methods were not recommended;
(3)
Average linkage was the default method in the HCA module in SPSS. It included two techniques (i.e., between-group linkage and within-group linkage), and both could make full use of known information. All samples and indicators were considered, and the clustering process was not easily affected by outliers. Average linkage performed well in clustering and was recommended for dealing with a large number of samples, complex variables, and indicators;
(4)
Ward’s minimum-variance method could capture and enlarge the differences between clusters that were subtle, hidden, and difficult to identify using other methods, which was conducive to data classification. Using this method, more information could be delivered and expressed, which increased the classification accuracy. For classification tasks with fewer objects and variables, this method could effectively improve the accuracy and classification sensitivity, which could help to explore the essential attributes of data.

Author Contributions

Conceptualization, J.B.; formal analysis, J.B.; funding acquisition, W.L.; investigation, J.B.; methodology, J.B. and Z.P.; software, Z.P.; supervision, W.L.; validation, K.L.; visualization, K.L.; writing—original draft, J.B.; writing—review and editing, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundations of China: 41907177, 42007178; Fundamental Research Funds for the Central Universities: CUGL180817, CUGL180837; Open Research Program of Groundwater Remediation Technology Transformation Pilot Base of Hubei Province: GRTT202003; Natural Science Foundation of Hubei Province: 2019CFA013, 2020CFB463.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liang, Y.; Ma, R.; Wang, Y.; Wang, S.; Qu, L.; Wei, W.; Gan, Y. Hydrogeological controls on ammonium enrichment in shallow groundwater in the central Yangtze River Basin. Sci. Total Environ. 2020, 741, 140350. [Google Scholar] [CrossRef]
  2. Hu, Y.; Ma, R.; Wang, Y.; Chang, Q.; Wang, S.; Ge, M.; Bu, J.; Sun, Z. Using hydrogeochemical data to trace groundwater flow paths in a cold alpine catchment. Hydrol. Process. 2019, 33, 1942–1960. [Google Scholar] [CrossRef]
  3. Chang, Q.; Ma, R.; Sun, Z.; Zhou, A.; Hu, Y.; Liu, Y. Using isotopic and geochemical tracers to determine the contribution of glacier-snow meltwater to streamflow in a partly glacierized alpine-gorge catchment in northeastern Qinghai-Tibet Plateau. J. Geophys. Res. Atmos. 2018, 123, 10037–10056. [Google Scholar] [CrossRef]
  4. Ma, R.; Sun, Z.; Hu, Y.; Chang, Q.; Wang, S.; Xing, W.; Ge, M. Hydrological connectivity from glaciers to rivers in the Qinghai–Tibet Plateau: Roles of suprapermafrost and subpermafrost groundwater. Hydrol. Earth Syst. Sci. 2017, 21, 4803–4823. [Google Scholar] [CrossRef] [Green Version]
  5. Lin, J.; Ma, R.; Hu, Y.; Sun, Z.; Wang, Y.; McCarter, C.P. Groundwater sustainability and groundwater/surface-water interaction in arid Dunhuang Basin, northwest China. Hydrogeol. J. 2018, 26, 1559–1572. [Google Scholar] [CrossRef]
  6. Guler, C.; Thyne, G.D. Hydrologic and geologic factors controlling surface and groundwater chemistry in Indian Wells-Owens Valley area, southeastern California, USA. J. Hydrol. 2004, 285, 177–198. [Google Scholar] [CrossRef]
  7. Bu, J.; Sun, Z.; Ma, R.; Liu, Y.; Gong, X.; Pan, Z.; Wei, W. Shallow Groundwater Quality and Its Controlling Factors in the Su-Xi-Chang Region, Eastern China. Int. J. Environ. Res. Public Health 2020, 17, 1267. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Zhang, B.; Song, X.; Zhang, Y.; Han, D.; Tang, C.; Yu, Y.; Ma, Y. Hydrochemical characteristics and water quality assessment of surface water and groundwater in Songnen plain, Northeast China. Water Res. 2012, 46, 2737–2748. [Google Scholar] [CrossRef]
  9. Zhang, Q.; Wang, S.; Yousaf, M.; Wang, S.; Nan, Z.; Ma, J.; Wang, D.; Zang, F. Hydrochemical characteristics and water quality assessment of surface water in the northeast Tibetan Plateau of China. Water Sci. Technol. 2018, 18, 1757–1768. [Google Scholar] [CrossRef]
  10. Gu, H.; Chi, B.; Li, H.; Jiang, J.; Qin, W.; Wang, H. Assessment of groundwater quality and identification of contaminant sources of Liujiang basin in Qinhuangdao, North China. Environ. Earth Sci. 2015, 73, 6477–6493. [Google Scholar] [CrossRef]
  11. Zhang, Q.; Wang, S.; Yousaf, M.; Nan, Z.; Wang, S.; Ma, J.; Wang, D.; Zang, F. Hydrochemical Characteristics and Water Quality Assessment of Surface Water at Xiahe County in Tibetan Plateau Pastoral of China. Preprints 2016. [Google Scholar] [CrossRef]
  12. Miranda, J.; Andrade, E.; López-Suárez, A.; Ledesma, R.; Cahill, T.A.; Wakabayashi, P.H. A receptor model for atmospheric aerosols from a southwestern site in Mexico city. Atmos. Environ. 1996, 30, 3471–3479. [Google Scholar] [CrossRef]
  13. Vega, M.; Pardo, R.; Barrado, E.; Debán, L. Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Res. 1998, 32, 3581–3592. [Google Scholar] [CrossRef]
  14. Chen, K.; Jiao, J.J.; Huang, J.; Huang, R. Multivariate statistical evaluation of trace elements in groundwater in a coastal area in Shenzhen, China. Environ. Pollut. 2007, 147, 771–780. [Google Scholar] [CrossRef] [PubMed]
  15. Güler, C.; Thyne, G.D.; McCray, J.E.; Turner, K.A. Evaluation of graphical and multivariate statistical methods for classification of water chemistry data. Hydrogeol. J. 2002, 10, 455–474. [Google Scholar] [CrossRef]
  16. Goné, D.L.; Douagui, A.G.; Bai, L.; Kamagaté, B.; Ligban, R. Using Graphical and Multivariate Statistical Methods for Geochemical Assessment of Groundwater Quality in Oumé Department (Cte d’Ivoire). J. Environ. Prot. 2014, 5, 1265. [Google Scholar]
  17. Aruga, R.; Negro, G.; Ostacoli, G. Multivariate data analysis applied to the investigation of river pollution. Fresenius J. Anal. Chem. 1993, 346, 968–975. [Google Scholar] [CrossRef]
  18. Ritzi, R.W., Jr.; Wright, S.L.; Mann, B.; Chen, M. Analysis of Temporal Variability in Hydrogeochemical Data Used for Multivariate Analyses. Groundwater 2010, 31, 221–229. [Google Scholar] [CrossRef]
  19. Usunoff, E.J.; Guzmán-Guzmán, A. Multivariate Analysis in Hydrochemistry: An Example of the Use of Factor and Correspondence Analyses. Groundwater 1989, 27, 27–34. [Google Scholar] [CrossRef]
  20. Ashley, R.P.; Lloyd, J.W. An example of the use of factor analysis and cluster analysis in groundwater chemistry interpretation. J. Hydrol. 1978, 39, 355–364. [Google Scholar] [CrossRef]
  21. Panda, U.C.; Sundaray, S.K.; Rath, P.; Nayak, B.B.; Bhatta, D. Application of factor and cluster analysis for characterization of river and estuarine water systems-A case study: Mahanadi River (India). J. Hydrol. 2006, 331, 434–455. [Google Scholar] [CrossRef]
  22. Swanson, S.K.; Bahr, J.M.; Schwar, M.T. Two-way Cluster Analysis of Geochemical Data to Constrain Spring Source Waters. Chem. Geol. 2001, 179, 73–91. [Google Scholar] [CrossRef]
  23. Walton, N.R.G. Electrical Conductivity and Total Dissolved Solids—What is Their Precise Relationship? Desalination 1989, 72, 275–292. [Google Scholar] [CrossRef]
  24. Atekwana, E.A.; Atekwana, E.A.; Rowe, R.S.; Werkema, D.D., Jr.; Legall, F.D. The relationship of total dissolved solids measurements to bulk electrical conductivity in an aquifer contaminated with hydrocarbon. J. Appl. Geophys. 2004, 56, 281–294. [Google Scholar] [CrossRef]
  25. Marickar, Y.M.F. Electrical conductivity and total dissolved solids in urine. Urol. Res. 2010, 38, 233–235. [Google Scholar] [CrossRef] [PubMed]
  26. APHA/AWWA/WEF. Standard Methods for the Examination of Water and Wastewater, 21st ed.; American Public Health Association: Washington, DC, USA, 2005. [Google Scholar]
  27. Bu, J.; Sun, Z.; Zhou, A.; Xu, Y.; Ma, R.; Wei, W.; Liu, M. Heavy metals in surface soils in the upper reaches of the Heihe River, northeastern Tibetan Plateau, China. Int. J. Environ. Res. Public Health 2016, 13, 247. [Google Scholar] [CrossRef] [Green Version]
  28. DíAz, R.V.; Aldape, F.; Flores, M.J. Identification of airborne particulate sources, of samples collected in Ticomán, Mexico, using pixe and multivariate analysis. Nucl. Instrum. Methods Phys. Res. 2002, 189, 249–253. [Google Scholar] [CrossRef]
  29. Han, Y.M.; Du, P.X.; Cao, J.J.; Posmentier, E.S. Multivariate analysis of heavy metal contamination in urban dusts of Xi’an, central China. Sci. Total Environ. 2006, 355, 176–186. [Google Scholar]
  30. Bu, J.W.; Zhou, J.W.; Zhou, A.G.; Kong, F.L. The Comparison of Different Methods in Hydrochemical Classification Using Hierarchical Clustering Analysis. In Proceedings of the 2011 International Conference on Remote Sensing, Environment and Transportation Engineering (RSETE), Nanjing, China, 24–26 June 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1783–1787. [Google Scholar]
  31. Suk, H.; Lee, K.K. Characterization of a Ground Water Hydrochemical System Through Multivariate Analysis: Clustering. Ground Water 1999, 37, 358. [Google Scholar] [CrossRef]
  32. Rafighdoust, Y.; Eckstein, Y.; Harami, R.M.; Gharaie, M.H.M.; Mahboubi, A. Using inverse modeling and hierarchical cluster analysis for hydrochemical characterization of springs and Talkhab River in Tang-Bijar oilfield, Iran. Arab. J. Geosci. 2016, 9, 241. [Google Scholar] [CrossRef]
  33. Tay, C.K.; Hayford, E.; Hodgson, I.O.; Kortatsi, B.K. Hydrochemical appraisal of groundwater evolution within the Lower Pra Basin, Ghana: A hierarchical cluster analysis (HCA) approach. Environ. Earth Sci. 2015, 73, 3579–3591. [Google Scholar] [CrossRef]
  34. Gorman, B.S.; Primavera, L.H. The Complementary Use of Cluster and Factor Analysis Methods. J. Exp. Educ. 1983, 51, 165–168. [Google Scholar] [CrossRef]
  35. Li, G.; Wang, X.; Meng, Z.; Zhao, H. Seawater inrush assessment based on hydrochemical analysis enhanced by hierarchy clustering in an undersea goldmine pit, China. Environ. Earth Sci. 2014, 71, 4977–4987. [Google Scholar] [CrossRef]
  36. Helstrup, T.; Jrgensen, N.O.; Banoeng-Yakubo, B. Investigation of hydrochemical characteristics of groundwater from the Cretaceous-Eocene limestone aquifer in southern Ghana and southern Togo using hierarchical cluster analysis. Hydrogeol. J. 2007, 15, 977–989. [Google Scholar] [CrossRef]
Figure 1. Conceptual diagram of the single linkage.
Figure 1. Conceptual diagram of the single linkage.
Ijerph 17 09515 g001
Figure 2. Conceptual diagram of the complete linkage.
Figure 2. Conceptual diagram of the complete linkage.
Ijerph 17 09515 g002
Figure 3. Conceptual diagram of the median linkage.
Figure 3. Conceptual diagram of the median linkage.
Ijerph 17 09515 g003
Figure 4. Conceptual diagram of the centroid linkage.
Figure 4. Conceptual diagram of the centroid linkage.
Ijerph 17 09515 g004
Figure 5. Conceptual diagram of the between-groups linkage.
Figure 5. Conceptual diagram of the between-groups linkage.
Ijerph 17 09515 g005
Figure 6. Conceptual diagram of the within-groups linkage.
Figure 6. Conceptual diagram of the within-groups linkage.
Ijerph 17 09515 g006
Figure 7. Dendrogram of data through single-linkage clustering.
Figure 7. Dendrogram of data through single-linkage clustering.
Ijerph 17 09515 g007
Figure 8. Dendrogram of data through complete-linkage clustering.
Figure 8. Dendrogram of data through complete-linkage clustering.
Ijerph 17 09515 g008
Figure 9. Dendrogram of data through median linkage.
Figure 9. Dendrogram of data through median linkage.
Ijerph 17 09515 g009
Figure 10. Dendrogram of data through the centroid linkage.
Figure 10. Dendrogram of data through the centroid linkage.
Ijerph 17 09515 g010
Figure 11. Dendrogram of data through between-groups linkage.
Figure 11. Dendrogram of data through between-groups linkage.
Ijerph 17 09515 g011
Figure 12. Dendrogram of data through the within-groups linkage.
Figure 12. Dendrogram of data through the within-groups linkage.
Ijerph 17 09515 g012
Figure 13. Dendrogram of data through the Ward’s minimum-variance method.
Figure 13. Dendrogram of data through the Ward’s minimum-variance method.
Ijerph 17 09515 g013
Figure 14. Piper diagram of 19 water samples.
Figure 14. Piper diagram of 19 water samples.
Ijerph 17 09515 g014
Table 1. Chemical analyses of water samples (unit: mg/L except pH).
Table 1. Chemical analyses of water samples (unit: mg/L except pH).
Sample NumberSampling LocationWater TypepHNa+K+Ca2+Mg2+ClSO42−CO32−HCO3FNO3TDS
WS 02CEMCUSW7.03123.39616.34858.09614.251231.8768.41-357.600.31-897.484
WS 03Jianxinpo tunnelBFW9.2793.06014.991103.3980.92624.3581.9583.529.571.3967.92523.827
WS 04Jianxinpo tunnelPPW9.6277.53313.630233.7023.77126.33117.23153.5229.300.746.03728.329
WS 05+327.5 mLW9.43225.495128.5986.6410.18543.92158.71271.1613.161.124.48925.866
WS 06+347 mLW8.64242.497104.6972.9480.25245.85190.27142.34220.061.103.641038.044
WS 07+355 mLW8.69233.404103.904--44.98220.18145.28211.091.110.941042.903
WS 08Tunnel peripheryRain5.376.1612.5550.908-4.4610.77-34.680.087.8871.649
WS 09+272 mDHRW8.72111.90211.6610.8480.10344.7482.5965.2981.331.2811.60445.418
WS 10+355 mLW8.58261.199134.79610.9640.94147.43197.1094.11400.661.045.471234.969
WS 11+327.5 mLW8.82213.104119.2032.2921.19442.98154.31249.9843.651.2010.63906.029
WS 12+347 mLW8.66233.00298.7853.6340.79558.67195.0683.52429.360.249.801178.830
WS 13+272 mDHRW8.84120.69614.6400.3030.17143.6782.7988.2341.261.3515.43445.471
WS 14+327.5 mLW9.48212.302110.803--41.35139.92131.7613.161.212.77718.916
WS 15+347 mLW8.41218.40382.8681.5170.08446.87148.3854.11134.551.122.47759.789
WS 16+355 mLW8.51239.996118.7973.6952.01642.06170.79200.57188.971.025.731042.38
WS 17+272 mDHRW8.73134.50415.5542.0540.23043.6284.8668.2356.211.2926.87470.354
WS 18+327.5 mLW9.14217.59798.5750.3050.48340.60111.51123.5217.941.032.39670.903
WS 19+347 mLW8.38230.90384.2680.7280.04645.88145.7252.94146.511.082.22779.646
WS 20+355 mLW8.53258.095125.7654.4750.41144.26197.35108.82397.671.063.961221.649
TDS: Total dissolved solids; CEMC: Chongqing Emergency Medical Center; LW: Leakage water; USW: Underground sewer water; BFW: Bedrock fissure water; PPW: Pumping pipeline water; DHRW: Drain hole running water.
Table 2. Results of dimensionless standardization of water variables.
Table 2. Results of dimensionless standardization of water variables.
Sample NumberpHNa+K+Ca2+Mg2+ClSO42−HCO3FNO3TDS
WS 02−1.61573−0.79248−1.155150.614123.961293.99725−1.190041.41−1.80109−0.651750.33212
WS 030.73282−1.20678−1.182541.40611−0.13248−0.58216−0.9467−0.939931.068353.76961−0.87724
WS 041.09978−1.41884−1.209933.684170.74037−0.53847−0.31265−0.80671−0.65863−0.25922−0.21535
WS 050.900570.601731.10549−0.28557−0.35991−0.15030.43283−0.915690.35099−0.360120.42401
WS 060.072290.833870.62416−0.35008−0.34147−0.107711.000030.481320.29785−0.41480.78706
WS 070.124710.709610.60805−0.40166−0.41831−0.126911.537570.420750.32442−0.590560.80279
WS 08−3.35617−2.39341−1.43287−0.38578−0.41831−1.02108−2.22594−0.77038−2.41218−0.13879−2.34078
WS 090.15617−0.94951−1.2496−0.38683−0.38757−0.13221−0.9352−0.45540.776090.10337−1.13103
WS 100.009381.089231.23035−0.21004−0.12941−0.072851.122781.700750.13844−0.295681.42445
WS 110.261010.432410.91618−0.36159−0.05257−0.171050.35375−0.709820.563540.040220.35979
WS 120.093260.704150.50514−0.33818−0.172430.175191.086111.89453−1.98707−0.013811.24275
WS 130.28198−0.82934−1.18959−0.39636−0.36606−0.15582−0.9316−0.725950.962070.35269−1.13086
WS 140.952990.421480.74701−0.40166−0.41831−0.207010.09514−0.915690.59011−0.47144−0.24581
WS 15−0.168860.504780.18452−0.37514−0.39372−0.08520.24718−0.096050.35099−0.49097−0.11353
WS 16−0.064010.71780.90812−0.337060.20252−0.191350.649930.27140.0853−0.278750.80111
WS 170.16665−0.6409−1.17126−0.36575−0.34762−0.15692−0.8944−0.625010.802661.09739−1.05034
WS 180.596520.493850.5007−0.39633−0.27078−0.22357−0.41545−0.883410.11187−0.49617−0.40123
WS 19−0.200310.675470.21271−0.38893−0.40294−0.107050.19938−0.01530.24471−0.50724−0.04925
WS 20−0.043041.046891.04849−0.32344−0.2923−0.14281.127271.680560.19158−0.393971.38134
Table 3. Euclidean distance matrix of water samples.
Table 3. Euclidean distance matrix of water samples.
Sample NumberEuclidean Distance
WS02WS03WS04WS05WS06WS07WS08WS09WS10WS11WS12WS13WS14WS15WS16WS17WS18WS19WS20
WS0208.8817.4557.9277.4357.6678.047.2837.5087.536.8717.4847.927.1677.087.4817.4917.1587.577
WS038.88105.1195.7276.1566.427.2714.2086.9595.2967.133.9535.4425.5226.0173.3695.3635.6316.946
WS047.4555.11905.3495.6975.8347.094.766.4225.3066.264.8245.2125.2085.5374.9255.0315.2896.428
WS057.9275.7275.34901.8412.027.5323.6633.080.9123.9683.5930.9031.741.7263.6911.4051.7753.044
WS067.4356.1565.6971.84100.5937.5113.9371.5651.6332.78142.2151.4430.8044.0182.3971.3681.436
WS077.6676.425.8342.020.59307.7374.1851.7281.9132.94.2522.4181.7591.2184.3032.7081.7271.583
WS088.047.2717.097.5327.5117.73705.3578.3427.0917.6065.5917.1296.437.255.6246.4746.4678.229
WS097.2834.2084.763.6633.9374.1855.35705.1393.2755.3710.4532.9882.7163.9031.0632.6742.8665.004
WS107.5086.9596.4223.081.5651.7288.3425.13902.9092.335.2393.5552.831.7455.1823.7052.7170.311
WS117.535.2965.3060.9121.6331.9137.0913.2752.90903.8753.2031.1761.3371.3523.2151.4261.4032.869
WS126.8717.136.263.9682.7812.97.6065.3712.333.87505.5514.3633.542.8015.4144.2173.4112.346
WS137.4843.9534.8243.59344.2525.5910.4535.2393.2035.55102.9092.793.9540.8052.6432.9455.113
WS147.925.4425.2120.9032.2152.4187.1292.9883.5551.1764.3632.90901.5392.1553.1070.8551.6343.481
WS157.1675.5225.2081.741.4431.7596.432.7162.831.3373.542.791.53901.4822.9081.3850.2372.697
WS167.086.0175.5371.7260.8041.2187.253.9031.7451.3522.8013.9542.1551.48203.9442.2011.4021.718
WS177.4813.3694.9253.6914.0184.3035.6241.0635.1823.2155.4140.8053.1072.9083.94402.8313.0415.073
WS187.4915.3635.0311.4052.3972.7086.4742.6743.7051.4264.2172.6430.8551.3852.2012.83101.4343.63
WS197.1585.6315.2891.7751.3681.7276.4672.8662.7171.4033.4112.9451.6340.2371.4023.0411.43402.584
WS207.5776.9466.4283.0441.4361.5838.2295.0040.3112.8692.3465.1133.4812.6971.7185.0733.632.5840
Table 4. Classifications of traditional hydrochemical analysis methods.
Table 4. Classifications of traditional hydrochemical analysis methods.
Sample NumberSampling LocationSchuka Lev ClassificationKurllov’s Formula
WS 08Tunnel peripheryHCO3-(Na+K)7-A M 0.06 H C O 3 69 S O 4 22 ( N a + K ) 91 T 18.8   ° C
WS 02CEMCHCO3·Cl-(Na+K)·Ca25-A M 0.87 H C O 3 54 C l 35 S O 4 10 ( N a + K ) 66 C a 27 T 20.1   ° C
WS 03Jianxinpo TunnelSO4-(Na+K)·Ca32-A M 0.33 S O 4 71 C l 21 ( N a + K ) 51 C a 49 T 21.5   ° C
WS 04SO4-Ca·(Na+K)32-A M 0.50 S O 4 68 H C O 3 17 C l 15 C a 71 ( N a + K ) 28 T 20.8   ° C
WS 09+272 mSO4·HCO3-(Na+K)14-A M 0.33 S O 4 40 H C O 3 39 C l 21 ( N a + K ) 99 T 21.9   ° C
WS 13SO4·Cl·HCO3-(Na+K)21-A M 0.30 S O 4 49 C l 26 H C O 3 25 ( N a + K ) 100 T 23.2   ° C
WS 17SO4·HCO3-(Na+K)14-A M 0.34 S O 4 46 H C O 3 30 C l 24 ( N a + K ) 99 T 23.4   ° C
WS 05+327.5 mSO4-(Na+K)35-A M 0.58 S O 4 74 C l 20 ( N a + K ) 98 T 22.0   ° C
WS 11SO4-(Na+K)35-A M 0.58 S O 4 64 H C O 3 18 C l 18 ( N a + K ) 99 T 22.6   ° C
WS 14SO4-(Na+K)35-A M 0.52 S O 4 72 C l 21 ( N a + K ) 100 T 22.8   ° C
WS 18SO4-(Na+K)35-A M 0.49 S O 4 66 C l 24 H C O 3 11 ( N a + K ) 100 T 22.9   ° C
WS 06+347 mHCO3·SO4-(Na+K)14-A M 0.81 H C O 3 48 S O 4 42 C l 10 ( N a + K ) 99 T 23.5   ° C
WS 12HCO3·SO4-(Na+K)14-A M 1.02 H C O 3 63 S O 4 29 ( N a + K ) 99 T 22.4   ° C
WS 15SO4·HCO3-(Na+K)14-A M 0.63 S O 4 45 H C O 3 41 C l 14 ( N a + K ) 99 T 22.4   ° C
WS 19HCO3·SO4-(Na+K)14-A M 0.65 H C O 3 43 S O 4 43 C l 14 ( N a + K ) 100 T 22.6   ° C
WS 07+355 mSO4·HCO3-(Na+K)14-A M 0.81 S O 4 46 H C O 3 44 ( N a + K ) 100 T 22.9   ° C
WS 10HCO3·SO4-(Na+K)14-A M 1.05 H C O 3 62 S O 4 31 ( N a + K ) 97 T 22.3   ° C
WS 16HCO3·SO4-(Na+K)14-A M 0.76 H C O 3 47 S O 4 43 ( N a + K ) 98 T 23.1   ° C
WS 20HCO3·SO4-(Na+K)14-A M 1.03 H C O 3 62 S O 4 31 ( N a + K ) 99 T 23.5   ° C
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bu, J.; Liu, W.; Pan, Z.; Ling, K. Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods. Int. J. Environ. Res. Public Health 2020, 17, 9515. https://doi.org/10.3390/ijerph17249515

AMA Style

Bu J, Liu W, Pan Z, Ling K. Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods. International Journal of Environmental Research and Public Health. 2020; 17(24):9515. https://doi.org/10.3390/ijerph17249515

Chicago/Turabian Style

Bu, Jianwei, Wei Liu, Zhao Pan, and Kang Ling. 2020. "Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods" International Journal of Environmental Research and Public Health 17, no. 24: 9515. https://doi.org/10.3390/ijerph17249515

APA Style

Bu, J., Liu, W., Pan, Z., & Ling, K. (2020). Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods. International Journal of Environmental Research and Public Health, 17(24), 9515. https://doi.org/10.3390/ijerph17249515

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop