3.1. Variable Description
Data in this research were collected from the field and summarized by water source, either from groundwater (GW) or surface water (SW). Summary statistics on the numerical parameters are shown in
Table 1 by water source. Most variables have a non-symmetric distribution with positive or negative skewness, and a Shapiro–Wilk normality test is rejected for all parameters except pH.
Figure 4 represents the sample values of four log-transformed water quality parameters (DO, TDS, Cl, and NH
4+) versus the distance from the Tumpun pumping station for the two different water sources (groundwater (GW) and surface water (SW)) and sampling seasons (ST2, ST3, and ST4).
Figure 5 represents bubble plots of the sample values of each water quality parameter at each sample location for the four variables (DO, TDS, Cl, and NH
4+). Scatterplots and bubble plots for the remaining variables are presented in
Figure A1,
Figure A2,
Figure A3,
Figure A4,
Figure A5,
Figure A6,
Figure A7 and
Figure A8 in
Appendix A. Each variable has two panels for the two different water sources (groundwater (GW) and surface water (SW)).
Dissolved oxygen (DO) is the concentration of oxygen in the gaseous phase dissolved in water [
13]. In
Figure 4 and
Figure 5, surface water DO shows a clear pattern: values are lower in the northern part of the study area, which is lower than the water quality standard (WQS) for Cambodia (2 mg/L) [
14]. A lower DO value indicates that the water body lacks the ability to decompose or clean contaminants, which corresponds to what is expected. This phenomenon is due to the discharge of the city (northern part of the study area) waste effluent. In contrast, DO in groundwater is relatively stable across the study region. The surface water sampling points ESW22 and WSW22 have higher DO compared with other points, and they are very close to the groundwater sampling points 2PW22 and 1PW22 (locations refer to the labels in
Figure 2). This might be an indication of the existing outflow from groundwater to surface water.
The role of pH in modulating most chemical and biological processes in water, is well known [
15].
Total dissolved solids (TDS) measure the total dissolved inorganic salts, such as calcium, magnesium, potassium, and sodium, and organic substances [
16]. As shown in
Figure 4, we can see a relatively higher level of TDS in the northern area and a relatively lower level of TDS in the central area and at the southern outlet. In contrast, variations in TDS in groundwater are unnoticeable.
Nutrients such as nitrate and phosphorous are essential for water-dwelling plants [
17]. Cambodia’s WQS has a standard value for total nitrogen, which is below 0.6 mg/L for lakes/reservoirs, whereas for total phosphate it is between 0.005 and 0.05 mg/L for lakes/reservoirs [
14]. NO
3− in surface water has a maximum value of 34.6 mg/L. This location is downstream of where the southern outlet stream meets the Prek Thnout River during the dry season and the south outlet stream drains the wetland. The highest nitrate NO
3− content in our study was found on the Prek Thnout River at SWB35. It represents the mixing between the southern outlet stream and the Prek Thnout river water. PO
4 in surface water also exceeds the higher bound of the standard, with a maximum value of 20.6 mg/L. Phosphate is more concentrated in the northern to central areas, where the incoming municipal wastewater comes from (
Figure A4). The amount of phosphate in water may lead to excessive algae growth and degrade the water quality. Phosphate is known to bind to soil because of its high adsorption coefficient, which explains the complete absence of phosphate (dissolved phase) in groundwater [
18]. Sources of nitrate and phosphate may include septic systems, agricultural fertilizers from the farmer peninsula near sample location 2PW22, and garbage dumps.
Ammonia (NH
4+) is a naturally occurring chemical derived from decomposed animal and plant matter. Ammonia is used in small concentrations to disinfect drinking water. However, too much ammonia in water is harmful to humans and wildlife. The EPA’s most recent guidance from 2013 recommends no more than 17 mg of total ammonia nitrogen (TAN) per liter at pH 7 and 20 °C for a one-hour average duration more than once on average during a three-year period. NH
4+ levels are as high as 50.61 mg/L at location 36RW (
Figure 2) near the Tumpun pump station. Its concentration decreases with distance from this point and has a more drastic decline in the rainy and post-rainy seasons than in the dry season.
From the descriptive analysis shown in
Table 1, it is evident that most water quality indicators show a high degree of skewness. This lack of symmetry suggests the need for a response transformation, except for the variable pH. Logarithmic transformation is widely used as a special case of the Box–Cox transformation family to reduce data variability and normalize the response for accomplishing statistical model assumptions. In this study, log transformation was used as a normalizing and stabilizing variance transformation for all variables except pH.
3.2. Principal Component Analysis
Principal component analysis (PCA) is a well-known multivariate analysis technique used to reduce the dimensionality of a data set, i.e., to find the dimensions of highest variation from a multivariate population. The goal is to explain most of the variation in the original variables with a small set of uncorrelated variables/components, where the new variables/components are linear combinations of the original variables that are highly correlated. The first principal component would account for the highest variation among the original variables; the second and subsequent components would account for most of the remaining variation. Due to the complex mechanisms originating mostly from the chemistry and biological reactions in the water quality parameters, PCA was introduced as a tool to build a water quality index from the new uncorrelated variables/components. Because the water quality assessment problem is high-dimensional in nature and many variables from the original dataset are correlated (
Figure 6 (left)), PCA is a convenient way to account for most of the variation in the original dataset.
We used the scree plot and parallel analysis to determine the number of significant components that explain most of the data variation. The three components are sufficient to explain most of the variation observed in the data from this study. Different rotation methods, including orthogonal and oblique rotation, were tested. The Promax rotation was chosen because it first performs an orthogonal Varimax rotation and then relaxes the assumption of independence and allows for correlations between the factors to improve the fit [
19]. In the rotated PCAs called RCs, the first component (RC1) explains 41.2% of the total variation, the second component explains 30.6% of the total variation, and the last component explains 28% of the total variation (
Table 2).
The first rotated component (RC1) relies mainly on inorganic solids (TDS, phosphate, and chloride). Meanwhile, RC2 has larger weights mostly from pH and nitrate parameters and has a negative contribution from F.