Next Article in Journal
Assessment of the Effects of COVID-19 Pandemic Stay-at-Home Measures on Potable Water Consumption Patterns, Location, and Financial Impacts for Water Utilities in Colombian Cities
Next Article in Special Issue
Assessing and Mitigating Ice-Jam Flood Hazards and Risks: A European Perspective
Previous Article in Journal
Groundwater Quality for Irrigation Purposes in the Diass Horst System in Senegal
Previous Article in Special Issue
Marine Environmental Capacity in Sanmen Bay, China
 
 
Article
Peer-Review Record

Discrimination of Chemical Oxygen Demand Pollution in Surface Water Based on Visible Near-Infrared Spectroscopy

Water 2022, 14(19), 3003; https://doi.org/10.3390/w14193003
by Xueqin Han 1,†, Xiaoyan Chen 2,†, Jinfang Ma 1, Jiaze Chen 1, Baiheng Xie 1, Wenhua Yin 2, Yanyan Yang 2, Wenchao Jia 2, Danping Xie 2,* and Furong Huang 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Water 2022, 14(19), 3003; https://doi.org/10.3390/w14193003
Submission received: 24 August 2022 / Revised: 10 September 2022 / Accepted: 20 September 2022 / Published: 23 September 2022
(This article belongs to the Special Issue Surface Water Quality Modelling)

Round 1

Reviewer 1 Report

The comments and suggestion for authors are included in the attached pdf

Comments for author File: Comments.pdf

Author Response

Response to reviewers’ comments

 

Dear Editor and Dear reviewer:

 

On behalf of my co-authors, we thank you very much for giving us an opportunity to revise our manuscript. We appreciate editor and reviewers very much for their positive and constructive comments and suggestions on our manuscript entitled “Discrimination of chemical oxygen demand pollution in surface water based on visible near-infrared spectroscopy” (ID: water-1906610).

 

We have addressed all the questions and revised our manuscript, accordingly. Please Note: reviewers’ comments are in black; our comments are in red italic; the original paper text is in red and revised text is in blue.

 

We would like to express our great appreciation to you and reviewers for comments on our paper. Looking forward to hearing from you.

Sincerely,

Xueqin Han, Xiaoyan Chen, Jinfang Ma, Jiaze Chen, Baiheng Xie, Wenhua Yin, Yanyan Yang, Wenchao Jia, Danping Xie, Furong Huang.

Corresponding authors: Furong Huang

E-mail addresses[email protected]

 

 

Reviewer #1:

 

1.The format of citations and reference list is not the recommended by the journal. They should be transformed to the journal's style.

Response: Thanks for the reviewer’s comment. We have revised the format of citations and reference.

  1. The bibliography of the article is very limited, it must be enriched.

Response: Thanks for the reviewer’s comment. We have added 9 citations to a total of 34 citations.

 

  1. The discussion of the article should be more extensive. Especially a direct comparison with the results obtained from the previous article of the same authors «"Estimation of chemical oxygen demand in different water systems by near-infrared spectroscopy" should be performed, as the surface samples that have been used in this article are the same with those used in your previous article and a comparison can be obtained.

Response: Thanks for the reviewer’s comment. We have added a comparison of results with our previous article in the introduction, as follows.

 

.......However, it has not been validated whether this algorithm can effectively discriminate if the COD of surface water exceeds the threshold through Vis-NIR.

The purpose of this study was to explore the best comprehensive modeling approach of Vis-NIR to diagnose whether the COD of surface water exceeds its management value. The following objectives were considered: .......

.......However, it has not been validated whether this algorithm can effectively discriminate if the COD of surface water exceeds the threshold through Vis-NIR.

In our last article, we have achieved quantitative predictions for surface water, but not very good predictions for COD higher than 120 mg/L. In this experiment, samples with more seriously polluted , whose COD is greater than 600 mg/L were added, and the method of qualitative discrimination was tried to achieve high-accuracy COD online discrimination, which provided new ideas for surface water quality management.

The purpose of this study was to explore the best comprehensive modeling approach of Vis-NIR to diagnose whether the COD of surface water exceeds its management value. The following objectives were considered: .........

 

 

  1. In the discussion section the adequacy of water samples for the calibration, prediction and validation of the model should be discussed since the samples were collected during a very limited time (4 months) and there is a question if they are representative to calibrate and evaluate the model.

Response: Thanks for the reviewer’s comment. For the adequacy of model calibration and prediction, etc., we have placed this part in the last paragraph of the discussion section in Section 4.2, as follows.

 

4.2. Implication of proposed strategy

 ........ Combining the CARS selection algorithm with the SMOTE algorithm not only improved the discrimination accuracy of the model but also reduced the input of the discrimination model.

 ........ Combining the CARS selection algorithm with the SMOTE algorithm not only improved the discrimination accuracy of the model but also reduced the input of the discrimination model.

In this study, the surface water samples were collected for a total of 4 months, covering both the rainy and non-rainy seasons in Guangzhou. Changes in the rainy season will lead to changes in COD, because the runoff generated by the rainfall in the rainy season will cause pollutants from land sources to enter the water, resulting in an increase in COD. From the principle of COD chemical detection, these pollutants are all aerobic substances. The aerobic substances in the surface water in the rainy season and non-rainy season have general law, and there will be no major changes in components due to the rainy season. We have carried out Vis-NIR detection on a large number of samples, and used surface water model to grasp the quantitative relationship between all aerobic substances and COD values as much as possible. We have used the CARS-SMOTE-PLSDA model to realize the online monitoring of large COD values, which provides a new way of discriminating for the management of seriously polluted surface water.

 

  1. Lines 64-66: A short description of partial least squares discriminant analysis (PLSDA) should be included.

Response: Thanks for the reviewer’s comment. We have added a brief description of the PLS-DA algorithm here, as follows.

 

The analytical method of this technique mainly involves the establishment of a calibration model using the spectra and conventional values of the target components. Linear discriminant models such as partial least squares discriminant analysis (PLS-DA) are commonly used in spectral modeling owing to their simple structure and ease of operation (Yuan et al., 2022).

The analytical method of this technique mainly involves the establishment of a calibration model using the spectra and conventional values of the target components. Linear discriminant models such as partial least squares discriminant analysis (PLS-DA) are commonly used in spectral modeling owing to their simple structure and ease of operation (Yuan et al., 2022). PLS-DA is a classification technique based on partial least squares. Its mathematical basis is principal component analysis, and the regression model between the independent variable and the categorical variable of the training sample is mainly established by the information of the samples in the process of features selection, and then the characteristic variables related to the classification are effectively extracted

 (Zhang et al., 2020).

 

 

  1. Lines 140-142: bibliographic references should be added for the selected four preprocessing methods used to minimize the sources of spectral variability.

Response: Thanks for the reviewer’s comment. We have added a citation here, as follows.

 

The measured spectrum was inevitably affected by instrument noise and the ambient environment. Therefore, four spectral pre-processing methods were used for the spectra of the water samples: first derivative (FD), second derivative (SD), multiplicative scatter correction (MSC), and standard normal variate (SNV).

The measured spectrum was inevitably affected by instrument noise and the ambient environment. Therefore, four spectral pre-processing methods were used for the spectra of the water samples: first derivative (FD), second derivative (SD), multiplicative scatter correction (MSC), and standard normal variate (SNV) (Hong et al., 2019).

 

  1. Line 145: Add bibliographic reference for joint X-Y distance (SPXY) method for sample partitioning

Response: Thanks for the reviewer’s comment. We have added two citations here, as follows.

 

The sample set partitioning based on joint X-Y distance (SPXY) was used.

The sample set partitioning based on joint X-Y distance (SPXY) (Xu and Goodacre, 2018, Galvao et al. 2005) was used.

 

  1. Lines 150-165: The indices chosen to assess the effectiveness of the model forprediction should presented in a separate section. Additionally, you shouldjustify their selection instead of οther widely used indices such as linear correlation coefficient (R2), root mean-squared error of calibration (RMSEC) and root mean-squared error of validation (RMSEV). Also add bibliographic reference for the indices used.

Response: Thanks for the reviewer’s comment. We have refined the expression here and put the model evaluation parameters in a new section, as follows.

 

2.4 Sample set partitioning and model evaluation parameters

The sample set partitioning based on joint X-Y distance (SPXY) (Xu and Goodacre, 2018, Galvao et al. 2005) was used. The training and test sets were partitioned with a ratio of 3:1. The training set could identify different classes of spectral patterns upon fitting the classification model, whereas the test set was used to evaluate the performance of the model. The specific partitioning results with the surface water sample information are shown in Table 1.

The performance of a classification model is generally evaluated using the accuracy, sensitivity, and specificity of the prediction set. When the accuracy, specificity, and sensitivity are closer to 1, the classification model has better performance. The classification accuracy refers to the ratio of the number of samples correctly discriminated to the total number of samples in the classification model when testing the established model using the prediction set. Sensitivity and specificity are two key metrics for the classification model that indicate the percentage of positive and negative samples correctly classified, respectively.

2.4 Sample set partitioning

The sample set partitioning based on joint X-Y distance (SPXY) (Xu and Goodacre, 2018, Galvao et al. 2005) was used. The training and test sets were partitioned with a ratio of 3:1. The training set could identify different classes of spectral patterns upon fitting the classification model, whereas the test set was used to evaluate the performance of the model. The specific partitioning results with the surface water sample information are shown in Table 1.

2.5 Evaluation of the model performance

The accuracy, sensitivity, and specificity were used to evaluate the overall performance of the classification models. The classification accuracy refers to the ratio of the number of samples correctly discriminated to the total number of samples in the classification model when testing the established model using the prediction set. The sensitivity and specificity are two key metrics for the classification model that indicate the percentage of positive and negative samples correctly classified, respectively. When the accuracy, specificity, and sensitivity are closer to 1, the classification model has better performance.

 

 

  1. Lines 261-262. The attached reference does not correspond to the text since the article does not refer to COD in surface water.

Response: Thanks for the reviewer’s comment. We have changed the citations here. Although we have cited literature from other fields, we believe that the approximate locations of these bonds are consistent in the NIR region.

 

Since there are large peaks and troughs near 1800 nm, the spectra after SD pre-processing were locally amplified to obtain Figures 2(c) and 2(d). These figures show more pronounced absorptions at 1400, 1450, and 1980 nm, which may be caused by the stretching vibrations of the O-H, C-H, and N-H bonds, respectively (Cen and He, 2007; Rady and Guyer, 2015).

Since there are large peaks and troughs near 1800 nm, the spectra after SD pre-processing were locally amplified to obtain Figures 2(c) and 2(d). These figures show more pronounced absorptions at 1400, 1450, and 1980 nm, which may be caused by the stretching vibrations of the O-H, C-H, and N-H bonds, respectively (Cen and He, 2007; Rossel and Behrens, 2010; Xu et al., 2020). They also show that the uncontaminated and contaminated samples exhibited large differences in these three bands.

 

10.Line 313: “ .the Receiver operating characteristic (ROC) curves…. Add reference.

Response: Thanks for the reviewer’s comment. We have refined the expression here and added a reference, as follows.

 

To further investigate the performance of the three models, the Receiver operating characteristic (ROC) curves of the four different pre-processing methods were plotted and analyzed, as shown in Figure 4.

To further investigate the performance of the three models, the Receiver operating characteristic (ROC) curves of the four different pre-processing methods were plotted and analyzed, the ROC is a comprehensive evaluation index reflecting the continuous variables of the sensitivity and specificity in the classification problem (Daniel et al., 2021). As shown in Figure 4.

 

  1. Line 347: The legend of Figure 5 is missing.

Response: Thanks for the reviewer’s comment. We have added the legend of Figure 5, as follows. 

 

Figure 5. (a) Feature bands selected by competitive adaptive reweighted sampling (CARS) after second derivative (SD) pre-processing, (b) score plot of the feature bands

 

Author Response File: Author Response.docx

Reviewer 2 Report

1-      Add a table containing the surface water characteristics (TDS, pH, etc…)

2-      Does this method remains applicable if the water samples were collected in rainy season since the COD will change during winter

3-      Write the equations of the simplified model used in this study

4-      Add a caption under figure 5

5-      In table 3, remove the reference from the table title and add it in the text instead

6-      In line 297, specify the four different methods used

7-      Add a curve that compares the measured COD values with those analyzed using  Vis-NIR method

8-      Write the references according the MDPI reference format (both in the text and in the end of the manuscript)

Author Response

Response to reviewers’ comments

 

Dear Editor and Dear reviewer:

 

On behalf of my co-authors, we thank you very much for giving us an opportunity to revise our manuscript. We appreciate editor and reviewers very much for their positive and constructive comments and suggestions on our manuscript entitled “Discrimination of chemical oxygen demand pollution in surface water based on visible near-infrared spectroscopy” (ID: water-1906610).

 

We have addressed all the questions and revised our manuscript, accordingly. Please Note: reviewers’ comments are in black; our comments are in red italic; the original paper text is in red and revised text is in blue.

 

We would like to express our great appreciation to you and reviewers for comments on our paper. Looking forward to hearing from you.

Sincerely,

Xueqin Han, Xiaoyan Chen, Jinfang Ma, Jiaze Chen, Baiheng Xie, Wenhua Yin, Yanyan Yang, Wenchao Jia, Danping Xie, Furong Huang.

Corresponding authors: Furong Huang

E-mail addresses[email protected]

 

 

 

 

 

 

 

 

Reviewer #2:

  • Add a table containing the surface water characteristics (TDS, pH, etc…).

Response: Thanks for the reviewer’s comment. Based on our detection information, We have added a range of pH values in Table 1, as follows.

Sample Type

Set

Number of Samples

Min (mg/L)

Max (mg/L)

Mean (mg/L)

Median (mg/L)

COD value>40 mg/L

COD value<40 mg/L

Surface water

All

127

4

688

61.98

27

39

88

Training set

95

4

688

58.65

20

25

70

Testing set

32

5

313

50.25

18

14

18

Sample Type

Set

The range of PH

Number of Samples

Min (mg/L)

Max (mg/L)

Mean (mg/L)

Median (mg/L)

COD value>40 mg/L

COD value<40 mg/L

Surface water

All

5.63-8.92

127

4

688

61.98

27

39

88

Training set

5.63-7.85

95

4

688

58.65

20

25

70

Testing set

6.52-8.92

32

5

313

50.25

18

14

18

 

  • Does this method remains applicable if the water samples were collected in rainy season since the COD will change during winter.

Response: Thanks for the reviewer’s comment. Changes in the rainy season will lead to changes in COD, and the runoff generated by rainfall in the rainy season will cause pollutants from land sources to enter the water, resulting in an increase in COD. From the principle of COD detection, these pollutants are all aerobic substances. The aerobic substances in the surface water in the rainy season and non-rainy season have general rules, and there will be no major changes in components due to the rainy season. We perform near-infrared detection on a large number of samples, and use the model to summarize the pattern of the map, that is, to grasp the quantitative relationship between all aerobic substances and COD values as much as possible. Therefore, the surface water in the rainy and non-rainy seasons will not have a great influence on the detection results of COD by near-infrared spectroscopy.

 

  • Write the equations of the simplified model used in this study

Response: Thanks for the reviewer’s comment. We have used the CARS algorithm to simplify the model, Therefore, we introduce the principle of the algorithm in more detail in the CARS algorithm backup, and add the formula of the principle, as follows.

 

CARS is a wavelength selection method that adopts the Darwinian evolution theory of “survival of the fittest “, The key wavelengths selected are those with relatively large absolute coefficients in the multiple linear regression model. This selection method conducts wavelength selection based on the exponential decay function (EDF), and then selects the key wavelengths based on the competitive wavelength selection of adaptive reweighted sampling (Li et al., 2009; Yang et al., 2019).

The aforementioned algorithms were run in MATLAB (R2018a, Math Works, Inc., Natick, MA, USA).

CARS is a wavelength selection method that adopts the Darwinian evolution theory of “survival of the fittest “, The key wavelengths selected are those with relatively large absolute coefficients in the multiple linear regression model. This selection method conducts wavelengths selection based on the exponential decay function (EDF), and then selects the key wavelengths based on the competitive wavelength selection of adaptive reweighted sampling (Li et al., 2009; Yang et al., 2019). The algorithm implementation is divided into the following four steps:

  • Perform monte carlo sampling and select a certain proportion of samples to build a calibration model.
  • Use EDF to remove the number of wavelengths with low absolute values of regression coefficients.
  • Calculate root mean square errorcross-validation (RMSECV) and filter out significant wavelengths using adaptive reweighted sampling (ARS).
  • Select the subset with the lowest RMSECV as the best wavelengths combination.

EDF can realize the rapid elimination and selection of wavelengths. In each sampling process, the wavelength ratio to be retained is calculated by using EDF. The calculation formula of the wavelengths ratio is as follows.

 

 

Among them,  is related to two fast constants, which are related to the number of spectral wavelengths  and the number of sampling runs  in CARS.

 

 

After forced wavelength reduction by EDF, ARS is used to imitate the principle of survival of the fittest, and wavelengths are eliminated in a competitive manner. In ARS, variables will be randomly weighted and sampled, and variables with larger weights will be selected.

The aforementioned algorithms were run in MATLAB (R2018a, Math Works, Inc., Natick, MA, USA).

 

 

 

  • Add a caption under figure 5

Response: Thanks for the reviewer’s comment. We have added the caption of Figure 5, as follows. 

 

Figure 5. (a) Feature bands selected by competitive adaptive reweighted sampling (CARS) after second derivative (SD) pre-processing, (b) score plot of the feature bands.

 

  • In table 3, remove the reference from the table title and add it in the text instead

Response: Thanks for the reviewer’s comment. We've moved the reference into the text above.

Table 3. Basic chemical bonds, absorption wavelengths, and possible associated water pollution components of main spectral bands screened by competitive adaptive reweighted sampling (CARS) for visible near-infrared region (Cen and He, 2007; Cozzolino et al., 2003).

 

The chemical bonds corresponding to the main bands of the Vis-NIR region screened by CARS and the possible corresponding contamination components are shown in Table 3. The band most screened by CARS was near 400-860 nm; this may arise from the vibration of C-H and N-H chemical bonds, such as those in aromatic hydrocarbons (Cozzolino et al., 2003,Cen and He, 2007; Cozzolino et al., 2003).

Table 3. Basic chemical bonds, absorption wavelengths, and possible associated water pollution components of main spectral bands screened by competitive adaptive reweighted sampling (CARS) for visible near-infrared region. 

 

  • In line 297, specify the four different methods used.

Response: Thanks for the reviewer’s comment. We have refined the expression here and enriched the description in this section.

Compared with those of the PLS-DA model, the modeling results of SMOTE-PLS-DA with SD pre-processing were the best. The training and test set accuracies of the model improved by 9% and 6%, respectively. The sensitivity and specificity also improved, whereas the test set accuracy with MSC and SNV pre-processing did not significantly improve.

Compared with those of the PLS-DA model, the SMOTE-PLS-DA model accuracy with the FD, SD and MSC pre-processing methods has been improved. Among them, for the FD pre-processing method, The training and test set accuracies of the model improved by 7% and 7%ï¼›For the SD method, he training and test set accuracies of the model improved by 9% and 6%. For the MSC method, the training and test set accuracies of the model improved by 12% and 3%. However, the accuracy of the SMOTE-PLS-DA model of the SNV pre-processing method has not been improved, but the sensitivity of the model has been greatly improved.

  • Add a curve that compares the measured COD values with those analyzed using Vis-NIR method.

Response: Thanks for the reviewer’s comment. We have put the fitting diagrams of the PLS-DA, SMOTE-PLS-DA and CARS-SMOTE-PLS-DA models with the best preprocessing method together with the ROC curve in Figure 4,as follows.

 

 

Figure 4. Receiver operating characteristic (ROC) curves and surface water score map: (a), (d) partial least squares discriminant analysis (PLS-DA) model, (b), (e) synthetic minority oversampling technique (SMOTE)-PLS-DA model, (c), (f) competitive adaptive reweighted sampling (CARS)-SMOTE-PLS-DA model.

*FD: first derivative; SD: second derivative; MSC: multiple scattering correction; SNV: standard normal variate.

 

  • Write the references according the MDPI reference format (both in the text and in the end of the manuscript).

Response: Thanks for the reviewer’s comment. We have refined our manuscript according to the MDPI reference format.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors have well revised the manuscript and addressed all the comments thus it can be accepted in its revised form

Back to TopTop