Next Article in Journal
Real-Time Face Mask Detection to Ensure COVID-19 Precautionary Measures in the Developing Countries
Previous Article in Journal
Scoliosis Brace Finite Element Model and Preliminary Experimental Testing Using Electronic Speckle Pattern Interferometry
 
 
Article
Peer-Review Record

A Neural Network-Based Method for Respiratory Sound Analysis and Lung Disease Detection

Appl. Sci. 2022, 12(8), 3877; https://doi.org/10.3390/app12083877
by Luca Brunese 1,†, Francesco Mercaldo 1,2,*,†, Alfonso Reginelli 3,† and Antonella Santone 1,†
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(8), 3877; https://doi.org/10.3390/app12083877
Submission received: 9 February 2022 / Revised: 24 March 2022 / Accepted: 1 April 2022 / Published: 12 April 2022
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Dear Authors,

Thank you for your resubmission and for addressing all of the issues. The paper looks great now.

Author Response

We are really thankful to the reviewer for the interesting suggestions and for the opportunity to improve the quality and the presentation of the proposed manuscript.

Reviewer 2 Report

This article describes a method to detect lung disease from respiratory sound recordings and further classify the recordings into different types of lung disease. The described method is an interesting application of machine learning in medical diagnostics and can be of interest to the journal’s audience. The study design and method overall is sound, but the presentation needs some improvement. In particular, some important details seem to be missing as follows.

 - The features used for training the models are not described in sufficient details. For example, in section 2.2 what specific frequency range is used for calculating SC, how the “bandwidth” feature is calculated or what percentage of the total spectral energy is used for SR. Providing mathematical formula for each of the features would be helpful. Also, providing justification and discussion around the selection of these particular features will be useful.

- The train test split strategy could also be described in more details. For instance, what percentage of data was used for test vs train (i.e. the value of k in k-fold cross validation) vs validation, was the splitting stratified (i.e. the proportion of the classes preserved in the splits) etc.

- The classes for disease classification seem highly imbalanced (e.g. 1 Asthma vs 64 COPD cases), how is this handled during training phase to ensure they are equally presented during the training phase? Also, the performance metrics are reported as overall values. Given the imbalance, it would be helpful to see the metrics for each class. The overall metrics can be highly biased with larger classes (COPD in this case). 

- How was the hyper parameter tuning conducted? My impression from lines 208-212 is that this was conducted manually. I would recommend that this step be conducted in a systematic way (e.g. grid search) and described in more details.

- From the last paragraph of the article (lines 388-392) it seems like no feature normalization was conducted prior to feeding the feature vectors into the models. This usually is a  necessary step specially with features that are in different numerical ranges. From figures 2-5 this seems to be the case (e.g. F5 in the range of 500-2500 while F9 in the range of ~0-0.04). Also there is no mention of regularization in the method. Was any regularization used in the models? 

- Finally I think some of the basics are explained in too much details, e.g. how the csv file for data is structured etc. Also, I would recommend proof reading the article for English language and style corrections.

Author Response

Comment #1: The features used for training the models are not described in sufficient details. For example, in section 2.2 what specific frequency range is used for calculating SC, how the “bandwidth” feature is calculated or what percentage of the total spectral energy is used for SR. Providing mathematical formula for each of the features would be helpful. Also, providing justification and discussion around the selection of these particular features will be useful.
Response: In the revised version of the paper we added several references to better justify the rationale behind the features we selected and the mathematical background about the features we considered. Moreover several references are introduced aimed to provide to explanation to feature computation.
 
Comment #2: The train test split strategy could also be described in more details. For instance, what percentage of data was used for test vs train (i.e. the value of k in k-fold cross validation) vs validation, was the splitting stratified (i.e. the proportion of the classes preserved in the splits) etc.
Response: We are really thankful for the observation. In the revised version of the manuscript we clarified this aspect. As a matter of fact, we exploited  the 80:20 as train test split strategy. Thank you again for the opportunity to improve the proposed manuscript.
 
Comment #3: The classes for disease classification seem highly imbalanced (e.g. 1 Asthma vs 64 COPD cases), how is this handled during training phase to ensure they are equally presented during the training phase? Also, the performance metrics are reported as overall values. Given the imbalance, it would be helpful to see the metrics for each class. The overall metrics can be highly biased with larger classes (COPD in this case). 
Response: We are really thankful to the reviewer for the opportunity to improve the proposed manuscript. In the experimental analysis, we considered a version of k-fold cross-validation that preserves the imbalanced class distribution in each fold. It is called stratified k-fold cross-validation and will enforce the class distribution in each split of the data to match the distribution in the complete training dataset. In other words, the folds are selected so that each fold contains roughly the same proportions of class labels of the original dataset. We added in the revised version of the paper a better explaination about the cross-validation and the stratified cross validation we considered in the experimental analysis. Relating to the performance metrics, in the revised version of the paper we added the performance metrics for the single diseases.  Thank you again for your suggestions.
 
Comment #4: How was the hyper parameter tuning conducted? My impression from lines 208-212 is that this was conducted manually. I would recommend that this step be conducted in a systematic way (e.g. grid search) and described in more details.
Response: In order to tune hyper parameter we exploited the Exhaustive Grid Search provided by the Orange data mining tool. In particular we expolited  the GridSearch CV that exhaustively considers all parameter combinations in order to find the best ones. We are really thankful to the reviewer for the opportunity to improve the quality of the proposed manuscript.
 
Comment #5: From the last paragraph of the article (lines 388-392) it seems like no feature normalization was conducted prior to feeding the feature vectors into the models. This usually is a  necessary step specially with features that are in different numerical ranges. From figures 2-5 this seems to be the case (e.g. F5 in the range of 500-2500 while F9 in the range of 0-0.04). Also there is no mention of regularization in the method. Was any regularization used in the models?
Response: We are really thankful to the reviewer for the opportunity to improve the quality of the proposed manuscript. As highlighted by the reviewer we do not take into account a feature normalisation step. We are aware that feature normalization is beneficial in many cases. It improves the numerical stability of the model and often reduces training time. However, it can harm the performance of distance-based clustering algorithms by assuming equal importance of features. If there are inherent importance differences between features, typically is not exploited the normalisation of the features.
 For instance, neural networks can counteract standardization in the same way as regressions. Therefore, in theory, data standardization should not affect the performance of a neural network. These are the reason why we do not consider feature normalisation. We added these sentences in the revised version of the paper. Relating to the regularization, we added following sentences into the paper: "Regularization is used in machine learning as a solution to overfitting by reducing the variance of the ML model under consideration. Regularization can be implemented in multiple ways by either modifying the loss function, sampling method, or the training approach itself. With the aim to avoid overfitting we exploited the cross validation."
 
Comment #6: Finally I think some of the basics are explained in too much details, e.g. how the csv file for data is structured etc. Also, I would recommend proof reading the article for English language and style corrections.
Response: We are really thankful to the reviewer for the opportunity to improve the presentation of the proposed manuscript. In the revised version of the paper, we removed the suggested unuseful details and we performed a deep proofread aimed to avoid grammatical errors and in general to improve the English language. 

Round 2

Reviewer 2 Report

The changes authors have made to the article have clarified most of the points I brought up in my first review. I am still not quite convinced why feature normalization was not conducted, but I can recommend the article for publication in its current form.

Back to TopTop