Next Article in Journal
Enhancing Industrial IoT Network Security through Blockchain Integration
Previous Article in Journal
AdvMix: Adversarial Mixing Strategy for Unsupervised Domain Adaptive Object Detection
Previous Article in Special Issue
High-Fidelity Synthetic Face Generation for Rosacea Skin Condition from Limited Data
 
 
Article
Peer-Review Record

Exploring Machine Learning for Predicting Cerebral Stroke: A Study in Discovery

Electronics 2024, 13(4), 686; https://doi.org/10.3390/electronics13040686
by Rajib Mia 1,*, Shapla Khanam 1, Amira Mahjabeen 1, Nazmul Hoque Ovy 1, Deepak Ghimire 2, Mi-Jin Park 3,*, Mst Ismat Ara Begum 4 and A. S. M. Sanwar Hosen 5
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2024, 13(4), 686; https://doi.org/10.3390/electronics13040686
Submission received: 18 November 2023 / Revised: 14 January 2024 / Accepted: 1 February 2024 / Published: 7 February 2024
(This article belongs to the Special Issue Machine Learning in Electronic and Biomedical Engineering, Volume II)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

 

General comment:

This manuscript describes a study for predicting cerebral stroke using state-of-the-art machine learning algorithms. The work is relevant in signal processing and data science for biomedical applications. Furthermore, the proposal is well-motivated and represents an advance in the knowledge for researchers and professionals working with advanced algorithms for biomedicine. The experimental framework is clear and the results are well supported. The manuscript is interesting and well-written. I have some points that should be addressed before the manuscript can be accepted.

 

Comment 1:

In section 3.4, the title has a typo in the word strokes (it says stokes).

 

Comment 2:

The authors claim to use ML techniques. However, there is no mention of unsupervised learning algorithms. For instance, what about data clustering and dimensionality reduction?

 

Comment 3:

It would be better to present the performance results graphically, instead of giving numbers as in Table 2.

 

Comment 4:

From the perspectives of the work, it should be important to add some statements about deep learning algorithms, and generative AI models for stroke prediction.

 

 

 

 

 

 

 

 

 

Author Response

Please see the attachment. 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This research investigates the application of robust Machine Learning (ML) algorithms, including Logistic Regression (LR), Random Forest (RF), and K-Nearest Neighbor (KNN), to the prediction of cerebral strokes. The data is generated using the Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and Random Over-Sampling Technique (ROSE) to address class imbalances to improve the accuracy of minority classes. To address the challenge of forecasting strokes from partial and imbalanced physiological data, this study introduces a novel hybrid ML approach.The research work reported is interesting in the community. Some suggestions are listed below to improve the manuscript's quality (major revision):

1. The manuscript's motivations should be further highlighted in the manuscript, e.g., what problems did the previous works exist? How to solve these problems? 

2. The authors must clearly explain the difference(s) between the proposed method and similar works in the introduction.

3. The authors should further highlight the manuscript's innovations and contributions.

4. In the section of 1. Introduction,the main contributions of this paper should be further summarized and clearly demonstrated. 

5. In this paper, all figures are missed, please add them into the revised paper.

6. The literature review is poor in this paper. I hope that the authors can add some new references in order to improve the reviews. For example, https://doi.org/10.1109/JIOT.2023.3296460; https://ieeexplore.ieee.org/document/8846596; http://dx.doi.org/10.1109/TCSS.2022.3152091 and  so on.

7. At Line 135 and 136, "In the pursuit of exceptional precision, the dataset is thoughtfully partitioned into two segments: the training data, comprising 80%, and the testing data, making up the remaining 20%." Why  the training data are 80% and the testing data are 20%? can 70% and 30% or 60% and 40%?

8. In the expression (1), what are the physical meanings of parameters, variables, and constants? Please provide them.

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Please see the attachment. 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors compared the performance of three machine learning models with three oversampling techniques for Harvard Dataverse Repository stroke dataset.

Below are my comments.

Abstract.

The data is generated using the Synthetic Minority Over-sampling Technique (SMOTE), 6 Adaptive Synthetic Sampling (ADASYN), and Random Over-Sampling Technique (ROSE) to address 7 class imbalances to improve the accuracy of minority classes. – What data were you sampling? No introduction of the dataset

To address the challenge of forecasting 8 strokes from partial and imbalanced physiological data, this study introduces a novel hybrid ML 9 approach. – unfold on the approach 

Introduction

The scientific community places a strong emphasis on creating predictive models for stroke with the aim 54 of prevention, considering its significant societal impact. -using what type of data?

To facilitate the application of ML models in clinical practice, we selected data that physicians can readily monitor. – such as?

Related work

2nd paragraph is unclear – authors site 12, continue discussion of the paper, and site 13 (entirely different study)

Table 1 provides insights into previous works and their respective methodologies and accuracies, underscoring the ongoing advancements in this critical  domain. – No Table 1 with this information

Furthermore, it’s noted that addressing the 3% missing information related to BMI 116 is essential to enhance execution assessment. – this sentence is disconnected from the remaining of the paragraph

Overall for related work, I suggest to unfold on ML method performance and  source/type and size of data (e.g. EHR, -omics, etc.) used to build the models for every study cited (some has it, but many are missing).

Materials and Methods

The dataset encompasses 43,400 samples, charac terized as a standard class unbalanced type – What is standard class unbalanced type?

What is the case/control distribution of the dataset

No referenced figures are available

The relationship between body mass index and intermediate glucose level is so mini- 156 mal that it could be considered negligible. – why is this relevant for data analysis section?

Notably, only one conceivable outcome exists 157 for the correlation coefficient, demonstrating a negative but statistically insignificant assocation between BMI and stroke. This sentences should be in results

The entire section 3.1 is confusing and needs to be rewritten. It looks like authors tried to described a missing values distribution in the dataset, A simple table/barplot would be more efficient.

In this study, missing values are effectively imputed by leveraging the mean of other available values. – there are better methods (knn, iterative imputer)

To tackle data imbalance, three oversampling  techniques are employed to refine the final output. What is the imbalance proportion?

In addition to oversampling techniques suggest to use class imbalanced sensitive metric on the original dataset (balanced accuracy, precision-recall curve). Depending on class imbalance degree this will provide more realistic estimate of a model.

Dataset have a little over 20 features, feature selection is unnesessary

3.4. Classification of Stokes using Machine Learning Models – I don’t think the manuscript needs this section. All three methods are well known and considered as a basic knowledge in ML field.

Same for the Evaluation Method section: correlation coefficient and performance metrics are common knowledge from the statistics text books.

I don’t think that the table 3 provides a fair comparison, it was done for different datasets (e.g. you can’t compare image and EHR models). Also, authors used an oversampling technique that could lead to inflated performance values.

All figures are missing.

 

Author Response

Please see the attachment. 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

This paper can be accepted now.

Comments on the Quality of English Language

This paper can be accepted now.

Author Response

There is no review. So, I have no attachment.  Thanks. 

Back to TopTop