Next Article in Journal
Functional Connectivity and Feature Fusion Enhance Multiclass Motor-Imagery Brain–Computer Interface Performance
Previous Article in Journal
Scale-Aware Tracking Method with Appearance Feature Filtering and Inter-Frame Continuity
 
 
Article
Peer-Review Record

Design and Implementation of an Atrial Fibrillation Detection Algorithm on the ARM Cortex-M4 Microcontroller

Sensors 2023, 23(17), 7521; https://doi.org/10.3390/s23177521
by Marek Żyliński *, Amir Nassibi and Danilo P. Mandic
Reviewer 1:
Reviewer 2:
Reviewer 3:
Reviewer 4:
Sensors 2023, 23(17), 7521; https://doi.org/10.3390/s23177521
Submission received: 21 July 2023 / Revised: 25 August 2023 / Accepted: 26 August 2023 / Published: 30 August 2023

Round 1

Reviewer 1 Report

Additional testing and analysis may clarify the robustness of your work more sufficiently 

Revision of you writing is needed 

Author Response

Comment #1:

Additional testing and analysis may clarify the robustness of your work more sufficiently.

Comment #2:

Revision of you writing is needed.

 

Authors’ answer to comment #1 and #2:

The authors would like to thank the reviewer for bringing up this important point regarding additional testing and analysis. In the review process, we have improved both the Methodology and Results sections. We hope that the robustness of our work is now evident. Furthermore, we have conducted a revision of the paper in accordance with the reviewer's request.

 

 

Reviewer 2 Report

The manuscript presents initial report on migrating AF detection to a microcontroller that could be used on a ECG recording device and thus eliminate the need to transmit raw ECG to cloud for AF detection. Although performed on pre-measured and annotated ECG, the results demonstrate the viability of the approach.

The work is described in a clear manner and I find only minor problems that I list below.:

SVM should be shortly introduced (and the acronym explained), as should the Naive Bayes. The machine learning approaches should not be described in detailed but a short description with adequate references is needed.

The explanation of how the timing of the code is performed could be made simpler to improve readability.

- The pseudo code that demonstrates positioning of timing calls (Algorithm 1) is simple enough (trivial) for it to be replaced by a simple description (the existing accompanying description is already fine).

- No need to specify timer ticks in Table 1, they bear no additional information.

- Execution time for 1 recording does not seem like a good measure, since recordings are supposedly of different lengths (as specified in Method section). Perhaps a more informative measure can be constructed, e.g. execution time for classification of 1 s of ECG recording?

Discussion overlaps the topics that should be moved to Introduction, such as other approaches including CNNs, optimizations, communication with Cloud as an alternative to edge-computing etc. Overall the discussion part is far too long, it should only focus on the discussion of the methodology and the results, not related work.

There is clearly an error in the following statement, the value should probably be in seconds: "Transferring such a substantial amount of data via Bluetooth Low Energy would require at least 0.024 ms". There is also an error in the presented reasoning. If the proposed algorithms only deal with RR intervals, then those are implicitly available to the microcontroller and only those should be transferred up the computational hierarchy for further processing, not the whole raw ECG.

I see some problems also with the methodology:

- Statistics on the RR intervals are used as features, but RR intervals are not a part of the raw input - in a real-life use case, the microcontroller on the ECG recorder must deduce RR peaks first and RR intervals later by itself. This task might not be trivial (depends on the level of noise on the input) and errors in detection will occur. While I believe discussing that part in depth would be off topic for the presented manuscript, some discussion is required. For example, it matters a great deal how the reference articles deal with ECG [at least references 18-21] - do they take RR as input or calculate it themselves, do they require it at all? Are there alternatives that work on raw ECG? Please discuss.

- Statistics on RR might not be a trustworthy feature on short measurements. Measurements that are 6 seconds long can contain less than 10 R peaks. Moreover, the statistics on measurements of different length are not exactly comparable, the worst being pNN50. Please discuss.

- There is an inherent error in using manually determined R wave peaks as input. Machine labeled data will behave differently, usually a regular ECGs will be labeled significantly better than the irregular ECGs. While the presented approach is correct for testing the algorithms (mainly their execution times), the reported accuracy of the algorithms will be quite different in real-life use case and their ranking by accuracy might be affected.

It should be pointed out that the execution time comparison is made between a compiled native (C) implementation on Cortex M4 and partly interpreted implementation (python) on the rest of the devices. The latter could be (significantly) optimized if required.

Cabernets plus application was unknown to me, I had to google it to find out it is probably Carnets Plus misspelled. There should be a reference to all the applications mentioned in the manuscript just to make the methods unambiguous.

The linked to github repository does not seem to be public.

There are minor mistakes present throughout the article, please re-check your spelling; e.g.: "a execution time".

Author Response

Comment #1:

The work is described in a clear manner and I find only minor problems that I list below.:

-SVM should be shortly introduced (and the acronym explained), as should the Naive Bayes. The machine learning approaches should not be described in detailed but a short description with adequate references is needed.

Authors’ answer to comment #1:

 

The authors would like to sincerely thank the reviewer for the positive opinion on the paper and for pointing out to the issue. To address this issue, we added a short introduction to both methods on page 3.

 

Comment #2 and #3:

-The explanation of how the timing of the code is performed could be made simpler to improve readability. 

- The pseudo code that demonstrates positioning of timing calls (Algorithm 1) is simple enough (trivial) for it to be replaced by a simple description (the existing accompanying description is already fine). 

Authors’ answer to comment #2 and #3:

 

In response to the concerns raised by the reviewer, we have removed the pseudo code. This adjustment aims to enhance the simplicity and improve overall readability.

 

Comment #4:

- No need to specify timer ticks in Table 1, they bear no additional information.

Authors’ answer to comment #4:

We have removed the column contains a number of timer ticks in Table 1. We agree that it contained the same information as the next column execution time in seconds.

Comment #5:

- Execution time for 1 recording does not seem like a good measure, since recordings are supposedly of different lengths (as specified in Method section). Perhaps a more informative measure can be constructed, e.g. execution time for classification of 1 s of ECG recording?

Authors’ answer to comment #5:

The authors appreciate these concerns. It should be noted that the classification was conducted using a fixed set of features, which remains consistent regardless of the recording duration. In this context, using the execution time for a single recording can be deemed an appropriate measure. This is because the classification time is not anticipated to be influenced by the length of the recording. However, the proposed metric execution time for 1 second of a recording is more appropriate for the calculation of features. 

Comment #6:

Discussion overlaps the topics that should be moved to Introduction, such as other approaches including CNNs, optimizations, communication with Cloud as an alternative to edge-computing etc. Overall the discussion part is far too long, it should only focus on the discussion of the methodology and the results, not related work.

Authors’ answer to comment #6:

In response to the reviewer ' concerns, we have made the discussion section more concise, with a primary focus on our results. Sections of a more general nature, such as the discussion of potential model improvements, have been relocated to another part of the paper.

 

Comment #7:

There is clearly an error in the following statement, the value should probably be in seconds: “Transferring such a substantial amount of data via Bluetooth Low Energy would require at least 0.024 ms”. There is also an error in the presented reasoning. If the proposed algorithms only deal with RR intervals, then those are implicitly available to the microcontroller and only those should be transferred up the computational hierarchy for further processing, not the whole raw ECG.

Authors’ answer to comment #7:

 

We acknowledge the presence of an error. The accurate transfer time for the data is 24 ms.

 

It is important to note that this example is not intended as a comprehensive system analysis but rather a simplified illustration that may contain certain limitations. The purpose of this example is to highlight the benefits of edge computing, which encompass power efficiency, reduced costs linked to data transfer, and enhanced user privacy.

 

Comment #8:

- Statistics on the RR intervals are used as features, but RR intervals are not a part of the raw input - in a real-life use case, the microcontroller on the ECG recorder must deduce RR peaks first and RR intervals later by itself. This task might not be trivial (depends on the level of noise on the input) and errors in detection will occur. While I believe discussing that part in depth would be off topic for the presented manuscript, some discussion is required. For example, it matters a great deal how the reference articles deal with ECG [at least references 18-21] - do they take RR as input or calculate it themselves, do they require it at all? Are there alternatives that work on raw ECG? Please discuss.

Authors’ answer to comment #8:

The authors would like to note that various approaches have been employed across different studies:

  • Tuboly et al. algorithm start with ECG signal, for R peaks detection they used Chistov detector.
  • Tison et al. as input used heart rate and step count data obtained from Apple Watches.
  • Tateno and Glass take as input RR intervals. For each beat they determined the density histograms of RR and ARR (for 100 beats segments centred on each beat).
  • Petmezas et al. used CNN-LSTM Network, first 1-D convolution layer as input take ECG signal.

We have included information regarding the methodologies utilized in the cited papers.

Indeed, it is feasible to perform AF classification based on raw signal. Convolutional neural networks (CNNs) have the capability to process raw ECG signals and autonomously extract a unique set of features. This approach, as demonstrated by Petmezas et al., yielded the highest accuracy. However, it is worth noting that this approach requires a larger dataset for proper training compared to the machine learning approach based on selected features.

 

Comment #9:

- Statistics on RR might not be a trustworthy feature on short measurements. Measurements that are 6 seconds long can contain less than 10 R peaks. Moreover, the statistics on measurements of different length are not exactly comparable, the worst being pNN50. Please discuss. 

Authors’ answer to comment #9:

pNN50 is a number of successive differences of RR intervals exceeding 50 ms in a recording. It is a widely used measure of heart rate variability, whereby 50 ms indicate a change of successive RR greater than 5% (at a heart rate 60 beats per second). Usually, this happens when an extra beat occurs, what is characteristic phenomena in AF. We agree that this feature is not comparable for different lengths of recording. Lack of standardization will result in higher pNN50 for longer recording. However, for normal rhythms, pNN50 is usually equal to 0.

The main issue with a short recording is that it may not catch an AF episode, thereby rendering the entire analysis inconclusive.

Comment #10:

- There is an inherent error in using manually determined R wave peaks as input. Machine labeled data will behave differently, usually a regular ECGs will be labeled significantly better than the irregular ECGs. While the presented approach is correct for testing the algorithms (mainly their execution times), the reported accuracy of the algorithms will be quite different in real-life use case and their ranking by accuracy might be affected.

Authors’ answer to comment #10:

We agree with this statement. Moreover, the usage of different R peaks detection algorithms may exhibit different results in real-life scenarios. Evaluation of determination of R peaks algorithms is an interesting topic for future work but it is beyond the scope of this paper.

Comment #11:

It should be pointed out that the execution time comparison is made between a compiled native (C) implementation on Cortex M4 and partly interpreted implementation (python) on the rest of the devices. The latter could be (significantly) optimized if required.

Authors’ answer to comment #11:

We have highlighted this in the Results section of the revised paper.

Comment #12:

Cabernets plus application was unknown to me, I had to google it to find out it is probably Carnets Plus misspelled. There should be a reference to all the applications mentioned in the manuscript just to make the methods unambiguous.

Authors’ answer to comment #12:

The authors express their gratitude to the reviewer for bringing this matter to our attention. We agree with the reviewer's observation that the application name is Carnets plus. We have rectified this and included a link to the application in the Apple Store.

Comment #13:

The linked to github repository does not seem to be public.

Authors’ answer to comment #13:

We updated the repository, making it now publicly accessible.

Once again, we extend our appreciation for the thorough revision of our work.

Reviewer 3 Report

The work is good. However, I have some severe concerns to resolve before acceptance.

1-Abstract would be enhanced by adding performance measure scores. Also, please highlight your findings.

2- The introduction section is too large and needs to be concise.

3-please highlight your contributions in the introduction section.

4-please proofread the manuscript. I found many typos and grammatical errors. There is a need for extensive English revision.

5-Why do you use the existing dataset? Provide justification.

6- The proposed study lacks a literature work section. How did you come up with the idea without doing a related literature analysis? 

7-Provide a step-by-step workflow diagram in the METHODS section that would be better to understand the proposed methodology.

8-The proposed methods/models need to be described in the manuscript.

9-The manuscript is missing the future work section.

Minor editing of English language required

Author Response

Comment #1:

The work is good. However, I have some severe concerns to resolve before acceptance.

1-Abstract would be enhanced by adding performance measure scores. Also, please highlight your findings.

Authors’ answer to comment #1:

The authors would like to sincerely thank the reviewer for reviewing our paper. We are glad that you appreciate our work.

To address the reviewer concerns, in the abstract, we provide the accuracy, sensitivity, specificity, and execution time of the SVM classifier with RBF kernel evaluated on ARM cortex M4 microprocessor. We have added highlights of the contributions of our work in the Introduction section.

Comment #2:

2- The introduction section is too large and needs to be concise.

Authors’ answer to comment #2:

We have shortened the Introduction section and relocated paragraphs related to edge computing to another section.

Comment #3:

3-please highlight your contributions in the introduction section.

Authors’ answer to comment #3:

In response to the reviewer's concerns, we have incorporated a paragraph that highlights contribution of the paper at the end of introduction section.

Comment #4:

4-please proofread the manuscript. I found many typos and grammatical errors. There is a need for extensive English revision.

Authors’ answer to comment #4:

In response to the reviewer's comment, we conducted a thorough English revision.

Comment #5:

5-Why do you use the existing dataset? Provide justification.

Authors’ answer to comment #5:

The use of the existing dataset has many advantages and is the perfect solution for evaluation methods. The use of existing dataset allows direct comparison of different methods, and the dataset that we used is publicly available for everyone. Usage of public dataset does not require performing complicated clinical study. The dataset employed comprises over 6,000 recordings. Creating such a vast trial involves substantial time, expenses, and other complexities. Naturally, conducting a clinical trial is an integral part of developing clinical methodologies, but is not necessary for establishing a proof of concept. 

Comment #6:

6- The proposed study lacks a literature work section. How did you come up with the idea without doing a related literature analysis? 

Authors’ answer to comment #6:

The authors would like to note that this work represents a continuation of our prior research, which involved participation in the PhysioNet 2020 challenge. In the current paper, we have referenced our previous work: Zylinski, M.; Cybulski, G. "Selected features for classification of 12-lead ECGs." In Proceedings of the IEEE Computing in Cardiology, 2020, Vol. 47, pp. 1–4. During that study, one of the authors conducted a literature review.

It is important to highlight that AF classification based on heart rate variability is not a novel concept. To further support this notion, we have included an additional reference to a book chapter that provides a detailed description of the features utilized in our study.

Comment #7:

7-Provide a step-by-step workflow diagram in the METHODS section that would be better to understand the proposed methodology.

Authors’ answer to comment #7:

In response to the reviewer's concerns, we have included a workflow diagram in the methods section.

Comment #8:

8-The proposed methods/models need to be described in the manuscript.

Authors’ answer to comment #8:

We have included a description of the supported vector machine and naïve Bayes classifiers that were used.

Comment #9:

9-The manuscript is missing the future work section.

Authors’ answer to comment #9:

We have now included a "Future Work" section where we outline our intentions to explore potential applications of more sophisticated models on edge devices (neural networks). Additionally, we aim to integrate classifiers with online ECG signal analysis conducted on microcontrollers.

Reviewer 4 Report

The work reports about an AF detection classifier. The MS is not complete.

a) Introduction has to be more succinct.

b) What is RR? It has not been defined.

c) The confusion matrices are not explained properly.

d) There are many parameters authors have covered such as normal, AF and others. Out of these, how the AF is incorporated and subsequent analysis are done is not evident from the work.

e) Authors are contradicting their own statement. at one point, they mention that comparison cannot be useful as implementation is on different dataset. Then, what is the point of Table 4?

f) Linkage between paragraphs need to be rational as well as more interpretation is necessary.

 

Minor editing is required.

Author Response

Comment #1:

The work reports about an AF detection classifier. The MS is not complete.

  1. a) Introduction has to be more succinct.

Authors’ answer to comment #1:

The authors would like to extend sincere gratitude to the reviewer for their valuable comments. While revising the paper, we enhanced the Methods section and incorporated additional explanations regarding the methodology employed. Additionally, we have condensed the Introduction section.

Comment #2:

  1. b) What is RR? It has not been defined.

Authors’ answer to comment #2:

RR is the time elapsed between two successive R points in an ECG signal. This metric serves as a standard measure for evaluating ECG recordings. In response to the reviewer's comment, we have incorporated a definition of this interval.

 

Comment #3:

  1. c) The confusion matrices are not explained properly.

Authors’ answer to comment #3:

Confusion matrices constitute a standard representation of classifier performance. We employed these matrices to showcase the results of the tested classifiers. We hold the view that an elaboration on the confusion matrix is unnecessary. This outcome was utilized for both the comparison of tested classifiers and the discussion of sensitivity issues within atrial fibrillation classifiers.

Comment #4:

  1. d) There are many parameters authors have covered such as normal, AF and others. Out of these, how the AF is incorporated and subsequent analysis are done is not evident from the work.

Authors’ answer to comment #4:

The authors would like to thank the reviewer for their comment. It is important to note that the authors utilized an open dataset containing atrial fibrillation annotations provided by clinicians. It should be clarified that the authors did not introduce atrial fibrillation artificially. We have reviewed the methods section and added a step-by-step workflow diagram, that provides information on subsequent analysis performed in our work.

Comment #5:

  1. e) Authors are contradicting their own statement. at one point, they mention that comparison cannot be useful as implementation is on different dataset. Then, what is the point of Table 4?

Authors’ answer to comment #5:

Table 4 presents the sensitivity and specificity of various atrial fibrillation classifiers as mentioned in the literature. The table demonstrates the alignment of our classifier's accuracy with that reported in the literature. Overall, atrial fibrillation classifiers exhibit strong sensitivity and specificity levels (>90%).

 

We have indicated that conducting a direct comparison of classifiers using this table is not feasible. This limitation arises due to the utilization of different datasets in each paper, rendering these scores unsuitable as a universal benchmark for evaluating classifiers. Our intention is to prevent any erroneous implications that might arise from interpreting this table, suggesting superiority of one classifier over another.

 

Comment #6:

  1. f) Linkage between paragraphs need to be rational as well as more interpretation is necessary.

Authors’ answer to comment #6:

To address the reviewer's concerns, we have extensively revised the paper, which improves the linkage between paragraphs and the substance of the paper.

Round 2

Reviewer 3 Report

All comments addressed and so accepted.

Author Response

Thank you for accepting our response.

Reviewer 4 Report

It can be accepted.

Minor language editing is required.

Author Response

Thank you for accepting our response.

Back to TopTop