Next Article in Journal
Image-Based Arabian Camel Breed Classification Using Transfer Learning on CNNs
Next Article in Special Issue
A Feasibility Study for a Hand-Held Acoustic Imaging Camera
Previous Article in Journal
Interferences of Electromagnetic Pulses on Microcontroller Units
Previous Article in Special Issue
Building Ensemble of Resnet for Dolphin Whistle Detection
 
 
Article
Peer-Review Record

OneBitPitch (OBP): Ultra-High-Speed Pitch Detection Algorithm Based on One-Bit Quantization and Modified Autocorrelation

Appl. Sci. 2023, 13(14), 8191; https://doi.org/10.3390/app13148191
by Davide Coccoluto, Valerio Cesarini and Giovanni Costantini *
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 5:
Appl. Sci. 2023, 13(14), 8191; https://doi.org/10.3390/app13148191
Submission received: 4 June 2023 / Revised: 10 July 2023 / Accepted: 11 July 2023 / Published: 14 July 2023
(This article belongs to the Special Issue New Advances in Audio Signal Processing)

Round 1

Reviewer 1 Report

The article proposed by the authors deals with signal processing analysis in acoustics. Authors presents a novel algorithm for ultra-fast pitch detection for real-time applications, based on a modified autocorrelation implemented on a single bit signal. The article did not lack comparison their algorithm with some of the most effective existing techniques, which is undoubtedly an advantage of this paper.

The authors treated the presented issues quite broadly, including the analysis of the state of the issue and the list of presented source materials is almost sufficient. Unfortunately, this article lacks a broader indication of the topics suggested for future research.

In my opinion, a weakness of this paper is too not enough new bibliographic sources. Only 28% of the bibliography list are articles younger than 3 years. Supplementation in this regard would certainly increase the value of the paper. The authors use an autocorrelation function.

A much more recent article on the autocorrelation function could also be cited in this context. The analysis of the state of the issue would be worth expanding to include the following topics:

- Properties of selected frequency estimation algorithms in accurate sinusoidal voltage measurements,

- Probabilistic Properties of Deterministic and Randomized Quantizers.

Moreover, the summary of the state of the issue is vague, to say the least.

In my opinion, the paper requires consultation with an English native speaker, because in places, there are simple errors like: "one of the signal". The paper fits perfectly with the applied sciences journal theme, is remarkably interesting and I have no fundamental objections to it.

Overall, the work is well conceived. In my opinion, it can be accepted for publication after correcting all the specified errors.

Additionally, the authors made some small mistakes, I hope typographic only, and they absolutely must correct them.

Strengths

The undoubted advantage of this paper is the excellent graphic form, especially the legibility of the charts.

Noticed errors/remarks

          Line 51 contain "wholesale" numbers of literature references without any, even a brief, characterization of each. This is not correct practice for more than 2 to 3 references. Additionally references to cite several sources at the same time should be in one square bracket, e.g., not: [6]-[9]; but: [6-9]. Applies to the entire paper.

Small errors

        Line 122. Is: 96ksps; should be: 96 ksps. Applies to the entire paper.

        Line 143: is “one of the signal”, should be: “one of the signals”.

        Lines 132 to 145: some sentences end with a period mark "." and some end with a semicolon mark ";". This should be standardized. Here, it is even more asking to use a bulleted list. Correcting this will improve the readability of the paper.

        Line 173. References to cite several sources at the same time should be in one square bracket, e.g., It is [19]–[21]; should be: [19–21].

        Line 200: Different format of variables in the text and in the formulas (see equation 3.). Variables should be presented in italics. This should be standardized throughout the whole paper.

 

        Line 260: is “functio”, should be: “function”.

Author Response

RESPONSE TO REVIEWER #1

 

The article proposed by the authors deals with signal processing analysis in acoustics. Authors presents a novel algorithm for ultra-fast pitch detection for real-time applications, based on a modified autocorrelation implemented on a single bit signal. The article did not lack comparison their algorithm with some of the most effective existing techniques, which is undoubtedly an advantage of this paper.

The authors treated the presented issues quite broadly, including the analysis of the state of the issue and the list of presented source materials is almost sufficient. Unfortunately, this article lacks a broader indication of the topics suggested for future research.

In my opinion, a weakness of this paper is too not enough new bibliographic sources. Only 28% of the bibliography list are articles younger than 3 years. Supplementation in this regard would certainly increase the value of the paper. The authors use an autocorrelation function.

A much more recent article on the autocorrelation function could also be cited in this context. The analysis of the state of the issue would be worth expanding to include the following topics:

- Properties of selected frequency estimation algorithms in accurate sinusoidal voltage measurements,

- Probabilistic Properties of Deterministic and Randomized Quantizers.

Moreover, the summary of the state of the issue is vague, to say the least.

 

Thank you very much for taking the time to evaluate our paper.

We do agree on falling a bit short on references, thus we thoroughly expanded the whole Introduction section with more useful and up-to-date citations.

We found the subjects mentioned by you very interesting, and thus decided to add them within the paper: thank you for this.

We also explored some relevant, alternative approaches to pitch detection with references from 2020, especially analysing the methodologies and exploits used by other researches for real-time estimation, and comparing some of our performances. We also strived to better state the nature of the problem and the way others have faced it, although the field of pitch detection is very diverse, does not have a specific standard and it’s very application-specific.

 

In my opinion, the paper requires consultation with an English native speaker, because in places, there are simple errors like: "one of the signal". The paper fits perfectly with the applied sciences journal theme, is remarkably interesting and I have no fundamental objections to it.

Overall, the work is well conceived. In my opinion, it can be accepted for publication after correcting all the specified errors.

Additionally, the authors made some small mistakes, I hope typographic only, and they absolutely must correct them.

Strengths

The undoubted advantage of this paper is the excellent graphic form, especially the legibility of the charts.

Noticed errors/remarks

  • Line 51 contain "wholesale" numbers of literature references without any, even a brief, characterization of each. This is not correct practice for more than 2 to 3 references. Additionally references to cite several sources at the same time should be in one square bracket, e.g., not: [6]-[9]; but: [6-9]. Applies to the entire paper.

 

Thank you for your kind words and for the time spent evaluating our work. We’d like to say that we strived to revise the English of the whole paper, correcting all minor spelling mistakes and typos, which we thank you for having pointed out.

We agree on the citations 6-9, and we re-organized them so that each one is clearly presented after the category it’s referencing (e.g., “phonatory” diseases, etc.).

 

Small errors

  • Line 122. Is: 96ksps; should be: 96 ksps. Applies to the entire paper.
  • Line 143: is “one of the signal”, should be: “one of the signals”.

 

This was actually not wrong, as we were referring to “the one of the signal”, which means that we measure the SNR with the specific signal. However, we definitely agree that the sentence was not well written: to improve readability, we changed it to: “…after measuring the power of the signal with added harmonics/partials”.


  • Lines 132 to 145: some sentences end with a period mark "." and some end with a semicolon mark ";". This should be standardized. Here, it is even more asking to use a bulleted list. Correcting this will improve the readability of the paper.

 

Thank you for the suggestion: we changed it to a bulleted list ending with semicolons.

 

  • Line 173. References to cite several sources at the same time should be in one square bracket, e.g., It is [19]–[21]; should be: [19–21].
  • Line 200: Different format of variables in the text and in the formulas (see equation 3.). Variables should be presented in italics. This should be standardized throughout the whole paper.
  • Line 260: is “functio”, should be: “function”.

 

We also correctly formatted all the remaining citation, while improving the very references by adding new and more recent works. We thank you again for helping us notice the spelling mistakes: we corrected them all. 

Reviewer 2 Report

The topic of the paper is very interesting, and my comments are as follows:.

(1)   The theoretical contributions should be stressed in detail in Introduction

(2)   Please check carefully all notations and equations. Moreover, the usage of English should be improved.

(3)   Advantages of the proposed algorithm upon the well-known algorithms should be stressed. I suggest mentioning/comparing the simulations with the results of the recent related valid references.

(4)   In introduction, it is not enough to state the current work. It should be expended and reconstructed. Including the motivation, the main difficulties, the main work and the improvements compared with previous related works should be emphasized in this section.

(5)   The importance of the problem considered in this paper should be further addressed.

(6)   The types of software employed for solving the problem and also simulation experiments should be stated clearly.

(7)   The directions to further and improve the work should be added as future recommendation section after ‘conclusions’ section.

Author Response

RESPONSE TO REVIEWER #2

The topic of the paper is very interesting, and my comments are as follows:.

  • The theoretical contributions should be stressed in detail in Introduction

Thank you for the suggestion: we described in a short but comprehensive summary all of the main theoretical contributions at the end of the Introduction section, which we report here.

The main contributions of this paper lie in the presentation of a novel pitch detection algorithm, based on a partially unexplored approach focused on high-speed and low bit depth, very suitable for hardware implementations. All of these characteristics make it a good candidate for live performance applications or MIDI instruments, which rely on real-time note detection. The mathematical and signal processing theories behind our novel algorithm explore the characteristics of the autocorrelation function, its maximization and its approximations, as well as the effect of quantization on the fundamental frequency of a signal.

Along with the new algorithm, a testing paradigm for evaluating the speed and computational complexity of pitch detection algorithms is proposed, and a custom, synthetic dataset is produced and made available to the public. State-of-the-art, pre-existing pitch detection algorithms, especially those focusing on speed, are thoroughly explored and tested.

  • Please check carefully all notations and equations. Moreover, the usage of English should be improved.

    Thank you for pointing this out: we strived to correct all major and minor spelling mistakes or typos, and we reviewed the whole paper to improve English. We also corrected inconsistencies in notations and equations, as well as acronyms.

(3)   Advantages of the proposed algorithm upon the well-known algorithms should be stressed. I suggest mentioning/comparing the simulations with the results of the recent related valid references.

The advantages of the proposed algorithm mainly lie in the speed (and inherent low computational complexity), ease of hardware implementation and good speed vs. accuracy compromise. These can be shown numerically against other notable algorithms, by the test we performed. However, other tests are performed on different datasets and their results cannot be comparable to ours. In the “Future Improvements” section which we added in the Conclusion as per your suggestion, we clearly state that we are working on testing our algorithm in other situations, such as vocal signals or pre-existing datasets.
On the other hand, the computational complexity is a mathematical metric, which is exact: from the O(N^2) complexity of the normal autocorrelation function, we arrive to a single bit XOR (implementing the difference autocorrelation) whose complexity is constant, as O(1). In order to clarify this, we specified it in the “Methods” section regarding our proposed OBP algorithm. Moreover, speed performances are well tested by the provided time benchmarks (TE).
In order to clarify the picture and welcome your suggestions, we also strived to include even more recent references detailing the current state-of-the-art for algorithms and applications that are similar to what we are working on.

(4)   In introduction, it is not enough to state the current work. It should be expended and reconstructed. Including the motivation, the main difficulties, the main work and the improvements compared with previous related works should be emphasized in this section.

Thank you for noticing this: we thoroughly expanded the whole Introduction section, adding motivations and useful, up-to-date citations.

We also explored some relevant, alternative approaches to pitch detection with references from 2020, especially analysing the methodologies and exploits used by other researches for real-time estimation, and comparing some of our performances. We also strived to better state the nature of the problem and the way others have faced it, although the field of pitch detection is very diverse, does not have a specific standard and it’s very application-specific.

(5)   The importance of the problem considered in this paper should be further addressed.

 

Thank you for the suggestion: we added this in the Introduction. The problem of pitch detection is crucial in all the applications that rely on knowing the fundamental frequency in order to perform periodicity-related computations, such as acoustic feature extraction relying on prosodic metrics such as HNR, jitter or shimmer that evaluate “cycle-to-cycle” variations. Moreover, professional audio relies on pitch detection to build tuners for real instruments, or for real-time pitch re-adjusting applications especially directed towards vocal tuning. Real-time detectors are thus necessary to enable performers to monitor pitch accuracy and trigger events in real time, especially related to MIDI applications. Additionally, fast, real-time pitch detection is valuable in interactive audio applications, such as games and virtual reality, where it enables dynamic audio synthesis and effects as well as responsive processing. […] The problem of detecting the fundamental frequency in real-time is crucial whenever live performances are involved, as even minuscule latencies of few milliseconds can be perceived by the musician or operator.

(6)   The types of software employed for solving the problem and also simulation experiments should be stated clearly.

We used MATLAB for the whole procedure. It is specified at the end of the “Results” section along with the specifics of the machine used for simulation (helpful for accounting the TE). All the analyses, data creation, tests, simulations and algorithm implementations have been performed on MATLAB® R2023a (by Mathworks Inc., Natick, Massachusetts) on a Dell Latitude E5550 computer, with an Intel Core i5 5200U processor and a 16 GB Dual-Channel DDR3 RAM.

(7)   The directions to further and improve the work should be added as future recommendation section after ‘conclusions’ section.

Thank you: we added a whole sub-section called “4.1. Future works” detailing the state of our current work in improving this algorithm (testing on other datasets) and the future perspectives (alternative implementations and hardware) as well as a summarized bullet list. Since the Applied Sciences template does not entail sub-sections in the Conclusions, which is supposed to be the last section, we added the Future Works at the end of the Discussion (section 4).  

Reviewer 3 Report

What is the original perspective or outlook of the presented study? The concept is similar to other similar works. What kind of
information does the readership of the Journal gain from this study?

The main objectives of this study and novelty point should be clearly discussed in detail.

Recent studies from high impact factor journal (see https://www.scimagojr.com/) should be cited like from IEEE transactions, Springer, MDPI and Elsevier in the introduction or in a related work section.

All acronyms must be defined including YIN, SWIPE, NLS, and FPGA in the abstract.

Figs. 2, 5 and 6 need labels in both axes.

A full statistical analysis of the results must be presented related to Tabels 2-6.

 

 

 

English needs improvements.

Author Response

RESPONSE TO REVIEWER #3

What is the original perspective or outlook of the presented study? The concept is similar to other similar works. What kind of
information does the readership of the Journal gain from this study?

The main objectives of this study and novelty point should be clearly discussed in detail.

Thank you for the efforts in evaluating our work and for the useful suggestions. We strived to clarify our main contributions, and we thus expanded the Introduction section. In our paper, we present a novel pitch detection algorithm, never-before seen, with the added advantage of not just being an alternative implementation, but the fastest one. Moreover, a dataset has been built and made available to the public along with a thorough testing pipeline for pitch detectors, and a comparison and overview of some of the fastest alternatives is presented. Citing from our manuscript: “The main contributions of this paper lie in the presentation of a novel pitch detection algorithm, based on a partially unexplored approach focused on high-speed and low bit depth, very suitable for hardware implementations. All of these characteristics make it a good candidate for live performance applications or MIDI instruments, which rely on real-time note detection. The mathematical and signal processing theories behind our novel algorithm explore the characteristics of the autocorrelation function, its maximization and its approximations, as well as the effect of quantization on the fundamental frequency of a signal. Along with the new algorithm, a testing paradigm for evaluating the speed and computational complexity of pitch detection algorithms is proposed, and a custom, synthetic dataset is produced and made available to the public. State-of-the-art, pre-existing pitch detection algorithms, especially those focusing on speed, are thoroughly explored and tested.”

Recent studies from high impact factor journal (see https://www.scimagojr.com/) should be cited like from IEEE transactions, Springer, MDPI and Elsevier in the introduction or in a related work section.

Thank you for noticing this: we thoroughly expanded the whole Introduction section, adding motivations and useful, up-to-date citations.

We also explored some relevant, alternative approaches to pitch detection with references from 2020, especially analysing the methodologies and exploits used by other researches for real-time estimation, and comparing some of our performances. We also strived to better state the nature of the problem and the way others have faced it, although the field of pitch detection is very diverse, does not have a specific standard and it’s very application-specific.

All acronyms must be defined including YIN, SWIPE, NLS, and FPGA in the abstract.

Thank you for the suggestion: we now specified the acronyms whenever possible. However, YIN is not an acronym but an arbitrary name, so we had to leave it as is. Please see: http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf

Figs. 2, 5 and 6 need labels in both axes.

Corrected.

A full statistical analysis of the results must be presented related to Tabels 2-6.

We performed it and reported it in the Results section.

 

Reviewer 4 Report

The author discuss a new method for obtaining the fundamental pitch from a sound signal. The applications are vast and cover many relevant issues in for instance music industry. The paper is very clearly written and the results support authors conclusions. I definitely recommend it publication in Applied Science. I only have minor presentation points which are listed below:

Minor points: Figures and Tables.
(1) Line 358: It should be 'Figure 6'.
(2) Line 411: It should be 'Figure 7'.
(3) Line 429: (see Table 3 for the numerical values).

Minor points: Text
(1) Line 37: 'predictability'
(2) Line 105: '..version of the'

General comments:
(1) Eq.(2). Does it mean that S.D.=1? Perhaps it does not matter. Please   comment.

(2) Explain the range of variability for the lag L in Eq.(3). Must L be smaller than T? Comment please.

(3) Which RAE values are acceptable for obtaining good results in most applications you are interested in? RAE < 0.01, 0.05, 0.1?

See above list.

Author Response

RESPONSE TO REVIEWER #4

The author discuss a new method for obtaining the fundamental pitch from a sound signal. The applications are vast and cover many relevant issues in for instance music industry. The paper is very clearly written and the results support authors conclusions. I definitely recommend it publication in Applied Science. I only have minor presentation points which are listed below:

Minor points: Figures and Tables.
(1) Line 358: It should be 'Figure 6'.
(2) Line 411: It should be 'Figure 7'.
(3) Line 429: (see Table 3 for the numerical values).

Minor points: Text
(1) Line 37: 'predictability'
(2) Line 105: '..version of the'

Thank you so much for helping us finding these mistakes. We corrected them.


General comments:
(1) Eq.(2). Does it mean that S.D.=1? Perhaps it does not matter. Please   comment.

Equation 2 just refers to the common formula for discrete-sequence autocorrelation. No assumptions are needed.

(2) Explain the range of variability for the lag L in Eq.(3). Must L be smaller than T? Comment please.

The lag is the independent variable, in function of which the autocorrelation is computed. Since we are dealing with digital signals  discrete sequences, its unit would be samples. We clarified this in the text.

(3) Which RAE values are acceptable for obtaining good results in most applications you are interested in? RAE < 0.01, 0.05, 0.1?

Acceptable RAE values can be approximately as below 0.025, because most musical ap-plications use discretized pitches that do not result into note errors if within a range <2.5% around the starting pitch. We chose 2% as a safety measure: this range is well represented to the “ACC-2“ metric which is the percentage of instances in which the algorithm brings an error equal or lower than 2%. We better clarified this in the Methods section. It is also mentioned within the Discussion.

Reviewer 5 Report

A high-speed pitch detection technique is built using one-bit quantization in this paper. Three of the most extensively used algorithms are compared to the suggested algorithm. The paper is well-written and well-presented. However, the following factors should be considered before acceptance:

·         Some abbreviations should be defined before use.

  • The authors should be sure that all the symbols are defined in the paper since there is no nomenclature.
  • Figures 6 and 7 should be explained by the authors in the text of the paper.
  • Figure 7 should include the time unit.
  • A more quantitative comparison of the suggested approach and the comparable methods should be included in the conclusion.
  • The references are up-to-date.

Minor editing of English language required

Author Response

RESPONSE TO REVIEWER #5 

A high-speed pitch detection technique is built using one-bit quantization in this paper. Three of the most extensively used algorithms are compared to the suggested algorithm. The paper is well-written and well-presented. However, the following factors should be considered before acceptance:

  • Some abbreviations should be defined before use.
  • The authors should be sure that all the symbols are defined in the paper since there is no nomenclature.
  • Figures 6 and 7 should be explained by the authors in the text of the paper.
  • Figure 7 should include the time unit.
  • A more quantitative comparison of the suggested approach and the comparable methods should be included in the conclusion.
  • The references are up-to-date.

 

Thank you for taking the time to evaluate our paper and for the kind words. We do agree on all the points you noticed, and we have corrected all minor mistakes, typos, etc.

We strived to define all abbreviations, even in the Abstract; we explained Figure 6 and 7 in the text of the corresponding sections (Results and Discussion respectively). We added the time unit in Fig. 7 and updated the Conclusion (and the end of the Discussion in a similar fashion). Thank you again.

 

Round 2

Reviewer 1 Report

The authors have corrected errors and inaccuracies noted. In my opinion, the article can be accepted for publication in its current form.

Author Response

Thanks for your review

Reviewer 3 Report

Suggestion (before of the Section 2): The remainder of this paper is organized as follows. Section 2 presents ... Section 3 describes ...

All variables in the equations must be described.

English needs some adjustments.

Author Response

Thank you for the precious suggestions. We added a short paragraph at the end of the Introduction section which details the content to follow and explains the structure of the manuscript. We also specified each variable used in each equation, and, even when already specified, we made small changes for readability.

Lastly, we improved the quality of the English, especially in the Introduction, reviewing spelling/grammar mistakes.

Reviewer 5 Report

The authors have addressed all the points.

 Minor editing of English language required

Author Response

Thank you for the precious suggestions. We improved the quality of the English, especially in the Introduction, reviewing spelling/grammar mistakes.

Back to TopTop