Next Article in Journal
A Study on the Optimization of Water Jet Decontamination Performance Parameters Based on the Response Surface Method
Previous Article in Journal
Quantitative Determination of Partial Voxel Compositions with X-ray CT Image-Based Data-Constrained Modelling
 
 
Article
Peer-Review Record

Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates

Appl. Sci. 2024, 14(16), 7410; https://doi.org/10.3390/app14167410 (registering DOI)
by Quang Dao Vuong 1, Kanghyun Seo 2, Hyejin Choi 2, Youngmin Kim 2, Ji-woong Lee 1 and Jae-ung Lee 1,*
Reviewer 1: Anonymous
Appl. Sci. 2024, 14(16), 7410; https://doi.org/10.3390/app14167410 (registering DOI)
Submission received: 8 July 2024 / Revised: 18 August 2024 / Accepted: 20 August 2024 / Published: 22 August 2024
(This article belongs to the Section Acoustics and Vibrations)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study aimed to improve the efficiency of storing sensor reading values in tasks with exceptionally high sampling rates, such as AE measurements. This was achieved by introducing novel algorithms designed to reduce the file size and increase the write rate. These algorithms maximize the benefits of binary formats with the goal of storing large amounts of data rapidly and accurately without requiring expensive storage upgrades. It is a meaningful work.

 

[1]    Formula (1) should not appear in the introduction section of a paper (see line 44).

[2]    Figure 2,12 and 13 should be in other more formal forms.

[3]    Lines 387, 436, 464 and 512 should not be No.x. Why write methods 1,2,3,4; discussion 1,2; solution 1,2 in this form? Why not list subheadings, similar to 4.2.1...methods, 4.2.2 ...methods?

[4]    What is the specific difference between method 4 and methods 1,2,3? Why is there no comparison between method 4 and methods 1,2,3?

 

[5]    There are some grammatical errors in this article, which need to be further modified.

Comments on the Quality of English Language

There are some grammatical errors in this article, which need to be further modified.

Author Response

Dear Reviewer,

We are extremely grateful for your detailed review of our manuscript, which improved the completeness of this paper not only technically but also grammatically. We have revised the manuscripts in accordance with your comments. We hope all our responses make sense to you. Thank you for your valuable time spent on this manuscript. You truly deserve our deepest gratitude.

Please take a look at our responses as follows:

 

[1]    Formula (1) should not appear in the introduction section of a paper (see line 44).

Response 1: Yes, thank you for pointing this out. The formula was removed as per your suggestion.

 

[2]    Figure 2,12 and 13 should be in other more formal forms.

Response 2: Dear Reviewer, thank you for pointing this out. We have removed Figure 2 from the manuscript. Figures 12 and 13 (now 11 and 12) are screenshots of a working window. We are unsure how to alter their forms. However, we believe this issue can be addressed in the next editing step through discussion with the journal's assistant editors.

 

[3]    Lines 387, 436, 464 and 512 should not be No.x. Why write methods 1,2,3,4; discussion 1,2; solution 1,2 in this form? Why not list subheadings, similar to 4.2.1...methods, 4.2.2 ...methods?

Response 3: Yes, thank you for pointing this out. “Approach No. x” has been revised as “Approach x” as per your recommendation.

In the manuscript, we introduced four approaches to identify the most optimal data-writing algorithm. As a result, Approaches 1 and 3 were found to be ineffective, while Approaches 2 and 4 are recommended. Since not every approach is effective, we have chosen not to use subheadings for each approach to avoid the confusion that all approaches are equally effective.

Besides, there is a Discussion for Approaches 1, 2 and 3. Since Approaches and Discussions are not the same type of content, we believe it is unsuitable to use the same type of subheadings for them. It is better to introduce them sequentially, as presented in the manuscript.

We highly appreciate your understanding.

 

[4]    What is the specific difference between method 4 and methods 1,2,3? Why is there no comparison between method 4 and methods 1,2,3?

Response 4: Dear Reviewer, Approaches 1, 2, and 3 are for a single drive, while Approach 4 is for multiple drives. The manuscript includes comparisons between Approach 4 and the other approaches. Approach 4 is almost the same as Approach 2, except that the data is written across multiple drives (Line 516). Consequently, the write rate reached 472 MB/s, which is the combined write rate of all drives (Line 538). We believe these points highlight the differences between Approach 4 and the other approaches. We highly appreciate your understanding.

 

[5]    There are some grammatical errors in this article, which need to be further modified.

Response 5: Dear Reviewer, the manuscript has been proofread by Editage, an expert English editing service. However, some errors may still be present. We have revised the manuscript. Please inform us if any errors remain.

 

[6] Comments on the Quality of English Language

There are some grammatical errors in this article, which need to be further modified.

Response 6: Dear Reviewer, the manuscript has been proofread by Editage, an expert English editing service. However, some errors may still be present. We have revised the manuscript. Please inform us if any errors remain.

 

We greatly appreciate your efforts in helping to improve our manuscript.

Thank you so much and Best Regards.

Reviewer 2 Report

Comments and Suggestions for Authors

For comments and suggestions please see the attached pdf

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

We are extremely grateful for your detailed review of our manuscript, which improved the completeness of this paper not only technically but also grammatically. We have revised the manuscripts in accordance with your comments. We hope all our responses make sense to you. Thank you for your valuable time spent on this manuscript. You truly deserve our deepest gratitude.

Please take a look at our responses as follows:

 

General remarks

               Overall, the text is well written and comprehensible. There is a suspicion that parts of the text were written with the help of a large language model such as ChatGPT. According to the MDPI guidelines, this is also okay (https://www.mdpi.com/about/announcements/5687), but should be mentioned accordingly in the paper, ideally in the Acknowledgement section

Response: Yes, you are right. Several parts of the text were corrected for grammar with the help of ChatGPT, but the volume of these corrections is not significant enough to mention. Besides, the manuscript was thoroughly proofread by Editage, an expert English editing service, funded by the Ministry of Oceans and Fisheries (MOF, Korea) and the Korea Maritime & Ocean University Research Fund, as stated in the Funding section. Therefore, we believe it is unnecessary to mention this further in the Acknowledgement section.

 

               The article is of large volume. Large volume is not always an advantage and can hinder readers from fully understanding the essence of the article. Therefore, I suggest highlighting the most important parts in the article, and eliminating elements that do not contribute scientific value.

Response: Yes, thank you for your suggestion. The manuscript has been revised according to your recommendation. Some elements were eliminated to highlight the most important parts.

 

               A central point of the paper is the proposal to store DAQ data in binary format. This is common practice in most (industrial) applications (such as wav: https://en.wikipedia.org/wiki/WAV) and has no novelty value.

Response: Yes, using binary formats in GENERAL is not a novel idea. However, in this study, we introduced OPTIMIZED binary formats for AE measurements, requiring just 2 bytes to store a value. Additionally, we explored the FASTEST METHODS for writing this data on HDDs. A series of tests employing various writing approaches were conducted to identify the most optimal data-writing algorithm. As a result, Approaches 2 and 4 were recommended. We highly appreciate your understanding of our work.

 

               Although a reasonable number of references are listed, most of them refer to the introduction, which is largely concerned with listing applications of acoustic emission techniques. A state of the art discussion of common storage, streaming and compression principles is completely missing. What about streaming technologies such as direct streaming or Apache Kafka?

Response: Thank you for your recommendation. Adding these contents may help enrich the manuscript. However, this study focuses on how to store data on HDDs effectively. Transferring data from a source to a destination (streaming) is not of interest in this situation. We highly appreciate your understanding.

 

Special notes

 

  1. Lines 40 - 45 Redundancy (once in words, once as a formula)

Response 1: Yes, thank you for pointing this out. The formula was removed as per your suggestion.

 

  1. Lines 46 / 47 - Why?

Response 2: Dear Reviewer, a sampling rate of at least 10 times the maximum frequency of interest is needed to reproduce accurate waveform of the signal. A higher sampling rate offers a more accurate representation of the waveform. Number 10 is based on practical experience. A reference ([4]) has been added to provide the source of information as follows:

https://www.dataq.com/data-acquisition/general-education-tutorials/what-you-really-need-to-know-about-sample-rate.html

 

  1. Line 49; reference to a general NDT book is a bit crude at this point. NDT covers many techniques from ultrasound to penetrant testing to X-ray. If you do, please reference at least one relevant chapter in the book.

Response 3: Yes, you are right. The reference was replaced by another suitable book as follows:

Moore, P.O.; Miller, R.K.; Hill, E. Nondestructive Testing Handbook, Vol. 6-Acoustic Emission Testing. Ohio, American Society for NDT Inc, pp.147-190. 2005.

 

  1. Lines 65 / 66 Equation 1 only says that the sampling rate is 2.5 times the highest frequency. This has nothing (directly) to do with the need for high sampling rates.

Response 4: Yes, thank you for pointing this out. This was removed from the manuscript.

 

  1. Lines 71 - 84: Not absolutely necessary, but would still not be bad to use references for the write rates mentioned for HDDs and SSDs.

Response 5: Thank you for pointing this out. A reference ([22]) has been added for the write rates mentioned for HDDs and SSDs as follows:

https://tekie.com/blog/hardware/ssd-vs-hdd-speed-lifespan-and-reliability/

 

  1. Lines 114 - 127: As described above, AE techniques cover a very wide range of applications. For most of these applications, the technique described here will not be necessary. It would be good if some specific use cases and the resulting data volumes could be described in the introduction, such as multi-channel ultrasound microscopy. This would make the application here a little more concrete.

Response 6: Dear Reviewer, thank you so much for your recommendation. I believe that ultrasound microscopy and scanning acoustic microscopy (SAM) refer to the same thing. As per your suggestion, I have added some information about SAM to the introduction (Lines 124-127). Due to our limited experience with this technique, the discussion is brief. We greatly appreciate your understanding.

 

  1. Lines 139 - 164 This part is trivial and should be accepted as general knowledge. If the authors consider it absolutely necessary, they should at least shorten this digression to a minimum.

Response 7: Yes, this part was minimized as per your recommendation.

 

  1. Lines 166-169 Please underpin this statement with literature. In my practice, I have encountered many cases of binary data formats in sound and vibration measurements. Figure 2 also contains no relevant information for the text.

Response 8: Yes, thank you for pointing this out. Figure 2 was removed. Session 2 has been shortened as per your suggestion.

 

  1. The entire section "2.2 Advantages of binary file formats" (166-234) is state of the art. Please shorten it to the minimum necessary.

Response 9: Yes, thank you for pointing this out. Session 2 has been shortened as per your suggestion.

 

  1. Lines 270 / 271 Most DAQ devices that I know of save data in binary format. In addition, the storage is not (solely) dependent on the DAQ device, but on the software used.

Response 10: Yes, you are right. However, in this scenario, we mentioned the reading values that a software can read from a DAQ, which are typically real numbers. We highly appreciate your understanding.

 

  1. Lines 265 - 283 Textbook knowledge. The explanations and the illustration do not provide any added value. By the way: non-linear scaling makes it possible to capture small sensor values with high precision even at low resolutions. An example of non-linear quantization curves is to keep the quantization noise constant over the entire value range.

Response 11: Yes, you are right. This is textbook knowledge, detailing how to calculate the smallest signal change that a DAQ can detect. It is important to determine how many digits are needed to store a value in text format compared to a binary format. We highly appreciate your understanding.

 

  1. Lines 292 293 The illustration does not add any value to the understanding of the text. Lines 294 - 298 Where do the values a1 = 6; a2 = 30; a3 = 150 come from? This will become clear later, but it should at least be mentioned briefly at this point.

Response 12: Yes, thank you for pointing this out. This part was revised regarding your comment (Lines 254 and 258). It is just an example to illustrate the configuration of the 3-byte number format.

 

  1. Lines 323 Use of power notation - is easier to read

Response 13: Yes, use of power notation is easier to read. However, we prefer the current number format (“0.000001”) to demonstrate that SEVEN DIGITS are needed to represent a value in text format.

 

  1. Lines 322 - 328 a standard industry DAQ device stores in binary format; discussion unnecessary

Response 14: Dear Reviewer, this part is important as it compares the use of text format with the 3-byte number format.

You are correct that a standard industry DAQ device can store data in binary format. However, this binary format may not be optimized for the reading values. The most common binary formats are Floating Point (64-bit, double precision), which requires 8 bytes to represent a reading value, or Integer (32-bit), which requires 4 bytes. This was described in Lines 212-228.

 

  1. Lines 329 - 330 Document/reference? With all devices I have worked with so far, up to 32 bits could be set.

Response 15: Dear Reviewer, this part mentions DAQ devices used for AE measurements. These DAQs support multi-channel with high sampling rates (at least 1 Mega sample/ second per channel). My experience is limited. It is primarily with National Instruments DAQs. The DAQs that meet these requirements offer up to 16-bit resolution. From my searching on internet, it is noted that other DAQs that meet these requirements also typically have up to 16-bit resolution.

Perhaps 32-bit DAQs for AE measurements are available, but I believe that they are not common. That is why I mentioned “most DAQ devices”, not “all DAQ devices”.

 

  1. Lines 335 Use power format (x.xx * 10^y)

Response 16: Dear Reviewer, use of power notation is easier to read. However, we prefer the current number format (“0.0003”) to demonstrate that FIVE DIGITS are needed to represent a value in text format.

 

  1. Lines 342 - 362 In general, it is good engineering practice not to exceed the working range at all. Increasing the operating range by 7% is a small gain compared to the potential damage to the DAQ device if the range is exceeded too much. In addition, the problem of clipping is merely postponed and not eliminated when the working range is increased.

Response 17: Dear Reviewer, we did not intend to expand the operating range of the DAQ device. However, the actual operating range of the NI-9233 DAQ is wider than its nominal range. This is why we introduced two Solutions for this issue.

 

  1. Table 1) Takes up a lot of space, while some of the information is irrelevant for the specific case here (rotational speed, form factor, year).

Response 18: Dear Reviewer, all the specifications listed in Table 1 are meaningful for describing the write performance capabilities of the HDD, as follows:

-             Rotational speed: a 7200 rpm HDD performs better than a 5400 rpm HDD.

-             Form factor: a 3.5-inch HDD offers better performance compared to a 2.5-inch HDD.

-             Year: a newer HDD generally has better write performance than older models.

 

  1. Lines 422- 475 The figures in this section show the transfer rate over time, but the average write rate is discussed. This is confusing. Perhaps the average write rate could be added to the diagrams as a line?

Response 19: Yes, thank you for pointing this out. The manuscript has been revised as per your suggestion.

 

  1. Lines 436 - 461: What is the total amount of data stored in the 6 examples? Is it the same in each case? If so, why is a buffer size of 8 MB not optimal? This is the shortest period of time for storing the data.

Response 20: Yes, you are right.  The total amount of data stored in each example is consistently 5 GB, as stated in Line 348. Consequently, using an 8 MB file size results in a higher number of files compared to the other configurations. This configuration resulted in the lowest average write speed of 136 MB/s, as shown in Table 2.

 

  1. Lines 436 - 461: Approach No. 2 discusses the buffer size. Although it is mentioned in the text that the Flie Size is equal to the Buffer Size, it would be better to write Buffer Size instead of File Size in the caption to Figure 9.

Response 21: Yes, we agree. The manuscript has been revised as per your recommendation.

 

  1. Lines 527 Spelling error (anew instead of a new) Figure 13 provides no recognizable added value

Response 22: Yes, thank you for pointing out this. The spelling error was corrected in the manuscript.

Figure 13 not only shows the program's operating window but also illustrates how all measurement channels are displayed in real time, partially demonstrating the software's capabilities. Besides, from my perspective, it is more effective to discuss a program with an overview image.

 

  1. As far as I understand it, only harmonic signals are used in the simulations. Why? Why wasn't noise also examined as a test signal?
  • Noise signals, particularly white noise, contain a much more random distribution of amplitudes across all frequencies. This randomness could potentially test the DAQ system's ability to handle data with high entropy. Harmonic signals, being periodic and prability to handle random data, which is common in real-world scenarios.
  • Noise signals are typically less compressible than harmonic signals due to their randomness. This characteristic might impact the system's storage efficiency as compressibility can influence write speeds. The less predictable the data, the harder it is to compress, potentially leading to slower write speeds due to larger file sizes being written.
  • The efficiency of the DAQ system’s buffering strategies could be differently impacted by noise versus harmonic signals. Noise, with its high variability, might challenge caching algorithms which rely on data predictability to efficiently manage memory.
  • All in all I recommand also to add a comparison of the writing speed between different types of signals (harmonic signals vs. noise signals)

Response 23: Dear Reviewer, in this study there is no compression on the data. The writing procedure can be described as follows:

  1. Read data from the DAQ: The DAQ has a duty to continuously provide reading values in real time. The performance or efficiency of the DAQ is not the focus of this research.
  2. Encode data into a binary format: This process is regardless of the data's amplitude. There is no difference between harmonic signals and noise signals in this step.
  3. Write binary data to HDDs: Similarly, this process is unaffected by the data's amplitude, with no difference between harmonic signals and noise signals.

In conclusion, we believe that it is not necessary to add a comparison of the writing speed between different types of signals (harmonic signals vs. noise signals). We highly appreciate your understanding.

 

  1. Figure 16: plotting a - c in one figurea makes it easier to compare the signals and requires less space

Response 24: Dear Reviewer, plotting only a-c in one figure does not provide sufficient information to detect the potential crack position, as the cross section is a circle. It is necessary to compare the signals from all three sensors as described in the manuscript. We highly appreciate your understanding.

 

  1. Section 5.2 Actual measurement with the application of acoustic emission monitoring (Lines 702 - 764)
  • Its great to show an concrete example application for the proposed system
  • However, the example does not require a real high data rate? Why was was example chosen and in which relation is it to the aim of the paper?

Response 25-1: The measurement sampling rate was set to 2 MS/s per channel, providing a more accurate representation of the waveform, which is crucial for detecting an AE event. Additionally, the Kistler sensors operated within a frequency range of 100–900 kHz. As mentioned in the manuscript, a sampling rate of 2.56 times the highest frequency component is recommended.

The example was chosen to demonstrate the proposed system's performance in an actual AE measurement. The high sampling rate was configured for 8 channels over a measurement period exceeding 29 hours. This test confirmed the program's capability to perform an AE measurement, successfully detecting potential cracks. All data was written efficiently on the HDD, using minimal storage.

 

  • "Furthermore, the data 760 write algorithms demonstrated a remarkable improvement in the write performance of 761 traditional HDDs. " --> This is not evident from the actual text.

Response 25-2: Thank you for pointing this out. It was removed from the manuscript as per recommendation.

 

  1. 773 - 781 As mentioned several times before: industrial standard

Response 26: You are correct that a standard industry DAQ device stores data in binary format. However, this binary format may not be optimized for the reading values. The most common binary formats are Floating Point (64-bit, double precision), which requires 8 bytes to represent a reading value, and Integer (32-bit), which requires 4 bytes. An optimized binary format requires just 2 bytes for a value in a 16-bit resolution system.

 

We greatly appreciate your efforts in helping to improve our manuscript.

Thank you so much and Best Regards.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

General remarks

Remark 1

From my first Review: “A central point of the paper is the proposal to store DAQ data in binary format. This is common practice in most (industrial) applications (such as wav: https://en.wikipedia.org/wiki/WAV) and has no novelty value.”

Answer: “Yes, using binary formats in GENERAL is not a novel idea. However, in this study, we introduced OPTIMIZED binary formats for AE measurements, requiring just 2 bytes to store a value. Additionally, we explored the FASTEST METHODS for writing this data on HDDs. A series of tests employing various writing approaches were conducted to identify the most optimal data-writing algorithm. As a result, Approaches 2 and 4 were recommended. We highly appreciate your understanding of our work.”

Reply: “I appreciate the exploration into optimizing data storage efficiency in your study. However, storing DAQ data directly in binary formats matching the acquisition hardware’s resolution is common practice in the industry, as seen with companies like Mecalc Technologies, gfai tech or Dewesoft just to name a few. Given this context, the use of "optimized binary formats" is not a notably new innovation. The extensive discussion spanning lines 137 - 326 must be substantially condensed. A succinct half-page explanation as a part of the methods is appropriate.”

Remark 2

From my first review: Although a reasonable number of references are listed, most of them refer to the introduction, which is largely concerned with listing applications of acoustic emission techniques. A state of the art discussion of common storage, streaming and compression principles is completely missing. What about streaming technologies such as direct streaming or Apache Kafka?

Response: Thank you for your recommendation. Adding these contents may help enrich the manuscript. However, this study focuses on how to store data on HDDs effectively. Transferring data from a source to a destination (streaming) is not of interest in this situation. We highly appreciate your understanding.

Reply: Okay, I accept that streaming is not the focus. However, I still lack a classification of the state of the art in relation to the topic presented, particularly in relation to storage.

Remark 3

From my first review: The article is of large volume. Large volume is not always an advantage and can hinder readers from fully understanding the essence of the article. Therefore, I suggest highlighting the most important parts in the article, and eliminating elements that do not contribute scientific value.

Response: Yes, thank you for your suggestion. The manuscript has been revised according to your recommendation. Some elements were eliminated to highlight the most important parts.

Reply: The revised manuscript is only 21 lines shorter than the original version. I must therefore repeat myself: The article is of large volume. Large volume is not always an advantage and can hinder readers from fully understanding the essence of the article. Therefore, I suggest highlighting the most important parts in the article, and eliminating elements that do not contribute scientific value.

Special notes

Review 1 – Bullet Point 23

From my first Review: “As far as I understand it, only harmonic signals are used in the simulations. Why? Why wasn't noise also examined as a test signal?

Noise signals, particularly white noise, contain a much more random distribution of amplitudes across all frequencies. This randomness could potentially test the DAQ system's ability to handle data with high entropy. Harmonic signals, being periodic and prability to handle random data, which is common in real-world scenarios.

Noise signals are typically less compressible than harmonic signals due to their randomness. This characteristic might impact the system's storage efficiency as compressibility can influence write speeds. The less predictable the data, the harder it is to compress, potentially leading to slower write speeds due to larger file sizes being written.

The efficiency of the DAQ system’s buffering strategies could be differently impacted by noise versus harmonic signals. Noise, with its high variability, might challenge caching algorithms which rely on data predictability to efficiently manage memory.

All in all I recommand also to add a comparison of the writing speed between different types of signals (harmonic signals vs. noise signals)”

Answer: Dear Reviewer, in this study there is no compression on the data. The writing procedure can be described as follows:

1.

Read data from the DAQ: The DAQ has a duty to continuously provide reading values in real time. The performance or efficiency of the DAQ is not the focus of this research.

2.

Encode data into a binary format: This process is regardless of the data's amplitude. There is no difference between harmonic signals and noise signals in this step.

3.

Write binary data to HDDs: Similarly, this process is unaffected by the data's amplitude, with no difference between harmonic signals and noise signals.

In conclusion, we believe that it is not necessary to add a comparison of the writing speed between different types of signals (harmonic signals vs. noise signals). We highly appreciate your understanding.

Reply: I am fine with that explanation. However, I recommend to include a short discussion, why you took harmonic signals as test signals.

Review 1 – Bullet Point 25

From my first review: Section 5.2 Actual measurement with the application of acoustic emission monitoring (Lines 702 - 764)

Its great to show an concrete example application for the proposed system

However, the example does not require a real high data rate? Why was was example chosen and in which relation is it to the aim of the paper?

Response: The measurement sampling rate was set to 2 MS/s per channel, providing a more accurate representation of the waveform, which is crucial for detecting an AE event. Additionally, the Kistler sensors operated within a frequency range of 100–900 kHz. As mentioned in the manuscript, a sampling rate of 2.56 times the highest frequency component is recommended.

The example was chosen to demonstrate the proposed system's performance in an actual AE measurement. The high sampling rate was configured for 8 channels over a measurement period exceeding 29 hours. This test confirmed the program's capability to perform an AE measurement, successfully detecting potential cracks. All data was written efficiently on the HDD, using minimal storage.

Reply: The manuscript does not specify whether the acoustic emission measurements were conducted using eight channels, as mentioned in your answer above. Clarifying whether all eight channels were active during the tests would help in understanding the full scope of the data acquisition and the system's capabilities. This detail is crucial for assessing the validity of the results and the performance of the DAQ system under review.

Review 1 – Bullet Point 2

From my first review: Lines 46 / 47 - Why?

Response: Dear Reviewer, a sampling rate of at least 10 times the maximum frequency of interest is needed to reproduce accurate waveform of the signal. A higher sampling rate offers a more accurate representation of the waveform. Number 10 is based on practical experience. A reference ([4]) has been added to provide the source of information as follows:

https://www.dataq.com/data-acquisition/general-education-tutorials/what-you-really-need-to-know-about-sample-rate.html

Reply: There is no theoretical basis for choosing a sampling rate that is 10 times the highest frequency. In the industry, this is often used as a rule of thumb to avoid artifacts resulting from the filter design of anti-aliasing filters or the SNR, for example. However, the term "needed" should not be used here. It is better to use "recommended".

By the way: the link in the reference does not work, if I click it. It sends me to https://www.dataq.com/data-acquisition/general-edu-803 and not to https://www.dataq.com/data-acquisition/general-education-tutorials/what-you-really-need-to-know-about-sample-rate.html

Figure Reference in wrong Format

Zeile 261: Fig 5 -> Figure 4

Review 1 – Bullet Point 17

From my first review: Lines 342 - 362 In general, it is good engineering practice not to exceed the working range at all. Increasing the operating range by 7% is a small gain compared to the potential damage to the DAQ device if the range is exceeded too much. In addition, the problem of clipping is merely postponed and not eliminated when the working range is increased.

Response: Dear Reviewer, we did not intend to expand the operating range of the DAQ device. However, the actual operating range of the NI-9233 DAQ is wider than its nominal range. This is why we introduced two Solutions for this issue.

Reply: You write that either the nominal range (+/- 10 V) can be used with clipping detection (Solution 1) or the full operating range (+/- 10.7 V) (apparently) without clipping detection (Solution 2). However, these are two completely different considerations.

1. you can increase the operating range to gain additional headspace

2. you can do without clipping detection for performance reasons. This requires careful tuning of the input gain to ensure that the signals are not clipped. However, this is independent of whether the nominal or the actual working range is used.

Author Response

Dear Reviewer,

We are extremely grateful for your detailed review of our manuscript, which improved the completeness of this paper not only technically but also grammatically. We have revised the manuscripts in accordance with your comments. We hope all our responses make sense to you. Thank you for your valuable time spent on this manuscript. You truly deserve our deepest gratitude.

Please take a look at our responses as follows:

 

General remarks

 

Remark 1

 

From my first Review: “A central point of the paper is the proposal to store DAQ data in binary format. This is common practice in most (industrial) applications (such as wav: https://en.wikipedia.org/wiki/WAV) and has no novelty value.”

 

Response R1: “Yes, using binary formats in GENERAL is not a novel idea. However, in this study, we introduced OPTIMIZED binary formats for AE measurements, requiring just 2 bytes to store a value. Additionally, we explored the FASTEST METHODS for writing this data on HDDs. A series of tests employing various writing approaches were conducted to identify the most optimal data-writing algorithm. As a result, Approaches 2 and 4 were recommended. We highly appreciate your understanding of our work.”

 

Reply: “I appreciate the exploration into optimizing data storage efficiency in your study. However, storing DAQ data directly in binary formats matching the acquisition hardware’s resolution is common practice in the industry, as seen with companies like Mecalc Technologies, gfai tech or Dewesoft just to name a few. Given this context, the use of "optimized binary formats" is not a notably new innovation. The extensive discussion spanning lines 137 - 326 must be substantially condensed. A succinct half-page explanation as a part of the methods is appropriate.”

 

Response R2 for Remark 1:

Dear Reviewer, thank you for pointing this out. In lines 178-179, I mentioned:” If a DAQ device allows the reading of raw binary values from its ADC, these data can be stored directly without encoding, leading to efficient storage utilization.” I believed this task could be easily managed by the DAQ manufacturer. However, the introduced algorithm for optimizing binary formats is intended for DAQ devices which presents reading values in the form of real numbers. Understanding this algorithm helps programmers reduce the file size significantly.

To avoid any further arguments, I have removed the mention of “optimized binary formats” as an invention. You can see the modifications in Lines 17, 124, and 125. We hope this modification satisfies your comment. Your understanding is highly appreciated.

Additionally, based on your comment, the section “Advantages of binary file formats” has been removed, and the section “Algorithms for data encoding to minimize the file size” has been shortened. The remaining content is retained to provide adequate information on the principle and advantages of the algorithm. Your understanding is highly appreciated.

Thank you so much.

 

 

Remark 2

 

From my first review: Although a reasonable number of references are listed, most of them refer to the introduction, which is largely concerned with listing applications of acoustic emission techniques. A state of the art discussion of common storage, streaming and compression principles is completely missing. What about streaming technologies such as direct streaming or Apache Kafka?

 

Response R1: Thank you for your recommendation. Adding these contents may help enrich the manuscript. However, this study focuses on how to store data on HDDs effectively. Transferring data from a source to a destination (streaming) is not of interest in this situation. We highly appreciate your understanding.

 

Reply: Okay, I accept that streaming is not the focus. However, I still lack a classification of the state of the art in relation to the topic presented, particularly in relation to storage.

 

Response R2 for Remark 2:

Yes, thank you for pointing this out. Solid-state hybrid drives have been discussed in lines 82-92, as per your comment. We believe that including SSDs, HDDs, and SSHDs sufficiently covers the discussion of local storage solutions for AE measurements.

 

 

Remark 3

 

From my first review: The article is of large volume. Large volume is not always an advantage and can hinder readers from fully understanding the essence of the article. Therefore, I suggest highlighting the most important parts in the article, and eliminating elements that do not contribute scientific value.

 

Response R1: Yes, thank you for your suggestion. The manuscript has been revised according to your recommendation. Some elements were eliminated to highlight the most important parts.

 

Reply: The revised manuscript is only 21 lines shorter than the original version. I must therefore repeat myself: The article is of large volume. Large volume is not always an advantage and can hinder readers from fully understanding the essence of the article. Therefore, I suggest highlighting the most important parts in the article, and eliminating elements that do not contribute scientific value.

 

Response R2 for Remark 3:

Dear Reviewer, thank you for pointing this out. Based on your comment, the section “Advantages of binary file formats” has been removed, and the section “Algorithms for data encoding to minimize the file size” has been shortened. The remaining content is retained to provide adequate information on the principle and advantages of the algorithm. Your understanding is highly appreciated.

 

 

Special notes

 

Review 1 – Bullet Point 23

 

From my first Review: “As far as I understand it, only harmonic signals are used in the simulations. Why? Why wasn't noise also examined as a test signal?

  • Noise signals, particularly white noise, contain a much more random distribution of amplitudes across all frequencies. This randomness could potentially test the DAQ system's ability to handle data with high entropy. Harmonic signals, being periodic and prability to handle random data, which is common in real-world scenarios.

 

  • Noise signals are typically less compressible than harmonic signals due to their randomness. This characteristic might impact the system's storage efficiency as compressibility can influence write speeds. The less predictable the data, the harder it is to compress, potentially leading to slower write speeds due to larger file sizes being written.
  • The efficiency of the DAQ system’s buffering strategies could be differently impacted by noise versus harmonic signals. Noise, with its high variability, might challenge caching algorithms which rely on data predictability to efficiently manage memory.
  • All in all I recommand also to add a comparison of the writing speed between different types of signals (harmonic signals vs. noise signals)”

 

Response R1: Dear Reviewer, in this study there is no compression on the data. The writing procedure can be described as follows:

  1. Read data from the DAQ: The DAQ has a duty to continuously provide reading values in real time. The performance or efficiency of the DAQ is not the focus of this research.
  2. Encode data into a binary format: This process is regardless of the data's amplitude. There is no difference between harmonic signals and noise signals in this step.
  3. Write binary data to HDDs: Similarly, this process is unaffected by the data's amplitude, with no difference between harmonic signals and noise signals.

In conclusion, we believe that it is not necessary to add a comparison of the writing speed between different types of signals (harmonic signals vs. noise signals). We highly appreciate your understanding.

 

Reply: I am fine with that explanation. However, I recommend to include a short discussion, why you took harmonic signals as test signals.

 

Response R2 for Bullet Point 23:

Yes, thank you for pointing this out. A discussion has been added in Lines 535-537 as per your recommendation.

 

 

Review 1 – Bullet Point 25

 

From my first review: Section 5.2 Actual measurement with the application of acoustic emission monitoring (Lines 702 - 764)

  • Its great to show an concrete example application for the proposed system
  • However, the example does not require a real high data rate? Why was was example chosen and in which relation is it to the aim of the paper?

 

Response R1: The measurement sampling rate was set to 2 MS/s per channel, providing a more accurate representation of the waveform, which is crucial for detecting an AE event. Additionally, the Kistler sensors operated within a frequency range of 100–900 kHz. As mentioned in the manuscript, a sampling rate of 2.56 times the highest frequency component is recommended.

 

The example was chosen to demonstrate the proposed system's performance in an actual AE measurement. The high sampling rate was configured for 8 channels over a measurement period exceeding 29 hours. This test confirmed the program's capability to perform an AE measurement, successfully detecting potential cracks. All data was written efficiently on the HDD, using minimal storage.

 

Reply: The manuscript does not specify whether the acoustic emission measurements were conducted using eight channels, as mentioned in your answer above. Clarifying whether all eight channels were active during the tests would help in understanding the full scope of the data acquisition and the system's capabilities. This detail is crucial for assessing the validity of the results and the performance of the DAQ system under review.

 

Respond Round 2 for Bullet Point 25:

Yes, the manuscript has been revised according to your comment. You can see the modifications in Lines 628-629. We believe it is unnecessary to describe all eight activated channels, so we have only discussed the channels that are meaningful for the research.

 

 

Review 1 – Bullet Point 2

 

From my first review: Lines 46 / 47 - Why?

 

Response R1: Dear Reviewer, a sampling rate of at least 10 times the maximum frequency of interest is needed to reproduce accurate waveform of the signal. A higher sampling rate offers a more accurate representation of the waveform. Number 10 is based on practical experience. A reference ([4]) has been added to provide the source of information as follows:

https://www.dataq.com/data-acquisition/general-education-tutorials/what-you-really-need-to-know-about-sample-rate.html

 

Reply: There is no theoretical basis for choosing a sampling rate that is 10 times the highest frequency. In the industry, this is often used as a rule of thumb to avoid artifacts resulting from the filter design of anti-aliasing filters or the SNR, for example. However, the term "needed" should not be used here. It is better to use "recommended".

By the way: the link in the reference does not work, if I click it. It sends me to https://www.dataq.com/data-acquisition/general-edu-803 and not to https://www.dataq.com/data-acquisition/general-education-tutorials/what-you-really-need-to-know-about-sample-rate.html

 

Respond R2 for Bullet Point 2: 

Thank you so much for pointing this out. A modification was made in Line 44 as per your recommendation.

The link in the reference was typed as text only. It cannot be accessed by direct click in my computer. Your issue might also occur on another computer. I believe assistant editors can resolve this in the final step.

 

 

Comment: Figure Reference in wrong Format

Zeile 261: Fig 5 -> Figure 4

 

Respond R2:  Thank you so much for pointing this out. It was corrected in Line 202.

 

 

Review 1 – Bullet Point 17

 

From my first review: Lines 342 - 362 In general, it is good engineering practice not to exceed the working range at all. Increasing the operating range by 7% is a small gain compared to the potential damage to the DAQ device if the range is exceeded too much. In addition, the problem of clipping is merely postponed and not eliminated when the working range is increased.

 

Response R1: Dear Reviewer, we did not intend to expand the operating range of the DAQ device. However, the actual operating range of the NI-9233 DAQ is wider than its nominal range. This is why we introduced two Solutions for this issue.

 

Reply: You write that either the nominal range (+/- 10 V) can be used with clipping detection (Solution 1) or the full operating range (+/- 10.7 V) (apparently) without clipping detection (Solution 2). However, these are two completely different considerations.

 

  1. you can increase the operating range to gain additional headspace

 

  1. you can do without clipping detection for performance reasons. This requires careful tuning of the input gain to ensure that the signals are not clipped. However, this is independent of whether the nominal or the actual working range is used.

 

Response R2 for Bullet Point 17:

Dear Reviewer, you are right that adjusting the input gain can control the range of signals. However, unwanted noise from sensors, amplifiers, or cables may still be present in the measurements. An additional headspace might not be sufficient to cover this noise. In this case, the noise must be clipped, as outline in Solution 1. For a better performance, we recommend Solution 2, which utilizes the actual range of the DAQ, to cover all possible reading values with noise.

 

 

We greatly appreciate your efforts in helping to improve our manuscript.

Thank you so much and Best Regards.

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

Please see my attached comments.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

We are extremely grateful for your detailed review of our manuscript. We have revised the manuscripts in accordance with your comments. We hope all our responses make sense to you. Thank you for your valuable time spent on this manuscript.

Please take a look at our responses as follows:

 

 

1)         Statistical evaluations

In the first two rounds of review, I missed an essential aspect of your paper. In the section "Algorithms to improve data write rate" you discuss various methods and show the data write transfer rate as a function of time. The processes shown look like one-off runs. This raises the question of whether these processes are 100% repeatable. For reliable statements, it would have been very important to include a statistical discussion (with mean value and standard deviation). The results here should therefore be treated with a good deal of skepticism. In view of the fact that this is the third round of the review, however, it would be unfair to ask you to do so.

 

Response:

Dear Reviewer, thank you for your kind understanding. For every configuration in each approach, the writing test was conducted multiple times. There was no remarkable difference between the average write rates of these tests and those described in Table 2. This stability is attributed to the 5GB of data used in each test, which is sufficiently large to ensure generalization. This content has been added to the manuscript in Lines 413-417. Thank you so much for pointing this out.

 

2)         Binary formats

I am not completely convinced, but I respect your objections and adjustments to the binary formats. However, I expect at least the following final changes / adjustments:

Line 16 and line 125: Please remove the word "novel". This is not new.

 

Response:

Dear Reviewer, we agree that the concept of “optimized binary formats” is not entirely new, which is why we did not use the term “novel” to describe them. However, the term “novel” in Lines 16 and 25 was applied to the approaches of “handling binary formats” which involve a combination of reducing data file size and increasing write rates. We believe this aspect is new.

Nevertheless, to avoid further debate, we have removed the word "novel" as per your suggestion. We trust that readers will recognize the innovation in our algorithms.

 

 

Line 163: “some (or at least many) common methods” instead of most

 

Response:

Dear Reviewer, thank you for pointing this out. The manuscript has been revised accordingly. The phrase “a common method” was used in place of “the most common method”. The changes can be found in Line 160.

 

 

Line 185: “Some (or at least many) DAQ devices” instead of most DAQ devices

 

Response:

Dear Reviewer, thank you for pointing this out. The sentence has been revised to avoid using the phrase “most DAQ devices.” In this context, “12-, 16-, or 24-bit” is simply examples of a resolution, as every DAQ device has a measurement resolution. The changes can be found in Line 175.

 

 

3)         Utilization of the headroom

You have correctly emphasized that adjusting the input gain factor keeps the signal within the working range. However, it remains unclear how the additional headroom due to the actual voltage range of the DAQ is to be used without the risk of clipping, especially if clipping detection is omitted, as this is cited as a limitation when using the nominal working range. It would be helpful if you could explain in more detail the risks and impact on signal processing when using the extended voltage range. I am particularly interested in how clipping is avoided when the actual voltage range is used. I would also like to see a discussion about the amplification in connection with the two options.

 

Response:

Dear Reviewer, we believe that we have described this clearly in Lines 240-260. For more detailed, please consider a discussion as follows:

DAQ device: National Instruments’ NI-9223 has a nominal analog input range of +/-10 V but can actually measure up to +/-10.7 V.

-           Case 1: using the nominal range, that is, +/-10 V and the 3-byte number format conversion algorithm. This algorithm describes values in the -10 to 10 V range by a scaled range 0 to 16,777,215.

For example, a reading value is 10.3 V, this exceeds the range of conversion algorithm. This can occur due to various factors, such as incorrect amplifier settings, electrical noise, cable interference, or the impact of nearby electrical equipment. Using Equation 5, the scaled value would be 17,028,873, which lies outside the algorithm's range, leading to conversion errors. In this case, clipping is required.

-           Case 2: using the actual range, that is +/-10.7 V and the 3-byte number format conversion algorithm. This algorithm describes values in the -10.7 to 10.7 V range by a scaled range 0 to 16,777,215.

 For example, the input voltage (analog) is 11 V. In this case, the DAQ reads the value (digital) as 10.7 V, corresponding to the scaled value of 16,777,215, which is the maximum allowed by the device's actual range. The DAQ clips the value itself. Therefore, all reading values are always within the conversion range of the algorithm. An algorithm for clipping is avoided.

We hope the above discussion addresses your concern.

 

 

4)         Missing Bullet Points

Between Line 166 and Line 197 the content has been organized in Bullet Points before. Not sure, if another reviewer critizised that, but I found it better in the previous presentation.

 

Response

Dear Reviewer, we have reviewed all previous versions of the manuscript, and bullet points were not used between Lines 159 and 208. Regardless, we are satisfied with the current format. Thank you so much for your suggestion.

 

 

5)         Actual measurement with the application of acoustic emission monitoring

Thanks for clarifying in line 637 that eight channels have been activated. However, it is still not clear, that all eight channels have been recorded. It is interesting to read the reported test case. However, the focus of this paper lies in writing high data rates. So please point out, that you recorded also the unused signals to proof your algorithm.

 

Response:

Dear Reviewer, thank you for pointing this out. The manuscript has been revised as per your recommendation. The changes can be found in Lines 643-649.

 

 

We greatly appreciate your efforts in helping to improve our manuscript. You truly deserve our deepest gratitude.

Thank you so much and Best Regards.

Back to TopTop