Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates

Vuong, Quang Dao; Seo, Kanghyun; Choi, Hyejin; Kim, Youngmin; Lee, Ji-woong; Lee, Jae-ung

doi:10.3390/app14167410

Open AccessArticle

Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates

by

Quang Dao Vuong

¹

,

Kanghyun Seo

²,

Hyejin Choi

²

,

Youngmin Kim

²,

Ji-woong Lee

¹

and

Jae-ung Lee

^1,*

¹

Division of Marine System Engineering, Korea Maritime and Ocean University, Busan 49112, Republic of Korea

²

Division of Marine Information Technology, Korea Maritime and Ocean University, Busan 49112, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7410; https://doi.org/10.3390/app14167410 (registering DOI)

Submission received: 8 July 2024 / Revised: 18 August 2024 / Accepted: 20 August 2024 / Published: 22 August 2024

(This article belongs to the Section Acoustics and Vibrations)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Featured Application

This study developed algorithms to enhance the data write performance for exceptionally high sampling rate measurements, such as acoustic emission measurements, using conventional hard disk drives.

Abstract

This study aimed to enhance the data write performance in measurements with exceptionally high sampling rates, such as acoustic emission measurements. This is particularly crucial when employing conventional hard disk drives to store data. This study introduced algorithms for handling binary formats, thereby reducing the data file size, increasing write rates, and ultimately shortening data write times during measurements. The suggested approaches included utilizing specialized binary formats and implementing self-created buffers. These approaches resulted in a remarkable write time reduction of up to 40×. Furthermore, employing multiple drives for writing significantly enhanced performance compared with that of using a single drive. Therefore, the proposed algorithms offer promising results for managing large amounts of data in real time.

Keywords:

high sampling rate; acoustic emission; hard disk drive; binary format; write rate

1. Introduction

Condition monitoring is a critical maintenance approach that measures the parameters of conditions in machines and structures. These parameters include vibration, noise, acoustic emission (AE), and temperature, which determine the machine health. Any abnormal parameter value can indicate a developing defect, and timely identification can prevent major breakdowns or failures. One of the essential configurations of a condition-monitoring system is the sampling rate. The sampling rate or frequency determines the number of samples per second (S/s) that are digitized using a data acquisition (DAQ) device. The required sampling rate depends on the maximum frequency of interest in the application. For instance, in the case of structural vibration measurements in two-stroke diesel engines, the frequency range of interest is 2–100 Hz. For sound measurements, the frequency range of interest is typically 20–20 kHz, which covers the audible range for humans. Nyquist’s theorem and antialiasing filters play crucial roles in determining the minimum required sampling rates. According to Nyquist’s theorem, the sampling rate should be at least twice that of the highest frequency component of interest. In practice, a sampling rate of 2.56 times the highest frequency component is recommended to account for antialiasing filter effects [1,2,3]. When an accurate representation of the signal waveform is required, a sampling rate of at least 10 times the maximum frequency of interest is recommended [4].

The AE technique [5,6] is a special case of condition monitoring. In the industry, AE testing is preferred among nondestructive testing techniques [7], which allows the examination of an object or a material without damaging it. The AE technique offers real-time monitoring and rapid fault recognition with a high sensitivity. The installation of AE sensors is simple and does not disturb nearby sensors or the operation of the target mechanism. Using AE sensors, we can “listen” to the “sounds” generated by various types of faults, such as leaks [8,9] and growing cracks [10,11,12], as well as those that have not yet appeared [13,14]. By analyzing the AE parameters, the stage of material or structural degradation can be accurately determined. AE technology can also be used to locate [15,16] and characterize [17] impending failures, making it possible to prevent accidents before they occur.

AE frequency is an essential parameter for assessing the health and performance of materials or structures. This is based on the frequency of the elastic energy [18] released by irreversible changes in the internal structure of materials, such as crack formation or plastic deformation. Typically, the frequency of an AE ranges from a few kilohertz to several megahertz, although lower and higher frequencies can also be generated depending on the specific situation. In rare cases, AE has been reported at frequencies up to 100 MHz [19,20,21]. To ensure the accurate reconstruction of the signals after digitization, AE measurements require a very high sampling rate. Consequently, a large amount of data is acquired for each short consecutive period. Moreover, when multiple channels are simultaneously monitored, the amount of data that must be acquired and stored becomes even more significant. The write process must be sufficiently rapid to store the entire data being continuously streamed in real time.

Advancements in computer technology have resulted in significant improvements in the transfer rates of hard drives. A standard hard disk drive (HDD) can now offer a sequential write rate of approximately 160 MB/s [22], which represents the maximum speed at which data can be continuously written into the drive. By contrast, solid-state drives (SSDs) demonstrate excellent write performance. A standard serial Advanced Technology Attachment (SATA) SSD can provide a sequential write rate of over 500 MB/s, whereas a nonvolatile memory express (NVME) SSD can offer a write rate of more than 5000 MB/s. Based on these outstanding performances, it appears that SSDs are the best choice for applications requiring high-speed data processing and storage. However, despite the exceptional performance of SSDs, their installation costs remain considerably high. Although SSDs have become cheaper, a significant price difference remains between HDDs and SSDs. In addition, the capacity of an SSD is relatively limited compared with that of an HDD. Therefore, the use of HDDs remains an efficient and economical solution for AE measurements, which require a large storage space to store measurement data.

In the art of storage solutions, solid-state hybrid drives (SSHDs) [23] combine the advantages of traditional HDDs and SSDs. This offers the large storage capacity of HDDs, and the faster data write speeds of SSDs. The improved performance of an SSHD depends on its firmware and caching algorithms. When the firmware identifies certain data as frequently accessed, it stores that data in the NOT-AND (NAND) flash memory, achieving speeds comparable to entry-level SSDs. However, typically, measurement data are not often categorized as frequently accessed. In addition, once the NAND flash cache is full, the SSHD writes data on the traditional HDD platters, resulting in write speeds similar to standard HDDs. Therefore, for AE measurements, SSHDs do not provide significant write performance improvements over HDDs. Consequently, this study focuses solely on HDDs as the storage solution.

Although the sequential write rate of a standard HDD is approximately 160 MB/s, the actual write rate is generally lower when performing common tasks, particularly when dealing with text formats. Text formats are commonly employed to store sensor readings in measurements, provide accessibility through various text editor programs, and offer convenience to general computer users. However, the utility of text formats results in slow and inefficient data storage, which is attributed to high memory consumption and suboptimal write rates. Hence, text formats are not preferable for AE measurements, particularly for those employing HDDs.

To address these issues, binary file formats are recommended. These formats offer significant performance benefits over text file formats. Binary file formats are designed for direct access, offering unmatched speeds compared with other file formats that require parsing and interpretation. Values stored in binary formats occupy the same space on the disk as they occupy in memory, ensuring the efficient utilization of storage resources. Furthermore, most general programming languages support encoding a number in the binary format as a series of bytes, and vice versa, simplifying the process of working with binary file formats across different programming environments.

The ISO/IEC 60559:2020 [24] (IEEE 754 [25,26]) standard is the most widely used for representing real numbers in binary form. This standard defines different binary encoding algorithms that can represent real numbers in large ranges, thereby enabling the storage of a wide range of sensor reading values. However, this may not be the most efficient method of encoding sensor reading values, as these values typically have a limited range for each measuring channel. For instance, sensor reading values are generally acquired as real numbers with a range of ±10 V (voltage) or 4–20 mA (current), depending on the specification of the DAQ device employed. In addition, each DAQ device has a specific measurement resolution that limits the number of measurement values [27]. Using standard algorithms to store sensor reading values can lead to the wastage of storage resources. To optimize data storage, data-encoding algorithms should be specialized with respect to the characteristics of the target values. This approach ensures that the required memory is minimized.

To this end, this study aimed to improve the efficiency of storing sensor reading values in tasks with exceptionally high sampling rates, such as AE measurements. This was achieved by introducing algorithms for handling binary formats, thereby reducing the data file size, increasing write rates. These algorithms maximize the benefits of binary formats with the goal of storing large amounts of data rapidly and accurately without requiring expensive storage upgrades. This allows for AE measurements to be adopted in multiple channels to perform well when employing a conventional HDD at a considerably lower cost than when investing in more expensive storage solutions. Accordingly, the number of bytes required to store a value can be minimized by encoding data using a specialized binary format based on the measurement range and resolution of the employed DAQ device, resulting in a smaller file size. In addition, the algorithms exhibited a remarkable enhancement in write performance, allowing sensor readings to be stored at a rate of hundreds of mega-samples per second (MS/s), or even higher. This capability supports not only AE measurements but also Scanning Acoustic Microscopy (SAM) [28,29] using hard disk drives (HDDs), which are acknowledged for their limited write performance. High sampling rates ensure that the SAM system can accurately reconstruct acoustic signals, producing detailed images of the specimen’s internal structure or surface.

When obtaining measurements, monitoring programs are often necessary to perform multiple tasks concurrently. These tasks include acquiring, analyzing, visualizing, encoding, and writing sensor readings in real time. Furthermore, in high-performance measurements, such as those required for acoustic emission applications, the workload is considerably higher. This requires advanced configurations to ensure the efficient functioning of all tasks, even when computer resources are limited. To address these challenges, the proposed algorithms were used to develop a condition-monitoring program. A series of measurements were conducted to verify its ability to adapt to high-performance measurements.

2. Algorithms for Data Encoding to Minimize the File Size

Figure 1 shows a schematic of the measurement system that employs the algorithms developed in this study. This measurement system is tasked with simultaneously executing various functions, such as acquiring, analyzing, visualizing, encoding, and writing sensor readings in real time. Each task must be optimized to improve the efficiency of the system. First, data-encoding algorithms should be customized to match the characteristics of the target values, thereby minimizing the required data storage.

In typical vibration measurements, sensor reading values are acquired in real time. A common method for storing these values is to use text formats. However, binary formats offer the ability to achieve a smaller file size while maintaining the same amount of information compared with plain text formats. In a binary file, data are stored in the same manner as in the main memory for processing, which provides direct file access. In addition, binary formats allow data processing in large chunks. This significantly improves the read and write performances. Some standard formats represent real numbers in a binary format. For example, the double-precision floating-point (or Double) format is generally applied to represent real numbers with high precision. In this study, we used terms from the C# programming language. They may have different names but have the same functionality in other languages. A Double value can be represented in binary format and requires eight bytes of memory [30]. However, the range covered by this standard format is extensive, ±5.0 × 10⁻³²⁴ to ±1.7 × 10³⁰⁸, making it inefficient for representing sensor reading values, which typically have limited ranges and specific resolutions. Consequently, specialized encoding algorithms that can optimize the storage of sensor readings based on their specific ranges and resolutions are needed.

Every DAQ device has a measurement resolution, typically 12-, 16-, or 24-bit, which generally indicates that these devices have an analog-to-digital converter (ADC) that returns the given bits for an input sample. This implies that the ADCs represent values using binary formats. If a DAQ device allows the reading of raw binary values from its ADC, these data can be stored directly without encoding, leading to efficient storage utilization. However, for user convenience, most DAQ devices typically present reading values in the form of real numbers. Therefore, it is necessary to develop specialized data-encoding algorithms that can efficiently convert these values into binary data. To achieve this, the resolution and analog input range of the DAQ must be considered. The resolution determines the number of levels used to digitize the sensor reading values, which in turn determines the smallest signal change detected by the device. With a 16-bit resolution, an ADC can represent 2¹⁶ = 65,536 evenly discrete levels, whereas a 24-bit ADC can represent 2²⁴ = 16,777,216 evenly discrete levels. For example, a typical DAQ device has an analog input range of −10 to 10 V with a 16-bit resolution for each channel. In this case, the smallest signal change that the DAQ can detect is:

\frac{10 - (- 10)}{2^{16} - 1} = \frac{20}{65,535} = 3.05 \times 10^{- 4} (V)

(1)

If the resolution is 24-bit, the smallest signal detectable change will be:

\frac{10 - (- 10)}{2^{24} - 1} = \frac{20}{16,777,215} = 1.19 \times 10^{- 6} (V)

(2)

For a DAQ device with a 24-bit resolution and a ±10 V analog input range, the data storage format used to represent the sensor reading values should satisfy this range and resolution. The 3-byte number format is suitable for this purpose. Thus, each sensor reading value is encoded as a scaled value, consisting of three bytes. Figure 2 demonstrates the configuration of an example value encoded in the 3-byte format, where the scaled value is determined by a group of three bytes, with values

a_{0}

,

a_{1}

, and

a_{2}

, respectively.

The 3-byte number format provides a 24-bit resolution, where each byte containing 8 bits can represent 1 of the 256 distinct values ranging from 0 to 255. This provides a 3-byte number format for a total of

256^{3} = 16,777,216

possible values (

2^{24}

). In the example described in Figure 2, the scaled value,

X

, is calculated as follows:

\begin{matrix} X & = a_{0} \times 256^{0} + a_{1} \times 256^{1} + a_{2} \times 256^{2} \\ = 6 \times 256^{0} + 30 \times 256^{1} + 150 \times 256^{2} \\ = 9,838,086 \end{matrix}

(3)

The minimum possible scaled value is

X_{m i n} = 0

, which occurs when

a_{0} = a_{1} = a_{2} = 0

. The maximum possible scaled value is

X_{m a x} = 16,777,215 = 2^{24} - 1

, which occurs when

a_{0} = a_{1} = a_{2} = 255

. To represent the full input range of acquired values,

A

, the lower limit value,

A_{m i n} = - 10

V, is assigned to

X_{m i n}

, and the upper limit value,

A_{m a x} = 10

V, is assigned to

X_{m a x}

. The conversion algorithm is depicted in Figure 3 and can be mathematically expressed as follows:

X = R o u n d (\frac{16,777,215}{20} (A + 10))

(4)

or:

A = \frac{20 X}{16,777,215} - 10

(5)

In this case, the scaled value,

X = 9,838,086

, represents the acquired value,

A = 1.727913

. The round method in Equation (4) can be used to round off a value to the nearest integer. During the measurements, the value

A

is acquired from the analog inputs. Subsequently,

X

is determined using Equation (4). The next issue is how to store these values in a 3-byte number format. The three components can be calculated as follows:

a_{2} = F l o o r (\frac{X}{256^{2}})

(6)

a_{1} = F l o o r (\frac{X - {a_{2} \times 256}^{2}}{256})

(7)

a_{0} = X - {a_{2} \times 256}^{2} - a_{1} \times 256

(8)

The floor method returns the largest integer value that is less than or equal to a given number. Equations (6) and (7) can be understood by considering the integer part after the division.

The combination of a 24-bit resolution and a ±10 V analog input range offers the smallest signal detectable change of

0.000001

, according to Equation (2). Thus, to represent an acquired value in text format, seven digits are needed, along with a dot symbol, plus/minus sign, and delimiter, totaling ten characters. Therefore, 10 bytes are required to store the value acquired by the DAQ. However, the 3-byte number format can represent the same value using only 3 bytes, thereby reducing the file size by 3.3 times, while maintaining the precision of the sensor reading values.

Indeed, most DAQ devices used for AE measurements have a 16-bit resolution, meaning that they require a number system that can represent 2¹⁶ = 65,536 discrete levels. The 2-byte number format can fully adapt to this requirement, as it has an algorithm similar to the 3-byte number format, but with

a_{2}

always being zero (Equation (6)). Therefore, this format uses only two bytes—

a_{0}

and

a_{1}

—to represent a value, as shown in Equations (8) and (9). A DAQ device with a 16-bit resolution and a ±10 V analog input range offers the smallest signal detectable change equal to

0.0003

, as presented in Equation (1). Representing an acquired value as a text requires five digits. In the same manner as described above, eight bytes are required to store that value. Therefore, the 2-byte number format reduces the file size by 4 times compared with the text format. The 2-byte format can also be employed to represent sensor reading values with a 12-bit resolution. In addition, if a 32-bit resolution is needed in some special cases, the 4-byte number format can satisfy the data storage requirement in the same manner.

Notably, the actual range of a DAQ device is often wider than its nominal range. For example, National Instruments’ NI-9223 DAQ device has a nominal analog input range of ±10 V but can actually measure up to ±10.7 V. If the acquired value falls outside the configuration range, the conversion algorithm encounters errors. Several solutions are available to address this issue, including:

Solution 1: Using the nominal range; that is, ±10 V. If a value exceeds this range, such as −10.2 or 10.3, it is clipped to a limit value of −10 or 10, respectively. However, this solution requires additional signal processing. Condition-monitoring programs must check every value in real time, whether the value is in or out of range, and clip it if necessary. This task is insignificant for one or even thousands of values. However, with large amounts of data (tens of millions of values per second), it can become a significant workload that requires a long time to process. This process may require tens of milliseconds and should be considered carefully because the monitoring system must complete several other tasks within a limited time.

Solution 2: Using the actual range of the DAQ device. In this solution, test engineers must manually enter the actual analog input range of the connected DAQ devices or load them from a database of condition-monitoring programs. This approach ensures that all acquired values are within the configured range, thus eliminating the need for a monitoring program to check every value. Hence, the sensor reading values can be directly converted into binary formats, thereby reducing the processing time and optimizing performance.

3. Algorithms to Improve the Data Write Rate

3.1. Data Write Performance Tests Using Text Formats

Write performance tests were conducted using the popular HDD of Western Digital. The drive specifications are listed in Table 1, where HDD #1 is identified as HDD No. 1. In C#, various methods are available for writing the text to a file, and one of the most commonly used methods is the StreamWriter class, available in the System.IO namespace [31]. The testing procedure involves the following steps:

Step 1: A total of 10,000,000 values were created as the Double type.
Step 2: A new blank file was created.
Step 3: The values were written into the file. Two options are available:
-
Each value is converted into a string and then written into the file individually.
-
Every value is converted into a string, forming a large string, and then written into the file.
Step 4: The file was closed.

The highest observed average write rate was 16 MB/s. This rate is insufficient to satisfy high write performance demands, such as those required for AE measurements.

3.2. Data Write Performance Tests Using Binary Formats

Using the same HDD, a series of tests adopting various writing approaches were performed to identify the most optimal data-writing algorithm. Prior to each test, a data pack was created comprising several one-dimensional byte arrays, totaling 5 GB (5,368,709,120 bytes). The tests utilized the FileStream class [31] from the System.IO namespace, which is a widely used and efficient approach to write a sequence of bytes to a file stream. The entire dataset can be written to either a single file or multiple files of equal size.

3.2.1. Approach 1. Writing the Values into Files Immediately after They Are Acquired Using Internal Buffers

The measurement typically begins by creating a new blank file. The sensor reading values are then continuously written into this file as soon as they are acquired. The file is eventually closed when the measurement is complete. This process can be repeated for multiple measurements. The testing procedure involved the following steps:

Step 1: A new blank file was created.
Step 2: Overall, 1 MB of data was read from the data pack and written into the data file. This step simulates the process of acquiring the sensor reading values and then writing them into a file.

This step was repeated until the file size reached 128 MB.

Step 3: The current file was closed, and the process was started again from Step 1. A new file was created to save the data.
The process was repeated from Step 2 until the entire 5 GB of data had been written.

In this manner, the entire dataset was written into multiple equal-sized 128 MB files, each named in the format “DATA-xxx.bin”, where “xxx” is the file number (e.g., “DATA-001.bin”, “DATA-002.bin”, etc.).

In fact, when data are written into a file, they are not immediately stored on a physical disk. Instead, they are temporarily copied into the buffer memory. A buffer can be understood as a “waiting area” for data before writing. It refers to a chunk of memory (typically in RAM) where the data are temporarily gathered. Data that arrive earlier must wait for a certain amount of data to arrive before being sent for storage in a file. When a new file is created, an internal buffer of a configurable size is generated. If no specific buffer size is specified, the default buffer size is used during the writing process. Various buffer configurations were employed for testing, as outlined below.

Option 1: Using the default buffer size. The default buffer size used was 4096 bytes. As described in Step 2, the size of the data acquired for each write was 1 MB, which was larger than the buffer size. With this configuration, the data were written immediately into the files for every 1 MB acquired. The write performance is illustrated in Figure 4. The average data write rate was 124 MB/s.

In another test, the size of each file was increased to 256 MB, resulting in an average data write rate of 125 MB/s. Additional tests were conducted using the following configurations:

Increasing the size of each file to 512 MB.
Increasing the read data chunk size to 5, 10, and 20 MB.

Nevertheless, these tests did not reveal any notable difference in write performance.

Option 2: Using a configured buffer size. Similar to the previous tests, the file size was first configured to 128 MB, and then to 256 MB. The buffer sizes were 8, 16, 32, 64, 128, and 256 MB. With these configurations, the acquired data were temporarily gathered in buffers until sufficient data were accumulated for writing into the files. The results indicated that larger buffer sizes resulted in a better write performance for the same file size configuration. For each file size setting, the best performance was achieved when the buffer size matched the file size. The average write speeds were 150 and 154 MB/s for buffer sizes of 128 and 256 MB, respectively, as illustrated in Figure 5. However, setting the buffer size larger than the file size did not yield any significant improvements.

3.2.2. Approach 2. Storing the Values in a Self-Created Buffer before Writing

In this approach, a buffer memory is manually created to temporarily store the values. These values are only written into a file when the buffer becomes full. The test procedure is described below:

Step 1: An array was created to serve as a buffer memory.
Step 2: A new blank file was created.
Step 3: The values were read from the data pack and copied into the buffer. When the buffer became full, all the values were copied into a data block.
Step 4: The data block was written into the file, which was then closed.
Step 5: The buffer was reused as a new buffer for new values. A new file was created to store the data.

The process was repeated from Step 3 until the entire 5 GB of data were written.

A series of tests were conducted with different buffer size settings. The results are shown in Figure 6 and presented in Table 2. The tests revealed that increasing the buffer size improved the write rate. However, the improvement observed when the buffer size was increased from 128 to 256 MB was less significant than that when the buffer size was increased from 64 to 128 MB. However, the improvement was more effective when the buffer size was increased from 256 to 512 MB. Furthermore, increasing the file size beyond 512 MB did not result in a significant improvement in write performance.

When conducting actual measurements, it is necessary to begin Step 5 immediately after Step 3 to process newly acquired data. However, Step 4 requires some time to complete. To overcome this problem, Step 4 is performed using a new thread. By utilizing multi-threading techniques, Steps 5 and 4 can run simultaneously.

3.2.3. Approach 3. Parallel Writing

Several tests were conducted to verify whether writing data in parallel into multiple files improves write performance. Accordingly, the 5 GB data pack was split into multiple equal-sized blocks and then written into separate files concurrently using parallel processing. Each data block is written into a file by a separated thread. The buffer size and the file size were set equally as 8, 16, 32, 64, 128, 256, and 512 MB, as performed in Approach 2. The results, as illustrated in Figure 7, showed that file size configurations of 128 and 256 MB resulted in average write rates of 110 and 120 MB/s, respectively. These rates were lower than the rates of 157 and 160 MB/s achieved with sequential file writes, as described in Approach 2. This indicates that writing multiple files concurrently is less efficient than writing each file sequentially. Similar results were observed when testing smaller file sizes, such as 8, 16, 32, and 64 MB.

3.2.4. Discussion

In applications such as AE and vibration monitoring, sensor reading values are frequently acquired in real time as data chunks. Writing several small chunks into a single file requires more time than writing a few large chunks of the same total size. Storing every data chunk in a file as soon as it becomes available leads to a poor write performance. Therefore, Approach 1 is not recommended. To achieve a better write performance, it is recommended to temporarily store the available data in a buffer and wait for more data to arrive before writing all of them into a file. The buffer size should be configured to be sufficiently large to hold data temporarily before writing. Typically, an internal buffer is generated while creating a new file (or opening an existing file). However, an internal buffer is used for both reading and writing; therefore, the write performance is not as good as expected. Instead, it is recommended that a self-created buffer be applied, as described in Approach 2. Using this approach, the average write rate reached the best performance at 162 MB/s in the test applying a buffer size of 512 MB.

As described above, each write should be performed using a new thread. To avoid access conflicts between the threads, a new file is created for each write. Creating separate threads to write to a single file presents the risk of one thread trying to access the file while another thread is still accessing it, resulting in errors. However, multiple-file parallel writing should be avoided due to poor performance, as demonstrated by the results of Approach 3.

Compared with the write rate of 16 MB/s when storing data as text files, binary formats provide a considerably faster write rate of 10×. In addition, the 2-byte binary format can reduce the file size by 4× for the same precision. This combination of a faster write rate and a smaller file size results in a 40× reduction in the write time or a 40× better write performance. A 16-bit resolution requires 2 bytes to store a value, meaning that a write rate of 162 MB/s provides a measurement ability of 81 MS/s. In the case of a 24-bit resolution, 3 bytes are required to store a value; therefore, the same write rate can offer a measurement ability of 54 MS/s. These abilities satisfy the high sampling rate requirements of AE measurements.

In this study, the buffer sizes were selected as powers of two, such as 512, 256, or 128 MB, for ease of implementation. However, the tests showed that the difference in the data write rates between the 130 MB and 128 MB configurations was not remarkable. Therefore, it is not mandatory to use a buffer size with a power of 2, as the write rate increases regardless of the buffer size. Additionally, for every configuration in each approach, the writing test was conducted multiple times. There was no remarkable difference between the average write rates of these tests and those described in Table 2. This stability is attributed to the 5 GB of data used in each writing test, which was sufficiently large to ensure generalization.

3.2.5. Approach 4. Multidrive Write

In computer science, a technology called the redundant array of independent disks (RAID) combines two or more hard drives. Various RAID levels are available, with RAID 0 offering the best performance. Accordingly, the data are divided into blocks and alternatively written across multiple drives simultaneously, as shown in Figure 8. This technique significantly reduces the write latency, leading to a significant boost in the write performance. A combination of two similar drives with RAID 0 offers twice the write rate and double the capacity of a single drive. With three similar drives, the achievement triples, and so on.

RAID 0 offers a superior write performance. However, it is difficult for engineers to implement this technique owing to the following disadvantages:

The entire data on the hard drives are erased during the RAID setup.
A lack of flexibility in the configuration, such as adding or removing a drive, necessitates starting anew and erasing all the existing data.
Unable to read data when one of the drives is damaged.
To maximize the write performance, all hard drives must have the same capacity and write rate. For example, combining a 512 GB capacity and 120 MB/s write rate drive with a 128 GB capacity and 150 MB/s write rate drive in an RAID 0 system results in a system with a capacity of 128 × 2 = 256 GB and a write rate of $120 \times 2 = 240$ MB/s. The achievement level is based on the lowest capacity and rate.

Thus, a solution was proposed in this study to solve these concerns. It also requires at least two drives to store the data. In this study, three HDDs were employed for testing, and their specifications are listed in Table 3. HDD #1 was the hard drive utilized in the previous tests. Write performance tests were conducted for each HDD. The test procedure was the same as that described in Section 3.2.2. Accordingly, the 5 GB data were written into multiple equal-sized 512 MB files. Although HDDs #1 and #2 had the same manufacturing specifications, their write performances differed owing to the differences in runtime (power-on hours). The test results showed that the average data write rates for the drives were 162, 172, and 138 MB/s, respectively.

Herein, we describe a solution that utilizes a combination of three drives to enhance write performance. The write algorithm involved modifications to Approach 2. In addition, a 15 GB data pack was created for testing. The test procedure was performed as follows:

Step 1: The configurations were set, as described in Figure 9:
-
Three distinct folders were selected to store the data, each located on a different drive.
-
Setting the buffer size—this is the configuration of a self-created buffer.
Step 2: A new data file located in folder #1 was created.
Step 3: The values were read from the data pack and copied into the buffer. When the buffer became full, all the values were copied to a data block.
Step 4: A new thread was created to write the data block to the file and then closed.
Step 5: The buffer was reused as a new buffer for the new values. A new file located in folder #2 was created to store the data.

The process was repeated with folders #2 and #3 and then returned to folder #1 until all data had been written.

Finally, the entire dataset was written into multiple equal-sized 512 MB files located in order across the three drives, resembling RAID 0 technology. A simple algorithm was implemented to ensure that, at any given time, only one file was written into a drive. This implementation aims to prevent the reduction in write performance caused by parallel writing, as discussed in Approach 3. Consequently, this approach resulted in an excellent write performance, with an average rate of 472 MB/s, which is the summed write rate of all three drives.

This approach provides a solution that offers a high write performance of RAID 0 while addressing all its drawbacks, as follows:

Maintain all the existing data on the hard drives.
The configuration is flexible, allowing the addition or removal of a hard drive without affecting the existing data.
Data on the remaining drives are still accessible if one of the drives is damaged.
The use of hard drives with similar write rates is recommended, although the same capacity is not necessary.

During the test, we observed that initially, when all three drives were operating, the overall rate equaled the sum of their individual write rates. However, HDD #3, which had the slowest write rate, required more time to complete its tasks. Consequently, the overall write rate decreased to align with that of HDD #3. To address this issue, solutions can be implemented to distribute workloads based on the write rates of each drive, as outlined below.

Solution 1: Utilizing three separated buffers, each configured for a specific drive. As mentioned above, the average data write rates for the three drives were 162, 172, and 138 MB/s, respectively. To ensure that all the drives can finish their work concurrently, an equal ratio of the buffer size to the write rate on each drive is required. The expected buffer size, for example, 512 MB, can be used as a reference; therefore, the buffer sizes for each drive can be configured as 482, 512, and 410 MB, respectively. The best drive, HDD #2, was assigned an expected buffer size of 512 MB, which helped minimize the workload on the other two drives, #1 and #3, which had lower write rates.

To save memory, it is possible to use only one 512 MB buffer instead of three separate buffers. However, this method is complex. Accordingly, Step 3 in this approach is tailored for the writing turn of each drive, as follows: Read values from the data pack and then copy them to the buffer. Once the buffer reaches the designated value for the current drive (482, 512, or 410 MB), copy these values to a data block.

Solution 2: Multiplying the work on good drives. This solution is suitable when a significant difference exists in the write rates of the drives. For instance, let us assume that the write rate of drive #1 is almost the same as that of drive #2 but is approximately twice that of drive #3. In this scenario, the workloads for drives #1 and #2 should be twice that of drive #3. For this purpose, in Step 1, five folders are selected as the drives in the following order for the write cycle: (#1)–(#2)–(#3)–(#1)–(#2).

Distributing the workloads based on the write rates of each drive ensures that the maximum write rate is always achieved; however, the solutions are somewhat complex. To simplify this process, the use of drives with similar write rates is recommended.

Approach 4 allows for data to be written across multiple drives, resulting in a combined write rate, which is the sum of the write rates of all the drives. A configuration log file should be created to store the parameters, options, and settings for each measurement and provide sufficient information to gather individual files and combine them into one large file containing the entire measurement. Alternatively, each data file could be read independently. In addition, a configuration log file can provide an algorithm for converting binary data files into text formats. This approach facilitates data storage rates of hundreds of MS/s using HDDs and even higher rates when more drives are used. Furthermore, it enables AE measurements using older HDDs with lower write rates.

4. Verification of High-Performance Measurements

All presented algorithms were utilized to develop a condition-monitoring program, called the Vibration Monitoring and Analysis System (VMAS). It was designed to operate with DAQ devices from National Instruments. As demonstrated in the aforementioned tests, the data packet was generated in advance and required only access and writing. Nonetheless, the process becomes more complex in terms of measurements. A monitoring program is required to perform multiple tasks simultaneously, including:

Acquiring sensor reading values from the DAQ device.
Analyzing the data in real time to detect any anomalies or trends.
Visualizing the data to provide meaningful insights.
Encoding the data and writing them into storage.

4.1. Virtual Measurement Using Simulated DAQ Devices

To test and validate the capabilities of the program, various DAQ devices were simulated using the NI Measurement and Automation Explorer (MAX) program. National Instruments provide a wide range of DAQ devices for various measurement applications. The simulated signals were harmonic, with limited noise. The write performance was regardless of the amplitude of values or the signal waveform.

AE measurement systems, as well as high-performance measurement systems, are required to acquire and store large amounts of data from numerous channels at a high sampling rate. To test this system, a configuration of the DAQ devices was created with MAX using the NI cDAQ-9719 chassis, along with 16 NI-9223 modules. All the 56 available channels were measured at a sampling rate of 1 MS/s per channel. The three HDDs mentioned in Section 3 were employed. The operating window of the program with 56 measurement channels is shown in Figure 10.

Several tests were conducted, accompanied by multiple configurations, as outlined below:

(1): Store data. The NI-9223 module type has a 16-bit resolution, making the 2-byte number format recommended for storing the measured data. However, the 3-byte number format was selected to verify the capabilities of the program. This format provided a 24-bit resolution that was greater than that of the DAQ devices. It did not make sense to improve the accuracy of the stored values but instead increase the amount of data generated for testing. The actual input range of ±10.7 V was set. As configured, this system acquired 56 mega-samples every second for all channels, producing 168 MB of data every second, thereby requiring a write rate of at least 168 MB/s.
The average data write rates of the three HDDs were reported in Section 3 as 162, 172, and 138 MB/s, respectively. Therefore, only HDD #2 supported the required write rate and was selected to store the data. The target buffer size was set to 512 MB to maximize the write performance. The program automatically calculated and then applied an appropriate buffer size of 508 MB, which was closest to the target size. With this configuration, the buffer was filled every 3 s, and the entire dataset was written into the hard drive. The configuration for obtaining the measurements performed well.
Another setting was also tested. Now, a combination of HDDs #1 and #3 was selected to store the data instead of HDD #2. A multidrive write was used for this measurement. The other parameters were configured as those for the previous test. This test was well implemented and successfully wrote the data into multiple files located across HDDs #1 and #3, even though neither drive could individually support the required write rate.
(2): Skipping samples for visualization. The number of samples acquired every second was significantly large, leading to the visualization of the signals, which consumed a large amount of computing resources. Plotting all samples can be time-consuming and can impact the computer performance for other tasks. Notably, all tasks on the current data must be completed before new data are available. This program provides an option for skipping samples during signal plotting. For instance, let us set the number of samples to 10. Accordingly, for each group of 10 samples, only the first sample is plotted, and the others are skipped. This configuration reduces the number of samples that need to be plotted 10 times and accelerates the plotting process. By skipping the samples, a plot can be generated more rapidly while still providing a reasonable representation of the signal. This feature helps conserve computer resources and improve the performance of other tasks.
(3): Alarm. In actual measurements using sensors, it is not always easy to recognize abnormal signals by simply observing the signal plots. This becomes even more difficult when the option of skipping samples for visualization is employed. Abnormal signals, such as AEs, can be automatically detected by activating the alarm feature in the program. Various configuration parameters are available for alarm supervision, such as peak, mean, and root mean square (RMS) values. However, as the measurements were conducted using simulated devices in this study, the signal waveforms were almost sinusoidal. The signals included noise but were insignificant, as shown in Figure 10. No trends or abnormalities were observed for these signals. Nevertheless, the alarm feature was deliberately activated, serving only to add additional workload to the testing program. Despite this challenge, the program successfully performed the tests.
(4): Scheduled measurement. For long-term condition-monitoring applications, storing sensor data over an extended period can result in a significant amount of required storage space, particularly in AE-monitoring cases. To address this issue, the sensor reading values can be stored at specific times and durations. For instance, data were stored for 1 min for every 5 min. This approach saves storage space without sacrificing critical information. Additionally, it is possible to identify trends, patterns, and anomalies in the signals.

These tests successfully demonstrated the ability of the monitoring program with various configurations to conduct high-performance measurements. Although some of the configurations were superfluous for the simulated signals, they could still serve as valuable tips for conducting actual measurements and sharing experiences for developing new condition-monitoring applications.

4.2. Actual Measurement with the Application of Acoustic Emission Monitoring

In this case study, the program was used to detect and locate crack sources early by monitoring AEs during an in situ load test for a new hydraulic accumulator. The accumulator configuration is shown in Figure 11. During the testing, the operational oil was cyclically pumped into the accumulator. In each cycle, the oil pressure ascended to the peak amplitude, reaching 400 bar, and remained at this level for 0.2 s before declining to 100 bar. Each cycle was completed within approximately 0.8 s. These pressure settings were significantly more severe than those used in the operational design. The test was continued until the new accumulator was entirely broken.

A simulation performed using ANSYS Workbench demonstrated that the greatest stress concentration occurred near the bottom thread of the lower shell under the test pressure conditions, as illustrated in Figure 11. Consequently, during the actual measurement, AE sensors were positioned around the accumulator at the bottom thread, as shown in Figure 12. Three types of 8152C Kistler sensors were installed at positions P1, P2, and P3. Two VS30-V Vallen sensors were placed at positions P1 and P3. The frequency range of the Kistler sensors was 100–900 kHz, whereas that of the Vallen sensors was 20–80 kHz. The measurement sampling rate was configured to be 2 MS/s per channel for eight activated channels using the DAQ device NI USB-6366. The 2-byte number format was selected to store the measurement data, which aligned with the 16-bit resolution of the DAQ device. These measurement configurations required an average write rate of 32 MB/s. Consequently, only HDD #1 was used for the measurements, with the specifications outlined in Section 3. Data were recorded every 10 min for 10 s, resulting in a data file size of 320 MB. The program successfully performed the measurements.

After 18 h of testing, corresponding to 81,000 load cycles, the Kisler sensors detected AE events from the accumulator. Figure 13 illustrates a measured AE event indicating a potential crack in the accumulator that may not be visible yet. In the event, a burst wave was generated with frequencies greater than 120 kHz, beyond the detection range of the Vallen sensors. Although all eight channels were recorded to gather extensive information about the accumulator’s condition and to verify the effectiveness of the data-writing algorithms, only the AE waves measured by the Kistler sensors were able to reveal the AE events. Figure 13d shows a comparison of the arrival times of the AE event waves at positions P1, P2, and P3, which are represented by black, red, and blue, respectively. The wave first arrived at P2, followed by P1, and finally at P3. The time differences in arrivals at P1 and P3 were minimal, suggesting that the fracture location was adjacent to P2 in the direction of P1.

The load test was continued until the accumulator was completely broken, as depicted in Figure 14. The operational lifetime was approximately 29 h, which is equivalent to approximately 130,000 load cycles. A subsequent inspection after the test revealed that the initial cracking point was adjacent to P2 in the direction of P1, as previously indicated by the AE sensors. This test confirmed the ability of the program to perform AE measurements. In this case, the potential of the crack was accurately detected at an early stage, which was approximately 62% of the total operational lifetime. The total data size for all measurements was approximately 55.7 GB. This highlights the benefits of using a binary format for measurements with exceptionally high sampling rates.

5. Conclusions

AE measurements adopting multiple channels require a high data write rate to store all the acquired sensor reading values in real time. To employ HDDs in such cases, it is necessary to boost the data write performance. Binary file formats offer many advantages over plain text file formats, including a smaller file size and unmatched speed. This study proposed new algorithms to maximize these benefits by optimizing data encoding and writing. A series of writing tests were conducted on three conventional HDDs to validate these algorithms. The following conclusions were drawn:

To reduce the file size, values can be encoded using a specialized binary format regarding the measurement range and resolution of the employed DAQ device. The 3-byte number format requires 3 bytes to represent a value for a 24-bit resolution. This format reduces the size of files by 3.3 times compared with the text format with the same precision. For AE measurements, most DAQ devices are 16-bit resolution types, making the 2-byte format ideal for achieving 16-bit resolution and reducing file sizes by 4 times compared with the text format with the same precision. Additionally, the 4-byte number format can satisfy the data storage requirements for special cases that require a 32-bit resolution.
Regarding the write rate, storing sensor reading values into a file immediately after they become available results in poor write performance. To overcome this issue, the use of a larger self-created buffer is recommended to achieve a better write rate. The optimal buffer size range was 128 MB to 512 MB, beyond which the improvement became less significant. Furthermore, each write process should be performed by a new thread to allow the processing of newly acquired data while the previous data are being written. A new file can also be created for each write to avoid errors. Implementing these strategies resulted in a 10× faster write rate, which, in combination with a 4× smaller file size, achieved a 40× reduction in the write time or a 40× better write performance. HDD #2 exhibited a write rate of 172 MB/s, providing the capability to perform measurements at 86 MS/s.
Combining multiple hard drives to achieve the sum of the write rates of all drives can lead to a significant increase in write performance by multiples compared with a scenario using a single drive. This approach is similar to the RAID 0 technique but addresses all its drawbacks. Accordingly, the measurement data were written into multiple individual files located in order across the selected drives, and a configuration file was created to provide sufficient information to collect the individual files and subsequently merge them into a single large file. Alternatively, each data file could be read independently. This provides a simple yet effective approach to enhance the write performance for data storage, allowing for faster data acquisition and processing.

In actual measurements, monitoring programs must perform multiple tasks simultaneously, in addition to simply writing data. In this study, the condition-monitoring program developed utilizing the proposed algorithms was successfully tested. These tests demonstrated the capability of the program to conduct high-performance measurements in various configurations, indicating its potential to provide an efficient and robust system for monitoring and collecting data.

Author Contributions

Conceptualization, Q.D.V., J.-w.L., and J.-.L.; data curation, K.S. and H.C.; formal analysis, H.C. and J.-u.L.; funding acquisition, J.-u.L.; investigation, K.S., Y.K., and J.-w.L.; methodology, Q.D.V. and J.-u.L.; project administration, J.-u.L.; resources, Q.D.V. and J.-u.L.; software, K.S. and Y.K.; supervision, J.-u.L.; validation, Q.D.V., J.-w.L., and J.-u.L.; visualization, Q.D.V., Y.K., and J.-w.L.; writing—original draft, Q.D.V.; writing—review and editing, J.-w.L. and J.-u.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the ‘Development of Autonomous Ship Technology (20200615)’, funded by the Ministry of Oceans and Fisheries (MOF, Korea). This work was supported by the Korea Maritime and Ocean University Research Fund in 2023. This research was supported by Korea Institute of Marine Science & Technology Promotion (KIMST) funded by the Ministry of Oceans and Fisheries (20220630).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kuo, S.M.; Lee, B.H.; Tian, W. Real-Time Digital Signal Processing: Fundamentals, Implementations and Application, 3rd ed.; Wiley: Chichester, West Sussex, UK, 2013. [Google Scholar]
Proakis, J.G.; Manolakis, D.G. Digital Signal Processing: Pearson New International Edition, 4th ed.; Pearson: London, UK, 2013. [Google Scholar]
Smith, S.W. Digital Signal Processing: A Practical Guide for Engineers and Scientists, 3rd ed.; Elsevier Science & Technology: Saint Louis, MO, USA, 2013. [Google Scholar]
What You Really Need to Know About Sample Rate. Available online: https://www.dataq.com/data-acquisition/general-education-tutorials/what-you-really-need-to-know-about-sample-rate.html (accessed on 30 July 2024).
Grosse, C.; Ohtsu, M.; Aggelis, D.; Shiotani, T. Acoustic Emission Testing: Basics for Research–Applications in Engineering; Springer: Berlin, Germany, 2022. [Google Scholar]
Nazarchuk, Z.T.; Skalskyi, V.; Serhiyenko, O. Acoustic Emission: Methodology and Application, 1st ed.; Springer: Berlin, Germany, 2017. [Google Scholar]
Moore, P.O.; Miller, R.K.; Hill, E. Nondestructive Testing Handbook, Vol. 6-Acoustic Emission Testing; American Society for NDT Inc.: Columbus, OH, USA, 2005; pp. 147–190. [Google Scholar]
Rai, A.; Kim, J.-M. A novel pipeline leak detection approach independent of prior failure information. Measurement 2021, 167, 108284. [Google Scholar] [CrossRef]
Li, S.Z.; Song, Y.J.; Zhou, G.Q. Leak detection of water distribution pipeline subject to failure of socket joint based on acoustic emission and pattern recognition. Measurement 2018, 115, 39–44. [Google Scholar] [CrossRef]
Ma, G.; Wu, C.; Hwang, H.-J.; Li, B. Crack monitoring and damage assessment of BFRP-jacketed concrete cylinders under compression load based on acoustic emission techniques. Constr. Build. Mater. 2021, 272, 121936. [Google Scholar] [CrossRef]
Zhang, H.; Lin, Z. Analytical solution of acoustic emission in soft material with cracks by using reciprocity theorem. Eng. Fract. Mech. 2023, 277, 108996. [Google Scholar] [CrossRef]
Hosseini, S.M.; Azadi, M.; Ghasemi-Ghalebahman, A.; Jafari, S.M. Fatigue crack initiation detection in ductile cast iron crankshaft under rotating bending fatigue test using the acoustic emission entropy method. Eng. Fail. Anal. 2023, 144, 106981. [Google Scholar] [CrossRef]
Wang, X.; Zou, Q.; Wang, R.; Li, Z.; Zhang, T. Deformation and acoustic emission characteristics of coal with different water saturations under cyclic load. Soil Dyn. Earthq. Eng. 2022, 162, 107468. [Google Scholar] [CrossRef]
Teng, M.; Bi, J.; Zhao, Y.; Wang, C. Experimental study on shear failure modes and acoustic emission characteristics of rock-like materials containing embedded 3D flaw. Theor. Appl. Fract. Mech. 2023, 124, 103750. [Google Scholar] [CrossRef]
Eaton, M.J.; Pullin, R.; Holford, K.M. Acoustic emission source location in composite materials using Delta T Mapping. Compos. Part A Appl. Sci. Manuf. 2012, 43, 856–863. [Google Scholar] [CrossRef]
Na, K.; Yoon, H.; Kim, J.; Kim, S.; Youn, B.D. PERL: Probabilistic energy-ratio-based localization for boiler tube leaks using descriptors of acoustic emission signals. Reliab. Eng. Syst. Saf. 2023, 230, 108923. [Google Scholar] [CrossRef]
McCrory, J.P.; Al-Jumaili, S.K.; Crivelli, D.; Pearson, M.R.; Eaton, M.J.; Featherston, C.A.; Guagliano, M.; Holford, K.M.; Pullin, R. Damage classification in carbon fibre composites using acoustic emission: A comparison of three techniques. Compos. Part B Eng. 2015, 68, 424–430. [Google Scholar] [CrossRef]
Lurie, A.I. Theory of Elasticity; Springer: Berlin, Germany, 2010. [Google Scholar]
Okawai, H.; Tanaka, M.; Dunn, F. Non-contact acoustic method for the simultaneous measurement of thickness and acoustic properties of biological tissues. Ultrasonics 1990, 28, 401–410. [Google Scholar] [CrossRef] [PubMed]
Chubachi, N.; Kanai, H. Noncontact AE measurement system using acoustic microscope. Electron. Lett. 1991, 27, 2104–2105. [Google Scholar] [CrossRef]
Hundt, W.; Leuenberger, D.; Rehsteiner, F.; Gygax, P. An approach to monitoring of the grinding process using acoustic emission (AE) technique. CIRP Ann. 1994, 43, 295–298. [Google Scholar] [CrossRef]
SSD vs. HDD. Available online: https://tekie.com/blog/hardware/ssd-vs-hdd-speed-lifespan-and-reliability/ (accessed on 30 July 2024).
Advantages and Disadvantages of SSHDs (Solid State Hybrid Drives). Available online: https://www.lifewire.com/solid-state-hybrid-drive-833451 (accessed on 30 July 2024).
ISO/IEC 60559:2020: Information Technology—Microprocessor Systems—Floating-Point Arithmetic; International Organization for Standardization: London, UK, 2020.
Standard for Floating-Point Arithmetic. In IEEE Std 754-2008; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2008; pp. 1–70. [CrossRef]
Standard for Floating-Point Arithmetic. In IEEE Std 754-2019 (Revision of IEEE 754-2008); Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2019; pp. 1–84. [CrossRef]
Data Acquisition Handbook; Measurement Computing Corporation: Norton, MA, USA, 2012.
Bauermann, L.P.; Mesquita, L.V.; Bischoff, C.; Drews, M.; Fitz, O.; Heuer, A.; Biro, D. Scanning acoustic microscopy as a non-destructive imaging tool to localize defects inside battery cells. J. Power Sources Adv. 2020, 6, 100035. [Google Scholar] [CrossRef]
Morokov, E.; Levin, V.; Chernov, A.; Shanygin, A. High resolution ply-by-ply ultrasound imaging of impact damage in thick CFRP laminates by high-frequency acoustic microscopy. Compos. Struct. 2021, 256, 113102. [Google Scholar] [CrossRef]
Learn Microsoft, Floating-Point Numeric Types (C# Reference). Available online: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types (accessed on 30 July 2024).
Learn Microsoft, StreamWriter Class (System.IO). Available online: https://learn.microsoft.com/en-us/dotnet/api/system.io.streamwriter?view=net-7.0 (accessed on 30 July 2024).

Figure 1. Schematic diagram of the measurement system utilizing the algorithms developed in this study.

Figure 2. Composition of an example scaled value of the 3-byte number format.

Figure 3. Conversion algorithm of 2-byte and 3-byte number formats.

Figure 4. Data write performance in the tests using the default buffer size. (a) File size = 128 MB and (b) file size = 256 MB.

Figure 5. Data write performance in tests using the configured buffer size. (a) File size = 128 MB and (b) file size = 256 MB.

Figure 6. Data write performance of tests using a self-created buffer. (a) Buffer size = 8 MB, (b) buffer size = 32 MB, (c) buffer size = 64 MB, (d) buffer size = 128 MB, (e) buffer size = 256 MB, and (f) buffer size = 512 MB.

Figure 7. Data write performance of the multi-threaded write. (a) Buffer size = 128 MB and (b) buffer size = 256 MB.

Figure 8. Data distribution across the three drives, based on the principle of RAID 0.

Figure 9. Configuration to store data on multiple drives.

Figure 10. Operating window of the program in a high-performance measurement test.

Figure 11. Configuration of the hydraulic accumulator.

Figure 12. Installation of acoustic emission sensors.

Figure 13. AE event wave measured by the Kistler sensors. (a) At position P1, (b) at position P2, and (c) at position P3. (d) Comparison of the arrival times of the AE event wave to positions P1, P2, and P3.

Figure 14. Failure of the hydraulic accumulator after 29 h of the load test.

Table 1. Specifications of HDD #1.

Manufacturer	Western Digital
Model	WD10EZEX
Storage capacity	1000 GB
Free capacity	750 GB
Connectivity technology	SATA 6 GB/s
Form factor	3.5 inches
Rotational speed	7200 rotations per minute (rpm)
Power-on hours	31,280 h
Year	2012

Table 2. Data write performance results.

	Configuration	Average Data Write Rate (MB/s)
Approach No. 1	Default buffer size
	File size = 128 MB	124
	File size = 256 MB	125
	Configured buffer size
	Buffer size = file size = 128 MB	150
	Buffer size = file size = 128 MB	154
Approach No. 2	Self-created buffer
	Buffer size = file size = 8 MB	136
	Buffer size = file size = 32 MB	145
	Buffer size = file size = 64 MB	151
	Buffer size = file size = 128 MB	157
	Buffer size = file size = 256 MB	160
	Buffer size = file size = 512 MB	162
Approach No. 3	Parallel writing
	Buffer size = file size = 128 MB	110
	Buffer size = file size = 256 MB	120

Table 3. Specifications of HDDs for the write performance tests.

	HDD #1	HDD #2	HDD #3
Manufacturer	Western Digital	Western Digital	Toshiba
Model	WD10EZEX	WD10EZEX	DT01ACA200
Storage capacity	1000 GB	1000 GB	2000 GB
Free capacity	750 GB	750 GB	1200 GB
Connectivity technology	SATA 6 GB/s	SATA 6 GB/s	SATA 6 GB/s
Form factor	3.5 inches	3.5 inches	3.5 inches
Rotational speed	7200 rpm	7200 rpm	7200 rpm
Power-on hours	31,280 h	8180 h	8170 h
Year	2012	2012	2013

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vuong, Q.D.; Seo, K.; Choi, H.; Kim, Y.; Lee, J.-w.; Lee, J.-u. Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates. Appl. Sci. 2024, 14, 7410. https://doi.org/10.3390/app14167410

AMA Style

Vuong QD, Seo K, Choi H, Kim Y, Lee J-w, Lee J-u. Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates. Applied Sciences. 2024; 14(16):7410. https://doi.org/10.3390/app14167410

Chicago/Turabian Style

Vuong, Quang Dao, Kanghyun Seo, Hyejin Choi, Youngmin Kim, Ji-woong Lee, and Jae-ung Lee. 2024. "Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates" Applied Sciences 14, no. 16: 7410. https://doi.org/10.3390/app14167410

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Algorithms to Reduce the Data File Size and Improve the Write Rate for Storing Sensor Reading Values in Hard Disk Drives for Measurements with Exceptionally High Sampling Rates

Abstract

Featured Application

Abstract

1. Introduction

2. Algorithms for Data Encoding to Minimize the File Size

3. Algorithms to Improve the Data Write Rate

3.1. Data Write Performance Tests Using Text Formats

3.2. Data Write Performance Tests Using Binary Formats

3.2.1. Approach 1. Writing the Values into Files Immediately after They Are Acquired Using Internal Buffers

3.2.2. Approach 2. Storing the Values in a Self-Created Buffer before Writing

3.2.3. Approach 3. Parallel Writing

3.2.4. Discussion

3.2.5. Approach 4. Multidrive Write

4. Verification of High-Performance Measurements

4.1. Virtual Measurement Using Simulated DAQ Devices

4.2. Actual Measurement with the Application of Acoustic Emission Monitoring

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI