A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model

Wang, Yusen; Zhang, Lei; Qi, Xue; Yang, Xiaopeng; Tan, Qiulin

doi:10.3390/app14114447

Open AccessArticle

A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model

by

Yusen Wang

^1,2,

Lei Zhang

^1,2,

Xue Qi

^1,2,

Xiaopeng Yang

^1,2 and

Qiulin Tan

^1,2,*

¹

Key Laboratory of Micro/Nano Devices and Systems, Ministry of Education, North University of China, Taiyuan 030051, China

²

State Key Laboratory of Dynamic Measurement Technology, North University of China, Taiyuan 030051, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4447; https://doi.org/10.3390/app14114447

Submission received: 16 April 2024 / Revised: 18 May 2024 / Accepted: 22 May 2024 / Published: 23 May 2024

(This article belongs to the Special Issue Innovative Applications of Artificial Intelligence in Multidisciplinary Sciences: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

:

Strain measurements are vital in engineering trials, testing, and scientific research. In the process of signal acquisition, baseline drift has a significant impact on the accuracy and validity of data. Traditional solutions, such as discrete wavelet transform and empirical mode decomposition, cannot be used in real-time systems. To solve this problem, this paper proposes a Transformer-based model to eliminate the drift in the signal. A self-attentive mechanism is utilized in the encoder of the model to learn the interrelationships between the components of the input signal, and captures the key features. Then, the decoder generates a corrected signal. Meanwhile, a high-precision strain acquisition system is constructed. The experiments tested the model’s ability to remove drift from simulated voltage signals with and without Gaussian noise. The results demonstrated that the transformer model excels at eliminating signal baseline drift. Additionally, the performance of the model was investigated under different temperature conditions and with different levels of force applied by the electronic universal testing machine to produce strain. The experimental results indicate that the Transformer model can largely eliminate drift in dynamic signals l and has great potential for practical applications.

Keywords:

transformer model; strain measurement; drift signal; baseline drift mitigation

1. Introduction

The use of Wheatstone bridges for measuring minute resistance changes in strain gauges and other resistive transducers is prevalent in engineering and scientific research. They play a key role in gauging physical parameters like strain, pressure, and displacement [1,2,3]. Accurate and real-time acquisition of strain data is crucial in engineering projects and scientific research, which require a comprehensive and efficient measurement system. Consequently, high-precision measurement systems are highly valuable. In practical industrial production, strain measurement systems usually combine resistance strain gauges with Wheatstone bridges to translate minor deformations into electrical resistance changes, to facilitate strain measurement and analysis through electrical signal variations. However, resistance strain gauges generate only minor resistance changes upon force application, and it is necessary to ensure the signal processing system’s accuracy for precise strain-signal measurement.

Baseline drift, especially temperature-induced baseline drift, is one of the most important factors affecting the accuracy of signal acquisition, and it eventually leads to poor stability and reduced reliability of the obtained data during prolonged strain measurements [4,5,6,7]. Therefore, it is critical to eliminate baseline drift in the processing of dynamic signals [8]. Traditional methods usually adopt circuit calibration to address baseline drift in signals, with high-pass filtering as a classical approach. However, high-pass filtering can cause the distortion of low-frequency signals, and the additional electronic devices required can complicate circuit design and diminish system portability [9]. In addition to traditional methods, some time-series analysis methods have also been proven to be effective in removing noise from signals. For example, the discrete wavelet transform (DWT) has been shown to be an effective tool for eliminating clinical motion artifacts in thoracic electrical impedance tomography, including common artifacts like baseline drift [10]. A sensor drift detection method based on the DWT and the grey model has been demonstrated to be effective both in analog temperature sensor output data from a continuous stirred-tank reactor sensor model and in measurements from a physical temperature sensor at a nuclear power control test facility [11]. Empirical mode decomposition (EMD), which offers a higher degree of adaptability compared to wavelet variation, has also been shown to be effective in removing power line interference and baseline wander noise from electrocardiogram signals [12]. An EMD-based fusion algorithm has been demonstrated to be effective in eliminating the effects of external environments and accurately compensating for temperature drift in MEMS gyroscopes [13]. However, such methods do not perform well in the face of highly randomized drift [14,15] and are difficult to apply in real-time systems [16]. In recent years, some improved methods have also been proposed. For example, empirical wavelet transform has shown good results in handling non-stationary signals and has been successfully applied to eliminate noise from electromyography signals [17]. With the continuous development of neural networks, various models have become powerful tools for processing time series data and have achieved significant results in dealing with signal drift [18,19,20,21]. Therefore, using deep learning is a viable solution for complex signal modeling tasks like drift elimination.

The Recurrent Neural Network (RNN) is a classic neural network architecture designed to capture temporal information in time series data. However, RNNs suffer from problems like gradient explosion and gradient vanishing, limiting their ability to learn long-term dependencies [22]. To solve these problems, several RNN variants like Long Short-Term Memory Networks (LSTMs) and Gated Recurrent Units (GRUs) have been developed [23,24]. Though these variants mitigate the gradient issue to a degree, they do not completely resolve it, nor do they allow for focusing on variable information at distinct time steps [25]. Moreover, to more effectively solve sequence-to-sequence problems, an encoder–decoder architecture based on the attention mechanism has been presented [26]. This architecture is now extensively used in time series processing and shows improved efficacy in removing signal drifts [27,28,29]. However, the problem of distance dependence in these systems remains unresolved. The Transformer model abandons the traditional RNN and CNN structures but relies only on the attention mechanism [30]. This unique structure enables the Transformer’s self-attention mechanism to simultaneously focus on all positions in a sequence, thereby capturing global dependencies and effectively solving the problem of long-term dependency [31]. In addition, the Transformer’s multi-attention mechanism facilitates parallel computation, greatly improving computational efficiency and reducing training time. In drift removal tasks, the network takes a sampled signal as input and outputs a signal with the drift removed. The drifted signal differs from the desired useful signal in both time and frequency domains. The Transformer’s self-attention mechanism can discern these global dependencies in the complete sequence and assign weights accordingly. Given that the voltage signals obtained by the system have rich temporal information, using the Transformer model to eliminate drift from these signals is a reasonable approach.

In this paper, a Transformer-based model is designed to eliminate baseline drift from the signals acquired by the strain-signal acquisition system. This approach aims to reduce the negative impact of baseline drift on the accuracy of the signals. The paper is organized as follows. The workflow of the strain acquisition system and the theory of the Transformer model are described in detail in Section 2. In Section 3, multiple models are compared on training sets without and with Gaussian noise, respectively, demonstrating the Transformer model’s excellence in removing drifts. The effectiveness of the proposed model is also verified using real strain acquisition signals. The article concludes in Section 4, summarizing the results and discussing future work.

2. Method

2.1. Measurement System Description

The schematic diagram of the strain measurement system is shown in Figure 1. The resistive strain gauge used in the test has a substrate material of polyimide with dimensions of 6.4 mm × 3.5 mm. The resistance of the strain gauge is 120 Ω with a sensitivity factor of 1.8. The resistive strain gauge is connected to the Wheatstone bridge as a variable resistor of the quarter bridge. In the experiment, considering the distance of the test piece to be tested from the instrument, a three-wire connection method is chosen to offset the thermal-effect output problem of the long wire.

Before the signal acquisition process starts, the relevant parameters of the acquisition system, including the sampling frequency, transmission rate, and communication interface type, are set through the software running on the upper computer. After the configuration is completed, the system starts to collect data. When the test specimen is deformed by force, the resistance value of the resistance strain gauges pasted on its surface will change accordingly. This small resistance change is converted by the Wheatstone bridge into an electrical signal, which is then amplified and filtered by the conditioning circuit. The final processed signal is converted to a digital signal by an analog-to-digital converter (ADC).

To ensure that the data are not lost during processing or transmission in the case of high sampling rates and long-time acquisition operations, the system utilizes two DDR3 SDRAMs as high-speed, high-capacity storage to cache the acquired data. After frame encoding, the collected data is initially cached into DDR3. Then, these data are retrieved from the cache and transmitted to the software on the supervisory computer through the Ethernet interface module. The system program resides in QSPI Flash. Upon device activation, it automatically retrieves the program and configuration bitstream from this non-volatile storage, loading them into the processor and FPGA for initialization. The setting of circuit parameters of the measuring system is shown in Table 1.

By combining the acquisition system and the proposed algorithm, the accurate acquisition of strain data can be achieved. The data are transmitted from the hardware system to the host computer, where they undergo drift correction through a Transformer model that has been previously trained.

2.2. Drift Elimination

The structure of the model is shown in Figure 2. Firstly, the sampling voltage is normalized before it is input into a linear layer, which transforms the initial features of the signal into a higher-dimensional representation space. Then, the data are combined with positional encoding and sequentially processed by the encoder and decoder of the Transformer model. Finally, another linear layer remaps the processed high-dimensional vectors back to the original data dimensions for further analysis and application.

2.2.1. Transformer Encoder

As illustrated in Figure 2, the encoder part of the Transformer model is formed by stacking multiple encoder layers of the same structure. Each encoder layer consists of two sub-layers: a multi-head attention module and a feed-forward neural network module. Residual connections are used around the sublayers [32], and each sublayer is followed by a layer normalization module [33].

2.2.2. Positional Encoding

Unlike traditional RNNs, the Transformer’s self-attention mechanism does not process information on the order of elements in the input sequence. This indicates that, without processing the input data, the model fails to distinguish the positional relationships of the elements in the sequence, and cannot capture the dependencies between signals. The purpose of positional encoding is to add an encoding related to its position in the sequence to the representation of each input element at the input stage of the model so that the model can understand the sequential and temporal relationships in the sequence by using this information. There are generally two types of position encoding: sinusoidal functions and learnable position embeddings. The sinusoidal version, using sine and cosine functions at different frequencies, allows the model to handle sequences longer than those seen during training [30]. In contrast, learnable positional encoding involves adding a trainable weight matrix of the same dimension to the input data, optimizing these positional encoding parameters throughout model training. This method enables the model to learn the positional relationships of elements within the sequence [34]. Since both perform equally well, this paper chooses the sine–cosine function version of the position embedding. It can be expressed as

P E_{(p o s, 2 i)} = s i n (\frac{p o s}{10, 000^{\frac{2 i}{d_{m o d e l}}}})

(1)

P E_{(p o s, 2 i + 1)} = c o s (\frac{p o s}{10, 000^{\frac{2 i}{d_{m o d e l}}}})

(2)

where pos represents the position of the element in the sequence, i represents the dimension in the embedding vector, and

d_{m o d e l}

represents the dimension of the vector. In this way, each element of the positional encoding is assigned a corresponding sine wave, which enables the Transformer model to understand the relative positional relationships among elements in the sequence [35].

2.2.3. Scaled Dot-Product Attention

Capturing the long-term dependence and dynamics of the drifting signal is a prominent feature of the Transformer model, and this function is precisely realized by scaled dot-production attention, whose detailed structure is shown in Figure 3a. The inputs are mapped into three channels, namely, query, key, and value, and their lengths are

d_{q}

,

d_{k}

, and

d_{v}

, respectively. The system’s sampled voltage, denoted as

u_{s} (t)

, serves as the input and is multiplied by distinct weight matrices

W^{Q}

,

W^{K}

, and,

W^{V}

to obtain the query vector Q, key vector K, and value vector V, respectively. The specific formulas are shown in Equations (3)–(5):

Q = W^{Q} u_{s} (t)

(3)

K = W^{K} u_{s} (t)

(4)

V = W^{V} u_{s} (t)

(5)

The commonly used attention functions in attention mechanisms are additive attention [26] and dot-product attention. Additive attention employs a feedforward network with a single hidden layer to compute the compatibility function. In contrast, dot-product attention is faster and more space-efficient than additive attention because it can be implemented using a highly optimized matrix multiplication code. Thus, this paper opts for dot-product attention.

The purpose of scaling dot-product attention is to apply the weights generated from query vectors and key vectors to the value vectors, thereby focusing on the more relevant locations in the input data. First, the degree of matching is determined by calculating the dot product between the query vector and the key vector. Then, the weights of the value vectors are obtained after appropriate scaling and processing by the SoftMax function. The scaling process involves dividing each dot product by

\sqrt{d_{k}}

to prevent gradient vanishing during training. Finally, the output of the attention mechanism is obtained by multiplying the weights with the value vector [36]. This can be expressed as

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(6)

where SoftMax represents the SoftMax function, Q represents the query vector, K represents the key vector, V represents the value vector, and

d_{k}

represents the length of the key vector.

2.2.4. Multi-Head Attention

A single-head self-attention layer limits the model’s ability to focus on multiple important locations in the data simultaneously, potentially causing some information to be missed [35]. The multi-head attention mechanism addresses this by enabling the model to use multiple sets of weight matrices, projecting the input into different subspaces using different query, key, and value matrices. This allows the model to access information from various locations in different subspaces. In the multi-head attention module, the scaled dot-product attention module is stacked in parallel, as illustrated in Figure 3b. The query, key, and value are linearly projected h times to the dimensions of

d_{q}

,

d_{k}

, and

d_{v}

, respectively. Then, the attention function is executed h times in parallel on the projected query, key, and value, and, finally, the output value in dimension

d_{v}

is obtained. Here, each head is a scaled dot-product attention module, with h representing the number of heads. The output of each head is concatenated and projected to obtain the final value. The multi-head attention function can be expressed as

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{o}

(7)

h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(8)

where the linear projection matrix is

W_{i}^{Q} \in ℝ^{d_{m o d e l} \times d_{q}}

,

W_{i}^{K} \in ℝ^{d_{m o d e l} \times d_{k}}

,

W_{i}^{V} \in ℝ^{d_{m o d e l} \times d_{v}}

, and

W^{O} \in ℝ^{d_{m o d e l} \times h d_{v}}

.

2.2.5. Feedforward Layer

Each layer of the encoder and decoder includes a feedforward network (FFN) composed of two fully connected layers, with one layer utilizing a nonlinear activation function [37]. In this paper, the activation function used is ReLU, a commonly employed activation function in neural networks. The FFN can be expressed as

F F N (x) = \max (0, x W_{1} + b_{1}) W_{2} + b_{2}

(9)

where

W_{1}

and

W_{2}

are the parameter matrices in the two linear transformation layers, respectively, and max is the ReLU activation function.

2.2.6. Transformer Encoder

The structure of the Transformer encoder is similar to that of the decoder, and the decoder part also stacks multiple decoder layers of the same structure, as shown in Figure 2. Compared to the encoder layer, the decoder layer modifies the multi-attention sublayer to make it a multi-attention mechanism with a mask. This helps to prevent future information from being leaked when generating the output sequence, i.e., the output generated at each moment depends only on previous outputs and not on future outputs. Additionally, the decoder layer inserts a third sublayer in the middle of the multi-head attention sublayer with masks and the feed-forward neural network sublayer, which is a multi-head attention-mechanism module. This sublayer is referred to as the cross-attention sublayer, and its key and value come from the output of the encoder, while the query is obtained from the output of the previous self-attention layer.

3. Experiments and Results

3.1. Construction of Test Platform

To validate the accuracy of the baseline drift elimination algorithm proposed in this study, a test system that can simultaneously control temperature and strain was developed, as illustrated in Figure 4. The system consists of an electronic universal testing machine, a high-temperature extensometer, a high-frequency induction heating device, mullite and graphite sleeves, a resistance strain gauge, an acquisition system, and a DC power supply.

At first, a resistance strain gauge is fixed to a nickel-based high-temperature alloy substrate and connected to the system as a variable resistor in a quarter-bridge configuration. The nickel-based high-temperature alloy substrate, with fixed resistance strain gauges, is encased in a graphite sleeve and further enveloped by a mullite sleeve. Subsequently, the top and bottom ends of the nickel-based high-temperature alloy substrate are secured within the fixture of the electronic universal testing machine.

An electronic universal testing machine is used to perform tensile testing of test pieces, by continuously applying force to the test piece to produce strain. The electronic universal testing machine is entirely controlled through a computerized system. As the force is applied, an extensometer is used to accurately measure the change in length of the specimen so that the resulting strain can be calculated. The high-temperature extensometer is connected to the control software of the electronic universal testing machine, and the two ceramic rods of the extensometer are in contact with the nickel-based high-temperature alloy substrate, which can transmit in real-time the magnitude of the strain generated by the nickel-based high-temperature alloy substrate back to the control software of the electronic universal testing machine. The electronic universal testing machine adjusts the size of the force according to the strain value transmitted back to the high-temperature extensometer, to ensure that the test is conducted in accordance with the predetermined program until the set strain value is reached.

High-frequency induction heating equipment utilizes a high-frequency electromagnetic field to heat conductive materials. Placed within the induction coil, the graphite sleeve acts as a conductor to generate eddy current heating, while the mullite sleeve maintains the temperature seal. With standard K-type thermocouples inserted into the graphite sleeve, the high-frequency induction heating equipment allows real-time transmission of internal temperature data back to the equipment for temperature control.

3.2. Drift-Elimination Experiment

A specialized dataset was constructed to train the Transformer model for drift elimination. The input to the model consists of two parts. One is the drift data collected by the resistance strain gauge without being pulled by the electronic universal testing machine, which has a duration of 1500 s. The other part is the voltage signal generated by the simulation, which is a cluster of sinusoidal signals. This cluster of sinusoidal signals is represented as

u_{r} (t) = A s i n (2 π f t)

, t ∈ [0, 1500 s], where f = 0.5, 0.6, 0.75, and A = 0.05, 0.07, and 0.1, respectively. The Transformer model aims to utilize an encoder to capture the drift characteristics of the signal and then uses a decoder to eliminate the drift from the signal while retaining the other components of the signal. Therefore, it is reasonable to use simulated voltages of finite frequency and amplitude to train the network to eliminate drift from the similar signal.

In the experiment, the simulated sinusoidal signal is superimposed with the acquired drift to obtain a signal containing the drift to simulate the sampling voltage

u_{s} (t)

. The dataset was split into training and validation sets at a ratio of 8:2. For the Transformer model, both the numbers of encoder and decoder layers were set to 2, the number of heads (h) in the multi-head attention mechanism was set to 8, the output dimensions of the embedding layer and each sublayer were set to

d_{m o d e l} = 64

, the dimensionality of the feedforward network was set to

\dim_f e e d f o r w a r d = 256

, and the dropout rate was set to 0.1. This configuration yielded the best performance on the validation set.

Figure 5 and Table 2 compare the performance of different models in eliminating the drift of the simulated voltage signal. The simulated voltage signal is

u_{r} (t) = 0.1 \sin (1.5 π t)

, while the drift data are obtained by the device. This study compares the RNN model, the GRU model, the encoder–decoder model (AGRU) using both the attention mechanism and the GRU, and the Transformer model. The time step for the RNN and the GRU is set to 15, and the size of the hidden state is set to 64. The AGRU model consists of an encoder, attention mechanism, and decoder. The GRU model is used for both the encoder and decoder structures. The attention mechanism employs additive attention for the scoring function. In the AGRU model, the time step is set to 15, and the hidden state size for both the encoder and decoder is 64. In Table 2, in order to compare the ability of different models to eliminate drift, some performance metrics such as RMSE, MAE and MAPE are selected. According to the results shown in Figure 5 and Table 2, the RNN and GRU models are less effective at removing drift, resulting in distortion of the output signal. These problems mainly stem from the fact that these models suffer from gradient explosion and gradient vanishing, so they have difficulties in capturing long-term dependencies. In addition, RNN and GRU models struggle to retain all important information due to their fixed-length context vectors. This limitation leads to the loss of key information and critical details in the time characteristics. The introduction of the attention mechanism and sequence-to-sequence architecture allows the model to dynamically assign different weights to various parts of the data. This preserves the hidden states of important time-frequency features in the signal data, significantly enhancing the model’s learning capability. From the model outputs in Table 2 and Figure 5c, it is evident that the AGRU model significantly improves drift elimination compared to the RNN and GRU models. The traditional sequence-to-sequence model suffers from the distance dependence problem, and the model’s accuracy decreases greatly with the increasing sequence length. Although attempts have been made to solve this problem by introducing the attention mechanism or using variants of RNN such as LSTM and GRU, the problem still cannot be completely solved. The self-attention mechanism and multi-attention mechanism of the Transformer model not only solve the problem of long-term dependence but also realize the parallel computation, which significantly improves the computational efficiency. For the above reasons, the Transformer model has shown further improvement in eliminating signal baseline drift compared to the AGRU model, based on the available experimental results.

Gaussian noise was added to the sinusoidal signal to compare the ability of the four models to remove drift in a more realistic setting. The simulated voltage signal is still

u_{r} (t) = 0.1 \sin (1.5 π t)

, to which Gaussian noise with mean μ = 0 and standard deviation σ = 0.005 is added to generate a new signal

u_{e} (t)

. The drift captured by the real sensor used in the experiments of Figure 5 is then superimposed on

u_{e} (t)

to obtain a signal that contains both drift and Gaussian noise, thus simulating a sampled voltage

u_{g} (t)

that more closely resembles a real environment. The hyperparameters of the four models are constant, and the experimental results are shown in Figure 6 and Table 3. Since the RNN and GRU models perform poorly in dealing with long-term dependencies, both models still perform poorly in removing drift after adding Gaussian noise and lead to a severe distortion of the signal. Both the AGRU model and the Transformer model perform well in removing baseline drift from the signal. The characteristics of drift in both the time and frequency domains differ significantly from those of sinusoidal signals and Gaussian noise. Therefore, with the introduction of the attention mechanism, the model can effectively distinguish which signal features need to be preserved and which should be forgotten. This allows the model to retain important signal components while removing drift. The Transformer model performs more stably and efficiently than the AGRU model in handling long-term dependencies. As shown in Figure 6 and Table 3, the Transformer model demonstrates a superior ability to eliminate drift, even with the addition of Gaussian noise.

Then, real sampling voltages are used to evaluate the ability of the model to eliminate baseline drift in the signal. In the experiment, high-temperature strain gauges were first pasted onto nickel-based high-temperature alloy substrates. Next, they were placed in graphite sleeves and mullite sleeves; finally, they were fixed with the fixture of the electronic universal testing machine, and the temperature inside the graphite sleeves was gradually increased by controlling the high-frequency induction heating equipment. Tests were carried out at ambient temperatures of 23 °C, 100 °C, 170 °C, and 210 °C, respectively. In Figure 7, the black curve represents the sampled voltage

u_{s} (t)

, whereas the red curve illustrates the drift obtained through low-pass filtering. It can be observed that as the ambient temperature increases, the drift in the signal becomes more pronounced. The signal

u_{s} (t)

acquired at a temperature of 170 °C and 210 °C is separately input into the trained Transformer model, and the output is shown as the blue curve in Figure 7, where

u_{r} (t)

is the signal after removing the drift.

This study evaluated the model’s efficacy in eliminating drift under stress-induced strain conditions. Before the experiment was conducted, parameters were configured in the control software of the electronic universal testing machine, and the parameters included specimen shape, test speed, control mode, and target strain value. These parameter settings enabled the electronic universal testing machine to adhere to a predefined program, gradually increasing the tensile force until the specimen reached the predetermined strain value. Once the high-temperature extensometer detected that the strain reached the initial set value, the electronic universal testing machine temporarily maintained the force and stabilized the strain at this value for a duration before incrementally increasing the tensile force to the subsequent strain value. After the test was started, the voltage was recorded without applying force. Then, the electronic universal testing machine progressively increased the tensile force, stretching the nickel-based high-temperature alloy substrate to 500 με, 1000 με, and, finally, 1500 με. Each time the strain reached a predetermined value, the electronic universal testing machine kept the current force and maintained the strain for 90 s. When the strain value reached 1500 με, the machine kept the force for a period of time, and the test was ended. In Figure 8, the black curve represents the recorded voltage, whereas the red curve depicts the voltage after drift elimination. The experimental results indicated that drift was effectively mitigated under the application of force.

The Transformer model was trained using the deep learning framework PyTorch 1.13.1 in Python 3.7 on a personal computer equipped with an AMD Ryzen7 5800H CPU 3.20 GHZ (Advanced Micro Devices, Inc., Santa Clara, CA, USA), an NVIDIA GeForce RTX 3060 GPU (NVIDIA Corporation, Santa Clara, CA, USA), and 16 GB RAM (Micron Technology, Inc., Boise, ID, USA).

4. Conclusions

High-precision strain measurement systems are crucial in industrial production and scientific research. However, baseline drift caused by factors such as ambient temperature can significantly affect signal acquisition accuracy. Traditional methods like high-pass filtering, discrete wavelet transform, and empirical mode decomposition fail to effectively address this issue. However, with the continuous development of artificial intelligence, various models have become powerful tools for processing time series data. In this paper, a Transformer-based model is designed to remove baseline drift from the signal. The encoder of this model uses a self-attentive mechanism to learn the interrelationships within the input signal and to capture its key features. The decoder then generates a corrected signal. The network can be trained to capture drift signals hidden within time- and frequency-domain features. The ability of the RNN, GRU, AGRU, and Transformer models to eliminate drift was tested on training sets of simulated voltage signals, both with and without Gaussian noise. The experimental results show that the Transformer model has a superior ability to eliminate signal baseline drift compared to the other three models. To further verify the model’s validity, an electronic universal tensile machine and a high-frequency induction heating device were used to test the model under different temperatures and forces. The experimental results in Figure 7 and Figure 8 demonstrate that the proposed Transformer model can effectively eliminate drift in dynamic signals and has significant potential for practical applications.

In the future, to improve the speed and accuracy of data processing in the acquisition system and enhance the system’s reliability and real-time performance, we will deploy the algorithm designed in this paper on embedded hardware platforms, such as the NVIDIA Jetson, to verify the feasibility of the proposed method.

Author Contributions

Conceptualization, Y.W.; methodology, Y.W.; software, Y.W.; resources, L.Z.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W., X.Q. and X.Y.; funding acquisition, Q.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Research and Development Plan of Shanxi Province (No. 202102030201005), the National Natural Science Foundation of China (Grant No. 52105594), and the Fundamental Research Program of Shanxi Province (Grant No. 20210302124274).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors have no conflicts of interest to disclose.

References

Bose, A.K.; Zhang, X.Z.; Maddipatla, D.; Masihi, S.; Panahi, M.; Narakathu, B.B.; Bazuin, B.J.; Williams, J.D.; Mitchell, M.F.; Atashbar, M.Z. Screen-printed strain gauge for micro-strain detection applications. IEEE Sens. J. 2020, 20, 12652–12660. [Google Scholar] [CrossRef]
Jindal, S.K.; De, R.T.; Kumar, A.; Raghuwanshi, S.K. Novel MEMS piezoresistive sensor with hair-pin structure to enhance tensile and compressive sensitivity and correct non-linearity. J. Electron. Test. 2020, 36, 509–517. [Google Scholar] [CrossRef]
Rana, S.; George, B.; Kumar, V.J. Assay of a resistive displacement transducer with a floating wiper. IEEE Sens. J. 2015, 15, 6611–6618. [Google Scholar] [CrossRef]
Zhang, C.; Wang, W.; Pan, Y.; Cheng, L.N.; Zhai, S.P.; Gao, X. A two-stage method for real-time baseline drift compensation in gas sensors. Meas. Sci. Technol. 2022, 33, 045108. [Google Scholar] [CrossRef]
He, J.B.; Xie, J.; He, X.P.; Du, L.M.; Zhou, W. Analytical study and compensation for temperature drifts of a bulk silicon MEMS capacitive accelerometer. Sens. Actuator A Phys. 2016, 239, 174–184. [Google Scholar] [CrossRef]
Miao, S.F.; Koenders, E.; Knobbe, A. Automatic baseline correction of strain gauge signals. Struct. Control Health Monit. 2015, 22, 36–49. [Google Scholar] [CrossRef]
Kanaparthi, S.; Singh, S.G. Drift independent discrimination of H₂S from other interfering gases with a metal oxide gas sensor using extracted adsorption-desorption noise. Sens. Actuator B Chem. 2021, 344, 130146. [Google Scholar] [CrossRef]
Gao, L.; Li, D.H.; Yao, L.L.; Gao, Y.N. Sensor drift fault diagnosis for chiller system using deep recurrent canonical correlation analysis and k-nearest neighbor classifier. ISA Trans. 2022, 122, 232–246. [Google Scholar] [CrossRef] [PubMed]
Wu, D.Y.; Zhang, G.J.; Zhu, S.; Liu, Y.; Liu, G.C.; Jia, L.; Wu, Y.D.; Zhang, W.D. A baseline drift removal algorithm based on cumulative sum and downsampling for hydroacoustic signal. Measurement 2023, 207, 112344. [Google Scholar] [CrossRef]
Yang, L.; Qu, S.Y.; Zhang, Y.W.; Zhang, G.; Wang, H.; Yang, B.; Xu, C.H.; Dai, M.; Cao, X.S. Removing Clinical Motion Artifacts During Ventilation Monitoring with Electrical Impedance Tomography: Introduction of Methodology and Validation With Simulation and Patient Data. Front. Med. 2022, 9, 10. [Google Scholar] [CrossRef]
Han, X.J.; Jiang, J.; Xu, A.D.; Bari, A.; Pei, C.; Sun, Y. Sensor Drift Detection Based on Discrete Wavelet Transform and Grey Models. IEEE Access 2020, 8, 204389–204399. [Google Scholar] [CrossRef]
Tan, X.; Chen, X.X.; Hu, X.Y.; Ren, R.; Zhou, B.; Fang, Z.; Xia, S.H. EMD-based electrocardiogram delineation for a wearable low-power ECG monitoring device. Can. J. Electr. Comput. Eng. 2014, 37, 212–221. [Google Scholar] [CrossRef]
Li, Z.; Cui, Y.C.; Gu, Y.K.; Wang, G.D.; Yang, J.; Chen, K.; Cao, H.L. Temperature Drift Compensation for Four-Mass Vibration MEMS Gyroscope Based on EMD and Hybrid Filtering Fusion Method. Micromachines 2023, 14, 15. [Google Scholar] [CrossRef] [PubMed]
Yan, K.; Zhang, D. Correcting instrumental variation and time-varying drift: A transfer learning approach with autoencoders. IEEE Trans. Instrum. Meas. 2016, 65, 2012–2022. [Google Scholar] [CrossRef]
Yan, K.; Zhang, D. Calibration transfer and drift compensation of e-noses via coupled task learning. Sens. Actuator B Chem. 2016, 225, 288–297. [Google Scholar] [CrossRef]
Kelly, J.W.; Siewiorek, D.P.; Smailagic, A.; Wang, W. An adaptive filter for the removal of drifting sinusoidal noise without a reference. IEEE J. Biomed. Health Inform. 2016, 20, 213–221. [Google Scholar] [CrossRef] [PubMed]
Elouaham, S.; Nassiri, B.; Dliou, A.; Zougagh, H.; El Kamoun, N.; El Khadiri, K.; Said, S. Combination time-frequency and empirical wavelet transform methods for removal of composite noise in EMG signals. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2023, 21, 1373–1381. [Google Scholar] [CrossRef]
Weerakody, P.B.; Wong, K.W.; Wang, G.J.; Ela, W. A review of irregular time series data handling with gated recurrent neural networks. Neurocomputing 2021, 441, 161–178. [Google Scholar] [CrossRef]
Wu, Z.K.; Zaghloul, M.A.S.; Carpenter, D.; Li, M.J.; Daw, J.; Mao, Z.H.; Hnatovsky, C.; Mihailov, S.J.; Chen, K.P. Mitigation of radiation-induced fiber bragg grating (FBG) sensor drifts in intense radiation environments based on long-short-term memory (LSTM) network. IEEE Access 2021, 9, 148296–148301. [Google Scholar] [CrossRef]
Zou, Y.A.; Lv, J.H. Using recurrent neural network to optimize electronic nose system with dimensionality reduction. Electronics 2020, 9, 2205. [Google Scholar] [CrossRef]
Badawi, D.; Agambayev, A.; Ozev, S.; Cetin, A.E. Real-time low-cost drift compensation for chemical sensors using a deep neural network with hadamard transform and additive layers. IEEE Sens. J. 2021, 21, 17984–17994. [Google Scholar] [CrossRef]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Cho, K.; Merrienboer, B.v.; Gülçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014. [Google Scholar]
Yuan, X.F.; Li, L.; Shardt, Y.A.W.; Wang, Y.L.; Yang, C.H. Deep learning with spatiotemporal attention-based LSTM for industrial soft sensor model development. IEEE Trans. Ind. Electron. 2021, 68, 4404–4414. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y.J.C. Neural machine translation by jointly learning to align and translate. arXiv 2016, arXiv:1409.0473. Available online: https://arxiv.org/abs/1409.0473 (accessed on 1 September 2014).
Qin, Y.; Song, D.J.; Chen, H.F.; Cheng, W.; Jiang, G.F.; Cottrell, G.W. A dual-stage attention-based recurrent neural network for time series prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017. [Google Scholar]
Li, A.; Xiao, F.; Zhang, C.; Fan, C. Attention-based interpretable neural network for building cooling load prediction. Appl. Energy 2021, 299, 117238. [Google Scholar] [CrossRef]
Yao, B.; Dai, Y.; Xia, G.M.; Zhang, Z.H.; Zhang, J.X. High-sensitivity and wide-range resistance measurement based on self-balancing wheatstone bridge and gated recurrent neural network. IEEE Trans. Ind. Electron. 2023, 70, 5326–5335. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. Available online: https://arxiv.org/abs/1607.06450 (accessed on 21 July 2016).
He, Y.C.; Wang, X.; Yang, Z.J.; Xue, L.B.; Chen, Y.M.; Ji, J.Y.; Wan, F.; Mukhopadhyay, S.C.; Men, L.; Tong, M.C.F.; et al. Classification of attention deficit/hyperactivity disorder based on EEG signals using a EEG-Transformer model J. Neural Eng. 2023, 20, 13. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.H.; Chen, H.T.; Chen, X.H.; Guo, J.Y.; Liu, Z.H.; Tang, Y.H.; Xiao, A.; Xu, C.J.; Xu, Y.X.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Zhang, Z.Z.; Zhang, X.D.; Huang, G.N.; Liu, Y.; Chen, X. EEG-based Emotion Recognition via Transformer Neural Architecture Search. IEEE Trans. Ind. Inform. 2023, 19, 6016–6025. [Google Scholar] [CrossRef]
Li, C.; Huang, X.Y.; Song, R.C.; Qian, R.B.; Liu, X.; Chen, X. EEG-based seizure prediction via Transformer guided CNN. Measurement 2022, 203, 15. [Google Scholar] [CrossRef]

Figure 1. The schematic diagram of the strain measurement system.

Figure 2. The Transformer-based model architecture.

Figure 3. (a) Structure of scaled dot-product attention (b) Structure of multi-head attention.

Figure 4. The strain- and temperature-testing platform.

Figure 5. Drift-elimination results of different algorithms for simulated signals. (a) RNN. (b) GRU. (c) AGRU. (d) Transformer.

Figure 6. Drift-elimination results of different algorithms for simulated signals with Gaussian noise. (a) RNN. (b) GRU. (c) AGRU. (d) Transformer.

Figure 7. Signals at different temperatures and drift elimination. (a) Ambient temperature of 23 °C. (b) Ambient temperature of 100 °C. (c) Ambient temperature of 170 °C. (d) Ambient temperature of 210 °C.

Figure 8. Drift-elimination results when subjected to force.

Table 1. Measurement System Circuit Parameters.

Electronic Components	Device Parameters
Resistors in the Wheatstone Bridge	120 Ω
Amplifier	AD8230 (Analog Devices, Inc., Wilmington, MA, USA)
Filter	MAX7480 (Maxim Integrated Products, Inc., San Jose, CA, USA)
ADC	AD7667, 16 bits, 1MSPS (Analog Devices, Inc., Wilmington, MA, USA)
System on Chip	XC7Z020CLG400 (Xilinx, Inc., San Jose, CA, USA)
DDR3	MT41K128M16JT-125 (Micron Technology, Inc., Boise, ID, USA)
Ethernet	RTL8211E (Realtek Semiconductor Corp., Hsinchu, China)
QSPI Flash	W25Q256FV (Winbond Electronics Corporation, Taichung, China)

Table 2. Performance of different algorithms on simulated signals.

Methods	MAPE (%)	RMSE (V)	MAE (V)
RNN	1.2126%	0.0411	0.0325
GRU	1.0377%	0.0352	0.0278
AGRU	0.2343%	0.0076	0.0062
Transformer	0.1872%	0.0060	0.0049

Table 3. Performance of different algorithms on simulated signals with Gaussian noise.

Methods	MAPE (%)	RMSE (V)	MAE (V)
RNN	1.3477%	0.0421	0.0351
GRU	1.0650%	0.0358	0.0286
AGRU	0.3234%	0.0109	0.0085
Transformer	0.2634%	0.0085	0.0069

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Zhang, L.; Qi, X.; Yang, X.; Tan, Q. A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model. Appl. Sci. 2024, 14, 4447. https://doi.org/10.3390/app14114447

AMA Style

Wang Y, Zhang L, Qi X, Yang X, Tan Q. A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model. Applied Sciences. 2024; 14(11):4447. https://doi.org/10.3390/app14114447

Chicago/Turabian Style

Wang, Yusen, Lei Zhang, Xue Qi, Xiaopeng Yang, and Qiulin Tan. 2024. "A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model" Applied Sciences 14, no. 11: 4447. https://doi.org/10.3390/app14114447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model

Abstract

1. Introduction

2. Method

2.1. Measurement System Description

2.2. Drift Elimination

2.2.1. Transformer Encoder

2.2.2. Positional Encoding

2.2.3. Scaled Dot-Product Attention

2.2.4. Multi-Head Attention

2.2.5. Feedforward Layer

2.2.6. Transformer Encoder

3. Experiments and Results

3.1. Construction of Test Platform

3.2. Drift-Elimination Experiment

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI