Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals

Ma, Chenggong; Gao, Jiuyang; Wang, Zhenggang; Liu, Ming; Zou, Jing; Zhao, Zhipeng; Yan, Jingchao; Guo, Junyu

doi:10.3390/pr12102094

Open AccessArticle

Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals

by

Chenggong Ma

¹,

Jiuyang Gao

^1,2,*,

Zhenggang Wang

¹,

Ming Liu

¹,

Jing Zou

¹,

Zhipeng Zhao

¹,

Jingchao Yan

¹ and

Junyu Guo

^3,*

¹

China Oil & Gas Pipeline Network Corporation Central China Branch, Wuhan 430000, China

²

School of Mechanical & Electrical Engineering, Wuhan Institute of Technology, Wuhan 430205, China

³

School of Mechatronic Engineering, Southwest Petroleum University, Chengdu 610500, China

^*

Authors to whom correspondence should be addressed.

Processes 2024, 12(10), 2094; https://doi.org/10.3390/pr12102094

Submission received: 28 August 2024 / Revised: 22 September 2024 / Accepted: 25 September 2024 / Published: 26 September 2024

(This article belongs to the Special Issue Applications of Artificial Intelligence Technologies in Energy, Manufacturing and Automatic Control Processes)

Download

Browse Figures

Versions Notes

Abstract

This paper introduces a novel network, DDFE-Transformer (Data-Driven Feature Extraction-Transformer), for fault diagnosis using acoustic emission signals. The DDFE-Transformer network integrates two primary modules: the DDFE module, focusing on noise reduction and feature enhancement, and the Transformer module. The DDFE module employs two techniques: the Wavelet Kernel Network (WKN) for noise reduction and the Convolutional Block Attention Module (CBAM) for feature enhancement. The wavelet function in the WKN reduces noise, while the attention mechanism in the CBAM enhances features. The Transformer module then processes the feature vectors and sends the results to the softmax layer for classification. To validate the proposed method’s efficacy, experiments were conducted using acoustic emission datasets from NASA Ames Research Center and the University of California, Berkeley. The results were compared using the four key metrics obtained through confusion matrix analysis. Experimental results show that the proposed method performs excellently in fault diagnosis using acoustic emission signals, achieving a high average accuracy of 99.84% and outperforming several baseline models, such as CNN, CNN-LSTM, CNN-GRU, VGG19, and ZFNet. The best-performing model, VGG19, only achieved an accuracy of 88.61%. Additionally, the findings suggest that integrating noise reduction and feature enhancement in a single framework significantly improves the network’s classification accuracy and robustness when analyzing acoustic emission signals.

Keywords:

fault diagnosis; wavelet kernel network; transformer; acoustic emission

1. Introduction

Milling tools are critical components in machining processes, and their performance directly impacts machining efficiency and productivity. High-quality milling tools enable high-speed cutting, reduce processing time, and enhance production rates [1]. However, tool failures during operation can lead to property damage, production downtime, and even pose risks to personnel. Therefore, effective fault diagnosis of milling tools is crucial to ensuring smooth industrial production [1,2,3]. Tool wear monitoring methods can be broadly categorized into direct and indirect measurements [4]. However, direct measurement methods increase costs and are less efficient in industrial environments, often failing to provide accurate real-time assessments of tool conditions, particularly in complex working conditions. In contrast, indirect measurement methods, which monitor physical signals associated with tool conditions, offer more cost-effective and real-time feedback. Consequently, indirect methods are predominantly employed in tool monitoring. These methods infer tool wear through the detection of physical signals such as vibration, cutting force, acoustic emission, and current signals [5,6,7]. Studies indicate that 7% to 20% of machine downtime is caused by excessive tool wear or damage, while the cost of tools and their replacement accounts for 3% to 12% of total production costs. Therefore, developing an effective fault diagnosis method for milling tools is of paramount importance [8].

Current and vibration signals are widely used in the field of fault diagnosis due to their rich information content and ease of detection [9,10,11]. However, these two signals often lack sufficient sensitivity when detecting small cracks or defects. In contrast, acoustic emission (AE) signals can effectively compensate for these limitations, providing enhanced sensitivity in such cases. AE is a non-destructive monitoring technique that occurs when internal stress in a material is redistributed due to structural changes. This process converts mechanical energy into sound energy, generating elastic waves. AE spans a broad frequency range, from a few Hz up to 1 MHz, and can even extend to 10 MHz. Secondary AE sources, which are not directly related to microscopic dislocations, macroscopic deformations, or material fractures, primarily include fluid flow, leakage, friction, impact, and combustion [12]. In the realm of artificial intelligence, machine learning was initially employed for fault detection and diagnosis based on AE signals. For example, Unterberg et al. [13] investigated the use of AE signals for monitoring milling tool wear during precision blanking processes. By applying both linear and nonlinear dimensionality reduction techniques to visualize the extracted feature space, they successfully unveiled the temporal progression of tool wear. Similarly, Twardowski et al. [14] investigated tool wear identification based on acoustic emission signals and machine learning methods. They monitored edge wear on tools using acoustic emission signals and employed machine learning techniques, such as decision trees, to classify tool wear states. However, the model used in this study is relatively simplistic and lacks robustness. Cui et al. [15] presented an enhanced method for pipeline leak detection using AE signals, integrating improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) and probabilistic neural networks (PNNs). This approach, validated through experimental research, demonstrates superior stability, anti-interference capabilities, and detection precision, achieving a 98% accuracy rate in identifying pipeline leaks, even in noisy environments. Jiang et al. [16] proposed an improved RepVGG technique for diagnosing faults in extremely low-speed, heavy-load bearings using AE signals. This method converts normalized and noise-reduced bearing signals into Mel frequency cepstrum coefficients (MFCCs) and employs an optimized deep neural network for analysis.

To address increasingly complex challenges, deep learning has been integrated into fault diagnosis research based on AE [17,18,19]. Hou et al. [20] utilized a fingerprint feature recognition method based on AE technology to diagnose and predict faults in high-speed train wheel set bearings. This approach accurately identifies and forecasts bearing issues using AE signals. Wang et al. [21] developed a bearing fault diagnosis method that combines vibro-acoustic data from accelerometers and microphones within a 1D-CNN network. Extensive experiments on ten groups of bearings showed that this method offers higher accuracy and robustness compared to single-modal approaches. The study also includes a visualization analysis to clarify the method’s internal mechanisms. Choudhary et al. [22] introduced a vibro-acoustic fusion technique using a multi-input convolutional neural network (MI-CNN) for the accurate fault diagnosis of induction motors under varying conditions. Its effectiveness and reliability were demonstrated through experiments on six motor conditions and validated with additional bearing and gearbox datasets. Liu et al. [23] proposed a novel cross-scale data-based methodology for damage identification in CFRP laminates by integrating AE and deep learning. The study employed wavelet packet transform (WPT) and continuous wavelet transform (CWT) to preserve frequency domain information. Using a CNN, the methodology achieved 96.3% accuracy in detecting and classifying damage modes. Li et al. [24] introduced a physics-guided deep-learning framework for monitoring the loading conditions of high-strength bolt connections using AE data. By integrating supervised and unsupervised learning, this method achieved high accuracy in diagnosing damage mechanisms and identifying loading stages. It outperformed traditional models by successfully distinguishing between static friction failure, shear failure, and other damage mechanisms. Wang et al. [25] presented an advanced early fault detection (EFD) method for rotating machinery using AE signals, enhanced with a convolutional generative adversarial network (GAN) and a novel history-state ensemble (HSE) technique. This method effectively extracted deep information from AE signals, improving the robustness and performance of EFD without requiring additional computing time or specialized network architecture, as demonstrated by durability tests. Nashed et al. [26] introduced a methodology for classifying gas turbine failures using AE, wavelet analysis, and deep learning. By extracting time-series envelopes and generating time-frequency features with continuous wavelet transform, the study trained a deep convolutional neural network to accurately classify normal and faulty turbine conditions. This approach demonstrated effectiveness in early fault detection and condition monitoring. Huang et al. [27] proposed a lightweight neural network architecture for monitoring pipeline weld crack leakage using AE signals. This method reduced sampling points by 80% while maintaining the high characterization capabilities of AE data. Validated through three experiments with different crack leaks, the method demonstrated superior performance metrics compared to state-of-the-art techniques, offering a promising solution for industrial pipeline leak monitoring. Finally, Wang et al. [28] introduced a novel method for early sub-surface fault detection in rolling element bearings using AE signals. This approach incorporated a hybrid parameter, the information entropy penalty factor (IEPF), combining entropy theory and deep learning to overcome the limitations of traditional parameters, as demonstrated in roller-bearing contact fatigue experiments.

In this paper, a network is proposed named DDFE-Transformer, which comprises two key modules: DDFE and Transformer. The DDFE module performs data noise reduction and feature enhancement using WKN and CBAM, where WKN’s wavelet function is employed for noise reduction and CBAM’s attention mechanism for feature enhancement. Subsequently, the Transformer module computes the feature vectors and forwards the results to the softmax layer for classification. The uniqueness of this network lies in the integration of noise reduction and feature enhancement within a single framework, thereby improving classification accuracy and robustness.

The remainder of the paper is organized as follows: Section 2 discusses the theoretical background; Section 3 presents the experimental data; Section 4 covers the experimental procedures and analysis; and Section 5 concludes the paper.

2. Theoretical Background

2.1. Wavelet Kernel Network

A CNN comprises an input layer, multiple hidden layers for feature extraction, and an output layer for generating results [29,30,31]. The hidden layers typically include a convolutional layer, a pooling layer, an activation layer, and a fully connected layer. Among these, the convolutional layer plays a pivotal role in feature extraction. It utilizes a set of learnable filters, or convolutional kernels, to perform convolution operations across the input data using a sliding window approach [22]. Each filter in the convolutional layer functions as a small weight matrix undergoing element-wise multiplication with localized regions of the input data [20]. The results are then summed to produce an output feature map. By sliding the filters across different positions and consistently applying the same filters, the convolutional layer effectively localizes and extracts specific features from the input data [21].

Convolutional layers exhibit two key characteristics: parameter sharing and translation invariance [32,33]. Parameter sharing refers to the use of the same filter across the entire input data, reducing the number of parameters to be learned and improving model efficiency. Translation invariance is the convolutional layer’s ability to detect identical features regardless of their location within the input data. This property is particularly valuable when dealing with localized patterns in data, such as images, speech, and text. However, convolutional layers have inherent limitations [24]. Firstly, conventional convolutional layers use fixed-size convolutional kernels, which limits their ability to extract features at varying scales and capture long-term dependencies effectively. Secondly, these layers often operate as a “black box”, making it difficult to interpret their internal decision-making processes.

To address these challenges, the Wavelet Kernel Net (WKN) is introduced [25]. This network incorporates an internal Continuous Wavelet Convolutional (CWConv) layer, effectively replacing the initial convolutional layer found in standard CNN. This adaptation enables the CWConv layer to extract more meaningful features [26]. The significance of this conversion lies in WKN’s ability to better handle complex signals, especially in applications such as tool wear monitoring. Through CWConv, WKN captures more meaningful and detailed feature representations, thereby improving classification accuracy and robustness. Regarding its applicability, while WKN shows clear advantages in processing non-stationary signals (such as acoustic emission or vibration data), it is not necessarily suitable for all types of data. For instance, in applications where spatial invariance or global feature extraction is more critical, CNN may still be the preferred choice. Therefore, the use of WKN should be determined based on the specific application and the characteristics of the data. The structural transition from CNN to WKN is visually depicted in Figure 1.

The CWConv layer is defined as follows:

W = ψ_{v, r} (t) * x

(1)

where the predetermined over function

ψ_{v, r} (t)

can be some wavelet function with a time domain expression to represent the results of the CWConv layer. The process of updating translation parameters and scaling parameters for the CWConv layer is

\{\begin{cases} θ_{u_{i}} = \frac{\partial W}{\partial u_{i}} = \frac{\partial W}{\partial z_{i}} \frac{\partial z_{i}}{\partial w_{i}} \frac{\partial w_{i}}{\partial ψ_{v, r}^{i}} \frac{\partial ψ_{v, r}^{i}}{\partial u_{i}} \\ θ_{s_{i}} = \frac{\partial W}{\partial s_{i}} = \frac{\partial W}{\partial z_{i}} \frac{\partial z_{i}}{\partial w_{i}} \frac{\partial w_{i}}{\partial ψ_{v, r}^{i}} \frac{\partial ψ_{v, r}^{i}}{\partial s_{i}} \end{cases}

(2)

\{\begin{cases} v_{i} = v_{i} - η θ v_{i} \\ r_{i} = r_{i} - η θ r_{i} \end{cases}

(3)

where

\partial

is the derivative operator,

ψ_{v, r}^{i}

is the

i

th wavelet kernel of the CWConv layer of length L, and

r_{i}

and

v_{i}

denote the scale parameter and translation parameter, respectively. Parameters

v

and

r

will be updated by subtracting the learning rate

η

and gradient

θ

.

The CWConv layer performs convolutional operations using wavelet functions at varying scales. This capability allows for multi-scale characterization, enhancing the model’s ability to capture intricate details and diverse frequency components within the signal. Additionally, the integration of translation parameters with the wavelet functions effectively accommodates different time scales within the WKN network. As a result, the network efficiently processes extensive time series data while adeptly capturing the long-term temporal dependencies inherent in such data.

2.2. Convolutional Block Attention Module

Attention models in deep learning help systems identify the most relevant information for a given task, much like the human visual system focuses on essential details while ignoring irrelevant ones [17]. These models filter inputs to isolate the most useful data, thereby improving the network’s feature extraction capabilities and prediction accuracy. However, integrating attention mechanisms with CNN can complicate neuron connections. To address this, researchers developed the CBAM, which integrates seamlessly with convolutional layers and has proven effective in diagnosing mechanical device faults. The structure of CBAM is shown in Figure 2a.

CBAM comprises two main components: the channel attention module (CAM) and the spatial attention module (SAM). These components enhance the network’s feature extraction capability, leading to more accurate diagnostic outcomes. Figure 2b,c illustrate the generation processes of SAM and CAM, respectively.

The channel attention module (CAM) targets feature channels using global maximum pooling and global average pooling to obtain feature vectors. A multilayer perceptron processes these vectors to reduce dimensionality and extract relevant signal features, which are then combined into weight coefficients using a Sigmoid function to weight the input features. Conversely, the spatial attention module (SAM) uses a shared convolutional layer to extract features, focusing on capturing useful information within the feature space. Then, it obtains a feature vector of size

1 \times 1 \times N

through a global pooling layer. A multilayer perceptron refines useful feature information, and a Sigmoid function fuses these features into weight coefficients for the spatial dimensions of the input features. Integrating the channel and spatial attention modules forms CBAM, significantly enhancing the extraction of valuable information from input features. This enhancement leads to improved diagnostic accuracy in neural networks. Equations (4)–(8) detail the complete computational procedure of CAM.

M_{C} (S) = σ (M L P (A v g P o o l (S)) + M L P (M a x P o o l (S)))

(4)

M_{S} (S) = σ (f ([A v g P o o l (S), M a x P o o l (S)]))

(5)

where

σ

denotes the Sigmoid activation function,

M L P

represents a model with a multi-layer perceptron,

A v g P o o l

and

M a x P o o l

denote average-pooling and max-pooling, respectively,

f (\cdot)

denotes the convolution operation,

S

is a representation of intermediate layer feature maps, and

M_{C} (S)

denotes the channel attention feature map.

2.3. Transformer

The Transformer addresses limitations of traditional neural networks, such as the lack of parallel computation, thereby enhancing training efficiency [34]. Moreover, it excels at capturing internal data correlations and extracting temporal features, overcoming the feature extraction shortcomings of CNN models. By utilizing the self-attention mechanism, the Transformer effectively captures long-distance dependencies, a challenge that traditional recurrent neural networks struggle to address. Due to these advantages, the Transformer has become highly popular in NLP and demonstrates strong performance across various tasks.

The Transformer model comprises two main components: an encoder and a decoder. Given the need to compute a continuous representation of the input sequence for fault diagnosis in drilling pumps, only the encoder and positional encoding components of the Transformer are utilized. As illustrated in Figure 3, the key components of the Transformer model include position encoding and the multi-head attention mechanism. Position encoding captures the ordering information of the input data, while the multi-head self-attention mechanism encodes the sequence to extract temporal features.

A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(6)

where

N

specifies the length of the query and key (or value),

M

represents the length of the query and key (or value),

D_{k}

denotes the dimension of the key (or query) and the value,

D_{v}

refers to the dimension of the key (or query) and the value, and

\frac{1}{\sqrt{d_{k}}}

represents the scaling factor.

The scaling factor is used for normalization, which decouples the vector dimension from the distribution of softmax, allowing the gradient to remain stable during training and avoiding the problem of vanishing gradients. The dimensions of the query and keys need to be consistent, and the lengths of the keys and values need to be the same.

Unlike others that use only a single attention module, Transformer operates in parallel using multiple attention modules, where the original query

D_{m}

, key, and value project the learned to

H

different vectors into the

D_{k}

,

D_{k}

, and

D_{v}

dimensional spaces, respectively. Each projected query, key, and value is computed by Equation (9), and the attention weights are output separately. Then, all the outputs are concatenated and projected to the

D_{m}

dimension.

2.3.1. Position Code

In fault diagnosis of mechanical equipment, because the output of the model varies with the order of the input sequences, the model is sensitive to the order of the position information, which suggests that the position information of the sequences is very important in the Transformer. However, the self-attentive mechanism does not learn the position information of the sequence. To solve this problem, position coding is used to capture the position information in the Transformer model, and the function is calculated as follows:

P E_{(p, 2 i)} = \sin (\frac{p o s}{10000^{\frac{2 i}{d_{\mod e l}}}})

(7)

P E_{(p, 2 i + 1)} = \cos (\frac{p}{10000^{\frac{2 i}{d_{\mod e l}}}})

(8)

where

p o s

denotes the position of the feature vector and

d_{\mod e l}

denotes the dimension of the feature vector. The position of each feature vector is encoded as a cosine-sine function with different frequencies.

2.3.2. Self-Attention

The SAM is a common technique in natural language processing, widely employed for successful sequence data processing. Essentially, the self-attention mechanism represents a network structure akin to a non-local filtering operation that is proficient at establishing connections between different segments of a sequence, thereby generating a newly encoded sequence. Furthermore, the self-attention mechanism has the capability to assign varying weights to different segments, prioritizing the importance of certain parts of the sequence. Its schematic diagram is depicted in Figure 3. In comparison to traditional RNNs and CNNs, the self-attention mechanism excels in modeling without imposing distance limitations. Additionally, it perfectly handles scenarios involving multiple vectors with uncertain sizes. This makes it a versatile solution when dealing with various data representations using the self-attention mechanism.

The Transformer model utilizes a scaled dot product attention mechanism to calculate the attention values of the feature matrix, as illustrated in Figure 4. In this mechanism, the query and key matrices are initially subjected to a dot product operation and then normalized using softmax to compute the weight coefficients. Subsequently, the value matrix is weighted and summed based on these computed weight coefficients. In the input as

X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{n \times d}

and the final output as

Y = [y_{1}, y_{2}, \dots, y_{n}] \in R^{n \times d}

, the specific calculations are shown in Equations (12) and (13).

\{\begin{cases} Q = X_{f} W^{Q} \\ K = X_{f} W^{K} \\ V = X_{f} W^{V} \end{cases}

(9)

Y = S_{A} (Q, K, V) = S o f t \max (A) \cdot V = S o f t \max (\frac{Q \cdot K^{T}}{\sqrt{d}}) \cdot V

(10)

where

d

represents the number of dimensions,

Q

refers to the query matrix,

K

corresponds to the key matrix,

V

relates to the value matrix,

X_{f}

indicates the input feature matrix,

W^{Q}

is associated with the weight matrix corresponding to the query matrix,

W^{K}

is linked to the weight matrix corresponding to the key matrix, and

W^{V}

associates with the weight matrix corresponding to the value matrix.

2.3.3. Multi-Head Self-Attention Mechanism

A key component in the Transformer model is the multi-head self-attention mechanism, an enhancement based on the self-attention mechanism designed to better handle sequential data. In the traditional self-attention mechanism, each query, key, and value is generated from the same vector. In contrast, in the multi-head self-attention mechanism, these vectors are divided into multiple heads, which learn the information in different subspaces separately, and each head learns to generate a set of query

Q

, key

K

, and value

V

. Then, the multiple attention values are spliced and linearly transformed to get the final attention value, which can better represent the information in the sequence. The multi-head self-attention mechanism can improve the performance of the self-attention layer, which is similar to the multi-channel convolutional kernel in CNN.

The self-attention mechanism allows the model to focus on important information within the input features. However, a single attention mechanism is limited to learning relevant information within a single representation space, which may restrict the model’s ability to simultaneously focus on and process multiple aspects of the input. To synthesize the significance of information contained in the input sequence, this paper employs a multi-head self-attention mechanism, which jointly attends to information from different representation subspaces at various locations, as illustrated in Figure 5. During the fault diagnosis process of drilling pumps, the multi-head attention mechanism can effectively capture time features related to pump faults. The corresponding mathematical expressions are provided in Equations (14) and (15).

M u l t i H e a d (Q, K, V) = C o n c a t (h_{1}, \dots, h_{m}) \cdot W

(11)

h_{i} = A t t e n t i o n (X W_{i}^{Q}, X W_{i}^{K}, X W_{i}^{V})

(12)

where

W

represents the multiple attention weight matrix,

W_{i}^{Q}

signifies the weight matrix of the first attention,

W_{i}^{K}

denotes the weight matrix of the first attention,

W_{i}^{V}

refers to the weight matrix of the first attention,

m

stands for the number of attention heads, and

C o n c a t (\cdot)

signifies the output value computed by concatenating the attention heads.

3. DDFE-Transformer Fault Diagnosis Method

The flowchart of the DDFE-transformer fault diagnosis method is illustrated in Figure 6. The specific steps are as follows:

Step 1: AE data collected from NASA Ames Research Center and the University of California, Berkeley, were organized into an appropriate dataset format. This dataset, primarily used to study tool wear, records data from milling experiments under different operating conditions. Preprocessing of the raw data, including denoising and normalization, was performed to enhance data quality and minimize the impact of noise on subsequent analysis. Additionally, data augmentation was conducted using an overlapping sampling strategy to increase the number of samples and improve the model’s generalization ability.

Step 2: The preprocessed data are fed into the DDFE module, which comprises the WKN and CBAM. The WKN module applies wavelet transform to reduce noise in the acoustic emission signal. Through multi-scale analysis, the wavelet transform effectively removes noise while preserving essential features. The CBAM module enhances the noise-reduced signal using channel attention and spatial attention mechanisms. CBAM extracts global features through global average pooling and max pooling operations, followed by feature weight adjustment using a sigmoid activation function, thereby improving the network’s feature extraction capability.

Step 3: The data processed by the DDFE module are passed to the Transformer module, which consists of multiple encoder layers and utilizes the multi-head self-attention mechanism for feature computation and processing. This step enhances the efficiency and accuracy of feature extraction.

Step 4: The feature vectors processed by the Transformer module are input to the softmax layer for classification, yielding results for tool wear level and fault type. During training, the cross-entropy loss function is used, and parameters are updated using the Adam optimization algorithm. Hyperparameters such as learning rate, weight decay, number of training rounds, and batch size are optimized using the Bayesian optimization method.

Step 5: Ablation and comparison experiments are conducted to evaluate the importance of the Transformer and CBAM modules by observing changes in classification accuracy and robustness when either module is removed. The DDFE-Transformer is then compared with other deep learning models to validate its performance.

4. Experiments and Analysis

4.1. Experimental Setup and Data Processing

4.1.1. Data Presentation

The dataset utilized in this study was obtained from the NASA Ames Research Center and the University of California, Berkeley, as part of research conducted by Kai Goebel and Alice Agogino [35]. It comprises data from milling experiments conducted under various operating conditions, primarily aimed at investigating tool wear. The dataset includes sixteen cases, each with a different number of experimental runs, as detailed in Table 1. The number of runs was determined based on the recorded flank wear, which was measured at irregular intervals until the wear limit was either reached or exceeded. It should be noted that some instances lack entries due to the irregularity of the flank wear measurements.

The process of data collection is as follows:

Acoustic Emission Sensor (Spindle): The sensor is mounted on the spindle to detect acoustic emissions during the milling process.
Preamplifier: The signal from the acoustic emission sensor is first fed into a preamplifier to enhance the signal strength and quality.
RMS: The amplified signal is then processed through an RMS (Root Mean Square) device to smooth the signal and make it suitable for further analysis.
Computer: Finally, the processed signal is sent to a computer for data acquisition, storage, and analysis.

This sequence ensures that the acoustic emission data are accurately captured and processed for reliable analysis of tool wear and milling conditions.

The specific test setup is shown below:

An acoustic emission sensor model WD 925 (PHYSICAL ACOUSTIC GROUP, up to 2 MHz) is mounted on the table, adhered to a custom base attached to the clamping support. Figure 7a illustrates the sensor’s layout on the clamping device. For industrial purposes, a 70 mm face mill with six inserts was selected (Figure 7b). The recommended inserts, KC710 (Kennametal, Latrobe, PA, USA, 1985), are specifically designed for roughing operations. These inserts feature multiple layers of titanium carbide, titanium carbonitride, and titanium nitride (TiC/TiC-N/TiN), providing the toughness of tungsten carbide along with enhanced resistance to cratering and edge wear. Additionally, they reduce face friction, making them suitable for heavy roughing tasks.

Acoustic emissions, high-frequency oscillations that occur spontaneously within metals during deformation or fracture, result from the release of strain energy as the material’s microstructure rearranges. These emissions originate in primary and secondary shear zones (Figure 7c), occurring at the chip/tool interface through bulk deformation and sliding, as well as at the tool flank/workpiece interface due to friction. Although the fundamental acoustic emission signal is sinusoidal, it becomes random due to reflection and scattering from structural defects. The frequency of these oscillations ranges from 50 kHz to several MHz. It is preferable to position the acoustic emission sensor near the cutting zone because the signal attenuates with distance, though sensor protection can be challenging.

Vibration emissions, characterized by low-frequency oscillations (0–40 kHz), result from object acceleration due to dynamic cutting force changes, periodic tool geometry changes, chip formation, and built-up edges. Similar to acoustic emissions, the vibration sensor should be placed close to the cutting zone to minimize signal attenuation. While the Matsuura machining center’s rigidity reduces vibrations compared to the upright Bridgeport milling machine, vibrations still contribute to tool wear and must be considered.

4.1.2. Dataset Split

The data under case 1 were chosen to divide the labels according to the different wear levels of the tool, as shown in Table 2. There are 55 samples for each label, which are divided into training set, validation set, and test set in the ratio of 6:2:2. In Figure 8, the horizontal axis represents a complete machining process from tool engagement to retraction, with 0, 2000, 4000, and 8000 denoting the sequence of sampling points. The vertical axis shows the normalized acoustic emission (AE) signal amplitude, ranging from [0, 1]. The normalization makes the signal’s strength variation at different time points more intuitive, facilitating comparison and analysis of different signals. Figure 8 illustrates the fluctuations of the AE signal throughout the complete process of the milling tool. The AE signals are captured by monitoring the high-frequency sound generated by the milling tool during operation. Each plot corresponds to a tool with varying degrees of wear, and the signals are normalized to the [0, 1] range, making it easier to visualize and compare the signal variations across different wear levels.

4.1.3. Data Preprocessing

By increasing the number of samples, the model’s generalization ability is significantly boosted, enabling it to learn effectively from a large dataset and iteratively refine network parameters. To thoroughly characterize signals, overlapping sampling for data augmentation is employed. An overlapping sampling strategy with a step size of 128 data points is shown in Figure 9. The blue signal represents a segment of the captured acoustic signal, while the red boxes (Sample1 and Sample2) indicate two samples extracted from the signal using a sliding window. Each sample contains a portion of the original signal data. The “step” denotes the stride length of the sliding window, and the “overlap” refers to the shared signal data between adjacent samples. This overlapping technique ensures continuity between samples, allowing the capture of subtle variations within the signal.

4.2. Parameter Setting

During training, the DDFE-Transformer network model utilizes the cross-entropy loss function. Additionally, the Adam optimization algorithm updates the network weights. Bayesian optimization is applied to determine the best values for parameters such as ‘learning rate’, ‘weight decay’, ‘epoch’, and ‘batch size’, as shown in Table 3. The model’s architecture parameters, detailed in Table 4, include ‘filter’ defining the number of convolutional kernels, ‘stride’ indicating the step length of the convolution kernel sliding over the input feature map, ‘kernel_size’ referring to the convolutional kernel size, and ‘activation’ specifying the activation function. ‘D_model’ represents the dimension of input and output features, ‘num_heads’ indicates the number of heads in the Transformer’s anomaly attention mechanism, and ‘num_layers’ refers to the number of encoder layers in the Transformer. The softmax function is used in the output layer.

4.3. Metrics

To thoroughly assess the proposed method’s effectiveness and classification accuracy, four evaluation metrics are used. In binary classification, the model’s outputs are typically labeled “True” for True Positive (TP) and “False” for True Negative (TN). Predictions that contradict the actual label are termed False Positive (FP) and False Negative (FN) [36]. The model’s accuracy and precision are calculated as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(13)

p r e c i s i o n = \frac{T P}{T P + F P}

(14)

Recall, also known as sensitivity or True Positive Rate (TPR), measures a classification model’s ability to accurately identify positive class samples, as in Equation (15). Ranging from 0 to 1, higher recall values indicate better detection of positive samples. A high recall rate is vital to ensure all positive samples are detected. Evaluating classification models requires considering both recall and accuracy; hence, using multiple metrics provides a more comprehensive assessment rather than relying on a single metric.

R e c a l l = T P R = \frac{T P}{T P + F N}

(15)

False Positive Rate (FPR) represents the proportion of negative samples that are incorrectly classified as positive. The formula is as follows:

F P R = \frac{F P}{F P + T N}

(16)

The Receiver Operating Characteristic (ROC) curve is plotted by gradually changing the classification threshold and calculating TPR and FPR at different thresholds. The X-axis represents the FPR, and the Y-axis represents the TPR. Typically, a good classifier will have the curve close to the top-left corner (high TPR and low FPR). The Area Under the Curve (AUC) is the area under the ROC curve and is used to evaluate the overall performance of a model. The AUC value ranges from [0, 1], where

AUC = 1 indicates a perfect classifier.

AUC = 0.5 indicates that the classifier is no better than random guessing.

AUC < 0.5 indicates that the classification performance is worse than random guessing.

The ROC curve is generated by progressively changing the classification threshold from 1 (where all samples are predicted as positive) to 0 (where all samples are predicted as negative), yielding different TPR and FPR.

The F1 score balances recall and precision, averaging these potentially conflicting metrics. It is valuable for assessing model performance, as in Equation (17).

F 1 = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(17)

4.4. Experimental Results and Analysis

Figure 10 illustrates the convergence of the proposed DDFE-Transformer model during training on the acoustic emission signal dataset for tool wear. The model exhibits excellent convergence, with significant progress observed by the fifth epoch, where it nearly completes the convergence process. By the 10th epoch, the model achieves full convergence, attaining 100% recognition accuracy on both the training and validation sets. This performance highlights the effectiveness of the proposed method in classifying acoustic emission signals related to tool wear.

Figure 11 illustrates the ROC curves for multiple classes of the proposed model, with each color representing a different class. In this figure, all the ROC curves are close to the top-left corner, and the AUC for each class is 1.00, indicating excellent classification performance for every category. The black dashed line represents the diagonal, which corresponds to the performance of a random classifier. The further the ROC curves are from this diagonal, the better the classifier’s performance. As shown here, all the ROC curves are well above the diagonal, demonstrating that the classification performance of this model is excellent.

4.4.1. Ablation

Through the ablation experiments, the contribution of each module within the proposed method was systematically validated. Since the CBAM-Transformer does not directly process the original signal, the comparison with the CBAM-Transformer model was excluded from the ablation experiments. Specifically, two ablation experiments were designed: (1) removing the CBAM while retaining the WKN-Transformer network; and (2) removing the Transformer module while retaining the DDFE network.

Table 5 presents the average results of five tests performed on the tool acoustic emission dataset for the three network models and documents the detailed outcomes of these experiments. As shown in Table 5, the proposed network model achieved consistently high values across all four metrics, with all metrics exceeding 99%, demonstrating its outstanding classification capability and robust performance. While the WKN-Transformer model also performed well, it showed slightly lower values across the four metrics compared to the proposed method. When the Transformer module was removed, the model’s accuracy, precision, and other metrics dropped significantly, with the accuracy dropping to as low as 77.92%.

Figure 12 provides an intuitive comparison of the performance of different models, clearly showing that the proposed model exhibits more stability and better robustness.

The confusion matrix is a commonly used evaluation method for multi-class classification tasks, providing a clear way to assess the model’s classification performance and identify which fault types are prone to misclassification. This is crucial for further model optimization. As shown in Figure 13a, the proposed method performed excellently in distinguishing samples with varying degrees of wear from the tool acoustic emission dataset. In contrast, Figure 13b reveals some misclassifications in certain classes (highlighted in red boxes), while Figure 13c shows that the WKN-CBAM model had a significantly higher number of classification errors, with multiple categories misclassified. It even failed to recognize the “0.44” and “0.45” categories entirely, further confirming the critical importance of each module. The confusion matrix is a commonly used evaluation method for multi-class classification tasks, providing a clear way to assess the model’s classification performance and identify which fault types are prone to misclassification. This is crucial for further model optimization. As shown in Figure 13a, the proposed method performed excellently in distinguishing samples with varying degrees of wear from the tool acoustic emission dataset. In contrast, Figure 13b reveals some misclassifications in certain classes (highlighted in red boxes), while Figure 13c shows that the WKN-CBAM model had a significantly higher number of classification errors, with multiple categories misclassified. It even failed to recognize the “0.44” and “0.45” categories entirely, further confirming the critical importance of each module.

The t-SNE plot provides an intuitive visualization of the clustering effect of data points in low-dimensional space, helping to evaluate the separability between different fault types and further validate the model’s effectiveness in distinguishing different wear conditions. As shown in Figure 14a, the proposed method effectively separated signal feature points from different categories, demonstrating its superior performance in handling high-dimensional signal features. In Figure 14b, while certain classes (as indicated by the red arrows) are well separated, others are more sparsely distributed, suggesting that the model, though capable of extracting some features, does not perform as well in classification compared to the proposed method. In Figure 14c, it is evident that WKN-CBAM produced the most scattered feature distribution, with signal feature points from different categories mixed together and overlapping significantly.

Overall, these results further confirm the severe negative impact on classification performance when the Transformer module and CBAM are removed from the model.

4.4.2. Comparison

To validate the effectiveness of the proposed method, this paper conducted a comparison with five classical models: CNN, CNN-LSTM [37], CNN-GRU [38], VGG19 [39], and ZFNet [40]. Table 6 presents the average performance of these models on the test set after individual training. As shown in the table, this paper’s proposed model demonstrates a significant advantage over the others. Among the compared models, VGG19 performs best, yet it still trails the proposed method by 10% in accuracy. The remaining models, such as CNN, CNN-LSTM, and CNN-GRU, achieved accuracies of 62.05%, 69.31%, and 76.39%, respectively, indicating that their performance in the classification task is significantly inferior to the proposed method. Figure 15 further visualizes these metric data, providing an intuitive display of the models’ performance across multiple tests.

Figure 16 shows box plots illustrating the accuracy distribution of different models. The proposed method not only achieved the best accuracy but also exhibited the smallest range of variation, further demonstrating its robustness and stability. In contrast, models such as VGG19 and CNN-GRU showed more dispersed accuracy distributions with greater fluctuations. The CNN model, in particular, displayed poor consistency and significant performance variability.

The confusion matrix in Figure 17 reveals the classification performance of each model on the test set. Compared to the proposed model, CNN (Figure 17b), CNN-LSTM (Figure 17c), and CNN-GRU (Figure 17d) exhibited higher rates of misclassification, especially in certain categories. For instance, the CNN model failed to identify category “0”; CNN-LSTM could not recognize categories “0.11”, “0.44”, “0.45”, and “0.50”; and CNN-GRU failed to identify categories “0.20”, “0.40”, and “0.43”. In addition, both VGG19 and ZFNet also exhibited classification errors in certain categories.

The t-SNE plots in Figure 18 provide a visual representation of how well the models cluster feature points in low-dimensional space. Compared to the proposed model, the distributions in Figure 18b,d,f appear more chaotic, with significant overlap and mixing between categories. This indicates that the CNN, CNN-GRU, and ZFNet models struggled to accurately distinguish between these categories, reflecting their limitations in classification ability. In Figure 18c, the CNN-LSTM model’s feature points are more dispersed, with some categories overlapping, indicating a limited ability to differentiate signal features. Figure 18e shows that the t-SNE plot of the VGG19 model exhibited relatively good clustering for some categories, with closely grouped feature points. However, overlap remained between different categories, particularly on the right-hand side (e.g., the pink and orange categories), suggesting that while VGG19 performed well, it still had shortcomings in distinguishing certain categories.

Overall, the results of the comparison experiments clearly demonstrate that the proposed model outperforms other classical models in all key metrics, exhibiting higher accuracy, consistency, and robustness.

5. Conclusions

This paper proposes a network named DDFE-Transformer for fault diagnosis based on acoustic emission signals. The DDFE-Transformer network comprises two key modules: DDFE and the Transformer. This method integrates data noise reduction and feature enhancement within a single network structure, effectively addressing the suboptimal signal processing capabilities of previous approaches. Additionally, the use of a Transformer enhances the processing of feature vectors, leading to more accurate fault identification. To validate the efficacy of the proposed method, experiments were conducted using acoustic emission datasets collected by NASA Ames Research Center and the University of California, Berkeley. The experimental results demonstrate that the DDFE-Transformer network excels in fault diagnosis using acoustic emission signals, achieving an accuracy of 99.84% in identifying potential fault types. These results significantly outperform traditional models such as CNN, CNN-LSTM, and VGG19, which achieved accuracies of 62.05%, 69.31%, and 88.61%, respectively. This approach provides an effective solution for condition monitoring in industrial production processes. Future work will explore the application of transfer learning to the model to achieve domain adaptation across multiple working conditions.

Author Contributions

Conceptualization, C.M. and Z.W.; methodology, Z.W.; software, C.M.; validation, J.G. (Jiuyang Gao), M.L. and Z.Z.; formal analysis, M.L.; investigation, Z.Z.; resources, J.Y.; data curation, J.Y.; writing—original draft preparation, C.M. and J.Z.; writing—review and editing, C.M.; visualization, C.M.; supervision, C.M.; project administration, J.G. (Junyu Guo); funding acquisition, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Natural Science Foundation of Sichuan, China (Grant No. 2023NSFSC856), China Oil & Gas Pipeline Network Corporation Central China Branch Stabilization Support.

Data Availability Statement

Data available on request due to restrictions (e.g., privacy, legal or ethical reasons).

Conflicts of Interest

Authors Chenggong Ma, Jiuyang Gao, Zhenggang Wang, Ming Liu, Jing Zou, Zhipeng Zhao, Jingchao Yan were employed by the company China Oil & Gas Pipeline Network Corporation Central China Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

DDFE	Data-Driven Feature Extraction
WKN	Wavelet Kernel Network
CBAM	Convolutional Block Attention Module
AE	Acoustic emission
CWT	Continuous wavelet transform
CNN	Convolutional neural network
CWConv	Continuous Wavelet Convolutional
CAM	Channel attention module
SAM	Spatial attention module
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Unit
VGG19	Visual Geometry Group 19
ZFNet	Zeiler and Fergus Network

References

Kaliyannan, D.; Thangamuthu, M.; Pradeep, P.; Gnansekaran, S.; Rakkiyannan, J.; Pramanik, A. Tool Condition Monitoring in the Milling Process Using Deep Learning and Reinforcement Learning. J. Sens. Actuator Netw. 2024, 13, 42. [Google Scholar] [CrossRef]
Mohanraj, T.; Kirubakaran, E.S.; Madheswaran, D.K.; Naren, M.L.; Ibrahim, M. Review of advances in tool condition monitoring techniques in the milling process. Meas. Sci. Technol. 2024, 35, 092002. [Google Scholar]
Natarajan, S.; Thangamuthu, M.; Gnanasekaran, S.; Rakkiyannan, J. Digital twin-driven tool condition monitoring for the milling process. Sensors 2023, 23, 5431. [Google Scholar] [CrossRef] [PubMed]
Fu, G.Z.; Zhang, X.; Li, W.; Guo, J. Bayesian Fusion of Degradation and Failure Time Data for Reliability Assessment of Industrial Equipment Considering Individual Differences. Processes 2024, 12, 268. [Google Scholar] [CrossRef]
Jin, C.; Chen, X. An end-to-end framework combining time–frequency expert knowledge and modified transformer networks for vibration signal classification. Expert Syst. Appl. 2021, 171, 114570. [Google Scholar] [CrossRef]
Xia, P.; Huang, Y.; Tao, Z.; Liu, C.; Liu, J. A digital twin-enhanced semi-supervised framework for motor fault diagnosis based on phase-contrastive current dot pattern. Reliab. Eng. Syst. Saf. 2023, 235, 109256. [Google Scholar] [CrossRef]
Li, Y.; Sun, L.; Geng, J.; Zhao, X. Semi-analytical investigation on hydrodynamic efficiency and loading of perforated breakwater-integrated OWCs. Ocean Eng. 2024, 309, 118460. [Google Scholar] [CrossRef]
Zhou, Y.; Sun, B.; Sun, W. A tool condition monitoring method based on two-layer angle kernel extreme learning machine and binary differential evolution for milling. Measurement 2020, 166, 108186. [Google Scholar] [CrossRef]
Wu, H.; Wei, J.; Wu, P.; Zhang, F.; Liu, Y. Dynamic response analysis of high-speed train gearboxes excited by wheel out-of-round: Experiment and simulation. Veh. Syst. Dyn. 2024, 1–27. [Google Scholar] [CrossRef]
Li, F.; Wu, H.; Liu, L.; Ye, Y.; Wang, Y.; Wu, P. Nonlinear optimal frequency control for dynamic vibration absorber and its application. Mech. Syst. Signal Process. 2025, 223, 111932. [Google Scholar] [CrossRef]
Yang, Y.; Xu, Q.; Chen, Y.; Yu, T.; Fu, G.; Huang, S. A fast nonlinear equivalent magnetic network model for magnetic jack type control rod drive mechanism in reactor. Prog. Nucl. Energ. 2024, 169, 105058. [Google Scholar] [CrossRef]
Hu, J.; Yu, Y.; Yang, J.; Jia, H. Research on the generalisation method of diesel engine exhaust valve leakage fault diagnosis based on acoustic emission. Measurement 2023, 210, 112560. [Google Scholar] [CrossRef]
Unterberg, M.; Voigts, H.; Weiser, I.F.; Feuerhack, A.; Trauth, D.; Bergs, D. Wear monitoring in fine blanking processes using feature based analysis of acoustic emission signals. Procedia CIRP 2021, 104, 164–169. [Google Scholar] [CrossRef]
Twardowski, P.; Tabaszewski, M.; Wiciak–Pikuła, M.; Felusiak-Czyryca, A. Identification of tool wear using acoustic emission signal and machine learning methods. Precis. Eng. 2021, 72, 738–744. [Google Scholar] [CrossRef]
Cui, J.; Zhang, M.; Qu, X.; Zhang, J.; Chen, L. An Improved Identification Method of Pipeline Leak Using Acoustic Emission Signal. J. Mar. Sci. Eng. 2024, 12, 625. [Google Scholar] [CrossRef]
Jiang, P.; Sun, W.; Li, W.; Wang, H.; Liu, C. Extreme-low-speed heavy load bearing fault diagnosis by using improved RepVGG and acoustic emission signals. Sensors 2023, 23, 3541. [Google Scholar] [CrossRef]
Guo, J.; Yang, Y.; Li, H.; Li, H.; Wang, J.; Tang, A.; Shan, D.; Huang, B. A hybrid deep learning model towards fault diagnosis of drilling pump. Appl. Energy 2024, 372, 123773. [Google Scholar] [CrossRef]
Shao, Z.; Yin, Y.; Lyu, H.; Soares, G.C. A robust method for multi object tracking in autonomous ship navigation systems. Ocean Eng. 2024, 311, 118560. [Google Scholar] [CrossRef]
Wang, X.; Liu, X.; Yang, H.; Wang, Z.; Wen, X.; He, X.; Qing, L.; Chen, H. Degradation Modeling for Restoration-enhanced Object Detection in Adverse Weather Scenes. IEEE Trans. Intell. Veh. 2024, 1–17. [Google Scholar] [CrossRef]
Hou, D.; Qi, H.; Wang, C.; Han, D. High-speed train wheel set bearing fault diagnosis and prognostics: Fingerprint feature recognition method based on acoustic emission. Mech. Syst. Signal Process. 2022, 171, 108947. [Google Scholar] [CrossRef]
Wang, X.; Mao, D.; Li, X. Bearing fault diagnosis based on vibro-acoustic data fusion and 1D-CNN network. Measurement 2021, 173, 108518. [Google Scholar] [CrossRef]
Choudhary, A.; Mishra, R.K.; Fatima, S.; Panigrahi, B.K. Multi-input CNN based vibro-acoustic fusion for accurate fault diagnosis of induction motor. Eng. Appl. Artif. Intell. 2023, 120, 105872. [Google Scholar] [CrossRef]
Liu, Y.; Huang, K.; Wang, Z.X.; Li, Z.; Chen, L.; Shi, Q.; Yu, S.; Li, Z.; Zhang, L.; Guo, L. Cross-scale data-based damage identification of CFRP laminates using acoustic emission and deep learning. Eng. Fract. Mech. 2023, 294, 109724. [Google Scholar] [CrossRef]
Li, D.; Nie, J.H.; Wang, H.; Ren, W.X. Loading condition monitoring of high-strength bolt connections based on physics-guided deep learning of acoustic emission data. Mech. Syst. Signal Process. 2024, 206, 110908. [Google Scholar] [CrossRef]
Wang, Y.; Vinogradov, A. Improving the performance of convolutional GAN using history-state ensemble for unsupervised early fault detection with acoustic emission signals. Appl. Sci. 2023, 13, 3136. [Google Scholar] [CrossRef]
Nashed, M.S.; Renno, J.; Mohamed, M.S.; Reuben, R.L. Gas turbine failure classification using acoustic emissions with wavelet analysis and deep learning. Expert Syst. Appl. 2023, 232, 120684. [Google Scholar] [CrossRef]
Huang, J.; Zhang, Z.; Qin, R.; Yu, Y.; Wen, G.; Cheng, W.; Chen, X. Lightweight neural network architecture for pipeline weld crack leakage monitoring using acoustic emission. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
Wang, Y.; Hestmo, R.H.; Vinogradov, A. Early sub-surface fault detection in rolling element bearing using acoustic emission signal based on a hybrid parameter of energy entropy and deep autoencoder. Meas. Sci. Technol. 2023, 34, 064008. [Google Scholar] [CrossRef]
Guo, J.; Yang, Y.; Li, H.; Dai, L.; Huang, B. A parallel deep neural network for intelligent fault diagnosis of drilling pumps. Eng. Appl. Artif. Intell. 2024, 133, 108071. [Google Scholar] [CrossRef]
Shao, Z.; Yin, Y.; Lyu, H.; Soares, G.C.; Cheng, T.; Jing, Q.; Yang, Z. An efficient model for small object detection in the maritime environment. Appl. Ocean Res. 2024, 152, 104194. [Google Scholar] [CrossRef]
Guo, J.; Wang, Z.; Li, H.; Yang, Y.; Huang, G.C.; Yazdi, M.; Kang, H.S. A hybrid prognosis scheme for rolling bearings based on a novel health indicator and nonlinear Wiener process. Reliab. Eng. Syst. Saf. 2024, 245, 110014. [Google Scholar] [CrossRef]
Wang, X.; Chen, H.; Gou, H.; He, J.; Wang, Z.; He, X.; Qing, L.; Sheriff, R.E. RestorNet: An efficient network for multiple degradation image restoration. Knowl.-Based Syst. 2023, 282, 111116. [Google Scholar] [CrossRef]
Zhao, Y.; Teng, Q.; Chen, H.; Zhang, S.; He, X.; Li, Y.; Sheriff, R.E. Activating more information in arbitrary-scale image super-resolution. IEEE Trans. Multimed. 2024, 26, 7946–7961. [Google Scholar] [CrossRef]
Wang, X.; Wang, H.; Zhang, M.; Zhang, F. Combining optical flow and Swin Transformer for Space-Time video super-resolution. Eng. Appl. Artif. Intell. 2024, 137, 109227. [Google Scholar] [CrossRef]
Mo, X.; Wang, T.; Zhang, Y.; Hu, X. A cumulative descriptor enhanced ensemble deep neural networks method for remaining useful life prediction of cutting tools. Adv. Eng. Inf. 2023, 57, 102094. [Google Scholar] [CrossRef]
Wang, L.; Mao, Z.; Xuan, H.; Ma, T.; Hu, C.; Chen, J.; You, X. Status diagnosis and feature tracing of the natural gas pipeline weld based on improved random forest model. Int. J. Press. Vessel. Pip. 2022, 200, 104821. [Google Scholar] [CrossRef]
Du, J.; Zeng, J.; Wang, H.; Ding, H.; Wang, H.; Bi, Y. Using acoustic emission technique for structural health monitoring of laminate composite: A novel CNN-LSTM framework. Eng. Fract. Mech. 2024, 309, 110447. [Google Scholar] [CrossRef]
Wang, H.; Hu, D.; Yang, C.; Wang, B.; Duan, B.; Wang, Y. Model construction and multi-objective performance optimization of a biodiesel-diesel dual-fuel engine based on CNN-GRU. Energy 2024, 301, 131586. [Google Scholar] [CrossRef]
Ferdousi, J.; Lincoln, S.I.; Alom, M.K.; Foysal, M. A Deep Learning Approach for White Blood Cells Image Generation and Classification using SRGAN and VGG19. Telemat. Inform. Rep. 2024, 16, 100163. [Google Scholar] [CrossRef]
Fu, L.; Feng, Y.; Majeed, Y.; Zhang, X.; Zhang, J.; Karkee, M.; Zhang, Q. Kiwifruit detection in field images using Faster R-CNN with ZFNet. IFAC-PapersOnLine 2018, 51, 45–50. [Google Scholar] [CrossRef]

Figure 1. Transition in structure from CNN to WKN.

Figure 2. The structure of CBAM model.

Figure 3. Position encoding and encoders using Transformer.

Figure 4. Structure of the self-attention mechanism.

Figure 5. Multiple attention mechanisms.

Figure 6. The flowchart of the proposed fault diagnosis method.

Figure 7. Schematic of the experimental setup and tool wear.

Figure 8. Visualization of acoustic emission signals.

Figure 9. Data augmentation.

Figure 10. Model performance on training and validation sets.

Figure 11. ROC curve of DDFE-Transformer.

Figure 12. Metrics in ablation experiment (a) Accuracy, (b) Precision, (c) Recall, (d) F1 score.

Figure 13. Matrix of confusion for each of the models of the ablation experiment.

Figure 14. t-SNE plots for each model of the ablation experiment.

Figure 15. Metrics in comparison experiment (a) Accuracy, (b) Precision, (c) Recall, (d) F1 score.

Figure 16. Accuracy of different models.

Figure 17. Matrix of confusion for each of the models of the comparison experiment.

Figure 18. t-SNE plots for each model of the comparison experiment.

Table 1. Structure field names and descriptions.

Case	Depth of Cut	Feed	Material
1	1.5	0.5	1—cast iron
2	0.75	0.5	1—case iron
3	0.75	0.25	1—cast iron
4	1.5	0.25	1—cast iron
5	1.5	0.5	2—steel
6	1.5	0.25	2—steel
7	0.75	0.25	2—steel
8	0.75	0.5	2—steel
9	1.5	0.5	1—cast iron
10	1.5	0.25	1—cast iron
11	0.75	0.25	1—cast iron
12	0.75	0.5	1—cast iron
13	0.75	0.25	2—steel
14	0.75	0.5	2—steel
15	1.5	0.25	2—steel
16	1.5	0.5	2—steel

Table 2. Data segmentation.

Number	Label	Train Set	Valid Set	Test Set
1	0.00	33	11	11
2	0.11	33	11	11
3	0.20	33	11	11
4	0.24	33	11	11
5	0.28	33	11	11
6	0.29	33	11	11
7	0.38	33	11	11
8	0.40	33	11	11
9	0.43	33	11	11
10	0.44	33	11	11
11	0.45	33	11	11
12	0.50	33	11	11

Table 3. Training configuration parameters of DDFE-Transformer.

Parameters	Optimizing Space	Value
Batch size	[8,16,32,64,128,256]	32
Learning rate	[0.00001–0.01]	0.0001094
Epoch	[15–50]	35
Wight decay	[0–0.4]	0.0318

Table 4. Architecture parameters of DDFE-Transformer.

Layer	Hyperparameters
CWConv layer	filters = 32, kernel_size = 16, strides = 4, activation = ReLU
CBAM layer 1	In_channel = 32
Convolutional layer 1	filters = 32, kernel_size = 9, strides = 4, activation = ReLU
CBAM layer 2	In_channel = 32
Convolutional layer 3	filters = 64, kernel_size = 7, strides = 4, activation = ReLU
Transformer encoder	d_model = 30, nhead = 3, num_layers = 2
Dense1	1024, activation = ReLU
Dense2	16, activation = Softmax

Table 5. Metrics in the ablation experiment.

Models	ACC (%)	PRE (%)	REC (%)	F1_S (%)
The Proposed Method	99.84%	99.93%	99.90%	99.90%
WKN-Transformer	98.90%	98.79%	98.61%	98.46%
DDFE	77.92%	83.98%	77.24%	73.16%

Table 6. Metrics in the comparison experiment.

Models	ACC (%)	PRE (%)	REC (%)	F1_S (%)
The Proposed Method	99.84%	99.93%	99.90%	99.90%
CNN	62.05%	70.97%	57.96%	54.61%
CNN-LSTM	69.31%	78.89%	64.50%	61.28%
CNN-GRU	76.39%	86.43%	73.05%	67.99%
VGG19	88.61%	89.08%	85.11%	84.23%
ZFNet	78.05%	81.52%	71.97%	70.14%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, C.; Gao, J.; Wang, Z.; Liu, M.; Zou, J.; Zhao, Z.; Yan, J.; Guo, J. Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals. Processes 2024, 12, 2094. https://doi.org/10.3390/pr12102094

AMA Style

Ma C, Gao J, Wang Z, Liu M, Zou J, Zhao Z, Yan J, Guo J. Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals. Processes. 2024; 12(10):2094. https://doi.org/10.3390/pr12102094

Chicago/Turabian Style

Ma, Chenggong, Jiuyang Gao, Zhenggang Wang, Ming Liu, Jing Zou, Zhipeng Zhao, Jingchao Yan, and Junyu Guo. 2024. "Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals" Processes 12, no. 10: 2094. https://doi.org/10.3390/pr12102094

APA Style

Ma, C., Gao, J., Wang, Z., Liu, M., Zou, J., Zhao, Z., Yan, J., & Guo, J. (2024). Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals. Processes, 12(10), 2094. https://doi.org/10.3390/pr12102094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals

Abstract

1. Introduction

2. Theoretical Background

2.1. Wavelet Kernel Network

2.2. Convolutional Block Attention Module

2.3. Transformer

2.3.1. Position Code

2.3.2. Self-Attention

2.3.3. Multi-Head Self-Attention Mechanism

3. DDFE-Transformer Fault Diagnosis Method

4. Experiments and Analysis

4.1. Experimental Setup and Data Processing

4.1.1. Data Presentation

4.1.2. Dataset Split

4.1.3. Data Preprocessing

4.2. Parameter Setting

4.3. Metrics

4.4. Experimental Results and Analysis

4.4.1. Ablation

4.4.2. Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI