1. Introduction
In modern manufacturing, ensuring the reliability and precision of milling machines is vital due to their wide use in various industries [
1]. Milling machines play a key role in transforming raw metal workpieces into finished products by performing complex operations [
2,
3]. These operations, characterized by high speeds and heavy loads, make milling machines susceptible to unexpected faults, particularly in main components such as cutting tools, gears, and bearings. Among the various faults encountered, mechanical failure accounts for 57% of all issues, with bearing failures representing 42% of these cases. Furthermore, faults in gears and bearings can disrupt the functionality of the spindle system, leading to downtime and financial losses. For instance, cutting tool failures contribute to approximately 20% of unexpected downtime, significantly impacting operational efficiency [
4]. Mechanical component failures in milling machines not only increase maintenance costs but also lead to longer downtime, injuries, fatalities, and reduced cutting speeds [
5]. As such, addressing these faults promptly is essential for maintaining operational efficiency and minimizing financial impacts. This study focuses on developing an intelligent fault diagnosis technique to detect and identify mechanical component failures in milling machines, specifically targeting gears, bearings, and cutting tools.
Over the past decade, intelligent condition monitoring techniques have transformed manufacturing by enhancing efficiency, sustainability, and profitability [
6]. Condition monitoring involves the real-time assessment of machine health using sensors and artificial intelligence (AI). The literature identifies two major technologies for machining condition monitoring: direct and indirect methods. Direct methods analyze variations in tool geometry, such as flank wear and surface quality, using techniques like machine vision [
7]. Despite their accuracy, direct methods are less favored due to their long downtime, susceptibility to cutting fluids, limited access to machine parts, and dependence on lighting conditions. In contrast, indirect methods utilize sensors to capture physical signals, such as vibrations [
8] and acoustic emissions (AEs) [
9], which are then processed using advanced signal processing and AI algorithms. AE-based monitoring has several advantages over conventional techniques, as it operates at higher frequencies, reducing interference from machine vibrations and environmental noise [
10]. The frequency range of the AE signal is above that of vibrations from machine tools and any noise interference from the surroundings of the milling machine.
The shear stress in the material, which results from the interaction between the tool and the workpiece during machining, generates acoustic emissions. These emissions can characterize changes in material removal, chip formation, and tool wear generation. Furthermore, AE signal analysis allows for the monitoring of critical rotating machinery within the milling machine due to its non-directional nature. AE sensors are easy to install and are less impacted by tool geometry and cutting circumstances. Compared to other direct and indirect non-destructive testing techniques, AE is an appealing option for industrial applications because of these special qualities. Therefore, this study uses AE-based non-destructive testing to monitor essential components, such as bearings, gears, and cutting tools, within milling machines.
Related Literature
AE refers to the release of elastic energy from materials undergoing deformation or fracture, detected by AE sensors as acoustic emission hits (AEHs). The presence of faults in a milling machine changes the distribution of the AE signals. Researchers are concentrating on extracting fault-related information from AE signals across the temporal domain (TD), spectral frequency domain (SD), and multiresolution time-frequency domain (TFD) [
11]. Once fault indicators have been extracted, AI approaches are used to identify faults within milling machines. Twardowski et al. [
12] developed a framework for tool condition monitoring utilizing a temporal domain indicator, i.e., RMS, extracted from a fault-specific frequency band, and classified using a decision tree. Medina et al. [
13] employed AE Poincaré plots with a random forest classifier to detect broken teeth, pitting, scuffing, and crack defects in gears. Li et al. [
14] implemented long short-term Memory (LSTM) and support vector data description (SVDD) to detect wear states on one-dimensional TD signals. The frequency spectrum of the AE signal is changed by the faults in the mechanical components of milling machines, which makes spectral domain analysis a viable approach for identifying health conditions. Bai X. [
15] introduced a lightweight deep-learning model for diagnosing faults in CNC electromechanical systems, which outperformed conventional neural network classification algorithms in terms of model classification accuracy when trained on TD and SD statistical indicators such as mean, mean frequency, variance, RMS, and RMS frequency, since the AE signals from milling machines are non-stationary [
16]. To overcome these limitations, it is necessary to use modern TFD techniques to extract significant fault-related information. These techniques include wavelet transform (WT) [
17], variational mode decomposition (VMD) [
18], and empirical mode decomposition (EMD) [
19]. Hussain et al. [
20] used statistical indicators from signals decomposed by optimal WT including mean, max, min, standard deviation, kurtosis, skewness, and Shannon entropy to diagnose faults in the machining center. Their study identified coif8 and db14 fifty mother wavelets at decomposition level 3 as the most effective for the ongoing health conditions of the machining center. Ding et al. [
21] proposed a lightweight model for identifying faults in the tools and spindles using discrete WT and lightweight CNN. Wang et al. [
22] introduced a tool fault diagnosis framework using marine predator-optimized random forest trained on features extracted from machining center signals decomposed by EMD.
Despite the high performance demonstrated for the diagnosis of faults in essential milling machine components such as bearings, gears, and tools, the literature has several limitations and challenges. The sensitivity of TD statistical indicators is compromised by interference noise in AE signals, making them less effective for diagnosing the health of machining centers. While SD analysis is suitable for stationary signals, the complexity and non-stationary nature of AE signals from milling machines limit the effectiveness of SD indicators for fault detection and identification. TFD techniques, though powerful, are computationally intensive when processing AE signals [
23]; this means that EMD, though self-adaptive, suffers from issues like extreme interpolation and mode mixing.
Faults in the mechanical components of milling machines result in changes in stiffness, which release elastic energy that is detected by AE sensors as acoustic emission hits (AEHs) [
19,
20]. AEH features, such as hit peak, hit counts, rise time, decay time, and hit average frequency, are independent of the overall AE signal distribution. These features provide direct and valuable information about AE events caused by defects. They outperform traditional TD, SD, and TFD indicators in various applications, including railways [
24], structural damage monitoring [
25], pipelines [
26], and rotating machinery [
27]. AEH in the milling process arises from multiple sources, such as machine vibrations, tool interactions, and environmental noise, making it challenging to isolate AEH features for accurate fault identification. To overcome these challenges, this study uses a new approach. First, the AE signals are transformed into continuous wavelet transform (CWT) scalograms, which provide a rich time-frequency representation of the AE signals, extracting both the transient and non-stationary characteristics of the faults. The presence of noise in AE signals can introduce uncertainty and reduce the reliability of the fault diagnosis. To reduce this, Gaussian filtering is applied to the scalograms, which smooths the images and reduces noise while retaining essential fault-related data [
28].
In recent years, deep learning techniques have outperformed traditional machine learning approaches due to their ability to independently learn discriminative features from raw data, without the need for manual feature engineering [
29]. Among the most widely used deep learning models for defect identification are convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and deep belief networks (DBNs). CNNs are highly effective in extracting spatial features from data by using local receptive fields and shared weights. These properties reduce computational complexity and minimize the risk of overfitting. CNNs have already shown good results in diagnosing faults in mechanical components such as bearings, gears, and cutting tools [
30]. However, while CNNs are effective in spatial feature extraction, they are less capable of capturing the temporal dependencies inherent in sequential data, such as AE signals. To address this limitation, the proposed hybrid model combines a CNN with an LSTM network. The hybrid approach allows the model to utilize the strengths of both CNN for spatial feature extraction and LSTM for capturing long-range dependencies in time-series data, making it particularly well-suited for analyzing AE signals collected during milling operations. In the proposed approach, the enhanced CWT scalograms are fed into a hybrid deep learning model, which consists of a CNN, a BiLSTM, a GA, and a fully connected layer. The CNN extracts high-level patterns from the denoised scalograms, while the BiLSTM captures the temporal dependencies in AE signals. Additionally, a GA is added to optimize feature selection, reduce redundancy, and improve classification accuracy. The fully connected layer combines the optimized features from the hybrid model and performs the final classification of the fault classes in the milling machines. This approach is validated on AE data obtained from a milling machine, and the experimental results show that the proposed method achieves superior fault detection and classification performance compared to traditional methods. The main contributions of this paper can be summarized as follows.
A Gaussian filter is applied to enhance the energy color intensity variations across multiple scales and frequencies in AE scalograms.
A hybrid deep learning framework that combines enhanced AE scalograms with CNN-BiLSTM models and GA for feature optimization is proposed to enhance fault detection in milling machines.
The proposed approach is validated using AE data collected from a controlled laboratory milling machine testbed, demonstrating its effectiveness in detecting various fault conditions.
The structure of this paper is organized as follows.
Section 2 describes the methodology of the proposed approach. The findings are presented in
Section 3. Finally,
Section 4 offers the study’s conclusions.
2. Proposed Method for Fault Diagnosis in Milling Machines
The proposed method involves converting AE signals into CWT scalograms, applying Gaussian filtering, using a CNN architecture with BiLSTM, enhancing it with GA, and following this with a fully connected layer for multiclass fault classification. The complete workflow of the proposed hybrid model is shown in
Figure 1. The proposed method can be explained as follows.
Step 1: This method starts with the preprocessing of AE signals from milling machines. AE signals are converted into CWT scalograms. This transformation allows us to capture both time and frequency information from the AE signals, which is critical for detecting transient patterns unique to each fault type. To improve the quality of the CWT images, a Gaussian filter is applied. This filtering process smooths the images, reduces the noise, and helps in improving the robustness of the feature extraction process.
Step 2: After denoising, the filtered AE signals are processed through a CNN to extract high-level features. The VGG16 model, pre-trained on the ImageNet dataset, is utilized for feature extraction. This model includes several convolutions and pooling layers designed to capture spatial features from the CWT scalograms. The convolution base of VGG16 is frozen to use the pre-trained weights and avoid retraining the entire model, which saves computational resources and time. Custom layers, including a flattened layer, are added on top of VGG16 to prepare for further processing. After extracting features using CNN, a BiLSTM network is added behind the CNN to capture temporal dependencies from the CNN-extracted features. The BiLSTM network processes the sequential data in both the forward and backward directions, obtaining long-range dependencies and enhancing the feature extraction capabilities. This integration addresses the problem of information wastage in the time series direction, which is important for accurately detecting faults in dynamic systems like milling machines. To further refine the extracted features from the BiLSTM output, GA is employed after the BiLSTM. The GA optimizes feature selection, reducing redundancy and focusing on the most informative features. For example, features associated with certain time-frequency patterns—such as transient bursts in tool faults or repeated impact signatures in bearing faults—are prioritized, allowing the model to identify key characteristics that distinguish each fault type. The process involves initializing a population of potential feature subsets, evaluating their performance using a fitness function, and applying selection, crossover, and mutation operations to evolve the population over several generations. This optimization approach improves the model’s accuracy efficiency by ensuring that only the most useful features are fed into the final layers.
Step 3: Finally, the obtained discriminant selected features are fed into a fully connected layer for fault classification. The proposed model is trained on the selected features using the training dataset. The model is compiled with the Adam optimizer and the categorical cross-entropy loss function. Training is conducted for several epochs, and the performance is monitored using validation accuracy. This layer assigns a fault class (e.g., Tool Fault, Bearing Fault, Gear Fault, or Normal) based on learned patterns, with high accuracy achieved through the combination of spatial, temporal, and optimized feature selection. The model achieves high validation accuracy, thus ensuring an effective approach for effective fault detection in milling machines. The final classification distinguishes among the four fault classes (Tool Fault, Bearing Fault, Gear Fault, and Normal) with high accuracy.
2.1. Continuous Wavelet Transform
CWT is a powerful signal-processing technique used to analyze non-stationary signals, such as AE signals, in the time-frequency domain. Unlike the Fourier transform, which only provides frequency information, CWT preserves both temporal and frequency information [
31], making it ideal for capturing transient and localized features in signals. Mathematically, the CWT of a signal
is defined as shown in Equation (1):
where
a and
b are the scale and translation parameters, respectively, and
is the mother wavelet. The scale parameter
a determines the frequency resolution, while the translation parameter
b determines the temporal resolution. The relationship between the scale
a and frequency
f can be expressed as
f =
fc/
a where
fc is the central frequency of the wavelet. By applying the CWT to AE signals, we generated scalograms that visualize the energy distribution of the signals across various scales and frequencies.
Figure 2 illustrates these scalograms under normal and distinct faulty conditions. In the scalogram of the normal condition, the energy distribution appears relatively uniform and stable, with lower intensities across frequencies, indicating consistent and steady operation without disruptions. In contrast, the scalograms of the fault conditions reveal high-energy regions concentrated at specific frequencies and time intervals, reflecting transient bursts and irregularities that correspond to the faulty states. This clear contrast in energy patterns between normal and defective conditions highlights the ability of CWT scalograms to capture localized and transient features that are indicative of faults. The differences in energy distribution across the scalograms enable us to identify distinct fault patterns, which are then used for feature extraction and classification in the subsequent stages of the fault diagnosis process.
2.2. Gaussian Filter
The Gaussian filter is a linear filter used in image processing to smooth images by reducing noise while keeping key structural features. It operates by applying a Gaussian function, defined as shown in Equation (2):
where
x and
y are pixel coordinates, and
σ is the standard deviation controlling the extent of smoothing. In this work, the Gaussian filter is applied to CWT scalograms generated from AE signals obtained from milling machines. These scalograms provide a detailed time-frequency representation of the AE signals, which is necessary for detecting and classifying faults. The application of the Gaussian filter smooths the scalograms, reducing noise and random pixel fluctuation while maintaining the essential energy patterns associated with different fault conditions [
32]. Gaussian filtering is effective in reducing noise while preserving essential features within the scalograms. Unlike other filters, the Gaussian filter smooths high-frequency noise without distorting the primary structure of the CWT images, which is important for accurate fault detection. This step improves the clarity of the CWT images, ensuring that further CNN processing focuses on meaningful data, leading to more accurate fault detection and classification. The visualization of CWT scalograms after applying the Gaussian filter is shown in
Figure 3.
2.3. Convolutional Neural Network
A CNN is a specialized class of deep learning model particularly adept at processing data with a grid-like structure, such as images. In the context of this research, a CNN is utilized to analyze CWT scalograms generated from AE signals. These scalograms provide a detailed time-frequency representation of the signals, which is important for identifying faults in milling machines. A traditional CNN architecture is mainly composed of several key components: convolutional layers, pooling layers, and flattened layers [
33].
The convolutional layer is the fundamental building block of a CNN, responsible for automatically learning spatial hierarchies of features from the input images. This layer applies a set of filters across the input data, performing a convolution operation to produce feature maps. Each filter is designed to detect specific patterns, such as edges, textures, or shapes within the scalograms. The operation of a convolutional layer can be mathematically described by Equation (3) as follows:
where
represents the output feature map at layer l for the
j-th filter.
denotes the input feature map at the previous
.
is the convolutional kernel applied between the
i-th input and the
j-th output feature map.
is the bias term added to the output. This layer can detect important features at different levels of abstraction and is key to the success of CNNs in fault detection from CWT scalograms.
Pooling layers are typically inserted after convolutional layers to reduce the spatial dimensions of the feature maps while retaining the most important features. This down-sampling process helps to make the network more efficient by reducing the number of parameters, thus reducing the risk of overfitting and speeding up computation. The operation of a pooling layer, particularly max pooling, can be expressed as shown in Equation (4).
where
is the pooled output corresponding to the
j-th feature map at the layer
, and
is the input feature map of the pooling layer. The pooling layer helps in summarizing the feature maps, thereby preserving the essential patterns while reducing the spatial resolution, which is beneficial for further layers.
The flattened layer serves as the bridge between the convolutional/pooling layers and the fully connected layers that follow. It takes the multi-dimensional output of the convolutional and pooling layers and converts it into a one-dimensional vector. Mathematically, if the output of the final pooling layer is a feature map of size
h ×
w ×
d (height, width, depth), the flattened layer reshapes it into a vector of length
h ×
w ×
d, as shown in Equation (5).
where
Z is the input feature map tensor of the shape (
h ×
w ×
d), and
v is the resulting one-dimensional vector, as can be seen in
Figure 4.
In this work, CNN extracts and processes features from CWT scalograms derived from AE signals. By using a convolutional layer for feature detection, pooling layers for dimensionality reduction, and flattened layers for preparing the data for the next round of processing, CNN enables efficient fault diagnosis in milling machines.
Figure 4 is the architecture diagram of the CNN for the proposed method. The specific CNN architecture used in this study is based on the VGG16 model, a well-established CNN known for its depth and performance in image recognition tasks. The VGG16 model’s convolutional base is pre-trained on the ImageNet dataset, and its weights are frozen to preserve the learned features. Custom layers, including a flattened layer, are added on top of the VGG16 base to adapt the model to specific fault detection.
2.4. Bidirectional Long Short-Term Memory
BiLSTM networks are an advanced type of recurrent neural network specifically designed to handle sequential data while holding long-term dependencies effectively. Traditional LSTM networks process input data in a single direction, either forward (from past to future) or backward (from future to past), which might limit their ability to fully capture the context of the sequence. In contrast, BiLSTM networks process data in both forward and backward directions [
34], allowing the model to understand the context from both the past and the future simultaneously.
In this paper, BiLSTM networks are integrated with CNN to enhance the temporal feature extraction from CWT scalograms of AE signals. The CNN extracts spatial features from the CWT images, while the BiLSTM captures temporal dependencies in these features, enabling the model to consider the sequence of events over time. The forward and backward hidden states generated by the BiLSTM are combined to produce a comprehensive representation of the sequence. This is mathematically represented as shown in Equation (6):
where
is the hidden state from the forward LSTM pass,
is the hidden state from the backward LSTM pass, and
is the concatenated output of the BiLSTM at the time step
t, as can be seen in
Figure 5. By capturing these bidirectional dependencies, the BiLSTM improves the model’s ability to accurately detect faults in the milling machine by considering both preceding and succeeding signal characteristics.
2.5. Genetic Algorithm
A GA is a type of evolutionary algorithm used to solve optimization problems by mimicking the process of natural selection. A graphical flow of the steps involved in the GA is provided in
Figure 6. In this study, a GA is utilized after the BiLSTM layer to optimize the selection of features that have been extracted from the AE signals. The process begins with the initialization of a population of feature subsets, each represented as a binary chromosome. The GA then iteratively evolves this population through the following steps. (I) Individuals with higher fitness (i.e., better performance in terms of classification accuracy) are selected to form a new generation. (II) Selected individuals are combined to create new offspring by exchanging genetic material, promoting diversity in the population. (III) Random changes are introduced to some individuals to explore new feature subsets and avoid local optima [
35].
Initialization: GA begins by creating a population of feature sets, each represented as a binary string e.g., [1, 0, 1, 0], where 1 includes a feature, and 0 excludes it.
Fitness Evaluation: Each feature set’s classification accuracy is tested on a simplified model. Higher accuracy indicates a better-performing subset.
Selection, Crossover, and Mutation: During selection, the best-performing feature sets are chosen as “parents.” In the crossover step, parents combine to create new “offspring” feature sets. Finally, during mutation, small random changes are introduced, allowing the exploration of new feature combinations.
To ensure optimal performance and computational efficiency, key GA parameters were carefully selected based on experimental testing and are described as follows. Population size was set to 50, balancing diversity and computational load and allowing for effective exploration of the feature space. Generations were set to 8, providing sufficient opportunity for convergence. Crossover probability was set at 0.5, fostering genetic diversity and broad search space exploration. Mutation probability was set at 0.2, allowing for the exploration of varied feature combinations and helping to prevent local optima. These optimized parameters, as given in
Table 1 (population size = 50, generations = 8, crossover probability = 0.5, mutation probability = 0.2), ensure both computational efficiency and classification accuracy, focusing on discriminative feature selection.
The fitness function used in this GA is designed to evaluate the classification performance of each feature subset, specifically targeting the ability to accurately distinguish among different fault conditions. The goal is to maximize this fitness function, thereby identifying the most relevant features for fault diagnosis in milling machines. Once the GA process converges, the optimized feature subset is fed into the fully connected layer of the proposed model. GA refines the feature set by selecting the most relevant features, enhancing classification accuracy. Discarding irrelevant features reduces training time and enhances accuracy, making the model well-suited for real-time fault diagnosis.
2.6. Fully Connected Layer
The FC layer, also known as a dense layer, is an essential component of deep learning models where the final decision-making process takes place [
36]. After the most optimized features have been selected by the GA, they are fed into one or more fully connected layers for classification. In the FC layer, every neuron is connected to all the neurons in the previous layer, as shown in
Figure 7, allowing the network to learn complex relationships between features. The operation of the FC layer can be described by Equation (7) as follows:
where
is the output of the
i-th neuron in the fully connected layer,
represents the
j-th input feature,
is the weight associated with the connection between the
j-th input and the
i-th neuron, and
is the bias term.
The FC layer synthesizes the selected features to produce a final classification decision. In this approach, the output layer of the FC network uses a softmax activation function to generate probabilities for each of the four fault classes: Tool Fault, Gear Fault, Bearing Fault, and Normal. The category with the highest probability is selected as the model’s prediction, ensuring accurate fault classification in milling machines.
2.7. Establishment of a Hybrid Model
In this study, a hybrid deep learning model for fault diagnosis and classification in milling machines, combining the strengths of a CNN, BiLSTM networks, and a GA, is proposed. The architecture of the hybrid model is shown in
Figure 8, which provides a visual overview of the proposed model’s key components and data flow.
The process begins with an input image of size 224 × 224 × 3, representing scalograms of the AE signals. These images are fed into a pre-trained VGG16 CNN model, which has been modified by removing its top layer to serve as a feature extractor. The VGG16 model processes the input through multiple convolutional and pooling layers (Conv1 to Conv5), progressively obtaining complex spatial features of the input data. The final output of the CNN block is a 7 × 7 × 1024 feature map, which is then flattened into a one-dimensional vector.
The flattened feature vector is further passed through the BiLSTM network. The BiLSTM network is designed to capture temporal dependencies in the extracted features by processing data in both the forward and backward direction. This bidirectional approach enables the model to better understand the temporal dynamics of the AE signals, which is important for accurate fault detection. Following the BiLSTM layer, the feature vectors are fed into a GA block. The GA is used to perform an optimal selection of the most relevant features, thereby enhancing the model’s performance while reducing computational complexity. By iteratively selecting, crossing over, and mutating the features, the GA refines the feature set to improve the overall classification accuracy.
The optimized features selected by the GA are then passed into a fully connected layer, where the final classification is performed. This layer consists of densely connected neurons that aggregate the features and map them to the output classes, corresponding to different fault conditions in the milling machine, such as Gear Fault, Tool Fault, Bearing Fault, and Normal, as shown in
Table 2 of the complete proposed hybrid model summary. The output is generated through a softmax activation function, providing a probabilistic prediction for each class. The integration of a CNN, BiLSTM, and a GA into a single hybrid model helps the unique advantages of each component, which results in a powerful and efficient system for fault detection in milling machines. This architecture effectively handles the complexity of AE signals, providing high accuracy in diagnosing and classifying different fault types. The proposed model’s scalability and adaptability make it suitable for real-time industrial applications, where reliable fault detection is important for maintaining operational efficiency.
2.8. Feature Importance Analysis
To enhance interpretability, we assessed feature importance by analyzing the contributions of selected features to the final classification decision.
Figure 9a is the distribution of features;
Figure 9b shows features associated with specific frequencies and temporal patterns prioritized by the GA during selection, indicating their high relevance in distinguishing between fault conditions. This focus on key features supports a rule-based interpretation, allowing the model to effectively use these discriminative characteristics to diagnose faults accurately. This insight provides a basis for understanding which signal features are most indicative of each fault type, enhancing the interpretability of the model’s decisions.
3. Results and Performance Evaluation
The effectiveness of the proposed method is assessed using AE data obtained from an actual milling machine. Since the primary aim of this method is to detect and diagnose faults in milling machines, this section begins by comparing the fault detection capability of the proposed method with existing time-domain indicators. Subsequently, the method’s performance in fault classification is evaluated against other current approaches.
3.1. Experimental Setup and Data Acquisition
AE signals were recorded from a real milling machine setup; the experimental setup is shown in
Figure 10. The milling operations were conducted on an INTER-SIEG X1 Micro Mill Drill built from cast iron, which functions similarly to a small-scale pillar drill. The primary focus of this experiment was on straight parallel milling operations performed on steel workpieces, a process typically used for shaping and machining hard materials. The cutting tool used was a two-flute carbide end mill, worn to average flank wear of 0.3 mm, in alignment with ISO-8688-2 standards [
37] for tool lifespan. The machining was conducted under the following conditions: motor speed of 1320 RPM (22 Hz), spindle speed of 660 RPM (11 Hz), and bed feed rate of 0.4 mm/s. These conditions ensure stable, consistent AE signal generation, capturing distinct fault signatures. Five steel pieces, each with dimensions of 20 mm, 35 mm, and 35 mm, were utilized during the experiment. The initial state of these workpieces is shown in
Figure 11a, with a processed example displayed in
Figure 11b.
The two channels (channels 1 and 2) were set to identical acquisition conditions, including the same bandpass filter range and threshold settings, to ensure consistency across both channels. A bandpass filter was applied to each channel to remove low-frequency noise and high-frequency interference, which helps optimize signal quality for subsequent analysis. Data from both channels were collected simultaneously to capture synchronous signals across different components, allowing for accurate, time-aligned comparisons in fault analysis. To monitor the AE signals, an R15I-AST from MISTRAS, Inc., USA, was attached to the milling machine using industrial-grade adhesives. The signals were collected using the NI-9223 data acquisition system from National Instruments, with custom software developed by the Ulsan Industrial Artificial Intelligence Laboratory in Python 3.11. The AE data were collected at a high sampling frequency of 1 MHz, with each 1 s sample containing one million data points. Before the actual data acquisition began, the HSU–Nelson test was conducted to verify the proper functioning of the AE sensors. Both sensors successfully detected AE events during the test, confirming their readiness for the experiment. Two AE sensors were used in the experiment; the main sensor was fastened to the spindle, while a secondary sensor was attached to the motor. The sensor on the motor acted as a guard transducer, as depicted in
Figure 10, which helped to filter out irrelevant signals and noise, ensuring that the primary sensor focused on collecting vital data related to the tool, bearing, and gear conditions. The data collection began under the normal operating conditions of the milling machine. According to ISO-8688-2 standards, a tool’s lifespan is characterized by an average flank wear of 0.3 mm. However, in practice, tools may fail catastrophically even in the early stages of their life, especially when machining hard materials. To simulate these conditions, the tool, made of carbide, was intentionally worn to an average of 0.3 mm, and data were collected under these defective conditions. Additionally, an initial defect was introduced into the outer race of the bearing supporting the tool, and AE signals were recorded during machining. A small metal fragment was also removed from one of the gear teeth, which transmits torque from the motor to the spindle, creating a fault, and AE signals were similarly recorded during operation.
In total, 100 samples were recorded for each operating condition, with simultaneous data acquisition on both channels to maintain data integrity and consistency across conditions.
Table 3 provides an overview of the dataset taken from the milling machine. For ease of reference, the normal condition is labeled N, while the tool, bearing, and gear faults are labeled TF, BF, and GF, respectively.
Table 4 provides a clear, high-level overview of the experimental setup and dataset for reference. A 1 s AE signal recorded under N, TF, BF, and GF conditions is depicted in
Figure 12a–c. The AE frequency domain signals for different fault conditions (Bearing Fault sample, Gear Fault sample, Normal sample, and Tool Fault sample) are shown in
Figure 13a–d. The faulty tool, bearing, and gear components used in the experiment are shown in
Figure 14a–c.
3.2. Performance Metrics for Comparisons
In this study, we used a hybrid framework designed for fault diagnosis in milling machines, combining CNN and BiLSTM networks with a GA. This hybrid approach capitalizes on CNNs for feature extraction from CWT scalograms of AE signals, followed by temporal feature processing through BiLSTM. The final classification step is optimized using a GA, ensuring the most important and relevant features are utilized for accurate fault classification. To evaluate the effectiveness of this approach, we considered several performance metrics, including accuracy, precision, recall, and F1-score, as shown in Equations (8)–(11). These metrics are useful for assessing the model’s ability to correctly classify different fault conditions within the milling machine. The mathematical expressions for these metrics are detailed below
Here, TP (true positive) refers to the instances correctly classified as faulty conditions, while TN (true negative) indicates instances correctly identified as non-faulty. FP (false positive) represents cases where the model incorrectly classifies a non-faulty instance as faulty, and FN (false negative) indicates faulty instances incorrectly identified as non-faulty by the model. These metrics provide a comprehensive evaluation of the model’s classification performance, ensuring that the hybrid CNN-BiLSTM-GA approach is both robust and accurate in detecting and diagnosing faults in milling machine operations. The use of a GA for feature selection enhances the model’s ability to distinguish between different fault types effectively, leading to improved diagnostic accuracy and reliability.
3.3. Comparative Analysis of Fault Diagnosis Methods
To evaluate the effectiveness of our proposed fault detection method, we compare it with three other methods: CWT-CNN [
26], FFT-CNN [
38], and STFT-CNN [
39]. The performance of each method is assessed based on the matrices. The results obtained from the proposed and reference methods are presented in
Table 5, while the per-class true positive rate is presented in
Table 6.
In the proposed method, AE signals are first transformed into CWT scalograms, providing a detailed time-frequency representation. The CWT images are denoised using a Gaussian filter, which smooths the images and reduces noise. This step significantly improves the clarity of the CWT images, ensuring that the feature extraction process focuses on meaningful data for more accurate fault diagnosis. The denoised CWT scalograms are processed by the VGG16 model. The convolutional base of the VGG16 is used for feature extraction, helping its deep architecture and pre-trained weights. To assess the importance of each method within the model, additional experiments are conducted to compare results before and after applying key methods, such as Gaussian filtering and BiLSTM for temporal dependencies, as well as GA for optimized feature selection. The features extracted by the VGG16 are then processed by a BiLSTM network, which captures temporal dependencies in both forward and backward directions. This BiLSTM step enhances the model’s ability to recognize sequential fault patterns, contributing to improved classification accuracy. A GA is added to optimize feature selection, reduce redundancy, and focus on selecting the most informative features. By selecting only the most discriminative features, the GA enhances the model’s efficiency and classification accuracy. The optimized features are fed into a fully connected layer for final classification. The proposed method achieves an accuracy of 99.6%, outperforming the reference methods, as shown in
Table 5. Additionally, the per-class true positive rate of the proposed method exceeds that of the reference methods, as detailed in
Table 6.
In the CWT-CNN method, AE signals are transformed into CWT scalograms, which are then fed into the CNN for feature extraction and classification. The CNN architecture includes convolutional layers, pooling layers, and a flattened layer, followed by fully connected layers for fault classification. The CWT-CNN method achieves an accuracy of 96.5%. The high accuracy of the CWT-CNN method demonstrates the superiority of CWT in obtaining important features of AE signals. However, the method does not explicitly capture temporal dependencies in the data, which results in lower accuracy as compared to the proposed method. The per-class true positive rate of the CWT-CNN is higher than 95% except for the GF, as can be seen in
Table 6. Therefore, this method can be used for BF and TF detection in the milling machine.
The FFT-CNN method involves transforming AE signals using fast Fourier transform (FFT) to obtain frequency-domain features. The FFT-CNN method achieves an accuracy of 87.5% with a per-class TPR lower than the proposed method, as presented in
Table 5 and
Table 6. The FFT provides valuable frequency-domain insights, but it does not capture the transient and non-stationary nature of AE signals as effectively as CWT. This results in a loss of the most important information necessary for fault detection.
The STFT-CNN method uses a short-time Fourier transform (STFT) to convert AE signals into a time-frequency representation. The STFT-CNN method achieves an accuracy of 93.5%. STFT-CNN performs better than FFT-CNN. The windowing information loss inherent in STFT results in lower resolution in both domains compared to CWT, which likely contributes to the slightly lower accuracy.
The results of our study demonstrate the effectiveness of the proposed method, which integrates AE signal processing with advanced deep learning and evolutionary algorithms. By addressing the limitations of traditional approaches, our method provides a reliable and accurate solution for fault diagnosis in milling machines, achieving higher accuracy than the CWT-CNN, FFT-CNN, and STFT-CNN methods. This research paves the way for improved predictive maintenance and operational efficiency in industrial settings.
4. Conclusions
This study introduced a fault detection method for milling machines by combining high-frequency AE detection with an advanced DL algorithm. The primary goal was to diagnose faults in critical components such as cutting tools, gears, and bearings with high accuracy and efficiency. To achieve this, a Gaussian filter was applied to CWT scalograms to remove noise, ensuring that feature extraction and classification were based on clean data. The use of a VGG16-based CNN and a BiLSTM network enhanced the model’s ability to capture both spatial and temporal features, while the addition of a GA for feature selection further refined the model, resulting in an impressive 99.6% diagnostic accuracy.
This approach provides a robust solution for predictive maintenance by reducing unplanned downtime and maintenance costs and enhancing the operational efficiency of milling machines. To adapt this model to real-time industrial monitoring, AE sensors should be installed on key machine components to continuously capture signals, which are then sent in real time to a processing unit. A Gaussian filter should be applied immediately to reduce noise, retaining essential features for accurate fault analysis. Using edge computing or industrial GPUs, our CNN-BiLSTM model can process the signals rapidly for real-time classification. The model can also be updated with new data to remain effective as machine conditions change, thus enhancing long-term reliability. An alert system would notify operators immediately upon fault detection, allowing for timely maintenance and minimizing downtime. This setup ensures that the model is suitable for real-world applications, providing both accuracy and responsiveness in monitoring machine health. In future work, the proposed hybrid model can be extended by incorporating additional sensor data, such as force and temperature signals, to improve the reliability of fault diagnosis in milling machines. Furthermore, expanding the dataset with a wider variety of fault types and operating conditions will help to refine the model’s generalizability for real-time applications in industrial environments.