1. Introduction
In contemporary manufacturing, tool wear represents a critical issue that demands considerable attention. Excessive tool wear not only compromises machining precision but also diminishes product yield rates, thereby substantially escalating production costs. Tool wear prediction is inherently challenging due to the dynamic coupling of time-varying cutting parameters, heterogeneous workpiece materials, and environmental vibrations. Traditional mechanistic models, while interpretable, may encounter challenges in fully capturing the nonlinear interdependencies among these factors and could exhibit constrained real-time adaptability to dynamic degradation patterns, especially when handling asynchronous multi-sensor data streams. Consequently, developing a novel tool wear prediction method that improves accuracy remains a critical need in industrial applications.
Currently, the mainstream methods for predicting tool wear can be categorized into three groups: physical model-based approaches [
1], data-driven methods [
2], and intelligent algorithm-based techniques [
3]. The physical model-based method predicts the wear state of tools by analyzing and modeling physical phenomena during the cutting process, such as cutting force and temperature. Zhang et al. (Mechanical Systems and Signal Processing, 2023) [
4] established an accurate physical model of tool wear based on the relationship between cutting force and tool wear, achieving effective prediction of tool wear. Yang et al. [
5] constructed a model correlating tool wear rate with cutting parameters based on wear influencing factors and accurately predicted the tool wear curve by stretching the abscissa. Zhang et al. (Metals, 2023) [
6] proposed a tool wear rate model based on the positive feedback relationship between tool geometry and wear rate. While these models provide interpretable insights, they rely on assumptions that may reduce their applicability to complex real-world conditions. Data-driven methods, including Convolutional Neural Networks (CNNs) and power signal-based models, have gained traction due to their ability to extract patterns from multi-sensor data. For instance, Zhang et al. (The International Journal of Advanced Manufacturing Technology, 2024) [
7] demonstrated the effectiveness of CNNs in mapping vibration and current signals to tool wear states. Similarly, Wang et al. (The International Journal of Advanced Manufacturing Technology, 2023) [
8] achieved promising results using power signals. However, their performance appears dependent on the availability of comprehensive labeled datasets, and they face challenges in generalizing across diverse machining scenarios. Recent advancements, such as the integration of physical and data-driven models by Fan et al. [
9], aim to mitigate these limitations but still lack robustness in noisy environments. Zhou et al. [
10] developed a monitoring method employing a Multi-scale Edge-labeled Graph Neural Network (MEGNN), though its technical emphasis resides in the feature learning potential of the deep graph architecture for few-shot classification scenarios. Intelligent algorithms, including deep neural networks and attention-enhanced LSTM networks, further push the boundaries of prediction accuracy. Zhao et al. [
11] utilized multi-sensor fusion with deep learning to achieve high precision, while Tian et al. [
12] improved temporal feature extraction using attention mechanisms. Despite their success, computational complexity remains a consideration, and there are overfitting risks. Zhang et al. (Sensors (Basel, Switzerland), 2016) [
13] proposed a wireless triaxial accelerometer method using wavelet denoising and NFNs to predict tool wear and remaining life, showing improved accuracy over traditional neural networks. However, it relies on single vibration sensors without integrating multi-source data (e.g., cutting forces), and subsequent studies could explore enhanced multi-sensor fusion strategies. Recent studies, such as Bampoula et al. [
14], highlight the potential of LSTM autoencoders and transformer encoders for condition monitoring, yet their application to tool wear prediction remains underexplored. Similarly, Cakir et al. [
15] compared machine learning algorithms for predictive maintenance but overlooked the unique challenges of tool wear dynamics. Wu et al. [
16] demonstrated the efficacy of transformers in time-series forecasting, offering insights for improving sequential data modeling in tool wear contexts. Support Vector Regression (SVR)-based methods, optimized via evolutionary algorithms [
17] or hybridized with particle filters [
18], have shown robustness in specific scenarios. For example, Kong et al. [
19] combined kernel PCA with v-SVR to enhance feature fusion, while Benkedjouh et al. [
20] employed nonlinear feature reduction for life prediction. Nevertheless, their application to tool wear prediction warrants deeper investigation to fully exploit the nonlinear relationships and dynamic couplings inherent in tool wear data. Chen et al. [
21] developed a DBO-optimized 1DCNN–LSTM model for high-accuracy prediction of bearing surface roughness, demonstrating the strengths of deep learning models in dynamic signal processing, while requiring substantial labeled data and complex network architectures.
Despite remarkable progress in recent studies, critical aspects of tool wear research remain underexplored, First, existing models predominantly focus on either linear (e.g., Support Vector Regression, SVR) or nonlinear (e.g., deep learning) relationships, with limitations in exploiting their synergistic potential, thereby limiting the characterization of complex tool wear dynamics. Second, although data-driven methods might benefit from more sophisticated signal processing techniques, their prediction accuracy could still be compromised under real-world machining conditions. Additionally, data-driven methods are highly susceptible to environmental and operational noise, leading to compromised prediction accuracy in practical machining scenarios. Conventional neural network training typically employs gradient-based algorithms, which may face challenges related to local optima convergence and computational efficiency, especially when operating under noisy or non-stationary conditions. These limitations collectively hinder the development of robust and efficient prediction systems. Mishra et al. [
22] proposed an unsupervised GMM approach that demonstrates high versatility in tool condition clustering, but this method relies on physical features and did not quantify prediction errors.
The integration of Support Vector Regression (SVR) and Autoencoder (AE) in this work is motivated by two critical research gaps in tool wear prediction: (1) Existing methods predominantly focus on either the linear relationship (e.g., SVR) or nonlinear modeling (e.g., deep learning) and have not been fully exploited in current implementations to address the intertwined linear–nonlinear dynamics of tool wear caused by time-varying cutting parameters, material heterogeneity, and environmental noise. While standalone linear models lack the capacity to capture complex wear patterns, purely nonlinear approaches often overfit limited industrial data. (2) Conventional Neural Network parameter optimization relies on gradient-based methods, which suffer from local optima and inefficiencies under noisy, varying operational conditions. Existing prediction methods often rely on single models that predominantly focus on either linear or nonlinear modeling alone, thereby failing to comprehensively capture complex data characteristics and resulting in limited prediction accuracy. To address this limitation, this study proposes a hybrid model integrating Support Vector Regression (SVR) and Autoencoder (AE). The proposed model explicitly captures linear patterns in data through SVR while leveraging the AE to extract nonlinear latent features. Their synergistic interaction effectively resolves the critical issue of inadequate representation of combined linear and nonlinear features within multidimensional data. Furthermore, to enhance model performance, we incorporate the Ant Colony Optimization (ACO) algorithm for parameter optimization. Experimental results demonstrate high prediction accuracy, which validates the feasibility of the proposed method.
In view of this, the study focuses on the in-depth exploration of complex multi-sensor data, specifically triaxial cutting force signals (in the X, Y, and Z directions), vibration signals, and acoustic emission signals, aiming to improve prediction accuracy in dynamic machining environments. To better capture both the nonlinear and the linear characteristics of the tool wear dataset, this study proposes a two-stage prediction framework: Ant Colony Optimization–Support Vector Regression–Autoencoder (ACO–SVR–AE), which synergistically integrates nonlinear error compensation with swarm intelligence optimization. The rationale for combining linear Support Vector Regression (SVR) and Autoencoder lies in their complementary capabilities: Linear SVR is adept at capturing linear relationships and exhibits robustness with limited training data, whereas Autoencoder effectively models intricate nonlinear patterns and mitigates input noise. The framework initially leverages Support Vector Regression (SVR) to estimate linear trends, followed by Autoencoder-based compensation of nonlinear residuals. Additionally, the Ant Colony Optimization (ACO) algorithm is introduced to optimize Autoencoder parameters, significantly improving convergence efficiency and predictive accuracy. Extensive experiments on the PHM2010 high-speed milling dataset validate the proposed model against multiple benchmarks, including linear Support Vector Regression (SVR), standalone Autoencoder, and conventional models (e.g., random forest, neural networks). The results highlight its exceptional performance in both prediction precision and noise robustness, demonstrating practical potential for industrial applications.
The paper is organized as follows:
Section 1 introduces the research background of tool wear prediction and identifies limitations of existing approaches.
Section 2 provides theoretical foundations for Support Vector Regression (SVR), Autoencoder, and ACO.
Section 3 details the proposed ACO–SVR–AE framework, including its two-stage linear-nonlinear modeling architecture and parameter optimization strategy.
Section 4 validates the model’s superiority through comprehensive experiments on PHM2010 datasets, demonstrating significant improvements in MAE/MSE metrics and noise robustness.
Section 5 concludes with research contributions and future directions.
4. Experimental Results and Analysis
4.1. Introduction of the Dataset
The data are from the Prognostics and Health Management (PHM) Society’s Prognostics and Health Management Competition for High-speed CNC Machine Tool Cutting Tools [
27]. The data size: The dataset contains 315 files, each corresponding to a full milling cycle from tool initiation to failure. Each run recorded three synchronized sensor signals (force, vibration, and acoustic emission).
The data type: The dataset comprise time-series sensor signals (numerical data including cutting forces in Newtons, vibration in g, and acoustic emission in volts), tool wear labels (numerical values in micrometers quantified via post-process microscopy), and metadata capturing operational parameters such as spindle speed, feed rate, and depth of cut.
The data source: Collected by the PHM Society for the 2010 Data Challenge.
The experimental conditions are presented in
Table 1. The PHM2010 dataset was used to verify the effectiveness of the method proposed in this paper. The equipment for collecting experimental data is shown in
Figure 5. The spindle speed of the CNC milling machine was 10,400 r/min, the cutting depth in the Y-direction (radial) was 0.125 mm, the feed rate was 1555 mm/min, and the cutting depth along the Z-axis was 0.2 mm. The PHM2010 dataset contains 945 data points from three tool life experiments. Measurements included triaxial cutting forces (F
X, F
Y, F
Z), vibration, acoustic emission (AE) signals, and spindle motor current (21 statistical features per pass), with flank wear width (VB, µm) as the target variable. Subsequently, after the three signals collected during the cutting process were amplified by a signal amplifier, the original time domain signals were collected. The signal sampling frequency was 50 kHz, and a total of 7 signals were collected.
Experimental data: Acoustic emission signals, milling force signals, and vibration signals in three directions, namely the tool feed direction X, the spindle radial direction Y, and the spindle axial direction Z, were collected. Under fixed working conditions, full life-cycle data of six tools (C1, C2, C3, C4, C5, and C6) were obtained, with each tool cutting 315 times. Off-line wear measurements were carried out for each of the three milling cutters, C1, C4, and C6, and the average wear amount of the three cutting edges of each milling cutter was taken as the tool wear result. In this paper, three groups of experiments were set up, as shown in
Table 2. Specifically, any two of these three tools were used for training, and the other one was used for testing.
Open data provide us with abundant experimental conditions, tool wear data, multi-sensor signal data, dataset structure, and other information. In
Table 3, “flute” represents the groove wear data.
4.2. Data Pre-Processing and Feature Extraction
During milling operations, the data collected by sensors often fluctuate significantly. Such signals are interfered with by factors such as the environment and temperature, and cannot be effectively used to monitor the wear state of the milling cutter, thus making them invalid signals. Moreover, the signals collected by the sensors also contain redundant information. Therefore, to build a reliable tool wear monitoring model, it is necessary to eliminate the useless information in the original signals. In this experiment, each tool ran 315 times. The amount of signal data collected in each run varied greatly, with a maximum of 230,000 data points and a minimum of 120,000 data points. To obtain stable signals, the number of data points selected each time was controlled to between 40,000 and 90,000. The original data presented as a natural time-series signal, which depicts the relationship between physical signals or mathematical functions changing over time. Based on this characteristic, we could extract some statistical time domain features from the original data and use them as discriminative features to input into the deep learning system. In practical applications, statistical features such as the mean value, Root Mean Square (RMS), standard deviation, and variance are widely used. Given the stationarity of this signal, skewness and kurtosis were also extracted and incorporated into the model as input features [
28]. Next, the frequency domain is a coordinate system used to describe the frequency characteristics of signals. In mechanical failures, periodic pulses are common, and the main frequency components contain information and discriminative features. By means of the Fast Fourier Transform (FFT), the time domain vibration signal can be converted into a frequency domain vibration signal, clearly showing the frequency characteristics of the signal [
29]. For non-stationary signals, time–frequency domain features are of great practical value.
In the signal processing process, to convert a one-dimensional signal into a two-dimensional signal that combines time and frequency, the Short-Time Fourier Transform (STFT), wavelet transform/decomposition, and Empirical Mode Decomposition (EMD) are all commonly used and effective methods. The time frequency domain, simply put, integrates the concepts of the time domain and the frequency domain. Its principle is to sequentially extract the corresponding frequency domain information within different time windows in the time domain as time progresses so as to analyze the signal features more comprehensively and in detail [
30]. Wavelet analysis, STFT, and Hilbert–Huang Transform all belong to time frequency domain methods, which can analyze the waveform signals in both the time domain and the frequency domain simultaneously. In addition, some research attempts to fuse the features of the time domain, frequency domain, and time–frequency domain together to present the feature information more comprehensively [
31]. In this paper, 12 features were extracted from each signal, including those in the time, frequency, and time–frequency domains. These features reflect the characteristics of the sensor signals from different perspectives. In the time domain, 7 feature quantities, namely the Root Mean Square, variance, maximum value, minimum value, skewness, kurtosis, and peak-to-peak value, were extracted. In the frequency domain, 4 feature quantities, namely the centroid frequency, average frequency, Root mean Square frequency, and frequency standard deviation, were extracted. In the time–frequency domain, wavelet energy entropy was extracted. A total of 84 feature quantities were extracted to form a feature set. The specific characteristic information is shown in
Table 4.
4.3. Effectiveness of ACO–SVR–Autoencoder
MAE and MSE are two main methods for evaluating the prediction accuracy of a model. In this study, the Mean Absolute Error (MAE) and the Mean Squared Error (MSE) were adopted. Specifically, the MAE and MSE of all time periods in the test sequence were calculated to achieve an overall assessment of the prediction quality of the entire test-set sequence, thus enhancing comparability.
The Mean Absolute Error is the average of the absolute values of the differences between the predicted values and the true values. Its calculation formula is as follows:
In this formula, represents the number of samples, is the true value of the sample, and is the predicted value of the sample.
The advantage of MAE is that it is simple to calculate and has an intuitive meaning. It measures the error in the unit of the original data and can reflect the average magnitude of the prediction error. The Mean Squared Error is the average of the squares of the differences between the predicted values and the true values. Its calculation formula is:
Compared with the MAE, the MSE amplifies the impact of larger errors by squaring the errors. This makes the model pay more attention to reducing larger errors during the training process, as larger errors contribute more to the MSE.
In this study, the MAE and MSE were calculated to evaluate the performance of the Ant Colony Optimization–Support Vector Regression–Autoencoder tool wear prediction method (ACO-–VR–Autoencoder). In the first stage, after using the linear Support Vector Regression (SVR) with a linear kernel to predict the data, the MAE and MSE between the predicted values and the true tool wear values were calculated to measure the preliminary prediction error of the Support Vector Regression (SVR) model. In the second stage, the Autoencoder was used to perform non-linear modeling on this error and achieve error compensation. Then, the MAE and MSE between the compensated predicted values and the true values were calculated again. By comparing the changes in the indicators before and after, the effect of error compensation and the effectiveness of the entire two-stage prediction method can be evaluated. Through in-depth analysis of these error indicators, a more comprehensive understanding of the model’s performance at different stages can be obtained, providing strong data support for further model optimization.
The parameters of the Support Vector Regression (SVR) and the Autoencoder were determined through experiments, and a prediction model based on the combination of linear Support Vector Regression and Autoencoder was constructed and tested on the dataset of a high-speed CNC milling machine. This method aimed to reveal the long-term trend of the data while minimizing the impact of short-term oscillations. The effectiveness of this method was verified by comparison with other algorithms.
Figure 6 presents the visualization of experimental results from three datasets used for validating the ACO–SVR–Autoencoder method. C1 served as the test set, while C4 and C6 served as the training sets. C4 served as the test set, while C4 and C6 served as the training sets. C6 served as the test set, while C1 and C4 served as the training sets.
4.3.1. Parameter Determination Process
The configuration of model hyperparameters was determined through a systematic experimental validation approach. Specifically, we implemented a stratified cross-validation strategy by reserving 20% of the original training set as an independent validation set to assess the generalization capabilities of different parameter combinations. Multiple rounds of iterative experiments were conducted to monitor variations in both loss functions and evaluation metrics on the validation set. The regularization coefficient (C = 1) was selected to optimally balance model complexity with fitting performance. The determination of hidden layer neuron numbers (100/140) achieved an effective compromise between representational capacity and overfitting prevention. Training for 30 epochs with a batch size of 24 was optimized to strike a balance between convergence rate and computational resource utilization. The heuristic factor range [0, 1] was validated through controlled experiments, demonstrating its effectiveness. All hyperparameter settings were ultimately finalized based on the consistent performance stability observed across the validation cycles.
- (1)
Determination of Parameters for Linear Support Vector Regression (SVR)
The linear kernel was selected as the kernel function for linear Support Vector Regression (SVR). The regularization parameter was explicitly set to 1.
During the experiment, relatively good results were achieved by adjusting the regularization parameter C through cross-validation; given the need for subsequent error modeling, the remaining hyperparameters were configured using the methodology proposed in reference [
32].
- (2)
Determination of Parameters for Autoencoder
The parameters of the Autoencoder model were determined using the Ant Colony Optimization (ACO) algorithm. The following parameter ranges were defined:
The number of training epochs for the Autoencoder ranged from 10 to 100.
The batch size of the Autoencoder ranged from 16 to 128.
The number of pre-training epochs for the Autoencoder ranged from 10 to 50.
The number of neurons in the hidden layers of the Autoencoder ranged from 50 to 120 and from 120 to 300 for different architectures.
These parameters were randomly generated in each iteration of the ACO algorithm and subsequently updated and optimized based on the model’s performance, as evaluated by the Mean Squared Error (MSE).
- (3)
Determination of Parameters for the Ant Colony Optimization (ACO) Algorithm
The parameters of the Ant Colony Optimization (ACO) algorithm were manually configured as follows:
The number of ants was set to 10.
The number of iterations was set to 10.
The pheromone importance factor was set to 1.
The heuristic information importance factor was set to 2.
The pheromone evaporation rate was set to 0.5.
Given that nearly all prediction models necessitate random parameter initialization, in order to ensure the reliability and stability of the experimental results, each model was tested 30 times. The specific experimental parameter settings are elaborated on in
Table 5. Moreover, in this experiment, two metrics, namely the Mean Absolute Error (MAE) and the Mean Squared Error (MSE), were chosen to comprehensively assess the prediction performance of the models.
The parameter configurations are grounded in established literature and validated through empirical studies:
Regularization (C = 1): We adopted dropout regularization (applied after the second hidden layer), as recommended by Srivastava et al. (2014) [
33], to prevent overfitting in deep autoencoders. A dropout rate of 0.5 (equivalent to C = 1 in our framework) was selected via grid search on the validation data to balance model capacity and generalization.
Hidden Layers (100→140): The hierarchical architecture followed the layer-wise dimension increase proposed by Zhang et al. (2011) [
34], where incremental expansion (input→100→140) enhances feature disentanglement. Ablation studies showed that smaller layer dimensions (e.g., 50→120) increased reconstruction MSE by 32% compared to baseline, while larger dimensions (200→300) caused validation loss deterioration (overfitting).
Training Protocol (30 epochs, batch size = 24):
Pre-training (25 epochs): This aligns with Hinton (2006) [
35], where greedy layer-wise pre-training stabilizes weight initialization.
Batch Size 24: This follows the small-batch heuristic from Kingma (2014) [
36], balancing gradient noise and convergence efficiency for Autoencoders (AEs).
Heuristic Factor [0, 1]: Input normalization to a unit range aligns with guidance from Yann et al. [
37] (2015) on feature scaling for neural networks, ensuring domain knowledge inputs (e.g., expert scores) do not dominate latent representations.
4.3.2. Reasons for Algorithm Selection
- (1)
Reasons for Selecting Linear Support Vector Regression (SVR)
Support Vector Regression (SVR) excels in handling linear relationships and has moderate robustness. In tool wear prediction, data such as the cutting speed of the tool and the amount of wear often exhibit a linear trend. Support Vector Regression (SVR) can accurately capture these linear features, providing a reliable basic prediction for the entire prediction process. This not only reduces errors caused by improper handling of linear relationships but also lays a solid foundation for subsequent error compensation work. As a result, Support Vector Regression (SVR) serves as a crucial first step in the overall prediction framework.
- (2)
Reasons for Selecting Autoencoder
Autoencoder has excellent capabilities in non-linear modeling and unsupervised learning. Tool wear is comprehensively affected by various factors such as cutting force and temperature, and there are complex non-linear relationships between these factors and the amount of wear. Autoencoder can automatically learn the feature representation of input data, deeply explore the complex non-linear relationships in prediction errors, accurately model the errors, and effectively compensate for the deficiencies of Support Vector Regression (SVR)’s linear prediction. For example, Windrim et al. [
38] demonstrated the superiority of Autoencoders in unsupervised feature learning for hyperspectral data, with their model capturing non-linear dependencies and improving prediction accuracy compared with traditional linear methods. Consequently, this significantly improves the overall prediction accuracy, complementing Support Vector Regression (SVR)-based linear prediction.
- (3)
Reasons for Selecting Ant Colony Optimization (ACO) Algorithm
The global search ability of ACO and its characteristics of simulating intelligent behavior make it an ideal choice for optimizing the parameters of Autoencoder. In a complex parameter space, ACO can conduct extensive searches, avoid getting trapped in local optimal solutions, and help find the parameter combination that optimizes the performance of Autoencoder, giving full play to Autoencoder’s non-linear modeling capabilities. By simulating the cooperation and information-exchange mechanism of ant colonies, ACO continuously updates pheromones during the iterative process to guide the search direction and can adaptively adjust the search strategy. It optimizes the key parameters of Autoencoder (the number of training epochs, batch size, and the dimensions of the hidden layer), avoiding the limitations and blindness of manual parameter adjustment, maximizing the performance of Autoencoder, and ensuring that its non-linear modeling and error compensation effects reach the best state. Thus, it significantly improves the accuracy of tool wear prediction as well as the performance and stability of the entire prediction method. ACO’s optimization of Autoencoder further enhances the overall effectiveness of the combined algorithm. In summary, the three algorithms cooperate with each other, giving full play to their respective advantages, jointly forming the ACO–SVR–Autoencoder method, which realizes efficient data prediction and error compensation. Through comparative research, this paper aims to better illustrate the superiority of the ACO–SVR–Autoencoder method in improving prediction accuracy, thereby providing robust support for solving practical problems. To comprehensively verify the effectiveness of this optimized two-stage combined error-compensation prediction method, we carefully designed comparative experiments. The final prediction results of the ACO–SVR–Autoencoder method were rigorously compared with two categories of baseline approaches (as illustrated in
Figure 6,
Figure 7 and
Figure 8): (1) single-stage methods (predictions using solely linear Support Vector Regression (SVR) or Autoencoder alone) and (2) classic machine learning algorithms such as neural networks and random forests. Comparative analysis demonstrated that the proposed two-stage combination method achieved significantly higher prediction accuracy than both single-stage methods and traditional algorithms. This marked improvement underscores the practical feasibility of the two-stage framework in real-world applications.
4.4. Analysis of the Prediction Results of the Optimized Linear Regression and Non-Linear Error Compensation Model
The innovations achieved in the prediction method of this study are mainly reflected in the following three key aspects:
First, a two-stage prediction method, ACO–SVR–Autoencoder (Ant Colony Optimization–Support Vector Regression–Autoencoder), is proposed through the integration of linear regression and non-linear error compensation. Specifically, the SVR model, utilizing a linear kernel function, demonstrates superior performance in modeling linear relationships by efficiently capturing data trends, thereby establishing a robust prediction baseline. Conversely, the autoencoder architecture excels at extracting complex non-linear features through its deep representation learning capability. This hybrid approach leverages complementary strengths—linear modeling fidelity and non-linear residual compensation—to achieve significant improvements in prediction accuracy. Second, in the error compensation stage, Autoencoder demonstrates unique advantages. It can deeply explore the complex error mapping relationships and effectively tackle the challenges confronted by traditional methods in dealing with non-linear problems. This powerful non-linear processing ability enables the prediction model to compensate for errors more accurately, thus further optimizing the prediction results. Finally, the introduction of the Ant Colony Optimization (ACO) algorithm to optimize the key parameters of Autoencoder is another innovative highlight of this study. ACO breaks through the limitations of traditional parameter adjustment methods. By simulating the intelligent behavior of ant colonies, it can adaptively search for the optimal solution in the complex parameter space, maximizing the performance of Autoencoder. This not only further enhances the effectiveness of the prediction model but also significantly improves the model’s stability, ensuring good prediction performance under different data conditions. In conclusion, the integrated ACO–SVR–Autoencoder optimization approach delivers an efficient, accurate, and stable solution for the prediction field.
To comprehensively evaluate the effectiveness of this optimized two-stage combined error compensation prediction method, comparative experiments were designed. The proposed methodology was evaluated on the PHM2010 dataset to validate its performance characteristics under standardized benchmarking conditions. The final prediction results obtained through the proposed method were compared with those from single approaches [
11] (Support Vector Regression (SVR) and Autoencoder) and other classical algorithms (neural networks [
39] and random forest [
40]). The comparative analysis revealed that the prediction results obtained through the two-stage combined approach demonstrated enhanced performance compared to those generated by standalone methods or classical algorithms. These findings indicate the effectiveness of the two-stage integrated prediction framework.
E1: The test set used in this experiment was C1, with C4 and C6 serving as the training sets. As depicted in
Figure 7, ACO–SVR–Autoencoder outperformed conventional methods (Linear Support Vector Regression (SVR), Autoencoder, random forest, neural networks) and our baseline SVR–Autoencoder in prediction accuracy. While the baseline SVR–Autoencoder achieved the lowest MSE (83.87,
Table 6), demonstrating its effectiveness in reducing large-magnitude errors, ACO–SVR–Autoencoder attained the lowest MAE (7.50), signifying superior robustness in minimizing systematic deviations. The marginally higher MSE of the ACO–SVR–Autoencoder (88.75 vs. 83.87) can be attributed to the Ant Colony Optimization (ACO) algorithm’s focus on MAE minimization during hyperparameter selection.
Although the absolute error reduction from ACO integration appears modest, its advantages extend beyond direct metric improvements. First, ACO automates the optimization of key Autoencoder hyperparameters (training epochs, batch size, and hidden layer dimensions), eliminating manual tuning while ensuring model stability. Second, the ACO-optimized framework exhibited enhanced generalization, as evidenced by its lower performance variance across repeated trials, indicating reduced sensitivity to initialization and noise.
E2: The test set was C4, and the training sets were C1 and C6.
Figure 8 shows the visualization of the prediction results of each method. As can be seen from
Table 7, when comparing the tabular data, traditional methods such as Linear Support Vector Regression (SVR), Autoencoder, random forest, and neural networks performed worse than the SVR–Autoencoder and ACO–SVR–Autoencoder proposed in this study in terms of MSE and MAE metrics. Among them, ACO–SVR–Autoencoder had the lowest MSE and MAE, demonstrating better prediction accuracy and stability.
E3: The test set was C6, and the training sets were C1 and C4.
Figure 9 shows the visualization of the prediction results of each method. The proposed ACO–SVR–Autoencoder method demonstrated enhanced prediction accuracy compared to conventional approaches, including linear Support Vector Regression (SVR), Autoencoder, random forest, and neural networks (
Table 8). Although the baseline SVR–Autoencoder framework achieved comparable performance metrics (MSE: 96.50; MAE: 8.11), ACO–SVR–Autoencoder obtained minimal prediction errors, with an MSE of 96.412 and an MAE of 7.75, demonstrating superior capability in reducing error fluctuations. This performance improvement primarily originated from the ACO optimization algorithm’s effective tuning of critical parameters in the Autoencoder architecture. The experimental results confirm the effectiveness of the proposed approach in enhancing prediction accuracy.
In this study, we employed two distinct models: Support Vector Regression (SVR) was first utilized to model the linear components of the data, followed by an Autoencoder model to capture non-linear patterns. This hybrid framework effectively integrated both linear and non-linear data characteristics. Additionally, an intelligent optimization algorithm was applied to adaptively tune the model parameters based on the dataset’s intrinsic properties. Consequently, this methodology significantly improved the predictive accuracy of the integrated framework.
This study validated the effectiveness of the Ant Colony Optimization (ACO) algorithm through three sets of comparative experiments, using evaluation metrics including Mean Squared Error (MSE) and Mean Absolute Error (MAE). The results show that while the MSE displayed an increased error in one experimental group, the MAE exhibited reduced errors across all three groups. This indicates that the ACO algorithm is practically effective. The findings confirm that the ACO algorithm can adaptively optimize model parameters based on different datasets, providing insights for applications in other domains.
5. Conclusions
This study proposes a two-stage prediction method based on linear regression and non-linear error compensation. The effectiveness of the model is verified by comparing the results of three groups of experiments. The research conclusions are as follows. To address accuracy limitations in tool wear process modeling and remaining useful life (RUL) estimation, this study proposes a two-stage ACO–SVR–AE framework that integrates linear regression and nonlinear error compensation. The methodology operates in three phases:
- (1)
Linear Modeling: A baseline Support Vector Regression (SVR) model predicts tool wear trends, generating preliminary results and residual error distributions.
- (2)
Nonlinear Compensation: An Autoencoder (AE) is employed to learn and model the nonlinear error patterns, thereby compensating for the residuals in SVR predictions and forming the integrated SVR–AE hybrid model.
- (3)
Parameter Optimization: The Ant Colony Optimization (ACO) algorithm adaptively tunes three critical AE hyperparameters—the number of training epochs, batch size, and hidden layer dimensions, building the ACO–SVR–AE model.
Experimental results on the PHM2010 dataset demonstrate the feasibility of the proposed method. Compared to standalone Support Vector Regression (SVR) and Autoencoder (AE) models, the framework achieves significant average reductions of 26.1% in Mean Squared Error (MSE) and 14.5% in Mean Absolute Error (MAE). When benchmarked against traditional approaches such as random forest and neural networks, the improvements are even more pronounced, with a 32.3% lower MSE and a 25.3% lower MAE. By synergistically integrating linear modeling and nonlinear error compensation, this method offers an innovative solution for predictive tasks in complex industrial systems.
This optimized two-stage framework provides an effective solution for practical prediction challenges, particularly in scenarios requiring both linear pattern capture and non-linear residual correction.
The proposed algorithm’s performance may degrade in scenarios with significant variations in working conditions (e.g., cutting speed, feed rate, or depth of cut). If training and testing data derive from distinct parameter settings, the statistical distributions of vibration/acoustic emission signals may diverge substantially. As the model is trained on condition-specific signal patterns, such divergence would require retraining with new data to ensure accuracy in altered environments.
Future work will focus on developing systematic prediction and optimization models for the cutting process, explicitly accounting for variations in working conditions and cutting parameters.