*Article* **Tool Wear Monitoring for Complex Part Milling Based on Deep Learning**

#### **Xiaodong Zhang 1, Ce Han 1, Ming Luo 1,2,\* and Dinghua Zhang 1,2**


Received: 1 September 2020; Accepted: 29 September 2020; Published: 2 October 2020

**Abstract:** Tool wear monitoring is necessary for cost reduction and productivity improvement in the machining industry. Machine learning has been proven to be an effective means of tool wear monitoring. Feature engineering is the core of the machining learning model. In complex parts milling, cutting conditions are time-varying due to the variable engagement between cutting tool and the complex geometric features of the workpiece. In such cases, the features for accurate tool wear monitoring are tricky to select. Besides, usually few sensors are available in an actual machining situation. This causes a high correlation between the hand-designed features, leading to the low accuracy and weak generalization ability of the machine learning model. This paper presents a tool wear monitoring method for complex part milling based on deep learning. The features are pre-selected based on cutting force model and wavelet packet decomposition. The pre-selected cutting forces, cutting vibration and cutting condition features are input to a deep autoencoder for dimension reduction. Then, a deep multi-layer perceptron is developed to estimate the tool wear. The dataset is obtained with a carefully designed varying cutting depth milling experiment. The proposed method works well, with an error of 8.2% on testing samples, which shows an obvious advantage over the classic machine learning method.

**Keywords:** tool wear monitoring; milling; complex part; deep learning; autoencoder; deep multi-layer perceptron

#### **1. Introduction**

Tool wear is a cost driver in machining that affects quality and productivity and adds unscheduled downtime for tool changes and the reworking of damaged parts. Accurate tool wear monitoring is necessary to avoid these unnecessary costs. Tool wear monitoring methods can be categorized into two main groups: direct and indirect methods [1]. Direct methods measure the actual wear value with optical, laser or ultrasonic devices. Although direct methods measure tool wear precisely, they are difficult to implement in real-time machining because, in most cases, the tool wear area is unreachable due to the occlusion of workpiece structure and flood coolant. Indirect methods monitor in-process physical parameters to evaluate wear state, such as force, vibration, acoustic emission, current, power and temperature signals [2]. Nowadays, indirect methods are the most widely used in tool wear monitoring because they are easy to conduct in real time and can obtain acceptable accuracy by using a proper monitoring signals and modeling method.

Indirect tool wear monitoring methods can be divided into two categories: physical-based and data-driven methods. Physical-based methods first develop the hand-designed physical model, i.e., the mathematic relationship between tool wear and measurable physical quantities from the

mechanism of machining, and then estimate tool wear value through monitoring signals based on the model. Choudhury and Rath [3] proposed an evaluating approach of milling tool flank wear based on the relationship between average tangential cutting force coefficients and tool wear. Cui [4] discussed the influences of process parameters, tool parameters, and tool wear on tangential cutting force coefficients in his dissertation. A recognition approach for milling tool wear was proposed, based on the relationship between tangential cutting force coefficients and tool wear. This approach was needed to solve the cutting force coefficients through actual cutting forces, and then the tool wear was recognized through the solved tangential cutting force coefficients. The complex calculations meant that the recognition speed was low, and the result was not precise enough. Shao et al. [5] established a cutting power model in face milling, which included the cutting conditions and the tool flank wear, and then proposed a tool wear monitoring approach based on this power model. Hou et al. [6] developed the relationship between flank wear and average milling force based on the stress distribution in the tool wear zone and applied this relationship to estimate the flank wear width in the milling process. Han et al. [7,8] proposed a mechanistic cutting force model considering various wear types for difficult-to-cut material drilling, which can be used for tool wear monitoring and process parameter optimization in the conditions of multiple tool wear existing on different cutting edges.

Due to the complexity of tool wear mechanism and randomness of the machining process, the physical-based monitoring methods are faced with the problem of low accuracy and weak universality [9,10]. In recent years, many data-driven tool wear monitoring methods have been proposed based on machine learning such as fuzzy logic, artificial neural network (ANN), support vector machine (SVM), and Bayesian networks. Yu [11] used logistic regression with penalization and manifold regularization for tool condition monitoring. Kilickap et al. [12] used ANN with the inputs of cutting speed, feed rate and depth of cut for tool wear prediction in the milling of Ti-6242S. Patra et al. [13] used ANN with thrust force signals for tool wear prediction in micro-drilling. Karam et al. [14] built an ANN-based cognitive decision-making system to extract signal features for online tool life prediction. In other studies, the ANNs with different inputs and architectures are employed to predict tool wear [15–17]. Madhusudana et al. [18] used SVM with the features extracted from discrete wavelet transformation in sound signals for tool condition monitoring in face milling. Benkedjouh et al. [19] used support vector regression (SVR) with the features extracted from multi-sensor signals to predict tool life. Zhang and Zhang [20] used a least-square SVM to develop a nonlinear regression model for tool wear prediction in the milling process. Yu et al. [21] proposed a weighted hidden Markov model for tool life prediction. Zhu and Liu [22] proposed a hidden semi-Markov model with dependent durations through cutting force signals for tool wear monitoring. Tobon-Mejia et al. [23] established a dynamic Bayesian network for tool condition monitoring and remaining useful life estimation. Kong et al. [24] proposed a Gaussian process regression model for tool wear prediction. Wu et al. [25] made a comparative study on different machining learning method including ANN, SVR and random forest for tool wear monitoring. Liu et al. [26] built an Elman\_Adaboost predictor with Elman neural networks for milling tool wear assessment with several statistic features selected from multi-sensor data including spindle current, force, vibration and acoustic emission.

The accuracy and generalization ability of the above-mentioned conventional feature-based machining learning models are highly affected by the quality of the hand-designed features [27]. However, feature selection is problem-dependent and somewhat subjective in practice. Different from feature-based machining learning methods, deep learning can achieve adaptive feature learning, which is helpful to improve the adaptability of prediction methods. Moreover, layer-by-layer feature learning in deep network is more likely to learn essential features hidden in the monitoring data and then to improve prediction accuracy [28]. Common deep learning methods include deep multi-layer perceptron (DMLP), deep autoencoder (DAE), convolutional neural network (CNN) and long short-term memory (LSTM) network. Serin et al. [29] used DMLP neural networks to predict surface roughness and specific energy consumption during 5-axis milling. Ou et al. [30] proposed an online sequential extreme learning machine for tool wear state recognition with a stacked denoising autoencoder (SDAE)

put forward to extract abstract features. Cao et al. [31] proposed a 2-D CNN for milling tool wear monitoring, with the spectrum of the high signal-to-noise ratio vibration signals obtained from the derived wavelet frames as input features. Aghazadeh et al. [32] employed a CNN with a hybrid feature extraction method using wavelet time-frequency transformation and spectral subtraction algorithms for tool wear estimation. Mart'ınez-Arellano et al. [33] built a CNN model with time series imaging technique to transform the input raw signals. Sun et al. [34] designed an LSTM network to predict multiple flank wear values using raw signals of cutting force, vibration and acoustic emission. Zhao et al. [35] used convolutional bi-directional LSTM networks for tool condition monitoring in milling process.

From the above, most existing deep learning methods use raw signals without feature pre-selection and hand-design to maximize the merits of deep learning. In the actual machining process, cutting conditions are strongly time-varying, especially for the milling of parts with complex geometry, of which the allowance and cutting depth drastically change along the whole tool path [36]. In most cases, in the actual machining environment, only a few types of monitoring signals are possible to acquire, and the input features from the signals are probably highly correlated, leading to low prediction accuracy and poor generalization ability in time-varying cutting conditions. Besides, the model based on the raw signal as input has weak interpretability and it is difficult to analyze the source of error.

To solve this problem, a tool wear monitoring method for complex part milling based on deep learning is proposed in this paper. Compared with the existing approaches, the features are pre-selected with cutting force model and wavelet packet decomposition. The cutting depth, cutting forces, cutting force coefficients and the energy of the cutting vibration are selected as the input parameters and the output is the flank wear. Then, a deep autoencoder is employed to retract the highly correlated features from the pre-selected features, followed by a deep neural network to predict the wear value. The dataset for training and testing of the deep learning model is obtained with a carefully designed varying depth milling experiment. The testing results prove the effectiveness of the proposed method.

The remainder of this paper is organized as follows. The proposed monitoring method based on deep learning is introduced in Section 2. The experiment is conducted in Section 3. The application and results of the proposed method are shown and discussed in Section 4. The conclusions are summarized in Section 5.

#### **2. Method**

#### *2.1. Overall Monitoring Method*

Tool wear monitoring usually uses force, vibration, acoustic emission, current, power and temperature signals. In this study, to accord with the situation of few sensors in real machining environment, cutting force is selected as the single signal used for tool wear monitoring. In complex part milling, the cutting depth usually varies with the profile of the part. In this case, the cutting force and stability are time-varying, and it is difficult to design features for accurate tool wear estimation. To address this problem, a feature extractor based on pre-selection and deep learning is developed. The hand-designed features from the cutting force signal are further extracted with a deep autoencoder and then input to the deep multi-layer perceptron to estimate the tool wear value. The framework of the proposed tool wear monitoring method is illustrated in Figure 1.

#### *2.2. Feature Pre-Selection*

The features are pre-selected from the cutting force signals as the input of the deep learning, including nine features, i.e., the radial cutting depth, the magnitudes of two cutting force components in the tangential and axial direction, four cutting force coefficients and the two cutting vibration features in the tangential and axial direction.

**Figure 1.** Framework of tool wear monitoring method for complex part milling based on deep learning.

#### 2.2.1. Cutting Force Features

Cutting force is the most sensitive quantity reflecting the changes in tool wear. The magnitudes of cutting forces are selected as the features for tool wear monitoring. Cutting force coefficients can also be used as the cutting condition independent features for tool wear monitoring in complex part milling. Thus, the cutting force coefficients are also selected as the features.

The cutting forces in the milling process are illustrated in Figure 2. The cutting force model can be expressed as

$$\begin{cases} F\_{\rm t}(\phi) = K\_{\rm tc} b \mathfrak{h}(\phi) + K\_{\rm tc} b \\ F\_{\rm r}(\phi) = K\_{\rm rc} b \mathfrak{h}(\phi) + K\_{\rm rc} b \\ F\_{\rm A}(\phi) = K\_{\rm ac} b \mathfrak{h}(\phi) + K\_{\rm ac} b \end{cases} \tag{1}$$

where *F*t, *F*<sup>r</sup> and *F*<sup>a</sup> are the tangential, radial and axial cutting force, *K*tc, *K*rc and *K*ac are the cutting force coefficients contributed by the shearing action in tangential, radial and axial directions, *K*te, *K*re and *K*ac are the edge constants, φ is the instantaneous angle of immersion, *b* is the edge contact length, *h* is uncut chip thickness

$$h = f\_z \sin \phi \tag{2}$$

where *fz* is feedrate per tooth. According to Equations (1) and (2), the cutting force has a linear relationship with the uncut chip thickness *h*. When the spindle speed and feedrate are fixed, the uncut chip thickness *h* changes with the sine of the instantaneous angle of immersion φ. The linear function can be rewritten as

$$\begin{cases} \begin{aligned} \,^\text{F} \mathbf{r}\_{\text{t}}/b &= \mathbf{K}\_{\text{tc}} f\_{\text{z}} \cdot \sin \phi + \mathbf{K}\_{\text{te}} \\ \,^\text{F} \mathbf{r}\_{\text{r}}/b &= \mathbf{K}\_{\text{rc}} f\_{\text{z}} \cdot \sin \phi + \mathbf{K}\_{\text{re}} \\ \,^\text{F} \mathbf{a} / b &= \mathbf{K}\_{\text{ac}} f\_{\text{z}} \cdot \sin \phi + \mathbf{K}\_{\text{ae}} \end{aligned} \tag{3}$$

As per Equation (3), the cutting force coefficients can be calibrated by linear regression of the cutting force value. Let the slope and intercept of the linear function after linear fitting be *Pi* and *Qi*, respectively, the cutting force coefficients can be obtained as

$$\begin{cases} \ K\_{\text{ic}} = P\_i / f\_z \\ \ K\_{\text{ic}} = Q\_i \end{cases} \tag{4}$$

In complex part milling, the part is usually cut layer by layer in the axial direction. The tangential and radial force are varied significantly compared with the axial cutting force. Hence, in this study the average tangential and radial cutting forces *F*t, *F*r, and the tangential and radial cutting force coefficients *K*tc, *K*rc, *K*te, *K*re are selected as the input features.

**Figure 2.** Interaction of tool and workpiece in milling process.

#### 2.2.2. Cutting Vibration Features

Cutting vibration has high correlation with tool wear. In this study, the cutting vibration features are extracted from cutting force signals for tool wear monitoring. To this end, a three-layer wavelet packet decomposition is used to decompose the cutting force fluctuation components from the original cutting force signal, as shown in Figure 3.

**Figure 3.** Structure of the three-layer wavelet packet decomposition for reconstruction of cutting force signals.

Wavelet packet decomposition (WPD) is a time-frequency analysis method. In WPD, the scale function of a standard orthogonalization ψ(*x*)is used, and with two scale difference recursive Equations, the orthogonal wavelet packet is generated as

$$\begin{cases} \begin{array}{c} w\_{2n}(\mathbf{x}) = \sqrt{2} \sum\_{k=Z} h\_k w\_n(2\mathbf{x} - k) \\ w\_{2n+1}(\mathbf{x}) = \sqrt{2} \sum\_{k=Z} g\_k w\_n(2\mathbf{x} - k) \end{array} \end{cases} \tag{5}$$

where *w*<sup>0</sup> = ψ(*x*), *hk*, *gk* are, respectively, a pair of conjugate quadrature filter coefficients derived from ψ(*x*). For signal *s*(*t*), the discrete orthogonal WPD is defined as the projection coefficient of *s*(*t*) on the orthogonal wavelet packet base, which can be expressed as

$$P\_s(n,j,k) = \left\langle s(t), w\_{n,j,k}(t) \right\rangle = \int\_{-\infty}^{+\infty} s(t) \left[ 2^{-\frac{j}{2n\_B}} \binom{2^{-j}t - k}{} \right] dt\tag{6}$$

where . *Ps*(*n*, *j*, *k*) / is the WPD coefficient sequence of *s*(*t*) on the orthogonal wavelet packet space *U<sup>n</sup> j* . By setting a group of different conjugate quadrature filter coefficients of {*hk*} and . *gk* / , the wavelet packet transform coefficients can be expressed as

$$\begin{cases} \begin{array}{c} P\_{\sf s}(2n,j,k) = \sum\_{k=Z} l\_{l-2k} P\_{\sf s}(n,j-1,l) \\ P\_{\sf s}(2n+1,j,k) = \sum\_{k=Z} g\_{l-2k} P\_{\sf s}(n,j-1,l) \end{array} \end{cases} \tag{7}$$

Then, the energy distribution of the reconstructed signal in the time-frequency domain is expressed as

$$E(j, n) = \sum\_{k=Z} \left[ P\_s(n, j, k) \right]^2 \tag{8}$$

In this study, the reconstructed force component with WPD caused by vibration is adopted as the input feature for tool wear monitoring.

#### *2.3. Deep Learning for Tool Wear Monitoring*

#### 2.3.1. Structure of the Deep Learning Network

The deep learning network for tool wear monitoring in variable cutting depth milling is shown in Figure 4. It consists of two parts, a deep autoencoder and a deep multi-layer perceptron. The deep autoencoder is used to reduce the dimension of the input vectors (the pre-selected features) and learn the highly correlated features. Then, the following deep multi-layer perceptron is used to learn the effect of the extracted features on the tool wear and predict the tool wear value.

#### 2.3.2. Deep Autoencoder

The pre-selected features are all obtained from cutting force signals. They are highly coupled and superfluous for tool wear prediction. This affects the accuracy and generalization ability of the deep learning model. In this study, the deep autoencoder (DAE) is used to learn the low-dimensional abstract features from the original features, which have a high correlation with the tool wear. The structure of the DAE is illustrated in Figure 5. Deep autoencoder is an unsupervised deep learning network for data dimensional reduction and feature extraction. It has a symmetric structure of multiple layers with the same input and output data. It consists of an encoder and a decoder. The transformation from the input layer to the middle hidden layer is called encoding. This is a dimensional reduction process, which transforms the high-dimensional original data into the low-dimensional space. The transformation from the middle hidden layer to the output layer is called decoding. This is the reconstruction process, which is the inverse of the encoder that reconstructs the encoded vector back to the original input data. In the encoding process, the obtained middle hidden layer reflects

the essential features of the high-dimensional input data and is the core of the DAE. In this study, a six-layer DAE is developed with three layers for both the encoder and decoder.

**Figure 4.** Architecture of deep learning network for tool wear prediction combining autoencoder and multi-layer perceptron.

**Figure 5.** Symmetrical structure of deep autoencoder consisting of three-layer encoder and three-layer decoder.

The encoding process can be expressed as

$$\mathbf{h} = f\_{\boldsymbol{\theta}}(\mathbf{x}) = \mathbf{s}\_f(\mathbf{W}\mathbf{x} + \mathbf{b}) \tag{9}$$

The decoding process can be expressed as

$$\hat{\mathbf{x}} = \mathbf{g}\_{\partial'}(\mathbf{h}) = \mathbf{s}\_{\S} \Big(\mathbf{W}^{'}\mathbf{h} + \mathbf{b}^{'}\Big) \tag{10}$$

where *<sup>x</sup>* <sup>=</sup> [*x*1, *<sup>x</sup>*2, ··· , *xn*] *<sup>T</sup>* is the input vector, *<sup>h</sup>* and *^ <sup>x</sup>* <sup>=</sup> [*x*ˆ1, *<sup>x</sup>*ˆ2, ··· , *<sup>x</sup>*ˆ*n*] *<sup>T</sup>* are the encoded and decoded vectors, *f*<sup>θ</sup> and *g*θ are the encoding and decoding functions, *W* is the weight matrix from the input layer to the hidden layer, *W*- is the weight matrix from the hidden layer to the input layer, *b* is the bias vector of the hidden layer, *b* - is the bias vector of the output layer, θ is the set of parameters in the encoding function, θ is the set of parameters in the decoding function, *sf* and *sg* are the activation functions of the encoder and decoder, which are both selected as the sigmoid function.

In the training process, to measure the error between the input *x* and the reconstructed input *x* - , the squared Euclidean distance is used as the loss function for this regression problem, which can be expressed as

$$\text{Loss}(\mathbf{W}, \mathbf{b}, \mathbf{W}', \mathbf{b}') = \sum\_{i=1}^{n} \left\| \mathbf{x}\_{i} - \hat{\mathbf{x}}\_{i} \right\|^{2} \tag{11}$$

After the DAE has been trained, the encoder (the first three layers) is employed as a feature extractor in front of the deep learning model.

#### 2.3.3. Deep Multi-Layer Perceptron

The deep multi-layer perceptron is used to predict tool wear with the input of the extracted features from the DAE. The deep multi-layer perceptron consists of an input layer, a hidden layer and an output layer. The input layer and hidden layer are followed by a dropout layer. The number of neurons in each layer is tuned in the training process. The Rectified Linear Unit (ReLU) is used as the activation function, which can be expressed as

$$f(\mathbf{x}) = \max(0, \mathbf{x}) \tag{12}$$

#### 2.3.4. Loss Function

The tool wear prediction is a regression problem. The target of network training is to minimize the error between the predicted and measured wear values. In this study, the mean squared error (*MSE*) is used to evaluate the prediction error, which can be expressed as

$$MSE = \frac{1}{n} \sum\_{i=1}^{n} \left\| y\_i - \hat{y}\_i \right\|^2 \tag{13}$$

where *<sup>y</sup>* <sup>=</sup> [*y*1, *<sup>y</sup>*2, ··· , *yn*] *<sup>T</sup>* is the measured output vector which refers to the tool wear value, *^ <sup>y</sup>* <sup>=</sup> [*y*ˆ1, *<sup>y</sup>*ˆ2, ··· , *<sup>y</sup>*ˆ*n*] *<sup>T</sup>* is the predicted output vector, and *n* is the number of samples.

#### 2.3.5. Regularization

Regularization is an effective technique used to enhance the generalization ability of deep neural network. The L1 and L2 regularization are generally used regularization methods. This involves adding the L1 norm or the L2 norm of the weight to the loss function in order to constrain the weight of the neural network. In this way, the networks with large weight values are abandoned in training process. In this study, to accord with the MSE loss value, L2 regularization is adopted. Then, the final loss function of the deep neural network can be expressed as

$$\text{Loss} = \frac{1}{n} \sum\_{i=1}^{n} \left\| y\_i - \hat{y}\_i \right\|^2 + \frac{\lambda}{2} \left\| \mathbf{u} \boldsymbol{\nu} \right\|^2 \tag{14}$$

where *w* is the L2 norm of the weight vector *<sup>w</sup>*, and <sup>λ</sup> is the regularization parameter. The regularization parameter is tuned in the training process.

Another regularization approach is Dropout. In the Dropout method, the neurons in the network are deleted randomly in the training process, as illustrated in Figure 6. This trick can significantly reduce the interaction between the feature detectors (the hidden layer nodes). Detector interaction means that, in a deep network, some detectors only show their effect depending on other detectors. In this study, the dropout layers are inserted behind the input and hidden layers to diminish overfitting in training process. The dropout rate is set as 0.2, which means each neuron has a 20% probability of being deleted in the training process.

**Figure 6.** Schematic representation of dropout trick in a fully connected network.

#### **3. Experiment**

To acquire the training and testing dataset for a deep learning model, the milling experiment is performed. The experimental setup is shown in Figure 7. The experiment is carried out on a YHVT850Z three-axis machining center. The work material is GH4169. The up milling is used with a 12 mm diameter indexable carbide insert tool APMT113PDER-H2. To control random factors in the machining process, the milling process is performed without a coolant. The cutting force is measured with a Kistler 9123C rotating dynamometer. The flank wear of the cutting tool was measured with Alicona InfiniteFocus optical measurement device.

**Figure 7.** Experimental setup: (**a**) cutting force online measurement system, (**b**) layout of cutting tool and workpiece.

To simulate the actual milling process with time-varying cutting depth, the milling process is designed with variable radial cutting depth. The tool path is illustrated in Figure 8. A total of 12 cutting passes are performed. For each pass, the radial cutting depth is from 0.5 to 1.5 mm. With the increase of cutting length, tool wear is gradually increased. Each cutting pass is divided into 10 segments. The cutting process was paused after each segment and the tool wear measurement was performed offline after each segment was finished. The average cutting forces in each segment are recorded as the input features of tool wear monitoring. The cutting parameters are listed in Table 1.

**Figure 8.** Tool path of varying depth milling: (**a**) tool path design, (**b**) tool path, (**c**) cutting process simulation.


**Table 1.** Cutting parameters in milling experiment.

#### **4. Results and Discussion**

#### *4.1. Features Extracted from Cutting Force Signals*

The raw cutting force signal in a single segment is shown in Figure 9. This shows that the magnitude of the force is increased with the cutting depth. The average cutting force and the cutting force coefficients are selected as the input cutting force features of the deep learning model for tool wear monitoring. The average value of the cutting force in each segment are calculated and used as the input features. Cutting force coefficients are calibrated by applying linear regression as per Equations (3) and (4).

**Figure 9.** Raw signal of cutting force in a single segment.

By reconstructing the cutting force signal with the WPD, the cutting force components caused by cutting vibration are obtained. The features related to tool wear are extracted from these two kinds of signals. With the three-layer WPD, the cutting force signals in the 0~1.5 KHZ are decomposed into eight frequency bands. By using 8 db wavelet packet decomposition, the width of each frequency band is 150 Hz. Taking the cutting force signal in three cycles as an example, the reconstructed signals in each band after WPD are shown in Figure 10.

From Figure 10, the reconstructed signals in different frequency band can be analyzed intuitively. According to the cutting force model Equation (1), the signal in the 0~150 Hz frequency band is related to the static cutting force component; the other signal in higher frequency band can be regarded as the dynamic component caused by cutting vibration and noise. From the results of WPD, the tangential force signal in 150~300 Hz is related to cutting vibration. The reconstructed force is denoted as *Vt*. The radial force signal in 450~600 Hz is related to cutting vibration. The reconstructed force is denoted as *Vr*. These two parameters are selected as the input cutting vibration features of the deep learning model for tool wear monitoring.

**Figure 10.** Reconstructed cutting force signals in each frequency band with wavelet packet decomposition.

#### *4.2. Training and Validation of Deep Neural Network*

After the feature pre-selection and measurement shown above, the dataset for deep learning is shown in Appendix A. There are a total of 120 samples (acquired from 10 segments in 12 cutting passes). The Keras deep learning library is employed with Tensorflow as the back-end to implement the proposed deep learning networks. The dataset is randomly split into 80% training set (96 samples) and 20% testing set (24 samples). Before training, all the input parameters are normalized. Then, the grid search technique is employed to select the optimal hyper-parameters, including epochs, neuron numbers, optimizer, learning rate and regularization parameter. The obtained optimal hyper-parameters are given in Table 2.

With the proposed deep learning model in Section 2, with the above optimal hyper-parameters, the training process is carried out. In the training process, four-fold validation is used, with 24 samples in each fold. The mean value of the mean absolute error in all the fold is recorded as the training error.

The training curve is shown in Figure 11. The final mean absolute error after 1000 epochs is 34.93 μm, and the corresponding *R*<sup>2</sup> value is 0.9826. This proves the accuracy of the trained deep learning model.


**Table 2.** Optimal hyper-parameters of deep neural network obtained by grid search.

**Figure 11.** Mean absolute error of proposed deep learning model in training process.

The mean absolute error with the trained network in the four folds of the validation set is shown in Figure 12. The prediction error of tool wear value is less than 37 μm, and it is enough to monitor the tool wear state in the real machining process. This proves the effectiveness of the trained deep learning model.

**Figure 12.** Mean absolute error in four-fold validation set.

#### *4.3. Testing of Deep Neural Network*

The trained deep learning model is used for tool wear prediction on the testing set. The average error in percentage is used to evaluate the performance of the trained model, which is calculated as

$$error = \frac{1}{m} \sum\_{i=1}^{m} \frac{\left| VB\_i^\* - VB\_i \right|}{VB\_i} \tag{15}$$

where *VB*∗ *<sup>i</sup>* is the predicted tool wear value, *VBi* is the measured tool wear value, and *m* is the total number of the testing samples.

The testing result is shown in Figure 13. The average error of the 24 testing samples is 8.2%, as per Equation (15). In the testing samples, the maximum relative error occurs in the test NO.13, with 64.8 μm larger than the measured value. In this case, the new tool has just begun to cut, with the measured flank wear of 25.7 μm. Besides, the maximum absolute error occurs in the test NO.15, with 82.3 μm less than the measured value. In this case, the cutting edge is seriously worn and the measured flank wear reaches 402.9 μm, far beyond the ISO tool wear criterion of 300 μm. Except for the two extreme cases shown above, where the actual tool wear value is too small or too large, the trained deep learning model predicts well for all the testing samples. For the cases where the tool wear is too small or too large, the model has a larger prediction error because the training set obtained from the experiment covers few samples of the tool wear with such extreme values, and thus the rules learnt in the model under these situations are somewhat under-fitting. The proposed deep learning model can be more accurate with more varied training data through additional experiments. This will be investigated in future research work. In this study, the testing result shows that the proposed model works well in most cases and this proves the generalization ability of the trained deep learning model.

**Figure 13.** Testing result in 24 samples of the trained deep learning model.

Then, the radial basis function neural network (RBF NN) is implemented on the same dataset as the baseline to compare it with the performance of the proposed method. It turns out that the average error of RBF NN is 13.8%. Compared with the RBF NN, the proposed method shows higher accuracy in the tool wear prediction of varying cutting depth milling.

#### **5. Conclusions**

In this paper, a monitoring method based on deep learning is presented to give a solution to the feature extraction and accurate prediction of tool wear in complex part milling. A deep neural network is developed to estimate tool wear with the features pre-selected based on the cutting force model and wavelet packet decomposition, and further extracted using a deep autoencoder. The conclusions of this paper are summarized as: 1) the feature pre-selection based on cutting force model and wavelet packet decomposition and extraction coupled with a deep autoencoder is able to learn the highly correlated features for tool wear monitoring from single cutting force signals; 2) the proposed deep leaning model with the optimal settings has high accuracy for predicting tool wear; 3) the proposed deep learning method is proved to have higher accuracy than radial basis function neural network in varying cutting depth milling.

**Author Contributions:** Conceptualization, X.Z. and C.H.; methodology, X.Z. and C.H.; software, C.H.; validation, X.Z. and C.H.; formal analysis, M.L.; investigation, X.Z. and M.L.; resources, M.L. and D.Z.; data curation, M.L.; writing—original draft preparation, X.Z.; writing—review and editing, C.H. and M.L.; visualization, C.H.; supervision, D.Z.; project administration, D.Z.; funding acquisition, D.Z.. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China, grant number 91860137, the Fundamental Research Funds for the Central Universities, grant number 31020200504002 and the Shaanxi Key Research and Development Project (Grant No. 2019KW-018).

**Acknowledgments:** Thanks are due to Tao Li for assistance with the experiments and data processing.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**


**Table A1.** Experimental data in varying cutting depth milling.

**Table A1.** *Cont*.


**Table A1.** *Cont*.



**Table A1.** *Cont*.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18