Fault Intelligent Diagnosis for Distribution Box in Hot Rolling Based on Depthwise Separable Convolution and Bi-LSTM

Guo, Yonglin; Zhou, Di; Chen, Huimin; Yue, Xiaoli; Cheng, Yuyu

doi:10.3390/pr12091999

Open AccessArticle

Fault Intelligent Diagnosis for Distribution Box in Hot Rolling Based on Depthwise Separable Convolution and Bi-LSTM

by

Yonglin Guo

¹,

Di Zhou

^1,*,

Huimin Chen

¹,

Xiaoli Yue

¹ and

Yuyu Cheng

²

¹

School of Mechanical Engineering, Donghua University, Shanghai 200051, China

²

School of Information Science and Technology, Donghua University, Shanghai 200051, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(9), 1999; https://doi.org/10.3390/pr12091999

Submission received: 14 August 2024 / Revised: 15 September 2024 / Accepted: 16 September 2024 / Published: 17 September 2024

(This article belongs to the Special Issue Industrial IoT-Enabled Modeling and Optimization for the Process Industry)

Download

Browse Figures

Versions Notes

Abstract

:

The finishing mill is a critical link in the hot rolling process, influencing the final product’s quality, and even economic efficiency. The distribution box of the finishing mill plays a vital role in power transmission and distribution. However, harsh operating conditions can frequently lead to distribution box damage and even failure. To diagnose faults in the distribution box promptly, a fault diagnosis network model is constructed in this paper. This model combines depthwise separable convolution and Bi-LSTM. Depthwise separable convolution and Bi-LSTM can extract both spatial and temporal features from signals. This structure enables comprehensive feature extraction and fully utilizes signal information. To verify the diagnostic capability of the model, five types of data are collected and used: the pitting of tooth flank, flat-headed sleeve tooth crack, gear surface crack, gear tooth surface spalling, and normal conditions. The model achieves an accuracy of 97.46% and incorporates a lightweight design, which enhances computational efficiency. Furthermore, the model maintains approximately 90% accuracy under three noise conditions. Based on these results, the proposed model can effectively diagnose faults in the distribution box, and reduce downtime in engineering.

Keywords:

fault diagnosis; finishing mill; distribution box; depthwise separable convolution; Bi-LSTM

1. Introduction

As an important component of the hot rolling process, the finishing mill performs a key operation in the control of the final quality of products [1]. Considering the harsh external environmental conditions and the complexity of internal mechanisms, finishing mills inevitably experience performance degradation and failure [2]. In a finishing mill, the distribution box functions like the power transmission and distribution in the rolling system. The variable speed and load conditions can also result in a higher probability of failure compared to other mechanisms in the finishing mill [3]. The inability to diagnose faults in a timely manner may increase downtime, and this downtime will cause potential economic losses. To mitigate the economic impact of downtime, it is essential to develop effective fault diagnosis methods for distribution boxes. The difficulties in fault diagnosis for distribution boxes stem from two primary issues. First, the hot rolling process is characterized by lengthy procedures and interconnected structures, introducing numerous uncertainties and nonlinearities [4]. Second, the complex working environment of the distribution box makes fault pulses easily masked by ambient noise. Consequently, traditional physical models and machine learning (ML) methods may fall short of addressing these challenges effectively. Specifically, the manual feature extraction process and the limited depth of models restrict their ability to accurately classify faults in noisy environments.

With the enhancement of data acquisition and processing capabilities, big data approaches have emerged as new methods for fault diagnosis [5]. In recent years, methods based on machine learning have demonstrated promising initial results in the fault diagnosis of distribution boxes in finishing mills. Considering the environmental conditions in hot rolling operations, Yuan [6] proposes a novel approach combining multiwavelet sliding window neighboring coefficient denoising and optimal blind deconvolution techniques. Zhao [7] combined improved multivariate variational mode decomposition with multivariate composite multiscale weighted permutation entropy to extract bearing fault features. Shin [8] proposed a new diagnosis method for roll eccentricity under roll speed changes. Liu [9] constructed the architecture of a remote fault diagnosis system based on empirical mode decomposition and support vector machines for heavy mills. Zhang [10] employed a KPCA-based approach to nonlinear activation issues in a hot rolling automation system. Chen [11] suggested a customized maximal-overlap multiwavelet denoising technique for fault identification. Although the above methods have achieved promising results in the study of fault diagnosis in a rolling mill, machine learning and signal process methods still face some difficulties. In machine learning, feature extraction and fault classification are usually performed separately. In particular, the manual feature extraction process limits the performance accuracy of fault classification results. Additionally, only a limited number of layers are employed in the model, which makes it difficult to extract features in a noisy environment.

Considering the shortcomings of traditional ML, recent optimizations and innovations in neural networks have shown promising results. Hinton [12] introduced the concept of deep learning (DL). Since its introduction, DL has emerged as a beneficial technology widely applied in computer vision, natural language processing, and other fields, and has demonstrated remarkable performance. Compared to traditional ML, DL can deal with large-scale data and extract deeper features. By increasing the number of neurons in the neural network and the depth of the network, the accuracy of classification can be improved. Additionally, in the fault diagnosis of a rolling mill, DL has gradually become a research focus [13]. To enhance the feature extraction capabilities, Zhang [14] integrated the attention mechanism into both CNN and LSTM. Yu [15] fused signals from multiple sensors as inputs into the DBN, which enhanced the performance under limited datasets. Considering the coupling relationship of the signals of the finishing mill, Hou [16] built a graph transformer model for fault diagnosis. Shi [17] used time–frequency images as inputs for a dual attention-guided feature enhancement network, improving the model’s focus on both temporal and spectral features. Zhao [18] presented a multisource domain adversarial graph convolutional networks framework to overcome variable working conditions. That model can achieve the fault diagnosis of a rolling mill under complex conditions. From these studies, we see that DL can mine deeper features in the input data, reduce the impact of manual feature extraction, and better address the challenges that traditional ML faces to achieve desirable feature extraction and fault recognition.

Benefiting from multilayer convolution and pooling operations, a CNN can more effectively mine deeper failure features from data [19]. Chen [20] employed group normalization in CNN to normalize the feature maps of the network, which reduces the impact of data distribution discrepancy. Dash [21] combined the bond graph technique with CNN to enhance fault diagnosis, even with a minimal amount of labeled data. Considering the calculation cost, Zhao [22] proposed an efficient and lightweight model for fault diagnosis. Zhang [23] investigated the use of spatial dropout regularization to control gradient explosion and improve network stability. Zhou [24] combined inverted residuals with a lightweight network to reduce interference from unclear and small datasets. Liu [25] created a method with depthwise separable convolutions to simultaneously extract different features from vibration signals. Wang [26] developed a lightweight CNN for fault diagnosis of bearings, which can satisfy the need for fewer parameters and storage space. In addition, considering the temporal nature of fault signals, several time series-based models were investigated for fault diagnosis. Lei [27] presented an end-to-end long short-term memory model to learn features directly from multivariate time series data. Zou [28] linked multiscale weighted entropy morphological filtering and Bi-LSTM to overcome the low degree of fault discrimination and high computational complexity. Cao [29] suggested a method employing deep bidirectional long short-term memory to address time-varying and non-stationary operating conditions. Based on the above research, CNNs excel at extracting local features from signals, while LSTMs are adept at capturing temporal information. Consequently, many scholars combined CNNs and LSTMs to enhance the feature extraction capability of the model. Huang [30] constructed a CNN-LSTM model to extract feature information and time delay information, demonstrating both the accuracy and noise immunity of the model. Zhi [31] integrated joint wavelet regional correlation threshold denoising with a CNN-LSTM model, finding that the model effectively mines hidden features after denoising. Wang [32] employed a CNN and an LSTM network to address feature nonlinearity and complex conditions in the motor drive control system. Considering the varying scales of fault features, Chen [33] proposed an MRJDCNN-LSTM model that effectively reduces the loss of essential features. Qiao [34] constructed a model that employs two different convolutions and two LSTMs, enhancing diagnostic ability under variable load and noise conditions. Liu [35] established a Siamese CNN-Bi-LSTM model to address imbalanced sample classification and varying working conditions. Several related fields also modeled the CNN-LSTM class benefit after its superior feature extraction capabilities. Ullah [36] constructed a CNN and Bi-LSTM model for real-time anomaly detection in complex surveillance scenarios. Sun [37] proposed a novel intrusion detection model based on CNN-LSTM with an attention mechanism, improving both convergence speed and prediction accuracy. Xia [38] proposed an ensemble framework that fuses convolutional bidirectional long short-term memory with multiple time windows for accurate remaining useful life prediction. Huang [39] used a transfer depthwise separable convolutional recurrent network to estimate the remaining useful life of the bearing with incomplete data. The widespread application and effective results of CNN-LSTM models demonstrate their enhanced performance in feature extraction.

Considering that engineering fault data are often limited, irreproducible, and prone to noise interference, we designed a fault diagnosis model that integrates a depthwise separable convolution block with a Bi-LSTM block. This structure allows the model to extract both spatial and temporal features separately from the input signals, maximizing the information captured from the signals. By leveraging spatial features through convolution and temporal features through Bi-LSTM, the model takes full advantage of the inherent information in the signals. The Bi-LSTM improves the ability to capture time-related dependencies more effectively than a standard LSTM. Meanwhile, depthwise separable convolution reduces the number of parameters and computational complexity, ensuring the model remains lightweight without reducing accuracy.

To address the issue of timely fault diagnosis and minimize downtime for distribution boxes, this paper introduces a novel intelligent fault diagnosis model for the distribution box. The proposed model incorporates a depthwise separable convolution block and a Bi-LSTM block, designed to separately capture spatial and temporal features from fault signals. This model is capable of accurately diagnosing four types of faults: the pitting of tooth flanks, flat-headed sleeve tooth cracks, gear surface cracks, and gear tooth surface spalling. Compared to existing diagnostic models, the proposed model demonstrates superior accuracy, enhanced noise resistance, and a lightweight design.

This paper is organized as follows: Section 2 describes the distribution box data detection condition and diagnostic process within a hot rolling process. Section 3 introduces fundamental algorithm theories and the construction of the proposed model. The performance of the proposed model and comparison results are shown in Section 4. Conclusions are presented in Section 5.

2. Fault Diagnosis for Distribution Box

2.1. Distribution Box in Rolling Mill

The hot rolling process is regarded as the key stage for ensuring the quality and performance of the final product. As depicted in Figure 1, this process generally comprises four major stages: ironmaking, the continuous casting of steel, hot rolling, and cold rolling. Among these stages, the finishing mill plays a significant role in shaping steel processing. Additionally, relevant information about the hot rolling process is provided in Table 1.

Considering extensive and continuous production demands, the finishing mill is vital for maintaining the quality of strip steel. In the finishing process, there are seven finishing mills working and labeled F1–F7. The rolling mill F2 was chosen as the object of this study. The finishing mill column is a four-roll irreversible horizontal mill, a continuous arrangement, with the finishing mill-specific structure schematic diagram shown in Figure 2.

As depicted in Figure 2, a finishing mill consists primarily of seven components: a motor, a reduction gearbox, the main coupling, a distribution box, a hydraulic AGC system, upper and lower work rolls, and upper and lower support rolls. The motor drives the upper work roll rotation via the gearbox and the main coupling. The upper work roll is linked to the lower work roll through gears in the distribution box. The above mechanism enables the upper and lower work rolls to process steel plates. As a critical component in the transmission system, the distribution box is always working at variable speeds and load conditions. Therefore, it has a higher probability of failure compared to other equipment in the finishing mill.

2.2. Installation of Sensors

During hot rolling production, various sensors are used to monitor the real-time conditions of the hot rolling process to ensure safety and stability. The distribution box contains several rotating parts, such as gears, bearings, and couplings. These rotating parts generate periodic mechanical vibration pulses during operation. Acceleration vibration sensors can effectively capture the pulse features of vibration signals in the rotating component. Triaxial vibration acceleration sensors are typically installed on the shell of the rolling mill to collect operational data. The locations of the sensors are shown in Figure 3. To ensure the accuracy and completeness of the detected signals, four sensors are installed near the rotating components. The collected data are then stored in a database for subsequent studies.

2.3. Intelligent Fault Diagnosis for Distribution Box

With the development of intelligent technologies, digital intelligence technology has been applied to monitoring numerous types of industrial equipment. As an important part of machinery monitoring, fault diagnosis can detect faults in time, ensuring operating conditions and improving the reliability of the equipment. The diagnostic process of the distribution box is illustrated in Figure 4. When the distribution box is operational, sensors continuously collect signals, which are then stored in a database for further processing. The system applies a fault diagnosis model, depicted in the upper right section, to analyze the abnormal signals. Once a specific fault is identified, it can be quickly addressed to minimize downtime. Finally, the diagnostic results are fed back into the engineering system for further action.

3. Framework of Proposed Model

3.1. Spatial Feature Extraction Based on Convolutional Neural Networks

CNNs offer robust spatial feature extraction through convolutional layers, where kernels slide over the input signals to capture local features [40]. A CNN model can be established from pooling layers, activation layers, linear layers, and so on [41].

The convolutional layer processes the input data using kernels to extract features and output the computed properties. The mathematical expressions of the convolution operation are shown in Equations (1) and (2):

x_{j}^{l} = \sum_{i \in M_{j}} x_{i}^{l - 1} * ω_{i j}^{l} + b_{j}^{l}

(1)

and

y_{j} = f (x_{j}^{l}) = f (\sum_{i e M_{j}} x_{i}^{l - 1} * ω_{i j}^{l} + b_{j}^{l}),

(2)

where

x_{j}^{l}

and

x_{i}^{l - 1}

are the output of layer l and the input of layer l, respectively; M_j is the features set of layer l − 1;

ω_{i j}^{l}

is the weight parameter of the convolutional kernel;

b_{i}^{l}

is the offset; and f (·) indicates the activation function.

The pooling layer reduces parameters and mitigates overfitting by downsampling data from the previous layer [42]. The mathematical expression for the maximum pooling is shown in Equation (3):

P_{i}^{l + 1} (j) = \max_{(j - 1) W + 1 \leq l \leq j W} {q_{i}^{l} (t)},

(3)

where

P_{i}^{l + 1} (j)

represents the output value of layer l + 1; W is the width of the pooling kernel; and

q_{i}^{l} (t)

denotes the output of the t-th neuron in the i-th channel of the l layer.

Activation functions introduce nonlinearity, allowing the network to learn complex patterns [43]. ReLU, a rectification function, enhances generalization, and its activation function is described in the following equation:

σ_{Re LU} = \max (0, x)

(4)

Generally, the output values of ReLU are normalized via batch normalization (BN) [44]. The mathematical form of BN is shown in Equations (5) and (6):

{\hat{x}}_{i j} = \frac{x_{i j} - μ_{i}}{\sqrt{σ_{i}^{2} + \in}}

(5)

and

y_{j i} = γ_{i} {\hat{x}}_{j i} + β_{i}

(6)

where

{\hat{x}}_{i j}

represents the normalized input feature;

μ_{i}

and

σ_{i}^{2}

denote the mean and variance of the i-th feature, respectively;

γ_{i}

and

β_{i}

are learnable parameters that scale and translate the normalized features, enabling the model to represent richer features; and factor

\in

is a small constant added to prevent division by zero.

3.2. Temporal Feature Extraction Based on Long Short-Term Memory Network

LSTM (long short-term memory), a type of RNN, excels at capturing long-term dependencies [45]. With more parameters to control retained and discarded information, LSTM effectively analyzes periodicity, trends, and patterns in time series [46].

An LSTM mainly consists of the input gate, the forget gate, and the output gate. In an LSTM, the forget gate manages memory by determining which information to retain or discard. The output of the forget gate is expressed as follows:

f_{t} = σ (W_{f} \cdot x_{t} + R_{f} \cdot h_{t - 1} + b_{f})

(7)

where x_t is the data value at time t; h_t₋₁ is the output at time t − 1; W_f and R_f are the weight matrices associated with f_t; b_f is the corresponding bias vector and indicates the dot product operation.

The information needs to be input and the candidate

{\tilde{C}}_{t}

for the state value of the unit is identified on the basis of Equations (8) and (9):

i_{t} = σ (W_{i} \cdot x_{t} + R_{i} \cdot h_{t - 1} + b_{i})

(8)

and

{\tilde{C}}_{t} = φ (W_{c} \cdot x_{t} + R_{c} \cdot h_{t - 1} + b_{c})

(9)

where W_i and R_i are the weight matrices associated with it; b_i is the corresponding bias vector; W_i and Rc are the weight matrices associated with Ct; and b_c is the corresponding bias vector.

By integrating f_t, i_t,

{\tilde{C}}_{t - 1}

and the state value C_t₋₁ of the unit at time t − 1, the state value C_t of the unit at time t can be obtained via Equation (10):

C_{t} = f_{t} \times C_{t - 1} + i_{t} \times {\tilde{C}}_{t}

(10)

where factor × is the element-wise multiplication of vectors.

Finally, adding state value C_t, the input x_t of the unit, and the output h_t−₁ of the previous unit can produce the final output h_t of the unit, as shown in Equations (11) and (12):

O_{t} = σ (W_{o} \cdot x_{t} + R_{o} \cdot h_{t - 1} + b_{o})

(11)

and

h_{t} = O_{t} * \frac{e^{C_{t}} - e^{C_{t}}}{e^{C_{t}} + e^{C_{t}}}

(12)

3.3. Fault Diagnosis Model Based on Spatiotemporal Feature Extraction

Considering the irreproducibility of fault signals in distribution boxes, a model combining depthwise separable convolution with Bi-LSTM (bidirectional long short-term memory) was established to utilize the fault signals fully. Compared to a convolution model, depthwise separable convolution can capture local features in each channel. It can help prevent subtle fault features from being overlooked. Additionally, incorporating Bi-LSTM can capture the temporal features of signals. A Bi-LSTM consists of two independent LSTM networks, which can be divided so that one processes the sequence forward, and the other processes it in reverse. Compared to LSTM, a Bi-LSTM can extract features more comprehensively. The framework of the proposed model is shown in Figure 5. The main components are the input layer, the 1D convolution layer, the depthwise separable convolution, the Bi-LSTM, the linear layer, and the output layer.

As input data for the proposed model, the raw signals need to be divided into standardized samples. These divided signals are then passed through the 1D convolution layer to generate several feature maps, which are subsequently fed into the depthwise separable convolution.

Depthwise separable convolution consists of deep convolution and point convolution. During deep convolution, each feature map is processed by an independent kernel and can capture the local feature individually. The point convolution integrates information across channels to achieve spatial interaction.

After the depthwise separable convolution, the raw signals are transformed into feature maps with spatial features. To analyze the temporal information from these feature maps, a Bi-LSTM is constructed to extract temporal features. The Bi-LSTM contains the forward LSTM layer and the backward LSTM layer. The forward LSTM deals with data from the start to the end of a time series, while the backward LSTM processes data from the end to the beginning.

Finally, the final feature maps are fed into a linear layer. The linear layer transforms the input feature maps into a one-dimensional vector, which is then used for the final classification tasks.

4. Model Performance Analysis of Fault Diagnosis

4.1. Dataset Introduction

The distribution box we investigated had a working component rotational speed of approximately 600 rpm, and a response frequency of sensor at 1250 Hz. From December 2022 to April 2023, engineers monitored and assessed the vibration signals of four different faults. These signals were collected from sensors installed in Detection Point 3, as shown in Figure 3, ensuring that they accurately reflected real-world operating conditions. The four fault conditions included the pitting of tooth flanks, flat-headed sleeve tooth cracks, gear surface cracks, and gear tooth surface spalling. These are common faults in distribution boxes, and accurately diagnosing them is the primary objective of this study. Additionally, the signal of the normal conditions was recorded. In the latter part of the experiment, faults were replaced with their corresponding labels. The signals collected for the four faults and their normal conditions were organized and segmented. There were 322 samples for each class, with each sample containing 512 data points. The ratio of the training set to the test set for each fault was 7:3. The final segmentation results are shown in Table 2.

Considering that a static 7:3 split can be a simplistic evaluation setup, we implemented leave-P-out cross-validation for a more rigorous performance assessment. In our experiments, we conducted five iterations, each time removing a different subset of 50 samples from each class, leaving 272 samples for training and 82 for validation. The final accuracy of our proposed model, as well as the comparison models, are reported as the average accuracy obtained from cross-validation results.

The time domain signals are shown in Figure 6, illustrating the variations in signal amplitude over time.

In addition, a spectrogram of the Short-Time Fourier transform (STFT) is shown in Figure 7. The horizontal axis represents time, the vertical axis represents frequency, and the color intensity represents the magnitude of the frequency components.

4.2. Model Parameters and Evaluation Metrics

The proposed model consists of 12 layers. The main computational components include an input layer, a 1D convolution layer, a depthwise separable convolution layer, pointwise convolution layers, a Bi-LSTM layer, an adaptive average pooling layer, and a linear layer. Additionally, we incorporated dropout layers, ReLU activation layers, batch normalization layers, and two permute layers to appropriately adjust the inputs for subsequent layers. The model framework parameters are shown in Table 3.

In classification problems, common metrics for evaluating model performance include accuracy, precision, recall, and F1 score. As a type of classification problem, the performance of the diagnosis model is also evaluated using the aforementioned metrics. The specific formulas are presented in Equations (13), (14), (15), and (16):

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N} \times 100 %

(13)

and

P r e c i s i o n = \frac{T P}{T P + F P} \times 100 %

(14)

and

R e c a l l - r a t e = \frac{T P}{T P + F N} \times 100 %

(15)

and

F 1 - s c o r e = \frac{2 \times R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(16)

where TP represents the number of true positive samples, which are samples correctly predicted as positive by the classifier; TN represents the number of true negative samples, which are samples correctly predicted as negative by the classifier; FN represents the number of false negative samples, which are samples incorrectly predicted as negative by the classifier; and FP represents the number of false positive samples, which are samples incorrectly predicted as positive by the classifier.

For this study, all models were run on a computer equipped with an Intel Core i7-9750H CPU, an Nvidia GeForce GTX 1660 Ti GPU, and 16 G of RAM, and the program was written with Python 3.7 and Pytorch. Additionally, all models were run under the same conditions: the learning rate was set to 0.01, the batch size was 32, and the number of epochs was 100. Graphs of the proposed model were plotted to illustrate the entire training history visually and effectively. The horizontal axis represents the number of epochs, while the vertical axis represents accuracy, as shown in Figure 8.

Figure 8 illustrates that the proposed model achieved reasonably acceptable accuracy in the fault diagnosis of the distribution box. The loss function is commonly used to evaluate and monitor the training process of a model. It indicates whether the model is overfitting during training. If the loss value stabilizes after a number of training iterations, it indicates that the model converges, reaching an optimal state. The loss function rate curves of the models are presented in Figure 9. The curves show that the values of loss on the training and test datasets changed during the training process. After 40 iterations, the function loss rate of the model dropped below 0.06, indicating excellent convergence and stability.

Accuracy provides an overall measure of diagnostic performance. To illustrate the diagnostic results for each individual fault, we plotted a confusion matrix. The confusion matrix is a tool used to evaluate the performance of a classification model. The X-axis represents the predicted labels, while the Y-axis represents the true labels. Intuitively, elements on the diagonal indicate the number of samples correctly classified by the model, while elements off the diagonal indicate the number of samples incorrectly classified by the model. As shown in Figure 10, the proposed model correctly classified the vast majority of fault categories. All but 12 fault samples were recognized correctly.

To analyze the distribution of each type, t-SNE was used to visually examine the clustering patterns, classification boundaries, and anomalies in the data. As shown in Figure 11, the left image was generated from the original data, where data points were evenly distributed without distinct clustering or structural patterns. The figure on the right displays the t-SNE plot generated post-training. Purple represents the pitting of tooth flanks, blue represents flat-headed sleeve tooth crack, cyan represents gear tooth surface crack, yellow represents gear surface spalling, and green represents normal conditions.

As shown in the t-SNE visualization, the four types of faults are distinctly separated into four clusters across two dimensions. Among them, the samples of normal conditions, gear surface spalling, and the flat-headed sleeve tooth crack are well-differentiated. But the pitting of tooth flanks and gear tooth surface cracks are close. This is also reflected in the confusion matrix that misclassification occurred in the two classes.

4.3. Model Comparison

To verify the capability of the proposed method, three other methods were used to diagnose the same fault dataset for comparison. The compared models included CNN-LSTM [29], multi-channels convolutional neural network (MCCNN), WDCNN [47], Bi-LSTM, and standard convolutional neural network (CNN). The same test set was used for evaluation, and the classification accuracies were calculated. Figure 12 depicts the training process of the six diagnostic models. The accuracies of the CNN, MCCNN, Bi-LSTM, and proposed models, obtained through cross-validation, were 86.78%, 90.08%, 87.81%, 76.86%, 90.08%, and 97.46%, respectively.

For a comprehensive comparison, the metrics of each model, including precision, recall, F1 score, and accuracy, were collected and are listed in Table 4. The proposed model demonstrates competent performance compared to all other models, achieving over 97% in all metrics. In contrast, the results obtained for the other models are lower than those of the model proposed in this study.

Considering computer hardware and optimization schemes, parameters and FLOPs were introduced to measure model complexity. In deep learning, a smaller value indicates greater computational efficiency. As shown in Table 5, the proposed model had fewer parameters and FLOPs, indicating better efficiency compared to other models.

The confusion matrices of each model are plotted in Figure 13. By analyzing the confusion matrix of the comparison models, it is clear to see that the proposed model had better classification performance than other methods.

4.4. Verification of Noise Robustness

In the working environment of a distribution box, there may be varying levels of white noise interference. White Gaussian noise can be effectively used to simulate noise interference. Three different signal-to-noise ratios (SNRs) of 4 dB, 6 dB, and 8 dB were added to validate the performance of the models. The accuracy of each model under these SNR conditions is shown in Table 6. The accuracy of the proposed model was approximately 90%, showing better resistance to noise compared to other models.

Considering the complex working environment of distribution boxes, noise always impacts the quality of extracted signals. The proposed model has the ability to resist the noise up to a certain extent. Different faults can be distinguished, and engineering accidents can be analyzed. Also, the proposed model can be utilized for intelligent fault diagnosis of the distribution box in the hot milling process.

5. Conclusions

Considering the high load operating conditions, distribution boxes are prone to gradual failure in the hot milling process. To diagnose faults in the distribution box in a timely manner, it is crucial to develop an effective diagnostic method for stable operation. This paper proposes an intelligent fault diagnosis model that integrates depthwise separable convolution with Bi-LSTM. The model achieves an accuracy of 97.46% in diagnosing four types of faults: the pitting of tooth flanks, flat-headed sleeve tooth cracks, gear surface cracks, and gear tooth surface spalling. Additionally, the model demonstrates robustness by maintaining approximately 90% accuracy under different SNR levels, indicating its ability to resist noise. Therefore, the proposed model is suitable for engineering applications. It effectively diagnoses these faults and facilitates timely repairs, which reduces downtime and enhances safety in the hot milling process.

Compared to the five existing diagnostic models, the proposed model offers significant advantages. It not only delivers superior diagnostic performance but also has fewer parameters and lower FLOPs, addressing the limitations associated with the computational load. Future research will focus on reducing the model’s longer inference times compared to other approaches. We will work on optimizing the model to improve its computational efficiency without compromising accuracy and performance.

Author Contributions

Methodology, writing, Y.G.; software, formal analysis, D.Z.; supervision, validation, H.C.; supervision, investigation, X.Y.; conceptualization, methodology, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This project is supported by the Shanghai Scientific Research project (No. 22511103604) and the Fundamental Research Fund for the Central Universities (2232023D-17). Also, this project is supported by the Foundation of Key Laboratory of Vibration and Control of Aero-Propulsion System, Ministry of Education (VCAME202104), which received support from Northeastern University.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hu, B.; Chen, Z. System-Level Predictive Maintenance Optimization for No-Wait Production Machine–Robot Collaborative Environment under Economic Dependency and Hybrid Fault Mode. Processes 2024, 12, 1690. [Google Scholar] [CrossRef]
Yildiz, S.K.; Forbes, J.F.; Huang, B.; Zhang, Y.; Wang, F.; Vaculik, V.; Dudzic, M. Dynamic modelling and simulation of a hot strip finishing mill. Appl. Math. Model. 2009, 33, 3208–3225. [Google Scholar] [CrossRef]
Zhou, X.; Ben, X. Maintenance modelling for work rolls in hot finishing mill group with constraint of thermal character. Int. J. Prod. Res. 2023, 62, 1846–1861. [Google Scholar] [CrossRef]
He, H.-N.; Shao, J.; Wang, X.-C.; Yang, Q.; Liu, Y.; Xu, D.; Sun, Y.-Z. Research and application of approximate rectangular section control technology in hot strip mills. J. Iron Steel Res. Int. 2021, 28, 279–290. [Google Scholar] [CrossRef]
Hu, X.; Cao, Y.; Tang, T.; Sun, Y. Data-driven technology of fault diagnosis in railway point machines: Review and challenges. Transp. Saf. Environ. 2022, 4, 36. [Google Scholar] [CrossRef]
Yuan, J.; He, Z.; Zi, Y.; Liu, H. Gearbox fault diagnosis of rolling mills using multiwavelet sliding window neighboring coefficient denoising and optimal blind deconvolution. Sci. China Technol. Sci. 2009, 52, 2801–2809. [Google Scholar] [CrossRef]
Zhao, C.; Sun, J.; Lin, S.; Peng, Y. Rolling mill bearings fault diagnosis based on improved multivariate variational mode decomposition and multivariate composite multiscale weighted permutation entropy. Measurement 2022, 195, 111190. [Google Scholar] [CrossRef]
Lee, C.W.; Kang, H.K.; Park, C.J.; Shin, K.H. Fault diagnosis of roll shape under the speed change in hot rolling mill. IFAC Proc. Vol. 2005, 38, 45–50. [Google Scholar] [CrossRef]
Liu, J.F.; Chen, M.; Gu, J.Y.; Cheng, L. Remote fault diagnosis system based on EMD and SVM for heavy rolling-mills. Adv. Mater. Res. 2014, 889–890, 681–686. [Google Scholar] [CrossRef]
Zhang, F.; Zong, S.; Ling, Z. Fault diagnosis using kernel principal component analysis for hot strip mill. J. Eng. 2017, 2017, 527–535. [Google Scholar] [CrossRef]
Chen, J.; Wan, Z.; Pan, J.; Zi, Y.; Wang, Y.; Chen, B.; Sun, H.; Yuan, J.; He, Z. Customized maximal-overlap multiwavelet denoising with data-driven group threshold for condition monitoring of rolling mill drivetrain. Mech. Syst. Signal Process. 2016, 68–69, 44–67. [Google Scholar] [CrossRef]
Hinton, G.E.; Osindero, S.; Teh, Y.-W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
Hu, X.; Tang, T.; Tan, L.; Zhang, H. Fault Detection for Point Machines: A Review, Challenges, and Perspectives. Actuators 2023, 12, 391. [Google Scholar] [CrossRef]
Zhang, K.; Zhang, X.; Peng, K. A novel parallel feature extraction-based multibatch process quality prediction method with application to a hot rolling mill process. J. Process. Control 2024, 135, 103166. [Google Scholar] [CrossRef]
Yu, Y.; Shi, P.; Tian, J.; Xu, X.; Hua, C. Rolling mill health states diagnosing method based on multi-sensor information fusion and improved DBNs under limited datasets. ISA Trans. 2023, 134, 529–547. [Google Scholar] [CrossRef]
Hou, D.; Zhang, B.; Chen, J.; Shi, P. Improved GNN based on Graph-Transformer: A new framework for rolling mill bearing fault diagnosis. Trans. Inst. Meas. Control 2024. [Google Scholar] [CrossRef]
Shi, P.; Gao, H.; Yu, Y.; Xu, X.; Han, D. Intelligent fault diagnosis of rolling mills based on dual attention- guided deep learning method under imbalanced data conditions. Measurement 2022, 204, 111993. [Google Scholar] [CrossRef]
Zhao, S.; Bao, L.; Hou, C.; Bai, Y.; Yu, Y. Multi-source domain adversarial graph convolutional networks for rolling mill health states diagnosis under variable working conditions. Struct. Health Monit. 2024. [Google Scholar] [CrossRef]
Hu, X.; Zhang, X.; Wang, Z.; Chen, Y.; Xia, J.; Du, Y.; Li, Y. Railway Switch Machine Fault Diagnosis Considering Sensor Abnormality Scenarios. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; Volume 9, pp. 4834–4839. [Google Scholar]
Chen, Z.; Mauricio, A.; Li, W.; Gryllias, K. A deep learning method for bearing fault diagnosis based on Cyclic Spectral Coherence and Convolutional Neural Networks. Mech. Syst. Signal Process. 2020, 140, 106683. [Google Scholar] [CrossRef]
Dash, B.M.; Bouamama, B.O.; Boukerdja, M.; Pekpe, K.M. Bond Graph-CNN based hybrid fault diagnosis with minimum labeled data. Eng. Appl. Artif. Intell. 2024, 131, 107734. [Google Scholar] [CrossRef]
Zhao, Z.; Jiao, Y. A Fault Diagnosis Method for Rotating Machinery Based on CNN With Mixed Information. IEEE Trans. Ind. Inform. 2023, 19, 9091–9101. [Google Scholar] [CrossRef]
Zhang, J.; Kong, X.; Li, X.; Hu, Z.; Cheng, L.; Yu, M. Fault diagnosis of bearings based on deep separable convolutional neural network and spatial dropout. Chin. J. Aeronaut. 2022, 35, 301–312. [Google Scholar] [CrossRef]
Zhou, S.; Liu, J.; Fan, X.; Fu, Q.; Goh, H.H. Thermal Fault Diagnosis of Electrical Equipment in Substations Using Lightweight Convolutional Neural Network. IEEE Trans. Instrum. Meas. 2023, 72, 5005709. [Google Scholar] [CrossRef]
Ling, L.; Wu, Q.; Huang, K.; Wang, Y.; Wang, C. A Lightweight Bearing Fault Diagnosis Method Based on Multi-Channel Depthwise Separable Convolutional Neural Network. Electronics 2022, 11, 4110. [Google Scholar] [CrossRef]
Wang, Y.; Yan, J.; Sun, Q.; Jiang, Q.; Zhou, Y. Bearing Intelligent Fault Diagnosis in the Industrial Internet of Things Context: A Lightweight Convolutional Neural Network. IEEE Access 2020, 8, 87329–87340. [Google Scholar] [CrossRef]
Lei, J.; Liu, C.; Jiang, D. Fault diagnosis of wind turbine based on Long Short-term memory networks. Renew. Energy 2019, 133, 422–432. [Google Scholar] [CrossRef]
Zou, F.; Zhang, H.; Sang, S.; Li, X.; He, W.; Liu, X. Bearing fault diagnosis based on combined multi-scale weighted entropy morphological filtering and bi-LSTM. Appl. Intell. 2021, 51, 6647–6664. [Google Scholar] [CrossRef]
Cao, L.; Qian, Z.; Zareipour, H.; Huang, Z.; Zhang, F. Fault Diagnosis of Wind Turbine Gearbox Based on Deep Bi-Directional Long Short-Term Memory Under Time-Varying Non-Stationary Operating Conditions. IEEE Access 2019, 7, 155219–155228. [Google Scholar] [CrossRef]
Huang, T.; Zhang, Q.; Tang, X.; Zhao, S.; Lu, X. A novel fault diagnosis method based on CNN and LSTM and its application in fault diagnosis for complex systems. Artif. Intell. Rev. 2022, 55, 1289–1315. [Google Scholar] [CrossRef]
Zhi, Z.; Liu, L.; Liu, D.; Hu, C. Fault Detection of the Harmonic Reducer Based on CNN-LSTM With a Novel Denoising Algorithm. IEEE Sens. J. 2021, 22, 2572–2581. [Google Scholar] [CrossRef]
Wang, T.; Zhang, L.; Wang, X. Fault Detection for Motor Drive Control System of Industrial Robots Using CNN-LSTM-based Observers. CES Trans. Electr. Mach. Syst. 2023, 7, 144–152. [Google Scholar] [CrossRef]
Chen, Y.; Yin, X.; Zhang, R.; Gao, F. Reinforced convolutional neural network fault diagnosis of industrial production systems. Chem. Eng. Sci. 2024, 299, 120466. [Google Scholar] [CrossRef]
Qiao, M.; Yan, S.; Tang, X.; Xu, C. Deep Convolutional and LSTM Recurrent Neural Networks for Rolling Bearing Fault Diagnosis Under Strong Noises and Variable Loads. IEEE Access 2020, 8, 66257–66269. [Google Scholar] [CrossRef]
Liu, X.; Chen, G. A Siamese CNN-BiLSTM-based method for unbalance few-shot fault diagnosis of rolling bearings. Meas. Control 2024, 57, 551–565. [Google Scholar]
Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W. CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl. 2020, 80, 16979–16995. [Google Scholar] [CrossRef]
Sun, H.; Chen, M.; Weng, J.; Liu, Z.; Geng, G. Anomaly Detection for In-Vehicle Network Using CNN-LSTM With Attention Mechanism. IEEE Trans. Veh. Technol. 2021, 70, 10880–10893. [Google Scholar] [CrossRef]
Xia, T.; Song, Y.; Zheng, Y.; Pan, E.; Xi, L. An ensemble framework based on convolutional bi-directional LSTM with multiple time windows for remaining useful life estimation. Comput. Ind. 2020, 115, 103182. [Google Scholar] [CrossRef]
Huang, G.; Zhang, Y.; Ou, J. Transfer remaining useful life estimation of bearing using depth-wise separable convolution recurrent network. Measurement 2021, 176, 109090. [Google Scholar] [CrossRef]
Ilesanmi, A.E.; Ilesanmi, T.O. Methods for image denoising using convolutional neural network: A review. Complex Intell. Syst. 2021, 7, 2179–2198. [Google Scholar] [CrossRef]
Zhao, B.; Zhang, X.; Zhan, Z.; Pang, S. Deep multi-scale convolutional transfer learning network: A novel method for intelligent fault diagnosis of rolling bearings under variable working conditions and domains. Neurocomputing 2020, 24, 24–38. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Muller, K.-R. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proc. IEEE 2021, 109, 247–278. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, J.; Liu, R.; Zhao, S. Improving Accuracy and Interpretability of CNN-Based Fault Diagnosis through an Attention Mechanism. Processes 2023, 11, 3233. [Google Scholar] [CrossRef]
Cui, Z.; Ke, R.; Pu, Z.; Wang, Y. Stacked bidirectional and unidirectional LSTM recurrent neural network for forecasting network-wide traffic state with missing values. Transp. Res. Part C Emerg. Technol. 2020, 118, 102674. [Google Scholar] [CrossRef]
Zhang, X.; Yang, J.; Yang, X. Residual Life Prediction of Rolling Bearings Based on a CEEMDAN Algorithm Fused with CNN–Attention-Based Bidirectional LSTM Modeling. Processes 2024, 12, 8. [Google Scholar] [CrossRef]
Zhang, A.; Li, S.; Cui, Y.; Yang, W.; Dong, R.; Hu, J. Limited Data Rolling Bearing Fault Diagnosis With Few-Shot Learning. IEEE Access 2019, 7, 2169–3536. [Google Scholar] [CrossRef]

Figure 1. Hot rolling line.

Figure 2. Finishing mill structure.

Figure 3. Sensor spatial layout.

Figure 4. Flow of intelligent fault diagnosis for distribution box.

Figure 5. Structure of the proposed model.

Figure 6. Time domain waveform of original signals: (a) pitting of tooth flanks; (b) flat-headed sleeve tooth crack; (c) gear surface crack; (d) gear tooth surface spalling (e) normal conditions.

Figure 7. STFT analysis of raw signals: (a) pitting of tooth flanks; (b) flat-headed sleeve tooth crack; (c) gear surface crack; (d) gear tooth surface spalling; (e) normal conditions.

Figure 8. Accuracy curve of proposed model.

Figure 9. Loss curve of the proposed model.

Figure 10. Confusion matrices of the proposed model.

Figure 11. t-SNE visualization of the proposed model.

Figure 12. Training history of different models: (a) accuracy curves; (b) loss curves.

Figure 13. Confusion matrix of the compared models: (a) CNN-LSTM; (b) MCCNN; (c) WDCNN; (d) Bi-LSTM; (e) CNN; (f) Proposed model.

Table 1. Specifications of the hot rolling.

Specifications	Parameters
Production line name	Hot rolling 1580
Capacity	3 million tons per year
Thickness range	1.2 mm to 12.5 mm
Maximum width	1580 mm
Production speed	5 m/s to 15 m/s
On year	Since 2000
Number of stands	7 finishing mills

Table 2. Description of sample distribution.

Fault Location	Label	Sample Number	Training/Testing
Pitting of tooth flanks	A	322	225/97
Flat-headed sleeve tooth crack	B	322	225/97
Gear surface crack	C	322	225/97
Gear tooth surface spalling	D	322	225/97
Normal conditions	E	322	225/97

Table 3. Structural parameters of the proposed model.

Network Layer	Parameter	Output Size
Input		512 × 1
ConV	Kernel Size = 32; Stride = 4; Channel Size = 24	121 × 24
Deep_ConV	Kernel Size = 16; Stride = 1; Channel Size = 24; Padding = same	121 × 24
ReLU
BatchNormal
Point_ConV	Kernel Size = 1; Stride = 1; Channel Size = 24; Padding = same	121 × 24
Dropout	Dropout = 0.5
Permute
BiLSTM	Input size = 24; hidden size = 16; batch first = True; bidirectional = True	2 × 16
Permute
AdaptiveAvgPool
Linear	16 × 2; out features = 5	1 × 5

Table 4. Performance comparison of different models.

Model	Precision	Recall	F1-Score	Accuracy
CNN-LSTM	86.99%	86.78%	86.87%	86.78%
MCCNN	90.13%	90.08%	9011%	90.08%
WDCNN	88.11%	87.81%	87.87%	87.81%
Bi-LSTM	76.73%	76.86%	76.57%	76.86%
CNN	90.62%	90.08%	90.10%	90.08%
Proposed Model	97.64%	97.46%	97.46%	97.46%

Table 5. Comparison of model parameters and FLOPs.

Model	Parameters	FLOPs
CNN-LSTM	6.2 × 10⁴	3.2 × 10⁷
MCCNN	5.7 × 10⁴	3.2 × 10⁷
WDCNN	8.7 × 10⁴	5.1 × 10⁶
Bi-LSTM	1.2 × 10⁶	2.1 × 10⁶
CNN	2.5 × 10⁵	4.7 × 10⁶
Proposed Model	7.1 × 10³	4.6 × 10⁶

Table 6. Accuracy of each model under different SNR conditions.

Model	4DB	6DB	8DB
CNN-LSTM	61.36%	68.18%	58.06%
MCCNN	83.88%	72.52%	70.87%
WDCNN	88.22%	73.55%	78.10%
Bi-LSTM	69.42%	66.94%	70.04%
CNN	85.74%	44.21%	59.09%
Proposed model	88.22%	87.40%	91.12%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, Y.; Zhou, D.; Chen, H.; Yue, X.; Cheng, Y. Fault Intelligent Diagnosis for Distribution Box in Hot Rolling Based on Depthwise Separable Convolution and Bi-LSTM. Processes 2024, 12, 1999. https://doi.org/10.3390/pr12091999

AMA Style

Guo Y, Zhou D, Chen H, Yue X, Cheng Y. Fault Intelligent Diagnosis for Distribution Box in Hot Rolling Based on Depthwise Separable Convolution and Bi-LSTM. Processes. 2024; 12(9):1999. https://doi.org/10.3390/pr12091999

Chicago/Turabian Style

Guo, Yonglin, Di Zhou, Huimin Chen, Xiaoli Yue, and Yuyu Cheng. 2024. "Fault Intelligent Diagnosis for Distribution Box in Hot Rolling Based on Depthwise Separable Convolution and Bi-LSTM" Processes 12, no. 9: 1999. https://doi.org/10.3390/pr12091999

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fault Intelligent Diagnosis for Distribution Box in Hot Rolling Based on Depthwise Separable Convolution and Bi-LSTM

Abstract

1. Introduction

2. Fault Diagnosis for Distribution Box

2.1. Distribution Box in Rolling Mill

2.2. Installation of Sensors

2.3. Intelligent Fault Diagnosis for Distribution Box

3. Framework of Proposed Model

3.1. Spatial Feature Extraction Based on Convolutional Neural Networks

3.2. Temporal Feature Extraction Based on Long Short-Term Memory Network

3.3. Fault Diagnosis Model Based on Spatiotemporal Feature Extraction

4. Model Performance Analysis of Fault Diagnosis

4.1. Dataset Introduction

4.2. Model Parameters and Evaluation Metrics

4.3. Model Comparison

4.4. Verification of Noise Robustness

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI