1. Introduction
As an important component of the hot rolling process, the finishing mill performs a key operation in the control of the final quality of products [
1]. Considering the harsh external environmental conditions and the complexity of internal mechanisms, finishing mills inevitably experience performance degradation and failure [
2]. In a finishing mill, the distribution box functions like the power transmission and distribution in the rolling system. The variable speed and load conditions can also result in a higher probability of failure compared to other mechanisms in the finishing mill [
3]. The inability to diagnose faults in a timely manner may increase downtime, and this downtime will cause potential economic losses. To mitigate the economic impact of downtime, it is essential to develop effective fault diagnosis methods for distribution boxes. The difficulties in fault diagnosis for distribution boxes stem from two primary issues. First, the hot rolling process is characterized by lengthy procedures and interconnected structures, introducing numerous uncertainties and nonlinearities [
4]. Second, the complex working environment of the distribution box makes fault pulses easily masked by ambient noise. Consequently, traditional physical models and machine learning (ML) methods may fall short of addressing these challenges effectively. Specifically, the manual feature extraction process and the limited depth of models restrict their ability to accurately classify faults in noisy environments.
With the enhancement of data acquisition and processing capabilities, big data approaches have emerged as new methods for fault diagnosis [
5]. In recent years, methods based on machine learning have demonstrated promising initial results in the fault diagnosis of distribution boxes in finishing mills. Considering the environmental conditions in hot rolling operations, Yuan [
6] proposes a novel approach combining multiwavelet sliding window neighboring coefficient denoising and optimal blind deconvolution techniques. Zhao [
7] combined improved multivariate variational mode decomposition with multivariate composite multiscale weighted permutation entropy to extract bearing fault features. Shin [
8] proposed a new diagnosis method for roll eccentricity under roll speed changes. Liu [
9] constructed the architecture of a remote fault diagnosis system based on empirical mode decomposition and support vector machines for heavy mills. Zhang [
10] employed a KPCA-based approach to nonlinear activation issues in a hot rolling automation system. Chen [
11] suggested a customized maximal-overlap multiwavelet denoising technique for fault identification. Although the above methods have achieved promising results in the study of fault diagnosis in a rolling mill, machine learning and signal process methods still face some difficulties. In machine learning, feature extraction and fault classification are usually performed separately. In particular, the manual feature extraction process limits the performance accuracy of fault classification results. Additionally, only a limited number of layers are employed in the model, which makes it difficult to extract features in a noisy environment.
Considering the shortcomings of traditional ML, recent optimizations and innovations in neural networks have shown promising results. Hinton [
12] introduced the concept of deep learning (DL). Since its introduction, DL has emerged as a beneficial technology widely applied in computer vision, natural language processing, and other fields, and has demonstrated remarkable performance. Compared to traditional ML, DL can deal with large-scale data and extract deeper features. By increasing the number of neurons in the neural network and the depth of the network, the accuracy of classification can be improved. Additionally, in the fault diagnosis of a rolling mill, DL has gradually become a research focus [
13]. To enhance the feature extraction capabilities, Zhang [
14] integrated the attention mechanism into both CNN and LSTM. Yu [
15] fused signals from multiple sensors as inputs into the DBN, which enhanced the performance under limited datasets. Considering the coupling relationship of the signals of the finishing mill, Hou [
16] built a graph transformer model for fault diagnosis. Shi [
17] used time–frequency images as inputs for a dual attention-guided feature enhancement network, improving the model’s focus on both temporal and spectral features. Zhao [
18] presented a multisource domain adversarial graph convolutional networks framework to overcome variable working conditions. That model can achieve the fault diagnosis of a rolling mill under complex conditions. From these studies, we see that DL can mine deeper features in the input data, reduce the impact of manual feature extraction, and better address the challenges that traditional ML faces to achieve desirable feature extraction and fault recognition.
Benefiting from multilayer convolution and pooling operations, a CNN can more effectively mine deeper failure features from data [
19]. Chen [
20] employed group normalization in CNN to normalize the feature maps of the network, which reduces the impact of data distribution discrepancy. Dash [
21] combined the bond graph technique with CNN to enhance fault diagnosis, even with a minimal amount of labeled data. Considering the calculation cost, Zhao [
22] proposed an efficient and lightweight model for fault diagnosis. Zhang [
23] investigated the use of spatial dropout regularization to control gradient explosion and improve network stability. Zhou [
24] combined inverted residuals with a lightweight network to reduce interference from unclear and small datasets. Liu [
25] created a method with depthwise separable convolutions to simultaneously extract different features from vibration signals. Wang [
26] developed a lightweight CNN for fault diagnosis of bearings, which can satisfy the need for fewer parameters and storage space. In addition, considering the temporal nature of fault signals, several time series-based models were investigated for fault diagnosis. Lei [
27] presented an end-to-end long short-term memory model to learn features directly from multivariate time series data. Zou [
28] linked multiscale weighted entropy morphological filtering and Bi-LSTM to overcome the low degree of fault discrimination and high computational complexity. Cao [
29] suggested a method employing deep bidirectional long short-term memory to address time-varying and non-stationary operating conditions. Based on the above research, CNNs excel at extracting local features from signals, while LSTMs are adept at capturing temporal information. Consequently, many scholars combined CNNs and LSTMs to enhance the feature extraction capability of the model. Huang [
30] constructed a CNN-LSTM model to extract feature information and time delay information, demonstrating both the accuracy and noise immunity of the model. Zhi [
31] integrated joint wavelet regional correlation threshold denoising with a CNN-LSTM model, finding that the model effectively mines hidden features after denoising. Wang [
32] employed a CNN and an LSTM network to address feature nonlinearity and complex conditions in the motor drive control system. Considering the varying scales of fault features, Chen [
33] proposed an MRJDCNN-LSTM model that effectively reduces the loss of essential features. Qiao [
34] constructed a model that employs two different convolutions and two LSTMs, enhancing diagnostic ability under variable load and noise conditions. Liu [
35] established a Siamese CNN-Bi-LSTM model to address imbalanced sample classification and varying working conditions. Several related fields also modeled the CNN-LSTM class benefit after its superior feature extraction capabilities. Ullah [
36] constructed a CNN and Bi-LSTM model for real-time anomaly detection in complex surveillance scenarios. Sun [
37] proposed a novel intrusion detection model based on CNN-LSTM with an attention mechanism, improving both convergence speed and prediction accuracy. Xia [
38] proposed an ensemble framework that fuses convolutional bidirectional long short-term memory with multiple time windows for accurate remaining useful life prediction. Huang [
39] used a transfer depthwise separable convolutional recurrent network to estimate the remaining useful life of the bearing with incomplete data. The widespread application and effective results of CNN-LSTM models demonstrate their enhanced performance in feature extraction.
Considering that engineering fault data are often limited, irreproducible, and prone to noise interference, we designed a fault diagnosis model that integrates a depthwise separable convolution block with a Bi-LSTM block. This structure allows the model to extract both spatial and temporal features separately from the input signals, maximizing the information captured from the signals. By leveraging spatial features through convolution and temporal features through Bi-LSTM, the model takes full advantage of the inherent information in the signals. The Bi-LSTM improves the ability to capture time-related dependencies more effectively than a standard LSTM. Meanwhile, depthwise separable convolution reduces the number of parameters and computational complexity, ensuring the model remains lightweight without reducing accuracy.
To address the issue of timely fault diagnosis and minimize downtime for distribution boxes, this paper introduces a novel intelligent fault diagnosis model for the distribution box. The proposed model incorporates a depthwise separable convolution block and a Bi-LSTM block, designed to separately capture spatial and temporal features from fault signals. This model is capable of accurately diagnosing four types of faults: the pitting of tooth flanks, flat-headed sleeve tooth cracks, gear surface cracks, and gear tooth surface spalling. Compared to existing diagnostic models, the proposed model demonstrates superior accuracy, enhanced noise resistance, and a lightweight design.
This paper is organized as follows:
Section 2 describes the distribution box data detection condition and diagnostic process within a hot rolling process.
Section 3 introduces fundamental algorithm theories and the construction of the proposed model. The performance of the proposed model and comparison results are shown in
Section 4. Conclusions are presented in
Section 5.