1. Introduction
Diesel engines are generally utilized in the construction of machinery, automobiles, vessels, and in other production areas. Due to their complex design structure and their long-term operation in harsh environments, various failures will inevitably occur, which lead to prolonged equipment downtime and increased maintenance costs. So, dealing with these issues has become a hot research direction, contemporarily studied by many scholars. Moreover, various intelligent data-driven approaches to coping with fault diagnosis of diesel engine failures have been suggested and satisfactory research outcomes have been achieved [
1,
2,
3].
Fault diagnosis approaches with machine learning methodologies substantially include three fault facets: extraction of features, reduction of feature dimensionality, and pattern recognition of features. For these purposes, the time-frequency analysis of the vibration signal is studied, the commonly implemented algorithms of which are called Hilbert Huang Transform (HHT), Wavelet Transform (WT), and Short Time Fourier Transform (STFT) [
4,
5,
6]. The implementations of these algorithms aim at extracting the feature parameters of both time and frequency domains [
7,
8]. When feature dimensionality reduction is under consideration, principal component analysis (PCA), kernel principal component analysis (k-PCA), and autoencoder are generally implemented [
9,
10,
11]. Similarly, when pattern recognition is under consideration, support vector machine (SVM), random forest (RF), and k-nearest neighbor (k-NN) methods are commonly utilized [
12,
13,
14]. Thus, these feature extraction and dimensionality reduction techniques are combined to detect pattern recognition of various failures. Moreover, various diagnostic methods to detect faults in diesel engines have been suggested [
15,
16,
17].
The conventional diagnosis methods utilizing intelligent approaches constitute a small set of machine learning methods with higher diagnosis accuracies. However, they are also accompanied by restrictions, as follows:
(1) Due to a large quantity of both noise and interference available in sampled vibration signals, detection of weak fault signals is generally difficult. To cope with weak fault signals, more advanced signal preprocessing techniques must be employed [
18].
(2) If the feature parameters are set improperly, or rely on largely preeminent knowledge of experts [
19,
20], the accuracy of fault diagnosis is affected.
As artificial intelligence technology rapidly advances, deep learning has gradually turned into an efficient method and surmounts the deficiencies of the conventional fault diagnosis approaches. So, extracting useful fault features directly from raw data is a key advantage. Hence, deep belief network (DBN), convolutional neural network (CNN), and long short-term memory network (LSMN) are commonly implemented to diagnose faults related to mechanical applications [
21,
22]. The DBN was implemented by Xu et al. [
23] to diagnose the air path fault of turbofan engines with higher classification accuracy. The parameters of the CNN were optimized by Zhou et al. [
24], by utilizing the sorting method of input measurement parameters, which was applied to detect the fault of the gas circuit of an engine with a relatively ideal diagnosis impact. Han et al. [
25] constructed a data-driven fault prediction model using an LSTM network and applied it to marine diesel engines, and obtained better fault prediction results.
Theproblems expressed below still exist and need to be dealt with:
(1) Since diesel engines work in complex environments for a long time, weak fault features are masked by stronger noise and interference signals, which greatly increases the difficulty of direct fault diagnosis using one-dimensional vibration signals.
(2) The parameters in network training increase with the increment of the number of hidden layers. When a multi-layer deep network model is trained, both preparing labeled samples in large quantities and the requirements of computational power and time need to be taken into account. However, collecting a large amount of data with faulty tags is almost impossible when the equipment basically runs in a steady state, in which few failures could occur.
When training a multi-layer deep network model from scratch, there is not only a need to prepare a large number of labeled samples, but also the training consumes a lot of computing power and time. Then, in practical engineering applications, the equipment is running in a normal condition, and there exist few failures, which makes it impossible to obtain a large number of data samples with faulty tags.
(3) The hyperparameter optimization and selection of the deep learning network model consumes a lot of time in training the network model and, thus, this will directly affect its performance.
Due to the issues mentioned above, a method called transfer learning (TL) was suggested when mechanical fault diagnosis is under consideration. Xu et al. [
26] suggested an approach employing migration component analysis to determine fault diagnosis when various working conditions were taken into account. Zhao et al. [
27] realized cross-domain aero-engine fault diagnosis by employing extreme learning machines, using the TL method. Xiong et al. [
28] suggested a methodology utilizing stacked autoencoders and feature transfer to diagnose diesel engine faults. Both training and optimizing the deep learning network model are essential when the available TL research is under consideration, which restricts its application in engineering implementations.
To resolve the issues presented above, this manuscript suggests a methodology to diagnose faults utilizing both optimized variational mode decomposition (VMD) and deep transfer learning (DTL), concurrently. Firstly, the VMD method is optimized by using the K value of the dispersion entropy to conduct the noise reduction in the original vibration. Then, the noise-reduced vibration signal is converted into a frequency map with two dimensions by the STFT method. To lower both training time and computational complexity of the deep learning network model, a TL method, based on the ResNet18 network model, is suggested, which could effectively extract useful features from the frequency map represented by two dimensions, and quickly achieve accurate fault classification. Finally, experiments conducted present both better-extracted features and higher diagnostic accuracies.
The key contributions of the manuscript are expressed as follows:
(1) The optimized VMD can not only withhold weak fault signals, but also better excludes both noisy and interfering signals in the original vibration signal, which results in successful noise reduction.
(2) By employing the ResNet18 network model, trained on the ImageNet dataset as the transfer object, which provides a fine-tuning of the network structure, the training time of the network could be effectively lowered, and the accuracy of identifying faults could be improved.
(3) This research utilizes both optimized VMD and DTL concurrently to extract and classify fault features of diesel engines in strong noise environments. The proposed method exhibited greatly improved performance and a practically usable form to diagnose faults in applications in experiments.
The remaining sections of the manuscript are constructed as follows: A method dealing with noise reduction utilizing the optimized VMD is described in
Section 2.
Section 3 details the basic theory of time-frequency images and DTL models. The fault diagnosis process of a diesel engine, utilizing both optimized VMD and DTL, is described in
Section 4.
Section 5 analyzes and validates the method suggested in this manuscript employing pre-set experiments. The conclusion is provided in
Section 6. Abbreviations provides the commonly used symbols in this manuscript.
2. A Noise Reduction Approach Utilizing the Optimal VMD
The VMD, Variational Mode Decomposition, as a novel non-stationary and non-linear signal processing methodology, was proposed in [
29]. By resolving the constrained variational problem to replace the previous empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD), and local mean decomposition (LMD) were suggested [
30]. Hence, the modal aliasing and end effect problems of the conventional decomposition method were improved. However, the VMD itself also has certain limitations. The determination of its decomposition level K needs to be set manually, which has a certain degree of subjectivity and randomness that will have an impact on the modal decomposition process. Therefore, this paper adopts a method of optimizing the VMD based on the
K value of dispersion entropy to realize the adaptive decomposition and reconstruction of noise reduction of the signal.
2.1. The VMD
The function of the VMD is to establish and resolve the variational problem, and decompose the available signal f into K Intrinsic Mode Function (IMF) components, . Both epicenter frequency and bandwidth are continuously revised by running the process iteratively, since the summation of each component is assumed to be equal to the input signal. Then, the IMF component minimizing the summation of the bandwidths related to the IMF is obtained, as follows:
(1) By running the Hilbert transform, the
analytic signal unilateral spectrum of each IMF component is obtained, namely,
then, the position of the center frequency of each eigenmode to the corresponding baseband is expressed by
where the impulse function is denoted by
, the sign “
” represents the convolution calculation, and
represents the center frequency of each IMF component;
(2) The
L2 norm of the gradient of the demodulated signal is computed, and the bandwidth of each IMF component is estimated. Then, the constrained variational model expression is expressed by
where
n denotes the
L2 norm operation;
is the
k IMF component, and
represents the original time-domain signal;
represents the central frequency of each component
(3) Both
α and
λ are used in the solution of the variational problem, which is called the quadratic penalty factor and the Lagrange multiplication operator. Then, the constrained variational problem is transformed into an unconstrained form, in which the Lagrange function in the augmented form is expressed by
where
α represents the quadratic penalty factor, which is usually selected as a positive large number to enhance the accuracy of the reconstructed signal;
λ is called the Lagrange multiplication operator, and
λ(t) guarantees strict constraints. ADMM, the Alternate Direction Method of Multipliers, is employed to recompute the values
uk,
ωk and
λ in each component. Then, the saddle point of the augmented Lagrange function is calculated, namely, the constrained variational model is optimized to realize modal decomposition.
2.2. Dispersion Entropy
The principle of the VMD signal decomposition was introduced in the previous section. However, determining the VMD decomposition layers, K, needs to be set manually. So, K would directly have an impact on the decomposition effect of the signal when a complex signal is processed. When K is set to large numbers, over-decomposition could occur, thus redundant components would be obtained. On the other hand, when K is assigned to small numbers, under-decomposition could happen and, thus, the useful signal could not be effectively separated. In this paper, it is proposed that K is optimized by employing scatter entropy to find and conduct the decomposition of the signal adaptively.
DE, Dispersion Entropy, is a novel method suggested by Rostaghi and Azami in 2016 to measure the complexity of time series [
31]. The fact that conventional permutation entropy does not consider the magnitude of amplitude was remedied and, thus, better stability and faster calculation were mentioned as advantages. The steps are presented by:
(1) The function of the normal distribution is selected as the nonlinear normalization function. The sequence x = {x1, x2, …, xN} is normalized with the mean and standard deviation of x as parameters. Then y = {y1, y2, …, yN} is attained, where N represents the sequence length with y ∈ (0,1).
(2) Map
y to integers in the range [1,
c] through a linear algorithm to obtain the sequence by
where
c and int represent the category numbers and rounding, respectively.
(3) Compute both the embedded vector and the scatter pattern
(
v = 1, 2,…,
c), and compute the probability
P for all scattered patterns defined by
where
,
,…,
;
are
the number of mappings to scatter patterns,
m and
d represent the embedding dimension and the time delay, respectively.
(4) The original sequence
DE is calculated by utilizing the definition of information entropy as follows:
According to the calculation method of the DE, the dispersion entropy, having maximum value when the probability of all dispersion modes is equal, is found. The larger the value of the dispersion entropy, the greater the complexity of the time series would be. Thus, the manuscript employs the dispersion entropy to optimize K of the VMD decomposition level. The turning point of the dispersion entropy change in each IMF component is obtained to determine the decomposition level K through the decomposition of the VDM. Then, the IMF components with useful values are selected to reconstruct the signal, which provides the noise reduction of the vibration signal.
4. Fault Diagnosis of the Diesel Engine Employing Both Optimized VMD and DTL
Figure 7 depicts the process to diagnose faults utilizing both optimized VMD and DTL, which consists of steps such as data preprocessing, the transformation of a two-dimensional time-frequency map, training network model, and fault classification. More detailed descriptions are given below.
Step 1 Preprocess data. The VMD algorithm decomposes the original vibration signal, and the scatter entropy value of the IMF component is solved. When the change of dispersion entropy first appears as a turning point, the corresponding decomposition level is found to be the optimal decomposition level. The valuable information of the IMF component is screened out, according to the dispersive entropy value. Then, it is superimposed and reconstructed to obtain a noise reduction signal.
Step 2 Transform the two-dimensional time-frequency map. Taking the noise reduction signal as the input condition, the STFT algorithm is employed to produce the corresponding data set of the two-dimensional time-frequency graph. Then, training, testing, and validation sets are generated, respectively, when the data set is split.
Step 3 Train the network model. The parameters of the pre-trained ResNet18 network model are utilized as the transfer object. The parameters of the convolution 1 layer and the residual layer are frozen, and the fully connected layer is fine-tuned. Both training and test data sets are imported into the pre-trained model. The fine-tuned ResNet18 network model is retained to attain a novel DTL-ResNet18 network model.
Step 4 Classify Fault. The new test and validation data sets are imported into the trained ResNet18 network model. The Softmax activation function is utilized to obtain the final result of the fault diagnosis.
6. Conclusions
This research suggested an approach to diagnosing faults utilizing the combination of optimized VMD and DTL, concurrently. A successful transference of the pre-trained ResNet18 model on the ImageNet samples was conducted to diagnose faults of diesel engines in a more complex noise environment. The manuscript contributes the following:
(1) To decompose the level selection of the VMD, the level selection method of dispersive entropy was adopted and better noise reduction impact was attained.
(2) A two-dimensional time-frequency image processing problem with noise reduction was dealt with, after employing the STFT method to convert the one-dimensional vibration signal so as to extract features.
(3) An approach dependent upon the DTL was suggested. The ResNet-18 network was selected as the transfer object, and the network was fine-tuned. The proposed approach could directly derive key features of faults from two-dimensional time-frequency images and perform fault diagnosis. Thus, the difficulty of manually extracting features and the dependence on expert experience were reduced. Therefore, both diagnostic efficiency and accuracy were greatly improved.
The experiments showed that the implementation impact of the DTL concerning the diagnosis of faults for mechanical parts was better than other network models. As artificial intelligence technology advances rapidly, the DTL would play a key role in various engineering applications. Besides, the results of this research will help expand new knowledge in this field and have good reference value.