A Time–Frequency Image Quality Evaluation Method Based on Improved LIME

Bai, Yihao; Cheng, Weidong; Wen, Weigang; Liu, Yang

doi:10.3390/app14072917

Open AccessArticle

A Time–Frequency Image Quality Evaluation Method Based on Improved LIME

School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(7), 2917; https://doi.org/10.3390/app14072917

Submission received: 27 February 2024 / Revised: 22 March 2024 / Accepted: 27 March 2024 / Published: 29 March 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Deep learning has the characteristics of high efficiency, high accuracy and low knowledge dependence, and has become a hot method in the research field of rotating machinery fault diagnosis. The time–frequency transform can show both the time and frequency characteristics of the vibration signal, so the time–frequency image is often used as the input of deep learning networks. At present, there are many time–frequency transform methods, and how to choose one is a problem worth discussing. This paper proposed a time–frequency image quality evaluation method based on improved local interpretable model-agnostic explanations (LIME). With the input of deep learning networks as the application background, this method evaluates the time–frequency image quality of rotating machinery vibration signals from two aspects: the accuracy of the diagnosis results and the consistency of the interpretation results with prior knowledge. The feasibility of the proposed evaluation method is verified by experiments on the measured data set, and engineers’ trust in the deep learning model is improved.

Keywords:

fault diagnosis; time–frequency image; deep learning; quality evaluation; LIME

1. Introduction

Rotating machinery is an important part of mechanical systems, and monitoring its health status is of great significance to the overall operation and maintenance of mechanical systems. Fault diagnosis is an important means to establish the relationship between monitoring data and machine conditions [1]. Since the 21st century, the deep learning method based on artificial intelligence has been widely used in the fault diagnosis of rotating machinery for its intelligent and efficient characteristics [2,3,4,5]. In these studies, most researchers focused on the design of the network structure, to some extent ignoring the important influence of the input on the diagnosis outcome. But input is crucial for nerve training [6,7,8]. Not only does it affect the time required for network training, but the description of the features in the data set also affects the accuracy of the network results [9,10].

Vibration signal is the most widely used monitoring data in rotating machinery fault diagnosis. In engineering practice, mechanical equipment has many parts and complex structures, and its operating conditions often change with time, so its vibration signal will change nonlinearly. Compared with simple time domain data or frequency domain data, time–frequency image data obtained through time–frequency transformation can express both global and local features of signals in time domains and frequency domains and accurately reveal changes in vibration signals over time, so these data are widely used as the input of deep learning models [11]. Since the 1940s, the pace of development of time–frequency transfer technology has never diminished [12,13], and in the last decade, it has seen more rapid development. Several new time–frequency transfer methods have been proposed to achieve higher time–frequency aggregation [14,15,16]. This goal is undoubtedly important in time–frequency transformation research, but when it is used as an input to deep learning for fault diagnosis, this goal sometimes does not lead to good results; images with higher time–frequency aggregation as the input do not necessarily lead to higher diagnostic accuracy. This is because the fault information is often contained in the sideband of the characteristic frequency. Whether it is the preprocessing method of time–frequency transformation represented by mode decomposition or the postprocessing method of time–frequency transformation represented by energy rearrangement, the time–frequency image will lose part of the fault information while improving the time–frequency aggregation. Therefore, in the field of intelligent diagnosis of rotating machinery, how to evaluate the quality of time–frequency images becomes an urgent problem to be solved.

Under the same network structure, diagnostic accuracy is an important index to measure input quality. However, past examples show that high accuracy does not always mean machine learning is really good. Sometimes, machine learning’s predictions are highly accurate but for the wrong reasons [17]. There is a conflict between predictive accuracy and descriptive accuracy of machine learning, and the decision-making process of machine learning is often unable to be described in a clear logical language. This limits the expert’s understanding of the model’s behavior. This leads to the fact that when deep learning models are applied to decision making in real engineering problems, people often fail to trust the models based on diagnostic accuracy alone. Only when the model is explained and the interpretation results are consistent with the prior knowledge can the engineers’ trust in the model be improved [18,19].

Aiming to solve this problem, this paper proposed an improved LIME method to evaluate the time–frequency image quality of vibration signals of rotating machinery from two aspects: the accuracy of the diagnosis results and the consistency of the interpretation results with prior knowledge. The rest of this paper is organized as follows: The second section describes the relevant theories and technical routes. The third section describes the experiment and analyzes the results. The fourth section outlines the conclusions and future prospects.

2. Materials and Methods

2.1. LIME

LIME (local interpretable model-agnostic explanations) is an interpretive technique for deep learning. By learning a locally interpretable model, the classifier’s predictions are interpreted in an explainable and trustworthy way [20]. The basic principle of LIME is shown in Figure 1. LIME is a general paradigm of interpretable methods rather than an algorithm, and its process can be expressed as follows. In order to explain a complex model f, a sample x is selected first, and several samples {z₁, z₂, …} are generated by perturbing the sample x to form a perturbed sample data set Z, and the corresponding predicted values {f(z₁), f(z₂), …} of the original model f are obtained. Next, interpretable features are extracted from samples {z₁, z₂, …} to obtain interpretable samples {z₁′, z₂′, …}. Then, an interpretable model g is selected and trained with {z₁′, z₂′, …}, which are taken as the input, and {f(z₁), f(z₂), …}, which are taken as the labels. Finally, a model with similar decision-making ability to the original model can be obtained in the neighborhood of sample x. By describing the interpretable model g, the interpretation of the original model f can be obtained.

The explanation produced by LIME is obtained by the following:

ξ (x) = a r g \min_{g \in G} L (f, g, π_{x}) + Ω (g),

(1)

where f represents the original model, that is, the model that needs to be explained. g is a selected interpretable model for the interpretation of the original model. G is a class of potentially interpretable models. π is the weight function.

The expression consists of two parts: a fidelity function L and a complexity measure Ω.

The fidelity function is defined as follows:

L (f, g, π_{x}) = \sum_{z, z' \in Z} {π_{x} (z) \times (f (z) - g (z'))}^{2},

(2)

where z is the perturbed sample, z′ is the interpretable version of z, and Z is a data set of perturbed samples. π is an exponential kernel defined based on a distance function. For images, it is L2 distance with width σ:

π_{x} = e x p (- D {(x, z)}^{2} / σ^{2}),

(3)

The complexity measure Ω is a measure of complexity (as opposed to interpretability) of the explanation g. For linear models, Ω is the number of weights.

2.2. GLCM

Textural feature is a concept in computer science technology that is mainly used to describe the surface properties of images. GLCM (gray-level co-occurrence matrix) is a method proposed by Haralick et al. in 1973 to describe the texture features of gray images [21]. The texture is formed by the repeated occurrence of gray levels in spatial positions and is the expression of the relationship between pixels separated by a certain distance. GLCM can accurately express the texture features of the image through the statistical form of the joint gray distribution of two pixels in the image, which is defined as the probability of grayscale j from a pixel point (x, y) with grayscale i and a point with distance d and direction θ:

G L C M = {[P]}_{G \times G},

(4)

P (i, j | d, θ) = {(x, y) | h (x, y) = i, h (x + d x, y + d y) = j}

(5)

where d is the relative distance represented by the number of pixels, and dx and dy are the components of this distance in the x and y directions, respectively. θ is the relative direction represented in angles, usually taking 0, 45, 90, and 135 in four two-way directions. x and y are the pixel coordinates in the image, and h is the gray value at that coordinates.

The size of GLCM is determined by the maximum grayscale level G of the image. The default grayscale range of a grayscale image is 0–255, and its maximum grayscale level is 256. For an image with a size of N × N, the expected amount of computation when calculating the gray co-occurrence matrix is about N² × G². For example, if the image size is 128 × 128, the basic calculation needs at least 10 billion times. To reduce the amount of computation, in practical application, it is generally necessary to reduce the gray level of the image through regularization while retaining the image texture characteristics as much as possible:

F (x, y) = I N T (f (x, y) \times G / G_{m a x}) + 1

(6)

where F is the regularized image, INT is the integer function, G_max is the maximum gray level of the original image, and G is the maximum gray level of the target, which is generally 8.

2.3. Technical Route

This paper proposes a time–frequency image quality evaluation method based on improved LIME to evaluate the time–frequency image quality of rotating machinery vibration signals from two aspects: the accuracy of the diagnosis results and the consistency of the interpretation results with prior knowledge. The technical route is shown in Figure 2. The technical route consists of four main parts that are described below.

2.3.1. Time–Frequency Transform

The acquired vibration signal is transformed by time–frequency transform to obtain a time–frequency matrix, and the time–frequency image is obtained after grayscale processing. There are many kinds of time–frequency transform methods, and it is unrealistic to evaluate all of them individually. This paper selects 5 common time–frequency transform methods for evaluation and comparison, as follows.

Short-time Fourier transform

Short-time Fourier transform (STFT) is one of the earliest time–frequency transform methods. STFT, which approximates the instantaneous frequency of time-varying signals through window processing, is a common means to analyze nonstationary signals. For the signal x_t, STFT is defined as follows:

S T F T (t, ω) = \int_{- \infty}^{\infty} x (τ) H^{*} (τ - t) e^{- j ω τ} d τ

(7)

where ω is the angular frequency and H is the window function.

2.: Wavelet transform

Wavelet transform adds time translation and scaling parameters to the basis function and has an adaptive microscopic ability to frequency changes, which often makes it better reflect the characteristics of time-varying signals. For the signal x_t, its wavelet transform is defined as follows:

W T (a, b) = \int_{- \infty}^{\infty} x (t) ψ (\frac{t - b}{a}) d t

(8)

3.: Winger–Ville transform

Winger–Ville transform is also one of the earliest time–frequency conversion methods, which was first applied to quantum mechanics and later introduced into signal analysis. It has the advantages of symmetry, time-shift invariance, and frequency modulation invariance. For the signal x_t, its Winger–Ville transform is defined as follows:

W V T (t, ω) = \int_{- \infty}^{\infty} x (t + τ / 2) x^{*} (t - τ / 2) e^{- j 2 π ω τ} d τ

(9)

4.: Empirical mode decomposition

Since 2000, based on data-adaptive processing technology, empirical mode decomposition (EMD) has occupied a pivotal position in the field of nonstationary signal processing. As a representative of the signal time–frequency transform preprocessing method, EMD performs signal decomposition according to the timescale characteristics of the data themselves, without presetting the basis function. For the signal x_t, its EMD is as follows:

x (t) = \sum_{n = 1}^{N} {i m f}_{n} (x) + r e s (x)

(10)

where imf is the n-order intrinsic mode function and res is the residual.

5.: Synchrosqueezing transform

In 2011, Daubechies et al. proposed Synchrosqueezing transform (SST) from the aspect of postprocessing of time–frequency transform [22]. By compressing the value of the frequency interval [ω_i − Δω, ω_i + Δω] near ω in the frequency of the time–frequency matrix toward ω, the time–frequency energy aggregation is improved. Its process can be expressed as follows:

S S T (ω_{i}) = {∆ ω}^{- 1} \sum_{|{T F}_{x} - ω_{i}| \leq \frac{Δ ω}{2}} {T F}_{x}^{- 1}

(11)

where TF is the time–frequency matrix obtained by the time–frequency transformation of signal x_t.

A one-dimensional vibration signal can be transformed into a two-dimensional time–frequency matrix by using a time–frequency transform method. However, the time–frequency matrix formed by different methods may be very different in the amplitude range, as shown in Figure 3. Even if it is the same method, the amplitude range is not the same under different preset parameters. As a result, when the same time–frequency transform method with different preset parameters or different simultaneous frequency transform methods are adopted, the time–frequency image generated by the same signal has a great difference in color performance. Therefore, in engineering applications, the time–frequency matrix is usually transformed into a grayscale image by grayscale processing as the input of the model. A grayscale image is a single-channel image form, and compared with three-channel RGB images, it has great advantages in transmission, storage, and computing efficiency.

2.3.2. Neural Network

This paper focuses on the influence of the input on model decision making, so the model structure is not described in too much detail. The classic VGG network with a simple structure and wide applicability is selected as the basic diagnosis model. VGG is a deep convolutional neural network structure developed by Oxford University’s Visual Geometry Group and Google DeepMind, which won the champion locating task and the runner-up classification task in the ImageNet large-scale visual recognition challenge in 2014. Due to its simple and powerful network architecture, VGG is widely used in various detection network frameworks such as fast-RCNN or SSD. The 8-layer VGG structure is built, and the network structure diagram is shown in Figure 4. The network structure contains 6 convolution layers with a 3 × 3 kernel size, and the number of kernels are increased layer by layer. After every two convolutional layers, a max pooling layer and a batch normalization layer are connected to prevent data explosion. The activation function selects the classical relu function to ensure the nonlinearity of the network. And at the end of the network, two fully connected layers make the final classification decision.

2.3.3. Improved LIME

The improvement in LIME in this paper is reflected in two aspects: how to perturb the sample and select features for the selected interpretable model for interpretation.

How to perturb the sample

To learn the local behavior of f near the sample x, sample set Z needs to be generated in the neighborhood by perturbing the sample x. The perturbation should ensure that the number of classification results of the original model f for the sample set Z are not less than 2, as this ensures that there is a solution to the selected model g. For image samples, this goal is usually achieved by adding salt and pepper noise, that is, setting the gray value of a random pixel to 0. This is because the features of the image sample are usually local, and adding salt and pepper noise can effectively destroy the local features of the image. However, for time–frequency images generated by vibration signals, in the frequency direction, fault information exists in multiple frequency bands and in their sidebands in the form of modulation, and in the time direction, the fault characteristics exist in the whole period with certain periodicity. Therefore, for time–frequency images, their features are global features, and the method of adding local noise is often difficult to achieve effective disturbance, so it is necessary to add global noise instead.

Taking the time–frequency image of the vibration signal in a healthy state as the target sample x, global Gaussian noise with a variance of 0–1 is added to simulate the impact response generated in the vibration signal when root cracks, missing teeth, and broken teeth occur on the time–frequency image, and global multiplicative noise with a variance of 0–1 is added to simulate the impact on the time–frequency image when tooth surface wear occurs.

How to select features

The linear model g can be expressed as follows:

g (x) = W \times X + b

(12)

W = {[w_{1} w_{2} \dots w_{n}]}^{'}

(13)

X = [x_{1} x_{2} \dots x_{n}]

(14)

where W is the coefficient matrix, b is the bias, X is the feature matrix, and n is the number of features. The larger the absolute value of the coefficients in the coefficient matrix, the more important the corresponding features are in the network’s decision making. For example, when weight w₁ is very large and weight w₂ is very small, the interpretation of the original model f through the interpretable model g can be expressed as follows: when the model makes a decision, it focuses on features x₁ but not x₂ features.

To explain the local behavior of f through g, in addition to requiring that the interpretable model g is simple enough to be understood, it is also necessary that the features of g are understandable. The understandability of input features includes two aspects: the feature has practical significance to the task objective and the number of features must be limited. Usually, for image classification tasks, LIME adopts the superpixel segmentation of the original image as a new feature, because these superpixels are often of practical significance. However, for the time–frequency image of the vibration signal, its superpixel is not of practical significance, so other features need to be selected.

Two kinds of Haralick features generated from the gray co-occurrence matrix are selected to describe the texture features of time–frequency images, respectively:

Angular Second Moment is the feature that measures the roughness of image texture. The larger the value is, the more concentrated the distribution of elements in the grayscale co-occurrence matrix is near the diagonal line, and the more regular the texture change in the original image is. The expression is as follows:

F_{A S M} = \sum_{i, j} {G L C M (i, j)}^{2}

(15)

Contrast is a feature that measures the total amount of local gray change in an image. The larger the value, the richer and clearer the image texture. The expression is as follows:

F_{C o n} = \sum_{n = 0}^{G - 1} n^{2} \{\sum_{i, j} G L C M (ⅈ, j)\}, |i - j| = n

(16)

When a rotating machine fails, its characteristic frequency and the vicinity of its sideband change, which is reflected in the change in texture in the time–frequency image, so these two features are related features in the fault identification task.

In addition, two global features of the original image are selected, which are as follows:

The mean of a gray image is a characteristic quantity used to measure the overall brightness of an image. The larger the value, the higher the overall brightness of the image. Its expression is as follows:

F_{M e a n} = \frac{1}{S_{h}} \sum_{I, J} h (i, j)

(17)

The variance of a gray image is a feature used to measure the dispersion degree of the image’s gray distribution. The larger the value is, the larger the gray distribution range of the image will be. Its expression is as follows:

F_{V a r} = \frac{1}{S_{h}} \sum_{i, j} {(h (i, j) - F_{M e a n})}^{2}

(18)

In fault diagnosis tasks, the overall light and shade of the gray distribution of time–frequency images are not the focus that engineers want to pay attention to, because these features are more affected by factors such as speed rather than fault conditions. Therefore, these two features are irrelevant features in the fault identification task.

The above four features together form the feature matrix. To prevent the result from being affected by the large numerical gap between the features, the feature matrix is treated with 0–1 normalization.

2.3.4. Quality Evaluation

The quantitative result of the quality evaluation in this paper is a quality score Q, and the higher the score, the higher the quality of the corresponding time–frequency image in the target task. It consists of accuracy A and consistency C:

Q = A - C

(19)

where accuracy A is the performance of the trained network on the test set. After the neural network is trained on the training set with preset hyperparameters, the predictive classification of the network is compared with the test set label using the test set data as the input. Accuracy is recorded as the proportion of the number of predictions consistent with the label to the total sample. The interval of A is [0, 1], and the higher the value, the higher the probability.

Consistency C is used to assess the consistency of the model’s interpretation results with prior knowledge. The interpretation result of the model is represented as the coefficient matrix of the interpretable model g, and the prior knowledge is also represented in matrix form. Considering that consistency measures are more concerned with the consistency of trends than the consistency of values, consistency C is obtained by calculating the cosine distance between the absolute value of the coefficient matrix W of the interpretable model g and the prior matrix P, as follows:

C = 1 - \frac{|W| \times P}{‖W‖ ‖P‖} = 1 - \frac{\sum_{i = 1}^{n} |W_{i}| \times P_{i}}{\sqrt[2]{\sum_{i = 1}^{n} W_{i}^{2}} \times \sqrt[2]{\sum_{i = 1}^{n} P_{i}^{2}}}

(20)

The prior matrix P is related to the feature matrix X. When X = [F_ASM F_Con F_Mean F_Var], P = [1 1 0 0]′. The interval of C is [0, 1], and the smaller the value, the smaller the difference, that is, the better the consistency.

3. Experiment and Analysis

3.1. Data Collection

In this paper, the proposed method is verified with measured data, which are collected from the Drivetrain Diagnostics Simulator, as shown in Figure 5. The equipment simulates faults in engineering practice by replacing parts. In this paper, the simulated fault occurs at the sun wheel in the planetary gearbox. The measured signals in five states are tested, which are a healthy state, root crack state, tooth surface wear state, broken tooth state, and missing tooth state. In each state, a total of 50 random data acquisition cycles are performed under the 20–50 Hz speed and 0–20 V electronic damping load as the initial data. The sampling frequency of the system is 24 kHz, and the sampling time is 30 s.

3.2. Data Processing

The structure of the planetary gearbox is shown in Figure 6. According to the calculation, the ratio of the sun wheel fault characteristic frequency to the spindle speed is 3.125, so under the condition of a 20–50 Hz spindle speed, the characteristic frequency does not exceed 200 Hz. Therefore, in the case that the Nyquist sampling theorem is satisfied, the initial data are downsampled at a resampling frequency of 400 Hz to improve the computational efficiency. Then, the data are sliced in a 3 s time period to obtain 500 samples in each state, giving a total of 2500 samples. Five kinds of time–frequency transformations are carried out on the samples, respectively, and the generated time–frequency matrix is gray-processed to obtain a time–frequency image of 128 × 128 pixels. The result is five input sets, each containing 2500 time–frequency images. Each set is divided into a training set and a test set at a 4:1 ratio. Finally, five one-to-one corresponding training sets and test sets are obtained, the size of each training set is 128 × 128 × 2000, and the size of the test set is 128 × 128 × 500. This process is shown in Table 1.

3.3. Experimental Result

A sample is randomly selected, and its original vibration signal is shown in Figure 7. The time–frequency image generated by five time–frequency transform methods and then grayscale processing is shown in Figure 8. As can be seen from the figure, the STFT image shows that the four frequency bands are relatively complete and the sidebands of each channel are also relatively rich. The wavelet transform image has better performance for the middle two frequency bands, but the other two frequency bands are fuzzy. The Winger–Ville transfer image method is clearer, but the aliasing phenomenon is severe. The image of the signal decomposed by EMD ignores the lowest frequency component. The frequency band of the SST image is the clearest, but the corresponding loss of most of the sideband components is a major issue.

The same network structure is trained with five input sets generated by five time–frequency methods, respectively. The design of network hyperparameters is shown in Table 2.

The improved LIME method is used to explain the trained network, and the interpretation results are shown in Figure 9. The Y-axis coordinate in the figure is the feature name, and the X-axis coordinate is the coefficient value of the trained model g. This value reflects how much the network depends on the corresponding feature when making a decision. The larger the value, the more attention the network pays to that feature when making a decision. It can be found that when the time–frequency images with rich texture features formed by STFT and wavelet transform are input into the network, the main basis of network classification is related to texture features. When the clearer time–frequency image formed by SST is used as the network input, the main basis of network classification is more inclined toward the irrelevant feature of image brightness.

The quality scores of the five time–frequency images are calculated, as shown in Table 3. It can be seen that the classification accuracy of the network is highest when the time–frequency images obtained by STFT and wavelet transform are input, but the consistency of the latter is weaker than that of the former. Although the accuracy of the Winger–Ville transfer is relatively low, the final quality score is higher than that of the wavelet transform because of the high consistency. The accuracy of EMD is much lower compared to other methods because the frequency component is ignored. The accuracy of SST is high, but the final quality score is very low because of the low consistency.

4. Conclusions

Automatic and intelligent diagnosis with deep learning technology has become a hot topic in the field of fault diagnosis. However, there is a lack of discussion on network input. Different from general image input, time–frequency images often lack easy-to-describe feature components, so a new image quality evaluation method is urgently needed. To solve this problem, this paper proposes a time–frequency image quality evaluation method based on an improved LIME method, which comprehensively considers the accuracy of the diagnosis results and the consistency of the interpretation results with prior knowledge. Accuracy shows the decision-making ability of the model for the target task, while consistency explains the decision-making basis of the model, and this interpretation behavior gives the engineer a reason to choose or not to choose the model. The two together constitute the quality evaluation standard of time–frequency images.

Experiments have been carried out on the measured data set, and the quality of the images obtained by five kinds of time–frequency transform has been evaluated, which proves the feasibility of the proposed method.

It should be noted that the evaluation method proposed in this paper is only for the application scenario in which time–frequency images are used as inputs for deep learning networks. The experimental results show that in this application scenario, more complex time–frequency conversion methods do not result in higher quality. However, in other applications or scientific research scenarios, the pursuit of higher time–frequency resolution and energy concentration is still a very important research direction.

Author Contributions

Conceptualization, Y.B.; methodology, Y.B.; software, Y.B. and Y.L.; validation, Y.B. and Y.L.; data curation, Y.B.; writing—original draft preparation, Y.B.; writing—review and editing, W.W.; supervision, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the anti-wear and anti-fatigue mechanism of aeroengine bearing steels with strength and toughness improved by the synergism of deep cryogenic temperature and ion implantation project, grant number 52375163.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from the School of Mechanical and Electronic Control Engineering, Beijing Jiaotong University, and are available from the authors with the permission of the School of Mechanical and Electronic Control Engineering, Beijing Jiaotong University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dai, X.; Gao, Z. From Model, Signal to Knowledge: A Data-Driven Perspective of Fault Detection and Diagnosis. IEEE Trans. Ind. Inform. 2013, 9, 2226–2238. [Google Scholar] [CrossRef]
Liu, R.; Yang, B.; Zio, E.; Chen, X. Artificial Intelligence for Fault Diagnosis of Rotating Machinery: A Review. Mech. Syst. Signal Process. 2018, 108, 33–47. [Google Scholar] [CrossRef]
Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep Learning and Its Applications to Machine Health Monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
Hoang, D.-T.; Kang, H.-J. A Survey on Deep Learning Based Bearing Fault Diagnosis. Neurocomputing 2019, 335, 327–335. [Google Scholar] [CrossRef]
Cerrada, M.; Sánchez, R.-V.; Li, C.; Pacheco, F.; Cabrera, D.; Valente de Oliveira, J.; Vásquez, R.E. A Review on Data-Driven Fault Severity Assessment in Rolling Bearings. Mech. Syst. Signal Process. 2018, 99, 169–196. [Google Scholar] [CrossRef]
Shi, L. The Selection of Neural Network Input Parameters Based on Association Rules; Atlantis Press: Amsterdam, The Netherlands, 2018; pp. 317–319. [Google Scholar]
Munkhdalai, L.; Munkhdalai, T.; Park, K.H.; Amarbayasgalan, T.; Batbaatar, E.; Park, H.W.; Ryu, K.H. An End-to-End Adaptive Input Selection with Dynamic Weights for Forecasting Multivariate Time Series. IEEE Access 2019, 7, 99099–99114. [Google Scholar] [CrossRef]
Egwu, N.; Mrziglod, T.; Schuppert, A. Neural Network Input Feature Selection Using Structured L2—Norm Penalization. Appl. Intell. 2023, 53, 5732–5749. [Google Scholar] [CrossRef]
Fernández Jaramillo, J.M.; Mayerle, R. Sample Selection via Angular Distance in the Space of the Arguments of an Artificial Neural Network. Comput. Geosci. 2018, 114, 98–106. [Google Scholar] [CrossRef]
Kowalski, P.A.; Kusy, M. Determining Significance of Input Neurons for Probabilistic Neural Network by Sensitivity Analysis Procedure. Comput. Intell. 2018, 34, 895–916. [Google Scholar] [CrossRef]
Bai, Y.; Cheng, W.; Wen, W.; Liu, Y. Application of Time-Frequency Analysis in Rotating Machinery Fault Diagnosis. Shock. Vib. 2023, 2023, e9878228. [Google Scholar] [CrossRef]
Sejdić, E.; Djurović, I.; Jiang, J. Time–Frequency Feature Representation Using Energy Concentration: An Overview of Recent Advances. Digit. Signal Process. 2009, 19, 153–183. [Google Scholar] [CrossRef]
Feng, Z.; Liang, M.; Chu, F. Recent Advances in Time–Frequency Analysis Methods for Machinery Fault Diagnosis: A Review with Application Examples. Mech. Syst. Signal Process. 2013, 38, 165–205. [Google Scholar] [CrossRef]
Barbosh, M.; Singh, P.; Sadhu, A. Empirical Mode Decomposition and Its Variants: A Review with Applications in Structural Health Monitoring. Smart Mater. Struct. 2020, 29, 093001. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis. Proc. R. Soc. London. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Li, Z.; Gao, J.; Li, H.; Zhang, Z.; Liu, N.; Zhu, X. Synchroextracting Transform: The Theory Analysis and Comparisons with the Synchrosqueezing Transform. Signal Process. 2020, 166, 107243. [Google Scholar] [CrossRef]
Yudkowsky, E. Artificial Intelligence as a Positive and Negative Factor in Global Risk. In Global Catastrophic Risks; Oxford University Press: Oxford, UK, 2008; ISBN 978-0-19-857050-9. [Google Scholar]
Dzindolet, M.T.; Peterson, S.A.; Pomranky, R.A.; Pierce, L.G.; Beck, H.P. The Role of Trust in Automation Reliance. Int. J. Hum. Comput. Stud. 2003, 58, 697–718. [Google Scholar] [CrossRef]
Herlocker, J.L.; Konstan, J.A.; Riedl, J. Explaining Collaborative Filtering Recommendations. In Proceedings of the 2000 ACM conference on Computer Supported Cooperative Work, New York, NY, USA, 1 December 2000; Association for Computing Machinery: New York, NY, USA, 2000; pp. 241–250. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Transactions on Systems Man and Cybernetics, 1973; SMC-3, 610–621. [Google Scholar] [CrossRef]
Daubechies, I.; Lu, J.; Wu, H.-T. Synchrosqueezed Wavelet Transforms: An Empirical Mode Decomposition-like Tool. Appl. Comput. Harmon. Anal. 2011, 30, 243–261. [Google Scholar] [CrossRef]

Figure 1. The basic principle of LIME. The pink and blue areas are the two parts that need to be distinguished. f is the original model to be explained and g is the explainable model used to explain f. x is the selected sample and Z is the area obtained by perturbing x. In this area, model g has a similar decision-making ability to f.

Figure 2. Quality evaluation method based on improved LIME.

Figure 3. The amplitude ranges of the time–frequency matrix obtained by three different time–frequency transformations for the same vibration signal are very different.

Figure 4. A diagram of the network structure. This structure contains the input and output layers, eight effective computation layers, and several auxiliary layers for optimization.

Figure 5. Drivetrain Diagnostics Simulator. This device consists of five main parts labeled in the figure.

Figure 6. A schematic diagram of the planetary gearbox structure.

Figure 7. The original vibration signal of one sample. The acquisition time of the signal is 3 s.

Figure 8. Five time–frequency images of one sample.

Figure 9. The interpretation of the judgment basis of the target task in the original model with five kinds of time–frequency images as the input. The larger the value in the x direction, the greater the influence of the corresponding feature on the model’s decision making.

Table 1. Data preprocessing.

Name	Size	Number
Original signal	72,000 × 1	250
Downsampling	12,000 × 1	250
Data slicing	1200 × 1	2500
Time–frequency image	128 × 128	2500 × 5
Training set	128 × 128 × 2000	5
Test set	128 × 128 × 500	5

Table 2. The network hyperparameters.

Parameter	Value
Mini batch size	64
Epoch	8
Initial learning rate	0.001
Decay	0.1
Momentum	0.9
Parameter	Value

Table 3. Quality evaluation of five time–frequency images.

Time–Frequency Transform Method	Accuracy	Consistency	Quality Score
Short-Time Fourier Transform	0.9300	0.0449	0.8851
Wavelet Transform	0.9560	0.2407	0.7153
Winger–Ville Transform	0.8820	0.0827	0.7993
Empirical Mode Decomposition	0.8100	0.1364	0.6736
Synchrosqueezing Transform	0.8960	0.4689	0.4262

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bai, Y.; Cheng, W.; Wen, W.; Liu, Y. A Time–Frequency Image Quality Evaluation Method Based on Improved LIME. Appl. Sci. 2024, 14, 2917. https://doi.org/10.3390/app14072917

AMA Style

Bai Y, Cheng W, Wen W, Liu Y. A Time–Frequency Image Quality Evaluation Method Based on Improved LIME. Applied Sciences. 2024; 14(7):2917. https://doi.org/10.3390/app14072917

Chicago/Turabian Style

Bai, Yihao, Weidong Cheng, Weigang Wen, and Yang Liu. 2024. "A Time–Frequency Image Quality Evaluation Method Based on Improved LIME" Applied Sciences 14, no. 7: 2917. https://doi.org/10.3390/app14072917

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Time–Frequency Image Quality Evaluation Method Based on Improved LIME

Abstract

1. Introduction

2. Materials and Methods

2.1. LIME

2.2. GLCM

2.3. Technical Route

2.3.1. Time–Frequency Transform

2.3.2. Neural Network

2.3.3. Improved LIME

2.3.4. Quality Evaluation

3. Experiment and Analysis

3.1. Data Collection

3.2. Data Processing

3.3. Experimental Result

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI