**The Effect of the Color Filter Array Layout Choice on State-of-the-Art Demosaicing**

**Ana Stojkovic \*,†, Ivana Shopovska \*,†, Hiep Luong, Jan Aelterman \* and Ljubomir Jovanov and Wilfried Philips**

TELIN-IPI, Ghent University—imec, 9000 Ghent, Belgium


Received: 20 June 2019; Accepted: 18 July 2019; Published: 21 July 2019

**Abstract:** Interpolation from a Color Filter Array (CFA) is the most common method for obtaining full color image data. Its success relies on the smart combination of a CFA and a demosaicing algorithm. Demosaicing on the one hand has been extensively studied. Algorithmic development in the past 20 years ranges from simple linear interpolation to modern neural-network-based (NN) approaches that encode the prior knowledge of millions of training images to fill in missing data in an inconspicious way. CFA design, on the other hand, is less well studied, although still recognized to strongly impact demosaicing performance. This is because demosaicing algorithms are typically limited to one particular CFA pattern, impeding straightforward CFA comparison. This is starting to change with newer classes of demosaicing that may be considered generic or CFA-agnostic. In this study, by comparing performance of two state-of-the-art generic algorithms, we evaluate the potential of modern CFA-demosaicing. We test the hypothesis that, with the increasing power of NN-based demosaicing, the influence of optimal CFA design on system performance decreases. This hypothesis is supported with the experimental results. Such a finding would herald the possibility of relaxing CFA requirements, providing more freedom in the CFA design choice and producing high-quality cameras.

**Keywords:** demosaicing; debayering; color filter array; image interpolation; image reconstruction

### **1. Introduction**

Since Bayer's original patent [1], (Bayer) Color Filter Array (CFA) demosaicing has established itself as the de facto standard method of acquiring multi-dimensional color images. General demosaicing in this sense would be defined as the reconstruction of a (multi-dimensional) color signal from an inherently single-dimensional array of (e.g., Charge-Coupled Device (CCD) or Complementary Metal-Oxide Semiconductor (CMOS)) sensors. The mosaiced image is obtained by using a planar sensor that is covered by an interleaved pattern of different color filters, resulting in sensor output that is an interleaved pattern of signal components that represent different parts of the color spectrum. A demosaicing algorithm reconstructs this into a (three-dimensional) full color signal. An optimal demosaicing system design would be constituted of the creation of an optimal interleaving pattern (the color filter array or CFA) and an optimal demosaicing algorithm that achieves the highest color reconstruction quality.

### *1.1. CFA Pattern Design*

A good CFA pattern design satisfies the criteria presented in [2]: cost-effective image reconstruction, robustness to color aliasing, robustness to image sensor imperfections and robustness to optical or

electrical influence between the neighboring pixels. The Bayer CFA exploits the fact that the human eye is more sensitive to wavelengths corresponding to the green colors, rather than to the blue and the red colors, i.e., the Bayer CFA consists of a repeating 2 × 2 pattern of one red, one blue and two green color filters. Given the analysis for the spectra of different CFA patterns, performed by Hirakawa et al. and presented in [3], it can be concluded that the Bayer CFA pattern is susceptible to introducing aliasing artifacts in both vertical and horizontal high spatial frequencies. Therefore, other solutions for CFA design that will overcome the problems with aliasing artifacts introduction for horizontal and/or vertical edges, were proposed. The idea for these designs is either based on the assumption that the physical world is rather horizontally than vertically oriented, or on the opposite assumption. Examples for more oriented patterns are: the Yamanaka CFA pattern [4], the Lukac CFA pattern [2], the Vertical CFA pattern, the Modified Bayer CFA pattern, the Diagonal CFA pattern, etc. With a purpose to increase the quality of the low-light photography, Sony introduced the Quad Bayer CFA sensor. In order to reduce the sensitivity to noise, many camera systems use multi-frame photography, which leads to ghosting artifacts introduction that affects the quality of the captured video. To deal with the problem of ghosting artifacts introduction, Sony introduced the IMX586 smartphone sensor with the Quad Bayer design and 48 MP resolution [5], with which Sony succeeds to obtain high performance and quality gain (as explained in [6]) when used in the high dynamic range (HDR) mode in low-light conditions. According to Sony, in normal light conditions, the camera achieves a similar quality of the captured images, when these are compared with the images captured with a sensor of 12 MP sensor with the Bayer design. Furthermore, to deal with the problem of low sensitivity in low-light conditions, many panchromatic CFA designs were introduced, e.g., Sony 4-Color [7], Kodak Ver.1-3 [8], etc. There also exist works on the multispectral filter array design [9,10] that find application in different fields of multispectral imaging.

In our analysis, we will only test the influence of Bayer-like CFA patterns (with the same sampling ratio as the Bayer CFA), considering them to be sufficient to show that the difference in quality performance between different CFA designs decreases with the increasing power of demosaicing algorithms.

### *1.2. Demosaicing*

Since demosaicing has been an extensively studied research area, it abounds with algorithms that may be classified into different groups. The earliest works are based on using simple interpolation techniques (bilinear, bicubic, spline interpolation, etc.). These reconstruction techniques are prone to artifacts introduction (aliasing) in the regions with high spatial frequencies, i.e., regions with edges and details. Consequently, the research in this field progressed further towards designing algorithms based on more sophisticated reconstruction techniques. For that purpose, numerous survey studies have been proposed [11–13]. In [14], the demosaicing algorithms were roughly classified into five categories, depending on the used techniques and on the reconstruction domain (spatial, frequency, wavelet, etc.). According to this classification, *classical* demosaicing includes: frequency-domain algorithms (good representatives are [15,16]), algorithms based on directional interpolations (among which residual interpolation (RI) algorithms, specifically [14] show superior performance), wavelet-based algorithms (among which good representatives are the algorithms presented in [17–19]), and reconstruction-based (with the algorithms presented in [20,21] as good representatives of this group). Another group of algorithms are the *learning-based* algorithms: dictionary-learning-based [22], regression-based [23], reconstruction-based with machine learning procedures for multispectral demosaicing (with [24] being a good representative algorithm) and neural-network-based algorithms, which we will refer to as *modern learning-based* demosaicing algorithms (with [25] as a representative).

A good overview of the performance of the *classical* demosaicing algorithms (that are not deep-learning-based) is given in the graph from [14], shown in Figure 1. As it can be seen, ARI [14] outperforms all previous algorithms. Its good performance is due to the fact that it is an iterative approach based on the assumption of color consistency along the edges and smooth areas. Additionally, the algorithm considers the color differences in creating the final demosaiced image. However, this

method, despite its good performance in the way it was designed, is not generic and is applicable only to Bayer pattern CFA and the multispectral filter array (MSFA) that was presented earlier in [10]. *Classical* demosaicing techniques have an advantage of being applicable to any type of image without requiring training data, while, on the other hand, the *modern learning-based* algorithms show superior performance and can be modified to be applied on different CFA patterns, which makes them generic.

**Figure 1.** Performance of the demosaicing algorithms over the years (from the analysis presented in [14]). Image source: [14].

### *1.3. CFA-Demosaicing Co-Design*

The co-design of CFA and demosaicing has received little attention. A reason for this is that many *classical* demosaicing techniques are intricate interpolation schemes that are finely tuned to a particular CFA pattern, typically the ubiquitous Bayer pattern. The intricacy of this design precludes the possibility of applying a demosaicing algorithm to a different CFA pattern and therefore inhibits the application of well-performing demosaicing techniques to CFA patterns that they were not designed for. There exist approaches in which a particular CFA design is introduced and a matching demosaicing algorithm is proposed (e.g., Pseudo-randomized CFA pattern [26] design and a demosaicing algorithm presented in [20]). Similarly, in [2], a Bayer-like CFA design is proposed and a generic algorithm is devised [27]. Demosaicing algorithms that are generic, in a sense that may be applied successfully to any CFA, do exist, but are much less common and are typically outperformed by CFA-specific demosaicing. Examples of such algorithms are [20,21], which belong to an earlier generation of demosaicing algorithms. These algorithms were outperformed by ARI (for the Bayer CFA pattern) and by another state-of-the-art algorithm, known as ACUDe [28] that belongs to the group of classical demosaicing methods and is also generic.

Another generic algorithm that belongs to the group of newer generation of *classical demosaicing* algorithms is the algorithm presented in [29], where the authors for reconstruction use the linear minimum mean square error (LMMSE) model. The LMMSE approach was tested on different periodic RGB CFA patterns and the experimental analysis has shown that, for some random CFA patterns, it achieves better performance than for the Bayer CFA pattern. Furthermore, with this analysis, it was shown that, for all analyzed CFA patterns, the reconstruction quality increases as a bigger neighborhood around the pixel to be estimated is taken into consideration.

More advanced generic algorithms are the learning based algorithms, among which neural network based algorithms are superior. Such an algorithm is presented in [30]. Here, authors propose a demosaicing algorithm based on a simple neural-network architecture. In this algorithm, the authors rely on the previously proposed concept of using a superpixel (neighboring area) for estimating the unknown pixel values. This technique was tested on different CFAs and the MSFA presented in [10].

Convolutional neural networks (CNNs) have shown great success in many image processing and computer vision tasks. One of the main strengths of CNNs is that they allow learning features specific for a given domain, compared to earlier approaches where the features were predefined based on domain knowledge. For example, for the problem of demosaicing, the same CNN can be applied for different CFAs by retraining it with different input data, without any structure modifications [25,31–36]. Moreover, some CNN-based algorithms learn an optimal color filter layout jointly with a model for demosaicing of images obtained with the learned pattern [37,38].

### *1.4. Structure*

This paper studies the effect of the CFA design choice on the overall CFA-demosaicing reconstruction quality. Specifically, we test the hypothesis that the impact of the CFA layout choice on the quality performance decreases with the emergence of more sophisticated, *modern learning-based* demosaicing algorithms.

It is structured as follows: in Section 2, we provide a description of demosaicing that is state-of-the-art with respect to quality performance and that lends itself well to adaptation and improvement towards generic CFA. In Section 3, we describe adaptations we made to these methods in an effort to allow evaluation of different CFAs and to achieve better results in terms of reconstruction quality. In Section 4, by starting with the motivation for the performed analysis, we proceed with explanation about the performed experiments, about the used image data-sets and the used CFA patterns. In Section 5, we present the qualitative and the quantitative results from the performed analysis, while in Section 6 we give a summary of the obtained conclusions.

### **2. State-of-the-Art Demosaicing**

In order to demonstrate a trend of state-of-the-art demosaicing becoming less sensitive to the CFA pattern (i.e., to show that the performance of *modern learning-based* demosaicing is less affected by the CFA design), we selected two algorithms, as representatives of two different groups of demosaicing algorithms: in the first group, we consider *classical* demosaicing techniques that are not based on machine learning and in the second group we consider *modern learning-based* techniques. In this section, we describe the two representatives of each group separately. Specifically, the first algorithm, known as ACUDe, belongs to the group of directional interpolation demosaicing algorithms. For reconstructing the full color image, it exploits the color consistency in real images, by using the interchrominance dependency. The second algorithm is a neural network based algorithm called CDMNet [25], chosen as a representative method with a publicly available code.

### *2.1. Universal Demosaicing of CFA (ACUDe)*

The method proposed by Zhang et al. [28], known as ACUDe, is devised to be applied on different designs of CFA. Considered to be a state-of-the-art generic algorithm among the algorithms that are based on directional interpolations, it has similar performance to the state-of-the-art ARI algorithm proposed by Monno et al. [14], which outperforms all demosaicing algorithms that were previously proposed. Furthermore, as ARI was implemented, it is only applicable to the Bayer CFA pattern and the five-band multispectral filter array described in [10,14]. Therefore, we will consider only ACUDe for our analysis. Although the main idea and theory of ACUDe can be applied on multispectral filter arrays, the algorithm was tested and implemented for the three primary color system. Considered to be an adaptive generic method, it exploits the inter-chrominance dependence and the measured CFA response, to estimate the chrominance components. It is implemented in three steps: estimation of the color/demosaicing transform matrix (only dependent from the CFA pattern) and the chrominance direction; distance dependent per-pixel weight generation based on inter-pixel chrominance capture and edge sensitivity; chrominance components estimation by weighting over the CFA image and demosaicing transformation to the three primary colors. The method was evaluated on the KODAK data-set [39] (because it abounds with images with both high and low spatial frequency content) and

the IMAX data-set [40] (considered to be more challenging for demosaicing methods because of its high color diversity and also because many images from the data-set contain high spatial frequencies in the chrominance components). In [28], the authors compared their algorithm with two other generic demosaicing algorithms for RGB CFA patterns (Condat's generic method [20] and Menon and Galvagno's regularization approach to demosaicing (RAD) [21]). The obtained results (in terms of measured color peak signal-to-noise ratio (CPSNR) and CIE LAB error) indeed show that ACUDe outperforms the two mentioned algorithms for six different CFA patterns.

### *2.2. Demosaicing Using a CNN (CDMNet)*

The baseline neural-network-based algorithm in this work is CDMNet [25]. In our analysis, we use this algorithm because of its high-quality performance and reconstruction power among the algorithms that belong to the group of *modern learning-based* demosaicing techniques and because our experiments show that it outperforms the state-of-the-art algorithms that belong to the group of *classical* demosaicing. Furthermore, this algorithm with the publicly available code and the retrainable neural-network is suitable for performing modifications towards making it generic (adaptive to any repetitive CFA pattern). In its original design, CDMNet is a three-stage neural network that is among the state-of-the-art, according to evaluations on different datasets. CDMNet relies on the inter-channel correlation for interpolation of the missing values. The problem of RGB demosaicing is split into three stages: (1) reconstruction of the green channel, (2) separate reconstructions of the red and blue channels jointly with the high-quality green channel and (3) joint fine-tuning of all three channels.

### **3. Modifying State-of-the-Art Demosaicing Algorithms**

Here, we present the modifications we made on ACUDe and CDMNet. Since ACUDe is designed to be generic, with our modifications, we achieved slight improvement in the reconstruction quality. The adaptations made on CDMNet apply to the generalisation of this algorithm towards repetitive RGB CFA patterns.

### *3.1. Modifying ACUDe*

The flowchart of the modified ACUDe is presented in Figure 2. The input data that is considered is: the CFA pattern and the CFA filtered image (CFA input). The green arrows represent the CFA pattern dependent data-flow, and the blue arrows represent the CFA filtered image dependent data-flow. For more details about the algorithm and each procedure, we refer to the explanation given in [28]. The modifications that were made consist of inserting the original values of the CFA filtered image to the corresponding locations in the output image. This procedure is applied after obtaining the rough estimate of the demosaiced image (non-adaptive demosaicing). For the main source-code used in our implementation and the obtained results for Bayer pattern, of the originally designed (as explained in [28]), and unmodified ACUDe, we direct the reader to the following web-site [41].

**Figure 2.** Flowchart of ACUDe and our modifications. Modifications are represented with the procedure in the orange block. Green arrows represent the data-flow between procedures (given with blocks) where only the color filter array (CFA)pattern is used. Blue arrows represent the data-flow between procedures (given with blocks) where also a CFA input image is used.

### *3.2. Making CDMNet Generic*

For this study, we need to compare the performance of the same demosaicing algorithm using different CFA patterns. To make the CDMNet algorithm agnostic of the pattern, we made several modifications of the original version of CDMNet, shown in Figure 3 and explained in more details below:

**Figure 3.** Flowchart of CDMNet and our modifications. The orange color represents locations where we modified the original algorithm to make it agnostic of any CFA pattern: no bilinear demosaicing, pixel shuffling and working in a lower spatial resolution with fixed downscaling factor of 4 × 4.


These adaptations allow for providing the same conditions when training the neural network for different patterns. Any difference in the reconstruction quality will only be a consequence of the pattern that was used. To train the neural network, we relied on the Waterloo Exploration Dataset (WED) [43] following the same practice as the original CDMNet method. Initially, we randomly select 4644 images to create the training dataset, and the remaining 100 images from WED are used as a validation set. We have then fixed the selected training and validation sets and used the same ones during re-training for all of the patterns.

Approximately 360,000 patches were extracted from the training images. Instead of the originally proposed patch size of 50 × 50, we extracted patches of size 48 × 48 to keep the dimensions divisible by the downscaling factor. For each pattern, the neural network was randomly initialized and re-trained for 81 epochs using batches of 64 patches. As in the original version [25], the learning rate was decreased five times every 20 epochs, ranging from 10−<sup>3</sup> to 10−5. Applying the modified CDMNet model to any camera with a three-color, a repetitive (periodic) CFA pattern would require only one offline re-training and applying a scaling factor proportional to the pattern size.

### **4. Experimental Analysis**

The objective of the performed analysis is to show that, as the performance quality increases with the more sophisticated reconstruction techniques being introduced, it becomes less affected by the CFA design.

### *4.1. Experiments and Materials*

In our study, we tested the two modified generic state-of-the art algorithms (that belong to different classes of algorithms) on Bayer-like patterns (where the sampling ratio between the green, red and blue channel is 2:1:1). We limit our analysis only to Bayer-like patterns, with a purpose of achieving fair comparison and to obtain unbiased (towards more sophisticated and novel CFA patterns) and uninfluenced (by different designs of demosaicing algorithms) conclusions. According to the performed analysis presented in [29], some random patterns perform better than Bayer for the specific LMMSE demosaicing methods. Our assumption is that the worst-performing CFAs (like Quad Bayer) will present the largest differences with respect to output quality as a function of the demosaicing

algorithm used. Our experiments show that when neural networks are applied for demosaicing, the choice of the CFA pattern does not significantly influence the quality of the results. Based on our finding and the results of generic algorithms presented in [28] (which are similar for Bayer CFA and for Pseudo Randomized CFA), we expect that this trend will be true for random patterns as well. For this reason, the paper focuses on showing how the performance for the worst-performing CFAs, as these are the ones with the largest differences in performance, changes as a function of chosen demosaicing algorithm. Therefore, we test the hypothesis that if the exact CFA pattern becomes irrelevant or less relevant, this largest difference would diminish with it.

The patterns (see Figure 4) on which the modified state-of-the-art algorithms were tested are: Bayer pattern, Lukac pattern, Yamanaka pattern, Modified Bayer pattern and Quad Bayer pattern.

Following the common practice for quality performance evaluation, in the performed analysis for the two modified state-of-the-art generic algorithms, we used the two well-known image data-sets: KODAK [39] (abounding with highly diverse in content and with plenty of details images) consisted of 24 images in total (18 images with resolution of 768 × 512 pixels and six images with a resolution of 512 × 768 pixels) and IMAX [40] (abounding with highly diverse in content and colors and with plenty of details images) consisted of 18 images with resolution of 500 × 500 pixels. The second image data-set, IMAX, is more challenging for quality performance evaluation of the demosaicing algorithms. These image data-sets do not coincide with WED and are therefore new, previously unseen data for the modified CDMNet. For testing the quality performance of the modified CDMNet, each of the five trained neural network models (one model for every CFA design presented on Figure 4), was applied on the input mosaic images obtained with the appropriate CFA pattern.

**Figure 4.** Bayer-like CFA patterns used in the experimental analysis: Bayer, Lukac, Yamanaka, Modified Bayer and Quad Bayer.

In order to test our hypothesis that a generic (in terms of CFA pattern) *modern learning-based* demosaicing technique (such as the modified CDMNet) is more advantageous for reconstruction and less affected by the quality of the CFA pattern than a generic *classical* demosaicing technique that is not learning based (such as the modified ACUDe), we perform three experiments. The first two experiments are part of the quantitative analysis for the quality performance of the two algorithms and these include: objective evaluation (using the average CPSNR and the average PSNR for each color channel, with the standard deviation of the mean, calculated over the reconstructed images from each image data-set) and perceptual evaluation (using the SSIM metric, with the standard deviation of the mean calculated over the reconstructed images from each image data-set). When each quality metric was calculated, 11 pixels were excluded from the two vertical and horizontal image borders. In the third experiment, which is part of the qualitative analysis, we analyze and compare images that were reconstructed with the both algorithms, for the five different Bayer-like CFA patterns.

### **5. Results**

In what follows, we present the results and the conclusions from the experiments of the performed analysis about the quality performance of the two representative state-of-the-art algorithms (the modified ACUDe as representative among the *classical* demosaicing algorithms and the modified CDMNet as representative among the *modern learning-based* demosaicing algorithms).

### *5.1. Results from the Quantitative Analysis*

In Figure 5, we present the averaged CPSNR results with the standard deviation of the mean (over the images from each image data-set) obtained for each analysed Bayer-like CFA pattern, with the representative techniques (the modified ACUDe and the modified CDMNet). In Figures 6–8, we present the average PSNR results with the standard deviation of the mean (over the images from each image data-set) for each channel separately. In the same way, in Figure 9, we present the SSIM results.

**Figure 5.** Average color peak signal-to-noise ratio (CPSNR) results with standard deviation of the mean: for the two modified generic algorithms ("CDMNet modified" and "ACUDe modified"), for the two data-sets ("KODAK" and "IMAX"), for five different Bayer-like patterns (Bayer, Lukac, Yamanaka, Modified Bayer and Quad Bayer).

**Figure 6.** Average PSNR (red channel) results with standard deviation of the mean: for the two modified generic algorithms ("CDMNet modified" and "ACUDe modified"), for the two data-sets ("KODAK" and "IMAX"), for five different Bayer-like patterns (Bayer, Lukac, Yamanaka, Modified Bayer and Quad Bayer).

**Figure 7.** Average PSNR (green channel) results with standard deviation of the mean: for the two modified generic algorithms ("CDMNet modified" and "ACUDe modified"), for the two data-sets ("KODAK" and "IMAX"), for five different Bayer-like patterns (Bayer, Lukac, Yamanaka, Modified Bayer and Quad Bayer).

**Figure 8.** Average PSNR (blue channel) results with standard deviation of the mean: for the two modified generic algorithms ("CDMNet modified" and "ACUDe modified"), for the two data-sets ("KODAK" and "IMAX"), for five different Bayer-like patterns (Bayer, Lukac, Yamanaka, Modified Bayer and Quad Bayer).

From the CPSNR graph (see Figure 5), it can be noticed that the difference in quality, achieved with the modified CDMNet (especially on the KODAK image data-set) across the different Bayer-like CFA patterns, is smaller than the difference in quality of the modified ACUDe. The highest difference in the quality, which may be observed from the both PSNR and SSIM values and as it is expected, is between the Bayer CFA pattern and the Quad Bayer CFA pattern. The color pixels in Bayer CFA are more frequently distributed around the pixel of interest (the pixel whose value is to be estimated) and therefore the reconstruction quality with both algorithms is better for the Bayer CFA pattern. This difference (for the KODAK image data-set) of ≈3 dB when the modified CDMNet is applied, is significantly lower, when being compared to the ≈6 dB difference when the modified ACUDe is applied. Although not highly pronounced, a similar trend is recognized when the results of the IMAX image data-set are observed. The same conclusion can be derived from the observations on the PSNR results for each color channel separately (see Figures 6–8).

**Figure 9.** Average SSIM results with standard deviation of the mean: for the two modified generic algorithms ("CDMNet modified" and "ACUDe modified"), for the two data-sets ("KODAK" and "IMAX"), for five different Bayer-like patterns (Bayer, Lukac, Yamanaka, Modified Bayer and Quad Bayer).

To examine if this difference (between the reconstruction quality across the different CFA patterns) becomes more apparent when the modified ACUDe is applied, we proceed with analyzing the SSIM results (see Figure 9). As expected, the derived conclusion of the CPSNR and PSNR results is more notably supported with the SSIM results. This analysis brings us towards a more general conclusion on the improvement of the reconstruction quality, with the emergence of new and more advanced demosaicing techniques. The conclusion is that the reconstruction quality becomes less affected by the CFA pattern, as the demosaicing techniques become more sophisticated and more reliant on powerful deep learning-based approaches.

Although the absolute overall quality performance of both representative demosaicing algorithms (the modified ACUDe and the modified CDMNet) is not the main focus of the performed analysis, in what follows, with a purpose to make the analysis thorough, we will discuss the major differences between the two algorithms for each image data-set. The KODAK image data-set, compared to the IMAX image data-set, consists of natural images that are more color consistent. On the contrary, the IMAX image data-set abounds with images that consider more high spatial color frequencies. Therefore, the better quality performance of the both demosaicing algorithms, on the KODAK image data-set, is expected and justified. On the other hand, the color constancy in natural images is one of the assumptions that the design of many *classical* demosaicing techniques (also including the state-of-the-art algorithms: the generic ACUDe and ARI) is based on. Therefore, our assumption is that the significantly better quality performance of the modified ACUDe on the KODAK image data-set, rather than on the IMAX image data-set, is due to the fact that ACUDe, in the way it was originally designed, is inherently biased towards image content with higher color constancy. If the PSNR results for each color channel (see Figures 6–8) are analyzed, it can be noticed that both algorithms are better in reconstructing the green color channel, rather than the blue and the red color channels. It can also be noticed that the difference in quality reconstruction between the content of the two image data-sets is smaller in the case when the modified CDMNet is applied. Moreover, the standard deviation of the mean (calculated over the reconstructed images from each image data-set and each analyzed CFA pattern) is smaller (especially when SSIM results are observed) in the case when the modified CDMNet is applied.

These results show that the modified CDMNet, as a representative among the *modern learning-based* demosaicing techniques, is more adaptive to different types of scenes and at the same time succeeds with achieving high quality reconstruction for different CFA patterns.

### *5.2. Results from the Qualitative Analysis*

Here, we visually present the results (cropped parts of the reconstructed images) from the selected representative examples of each image data-set (KODAK and IMAX). The ground-truth (GT) images, with the corresponding cropped parts (marked with rectangles), are presented in Figure 10. In Figure 11, we show the results obtained with the both algorithms (the modified ACUDe and the modified CDMNet) for the five Bayer-like CFA patterns presented in Figure 4. The differences between the reconstructed images from the different CFA inputs are visually more pronounced in the case when the modified ACUDe is applied. Note the color aliasing artifacts in the result for the KODAK example and the Yamanaka CFA pattern and the color aliasing artifacts accompanied with zippering artifacts for the KODAK example and the Quad Bayer CFA pattern presented in Figure 11. Some color aliasing artifacts may also be noticed (although negligible) in the reconstructed images with the modified CDMNet (see the result obtained with the modified CDMNet, for the KODAK example and the Yamanaka CFA pattern, presented in Figure 11). If we compare the results for the Quad Bayer CFA pattern and the Bayer CFA pattern, obtained with the modified CDMNet, we notice that there are no demosaicing artifacts present. The only difference between the two results may be perceived as insignificant blurriness in the case of the Quad Bayer CFA pattern. Furthermore, if the results obtained with the modified CDMNet for the IMAX example are observed, almost no differences between the reconstructed images will be noticed.

The consistently higher reconstruction quality across different patterns and data-sets achieved by CDMNet, over ACUDe, can be attributed to the strong representation power of convolutional neural networks. The advantage of CNN models compared to classical methods is the capability of modeling a distribution of natural images, and learning spatial and spectral correlations in the data. The huge number of trainable parameters in CDMNet provides sufficient model complexity for solving the demosaicing problem with similar quality for different input mosaic configurations.

The presented results from the performed experimental analysis, additionally with the results from the experimental evaluation presented in [30], veritably support our initial hypothesis that, when *modern learning-based* techniques are applied for demosaicing, the overall reconstruction quality is less influenced by the choice of the CFA pattern.

Cropped part: KODAK GT

IMAX example: ground truth (GT)

Cropped part: IMAX GT

KODAK example: ground truth (GT)

**Figure 10.** Examples of ground truth images from KODAK and IMAX image data-sets and their cropped parts.

CDMNet: Bayer CDMNet: Lukac CDMNet: Yamanaka CDMNet: Modified Bayer CDMNet: Quad Bayer

**Figure 11.** Results obtained for the representative examples from KODAK and IMAX data-sets. Two algorithms were analysed: (1) the modified ACUDe based on the algorithm presented in [28], as a state-of-the-art generic algorithm among the *classical* demosaicing techniques and (2) the modified CDMNet based on the algorithm presented in [25] as a state-of-the-art generic algorithm among the *learning-based* demosaicing techniques. The algorithms were tested on five Bayer-like CFA patterns. There are no big visual differences between the reconstructed images from different CFA inputs when the modified CDMNet (the representative of the *modern learning-based* demosaicing techniques) is applied.

### **6. Conclusions**

Within this study, we analyse the quality performance of two state-of-the-art generic demosaicing algorithms for different Bayer-like CFA patterns. The first one (modified ACUDe) belongs to the group of *classical* demosaicing algorithms, while the second one (modified CDMNet) belongs to the group of *modern learning-based*, i.e., neural-network-based demosaicing algorithms. The aim of the performed analysis is to test the hypothesis that the *modern learning-based* demosaicing techniques (the NN-based approaches) overcome the high difference in quality performance for different CFA patterns and at the same time succeed at achieving quality performance that is higher than the quality performance of the state-of-the-art generic algorithm (here modified ACUDe) that belongs to the group of *classical* demosaicing techniques. The presented results of the analysis for the quality performance indeed show that the hypothesis is true. From this study, we derive a conclusion about the constantly increasing

reconstruction power of the *modern learning based* demosaicing algorithms towards adaptiveness to any CFA design without loss in the reconstruction quality (which used to be dependent on the quality of the CFA design). This conclusion leads to a finding regarding the future opportunities for camera manufacturing and image reconstruction, specifically in combining lower hardware requirements with powerful reconstruction techniques. In other words, this means that, with the *modern learning-based* demosaicing methods, camera manufacturers have more freedom in the choice of the CFA pattern layout, without a noticeable loss in the image quality. In that direction, the patterns can be adapted to improve other image properties and facilitate various imaging tasks, such as the Quad Bayer that was designed to improve noise reduction in low-light imaging. Furthermore, this conclusion points towards the advantage of using the easily adaptive and retrainable neural-network based demosaicing techniques in various applications of multispectral imaging.

**Author Contributions:** A.S. and I.S. made the modifications of the two state-of-the-art demosaicing algorithms and conducted the experimental analysis. A.S., I.S. and J.A. wrote the paper. H.L. and J.A. proposed the hypothesis for which the experimental analysis was conducted. H.L., J.A., L.J. and W.P. supervised the research.

**Funding:** This work was partially funded by EU Horizon 2020 ECSEL JU research and innovation program under grant agreement 662222 (EXIST).

**Acknowledgments:** We would like to thank Pengwei Hao for making the implementation of the algorithm presented in [28] publicly available on http://www.eecs.qmul.ac.uk/~phao/CFA/acude/ and for the provided help for our analysis and the derived conclusions about the algorithm presented in [28]. We would also like to thank the authors of the algorithm presented in [25], for making the implementation publicly available. Many thanks to all authors of the publicly available image data-sets that were used in the performed analysis.

**Conflicts of Interest:** The authors declare no conflict of interest

### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **A Novel Method for Early Gear Pitting Fault Diagnosis Using Stacked SAE and GBRBM**

### **Jialin Li <sup>1</sup> , Xueyi Li 1, David He 1,2,\* and Yongzhi Qu <sup>3</sup>**


Received: 16 January 2019; Accepted: 9 February 2019; Published: 13 February 2019

**Abstract:** Research on data-driven fault diagnosis methods has received much attention in recent years. The deep belief network (DBN) is a commonly used deep learning method for fault diagnosis. In the past, when people used DBN to diagnose gear pitting faults, it was found that the diagnosis result was not good with continuous time domain vibration signals as direct inputs into DBN. Therefore, most researchers extracted features from time domain vibration signals as inputs into DBN. However, it is desirable to use raw vibration signals as direct inputs to achieve good fault diagnosis results. Therefore, this paper proposes a novel method by stacking spare autoencoder (SAE) and Gauss-Binary restricted Boltzmann machine (GBRBM) for early gear pitting faults diagnosis with raw vibration signals as direct inputs. The SAE layer is used to compress the raw vibration data and the GBRBM layer is used to effectively process continuous time domain vibration signals. Vibration signals of seven early gear pitting faults collected from a gear test rig are used to validate the proposed method. The validation results show that the proposed method maintains a good diagnosis performance under different working conditions and gives higher diagnosis accuracy compared to other traditional methods.

**Keywords:** early gear pitting fault diagnosis; vibration signals; SAE; GBRBM

### **1. Introduction**

Gears play an important role in mechanical transmission systems. It is necessary to diagnose gear faults to ensure stable and reliable operation of the systems. The methods of fault diagnosis can be roughly divided into two categories: model-driven methods and data-driven methods [1]. Model-based diagnostic methods require a deep understanding of the systems, and many parameter adjustments need to be performed to build the model. Therefore, this paper applies data-driven methods to diagnose gear faults. The data-driven diagnostic process involves two steps: (1) establish a data model based on known state data, (2) use the established model to diagnose mechanical faults. The fault diagnosis process can be regarded as the process of applying the model for pattern recognition. When building a fault diagnosis model, there are generally two processes: feature extraction and pattern recognition [2]. The purpose of feature extraction is to convert high-dimensional data into low-dimensional features, which can better perform pattern recognition. There are many methods for feature extraction such as statistical analysis methods, fast Fourier transform (FFT), Hilbert–Huang transform (HHT) [3], empirical mode decomposition (EMD) [4], wavelet transform (WT) [5], principal components analysis (PCA) [6], and so on. There are many traditional pattern recognition methods including Bayesian classifier [7], K-nearest neighbor (KNN) algorithms [8], artificial neural network

(ANN), support vector machine (SVM) [9], etc. Traditional ANN can only distinguish less complex features, and the diagnosis results are greatly affected by process of feature extraction and feature selection. In recent years, research on deep learning is getting popular. Deep learning can improve the shortcomings of traditional neural network. There are many types of deep learning methods applied in fault diagnosis, which can be divided into supervised methods such as deep neural network (DNN), convolutional neural network (CNN) [10] and unsupervised methods such as deep belief network (DBN), and autoencoder (AE).

Zhang et al. [11] used DNN to diagnose bearing faults and directly used the collected vibration signals as the inputs of neural network, removing the error caused by the feature extraction process. Tested by two publicly available data from University of Cincinnati Center for Intelligent Maintenance System (IMS) and Case Western Reserve University (CWRU), their proposed method was shown to be able to effectively diagnose bearing faults. Chen et al. [12] used a CNN to diagnose gearbox faults. FFT was performed on the vibration signals. Statistical methods were used to extract features from the time-domain and the frequency domain signals as inputs to the CNN. Chen et al. [13] applied ensemble empirical mode decomposition (EEMD) to extract features, and then applied DBN to classify the gear faults. Wang et al. [14] applied unsupervised continuous sparse autoencoder (CSAE) for feature learning and connected a layer of back propagation (BP) networks behind the CSAE. The training process first applied CSAE for unsupervised classification, and then used BP for supervised learning.

DBN has been used for fault diagnosis. The research using DBN for fault diagnosis is reviewed next. Tran et al. [15] used Teager–Kaiser energy operator (TKEO) and DBN to diagnose reciprocating compressor valves faults. In their paper, TKEO was proposed to estimate the amplitude envelopes. The collected vibration signal was processed by WT denoising, and then the time domain signal was converted into a frequency domain. Finally, the statistical methods were used to extract the feature as inputs of the DBN. The diagnostic method used by Han et al. [16] was similar to the method in reference [15], except that it adds a particle swarm optimization-support vector machine (PSO-SVM) to classify extracted parameters. Shao et al. [17] applied the dual-tree complex wavelet packet for feature extraction, and then used statistical methods for feature selection. Finally, an adaptive DBN was used for fault classification. Wang et al. [18] also applied statistical methods to process time-frequency domain signals, and then used DBN to detect multiple faults in axial piston pumps. Lee et al. [19] used DBN to diagnose the air handling unit (AHU). Ahmed et al. [20] combined DBN and softmax classifiers to diagnose rolling bearing faults. Tao et al. [21], He et al. [22], and Chen et al. [23] also applied statistical methods for feature extraction, and then applied DBN for fault classification.

Deutsch et al. [24] integrates DBN and a particle filter for bearing remaining useful life (RUL) prediction. Geng et al. [25] were inspired by the glial chains to improve the structure of the restricted Boltzmann machines (RBMs). An improved greedy layer-wise learning algorithm was used to improve the diagnostic accuracy. Ren et al. [26] combined deep belief networks and multiple models (DBN-MMs) to diagnose complex systems faults. Shao et al. [27] combined the CNN with the DBN to process the compressed sensing (CS). In addition, exponential moving average (EMA) technique was used to improve diagnostic accuracy of the constructed deep model. Jiang et al. [28] proposed a feature fusion DBN method to diagnose rotating machinery fault. Moreover, the locality preserving projection (LPP) was used to fusion deep features to further improve the quality of the deep features.

SAE has been used for fault diagnosis recently. The research using SAE to diagnose faults is reviewed next. Shao et al. [29] proposed ensemble deep auto-encoders (EDAEs) to diagnose bearing faults. The effects of different activation functions and various AEs on diagnostic results were discussed in the article. Maurya et al. [30] used stacked autoencoder to fuse the low-level feature. And then a multi-class SVM was used as classifier. Shao et al. [31] applied deep autoencoder to diagnose rotating machinery faults. The maximum cross entropy was used as the loss function and artificial fish swarm (AFS) algorithm was applied to optimize the key parameters of the deep autoencoder. Meng et al. [32] used denoising autoencoder to diagnose bearing faults. They improved the fault diagnosis rate by reusing the data points between the adjacent samples. The hyper parameter was adjusted by changing the number of units per layer to adapt to the different resilience of the layer.

This paper proposed integrates SAE with GBRBM to diagnose early gear pitting faults. The SAE is used to convert high-dimensional data into low-dimensional data, and GBRBM is used to accommodate the continuous distribution of the inputs. The rest of this paper is organized as follows. In Section 2, the proposed method based the stacked SAE and GBRBM is explained in details. In Section 3, the description of the experimental test rig used for collecting the vibration data for the seven gear pitting faults is provided. In Section 4, the validation results and the discussion of the validation results are presented. Finally, Section 5 concludes the paper.

### **2. The Proposed Method**

### *2.1. Framework of Proposed Method*

Most of the data-driven diagnosis methods involve separate manual feature extraction process. Manual feature extraction mostly relies on human expertise, and the manual feature extraction process is time-consuming and labor intensive. Moreover, the diagnostic results are greatly affected by the feature extraction method. Therefore, diagnostic methods that do not include separate manual feature process are more desirable. Inspired by the unsupervised learning process, this paper proposes a diagnostic method that combines supervised learning with unsupervised learning. The framework of the diagnostic method is shown in Figure 1.

**Figure 1.** The framework of the proposed method.

As shown in Figure 1, the framework of the proposed method includes three parts: (1) unsupervised feature learning, (2) transfer the learned useful information to the new network, (3) supervised fine-tuning the restructured network. Stacked SAE, GBRBM and RBMs are combined to

work as a simultaneous signal processing and unsupervised feature extraction process. The blue circles in the figure represent the input layer neurons, the red circles represent the hidden layer neurons, and the green circles represent the output layer neurons. The entire diagnostic model has a total of 6 layers of neurons. The specific training process contains 6 steps as shown in Figure 1, raw vibration signals are first used for feature extraction through unsupervised learning, and the data is forwarded through the SAE, GBRBM, two-layer RBM and softmax layers, then fine-tune the weights and biases from unsupervised learning process of each layer according to the cross entropy error function.

The network training in Figure 1 consists of 6 steps. Table 1 shows the detailed calculation principle for the 6 steps, and also includes input values, output values, and parameters transferred for each layer. Figure 1 and Table 1 in combination gives a general understanding of the training procedure. First, the unsupervised learning is performed layer by layer. Then, the learned useful information is transferred into the new network. Finally, supervised fine-tune is performed to adjust the entire network. The detailed equations are shown in Table 1.

**Table 1.** Detailed process of proposed method.


**end**

**Output: h**RBM1, **W**RBM1, **b**RBM1

### *2.2. Spare Autoencoder*

Sparse autoencoder (SAE) [33,34] is an unsupervised learning network mainly used for data dimensionality reduction and feature extraction. The SAE includes three layers: input layer (*n* + 1 neuron), hidden layer (*m* + 1 neuron, *m* < *n*), and output layer (*n* neurons). Figure 1 show the structure of SAE, which can be seen to contain two processes of encoding and decoding.

The encoding process of SAE can be implemented by Equation (1), and the decoding process can be implemented by Equation (2).

$$\mathbf{h} = \text{sign}(\mathbf{W}\_1 \mathbf{x} + \mathbf{b}\_1) \tag{1}$$

$$\hat{\mathbf{x}} = \text{sign}(\mathbf{W}\_2 \mathbf{h} + \mathbf{b}\_2) \tag{2}$$

where **x** is the input matrix, **W**<sup>1</sup> and **b**<sup>1</sup> are the weight matrix and bias vector between input layer and hidden layer, **h** is the hidden matrix, **W**<sup>2</sup> and **b**<sup>2</sup> are the weight matrix and bias vector between hidden layer and output layer, and **<sup>x</sup>**<sup>ˆ</sup> is the output matrix; function *sigm* (·)=1/(1 + *<sup>e</sup>*<sup>−</sup>z).

When the mean square error (MSE) is used as the loss function of SAE, the expected processing results usually cannot be achieved. In order to make SAE perform better, a new loss function is designed as Equation (3), which consists of three parts: *J*MSE, *J*weight, and *J*sparse [35]. The purpose of using *J*weight is to control the value of the connected weights to avoid overfitting [36]. The added *J*sparse is a sparsity penalty term, which can make SAE learn more features from the input by forcing SAE to maintain a degree of sparsity [37,38].

$$J\_{\rm SAT} = J\_{\rm MSE} + \lambda \cdot J\_{\rm weight} + \beta \cdot J\_{\rm sparse} \tag{3}$$

where *J*MSE is the mean square error term as show in Equation (4), *J*weight is the weight penalty item as show in Equation (5), *J*sparse is the sparsity penalty term as show in Equation (6), *λ* is the regularization parameter of weight term, and *β* is the coefficient of sparsity penalty term.

$$J\_{\rm MSE} = \frac{1}{2s} \sum\_{i=1}^{s} ||\mathfrak{x}\_i - \mathfrak{X}\_i|| \tag{4}$$

$$J\_{\text{weight}} = \frac{1}{2} \sum\_{l=1}^{k-1} \sum\_{j=1}^{m\_l} \sum\_{i=1}^{n\_{l-1}} \mathcal{W}\_{ij}^{l} \tag{5}$$

$$J\_{\text{square}} = \sum\_{j=1}^{m} \text{KL}(\rho \| \| \beta\_{\bar{j}}) = \sum\_{j=1}^{m} (\rho \log \left(\frac{\rho}{\bar{\rho}\_{\bar{\restriction}}}\right) + (1 - \rho) \log \left(\frac{1 - \rho}{1 - \bar{\rho}\_{\bar{\restriction}}}\right)) \tag{6}$$

$$\boldsymbol{\beta}\_{j} = \frac{1}{s} \sum\_{i=1}^{s} \mathbf{h}\_{ij} \tag{7}$$

where *s* is the sample size of training set, *k* is the number of layers in the network, *nl* is the neurons in layer *l*, *ρ* is the set neuron sparsity parameter, and *ρ*ˆ*<sup>j</sup>* is the sparsity of the *j*-th neuron as show in Equation (7).

### *2.3. Develop the GBRBM based on RBM*

Restricted Boltzmann Machine (RBM) is the basic component of the deep belief network (DBN) [39,40]. Similar to the SAE, it is also an unsupervised learning network that can be used for feature extraction. The RBM contains two layers: visible layer (contains *n* visible units) and hidden layer (contains *m* hidden units). The neurons in the same layer are not connected, and neurons in different layer are connected in each other. The weight matrix connecting the two layers is denoted by **W**, the bias vector of the visible layer is denoted by **c**, and the bias vector of the hidden layer is denoted by **b**.

Inspired by statistical physics, it can be found that any probability distribution can be transformed into an energy-based model. The joint probability distribution of the visible layer and the hidden layer is proportional to the energy equation [41], as shown in Equation (8). And the joint probability distribution of v and h can be obtained as shown in Equation (9).

$$-\log P(\mathbf{v}, \mathbf{h}) \approx E(\mathbf{v}, \mathbf{h} | \theta) = -\sum\_{i=1}^{n} c\_i v\_i - \sum\_{j=1}^{m} b\_j \mathbf{h}\_j - \sum\_{i=1}^{n} \sum\_{j=1}^{m} v\_i w\_{ij} \mathbf{h}\_j \tag{8}$$

$$P(\mathbf{v}, \mathbf{h} | \theta) = \frac{1}{Z(\theta)} \exp(-E(\mathbf{v}, \mathbf{h} | \theta)) \tag{9}$$

where *vi* is the visible layer unit, *hj* is the hidden layer unit, *wij* is the weights between visible layer and hidden layer, *ci* and *bj* are the bias of two layers; *m* hidden units in hidden layer, *n* visible units in visible layers, *θ* = {*wij*, *ci*, *bj*} are the parameters of RBM, and Z(*θ*) = ∑*<sup>n</sup>* ∑*<sup>m</sup>* exp(−E(**v**, **h**|*θ*)) is a partition function.

The probability function of the visible layer is given by Equation (10).

$$\begin{array}{lcl}P(\mathbf{v};\theta) &= \sum\_{h} P(v,h;\theta) \\ &= \frac{1}{Z(\theta)} \sum\_{h} \exp\left(\sum\_{i=1}^{n} c\_{i}v\_{i} + \sum\_{j=1}^{m} b\_{j}h\_{j} + \sum\_{i=1}^{n} \sum\_{j=1}^{m} v\_{i}w\_{ij}h\_{j}\right) \\ &= \frac{1}{Z(\theta)} \exp\left(\sum\_{i=1}^{n} c\_{i}v\_{i}\right) \times \prod\_{j=1}^{m} \sum\_{h} \exp\left(b\_{j}h\_{j} + \sum\_{i=1}^{n} v\_{i}w\_{ij}h\_{j}\right) \end{array} \tag{10}$$

Combining Equations (9) and (10), the conditional probability of the hidden layer can be obtained as shown in Equation (11).

$$\begin{array}{lcl}P(\mathbf{h}|\mathbf{v};\theta) &= \frac{P(\mathbf{v},\mathbf{h};\theta)}{P(\mathbf{v};\theta)}\\ &= \prod\_{j} \frac{\exp\left(b\_{j}h\_{j} + \sum\_{i=1}^{n} v\_{i}w\_{j}h\_{j}\right)}{\sum\_{j} \left(b\_{j}h\_{j} + \sum\_{i=1}^{n} v\_{i}w\_{j}h\_{j}\right)} = \prod\_{j} P(h\_{j}|\mathbf{v})\end{array} \tag{11}$$

Similarly, the conditional probability of the visible layer can be based on the joint probability of **v** and **h** divided by independent probability of hidden layer, as show in Equation (12).

$$P(\mathbf{v}|\mathbf{h};\theta) = \frac{P(\mathbf{v},\mathbf{h};\theta)}{P(\mathbf{h};\theta)} = \prod\_{i} P(v\_i|h) \tag{12}$$

The neurons in the same layer are not connected, meaning that the units are conditionally independent. So the conditional probability of the visible layer and hidden layer can be calculated by Equations (13) and (14).

$$P(v\_i = 1 | \mathbf{h}) = \operatorname{sign}\left(c\_i + \sum\_j w\_{ij} h\_j\right) \tag{13}$$

$$P(h\_j = 1 | \mathbf{v}) = \text{sign}\left(b\_j + \sum\_i v\_i w\_{ij}\right) \tag{14}$$

where *sigm*(*x*) = 1/(1 + exp(−*x*)) is the sigmoid function.

The parameter update of the RBM can be obtained by performing a stochastic gradient descent on the negative log-likelihood probability of the training data. The gradient of the negative log probability visible layer to the network parameters can be calculated by Equations (15)–(17). The value of <·>data

is easy to get, but the value of <·>model is difficult to get. Therefore, the contrastive divergence (CD) algorithm was proposed by Hinton [42].

$$\frac{\partial \log p(\mathbf{v}; \theta)}{\partial w\_{ij}} = \left( \_{\text{data}} - \_{\text{model}} \right) \tag{15}$$

$$\frac{\partial \log p(\mathbf{v}; \theta)}{\partial \mathbf{b}} = (\_{\text{data}} - \_{\text{model}}) \tag{16}$$

$$\frac{\partial \log p(\mathbf{v}; \theta)}{\partial \mathbf{c}} = \left( \_{\text{data}} -\_{\text{model}} \right) \tag{17}$$

where <·>data indicates expectations for data distribution and <·>model is the expectation of the distribution of the model definition.

Both the visible layer and the hidden layer of RBM are binary layers. It is not appropriate to construct the RBM with the binary visible layer when the input is a continuous valued data. So this paper is to develop the Gauss-Binary RBM (GBRBM) [43–45] instead of standard RBM, and the energy function of the standard RBM in Equation (8) is changed to Equation (18).

$$E(\mathbf{v}, \mathbf{h} | \theta) = -\sum\_{i=1}^{n} \frac{\left(\upsilon\_i - c\_i\right)^2}{2\sigma\_i^2} - \sum\_{j=1}^{m} b\_j h\_j - \sum\_{i=1}^{n} \sum\_{j=1}^{m} \frac{\upsilon\_i}{\sigma\_i^2} w\_{ij} h\_j \tag{18}$$

where *σ*<sup>2</sup> *<sup>i</sup>* is the variance of Gaussian distribution.

With the energy equation in Equation (18), the conditional probability between the visible layer and the hidden layer can be obtained according to the derivation process in Section 2.2.

$$P(v\_i = v | \mathbf{h}) = N\left(v; c\_i + \sum\_j w\_{ij} h\_{j\prime} \sigma\_i^2\right) \tag{19}$$

$$P\left(h\_{\vec{\}}=1|\mathbf{v}\right) = \operatorname{sign}\left(b\_{\vec{\}} + \sum\_{i} \frac{\upsilon\_{i}}{\sigma\_{i}^{2}} w\_{i\vec{\}}\right) \tag{20}$$

where *<sup>N</sup>*(·, *<sup>μ</sup>*, *<sup>σ</sup>*<sup>2</sup> *<sup>i</sup>* ) is Gaussian distribution, also called normal distribution, *<sup>μ</sup>* is the mean, and *<sup>σ</sup>*<sup>2</sup> *<sup>i</sup>* is the variance.

The softmax classification layer is commonly used in the last layer of the neural network, and its working principle is shown in Equations (21) and (22).

$$y\_j = \operatorname{softmax}(\sum\_{i=1}^p \left( h\_i w\_{ij} + d\_j \right)) \tag{21}$$

$$
gamma(z\_i) = \mathfrak{e}^{z\_j} / \sum\_{j=1}^{q} \mathfrak{e}^{z\_j} \tag{22}
$$

where *wij* and *dj* are weights and bias of softmax layer, *hi* is the input of softmax layer, *p* is the number of neurons in input layer, and *q* is the number of neurons in output layer.

### **3. Experiment Setup and Data Acquisition**

In this paper, vibration data collected from experiments of seven gears with early gear pitting faults on a gear test rig were used to validate the proposed method. Figure 2 shows the gear test rig and the seven gears with the early gear pitting faults. The gearbox in the test rig consists of a pair of spur gears. The pinion gear is the driving gear (including 40 teeth, module 3 mm), and the large gear is the driven gear (including 72 teeth, module 3 mm). The gearbox is powered by two Siemens servo motors with a power of 45 kW. Motor 1 is the driving motor and motor 2 is the loading motor. The gearbox is equipped with a lubrication and cooling system. The tri-axial acceleration sensor was mounted on the gearbox housing (the red box in the figure) with a sampling rate of 10240 Hz, and the vibration signals in the three directions of X, Y and Z were collected.

**Figure 2.** (**a**) Experimental test rig (**b**) gear pitting type.

The gear pitting faults were artificially manufactured by the drill on the driven gear surface. The specific conditions of the gear pitting faults are shown in Table 2. The fault degree is gradually increased and the latter one fault includes all of the previous fault conditions.


**Table 2.** Driven gear pitting type.

The vibration signals were collected under 25 working conditions. The 25 working conditions included combinations of five speeds (100–500 rpm) and five torque levels (100–500 Nm). Taking the working condition of 500 rpm–500 Nm as an example, each of seven gear types performed five independent data acquisitions and resulted in a total of 35 sets (120,000 data points per set) of data. 80% of all the data was used for training and the remaining data was used for testing. Hence, a training data matrix of 120,000 × 28 and testing data matrix of 120,000 × 7 were generated.

If the data matrix is directly used as the inputs, the network will be complex and the training will be slow. Therefore each data set was divided into several segmentations. For the sampling rate of 10240 Hz and a rotation speed of 500 RPM, approximately 1200 data points per gear rotation can be computed. In each segment, 300 data points (quarter of the collected data per gear rotation) were included [46]. In this case, the training data matrix dimension was 300 × 11200 and test data matrix dimension was 300 × 2800. Figure 3a shows sample vibration signals of the seven gears in *Z*-axis under 500 rpm–500 Nm working condition and Figure 3b represents one segment of the corresponding sample vibration signals.

**Figure 3.** Sample vibration signals of seven gear types in *Z*-axis under 500 rpm–500 Nm: (**a**) signal with length of 0.5 s, (**b**) signal segment containing 300 data points.

### **4. Results and Discussion**

### *4.1. PCA Data Visualization During the Training Process*

To show the effectiveness by stacking SAE and GBRBM for extracting useful gear pitting fault information from the raw vibration signals, the network was trained with data from working condition 500 rpm–500 Nm. A total of six layers of neurons constitute the proposed diagnostic model, as shown in Figure 1. The structure of the proposed diagnostic model had the following structure: SAE: 300 × 300 (300 neurons in the input layer and 300 neurons in the hidden layer), GRRBM: 300 × 200 (300 neurons in the visible layer and 200 neurons in the hidden layer), RBM 1: 200 × 100 (200 neurons in the visible layer and 100 neurons in the hidden layer), RBM 2: 100 × 50 (100 neurons in the visible layer and 50 neurons in the hidden layer), Softmax: 50 × 7 (50 neurons in the input layer and seven neurons in the output layer). The size of the weight matrix and the biases were determined by the structure of the proposed model. The initial weights (W1 and W2) of SAE layer were randomly generated between 0 and 1. The initial weights of the softmax layer were randomly generated between 0 and 0.5. The remaining initial weights and biases were set to 0. The proposed diagnostic model was trained layer by layer. Steps 1, 2, 3, 4, and 6 were trained in 300 epochs, respectively. The parameter λ of SAE layer was set to be 0.005, *β* set to 1.5, and *ρ* set to 0.1. The learning rate of GBRBM was set to 0.005, the learning rate of RBM set to 0.5, and the learning rate of the back propagation process set to 0.05. The minimum training error of the back propagation process was set to 0.05. The entire network was calculated on a mini-batch with the batch size set to 100. There are many related parameters affecting the performance of the diagnostic model. The key parameters such as learning rate, structure of the network, and training epochs that have a great impact on the diagnostic results will be discussed in Section 4.3 below.

The outputs of each layer in the network structure were obtained and these outputs were further processed by PCA. The first two principal components of the PCA results are used to draw a scatter plot in Figure 4 to show the changes of data. The effectiveness of each layer of the network can be judged by observing the changes in the data through each layer of the neural network. In the experiment, the training and testing of the diagnostic model were performed using MATLAB 2014a software. The PCA results shown in Figure 4 were also obtained using the MATLAB codes. All the computational experiments were carried out on a PC with Windows 7 system and a CPU of Intel(R) Core i5-6500 @ 3.2GHz.

**Figure 4.** Each layer PCA result of three methods.

In Figure 4, three methods are shown. The first column in Figure 4 represents a standard DBN. The middle column represents the method with the first RBM layer of the standard DBN replaced with a GBRBM. The third column represents the proposed method by adding the SAE layer. As can be seen in Figure 4, the proposed method has the best fault separation result, and the separation result of the middle method is better than the standard DBN. Also seen from Figure 4, as the data moves from top

down, the level of the fault separation is getting better. Figure 5 shows the confusion matrix of the gear pitting fault diagnosis results of the three methods. Again, as shown in Figure 5, the proposed method has the best diagnosis accuracy of 0.9346, the method with the first RBM layer of the standard DBN replaced with a GBRBM has a diagnosis accuracy of 0.8939, and the standard DBN has the worst accuracy of 0.3954. Even though the confusion matrix shown in Figure 5b looks similar to that in Figure 5c obtained by the proposed method, the diagnostic accuracy for the confusion matrix shown in Figure 5b is 0.8939 while the diagnostic accuracy for the confusion matrix shown in Figure 5c is 0.9346. Therefore, the proposed method gives more accurate diagnosis results. As shown in Figure 5, the graph located at the 2nd row in the middle column represents the PCA result without going through the SAE layer, while the graph located at the 2nd row in the 3rd column represents the PCA result after being processed by the SAE layer. By comparing these two graphs in Figure 4, one should note that the PCA result obtained by the SAE layer in the proposed method gives a better pitting fault separation. The results have shown the effectiveness of SAE layer in the proposed method for extracting useful fault features when it is used for processing the vibration signals.

**Figure 5.** Confusion matrix of the fault diagnosis results: (**a**) standard DBN, (**b**) the first RBM layer of DBN replaced by GBRBM, (**c**) the proposed method.

### *4.2. Diagnostic Results of Proposed Method*

Figure 6 shows the diagnostic accuracy of the proposed method and the other seven traditional methods. The 7 traditional methods include: (1) The first RBM layer of DBN replaced by GBRBM, (2) standard DBN, (3) standard DNN, (4) ANN with time domain vibration features, (5) ANN with frequency domain vibration features, (6) SVM with time domain vibration features, and (7) SVM with frequency domain vibration features. The results include the diagnostic accuracy for each gear pitting fault condition under 500 rpm–500 Nm working condition and the averaged accuracy over seven gear pitting fault conditions. From Figure 6, in comparison with other methods, the performance of the proposed method is significantly better than other methods. It can also be seen that the diagnostic accuracy for gear pitting conditions C4 and C5 is maintained at a high level in various methods, indicating that they are easier to diagnose than other fault conditions. This can be explained by observing the vibration signal in Figure 3b. It can be found that the vibration signal of C4 and C5 are clearly distinguished from the other gear pitting fault signals.

**Figure 6.** Diagnosis accuracy for all seven gear pitting conditions and the averaged accuracy under 500 rpm–500 Nm working condition.

Figure 7 shows the averaged diagnostic accuracy over all seven gear pitting conditions under 500 rpm–500 Nm working condition in ten trials with eight different methods. It can be seen that the proposed method has the highest diagnostic accuracy. In comparison with the proposed method, the accuracy of the method with the first RBM layer of the standard DBN replaced with a GBRBM is slightly lower. The standard DNN methods also have more prominent diagnosis results.

**Figure 7.** Averaged diagnosis accuracy of the ten trails under 500 rpm–500 Nm working condition.

As shown in Figures 6 and 7, among the methods compared with the proposed method, standard DNN has shown a competitive performance under the 500 rpm–500 Nm working condition. To show the performance of the proposed method in comparison with DNN for all the working conditions, the vibration signals under 25 working conditions were used compute the diagnostic accuracy for both the proposed method and the standard DNN. The results are provided in Figure 8, Tables 3 and 4.

In Figure 8, the averaged diagnosis accuracy over seven gear pitting conditions under 25 working conditions is provided for both the proposed method and the standard DNN. Further, the average accuracy over all five torque levels for each speed in Figure 8 is computed and provided in Table 3. The average accuracy over all five speeds for each torque level in Figure 8 is computed and provided in Table 4.

**Figure 8.** Diagnostic accuracy under 25 working condition.


**Table 3.** Averaged accuracy under 5 speeds.

**Table 4.** Averaged accuracy under 5 torques.


It can be seen from Figure 8, Tables 3 and 4 that the average diagnostic accuracy of the proposed method is higher than that of the standard DNN under various speeds and torque conditions.

It can be seen from Tables 3 and 4 that the diagnostic accuracy under 100 Nm working condition can reach to 0.9729. In order to prove the repeatability of the diagnosis results, five consecutive diagnoses were performed for five working conditions under 100 Nm. The diagnostic results are shown in Table 5. The averaged diagnostic accuracy of the five diagnosis results under 100Nm working condition is 0.9744, indicating that the proposed diagnostic method has high diagnostic reliability.


**Table 5.** Diagnosis accuracy of 5 trials under 5 working conditions.

### *4.3. The Effect of the Parameters on the Diagnostic Accuracy*

To investigate effect of the parameters of the proposed method on the performance of the gear pitting fault diagnosis, experiments were performed. In the first experiment, diagnostic accuracy results with epochs increased from 30 to 300 in an increment of 5 were obtained. In the network structure of the proposed method, the number of neurons in the input layer and the output layer were 300 and 7.

In order to investigate the impact of the network structure on the performance of the proposed method, a structure parameter Nλ was designed to represent the middle layer. Let Nλ be an integer coefficient between 1 and 10. In this case, the network structure of the proposed method can be represented as: 300-Nλ×(30-20-10-5)-7. In the second experiment, diagnostic accuracy results with Nλ increased from 1 to 10 in an increment of 1 were obtained. The results of the first and second experiments are provided in Figure 9. From Figure 9a, the average accuracy of ten trials gradually increases when the training epochs increased from 30 to 120, and reached to constant level after 120 epochs. Figure 9b shows the effect of the parameter Nλ on the diagnostic accuracy of the network structure. When N<sup>λ</sup> is increased from 1 to 4, the diagnostic accuracy is greatly improved. However, as N<sup>λ</sup> reaches over 4, the improvement becomes insignificant.

**Figure 9.** Parameters affecting the diagnosis accuracy: (**a**) epochs, (**b**) *N*λ.

To investigate the impact of the learning rate on the performance of the proposed method, in the third experiment, diagnostic accuracy results with the different learning rates (lr) in RBM and GBRBM were obtained. The results are provided in Figure 10. As seen from Figure 10, the learning rate of GBRBM has a greater impact on the diagnostic accuracy. When the learning rate of GBRBM is greater than 0.03, the accuracy decreased rapidly.

**Figure 10.** The influence of learning rate on the diagnosis accuracy.

### **5. Conclusions**

In this paper, a novel method for early gear pitting fault diagnosis with raw vibration signals as direct inputs was presented. The method was developed by stacking a spare autoencoder (SAE) and a Gauss-Binary restricted Boltzmann machine (GBRBM). The vibration data collected from the gear test rig was used to validate the diagnostic capability of the proposed method. The validation results have shown that the proposed method is capable of gear pitting fault diagnosis with high accuracy. The performance of the proposed method was also compared with other 7 methods including: (1) The first RBM layer of DBN replaced by GBRBM, (2) standard DBN, (3) standard DNN, (4) ANN with time domain vibration features, (5) ANN with frequency domain vibration features, (6) SVM with time domain vibration features, and (7) SVM with frequency domain vibration features. The results of the comparison have shown that the proposed method outperform the other methods in terms of the gear pitting fault diagnostic accuracy. The effect of parameters of the proposed method on the diagnostic performance of the proposed method was investigated and discussed in the paper.

**Author Contributions:** Conceptualization, D.H. and J.L.; methodology, J.L.; software, J.L.; validation, D.H., Y.Q., J.L. and X.L.; formal analysis, J.L.; investigation, D.H. and Y.Q.; resources, D.H. and Y.Q.; data curation, J.L. and X.L.; writing—original draft preparation, J.L.; writing—review and editing, D.H.; visualization, J.L.; supervision, D.H.; project administration, J.L. and X.L.; funding acquisition, D.H. and Y.Q.

**Funding:** This work was funded by the National Natural Science Foundation of China (No. 51675089 and No. 51505353).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Self-Adaptive Spectrum Analysis Based Bearing Fault Diagnosis**

### **Jie Wu, Tang Tang \*, Ming Chen and Tianhao Hu**

School of Mechanical Engineering, Tongji University, Shanghai 201804, China; 1710020@tongji.edu.cn (J.W.); chen.ming@tongji.edu.cn (M.C.); sirius.hu@tongji.edu.cn (T.H.)

**\*** Correspondence: tang.tang@tongji.edu.cn; Tel.: +86-139-1645-8293

Received: 16 September 2018; Accepted: 1 October 2018; Published: 2 October 2018

**Abstract:** Bearings are critical parts of rotating machines, making bearing fault diagnosis based on signals a research hotspot through the ages. In real application scenarios, bearing signals are normally non-linear and unstable, and thus difficult to analyze in the time or frequency domain only. Meanwhile, fault feature vectors extracted conventionally with fixed dimensions may cause insufficiency or redundancy of diagnostic information and result in poor diagnostic performance. In this paper, Self-adaptive Spectrum Analysis (SSA) and a SSA-based diagnosis framework are proposed to solve these problems. Firstly, signals are decomposed into components with better analyzability. Then, SSA is developed to extract fault features adaptively and construct non-fixed dimension feature vectors. Finally, Support Vector Machine (SVM) is applied to classify different fault features. Data collected under different working conditions are selected for experiments. Results show that the diagnosis method based on the proposed diagnostic framework has better performance. In conclusion, combined with signal decomposition methods, the SSA method proposed in this paper achieves higher reliability and robustness than other tested feature extraction methods. Simultaneously, the diagnosis methods based on SSA achieve higher accuracy and stability under different working conditions with different sample division schemes.

**Keywords:** fault diagnosis; feature extraction; self-adaptive spectrum analysis; bearing

### **1. Introduction**

Bearings are critical parts in rotating machines and their health condition has a great impact on production. However, because of non-linear factors such as frictions, clearance and stiffness, vibration signals of bearings acquired in real application scenarios are characterized by non-linearity and instability which make bearing fault diagnosis difficult [1].

The general fault diagnosis process involves three main steps, namely signal acquisition and processing, fault feature extraction and fault feature classification [2]. Sensors are utilized to acquire signals with noises, and signal processing techniques are applied subsequently to improve the signal-to-noise ratio [3]. Particularly, ideal fault feature extraction can express the feature information of filtered signals comprehensively and efficiently, and it is the basis to produce an accurate fault feature classification. Therefore, a reasonable and efficient fault feature extraction plays an important role in fault diagnosis. Current fault extraction methods mainly include time domain, frequency domain and time-frequency domain analysis [4].

Time domain analysis is one of the earliest methods studied and applied. It calculates various statistical parameters in the time domain, for instance peak amplitude, kurtosis and skewness [5–7] to construct feature vectors. Frequency domain analysis transforms signals from the time domain into the frequency domain first, mainly focusing on Fourier Transform (FT) [8], then the periodical features, frequency features and distribution features of signals are extracted with methods such as cepstrum analysis and envelope spectrum analysis to construct feature vectors [9,10].

However, time domain or frequency domain analysis only extracts the information in the corresponding domain, resulting in the loss of information in the other domain. With in-depth study, time-frequency domain fault feature extraction methods were developed accordingly. They can extract both time and frequency information. They also have shown superiority for analyzing nonlinear and unstable signals.

As a typical time and frequency domain analysis method, Short Time Fourier Transform (STFT) [11] improves the analysis capability for unstable signals by introducing a fixed-width time window function. However, a fixed-width time window function in STFT cannot guarantee optimal time and frequency resolution simultaneously. The Wavelet Transform (WT) [12] introduces time and frequency scale coefficients to overcome the drawbacks of STFT. WT is based on the theory of inner product mapping and a reasonable basis function is the key to guarantee the effectiveness of WT. However, it is difficult to select a proper basis function. Therefore, to improve the adaptive analysis capability to signals, Empirical Mode Decomposition (EMD) [13] and Local Mean Decomposition (LMD) [14] methods were successively studied and applied. According to the local characters of signals themselves, EMD and LMD adaptively decompose a signal into various components which have better statistical characters for later analysis. Compared with each other, EMD is a mature tool for long-term study and usage, while LMD has an improved decomposition process and better decomposition results with physical explanations [15].

In recent years, EMD and LMD have been extensively studied and implemented. Mejia-Barron et al. [16] developed a method based on EMD to decompose signals and extract features, completing the fault diagnosis of winding faults. Saidi et al. [17] introduced a synthetical application of bi-spectrum and EMD to detect bearing faults. Cheng et al. [18] combined EEMD and entropy fusion to extract fault features for planetary gearboxes, and furthermore implemented fault diagnosis successfully. Yi et al. [19] also utilized EEMD to pre-process signals for further fault diagnosis for bearings. Liu and Han et al. [20] applied LMD and multi-scale entropy methods to extract fault features and analyzed faults successfully. Yang et al. [21] proposed an ensemble local mean decomposition method and applied it in rub-impact fault diagnosis for rotor systems. Han and Pan et al. [22] integrated LMD, sample entropy and energy ratio to process vibration signals and realized the fault feature extraction and fault diagnosis in rolling element bearings. Yasir and Koh et al. [23] adopted LMD and multi-scale permutation entropy and realized bearing fault diagnosis. Guo et al. [24] studied an improved fault diagnosis method for gearbox combining LMD and a synchrosqueezing transform.

Fault feature classification is implemented after fault feature extraction. Nowadays, shallow machine learning methods are extensively utilized to solve the classification problem. Support Vector Machine, Artificial Neural Network and Fuzzy Logical System are widely applied in condition monitoring and fault diagnosis [25]. Particularly, SVM is based on statistics and minimum theory of structured risk, and it has better classification performance when dealing with the practical problems of a small amount of and non-linear samples. To solve the multi-class classification problems, based on SVM, Cherkassky [26] proposed a one-against-all (oaa) strategy in his studies, transforming a N-class classification problem into *N* binary classification problems. Also, Kressel [27] used a method to transform a N-class classification problem into *N*(*N* − 1)/2 binary classification problems, namely the one-against-one (oao) strategy. Wu et al. [28] adopted SVM to diagnosis via analyzing the full-spectrum to extract fault features. Saimurugan et al. [29] improved the diagnosis performance by integrating SVM and avdecision tree. Santos et al. [30] selected SVM for classification in wind turbine fault diagnosis with several trails of different kernels.

Currently, researchers all over the world have carried out extensive studies on bearing fault diagnosis. To our best knowledge, fault diagnosis methods still need further study, although various solutions have been investigated from different aspects. The main problems to be solved in this paper are summarized as follows:


In order to improve the fault diagnosis performance, in this paper, SSA is proposed to adaptively extract fault features and construct unfixed-dimension feature vectors according to local characters of signals. Then, SSA is implemented under the designed framework. Signals are decomposed firstly to obtain components with better analyzability, LMD and EEMD are both utilized to decompose signals into different components from different analysis aspects. SSA is utilized to extract fault features adaptively and feature vectors with non-fixed dimensions are constructed subsequently. Finally, SVM is selected to classify the fault features considering its inherent advantages to small amount train samples.

### **2. Methodology**

### *2.1. Self-Adaptive Spectrum Analysis*

Aiming at solving the problem that conventional feature extraction methods neglect local details of signal and fault information may be redundant or insufficient because of fixed-dimension feature vectors, Self-adaptive Spectrum Analysis (SSA) is proposed. With the SSA method, unfixed-dimension feature vectors are constructed by extracting the local characteristics of signals adaptively.

At first, a number of signals corresponding to different categories of fault types are selected. To implement SSA method efficiently, Fast Fourier Transform (FFT) is used to transform the signals into frequency domain to get corresponding spectrums for better readability. Then an overall frequency-window is set to all spectrums according to the fluctuation in spectrums, and local feature information inside the frequency-window is extracted to construct feature vectors.

In order to implement the proposed SSA, some definitions are given:

### **Definition 1.** *Differential frequency f*z.

*f*<sup>z</sup> is the minimum frequency unit in SSA. Normally, feature information is extracted at points corresponding to *n f*z (*n* = 1, 2, 3, . . . ), where *f*z is calculated as follows:

Firstly, in each spectrum, the maximum amplitude and corresponding frequency value are found. All the frequency values are denoted as *f*1, *f*2, *f*3, ... , *fm*, where *m* means the sequence number of signals. More than two fault categories must be included within the selected signals.

Secondly, the frequency values are arranged into different vectors according to the categories of samples; vectors are denoted as:

$$v\_i = \left[ f\_{(i-1)m/k+1\prime} f\_{(i-1)m/k+2\prime} \dotsm f\_{(i-1)m/k+m/k} \right] \tag{1}$$

where *k* means *k* kinds of faults, *i* = [1, 2, . . . *k*]. Here we assume that different categories have the same amount of signals. Then, the average values of all elements in each vector are figured out and denoted as *v*1, *v*2,... , *vk*, respectively, then a vector *f* = [*v*1, *v*<sup>2</sup> ... *vk*] is constructed.

Thirdly, minimum frequency value *f*min and the maximum frequency value *f*max are selected in vector *f* . Then, two neighboring frequency values are also selected in *f* , between which there is the maximum value among the differences between every two neighboring frequencies, the lower frequency is denoted as *f*low and the higher one is denoted as *f*high.

Finally, *f*min, *f*max, *f*low, *f*high are arranged in ascending order, and absolute values *f*diff of differences between every neighboring two frequencies are calculated. The minimum non-zero *f*diff value is picked to be the value of *f*z:

$$f\_{\mathbf{x}} = \min(f\_{\text{diff}}) \tag{2}$$

**Definition 2.** *Frequency Window W* = [ *f*l, *f*r].

The frequency window is a specific frequency section for extracting feature information, *f*<sup>l</sup> is the left boundary while *f*r is the right boundary. Frequency window is determined with fixed boundaries, and feature information is extracted inner the window. Boundaries are calculated as follows:

$$f\_1 = f \\lor \left(\frac{f\_{\text{min}}}{f\_{\text{z}}}\right) \* f\_{\text{z}} \\ \tag{3}$$

$$f\_{\mathbf{r}} = \operatorname{ceil} \left( \frac{f\_{\text{max}}}{f\_{\mathbf{z}}} \right) \* f\_{\mathbf{z}} \tag{4}$$

where *floor*(∗) is a round down function, *ceil*(∗) is a round up function.

**Definition 3.** *Tolerance μ*.

Tolerance *μ* denotes that in a section which is centered with *n f*z, *μ* is taken as the semidiameter to determine the searching section (*n f*<sup>z</sup> − *μ*, *n f*<sup>z</sup> + *μ* ], and the maximum amplitude value corresponding to a frequency within this section can be regarded as the amplitude value to *n f*z. *μ* is calculated as follows:

$$
\mu = flow\left(\frac{f\_{\mathbf{z}}}{2}\right) \tag{5}
$$

**Definition 4.** *Peak value ratio coefficient h*.

*h* denotes the degree of peak amplitude value. It is utilized to judge whether the amplitude value is normal or not and all *h* construct fault feature vectors. *h* is calculated as follows:

Firstly, average value of all the amplitude values in frequency window [*f*l, *f*r] is calculated, denoted as *A*ave, also the maximum amplitude value in section (*n f*<sup>z</sup> − *μ*, *n f*<sup>z</sup> + *μ* ] is selected and denoted as *A*max. Finrfally, *h* can be calculated as follows:

$$h = \frac{A\_{\text{max}}}{A\_{\text{ave}}} \tag{6}$$

Figure 1 gives a description of the definitions mentioned above.

**Figure 1.** Sketch map of parameters in adaptive spectrum analysis.

Combined with Figure 1, SSA is implemented on each spectrum as follows:


### *2.2. Framework Construction of Fault Diagnosis*

The overall framework construction of the proposed fault diagnosis method based on SSA in our research is shown in Figure 2.

**Figure 2.** SSA-based diagnostic framework.

The proposed fault diagnosis method includes three parts, namely data processing, fault feature extraction and fault feature classification.

### 2.2.1. Data Processing

As shown in Figure 3, a signal segment containing 120,000 points is selected, then it is segmented into 100 parts with a same length. In total, 100 samples are extracted from one signal segment. Therewith, 100 samples are separated into a training sample set and a test sample set.

Each sample is decomposed into a set of components with better analyzability with a time-frequency analysis method, LMD and EEMD are two commonly used ones. The very first component in each set of components is chosen to extract fault features because they accumulate the main part of the energy.

**Figure 3.** Segmentation of samples.

### 2.2.2. Fault Feature Extraction

FFT is utilized to transform the decomposed component into the frequency domain, and then SSA is implemented to extract fault features. First the components of the training samples are selected to calculate *f*z, and *f*z is utilized for both the training samples and test samples to extract fault features.

### 2.2.3. Fault Feature Classification

Fault feature vectors are classified into different fault patterns. Vectors extracted from the training samples are utilized to train the classification model and parameters are tuned to optimize the model. Here, SVM is selected because of its better performance in classification with small samples. Eventually, categories are output with the well-trained model.

### *2.3. Experiment Preparation*

### 2.3.1. Data Selection and Processing

Vibration signals acquired from bearings are utilized for validation. In this paper, selected bearing data published by Case Western Reverse University were used [32]. Single point faults are introduced to the test bearings on different parts (ball, inner race and outer race) to simulate different kinds of faults. Vibration signals of different kinds of faults with different failure degrees are collected under different loads to construct the experimental data set.

The data set consisted of vibration data collected on SKF bearings, and the sampling frequency is 12 kHz. Twelve kinds of combinations under four kinds of loads (0, 1, 2 and 3 hp) and three kinds of failure degrees (0.007, 0.014 and 0.021 inch) form 12 different working conditions.

Under each working condition, four kinds of fault mode (normal, ball fault, inner race fault and outer race fault) are simulated, and four time-varying signals corresponding to the faults are collected, respectively. Each signal is processed with the proposed method given in Figure 3 to extract 100 samples, and 100 feature vectors are subsequently constructed. Eventually, 400 feature vectors are determined under every working condition.

### 2.3.2. Parameter Determination

Parameters corresponding to decomposition methods, fault feature extraction process and fault feature classification modeling process are determined as follows:

Parameters to be determined in signal decomposition methods:


Parameters to be determined in SSA method:


Parameters to be determined in pattern recognition method:

(1) In SVM, cost *c* is a basic parameter while *g* is a specific one in RBF kernel. In this paper, Grid search [35] is applied and overall accuracy is taken into consideration to tune the two parameters.

### **3. Experiments and Results**

### *Experiment Results and Analysis*

In this subsection, a simulated signal *x*(*t*) is utilized to evaluate the effectivity of decomposition methods [33]. *x*(*t*) consists of two superimposed component signals: *x*(*t*) = (1 + 0.5*cos*(9*πt*))*cos*(200*πt* + 2*cos*(10*πt*)) + 3*cos* 20*πt* <sup>2</sup> + 6*πt t* ∈ [0, 1]

The LMD and EEMD methods are used to decompose the signal. Figure 4 illustrates the results of the decomposition.

**Figure 4.** Results of decomposition for simulated signal.

Figure 4a shows the oscillograph of the simulated signal. In Figure 4b,c, the oscillographs in red are two original components of the raw simulated signal, and the ones in blue are the Product Function (PF) components extracted with the LMD method. Obviously, the original components and extracted PF components have a high similarity except for several end points on the right. In Figure 4d,e, the oscillographs in blue are the first two Intrinsic Mode Function (IMF) components extracted with the EEMD method; both of them have less similarity with the original ones. These results prove that LMD adopted in the research can effectively decompose the raw signal into PF components which

have physical significance, and EEMD can decompose the raw signal into IMF components by another mechanism [18].

Four experimental sets are designed combining two different signal decomposition methods: two different feature extraction methods and a fault feature classification method. The four experimental sets are arranged as shown in Table 1. LMD and EEMD are utilized to decompose the signals. The fault feature extraction methods include the proposed SSA and the combination of Sample Entropy (SE) and Energy Ratio (ER) [36], and the LIBSVM [37] software package is selected to implement the pattern classification.


**Table 1.** Arrangement of experiments.

In each experiment set, 12 kinds of working conditions (a working condition is denoted as a load-fault five kinds of sample division scheme are tested (a sample division scheme is denoted as: number of training samples in 100 samples to every fault/number of test samples in 100 samples to every fault, for example 5/95, 10/90, 20/80, 40/60, 60/40), with each scheme, 10 independent experiments are repeated. Ultimately, 2400 experiments are carried out in total within the four experiment sets. Table 2 shows the *fz* values and dimensions of feature vectors under 12 working conditions with sample division schemes of 5/95 and 60/40, respectively, in experimental set 1.


**Table 2.** Values of differential frequency and dimensions of character vectors.

The results illustrate that when the working condition or division scheme changes, the differential frequency *f*z value and dimension value of the feature vectors change accordingly.

Without considering sample division schemes, the overall diagnostic capability of proposed model is evaluated. The average values and variance of accuracy values to all independent experiments (50 times) under each working condition are listed in Tables 3 and 4.


**Table 3.** Average diagnostic accuracy of all independent experiments corresponding to 12 different working conditions respectively in 1st–4th experiment sets.

**Table 4.** Variances of diagnostic accuracy of all independent experiments corresponding to 12 different working conditions respectively in 1st–4th experiment sets.


Figure 5a,b transforms Tables 3 and 4 in graphic ways, respectively.

**Figure 5.** (**a**) Average diagnostic accuracy of all independent experiments corresponding to 12 different working conditions respectively in 1st–4th experimental sets. (**b**) Variance of diagnostic accuracy of all independent experiments corresponding to 12 different working conditions respectively in 1st–4th experimental set.

Table 3 and Figure 5a show that Set 3 achieves the best average accuracies under six kinds of working conditions and Set 1 achieves the best under five kinds of working conditions, while Set 4 only ranks the first place under one kind of working conditions. Overall, the average accuracies of Set 1 and Set 3 are 97.42% and 98.27%, maintaining a higher level, yet the average accuracies of Set 2 and Set 4 are only 92.21% and 80.36%, being especially worse in severe failure situations. Meanwhile, the variance of average accuracies under different working conditions of Set 1 is 1.99, and the numbers of Set 2, Set 3 and Set 4 are 37.84, 4.08 and 195.72, respectively. Obviously, the statistics of Set 1 and Set 3 indicate better performances than Set 2 and Set 4.

Table 4 and Figure 5b show that Set 3 obtains the smallest variances under nine working conditions and Set 1 is the least under to kinds of working conditions, while Set 4 only performs best under one working condition. Clearly, the average value of variance values under 12 different working conditions for Set 1, Set 2, Set 3 and Set 4 are 5.81, 18.12, 3.17 and 20.52, respectively. Set 1 and Set 3 outperform Set 2 and Set 4. Meanwhile the values of the variances of Set 2 and Set 4 show obvious fluctuation under severe failure conditions.

As a matter of fact, to simulate a situation that labeled data are rare in real application scenarios, a small amount of samples division scheme is tested. Average values and variances of the accuracy values of all independent experiments (10 times) under each working condition with a (5/95) sample division scheme are shown in Tables 5 and 6.

**Table 5.** Average diagnostic accuracy to all independent experiments corresponding to 12 different working conditions respectively with the scheme (5/95) in 1st–4th experiment sets.


**Table 6.** Variances of diagnostic accuracy to all independent experiments corresponding to 12 different working conditions respectively with the scheme of (5/95) in 1st–4th experiment sets.


Figure 6a,b also illustrates the results of Tables 5 and 6 in graphic ways, respectively.

**Figure 6.** (**a**) Average diagnostic accuracy to all independent experiments corresponding to 12 different working conditions respectively with the scheme of (5/95) in the 1st–4th experimental sets. (**b**) Variance of diagnostic accuracy to all independent experiments corresponding to 12 different working conditions, respectively, with the scheme of (5/95) in the 1st–4th experimental sets.

Table 5 and Figure 6a show that Set 3 achieves the best average accuracies under eight kinds of working conditions and Set 1 achieves the best under two kinds of working conditions, while Set 4 only ranks the first place under two kinds of working conditions. Overall, the average accuracies of Set 1 and Set 3 are 95.94% and 97.59%, still maintaining a high level with slight decreases compared to the overall average accuracies, yet the average accuracies of Set 2 and Set 4 are only 87.22% and 76.15%, being especially worse in severe failure situations, and showing sharp decreases compared to overall average accuracies. Meanwhile, the variance of average accuracies under different working conditions of Set 1 and Set 3 are 10.65 and 4.00, and the numbers for Set 2 and Set 4 are 20.51 and 15.4, respectively. Obviously, the results in Set 1 and Set 3 are better than the results in Set 2 and Set 4.

Table 6 and Figure 6b show that the variance values of 50 independent experiments in Set 3 under each working condition maintain a steady low level and the mean value of 12 values is 4.00. While in Set 1, variance values appear obvious fluctuation under 3–0.014 only, and the mean value is 10.65; values in Set 2 and Set 4 fluctuate wildly and the mean values are 20.51 and 14.54, respectively.

The convergence performance with the increase of the amount of training samples is also a key indicator to evaluate a model. Experiments are conducted under different working conditions with different sample division schemes, also, mean value and variance of accuracy values of all independent experiments (10 times) are calculated and listed in Table 7. Figure 7 transforms Table 7 into diagrams.

In Table 7, 60 comparisons are conducted under different working conditions with different sample division schemes, and the results show that Set 1 and Set 3 have better performance in average accuracy with 56 comparisons out of 60, while Set 2 or Set 4 only get higher accuracies under the working condition of 0–0.007 with four kinds of schemes. As shown in Figure 7, with the increase of the number of training samples, the average accuracies of Set 1 and Set 3 obviously converge toward the highest value faster than Set 2 and Set 4.


**Table 7.** Average diagnostic accuracy to all independent experiments corresponding to 12 different working conditions respectively with different schemes in 1st–4th experiment sets.

Considering all the results comprehensively, further analysis is carried out. LMD and EEMD can decompose nonlinear and unstable signals into a set of components in the time domain, and these components have better analyzability. The proposed SSA method can adaptively extract feature information according to local characteristics, and construct unfixed-dimension fault feature vectors, and it is proved to have better efficiency and robustness. SSA-based fault diagnosis methods can obtain higher accuracies under different working conditions with different sample division schemes in most comparisons (56/60), and the accuracies show less fluctuation between different conditions. With the increasing number of samples, the accuracies achieved with the SSA-based method converge towards the highest values faster. Especially with a small sample division scheme (5/95), the results have shown that methods based on SSA still maintain high accuracy and stability and they are proved specially suitable for practical application in scenarios with small amounts of training samples.

**Figure 7.** Average diagnostic accuracy to all independent experiments corresponding to 12 different working conditions, respectively, with different schemes in the 1st–4th experimental sets.

### **4. Conclusions**

To improve the fault extraction performance, SSA is proposed in this paper. Combined with signal decomposition methods, SSA extracts fault features from non-linear and unstable signals effectively, then fault features are classified with SVM. Bearing data under 12 different working conditions obtained from CWRU are utilized to evaluate the diagnosis methods. The conclusions may be summarized as follows:


### **5. Future Lines of Work**

In recent years, deep learning has been adopted gradually in fault diagnosis. It can extract fault features automatically because of its multi-layer structure, this characteristic can improve the feature extraction further. At the same time, transfer learning [38] has achieved great success in many fields. Its generalization capability can be also utilized in fault diagnosis to promote the diagnostic theories to applications. Therefore, our future work will be focused on the study of implementation of the combination of deep learning and transfer learning in fault diagnosis.

**Author Contributions:** J.W. conceived the original ideas, then designed and conducted all the experiments, subsequently drafted the manuscript. T.T. and T.H. and L.W. contributed to writing-review and editing. M.C. provided supervision to the project.

**Funding:** This research was funded by the Ministry of Industry and Information Technology of the People's Republic of China (MIIT; Projects: 2017 Prefabricated Building New Material Intelligent Manufacturing Construction Project, 2018 Microwave Component Digital Factory New Mode Application, 2018 Remote Operation and Maintenance Standards and Experimental Verification of Key Equipment for Integrated Circuit Packaging), and the Young Excellent Talent Cultivation Project at Tongji University (2016KJ020).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Ultrasonic Flaw Echo Enhancement Based on Empirical Mode Decomposition**

### **Wei Feng 1, Xiaojun Zhou 1, Xiang Zeng <sup>2</sup> and Chenlong Yang 1,\***


Received: 3 December 2018; Accepted: 6 January 2019; Published: 9 January 2019

**Abstract:** The detection of flaw echoes in backscattered signals in ultrasonic nondestructive testing can be challenging due to the existence of backscattering noise and electronic noise. In this article, an empirical mode decomposition (EMD) methodology is proposed for flaw echo enhancement. The backscattered signal was first decomposed into several intrinsic mode functions (IMFs) using EMD or ensemble EMD (EEMD). The sample entropies (SampEn) of all IMFs were used to select the relevant modes. Otsu's method was used for interval thresholding of the first relevant mode, and a window was used to separate the flaw echoes in the relevant modes. The flaw echo was reconstructed by adding the residue and the separated flaw echoes. The established methodology was successfully employed for simulated signal and experimental signal processing. For the simulated signals, an improvement of 9.42 dB in the signal-to-noise ratio (SNR) and an improvement of 0.0099 in the modified correlation coefficient (MCC) were achieved. For experimental signals obtained from two cracks at different depths, the flaw echoes were also significantly enhanced.

**Keywords:** ultrasonic flaw echo enhancement; empirical mode decomposition; sample entropy; Otsu's method for thresholding; flaw echo separation

### **1. Introduction**

The ultrasonic technique has been widely used in nondestructive testing. Usually, the backscattered signal is complex due to the existence of electronic noise and backscattering noise. Consequently, flaw echo detection may be challenging. Numerous methods have been proposed to enhance flaw echoes, such as split spectrum processing [1–4], wavelet transforms [5–10], the Stockwell transform [11–14], and empirical mode decomposition (EMD) [15–22] (including the so-called ensemble EMD, i.e., EEMD [23]).

Split spectrum processing has significant advantages in processing ultrasonic signals with scattered noise. Split-spectrum analysis separates the spectrum of the signals to obtain several sub-bands, and uses some nonlinear de-noising criteria (such as thresholding method, etc.) to process the signals in each sub-band to achieve the purpose of de-noising. The difficulty of split spectrum processing is how to determine the filter type, central frequency, bandwidth and other parameters. In addition, split spectrum processing lacks the capability of multiresolution analysis.

Wavelet transform is a classical multiresolution analysis method. The difficulty of wavelet transform is how to choose the appropriate wavelet base function and decomposition layer. The disadvantage of conventional wavelet transform is that the phase information of signals is lost.

S transform is the development of short-time Fourier transform and wavelet transform. S transform combines the multiresolution analysis ability of wavelet transform and the phase retention ability of short-time Fourier transform. Meanwhile, the S transform adopts Gaussian window function, which satisfies the normalization characteristic, so the S transform is invertible, that is, the original

signal can be obtained from the converted time spectrum. However, due to the fact that the standard deviation of Gaussian window function in S transform is inversely proportional to the frequency and lacks flexibility, S transform may output the result of poor time-frequency resolution. At present, to compensate for the limitations of S transform, researchers introduce additional parameters to control the window function morphology, so that the generalized S transform has the ability to flexibly adjust the time-frequency resolution.

For a single component signal, Hilbert transform can be applied to obtain its analytical signal, and then the envelope spectrum and instantaneous frequency of the signal can be obtained. Detection of a flaw signal from the envelope spectrum is a common method of ultrasonic nondestructive testing. For a multi-component signal, it is necessary to decompose them into single component signals, and then obtain their analytical signals separately. Empirical mode decomposition (EMD) is an adaptive decomposition method. The original signal is decomposed into a series of intrinsic mode functions. The main disadvantages of EMD are the possibility of endpoint effect and mode mixing. Ensemble empirical mode decomposition (EEMD) can solve the mode mixing issue. EEMD decomposes the original signal by adding Gaussian white noise, and takes the result of the lumped average as the mode function. The main difficulty of EEMD is to select the intensity of Gaussian white noise and the times of lumped average.

In this article, an EMD-based methodology for ultrasonic flaw echo enhancement was established. The proposed methodology enhanced the flaw echo through six steps. First, the backscattered signal was adaptively decomposed into several intrinsic mode functions (IMFs) by EMD or EEMD. Second, the sample entropies (SampEn) [24,25] of all IMFs were calculated, and the differences in consecutive SampEn values were studied. Third, those IMFs with a large SampEn were considered to be irrelevant modes and were discarded, significantly suppressing the electronic noise. Fourth, the intervals containing the flaw echo were determined based on IMF interval thresholding and mode cell merging. Otsu's method [26] was used to search the threshold in IMF interval thresholding. Fifth, a Turkey-Hanning window was used to separate the flaw echo for each relevant mode. Finally, the denoised signal was reconstructed by combining the separated flaw echoes and the residue.

The remainder of this article is organized as follows. In Section 2, reviews of the required tools, including EMD and EEMD, EMD-based denoising methods, SampEn, and Otsu's method for thresholding, are presented. An analysis of the modes extracted from the backscattered signal, including the mixing of noise and flaw echoes, and the SampEn is given in Section 3. The proposed EMD-based methodology for ultrasonic flaw echo enhancement is given in Section 4 and tested using a simulated signal in Section 5. In Section 6, experimental validations of the proposed EEMD-based methodology are presented. Finally, conclusions are drawn in Section 7.

### **2. Required Tools**

### *2.1. EMD and EEMD*

In EMD, a signal is adaptively decomposed into a collection of IMFs. The EMD results can be presented as

$$\mathbf{x}(t) = \sum\_{i=1}^{L} h^{(i)}(t) + r(t) \tag{1}$$

where *<sup>x</sup>*(*t*) is the observed signal, *<sup>h</sup>*(*i*)(*t*)(*<sup>i</sup>* <sup>≤</sup> *<sup>L</sup>*) are the extracted IMFs, and *<sup>r</sup>*(*t*) is the residue.

Unfortunately, EMD is susceptible to mode-mixing. EEMD is an effective technique for alleviating mode-mixing in EMD by repeatedly adding Gaussian white noise and finding the mean of the individual ensemble IMFs as the final IMF.

### *2.2. Denoising Strategies Based on EMD*

Partial reconstruction, direct thresholding, and interval thresholding are three typical strategies adopted in EMD-based denoising.

Partial reconstruction removes noise from the observed signal by discarding irrelevant modes, which can be expressed as

$$\hat{x}(t) = \sum\_{i=M\_1}^{L} h^{(i)}(t) + r(t) = x(t) - \sum\_{i=1}^{M\_1 - 1} h^{(i)}(t) \tag{2}$$

where *h*(*i*)(*t*)(*i* < *M*1) are the irrelevant modes.

Direct thresholding is a direct application of wavelet thresholding in the EMD case. For the hard thresholding case, the denoised IMF is given by

$$\hat{h}^{(i)}(t) = \begin{cases} h^{(i)}(t), & \left| h^{(i)}(t) \right| > T\_i \\ 0, & \left| h^{(i)}(t) \right| \le T\_i \end{cases} \tag{3}$$

where *Ti* is the threshold of *h*(*i*)(*t*). The denoised signal can be given by

$$\widetilde{X}(t) = \sum\_{i=M\_1}^{M\_2} \widetilde{h}^{(i)}(t) + \sum\_{i=M\_2+1}^{L} h^{(i)}(t) + r(t) \tag{4}$$

The interval thresholding divides an IMF into several mode cells and treats each mode cell as a whole to perform thresholding. Generally, a mode cell is defined as the signal between two adjacent zero-crossings. For the interval **z** (*i*) *<sup>j</sup>* = [*z* (*i*) *<sup>j</sup> z* (*i*) *<sup>j</sup>*+1] defined by two zero-crossings *z* (*i*) *<sup>j</sup>* and *z* (*i*) *<sup>j</sup>*+1, the denoised IMF in the hard thresholding case is given as

$$\widetilde{h}^{(i)}(\mathbf{z}\_{j}^{(i)}) = \left\{ \begin{array}{ll} h^{(i)}(\mathbf{z}\_{j}^{(i)}), & \left| h^{(i)}(r\_{j}^{(i)}) \right| > T\_{i} \\ 0, & \left| h^{(i)}(r\_{j}^{(i)}) \right| \le T\_{i} \end{array} \right. \tag{5}$$

where *h*(*i*)(**z** (*i*) *<sup>j</sup>* ) are all of the samples from *z* (*i*) *<sup>j</sup>* to *z* (*i*) *<sup>j</sup>*+1, and *<sup>h</sup>*(*i*)(*<sup>r</sup>* (*i*) *<sup>j</sup>* ) is the single extremum of *<sup>h</sup>*(*i*)(*t*) in the interval **z** (*i*) *j* .

Interval thresholding generally outperforms direct thresholding as it avoids catastrophic consequences for the continuity of the reconstructed signal, which are inevitable in direct thresholding.

### *2.3. Sample Entropy*

Approximate entropy [27] and SampEn are two popular metrics for signal complexity measurement. Entropy values increase with increased signal complexity. It has been reported that SampEn outperforms approximate entropy in many aspects, such as reduced bias, independence from the signal, and relative consistency. SampEn was used here for signal complexity assessment.

The SampEn of a specified time series {*u*(*i*), *i* = 1, 2, ··· , *N*} can be obtained through the following steps.

Step 1: Form *m*−dimensional vectors as

$$lI\_m(i) = [\mu(i), \mu(i+1), \dots, \mu(i+m-1)]\tag{6}$$

where *i* = 1, 2, . . . , *N* − *m* + 1.

Step 2: Define the distance of two such vectors:

$$d\left[\mathcal{U}\_{\mathfrak{m}}(i), \mathcal{U}\_{\mathfrak{m}}(j)\right] = \max\_{k=0-(m-1)} |u(i+k) - u(j+k)|\tag{7}$$

Step 3: Consider the first *N* − *m* vectors of length *m* so that for *i* = 1, ··· , *N* − *m*, both *Um*(*i*) and *Um*+1(*i*) can be defined.

Step 4: Given a threshold *r* > 0, define

$$\begin{array}{l}B\_i^m(r) = \frac{1}{N-m-1} \sum \Theta(r - d[\mathcal{U}\_m(i), \mathcal{U}\_m(j)])\\A\_i^m(r) = \frac{1}{N-m-1} \sum \Theta(r - d[\mathcal{U}\_{m+1}(i), \mathcal{U}\_{m+1}(j)])\end{array} \tag{8}$$

where *j* = 1, 2, . . . , *N* − *m*, *j* = *i*, and Θ(·) is defined as

$$\Theta(\mathbf{x}) = \begin{cases} \ 1, & \mathbf{x} \ge 0 \\\ 0, & \mathbf{x} < 0 \end{cases} \tag{9}$$

Step 5: Calculate *Bm*(*r*) and *Am*(*r*):

$$\begin{aligned} B^m(r) &= \frac{1}{N-m} \sum\_{i=1}^{N-m} B\_i^m(r) \\ A^m(r) &= \frac{1}{N-m} \sum\_{i=1}^{N-m} A\_i^m(r) \end{aligned} \tag{10}$$

Step 6: For a limited series, the SampEn (denoted by *s*) is estimated as

$$s = SampEn(m, r) = -\ln \frac{A^m(r)}{B^m(r)}\tag{11}$$

In general, the dimension and threshold are often set to *m* = 2 and *r* = (0.1 ∼ 0.25)*SDu*, where *SDu* is the standard deviation of the time series {*u*(*i*)}.

### *2.4. Otsu's Method for Thresholding*

Otsu's method determines a threshold by maximizing the between-class variance *σ*<sup>2</sup> <sup>B</sup>. For a histogram with *H* levels (i.e., bins), the probability at each level can be first obtained:

$$p\_i = \frac{n\_i}{N}, i = 1, 2, \dots, H \tag{12}$$

where *ni* is the number of elements in the *i*th level, and *N* is the number of elements in the histogram. Obviously, *pi* ≥ 0 and ∑ *pi* = 1 are satisfied.

The histogram can be divided into two classes, *C*<sup>1</sup> and *C*2, with a threshold. *σ*<sup>2</sup> <sup>B</sup> is defined as

$$
\sigma\_\mathcal{B}^2 = \omega\_1 (\mu\_1 - \mu\_0)^2 + \omega\_2 (\mu\_2 - \mu\_0)^2 \tag{13}
$$

where 
$$\omega\_1 = \sum\_{\mathbb{C}\_1} p\_i, \quad \omega\_2 = \sum\_{\mathbb{C}\_2} p\_i, \quad \omega\_1 + \omega\_2 = 1 \tag{14}$$

*μ*0, *μ*1, *μ*<sup>2</sup> are the means of the histogram, class *C*<sup>1</sup> and class *C*2, respectively. Therefore, Equation (15) can be obtained:

$$
\mu\_0 = \omega\_1 \mu\_1 + \omega\_2 \mu\_2 \tag{15}
$$

According to Equations (13)–(15), *σ*<sup>2</sup> <sup>B</sup> can also be expressed as

$$
\sigma\_\text{B}^2 = \omega\_1 \omega\_2 (\mu\_1 - \mu\_2)^2 \tag{16}
$$

### **3. Analysis of Modes from Ultrasonic Signals**

### *3.1. The Clutter Model*

For metallic materials, when an incident ultrasonic wave propagates into the specimen, the backscattered signal will be primarily composed of three components: (1) the flaw echo signal *s*(*t*), (2) backscattering noise *v*(*t*) due to the grains, and (3) electronic noise *n*(*t*) due to the instruments and the environment. *n*(*t*) can be approximated as Gaussian white noise. The frequency spectra of *s*(*t*) and *v*(*t*) can be expressed as [2]

$$V(\omega) = H\_{\rm t}^{2}(\omega) \sum\_{k=1}^{K} \beta\_{k} \frac{\omega^{2}}{x\_{k}} e^{-x\_{k} 2x\_{k}\omega^{4}} e^{-\text{i}\omega \frac{2x\_{k}}{c\_{0}}} \tag{17}$$

$$S(\omega) = H\_{\rm t}^{2}(\omega) \exp(-\mathfrak{a}\_{\rm s} 2d\_{\rm flow} \omega) \exp(-\mathrm{i} \frac{2d\_{\rm flux}}{\mathfrak{c}\_{0}}) \tag{18}$$

where *H*t(*ω*) is the frequency response of the ultrasonic transducer, and *d*flaw is the location of the flaw. *α*<sup>s</sup> is the material attenuation coefficient, *c*<sup>0</sup> is the velocity of the longitudinal waves, and *K* is the total number of scatterers. *β<sup>k</sup>* and *xk* are the scattering coefficient and the position of the *k*th scatterer, respectively. The amplitudes of *s*(*t*) and *v*(*t*) are often normalized for brevity.

The similarity function obtained by deconvolution can be used to distinguish the flaw signals in ultrasonic inspection from other no flaw signals, such as specimen geometric reflection. It was found that the deconvolution patterns of the geometric reflection were impulse-like patterns, whereas those of flaws were bipolar patterns [28]. Therefore, geometric reflection was not considered in the model.

The observed backscattered signal *x*(*t*) can be expressed as

$$\mathbf{x}(t) = s(t) + \mu \mathbf{v}(t) + \sigma\_n \mathbf{u}\_0(t) \tag{19}$$

where *n*0(*t*) is standard Gaussian white noise. *μ* and *σ<sup>n</sup>* are scale factors.

Typical simulated results are depicted in Figure 1. The centre frequency of the ultrasonic transducer was 5 MHz, and the sampling frequency was 100 MHz. The scale factors were set to *μ* = 0.3 and *σ<sup>n</sup>* = 0.2. It can be seen in Figure 1a that the flaw echo had been polluted by intense noise. In addition, frequency aliasing of the flaw echo, backscattering noise, and wide-band Gaussian white noise can be found in Figure 1b.

**Figure 1.** Simulated results. (**a**) Waveforms and (**b**) frequency spectra.

### *3.2. Signal Decomposition*

The EMD results for the observed signal *x*(*t*) are shown in Figure 2.

**Figure 2.** Intrinsic mode functions (IMFs) extracted. (**a**) IMF 1 ~ IMF 4; (**b**) IMF 5 ~ IMF 8.

As shown in Figure 2, intense white noise was found in the low-order IMFs; in particular, IMF 1 resembled pure noise. The flaw echo was clearly detected in IMF 3. Consequently, the flaw echo was significantly enhanced by discarding IMF 1 and IMF 2 from the observed signal. However, partial reconstruction is often inadequate; further processing is required. Here, we take IMF 3 as an example.

The flaw echo is an instant signal. Ideally, oscillations can only be detected in the interval in which the flaw echo is located in IMF 3, in contrast to Figure 2. The difference mainly arises from frequency aliasing of the flaw echo, backscattering noise and wide-band Gaussian white noise. In addition, even in the noiseless case, the IMFs still contain false oscillations, as they resemble AM-FM modulated sinusoids. Consequently, further denoising is required to suppress the mixed noise lying outside the interval in which the flaw echo is located.

### *3.3. SampEn Values of IMFs*

The SampEn values of all of the IMFs were calculated and are listed in Table 1, where the threshold *r* in the SampEn calculations was set to (0.1, 0.15, 0.2)*SDu*.


**Table 1.** SampEn of the IMFs.

It can be noted that the SampEn tends to decrease with increasing IMF order, which indicates that the noise intensity in each IMF decreases as IMF order increases. It is noteworthy that the SampEn values of the first two IMFs, i.e., *s*<sup>1</sup> and *s*2, were much higher than the others. In addition, a sharp drop between *s*<sup>2</sup> and *s*<sup>3</sup> was detected.

### **4. Proposed Methodology**

According to Section 3.2, flaw echo enhancement can be achieved by discarding irrelevant modes and suppressing the mixed noise in the remaining relevant modes. Specifically, the relevant modes are determined by the SampEn of all IMFs or their differences, and the mixed noise is suppressed by separating the flaw echo from those relevant modes by windowing.

### *4.1. Relevant Mode Selection*

The IMFs with intense noise were of much higher complexity than other IMFs. Consequently, those relevant modes were determined according to the SampEn of the IMFs. Specifically, the parameter *M*1, which determined the first relevant mode, was determined by *si* or the difference of *si*.

Given a predefined SampEn threshold *T*s, *M*<sup>1</sup> can be determined as

$$M\_1 = (\text{max}i) + 1, \quad \text{s.t.} \ s\_i > T\_5 \tag{20}$$

Using the difference of *si*, i.e., d*si*, *M*<sup>1</sup> can also be determined:

$$M\_1 = \underset{i}{\text{argmin}} \{ \text{ds}\_i \} = \underset{i}{\text{argmin}} \{ s\_i - s\_{i-1} \} \tag{21}$$

To determine *T*s, we should note that the SampEn is dependent on the threshold *r*. For example, Table 1 shows that *s*<sup>1</sup> and *s*<sup>2</sup> decreased rapidly with increasing *r*. Consequently, the selection of *T*<sup>s</sup> was dependent on *r*. For this article, *r* = 0.15*SDu* and *T*<sup>s</sup> = 1 were selected.

### *4.2. Mixed Noise Suppression*

Two steps are required to suppress mixed noise: determine the location of the flaw echo in the first relevant mode and separate the flaw echo from all of the relevant modes.

For the first relevant mode, i.e., *h*(*M*1)(*t*), a collection of mode cells is built according to the zero-crossings. The set of absolute values of the extrema in *h*(*M*1)(*t*), which is denoted ' ' '**r** (*i*) *j* ' ' ' = [ ' ' '*r* (*i*) 1 ' ' ', ' ' '*r* (*i*) 2 ' ' ', ··· ], can be determined. Otsu's method is used for searching the threshold of ' ' '**r** (*i*) *j* ' ' ', i.e., *Ti*. It is performed on the first relevant mode, and all of the mode cells in which the flaw echoes are located can be detected.

All of the adjacent mode cells in which the flaw echo is located are further merged, yielding an interval in which the flaw echo is located. For example, three adjacent mode cells, **z** (*i*) *<sup>j</sup>* = [*z* (*i*) *<sup>j</sup> z* (*i*) *<sup>j</sup>*+1], **z** (*i*) *<sup>j</sup>*+<sup>1</sup> = [*z* (*i*) *<sup>j</sup>*+1*z* (*i*) *<sup>j</sup>*+2] and **z** (*i*) *<sup>j</sup>*+<sup>2</sup> = [*z* (*i*) *<sup>j</sup>*+2*z* (*i*) *<sup>j</sup>*+3], can be merged into one interval [*z* (*i*) *<sup>j</sup> z* (*i*) *<sup>j</sup>*+3]. In other words, the interval in which the flaw echo is located is often composed of several mode cells containing the flaw echo.

Thus far, the location of the flaw echo in the first relevant mode has been detected. Next, the flaw echoes in all relevant modes will be separated.

A window was determined and used for separating the flaw echoes in the relevant modes. Herein, the Turkey-Hanning window was used. The window was defined as

$$\omega(m) = \begin{cases} \frac{(1-\cos 2\pi \frac{M}{M+1})}{2}, & 1 \le m \le \frac{M}{2} \\\ 1, & \frac{M}{2} < m \le \frac{M}{2} + W \\\ \frac{1-\cos 2\pi \frac{M-W}{M+1}}{2}, & \frac{M}{2} + W < m \le M+W \end{cases} \tag{22}$$

where *W* determined the width of the pass zone and *M* determined the width of the transition zone. As an example, Figure 3 shows the waveform of a Turkey-Hanning window.

**Figure 3.** Waveform of a Turkey-Hanning window.

The parameters *W* and *M* were determined according to the width of the interval in which the flaw echo was located. To preserve the flaw echo, *W* was set to the width of this interval. *M* was selected more flexibly. In this article, *M* = [*W*/4] was adopted, where the operator "[]" indicated rounding.

### *4.3. Summarization of the Proposed Methodology*

In this methodology, the flaw echo was enhanced in six steps.

Step 1. Signal decomposition. Decompose the observed signal *x*(*t*) into a collection of the residue *<sup>r</sup>*(*t*) and the IMFs' *<sup>h</sup>*(*i*)(*t*)(*<sup>i</sup>* <sup>=</sup> 1, 2, ··· , *<sup>L</sup>*) using EMD or EEMD.

Step 2. SampEn calculation. Obtain the SampEn of all of the IMFs.

Step 3. Relevant mode selection. Determine the first relevant mode using the SampEn or the differences between them.

Step 4. Determine the interval in which the flaw echo is located in the first relevant mode. Otsu's method is used for threshold selection. Any two adjacent mode cells in which the flaw echo is located are merged into one interval.

Step 5. Separate the flaw echo using the Turkey-Hanning window in each relevant mode, which yields a collection of denoised modes &*h*(*i*)(*t*) (*M*<sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>L</sup>*).

Step 6. Reconstruct the denoised signal:

$$\tilde{\mathfrak{x}}(t) = \sum\_{i=M\_1}^{L} \tilde{h}^{(i)}(t) + r(t) \tag{23}$$

### **5. Simulated Signal Processing**

### *5.1. Performance Assessment*

To assess the performance of the EMD-based methodology, two metrics are introduced.

The signal-to-noise ratio (SNR) is the first metric. Suppose that *s*(*t*) is a noiseless signal; the SNR of signal *x*(*t*) is then defined as

$$SNR = 10 \lg \frac{\sum\_{i=1}^{N} s^2(i)}{\sum\_{i=1}^{N} \left( x(i) - s(i) \right)^2} \tag{24}$$

Well-preserved flaw echoes are expected in practice. Specifically, the amplitudes and shapes of flaw echoes are expected to be unchanged. This can be assessed by the modified correlation coefficient (MCC), which is defined as

$$\text{MCC} = \left| \frac{A\_x - A\_s}{A\_s} \right| \left( 1 - \frac{\sum\_{i=m}^{n} \left( s(i) - \overline{s} \right) \left( x(i) - \overline{x} \right)}{\sqrt{\sum\_{i=m}^{n} \left( s(i) - \overline{s} \right)^2 \sum\_{i=m}^{n} \left( x(i) - \overline{x} \right)^2}} \right) \tag{25}$$

where *m* and *n* are instants defining the interval in which the flaw echo is located. *Ax* and *As* are the amplitudes of the flaw echoes in *x*(*t*) and *s*(*t*), respectively.

A high SNR and low MCC are expected.

### *5.2. Signal Processing Results*

The relevant modes were first determined. On one hand, with the predefined parameters *r* = 0.15*SDu* and *T*<sup>s</sup> = 1, it can be seen in Table 1 that *s*1,*s*<sup>2</sup> > *T*<sup>s</sup> and *s*<sup>3</sup> < *T*<sup>s</sup> and were satisfied. Consequently, *M*<sup>1</sup> = 3 was determined according to Equation (20). On the other hand, it could be inferred from Table 1 that d*s*<sup>3</sup> = *s*<sup>3</sup> − *s*<sup>2</sup> < d*sk*(*k* = 3) was satisfied, which also indicates *M*<sup>1</sup> = 3 according to Equation (21). Thus, these two methods yielded the same relevant mode selection results. Consequently, the electronic noise could be significantly suppressed if we reconstructed the flaw echo via partial reconstruction by discarding *h*(1)(*t*) and *h*(2)(*t*). The corresponding reconstructed signal, *x*ˆ(*t*), is depicted in Figure 4a.

**Figure 4.** Processing results of the simulated signal. (**a**) The reconstructed signal given by partial reconstruction; (**b**) the detection interval in which the flaw echo was located; (**c**) the denoised signal.

*h*(3)(*t*) was used to determine the interval in which the flaw echo was located. The corresponding result is depicted in Figure 4b, for which the threshold given by Otsu's method was 0.2902. The flaw echoes in all relevant modes *<sup>h</sup>*(*i*)(*t*) (3 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> 8) were separated by the Turkey-Hanning window, and the denoised signal was reconstructed. The denoised signal is depicted in Figure 4c, which indicated that the electronic noise and backscattering noise were significantly suppressed, and the flaw echo were preserved well.

The performance of the EMD-based methodology was assessed using the SNR and the MCC. The results are provided in Table 2, which shows that the methodology achieved a high SNR and low MCC as expected. The improvements in SNR and MCC were 12.89 dB and 0.0099, respectively.


**Table 2.** Performance assessment.

### **6. Experimental Study**

The ultrasonic testing system was primarily composed of an ADVANTECH industrial personal computer (IPC), an Olympus ultrasonic probe with a centre frequency of 5 MHz, and a PCIUT3100 ultrasonic acquisition card installed on the IPC. A 6061 aluminium alloy specimen with two artificial cracks was used for ultrasonic testing. The specimen material density was 2.7 × <sup>10</sup><sup>3</sup> kg/m3 and the sound wave velocity was 6300 m/s. The two cracks, denoted F1 and F2, were machined using wire electrical discharge machining. The buried depths of F1 and F2 were 60 mm and 35 mm, respectively, and the sampling frequency was 100 MHz. The ultrasonic testing system and the specimen are shown in Figure 5.

**Figure 5.** The ultrasonic test system and specimen. (**a**) The ultrasonic testing system. (**b**) The specimen. (**c**) Geometry of the specimen. (**d**) Schematic diagram of flaws location and size, with dimensions in millimeters.

The ultrasonic signals acquired from cracks F1 and F2 are shown in Figure 6. The backscattered signals from instant 1001 to instant 3000 were used for further analysis.

**Figure 6.** The ultrasonic signals. (**a**) Acquired from F1; (**b**) acquired from F2.

The processing results of the backscattered signal from crack F1 are shown in Figure 7. The backscattered signal was first decomposed into a collection of IMFs, as shown in Figure 7a. For brevity, only the first five IMFs are shown. Correspondingly, the SampEn values of the first five IMFs were calculated and are listed in Table 3. As the first relevant mode, the third IMF was used to determine the interval in which the flaw echo was located based on Otsu's method for thresholding

and merging of adjacent mode cells, as shown in Figure 7b. The denoised signal is shown in Figure 7c, indicating that both the backscattering noise and the electronic noise were significantly suppressed.

**Figure 7.** Processing results of the backscattered signal from F1. (**a**) The IMFs extracted; (**b**) the interval in which the flaw echo was located; (**c**) the denoised signal.

**Table 3.** SampEn of the first five IMFs.


As supplementary information, Figure 8 shows the denoised signal of the backscattered signal from F2, with a significant reduction of the noise.

**Figure 8.** The denoised signal of the ultrasonic signal from F2.

### **7. Conclusions**

An EMD-based methodology was proposed for ultrasonic flaw echo enhancement. The observed signal was decomposed into IMFs by EMD or EEMD. The relevant modes were determined according to the SampEn of the IMFs. Otsu's method was used for interval thresholding of the first relevant mode, obtaining the interval in which the flaw echo was located. The flaw echoes in all IMFs were separated by the Turkey-Hanning window. The separated flaw echo and the residue were added together, yielding the denoised signal. Simulation results demonstrated that the EMD-based methodology achieved a high SNR and low MCC. Applications of the EMD-based methodology in flaw echo enhancement in experimental signals was also presented.

**Author Contributions:** Conceptualization, W.F., X.Z. (Xiaojun Zhou) and C.Y.; formal analysis, W.F.; funding acquisition, C.Y.; investigation, W.F., X.Z. (Xiang Zeng) and C.Y.; methodology, W.F. and X.Z. (Xiang Zeng); resources, X.Z. (Xiaojun Zhou) and C.Y.; supervision, X.Z. (Xiaojun Zhou); writing—original draft, W.F.; writing—review and editing, W.F., X.Z. (Xiaojun Zhou), X.Z. (Xiang Zeng) and C.Y.

**Funding:** This research was funded by 1: The Fundamental Research Funds for the Central Universities under Grant No.2018QNA4001; 2: Zhejiang Provincial Natural Science Foundation of China under Grant No.LY18E050002.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
