**Contents**



### *Editorial* **Advanced Fault Diagnosis and Health Monitoring Techniques for Complex Engineering Systems**

**Yongbo Li 1,\*, Bing Li 1, Jinchen Ji 2 and Hamed Kalhori 3**


### **1. Introduction**

Fault diagnosis and health condition monitoring have always been critical issues in the engineering research community. Over the past decade, with the rapid development of artificially structured materials, advanced sensing and data-driven intelligence algorithms, fascinating technical possibilities have been reported in the area of fault diagnosis and in the health condition monitoring of complex engineering systems. However, with the development of highly efficient intelligent algorithms, recent fault diagnosis and health monitoring strategies have become highly automated and are encountering sophisticated problems in terms of data availability, computational complexity, accuracy, etc. Meanwhile, combined with advanced intelligent algorithms, flourishing developments such as new sensing techniques, diagnostic approaches and the design of new types of metamaterials have also enabled significant advances and emerging opportunities in the field of system health condition monitoring. These studies will doubtlessly promote the reliability, availability and robustness of systems for the fault diagnosis and health monitoring of complex engineering systems.

This Special Issue of *Sensors* aims to collect research works encompassing the whole area of fault diagnosis and health monitoring techniques for engineering systems. This collection contains a total of 11 papers representing the current status of the research related to different methods of monitoring the health and reliability of engineering systems.

Targeting the limitations of the original transition permutation entropy (TPE) method, Guo et al. [1] propose a multiscale transition permutation entropy (MTPE) method. Furthermore, considering the weaknesses of the proposed multiscale approach, the feature extraction ability of the MTPE method is further improved by proposing a composite multiscale transition permutation entropy (CMTPE) approach. Lastly, the researchers input the features extracted using the CMTPE method into an extreme learning machine (ELM) to perform the fault diagnosis of a bearing.

Bykerk et al. [2] used vibro-acoustic sensors for detecting leaks in the water distribution mains of an urban area. The real-time data collected from the extensive deployment of the vibro-acoustic sensors across a sprawling metropolitan city were used to monitor the presence and absence of pipe leaks using a convolutional neural network (CNN) after pre-processing via short-time Fourier transform (STFT). Different external factors, such as pipeline size, pipeline material and the soil condition around the pipe, are also taken into consideration.

Asadi et al. [3] designed a Takagi–Sugeno (TS) fuzzy-based sliding mode observer (SMO) to reconstruct the faults in actuators and sensors installed in a nonlinear system subjected to unknown external disturbance. A non-quadratic Lyapunov function (NQLF) and fmincon function were used to guarantee the stability of the proposed SMO as a matlab

**Citation:** Li, Y.; Li, B.; Ji, J.; Kalhori, H. Advanced Fault Diagnosis and Health Monitoring Techniques for Complex Engineering Systems. *Sensors* **2022**, *22*, 10002. https:// doi.org/10.3390/s222410002

Received: 10 November 2022 Accepted: 11 November 2022 Published: 19 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

https://www.mdpi.com/journal/sensors

1

optimization tool. The influence of unknown disturbances and uncertainties are minimized by utilizing a performance criterion. The proposed method provides better accuracy, less conservative optimization conditions and improved generality in comparison to other existing state-of-the-art methods.

Hu et al. [4] combined piecewise aggregate approximation (PAA) with complete ensemble empirical mode decomposition (CEEMDAN) to alleviate the high memory requirements and low computational efficiency of the CEEMDAN method in bearing fault diagnosis. Vibration signals were used to study the efficacy of the proposed method. An enhanced bearing fault diagnosis performance was obtained using the proposed method.

Aiming to solve the problem of unavailable data in online fault detection in rolling element bearings, a multiscale deep support vector data description (Deep-SVDD) approach is proposed by Kou et al. [5] By utilizing data enhancement technology, training data were transformed into multiple subspaces. Then, a subsequent clustering algorithm was utilized to enhance the robustness of the features. Lastly, the proposed Deep-SVDD model was constructed to achieve the online monitoring of the health of rolling element bearings. The proposed method can be utilized to detect incipient faults in a bearing.

A new oversampling algorithm, namely, MeanRadius-SMOTE, is proposed by Duan et al. [6] for diagnosing mechanical faults regarding unbalanced data. The newly proposed method can effectively avoid the generation of useless and noisy samples and solve the multiclassification problem regarding different mechanical faults. A complete diagnosis of the faults in mechanical equipment can be achieved using the proposed method.

Mao et al. [7] addressed the challenges of an incomplete training dataset using a crossdomain intelligent fault diagnosis approach and proposed a novel deep learning approach called the partial transfer ensemble learning framework (PT-ELF). After substituting the missing health states with another dataset, the proposed method was able to address the variable data distribution challenge by training a weak global classifier and two partial domain adaption classifiers. Lastly, a specific ensemble strategy was used to combine these classifiers for fault diagnosis.

Aldawood et al. [8] developed a self-vibration-powered energy harvester sensor system to tackle the environmental threat posed by unused batteries in battery-powered sensors in wireless sensor networks (WSN). Dual moving magnets bordered by coil windings were used for power and signal generation in a harvester sensor unit. A radio frequency (RF) transmitter is operated using the power generated from the harvester, and the generated signal from this sensor is transmitted as the vibration signal. Lastly, a custom-made APP is utilized to detect faults in this system.

A 1D dilated convolutional neural network (1-DDCNN) is proposed by Chen et al. [9] for the fault diagnosis in an aircraft retraction/extension (R/E) system.Aiming to solve the limited feature information extraction and fault diagnosis ability of 1-DCNNs, multiple feature parameters have been used. Moreover, the main fault mode of the R/E system for aircraft landing gears has been studied, specifically exploring its working principal and the influence of convolutional kernel size on the classification accuracy.

Lee et al. [10] studied the optimal sensor selection criteria in a multi-sensor-based fault diagnosis of a roll-to-roll printed electronics system. Data are collected for four major defects of a Gravure roll-to-roll printed electronic system with three triaxial acceleration signals. Smart data were formed from the collected raw data obtained by a sensor data efficiency evaluation; a sensitivity evaluation for axis selection considering the directional nature of faults; and feature variable optimization using the feature combination matrix method. The progressive application of the aforementioned phases enhanced the fault diagnosis results in terms of accuracy, calculation time, predictive ability and data storage capacity.

Pan et al. [11] proposed a new method to investigate the mitigation of commonly occurring rotor–stator rub impact faults in aero-engines. A pre-strained, two-way shape memory alloy (SMA) wire was used in the design of a current-driven active control actuator

to mitigate the occurrence of rub impact faults. The feasibility of the proposed scheme is verified by different properties of the used NiTi wires. Finally, a prototype of the schemed actuator was designed and manufactured for testing under various conditions. The status of the rub impact fault was monitored using an acoustic emission sensor.

On behalf of all the editors of this Special Issue, we would like to extend our heartiest gratitude for the contributions from the authors to this project. We would also like to extend our sincere thanks to all the reviewers and members of the editorial board of *Sensors*.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


#### *Article* **Impact of Sensor Data Characterization with Directional Nature of Fault and Statistical Feature Combination for Defect Detection on Roll-to-Roll Printed Electronics**

**Yoonjae Lee 1, Minho Jo 1, Gyoujin Cho 2, Changbeom Joo 3 and Changwoo Lee 4,\***


**Abstract:** Gravure printing, which is a roll-to-roll printed electronics system suitable for high-speed patterning of functional layers have advantages of being applied to flexible webs in large areas. As each of the printing procedure from inking to doctoring followed by ink transferring and setting influences the quality of the pattern geometry, it is necessary to detect and diagnose factors causing the printing defects beforehand. Data acquisition with three triaxial acceleration sensors for fault diagnosis of four major defects such as doctor blade tilting fault was obtained. To improve the diagnosis performances, optimal sensor selection with Sensor Data Efficiency Evaluation, sensitivity evaluation for axis selection with Directional Nature of Fault and feature variable optimization with Feature Combination Matrix method was applied on the raw data to form a Smart Data. Each phase carried out on the raw data progressively enhanced the diagnosis results in contents of accuracy, positive predictive value, diagnosis processing time, and data capacity. In the case of doctor blade tilting fault, the diagnosis accuracy increased from 48% to 97% with decreasing processing time of 3640 s to 16 s and the data capacity of 100 Mb to 5 Mb depending on the input data between raw data and Smart Data.

**Keywords:** defect detection; Directional Nature of Fault; gravure printing; fault diagnosis; roll-to-roll printed electronics; sensor data characterization

### **1. Introduction**

Roll-to-roll processing is highly advantageous because it results in multiple functional layers of electronic circuitry printed on large flexible materials (i.e., web) [1–3]. Gravure printing is the desirable mode for fabricating these printed electronic devices, owing to its characteristic high-speed patterning of component layers [4–6]. Gravure printing can be classified into the following four phases: inking, doctoring, ink transfer, and ink setting [7,8]. Printing defects can be generated by undesired printing conditions and ink characteristics during each printing phase [9–12]. For example, during the doctoring phase, the misalignment of the doctor blade at either side can degrade the ink uniformity in the engraved patterns in the width direction (i.e., transverse direction (TD)). Moreover, non-uniform nip roll pressure can negatively affect the uniformity of the pattern thickness in the TD. To derive high-quality patterns with uniform thickness using the roll-to-roll gravure printing process, it is necessary to recognize and diagnose these.

In this study, a method of data characterization using sensor data efficiency evaluation (*SE*), directional nature of fault (DNF), and feature combination matrix (FCM) is proposed to diagnose these major faults. The aim is to recognize defects in advance and improve the

**Citation:** Lee, Y.; Jo, M.; Cho, G.; Joo, C.; Lee, C. Impact of Sensor Data Characterization with Directional Nature of Fault and Statistical Feature Combination for Defect Detection on Roll-to-Roll Printed Electronics. *Sensors* **2021**, *21*, 8454. https:// doi.org/10.3390/s21248454

Academic Editors: Yongbo Li, Bing Li, Jinchen Ji and Hamed Kalhori

Received: 19 November 2021 Accepted: 14 December 2021 Published: 18 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

diagnosis results by optimizing the training (input) data acquired from multiple sensors for the machine-learning fault diagnosis model. We find that the misalignment of the doctor blade, eccentricity of the nip and printing rolls, and non-uniform nip pressure can be indirectly measured via the vibration of the doctor blade, the nip roll, and the frames supporting the printing module. Through the acquisition of vibration data using multiple sensors, a vibration dataset (i.e., Raw data) is acquired. The smart data clearly show the characteristics of the vibration caused by the factors mentioned above, and they are selected from the raw dataset using the proposed methods in three phases to maximize performance efficiency. The evaluation criteria include diagnosis accuracy, positive predictive value (PPV), processing time for diagnosis, and data capacity. The performance of the machinelearning model developed using smart data was compared to that of the model just using the raw dataset.

With significant growth of industrial machines, recent studies have raised concerns regarding the maintenance of operating conditions. Profound interest in the fields of fault diagnosis based on data acquisition of sensors has been shown in recent research. Xia et al. presented convolutional neural network-based feature extraction approaches for fault diagnosis of rotating machines with multiple sensors [13]. Duan et al. have reviewed fields of fault diagnosis and condition monitoring based on multi-sensors for rolling bearings by presenting foundational knowledge [14]. Studies with multirate data and sensors for fault diagnosis by feature extracting deep learning models has been carried out by Zhao et al. and Huang et al. [15,16]. Research for fault diagnosis based on data optimization in recent studies has been shown by Bazan et al. and Wang et al. [17,18]. Lee et al. proposed quantification methods of fault features for rotary machine fault diagnosis. Most studies regarding fault diagnosis have shown methods of feature extraction to improve the results of machine learning from the data acquisition of sensors.

As shown in studies abovementioned, diagnosing the abnormal conditions with multiple sensors show promising results of fault diagnosis; however, the efficiency of diagnosis performance is without consideration. As studies focus on methods or strategies to conclude in diagnosis, this paper proposes methods to optimize multiple sensor data by selecting an optimal sensor. Furthermore, in comparison with Bazan et al., the performances of diagnosis results regarding accuracy, and data reduction stretch to positive predictive value and diagnosis processing time [17]. Related to Lee et al., this paper proposes strategies based on quantification methods to evaluate the efficiency of each phase [19].

#### **2. Methodology of Data Characterization**

#### *2.1. Procedure of Data Characerization from Raw Data to Smart Data*

Procedure of data characterization is led with data acquisition with three acceleration sensors which are attached to the doctor's blade and the frame of the gravure printing system. Each sensor is capable of acquiring data with three axes. Then on, experimentally acquired raw data is achieved in three phases, as shown in Figure 1. During Phase 1, the acquired sensor data are evaluated for efficiency (*SE*), and the most efficient (optimal) sensor is chosen for DNF processing in Phase 2 to extract the most sensitive of three axes from the sensor. Then, a list of feature variables is tallied for training data using the FCM method in Phase 3. Finally, the processed smart data are used as input to the machinelearning fault diagnosis model to classify the printing process operating conditions during the major fault occurrences. Further description of smart data characterization through Phases 1–3 will be extensively illustrated in detail in Sections 2.2–2.4.

#### *2.2. Sensor Data Efficiency Evaluation*

The optimally efficient sensor is selected using an evaluation procedure based on Equation (1), which leverages three variables. *α* is the ratio of the data capacity between raw data and single-sensor data. *β* is the ratio of the data processing time, and *γ* is the ratio of the misclassification rate. Likewise, *β* and *γ* is a ratio between raw data and single-sensor data. Since the value of *SE* in Equation (1) is dependent on the ratio of three variables of two comparing data, the sensor rating the highest *SE* is selected as the optimal single sensor. In other words, a sensor with the clearest distinction to the raw data in three aspects abovementioned is likely to score the highest *SE*.

$$S\_E = \frac{\alpha + \beta}{2\gamma} \tag{1}$$

In the case of this experiment, the diagnosis results from the raw data of three triaxial sensors were compared.

**Figure 1.** Smart data characterization procedure from raw data.

#### *2.3. Directional Nature of Fault*

The DNF method extracts valid data from raw data by evaluating the sensitivity of the axial information from a single sensor. After Phase 1, the DNF method evaluates axes X, Y, and Z to extract valid data for fault diagnosis. The DNF method is defined in Equation (2), where *α* and *β* are weight factors defining the relative ratio between kurtosis and standard deviation. *k f* and *kn* are the kurtosis of the fault and normal conditions, respectively. *stdf* and *stdn* are the standard deviation of the fault and normal conditions, respectively. Based on the probability distribution curve, the standard deviation of the abnormal condition data has a wide distribution of data points [20,21]. The kurtosis of an abnormal condition has an imbalanced distribution [22]. The DNF number based on

Equation (2) can thus be acquired from each axis. The axis with the highest DNF number defines the most sensitive and valid data for training.

$$D\_N = \frac{1}{\alpha + \beta} (\alpha \frac{k\_f}{k\_n} + \beta \frac{\text{std}\_f}{\text{std}\_n}) \tag{2}$$

#### *2.4. Feature Combination Matrix*

The FCM method selects and extracts statistical feature variables. As shown in Figure 2, feature extraction is performed when the list of statistical feature variables is acquired from the dataset from Phase 2 [23,24]. The extracted features are then combined into the three features of a three-dimensional volume. As mentioned in Section 2.3, based on a normal distribution, the distribution of data points is likely to be imbalanced, broad, skewed, or irregular [25–27]. Comparing the volume acquired from the combination of the three features, the volume of the normal condition data is smaller than that of the abnormal condition. Hence, the combination producing the largest difference between the two volumes of different conditions reflects higher classification accuracy. The distance between the two datasets is also a factor that improves classification performance because it distinguishes between normal and abnormal conditions. The Mahalanobis distance is applied to evaluate the distance between two datasets in a multivariate space, including correlated points for multiple variables, considering the densities of the datasets [28–31]. Using the volumes of normal/abnormal feature combinations and the Mahalanobis distance feature variables, the Feature Variable's Dimensional Coordination number (*FDCN*) can be obtained. As shown in Equation (3), the *FDCN* evaluates the combination of extracted features to ranks them according to efficiency. *V*1 represents the volume of the normal condition feature combination, *V*2 represents the volume of the abnormal condition feature combination, and *Md* represents the Mahalanobis distance between *V*1 and *V*2.

$$FD\mathbb{C}\_N = M\_d \left(\frac{V\_2 - V\_1}{V\_2 + V\_1}\right) \tag{3}$$

The selected feature combination through evaluation of the *FDCN* is then applied to be used as training data for developing a machine learning fault diagnosis model.

**Figure 2.** Feature engineering for feature combination matrix.

#### **3. Experimental Data Acquisition**

The experimental data acquisition for major fault diagnosis of the gravure printing system is shown in Figure 3. As shown, acceleration Sensors 1, 2, and 3 were installed on both sides of the doctor blade and the frame supporting the printing module. All sensor outputs were obtained using a data acquisition module (NI-9230, National Instruments). Table 1 lists the specifications of the acceleration sensor and the NI-9230 module. When the sensors obtained the vibration data, they were transferred to the LabVIEW software to monitor and save the acquired data.

**Figure 3.** Experimental data acquisition by sensor position designation within the printing section.


**Table 1.** Specifications of acceleration sensor and NI-9230 module.

The possible main faults during the printing process of the gravure printing system are shown in Figure 4. The four main faults of the experimental design include doctor blade tilting, printing roll eccentricity, nip roll eccentricity, and nip force non-uniformity. To detect the main faults for diagnosis, the experimental variables included the doctor blade, nip force, and tension. Cases with and without doctoring, and cases with and without nipping were tested under tensions of 2, 4, and 6 kgf. Regarding the nip force, the nipping cases were tested under 5 and 10 kgf, as shown in Table 2.

As shown in Table 3, each case was tested under different tension, nip force, and doctoring conditions. The data used for diagnosing the doctor blade tilting fault required Cases 1 and 2 at an operating tension of 2 kgf, Cases 7 and 8 at an operating tension of 4 kgf, and Cases 13 and 14 at an operating tension of 6 kgf. Cases 1, 7, and 13 had different operating tensions; however, they were tested without and without doctoring. Cases 2, 8, and 14 also had different operating tensions with and without doctoring. The data for the fault diagnosis of the doctor blade tilting fault were acquired from the comparison of each case at the same operating tension. The data for diagnosis printing roll eccentricity were acquired from Cases 1, 7, and 13, which lack nipping and doctoring. Case comparison for nip roll eccentricity required conditions without doctoring; hence, Cases 5, 11, and 17 with a nip force of 10 kgf were compared to cases 1, 7, and 13. Nip force non-uniformity cases

were selected using the nip force data shown in Figure 5. Cases 9 and 15 with uniform nip forces were compared to Cases 11 and 17.

**Figure 4.** Possible main faults during gravure printing process: (**a**) Doctor blade tilting fault; (**b**) Printing roll eccentricity fault; (**c**) Nip roll eccentricity fault; and (**d**) Nip force non-uniformity fault.



**Table 3.** Case comparison for fault diagnosis of possible main faults during printing process of gravure printing system.


**Figure 5.** Nip force uniformity data of Cases 1–18.

### **4. Results**

*4.1. Doctor Blade Tilting Fault Diagnosis*

4.1.1. Doctor Blade Tilting Fault Diagnosis Based on Raw Data

In this section, the fault diagnosis results of the doctor blade tilting fault based on the raw data are presented in Table 4. The raw data in this case include all data acquired from Sensors 1, 2, and 3. The diagnosis of a doctor blade tilting fault at an operating tension of 2 kgf showed 58.2 with a diagnosis accuracy of 1508.9 s and a processing capacity of 115 Mb. For a tension of 4 kgf, accuracy rates of 48.1% at 3640.4-s processing time required 100-Mb data capacity. At a tension of 6 kgf, the accuracy of fault diagnosis rates was 67.2%, which was the highest among tensions by 368.4 s with 113-Mb data size.

**Table 4.** Doctor blade tilting fault diagnosis based on raw data (i.e., Sensors 1, 2, and 3).


4.1.2. Optimal Sensor Selection Based on Sensor Efficiency Evaluation Method

The sensor data efficiency method described in Section 2.2 was applied to the raw data to select a single optimal sensor for performance improvement. Because the raw data comprised all sensor data, the sensor data efficiency method evaluates the sensors individually, as shown in Table S1. To evaluate the efficiency of *SE*, the data capacity (*α*), processing time (*β*), and misclassification rate ( *γ*) must be obtained from individual sensors. Sensors 1 and 2 from Figure 3 were evaluated because both were installed on the doctor blade in the same directions as the X, Y, and Z axes. Tables S1–S3 show the results of the sensor data efficiency evaluation, comparing the raw data to the data of Sensors 1 and 2. The results of the doctor blade tilting fault diagnosis for optimal sensor selection in Tables S1–S3 show that the highest *SE* results for Sensor 1 are as listed in Table 5.

**Table 5.** Result of sensor data efficiency evaluation for optimal sensor selection of doctor blade tilting fault.


The result of the optimal sensor selection can be verified in Table S4 as compared with Table S5, based on the performance of the diagnosis results. It can also be seen that the diagnosis result of Sensor 1 was improved in accuracy, processing time, and data capacity compared with the result of raw data diagnosis shown in Table 4.

#### 4.1.3. Optimal Axis Selection Based on the DNF Method

Sensor 1 from the raw data of the doctor blade tilting cases was selected as the optimal sensor, and the DNF method was used to evaluate axes X, Y, and Z from Sensor 1 to extract the most sensitive axis. As mentioned in Section 2.3, based on the kurtosis and standard deviation of normal and abnormal conditions, the DNF number was calculated. The axis having the highest number of DNFs resulted in the highest diagnostic performance. As shown in Table 6, the DNF number evaluation of the X, Y, and Z axes from Sensor 1 is shown. As shown in Table 6, the axis having the highest DNF number differed depending on the operating tension. For a tension of 2 kgf, the Y-axis resulted in the highest DNF number. Tensions of 4 and 6 kgf showed the highest DNF numbers on the X-axis. The theory of achieving the highest diagnosis performance depending on the DNF number is verified in Tables S6–S8. Table S6 shows the highest accuracy of diagnosis for tensions of 4 and 6 kgf along the X-axis, and Table S7 illustrates the best result for a tension of 2 kgf. The proposed method evaluates the sensitivity of the axis using the DNF number, which resulted in a high rate of diagnosis accuracy and decreased processing time and data capacity requirements.


**Table 6.** DNF Number of axis X, Y, and Z from Sensor 1 of doctor blade tilting fault.

#### 4.1.4. Feature Variable Optimization Based on FCM Method

As shown in Figure 2, 12 feature variables were extracted from the data acquired during Phases 1 and 2. From the 12 feature variables, four were selected to be coordinated into a feature combination. The four variables in this case were skewness, kurtosis, standard deviation, and peak-to-peak. The left and right sides of the statistical feature variables are generally symmetrical around the mean on a normal distribution. Hence, skewness and kurtosis are selected as indicators to determine how far the distribution shape of the data deviates from normal. Skewness measures the asymmetry of the distribution. The more symmetric the data, the closer the skewness to zero. Furthermore, because kurtosis is a measure of outliers present in the distribution, there are clear criteria for discriminating between normal and abnormal, such as finding a value of three in the Gaussian probability distribution. In the case of peak-to-peak, peak vibration can be observed on the distribution chart when an abnormality occurs. Hence, the FCM method was applied to skewness, kurtosis, standard deviation, and peak-to-peak. The coordination of three feature variables of the selected four forms a volume, as shown in Figure 6. The red volume represents the three-dimensional feature variables of the abnormal condition data. The blue volume represents normal condition data. A significant volume difference between normal and abnormal conditions is visible. After evaluating the coordination of feature combinations from the selected feature variables using the FDC number from Equation (3), the combination having the highest FDC number was used as input data to train the machine-learning fault diagnosis model. As shown in Table 7, the fault diagnosis results of the doctor blade tilting condition improved, owing to the data characterization process of Phases 1, 2, and 3. Compared with the results of the raw data-based diagnosis in Table 4, the smart data-based fault diagnosis resulted in an improved accuracy of 90.1% from 58.2% at a tension of 2 kgf. At 4 kgf, the accuracy improved from 48.1% to 86.2%, and 67.2% to 97.0% at a tension of 6 kgf. The processing time reduced from 1508.9 s to 33.9 s at a tension of 2 kgf. It reduced from 3640.4 s to 37.5 s at 4 kgf. It reduced from 368.4 s to 16.6 s at 6-kgf tension. The data capacity was also reduced from approximately 113 Mb to 4 Mb.

#### *4.2. Printing Roll Eccentricity Fault Diagnosis*

#### 4.2.1. Printing Roll Eccentricity Fault Diagnosis Based on Raw Data

The fault diagnosis of printing roll eccentricity was conducted using the raw data of processes at tensions of 2, 4, and 6 kgf, as listed in Table 3. As shown in Table 8, the results based on the raw data showed a diagnosis accuracy of 69.7–76.9%. The processing time of the raw data diagnosis ranged from 208.0 s to 237.9 s.

#### 4.2.2. Printing Roll Eccentricity Fault Diagnosis Based on Smart Data

The diagnosis of the printing roll eccentricity fault data was performed in the same order as the doctor blade tilting diagnosis procedure described in Section 4.1. Based on the raw data of Phase 2, the sensor data efficiency evaluation was applied to select a single optimal sensor. As shown in Tables S9–S11, the data capacity, processing time, and misclassification rate of each case were computed to obtain *SE*, as shown in Table 9. *SE* results of Sensor 2 reflected the highest value for all tensions. The fault diagnosis results based on Sensors 1 and 2 are shown in Tables S12 and S13 as applied to the verification of the sensor data efficiency evaluation.

**Figure 6.** Volume comparison of normal and abnormal condition data: (**a**) Operating tension of 2 kgf; (**b**) Operating tension of 4 kgf; and (**c**) Operating tension of 6 kgf.


**Table 7.** Doctor blade tilting fault diagnosis based on smart data.

**Table 8.** Printing roll eccentricity diagnosis based on raw data (i.e., Sensors 1, 2, and 3).


**Table 9.** Result of sensor data efficiency evaluation for optimal sensor selection of printing roll eccentricity fault.


Based on the selected optimal Sensor 2 data, the DNF method was applied to extract the most sensitive axis information based on the DNF number. The results of the computation of the number of DNFs are listed in Table 10. The X-axis for tension 2 (4 kgf) resulted in the highest DNF number followed by the Z-axis for the remaining cases. Verification results of the selected axis depended on the cases based on the DNF number and are shown in Tables S14–S16. Compared with Table 10, the diagnostic performance of the selected axis having the highest DNF number provided the most efficient outcome.


**Table 10.** DNF Number of axis X, Y, and Z from Sensor 2 of printing roll eccentricity fault.

As shown in Figure 7, the feature variables were extracted and combined into three feature combinations for evaluation. The selected and extracted feature variables were identical to those described in Section 4.1.4. The conditions of normal and abnormal data formed a volume measure for each feature variable, as shown in Figure 7. The two conditions were then computed using Equation (3) to select the training input data. Based on the results of the FCM method, it was then used as input data for printing roll eccentricity fault diagnosis. The results are listed in Table 11. Compared with Table 8, smart data increased the diagnosis accuracy up to 99.1% with a processing time of 3.7 s and a data capacity of 4 Mb. In summary, diagnosing the main fault printing roll eccentricity with smart data improved the diagnostic performance with less time consumption and fewer data requirements.

**Figure 7.** Volume comparison of normal and abnormal condition data: (**a**) Operating tensions of 2 and 4 kgf; (**b**) Operating tensions of 2 and 6 kgf; and (**c**) Operating tensions of 4 and 6 kgf.


**Table 11.** Printing roll eccentricity fault diagnosis based on smart data.

#### *4.3. Nip Roll Eccentricity Fault Diagnosis*

4.3.1. Nip Roll Eccentricity Fault Diagnosis Based on Raw Data

The fault diagnosis of the nip roll eccentricity based on raw data is shown in Table 12. The results for cases of tensions 2, 4, and 6 kgf rated 42.1% to 56.0% diagnosis accuracy with 425.4 s to 597.0 s of processing time. The data capacity of the raw data ranged from 111 Mb to 114 Mb, like the raw data capacity of doctor blade tilting and printing roll eccentricity faults.

#### 4.3.2. Nip Roll Eccentricity Fault Diagnosis Based on Smart Data

The smart data transition from the raw data is presented in this section. The evaluation of the sensor data efficiency in Phase 1 used to select the optimal sensor is shown in Table 13. Sensor 1 was selected as the optimal sensor for the next phase of the DNF method. It can be seen that the *SE* of each case at Sensor 1 was higher than that of Sensor 2. As shown

in Tables S17–S19, the data capacities of Sensors 1 and 2 maintained an average value of 43. As the capacity difference of both sensors merely influenced factor *α*, the major factor influencing the outcome of *SE* was at factors *β* and *γ*. Tables S20 and S21 verify that the sensor having the highest *SE* maintained the diagnosis result with higher accuracy.


**Table 12.** Nip roll eccentricity fault diagnosis based on raw data (i.e., Sensors 1, 2, and 3).

**Table 13.** Result of sensor data efficiency evaluation for optimal sensor selection of nip roll eccentricity fault.


The evaluation of the X, Y, and Z axes of Sensor 1 was carried out based on the DNF method and the DNF number. The results from the most sensitive axis for each case are listed in Table 14. For the case of the tension of 2 kgf, the Z-axis rate had the highest *DN*, whereas tensions of 4 and 6 kgf rates were the highest in the X-axis. The diagnosis results for each case, based on the axis of Sensor 1, are shown in Tables S22–S24.

**Table 14.** DNF Number of axis X, Y, and Z from Sensor 1 of nip roll eccentricity fault.


The FCM method was carried out based on the results of Phase 2 in this section. The feature variables used for coordination of the combination were identical to the results of Sections 4.1 and 4.2 by skewness, kurtosis, standard deviation, and peak-to-peak. Kurtosis considers the effect of data at the end of the distribution on the probability curve. Based on the standard distribution, the kurtosis value increased with the weight of the outer values. Hence, kurtosis refers to the sharpness of the distribution, and if the degree of dispersion is large, the data are heterogeneous, and the height of the distribution is lowered. On the other hand, if the degree of dispersion is small, the data are homogeneous, and the height of the distribution increases.

The volume of normal and abnormal conditions based on the coordinated feature variables can be seen in Figure 8. Normal volume is shown in blue, and abnormal volumes are shown in red and yellow. The abnormal volumes differ depending on the nip force of the data. Table 15 shows the results of the nip roll eccentricity fault diagnosis based on the smart data. In the case of the tension of 2 kgf, the diagnostic accuracy rates were 100% with a data capacity of 4 Mb and a processing time of 4.63 s. Compared with the results of the raw data in Table 12, it can be seen that the fault diagnosis model performances improved in areas of accuracy, positive predictive value, processing time, and data capacity.

**Figure 8.** Volume comparison of normal and abnormal condition data: (**a**) Operating tension of 2 kgf; (**b**) Operating tension of 4 kgf; and (**c**) Operating tension of 6 kgf.


**Table 15.** Nip roll eccentricity fault diagnosis based on smart data.

#### *4.4. Nip Force Non-Uniformity Fault Diagnosis*

4.4.1. Nip Force Non-Uniformity Fault Diagnosis Based on Raw Data

Fault diagnosis based on raw data was performed to detect nip force non-uniformity. Figure 5 shows the data of the nip force for Cases 1–18. As Cases 11 and 17 in Figure 5 showed non-uniformity nip forces, the data of both cases were used as abnormal condition data for fault diagnosis. Table 16 shows the performance of the fault diagnosis at tensions of 4 and 6 kgf.

**Table 16.** Nip force non-uniformity fault diagnosis based on raw data (i.e., Sensors 1, 2, and 3).


4.4.2. Nip Force Non-Uniformity Fault Diagnosis Based on Smart Data

The sensor data efficiency evaluation results are shown in Table 17 based on the computation of Tables S25 and S26. It can be seen that Sensor 2 had the highest *SE* among the raw data. Tables S27 and S28 can be used to verify the optimal sensor selection results of the sensor data efficiency evaluation.

The DNF method was used to evaluate the axis of Sensor 2 by X, Y, and Z for tension cases of 4 and 6 kgf. The DNF numbers for both cases are shown in Table 18, where the result of a tension of 4 kgf showed axis Y as the most valid, and X for the tension case of 6 kgf. The results of the fault diagnosis based on Sensor 2 for the triaxis are shown in Tables S29–S31.

With identical feature variables coordinated through the FCM method, the volumes of normal and abnormal conditions are shown in Figure 9. It can be seen from Figure 9a that the volume of the normal condition overlaps with the volume of the abnormal condition. Thus, the peak values and the distribution of data points for abnormal conditions were

broad, compared with the normal volume condition. Based on the results of the FCM, the nip force non-uniformity fault diagnosis results with smart data are shown in Table 19.

**Table 17.** Result of sensor data efficiency evaluation for optimal sensor selection of nip force nonuniformity fault.


**Table 18.** DNF Number of X, Y, and Z axes from Sensor 2 of nip force non-uniformity fault.


**Figure 9.** Volume comparison of normal and abnormal condition data: (**a**) Operating tension of 4 kgf; (**b**) Operating tension of 6 kgf.



*4.5. Simultaneous Fault Diagnosis*

In Sections 4.1–4.4, defects caused during the printing process of gravure printing system has been diagnosed independently. However, occasionally in real applications it is likely for the gravure printing system to malfunction with more than one single fault. In this section, characterized smart data has been applied under the assumption of multiple faults appearing simultaneously to present the effectiveness of the diagnosis model performance.

Cases 6, 12, and 18 from Table 2 has been selected for the multiple fault data since the experimental condition included with nipping and doctoring at tensions 2, 4, and 6 kgf. Diagnosis results of simultaneous multiple faults is shown in Table 20. The effectiveness of the smart data characterization is shown with comparison to the diagnosis result with raw data. As the raw data of simultaneous faults contain various disturbances with noticeable

peaks, it is less complex for the raw data-based diagnosis model to clarify the distinct conditions for classification. Hence, the average accuracy of raw data diagnosis is at 72.3% in which rates a higher value compared to single fault diagnosis results. Therefore, results based on smart data rates at an average of 99% on the grounds of abovementioned basis. In short, detecting simultaneous multiple faults based on smart data shows positive results as shown in Table 20.


**Table 20.** Simultaneous fault diagnosis result based on big data and smart data.

#### *4.6. Raw Data and Smart Data Comparison for Fault Diagnosis*

The fault diagnosis of four possible major faults during the printing process of the gravure printing system based on raw and smart data is shown in Table 21. Table 21 summarizes the impact of data characterization methods for the diagnosis of the four suggested major faults and the simultaneous faults of the gravure printing system printing process. The diagnosis performance comparison results are shown based on raw and smart data. All diagnosis results based on raw data and smart data are processed through support vector machine algorithm. In Tables S32–S35, diagnosis results of the four major faults depending on the machine learning algorithm is shown. A total of eight different algorithms have been applied to each of the faults and consequently shows that the most efficient outcome of the performance regarding accuracy, positive predictive value, and processing time concludes with the use of a support vector machine algorithm to diagnose all faults of the printing process.


**Table 21.** Raw data and smart data diagnosis comparison.

Based on the results of Table 21, techniques to increase the accuracy of the classification has been applied to faults of doctor blade tilting, printing roll eccentricity, and nip force non-uniformity. As the abovementioned faults maintain an accuracy of 97% to 99%, it is possible to improve the final diagnosis results by adjusting the parameter of window size. As shown is Equation (4), the window size can be adjusted using the sampling rate and revolutions per minute. As *x* is the revolutions per minute, and *α* as the sampling rate (Hz), it is possible to obtain the value *Ws*. Once the value *Ws* is obtained for the three faults it is then applied to as a fixed parameter to be diagnosed based on the smart data. The results show in Table 22 that the contents of accuracy, PPV, and processing time have improved in comparison to the results of Table 21.

$$\mathcal{W}\_s = \alpha \left(\frac{\chi}{60}\right)^{-1} \tag{4}$$


**Table 22.** Smart data diagnosis improvement with window size adjustment.

### **5. Conclusions**

Printing defects generated by the misalignment of the doctor blade, eccentricity of the nip and printing rolls, and non-uniform nip roll pressures can negatively affect the performance of printed electronic devices. To prevent printing defects and to obtain highquality printed functional layers, it is necessary to recognize and diagnose factors that cause printing defects. In this study, a method for data characterization using sensor data efficiency evaluation (*SE*), DNF, and FCM methods was proposed to diagnose the possible four major faults in the roll-to-roll gravure printing process, followed by experimental verification. The misalignment of the doctor blade, printing roll eccentricity, nip roll eccentricity, nip force non-uniformity, and simultaneous faults rated an average value of 56% accuracy with raw data. However, with smart data, the accuracy rated 100.0% on average. The positive predictive value increased when the learning time was reduced from 1247 s to 12 s on average. The data capacity was reduced from 112 Mb to 5 Mb, depending on the selection of the sensor and its axis with optimized feature variable coordination. It is known that, with the use of smart data through sensor data efficiency evaluation, the feature combination matrix, and DNF methods, machine learning fault diagnosis model performance improves for classifying normal and abnormal conditions of datasets. The proposed smart data process in this paper is the most novel and contributory aspect of this paper because it leads to the near-perfect performance of the machine learning fault detection model. It is also faster and less computer-memory intensive than the results found from raw sensor data. This poses a contribution to the field, and countless industries can benefit from the improved and most cost-efficient production of printed electronics. Further research regarding the methodologies proposed in this paper plans to expand the application for fault diagnosis despite the numerous numbers of sensors.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/1424-8 220/21/24/8454/s1, Table S1: Sensor data efficiency evaluation of doctor blade tilting fault under operating tension 2 kgf, Table S2. Sensor data efficiency evaluation of doctor blade tilting fault under operating tension 4 kgf, Table S3. Sensor data efficiency evaluation of doctor blade tilting fault under operating tension 6 kgf, Table S4. Doctor blade tilting fault diagnosis result based on sensor 1, Table S5. Doctor blade tilting fault diagnosis result based on sensor 2, Table S6. Doctor blade tilting fault diagnosis result based on X axis of sensor 1, Table S7. Doctor blade tilting fault diagnosis result based on Y axis of sensor 1, Table S8. Doctor blade tilting fault diagnosis result based on Z axis of sensor 1, Table S9. Sensor data efficiency evaluation of printing roll eccentricity fault under case 2 kgf and 4 kgf, Table S10. Sensor data efficiency evaluation of printing roll eccentricity fault under case 2 kgf and 6 kgf, Table S11. Sensor data efficiency evaluation of printing roll eccentricity fault under case 4 kgf and 6 kgf, Table S12. Printing roll eccentricity fault diagnosis result based on sensor 1, Table S13. Printing roll eccentricity fault diagnosis result based on sensor 2, Table S14. Printing roll eccentricity fault diagnosis result based on X axis of sensor 2, Table S15. Printing roll eccentricity fault diagnosis result based on Y axis of sensor 2, Table S16. Printing roll eccentricity fault diagnosis result based on Z axis of sensor 2, Table S17. Sensor data efficiency evaluation of nip roll eccentricity fault under case 2 kgf, Table S18. Sensor data efficiency evaluation of nip roll eccentricity fault under case 4 kgf, Table S19. Sensor data efficiency evaluation of nip roll eccentricity fault under case 6 kgf, Table S20. Nip roll eccentricity fault diagnosis result based on sensor 1, Table S21. Nip roll eccentricity fault diagnosis result based on sensor 2, Table S22. Nip roll eccentricity fault diagnosis result based on X axis of sensor 1, Table S23. Nip roll eccentricity fault diagnosis result based on Y axis of sensor 1, Table S24. Nip roll eccentricity fault diagnosis result based on Z axis of sensor 1, Table S25. Sensor data efficiency evaluation of nip force non-uniformity fault under case 4 kgf, Table S26. Sensor

data efficiency evaluation of nip force non-uniformity fault under case 6 kgf, Table S27. Nip force non-uniformity fault diagnosis result based on sensor 1, Table S28. Nip force non-uniformity fault diagnosis result based on sensor 2, Table S29. Nip force non-uniformity fault diagnosis result based on X axis of sensor 2, Table S30. Nip force non-uniformity fault diagnosis result based on Y axis of sensor 2, Table S31. Nip force non-uniformity fault diagnosis result based on Z axis of sensor 2, Table S32. Doctor blade tilting fault diagnosis with various machine learning algorithms, Table S33. Printing roll eccentricity fault diagnosis with various machine learning algorithms, Table S34. Nip roll eccentricity fault diagnosis with various machine learning algorithms, Table S35. Nip force non-uniformity fault diagnosis with various machine learning algorithms.

**Author Contributions:** Conceptualization, Y.L. and C.L.; methodology, Y.L. and M.J.; software, Y.L. and G.C.; formal analysis, Y.L., M.J. and G.C.; data curation, M.J. and G.C.; writing—original draft preparation, Y.L., M.J. and C.L.; writing—review and editing, M.J. and C.J.; visualization, G.C. and C.J.; supervision, C.L.; project administration, C.L.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea (NRF) gran<sup>t</sup> funded by the Korea governmen<sup>t</sup> (MSIT) (No. 2020R1A5A1019649) and the Korea Institute for Advancement of Technology (KIAT) gran<sup>t</sup> funded by the Korea Government (MOTIE) (P0012770).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


### *Article* **Aircraft Landing Gear Retraction/Extension System Fault Diagnosis with 1-D Dilated Convolutional Neural Network**

**Jie Chen 1,\*, Qingshan Xu 1, Yingchao Guo 1 and Runfeng Chen 2**


**\***Correspondence: shuimujie@mail.nwpu.edu.cn

**Abstract:** The faults of the landing gear retraction/extension(R/E) system can result in the deterioration of an aircraft's maneuvering conditions; how to identify the faults of the landing gear R/E system has become a key issue for ensuring aircraft take-off and landing safety. In this paper, we aim to solve this problem by proposing the 1-D dilated convolutional neural network (1-DDCNN). Aiming at developing the limited feature information extraction and inaccurate diagnosis of the traditional 1-DCNN with a single feature, the 1-DDCNN selects multiple feature parameters to realize feature integration. The performance of the 1-DDCNN in feature extraction is explored. Importantly, using padding dilated convolution to multiply the receptive field of the convolution kernel, the 1-DDCNN can completely retain the feature information in the original signal. Experimental results demonstrated that the proposed method has high accuracy and robustness, which provides a novel idea for feature extraction and fault diagnosis of the landing gear R/E system.

**Keywords:** landing gear retraction/extension(R/E) system; 1-D dilated convolutional neural network (1-DDCNN); fault diagnosis; feature integration

### **1. Introduction**

The landing gear R/E system is the significant subsystem for aircrafts, after long-term running under complex and variable conditions, with heavy loads and strong impact, the key parts in the landing gear R/E system will inevitably generate multifarious faults, which may affect take-off, landing, and flight safety.

Firstly, Hinton proposed a deep learning method in 2006, which set off a new wave of research on artificial intelligence and its applications [1]. In particular, deep learning models have shown significant success in image processing, speech recognition, target detection, information retrieval, natural language processing, and so on [2]. Moreover, as an important network structure, CNNs are widely applied in computer vision and natural language processing [3]. Machine learning methods have made grea<sup>t</sup> progress in the field of fault diagnosis. For example, Gligorijevic et al. proposed a method for rolling bearing fault diagnosis. Through the five-level wavelet decomposition of the vibration signals, the standard deviations of the wavelet coefficients from six sub-bands were extracted as representative features; feature dimensionality reduction was then performed, and the diagnosis accuracy reached 98.9% [4]. However, some scholars gradually introduced CNNs into the field of fault diagnosis. By converting 1-D timeseries vibration signals into 2-D input matrices, some experts and scholars constructed 2-D convolutional neural network models for fault diagnosis of rotating machinery. Janssens et al. performed a short-time Fourier transform on the vibration information of rotating machinery, then input the transformed coefficient map into a constructed CNN model for feature extraction to achieve CNN-based multi-fault identification [5]. Jing et al. proposed an adaptive multi-sensor data fusion method based on deep convolutional neural networks, for fault

**Citation:** Chen, J.; Xu, Q.; Guo, Y.; Chen, R. Aircraft Landing Gear Retraction/Extension System Fault Diagnosis with 1-D Dilated Convolutional Neural Network. *Sensors* **2022**, *22*, 1367. https:// doi.org/10.3390/s22041367

Academic Editor: Andrea Cataldo

Received: 4 January 2022 Accepted: 7 February 2022 Published: 10 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

diagnosis. The proposed method can learn features from raw data and adaptively optimize a combination of different fusion levels to effectively diagnose different faults of planetary gearboxes [6]. Wen et al. proposed a signal-to-image conversion method, using CNN techniques to extract features of the converted images, and achieved excellent test results for three famous datasets, comprising a motor bearing dataset, a self-priming centrifugal pump dataset, and an axial piston hydraulic pump dataset [7].

The main difference between the 1-DCNN and 2-DCNN lies in the dimension of input data and the sliding mode of the convolution kernel. Wu et al. proposed a 1-DCNN model for fault diagnosis of original vibration signals, for which test diagnosis accuracy reached 99.3% [8]. Huang et al. proposed a multi-scale cascade convolutional neural network (MC-CNN) model for fault diagnosis of bearings, achieving satisfactory results under non-stationary operating conditions [9]. He et al. combined a 1-DCNN and a LSTM (long short-term memory) network to construct a novel network model for fault diagnosis of bearings, with an average accuracy of over 99% [10].

For the traditional CNN, the convolution and pooling operations are carried out alternately, which reduces the feature maps' size while increasing the receptive field. However, for the pixel-level prediction problem of image segmentation, the final feature output size is required to be consistent with the original image size, which involves the down-sampling and up-sampling processes, image resolution reduction, and information loss. To solve these problems, dilated convolution came into being [11]. Zhuang et al. proposed a stacked residual dilated convolutional neural network (SRDCNN) for realtime fault diagnosis of bearings by combining dilated convolution, LSTM, and residual networks, and their experimental results show that the proposed model has improved denoising performance and adaptability [12]. Feng et al. used dilated convolution to replace conventional convolution and pooling structures, and introduced instance normalization (IN) to solve the issue of data style transfer. The proposed 1-D stacked dilated convolutional neural network (1D-SDCNN) model has an average accuracy of 96.8% for fault diagnosis of rolling bearings with variable loads [13]. Liang et al. combined LSTM, dilated convolution, and capsule networks to construct a new capsule network with gate-structure dilated convolutions (GDCCN). Their experimental results demonstrated that the proposed model has strong noise resistance and generalization in fault diagnosis of motors under variable load conditions [14]. As equipment has become more intelligent, complex, and integrated, it is difficult to accurately determine the characteristics of the failure status with a single feature. Moreover, using the 1-DCNN for fault diagnosis is highly dependent on fault datasets for roller bearing. The CNN fault diagnosis model can directly extract features from the original data to achieve end-to-end fault diagnosis without complicated data pre-processing. Dimensionality reduction methods, such as principal component analysis (PCA), cannot effectively preserve the time-dependence of timeseries data; moreover, information loss occurs in the process of dimensionality reduction.

In order to solve the above problems, this paper proposes a fault diagnosis method for the aircraft landing gear R/E system based on a 1-D dilated convolutional neural network (1-DDCNN). The main work of this paper is summarized as follows:


#### **2. The Typical Aircraft Landing Gear R/E System Analysis**

*2.1. System Working Principle and Composition*

The landing gear system of a typical aircraft mainly includes the following: front landing gear and cabin door, main undercarriage and cabin door, landing gear R/E system, wheels and brakes, turning control system, landing gear position indication system, etc. Among them, the R/E system mainly completes normal R/E and emergency R/E functions, and provides the landing gear position indication signal.

The front landing gear R/E process is similar to that of the main landing gear; therefore, only the front landing gear retraction process is described here. The working status of the front landing gear is shown in Figure 1. When the plane takes off, and the landing gear wheels are off the ground, the pilot sets the landing gear R/E control switch to the "UP" position, and the current flows to the landing gear R/E electromagnetic switch and the accumulator charging electromagnetic switch. The hydraulic fluid from the three-position four-way directional valve enters the front landing gear lower lock and actuating cylinder. As the accumulator charging solenoid switch is turned on, oil from the pump is supplied to the actuating cylinder, and the oil in the accumulator is also released to aid the landing gear retraction. When the aircraft is about to land, the pilot moves the control switch to the "DOWN" position, the three-position four-way directional valve is switched to the down circuit, and the fluid enters the front landing gear upper lock. Once the lock is opened, the oil enters the lowering chamber of the R/E actuator to lower the landing gear.

**Figure 1.** Front landing gear retracting process.

The landing gear R/E system mainly includes the following components: constant pressure variable pump, tank, hydraulic motor, filter, accumulator, actuator, press control, throttle valve, and three-position four-way directional control valve.

Through providing certain pressure and oil mass, the pump converts mechanical energy into hydraulic energy. The tank is used to store hydraulic oil. The filter is used to filter the hydraulic oil and remove its impurities. The accumulator not only supplies oil at both weak and heavy flows, it also compensates for leakage and maintains constant pressure. The actuator is a device that converts hydraulic energy into mechanical energy for linear reciprocating motion, which overcomes the load (including friction) and maintains the speed of motion using pressure-driven liquid flow. The relief valve is one of the common pressure valves used to regulate or limit the pressure in a hydraulic system. The throttle valve is a hydraulic component that regulates and controls the flow of oil in a hydraulic system. The function of the one-way throttle valve is to ensure that the oil flows in one direction, with no backflow, using the throttling effect. The solenoid directional valve is one of the frequently used hydraulic components in hydraulic systems, and is used to switch the direction of the hydraulic circuit. This article uses a three-position four-way solenoid directional valve. The "Position" refers to the working position of the spool. The "Way" marks the valve body of the oil port [15].

#### *2.2. Failure Mode and Effect Analysis*

The failure mode and effect analysis (FMEA), derived from typical civil aircraft design data in this subsection, can be used for parameter selection and fault injection into the simulation model, whereby fault datasets are obtained. The FMEA was carried out on the system's main components for subsequent fault diagnosis, the results of which are shown in Table 1 below. The failure analysis in this paper focuses on the component level of the landing gear R/E system and does not explore the specific internal failure of each component. For the excessive noise from the pump, the failure threshold can be obtained by changing the air content of the oil. If the clogging of the throttle valves at both ends of the actuator cylinder has different effects on the system, it is necessary to change the throttle valve's diameter at both ends of the actuator cylinder to obtain the failure threshold. Regarding the system failure caused by constant pressure variable pump leakage, a throttle valve should be connected to the constant pressure variable pump in parallel, and the throttle valve's diameter should be changed to simulate different degrees of leakage of constant pressure variable pump. Actuator cylinder leakage also affects the normal operation of the system, and the failure threshold can be obtained by changing the actuator leakage coefficient.


**Table 1.** FMEA for the landing gear R/E system.

#### **3. 1-DDCNN Fault Diagnosis Model**

### *3.1. 1-DCNN*

The receptive field is defined as the area size mapped on the original image by each pixel on the feature map output from each layer in the CNN. The neuron's receptive field value decides the original range it can cover, meaning that it may contain more global features. Figure 2a shows the range of neuron receptive fields in the third layer of the 1-DCNN, with a convolutional kernel size of 3 × 1 and a step size of 1 × 1. The marked blue neurons in the third layer are mapped from the blue regions in the first layer, that is, the receptive field size of the input sequence data corresponding to a neuron in the output feature map 2 is 5 × 1.

**Figure 2.** The receptive field of 1-DCNN. (**a**) 1-DCNN; (**b**) 1-DDCNN.

Before inputting 1-D time series signals into 2-DCNN, the common method is to rearrange and combine signal sampling points, using a simple procedure, and convert them into 2-D matrix form. The 1-DCNN has the advantage that 1-Dtime series signals can be input directly without the need for cumbersome conversions.

The output receptive field of the n-th layer is:

$$r\_n = r\_{n-1} + ((k\_n - 1) \times \prod\_{i=1}^{n-1} s\_i), n \ge 2,\tag{1}$$

where *rn* is the receptive field size of the n-th layer,*kn* is the filter size of the n-th layer, and *si* is the movement step size of the *i*-th layer filter.

According to the receptive field's design principle, the size of the neuron receptive field in the last layer is close to the input signal's length, that is, it satisfies the condition *rn* = *L*, where L is the length of the input signal. The convolution kernel size is k, and the sliding step size of the convolution kernel is s. Each convolutional layer is followed by the maximum pooling layer, where the step size of the maximum pooling layer is *kpool* = 2, and the sliding window of the maximum pooling layer is *kpool* = 2. When *n* > 2, the receptive field of the convolutional layer is:

$$r\_n = r\_{n-1} + (k-1) \times \prod\_{i=1}^{n-1} s\_{i\prime} \tag{2}$$

and the receptive field of the pooling layer is:

$$r\_{n+1} = r\_n + (k\_{\text{pool}} - 1) \times \prod\_{i=1}^{n-1} s\_i \times s\_i \tag{3}$$

when *n* > 2, the n-th network layer is a convolutional layer and n is odd, then the difference between the front and back receptive fields is <sup>2</sup>*<sup>n</sup>*−1; thus, the expression for the receptive field *rn* of the neurons in the last pooling layer at the input signal, when *n* is even number, is:

$$r\_{\mathbb{H}} = k + 1 + 2k + 4k + \dots + 2^{\frac{\mathbb{H}}{2} - 1}k = 1 + 2^{\frac{\mathbb{H}}{2}}k - k \tag{4}$$

According to the condition *rn* = *L*, the value of k is obtained from:

$$k \approx \frac{L-1}{2^{\frac{n}{2}}-1},\tag{5}$$

### *3.2. 1-DDCNN*

Dilated convolution is also called expanded convolution; it replaces the traditional CNN pooling operation by introducing the expansion ratio, which can completely retain the feature information in the original signal so that the convolution kernel of the same size can obtain a larger receptive field.

$$\begin{cases} r\_1 = k\\ r\_n = r\_{n-1} + (k\_n - 1)l\_{n\prime} & n \ge 2\end{cases} \tag{6}$$

where *rn* is the receptive field size of the n-th layer network structure, *kn* is the convolutional kernel size of the n-th layer network structure, and *ln* is the expansion rate of the n-th layer network structure.

Figure 2b shows the receptive fields' range in the third layer of the 1-DDCNN (output feature map 2) for the first layer(input sequence data) and the second layer (output feature map 1). The convolutional kernel size is 3 × 1 (*k* = 3). The step size is 1 × 1. The expansion rate is 2 (*l*1 = *l*2 = 2). The receptive field size in the second layer corresponding to output feature 2 is 5 × 1, and the receptive field size in the first layer is 9 × 1.

#### *3.3. 1-DDCNN Fault Diagnosis Model Framework*

The structural framework of the proposed fault diagnosis method based on the 1-DDCNN is shown in Figure 3. The model fault diagnosis process was as follows:

**Figure 3.** 1-DDCNN fault diagnosis process.

Step 1: Datasets were divided into training set, validation set, and test set.

Step 2: According to the structure and parameters of the traditional 1-DCNN model, the 1-DDCNN was preliminarily designed.

Step 3: The diagnostic accuracy of the multi-feature 1-DDCNN model under different convolutional kernel sizes was investigated to determine the final model hyper-parameters.

Step 4: The proposed model was trained and tested with a test set to obtain the fault diagnosis accuracy.

### **4. Experimental Implementations**

#### *4.1. Data Description and Operating Environment*

Due to the fault data insufficiency regarding operation conditions, there is an incentiveto use AMESim® to model the landing gear R/E system model and obtain fault datasets. According to the mutual logical relationship between components, the landing gear R/E system model was established, and is presentedin Figure 4. The blue section represents the hydraulic subsystem, the green section signifies the mechanical subsystem, and the red section denotes the external load of the system. Component parameter settings in the model are shown in Table 2.

**Figure 4.** Simulation of landing gear R/E system. 1. Hydraulic motor; 2. Constant pressure variable pump; 3. Accumulator; 4. Filter; 5. Spring check valve; 6. Press control; 7. Oil tank; 8. Hydraulic check valve; 9. Three-position four-way directional control valve; 10. Hydraulic fluid; 11. Two-position three-way directional control valve; 12. Unlock actuator; 13. Flow control; 14. Actuatorcylinder.

**Table 2.** Component parameter settings in the landing gear R/E system model.


**Sub-Model Parameter Value** Press control Relief value cracking pressure 206.843 bar Relief value flow rate pressure gradient 20 L/min/bar Throttle valve Diameter 2 mm Three-position four-way directional control valve Valve natural frequency 80 Hz Valve damping ratio 0.8 Flow rate 39 L/min Pressure drop 2.5 bar Valve rated current 40 mA

**Table 2.** *Cont.*

The specific parameters of the FMEA in Section 2.2 are shown in Table 3.

**Table 3.** Labels and failure threshold.


The failure status:1 curve in the subgraphs a, b, c, and d of Figure 5 shows the main parameters' variation trends under normal conditions, and that the entire landing gear R/E process time is 32 s, during which the landing gear retraction time is 7.5 s and the extension time is 10.8 s. These times are similar to those specified in the manual, and the manual requires that the R/E time shall not exceed the specified time by 1 s, or it will be regarded as a fault [16].

**Figure 5.** Comparison of the six fault types under different feature parameters. (**a**)The displacement of actuating cylinder; (**b**) system pressure; (**c**) the pressure at the right end of actuating cylinder; (**d**) the pressure at the left end of actuating cylinder.

According to the fault thresholds in Table 3, 300 simulations were conducted for each of the six fault states, and the four parameters (actuator cylinder displacement, system pressure, and the pressure at the right and left end of the actuating cylinder) were sampled. The sampling frequency was set as 0.01 to obtain 1800 samples. Training, validation, and test sets were divided as 8:1:1, respectively. The details of the single-feature and multifeature datasets are shown in Tables 4 and 5, respectively, and the operation environment for the simulation is described in Table 6.

**Table 4.** Single-feature dataset A.




**Table 6.** Experimental operation environment.


### *4.2. Experimental Model*

Zhou [17] analyzed the following three important factors that have an impact on the performance of the CNN: network organization structure, network depth, and feature maps number. On the one hand, increasing the network depth can improve the recognition accuracy; on the other hand, increasing the feature maps number can also improve the recognition accuracy. Therefore, it is necessary to conduct a comparative study separately to determine the final model parameters. From Section 4.1, it is known that the sequence length of a single sample is 3201. The total number of convolutional and pooling layers is *n* = 12 (excluding the dropout layer). On the basis of Equation (5), the convolution kernel size of the first convolutional layer is 50. From the comparative test, the model with convolution kernel size 50, convolution number 4, and moving step size 1 at the first convolution layer, has the best diagnostic effect. The specific parameters of the traditional 1-DCNN model are shown in Table 7.


**Table 7.** 1-DCNN model details.

Due to the limited feature information extraction and inaccurate classification of the traditional 1-DCNN model with a single-feature parameter (actuator cylinder displacement), the three features, e.g., system pressure and the pressure at the right and left end of actuating cylinder, are selected to jointly characterize six failure statuses.

Referring to the traditional 1-DCNN, in 1-DDCNN we initially set the convolution kernel size as 50, the step size as 1, and the expansion factor as *ln* = <sup>2</sup>*<sup>n</sup>*−1, the calculation formula for the receptive field is:

$$r\_n = 1 + (k - 1)(2^n - 1)\tag{7}$$

The network structure and initial settings are shown in Figure 6 and Table 8, respectively. The design principle of the model is that the output feature graph size of the last convolution layer is similar to, or exactly the same as, the size of the input data. The proposed 1-DDCNN has the following advantages: firstly, it constructs the convolutional kernel to obtain a larger receptive field and completely retain the feature information in the original signal; secondly, it can act as a dropout layer to prevent over-fitting.

**Figure 6.** The structure of 1-DDCNN.


**Table 8.** 1-DDCNN model details.

#### *4.3. Experimental Results and Analysis*

4.3.1. Research on the Size of Convolution Kernel of 1-DDCNN Model

According to Equation (7), once the expansion factor is determined, the parameter that has a decisive influence on the receptive field size is the convolution kernel size. Since the output size of each dilated convolution layer in the 1-DDCNN is 3201 × 1, the convolution kernel size does not affect the output features' size of the dilated convolution layer, but has a grea<sup>t</sup> influence on the feature extraction degree of the original data. Therefore, it is necessary to investigate the convolutional kernel size's effect on the classification accuracy. The convolution kernel size was set to 30, 40, 50, 60, and 70 in turn to investigate the diagnostic accuracy of the 1-DDCNN under different conditions, and to determine the final model hyper-parameters.

Table 9 and Figure 7 show the detailed diagnosis results for the effect of convolution kernel size on the test samples in each trial. As the convolution kernel size increases, the total training parameters rise, and the model running time expands accordingly. The average accuracy of the 1-DDCNN with different convolution kernel sizes reached more than 90%. In particular, when the convolution kernel size was 40, the highest average accuracy of 99.80% was achieved in five training sessions. When the convolution kernel size was 50, its standard deviation was at least 0.0000, which indicates that the model had the highest stability under this condition.


**Table 9.** Fault diagnosis results of 1-DDCNN under different sizes of convolution kernels.

Figure 8 shows that accuracies were close to 100% at the 10th iteration with convolution kernel size ranging from 30 × 1 to 50 × 1, and fluctuated around 94.5% from the 50th iteration onwards, with convolution kernel size ranging from 60 × 1 to 70 × 1. In fact, when the convolution kernel size is 60 × 1, the accuracy actually dropped. It can be seen from Figure 9 that the loss values within the convolution kernel size range of 30 × 1 to 50 × 1 approach 0 at the 10th iteration. The loss value at convolution kernel size 60 × 1 remained around 0.92 after 10th iteration, which indicates over-fitting. The loss value of the model corresponding to the convolution kernel size 70 × 1 remains around 0.14 after the 40th iteration. In particular, at the 65th iteration, the training and validation loss values at convolution kernel size 40 × 1 were both less than 1.0 × 10−5.

**Figure 9.** Training and validation loss values at different convolution kernel sizes.

Considering the accuracy, stability, and training cost comprehensively, results are optimal when convolution kernel size is set as 40 × 1.

**E**

4.3.2. Comparative Experiment of Three Models under Different Datasets

To show the dilated convolution's advantages in feature extraction and information loss prevention, the comparative experiments of three models, e.g., traditional 1-DCNN, 1-DDCNN, and 1-DDCNN II, with dataset A and dataset B, were conducted. Depending

on whether the same size is maintained between the feature map and the input data, two types of convolution operations exist: VALID (without padding) and SAME (with padding) convolution operations. Compared to the 1-DDCNN, the 1-DDCNN II's dilated convolution layer was VALID, and the convolution kernel size was uniformly set to 51 × 1. Inputting the dataset into 1-DDCNN II, the output size of the flattening layer was 3264 × 1 after six dilated convolution layers, which is slightly larger than the sequence length of the input samples. The model structure is similar to the 1-DDCNN, and specific parameter settings are shown in Table 10.


**Table 10.** 1-DDCNN II details.

It can be seen from Table 11 and Figure 10 that, compared with dataset A, the diagnostic accuracies of the 1-DCNN, 1-DDCNN, and 1-DDCNN II with dataset B were higher, and the total training parameters and training time increased slightly. The average accuracies of the 1-DDCNN and 1-DDCNN II reached more than 99%, and the standard deviation of both was 0.0045, which indicates that both models are stable.



**Figure 10.** Test accuracy of different models from dataset A and dataset B.

The training processes and confusion matrices for the three models are shown in Figures 11–13.

**Figure 11.** Training and validation accuracy of different models from dataset A and B.

**Figure 12.** Training and validation loss value of different models from dataset A and B.

**Figure 13.** Confusion matrix of diagnosis results of different models based on dataset A and B.The following are the results of: (**a**) 1-DCNN from dataset A; (**b**) 1-DCNN from dataset B; (**c**) 1-DDCNN from dataset A; (**d**) 1-DDCNN from dataset B; (**e**) 1-DDCNN II from dataset A; (**f**) 1-DDCNN II from dataset B.

In Figure 13, the rows represent the predicted class (output class) and the columns represent the true class (target class). The diagonal cells represent the observations that were correctly classified. The off-diagonal cells represent incorrectly classified observations. Both the observation number and the percentage of the total observation numbers, are shown in each cell. The far-right column in the plot shows the percentages of all the predicted examples belonging to each class that were correctly and incorrectly classified. These metrics are often called the precision and false rate, respectively. The bottom row in the plot shows the percentages of all the examples belonging to each class that are correctly and incorrectly classified. These metrics are often called the recall (or true positive rate) and false negative rate, respectively. The cell in the bottom right of the plot shows the overall accuracy.

When the 1-DCNN model with dataset A was located at the 100th iteration, the training accuracy was 98.63%, and the training loss value was 0.0441. The validation accuracy was 98.61% and the validation loss value was 0.0609. The confusion matrix corresponding to the test set is shown in Figure 13a; four fault samples caused by "excessive noise from the pump" were incorrectly classified as "normal", and this corresponded to 2.2% of all 180 samples in dataset A. Similarly, six fault samples caused by "throttle valve blocking at left end of actuating cylinder" were incorrectly classified as "excessive noise from the pump", and this corresponded to 3.3% of all data. Overall, 93.9% of the classifications were correct and 6.1% were wrong. It is believed that the potential reason for the identification error was the high similarities between sample sequences of different failure status, as shown in Figure 5a.

It can be seen in Figure 11 that the accuracies of both 1-DDCNNs with dataset B were close to 100% at the 10th iteration, while the accuracy of the traditional 1-DCNN model with dataset A and B did not reach 100%, even after 100 epochs, which indicates that the traditional 1-DCNN model loses a large amount of information during the pooling process.

Figure 12 shows the loss values of the 1-DDCNN and 1-DDCNN II with dataset B are both close to 0 at the 10th iteration; the convergence trend of the loss value of the 1-DCNN with dataset B was slower than that of the other models. The training and validation loss value convergence curve of the 1-DCNN had a significant gap after the previous 20 iterations. In particular, at the 63th iteration, training and validation loss values of the 1-DDCNN with dataset B were both less than1.0 × 10−5. In terms of training cost, the iteration number should be set as 63 in subsequent model training.

It can be seen in Figure 13 that the test accuracies of subgraphs b, d, and f are higher than those of the subgraphs a, c, and e. In particular, the test accuracy of the 1-DDCNN (subgraph d) with dataset B reached 100%.

Figure 14 shows the output features visualization of Conv1 and Flatten layers in the 1-DDCNN with six failure statuses. The feature similarity of the four-channel output of Conv1 was relatively high for the six failure statuses. After convolution and flattening operations, the six failure statuses exhibited unique characteristics, which are conducive to distinguishing the failure status of the model.

**Figure 14.** Visualization of output features of Conv1 and Flatten layers with the six failure statuses of 1-DDCNN. (**1**–**6**) correspond to the six failure statuses in Table 3.

### **5. Conclusions**

Since the 2-DCNN cannot directly process one-dimensional time series data, which often requires complex pre-processing, a novel 1-DDCNN is proposed for landing gear R/E system fault diagnosis in this paper. Dilated convolution can exponentially increase the receptive field of the convolution kernel by adding the convolution layer, which could acquire more redundant information to alleviate the influence of randomness. The displacement of the actuator cylinder was selected as the feature parameter, and the diagnosis classification was carried out on the traditional 1-DCNN model, for which the average diagnosis accuracy reached 91.80%. Due to the limited feature information extraction and inaccurate diagnosis for a single feature in the traditional 1-DCNN, multiple feature parameters are selected to jointly represent the fault and to input into the proposed model for feature integration. The convolution kernel size's influence on classification accuracy is explored. When the convolution kernel size is 50, the model has the highe ststability. The results show that the average diagnostic accuracy of the proposed model is 99.80%, compared with other models.

Future work will be carried out on the following two aspects. Firstly, the system has noise in the actual working environment, and it is necessary to verify the robustness of the proposed model on noisy data. Secondly, this paper only considers the influence of a single parameter, such as oil mixing into the air or actuator leakage, on the system operating state. In the future, the complex situation of the simultaneous failure of multiple internal components, and the consequent effects on the system operating state, should be studied.

**Author Contributions:** J.C.: Resources, Funding acquisition, Project administration, Writing—Review & editing. Q.X.: Methodology, Software, Writing—Original. Y.G.: Data acquisition, Investigation. R.C.: Validation, Project administration. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Nature Science Foundation of China (Grant no.61873203), China Academy of Space Technology Innovation Foundation(CAST-2020-02-11), Aeronautical Science Foundation of China (Grant No.20200033053001).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


*Article*

#### **Study on a Fault Mitigation Scheme for Rub-Impact of an Aero-Engine Based on NiTi Wires**

**Qiang Pan 1, Tian He 1,\*, Wendong Liu 2, Xiaofeng Liu 1 and Haibing Chen 1**

1 School of Transportation Science and Engineering, Beihang University, Beijing 100191, China; panqiang@buaa.edu.cn (Q.P.); liuxf@buaa.edu.cn (X.L.); aw\_chenhb@buaa.edu.cn (H.C.)

2 AECC Shenyang Engine Research Institute, Shenyang 110000, China; lwdbuaa@163.com

**\***Correspondence: hetian@buaa.edu.cn; Tel.: +86-131-4128-1907

**Abstract:** The aim of this study was to solve the frequently occurring rotor-stator rub-impact fault in aero-engines without causing a significant reduction in efficiency. We proposed a fault mitigation scheme, using shape memory alloy (SMA) wire, whereby the tip clearance between the rotor and the stator is adjusted. In this scheme, an acoustic emission (AE) sensor is utilized to monitor the rub-impact fault. An active control actuator is designed with pre-strained two-way SMA wires, driven by an electric current via an Arduino control board, to mitigate the rub-impact fault once it occurs. In order to investigate the feasibility of the proposed scheme, a series of tests on the material properties of NiTi wires, including heating response rate, ultimate strain, free recovery rate, and restoring force, were carried out. A prototype of the actuator was designed, manufactured, and tested under various conditions. The experimental result verifies that the proposed scheme has the potential to mitigate or eliminate the rotor-stator rub-impact fault in aero-engines.

**Keywords:** rub-impact; tip clearance; shape memory alloy; aero-engine; fault mitigation

### **1. Introduction**

The efficiency of a rotating machine such as an aero-engine is strongly dependent on the tip clearance between the stationary and rotating parts [1]. In order to improve the efficiency of an aero-engine, the clearance should be designed as small as possible. It has been reported that a 0.0254 mm reduction in the tip clearance of a high-pressure turbine may lead to a decrease of 0.1 percent in specific fuel consumption [2,3]. However, minimizing the clearance is usually associated with undesired rub-impact phenomena occurring between the rotor and the casing due to mechanical, aerodynamic excitation, or thermal gradience during engine operation [4,5]. This leads to material or structural damage, e.g., plastic deformations, changes in the microstructure on the blade tips, crack initiation, and the break out of liner material at the rubbing zone [6], and, sometimes, catastrophic accidents [7].

Researchers have strived to understand the rub-impact fault, for example, exploring the fault mechanism [8–10], the fault feature extraction method [11–13], source localization [14,15], and intelligent fault diagnosis [16,17], etc. These studies contributed to the development of rub-impact fault diagnosis and to practical applications. However, most studies focused on how to monitor the occurrence of a fault and analyze its type or location; few were concerned with the mitigation or elimination of the rub-impact phenomenon. Unfortunately, rub-impact is a fault which may induce disastrous accidents, if it cannot be mitigated in time.

In order to protect the rotor from rub-impact from the casing of an aero-engine, a feasible solution is to adjust the tip clearance when it occurs. The most popular scheme to achieve this that is used in the design of aero-engines is based on the active clearance control (ACC) method [18]. For example, Bucaro et al. designed a new gas turbine engine thermal control device to improve the control efficiency in terms of the active thermal

**Citation:** Pan, Q.; He, T.; Liu, W.; Liu, X.; Chen, H. Study on a Fault Mitigation Scheme for Rub-Impact of an Aero-Engine Based on NiTi Wires. *Sensors* **2022**, *22*, 1796. https:// doi.org/10.3390/s22051796

Academic Editors: Bing Li, Jinchen Ji, Hamed Kalhori and Yongbo Li

Received: 21 January 2022 Accepted: 21 February 2022 Published: 24 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

control system [19]. Tillman et al. proposed a system and method of active thermal control that includes processing aircraft data when the aircraft is flying at altitude cruise conditions [20]. Decastro et al. used two actuators consisting of electrohydraulic servo valves and piezoelectric stacks to adjust the shroud in a high-pressure turbine section [21]. However, ACC is generally used to schedule the clearance to improve the performance of an operating aero-engine. The actuators based on ACC managemen<sup>t</sup> do not execute any action spontaneously when a rub-impact occurs between the rotor and casing, though such events frequently occur [22].

Recently, researchers have explored ways of limiting the rub-impact in an aero engine. They have proposed adding damping in the rotating machinery to control the abnormal vibration and reduce the rubbing-induced stress. Ma et al. [23] proposed a multi-objective optimization method to analyze the vibration attenuation effects of the squeeze film damper parameters on the dynamic response of the system. It was found that in the misalignmentrubbing coupling fault, the amplitude of the fundamental frequency reduced by 7.4%, the amplitude of 2× the fundamental frequency dropped by 51.5%, and the amplitude of 3× the fundamental frequency reduced by 16.8%. Their study provided a theoretical reference for vibration control and the optimal design of rotating machinery. Xu et al. [24] suggested an impulsive control method to eliminate the rotor-stator rubbing, based on the phase characteristic. It utilized the vibration energy and the phase difference to trigger the control of the rotor-stator rubbing by impulse. The impulse was applied in directions *x* and *y*, several times, to avoid the rotor-stator rubbing. However, it is a theoretical study, applicable to simple rotors only, and it did not account for the implementation of impulse in practical rotating machinery. Shang et al. [25] investigated the influence of cross-coupling effects on the rubbing-related dynamics of rotor/stator systems. They proposed a control method by generating cross-coupling damping on the stator through the active auxiliary bearing, thereby suppressing the contact severity to avoid rubbing instability. This method was validated by numerical analysis on the Jeffcott rotor model, ye<sup>t</sup> it lacks theoretical analysis and experimental data for complicated rotating systems. However, adding damping to the vibration system is beneficial with minor faults but it is not a valid approach once the rub impact is serious. The research into adding damping in the rotating machinery is not ye<sup>t</sup> well developed.

It has been recognized that rub-impact originates from a very small tip clearance between the stationary and rotating parts in an operating aero-engine [26]. If the clearance can be systematically controlled, the rub-impact might be effectively mitigated or eliminated. Garg [27] suggested shape memory alloy (SMA) as an attractive actuation material in an aero-engine due to its high order-of-magnitude energy density and low energy consumption. DeCastro et al. [28] advanced the concept of a prototype actuator consisting of high-temperature shape memory alloys for ACC actuation in the high-pressure turbine section of a modern turbofan engine. Based on the intrinsic properties of SMA and on the development of tip clearance control technology management, this paper presents a method to monitor rub-impact faults and mitigate them via SMA. The proposed method is schematically shown in Figure 1.

This paper presents a promising solution for rub-impact fault detection and mitigation for an aero-engine. An active control scheme, based on two-way SMA wires actuation is proposed. The rub-impact fault is monitored by an acoustic emission (AE) sensor. An SMA wire-based actuator prototype operated by an Arduino control board is established. A series of tests to establish the material properties of NiTi wires, including heating response rate, ultimate strain, free recovery rate, and restoring force, were carried out. The mechanism and design of the actuator are described in detail in this paper. The feasibility of the proposed model for rub-impact fault is verified by our experimental research and our results show that the proposed active control actuator can effectively mitigate rub-impact fault when it occurs. Therefore, the main contribution of this paper is that it provides a potential way to mitigate or even eliminate the accidental rub-impact fault, without a significant reduction in the engine's efficiency.

**Figure 1.** The basic idea of the proposed rub-impact fault mitigation method.

#### **2. Mechanical Behaviors of SMA Wires**

Typical SMA, such as NiTi, is capable of recovering its original shape after plastic deformation by heating above its characteristic transition temperature via the shape memory effect (SME) [29]. This unique mechanical behavior results from a phase transformation between high-temperature austenite and low-temperature martensite phases [30] and has led to many actuation applications [31,32]. In this study, NiTi wires with a two-way shape memory effect [30] were used in our design to drive the actuator through the application of an electric current. As the primary functional component of the actuator, the SME mechanical behaviors of NiTi wires were analyzed in a series of tests prior to design assembly. The material properties of the NiTi wires used in this study and their transition temperatures are listed in Tables 1 and 2, respectively.

**Table 1.** Material properties of NiTi wires.


**Table 2.** Phase transition temperatures of NiTi wires.


#### *2.1. Heating Response Rate of NiTi Wires*

The heating response rate of the NiTi wire is used as a measure of how fast the wire's SME function works. This function is represented by the phase transformation spending time, which is the time it takes the wire to reach its austenite finish temperature Af from its starting state. The shorter the phase transformation spending time, the higher the heating response rate. In order to investigate the heating response rate of NiTi wires, a series of tests under various temperatures were conducted. The experimental setup is schematically shown in Figure 2. The test rig consisted of a DC power supply, a tension test platform, NiTi wire, a thermocouple, and a data acquisition system.

The length and diameter of the NiTi wire were 100 mm and 0.8 mm, respectively. During the tests, the wire was clamped on a tension test platform and one thermocouple was attached to the wire to measure its surface temperature. At room temperature T0 = 21 ◦C, the wires were heated by 2, 3, 4, and 5 A electric currents provided by the DC power supply. The temperature variations of the wires are shown in Figure 3. The temperature of the NiTi wire rose slowly initially, becoming almost invariable, with an increased electric current. This means that the rising temperature rate of the NiTi wire gradually decreases, and tends to become stable when subjected to a continuous electric current. Figure 3 also indicates that the rising temperature rate of the wire is dependent on the amplitude of the electric current. A large current amplitude is associated with a high rising temperature rate. In

order to evaluate the response of the NiTi wires to different electric currents, the phase transformation spending time, which is the time that it takes for the wire to reach Af from its starting state, was measured, using various currents. The comparison results are shown in Table 3. The heating response rate rose with an increased electric current, but an increase in the response rate was not obvious when the electric current increased beyond 6 A.

**Figure 2.** Experimental setup of the heating response rate tests.

**Figure 3.** Temperature variation of NiTi SMA wire subjected to different currents.

**Table 3.** The phase transformation spending time to reach Af using different currents.


#### *2.2. Ultimate Strain of NiTi Wires*

A tension test was conducted using an Instron 5565 Universal Testing Machine. The length of the wire specimen used was 750 mm. Because the wire is very fine, it is difficult to accurately measure the ultimate strain directly during its deformation. As such, a loaddisplacement curve was plotted from the tension test results, as shown in Figure 4. The wire underwent elastic deformation from 0 to point A, plastic deformation from point A to B, a hardening stage from point B to C, and it fractures at point C. The elongations at points A, B, and C are 0.68%, 4.89%, and 13.33% of the original length of the wire, respectively. We found that the NiTi wire behaved well in terms of plasticity and its pre-strain was not allowed to exceed 13.33% due to its ultimate strain.

**Figure 4.** A load-displacement curve under tension test.

#### *2.3. Free Recovery Rate of NiTi Wires*

The free recovery rate of NiTi wires under different pre-strain was investigated using the same experimental setup as was used in the heating response rate tests. The initial length of the wire specimens was 100 mm and they were trained to have the characteristic of two-way SME. The wires were stretched with pre-strains of 2%, 3%, 4%, 5%, 6%, and 7%, and were heated to recover their deformations, then cooled to room temperature. As shown in Figure 5, *L*0, *L*1, *L*2, and *L*3 represent the original length of the wire, the length of the wire after being stretched and unloaded, the length of the wire after being heated, and the length of the wire after being cooled to room temperature, respectively. The elongations of the wires were measured to calculate the two-way free recovery rate *η*, which is defined by:

$$\eta \, \, = \, \frac{L\_3 - L\_2}{L\_1 - L\_0} \tag{1}$$

Based on Equation (1), the variation in the free recovery rate with respect to the cycle number at different pre-strain levels is plotted in Figure 6. The two-way free recovery rate initially increased with increased training cycles and then tended to stabilize after 60 cycles. Additionally, the large pre-tension strain resulted in a low free recovery rate. However, the largest free recovery rate appeared when the pre-strain was 4%. Therefore, a median deformation of 4% is preferred in order to obtain the maximum recovery rate of the NiTi wire used in a two-way SME.

**Figure 5.** Two-way shape memory effects.

#### *2.4. Restoring Force of NiTi Wires*

The restoring forces of the NiTi wires, with various pre-strains and which were subjected to 3, 4, 5, and 6 A of electric current, were tested and measured. Figure 7 shows that the restoring force of each wire, at a given electric current, begins to rise sharply and then tends to stabilize. In addition, the maximum restoring force of the NiTi wire with a given pre-strain is dependent on the amplitude of the electric current applied. The restoring force rose with increased current. To compare the heating efficiency of the wires subjected to different electric currents, the response time required to reach the maximum restoring force of each wire was measured and is listed in Table 4. The results indicate that the heating response time to reach the maximum force under a large current does not change significantly. Moreover, the wire with a 4% pre-strain presents the largest restoring force at 6 A. Thus, the designed actuator, installed with a NiTi wire with a 4% pre-strain, may obtain the largest restoring force in an acceptable response time under such conditions.

**Figure 6.** Free recovery rate curve of NiTi wires.



**Figure 7.** The restoring force of NiTi wires subjected to different electric currents: (**a**) ε = 2%; (**b**) ε = 3%; (**c**) ε = 4%; (**d**) ε = 5%; (**e**) ε = 6%; and (**f**) ε = 7%.

#### **3. Design of the Active Clearance Control Actuator**

#### *3.1. Prototype of the Actuator*

The proposed actuator was designed with the aim of mitigating or eliminating the rub-impact fault when it occurs between the rotor and casing during the operation of an aero-engine. A flowchart of the proposed fault mitigation scheme is shown in Figure 8. The design of the proposed actuator consists of a package, an electrode plate, an insulation layer, pre-stretched NiTi wires, a driving lever, a limit roller, a baffle, a spring, and external and internal casings, as is shown schematically in Figure 9.

**Figure 8.** Flowchart of the proposed fault mitigation scheme.

**Figure 9.** Design of the proposed actuator.

The actuator works by using a driving lever that passes through the external casing. The two ends of the lever are fixed on the internal casing and the bottom electrode plate, respectively, which forces the internal casing to deform when the lever moves. Two insulation layers are placed between the electrode plates and the other parts of the actuator for the insulation and protection of the entire structure. The two ends of NiTi wires are bolted at each end to two electrode plates and are heated by an electric current during the operation of the actuator. A baffle is welded onto the driving lever and a spring is installed between the baffle and the external casing to generate a restoring force. In addition, one limiter with three rollers, circumferentially threaded and installed on the driving lever, act

to improve the accuracy of the motion. The prototype of the actuator is shown in Figure 10. To allow us to observe the behavior of the actuator, the package was not completely sealed, as shown in Figure 10d.

**Figure 10.** Prototype of the SMA-based actuator: (**a**) Limiter; (**b**) driving lever and internal casing; (**c**) electrode plates; and (**d**) package.

Once a rub-impact fault is detected, the external direct current (DC) power supply system begins to heat the SMA wires. When the temperature is beyond the austenite start temperature of the wires As, the wire undergoes a martensite phase transformation and is compressed due to the unique property of SME. This compression forces the internal casing to move upwards via the driving lever. Due to the motion of the internal casing, the tip clearance between the rotor and the casing is enlarged and, as a result, the rub-impact phenomenon is mitigated or possibly eliminated. The aim of using two-way NiTi wires is to mitigate the rubbing fault, whilst guaranteeing engine efficiency, which is achieved when the clearance is reduced once the NiTi wires cool to the martensite phase. To verify the feasibility of the proposed model, a prototype of an SMA-based actuator was designed and its geometric parameters are shown in Table 5.


**Table 5.** Design of the proposed actuator prototype.

#### *3.2. Control Scheme of the Actuator*

In order to realize the self-healing of rub-impact faults, a control system based on an Arduino control board was investigated for the proposed actuator. This system consists of an Arduino Uno R3 control board, an RB-02S082A piezoelectric sensor, and an SRD-05VDC-SL-C 5V electromagnetic relay, as shown in Figure 11. When the system is running, the AE signals are acquired by the piezoelectric sensor and are then transferred to the Arduino Uno R3 control board, which is capable of identifying whether the acquired signals are fault signals. The electric circuit switching function is implemented by compiled coding. In normal conditions, the current circuit is shut down and the actuator does not work. Once a rub-impact fault is identified by the board, an order is sent to the electromagnetic relay to switch on the current circuit to heat the NiTi wires. A flowchart depiction of the algorithm of the control scheme is shown in Figure 12.

**Figure 11.** Configuration of the control system.

**Figure 12.** Flowchart of the control scheme.

#### **4. Experimental Verification of the Clearance Control Mechanism**

In order to evaluate the effectiveness of the clearance control actuator, the mechanism was verified by an experimental study. Figure 13 shows the experimental setup, which consisted of a rotor test rig, a rotor power supply, an accelerator, an electromagnetic relay, an Arduino control board, a piezoelectric sensor, an AE sensor, a preamplifier, a signal acquisition instrument, a DC power supply, and a PC and data analysis system.

**Figure 13.** Equipment of the clearance control mechanism.

The NiTi wires with 4% pre-strain were installed on the clearance control actuator, and the actuator was fixed on the external casing of the aero-engine. The power of the rotor test rig was provided by a rotor power supply and the rotational speed of the rotor was controlled by an accelerator. The piezoelectric sensor was attached to the upper surface of the internal casing and was connected to the Arduino control board to identify the rubbing and control the electromagnetic relay operation. An AE sensor was attached to the surface of the external casing to acquire the AE signals. These AE signals were processed by a signal preamplifier and then transmitted to the PC.

#### *4.1. Setting of the Threshold and Sampling Frequency*

The accuracy of the proposed control scheme is dependent on the selection of the signaling threshold and sampling frequency. The operation of an aero-engine is always accompanied by vibrations, whether or not rub-impact occurs. Therefore, the primary goal is to ascertain the appropriate threshold at which rub-impact faults occur, by reference to the amplitude of the vibration signal. Figure 14 shows the signals acquired by the piezoelectric sensor at rotational speeds of 1500, 2000, 2500, and 3000 rpm. The amplitudes of all vibration signals without the rub-impact fault are lower than 20. Therefore, the threshold may be set to 20 as the threshold for the occurrence of the rub-impact fault. Regarding the sampling, in our experience, it is generally better for the set to be greater than 5 times the highest frequency of the vibration signals, to ensure the reliability of the acquired data. Because the rotational frequency of the rotor at 3000 rpm was measured at 50 Hz in our tests, the sampling frequency in this study was set to 500 Hz, which was 10 times the highest frequency.

**Figure 14.** Vibration signal at various rotational speeds: (**a**) *n* = 1500 rpm; (**b**) *n* = 2000 rpm; (**c**) *n* = 2500 rpm; and (**d**) *n* = 3000 rpm.


The effectiveness of the proposed clearance control actuator was verified at rotational speeds of 1500, 2000, 2500, and 3000 rpm, with a 6 A electric current, in our tests. Assuming that *t*0, *t*1, *t*2, *t*3, *t*4, *t*5 represent the points at which a rub-impact fault occurs, the amplitude of the AE signal reaches its maximum, the fault is eliminated, the second rub-impact fault begins, the amplitude of the AE signal reaches its maximum again, and the second fault is eliminated, consecutively. The specific durations are defined by:

$$T\_1 = \ t\_1 - t\_0 \tag{2}$$

$$T\_2 = t\_2 - t\_1 \tag{3}$$

$$T\_3 = t\_3 - t\_2 \tag{4}$$

$$T\_4 = t\_4 - t\_3 \tag{5}$$

$$T\_{\mathbb{S}} = t\_{\mathbb{S}} - t\_{\mathbb{A}} \tag{6}$$

where *T*1 is the duration from the start of rub-impact to its most serious state, *T*2 denotes the rub-impact fault elimination time spent by the proposed actuator, *T*3 represents the time interval between the rub-impact fault and the start of the following rub-impact fault event, *T*4 is the duration from the start of the next fault to its most serious state, and *T*5 denotes the time spent to eliminate the fault.

Table 6 gives the spending times measured at various rotational speeds in the rotation test, ata6A electric current. The results show that the control times *T*1, *T*2, *T*3, *T*4, and *T*5 are almost the same at different rotational speeds. This means the control time of the clearance control mechanism based on two-way NiTi wires is independent of the rotational speed of the rotor.

**Table 6.** Control time of the clearance control actuator at different rotational speeds.


A series of rotational rubbing tests were conducted under the condition of a 6 A current and a 1500 rpm rotational speed, and the AE signal during the operational process was measured. As shown in Figure 15, the amplitude of the AE signal continuously rises with increased rotor speed during the start-up process until it reaches the maximum at *t*1 = 2.143 s, i.e., the rubbing becomes serious and a fault at this moment. At *t*2 = 3.177 s, this rubbing fault is eliminated. The next rubbing fault occurs at *t*3 = 17.49 s, becoming severe at *t*4 = 18.77 s. At *t*5 = 19.78 s, the fault is eliminated.

**Figure 15.** AE signals acquired during the clearance control process based on two-way NiTi SMA wires.

(2) Effect of the electric current

A series of rotational rubbing tests at 4, 5, and 6 A electric currents were conducted at rotational speeds of 1500 rpm. The rubbing fault eliminating time spent during the control process is recorded in Table 7. It was found that *T*2, *<sup>T</sup>*3, and *T*5 changed significantly with different electric currents. The durations of *T*2 and *T*5 gradually descended with increased current. Therefore, a large electric current contributes to a rapid control response time. In addition, the interval between one rub-impact fault and the next fault, *T*3, becomes larger at a higher current compared to lower currents, resulting from the fact that the NiTi wires need more time to cool. The proposed actuator is affected significantly by the electric current.


**Table 7.** Control time of the NiTi wire based actuator at various electric currents.

### **5. Conclusions**

A fault mitigation scheme based on a two-way SMA wire system to mitigate and possibly eliminate the rotor-stator rub-impact of an aero-engine, by controlling the tip clearance between the rotor and stator, was proposed in this study. Based on the inherent characteristics of SMA, a prototype of an SMA-based actuator was designed, manufactured, and tested. Through our experimental study, the proposed scheme was verified and the following conclusions were drawn:


**Author Contributions:** Conceptualization, Q.P. and T.H.; methodology, Q.P. and X.L. and T.H.; software, X.L. and H.C.; validation, W.L. and H.C.; formal analysis, X.L.; investigation, Q.P. and W.L.; resources, T.H.; writing—original draft preparation, W.L. and Q.P.; writing—review and editing, Q.P. and T.H.; visualization, W.L.; supervision, T.H.; project administration, X.L. and H.C.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Science and Technology Major Project (2017-I-0007-0008).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **Abbreviations**

The following abbreviations are used in this manuscript:


## **References**


### *Article* **Self-Powered Self-Contained Wireless Vibration Synchronous Sensor for Fault Detection**

**Ghufran Aldawood 1 and Hamzeh Bardaweel 1,2,3,\***


**Abstract:** Failure in dynamic structures poses a pressing need for fault detection systems. Interconnected sensor nodes of wireless sensor networks (WSN) offer a solution by communicating information about their surroundings. Nonetheless, these battery-powered sensors have an immense labor cost and require periodical battery maintenance and replacement. Batteries pose a significant environmental threat that is expected to cause irreversible damage to the ecosystem. We introduce a fully integrated vibration-powered energy harvester sensor system that is interfaced with a customdeveloped fault detection app. Vibrations are used to power a radio frequency (RF) transmitter that is integrated with the vibration sensor subunit. The harvester-sensor unit is comprised of dual moving magnets that are bordered by coil windings for power and signal generation. The power generated from the harvester is used to operate the transmitter while the signal generated from the sensor is transmitted as a vibration signal. Transmitted values are streamed into a high precision fault detection app capable of detecting the frequency of vibrations with an error of 1%. The app employs an FFT algorithm on the transmitted data and notifies the user when a threshold vibration level is reached. The total energy consumed by the transmitter is 0.894 μJ at a 3 V operation. The operable acceleration of the system is 0.7 g [m/s2] at 5–10.6 Hz.

**Keywords:** vibration energy harvesting; vibration sensor; self-powered sensor; clean technology; wireless vibration sensor; IoT support technology

### **1. Introduction**

Internet of Things (IoT) technologies are blooming and are expected to reach a 1567 billion USD market value by 2025 [1]. Currently, there are a little over 7 billion sensor nodes worldwide [1]. These sensors represent the backbone of IoT systems since they are responsible for detecting essential information about the surrounding environment and monitoring the health conditions of structures such as compressors, pumps, bridges, tunnels, railroads, and other dynamic structures [2]. Monitoring the health conditions of structures using these sensors helps in preventing catastrophic failures and loss of lives [2].

Presently, the majority of these sensors are powered using traditional batteries [3]. The use of these traditional batteries limits the scope and value of these IoT sensors [4,5]. This is due to the fact that the use of conventional batteries as a power source for IoT sensors results in several challenges. Not only do these batteries require continuous and frequent replacement and maintenance [6], but also they have a limited lifespan and pose an environmental threat [7]. Given the projected worldwide spread of these IoT sensors, this environmental concern has become a pressing issue to deal with [8]. Moreover, in harsh environments and remote locations, including wildlife, gas and oil fields, and drilling

**Citation:** Aldawood, G.; Bardaweel, H. Self-Powered Self-Contained Wireless Vibration Synchronous Sensor for Fault Detection. *Sensors* **2022**, *22*, 2352. https://doi.org/ 10.3390/s22062352

 Academic Editors: Hamed Kalhori, Yongbo Li, Bing Li and Jinchen Ji

Received: 17 February 2022 Accepted: 15 March 2022 Published: 18 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in mining fields, a self-powered solution becomes the only viable option for deploying long-lasting, stand-alone, eco-friendly, and sustainable sensors [9].

Consequently, there has been a growing interest in developing autonomous, selfpowered, environment-friendly sensor technologies as a necessary and integral part of the flourishing IoT systems and technologies [10]. Tremendous recent studies have investigated the issue of developing autonomous sensors powered by free sources of energies surrounding these sensors including solar and vibration energies. Vibration-powered sensors are an attractive option since they are not constrained by the availability of sunlight and can operate indoors and underground [3]. Moreover, vibration energy consists of a broadband vibration spectrum rich in low frequencies with a power density as high as 500 μW/cm<sup>3</sup> [11]. Vibration energy also represents a wealthy form of freely-available energy in transportation [12] and industry sectors [13].

To this end, Xin Li et al. developed a vibration-powered sensor node [3]. The system used a piezoelectric transducer to harvest ambient vibrations. The fully integrated self-powered sensing and transmitting system consisted of a few units including energy generation, energy transduction, energy-boosting, energy managemen<sup>t</sup> and circuitry, and demonstration unit (mobile interface). The system was successfully demonstrated under various vibrations conditions. In another effort, two wireless sensor nodes were powered using an electromagnetic vibration energy harvesting system [1]. A custom-built power conditioning system was integrated into the energy harvesting system and then used to power a sensor node with a duty cycle of 30 s. The self-powered system was shown to produce enough power to receive and transmit information at intervals of less than 60 s. Moreover, in Ref [14] a piezoelectric energy harvester was used to power a wireless platform which consisted of a vibration sensor, a microcontroller, a power managemen<sup>t</sup> circuitry, and a custom-built low power radio transmitter. The fully integrated system was operated at acceleration level and frequency of 0.25 g [m/s2] and 100 Hz, respectively. The system was able to transmit the sensor data every 10 s for a duty cycle of 0.2%. Moreover, in their study, Lu Wang et al. [15] built a wireless temperature sensor node powered by a piezoelectric bimorph cantilever vibration energy harvesting system. A big proof mass was attached to the harvester to lower its resonant frequency to approximately 22 Hz. A power managemen<sup>t</sup> circuitry was built for rectifying the output from the harvester which was then used to power the temperature sensor. In Ref [16], the authors utilized a commercial piezoelectric cantilever as an energy harvester for powering a wireless temperature sensor node. The study focused on investigating the design methodology for the power management circuitry used in their work. A demonstration of the self-powered temperature sensor was performed. Additionally, Lu Wang et al. [17] built a hybrid piezoelectric-triboelectric unit and used it to construct an autonomous wireless sensor node where the piezoelectric generator served as the energy source and the triboelectric worked as an accelerometer (i.e., sensing unit). The piezoelectric harvester produced approximately 6.5 [mW] at 1 g [m/s2] and 25 [Hz] and the triboelectric accelerometer showed a sensitivity of 15 V/g for acceleration range 0–1.5 g [m/s2]. In another effort, a piezoelectric energy harvesting-sensor unit was developed and implemented in monitoring airflow from an HVAC outlet [18]. To avoid signal distortion from the sensor the proposed system used two separate piezoelectric devices (i.e., one for energy harvesting and one for sensing purposes).

One of the major constraints in wireless sensor technologies is associated with the limitations in the supplied energy to the wireless sensor nodes [19–21]. Overcoming this challenge can only be achieved through minimizing energy consumption by means of ultralow-power techniques. Several techniques to reduce energy consumption have emerged in recent years, such as duty cycling [22]. Other techniques include topology control [23], which deals with the distribution of the wireless sensor nodes in order to reduce energy consumption while eliminating interference at the lowest cost possible. Additionally, data transmission network protocol selection is based on application, where different protocols have variable bandwidths with varying energy consumptions [24]. Cyber security of those

networks is also amongs<sup>t</sup> the necessary features that when added would require more processing power from the sensor node [25].

The work presented in this article is focused on developing a novel, self-powered, self-contained, environment-friendly, and wireless vibration sensor. One of the major issues that the vibration-powered wireless sensor node designers are faced with is the unstable power source causing much noise to the transmitted data [14]. This causes a need for stringent supply regulation using linear or switching regulators that amounts to power losses due to heat as a result of the regulation. In this work, power is conserved from not using any regulators, and instead, the noise corrupted data is retrieved from post-processing through a custom-developed dynamic fast Fourier transform (FFT) app. A given dynamic structure has a vibration signature where a frequency shift can be detected by implementing an FFT algorithm to the time series data [26]. Furthermore, there is an ample amount of noise introduced into the signal due to interference from the use of radiofrequency modules [14] and the use of FFT in post-processing of the data lessens the impact of noise on the signal [27]. The custom developed app in this work can fetch the sensor bit-stream, buffer the data, and use it to plot the vibration signal amplitude and its frequency in real-time.

Moreover, unlike in the relatively high voltages produced in piezoelectric transducers that would set a need for voltage regulation [3], in this work, electromagnetic transduction is used and the output voltages are within the electronics acceptable supplied voltage range. Also, unlike the aforementioned studies and state-of-the-art developments, in this work, the self-powered sensor uses vibrations to synchronously perform two functions. First, these free vibrations are converted into useful electric power through the presented energy harvester system. Second, these vibrations are detected as electric signals (voltage) by the presented vibration sensor, and are then transmitted wirelessly to the workstation (laptop). Thus, the harvester-sensor hardware is self-contained and self-powered. That is, the mechanical power required to operate the sensor is obtained from the energy harvester. A charge pump circuit, also known as a voltage multiplier, is used to rectify the AC output of the energy harvester. The output DC is then stored in a supercapacitor that provides the energy to the microcontroller and sensor transmitter circuit. The voltage multiplier circuit allows immediate circuit startup due to the transmitter circuit having sufficient voltage and electric current to operate. The signal from the sensor is sampled using a 10-bit analog to digital converter (ADC) and is transmitted over an RF amplitude modulated (AM) carrier. When monitoring the health conditions of a structure, a shift in its vibrations signature may indicate a malfunction in the structure which could lead to impending failure [2,28]. In this article, collected sensor data are analyzed through a custom-developed dynamic displacement monitoring software which helps in mitigating damage to vibrating structures.

The structure and organization of the article are outlined next. The design and structure of the self-powered self-contained wireless sensor are presented in Section 2. The manufacturing and fabrication of the system components are detailed in Section 3. Experimental methods and testing techniques are detailed in Section 4. Results and findings from this work and system operation and demonstration are discussed in Section 5. Finally, Section 6 presents the major conclusions and summarizes the results from this work.

#### **2. Design Concept and System Configuration**

The concept and overall structure of the proposed self-powered self-contained fully integrated system are shown in Figure 1. The overall structure of the self-powered sensor consists of a few main sub-systems, namely a vibration energy harvester-sensor unit, a transmitter circuit, a receiver circuit, and a custom-developed fault detection app. The main elements and components of these sub-systems are shown in Figure 2.

**Figure 1.** Design and concept of the wireless self-powered, self-contained, and eco-friendly vibration sensor system and its main components.

**Figure 2.** Block diagram showing the layout of the proposed wireless self-powered, self-contained, eco-friendly vibration sensor system.

#### *2.1. Vibration Energy Harvester-Sensor Subsystem*

The vibration harvester-sensor unit consists of two major components: a vibration energy harvester and a vibration sensor (as shown in Figure 3). The harvester consists of two (top and bottom) magnets with a third magne<sup>t</sup> that is levitated between them. The magnets are arranged in a repulsive configuration with alike poles facing each other and, therefore, the levitated magne<sup>t</sup> is floating between the top and bottom magnets. The bottom magne<sup>t</sup> is fixed while the top magne<sup>t</sup> is glued to, and guided by, a mechanical FR4 sensor diaphragm. A 40 AWG stationary copper coil is wound around the levitated magne<sup>t</sup> for electric power generation. The vibration sensor consists of the mechanical diaphragm and its guided top magnet, and copper coil windings are positioned around the top magne<sup>t</sup> as shown in Figure 3. The 3D printed guiding rail of the levitated magne<sup>t</sup> is designed to provide a restricted travel pathway for the levitated magne<sup>t</sup> between the top guided magne<sup>t</sup> and the bottom fixed magnet. The energy harvester coil windings are wound around fixed cylindrical support that is positioned in alignment with the center of the levitated magnet. The fixed magne<sup>t</sup> support is used to hold the bottom magne<sup>t</sup> to the guiding rail frame. The 3D printed holder casing is designed to hold the guiding rail of the levitated magne<sup>t</sup> to the rest of the components as shown in Figure 3.

When subject to external vibrations, first, the levitated magne<sup>t</sup> moves inside the harvester's coil windings, thus converting the kinetic energy from these oscillations into electric power that is used to operate the system shown in Figure 1. Consequently, dynamic displacement is induced in the sensor diaphragm and top magne<sup>t</sup> as a result of these excitations. In turn, induced vibrations in the sensor's diaphragm and top magne<sup>t</sup> result in induced voltage in the top coil surrounding the top guided magnet. The voltage signal

from the top coil is then sampled by the microcontroller in the transmitter circuit. This is discussed in further detail next.

**Figure 3.** Design and 3D view of the structure of the vibration energy harvester-sensor unit presented in this work: (**a**) exploded view and (**b**) collapsed view.

### *2.2. Transmitter Subsystem*

The main components of the transmitter sub-system are shown in Figure 2. The circuitry encapsulates a Microchip Technology PIC microcontroller that is enabled when the energy harvester has sufficient energy to power the circuit load. The input AC voltage from the energy harvester is rectified by a two-stage voltage multiplier, known as 'voltage doubler'. This is shown in Figure 4. In the voltage doubler circuit, the diode in the first stage of the doubler is forward biased during the negative half cycle of the input sinusoidal waveform. This allows charging up of both capacitors. Meanwhile, during the positive half cycle of the input, the diode in the first stage of the multiplier is reverse biased and is blocking the discharging of the capacitor in the first stage. This allows for the capacitor on the second stage to charge up to approximately twice the voltage of the input source voltage. The output DC voltage from the voltage doubler is stored in the supercapacitor to power the load.

**Figure 4.** Circuit diagram of the voltage doubler used to rectify the input AC output voltage from the energy harvester.

The energy buffer for the system load is chosen to be a supercapacitor, due to being maintenance-free as well as for its long lifetime and superior power density over chemical batteries [29]. The microcontroller draws the stored energy in the supercapacitor and uses it to power the system while allowing the supercapacitor to recharge in between the transmission cycles. An RF solutions radio frequency transmitter/receiver module with a high operating voltage range is used for the vibration sensor packetized data ahead of post-processing by the custom-developed app. To allow for minimum energy consumption and constant power savings, when vibration energy falls below a threshold value, the microcontroller is set to go into idle mode. Similar to sleep mode, in idle mode, the CPU clock is turned off. However, in idle mode, the microcontroller peripheral clock stays on.

### *2.3. Receiver Subsystem*

As shown in Figure 2, the receiver subsystem is composed of the RF receiver, an 8-bit microcontroller, USB to serial UART interface, and the custom-developed dynamic FFT app. The AM radio frequency receiver can receive the transmitted data at a range of 50 m. The microcontroller receives the bit-stream through a universal serial bus (USB) to transistor-transistor logic (TTL) interface at a rate of 300 baud. The received data packet is composed of 1 start bit, 8 data frame bits, and 1 stop bit. The data sampled from vibrations is then run through the in-house custom-made dynamic spectrum analysis FFT app. In the app, the user is prompted to turn data streaming on or off. Once the streaming is turned on, the app stores both voltage and time data into an equal size window buffer. FFT algorithm is carried out on the buffered vibration data, and the output vibration and amplitude of the signal are plotted in real-time on the graphical user interface (GUI) axes. A frequency tracking numeric field feature is also integrated into the app to offer a more distinguishable frequency monitoring.

#### **3. Manufacturing and Prototyping**

This section describes the details of manufacturing, integration, and assembly of the hardware of the self-powered sensor presented in this work. Additionally, details of signal acquisition and data post-processing via the custom-developed fault detection MATLAB app are discussed in this section.

#### *3.1. Fabrication and Integration of System Hardware*

The energy harvester-sensor unit was designed using a 3D CAD designing software (SolidWorks). The casing of the energy harvester was printed using a 3D printer and polylactic acid (PLA) filament. The FR4 sensor diaphragm customized patterns were fabricated using a Kern laser cutter (KER4824-Ti100 micro). The 130 W laser cutter was set at 80% of full power and a cutting rate of 20 mm/s. Three permanent solid magnets were used in the device assembly. The bottom magne<sup>t</sup> was fixed to the bottom support, the top magne<sup>t</sup> was fixed to the FR4 sensor diaphragm, and the levitated magne<sup>t</sup> was left to float. The levitated magne<sup>t</sup> was guided by the walls of the guiding rail tube as shown in Figure 3b. A manual winding machine (MXBAOHENG NZ-1) was used to wind the enameled copper coils around the harvester and sensor. Details and dimensions of the designed and fabricated structures are shown in Figures 5 and 6. A view of the final assembled energy harvester-sensor unit is shown in Figure 7. A list of the design specification and materials used to fabricate the energy harvester-sensor unit is shown in Table 1.

The transmitter circuit electronic components, shown in Figure 8, were affixed on an insulating board using soldering. The PIC microcontroller that was chosen (PIC16LF15325/45) had ultra-low-power features where it typically consumed only 8 μA at 32 kHz oscillator frequency and 50 nA at 1.8 V in sleep mode. The PIC also included a windowed watchdog timer feature that was able to issue a reset to the microcontroller in the event of software failure. Furthermore, the microcontroller peripheral module disable (PMD) feature was used to disable all of the unused peripherals to minimize power consumption.

**Figure 5.** Cross-section view of the vibration energy harvester-sensor unit along with a blow-up detailed section view.

**Figure 6.** Design and fabrication of the FR4 Sensor diaphragm: (**a**) Fabricated sensor diaphragm next to a scale, (**b**) CAD model top view and dimensions of the sensor diaphragm, (**c**) CAD model side view of the sensor diaphragm and, and (**d**) simulated CAD view of the deflected sensor diaphragm.

**Figure 7.** Fully assembled energy harvester-sensor unit next to a scale: (**a**) top view of the unit and (**b**) side view of the unit.

**Figure 8.** Circuit and components of the transmitter subsystem as part of the manufacturing process of the overall wireless sensor system presented in this work.


**Table 1.** Specification and details of the fabricated energy harvester-sensor unit.

Other components on the circuit board included a 433 MHz RF transmitter module, voltage doubler components incorporating two 16 V, 2200 μF electrolytic capacitors, and two Schottky diodes. The circuit also holds a supercapacitor energy buffer size 47 mF with a 5.5 V voltage rating and a low equivalent series resistance (ESR) of 25 Ω. A circuit diagram and interconnections of the transmitter circuit board are shown in Figure 9.

**Figure 9.** Circuit diagram and interconnections of the microcontroller transmitter subsystem components used in the wireless vibration sensor system presented in this work.

### *3.2. Signal Acquisition*

Both microcontrollers used in the transmitter and receiver subsystems were programmed through the PIC embedded applications development freeware (MPLAB X IDE) and a PICkit 3 in-circuit debugger. The vibration signal acquisition process described in Figure 10 starts when the microcontroller exits the idle (sleep) mode. The microcontroller is programmed to stay in sleep mode until it is prompted to 'wake up' when triggered by the voltage level held by the supercapacitor. The microcontroller waking up process is set up to initiate when the sampled signal from the supercapacitor reaches (2 V). The supercapacitor voltage signal sampling takes place at one of the 10-bit low power successive approximation ADC channels of the microcontroller (ADC-CH1). The ADC allows conversion of the analog signal into a 10-bit binary form of that input signal. The ADC channel input voltage level can vary from 0 V up to a maximum voltage that needs to be set as the reference voltage of the channel. A reference voltage is needed by the ADC to create a range of voltages that are mapped into specific length binary values. The supercapacitor sampled input voltage is then compared to those binary values from the reference. Typically,

the reference voltage used for ADC channels is the supply voltage to the microcontroller. This poses a challenge since the supply voltage and the sampled voltage would both be of the supercapacitor voltage. This was overcome by using the microcontroller internal fixed voltage reference (FVR) feature that is independent of the microcontroller supply voltage. A programmable independent buffer gain amplifier is used at the output of the FVR and is set to amplify the voltage reference to a desired selectable voltage level. During sampling, the supercapacitor voltage is compared to the FVR and when the value returned matches the selected voltage level, power is delivered to the rest of the system after waking the microcontroller up. During the microcontroller system initialization process, voltage rails are stabilized, and the CPU starts fetching code instructions and data to operate the necessary control registers.

**Figure 10.** A signal acquisition process flowchart of the wireless vibration sensor presented in this work showing detailed description of the program starting at the initialization process of the microcontroller and ending at the transmission of data.

Embedded systems interrupts are hardware features that preempt normal program code operation in order to execute a command that requires CPU attention. As shown in Figure 10, when an interrupt is set, an interrupt service routine firmware (ISR) determines the source of the interrupt by the process of polling. The ISR polling protocol is an active process of monitoring interrupt flag bits from the interrupt flag register. The peripheral interrupt from the supercapacitor voltage level value reaching the set threshold allows the CPU to service that interrupt and wake the microcontroller from sleep. As indicated in Figure 10, unless the supercapacitor voltage level is above the preset threshold, the ISR will continue polling and remain in sleep mode. A command to clear the interrupt flag from a previous interrupt is necessary to execute following the servicing of an interrupt. If the interrupt flag is not cleared, and if the supercapacitor voltage level is above the preset threshold, interrupts will occur repeatedly overriding necessary CPU functions.

In idle mode the microcontroller CPU core and memory operations are halted while the internal peripheral clocks such as ADC channels clock continue to operate. Once the microcontroller is woken up, the sensor data are sampled through a second ADC channel (ADC-CH2). In Figure 10, the ADC logs the sensor data and writes it to the enhanced universal synchronous and asynchronous receiver transmitter (EUSART) register upon waking up from idle mode. The data are then wirelessly transmitted via the 433 MHz AM RF transmitter module. The signal is then received by a compatible receiver module (QAM-RX10) that is connected to a second microcontroller in order to receive the data through the serial port of a PC by utilizing a USB to serial converter. The antenna-equipped receiver provides a two-way communication that transforms the electromagnetic waves into electrical signals. Modulation of the baseband data onto the carrier is accomplished by amplitude shift keying (ASK) of the signal.

#### *3.3. Custom Developed Fault Detection MATLAB App*

In this work, the monitoring app is developed using MATLAB-GUI to extract useful information from the collected vibration signal including maximum displacement amplitude and frequency. The extracted information is then compared against preset threshold values to assess the risk level associated with the operation performed.

Inside the MATLAB app development environment, the GUI components including the data streaming switch, frequency tracking numeric field, status indicator, and real time FFT plot are identified as dynamic objects. The objects are chosen from a MATLAB supported components library as seen in Figure 11. The dynamic objects are configured as public access properties that allow data exposure to the user through the GUI. The corresponding values to the communication port and baud rate of the receiver board are then specified. This allows for initiation of data streaming through the serial port when the 'on' dynamic object under the data streaming label is selected by the user. Evenly sized sectioned data buffers are set up for both voltage and time elements to allocate for real time data plotting. Parameters of the FFT measurements including the length of the signal, sampling frequency, and Nyquist frequency are identified to convert the windowed time domain data into frequency domain. A peak finder function is then used to detect the dominant frequency from the converted data for display in the GUI numeric field region. The amplitude of the resultant peak is then compared against fixed threshold values to vary the color of the status lamp indicator on the GUI. The variable colors of the lamp give the user a risk severity measure of the performed operation. Therefore, the indicator lamp switches color from green when the energy harvester sensor unit is subject to low amplitude vibrations to red once it experiences higher vibration amplitudes. Further details about the custom developed fault detection MATLAB app are discussed later in this article (i.e., Section 5).

**Figure 11.** The design view browser inside MATLAB app development environment. The dynamic objects are chosen from a component library and are populated into the GUI canvas.

#### **4. Experimental Methods and Characterization Techniques**

The experimental testing setup shown in Figure 12a was used to measure the dynamic frequency response of the wireless sensor system presented in this work. An illustration of the signal and power flow of the experimental testing equipment setup is shown in

Figure 12b. In the closed loop vibration testing system, as shown in Figure 12a, a vibration controller (S81B-P02, SENTEK DYNAMICS) is directed by a PC software. Initially, the test is conducted by setting up frequency, acceleration as well as time elements through the software's control settings. The settings are managed through Crystal Instrument's engineering and data managemen<sup>t</sup> (EDM) vibration control system (VCS) software. The preset commands of the test are then transferred into a vibration controller. The controller sends a drive signal to a power amplifier in order to drive a rectilinear shaker table. The shaker table (VT-500 by SENTEK DYNAMICS) transforms the drive signal into mechanical vibrations that are transferred into the energy harvester-sensor unit attached to the shaker armature.

**Figure 12.** Experimental methods used in this work: (**a**) experimental apparatus used for characterization of the wireless self-powered vibration sensor system presented in this work; (**b**) cartoon schematic of the characterization setup showing signal and power flow in the equipment.

When performing the experiments, the lower end of the energy harvester-sensor unit is secured on top of the shaker table as shown in Figure 12. The vibration response from the energy harvester is measured by an accelerometer (PCB333B30 by PCB Piezotronics). The RF transmitter board situated on the static outer rim of the shaker table is connected to the energy harvester-sensor unit as shown in Figure 12a. The RF receiver circuit board is connected to the laptop via the USB to serial TTL level FTDI cable for live frequency response analysis as shown in Figure 12a,b.

#### **5. Results and Discussion**

Using the experimental apparatus shown in Figure 12, dynamic characterization of the fabricated energy harvester-sensor unit was performed. The resulting voltage frequency responses of both the energy harvester and sensor are shown in Figures 13 and 14. The energy harvester-sensor unit was subject to fixed input acceleration values ranging from 0.1 g up to 0.7 g [m/s2] while the frequency was swept at a rate of 0.0833 Hz/s.

**Figure 13.** Energy harvester-sensor unit open circuit frequency response at a range of input accelerations. (**a**) Open circuit voltage of the energy harvester. (**b**) Open circuit voltage from the FR4 sensor diaphragm.

**Figure 14.** Energy harvester-sensor unit closed circuit frequency response at a range of input accelerations. (**a**) Energy harvester closed circuit voltage values measured across the microcontroller subsystem. (**b**) FR4 sensor diaphragm closed circuit voltage values measured across the input to the microcontroller.

The nonlinear magnetic spring stiffness nature of the energy harvester is evident in the voltage frequency response as shown in both Figures 13 and 14. That is, the repulsive magnetic forces experienced by the levitated magne<sup>t</sup> can be described as a nonlinear spring force [7,8]. This results in a hardening effect that is evident when comparing the trend in output voltage peaks shown in Figures 13 and 14. One can notice that these peaks are shifting to higher frequencies as the input acceleration is increased. The nonlinear behavior of magnetic levitation-based energy harvesting system was studied extensively in our prior work [5,7,8,29]. Furthermore, during the frequency sweep, an abrupt sharp decline of the energy harvester's frequency response is evident. This drop, known as the frequency jump phenomenon [7,8], is attributed to the coexistence of multiple energy states at the frequency branch. The discontinuity in the response is a characteristic of magnetic levitation-based energy harvesting systems [5,29].

The supercapacitor charging and discharging cycles were measured using a data acquisition device (NI myDAQ) and the graphical programming environment LabVIEW software. The supercapacitor charging history is shown in Figure 15 where the maximum charge is held at 3.7 V after 122 s. During the charging cycle and after approximately 39 s, a slight shift in voltage level that lasts for 4 s occurs during the microcontroller waking up and voltage rails stabilization stage. The rate of the voltage held by the supercapacitor is seen to decrease from 44 mV/s to 22 mV/s after the microcontroller startup period. This is likely due to the fact that during this stage the system is draining the supercapacitor in order to operate. After approximately 122 s of charging, while simultaneously the energy harvester -sensor unit crosses the resonant frequency point during the frequency sweep, the supercapacitor discharging cycle starts to take effect at a voltage decline rate of 15 mV/s. The supercapacitor charging period after the microcontroller startup takes 79 s while it takes 108 s to discharge before the microcontroller enters the sleep mode as shown in Figure 15. At 230 s and 2 V, the microcontroller enters the sleep mode for a period that lasts approximately 50 s before the supercapacitor voltage level drops below 1.8 V which is no longer a sufficient amount of power for the microcontroller to operate. Consequently, the supercapacitor starts a self-discharge process as shown in Figure 15.

**Figure 15.** The supercapacitor voltage charge and discharge cycle during frequency sweep 6.5–15 Hz at 1 g [m/s2] acceleration level.

Next, the fully integrated, self-powered, self-contained sensor system was subject to harmonic oscillations and was tested under lab-controlled settings using the experimental apparatus shown in Figure 12. The resulting energy consumptions at different system states are shown in Table 2. The microcontroller internal oscillator was set to the lowest clock frequency of 32 kHz to minimize energy consumption. The ADC peripheral was configured to use the internal system clock oscillator. The microcontroller takes 10 measurements from the sensor signal during analog to digital bit conversion at a sampling rate of 51 kHz. After the measurements conversion is completed in 19.55 μs, the data are stored in the 16-bit ADC results register. The data are then retrieved by the EUSART serial communication peripheral, and the 10-bit data frame is transmitted asynchronously with 1 start bit and 1 stop bit added to each 8-bit sensor data packet and no parity bit.


**Table 2.** Transmitter circuit subsystem energy consumption at different program states.

Figure 16 shows the power consumed by the transmitter subsystem as well as the power generated at an unloaded voltage doubler circuit. The power generated from the unloaded voltage doubler at resonant frequency is 12.6 mW and it exceeds the power consumed by the transmitter subsystem load of 5.1 mW. In the power consumed curve, the decline in power is more gradual compared to the abrupt decline seen in the power generated curve due to having the supercapacitor in the transmitter subsystem circuitry. In both power curves, the profiles show maximum power generation and consumption at resonant frequency of 9.9 Hz.

**Figure 16.** Power generated from the energy harvester at an unloaded voltage doubler output compared to power consumed by the microcontroller-transmitter subsystem load.

The operation and realization of the fully integrated self-powered fault detection system are examined next. A video recording of the self-powered, self-contained, environmentfriendly, and wireless vibration sensor during data acquisition and transmission process is included in the Supplementary Materials. The video recording includes segments from the custom developed fault detection app along with the EDM shaker table vibration control software and the energy harvester and sensor voltage waveforms as well as the experimental setup showing the fabricated energy harvester-sensor unit mounted on the shaker table.

Figures 17–20 show selected timeframes showing the operation of the fully integrated self-powered wireless fault detection system at different stages during the system's operation. Figures 17–20 demonstrate the ability to self-power the sensor and the ability of the frequency tracker in the custom developed app to detect the vibration frequency from the self-powered sensor. For example, Figure 17a shows the vibration frequency from the sensor, detected by the frequency tracker in the custom developed app, at 9.479 Hz. The sensor vibration frequency value can be confirmed by the EDM software shaker table preset frequency, as shown in Figure 17b. Here, the preset frequency was approximately 9.007 Hz. Thus, there is a very slight discrepancy between the frequency detected by the frequency tracker in the custom developed app and the preset frequency from the shaker table. The error is estimated at approximately 5%. This discrepancy may be attributed to a few factors including the high noise in the AM transmitter/receiver module that is not filtered out during the FFT filtering process. Other factors include the use of the internal microcontroller oscillator in this work as opposed to using an external oscillator which would have resulted in a more accurate frequency stability. A bitrate estimated error of approximately 1.24% in the transmitted data is also a contributing factor to the overall error in the received and filtered data. Moreover, Figure 17c demonstrates the output signal from the fabricated self-powered, self-contained senor as the top magne<sup>t</sup> moves inside the top coil in response to the detected vibrations. That is, the dynamic displacement induced in the sensor causes induced voltage in the top coil surrounding the top guided magne<sup>t</sup> as shown in Figure 17c. Here, the power generated by the energy harvester is used to operate the RF transmitter circuit. The output voltage from the sensor is then sampled by the microcontroller on the transmitter circuit. The app, shown in Figure 17a, also demonstrates successful status indicator monitoring capabilities where it shows a green light, indicating a low level of acceleration experienced by the sensor (i.e., low risk operation). Similarly, Figures 18–20 show subsequent timeframes from the demonstration experiment. In those timeframes, the sensor is self-powered, and the app detects a transition from a lower risk operation to a higher risk operation as seen signaled by the status indicator turning red in Figure 18. In Figure 18, the preset frequency from the vibration source (i.e., the shaker table) was approximately 9.652 Hz whereas the app detected tracker frequency was at 9.748 Hz. This results in a small error of approximately 1%. Likewise, the next timeframe in Figure 19 shows the preset frequency was at 10.98 Hz with the app detecting a 10.86 Hz also corresponding to a 1% error in the detected frequency value. The last presented timeframe from the demonstration experiment is shown in Figure 20 with the app detecting a frequency of 11.05 Hz compared to the preset frequency of 11.15 Hz resulting in only a 0.9% error. The mean absolute error value for all frequencies under investigation is found to be 0.8 while the mean percent error value is 2.6%.

**Figure 17.** Screen captures of the self −powered, self −contained, vibration sensor system taken from the demonstration experiment: (**a**) custom developed fault detection app with a frequency tracker feature and status indicator for risk monitoring, (**b**) EDM shaker table vibration control software with preset frequency monitor, and (**c**) sensor voltage waveform monitor at preset frequency 9.007 Hz.

**Figure 18.** Screen captures of the self −powered, self −contained, vibration sensor system taken from the demonstration experiment: (**a**) custom developed fault detection app with a frequency tracker feature and status indicator for risk monitoring, (**b**) EDM shaker table vibration control software with preset frequency monitor, and (**c**) sensor voltage waveform monitor at preset frequency 9.652 Hz.

**Figure 19.** Screen captures of the self −powered, self −contained, vibration sensor system taken from the demonstration experiment: (**a**) custom developed fault detection app with a frequency tracker feature and status indicator for risk monitoring, (**b**) EDM shaker table vibration control software with preset frequency monitor, and (**c**) sensor voltage waveform monitor at preset frequency 10.98 Hz.

**Figure 20.** Screen captures of the self −powered, self −contained, vibration sensor system taken from the demonstration experiment: (**a**) custom developed fault detection app with a frequency tracker feature and status indicator for risk monitoring, (**b**) EDM shaker table vibration control software with preset frequency monitor, (**c**) sensor voltage waveform monitor at preset frequency 11.15 Hz.

### **6. Conclusions**

In this work, we have introduced a novel self-powered self-contained wireless vibration sensor for fault detection in dynamic structures. The energy harvester-sensor unit is based on dual mass moving magnets. The voltages are extracted from the moving magnets by the coil surrounding the casing around the magnets. The power produced by the energy

harvester subunit is used to operate an RF based transmitter subsystem. The transmitter subsystem sends mechanical vibration levels through a sensor subunit to a custom developed fault detection app wirelessly. The app notifies the user of the degree of risk associated with the operation by applying an FFT algorithm to the transmitted vibration data. The app can identify the frequency of the vibration with a low error of approximately 1% in most of the transmitted values. Unlike commonly studied self-powered vibration based WSN transmitter subsystems, this work utilizes the active power of the sensor subunit during the transmission process as opposed to requiring to power a passive vibration sensor. The transmitter subsystem operates at ultra-low power where the total consumption of energy to transmit a sensor value is approximately 0.894 μJ at 3 V. The transmitter subsystem can transmit the data from the sensor at a minimum operable acceleration of 0.7 g [m/s2] and an excitation range of 5–10.6 Hz. The significance in this work also lies in the low frequency required to operate the energy harvester-sensor unit that is widely available in many surrounding environments. Future work will consider employing a more energy-aware circuit with maximum power point (MPPT) tracking capability. This enhancement will allow for a self-sufficient stand-alone field operation of the energy harvester sensor unit.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/s22062352/s1. Video S1: Demonstration experiment.

**Author Contributions:** Conceptualization, G.A. and H.B.; methodology, G.A. and H.B.; software, G.A.; validation, G.A. and H.B.; formal analysis, G.A. and H.B.; investigation, G.A. and H.B.; resources, H.B.; data curation, G.A.; writing—original draft preparation, G.A. and H.B.; writing—review and editing, H.B; visualization, G.A.; supervision, H.B.; project administration, H.B.; funding acquisition, H.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by the Louisiana Department of Transportation-Transportation Research Center [Contract DOTLT1000369], Louisiana Board of Regents-National Science Foundation (BoR/NSF) Grant and Cooperative Agreement Number [80NSSC20M0110], and Louisiana Board of Regents Support Fund contract number [LEQSF(2020-24)-LaSPACE]. Dr. Bardaweel is W.W. Chew II Endowed Professor. W.W. Chew II Professorship is made available through the State of Louisiana Board of Regents Support Fund. The APC of this article was funded by W.W. Chew II Professorship. The views expressed in this article are those of the authors and do not reflect the official policy or position of the funding agencies.

**Data Availability Statement:** Data are contained within the article or Supplementary Material. The data presented in this study are available in [this manuscript].

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


### *Article* **Partial Transfer Ensemble Learning Framework: A Method for Intelligent Diagnosis of Rotating Machinery Based on an Incomplete Source Domain**

**Gang Mao, Zhongzheng Zhang, Sixiang Jia, Khandaker Noman and Yongbo Li \***

> MIIT Key Laboratory of Dynamics and Control of Complex System, School of Aeronautics, Northwestern Polytechnical University, Xi'an 710072, China; mg0207@yeah.net (G.M.);

zhangzhongzheng@mail.nwpu.edu.cn (Z.Z.); sixiang\_j@163.com (S.J.); khandakernoman93@nwpu.edu.cn (K.N.) **\*** Correspondence: yongbo@nwpu.edu.cn

**Abstract:** Most cross-domain intelligent diagnosis approaches presume that the health states in training datasets are consistent with those in testing. However, it is usually difficult and expensive to collect samples under all failure states during the training stage in actual engineering; this causes the training dataset to be incomplete. These existing methods may not be favorably implemented with an incomplete training dataset. To address this problem, a novel deep-learning-based model called partial transfer ensemble learning framework (PT-ELF) is proposed in this paper. The major procedures of this study consist of three steps. First, the missing health states in the training dataset are supplemented by another dataset. Second, since the training dataset is drawn from two different distributions, a partial transfer mechanism is explored to train a weak global classifier and two partial domain adaptation classifiers. Third, a particular ensemble strategy combines these classifiers with different classification ranges and capabilities to obtain the final diagnosis result. Two case studies are used to validate our method. Results indicate that our method can provide robust diagnosis results based on an incomplete source domain under variable working conditions.

**Keywords:** partial transfer learning; ensemble strategy; fault diagnosis; deep adversarial convolutional neural network

### **1. Introduction**

Rotating components play a significant role in system performance and are widely applied in engineering machinery such as aerobat, engine, and gearbox systems [1,2]. The failure of rotating components may cause unexpected downtime and economic losses. Therefore, it is crucial to precisely identify and detect the fault states of rotating machinery [3]. Recently, intelligent fault diagnosis has become a hotspot because it can analyze vast amounts of measured data and provide intuitionistic diagnosis results [4].

Intelligent fault diagnosis has received a lot of attention in recent years from both industrial engineers and academic researchers and has accomplished remarkable achievements [5]. For example, shallow machine learning techniques such as support vector machine (SVM) [6] and random forest (RF) [7] have been studied. Deep learning methods have been researched that can adaptively extract the fault features hidden in a collected signal, such as recurrent neural network (RNN) [8], convolutional neural network (CNN) [9], and stack autoencoder (SAE) [10]. In addition, some variant models are being studied, such as dilated CNN [11], CNN with capsule network [12], and multiscale CNN [13]. However, the existing methods are developed based on statistics, which assume that adequate labeled samples are obtainable to train the models. In addition, these methods require the data distribution of training and testing to be identical [14]. In actual industry settings, obtaining a large amount of labeled data is unrealistic. Even if the labeled data can be acquired, the aforementioned methods may fail to recognize the unlabeled data collected

**Citation:** Mao, G.; Zhang, Z.; Jia, S.; Noman, K.; Li, Y. Partial Transfer Ensemble Learning Framework: A Method for Intelligent Diagnosis of Rotating Machinery Based on an Incomplete Source Domain. *Sensors* **2022**, *22*, 2579. https://doi.org/ 10.3390/s22072579

Academic Editor: Mohammad Noori

Received: 8 March 2022 Accepted: 25 March 2022 Published: 28 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

from another machine or under different working conditions due to the inconsistent data distribution [15].

The proposal of transfer learning aims to solve this problem by promoting models trained by labeled data from a relevant domain to the target fields [16]. The implementation of transfer learning for machine fault diagnosis mainly includes two scenarios: (1) A few target-domain-labeled data are available but are insufficient to support the model training. Qian et al. [17] implemented bearing fault diagnosis under diverse working conditions by transferring the parameters of SAE. Chen et al. [18] studied the use of transferable CNN to recognize the fault states of rotary machinery by pre-training a 1D-CNN using the source data and fine-tuning it with the limited labeled samples in the target domain. (2) There are no available labeled target data to participate in the model training process. One solution is to add a domain adaptation term to the loss function, such as the Maximum Mean Discrepancy (MMD) [4,19,20], Wasserstein distance [21]. Another solution is to implement the transfer learning by use of an adversarial network, in which case a feature extractor aims to extract domain-insensitive features from the target and source domains by adversarial training [22–24].

The existing cross-domain fault diagnosis methods can obtain superior results in the target domain, but the precondition lies in the assumption that the health states in the target domain are identifiable with the source domain. However, given the variation of operations and unpredictability of the fault states, it is difficult to guarantee that the current or future fault states have all been learned in the training phase. Therefore, the source training dataset is usually incomplete, and there are some additional failure states in the target domain. This causes negative transfer and misclassification in the testing stage. These private failure state data can be collected from another component, but the working conditions, such as speed, load, and frequency, are completely different from the source domain and target test data. Figure 1 shows an example of such a situation. Dataset *A* is collected from bearing 1 and contains five health states. However, during the test, more fault states appeared due to the change in working conditions, resulting in seven health states. The data for the two missing health states can be supplemented from dataset *B*. Dataset *B* is collected from bearing 2 and includes four health states total. So, the data source domain discrepancy between *A* and *B* also needs to be taken into consideration; this creates some difficulties for the implementation of transfer learning diagnostic methods.

**Figure 1.** Example of the situation of fault diagnosis with new health states.

This research studies a partial transfer ensemble learning framework (PT-ELF) to solve the above problem. First, two incomplete source domain datasets collected from different components or under different working conditions are defined. Note that neither of them contains all the health states present in the target domain data. They are used to form a complete dataset in which all the health states are included. Then, a weak global classifier based on the complete dataset and two partially strong classifiers based on the deep adversarial network are established. Finally, since the classification ability and classification range of classifiers differ, a particular ensemble strategy is designed to combine these two strong partial classifiers and the weak global classifier, resulting in the final diagnostic results. The main contributions of this research are summarized as follows:


The rest of this article is arranged as follows: Section 2 presents the basic theories. The details of the proposed PT-ELF are given in Section 3. Section 4 validates the proposed method and analyzes the results. Finally, the conclusion in Section 5 brings the study to a close.

### **2. Basic Theory**

#### *2.1. Convolutional Neural Network*

A standard CNN usually includes convolution, pooling, fully connected, and output layers. In addition, batch normalization operation is usually used in CNN [25]. A convolution layer is combined with a pooling layer to form a convolution block, and a deep architecture is built from several such blocks. A Softmax Regression layer usually serves as the last layer and performs regression or classification [26]. In a convolutional layer, the local receptive is adopted, in which only part of the input sample points connect to each node. This operation rapidly decreases the number of parameters and the model complexity. To identify the local features throughout the input sample, weights and biases are shared between the hidden neurons in one convolutional layer [27]. The process in the convolutional layer can be expressed as:

$$\mathbf{z}\_n^l = \sum\_k \mathbf{x}\_k^{l-1} \ast \mathbf{w}\_n^l + \mathbf{b}\_n^l \tag{1}$$

where **x***l*−<sup>1</sup> *k* is the *k*-th node in *l* − 1 layer. \* represents the convolution operation. **w***ln* and **b***ln* represent the weight and the corresponding bias. Additionally, the activation function *ϕ*(•) is given to transform the convolution layers nonlinearly, which can be denoted as:

$$\mathbf{c}\_{n}^{l} = \boldsymbol{\varphi}\left(\mathbf{z}\_{n}^{l}\right) \tag{2}$$

where **c***ln* represents the *k*-th nonlinear feature value in *l* − 1 layer. Sigmoid and ReLU activation functions are commonly used in CNN. Sigmoid can normalize the input data to between 0 and 1. ReLU can enhance the efficiency of the model training and decrease the risk of gradient disappearance [28].

In a pooling layer, the down-sampling operation can decrease the dimension of the features and enhance their robustness. Mathematically, a maximum pooling operation is defined as:

$$po\_j = \max\_{i \in m\_j} \{ \mathbf{c}\_j(i) \}\tag{3}$$

where **c***j* represents the *j*-th location, and the *poj* is the output of the pooling. For classification tasks, after several convolution blocks and fully connected layers, the Softmax function is usually utilized to predict categories. The loss objective function can be expressed as:

$$H(r, p) = -\sum\_{i} r\_i \log(p\_i) \tag{4}$$

where *p* represents the output probability, and *r* corresponds to the actual labels.

#### *2.2. Deep Adversarial Convolutional Neural Network*

Generally, a deep adversarial convolutional neural network (DACNN) consists of a feature extractor *Gf*, a domain discriminator *Gd*, and a classifier *Gy* [29–31]. The feature extractor, namely several convolution blocks, serves as a contestant in the DACNN. It can be expressed as *Gf* = *Gf*(*<sup>x</sup>*, *θ f*), which indicates that the features are extracted from the input sample *x* with parameters *θ f* . In addition, a discriminator (binary classifier) is treated as the opponent, which is expressed as *Gd* = *Gd*(*Gf*(*x*), *<sup>θ</sup>d*). Input the source and target samples into the feature extractor, and the output features are further distinguished by the discriminator *Gd*. The binary cross-entropy loss is taken as an objective function, which is described as:

$$L(\mathcal{G}\_d(\mathcal{G}\_f(\mathbf{x}\_i)), d\_i) = d\_i \log \frac{1}{\mathcal{G}\_d(\mathcal{G}\_f(\mathbf{x}\_i))} + (1 - d\_i) \times \log \frac{1}{1 - \mathcal{G}\_d(\mathcal{G}\_f(\mathbf{x}\_i))} \tag{5}$$

where *di* denotes the binary variable for *xi*. Through the adversarial training between two parts, the feature extractor *Gf* tends to extract the common features from the two types of data and makes it hard to differentiate 0 or 1 as the discriminator. Hence, the model can perform well on both the source and target datasets. The loss function is expressed as:

$$E(\theta\_{f'}, \theta\_d) = -\left(\frac{1}{n} \sum\_{i=1}^n L\_d^i(\theta\_{f'}, \theta\_d) + \frac{1}{N - n} \sum\_{i=n+1}^N L\_d^i(\theta\_{f'}, \theta\_d)\right) \tag{6}$$

where *n* and *N* − *n* represent the sample number of the source and target domain.

Additionally, all of the labeled samples should be supervised during training to ensure the accuracy of the diagnosis in the adversarial procedure. Thus, a classifier is established and is expressed as *Gy* = *Gy*(*Gf*(*x*), *<sup>θ</sup>y*) : *R<sup>D</sup>* → *R<sup>L</sup>* with parameters *<sup>θ</sup>y*, in which *L* is the number of classes. The cross-entropy loss is applied in the Softmax function and is described as:

$$L(G\_y(G\_f(\mathbf{x}\_i)), y\_i) = \log \frac{1}{G\_y(G\_f(\mathbf{x}\_i))\_{y\_i}} \tag{7}$$

Adding Equation (7) to the objective function (6), the optimization objective can be expressed as:

$$E(\boldsymbol{\theta}\_{f^\prime}, \boldsymbol{\theta}\_{y^\prime}, \boldsymbol{\theta}\_d) = \frac{1}{n} \sum\_{i=1}^n L\_y^i(\boldsymbol{\theta}\_{f^\prime}, \boldsymbol{\theta}\_y) - \lambda \left(\frac{1}{n} \sum\_{i=1}^n L\_d^i(\boldsymbol{\theta}\_{f^\prime}, \boldsymbol{\theta}\_d) + \frac{1}{N - n} \sum\_{i=n+1}^N L\_d^i(\boldsymbol{\theta}\_{f^\prime}, \boldsymbol{\theta}\_d)\right) \tag{8}$$

where *Liy*(*θ f* , *<sup>θ</sup>y*) = *<sup>L</sup>*(*Gy*(*Gf*(*xi*)), *yi*) and *λ* is a non-negative hype-parameter trade-off for the losses of the discriminator. In the whole training procedure of the DACNN, the optimization parameters *θ f* , *<sup>θ</sup>y*, *θd* can be obtained by:

$$E(\hat{\theta}\_{f'}, \hat{\theta}\_{\mathcal{Y}}) = \underset{\theta\_{f'}, \theta\_{\mathcal{Y}}}{\text{argmax}} E(\theta\_{f'}, \theta\_{\mathcal{Y}'}, \hat{\theta}\_d) \tag{9}$$

$$\boldsymbol{\theta}\_d = \underset{\boldsymbol{\theta}\_d}{\text{argmax}} \boldsymbol{E}(\boldsymbol{\hat{\theta}\_{f'}}, \boldsymbol{\hat{\theta}\_{y'}}, \boldsymbol{\theta}\_d) \tag{10}$$

The flowchart of the DACNN is displayed in Figure 2. By optimizing Equations (9) and (10), the DACNN tends to train a feature extractor *Gf* that can extract suitable representations from input samples that can be classified accurately by the classifier *Gy* but weakens the ability of the discriminator *Gd* to differentiate which domain this representation is from. In the phases of testing, the domain-insensitive features are extracted by the feature extractor G*f* and fed into the health state classifier *Gy* to identify the states immediately.

**Figure 2.** The schematic of the DACNN.

#### **3. The Proposed Method**

This section describes the proposed method in detail. It mainly includes problem formulation, the training of the three classifiers, and the classifiers' ensemble.

### *3.1. Problem Formulation*

Before implementing the proposed method, two incomplete source domain datasets *A* and *B* are defined as shown in Figure 3. The source dataset *A*= {(*xAi* , *yAi* )}*nAi*=<sup>1</sup> of *nA* labels instances associated with |*DA*| classes and is drawn from distribution *PSA*. The source dataset *B*= {(*xBi* , *yBi* )}*nBi*=<sup>1</sup> of *nB* labels instances associated with |*DB*| classes collected from another same-type component and is drawn from distribution *PSB*. The class label spaces of *A* and *B* are denoted as *DA* and *DB*, respectively. The collection of different components results in variations in the operating conditions (such as load, speed, etc.) in a real industrial environment; this means that *PSA* = *PSB*. In addition, there must be some shared health states contained in both source dataset *A* and source dataset *B*, which are denoted as *D* = *DA* ∩ *DB* and shown in Figure 3. *D* ˆ *A* = *DA*\*DB* denotes the private label space of the *A* and *D* ˆ *B* = *DB*\*DA* denotes the private label sets of *B*.

**Figure 3.** Two different source domain datasets.

However, in the testing stage of the actual machine fault diagnosis scenario, all possible health states may appear. Therefore, the target domain dataset includes all health states; it can be expressed as *T*= {(*xTi* )}*nTi*=<sup>1</sup> of *nT* unlabeled instances associated with |*DT*| classes drawn from distribution *PT*. The *DT* represents the label sets of the target domain and *DT* = *DA* ∪ *DB*. In addition, the target domain distribution *PT* is different in source domain distributions *PSA* and *PSB*.

This paper aims to establish a fault diagnosis model to realize fault diagnosis based on incomplete source training data under different operating conditions.

### *3.2. Classifier Training*

This section describes the training procedure for the three classifiers (weak classifier C*W*, classifier C*A*, and classifier C*B*) concretely.

First, a complete dataset *C* that contains all of the classes can be formed based on the incomplete source datasets *A* and *B*, as shown in Figure 4. In the complete dataset *C*, the sample in label space *D* ˆ *A* is from source dataset *A*, and the sample in label space *D* ˆ *B* is from source dataset *B*. For the samples in shared label space *D*, a portion of them come from *A*, and the rest come from *B*. Thus, the label space of dataset *C* is the same as *T*, and it includes |*DT*| health states. Second, a standard CNN classifier C*W* is trained using the complete dataset *C*. However, since the source domain datasets *A* and *B* are collected under various work conditions, the samples in the dataset *C* are drawn from two types of distributions. In addition, the data distribution in the testing set *PT* is different in *PSA* and in *PSB*. Therefore, the classifier C*W* has poor classification ability for the target domain data. However, the classifier C*W* has the ability to classify all health states.

**Figure 4.** The process of forming a complete dataset C.

After the weak classifier C*W* is obtained, the test samples from the target domain *T*= {(*xTi* )}*nTi*=<sup>1</sup> of *nT* unlabeled instances associated with |*DT*| classes are classified, and the result is served as a pseudo-label to participate in the subsequent training. Target domain samples whose pseudo-label is in *DA* are obtained to construct the target domain training set *A*T. The samples whose pseudo-label is in *DB* are obtained to construct the target domain training set *B*T. Thus, the datasets *A* and *A*T have the same label space *DA*, and the datasets *B* and *B*T have the same label space *DB*.

Dataset *A* and *A*T have the same health states but draw from different distributions. So, a DACNN model can be trained using the datasets *A* and *A*T. A feature extractor and a classifier in this DACNN are combined to form a block, which is taken as classifier C*A*. The classifier C*A* is constructed by a DACNN using domain adaptation techniques, so that it has a strong classification ability for the unlabeled target domain dataset. However, the classification range of strong classifier C*A* is limited to |*DA*| classes. After the training of classifier C*A* is completed, classifier C*B* is trained in the same way. Similarly, the classification range of C*B* is limited to |*DB*| classes.

In the implementation process of the DACNN, the SELU activation function is used in convolutional layers; its mathematical expression is expressed as Equation (11):

$$\text{SELU}(\mathbf{x}) = \lambda \begin{cases} \; \varkappa e^{\mathbf{x}} - \mathbf{a} & (\mathbf{x} \le \mathbf{0}) \\ \; \mathbf{x} & (\mathbf{x} > \mathbf{0}) \end{cases} \tag{11}$$

where the value of *α* is 1.6732, and the value of *λ* is 1.0507. The SELU activation function can automatically normalize the sample distribution to 0 mean value and unit variance to avoid the gradient exploding or disappearing. The activation function used in the fully connected layer in the state classifier and domain discriminator is ReLU, and it is expressed as Equation (12):

$$\text{ReLU}(x) = \begin{cases} \
0 & (x \le 0) \\
x & (x > 0) \end{cases} \tag{12}$$

In this way, three well-trained classifiers are achieved, including one weak global classifier C*W*, one strong partial classifier C*A*, and one strong partial classifier C*B*. The details of the three classifiers are listed in Table 1.

**Table 1.** Classification range and ability of the three classifiers.


### *3.3. Classifiers' Ensemble*

After the three classifiers are obtained, this section designs a particular ensemble strategy to combine their results. The procedure for the ensemble strategy is presented in Figure 5.

After inputting a testing sample *x* into the three classifiers, the classification result **y***W*, **y***A*, and **y***B* can be output from the three classifiers, which can be expressed as:

$$\begin{cases} \mathbf{y}\_W = \mathbf{C}\_W(\mathbf{x}) \\ \mathbf{y}\_A = \mathbf{C}\_A(\mathbf{x}) \\ \mathbf{y}\_B = \mathbf{C}\_B(\mathbf{x}) \end{cases} \tag{13}$$

If **y***W* = **y***A* **y***W* = **y***B* **y***A* = **y***B* is satisfied, the final result **y** can be obtained by a majority voting strategy immediately. Otherwise, it means that the results of the three classifiers are different from each other. In such cases, because the classifier C*W* is a global classifier, **y***W* is served as the reference standard. If **y***W* ∈ *DA* is satisfied, that means that the actual label of *x*may be in *DA*. In this range, the classifier C*A* has perfect classification ability, and thus **y***A* is served as the final result. Similarly, if **y***W* ∈ *DB* is satisfied, **y**B is served as the final result. However, if **y***W* ∈ *D* is satisfied, both the classifiers C*A* and C*B*

have good classification ability in this shared range. In this case, **y** is determined according to the output probability *p* in the Softmax layer of classifiers, and it can be expressed as:

$$\begin{cases} \mathbf{y} = \mathbf{y}\_A & \text{if} \quad p\_A = \max(p\_{A\prime}, p\_{B\prime}, p\_W) \\ \mathbf{y} = \mathbf{y}\_B & \text{if} \quad p\_B = \max(p\_{A\prime}, p\_{B\prime}, p\_W) \\ \mathbf{y} = \mathbf{y}\_W & \text{if} \quad p\_W = \max(p\_{A\prime}, p\_{B\prime}, p\_W) \end{cases} \tag{14}$$

where the *pA*, *pB*, and *pW* represent the Softmax output probability of classifiers C*A*, C*B*, and C*W*; max(·) is the maximum function.

**Figure 5.** The flowchart of the classifiers' ensemble.

#### *3.4. Architecture of the Proposed Method*

The architecture of our method for fault diagnosis is presented in Figure 6, and the process is summarized below.


**Figure 6.** The overall procedures of the proposed method.

### **4. Experimental Verification**

To validate the effectiveness of the proposed PT-ELF method, rotor and rolling bearing experiments are designed. Note that the code for the proposed method is written in Pytorch 1.2 and runs with 16G RAM and a Core I5 10400F CPU.

### *4.1. Case 1*

### 4.1.1. Rotor Experiment

Case 1 adopts the rotor dataset from Northwestern Polytechnical University. As shown in Figure 7a, the experimental system is composed of a three-phase variable frequency motor, single-span rotor shafting, torque speed sensor, rolling bearing seat, shafting load plate, rubbing mounting bracket, platform bottom plate, radial loading device, coupling, system control cabinet, and fault suite. A displacement sensor is mounted on the rotor test bench to collect vertical vibration signals under a health state and six different fault states as shown in Figure 8, and the sample frequency is 10,240 Hz. Figure 7b depicts the sensor and single-span rotor shaft layout. The structural components are listed in Table 2.

**Figure 7.** The rolling bearing experiment system: (**a**) the experimental test rig; (**b**) the layout of the test rig.

**Figure 8.** Six different fault states: (**a**) full annular rub; (**b**) blade crack; (**c**) bearing fault; (**d**) blisk crack; (**e**) Shaft coupling fault; (**f**) Shaft crack.

**Table 2.** The structural components of the single-span rotor shafting.


The rotor vibration data are collected under three working load conditions of 0%, 20%, and 40%. As detailed in Table 3, for each load, data from seven health states (including a health state and six fault states) are used. The data in each state are divided into 300 samples, with 80 randomly selected as tests and the remaining 220 used to train. Each sample, each consisting of 800 data points, is used to verify the method proposed in this paper. Figure 9 shows the waveform of the original displacement signal and the spectral distributions of each health state under 0% load. The left shows the spectral signal, and the right shows the corresponding spectrum. The signals have a large amplitude of around 10–30 Hz, showing relatively similar characteristics, which makes it hard to recognize the health states.

**Table 3.** Seven health states of the rotor.


**Figure 9.** Original displacement signals and spectral distributions: (**a**) health; (**b**) full annular rub; (**c**) blade crack and bearing fault; (**d**) blade crack; (**e**) blisk crack; (**f**) shaft coupling fault; (**g**) shaft crack.

#### 4.1.2. Results and Discussion

In this case study, two incomplete source datasets are constructed, as shown in Table 4. The source dataset *A* contains five kinds of health states (states 1–5), and the source dataset *B* contains four kinds of health states (states 4–7).


**Table 4.** Distribution of health states in two source domains and one target domain.

First, the source domain datasets *A* and *B* are mixed to form a training set that contains all health states, which is used to train a weak classifier C*W*. The classifier C*W* has a classification ability for all of the health states (seven kinds of health states). Second, according to the classification results (the pseudo-label) of the weak classifier C*W* on the target domain samples, two transfer models based on a DACNN are trained. They are transferred from source domain dataset *A* and source domain dataset *B* to the target domain. Thus, two strong classifiers C*A* and C*B* are trained. Finally, after classifying a test sample by the classifiers C*A*, C*B*, and C*W*, three results are obtained and fused by the proposed ensemble strategy described in Section 3.3.

To demonstrate that our method is applicable to various operating conditions, five test scenarios (test scenarios A1–E1) are designed to test the proposed method. As listed in Table 5, the source domain *A*, source domain *B*, and target domain are served by the collected dataset under different loads. In source dataset *A*, only five kinds of labeled samples in states 1–5 are available. Similar to source domain *A*, in source dataset *B*, only four types of labeled samples in states 4–7 are available. The test data in the target domain contain all seven kinds of unlabeled samples in states 1–7.



The accuracies of the three classifiers (two strong partial classifiers and a weak global classifier) and the proposed PT-ELF method in the five test scenarios are listed in Table 6, and a bar diagram is shown in Figure 10a. Note that the accuracy of C*A* is tested by states 1–5, and the accuracy of C*B* is tested using states 4–7. The result of the weak classifier C *W*and the ensemble result are tested using target domain test data that contain all of the health states (states 1–7).

**Table 6.** Results of different classifiers.


**Figure 10.** The result diagram for different classifiers.

It can be seen from Table 6 that the two strong classifiers C*A* and C*B* have high accuracy in the corresponding classification range, with averages of 93.29% and 96.83%. On the one hand, this is because the two strong classifiers are trained by a domain adversarial network DACNN, which can extract domain-insensitive features to classify. On the other hand, they are just tested by partial health states. The result of the weak classifier C *W* is relatively poor, with an average accuracy of 86.52%. This is because the data of the target domain and two source domains are not uniformly distributed, leading to the decrease in classification performance.

Out of five test scenarios, the result in scenario B1 is the highest at 95.41%; scenario C1 has the lowest accuracy at 83.75%, and the average is 90.73%. This is significantly higher than the weak classifier C *W*, and maintains a high classification accuracy. This is because the proposed ensemble strategy can cause the test sample to be classified by the corresponding strong classifier as far as possible. It indicates that our method can still achieve good results even under incomplete training data.

In addition, to prove the superiority of our method, relevant methods for a CNN and a DACNN, trained by source dataset *A* and source dataset *B*, respectively, are used as comparison methods (Method 1–4). The result is listed in Table 7, and a bar diagram of the various methods is shown in Figure 10b. It can be observed that the average accuracies of the CNN trained by source domains *A* and *B* are 58.87% and 55.27%, respectively. The average accuracies of the DACNN trained by source domains *A* and *B* are 64.02% and 56.79%, respectively, which are significantly higher than the accuracy of the CNN. This is because the DACNN can extract domain-insensitive features using adversarial training; this restrains the model's performance decrease caused by a distribution discrepancy and further improves the accuracy of the model in the target domain. However, since the source domain *A* is incomplete, a model (CNN or DACNN) trained by source dataset *A* is unable to classify the testing samples whose actual label is in *D*ˆ *B* (states 6–7). Similarly, a model (CNN or DACNN) trained by source dataset *B* is unable to classify the testing samples whose actual label is in *D*ˆ *A* (states 1–3); therefore, the results of methods 1–4 are poor compared to our method. The average accuracy of our method is as high as 90.73%, which indicates that the proposed method has good classification ability for all health states presented in the testing dataset in the target domains.


**Table 7.** Results of different methods.

### *4.2. Case 2*

4.2.1. Rolling Bearing Experiment

The rolling bearing vibration data utilized in case 2 are from Case Western Reserve University [32]. As shown in Figure 11, the setup mainly consists of a loading motor, an induction motor, and testing bearings. The vibration signals used in this case are collected by an accelerometer installed near the drive end. As listed in Table 8, the vibration signals were collected under four different loads (Load 1–Load 4). Each fault was artificially implanted into the bearings with different severity levels from 0.007 to 0.028 inches in diameter (1 inch = 25.4 mm). The details of the test bearing are listed in Table 9.

**Figure 11.** The experiment setup of rolling bearing.

**Table 8.** Four different loads.


**Table 9.** Details of the test bearing.


The vibration data collected under four different loads are used to test the proposed method. Each of them includes 12 health states, which include different failure locations (shown in Figure 12), different failure orientations, and different failure severities. As detailed in Table 10, each health state contains 300 samples, which consist of 400 continuous data points. At random, 200 samples are selected to train, and the remaining 100 are used to test. The raw vibration is under 1797 rpm (0 hp) (in the left column), and the corresponding spectral distributions (in the right column) are shown in Figure 13. In terms of raw vibration signals, the health state vibration amplitude is relatively small (Figure 13a). The fault signals (Figure 13b–i) have an obvious impact. The spectral distribution contains the fault frequency and the bearing natural frequency. In addition to the health signals, the other fault vibration signals have a higher amplitude of around 3–4 kHz. It is still very unrealizable to accurately distinguish the fault location, dimension, and orientation across different working conditions with new fault states.

**Figure 12.** The faults of bearing in three locations: (**a**) ball fault; (**b**) inner fault; (**c**) outer fault.


**Table 10.** The details of the 12 operating states.

The proposed method mainly studies the case in which only partial health state labeled data are available in the source domain. To verify our method, we assume that source domain dataset *A* only contains eight kinds of fault state labeled data, while source domain dataset *B* contains seven kinds of labeled data. Among them, three categories overlap, as shown in Table 11. In addition, all target domain data are unlabeled; these data contain 12 kinds of health states.

**Table 11.** Distribution of health states in source and target data.


4.2.2. Results and Discussion

Similar to Case 1, the source datasets *A* and *B* are first mixed to form a training set containing all health states, and it is used to train the weak classifier C *W*. Thus, C *W* has a classification ability for all of the health states, but the classification ability is weak.

In the following step, two DACNN models are trained based on source domain datasets *A* and *B* to adapt target domain data. Then, two strong classifiers C*A* and C*B* can be obtained. In each DACNN, the feature extractor G*f* contains two convolution blocks. Meanwhile, the classifier <sup>G</sup>*y* contains a fully connected layer and output by a Softmax function. The <sup>G</sup>*y*(G*<sup>f</sup>* (*x*)) in the DACNN is taken as the classifier. Finally, three well-trained classifiers C*A*, C*B*, and C *W* with different classification capabilities and classification ranges are integrated using the ensemble strategy introduced in Section 3.3 to obtain the final diagnosis result.

**Figure 13.** Waveform of raw signals and spectral distributions of the rolling bearing: (**a**) health; (**b**) rolling element failure (0.007); (**c**) rolling element failure (0.014); (**d**) rolling element failure (0.021); (**e**) inner race failure (0.007); (**f**) inner race failure (0.021); (**g**) inner race failure (0.028); (**h**) outer race failure (0.007 Center); (**i**) outer race failure (0.007 Vertical); (**j**) outer race failure (0.014 Center); (**k**) outer race failure (0.021 Center); (**l**) outer race failure (0.021 Vertical).

To demonstrate that our method is applicable to different working conditions, five test scenarios (test scenarios A2–E2) with incomplete data are used to test the proposed method, as shown in Table 12. In source dataset *A*, eight kinds of labeled samples in states 1–8 are available, and in source dataset *B*, seven kinds of labeled samples in states 6–12 are available. The target data, which contains 12 kinds of unlabeled samples in states 1–12, is used to test. In the five test scenarios, source domain datasets *A* and *B* and the target domain dataset are served by data collected under different loads. To indicate the superiority of our method, two conventional deep learning methods based on CNN (method 1 and method 2) and two transfer learning methods based on DACNN (method 3 and method 4) are used for comparison in five test scenarios; the results are listed in Table 13. Method 1 and method 3 are trained using source dataset *A*, and method 2 and method 4 are trained using source dataset *B*. In order to show the comparison results visually, the results bar diagram for different methods is shown in Figure 14.

**Table 12.** Five different test scenarios.


**Table 13.** Results of different methods.


**Figure 14.** The results diagram for different methods.

As shown in Table 13 and Figure 14, the average accuracies of methods 1 and 2 are 64.27% and 57.53%, respectively. The average accuracies of method 3 and method 4, based on transfer learning, are 66.22% and 58.05%, respectively. This is because the

DACNN can solve the problem of cross-domain fault diagnosis well and enhances the recognition accuracy in the target domain. However, since the source datasets *A* and *B* are incomplete, neither of them contains all the health states presented in the testing data; the fault classification accuracy is still relatively low even if the transfer strategy is used. The accuracy of the method proposed can achieve 98.08%, 95.41%, 99.66%, 99.25%, and 95.83% in five test scenarios, respectively. Accuracy is the lowest in test scenario B2, but it can still remain at 95.41%. In test scenario C2, the classification accuracy is the highest at 99.66%. The comparison results demonstrate that the proposed PT-ELF method exhibits satisfactory cross-domain diagnostic ability with new health states.

### **5. Conclusions**

This paper proposes a rotating machinery fault diagnosis method based on partial transfer learning and ensemble learning. Unlike other existing cross-domain diagnostic methods with the assumption of the same health states in the source and target domains, the proposed method can provide a reliable diagnosis result in the target domain even when the source domain is incomplete and only contains partial health states. As the core of the proposed method, partial transfer learning can avoid the problem induced by incomplete training data and train two classifiers with strong classification capabilities for partial categories. Then, a particular ensemble strategy is designed to combine the output of the three classifiers (a weak global classifier and two strong partial classifiers). The effectiveness of the proposed method is validated on a rotor experiment and a bearing experiment. After comparing with four related methods, results indicate that the proposed method can achieve superior performance and provide a reliable diagnosis result based on incomplete source domain under various working conditions.

In this preliminary study, the proposed method lies in the assumption that the missing health states in the source domain training set can be obtained from another dataset or another component. The unseen health states will be considered in our future research.

**Author Contributions:** Data curation, Z.Z.; Formal analysis, G.M.; Funding acquisition, Y.L.; Investigation, S.J.; Methodology, G.M.; Resources, Z.Z.; Software, S.J.; Validation, G.M. and K.N.; Writing—original draft, G.M.; Writing—review & editing, K.N. and Y.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research is supported by the National Natural Science Foundation of China under Grant 51805434 and 12172290 and Key Laboratory of Equipment Research Foundation under Grant 6142003190208.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare that they have no known competing financial interest or personal relationship that relate to the work reported in this paper.

## **Abbreviations**


## **References**


### *Article* **An Oversampling Method of Unbalanced Data for Mechanical Fault Diagnosis Based on MeanRadius-SMOTE**

**Feng Duan, Shuai Zhang \*, Yinze Yan and Zhiqiang Cai**

> School of Mechanical Engineering, Northwestern Polytechnical University, Xi'an 710072, China; fengduan@mail.nwpu.edu.cn (F.D.); yanyinze@mail.nwpu.edu.cn (Y.Y.); caizhiqiang@nwpu.edu.cn (Z.C.) **\*** Correspondence: zhangshuai5000@nwpu.edu.cn

**Abstract:** With the development of machine learning, data-driven mechanical fault diagnosis methods have been widely used in the field of PHM. Due to the limitation of the amount of fault data, it is a difficult problem for fault diagnosis to solve the problem of unbalanced data sets. Under unbalanced data sets, faults with little historical data are always difficult to diagnose and lead to economic losses. In order to improve the prediction accuracy under unbalanced data sets, this paper proposes MeanRadius-SMOTE based on the traditional SMOTE oversampling algorithm, which effectively avoids the generation of useless samples and noise samples. This paper validates the effectiveness of the algorithm on three linear unbalanced data sets and four step unbalanced data sets. Experimental results show that MeanRadius-SMOTE outperforms SMOTE and LR-SMOTE in various evaluation indicators, as well as has better robustness against different imbalance rates. In addition, MeanRadius-SMOTE can take into account the prediction accuracy of the overall and minority class, which is of grea<sup>t</sup> significance for engineering applications.

**Keywords:** mechanical fault diagnosis; unbalanced data set; MeanRadius-SMOTE; minority class

### **1. Introduction**

With the continuous innovation of technology, industrial equipment has developed rapidly in the direction of large-scale, automated, integrated, and intelligent, such as aircraft engines, steam turbines, wind turbines, centrifuges, etc. In order to meet the requirements of mechanical equipment reliability and precision in the industrial field, PHM (Prognostics and Health Management) was initiated to ensure the stable operation of mechanical equipment and reduce maintenance costs [1–3].

With the development of big data in the industrial field, data-driven mechanical fault diagnosis research has received more and more attention [4–6]. Mechanical fault diagnosis generally starts by extracting vibration signals from the operation of the equipment, because vibration signals can provide sufficient fault features to reflect the fault status and serve as the input of the prediction model [7,8]. However, due to the low frequency of some faults, the vibration signals of such faults are too small, and the classifier cannot predict them accurately, which is the problem of unbalanced data sets in fault diagnosis. In the multi-classification mechanical fault diagnosis problem, the machine learning classifier emphasizes the accuracy of the overall prediction, which leads to sacrificing the prediction accuracy of the minority class to ensure the prediction of the majority class samples [9]. However, there are infrequent failures in some mechanical equipment, which will lead to huge economic losses once they occur. Therefore, it is necessary to research the problem of unbalanced data sets in mechanical fault diagnosis.

At present, the research on the problem of unbalanced data sets is relatively mature, but this research in the mechanical fault diagnosis field has just begun [10]. Many fault diagnosis techniques rely on reliable and complete data sets, such as multi-sensing fusion techniques [11]. However, since machinery usually operates under normal conditions, it is

**Citation:** Duan, F.; Zhang, S.; Yan, Y.; Cai, Z. An Oversampling Method of Unbalanced Data for Mechanical Fault Diagnosis Based on MeanRadius-SMOTE. *Sensors* **2022**, *22*, 5166. https://doi.org/10.3390/ s22145166

Academic Editor: Jongmyon Kim

Received: 13 June 2022 Accepted: 8 July 2022 Published: 10 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

difficult to collect enough failure data, so that the actual data set lacks completeness [12,13]. The lack of samples with specific labels can lead to data imbalance problems. In recent years, many scholars have begun to pay attention to this problem and have given their own methods [14,15]. Generally, the solution to the problem of unbalanced data sets is mainly divided into data and algorithm aspects, and sometimes they are combined [16].

For the data aspect, scholars mainly use resampling technology to copy, synthesize, delete, and perform other operations on original samples, to adjust the number of samples to reduce the impact of unbalanced data sets. Resampling techniques are divided into oversampling for minority class samples and undersampling for majority class samples. The main idea of oversampling is to increase the number of minority class samples to achieve class balance. The main methods are divided into replicating samples and generating new samples. ROS (Random Oversampling) is to randomly replicate original samples to expand the number of minority class samples, but it may cause the replication of noise samples to affect the quality of the data set [17]. The method of generating new samples derives new samples from one or more original samples, and the new samples can indirectly reflect the features of the minority class. The most classic oversampling is the SMOTE algorithm [18]. The SMOTE algorithm selects the line connecting the two original samples as the range of the new sample and determines a point on the line as the new sample. However, SMOTE still does not avoid the generation of noise samples, and the new samples are easily affected by the distribution of the original samples, which may cause the new samples to deviate from the actual distribution. Later scholars improved SMOTE in terms of noise reduction and generation algorithms, such as Borderline-SMOTE [19], Adasyn [20], LR-SMOTE [21], etc. Undersampling achieves class balance by reducing the number of majority class samples, such as undersampling based on the clustering algorithm and ENN (Edited Nearest Neighbor) [22]. In fact, most of the unbalanced data sets are caused by too few samples in the minority class, so oversampling is the key research in this field [23].

For the algorithm aspect, with the rapid development of machine learning, many classifiers have responded to the problem of unbalanced data sets. On the premise that each sample is equal, the number of samples determines which class the classifier prefers, so setting the weight of the sample, the threshold of the decision boundary, or the objective function of the classifier can strengthen the ability of the classifier to combat unbalanced data sets [24,25]. Adjusting these can make the classifier's decision boundary less sensitive to the sample size [26]. Moreover, adding a proper regularization term to the objective function can reduce the impact of the imbalance rate on the classifier [27].

There is no universal solution to the problem of unbalanced data sets in mechanical fault diagnosis; although, scholars have tried in various directions. From the perspective of features, extracting more abundant features from vibration signals is beneficial to solving the problem, because the failure can be reflected in the energy of the vibration of the equipment [28]. In addition to features in the time and frequency domains, there are features based on wavelet packet energy and entropy values [29,30], and the fault features are also extracted using a bag-of-visual-word approach from the infrared thermography images [31]. However, the increase of features will undoubtedly increase the workload of feature screening. From the perspective of resampling, scholars use various existing resampling methods to conduct experiments on mechanical equipment [32]. Once there are more failure types or concurrent failures, existing oversampling algorithms may fail. Therefore, analyzing the commonality of mechanical faults and proposing a new oversampling algorithm is the key to solving this problem in the mechanical field [33,34]. From the perspective of the classifier, scholars mainly set the cost matrix, and change the loss function or network structure to make the classifier aware of this imbalance [35]. These classifiers are often only suitable for identifying faults in stationary parts, such as gears or bearings [36].

Although new oversampling algorithms are emerging, there are still the following problems: (1) The solutions are generally only aimed at the prediction of bearing failures or gear failures, so the methods cannot comprehensively diagnose the running state of

complete mechanical equipment. (2) Most of the solutions are aimed at the two-category problem, which is obviously not practical. For a simple secondary planetary gear, there are already as many as eight failure types. (3) The new samples are not effective enough that the existing oversampling methods generate. Although the number has reached a balance, it is far from enough in terms of the amount of fault-type information contained in the sample.

In view of the existing problems, this paper improves SMOTE and proposes an oversampling algorithm called MeanRadius-SMOTE, which is specially used to solve the multi-classification problems in mechanical fault diagnosis. MeanRadius-SMOTE can reduce the production of noise samples and add more samples with the ability to affect the decision boundary, and it is easier to inherit the feature information from the original samples. The complexity of the MeanRadius-SMOTE algorithm is not high compared to SMOTE.

The main contributions of this paper are as follows: To solve the problem of multiclassification unbalanced data sets in mechanical fault diagnosis, a new oversampling algorithm, MeanRadius-SMOTE, is proposed. The algorithm takes into account the performance of prediction of overall and minority class, and especially in the minority class, prediction accuracy is greatly improved. In this paper, a large number of comparative experiments are carried out on data sets with various specifications and imbalance rates, and the effectiveness, stability, and robustness of the algorithm are verified.

The rest of this paper is divided into five parts. In Section 2, the SMOTE algorithm and the improved LR-SMOTE algorithm based on SMOTE are introduced. In Section 3, the specific process of the MeanRadius-SMOTE algorithm is introduced in detail. In Section 4, we mainly introduce the source and processing of the data set, as well as the selection of classifiers and evaluation indicators in the experiment. In Section 5, we introduce the experimental process and experimental results. In the following sections, we discuss and summarize the MeanRadius-SMOTE algorithm based on experiments, and we propose future research directions.

### **2. Related Works**

Since the machine learning algorithm is greedy in the face of multi-classification problems, the classifier will give priority to ensuring the highest overall accuracy, resulting in an inaccurate prediction of some minority class samples. In the real industrial field, in the face of some faults with low probability but high maintenance cost, operators hope that the model can accurately predict these faults. Therefore, this section introduces the commonly used methods to deal with unbalanced data sets, namely, the traditional SMOTE method and the improved LR-SMOTE method.

### *2.1. SMOTE*

The SMOTE algorithm was proposed by Chaw La et al. in 2002 [18], and the algorithm is an improved method based on ROS. In the SMOTE algorithm, new samples are generated based on the original samples, which has a greater probability of obtaining effective features than random oversampling of new samples. The steps of the SMOTE algorithm are as follows:


$$\mathbf{x\_{new}} = \mathbf{x\_i} + rand(\mathbf{0}, \mathbf{1}) \* (\mathbf{x\_h} - \mathbf{x\_i}) \tag{1}$$

Although the SMOTE algorithm overcomes the overfitting problem of the ROS algorithm, SMOTE still has some problems with noise samples and useless samples. Many scholars have improved SMOTE. For example, Han proposed the Borderline-SMOTE algorithm [19]. The algorithm first classifies the original samples into safe, dangerous, and noise, then uses the dangerous samples to generate new samples. It not only reduces the interference of noise points but also enables new samples to better reflect the features of the data set. However, how to accurately divide the three labels is a more difficult problem for different data sets.

### *2.2. LR-SMOTE*

Based on the SMOTE algorithm, Wang proposed the LR-SMOTE algorithm [21]. The algorithm first uses SVM (Support Vector Machine) and K-means to remove the noise samples in the original data set, then changes the generation rules of new samples and considers the center of the samples to generate new samples. The specific steps of the LR-SMOTE algorithm are as follows:


$$\mathbf{x}\_{new} = \mathbf{x}\_{i} + rand(\mathbf{0}, \mathbf{M}\_{i}) \* (\mathbf{x}\_{c} - \mathbf{x}\_{i}) \tag{2}$$

(5) Repeat steps 3 and 4 until the number of samples of the majority class and minority class is balanced.

In the LR-SMOTE algorithm, the new samples are generated based on the functional relationship between the sample center and each sample, rather than any two minority samples. Therefore, the new samples will not deviate from the range of the minority samples, and the features are closer to the original sample. LR-SMOTE provides a good direction for generating rules so that the algorithm determines the distribution of samples according to the sample center. This paper also proposes a new algorithm along this way to solve the unbalanced data sets in the mechanical field. We use the MeanRadius-SMOTE algorithm to experiment on a variety of mechanical failure data sets, and the experimental results show that the MeanRadius-SMOTE algorithm is suitable for solving the problem of unbalanced data sets in the mechanical field.

### **3. Proposed Method**

In an oversampling algorithm, new samples at different geometric locations have different improvements in classifier training. In general, the more new samples near the decision boundary, the greater the impact on the classifier. This paper proposes the MeanRadius-SMOTE (MR-SMOTE) algorithm considering the sample center and radius. When using machine learning to predict mechanical failures, we deal with noise samples in advance, so noise reduction is performed in feature preprocessing. Noise reduction is not involved in the MeanRadius-SMOTE, and the noise reduction algorithm will be introduced in the next section.

The MeanRadius-SMOTE algorithm mainly changes the generation rules of the SMOTE algorithm, so that the new samples are more likely to be distributed near the average radius of the minority class samples, and the new samples have a stronger ability to affect the decision boundary of the classifier. In the MeanRadius-SMOTE algorithm, the new sample is determined by k vectors of the sample center to the samples, and the

distance between the new sample and the sample center follows a normal distribution. The steps of the MeanRadius-SMOTE algorithm are as follows:


$$\mathbf{x}\_{nuc} = \mathbf{x}\_c + r \ast \sum\_{i=0}^{k} \mathbf{v}\_i \quad r \sim \mathcal{N}(d\_{\mathbf{m}\prime} \cdot \frac{d\_{\mathbf{m}}}{\Theta}) \tag{3}$$

(5) Repeat steps 3 and 4 until the number of samples of the majority class and minority class is balanced.

In order to show the flow of the algorithm more conveniently, we draw the flow chart of the MeanRadius-SMOTE algorithm, as shown in Figure 1.

**Figure 1.** The flow chart of the MeanRadius-SMOTE algorithm.

In the MeanRadius-SMOTE algorithm, k and θ are hyperparameters of the algorithm, which are determined according to the number of minority class samples and the imbalance rate. If k is too large, the direction of the new sample relative to the sample center will become meaningless, and θ directly affects the distribution of the new sample. As shown in Figure 2, new samples under different θ are likely to be distributed in colored areas. When θ is too small, the new sample may be far from the sample center. When θ is too large, the new sample is too conservative and cannot balance the number of positive and

negative samples near the decision boundary. Therefore, in general, the selection range of parameters k is 2 to 5 and the selection range of parameters θ is 4 to 10.

**Figure 2.** New samples distribution under different θ.

For mechanical equipment, some concurrent faults and the original fault have similar vibration states, and the two types of samples often overlap in distribution. Whether the classifier can find an excellent decision boundary is the key to determining the accuracy of the model. In the MeanRadius-SMOTE algorithm, most of the new samples are concentrated around the sample radius to ensure the validity of the new samples. The new sample is determined by k samples and is related to the sample center, so that the new sample can better inherit the features of the minority class. The geometric positions of new samples generated by different oversampling algorithms have their own characteristics, so we plot the examples of SMOTE, LR-SMOTE, and MeanRadius-SMOTE on two-dimension feature samples, as shown in Figure 3. The information of the two-dimension feature samples is shown in Table 1.

**Figure 3.** New samples on oversampling algorithms: (**a**) SMOTE, (**b**) LR-SMOTE, (**c**) MeanRadius-SMOTE.


**Table 1.** The information of the two-dimension feature samples.

The new samples of SMOTE are more inclined to be generated in locations with a high density of the original samples. Since LR-SMOTE randomly chooses a sample to determine the orientation of the new sample, the new sample is more clustered and radial. In MeanRadius-SMOTE, the orientation of new samples is relatively random, and the new samples are generated around the sample radius.

### **4. Experimental Preparation**

### *4.1. Data Set*

Our experimental data set is the 2009 PHM data challenge of gearbox [37]. The data set is a typical industrial gearbox data set, which contains 3 shafts, 4 gears, and 6 bearings, and its experimental bench is shown in Figure 4. The data set tests two sets of gears: spur gear and helical gear. The spur gear data set contains 8 health states, and the helical gear data set contains 6 health states. The data set consists of two channels of accelerometer signals and one channel of tachometer signals. The sampling frequency is 66.67 kHz, and the tachometer signals are collected at 10 pulses per revolution. There are five types of shaft speeds: 30 Hz, 35 Hz, 40 Hz, 45 Hz, and 50 Hz, with high and low loads. In the experiment, we chose the low load spur gear operating data at 30 Hz, and we used the vibration data of the two acceleration channels for feature extraction, The 8 health states of spur gears are as follows in Table 2.

**Figure 4.** Gearbox used in PHM 2009 challenge data.

**Table 2.** A brief description of the faults.


Mechanical equipment frequently fails in the harsh environment of high temperature and high pressure due to concurrent failures composed of multiple single failures [38]. In the PHM dataset, there are many types of concurrent failures, such as labels 4 to 8. They are all combinations of different types of failures in gears and bearings.

For the vibration signal, we sampled the data set using a sliding window with a stride of 100 and a width of 1000. Then we extracted time–frequency domain features for each vibration signal sample and add labels [39]. The formula of 23 features is shown in Table 3.


**Table 3.** The time–frequency domain features.

In the experiment, we used the K-nearest neighbor algorithm to denoise the data set. If the five nearest samples around a sample are not of this class, we consider it to be a noise sample and delete it. After the above preprocessing, we obtained 2656 samples per label, of which 1000 samples per label were taken as the test set. Additional samples were used to construct unbalanced data sets.

### *4.2. Classifiers*

In order to comprehensively evaluate the oversampling algorithm, we chose different classifiers to build the experimental model, which excludes the influence of the classifier and verifies the generality of the oversampling algorithm. Through experiments in a large number of mechanical fault diagnoses, the SVM classifier generally has a good training effect, so we chose SVM to establish a classification model. With the continuous development of the decision tree algorithm, the ensemble learning model is also favored by scholars because of its excellent generalization ability. Therefore, we chose RF (Random Forest) representing bagging ensemble mode, and GBDT (Gradient Boosting Decision Tree) representing boosting ensemble mode for experiments.

### *4.3. Evaluation Indicators*

Traditional evaluation indicators can well evaluate the performance of the model in the two-category problem. However, in the multi-classification problem, due to the partiality of the classifier, these indicators cannot comprehensively evaluate the model on unbalanced data sets. The expectation of the oversampling algorithm in this paper is to improve the prediction performance of the minority class without losing the overall prediction accuracy of the classifier. Therefore, we will use the traditional evaluation indicators and the prediction indicator of the minority class to evaluate the prediction model. For class *i* samples, we define the prediction results as follows, as shown in Table 4: **Table 4.** Predicting results for class *i* samples.


We choose the following four evaluation indicators:

(1) Accuracy (Acc): The Acc value is the ratio of the number of correctly predicted samples to the total number of samples. The calculation method is as shown in Equation (4):

$$\text{Acc} = \frac{\sum\_{i} \text{TP}\_{i} + \text{FN}\_{i}}{\sum\_{i} \text{FP}\_{i} + \text{TN}\_{i} + \text{TP}\_{i} + \text{FN}\_{i}} \tag{4}$$

The Acc value evaluates the overall prediction, but in the case of unbalanced data sets, it is not a good indicator to measure the results.

(2) Macro-Precision (Mac-P): The calculation method of the Precision value for class *i* samples is as shown in Equation (5):

$$\text{Precision}\_{i} = \frac{\text{TP}\_{i}}{\text{TP}\_{i} + \text{FP}\_{i}} \tag{5}$$

In the multi-classification problem, the Precision value is divided into Macro and Micro methods. Micro-Precision focuses more on types of samples with a large number of samples, so it is more susceptible to the majority class. However, Mac-P will treat each type of sample equally, so it can better describe the model's ability to deal with unbalanced data sets. The calculation method is as shown in Equation (6):

$$\text{Mac} - \text{P} = \frac{\sum\_{i} \text{Precision}\_{i}}{n} \tag{6}$$

(3) Macro-F1 (Mac-F1): It is contradictory to improve the Precision value and Recall value at the same time. The F1 value is a balance point with high Precision value and high Recall value, and its calculation method is as shown in Equation (7):

$$\text{F1}\_{i} = \frac{2 \ast \text{Precision}\_{i} \ast \text{Recall}\_{i}}{\text{Precision}\_{i} + \text{Recall}\_{i}} \tag{7}$$

In the multi-classification problem, The F1 value also has Macro and Micro methods such as the Precision value. This paper selects Mac-F1, which can better take into account the minority class. The calculation method is as shown in Equation (8):

$$\text{Mac} - \text{F1} = \frac{\sum \text{F1}\_i}{n} \tag{8}$$

(4) Precision-Minority (Pre*small*): In order to pay more attention to the prediction effect of the model on the minority class samples after oversampling algorithms, we will calculate the Precision value of the minority class as an indicator, and its calculation method is as shown in Equation (9):

$$\text{Pre}\_{small} = \frac{\text{TP}\_{small}}{\text{FP}\_{small} + \text{TP}\_{small}} \tag{9}$$

#### **5. Experimental Design and Results**

In this paper, we will design unbalanced data sets of various sizes for experiments. According to the distribution of sample data volume within each class, unbalanced data

sets can be divided into two forms, linear imbalance and step imbalance. The distribution of sample data volume for the two forms is as shown in Figure 5.

**Figure 5.** Two imbalance forms: (**a**) linear imbalance, (**b**) step imbalance.

In this paper, we design three linear unbalanced data sets and four step unbalanced data sets. In order to reduce the interference of the class on the Pre*small* in different experiments, we set the number of samples for labels 4 to 50 as the smallest minority class. We set the normal label as the large sample class, and the imbalance rate is designed to be 30, 20, and 15, through which the number of other labels can be determined. The details of the seven unbalanced data sets are shown in Table 5. For line-1 to 3, their imbalance rates are not the same. Moreover, the label linear order is shuffled. For stage-1 to 4, there are differences in the imbalance rate and the ratio of minority class and majority class labels.


**Table 5.** Unbalanced data sets description.

In the experiment, we will use the SMOTE, LR-SMOTE, and MeanRadius-SMOTE to oversample the seven unbalanced data sets, so that each class label becomes balanced. Then, we conduct experiments on the original data set and the three processed data sets on SVM, RF, and GBDT classifiers. In order to eliminate the training bias caused by random data, all experiments were performed with 5-fold cross-validation and repeated 10 times to obtain the average number of indicators.

The experimental results of Acc, Mac-P, and Mac-F1 on the linear unbalanced data sets and step unbalanced data sets are shown in Tables 6 and 7, where the values with bold mean the largest value in four compared models.

From Table 6, it can be found that the oversampling algorithm can effectively improve Acc, Mac-P, and Mac-F1, and MeanRadius-SMOTE is the best in most cases. In some experiments, SMOTE performs better than MeanRadius-SMOTE, but the gap between them is very small. However, in the SVM experiment, MeanRadius-SMOTE improves the three indicators much better than SMOTE and LR-SMOTE.

From Table 7, since there are more minority classes in the step unbalanced data sets, the three indicators are all lower in the experiments without oversampling, and are more affected by the imbalance rate. The SVM classifier combined with any oversampling algorithm is better than the ensemble learning classifier, and there are obvious gaps in the three indicators. On the step unbalanced data sets, MeanRadius-SMOTE outperforms SMOTE and LR-SMOTE in all cases, and the gap is especially significant on the SVM classifier.


**Table 6.** Experimental results of the linear unbalanced data set.

**Table 7.** Experimental results of the step unbalanced data set.


By analyzing Acc, Mac-P, and Mac-F1, all oversampling algorithms can effectively improve the overall prediction performance of the classifier on both forms of unbalanced data sets, and the MeanRadius-SMOTE algorithm proposed in this paper has the most obvious effect. We still need to focus on the prediction performance of the algorithm on the minority class; the experimental results of Pre*small* are shown in Table 8, where the values with bold mean the largest value in four compared models.

From Table 8, Pre*small* does not even exceed five in the None experiments. SMOTE and LR-SMOTE only improved Pre*small* by around five in most experiments. However, MeanRadius-SMOTE can help the classifier to more accurately predict the minority class, improving Pre*small* by around six or seven. In addition, MeanRadius-SMOTE is more stable in experiments with different imbalance rates, and does not fluctuate greatly like SMOTE and LR-SMOTE.


**Table 8.** Pre*small* on the data sets.

To better compare the effects of SMOTE, LR-SMOTE, and MeanRadius-SMOTE, we draw the line charts of Mac-P, Mac-F1, and Pre*small*, as shown in Figure 6. Since the data of Acc and Mac-F1 are close and their trend is basically the same, we only choose Mac-F1 to draw the line chart.

**Figure 6.** The line charts of Mac-P, Mac-F1, and Pre*small*.

According to Figure 6, the following conclusions can be drawn:


In summary, MeanRadius-SMOTE shows excellent performance in all experiments, which can take into account the prediction performance of the overall and minority class. In individual experiments, SMOTE is slightly higher than MeanRadius-SMOTE in Acc, Mac-P, and Mac-F1, but lower than MeanRadius-SMOTE in Pre*small*. We can think that this is the result of sacrificing the prediction performance of the minority class. Therefore, it can still be considered that MeanRadius-SMOTE is better than SMOTE and LR-SMOTE. Furthermore, the model composed of MeanRadius-SMOTE and SVM can improve prediction accuracy and stability.

#### **6. Conclusions and Outlook**

Mechanical fault diagnosis has always been a key issue in the PHM. Since the development of machine learning, although mechanical fault diagnosis has been solved by many effective methods, fault diagnosis under unbalanced data sets has always been a stubborn problem. The oversampling algorithm is currently recognized as an effective means to solve the problem of unbalanced data sets. The traditional oversampling algorithm is not only affected by the sample distribution, but also easily generates noise samples, which makes the decision boundary blurred. These drawbacks are not conducive to the classifier making predictions.

Based on the SMOTE, this paper proposes the new algorithm, MeanRadius-SMOTE, combining the sample center and radius. MeanRadius-SMOTE effectively avoids useless samples and noise samples in the process of generating new samples. In this paper, we conduct comparative experiments for SMOTE, LR-SMOTE, and MeanRadius-SMOTE algorithms and use SVM, RF, and GBDT classifiers on three linear unbalanced data sets and four step unbalanced data sets. Experimental results show that the MeanRadius-SMOTE algorithm can effectively balance data classes and improve the prediction performance of machine learning classifiers. From the perspective of various indicators, the MeanRadius-SMOTE algorithm is better than SMOTE and LR-SMOTE, and has better robustness. In the problem of unbalanced data sets, MeanRadius-SMOTE can more accurately predict the minority class without sacrificing the prediction performance of other classes, which is of grea<sup>t</sup> significance for mechanical fault diagnosis, and the combined model of MeanRadius-SMOTE and SVM is proved to be much better than other models.

Although this paper proves on PHM09 challenge data that MeanRadius-SMOTE has a good ability to deal with unbalanced data sets, considering the actual situation, future research can be carried out from the following aspects:


**Author Contributions:** Data curation, F.D. and S.Z.; validation, F.D.; investigation, S.Z. and Y.Y.; writing—original draft preparation, F.D. and Y.Y.; writing—review and editing, F.D., S.Z., and Z.C.; supervision, S.Z. and Z.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This article was supported by the National Natural Science Foundation of China (71871181); the Basic Research Project of Natural Science of Shaanxi Province (2022JM-433); the Key R&D Program of Shaanxi Province (2022GY-207, 2021ZDLGY10-06).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data in this article are the 2009 PHM data challenge of the gearbox.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


### *Article* **The Robust Multi-Scale Deep-SVDD Model for Anomaly Online Detection of Rolling Bearings**

**Linlin Kou 1,2, Jiaxian Chen 3, Yong Qin 1 and Wentao Mao 3,\***


**Abstract:** Aiming at the online detection problem of rolling bearings, the limited amount of target bearing data leads to insufficient model in training and feature representation. It is difficult for the online detection model to construct an accurate decision boundary. To solve the problem, a multi-scale robust anomaly detection method based on data enhancement technology is proposed in this paper. Firstly, the training data are transformed into multiple subspaces through the data enhancement technology. Then, a prototype clustering method is introduced to enhance the robustness of features representation under the framework of the robust deep auto-encoding algorithm. Finally, the robust multi-scale Deep-SVDD hyper sphere model is constructed to achieve online detection of abnormal state data. Experiments are conducted on the IEEE PHM Challenge 2012 bearing data set and XJTU-TU data set. The proposed method shows much greater susceptibility to incipient faults, and it has fewer false alarms. The robust multi-scale Deep-SVDD hyper sphere model significantly improves the performance of incipient fault detection for rolling bearings.

**Keywords:** incipient fault detection; robustness; reinforcement learning; anomaly detection

### **1. Introduction**

As a kind of special mechanical parts, rolling bearing has a decisive impact on the operation and reliability of mechanical equipment [1]. Once damaged, it will cause major losses to industrial production and personal property. Detecting abnormalities in the early stages of bearing fault, and performing accurate and reliable detection and diagnosis, will help to take timely measures for maintenance and avoid major accidents. Incipient fault detection is a key link in the Prognostics and Health Management (PHM) for rolling bearings [2].

For signal analysis-based incipient fault detection methods, noise elimination and noise utilization are conducted at first for vibration signal. Then, time domain, frequency domain or time-frequency domain analysis are performed to extract and compare fault characteristics [3–5]. Vibration spectra were conducted in [6] with consideration of a set of different recurrence indicators to describe the response of the bearing to the optimal clearance. Acoustic emission and lubricating oil characteristics are also very helpful for condition monitoring of bearings. Liu et al. [7] based on acoustic emission signal proposed a modified time-dependent excitation (TDE) model to detect defects of angular contact ball bearings. Chen et al. [8] made contributions on low-speed rolling bearing fault detection with subspace embedded feature distribution alignment and Structural Risk Minimization framework based on acoustic emission signal. Maroua et al. [9] analyzed the performance of different kinds of rolling bearings under five fully formulated axle gear oils with different viscosity and different formulations.

**Citation:** Kou, L.; Chen, J.; Qin, Y.; Mao, W. The Robust Multi-Scale Deep-SVDD Model for Anomaly Online Detection of Rolling Bearings. *Sensors* **2022**, *22*, 5681. https:// doi.org/10.3390/s22155681

Academic Editors: Yongbo Li, Bing Li, Jinchen Ji and HamedKalhori

Received: 4 July 2022 Accepted: 27 July 2022 Published: 29 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Those methods can extract incipient fault features from the original signal, which work as the input feature vector of classifier or as the indicator of rolling bearing incipient faults. However, the de-noise method has the disadvantage of weakening the fault information. In addition, these time-frequency domain methods cannot adaptively extract features, which lead to the weak ability of bearings early detection.

In recent years, machine learning-based methods are widely applied in many industries. In reference [10], an impact time-frequency dictionary was built to extract signal features with short-time matching method first, and then support vector machine (SVM) worked as classifier for incipient fault states. The supervised local Fisher discriminant and K-nearest neighbor method were introduced for weak fault diagnosing in [11], in which they are working for feature reduction and incipient fault state classification respectively. Ocak et al. [12] proposed a Hidden Markov Model (HMM) for rolling bearing fault detection and diagnosis, which can identify and detect early failures by tracking the probability changes of the pre-trained HMM under normal conditions. These methods usually use part of the normal state data in the initial stage to establish a single classification model, or employ existing normal state samples to construct abnormal discrimination criteria. However, the bearing has certain noise in the normal state, this kind of methods cannot automatically adapt to the irregular data fluctuations caused by the various noise, which may cause false alarms.

In the past decade, deep learning has already become an efficient way to detect and diagnose fault in many fields [13–17]. According to the authors' literature research, deep learning is still in its infancy on incipient fault detection. Lu et al. [18] used deep neural networks (DNN) and long short-term memory (LSTM) to construct an online data distribution estimator, and used the prediction bias value generated by the estimator to identify incipient fault location. A two-way Gated Recurrent Unit (GRU) network with local features is proposed for different types of faults to realize effective identification of incipient faults [19]. A new framework for rotor-bearing system fault diagnosis under varying working conditions is proposed in [20], it introduced stochastic pooling and Leaky rectified linear unit to overcome the training problems in classical CNN. Chen and et al. studied mixed faults diagnosis from multiple components by combining two 1-D convolutional neural networks (CNNs) [21]. Mao et al. [22] proposed an on-line detection method based on self-adaptive deep feature matching for incipient faults of rolling bearings. However, due to weak incipient fault, the ability of feature representation is poor. At the same time, due to the constraints of online application scenarios, the amount of available targetbearing data are limited. These methods have insufficient normal state data information, thus there are some certain obstacles in accurate decision boundary construction of online detection models.

Machine learning is used to dig out regular information from training data and learn pattern recognition knowledge. The parameter number of current deep neural networks is always very huge. It requires sufficient training data for training in order to obtain ideal results. In the case of a limited amount data, data enhancement technology can be used to increase the diversity of training data. Meanwhile, transformation operations can improve the feature representation ability of training data, therefore the feature information of training data is more sufficient and the problem of model over fitting can be avoided. Existing methods of data enhancement include: geometric transformation [23], color space enhancement [24], kernel filter [25], as well as generative adversarial networks [26] based on the idea of antagonistic thoughts and neural style transfer [27] and other methods. A geometric transformation is applied in [28] to solve the data imbalance problem for bearing fault detection.

To solve these problems, an incipient fault detection method based on multi-scale Deep-SVDD model with data enhancement is proposed in this research. First, the training data are transformed into multiple subspaces through data enhancement technology. Second, the prototype clustering method is introduced to improve the robustness of features under the framework of regularized dual averaging (RDA) algorithm, and then a robust multi-scale

Deep-SVDD hyper sphere model is constructed. Finally, the product of the probability that the transformed sample is located in its respective subspace is calculated as the anomaly score to achieve early online fault detection. The effectiveness of the proposed method is verified by experiments on the IEEE PHM Challenge 2012 bearing dataset and XJTU-SY dataset. The contribution of this paper can be summarized as follows. (1) A robust multi-scale Deep-SVDD hyper sphere model is proposed for online anomaly detection. The data information based on data enhancement technology is enriched. By extracting robust low-rank deep features, this method can enhance the capacity of multi-scale features representation and has good robustness. (2) An anomaly alarm indicator is built for online scenarios. This indicator is based on the robust low-rank features extraction, and then can measure abnormality. Therefore, this indicator is very effectively suitable for online applications. The details of this work are as follows.

#### **2. Deep Support Vector Data Description**

Deep support vector data description (Deep-SVDD) [29] is a representative method of using deep learning for anomaly detection in recent years. The nonlinear high-dimensional mapping is replaced by a neural network in this method, which improved the ability in dealing with high-dimensional and very large data sets. Deep-SVDD can take advantage of deep learning to deal with high-dimensional representation and processing of massive data.

Deep-SVDD constructs a neural network mapping. The method minimizes the volume of the hyper sphere containing the data features in the network when solving, and obtains the high-dimensional space common feature representation of normal data. The objective function is:

$$\min\_{\mathcal{W}} \frac{1}{n} \sum\_{i=1}^{n} \left\| \left\| \boldsymbol{\phi}(\mathbf{x}\_{i}; \mathcal{W}) - \boldsymbol{\varepsilon} \right\| \right\|\_{2}^{2} + \frac{\lambda}{2} \sum\_{l=1}^{L} \left\| \boldsymbol{w}^{l} \right\|\_{F}^{2} \tag{1}$$

The objective function consists of two items. The first item is the quadratic loss of the distance between the penalty sample feature and the center of the hyper sphere, and the second is the regular item that constrains the network weight to prevent over fitting, where *φ* is neural network mapping function, *xi*, *i* = 1, 2, ... , *n* is the sample data, *W* is the set of weight parameters of the network, *W* = *w*1,..., *<sup>w</sup><sup>L</sup>*. *c* is the center of hyper sphere, *L* is the number of layers of the neural network, *l* = {1, 2, . . . , *<sup>L</sup>*}. *λ* is the hyper parameters that control the weight decay. *W<sup>l</sup>* is the weight of *l*th hidden layer.

Optimizing the first item lets the network learn parameters *W* such that data points are closely mapped to the center *c*, and optimizing the second item is to minimize the volume of the hypersphere.

Center *c* is fixed in the neighborhood of the initial network outputs, which makes stochastic gradient descent (SGD) convergence faster and more robust [30].

The abnormal score of the Deep-SVDD algorithm evaluation sample can be calculated by the following:

$$s(\mathbf{x}) = \left\| \left\| \left\| \phi(\mathbf{x}\_i; \mathcal{W}^\*) - \mathbf{c} \right\| \right\| \right\|^2 \tag{2}$$

where *W*∗ is the network parameter of a trained model.

The larger *<sup>s</sup>*(*x*), the farther the sample is from the center of the hyper sphere, the higher the degree of abnormality of the sample.

#### **3. The Robust Multi-Scale Deep-SVDD Model of Incipient Fault Detection**

In this section, the proposed incipient Fault detection method for bearing is divided into offline stage and online stage. In the offline stage, employing data enhancement technology to transform a small number of normal samples into multiple feature spaces, based on this, the prototype clustering loss and multi-hyper sphere Deep-SVDD center loss are introduced to train the robust multi-scale Deep-SVDD model, and obtain each transformed model in the feature space, the distance-based cross entropy is used to determine the distance score threshold of the normal period data. In the online stage, the test samples to be detected are subjected to the same transformation enhancement, and then they are put

into the trained deep model to extract the deep features. The extracted deep features are used with each prototype center to calculate the distance-based anomaly score, and finally combined with the threshold value. When the score is less than the threshold, the sample is regarded as normal, otherwise, it is judged to be abnormal. Each step of the proposed robust multi-scale deep-SVDD model is elaborated in the following. The detailed flow chart is shown in Figure 1.

**Figure 1.** Flowchart of the Robust Multi-scale Deep-SVDD Model.

### *3.1. Signal Enhancement*

Vibration signal is a special one-dimensional datum and there is no specific neighborhood or order. Thus, traditional geometric transformations such as translation and rotation cannot be performed. In order to enable the transformation-based method to process vibration signal data, we propose a data transformation method for vibration signals. Specifically, we propose two transformations of vibration signals from the perspective of graphics.

### 3.1.1. Horizontal Scaling

First, we crop the length of *p*%(<sup>0</sup> ≤ *p* ≤ 50) from either end of the original signal. To ensure the same dimension of the feature space after transformation, we use the resampling method to sample the cropped signal to the length of the original signal, which is equivalent to stretching the original signal in the horizontal direction from a graphical point of view. In addition, for a vibration signal sample, to reduce information loss, we cut the two ends of the signal to obtain two sets of data of equal length, which are used as the two channels of the transformed data, for signal samples are displayed in the same feature space at different scales. As shown in Figure 2, the original signal length is 1280, the cropping parameter *p* is set to 30, and the two channels of the transformed sample are obtained by cropping from the left and right ends respectively.

**Figure 2.** Schematic diagram of the horizontal scaling of the vibration signal with (**a**) the original signal and (**b**) two transformed channel signals.

3.1.2. Vertical Scaling

> We set scaling parameters 0 < *α* < 1, transforming the original signal as follows:

$$f(\mathbf{x}\_i) = \begin{cases} (1+\alpha)\mathbf{x}\_i, \mathbf{x}\_i < 0\\ (1-\alpha)\mathbf{x}\_i, \mathbf{x}\_i \ge 0 \end{cases} \tag{3}$$

Vertical scaling does not change the length of the signal and the signal can extend or shorten in the vertical direction from the graph. This transformation is also the display of different scales of the signal. Similarly, to ensure the consistency of the feature space after transformation, the samples after vertical transformation are set to be two channels.

By setting different parameters and different combinations of horizontal and vertical transformation, we can obtain a variety of transformation methods to process the original signal. In this paper, the original vibration signal sample space *X* is transformed to obtain *M* subspaces *X*1,..., *XM*. The transformed sample is represented as *<sup>T</sup>*(*<sup>x</sup>*, *<sup>i</sup>*),..., *<sup>T</sup>*(*<sup>x</sup>*, *<sup>M</sup>*).

### *3.2. Prototype Clustering*

Prototype clustering is a clustering algorithm that uses the prototype to represent the center of the cluster. The prototype clustering algorithm usually needs to initialize the prototype cluster center and then employs the idea of iterative solution to find the cluster prototype.

Learning Vector Quantization (LVQ) [31] is a typical prototype clustering method. The LVQ algorithm uses prototype vectors to represent clusters. The sample is assumed to be labeled, and then the label information is working as an aid in the iterative optimization process to find the optimal prototype vector, which represents the cluster structure. The high-dimensional clustering space is divided into *n* clusters, and each prototype vector represents a cluster. The solution steps of the LVQ algorithm are as follows (Algorithm 1):


The algorithm finally learns a set of prototype vectors. Moreover, each of them represents the center of a certain area, which is equivalent to the center point of Voronoi division in space geometry. This center point is the center point of transformed sample in the neural network feature space.

#### *3.3. Distance-Based Cross Entropy Loss*

The original sample space *X* undergoes *M* transformations to obtain the transformed sample *<sup>T</sup>*(*<sup>x</sup>*, *i*), ... , *<sup>T</sup>*(*<sup>x</sup>*, *<sup>M</sup>*). For each transformed sample *<sup>T</sup>*(*<sup>x</sup>*, *i*), calculate the following conditional probability:

$$p\left(T(\mathbf{x},i)\in X\_i\right) = \frac{e^{-\left\|E\_\theta\left(T(\mathbf{x},i)\right) - c\_i\right\|\_2^2}}{\sum\_{i=1}^M e^{-\left\|E\_\theta\left(T(\mathbf{x},i)\right) - c\_i\right\|\_2^2}}\tag{4}$$

where *Eθ* is the network for feature extraction, *ci* is the center of *Xi*.

The distance-based cross entropy (dce) is expressed as:

$$\text{norm}\_{\mathbf{F}}(\mathbf{u}, \mathbf{v}) \propto \mathbf{u} \cdot \mathbf{r}(\mathbf{u}, \mathbf{v}) \quad (5)$$

$$\text{loss}\_{dce} = -\log p(T(\mathbf{x}, \mathbf{i}) \in X\_i) \quad (5)$$

Minimizing distance-based cross-entropy loss can map data samples to the class feature space near the prototype center, and improve the separability between classes. Compared with the softmax loss in traditional neural networks, it is more robust.

#### *3.4. Robust Multi-Scale Deep-SVDD*

The main idea of the robust multi-scale Deep-SVDD method proposed in this section is to perform data enhancement on the normal samples in the single-class anomaly detection and generate multiple transformations to construct SVDD hyper spheres, and use each transformation in multiple Deep-SVDD hyper spheres. The comprehensive score is used to measure the degree of sample abnormality.

First, to improve the robustness of feature extraction, we select robust deep autoencoding as the main framework, in which the robust deep auto-encoding encoder is conducted for feature extraction, so as to map the original samples to the low-rank feature space. Second, the learning vector method is used to find out the prototype centers *c*1, ... , *cM* of the transformed *M* samples subspace in the robust deep auto-encoder lowrank space. On this basis, the Deep-SVDD center loss is added, so that all normal samples are as close as possible to the center of each prototype, and the intra-class aggregation degree of each subspace is constrained. The final optimization function is as follows:

$$\begin{aligned} \min\_{\theta, \mathcal{S}} \|L\_D - D\_\theta(E\_\theta(L\_D))\|\_2^2 + \text{loss}\_{d\mathcal{C}} + \mu \|E\_\theta(L\_D) - \mathbf{c}\_i\|\_2^2 + \lambda \|\mathcal{S}\|\_{2,1} \\ \text{s.t. } X - L\_D - S = 0 \end{aligned} \tag{6}$$

where *μ* > 0, *λ* > 0 are regularization coefficient. Increasing the value of *μ* will make the normal sample features move closer to the center of each prototype, and vice versa, it will weaken the effect of the features gathering to the center.

#### *3.5. Calculation of Anomaly Score*

After the above steps, the training model can extract features from the input sample data after specific transformations and obtain the corresponding set of prototype centers *c*1, ... , *cM*, and then we can measure the degree of abnormality of the test sample. In the test stage, the test sample *x* undergoes *M* transformations to obtain the transformed sample *<sup>T</sup>*(*<sup>x</sup>*, *i*), ... , *<sup>T</sup>*(*<sup>x</sup>*, *<sup>M</sup>*). Put the transformed samples into a robust deep self-encoding encoder to extract features, according to Formula (4), the distance between all transformed samples and the centers of all prototypes is calculated to obtain the probability that they are located in their respective subspaces. Then the probability of the test normal sample *x* is the product of the probabilities that all the transformed samples are located in their respective subspaces, and the final anomaly score is expressed as:

$$\text{Score}(\mathbf{x}) = -\sum\_{i}^{M} \log P(T(\mathbf{x}, i) \in X\_i) \tag{7}$$

where the score represents the abnormal score of test sample *x*. The higher the score, the more abnormal.

Finally, for the bearing incipient fault detection, we need to determine a threshold for the abnormal score of a normal sample to determine whether the calculated abnormal score of the test sample meets the abnormal standard, that is, whether the bearing operating state is abnormal. In this paper, the maximum value of the training data anomaly score is directly used as the threshold standard.

### **4. Experiment**

Experiments on the IEEE PHM Challenge and XJTU-SY datasets are performed to verify the effectiveness of the proposed method. The programming environment is Python 3.6.0, Guido van Rossum, Beijing, China. The computer used in the experiment is configured with i5-8400 processor and 16 G memory.

### *4.1. Dataset Introduction*

IEEE PHM Challenge 2012 dataset is collected from the PRONOSTIA platform (shown in Figure 3a) [32], which specially designed and implemented by the AS2M department of the French FEMTO-ST Institute. It provides the entire life cycle data of rolling bearings through accelerated life degradation experiments. Bearings are working under three different working conditions in these experiments, (1) the engine speed is 1800 rpm and the load is 4000 N, (2) the engine speed is 1650 rpm and the load is 4200 N, (3) the engine speed is 1500 rpm and the load is 5000 N.

The XJTU-SY dataset is provided by the Institute of Design Science and Basic Component at Xi'an Jiaotong University (XJTU) [33], China and the Changxing Sumyoung Technology Co., Ltd. (SY), Zhejiang province, China. The platform is shown in Figure 3b. Three kinds of experimental working conditions were designed in this experiment, and five bearings were tested in each working condition. (1) The engine speed is 2100 rpm and the load is 12 kN. (2) The engine speed is 2250 rpm and the load is 11 kN. (3) The speed is 2400 rpm and the load is 10 kN.

#### *4.2. Model Parameter Settings*

The same data transformation (Horizontal and Vertical scaling) is conducted in the experiments. The parameter of Horizontal scale *p* is set to be 16, the value set is {0, 2, 4, 6, 8, 11, 14, 17, 20, 23, 27, 31, 35, 39, 43, 47}. The parameter of vertical scale *α* is set to be 3, and the value set is {0, 0.3, 0.7}. There are 48 combinations of these two transformations. The neural network structure used in the experiment is a deep residual network [34]. In the multi-scale robust Deep-SVDD, *μ* = 0.0001, *λ* = 0.5, *γ* = 0.002, the training iteration number is 100, and the size of each training batch is 8.

#### *4.3. Incipient Fault Detection Results*

Bearing 1\_2 and bearing 1\_3 in the IEEE PHM Challenge 2012 dataset, as well as the bearing 1\_1 and the bearing 2\_2 in the XJTU-SY dataset, are the target bearings, as shown in Table 1. The first 100 samples are selected for data transformation to obtain signals of 48 different scales in this experiment, and then the obtained data are put into the multi-scale robust Deep-SVDD model for training to complete the model training. In the test stage, the test samples are first subjected to data transformation, and then input into the model to calculate the abnormal score of each sample. The results of abnormal score and the RMS value are shown in Figures 4 and 5.

**Table 1.** Experiment dataset.


**Figure 4.** Result comparison of abnormal score and the RMS value with (**a**) PHM1\_2 and (**b**) PHM1\_3.

**Figure 5.** The comparison results of abnormal score and the RMS value with (**a**) XJTU1\_1 and (**b**) XJTU 2\_2.

### *4.4. Comparative Results*

To verify the superiority of the proposed algorithm, comparison between five other widely used methods for incipient fault diagnosis and detection and the proposed method is made. Among them, bandwidth empirical mode decomposition and adaptive multiscale morphological analysis (BEMD-AMMA) [35] is a typical method based on weak signal analysis, local outlier factor (LOF) and isolation forest (iFOREST) are two classic anomaly detection algorithms, meanwhile, Self-Adaptive Deep Feature Matching method (SDFM) [22] and Sparse Dictionary Representation (SDR) [36] methods are also used for comparison.

The Spectrum of bearing fault at different sample points are shown in Figure 6, where Figure 6a is for PHM1\_3, and Figure 6a is for XJTU1\_1. As we can see from both Figure 6a,b, the fault frequency gradually shows up with time.

We define a deviation rate of incipient fault detection (*DA*) to evaluate the methods' performance mentioned above.

$$DA = \frac{|p\_d - p\_r|}{p\_t - p\_r} \times 100\% \tag{8}$$

where is the detected sample point of incipient fault through method, *pr* is the real sample point of incipient fault, and *pe* is the end sample point of bearing in whole life. 2.

The anomaly detection result is shown in Table

**Table 2.** Comparison of anomaly detection results.


**Figure 6.** Spectrum of bearing fault at different sample points (**a**) PHM1\_3 (**b**) XJTU1\_1.

As shown in Table 2, the detection result of proposed method is the best one in the comparison. It indicates that employing multi-scale signal samples can enhance feature representation and make incipient fault more sensitive. The robust low-rank deep features extracted by multi-scale robust Deep-SVDD hyper sphere model have strong anti-noise ability for signal fluctuations. Thus, the stability and accuracy of detection results are relatively high.

### **5. Conclusions**

This paper proposes a multi-scale robust incipient fault detection method of rolling bearing with data enhancement. The data enhancement technology is incorporated into

the framework of the robust deep auto-encoding network, to improve the anti-noise ability. It makes the extracted features more robust. Moreover, the constructed robust multi-scale Deep-SVDD model is with good stability by adopting the multi-scale vibration signal features. From the experimental results, the proposed method is more sensitive to incipient faults and has lower false alarm number. The proposed method significantly improves the performance of incipient fault detection of rolling bearings.

**Author Contributions:** Conceptualization, Methodology, Software, Investigation, Formal Analysis, Writing—Original Draft, Writing—review and editing, L.K.; Visualization, Software, Writing— Original Draft, J.C.; Validation, Investigation and Supervision, Y.Q.; Conceptualization, Data Curation, Supervision, Writing—Review and Editing, Project Administration, W.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


### *Article* **Bearing Fault Diagnosis Using Piecewise Aggregate Approximation and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise**

**Lei Hu 1,2,\*, Ligui Wang 1, Yanlu Chen 1, Niaoqing Hu 3 and Yu Jiang 3**


**Abstract:** Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) effectively separates the fault vibration signals of rolling bearings and improves the diagnosis of rolling bearing faults. However, CEEMDAN has high memory requirements and low computational efficiency. In each iteration of CEEMDAN, fault vibration signals are added with noises, both the vibration signals added with noises and the added noises are decomposed with classical empirical mode decomposition (EMD). This paper proposes a rolling bearing fault diagnosis method that combines piecewise aggregate approximation (PAA) with CEEMDAN. PAA enables CEEMDAN to decompose long signals and to achieve enhanced diagnosis. In particular, the method first yields the vibration envelope using bandpass filtering and demodulation, then compresses the envelope using PAA, and finally decomposes the compressed signal with CEEMDAN. Test data verification results show that the proposed method is more effective and more efficient than CEEMDAN.

**Keywords:** rolling bearings; fault diagnosis; piecewise aggregate approximation; CEEMDAN

### **1. Introduction**

Rolling bearings are one of the most widely used components in rotating machinery. Failure of rolling bearings are one of the most frequent reasons for machine breakdown. Thus, fault diagnosis of rolling bearings is crucial to ensure the operational efficiency and reliability of engineering systems [1,2]. When a fault bearing rotates, a localised defect on the outer or inner race is struck by the rollers, or a localised defect on a roller strikes the inner and outer races. High-frequency resonances are excited and presented as impact transients. The periodicity of the successive impact transients is expressed as characteristic fault frequencies [2]. The vibration of fault bearing is recognised as the modulation between the components of low fault frequency *f*F and high natural frequency *f*n, as shown in Figure 1. It is the most classic bearing fault diagnosis method to obtain the envelope spectrum or squared envelope spectrum using bandpass filtering and demodulation [3]. Finding the optimal frequency band for filtering is critical for the envelope analysis [4]. Some successful tools, such as fast Kurtogram [5], the improved Kurtogram based on wavelet packet transform [6], protrugram [7], and Autogram [8], have been developed for finding the optimal band.

**Citation:** Hu, L.; Wang, L.; Chen, Y.; Hu, N.; Jiang, Y. Bearing Fault Diagnosis Using Piecewise Aggregate Approximation and Complete Ensemble Empirical Mode Decomposition with Adaptive Noise. *Sensors* **2022**, *22*, 6599. https:// doi.org/10.3390/s22176599

Academic Editors: Yongbo Li, Bing Li, Jinchen Ji and HamedKalhori

Received: 30 June 2022 Accepted: 24 August 2022 Published: 1 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** Transient response of bearing defects.

Empirical mode decomposition (EMD) is another widely used method for bearing fault diagnosis. EMD decomposes a signal into a set of intrinsic mode functions (IMFs) and a residue signal [9]. The IMFs are narrow-band components and indicate the natural oscillatory mode imbedded in the original signal [10]. As EMD is effective for nonlinear, non-stationary signals with both Gaussian and non-Gaussian noise, it has been applied with success in different fields, including bearing fault diagnosis [11], planetary gearbox fault diagnosis [12], railway structural wavelength identification [13], automatic sleep scoring [14], etc. However, EMD suffers from endpoint effects and mode mixing. As for mode mixing, a single IMF consists of signals of widely disparate scales, or a signal of a similar scale resides in different IMF components [15]. The mode mixing is the major drawback of the EMD. Ensemble EMD (EEMD) is developed to suppress mode mixing by adding assisted noises to improve the extrema distribution of the signal [16]. However, the IMFs generated by EEMD contain residual noise, and different numbers of IMFs can be generated as different assisted noises are added to the signal to be decomposed. In order to solve the problem that the IMFs are contaminated by residue noise, complementary EEMD (CEEMD) is presented via adding noises in pairs with opposite signs to the targeted signal [17,18]. However, the completeness property is not proven, and different noisy copies of the signal can produce a different number of modes. How to choose proper parameters is also a problem for CEEMD. A further improved algorithm named CEEMD with adaptive noise (CEEMDAN) is proposed to solve the problem of incomplete decomposition by adding particular noise to the signal, which in turn reduces the residual noise in the IMFs [19]. CEEMDAN has been applied in the fields of biomedical engineering [20], energy economics [21], and fault diagnosis [22,23]. In each iterative layer of CEEMDAN, *N* signals added with noises, as well as the *N* assisted noises ( *N* is the number of overall averages), are decomposed. Thus, CEEMDAN takes up a lot of memory, and is of low computational efficiency especially for long signal analysing.

A longer signal brings more robust information. For a signal of *<sup>x</sup>*(*t*), its Fourier transform is:

$$F(\omega) = \mathcal{F}[f(t)] = \int\_{t\_1}^{t\_2} f(t)e^{-j\omega t}dt. \tag{1}$$

It can be seen from Equation (1) that a frequency component reflects the average energy of the periodic component over the entire test period of *t*2 − *t*1. Theoretically, the longer the signal is, the more times that a component is averaged, and the clearer the spectrum will be. Figure 2a shows the simulation signal of a bearing with background noise. The signal length is *L* = 100 s and the signal noise ratio (SNR) is −18.091 dB. The fault characteristic frequency is 15 Hz. Signals with a length of 2 s, 10 s, 30 s, and 100 s are selected for envelope analysis. The corresponding envelope spectra yielded are shown in Figure 2b–e, respectively. It can be seen that the harmonics of fault frequency, which cannot be seen in the spectrum of 2 s, can be seen in the spectra of 10 s, 30 s, and 100 s. Although the harmonics of 10 s, 30 s, and 100 s have nearly equal amplitudes, the harmonics become increasingly clearer from Figure 2c–e, as the longer the signal length is, the better the background noise is reduced.

**Figure 2.** Fault bearing simulation signal and its envelope spectra: (**a**) Time domain signal with noise; (**b**) envelope spectrum of 2 s signal; (**c**) envelope spectrum of 10 s signal; (**d**) envelope spectrum of 30 s signal; (**e**) envelope spectrum of 100 s signal.

However, longer signals of high sampling frequency also increase the requirement of computing hardware, which can be a challenge especially for the application cases of edge computing. Particularly, as the natural frequency *f*n is as high as thousands (or even tens of thousands) of Hz, the sampling frequency of bearing vibration, *f*S, is set to be tens of thousands of Hz according to the Nyquist sampling theorem. Thus, it is natural to compress the signal before processing it using algorithms of high complexity. The technique of compressed sensing achieves data acquisition and compression at the same time. The measurements that compressed sensing obtains are nonadaptive linear projections of the original signals. And the original signals can be reconstructed with the measurements using recovery algorithms [24]. Compressed sensing is originally used for image processing in the fields of medical imaging [25–27], radar imaging [28,29], astronomy [30,31], face recognition [32,33], etc. Compressed sensing is also introduced for machinery fault diagnosis to obtain sparse representation of original signals and to extract fault features from the compressed signals [34–36]. The major drawback of compressed sensing for fault diagnosis is that the compression is not supervised with prior knowledge. Some classical diagnosis methods, such as envelope analysis and EMD, are not applicable any more for the compressed signals. In addition, loss of fault information is inevitable when reconstructing the original signals from the compressed signals.

Piecewise aggregate approximation (PAA) is a far easier method that can be used for signal compression [37,38]. An improved PAA is proposed to take fluctuating trends into account as well [39]. PAA first divides the time series into *N* segments equally and uses the average of each segmen<sup>t</sup> as an approximate representation of that segment. In this process, the original time series with *L* samples is compressed into a signal of *N* samples, which can be regarded as a process of dimensionality reduction. The equivalent sampling frequency of the compressed signal is *f*ES = *f*S × *N*/*L*, where *f*S is the sampling frequency of the original signal. Thus, there is information loss for components whose frequencies are larger than *f*ES/2.56.

In order to obtain reliable diagnostic results using long signals, a method combining PAA and CEEMDAN is proposed. In order to overcome the problem that CEEMDAN has large memory requirements and low computational efficiency, PAA is introduced to compress the signals before decomposing them. Moreover, in order to avoid information loss caused by signal compression, the traditional envelope analysis method is applied and PAA is performed on the envelopes instead of the original signals. Validations are carried out with signals collected from real rolling bearings.

### **2. Methodology**

*2.1. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)*

CEEMDAN is an improved algorithm of EMD and EEMD, which overcomes the shortcomings of EEMD, as mentioned in Section 1. The flow chart of CEEMDAN is shown in Figure 3.

**Figure 3.** Flow chart of the CEEMDAN algorithm.

Assuming *y* is the signal to be decomposed, CEEMDAN is performed to decompose the signal *y*, and the IMF obtained by layer *i* decomposition is expressed as *Ci*, *i* = 1, 2, ··· , *I*, where *I* is the number of layers of decomposition, and the decomposition steps are as follows:

(1) First layer decomposition, i.e., *i* = 1.

1 Adding white noise *vj* to the signal of *y* yields a new signal of *y* + *<sup>ε</sup>i*=1*<sup>v</sup>j*, where *j* = 1, 2, ... , *N*, *N* is the number of adding white noise, and *εi*=1 is the amplitude of white noise.

2 Decomposing the new signal of *y* + *<sup>ε</sup>i*=1*<sup>v</sup><sup>j</sup>* with EMD yields a series of IMFs, and the first IMF is presented as *<sup>E</sup>j*1st,*i*=1.

3 Ensemble averaging of *N* IMFs *<sup>E</sup>j*1st,*i*=<sup>1</sup> yields the *i*th (*i* = 1) IMF of CEEMDAN:

$$\overline{C\_{i=1}} = \frac{1}{N} \sum\_{j=1}^{N} E\_{1 \text{st}, i=1}^{j} \tag{2}$$

4 Removing the first IMF of *Ci*=<sup>1</sup> from *y* yields the residual of *ri*=1:

$$r\_{i=1} = y - \overline{\mathbb{C}\_{i=1}} \tag{3}$$

(2) Second layer decomposition, *i* = 2.

1 Decomposing *vj* with EMD yields a series of IMFs, the first of which is presented as *<sup>E</sup>*1*v<sup>j</sup>*. Adding *<sup>E</sup>*1*v<sup>j</sup>* as noise to the residual *ri*−1 yields a new signal of *ri*−1 + *<sup>ε</sup>iE*1*v<sup>j</sup>*.

2 Decomposing the new signal *ri*−1 + *<sup>ε</sup>iE*1*v<sup>j</sup>* with EMD yields a series of IMFs, the first of which is presented as *<sup>E</sup>j*1st,*i*.

3 Ensemble averaging of *N* IMFs *<sup>E</sup>j*1st,*i* yields the *i*th IMF of CEEMDAN:

$$\overline{C\_i} = \frac{1}{N} \sum\_{j=1}^{N} E\_{1 \text{st},i}^j \tag{4}$$

4 Removing the *i*th IMF of *Ci* from *ri*−1 yields the residual of *ri*:

$$r\_i = r\_{i-1} - \widetilde{\mathbb{C}\_i} \tag{5}$$

(3) The above steps are repeated until the residual signal obtained is a monotone function and cannot be further decomposed, at which point the algorithm ends. At last, the signal to be decomposed is presented as:

$$y = \sum\_{i=1}^{I} \overline{\mathbf{C}\_{i}} + r\_{I} \tag{6}$$

in which *I* is the number of IMFs and *rI* is last residual signal.

#### *2.2. Piecewise Aggregate Approximation*

It can be seen that CEEMDAN has large memory requirements and low computational efficiency, as in each iteration of CEEMDAN, tens of fault vibration signals added with assisted noises, as well as the assisted noises, are decomposed with classical EMD. To solve the problem of low computational efficiency, PAA is introduced to compress the signals before performing CEEMDAN.

PAA compresses a large amount of time series data while keeping as many original features of the data as possible. Assuming that the test signal is *x* = {*xi*}, the sampling frequency is *f*S, and the signal length is *L*. PAA defines a constant window *w*, then divides the sample sequence *x* into *N* equal segments, *N* = *L*/*w*, and finally calculates the mean of each segment:

$$p\_{\mathbb{M}} = \frac{1}{w} \sum\_{i=w(n-1)+1}^{\mathbb{M}} x\_i, n = 1, \ 2, \ \cdot \cdot \cdot \cdot \text{ , } N \tag{7}$$

The new sequence *p* = (*p*1, *p*2, ..., *pN*) is the obtained compressed signal. It can be seen that the equivalent sampling frequency of the compressed signal is *f*ES = *f*S/*<sup>w</sup>*. The larger *w* is, the smaller the samples that the compressed signal has obtained.

### *2.3. Diagnosis Flowchart*

Figure 4 shows the flow chart of the proposed method, which consists of five main steps: optimal band selection for filtering, bandpass filtering and demodulation, PAA, CEEMDAN, and spectra analysis. The steps are depicted as follows:

**Figure 4.** Flow chart of the proposed method.

(1) Optimal filtering band selection.

In order to enhance the modulation signal of low fault frequency and high natural frequency, finding an optimal resonance band for bandpass filtering is critical. The fast Kurtogram, which finds the optimal band according to the kurtosis of the filtered time signal in different filter banks, has been proven to be a practical tool in bearing fault diagnosis. Thus, the fast Kurtogram is introduced for optimal filtering band selection.

(2) Bandpass filtering and demodulation.

Bandpass filtering enhances the modulation signal of low fault frequency and high natural frequency, while demodulation obtains the envelope signal *y* of low fault frequency, *y* = |*x* + *iH*(*x*)|, where *x* is the filtered signal and *<sup>H</sup>*(*x*) is the Hilbert transform of *x*. The envelope consists of components of low frequencies, including the harmonics of fault frequencies. As the fault frequencies are far smaller than the natural frequency, the envelope can be compressed to obtain a signal whose equivalent sampling frequency is far smaller than the original sampling frequency.

(3) Signal compression.

PAA is introduced to compress the envelope yielded in the second step. PAA first divides the envelope into *N* segments of equal length *w*, *N* = *L*/*w*, where *L* is the length of the envelope. Then, PAA uses the mean *pi* of each segmen<sup>t</sup> as an approximate representation of the segment. The obtained compressed signal is *p* = {*pi*}.

The window size, or the segmen<sup>t</sup> length, *w*, is the only unknown parameter of PAA. In addition, *w* can be set according to the requirement for the equivalent sampling frequency *f*ES of the compressed signal. As for the envelope of the bearing fault signal, the interesting components are the harmonics of bearing fault characteristic frequencies, which include the ball pass frequency of outer race *f*BPFO, the ball pass frequency of inner race *f*BPFI, the ball spin frequency *f*BS, and the cage frequency *f*C. The maximum of the bearing fault characteristic frequencies, *f*max = (*f*BPFO, *f*BPFI, *f*BS, *f*C), is generally *f*BPFI. According to the Nyquist sampling theorem, the equivalent sampling frequency of the compressed signal should satisfy the condition of *f*ES ≥ 2.56*Z f*max, in which *Z* is the max order of fault frequency harmonics. Therefore, the window size meets the inequality of:

$$w \le f\_{\mathbb{S}} / (2.56 Z f\_{\text{max}}). \tag{8}$$

### (4) CEEMDAN.

Following the steps of CEEMDAN described in Section 2.1, the compressed signal is decomposed, and a series of IMFs is obtained.

### (5) Spectrum analysis.

Spectrum analysis is performed on the IMFs obtained to find the interesting IMFs whose frequency bands cover the fault characteristic frequency. Fault diagnosis of rolling element bearing is finally achieved according to the spectra of the interesting IMFs.

### *2.4. Remarks*

PAA is simple, but the envelope waveform of impact transients is well retained in the compressed signal. The reason is that signal compression is supervised with prior knowledge. Particularly, PAA compresses the envelope instead of the original signal. The series of impact transients produced successively by a localised defect are recognised as the modulation between the low-frequency fault components and high-frequency resonances. Thus, bearing vibration is collected with high sampling frequency, and compressing the original vibration signal causes the information loss of the high-frequency resonance; while the diagnostic information in the demodulated envelope is the low-frequency fault components, and the information will be kept in the compressed signal as long as the equivalent sampling frequency is larger than 2.56 multiples of interesting frequencies.

### **3. Experiment Validation**

Bearing fault simulation tests are carried out on the test bench, as shown in Figure 5. The test bench consists of a driving motor, a bearing-supported rotating shaft, an inertia wheel for providing radial load, a belt drive mechanism, a gearbox, a crank connecting rod mechanism, and a reciprocating mechanism. The bearing seeded with defect is mounted in the bearing housing closer to the motor. The seeded defect is a localised crack with both a width and depth of 0.2 mm. The bearing is a deep groove ball contact bearing, the model is MB-ER-10K. The fault characteristic frequencies are *f*BPFO = 3.052 *f*r, *f*BPFI = 4.948 *f*r, *f*BS = 1.992 *f*r, and *f*C = 0.382 *f*r, where *f*r is the shaft frequency. Vibration signals were collected using accelerometers of the PCB Model 608A11, whose bandwidth is of 0.5 Hz~9 kHz. The sampling frequency was set as *f*S = 25.6 kHz.

**Figure 5.** Machinery fault simulation bench.

The maximum of the bearing fault characteristic frequencies is *f*max = *f*BPFI. Assuming that *Z* = 5 orders of fault frequency harmonics are supposed to be retained in the compressed signals, it yields the condition of the window length, *w* ≤ 404.20/ *f*r, according to Equation (8).

#### *3.1. Validation for Outer Race Defect Case*

The vibration signal of a bearing with an outer race defect is shown in Figure 6a. The signal length is *L* = 19 s, the shaft speed is *f*r = 14.1184 Hz, and the corresponding fault frequency is *f*BPFO = 43.0894 Hz. The proposed method combining CEEMDAN and PAA was used to analyse the signal. Firstly, analysing the signal with fast Kurtogram yields the diagram, as shown in Figure 7. It can be seen that the center frequency of the optimal band is 10,667 Hz, the bandwidth is 4267 Hz, and the corresponding optimal filtering band is 8533.5~12,800.5 Hz. The filtered signal for the optimal filtering band is shown in Figure 6b. The envelope of the filtered signal is shown in Figure 6c.

**Figure 6.** Case 1 for outer race defect: (**a**) Original signal; (**b**) filtered signal; (**c**) envelope; (**d**) compressed signal; (**e**) partial enlarged envelope; (**f**) partial enlarged compressed signal.

**Figure 7.** Case 1 for outer race defect: Kurtogram results of the vibration signal.

Performing PAA to compress the envelope yields the result shown in Figure 6d. The window length is set to be *w* = 20 as *w* ≤ 404.20/ *f*r and *f*r = 14.1184 Hz. The equivalent sampling frequency of the compressed signal is *f*ES = 1.28 kHz. Partial enlarging the envelope of Figure 6c yields Figure 6e, and partial enlarging the compressed signal of Figure 6d yields Figure 6f. Comparing Figures 6d and 6c, and Figures 6f and 6e, it can be seen that although the compressed signal has smaller amplitudes than the envelope does, they share the same waveform of impulses.

Decomposing the compressed signal with CEEMDAN yields 16 IMFs. The spectra of the IMF 2~IMF 7 are shown in Figure 8, from which the component of *f*BPFO and its high order harmonics can be seen clearly. Particularly, the spectrum band of IMF 6 is concentrated around *f*BPFO, IMF 5 is around *f*BPFO and 2 *f*BPFO, IMF 4 is around 2 *f*BPFO and 3 *f*BPFO, and IMF 3 is around 3 *f*BPFO and 4 *f*BPFO. These peaks of the fault frequency harmonics illustrate the tested bearing with outer race defects.

**Figure 8.** Case 1 for outer race defect: Amplitude spectra of IMFs obtained from compressed signal.

The time length of the signal is *L* = 19 s, and the original signal is of *f*S × *L* = 486, 400 samples. CEEMDAN was used to decompose the original signal directly, and the algorithm was still running after 24 h of operation (the computer processor is I5 2.5 g dual-core, and the operating memory is 8G). The compressed signal is of *f*ES × *L* = 24, 320 samples, which equals the original signal of 0.95 s. Performing CEEMDAN to decompose the compressed signal 10 times, the mean operation time is 359.2 s.

For a segmen<sup>t</sup> of the original signal, which is of 0.95 s, it consists of the same 24,320 samples as the compressed signal does. Performing CEEMDAN to decompose the signal segmen<sup>t</sup> also yields 14 IMFs. The spectra of IMF 7~IMF 12 are shown in Figure 9. It can be seen that the spectrum band of IMF 10 is concentrated around *f*BPFO, IMF 9 is around 2 *f*BPFO, and IMF 8 is around 3 *f*BPFO and 4 *f*BPFO. However, none of these harmonics can be seen from these spectra. The reason is that the signal segmen<sup>t</sup> to be decomposed is too short, and the times that these harmonics are averaged during FFT are not enough to reduce background noises.

**Figure 9.** Case 1 for outer race defect: Amplitude spectra of IMFs obtained from an original signal segmen<sup>t</sup> that has the same samples as the compressed signal.

Comparing Figures 8 and 9, it can be seen that PAA enables CEEMDAN to decompose long signals and to yield enhanced diagnostic results.

#### *3.2. Validation for Inner Race Defect Case*

The vibration signal of a bearing with an inner race defect is shown in Figure 10a. The signal length is *L* = 19 s, the shaft speed is *f*r = 19.7 Hz, and the characteristic frequency of the inner race fault is *f*BPFI = 97.48 Hz. Analysing the signal with fast Kurtogram yields the result shown in Figure 11. The diagram is different from the one in Figure 7. The same band is selected, with a center frequency of 10,667 Hz, and a bandwidth of 4267 Hz.

**Figure 10.** Case 2 for inner race defect: (**a**) Original signal; (**b**) filtered signal; (**c**) envelope; (**d**) compressed signal; (**e**) partial enlarged envelope; (**f**) partial enlarged compressed signal.

**Figure 11.** Case 2 for inner race defect: Kurtogram results of the vibration signal.

Figure 10b shows the filtered signal for the filtering band, Figure 10c shows the envelope of the filtered signal, and Figure 10d shows the compressed signal of the envelope obtained with PAA. The window length of PAA is also set to be *w* = 20, which satisfies *w* ≤ 404.20/ *f*r. The equivalent sampling frequency of the compressed signal is also *f*ES = 1.28 kHz. Figure 10e,f shows the partial enlarged envelope and the partial enlarged compressed signal, respectively. It can be seen that the compressed signal keeps the waveform of low frequency components in the envelope.

Decomposing the compressed signal yields 16 IMFs. The spectra of IMF 3~IMF 6 are shown in Figure 12. It can be seen that the spectrum band of IMF 4 is concentrated around the inner race fault frequency *f*BPFI. The harmonic of *f*BPFI and its sidebands of *f*BPFI ± *f*r, and *f*BPFI + 2 *f*r are clearly presented in the spectrum of IMF 4. The reason for the modulation frequency of *f*r is that the inner race defect passes the bearing load zone once every rotation of the shaft, and the transient amplitudes change periodically.

**Figure 12.** Case 2 for inner race defect: Amplitude spectra of IMFs obtained from the compressed signal.

The spectrum of IMF 5 is concentrated around the frequency of *f*BPFI − *f*r, and clearly shows the harmonics of *f*BPFI, *f*BPFI ± *f*r, and *f*BPFI − 2 *f*r. The spectrum of IMF 3 is concentrated around the band of [ *f*BPFI, 2 *f*BPFI]. The harmonic of the fault frequency *f*BPFI and its sidebands of *f*BPFI + *f*r and *f*BPFI + 2 *f*r, as well as the second order fault frequency of 2 *f*BPFI and its sidebands of 2 *f*BPFI − 3 *f*r and 2 *f*BPFI − 2 *f*r, can be clearly seen from the spectrum. The sideband of 2 *f*BPFI − 3 *f*r can also be seen in the spectrum of IMF 4.

It is worth noting that the characteristic frequency of *f*BPFI = 4.948 *f*r is very close to the 5th order harmonic of shaft frequency 5 *f*r. Thus, the fault frequency harmonics and their sidebands are very close to the high order shaft frequencies. In any case, the components of *f*BPFI, 2 *f*BPFI, and their sidebands illustrate that the tested bearing has inner race defects.

### **4. Conclusions**

In this paper, a rolling bearing fault diagnosis method that combines PAA and CEEM-DAN is proposed. The method firstly extracts the envelope signal from an original signal using bandpass filtering and demodulation, then compresses the envelope with PAA, decomposes the compressed signal with CEEMDAN, and finally investigates the spectra of IMFs. Validation results with real bearings show that the proposed method is effective and efficient.

The interesting components in the original signal for fault diagnosis are the modulation between the fault frequencies and the resonance natural frequencies. The natural frequencies are as high as thousands of Hz, or even tens of thousands of Hz, while the interesting components in the envelope are the fault frequency harmonics demodulated from the original signal. As the fault frequencies are far lower than the natural frequencies, compressing the envelope instead of the original signal avoids information loss.

The spectra of IMFs reflect the average energy over the entire test period. The longer the signal is, the more times the spectra are averaged during FFT, and the better the background noise is reduced. However, in each iteration of CEEMDAN, an IMF is yielded by decomposing tens of signals added with assisted noises, as well as the assisted noises

themselves. Thus, CEEMDAN has large memory requirements and low computational efficiency for long signals. Compressing the envelope with PAA enables the use of CEEMDAN for long signals to achieve enhanced diagnosis.

**Author Contributions:** Conceptualization and supervision, L.H.; data curation, L.W. and N.H.; investigation, software and validation, L.W. and Y.C.; methodology and writing, L.H. and L.W.; review and editing, L.H. and Y.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (Grant No. 51575518).

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


#### *Article* **Simultaneous Sensor and Actuator Fault Reconstruction by Using a Sliding Mode Observer, Fuzzy Stability Analysis, and a Nonlinear Optimization Tool**

**Samira Asadi, Mehrdad Moallem \* and G. Gary Wang**

> School of Mechatronic Systems Engineering, Simon Fraser University, Surrey, BC V3T 0A3, Canada **\*** Correspondence: mehrdad\_moallem@sfu.ca

**Abstract:** This paper proposes a Takagi–Sugeno (TS) fuzzy sliding mode observer (SMO) for simultaneous actuator and sensor fault reconstruction in a class of nonlinear systems subjected to unknown disturbances. First, the nonlinear system is represented by a TS fuzzy model with immeasurable premise variables. By filtering the output of the TS fuzzy model, an augmented system whose actuator fault is a combination of the original actuator and sensor faults is constructed. An *H*∞ performance criteria is considered to minimize the effect of the disturbance on the state estimations. Then, by using two further transformation matrices, a non-quadratic Lyapunov function (NQLF), and fmincon in MATLAB as a nonlinear optimization tool, the gains of the SMO are designed through the stability analysis of the observer. The main advantages of the proposed approach in comparison to the existing methods are using nonlinear optimization tools instead of linear matrix inequalities (LMIs), utilizing NQLF instead of simple quadratic Lyapunov functions (QLF), choosing SMO as the observer, which is robust to the uncertainties, and assuming that the premise variables are immeasurable. Finally, a practical continuous stirred tank reactor (CSTR) is considered as a nonlinear dynamic, and the numerical simulation results illustrate the superiority of the proposed approach compared to the existing methods.

**Keywords:** actuator and sensor faults; TS fuzzy system; sliding mode observer (SMO); *H*∞ performance; non-quadratic Lyapunov function (NQLF); fmincon; fault reconstruction

### **1. Introduction**

Over the past few decades, the reliability and safety of industrial systems has attracted considerable attention. As a consequence, fault-tolerant control (FTC) has received considerable attention in different fields [1,2]. There are different classifications for FTCs. In general, FTCs are classified into passive and active classifications. Active fault-tolerant controllers compensate for the effects of the occurred faults by using early information obtained from fault detection and isolation (FDI) schemes, which leads to a more flexible dynamic [3]. Consequently, FDI is becoming an attractive topic in different research fields. Observer-based methods are one of the most popular model-based FDIs. The main idea of observer-based FDIs is to construct a residual based on the measured output of the systems or to reconstruct the fault directly. Sliding mode observer (SMO) works based on the second approach, which detects the faults while determining the dynamic behavior [4,5]. SMOs are more insensitive to the unknown uncertainties occurring in the system compared to other observers like unknown input observers (UIOs) [6].

First, SMO observers were developed for linear dynamic systems; however, most actual physical systems are often nonlinear. Currently, lots of SMO-based fault reconstruction methods have been developed for uncertain nonlinear systems. In ref. [7], by considering a filter of the measured output vector, the original system with sensor and actuator faults is transformed into an augmented system with just the actuator fault and unknown inputs.

**Citation:** Asadi, S.; Moallem, M.; Wang, G.G. Simultaneous Sensor and Actuator Fault Reconstruction by Using a Sliding Mode Observer, Fuzzy Stability Analysis, and a Nonlinear Optimization Tool. *Sensors* **2022**, *22*, 6866. https://doi.org/ 10.3390/s22186866

Academic Editors: Yongbo Li, Bing Li, Jinchen Ji and Hamed Kalhori

Received: 12 July 2022 Accepted: 7 September 2022 Published: 10 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Nevertheless, the classes of nonlinear systems considered in most of the papers are limited and cannot represent a general model for real systems [8,9].

Takagi–Sugeno (TS) fuzzy models can represent the behavior of nonlinear systems while keeping the simplicity of the linear models. A TS fuzzy representation is a convex nonlinear aggregation of several linear systems. Because the parameters of a TS fuzzy representation satisfy the convex sum, it is interesting to investigate the properties of the TS system based on its local linear vertices. With the advent of TS fuzzy systems, TS-based FDI techniques emerged to tackle a broader range of nonlinear systems [10]. By changing a nonlinear system to a TS system, some local linear systems are created, representing the behavior of the nonlinear system in a specific operating area. These local linear systems can be aggregated by using an interpolation mechanism. Thus, TS fuzzy models can represent the actual nonlinear behavior while maintaining the simplicity of linear models. Thus, an efficient FDI can be obtained by combining the SMO, which is robust to the uncertainties, and the TS fuzzy model, which causes simplicity in the design process. Recently, several researchers have utilized TS-based SMOs for fault detection and isolation in continuous-time and discrete-time systems [11,12]. However, in the methods developed in these articles, it is assumed that the premise variables are measurable, which reduces the applicability of these approaches. To deal with this problem, an FDI approach for stability analysis of the TS fuzzy systems with immeasurable premise variables was proposed in [13,14].

In [15], simultaneous actuator and sensor faults in a nonlinear system represented by a TS fuzzy model are reconstructed by using an SMO and considering *H*∞ performance criteria to reduce the effect of disturbance, whereas [16] does the same procedure for the fault reconstructions and both of the exogenous disturbance and the system faults are reconstructed. However, in refs. [15,16] quadratic Lyapunov functions (QLFs) are used to design the observers. By using the QLF for TS fuzzy systems with a large number of fuzzy rules can cause undesired performance or unfeasible solutions. Consequently, refs. [17,18] offered to use a non-quadratic Lyapunov function (NQLF) to design the TS-based SMO for the FDI purposes. In all these papers, a linear optimization approach based on linear matrix inequalities (LMIs) is utilized, making the stability analysis more complex and using some approximations and lemmas to prove the stability conditions.

In this paper, a TS fuzzy-based SMO with immeasurable premise variables is designed to reconstruct simultaneous actuator and sensor faults in a nonlinear system exposed to an unknown disturbance. Then, the states and faults are estimated. The stability of the proposed observer is guaranteed by using the NQLF and fmincon as a nonlinear optimization tool in MATLAB. In addition, *H*∞ performance criteria are considered to minimize the effect of disturbances and uncertainties on the estimation error and the fault estimations. By using the NQLF, a generalized eigenvalue problem is proposed, which maximizes the admissible Lipschitz constant and minimizes the disturbance effects on the estimation error through a nonlinear optimization problem.

The main advantages of the proposed approach over the existing methods can be summarized as follows:


This paper is organized as follows. Section 2 presents a TS fuzzy model with simultaneous actuator and sensor faults and disturbance and how to construct a fictitious system with just an actuator fault. In Section 3, the main results of this paper, including the sliding mode observer design and the sufficient conditions of stability of the estimation errors, are proposed and guarantee the *H*∞ performance simultaneously. Section 4 discusses the procedure of the actuator and sensor fault reconstructions. In Section 5, simulation results

are given, and comparisons are discussed. Finally, in Section 6, the concluding remarks are given.

### **2. Preliminaries**

Assume that a continuous-time nonlinear system affected by actuator and sensor faults and disturbance is given as

$$\begin{cases} \dot{\mathbf{x}}(t) = f(\mathbf{x}(t), u(t), f\_a(t), d(\mathbf{x}(t), u(t), t)) \\ \quad y(t) = \mathbf{C}\mathbf{x}(t) + N f\_s \begin{pmatrix} t \end{pmatrix} \end{cases} \tag{1}$$

where *x*(*t*) ∈ *Rn*, *u*(*t*) ∈ *Rm*, *y*(*t*) ∈ *Rp*, *fa*(*t*) ∈ *Rq*, *fs*(*t*) ∈ *R<sup>h</sup>* and *d*(*x*(*t*), *<sup>u</sup>*(*t*), *t*) ∈ *Rl* are the state, input, output, unknown actuator, and sensor faults, and the system uncertainty vectors, respectively. *f* and *g* are nonlinear smooth functions. By using sector nonlinearity transformation, the nonlinear model (1) can be replaced by the following TS fuzzy model

$$\begin{cases} \dot{\mathbf{x}}(t) = \sum\_{i=1}^{r} \mu\_i(\boldsymbol{\xi}(t)) \left\{ \begin{array}{l} A\_i \mathbf{x}(t) + B\_i \boldsymbol{u}(t) + M\_i \boldsymbol{f}\_d(t) \\ + D\_i \boldsymbol{d}(\mathbf{x}(t), \boldsymbol{u}(t), t) \end{array} \right\} \\\ y(t) = \mathbf{C} \mathbf{x}(t) + N \boldsymbol{f}\_s(t) \end{cases} \tag{2}$$

where *C* and *N* are known full rank matrices with appropriate dimensions. *Ai*, *Bi*, *Mi*, and *Di* are real known matrices, *r* represents the number of fuzzy rules and *μi*(*ξ*(*t*)) are the fuzzy membership functions depending on the unmeasurable variable vector *ξ*(*t*) and satisfy the following so-called convex sum property

$$\begin{cases} \ 0 \le \mu\_i(\xi(t)) \le 1\\ \sum\_{i=1}^r \mu\_i(\xi(t)) = 1 \end{cases} \tag{3}$$

In the rest of the paper, (*t*) is dropped from the equations, *d*, *μi* and *μ*ˆ*i* denote *d*(*<sup>x</sup>*, *u*, *t*), *μi*(*ξ*(*t*)), and *<sup>μ</sup>i* ˆ*ξ*(*t*) and the mark (∗) denotes the transposed element in a symmetric matrix.

To build a system with just an actuator fault and then use the actuator fault reconstruction concepts, the output is passed through an orthogonal matrix *Tr* ∈ *Rp*×*p* and an augmented TS system of order *n* + *h* can be obtained as

$$\begin{cases}
\dot{X} = \sum\_{i=1}^{r} \mu\_i \{\mathcal{A}\_i X + \mathcal{B}\_i u + \mathcal{D}\_i d + \mathcal{M}\_i f\_d + \mathcal{N} f\_s\} \\
Y = CX
\end{cases},\tag{4}$$

where *X* = *x<sup>T</sup> z<sup>T</sup><sup>T</sup>* ∈ *Rn*+*h*, *Y* = *y<sup>T</sup>*1 *z<sup>T</sup><sup>T</sup>* ∈ *Rp*, and

$$\mathcal{A}\_{i} = \begin{bmatrix} A\_{i} & 0 \\ A\_{f}\mathbf{C}\_{2} & -A\_{f} \end{bmatrix}, \mathcal{B}\_{i} = \begin{bmatrix} B\_{i} \\ 0 \end{bmatrix}, \mathcal{D}\_{i} = \begin{bmatrix} D\_{i} \\ 0 \end{bmatrix}, \ \mathcal{M}\_{i} = \begin{bmatrix} M\_{i} \\ 0 \end{bmatrix}, \mathcal{N} = \begin{bmatrix} 0 \\ A\_{f}N\_{2} \end{bmatrix}, \mathcal{C} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \tag{5}$$
 
$$\begin{bmatrix} \mathbf{C}\_{1} & \mathbf{0} \\ \mathbf{0} & I\_{h} \end{bmatrix}. \tag{6}$$

−*Af* ∈ *Rh*×*<sup>h</sup>* is an arbitrary stable matrix, *z* ∈ *R<sup>h</sup>* and *N*2 ∈ *Rh*×*h*. *Tr* can be obtained by QR reduction of the matrix *N*.

By defining

$$\Phi := \sum\_{i=1}^{r} (\mu\_i - \hat{\mu}\_i) \{ \mathcal{A}\_i X + + \mathcal{B}\_i u + \mathcal{D}\_i d + \mathcal{M}\_i f\_a + \mathcal{N} f\_s \}, \tag{6}$$

where *x*ˆ is the estimation of the *x*, the TS system (4) can be derived as

$$\begin{cases} \dot{X} = \sum\_{i=1}^{r} \dot{\mu}\_{i} \{ \mathcal{A}\_{i} X + \mathcal{B}\_{i} u + \mathcal{D}\_{i} d + \mathcal{M}\_{i} f\_{d} + \mathcal{N} f\_{s} + \phi \} \\\ Y = CX \end{cases} \tag{7}$$

Moreover, the nonlinear term *φ* is assumed to satisfy the Lipschitz condition as

$$\|\|\phi\|\| \le \gamma \parallel \|\|\mathbf{x} - \hat{\mathbf{x}}\|\|\,. \tag{8}$$

$$\forall \mathbf{x}, \mathbf{\hat{x}} \in \mathbb{R}^n. \tag{8}$$

To design a sliding mode observer, some assumptions and lemmas are needed as follows.

**Assumption 1.**

$$rank(\mathcal{C}[\mathcal{M}\_i \mathcal{N}]) = q + h \tag{9}$$

**Assumption 2.**

$$n > p \ge q + h \tag{10}$$

**Assumption 3.**

$$
rank \begin{bmatrix} sI\_{n+h-\mathcal{A}\_i} & \mathcal{M}\_i & \mathcal{N} \\ \mathcal{C} & 0 & 0 \end{bmatrix} = n + 2h + q \tag{11}
$$

*for all s satisfyingRe*(*s*) ≥ 0 *holds.*

### **Lemma 1.**

(a) *If Assumptions 1 and 2 are satisfied, then there exist changes of coordinates Ti such that*

$$\mathcal{A}\_{\mathbf{i}} = \begin{bmatrix} \mathcal{A}\_{11.\mathbf{i}} & \mathcal{A}\_{12.\mathbf{i}} \\ \begin{bmatrix} \mathcal{A}\_{211.\mathbf{i}} \\ \mathcal{A}\_{212.\mathbf{i}} \end{bmatrix} & \mathcal{A}\_{22.\mathbf{i}} \end{bmatrix}, \mathcal{M}\_{\mathbf{i}} = \begin{bmatrix} 0 \\ \mathcal{M}\_{21} \end{bmatrix}, \mathcal{N} = \begin{bmatrix} 0 \\ \mathcal{N}\_2 \end{bmatrix}, \mathcal{D}\_{\mathbf{i}} = \begin{bmatrix} \mathcal{D}\_{1,\mathbf{i}} \\ \mathcal{D}\_{2,\mathbf{i}} \end{bmatrix}, \mathcal{C} = \begin{bmatrix} 0 & T\_0 \end{bmatrix}, \tag{12}$$

*where* A11.*i* ∈ *R*(*n*+*h*−*p*)×(*n*+*h*−*p*) *,* A211.*i* ∈ *R*(*p*−*q*−*h*)×(*n*+*h*−*p*) *,* D2.*i* ∈ *Rp*×*l, and T*0 ∈ *Rp*×*p is an orthogonal matrix. Matrices* M2.*i* ∈ *Rp*<sup>×</sup>*q,*N2 ∈ *Rp*×*<sup>h</sup> can have the following structure:*

$$\mathcal{M}\_{2\text{i}} = \begin{bmatrix} 0\\\mathcal{M}\_{0\text{i}} \end{bmatrix}, \mathcal{N}\_2 = \begin{bmatrix} 0\\\mathcal{N}\_0 \end{bmatrix}. \tag{13}$$

*With*M0.*i* ∈ *<sup>R</sup>*(*q*+*h*)<sup>×</sup>*q,*N0 ∈ *R*(*q*+*h*)×*hare nonsingular.*

(b) *The pairs (*A11.*i*, A21.*i) are detectable if and only if the invariant zeros of {*A*<sup>i</sup>*, [M*i* N ], C*} lie in* C− *and it happens if and only if Assumption 3 is satisfied.*

**Assumption 4.** *The unknown vectors fa and fs and the derivatives of the μi for i* ∈ {1. . . . .*<sup>r</sup>*} *are assumed to be norm bounded by some known constants. Therefore,*

$$\parallel f\_a \parallel \leq \rho\_{a\prime}; \quad \parallel f\_s \parallel \leq \rho\_s; \quad \parallel \dot{\mu}\_i \parallel \leq \rho\_{mi}. \tag{14}$$

**Lemma 2.** *Ref.* [19] *parameterized linear matrix inequality (PLMI)* ∑*ri*=<sup>1</sup> ∑*rj*=<sup>1</sup> *<sup>μ</sup>i<sup>μ</sup>jQij* < 0 *is fulfilled if the following conditions hold:*

$$\begin{cases} \ R\_{ii} < 0 & \text{for } i = 1, \dots, r\\ \frac{2}{r-1} R\_{ii} + R\_{ij} + R\_{ji} < 0 & \text{for } i \neq j = 1, \dots, r \end{cases} \tag{15}$$

#### **3. TS Fuzzy-Based Sliding Mode Observer Design**

The proposed TS sliding mode observer for the nonlinear system (2) in the new coordinate (10) is as follows:

$$\begin{cases}
\dot{\hat{X}} = \sum\_{i=1}^{r} \dot{\mu}\_{i} \left\{ \mathcal{A}\_{i} \hat{X} + \mathcal{B}\_{i} u + \mathcal{G}\_{l,i} v\_{Y} + \mathcal{G}\_{n,i} v\_{a,i} + \mathcal{G}\_{n,i} v\_{s} \right\} \\
\hat{Y} = \mathcal{C} \hat{X}
\end{cases} \tag{16}$$

where G*<sup>n</sup>*.*<sup>i</sup>* and G*<sup>l</sup>*.*<sup>i</sup>* are design matrices of the observer that will be derived through Theorem 1. *eY* := *Y* − *Y* ˆ represents the output error estimation, *νa*.*i* and *νs* are the equivalent output error injections that are used to compensate the errors due to the actuator fault and sensor fault, respectively, and have the following structure:

$$\begin{aligned} \nu\_{a,i} &= \begin{cases} \eta\_{a,i} \frac{\|c\_Y\|}{c\_Y} & \varepsilon\_Y \neq 0 \\ 0 & \text{otherwise} \end{cases} \\ \nu\_s &= \begin{cases} \eta\_s \frac{\|c\_Y\|}{c\_Y} & \varepsilon\_Y \neq 0 \\ 0 & \text{otherwise} \end{cases} \end{aligned} \tag{17}$$

where *η<sup>a</sup>*.*i* and *ηs* are two positive scalars such that

$$\begin{aligned} \eta\_{a,i} &\geq \rho\_a \parallel T\_0 \mathcal{M}\_{2,i} \parallel \max\_j (\frac{\|\mathcal{P}\_{2,j}\|}{\lambda\_{\min} \left(\mathcal{P}\_{2,j}\right)}) + w\_{a,i} \\ \eta\_s &\geq \rho\_s \parallel T\_0 \mathcal{N}\_2 \parallel \max\_j (\frac{\mathcal{P}\_{2,j}}{\lambda\_{\min} \left(\mathcal{P}\_{2,j}\right)}) + w\_s \end{aligned} \quad \forall i, j \in \{1, \dots, r\}. \tag{18}$$

*wa*.*i* and *ws* are two arbitrary positive constants.

The observer (16) guarantees that the state estimation error converges to a pre-designed sliding surface in finite time and then, asymptotically to zero. Define state estimation error as *e* := *X* − *X* ˆ . By subtracting the observer dynamics from the system dynamic (7) in the new coordinate (12), the state estimation error dynamic can be given as

$$\dot{e} = \sum\_{i=1}^{r} \dot{\mu}\_{i} \begin{Bmatrix} (\mathcal{A}\_{i} - \mathcal{G}\_{I,i}\mathcal{C})e + \mathcal{M}\_{i}f\_{a} - \mathcal{G}\_{n,i}\nu\_{a,i} \\ + \mathcal{N}f\_{s} - \mathcal{G}\_{n,i}\nu\_{s} + \mathcal{D}\_{i}d + \phi \end{Bmatrix} . \tag{19}$$

By partitioning *φ* as *φ* = *φ<sup>T</sup>*1 *φT*2 *T* and applying a further change of coordinates

$$\begin{bmatrix} T\_{L,i} = \begin{bmatrix} I\_{n+h-p} & L\_i \\ 0 & T\_0 \end{bmatrix}, \ L\_i = \begin{bmatrix} \overline{L}\_i & 0 \end{bmatrix} \in \mathbb{R}^{(n+h-p)\times p} \tag{20}$$

where *Li* ∈ *R*(*n*+*h*−*p*)×(*p*−*q*−*h*) is a stabilizing gain matrix, it is straightforward to see that

$$\begin{cases} \begin{aligned} \mathcal{A}\_{i} &= \begin{bmatrix} \mathcal{A}\_{11,i} + L\_{i}\mathcal{A}\_{21,i} & \mathcal{A}\_{12,i} \\ T\_{0}\mathcal{A}\_{21,i} & \mathcal{A}\_{22,i} \end{bmatrix} \\ \widetilde{\mathcal{M}}\_{i} &= \begin{bmatrix} 0 \\ T\_{0}\mathcal{M}\_{2i} \end{bmatrix} \\ \widetilde{\mathcal{N}} &= \begin{bmatrix} 0 \\ T\_{0}\mathcal{N}\_{2} \end{bmatrix} \\ \widetilde{\mathcal{D}}\_{i} &= \begin{bmatrix} \mathcal{D}\_{1,i} + L\_{i}\mathcal{D}\_{2,i} \\ T\_{0}\mathcal{D}\_{2i} \end{bmatrix} \\ \widetilde{\mathcal{C}} &= \begin{bmatrix} 0 & I\_{P} \end{bmatrix} \\ \widetilde{\mathcal{G}}\_{n,i} &= \begin{bmatrix} 0 \\ I\_{p} \end{bmatrix} \\ \widetilde{\mathcal{G}}\_{i,i} &= \begin{bmatrix} \mathcal{A}\_{12,i} \\ \mathcal{A}\_{22,i} - \mathcal{A}\_{s,i} \end{bmatrix} \\ \widetilde{\mathcal{P}} &= \begin{bmatrix} T\_{L,i}\Phi\_{1} \\ T\_{L,i}\Phi\_{2} \end{bmatrix} \end{aligned} \tag{21}$$

where A*<sup>s</sup>*.*<sup>i</sup>* are arbitrary stable design matrices. Through the new coordinate, the error dynamic (19) can be re-written as

$$\dot{\mathcal{E}} = \begin{bmatrix} \dot{\mathcal{e}}\_{1} \\ \dot{\mathcal{e}}\_{Y} \end{bmatrix} = \sum\_{i=1}^{r} \dot{\mu}\_{i} \left\{ \begin{matrix} \mathcal{A}\_{t.} \, \widetilde{\boldsymbol{\mathcal{e}}} + T\_{L,i} \boldsymbol{\Phi} + \widetilde{\mathcal{M}}\_{i} f\_{a} - \\ \mathcal{G}\_{n.i} \boldsymbol{\nu}\_{a.i} + \widetilde{\mathcal{N}} f\_{s} - \widetilde{\mathcal{G}}\_{n.i} \boldsymbol{\nu}\_{s} + \mathcal{D}\_{i} d \end{matrix} \right\} \tag{22}$$

where

$$
\tilde{\mathcal{A}}\_{\mathbf{t},\mathbf{i}} = \begin{bmatrix}
\mathcal{A}\_{11.\mathbf{i}} + L\_{i}\mathcal{A}\_{21.\mathbf{i}} & 0 \\
T\_{0}\mathcal{A}\_{21.\mathbf{i}} & \mathcal{A}\_{\mathbf{s},\mathbf{i}}
\end{bmatrix} \,. \tag{23}
$$

The goal is to design the matrices *Li* such that the asymptotic stability of (22) is assured while the following specified *H*∞ performance is guaranteed:

$$\parallel \tilde{\epsilon}^2 \parallel \subseteq \theta^2 \parallel d^2 \parallel. \tag{24}$$

The following theorem provides sufficient conditions to ensure asymptotic stability of the state estimation error (22) with maximized admissible Lipschitz constant *γ* in (8) and minimized *H*∞ performance gain *ϑ* in (24).

**Theorem 1.** *If there exist feasible solutions for the following optimization problem with a fixed scalar* 0 ≤ *λ* ≤ 1

$$\begin{array}{ll} \min[\lambda(\sigma + \varepsilon) + (1 - \lambda)\theta] \\ Subject \\ \operatorname{sig}(R\_{ii}) < 0 & \text{for } i = 1, \dots, r \\ \operatorname{sig}(\frac{2}{r - 1} R\_{ii} + R\_{ij} + R\_{ji}) & \text{for } i \neq j = 1, \dots, r \\ -\operatorname{sig}(P\_{1i}) < 0 & \text{for } i = i = 1, \dots, r \\ -\operatorname{sig}(P\_{2i}) < 0 & \text{for } i = i = 1, \dots, r \\ -\varepsilon < 0 \\ -\sigma < 0 \\ -\theta < 0 \end{array} \tag{25}$$

*where*

$$\begin{aligned} R\_{ij} &= \begin{bmatrix} \Phi\_{1,ij} & (P\_{2,j}T\_0\mathcal{A}\_{21,i})^T & \Phi\_{3,ij} \\ P\_{2,j}T\_0\mathcal{A}\_{21,i} & \Phi\_{2,ij} & P\_{2,j}T\_0\mathcal{D}\_{2,i} \\ \Phi\_{3,ij}^T & (P\_{2,j}T\_0\mathcal{D}\_{21})^T & -\beta I\_l \end{bmatrix} \\ \Phi\_{1,ij} &= \left(\mathcal{A}\_{11,i} + L\_i\mathcal{A}\_{21,i}\right)^T P\_{1j} + P\_{1j}(\mathcal{A}\_{11,i} + L\_i\mathcal{A}\_{21i}) + \varepsilon^{-1}P\_{1,j}P\_{1,j}r \\ &+ \left(\sigma^{-1} + 1\right)I\_{n+h-p} + \sum\_{k=1}^r q\_{mk}P\_{1,k} \\ \Phi\_{2ij} &= \mathcal{A}\_{s1}^T P\_{2j} + P\_{2,j}\mathcal{A}\_{s.i} + \varepsilon^{-1}P\_{2,j}P\_{2,j} + \left(\sigma^{-1} + 1\right)I\_{p} + \sum\_{k=1}^r q\_{mk}P\_{2,k} \\ \Phi\_{3,ij} &= P\_{1,j}\mathcal{D}\_{1,i} + P\_{1,j}L\_i\mathcal{D}\_{2,i} \end{aligned} \tag{26}$$

*and eig represents eigenvalues of a matrix, then, the estimation error (22) is asymptotically stable with the maximized admissible Lipschitz constant γ*∗ = max(*γ*) = 1 *TL TL*−<sup>1</sup>√*εσ* and the derived *Li matrices can be used for the purpose of simultaneous fault reconstruction.*

**Proof.** The proof of this theorem is done by using a positive NQLF as follows

$$V = \hat{\varepsilon}^{\Gamma} \left( \sum\_{j=1}^{r} \hat{\mu}\_{j} P\_{j} \right) \tilde{\mathbf{e}}\_{\prime} \tag{27}$$

where *Pj* = diag*<sup>P</sup>*1*j*, *<sup>P</sup>*2*j* with *<sup>P</sup>*1*j* ∈ *R*(*n*+*h*−*p*)×(*n*+*h*−*p*) and *<sup>P</sup>*2*j* ∈ *Rp*×*p* are symmetric positive definite matrices. The time derivative of the candidate Lyapunov function along the trajectory (22) is given by

$$\dot{V} = \begin{split} \dot{V} &= \sum\_{i=1}^{r} \sum\_{j=1}^{r} \dot{\mu}\_{i} \dot{\mu}\_{j} \{ \hat{\epsilon}^{T} (\mathcal{A}\_{\text{t.i}} \,^{T} P\_{j} + P\_{j} \mathcal{A}\_{\text{t.i}} + \sum\_{k=1}^{r} \dot{\mu}\_{k} P\_{k}) \tilde{\epsilon} + 2 \hat{\epsilon}^{T} P\_{\hat{j}} (T\_{\text{L.i}} \phi + \widetilde{\mathcal{M}}\_{\text{i}} f\_{\text{f}} - 1) \\ & \quad \tilde{\mathcal{G}}\_{\text{n.i}} \nu\_{\text{a.i}} + \overline{\mathcal{N}} f\_{\text{s}} - \tilde{\mathcal{G}}\_{\text{n.i}} \nu\_{\text{s}} + \tilde{\mathcal{D}}\_{\text{i}} d \}). \end{split} \tag{28}$$

From (14), (17), (18) and (21), one has:

\$ *<sup>e</sup>TPj*M'*i fa* − <sup>G</sup>\$*n*.*iνa*.*<sup>i</sup>*= *eYTP*2.*jT*0M2.*i fa* − *η<sup>a</sup>*.*i eY<sup>T</sup> <sup>P</sup>*2.*jeY eY* ≤ *eYTP*2.*jT*0M2.*i fa* <sup>−</sup>*η<sup>a</sup>*.*i eY<sup>T</sup> <sup>P</sup>*2.*jeY eY* ≤ *eYTP*2.*jT*0M2.*i fa* <sup>−</sup>*η<sup>a</sup>*.*iλmin<sup>P</sup>*2.*<sup>j</sup> eY* ≤ *eY ρa <sup>P</sup>*2.*j T*0M2.*i* <sup>−</sup>*η<sup>a</sup>*.*iλmin<sup>P</sup>*2.*<sup>j</sup>* ≤ <sup>−</sup>*wa*.*iλmin<sup>P</sup>*2.*<sup>j</sup> eY* ≤ 0 \$ *eTPj*N'*fs* − <sup>G</sup>\$*n*.*iνs* = *eYTP*2.*jT*0N2 *fs* − *ηs eY<sup>T</sup> <sup>P</sup>*2.*jeY eY* ≤ *eYTP*2.*jT*0N2 *fs* <sup>−</sup>*ηs eY<sup>T</sup> <sup>P</sup>*2.*jeY eY* ≤ *eYTP*2.*jT*0N2 *fs* <sup>−</sup>*ηsλmin<sup>P</sup>*2.*<sup>j</sup> eY* ≤ *eY ρs <sup>P</sup>*2.*j <sup>T</sup>*0N2 <sup>−</sup>*ηsλmin<sup>P</sup>*2.*<sup>j</sup>* ≤ −*wsλmin<sup>P</sup>*2.*<sup>j</sup> eY* ≤ 0. (29)

From (14), one has

$$
\sum\_{k=1}^{r} \dot{\mu}\_k P\_k \le \sum\_{k=1}^{r} \rho\_{mi} P\_k. \tag{30}
$$

By considering the fact that 2P*<sup>T</sup>*Q ≤ 1*ε*P*T*<sup>P</sup> + *ε*Q*<sup>T</sup>*Q with *ε* > 0 and using (8), one obtains

$$2\tilde{\varepsilon}^{T}P\_{\hat{\jmath}}T\_{L}\phi \leq \frac{1}{\varepsilon}\tilde{\varepsilon}^{T}P\_{\hat{\jmath}}P\_{\hat{\jmath}}\tilde{\varepsilon} + \varepsilon\phi^{T}T\_{L}^{\top}T\_{L}\phi \leq \frac{1}{\varepsilon}\tilde{\varepsilon}^{T}P\_{\hat{\jmath}}P\_{\hat{\jmath}}\tilde{\varepsilon} + \varepsilon\mathfrak{a}^{2}||\left.\tilde{\varepsilon}\right|^{2}\tag{31}$$

where *α* := *TL TL*−<sup>1</sup> *γ*. By Substituting (29)–(31) into (28), one has

$$\dot{V} \le \sum\_{i=1}^{r} \Sigma\_{j=1}^{r} \mu\_{i} \dot{\mu}\_{j} \left\{ \tilde{\epsilon}^{T} \left( \mathcal{A}\_{t\dot{u}}{}^{T} P\_{\dot{\jmath}} + P\_{\dot{\jmath}} \mathcal{A}\_{t\dot{u}} + \frac{1}{\varepsilon} P\_{\dot{\jmath}} P\_{\dot{\jmath}} + \varepsilon a^{2} I\_{n+h} + \sum\_{k=1}^{r} \rho\_{mk} P\_{k} \right) \tilde{\epsilon} + \\ \tag{32}$$
 
$$2 \tilde{\epsilon}^{T} P\_{\dot{\jmath}} \vec{\mathcal{D}}\_{\dot{\imath}} d \right\}. \tag{33}$$

By defining parameter *σ* := *εα*<sup>2</sup>−<sup>1</sup> and the cost function as *J* := .*<sup>V</sup>*(\$*e*) + \$*eT*\$*e* − *<sup>ϑ</sup>*2*dTd*, one has

$$\begin{split} J \le \boldsymbol{\Sigma}\_{i=1}^{r} \boldsymbol{\Sigma}\_{j=1}^{r} \hat{\boldsymbol{\mu}}\_{i} \hat{\boldsymbol{\mu}}\_{j} \{ \hat{\boldsymbol{\varepsilon}}^{T} (\mathcal{A}\_{\mathtt{t},i} \boldsymbol{\hbox{P}}\_{j} + \boldsymbol{\mathsf{P}}\_{j} \boldsymbol{\mathcal{A}}\_{\mathtt{t},i} + \boldsymbol{\varepsilon}^{-1} \boldsymbol{\mathsf{P}}\_{j} \boldsymbol{\mathcal{P}}\_{j} + \boldsymbol{\sigma}^{-1} \boldsymbol{I}\_{n+\mathsf{h}} + \boldsymbol{I}\_{n+\mathsf{h}} + \\ \boldsymbol{\Sigma}\_{k=1}^{r} \boldsymbol{\rho}\_{mk} \boldsymbol{\mathsf{P}}\_{k} \boldsymbol{\mathsf{)}} \tilde{\boldsymbol{\varepsilon}} + \boldsymbol{2} \hat{\boldsymbol{\varepsilon}}^{T} \boldsymbol{P}\_{j} \tilde{\boldsymbol{\mathcal{D}}}\_{i} \boldsymbol{d} - \boldsymbol{\beta} \boldsymbol{d}^{T} \boldsymbol{d} \}, \end{split} \tag{33}$$

where *β* := *ϑ*2. By placing (23)in (33) and considering the diagonal structure of *Pj*, the inequality (33) is continued as

$$J \le \sum\_{i=1}^{r} \sum\_{j=1}^{r} \mu\_i \mu\_j \begin{bmatrix} \varepsilon\_1\\ \varepsilon\_Y\\ \xi \end{bmatrix}^T \Lambda \begin{bmatrix} \varepsilon\_1\\ \varepsilon\_y\\ \xi \end{bmatrix} < 0,\tag{34}$$

where

$$\begin{aligned} \mathbf{A} &= \begin{bmatrix} \Phi\_{1,ij} & \left(P\_{2,j}T\_{0}\mathcal{A}\_{21,i}\right)^{T} & \Phi\_{3,ij} \\ P\_{2,j}T\_{0}\mathcal{A}\_{21,i} & \Phi\_{2,ij} & P\_{2,j}T\_{0}\mathcal{D}\_{2,i} \\ \Phi\_{3,ji}^{T} & \left(P\_{2,j}T\_{0}\mathcal{D}\_{2,i}\right)^{T} & -\beta I\_{I} \end{bmatrix} \\ \Phi\_{1,ij} &= \left(\mathcal{A}\_{11,i} + L\_{i}\mathcal{A}\_{21,i}\right)^{T}P\_{1,j} + P\_{1j}\left(\mathcal{A}\_{11,i} + L\_{i}\mathcal{A}\_{21,i}\right) + \varepsilon^{-1}P\_{1,j}P\_{1j} + \\ & \left(\sigma^{-1} + 1\right)I\_{n+h-p} + \sum\_{k=1}^{r}q\_{mk}P\_{1,k} \\ \Phi\_{2,ij} &= \mathcal{A}\_{s,j}^{T}P\_{2,j} + P\_{2,j}\mathcal{A}\_{s,i} + \varepsilon^{-1}P\_{2,j}P\_{2,j} + \left(\sigma^{-1} + 1\right)I\_{p} + \sum\_{k=1}^{r}q\_{mk}P\_{2,k} \\ \Phi\_{3,ij} &= P\_{1,j}\mathcal{D}\_{1,i} + P\_{1,j}I\_{i}\mathcal{D}\_{2,i}. \end{aligned} \tag{35}$$

Based on the Congruence [20], the inequality (35) is satisfied by

$$\sum\_{i=1}^{r} \sum\_{j=1}^{r} \mu\_{i} \mu\_{j} \begin{bmatrix} \Phi\_{1,ij} & \left(P\_{2,j}T\_{0}\mathcal{A}\_{21,i}\right)^{T} & \Phi\_{3,ij} \\ P\_{2,j}T\_{0}\mathcal{A}\_{21,i} & \Phi\_{2,ij} & P\_{2,j}T\_{0}\mathcal{D}\_{2,i} \\ \Phi\_{3,ij}^{-} & \left(P\_{2,j}T\_{0}\mathcal{D}\_{2,i}\right)^{T} & -\beta I\_{l} \end{bmatrix} < 0. \tag{36}$$

By utilizing Lemma 2, the summations and the fuzzy membership functions will be omitted from inequalities (36). Finally, the results are going to be used for fmincon function which is a nonlinear optimization tool in MATLAB software and finds the minimum of a problem specified by

$$\begin{array}{c} \min\_{\mathbf{x}} f(\mathbf{x}) \\ \text{subject to} \begin{cases} c(\mathbf{x}) \le 0 \\ c\boldsymbol{eq}(\mathbf{x}) \le 0 \\ A \cdot \mathbf{x} \le b \\ A\boldsymbol{eq} \cdot \mathbf{x} = b\boldsymbol{eq} \\ lb \le \mathbf{x} \le ub \end{cases} \end{array} \tag{37}$$

The matrix inequalities (36) should be changed to some one-dimensional inequalities, and the optimization problem can be defined as (25) and (26). In addition, from the *α* and *σ* found by the optimization problem, the maximum admissible Lipschitz constant and the minimum can be calculated as

$$\gamma^\* = \frac{1}{\|\|T\_L\|\| \|T\_L^{-1}\| \|\sqrt{\sigma \varepsilon} \text{}}.\tag{38}$$


#### **4. Simultaneous Fault Reconstruction**

In Section 3, an *H*∞ sliding mode observer is designed in which two discontinuous terms (19) are considered to reconstruct simultaneous faults in the presence of an unknown disturbance based on the measured signals *u* and *y*. Along the sliding surface *eY* = .*eY* = 0. Consequently, (22) on the sliding surface changes to

$$\sum\_{i=1}^{r} \mu\_i \begin{Bmatrix} T\_0 \mathcal{A}\_{21.i} e\_1 + T\_0 \phi\_2 + T\_0 \mathcal{M}\_{2i} f\_a - \\ \nu\_{\text{eq},i} + T\_0 \mathcal{N}\_2 f\_s - \nu\_{\text{eq}} + T\_0 \mathcal{D}\_{2i} d \end{Bmatrix} = 0,\tag{39}$$

where *<sup>ν</sup>eqa*,*<sup>i</sup>* and *<sup>ν</sup>eqs* are approximations of the equivalent output error injection terms (17) required to maintain the sliding motion and can be defined as

$$\nu\_{\text{eqa.i}} = \eta\_{a.i} \frac{\mathfrak{e}\_Y}{||\!\!\!/\!c\_Y\!\!/\!\!/ \ + \delta\_a}; \quad \nu\_{\text{eqs}} = \eta\_s \frac{\mathfrak{e}\_Y}{||\!\!\!/\!\!/\!c\_Y\!\!/\!\!/ \ + \delta\_s},\tag{40}$$

where *δf* and *δd* are small positive constants. Consequently, (40) leads to

$$0 = \Sigma\_{i=1}^{\tau} \hat{\mu}\_i \left\{ \begin{matrix} \mathcal{A}\_{21.i} e\_1 + \phi\_2 + \mathcal{M}\_{2.i} f\_a \\ -T\_0^{-1} \nu\_{\text{eq},i} + \mathcal{N}\_2 f\_s - T\_0^{-1} \nu\_{\text{eq}} + \mathcal{D}\_{2.i} d \end{matrix} \right\}. \tag{41}$$

On the other hand, using (8) and (24) can show that the term A21.i*e*1 + *φ*2 + D2.*id* is bounded as

$$\leq \begin{array}{c} \|\|\mathcal{A}\_{21.i}e\_1 + \phi\_2 + \mathcal{D}\_{2,i}d\|\| \\ \leq \left( \|\|\mathcal{A}\_{21.i}\|\| + \gamma \|\|T\_L^{-1}\|\|\right) \|\|e\_1\|\| + \|\|\mathcal{D}\_{2,i}\|\|\|d\| \right) \\ \leq \left( \|\|\mathcal{A}\_{21.i}\|\| + \gamma \|\|T\_L^{-1}\|\|\right) \|\|\tilde{e}\|\| + \|\|\mathcal{D}\_{2,i}\|\|\|d\| \end{array} \tag{42}$$

where = *μ* A21,i +*γ TL*−<sup>1</sup> + <sup>D</sup>2,*i* . Therefore, for small values of *d* , the actuator and sensor faults can be estimated as

$$\hat{f}\_a = \left(\sum\_{k=1}^r \hat{\mu}\_i \{\mathcal{M}\_{2,i}\} \right)^\dagger T\_0^{-1} \sum\_{k=1}^r \hat{\mu}\_i \left\{ \eta\_{a,i} \frac{c\_Y}{||\ c\_Y|| + \delta\_a} \right\} \tag{43}$$

$$f\_s = \mathcal{N}\_2^\dagger T\_0^{-1} \eta\_s \frac{\mathcal{E}\_Y}{||\!\!\!\!\!\!x\_Y\ ||\!\!\!+\delta\_s\!\!\!\/)}\,,\tag{44}$$

where † shows the pseudo-inverse of a matrix.

**Remark 1.** *The numerical solution of Theorem 1 can be summarized as follows:*


### **5. Numerical Example**

In this section, a three-state variable continuous stirred tank reactor (CSTR) system is utilized to show the effectiveness of the proposed sliding mode observer in both actuator and sensor faults reconstruction in the presence of an unknown disturbance. To show the performance improvement of the proposed approach, the obtained results are compared to the LMI approach presented in ref. [17].

Consider a well-mixed variable CSTR in which a multi-component chemical reaction A B → C is being carried out. The nonlinear dynamics of the CSTR is given by the following model [21],

$$
\dot{\mathbf{x}} = \begin{bmatrix} -4 & 0.8796 & 0 \\ 3 & -3.6388 & 0 \\ 0 & 1.7592 & -1 \end{bmatrix} \mathbf{x} + \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \mathbf{u} + \begin{bmatrix} 0.5 \mathbf{x}\_2^2 \\ -1.5 \mathbf{x}\_2^2 \\ \mathbf{x}\_2^2 \end{bmatrix} \tag{45}
$$

where *x* = [*x*1 *x*2 *<sup>x</sup>*3]*<sup>T</sup>*, and the states represent the concentrations of the species *A*, *B*, and *C*, respectively. To check the advantage of the proposed method, two faults and a disturbance are added to the dynamic (45) as

$$\begin{cases} \dot{\mathbf{x}} = \begin{bmatrix} -4 & 0.8796 + 0.5 \mathbf{x}\_2 & 0\\ 3 & -3.6388 - 1.5 \mathbf{x}\_2 & 0\\ 0 & 1.7592 + \mathbf{x}\_2 & -1 \end{bmatrix} \mathbf{x} + \begin{bmatrix} 0\\ 1\\ 0 \end{bmatrix} \mathbf{u} \\\ \mathbf{x} + \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} f\_a + \begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix} \boldsymbol{\xi} \\\ \mathbf{y} = \begin{bmatrix} 0 & 1 & 0\\ 1 & 0 & 0\\ 0 & 0 & 1 \end{bmatrix} \mathbf{x} + \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix} f\_s \end{cases} \tag{46}$$

It is supposed that the concentration of *B* is dimensionless, which means that *x*2 ∈ [−<sup>1</sup> 1]. Consequently, by using TS rules, two membership functions can be defined as

$$h\_1 = \frac{1 - \chi\_2}{2}; \quad h\_2 = \frac{1 + \chi\_2}{2}. \tag{47}$$

Therefore, the local linear TS matrices can be determined as

$$\begin{aligned} A\_1 &= \begin{bmatrix} -4 & 0.8796 - 0.5 & 0 \\ 3 & -3.6388 + 1.5 & 0 \\ 0 & 1.7592 - 1 & -1 \end{bmatrix}; B\_1 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}; M\_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}; D\_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \\\ A\_2 &= \begin{bmatrix} -4 & 0.8796 + 0.5 & 0 \\ 3 & -3.6388 - 1.5 & 0 \\ 0 & 1.7592 + 1 & -1 \end{bmatrix}; B\_2 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}; M\_2 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}; D\_2 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}. \end{aligned} \tag{48}$$

The TS fuzzy system matrices satisfy all the assumptions; therefore, the TS fuzzy sliding observer (16) can be designed.

For simulation, the parameters and input signal are chosen as *u* = sin(*t*), *Af* = 1, A*s* = <sup>−</sup>5*I*, *ηd*.*i* = *ηa* = 25, *ηs* = 25, *δa* = 0.01 and *δs* = 0.01. and the initial conditions are chosen as *X*0 = 1 1.2 1 0*T* and *X*ˆ 0 = 1.5 2.8 0.5 <sup>0</sup>*<sup>T</sup>*. Moreover, the disturbance is chosen as *d* = 0.1 sin(0.2*t*)*<sup>x</sup>*3 and the shape is shown in Figure 1.

**Figure 1.** Disturbance d(t).

The maximum Lipschitz constant and the minimum *H*∞ performance gain obtained through fmincon function in MATLAB on Theorem 1 are *γ*∗ = 0.8358 and *ϑ*∗ = 0.2982. The observer matrices are derived as

$$\begin{aligned} \mathcal{G}\_{l,1} &= \begin{bmatrix} 0.4499 & 1 & 0 \\ 3.3912 & 3 & 0 \\ 4.8998 & 0 & 0 \\ 1.1852 & 0 & 4 \end{bmatrix}, \mathcal{G}\_{l,2} = \begin{bmatrix} 2.4723 & 1 & 0 \\ -0.2487 & 3 & 0 \\ 8.9447 & 0 & 0 \\ 1.7921 & 0 & 4 \end{bmatrix}, \\ \mathcal{G}\_{n,1} &= \begin{bmatrix} 0 & 1 & 0 \\ 1.1852 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \mathcal{G}\_{n,2} = \begin{bmatrix} 0 & 1 & 0 \\ 1.7921 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \end{aligned}$$

It should be noted that the initial point for fmincon is chosen based on the results of the related published papers. Figure 2 shows the state estimation error which converges to a neighborhood close to zero due to the unknown disturbance.

**Figure 2.** State estimation error in the presence of faults and disturbance.

Figures 3 and 4 show that the proposed TS-based SMO is able to reconstruct the simultaneous faults with a small error in the presence of an unknown disturbance.

**Figure 3.** Actuator fault *fa*(*t*) (by blue solid line) and its estimation ˆ*fa*(*t*) (by red dashed line).

**Figure 4.** Sensor fault *fs*(*t*) (by blue solid line) and its estimation ˆ*fs*(*t*) (by red dashed line).

The proposed approach is compared with another non-quadratic Lyapunov-based approach using linear optimization analysis based on LMIs [17]. Figure 5 describes the fault estimation errors using both approaches.

**Figure 5.** Fault estimation errors (**a**). Actuator fault, (**b**). Sensor fault (the proposed approach by red solid line and ref. [17] by green dashed line).

As can be seen, the proposed nonlinear approach is less conservative and can estimate both actuator and sensor faults with smaller errors. In addition, the proposed approach has less computational burden. In Table 1, a quantitative comparison between the proposed approach and the LMI approach presented in ref. [17] is considered. In this table, the Euclidean and infinity norms of the fault error estimations are compared and the improvements are calculated as

$$Improvement\left(\%\right) = \left(\frac{F\_l - F\_{l'}}{F\_l}\right) \* 100,\tag{49}$$

where *Fn* and *Fl* represent the *Error o f f* using the LMI approach [17] and the nonlinear proposed approach, respectively.

**Table 1.** The norm specifications of the fault reconstruction errors for two different approaches.


As can be seen in Table 1, the proposed approach improves the fault estimation accuracies by more than 30%.

### **6. Discussion**

In this paper, a nonlinear optimization approach for simultaneous actuator and sensor fault reconstruction in nonlinear systems subjected to unknown disturbances was proposed. First, an augmented system with just an actuator fault was created. Then, by using the fuzzy Lyapunov stability analysis and two changes of coordinates, the parameters of a sliding mode observer were designed through a nonlinear optimization problem while maximizing the Lipschitz constant and minimizing the *H*∞ performance index. The optimization problem was solved by using fmincon in MATLAB as a nonlinear optimization tool. By utilizing the optimum points, both actuator and sensor faults were reconstructed properly. Finally, the simulation results showed a considerable increase in the fault reconstruction accuracy with constraints with smaller dimensions.

**Author Contributions:** Conceptualization, S.A.; methodology, S.A., M.M., and G.G.W.; software, S.A.; validation, S.A., M.M.; formal analysis, S.A.; investigation, S.A.; resources, S.A.; writing—original draft preparation, S.A. and G.G.W.; writing—review and editing, S.A. and M.M.; visualization, S.A. and M.M.; supervision, M.M. and G.G.W.; project administration, M.M. and G.G.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


### *Article* **Vibro-Acoustic Distributed Sensing for Large-Scale Data-Driven Leak Detection on Urban Distribution Mains**

**Lili Bykerk \* and Jaime Valls Miro**

 Correspondence: lili.bykerk@uts.edu.au

**Abstract:** Non-surfacing leaks constitute the dominant source of water losses for utilities worldwide. This paper presents advanced data-driven analysis methods for leak monitoring using commercial field-deployable semi-permanent vibro-acoustic sensors, evaluated on live data collected from extensive multi-sensor deployments across a sprawling metropolitan city. This necessarily includes a wide variety of pipeline sizes, materials and surrounding soils, as well as leak sources and rates brought about by external factors. The novel proposition for structural pipe health monitoring shows that excellent leak/no-leak classification results (>94% accuracy) can be observed using Convolutional Neural Networks (CNNs) trained with Short-Time Fourier Transforms (STFTs) of the raw audio files. Most notably, it is shown how this can be achieved irrespective of the sensor used, with four models from different manufactures being part of the investigation, and over time across extended densely populated areas.

**Keywords:** water distribution network; vibro-acoustic sensors; leak detection; structural health monitoring; feature extraction; signal processing; machine learning; binary classification; data-driven; neural network

### **1. Introduction**

Potable water mains are critical components of water infrastructure. Many water utilities worldwide are managing underground pipes that have been in use for centuries. Given their age and environmental surroundings, pipes are susceptible to failures often caused by tree roots, corrosion, and/or ground movement. In addition to pipe failures, leaks can also emerge from appurtenances in the pipe network such as hydrants, valves, pipe joints, main tapping points, or service lines. Depending on the environment, water from some leaks may never surface, and will remain hidden, resulting in large water losses. When a leak becomes visible, reactive repairs are undertaken; causing disruption to customers and costly maintenance, which can be challenging for utilities to manage.

Distributed IoT sensors such as digital meters are being increasingly used by utilities to remotely monitor the performance of their network in (near) real-time. This allows the monitoring of water usage habits, and establishing the potential for leaks in the main tap and service line connection to a home. In the distribution network, IoT flow meters have been explored to identify leakage. A small experimental laboratory study contrasting various machine learning algorithms (random forest, decision trees, neural networks, and Support Vector Machine) revealed the former as the best at detecting leaks with a 75% accuracy [1]. These sensors require access to the water column to operate, a nontrivial exercise in distribution networks, thus severely limiting their leak identification and localisation capabilities. They have not been widely adopted by the industry, whose preference is for non-intrusive and portable sensing methods, such as contact acousticsbased signalling. As water discharges from a leak in the pipe network, vibrations are induced and propagated along the pipe wall. To detect hidden leaks, utilities commonly schedule Active Leak Detection (ALD) teams to periodically sweep areas of pipelines using acoustic leak detection equipment such as listening sticks and real-time correlators [2]. The

**Citation:** Bykerk, L.; Valls Miro, J. Vibro-Acoustic Distributed Sensing for Large-Scale Data-Driven Leak Detection on Urban Distribution Mains. *Sensors* **2022**, *22*, 6897. https://doi.org/10.3390/s22186897

Academic Editors: Yongbo Li, Bing Li, Jinchen Ji and Hamed Kalhori

Received: 9 June 2022 Accepted: 7 September 2022 Published: 13 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Robotics Institute (UTS:RI), University of Technology Sydney, Ultimo, NSW 2007, Australia **\***

success of these ALD sweeps can be hindered by the prevalence of environmental and water usage noises during the day, when the sweeps are conducted, and the experience of the user [3]. Depending on the length of the utility's pipeline network, the time that elapses between ALD sweeps may result in hidden leaks remaining undetected for long periods of time, or missed entirely. For the continuous monitoring of the network, alternative methods of leak detection are also employed, such as Minimum Night Flow (MNF) and pressure transient analysis using existing network hardware (flow meters and pressure gauges). These methods, however, are only capable of detecting possible leakage in a given area, and will not provide any means of locating or pinpointing a leak location.

Vibro-acoustic sensing has been widely adopted by water utilities [4,5], mainly due to the relative low cost, ease of implementation, flexibility, and passive nature of the system, whereby no permanent changes to the water pipeline network are required for the technology to function. These semi-permanent devices can be used to effectively and remotely monitor the water mains for leakages—generally between 2 and 4 a.m.—when there is low network activity (the time period when MNF is calculated) and low levels of environmental noise. However, there are several challenges and uncertainties in analysing the acoustic sensor data for leak detection: (1) a leak noise can be attenuated due to fittings, joints, junctions, and service connections which are often undocumented; (2) the presence of environmental noises, and water usage in the network; (3) the signal recorded by the acoustic sensor is directly related to the pipe material and diameter, proximity to the leak noise and the quality of the sensor's mounting point on the asset [6,7].

Semi-permanent vibro-acoustic noise loggers have in-built algorithms which raise leak alarms based on the intensity and consistency of the recorded noise [8]. Using this method, a large number of false positive leak alarms are raised by the system, and quieter leaks are missed (false negatives). By understanding the limitations of these in-built leak detection algorithms, and the uncertainties affecting the data recorded by an acoustic logger, there is a motivation and need for a more advanced analysis of the acoustic data to achieve accurate and reliable leak detection. Signal processing and data-driven machine learning methods are common techniques to increase the reliability of leak detection using vibro-acoustic noise loggers. Most leak detection approaches in the literature extract features from an audio recording, which is either directly used to interpret signals for leakage [7,9–11], or used to train machine learning classifiers. Models trained with simple features such as the absolute noise level recorded by loggers [12], or cross-correlation and coherence signals from neighbouring correlating noise loggers [13] have also demonstrated high accuracies in leak localisation and classification, respectively. Other methods rely on having collected baseline signals or signals before and after a leak has been repaired [14–17], to establish leak detection thresholds. Due to the persistent nature of a leak signal in an audio recording, Recurrence Plots (RPs) offer an alternative input for a binary classifier, with RPs of leak noises showing strong deterministic properties [18].

Data-driven machine learning studies have leveraged frequency-domain features of acoustic signals for training such as the Power Spectrum Density (PSD) [14,19] or Intrinsic Mode Functions (IMFs) [20]. Whilst these features may prove effective for classification in controlled laboratory tests, they are easily influenced by a temporary ambient noise which can mask a persistent leak noise in the PSD, leading to decreased classification performance [21]. This limitation is critical for sensor deployments on functioning pipeline networks, where both persistent and transient non-leak noises are prevalent, leak noises are not controlled, and the pipe network can be complex. Many of these studies are conducted in controlled laboratory environments [22–25], with few examples of data sets obtained from real pipeline networks. Data collected from in-field deployments of vibro-acoustic sensors have predominantly contained unbalanced data sets, with small amounts of leak samples [18,26,27] or data collected with minimal interference noises, where Gaussian White Noise (GWN) with different Signal-to-noise Ratios (SNRs) are added to augmen<sup>t</sup> the data sets [21]. Unbalanced data sets remain a limitation in evaluating the success of any leak detection classifier, particularly for real-world sensor deployments where pipe materials,

diameters, soil properties, service lines, and offtakes, amongs<sup>t</sup> other geospatial features, can vary significantly and heavily influence the signals recorded by the vibro-acoustic sensors.

Time–frequency features generated using discrete Short-time Fourier Transforms (STFTs), such as spectrograms, reveal the temporal nature of a signal that is not captured by analysing frequency-domain features alone. STFTs can provide rich features for machine learning; however, STFTs as standalone input features are rarely used for acoustic signal analysis, due to a limitation in the time–frequency resolution [28]. In an effort to balance the relationship between the time and frequency resolutions, a Time–Frequency Convolutional Neural Network (TFCNN), with three different spectrogram resolutions as inputs is proposed to study the efficacy of classification under varying SNR conditions in real pipeline networks [21]. The TFCNN model is compared against a range of other common classifiers, including a CNN trained with Fast Fourier Transform (FFT) data (Frequency Convolutional Neural Network (FCNN)). It is reported that the spectrogram contains sufficient defining characteristics of a leak signal (as opposed to time, or frequency-based features alone), and is therefore more favourable and reliable as an input to a leak detection system. Mel-frequency spectrograms, which closely align with the human perception of sound, are also commonly used as features in machine learning applications, including leak classification problems [29,30].

This paper evaluates state-of-the-art data-driven methods for leak classification using data collected from semi-permanent vibro-acoustic logger deployments in small reticulation mains across metropolitan Sydney over the course of up to 24 months. Data from a range of commercially available types of vibro-acoustic sensors deployed in different metropolitan areas of a utility-managed water network are used to evaluate the efficacy of existing data-driven methods (FCNN and TFCNN models [21]) for reliable leak detection in urban distribution mains.

The paper is organised as follows. Section 2 details the vibro-acoustic sensors and data loggers, data collection, signal processing, data curation, feature extraction and binary classification methods. Section 3 presents the results and discussion. Finally, the conclusions and future work are presented in Section 4.

#### **2. Materials and Methods**

#### *2.1. Vibro-Acoustic Sensors and Data Loggers*

Vibro-acoustic logging hardware consists of a vibro-acoustic sensor, data logger, and other peripherals such as GSM transmitters and antennas to send the collected data to the cloud. Vibro-acoustic sensors function on the premise that when water leaks through a pipe it creates vibrations due to the pressure differential between the inside and the outside of a pipe. The waves can travel thorough both pipe material and water, allowing the sensors to measure the vibration inflicted on the material, or directly in the water column. Standard manufacturer specifications indicate that vibro-acoustic sensors are effective in recording leakage noises on reticulation mains typically smaller than 375 mm in diameter, and can correlate over distances of up to 150 m between adjacent loggers.

In December 2019, a range of vibro-acoustic sensors deployments commenced across six Central Business District (CBD) areas in metropolitan Sydney (summarised in Table 1). In these CBD areas, five different types of commercially available semi-permanent vibroacoustic loggers (see Figure 1) were deployed. These could not be collocated in the same spots to compare performance given the chamber's physical limitations, and the extent of exposed asset to mount them on (see some examples Figure 2), and were thus distributed to cover separate areas and zones (when within the same area). It should also be noted that, given the attachment coupling of the sensor to the appurtenance, they can not physically measure the exact same point regardless, so arranging them over an extended geographical coverage of the city is more representative of a realistic deployment in a practical sense for comparison, and more effective to search for as many leaks as possible over a given time period for a more robust validation of the proposed scheme.

(**a**)

**Figure 1.** Range of vibro-acoustic loggers installed across metropolitan Sydney: (**a**) HWM PermaNET SU, (**b**) HWM PermaNET+, (**c**) SebaKMT Sebalog N-3, (**d**) Von Roll ORTOMAT-MTC, (**e**) Primayer Enigma3m.


**Table 1.** Vibro-Acoustic Logger Deployment Details.

Each of the five different vibro-acoustic sensors and data loggers are functionally equivalent, whereby vibrations in the pipeline network are detected by the sensors and recorded with the data logging hardware. The key differences between the loggers are the quality of the hardware used, the level of processing of the data, both on the logger itself and the cloud-based portals, and the user programmable settings (e.g., audio recording duration and time).

The sensors have mostly been installed on appurtenances (valves and hydrants) attached to Cast Iron Cement Lined (CICL) or Steel Cement Lined (SCL) pipelines, ranging in diameter from 100 mm to 450 mm and up to more than 100 years old. Depending on the available space in a hydrant or valve chamber and the condition of the assets, the sensors are often mounted with differing orientations and mounting points, as shown in Figure 2.

**Figure 2.** HWM PermaNET SU loggers deployed on hydrant control valves in different locations.

### *2.2. Data Collection*

Noises in the pipe network are measured every day at a time of low water usage and theoretically low environmental noise (between 2–4 a.m.). With the exception of the Sebalog N-3 vibro-acoustic sensors, all of the deployed sensors were programmed to record a 10-s duration audio file daily. The Sebalog N-3 units have limited configuration settings, thus, despite recording a 2.5 s duration audio clip every day, the audio file is only sent to the

cloud if the logger itself determines that a leak is present (through a noise level threshold algorithm). In addition to audio recordings, other noise-level data are also available for analysis from most of the loggers; however, these were not used in this study. All loggers are equipped with integrated modems and transmit data to the cloud, with the raw acoustic data (audio files) available through the sensor manufacturers FTP servers, or accessible through API calls.

The collected data consist of 'leak' and 'no-leak' audio recordings originating from a range of leak sources across the six deployment areas. Approximately 70% of the detected leaks were hidden, many of which were in built-up areas and estimated to have been present for up to 10 years. The detected leaks were found to have emerged from a range of sources, including hydrants ∼30%, valves ∼20%, main taps ∼22%, private ∼11%, service lines ∼12%, mains (leaks/breaks) ∼2.5%, and meter taps ∼2.5%. Some examples of hidden leaks detected by the vibro-acoustic sensors are shown in Figure 3.

**Figure 3.** Examples of detected hidden leaks from a range of vibro-acoustic sensors and deployment areas (pictures supplied by utility field crews, taken during repairs): (**a**) Copper service leak, (**b**) Main tap leak, (**c**) Main tap leak (clamped service line), (**d**) Leak on main tap coupling piece, (**e**) leaking main tap (excavation site), and (**f**) leaking main tap repaired with full circumference pipe clamp.

The four logger data sets (HWM, Von Roll, SebaKMT, Primayer) mostly include loggers that recorded leak noises from the first day they were deployed. These existing leaks were monitored for several days to confirm the likelihood of the presence of a leak, prior to raising these locations for in-field investigation by the water utility. The leaks were confirmed on-site by skilled network technicians through use of listening sticks and pinpointed using real-time correlators. Significant delays were experienced with some repair jobs, due to the complex locations of some leaks. Consequently, many of the recorded leak signals contain the same underlying persistent leak noise, occasionally overlaid with transient environmental noises. As existing leaks were gradually repaired and baseline noise levels could be achieved, the emergence and evolution of new leaks were able to be identified and the data sets grew further in size over the course of the deployments. Since

only a small subset of all of the deployed loggers detected leaks, only these loggers were included in the data sets (both before and after leak repairs), to ensure a relative balance of the data sets. To improve on the robustness of the classification in the presence of other environmental noises, those loggers which only recorded 'no-leak' signals for the duration of their deployment could also be used.

#### *2.3. Data Analysis-Signal Processing*

Across the six deployment areas, a wide variety of leak noises were recorded. Some sensors were located very close to the leak source, and others at a distance, with variations in pipe diameters and materials, and several offtakes between. Using STFT signal processing techniques, acoustic signals can be best visualised by generating spectrograms, which reveal temporal changes to the frequency and power of a signal. If the audio recording contains persistent noise, without the presence of any intermittent external noises, PSD line plots can also provide a simple means of signal comparison. As leaks are continuous noise sources, their higher-power frequencies are persistent in the spectrum, for the duration of an audio recording. On the other hand, non-leak noises—such as those from environmental sources, or water usage—are mostly transient in nature, with intermittent frequency components. Some environmental noises, however, can be persistent, such as mechanical or electrical equipment which commonly emit high-power, low-frequency noises usually with narrow frequency bands. Due to these characteristic features, persistent and intermittent 'no-leak' signals are easily distinguishable from 'leak' signals in a spectrogram (see Figure 4 for an example). Due to the close coupling of the sensors to the water main, leaks generally have a distinguishing pattern in the audio spectrum, even in the presence of other intermittent noises.

**Figure 4.** Spectrograms for a sensor detecting a variety of noises: (**a**) before and (**b**) after a leak was repaired ∼21 m away from the sensor. The sensor was situated <1 m from a pedestrian crossing.

By clustering the loggers in the pipeline networks to ensure neighbouring loggers are able to correlate, often more than one logger was able to record noise from a single leak source; one such example is shown in Figure 5, where six vibro-acoustic sensors were able to detect the leak noise caused by a broken back on the pipe (main break). The shift in the dominant leak frequency can be observed with increased distance between the leak and the sensor. Other contributing factors to the frequency shift could also include pipe material change and junctions and offtakes between the leak and the sensor. In general, the further away the sensor is from the leak location, the more the higher-frequency components of the spectrum are attenuated, and the lower frequency noises are more prevalent. With increased distance between the logger and the noise source, the intensity (power) of the noise also decays.

**Figure 5.** An illustration of a single leak source originating from a broken back pipe, and the noise spectrograms picked up by 6 HWM PermaNET+ loggers in the vicinity. The 7th sensor on the right is located too far away to pick it up.

A leak located close to the hydrant where the logger is installed will typically have elevated noise across the spectrum, often with higher power in high frequency band/s. Figure 6 shows PSD line plots from HWM vibro-acoustic sensors detecting leaks at the hydrant they were installed on. All leaks were on screw-down-type hydrants, and suspected to be of varying leak rates. The vibro-acoustic sensors were installed in different orientations and with different contact points on the hydrants, similar to those mounting configurations shown in Figure 2. There is a significant difference in the PSDs of each hydrant leak. The difference in signals could be attributed to many factors including the quality of the attachment point of the sensor on the asset or the magnitude of the leak. Comparing these signals to a 'quiet', baseline signal with no leak present, it is noted that all four leak signals show elevated power across almost all recorded frequencies, and clear peaks in the spectrum at certain frequencies. This indicates that despite leaking hydrant signals being inconsistent across multiple loggers/hydrants, there is still a significant deviation from a baseline 'no-leak' noise that is sufficient to detect a leak.

**Figure 6.** PSD plot: four different hydrant leak signals (loggers on leaking hydrants).

#### *2.4. Data Analysis-Data Curation*

In order to curate the collected data to train machine learning classifiers, the raw acoustic data were analysed in the time, frequency, and time–frequency domains using the signal processing and visualisation techniques (PSD, STFT, FFT) described in Section 2.3. Analysis of the vibro-acoustic data, in conjunction with feedback from the utility field crews, allowed for a database to be compiled with key information pertaining to the leaks. The collated and curated data consist of the audio file name, date of audio recording and binary class label ('leak' or 'no-leak'). Other collected information not used for the binary classification includes the leak source, and the distance, pipe material/s, and diameter/s between the leak and logger.

Most of the detected leaks were present prior to the loggers being installed; however, there were some instances where new leaks emerged during the logger deployment time. For the leaks that were already present, the collected acoustic signals were generally stable and unchanged in their frequency. In some instances, noticeable frequency/power shifts in the spectrum were observed (see Figure 7)—possibly from a leak worsening, or the sensor being slightly shifted/dislodged on the asset due to environmental factors or human intervention. These cases were carefully analysed to ensure that the data was representative of a true 'leak' or 'no-leak' signal, and the logger had not been dislodged from the asset.

The curated data from individual loggers were compiled into complete data sets for each logger *manufacturer* (for a total of four discrete data sets). Due to the slightly differing frequency ranges and audio recording duration (as listed in Table 1), individual classifiers were trained for each sensor manufacturer and were evaluated individually. With nearly 300 loggers deployed across the six deployment areas over the course of two years, the complete data sets from each logger manufacturer are vast. To ensure a relative balance of data for each data set, only data from loggers which recorded both 'leak' and 'no-leak' signals throughout their deployment are included in the data sets.

#### *2.5. Feature Extraction and Binary Classification*

To evaluate the performance of a binary classifier for each of the data sets, an extensive literature review on the topic of data-driven leak detection methods with acoustic data was first conducted. A critical criteria in determining the suitability of a classifier was the reported performance with data collected from real pipeline networks. With limited studies and evaluations utilising data from deployments of loggers outside of controlled laboratory environments, it was found that CNN-based classifiers leveraging features obtained from FFTs and STFTs (spectrograms) had the best reported performance, compared with other common binary classification models.

Both the FCNN and TFCNN models from [21] are trained and evaluated in this paper, using the four discrete data sets collected from the six deployment areas. The data sets were first prepared by augmenting [31] (splitting) each audio file into several 1 s audio chunks. For the SebaKMT loggers, only the first two seconds of the 2.5 s duration audio recordings were used. All other loggers (with 10 s duration) audio recordings were split into 10 individual audio chunks. Due to the vast array of samples, including various 'no-leak' noise sources, it was not deemed necessary to further augmen<sup>t</sup> the data sets by adding GWN with different SNRs into the raw signals. To extract the frequency bands of interest where leaks are most common, the 1 s duration audio samples are also bandpass filtered (100–2000 Hz). With the data sets collected and curated, finally, a random 80% of each complete data set (for each logger type) was used for training and 20% for testing. The models (whose structures are shown in Figure 8) were implemented in Python 3.9 using Keras [32] and TensorFlow [33] version 2.6.0.

**Figure 7.** Leaking main tap-changing frequency distribution is visible in the PSD (**top**) and spectrograms (**bottom**) from consecutive days. The logger was situated approximately 59 m away from the leak location, with noise being propagated along a straight section of 150 mm diameter CICL pipe.

**Figure 8.** TFCNN and FCNN model structures.

The input to the FCNN model is purely frequency-domain based—a FFT of the 1 s audio signal. The inputs to the TFCNN model are three spectrograms generated from the same 1 s audio signal. Each spectrogram is generated with a different time–frequency resolution (high time, transitional, high frequency) and is intended to improve the leak detection performance, since 'no-leak' and 'leak' noises have different time–frequency components. A high-time-resolution spectrogram reflects the change of the signal in the time-domain, where a leak signal is most stable. In these spectrograms, the presence of any transient noises are most obvious. The high-frequency-resolution spectrogram reflects the spectral structure and energy distribution of the signal in the frequency domain. Whilst transient noises can still be observed in these spectrograms, the leak frequency or frequencies are best represented. Finally, the transitional time–frequency resolution is intended to balance the relationship between the time and frequency resolutions. Due to different sampling rates of the four sets of loggers, the dimensions of the three spectrograms which are the inputs for the TFCNN model differ slightly, as listed in Table 2.



#### **3. Results and Discussion**

Tables 3 and 4 summarise the results of the FCNN and TFCNN classification models for the four logger data sets. The metrics used to evaluate the model performance were accuracy, sensitivity, and specificity. The following abbreviations are used to simplify the presentation of the equations: True Positive (*TP*); True Negative (*TN*); False Positive (*FP*); False Negative (*FN*). Accuracy is the measure of the classifier's overall correct classification performance: *TP* + *TN*/(*TP* + *TN* + *FP* + *FN*). Sensitivity is the classifier's ability to label a 'leak' signal as 'leak' (recall of the positive class): *TP*/(*TP* + *FN*). Specificity is the classifier's ability to label a 'no-leak' signal as 'no-leak' (recall of the negative class): *TN*/(*TN* + *FP*).

Despite the excellent performance of the FCNN model, as was reported in [21], it was found that the TFCNN model consistently outperformed the FCNN model across each of the performance metrics studied (with the exception of the specificity of the HWM loggers). This indicates that the spectrogram-based inputs are more effective than purely frequencydomain-based inputs in representing the characteristics of both 'leak' and 'no-leak' signals for binary classification.

**Table 3.** FCNN Results.


**Table 4.** TFCNN Results.


Figures 9 and 10 show the confusion matrices for each of the four different TFCNN and FCNN trained models, respectively. For a practical leak detection system that water utilities can rely on, high accuracy but also high specificity (true negative) and sensitivity (true positive) rates are key performance metrics. A reliable leak detection system will minimise the false positive leak alarms, to ensure that any follow-up field investigations are for real leak events, maximising the efficiency for utilities.

**Figure 9.** Confusion matrices for TFCNN models. HWM (**top left**), Von Roll (**top right**), Seba KMT (**bottom left**) and Primayer (**bottom right**).

**Figure 10.** Confusion matrices for FCNN models. HWM (**top left**), Von Roll (**top right**), Seba KMT (**bottom left**) and Primayer (**bottom right**).

Despite the limited data available from SebaKMT Sebalog N-3 loggers and a data imbalance with 'leak' and 'no-leak' signals across 3/4 of the data sets, the results indicate that the type of sensor used (different vibro-acoustic sensor with different sampling rate, sensitivity, etc.) does not affect the performance of the classifier. Furthermore, the results demonstrate that a leak detection system using either the FCNN or TFCNN model can be effectively trained with data from a single location both before and after a leak repair.

The excellent classification results show that—irrespective of the type of vibro-acoustic sensor used—the classifiers have been able to learn sufficiently with data from a range of deployment areas, where leak sources, pipe sizes and materials as well as soil conditions have varied widely. The results indicate that this is particularly relevant for identifying leaks in built-up CBD areas, where a variety of 'no-leak' persistent and transient environmental noises are prevalent, even in the early hours of the morning. Considering all of the factors that affect the recorded vibro-acoustic signals, the results presented show grea<sup>t</sup> promise for water utilities looking to integrate the use of semi-permanent vibro-acoustic sensors into their business-as-usual practice for structural pipe health monitoring. Through the use of vibro-acoustic sensors and early detection of hidden leaks, proactive maintenance can be scheduled and conducted, with minimal impact to the customer.

The classification performance may be improved by including a large number of 'noleak' signals from elsewhere in the pipeline network during a deployment i.e., by including those other loggers that did not record both 'leak' and 'no-leak' signals in the data set. This will help further train the classifier to better discriminate between 'leak' and 'no-leak' noises, further increasing the reliability and robustness of the classification.

### **4. Conclusions**

This paper studied and analysed the performance of a range of different semi-permanent vibro-acoustic sensors deployed in six CBD areas across wider Sydney for extended periods of time. Following careful collation, analysis and curation of the collected acoustic data, two state-of-the-art CNN-based classification models (FCNN and TFCNN) were trained and tested for each of the four logger types.

The results presented point towards the potency of FFT and STFT signal processing for CNN-based classification of vibro-acoustic measurements. Moreover, they represent the first known documented comparison of a variety of different semi-permanent sensing hardware, with a special underscore on the study having been undertaken on live deployments. The results demonstrate that these state-of-the-art methods are not only applicable to one particular make and model of semi-permanent acoustic sensor, as was previously documented in the single relevant case study found in the literature. Classification accuracies in the range of [94.63–98.51%] were achieved with the best performer, the TFCNN model, for all the sensors studied.

Future work to enhance the results of this study would involve obtaining further validated data collected from a wider variety of deployment locations and CBD areas. As indicated in Section 3, the robustness and reliability of these classifiers may also be improved by adding further existing 'no-leak' audio recordings into the data sets. Finally, despite their sensing hardware similarities, a comparison of the classification performance of semi-permanent and Lift and Shift (L&S) vibro-acoustic sensors (intended for short-term deployments, rather than continuous monitoring) would also provide further insights into the potential success and value of implementing smart leak detection methods for utilities.

**Author Contributions:** Conceptualisation, L.B. and J.V.M.; methodology, L.B. and J.V.M.; software, L.B.; validation, L.B. and J.V.M.; formal analysis, L.B.; investigation, L.B.; resources, J.V.M.; data curation, L.B.; writing—original draft preparation, L.B.; writing—review and editing, J.V.M.; visualisation, L.B.; supervision, J.V.M.; project administration, L.B. and J.V.M.; funding acquisition, J.V.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The data presented in this study cannot be made publicly available due to confidentiality; readers should contact the corresponding author for details.

**Acknowledgments:** The authors would like to thank the authors of [21] for their feedback and discussions about their TFCNN work during the development of the work presented in this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **Abbreviations**

The following abbreviations are used in this manuscript:


## **References**


*Article*

### **Composite Multiscale Transition Permutation Entropy-Based Fault Diagnosis of Bearings**

**Jing Guo 1,2, Biao Ma 1, Tiangang Zou 2, Lin Gui 2 and Yongbo Li 3,\***


**Abstract:** When considering the transition probability matrix of ordinal patterns, transition permutation entropy (TPE) can effectively extract fault features by quantifying the irregularity and complexity of signals. However, TPE can only characterize the complexity of the vibration signals at a single scale. Therefore, a multiscale transition permutation entropy (MTPE) technique has been proposed. However, the original multiscale method still has some inherent defects in the coarse-grained process, such as considerably shortening the length of time series at large scale, which leads to a low entropy evaluation accuracy. In order to solve these problems, a composite multiscale transition permutation entropy (CMTPE) method was proposed in order to improve the incomplete coarse-grained analysis of MTPE by avoiding the loss of some key information in the original fault signals, and to improve the performance of feature extraction, robustness to noise, and accuracy of entropy estimation. A fault diagnosis strategy based on CMTPE and an extreme learning machine (ELM) was proposed. Both simulation and experimental signals verified the advantages of the proposed CMTPE method. The results show that, compared with other comparison strategies, this strategy has better robustness, and can carry out feature recognition and bearing fault diagnosis more accurately and with improved stability.

**Keywords:** composite multiscale transition permutation entropy; bearing; fault diagnosis; feature extraction

### **1. Introduction**

Rotating machinery is essential mechanical equipment which has been widely used in large-scale industries, such as aerospace, vehicle engineering, electrical engineering, machinery manufacturing, and so on. Bearings are an important part of electric and power transmissions, and a bearing fault is one of the main causes of rotating machinery faults [1,2]. Bearing fault diagnosis is vital for the healthy maintenance and reliable operation of rotating machinery. Bearing health detection can reduce the occurrence of rotating machinery failures, thus ensuring system safety and reducing maintenance costs [3].

In bearing health monitoring, vibration signal analysis is a commonly used fault feature extraction method. This is because the vibration signal contains a wealth of useful fault information [4]. In recent decades, bearing vibration signal processing and pattern recognition have become a research hotspot in the field of fault diagnosis. Kankar et al. [5] used an artificial neural network (ANN) and a support vector machine (SVM) to diagnose bearing faults, and verified that machine learning can be used for the automatic diagnosis of a bearing fault. With the development of deep learning algorithms, neural networks such as convolutional neural networks (CNN) have been effectively used in bearing fault diagnosis [6,7]. The authors of [8] proposed a deep learning model to preprocess the original signal for noise removal to overcome the shortcoming of the traditional intelligent method being greatly affected by noise. However, in practice, the vibration signal of the bearing has obvious nonlinear and non-stationary characteristics. Therefore, the analysis of

**Citation:** Guo, J.; Ma, B.; Zou, T.; Gui,L.; Li, Y. Composite Multiscale Transition Permutation Entropy-Based Fault Diagnosis of Bearings. *Sensors* **2022**, *22*, 7809. https://doi.org/10.3390/s22207809

Academic Editor: Giovanni Betta

Received: 17 August 2022 Accepted: 21 September 2022 Published: 14 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China

nonlinear dynamic behavior and the extraction of useful and reliable fault features have become the key steps in fault diagnosis.

Entropy methods can quantify the dynamic trend and randomness of a nonlinear time series. In recent years, the use of entropy-based methods has become an important tool for analyzing signal complexity and feature extraction, and has been effectively used in fault diagnosis [9]. At present, approximate entropy (AE), sample entropy (SE), permutation entropy (PE), fuzzy entropy (FE), and diversity entropy (DE) methods are widely used in fault diagnosis of rotating machinery. AE was proposed by Pincus [10], and can be used to measure the regularity of a time series. Richman et al. [11] proposed SE, which uses the association dimension to induce SE to show relative consistency, and its complexity analysis performance is better than that of AE. However, SE has the disadvantage of relying heavily on data length [12]. The FE method proposed by Chen et al. [13] is an improvement on the SE method. Other researchers [14] have used FE to measure the complexity of vibration signals, and they verified the excellent dynamic tracking performance of FE and its ability to obtain a more accurate complexity estimation. In consideration of noise resistance and computational efficiency, Wang et al. [15] proposed DE, which uses cosine similarity to measure the divergence of orbits. Bandt et al. [16] proposed PE to calculate the state probability of track arrangemen<sup>t</sup> order, showing it has high computational efficiency and good feature extraction effects in signal processing.

In order to overcome the problem of insufficient information analysis when using SE to evaluate the dynamic characteristics and randomness of complex data, Costa et al. [17] proposed using multiscale sample entropy (MSE) to evaluate the complexity of time series over a range of scales. MSE has been successfully applied to analyze vibration signals generated by various dynamic behaviors [18–20]. Based on the same coarsening process as MSE, FE, PE, and DE can be extended to multiscale fuzzy entropy (MFE) [21–23], multiscale permutation entropy (MPE) [24–26], and multiscale diversity entropy (MDE) [15]. Through coarse-grained processing, the original time series can be divided into several short time series. The coarse-grained time series can represent the dynamic distribution characteristics of the original signal at a certain scale. Therefore, multiscale processing enhances the performance of entropy methods in evaluating signal complexity. On a multiscale basis, the combination of the symbol dynamic filtering process and the entropy method can not only remove noise, but also significantly improve the computational efficiency and feature extraction ability [27–29].

Recently, Zhang et al. [30] proposed a novel complexity estimation method, transition permutation entropy (TPE). TPE is different from the other methods in that it extracts the features of a time series from the transition probability matrix of ordinal patterns. Because the eigenvalue is very important when analyzing the dynamic behavior, TPE uses the positive eigenvalue of the transition probability matrix to calculate the entropy. This improves the feature identification performance of a time series. However, TPE only analyzes a time series using a single scale, which reduces the accuracy and comprehensiveness of the information analysis. Therefore, in this work we extended TPE to multiscale analysis. In the traditional multiscale calculation method, the coarse-grained time series is obtained by calculating the arithmetic mean of adjacent data points on the original time series without overlapping. The length of the coarse-grained time series obtained in this way is too short at large scale, and the accuracy and stability will be affected. Therefore, in this work, composite multiscale transition permutation entropy (CMTPE) was proposed as a way to solve these obstacles. When the composite multiscale method is used to coarse-grain the original time series, a coarse-grained time series with different starting points can be obtained at each scale, and the number is equal to the scale factor. Each coarse-grained time series can characterize the dynamic characteristics and randomness of the original signal, which can effectively enhance the accuracy and stability of TPE. CMTPE not only had excellent feature extraction performance, but also had better robustness to noise. The superiority of the proposed CMTPE method was verified by the simulation and experimental signals of bearing faults. The main contributions of this study are given as follows:


The rest of this paper is organized as follows: The concept of CMTPE is introduced in Section 2. In Section 3, results from simulation signals used to validate the superiority of CMTPE are reported. In Section 4, the effectiveness of CMTPE was verified using experimental signals. Finally, the conclusion of this article is provided in Section 5.

### **2. Methodology**

In this section, the theories of TPE and MTPE are introduced in detail. In addition, the concept of the CMTPE algorithm is proposed.

#### *2.1. Transition Permutation Entropy (TPE)*

A time series of length *N* can be written as *X* = {*<sup>x</sup>*1, *x*2, ··· , *xN*}. The TPE algorithm is introduced as follows:

Step 1. According to the phase space embedding theory, reconstruct *X* into a series of vectors with embedding dimension *m*. The reconstructed phase space is as follows:

$$X = \begin{bmatrix} \mathbf{x}\_1 & \mathbf{x}\_2 & \cdots & \mathbf{x}\_{N-m+1} \\ \mathbf{x}\_2 & \mathbf{x}\_3 & \cdots & \mathbf{x}\_{N-m+2} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{x}\_m & \mathbf{x}\_{m+1} & \cdots & \mathbf{x}\_N \end{bmatrix} \tag{1}$$

The reconstructed vectors can be expressed as *Xi* = {*xi*, *xi*+1, ··· , *xi*+*m*−<sup>1</sup>}, 1 ≤ *i* ≤ *N* − *m* + 1.

Step 2. Compare the size relationship of the elements in the vector, so as to identify the ordinal pattern of each vector. When the embedding dimension is *m*, there are *m*! possible ordinal patterns for any vector. For example, if the embedding dimension *m* = 3, there are 6 ordinal patterns. The size relationship of all vectors can be expressed by the size relationship of 0, 1, 2. For vector (*xk*−1, *xk*, *xk*+<sup>1</sup>) = (18, 3, <sup>15</sup>), the element size relationship is *xk* < *xk*+<sup>1</sup> < *xk*−1, and its corresponding ordinal pattern is *π* = 2, 0, 1.

Step 3. Calculate the transition probability between the ordinal patterns corresponding to all vectors to obtain the following transition probability matrix *P*:

$$P = \begin{bmatrix} p\_{11} & p\_{12} & \cdots & p\_{1n} \\ p\_{21} & p\_{22} & \cdots & p\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ p\_{n1} & p\_{n2} & \cdots & p\_{nn} \end{bmatrix} \tag{2}$$

where *n* = *m*!, and *pij* represents the probability of an transition from pattern *πi* to pattern *<sup>π</sup>j*.

Step 4. Calculate the TPE using the positive eigenvalue of matrix *P*. If the eigenvalue is complex number, its real part is taken. If *P* has *n* positive eigenvalues *λi*, TPE is calculated as follows:

$$\text{TPE}(X, m) = -\sum\_{i=1}^{n} \frac{\lambda\_i}{m!} \log \frac{\lambda\_i}{m!} \tag{3}$$

#### *2.2. Multiscale Transition Permutation Entropy (MTPE)*

The entropy calculated from a single scale can only provide poor fault information. Multiscale analysis can extract more useful information from time series of different scales. The MTPE algorithm consists of two steps: (1) Conducting a coarse-graining process to obtain the series of the original time series at different scales; and (2) Calculating the TPE of each coarse-grained time series. First, divide the time series *X* = {*<sup>x</sup>*1, *x*2, ··· , *xN*} into multiscale time series *Y* = {*<sup>Y</sup>*1,*Y*2, ··· ,*Yτ*}. The scale factor *τ* is a positive integer. The time series at any scale is *Yτ* = *y*1,*<sup>τ</sup>*, *y*2,*<sup>τ</sup>*, ··· , *yj*,*<sup>τ</sup>* , *j* = *N*/*<sup>τ</sup>*, and the calculation is as follows:

$$y\_{s,\tau} = \frac{1}{\tau} \sum\_{i=\tau(s-1)+1}^{\tau s} x\_i \tag{4}$$

Then, the time series of all scales obtained from the above process can be substituted into the TPE algorithm to calculate the MTPE as follows:

$$\text{MTPE}(X, m, \tau) = \text{TPE}(Y\_{\tau}, m) \tag{5}$$

#### *2.3. Composite Multiscale Transition Permutation Entropy (CMTPE)*

In order to further improve the accuracy and stability of MTPE, CMTPE was proposed. When the scale factor is *τ*, *τ* different time series can be obtained. MTPE only considers the first coarse-grained time series at each scale, while CMTPE considers all *τ* coarse-grained time series. As shown in Figure 1, when the scale factor *τ* = 3, MTPE only calculates one coarse-grained time series *y* (3) 1 , while CMTPE calculates three coarse-grained time series *y* (3) 1 , *y* (3) 2 , and *y* (3) 3 . Divide the time series *X* = {*<sup>x</sup>*1, *x*2, ··· , *xN*} into multiscale time series *Y* = {*<sup>Y</sup>*1,*Y*2, ··· ,*Yτ*}. The time series at any scale is *Yτ* = , *y*1 *τ*, *y*2 *τ*, ··· , *yk τ*, ··· , *yτ τ* -, where *yk τ* = , *<sup>y</sup>k*1,*τ*, *<sup>y</sup>k*2,*τ*, ··· , *yk j*,*<sup>τ</sup>* -. The calculation is as follows:

$$y\_{j,\tau}^k = \frac{1}{\tau} \sum\_{i=\tau(j-1)+k}^{\tau j + k-1} x\_i \tag{6}$$

CMTPE is the mean of the TPE values for all coarse-grained time series, that is,

$$\text{CMTPE}(X, m, \tau) = \frac{1}{\tau} \sum\_{k=1}^{\tau} \text{TPE}\left(y\_{\tau'}^{k} \, m\right) \tag{7}$$

CMTPE considers all *τ* different coarse-grained time series, at each scale factor *τ*. Therefore, CMTPE can extract more fault information from the original time series. The entropy calculated by this method is more accurate and stable than that calculated by MTPE.

**Figure 1.** Schematic diagram of multiscale coarsening process when scale factor *τ* = 3.

#### *2.4. CMTPE Based Fault Diagnosis Strategy*

In this work, a fault diagnosis strategy based on CMTPE was proposed. In this strategy, an ELM classifier was used to identify different fault types. The overall fault diagnosis framework is shown in Figure 2 [31]. The main steps were as follows:

Step 1. The vibration signals of bearings under different health conditions are measured by sensors.

Step 2. CMTPE is used for feature extraction of vibration signals. Each health condition will provide the corresponding entropy characteristics, representing the complexity of different vibration signals.

Step 3. A part of the fault features is randomly selected as a training set to train the ELM classifier.

Step 4. The remaining features are used as the test set to test the trained ELM, and the fault recognition rate is obtained. Steps 3 and 4 are run 20 times to obtain the average test accuracy.

**Figure 2.** The overall fault diagnosis framework.

#### **3. Simulation Evaluation**

#### *3.1. Simulated Bearing Signal*

In this section, in order to verify the effectiveness and advantages of the proposed CMTPE, we detail the three types of simulated bearing faults which were designed: outer race fault, inner race fault, and ball fault models. The schematic diagram of the three simulated faults is shown in Figure 3.

In the load area, as shown in Figure 3, the sensor was installed at the maximum load density. Figure 3a shows the fault model of an outer race fault. Since the location of a localized defect will not change with time, the impulse force can be regarded as an ideal force. Figure 3b shows the fault model of an inner race fault, which has the same basic assumptions as the outer race fault model. At the peak of the load area, the ball will contact with a localized defect, resulting in the first impulse. After that, the localized defect will rotate with the inner race, so the contact position between the ball and the inner race will change with time. This type of contact will generate an impulse force only when it occurs in the load area. Figure 3c shows the ball fault model, which also has the same basic assumptions as the outer race fault model. In contrast to the inner race fault, a localized

defect will rotate with the ball, and the defect will continuously contact the inner and outer races to continuously generate impulse force [32].

**Figure 3.** The schematic of simulated bearing faults.

The simulated bearing type was an N205 cylindrical roller bearing. The rotating speed was 3000 rpm. The sampling frequency was 10,240 Hz. The detailed bearing dimensions are shown in Table 1.

**Table 1.** Bearing parameters.


The fault frequency can be calculated according to the parameters in Table 1. Main parameters of the bearing: roller diameter *d* = 6.5mm; pitch circle diameter *D* = 35.5mm; number of rollers *Z* = 12; contact angle *α* = 0; rotating speed *v* = 3000rpm. The fault frequency was calculated as follows:

(1) Outer race fault characteristic frequency *f*0

$$f\_0 = \frac{1}{2}Z\left(1 - \frac{d}{D}\cos\alpha\right)\frac{\upsilon}{60} = 245.0704(\text{Hz})\tag{8}$$

(2) Inner race fault characteristic frequency *fi*

$$f\_i = \frac{1}{2}Z\left(1 + \frac{d}{D}\cos\alpha\right)\frac{\upsilon}{60} = 354.9296(\text{Hz})\tag{9}$$

(3) Ball fault characteristic frequency *fe*

$$f\_e = \frac{D}{d} \left( 1 - \left(\frac{d}{D}\right)^2 \cos^2 a \right) \frac{v}{60} = 263.9220 \text{(Hz)}\tag{10}$$

Figure 4 shows the time domain and envelope spectrum of each of the three simulated fault types. Among the data, Figure 4a,c,e depicts the time domain diagrams of the three faults, and Figure 4b,d,f shows the corresponding envelope spectrum diagrams. The fault frequency is marked with a blue arrow in the envelope spectrum.

**Figure 4.** Time domains and spectra of simulated bearing faults: (**a**) time domain of ball fault; (**b**) envelope spectrum of ball fault; (**c**) time domain of outer race fault; (**d**) envelope spectrum of outer race fault; (**e**) time domain of inner race fault; (**f**) envelope spectrum of inner race fault.

#### *3.2. Analysis of Simulation Results*

In the practical working environment, the operation of the equipment is influenced by noise. Therefore, the simulated bearing fault signal was added to Gaussian white noise with different signal-to-noise ratios (SNR) to simulate the actual working conditions. SNR ranged from 10 dB to 40 dB, in 1 dB steps.

In this simulation, MTPE, TPE, MPE, and the proposed CMTPE were used to extract fault features from simulation signals. For the selection of the main parameters when using the above method, there were the following considerations: if the embedding dimension *m* is small, the dynamic process of reconstruction will contain non detailed dynamic information, while if the value of *m* is too large, the number of vectors will decrease, which will lead to the loss of information. In addition, a large value of the scale factor *τ* will lead to information redundancy, and a small value of *τ* will lead to the loss of fault information. Therefore, the recommended value for parameter *τ* is 10–20 [29]. The values for the parameters of the entropy methods used in this study were set as *m* = 3 and *τ* = 20.

The fault diagnosis strategies of each of the four methods combined with ELM were used to identify three simulated bearing faults. For each fault type, the original signal was sliced into 100 samples without overlap, and the data length of each sample was 2048. Therefore, the data set had a total of 300 samples. Among them, 50 samples of each fault type were randomly selected as the ELM training set, and the rest of the samples were used to test the trained ELM. The ELM was run 20 times and the average test accuracy was taken as the final test accuracy. Higher test accuracy means better performance of fault diagnosis strategy, and smaller test variance means better stability. The robustness of the method against noise can be obtained by comparing the test accuracies for each different SNR value.

The test results are shown in Figure 5. It is obvious that no matter the SNR value, the test accuracy of CMTPE was always higher than that of the other methods, and its error bar was also smaller than that of other methods. This shows that CMTPE had the best bearing fault diagnosis performance and the highest test stability. When the signal-to-noise ratio was high, all strategies except TPE had high accuracy. For example, when the SNR range was 30 to 40, the test accuracy of CMTPE, MTPE, and MPE was higher than 95%. However, as the SNR gradually decreased, the test accuracy of MTPE and MPE decreased at a faster speed and a larger range, while CMTPE still had 100% test accuracy, even when the SNR value decreased to 20 dB. Moreover, when the SNR was reduced to 10 dB, the test accuracy of MTPE and MPE was only 61.90% and 71.73%, respectively, while the test accuracy of CMTPE was still 89.60%. This shows that the proposed CMTPE method has better robustness to noise.

**Figure 5.** Test results with different SNR value.

When the SNR was less than 5, the classification accuracy of all methods was less than 60%, and the accuracy near 0 was less than 50%. When the SNR was negative, the accuracy of the classifier was no longer referential, because at this time, the simulation signal had been submerged by noise, and none of the four methods could correctly identify faults. However, when the noise was relatively weak, CMTPE still had the highest classification accuracy and the smallest error bar, compared with other methods, and had better stability. In the case of negative SNR, we used filtering and other noise reduction methods to preprocess the signal to achieve better anti-noise effects.

### **4. Experimental Evaluation**

In this section, we report the testing of the effectiveness of the CMTPE using bearing fault data. CMTPE was compared with TPE, MTPE, and MPE to verify the superiority of the CMTPE-based fault diagnosis strategy.

#### *4.1. Bearing Test Rig and Experimental Data Illustration*

The experimental data were collected on an HD-FD-H-03X rotor rolling bearing fault test rig. The appearance and structure of the platform are shown in Figure 6. In the experiment, the speed of the motor was 1000 rpm and no load was applied. In order to verify the effectiveness of the proposed method, five different health conditions were designed, as shown in Figure 7. One of them was designated as normal, and the other four fault types were inner race crack 4mm (IRC), outer race crack 4 mm (ORC), inner race pitting 3 mm (IRP), and outer race pitting 3 mm (ORP).

**Figure 6.** The bearing fault test rig used in the experiment.

**Figure 7.** Four types of bearing fault designed in the experiment: (**a**) IRC; (**b**) ORC; (**c**) IRP; (**d**) ORP.

The vibration signals of the different health conditions were collected through the acceleration sensor, in which the sampling frequency was 10,240 Hz. Figure 8 shows the time domain and envelope spectrum of each of the four fault states. Figure 8a,c,e,g shows the time domain diagrams of three faults; Figure 8b,d,f,h displays the corresponding envelope spectrum diagrams. The vibration signal of each state was divided into 75 samples for feature extraction, and the length of each sample was 2048. Then, 25 samples of each state were randomly selected to train the ELM classifier, and the remaining samples were used for testing [33]. Therefore, the total number of training and test sets was 125 and 250, respectively.

**Figure 8.** Time domains and spectra of the four fault states: (**a**) time domain of IRC; (**b**) envelope spectrum of IRC; (**c**) time domain of ORC; (**d**) envelope spectrum of ORC; (**e**) time domain of IRP; (**f**) envelope spectrum of IRP; (**g**) time domain of ORP; (**h**) envelope spectrum of ORP.

### *4.2. Comparison Analysis*

According to the proposed fault diagnosis strategy, CMTPE was used to extract the fault features of bearing vibration signals. In addition, in order to prove that CMTPE has better feature extraction and fault diagnosis ability, MPE, TPE, and MTPE were used for comparison. The main parameters of these methods were as follows: the scale factor of the multi-scale methods was *τ* = 20 and the embedding dimension of all methods was *m* = 3. The features extracted by the above four methods were used to train and test the ELM. In order to reduce the error caused by randomness, each method was run 20 times, and then the average test accuracy was taken as the final classification result. The test accuracy of the four strategies over 20 runs is shown in Figure 9. Table 2 shows the classification accuracy and variance. The criteria were: higher accuracy represents better feature extraction ability, and lower variance represents better stability.

**Figure 9.** Test accuracy of the four strategies.


**Table 2.** Average test accuracy and variance of the four methods.

From Table 2 and Figure 9, it can be observed that the fault identification accuracy of TPE was lower than that of the other multiscale methods, and that the identification accuracy of CMTPE reached 98.60%, which was the highest among the multiscale methods. This shows that the multiscale analysis could extract more abundant fault information when processing vibration signals. Moreover, the coarse-grained process also affects the quality of the fault features. The higher identification accuracy proves that the coarse-grained process of the proposed CMTPE method can better grasp the key information related to bearing faults, and it had the best feature extraction effect. Furthermore, the variance of CMTPE was only 0.65%, which was lower than the other methods. This verifies that the proposed fault diagnosis strategy based on CMTPE not only had excellent fault feature extraction performance, but also had the best stability.

As shown in Figure 10, the confusion matrix of the four methods can intuitively visualize the classification performance of each method combined with ELM [34,35]. As shown in Figure 10b,c, TPE and MTPE exhibited many misclassifications; in particular, it was difficult to distinguish between IRC and IRP, while, as shown in Figure 10a,d, CMTPE and MPE have better classification performance. CMTPE had only four misclassifications, and the classification accuracy reached 98.4%, which was the highest among all methods.

**Figure 10.** Confusion matrix of four methods.

In this work, 25 training samples and 50 test samples were selected. In order to eliminate the contingency brought by the specific number of training samples, a performance test of CMTPE with different numbers of training samples was carried out. The classification accuracy was tested with 15, 25, 35, 45, 55 and 65 training samples, and each case was run 20 times to reduce randomness. The results are shown in Figure 11. Obviously, with an increase in the number of training samples, the accuracy of various methods will increase, but CMTPE always had the highest test accuracy compared to the other methods.

**Figure 11.** Effect of the number of training samples on the performance of CMTPE, MTPE, MPE and TPE.

In order to further intuitively compare the feature extraction capabilities of the three multiscale methods, we carried out visual processing on the extracted fault features. In this work, the scale factor of multiscale method was *τ* = 20. Therefore, the t-SNE visualization method was used to reduce the dimension of fault features to two dimensions [36]. The results of this feature visualization are shown in Figure 12. The criterion of feature extraction effect is: the closer the distance between clusters of the same type of features, the farther the distance between clusters of different types of features, which proves that the feature extraction effect of this method is better.

**Figure 12.** Visualization of features extracted by three multiscale entropy methods: (**a**) MPE method; (**b**) MTPE method; (**c**) CMTPE method.

As can be seen from Figure 12b, the features of the states, other than ORC, were obviously mixed, which indicates that MTPE has poor feature extraction performance. Figure 12a demonstrates that the features extracted by MPE could better distinguish between most fault characteristics. However, the distance between clusters of different

state features was small, some feature points were mixed, and the distance within clusters was large, so the effect of feature extraction were poor. In contrast, it can be seen from Figure 12c that CMTPE had the largest inter-cluster distance and the smallest intra-cluster distance, and that the feature extraction performance was the best.

In order to test the minor fault identification ability of the proposed method, the following two cases were designed with different extents of faults. Case 1 included a normal control and eight different degrees of inner and outer race crack faults: normal, inner race crack 0.2 mm, outer race crack 0.2 mm, inner race crack 1 mm, outer race crack 1 mm, inner race crack 2.7 mm, outer race crack 2.7 mm, inner race crack 4 mm, and outer race crack 4 mm. Case 2 included a normal control and six different degrees of pitting faults: normal, inner race pitting 1 mm, outer race pitting 1 mm, inner race pitting 2 mm, outer race pitting 2 mm, inner race pitting 3 mm, and outer race pitting 3 mm.

The vibration signals of each state were again divided into 75 samples, with 25 used for training the ELM, and the rest used for testing. The classification accuracy and variance under the two cases for the four methods are shown in Tables 3 and 4. It was found that CMTPE still had an excellent classification effect for the more minor faults, especially for pitting faults of different degrees; the test accuracy reached 99.67%. CMTPE also still had the highest stability. However, the other three methods had decreased test accuracy for minor faults, in particular, the classification effect of TPE was very poor. This also shows that the composite multiscale method can avoid the information loss caused by the single scale method, and also overcome the problem of low accuracy of entropy estimation caused by the traditional multiscale method. Therefore, CMTPE can effectively identify minor faults.

**Methods CMTPE TPE MTPE MPE** Average test accuracy (%) 96.46 44.71 70.46 94.31 Variance (%) 0.74 1.95 1.85 1.01

**Table 3.** Average test accuracy and variance of the four methods for different degrees of crack faults.

**Table 4.** Average test accuracy and variance of the four methods for different degrees of pitting faults.


### **5. Conclusions**

In this study, a method using CMTPE for quantifying the complexity of time series was proposed. CMTPE takes into consideration the transition probability matrix of an ordinal pattern and performs composite multiscale processing on the original time series. This avoids the loss of information caused by single-scale analysis, and overcomes the problem where the traditional multiscale method will greatly shorten the time series in large scale, resulting in low accuracy of entropy evaluation. Composite multiscale analysis improved the performance of CMTPE feature extraction, the accuracy of entropy estimation, and the robustness against noise. Compared with MTPE, TPE, and MPE, the superiority of CMTPE was verified by both simulation and experimental data. The results show that CMTPE has better robustness, can effectively identify bearing faults, and has the highest diagnostic accuracy and stability.

Moreover, in the case of negative SNR, it was necessary to use filtering and other noise reduction methods to preprocess the signal to achieve better anti-noise effect. Thus, in future work, we will test the effectiveness of combining CMTPE with other noise reduction methods.

**Author Contributions:** Data curation, B.M.; Formal analysis, L.G.; Funding acquisition, Y.L.; Investigation, T.Z.; Methodology, J.G. and B.M.; Resources, T.Z.; Supervision, Y.L.; Validation, J.G.; Visualization, L.G.; Writing—original draft, J.G.; Writing—review & editing, B.M., T.Z., L.G. and Y.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research was supported by the National Natural Science Foundation of China under Grants 12172290 and 52250410345.

**Conflicts of Interest:** The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

## **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-6463-0