*Article* **A Reliable Fault Diagnosis Method for a Gearbox System with Varying Rotational Speeds**

#### **Cong Dai Nguyen, Alexander Prosvirin and Jong-Myon Kim \***

School of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Korea; daimtavn@gmail.com (C.D.N.); a.prosvirin@hotmail.com (A.P.)

**\*** Correspondence: jmkim07@ulsan.ac.kr; Tel.: +82-52-259-2217

Received: 9 April 2020; Accepted: 29 May 2020; Published: 31 May 2020

**Abstract:** The vibration signals of gearbox gear fault signatures are informative components that can be used for gearbox fault diagnosis and early fault detection. However, the vibration signals are normally non-linear and non-stationary, and they contain background noise caused by data acquisition systems and the interference of other machine elements. Especially in conditions with varying rotational speeds, the informative components are blended with complex, unwanted components inside the vibration signal. Thus, to use the informative components from a vibration signal for gearbox fault diagnosis, the noise needs to be properly distilled from the informational signal as much as possible before analysis. This paper proposes a novel gearbox fault diagnosis method based on an adaptive noise reducer–based Gaussian reference signal (ANR-GRS) technique that can significantly reduce noise and improve classification from a one-against-one, multiclass support vector machine (OAOMCSVM) for the fault types of a gearbox. The ANR-GRS processes the shaft rotation speed to access and remove noise components in the narrowbands between two consecutive sideband frequencies along the frequency spectrum of a vibration signal, enabling the removal of enormous noise components with minimal distortion to the informative signal. The optimal output signal from the ANR-GRS is then extracted into many signal feature vectors to generate a qualified classification dataset. Finally, the OAOMCSVM classifies the health states of an experimental gearbox using the dataset of extracted features. The signal processing and classification paths are generated using the experimental testbed. The results indicate that the proposed method is reliable for fault diagnosis in a varying rotational speed gearbox system.

**Keywords:** adaptive noise reducer; gaussian reference signal; gearbox fault diagnosis; one against on multiclass support vector machine; varying rotational speed

#### **1. Introduction**

Gearboxes are widely used in industrial applications, usually in harsh and continuous conditions, making them susceptible to a variety of failures. Defects can cause the gearbox system to break down and potentially damage complex mechatronic equipment or even cause a serious threat to safety, property, or customer satisfaction. Therefore, it is essential to diagnose gearbox faults regularly to ensure their early detection. The vibrations of gearbox systems have been studied since the 1980s, and previous researchers have found that gearbox vibrations have a keynote meshing frequency [1,2] with complex sidebands around it and its harmonics [3,4]. Therefore, the sideband frequencies and the meshing frequency and its harmonics are the informative components for identifying gear faults. Signal analysis is a backbone procedure for rotational-machine fault diagnosis research and applications. It works by decomposing the related fault features that are the groundwork for identifying fault patterns. The vibration characteristics of gearbox systems produce two major signals that can be analyzed for fault detection: acoustic signals and vibration signals [5]. Vibration signals are the

most popularly used ones for gearbox fault monitoring because acquiring vibration data is easy [6]. However, vibration signals contain many types of noise from sources such as measurement systems (data acquisition systems), the environment, shafts, gears, and other related components and their impingement [7,8]. All that noise, which exceeds the signal, fills the frequency spectrum of the vibration signal, and eclipses it.

Many signal processing methods using advanced techniques have been presented by many researchers: frequency analysis focusing on Fourier transform [9], Wigner distribution [10], rank-order morphological filter [11], cyclostationary signals for mechanical applications [12], and the envelope analysis [13], which is the most well-known for rotational-machine fault diagnosis applications such as bearing-fault diagnosis. It detects the repeating shock amplitudes that appear as faulty teeth traverse each rotation cycle. Using this method, the vibration signal is first processed by a bandpass filter to achieve a high signal-to-noise ratio, and second, the Hilbert transform is used to achieve the envelope. If the sideband frequencies in a gearbox vibration signal appear in the envelope, the presence of faulty teeth in the gearbox can be deduced [14,15]. However, when the vibration signal is submerged in noise, it is difficult to recognize the informative components for fault diagnosis in the envelope.

Time-frequency analyses were developed to process non-stationary signals using a frequency transformation process divided based on windows across the time axis to capture informative events. The basic time-frequency analysis method is a short-time Fourier transform (STHT) or a spectrogram, such as a limited time window–width Fourier spectral analysis [16,17]. The challenges of the STHT method, such as a failure of the assumption that the pieces of a non-stationary signal are stationary, difficulty adapting the observation window size to the size of a real stationary piece of signal, and the conflict between frequency resolution and time resolution (which is related to the Heisenberg uncertainty principle) limit its usability. To resolve the disadvantages of the STHT method, the wavelet approach was developed as an adjustable window frequency spectral analysis method. The basic wavelet function can be modified to meet special needs, so the wavelet transform produces outputs with good resolution in the low-frequency range and good time resolution in the high-frequency range [18–20]. In the region relevant for rotational-machine fault diagnosis, wavelet-based decomposition has been widely used to apprehend the useful components of a vibration signal in a non-stationary condition (in this context, *non-stationary* is the notion that the sideband frequency information of a vibration signal is time-variant). Wavelet transform decomposes a vibration signal into many sub-bands that express the time-frequency distribution through the dilation and transition of the mother wavelet. The sub-bands that contain fault-related intrinsic features can then be used in the fault diagnosis process [21,22]. Nevertheless, the efficiency of the wavelet-based method correlates with the basic wavelet function, so informative components that do not correlate as well with the applied wavelet could be missed or lost in the transformed outcome. In addition, the white noise that is frequently parasitic in a vibration signal and appears across the whole range of the frequency spectrum gives correlated oscillations with a high potential to appear as excitation. In this paper, the effect of noise on the wavelet applied to the wavelet transform method is compared with the proposed method for processing the signal path in the experimental results.

The Hilbert-Huang transform (HHT) was introduced as a better methodology for analyzing non-linear and non-stationary signals [23]. This technique is now often used for rotational-machine fault diagnosis [24–27]. The HHT method uses a time adaptive operation known as empirical mode decomposition (EMD) to decompose the signal into a group of complete and orthogonal components, denoted as intrinsic mode functions (IMFs), that represent the intrinsic oscillation modes of the fault-related components of a vibration signal. The HHT method was shown to outperform wavelet transform in rotational-machine fault diagnosis in [28–30]. To capture the advantages of the HHT, several fault detection tools combine EMD with other methods, such as envelope analysis and the wavelet-based technique. EMD and the envelope analysis combine in series: the vibration signal is first decomposed by EMD to determine the number of IMFs; the envelope analysis then processes the IMFs to monitor for fault-related components. Compared with previous methodologies, this combined

technique had better results [28,31,32]. Combining wavelet transform and EMD for time-frequency analysis is another currently used combination method. It takes advantage of the strong points of the two techniques and minimizes their limitations, particularly aliasing in the high-frequency band (wavelet transform) and difficulties in isolating the signals within the second harmonic (EMD) [33,34].

However, the EMD technique is sensitive to noise, so noise-related IMFs, which are not useful, can be generated by the EMD. As illustrated by Van M. et al. [35], EMD performs well in processing low-noise vibration signals and poorly in processing high-noise signals. In other words, even EMD combination techniques are unreliable in noisy environments. Therefore, to effectively apply enhanced signal analysis techniques to non-stationary vibration signals, a proper pre-processing method to reduce noise is required, such as narrowband demodulation [36] or discrete wavelet transform (DWT) [37,38]. Applying those de-noising methods effectively reduces the measurement noise, but the original informative signal is also distorted by the attenuation of a narrow bandpass filter in narrowband demodulation or the threshold in DWT-based de-noising. In other words, using one of those noise reduction methods can degrade the performance of a fault diagnosis system. Therefore, we have developed a new de-noising technique to reduce the noise from an original vibration signal by optimizing a process for filtering the weights and parameters of the reference input signals (adaptive) and considering the noise characteristics and rotational speed. We call our new technique the Adaptive Noise Reducer–based Gaussian Reference Signal (ANR-GRS).

The adaptive noise-controlling technique reduces noise by means of destructive interference. It consists of an adaptive filter and reference signals. The adaptive noise filter is a digital filter with an adaptive algorithm that adjusts the filtering coefficients (or tap weights) so the filter can be flexibly and optimally operated in unknown conditions with non-stationary signals to effectively remove low-level noise [39]. The typical performance criterion for adjusting the filtering weights (convergence condition) is based on the error signal, which is the difference between the output of the filter and the input reference signal as determined using the recursive least-squares or least mean square (LMS) algorithm. Between them, the LMS is more widely used because of its robustness and simplicity [40]. The ANR-GRS technique has three main function blocks: Gaussian reference signal (GRS) generation, adaptive noise filtering using the LMS algorithm, and optimal output sub-band selection. The generated GRS is a special signal consisting of a white-noise reference signal and a Gaussian reference signal with adjustable parameters (mean value and standard deviation) to identify noise components that are independent of the informative components in the frequency domains of the vibration signal from a varying speed gearbox. The adaptive noise filter consists of an M-tap digital Finite impulse response (FIR) filter and the LMS adaptive algorithm; it has two inputs: a reference input for the GRS signal with specific parameters and the desired input for a vibration signal. The noise-reduced sub-band is achieved as the output of the adaptive noise filter. The optimal output sub-band selection adjusts the parameter of the GRS signals to receive the set of noise-filtered sub-bands output by the adaptive filters and then selects the sub-band with the minimum mean square as the optimal sub-band, which is the final output of the ANR-GRS module. That output becomes the input for the feature pool configuration process used to extract the statistical features in the time and frequency domains of the vibration signal as feature vectors [41] to be classified.

The heterogeneous feature pool improves the efficiency of gearbox fault expression for fault diagnosis process; however, the high dimensionality of the feature vectors can be a challenge for various machine learning techniques that can be used for decision making. In comparison with the other artificial intelligence algorithms, the classification performance of support vector machines (SVM) classifier is not much sensitive to the dimensionality of the feature vectors, in other words, this algorithm is not affected by the problem called 'curse of dimensionality'. Furthermore, SVM demonstrates excellent generalization performance, so this technique is capable of achieving high accuracy while classifying mechanical faults in rotation machinery [42]. Also, with an appropriate kernel function, SVM can accurately separate the non-linear datasets by hyperplanes in high-dimensional feature space using the non-linear mapping [43]. SVMs are widely used for fault diagnosis in many real-world

applications [44]. They were originally designed for binary classification and then improved for multiclass classification using the one against one, one against all, or hierarchical strategy. Among them, the one against one strategy is the most reliable for our purposes [45–47]. Therefore, a one-against-one multiclass SVM (OAOMCSVM) is used in this proposed methodology.

The new hybrid technique employs the ANR-GRS, which produces an optimal sub-band, and then uses a machine-learning classification of fault types based on the OAOMCSVM on features extracted from that optimal subband to identify faults in a gearbox system. The experimental results show that the proposed method outperforms the aforementioned denoising methods, which verifies that the "clean" input can be used to produce correct output from the signal processing and classification paths.

The rest of this paper is organized as follows: Section 2 provides the characteristics of a gearbox vibration signal and the experimental test setup used in this study. The proposed methodology is explained in detail, from theory to the construction of the ANR-GRS, feature pool configuration, and OAOMCSVM classification, in Section 3. Section 4 demonstrates our experimental results in signal processing and classification. Finally, Section 5 concludes the paper.

#### **2. The Characteristics of a Gearbox Vibration Signal and Experimental Testbed Setup**

#### *2.1. The Characteristics of a Gearbox Vibration Signal*

The defects of a gearbox can be classified into three major categories: manufacturing defects (tooth profile error, eccentricity of the wheel, etc.), installation defects (parallelism), and operational defects (tooth wear, case wear, tooth spalling, tooth cracks). This research considers operational faults. A one-stage transmission gearbox, which consists of two rigid blocks with a pinion (on the drive side) and a gear (on the non-drive side), is illustrated in Figure 1. A healthy gear in normal condition working smoothly and periodically generates a linear and periodic vibration signal [3]. The vibration signal, *sh*[mV/(m/s2)], of a fault-free normal pair of gears meshing under a constant load speed can be formulated as [48]:

$$s\_{\mathbf{l}}(t) = \sum\_{i=0}^{N} S\_{\mathbf{i}} \cos(2\pi \mathbf{i} f\_{\mathbf{M}} \mathbf{t} + q\_{\mathbf{i}}),\tag{1}$$

where *Si* and ϕ*<sup>i</sup>* are the amplitude and phase of the i-*th* meshing frequency harmonics; *fM* is the meshing frequency (for the pinion: *P* is the number of teeth in the pinion wheel, *fP* is the pinion rotation frequency, *fM* = *P*. *fP*; or *fM* = *G*. *fG*, *G* is the number of teeth in the gear wheel, *fG* is the gear rotation frequency), and N is the total number of *fM* harmonics in the frequency range of a vibration signal. Figure 2a shows the spectrum of the output vibration signal of a fault-free gearbox; it is filled with the frequency tones of the meshing frequency and its harmonics.

In a fault case, when the motion transferred from the drive shaft to the non-drive shaft by the rotation between the pinion wheel and the gear wheel traverses a defective tooth (chipped, worn, or missing), an abnormal movement occurs that changes the impulses in the vibration signals. The vibration signal contains amplitude and phase modulations of the carrier frequency as the meshing frequency; its frequency spectrum includes sidebands, frequency components on two sides of the meshing frequency and its harmonics, as given in Equation (1). Thus, when the gear wheel has a faulty tooth, the velocity of the gear angle changes impulsively within the rotating functionality and generates a non-linear vibration signal in which issues such as speed variation, amplitude, and phase modulation prevail [4]. The vibration signal is formulated [3] as given by Equation (2), and an example of its spectrum is shown in Figure 2b:

$$s\_f(t) = \sum\_{k=0}^{N} S\_k(1 + a\_k(t)) \cos(2\pi k f\_M t + q\_k + p\_k(t))\tag{2}$$

$$\text{Here, } a\_k(t) = \sum\_{j=0}^{M} A\_{kj} \cos(2\pi j f\_\odot t + \mu\_{kj}) \text{ and } p\_k(t) = \sum\_{j=0}^{M} P\_{kj} \cos(2\pi j f\_\odot t + \xi\_{kj}).$$

*Akj*, *Pkj* are amplitudes and μ*kj*, ξ*kj* are phases of the j-*th* sideband in the amplitude and phase modulation signals, respectively, around k meshing harmonics.

**Figure 2.** The frequency spectrum of the gearbox vibration signal: (**a**) a healthy gearbox and (**b**) a faulty gearbox.

#### *2.2. The Experimental Testbed Setup*

The experimental testbed is illustrated in Figure 3. The pinion wheel is fixed to a three-phase AC induction motor by a drive shaft (DS). The motion (torque) is transmitted from the AC motor to the load as adjustable blades, which are mounted on the end of the non-drive shaft by the engaged teeth of a pinion wheel and a gear wheel (a gearbox with a gear reduction ratio of 1:1.52).

(**a**)

**Figure 3.** *Cont*.

(**b**)

**Figure 3.** Experimental testbed setup: (**a**) function block diagram; (**b**) actual experimental assembly.

The number of teeth on the pinion wheel is 25 (P = 25), the gear wheel has 38 (G = 38), and the length of each tooth is 9 mm. Figure 4 depicts the seeded tooth failures on the gear wheel: a perfect or healthy gear (H), tooth cut 10% (F1), tooth cut 30% (F2), and tooth cut 50% (F3). To measure the speed of the shaft rotation, a displacement transducer is placed to track the hole in the DS once per rotation. The vibration signals from the gear wheel in the normal condition and three levels of tooth cut defects (shown in Figure 4) were continuously acquired from the vibration sensor (accelerometer 622B01 made by the IMI Sensor Company) mounted on the end of the DS, 72.5 mm from the pinion gear. The analog vibration was digitized using PCI-2 data acquisition (the specifications of the data acquisition system are provided in Table 1). The sample datasets for each health condition (H, F1, F2, F3) of the gearbox under four shaft speeds are provided in Table 2.


**Table 1.** Specification of the sensors and data acquisition system.

Each health state is sampled by sampling frequency 65536 Hz in continuous 1 s (1-s sample) repeating by 300 times to receive 300 1-s samples for each shaft speed. Hence, the number of samples for each health state is 1200 vibration samples in four different shaft speeds, the total number of samples in this experimental testbed is 4800 of 1-s samples.

**Figure 4.** The health states of the gear wheel and examples of vibration signals at the rotation speed of 300 RPM: (**a**) no seeded fault, healthy gear, (**b**) tooth cut 10% (0.9 mm), (**c**) tooth cut 30% (2.7 mm), (**d**) tooth cut 50% (4.5 mm).


**Table 2.** A detailed description of the fault types and dataset.

#### **3. The Gearbox Fault Diagnosis Methodology**

A function block diagram of the method for gearbox fault diagnosis proposed in this study is provided in Figure 5. We use four main processing blocks: the sensor and DAQ block, ANR-GRS, feature pool configuration, and multiclass SVM-based classification. To acquire discrete samples of each captured signal event containing information about the defective gearbox in the acquisition dataset, the raw vibration signal was sampled at a high frequency of 65536 Hz to acquire rich digitized vibration sample data under different shaft rotation speeds (300, 600, 900 and 1200 RPM), and the adjustable load was non-stationary. Though the operation frequency range of a vibration sensor in this study is from 0.42 Hz to 10 kHz (this is presented in Table 1), thus fault-related components in the frequency domain of vibration signals mostly exist in the lowest segment 0–10 kHz of their frequency spectrums. Therefore, the sampling frequency of the raw vibration signal (all vibration signals acquired in this paper) was reduced by a factor of three using a down-sampling technique. However, implementing decimation involves aliasing, so a low-pass Chebyshev Type I Finite Impulse Response filter (the filter with the order of 35 and a cut-off frequency of 10 kHz) was used for antialiasing [49]. The output sub-band signal from the lowpass filter (lpf(n), n is denoted as discrete-time) which have frequency spectrums in the range from 0–10 kHz, were then optimized by the ANR-GRS to achieve the optimal sub-band, opt(n), from which twenty-one features were extracted through a feature pool configuration, F(k), for classification by the OAOMCSVM.

**Figure 5.** Function block diagram of the proposed methodology.

#### *3.1. Adaptive Noise Reducer–based Gaussian Reference Signal*

#### 3.1.1. Adaptive Noise Filtering Technique

The Digital Filter

An adaptive filter combines the operation of a digital filter and an adaptive algorithm. The adaptive algorithm optimizes the coefficient (or weight) of a digital filter by using the feedback signal from the output (error signal) according to the signal condition or performance criteria [38]. Figure 6 illustrates the function of an adaptive filter constructed using a FIR filter and an adaptive algorithm. The output of the FIR filter is calculated as given in Equation (3):

$$\mathbf{g(n)} = \sum\_{\mathbf{m}=1}^{M} \mathbf{c\_m(n)} \mathbf{r(n-m)} = \mathbf{c^T(n)} \mathbf{r(n)}\tag{3}$$

where, cm, m = 0,1, ... , M−1 (M is the digital filter length) are the adjustable weights (coefficients) of the filter, which do not depend on the sample time. The weight vector (M × 1) is formed as:

$$\mathbf{c}(\mathfrak{n}) \equiv \begin{bmatrix} \mathbf{c}\_0, \mathbf{c}\_1, \dots, \mathbf{c}\_{\mathbf{M}-1} \end{bmatrix}^T,\tag{4}$$

and r (n−m), m = 0, 1, ... , M−1 are samples of an input signal composed of the vector M × 1:

$$\mathbf{r}(\mathbf{n}) \equiv [\mathbf{r}(\mathbf{n}), \mathbf{r}(\mathbf{n}-1), \dots, \mathbf{r}(\mathbf{n}-\mathbf{M}+1)]^{\mathrm{T}}.\tag{5}$$

T denotes the transpose operation of the matrix. Then, the error signal e(n) is the difference between the FIR filter response, y(n), and desired signal, d(n), which can be calculated as:

$$\mathbf{e}(\mathbf{n}) = \mathbf{d}(\mathbf{n}) - \mathbf{g}(\mathbf{n}) = \mathbf{d}(\mathbf{n}) - \mathbf{c}^T(\mathbf{n})\mathbf{r}(\mathbf{n}).\tag{6}$$

A common criterion for tuning the convergence of the weight vector, c(n), is the minimization of the mean-square error (MSE):

$$\begin{array}{l} \mathbf{J} \equiv \mathbf{E} \{ \mathbf{e}^2(\mathbf{n}) \} = \mathbf{E} \{ [\mathbf{d}(\mathbf{n}) - \mathbf{c}^T(\mathbf{n}) \mathbf{r}(\mathbf{n})]^2 \} \\ \mathbf{J} = \mathbf{c}^T(\mathbf{n}) \mathbf{R} \mathbf{c}(\mathbf{n}) - 2 \mathbf{P} \mathbf{c}^T(\mathbf{n}) + \mathbf{E} \{ \mathbf{d}^2(\mathbf{n}) \} \end{array} \tag{7}$$

where, R <sup>≡</sup> E{**r**(n)**r**T(n)} is the input autocorrelation matrix, and P <sup>≡</sup> E{**r**(n)**d**(n)} is the cross-correlation vector between the input signal and the desired signal vector.

Equation (7) indicates that the MSE is a quadratic function of the filter weights (c), and its performance surface guarantees that it has a single global minimum MSE corresponding to the optimal vector **c**o. The optimal vector **c**o can be found by taking the first derivative of Equation (7) and setting it to zero, the result achieved by Wiener-Hopf equation (assuming that R has an inverse matrix):

$$\mathbf{c}\_{\text{O}} = \mathbf{R}^{-1} \mathbf{P},\tag{8}$$

so that the minimum MSE is:

$$\mathbf{J}\_{\rm min} = \mathbf{E} \{ \mathbf{d}^2(\mathbf{n}) \} \ - \mathbf{P}^T \mathbf{c}\_0. \tag{9}$$

#### Adaptive Algorithm

The adaptive algorithm is a recursive function to automatically adjust the coefficient vector, c(n), to minimize MSE (Jmin) so that the weight vector converges to the optimum solution, **c**o, after iteration loops. Both the LMS and recursive least-squares algorithms can be used to fetch the optimal solution [39], but the LMS is the most broadly used. To calculate the updated weight vector in the recursive loop, the LMS algorithm is based on the steepest-descent procedure using a negative gradient of the instant square error, which was devised by Widrow and Stearns [50] as follows:

$$\mathbf{c}(\mathbf{n}+1) = \mathbf{c}(\mathbf{n}) + \mu \mathbf{r}(\mathbf{n})\mathbf{e}(\mathbf{n}),\tag{10}$$

where μ is the step size (or convergence factor) that determines the stability and convergence rate of the LMS algorithm. The algorithm adapts the weight vector to the optimal Wiener-Hopf solution (**c**o) given in Eqaution (9) by an iterative process with the convergence factor. The step size is selected in the range [40]:

$$0 < \mu < \frac{2}{\text{MS}\_{\text{u}}},\tag{11}$$

where Su is the average power of the input signal **r**(n).

**Figure 6.** Function block diagram of an adaptive filter.

Adaptive Noise Filtering Technique Applied to a Vibration Signal

To construct the adaptive noise filter, the noise reference signal and observed signal are applied as the input signal of an adaptive filter (the input response in Figure 6) and the desired signal (d(n) in Figure 6), respectively. The observed signal is the vibration signal acquired from the accelerometer sensor and digitized by the DAQ block reflecting gearbox behavior as expressed by the informative signals (s(n)) and the noise (w(n)), as shown in Figure 7. As explained in Sections 1 and 2, the informative signals and noise are formed by different sources: the informative signal comes from the vibration of the gear and pinion teeth, whereas the noise comes from the measurement system, unrelated gearbox components, and mechanical resonances. Therefore, the informative signal and noise represent independent processes (E{**s**(n)**w**(n)} = 0). To implement an effective adaptive noise filtering system, the generated noise reference signal, r(n), should meet two conditions (A and B):


When those conditions are met, the MSE of the adaptive noise filter can be calculated as follows:

$$\mathbb{E}\left[\mathbf{e}^2(\mathbf{n})\right] = \mathbb{E}\left[\left[\mathbf{s}^2(\mathbf{n}) + (\mathbf{w}(\mathbf{n}) - \mathbf{r}\_0(\mathbf{n}))\right]^2\right],\tag{12}$$

where **r**o(n) = **c**T(n)**r**(n), and the adaptive filter uses an FIR filter,

$$\mathbb{E}\{\mathbf{e}^2(\mathbf{n})\} = \mathbb{E}\{\mathbf{s}^2(\mathbf{n})\} + \mathbb{E}\{ [\mathbf{w}(\mathbf{n}) - \mathbf{c}^T(\mathbf{n})\mathbf{r}(\mathbf{n})]^2 \}. \tag{13}$$

The informative signal is independent of both the noise (E{**s**(n)**w**(n)} = 0) and the generated noise reference (E{**r**(n)**s**(n)} = 0). By implementing the LMS adaptive algorithm to adapt the filter coefficient vector, **c**(n), to the optimal vector, **c**o, the mean square of the output signal (error signal) approaches the single minimum of the performance surface. From Equation (13), the minimum MSE is taken to be:

$$\min\_{\mathbf{c}(\mathbf{n})} \text{E}\{\mathbf{e}^2(\mathbf{n})\} = \text{E}\{\mathbf{s}^2(\mathbf{n})\} + \min\_{\mathbf{c}(\mathbf{n})} \text{E}\{\left[\mathbf{w}(\mathbf{n}) - \mathbf{c}^T(\mathbf{n})\mathbf{r}(\mathbf{n})\right]^2\} \tag{14}$$

Therefore, the output signal of the adaptive noise filtering system carries the complete informative part of the gearbox vibration signal throughout the whole process of algorithm implementation. In addition, the noise integrated into the vibration signal is reduced; in the ideal case, the noise is removed (min **c**(n) <sup>E</sup>{[w(n) <sup>−</sup> cT(n)r(n)] 2 } = 0). Therefore, for adaptive noise control, we implement the ANR-GRS.

**Figure 7.** Function block diagram of an adaptive noise filtering technique.

#### 3.1.2. ANR-GRS

In this paper, the noise (w(n)) in the gearbox vibration signal is divided into two types: white noise (u(n)) and band noise (b(n)). The white noise arises from the measurement system: the amplifier, detector, DC power supply, thermal vibration of the semiconductor atoms, etc. In the frequency domain, the power of the white noise is spread across the whole frequency spectrum of the vibration signal (theoretically, the power of white noise is spread from -∞ to ∞ in the frequency axis) [8]. Band noise, on the other hand, represents noise caused by unrelated components [7]. The frequency harmonics of the band noise are distributed around the informative components of the gear sideband frequency, meshing frequency, and their harmonics. Therefore, the informative signal inside the vibration signal is separately independent of both types of noise. The ANR-GRS module is built using the adaptive noise filtering technique and reference noise-related generation signals, as illustrated in Figure 8. To reduce the white noise, we apply a generated white noise signal with a uniform, random distribution function (v(n)). The oscillation form of the generated white noise is thus analogous to the white noise integrated into the vibration signal. Because its frequency spectrum is within the observed frequency range, the maximum level of the power spectrum average (PSA) of the reference white noise is reduced to less than 10% (10% in this study) of the PSA of the vibration signal to ensure that the informative signal can be eligible for conditions A or B. The GRS (g(n)) is created to adapt to the band noise inside the vibration signal. To make the proposed methodology as an invariant model, the GRS generation module uses the shaft rotation speed (RPM) information from the displacement transducer and the vibration signal as the input parameters. Then, the mean frequency (FCenter) and the standard deviation of the GRS are calculated based on the frequency of the defective wheel, which is a function of RPM (the gear frequency in this paper). The GRS window is confined entirely within the frequency space between two consecutive sideband frequencies (a sideband segment), pictorially described in Figure 9, and computed as follows:

$$\mathbf{W\_{GRS}}(\mathbf{k}) = \sum\_{\mathbf{k}=1}^{\mathbf{N\_b}} \mathbf{e}^{-\frac{(\mathbf{k} - \mathbf{F\_{Cunder}})^2}{2\Lambda}},\tag{15}$$

where Δ = σ<sup>2</sup> is the variance, σ is the standard deviation of the GRS window, and FCenter is the mean frequency of the GRS window. They function as the frequency of a faulty wheel (the gear frequency, fG = P.RPM/G in this research).

$$F\_{Cuter} = \alpha.f\_{FW}.\tag{16}$$

By linearization of the Gaussian function, the standard deviation (the characteristic of a Gaussian distribution) can be approximately calculated as:

$$
\sigma = 0.318. \text{F}\_{\text{Center}} = 0.318. \text{a.f}\_{\text{FW}}. \tag{17}
$$

Nb is the number of frequency bins in a sideband segment and defined as follows:

$$\text{N}\_{\text{b}} = \frac{2\text{N}\_{\text{s}}}{\text{F}\_{\text{s}}}.\text{f}\_{\text{FW}}.\tag{18}$$

where Ns is the number of samples of the vibration signal, Fs is the sampling frequency of the vibration signal, and fFW is the frequency of the faulty wheel (the gear frequency, fG, in this paper).

To qualify condition B, the frequency components of a Gaussian window are separated from the informative frequencies (sideband frequencies). A Gaussian window is placed completely inside the space between two continual sideband frequencies in the visualization. Thus, the adaptation process for a band-noise reduction is to preserve the original informative frequency component (significantly reducing the noise components and causing negligible attenuation of the informative components). From Equations (15)–(17) and Figure 9, the coefficient α is selected in the range from 0.25 to 0.75, and the qualified Gaussian window signals are generated using the parameters in the following ranges:

The range of the mean value:

$$0.25.\,\mathrm{f\_{FW}} \le \,\mathrm{F\_{Center}} \le 0.75.\mathrm{f\_{FW}}.\tag{19}$$

The range of the standard deviations of the Gaussian reference signal:

$$
\sigma = \begin{cases}
\quad 0.318. \alpha. f\_{FW} & \text{when } 0.25 \le \alpha \le 0.5 \\
\quad 0.318. (1 - \alpha). f\_{FW} & \text{when } 0.5 < \alpha \le 0.75
\end{cases}
\tag{20}
$$

Therefore, the implementation of a stepping adjustment in the coefficient α drives a change in the mean value and standard deviation (the position and shape) of the Gaussian window, which defines the condition for fetching the optimal Gaussian window, as illustrated in Figure 9.

#### 3.1.3. The Process for Calculating the Optimized Subband

First, the ANR-GRS algorithm germinates the initial parameters for the Gaussian signal generation module: starting value of α = 0.25 in this paper, adaptive filter (M-tap, M = 40 in this study), coefficient vector **c**(n) = [0, 0, ... , 0], and step size μ (μ = 0.01). The parameter α is scanned in the range [0.25 0.75] in steps of 0.01 in company with the input rotation speed (RPM) to compute the FCenter (mean value) and standard deviation (σ) using Equations (16) and (17). To generate the specific GRS needed for the reference input of the adaptive filter **r**(n), the output of the adaptive filter is connected to the minus port of the summation module, **r**o(n). The vibration signal, which contains both the informative component and noise, is entered as the desired input and delayed for M sampling time steps to be compatible with the delayed processing of the FIR digital filter. The LMS algorithm adjusts the coefficient vector to receive the LMS of the error, which is the output of the summation module. The output error signal, which has LMS (and to which the optimal coefficient vector is set), is pushed into the set of proposed optimized sub-bands.

**Figure 8.** A function block diagram of the ANR-GRS module and parameter adjustments.

**Figure 9.** The overall flow chart of GRS signal generation for the ANR-GRS module.

Finally, the algorithm calculates the mean square value of each sub-band in that set and then selects the sub-band with the minimum value as the optimized sub-band and output of the AND-GRS module (Figure 10).

**Figure 10.** The algorithm flow chart of the ANR-GRS module.

#### *3.2. Feature Pool Configuration*

We found the ANR-GRS methodology to be highly effective in reducing most of the noise components from a 1-s raw vibration signal while leaving the information about gearbox faults intact. The optimized sub-band output from the ANR-GRS, i.e., the "clean" signal presenting the characteristics of the gearbox component vibration with trivial noise effects, carries the intrinsic fault symptoms of the cut tooth defects. We then use those optimal sub-bands, rather than the raw 1-s vibration signals, to extract features. According to Caesarendra et al. [51], the statistical parameters from the time and frequency domains of the signal are congruent and subservient for fault classification using machine learning. Table 3 displays twenty-one features, eighteen time-domain features (e.g., root means square, square mean root, kurtosis, skewness, margin, impulse, and peak-to-peak value) and three frequency-domain features (root mean square frequency, frequency center, and root variance frequency) for each optimal sub-band. The feature pool dimensionality is NHS × N1-SEC × NF, where NHS is the number of gearbox health states (number of classes) that need to be classified (4 classes in this study: healthy, pinion tooth cut 10%, pinion tooth cut 30%, and pinion tooth cut 50%), N1-SEC is the number of 1-s samples of each class (300 in this study), and NF defines the number of features (21 in this study). Therefore, groups of 21-feature vectors were considered as the validating input dataset for our proposed intelligent fault-detection method based on a multiclass SVM.

#### *3.3. Gearbox Fault Classification Using a Multiclass SVM Classifier*

The principle operation of an SVM is based on the statistical learning theory of Vapnik [45] and quadratic programming [46]. It was actually designed to classify binary datasets by finding the optimal plane, generally called the *hyperplane*, with the largest margin-gap separating it from both binary classes. Let {(xm, ym), m=1, 2, ... , M} be the given training dataset with M samples, where each

sample data xm <sup>∈</sup> <sup>R</sup>*D*, <sup>R</sup>*<sup>D</sup>* is a D-dimensional feature vector, and ym (ym <sup>∈</sup> {−1, <sup>+</sup>1}) are the class labels. The SVM is used to find a set of linearly separable hyperplanes between two classes and maintain the maximum distance (called the *margin*) from both of them.

**Table 3.** Definition of statistical features in the time and frequency domains.


Here is an input signal (i.e., optimized subband), N is the total number of samples, S(f) is the magnitude response of the fast Fourier transform of the input signal s, *Nf* is the total number of frequency bins, σ = " <sup>1</sup> *N N <sup>n</sup>*=<sup>1</sup> (*sn* − *s*) 2 , and *pn* <sup>=</sup> *<sup>s</sup>*<sup>2</sup> *n N <sup>n</sup>*=<sup>1</sup> *<sup>s</sup>*<sup>2</sup> *n* .

The hyperplane, denoted as w, is determined as the maximized width of the margin and the minimized structural risk, given by:

$$\mathbf{f}(\mathbf{w}, \mathbf{b}) = \underset{\mathbf{w}, \mathbf{b}}{\operatorname{argmin}} \frac{1}{2} \mathbf{w}^{\mathrm{T}} \mathbf{w} + \mathbf{C} \sum\_{\mathbf{m}=1}^{\mathbf{M}} \boldsymbol{\xi}\_{\mathbf{m}} \tag{21}$$

subject to: ym(wTψ(xm) + <sup>b</sup>) <sup>≥</sup> <sup>1</sup> <sup>−</sup> <sup>ξ</sup>m, <sup>∀</sup><sup>m</sup> <sup>=</sup> 1, 2, ... , M; <sup>−</sup> <sup>ξ</sup><sup>m</sup> <sup>≤</sup> 0, <sup>∀</sup><sup>m</sup> <sup>=</sup> 1, 2, ... , M

Here, b is bias, C is the trade-off parameter, ξ = {ξ1, ξ2, ... , ξN} is the set of slack variables, and ψ(.) is a feature vector in the expanded feature space. Equation (21) can be solved by applying the Lagrange duality solution [44] as shown below:

$$\operatorname\*{argmax}\_{\alpha} \mathbf{w}(\alpha) = \sum\_{\mathbf{m}=1}^{\mathbf{M}} \alpha\_{\mathbf{m}} - \frac{1}{2} \sum\_{\mathbf{m}=1}^{\mathbf{M}} \sum\_{\mathbf{k}=1}^{\mathbf{M}} \alpha\_{\mathbf{m}} \alpha\_{\mathbf{k}} \mathbf{y}\_{\mathbf{m}} \mathbf{y}\_{\mathbf{k}} \boldsymbol{\psi}^{\mathrm{T}}(\mathbf{x}\_{\mathbf{m}}) \boldsymbol{\psi}(\mathbf{x}\_{\mathbf{k}}),\tag{22}$$

subject to: <sup>M</sup> <sup>m</sup>=<sup>1</sup> αmym = 0, 0 ≤ bm ≤ C, ∀m = 1, 2, ... , M where, α<sup>m</sup> and α<sup>k</sup> are Lagrange multipliers, xm and xk are two input training vectors, and *K*(xm, xk) = ψT(xm)ψ(xk) is a kernel function used to map the input data space into a higher-dimensional feature space. Several kernel functions, such as linear, polynomial, Gaussian, radial basis, and sigmoid functions, can be used in SVM classification methods. Countless classification applications have more than two classes in their datasets and thus require a solution beyond the binary SVM just described. Multiclass SVMs have been developed to classify datasets of N different classes (N > 2), and they use one of three structures: one-against-one, one-against-all, and hierarchical. Among those structures, OAOMCSVM requires more classifiers than the others, but it also has the most reliable classification accuracy [45]. Therefore, we use OAOMCSVM, illustrated in Figure 11, in the methodology proposed in this paper.

**Figure 11.** The classification methodology of OAOMCSVM.

#### **4. Experimental Results**

To verify the advantage of the ANR-GRS module in the proposed methodology, we implemented experiments in two technological zones, signal processing and features dataset classification, and compared our results with those from conventional methodologies.

#### *4.1. Signal Processing Experimental Results*

The 1-s vibration signals acquired using the experimental testbed described above contained the phase and amplitude modulation signals bearing information about the health states of a gearbox. To investigate the effectiveness of the proposed noise reduction technique, the experimental dataset was collected under various shaft rotation speeds that are equal to 300, 600, 900, and 1200 RPM, respectively. The vibration signals output from the accelerometer are analog, so they were digitized with a high holding sample frequency of 65536 Hz to gather as much information (and noise) as possible on the wideband PCI-based data acquisition board (Table 1). Each 1-s digital vibration signal was down-sampled three times, incorporating a lowpass filter for antialiasing to output a digital vibration signal realistically compatible with the working range frequency of the accelerometer (0–10 kHz, Table 1). Then, the vibration signal was input into the ANR-GRS module (Figure 5). The shaft rotation speed (RPM), measured by the displacement transducer, was observed by the ANR-GRS module according to appropriate vibration signal data to generate the Gaussian reference signal. The optimal subbands were the output of the ANR-GRS module.

To demonstrate the superiority of the ANR-GRS technique, we compared its optimized subbands with the outputs of other signal processing approaches for noise reduction: the Hilbert transform (HT), window bandpass filter (WBF), and wavelet transform with optimal subband-based maximum kurtosis (WTK). We tested those approaches by replacing the ANR-GRS module with them. Figure 12 illustrates the frequency spectra compared with the input vibration signal. Figure 12a shows the output of a lowpass filter that received a 1-s vibration sample with 900 RPM (15 Hz) of fault type 2 (meshing frequency, fM = P.RPM = 25.15 = 375 Hz and sideband gear frequency, fG = P.RPM/G = 9.87 Hz, shown as lpf(n) in Figure 5 and labeled as the OutLPF signal in Figure 12a). The output signals from the noise-reduction modules are shown in Figure 12b (OutHT signal), Figure 12c (OutWBF signal), Figure 12d (OutWTK signal), and Figure 13, the proposed ANR-GRS (OptANR signal). The three conventional methods (HT, WBF, WTK) changed the outLPF signal into different shapes and types (the outLPF signal is an amplitude and phase modulation signal) regardless of the fault information (meshing frequency and its harmonics and sideband gear frequencies). HT exalted the area of the low-frequency components, whereas WTK fortified the high-frequency components in the frequency spectrum (Figure 12b,d). WBF was better than the HT and WTK methods because it filtered the noise in some of the meshing frequency harmonics and sideband gear frequencies, but it also reduced or removed significantly informative frequency components (Figure 12c). The outANR signal (Figure 13), the output signal from the ANR-GRS module proposed here, fulfilled the needs of signal processing: reducing the noise components and preserving the original informative components. It made the vibration signal from the gearbox "cleaner" (lowered the noise) and approached the characteristics of the gearbox vibrations signal presented in Section 2.1. This comparison verifies that our accurate is a suitable technique for reducing the noise in gearbox vibration signals and returning an honest reflection of the health states of a gearbox along an electronic signal path.

(**c**)

**Figure 12.** *Cont*.

(**d**)

**Figure 12.** The frequency spectrum analysis for the state-of-the-art methodologies: (**a**) the input signal, (**b**) the output signal from the Hilbert transform module, (**c**) the output signal from the window bandpass filter module, and (**d**) the output signal from the wavelet transform WTK module.

**Figure 13.** Frequency spectrum analysis of the input and output signal of the ANR-GRS module.

#### *4.2. Classification Results*

The classification performance of the proposed methodology is evaluated in two experiments. At the beginning, the feature sets of the dimensions 4 × 300 × 21 (300 samples from each of four health states, as shown in Table 2) for each speed (in this experiment, 300 RPM, 600 RPM, 900 RPM, and 1200 RPM) were selected. Then, in the first experiment, the dataset is created by merging the data of four available health states under different rotating speeds resulting in the new feature set of the dimensionality 4800 × 21. This dataset was then randomly divided into a train and a test sets at ratio 8:2 for training and testing OAOMCSVM classifier and general evaluation of the proposed fault diagnosis methodology.

Furthermore, to prove the robustness of the proposed ANR-GRS technique, the second experiment is performed where the classifier is trained on data (i.e., feature sets) corresponding to single rotating speed and tested by data instances collected under other speeds. Specifically, feature set for a single rotating speed was used as a training set (for instance, feature set corresponding to 300 RPM with the dimensionality of 4 × 300 × 21), and the feature sets corresponding to two other speeds were used for testing (for instance, feature set corresponding to two other speeds 600 RPM and 900 RPM with the dimensionality of 4 × 600 × 21).

Those processes were run four times. To construct the training model for classification, k-fold cross-validation (k-cv) was used to estimate the accuracy of the generalized classification [52]. In k-cv, the set of samples in the feature vector is split randomly into k mutual folds (k=10 in this study), denoted as C1, C2, ... , Ck. The classification OAOMCSVM operates on k-times of the accuracy estimation.

Some folds {Cj} (a random subset from k folds) are used as a training set, and the rest are used as a validation set and alternative iteration k times. More specifically, for each speed, 300 feature vectors for each health state in the training set were partitioned into ten folds (each fold containing 30 randomly chosen feature vectors (30 × 21) for each health state); 9 of those folds were used for training, and the 1 remaining fold was used for validation. That process was repeated 10 times until all folds had been used as the validation set. The final measure of performance in the training model is the average value of the accuracies attained in each fold. These data are then used as the testing dataset (which was not used at all in the training process) to verify the OAOMCSVM method and provide the final classification result.

We also used the OAOMCSVM classification method to classify the feature pool configuration datasets extracted from the comparison signal processing methodologies: the raw vibration signal (lowpass filter output signal) extraction (methodology I), HT, (methodology II), WBF (methodology III), WTK (methodology IV), and the IMFs and residuals from the EMD (methodology V). The implementation of those methodologies for achieving the classification result for the four health states complied strictly with the conditions used with input from the ANR-GRS module just described. To estimate the classification result between methodologies (the proposed method and others), all twenty-one features of the vibration signal were used as input feature vectors for the OAOMCSVM module to ensure that the most informative features for and from each methodology were used fairly for the classification. The classification results of the state-of-the-art methodologies and the proposed ANR-GRS methodology obtained during two experiments are shown in Tables 4 and 5 and Figure 14 to visualize the results tabulated in Table 5. Those classification accuracies were computed as follows:

$$\text{C}\_{\text{accuracy}} = \frac{\sum\_{\text{L}} \text{N}\_{\text{TP}}}{\text{N}\_{\text{samples}}}.100\% \tag{23}$$

where *L* is the number of categories (*L* = 4 as four health states), NTP is the number of true positives (the number of fault samples in category *i* that are correctly classified as class *i*), and Nsamples is the total number of samples used to estimate the performance of the proposed methodology.


**Table 4.** Classification results for state-of-the-art methodologies and the proposed ANR-GRS methodology by a combination dataset of different speeds.


**Table 5.** Classification results for state-of-the-art methodologies and the proposed ANR-GRS methodology by observation of separated speed dataset.


**Table 5.** *Cont*.

Table 4 illustrates that the proposed technique significantly outperforms its counterparts when it is trained on the data instances corresponding to all available speeds and achieving the highest accuracy of 99.7%.

Table 5 demonstrates that the proposed approach using ANR-GRS also yielded the highest average classification accuracies (98.31%) in comparison with the other five state-of-the-art signal processing methodologies when it is trained and validated on datasets corresponding to separate rotating speeds.

The methodology I extracted the feature vectors of all four speeds for classification by the OAMCSVM directly from the raw vibration signal (OutLPF signal), in which non-linear and non-stationary signals drown out the informative signal. Accordingly, those results are distributed chaotically among the four classes, producing the lowest accuracy among the 6 methodologies (62.63%). For methodologies II, III, and IV, the vibration signals change with the different characteristics of the gearbox vibration signal (its amplitude and phase modulation signal), so their classification accuracy is also low, around 70%. Methodology V (the EMD technique) is outstanding in comparison with the first four approaches (82.5%) because it extracts IMFs, which contain fault-related information to better discriminate between classes. However, IMFs can be mistakenly extracted from noise components, which damaged the accuracy compared with the ANR-GRS technique by around 15%.

In addition, as a quantitative evaluation, we present the space distribution in a 3-dimensional visualization (Figure 15) of samples belonging to four classes based on some features extracted from the outLPF signal and the outANRsignal (signals before and after using the ANR-GRS technique, respectively). The features of the outANR signal show better separation and clustering for different health states of the gearbox fault diagnosis experimental scheme. Samples from the same class are more closely clustered, whereas samples from different classes are discriminated and easy to classify. On the contrary, before using the ANR-GRS, the features of different classes overlap, making it difficult to distinguish the fault classes.

**Figure 14.** The accuracy of each class and the average accuracy of the state-of-the-art methodologies and the proposed ANR-GRS methodology.

**Figure 15.** Three-dimensional visualization of features extracted from (**a**) the input signal of the ANR-GRS module and (**b**) the output signal of the ANR-GRS module.

Moreover, confusion matrixes are shown in Figure 16 to demonstrate the reliability of the varying-speed gearbox fault diagnosis methodology using the ANR-GRS module for effective noise reduction. Using real-time tracking of the rotation speed (RPM) of a gearbox system, the ANR-GRS generated speed-related function signals for real-time tracking of speed-dependent noise components, and the optimized output signal was unaffected by speed during classification.

**Figure 16.** Confusion matrixes for four classification cases according to the speed dataset input for training: (**a**) 300 RPM, (**b**) 600 RPM, (**c**) 900 RPM, (**d**) 1200 RPM.

#### **5. Conclusions**

In this study we propose a reliable fault diagnosis methodology for gearbox systems under varying speed conditions. It integrates adaptive noise control to significantly reduce noise with machine learning classification to classify the fault states of the gearboxes. First, we created a set of Gaussian reference signals that are a function of the rotation speed and consist of many noise components such as white noise and band noise that are correlated to the parasitic noise in the vibration signals and independent of the intrinsic informative components. Then, we applied those GRSs to an adaptive noise control technique that produced an optimal sub-band as output for each 1-s vibration sample. The most optimal sub-bands were then used in the feature pool configuration to extract feature vectors, and an OAOMCSVM was used for classification. The experimental results indicated that the proposed gearbox fault diagnosis methodology achieved the highest classification accuracies in both experiments that are equal to 99.7% and 98.31% while significantly outperforming the counterpart state-of-the-art methodologies used for the comparison. In future research, we will continue improving the robustness of the proposed methodology and investigate it's applicability to the real-time fault diagnosis scenarios. **Author Contributions:** All the authors contributed equally to the conception of the idea, the design of the experiments, the analysis and interpretation of results, and the writing and improvement of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea and Korea Institute for Advancement of Technology (KIAT) through the Encouragement Program for The Industries of Economic Cooperation Region. (P0006123)

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Letter* **Sensor Fault-Tolerant Control Design for Magnetic Brake System**

#### **Krzysztof Patan \*, Maciej Patan and Kamil Klimkowicz**

Institute of Control and Computation Engineering, University of Zielona Góra, 65-516 Zielona Góra, Poland; M.Patan@issi.uz.zgora.pl (M.P.); k.klimkowicz@issi.uz.zgora.pl (K.K.)

**\*** Correspondence: k.patan@issi.uz.zgora.pl

Received: 14 July 2020; Accepted: 14 August 2020; Published: 16 August 2020

**Abstract:** The purpose of the paper is to develop an efficient approach to fault-tolerant control for nonlinear systems of magnetic brakes. The challenging problems of accurate modeling, reliable fault detection and a control design able to compensate for potential sensor faults are addressed. The main idea here is to make use of the repetitive character of the control task and apply iterative learning control based on the observational data to accurately tune the system models for different states of the system. The proposed control scheme uses a learning controller built on a mixture of neural networks that estimate system responses for various operating points; it is then able to adapt to changing working conditions of the device. Then, using the tracking error norm as a sufficient statistic for detection of sensor fault, a simple thresholding technique is provided for verification of the hypothesis on abnormal sensor states. This also makes it possible to start the reconstruction of faulty sensor signals to properly compensate for the control of the system. The paper highlights the components of the complete iterative learning procedure including the system identification, fault detection and fault-tolerant control. Additionally, a series of experiments was conducted for the developed control strategy applied to a magnetic brake system to track the desired reference with the acceptable accuracy level, taking into account various fault scenarios.

**Keywords:** braking control; nonlinear systems; fault tolerant control; fault detection; iterative learning control; neural networks

#### **1. Introduction**

Modern industrial systems are usually complex and nonlinear. This leads directly to challenging problems related to control design as requirements imposed on the control quality and robustness continuously increase. Since it is not easy to deal with the nonlinear models, there is still a need for new systematic methods taking advantage of the specificity of a particular class of nonlinear systems.

A magnetic brake also called an eddy-current brake is a typical example of such nonlinear systems with tangible applications in various areas of the automotive industry, including high speed railways, big trucks and industrial elevators. Basically, in the regime of the high speed reached by current vehicles, the classical friction-based retarders become ineffective and insufficient. Therefore, magnetic or electro-magnetic brake control systems constitute an attractive alternative to ensure travelers and vehicle safety and reliability. These are characterized by the fast response time, a non-contact construction, and a relatively low level of failure rate in comparison to conventional brakes.

As for control design for magnetic brakes, the problem is considered to be difficult since the complex dynamics include not only temporal but also spatial effects. This excludes the direct extension of results

from control theory for linear systems. Additionally, the system is characterized by a strict antisymmetry of control, as one can control only the rate of deceleration. Nevertheless, a few more or less successful attempts at adapting this theory have been reported by various authors in the control engineering literature.

In [1], the authors used an input–output feedback linearization scheme for speed control. However, they used a simplified control-affine model to derive the control law. In [2], the author used the classical proportional-integral (PI) controller with the current reference provided by means of the Kriging method. However, the optimization problem was solved using a genetic algorithm, which is a time-consuming method. In [3], the authors applied an adaptive control of vehicle speed. However, they assumed a low order lumped-parameter model of the plant. In [4], the authors applied a sliding mode control to design a speed control that is robust for changing road conditions, but again, they used a simplified approximated model as proposed in [1]. In turn, a torque tracking control by means of a sliding model was proposed in [5]. The author applied a simple static polynomial-based torque model, which leads directly to excessive sensitivity on the changes of the operating point of the control system during the implementation of this technique.

All reported papers used the simplified models of the magnetic brake and described closed-loop control schemes. The main problem that arises in the case of nonlinear control system design is derivation of the exact model of the plant considered. From a physical point of view, the magnetic brake is a distributed-parameter system, requiring sophisticated mathematical modeling that takes into account its spatiotemporal dynamics. This makes the accurate modeling numerically difficult and time-consuming, which constitutes one of crucial design difficulties.

In addition to this, there is a significant lack of contributions discussed in the context of fault-tolerant control. In fact, there are just a few papers covering this issue in the context of breaking control for electromagnetic systems. An adaptive fault-tolerant control based on a Takagi–Sugeno fuzzy model and observer design is reported in [6], but it consists of sequential recalibration of Kalman filters, reverting this approach to the linear approximation of the system. Another approach to achieve fault-tolerant brake control utilized the brake-by-wire strategy [7]. However, it was strongly dependent on preset recovery strategies. Additionally, it focused on the nonlinear effects of contact force control and did not take into account the specificity of the electromagnetic brakes, limiting its applicability in this area. In [8], the braking controller was constructed for vehicles with regenerative in-wheel electromagnetic brakes using sliding mode control and a fault-tolerant control architecture included as an adaptive scheme. However, in this case, the main stress was put on the vehicle dynamics and again the linear model of braking torque was applied. On the one hand, all of the approaches mentioned above allow to avoid significant perturbations in the control and provide satisfactory results in some particular situations; on the other hand, they do not take into account the effects of the complex dynamics of brakes, neglecting the nonlinearity that may introduce an additional level of uncertainty and significantly affect the reliability of fault detection. To fill this gap, this work focuses on an alternative control strategy driven by measurement data and developed on the basis of an iterative learning controller enhanced with statistical hypothesis testing for fault detection.

If an accurate mathematical model of the system is not available, the reasonable alternative is to design the model using input–output data recorded in the real plant during replications of the control task. In this context, iterative learning control (ILC) is a suitable technique to achieve ideal tracking of the given reference profile [9]. The learning controller calculates the current control signal based on the archive control data available from previous experiments. Most often, a learning controller has a fixed structure along the trial domain. However, in practice, a controller should be robust to disturbances and the faults within the infrastructure of the plant [10]. Therefore, a learning controller design should include the above circumstances into the analysis. In this paper, a neural network-based iterative learning control scheme is proposed, which is capable of adapting its own behavior to the changing working conditions of the plant (related to potential sensor faults) through a data-driven training process. In the background of

ILC, the additional advantage is that the squared tracking error norm stands for the convenient statistic for the fault detection task. It makes it possible to use classical parametric hypothesis testing on the abnormal state of the system, leading to a simple yet efficient thresholding rule for sensor fault detection.

Due to technological developments, modern industrial plants are more and more complex and are composed of an ever-growing number of interacting devices, including a huge number of measuring devices. That is the reason that industrial plants are becoming very vulnerable to any deviation from the so-called normal operating conditions caused by unpermitted deviations of plant characteristics or unexpected changes of system variables. Such a deviation is called a fault. Early detection of faults can provide a way to avoid a shut-down or even a failure of the system. More importantly, a proper fault accommodation is strongly related to avoiding the large financial losses and serious injuries or even death of personnel. On this basis, fault-tolerant control (FTC) has come into prominence and has received increased attention in the last decade [11–15]. The FTC approaches can be be split into passive and active ones. Passive FTC uses a priori knowledge about anticipated faults that can affect the system. In turn, active FTC methods use the information acquired from a fault diagnosis subsystem in order to accommodate a fault. An important class of fault tolerant systems is sensor FTC. The simple and intuitive way is to introduce hardware redundancy, which, however, complicates the construction of the system and increases maintenance costs. A more flexible solution is to use the idea of a virtual sensor [16–18]. The idea is to exclude the measurements given by the faulty sensor and replace them with data provided by the virtual sensor. The virtual sensor requires a fairly accurate plant model. In this paper, we propose to apply a mixture of neural network models, which renders it possible to effectively accommodate a sensor fault. This strategy is called a fault hiding approach [19], which renders it possible to apply the same controller to both faulty and fault-free cases.

The contributions of the paper can be stated as follows.


The paper is organized as follows. In Section 2 the magnetic brake system is described. In Section 3 the control scheme using iterative learning control is provided and the control performance for normal operating conditions is also discussed. Section 4 is devoted to the modeling issues. State-space neural networks are employed to model the plant at different operating points and the overall model of the plant is achieved by means of gain scheduling in the form of a mixture of models. Section 5 introduces the idea of sensor active FTC, including fault detection and accommodation and FTC in case of different sensor faults. The last section summarizes the paper.

#### **2. Magnetic Brake**

A magnet brake, in simple terms, consists of a disk of conductive material and a magnet generating a magnetic field in which the disk is rotating. The simplest form of the device is depicted in Figure 1 (left panel).

**Figure 1.** (**a**) Schematic view of a magnetic brake and (**b**) upper view with eddy currents field.

When a conductor moves in a magnetic field, it induces eddy currents, cf. Figure 1 (right panel), which interact with the magnetic flux to produce Lorentz forces resulting in a braking torque slowing down the disk. Let D ∈ <sup>R</sup><sup>3</sup> denote a spatial domain representing the disk. Then, the evolution of the rotational velocity *ω* of the disk over the time interval (0, *tf*) can be described by the following initial value problem:

$$J\frac{d\omega(t)}{dt} = M(B\_0(t), \omega(t)), \quad \omega(0) = \omega\_{0\prime} \tag{1}$$

where *J* is a disk moment of inertia and *B*0(*t*) is the magnetic flux of the external magnet (or electromagnet). *M* is a total torque affecting the disk:

$$M = \int\_{\mathcal{D}} T\_q(B(x, t), B\_0(t), \omega(t)) \,\mathrm{d}V,\tag{2}$$

where *B*(*x*, *t*) is the spatiotemporal field of internal magnetic flux inside the disk, d*V* is a volume element of the spatial domain and *Tq* denotes a Lorentz force on the volume element related to the spatial location *x*.

It becomes clear that the eddy current brakes are nonlinear systems with responses depending on the spatiotemporal dynamics of changes in magnetic flux ([20], Chapter 9). It is strongly dependent on the shape of the domain D as well as the form of the external magnetic flux generated by the magnet. Hence, the modeling with typical methods for distributed-parameter systems, such as finite or boundary element methods, are not only challenging but also difficult to apply in practical control design, as any change to the spatial distribution of external magnetic fields (e.g., reallocating the magnets or changing the boundary conditions) requires a total rebuilding of the model.

In fact, the accurate estimation of the braking torque is one of the key challenging problems for both the design of control systems and fault detection and its compensation. Furthermore, the modeling uncertainty has to be taken into account, as it is an important factor in applications to real-world engineering systems. Fortunately, Equation (2) defining the torque integrated over the spatial domain of the disk can be approximated up to a satisfactory accuracy based on the measurement data. Here, we make use of the well-known ensemble averaging properties of neural networks. They not only provide an important alternative for accurate modeling of the nonlinear torque, but also deliver necessary robustness with respect to disturbances as they can generalize the system response. Additionally, in order to introduce these uncertainties into control synthesis, the iterative learning control is applied, being considered a well-known robust control design technique. This is especially valid in the efficient control design, which will be discussed in the following section.

For the purpose of the simulation study considered in this work, we used an aluminum disk with a radius of 10 cm and a thickness of *d* = 1 mm. The disk is moving in the external magnetic field generated

by an electromagnet with a maximum flux of 0.1 T. This represents a typical configuration encountered in analog energy meters.

The control objective is to produce the input signal of magnetic flux *B*0(*t*) leading to the desired profile of output angular velocity *ω*(*t*) with the initial value of 200 rpm. As the control algorithm, the iterative learning control portrayed in detail in Section 3 was used. The exemplary spatial distribution of the magnetic flux inside the disk and the normalized reference profile are presented in Figure 2.

**Figure 2.** (**a**) Magnetic flux density *B* for *ω* = 200 rpm and *B*<sup>0</sup> = 0.1 T, and (**b**) required reference profile.

#### **3. Iterative Learning Control**

Because of the sequential regime of the process of gathering the measurement data as a basic control scheme for the magnetic brake, a nonlinear iterative learning was employed. The choice of the ILC is justified not only by the repetitive character of the process, but is also dictated by its robustness to model inaccuracies and repetitive disturbances [9,21] via proper compensation with the use of the measurement data. This makes it especially useful, especially in combination with neural networks providing proven convergence for the considered class of systems [15].

Taking into account the fundamental property of the magnetic brake, namely the strictly energy dissipative character of the system (one cannot force the overshoot upwards), and in order to keep the complexity of the controller at a relatively low level, open-loop ILC was chosen so as to achieve the assumed control performance. Even though it is clear that the feedback control may improve the average performance, this may come at the cost of higher sensitivity to online disturbances.

Because of the specificity of the controller learning process and neural modeling, it can be useful to further discuss the discrete-time domain. Thus, the general idea of the applied nonlinear ILC is represented by the scheme depicted in Figure 3. The learning controller has the following form:

$$B\_{\mathcal{P}}(k) = f(B\_{p-1}(k), \mathfrak{e}\_{p-1}(k)). \tag{3}$$

where *f*(·, ·) is a nonlinear function representing the dynamics of the controller, *p* stands for the iteration (trial) number, *k* is a time instant, *ep*(*k*) = *ωd*(*k*) − *ωp*(*k*) is the tracking error and *ωd*(*k*) is the reference signal. The system (3) constitutes the so-called P-type first-order learning controller [9]. Various choices exists for effective realization of *f*(·, ·). One of the possible solutions is to apply a feedforward neural network. The essential element of the learning algorithm is the necessity of estimating the output sensitivities of the magnetic brake with respect to the control signal. This is achieved using a model of the system as reported in previous works by the current authors [15,22,23]. The important feature

of such a solution is the possibility to adapt its structure to the changing operation conditions of the control system. Due to the availability of representative measurement data, such an approach seems to be especially attractive here. As the details of the training procedure for the controller are beyond the scope of this paper, the interested reader can be referred to [15,22,23]. Although the physics of magnetic brakes can be discussed in terms of distributed parameter systems, such a class of systems can be accurately approximated with non-parametric models, such as a nonlinear state-space innovation form model (NSSIF):

$$\begin{aligned} \mathbf{x}\_{\mathcal{P}}(k+1) &= \mathbf{g}(\mathbf{x}\_{\mathcal{P}}(k), \mathcal{B}\_{\mathcal{P}}(k), \mathbf{c}\_{\mathcal{P}}(k)),\\ \boldsymbol{\omega}\_{\mathcal{P}}(k) &= \mathbf{C} \mathbf{x}\_{\mathcal{P}}(k), \end{aligned} \tag{4}$$

where *xp*(*k*), *Bp*(*k*) and *ω*ˆ *<sup>p</sup>*(*k*) are the state-space, input, and predicted output vectors, respectively, *p*(*k*) = *<sup>ω</sup>p*(*k*) <sup>−</sup> *<sup>ω</sup>*<sup>ˆ</sup> *<sup>p</sup>* is the prediction error and *<sup>C</sup>* is the output (observation) matrix.

**Figure 3.** Iterative learning control scheme.

Figure 4 shows the control results relating to the tracking of the reference presented in Figure 2. Figure 4a illustrates the convergence of the tracking error norm for two cases: the untrained learning controller (red dashed line) and the preliminary trained controller (blue solid line). If we start with the randomly selected initial controller weights, the learning algorithm needs some time to adapt the parameters in order to follow the reference closely. Such an experiment is performed at the very beginning. After that we can use the preliminary trained neural controller, which is able to adapt to the current working conditions of the plant very quickly (blue-solid line). Figure 4b presents the quality of tracking of the reference, where the output of the the plant is marked with the red solid line while the reference is marked with the blue dashed line. Evidently, the applied ILC functions well and the plant output follows the reference closely.

The model (4) is in fact the state observer and works satisfactorily. However, in this paper we consider a fault-tolerant control design that relies on fault diagnosis. For the purposes of model-based fault diagnosis, a model of the system should be independent of the measured process variables. Unfortunately, the model (4) is dependent on the output of the system and thus cannot be used for fault detection and there a need for developing another model. This matter is taken up in the next section.

**Figure 4.** Control of the magnetic brake: (**a**) convergence of the tracking error norm and (**b**) tracking of the reference.

#### **4. Model Design**

As the NSSIF model cannot be used for fault diagnosis, our first attempt was aimed at the application of a state-space neural network represented by (4) assuming that the prediction error *p*(*k*) = 0. However, due to the complex characteristics of the magnetic brake portrayed in Section 2, just one state-space model is insufficient to accurately follow the dynamics of the system at different operating points.

Therefore, in order to overcome this impediment, a reasonable idea is to express the dynamics of the system as a mixture of multiple models, where each of them describes the system around an operating point. The idea is reminiscent of the so-called ensemble averaging from machine learning theory and is closely related to the gain scheduling approach to nonlinear control. Although the NSSIF is known to possess the universal approximation property [15], the proper generalization of the system response is often achieved at the cost of the bias. The mixture of models, if properly adopted, can reduce the distribution of the model output still keeping a bias on the low level. As a result, we are able to determine a good trade-off between uncertainty modeling and robustness with respect to disturbances. However, to achieve this balance we have to properly combine the components of the approach, which will be discussed in the following sections.

First, if not leading to ambiguity, the trial index *p* is omitted for the clarity of the presentation. Then the overall model is represented as follows:

$$
\hat{\omega}(k) = \sum\_{i=1}^{n} \hat{\omega}\_i(k)\mu\_i(\rho) \tag{5}
$$

with *<sup>n</sup>*

$$\sum\_{i=1}^{n} \mu\_i(\rho) = 1,\tag{6}$$

where *μ<sup>i</sup>* are scaling functions, *ω*ˆ*i*(*k*) represents estimates of the system output at the *i*-th operating point, *n* is the number of operating points considered and *ρ* stands for the parameter embodying dependence of the combination of local models on the operating point that can be represented by input, state, etc. The formulation (5) is known as a nonlinear gain scheduling [24]. It can also be considered as an ensemble of models used for regression [25]. The functions *μ<sup>i</sup>* can be perceived as a membership functions and then

(5) looks similar to the Takagi–Sugeno representation. The crucial parts of the modeling are: (i) a proper designing of local models and (ii) proper choice of the membership functions. The first problem can be readily solved using the state-space neural network of the form:

$$\begin{aligned} \mathbf{x}\_{l}(k+1) &= h\_{l}(\mathbf{x}\_{l}(k), \mathcal{B}(k)),\\ \boldsymbol{\omega}\_{l}(k) &= \mathbf{C}\_{l} \mathbf{x}\_{l}(k), \end{aligned} \tag{7}$$

where the function *hi*(·, ·) describing the *i*-th neural model is represented by:

$$\mathbf{x}\_{i}(k+1) = \mathcal{W}\_{i\_{2}} \sigma\_{h}(\mathcal{W}\_{i\_{1}}^{x} \mathbf{x}\_{i}(k) + \mathcal{W}\_{i\_{1}}^{b} B(k)) \tag{8}$$

where *<sup>σ</sup>h*(·) is the nonlinear activation function of the hidden layer, *Wi*<sup>1</sup> , and *<sup>W</sup><sup>x</sup> <sup>i</sup>*<sup>2</sup> and *<sup>W</sup><sup>b</sup> <sup>i</sup>*<sup>2</sup> are the *i*-th neural model weight matrices subject to training. Each neural network (7) is trained using data recorded during control of the magnetic brake at the specific operating point.

The latter problem can be effectively solved by means of Gaussian membership functions properly distributed on the domain *ρ*. In this paper, *ρ* is selected to be the operating point depending on the initial control value *B*0. This choice is motivated by the fact that the behavior of the magnetic brake strictly depends on the initial conditions. One of them is an initial control value. Then:

$$\mu\_i(B\_0) = e^{-\frac{(|B\_0| - B\_i)^2}{2\sigma^2}}\tag{9}$$

where *Bi* is the *i*-th operating point and *σ* is the spread of the Gaussian function. In (9) the absolute value of *u*<sup>0</sup> is used because for the magnetic brake the total torque defined by (2) is invariant to the exchange of magnetic poles; thus we can restrict the considerations to nonnegative values of the magnetic flux only. Consequently, it was possible to reduce the number of operating points and then the number of local models required to develop the overall model (5). In the present work, it is assumed that each membership function has the same spread and they are uniformly distributed, cf. Figure 5. In this way, the condition (6) is always satisfied.

Throughout the empirical analysis of the magnetic brake system the effective range of the magnetic field varied from 0.003 to 0.095. After performing a series of preliminary experiments we decided to set the number of operating points to seven. The distribution of Gaussian membership functions is depicted in Figure 5.

**Figure 5.** Distribution of the membership functions.

In order to satisfy condition (6), the spread of the functions was set to *σ* = 0.006512. The set of operating points was {0.003, 0.0183, 0.0337, 0.049, 0.0634, 0.0797, 0.095}. Data for model training were recorded during the normal work of the magnetic brake control system. For each operating point, one model represented by (7) was designed. The number of hidden neurons *v* as well as the order of the network *nx* were selected experimentally through trial and error. The final configuration of the models is presented in Table 1. A very important observation is that each local model has a relatively simple structure including a lower number of hidden neurons (only five) and model order (the second or third).

**Table 1.** Models configuration.


For illustration, let us analyze the modeling results presented in Figure 6.

In the case considered, the initial control was *ρ* = −0.0423. Then, four membership functions were activated with the following membership values: *μ*<sup>2</sup> = 0.0011, *μ*<sup>3</sup> = 0.4179, *μ*<sup>4</sup> = 0.5892 and *μ*<sup>5</sup> = 0.0053. This means that the estimated system output was calculated using four models. The third and fourth models had the most significant impact. The first, sixth and seventh models were not used here. Figure 6a shows the control signal. In turn, Figure 6b presents the output of the system (solid blue line) and the overall estimated output (solid black line) along with the outputs of all local models used to develop the overall model. Clearly, the proposed modeling design works pretty well. The sum of squared errors between the output of the system and its estimation is 0.0984. The model was tested on 100 sequences with different initial control conditions and the mean value of the sum of squared errors was 0.1681.

**Figure 6.** Modeling example: (**a**) Control signal and (**b**) output of models.

#### **5. Sensor Fault-Tolerant Control**

#### *5.1. Fault Detection and Accommodation*

Sensor fault detection is carried out on the ground of a residual signal. The residual is derived as a difference between the measurably available output of the system *ω*(*k*) and the output of the overall system model *ω*ˆ(*k*):

$$
\sigma(k) = \omega(k) - \hat{\omega}(k). \tag{10}
$$

Using (10) the diagnostic signal in the form of the squared norm is proposed:

$$d(r) = \|\vec{r}(k)\|\_{n'}^2 \tag{11}$$

where *n* is the length of samples in *r* and

$$
\vec{r}(k) = (r(k) - \mu\_r) / \sigma\_r \tag{12}
$$

with *μ<sup>r</sup>* and *σ<sup>r</sup>* denoting the expected value and the standard deviation of the residual signal, respectively.

From a statistical point of view, this constitutes a sufficient statistic preserving information about the state of the system. In fact, assuming that the measurement noise is a zero-mean Gaussian white process and according to the form of membership functions given by (9), the diagnostic signal (11) is a random variable distributed according to the *χ*<sup>2</sup> distribution with *n* degrees of freedom. In consequence, it is possible to use the diagnostic signal to construct a statistical parametric test to verify the null hypothesis about the excessive discrepancy of the norm from its nominal value. Thus, a decision about faults can be made. A very simple decision rule can be proposed, i.e., to compare the diagnostic signal *d*(*r*) with the suitably selected threshold *T*:

$$s(r) = \begin{cases} 0 & \text{if } d(r) \le T\_\prime \\ 1 & \text{otherwise.} \end{cases} \tag{13}$$

In the ideal conditions (absence of measurement and modeling uncertainty) the threshold is zero in the fault-free case and different from zero in the case of a fault. However, due to the modeling uncertainty, disturbances and the measurement noise, it is required to assign a threshold larger than zero to avoid excessive numbers of false alarms. In order to determine the threshold, let us introduce a significance level *α* that corresponds to the fixed a priori probability of a false alarm. Then the threshold *T* can be determined as some critical value being approximately equal to the (1 − *α*) · 100% percentile from the cumulative *χ*<sup>2</sup> distribution. When *n* is large, the random variable

$$
\vec{d}(r) = (d(r) - n) / \sqrt{2n} \tag{14}
$$

weakly converges to a standard normal distribution, so in practice, the determination of the threshold is even easier, as it is a solution to the equation

$$a = prob(\vec{d}(r) > T),\tag{15}$$

which can be efficiently found from the tabulated values of the cumulative distribution function of N (0, 1).

To make the threshold *T* applicable, it is required to determine the statistics *μ<sup>r</sup>* and *σ<sup>r</sup>* of the residual signal. These can be derived by analysis of the residual signal derived in the normal (fault-free or healthy) operating conditions of the magnetic brake, as portrayed in Section 3. The important index assessing the fault diagnosis quality is the time of fault detection. In this paper we deal with a dynamic system that is

used in a repetitive manner. Then, we propose to introduce a similar quantity to the time of fault detection called the trial of fault detection *pd*. Clearly, the trial of fault detection is the trial number at which the diagnostic signal (11) permanently exceeds the predefined threshold (15).

After detecting a sensor fault, it is time for reconstructing the value measured by the faulty sensor. To this end, the overall model of a magnetic brake is used. Since the fault is detected, the output of this model replaces the value measured by the faulty sensor. However, the perfectly fitted model of the system does not exist. Then, it is necessary to estimate the modeling uncertainty somehow. Here, we decided to apply a very simple strategy that is easy to implement, in which the modeling uncertainty becomes constant from trail to trail during the fault occurrence and is estimated as:

$$
\Delta\omega(k) = \omega\_{p\_d}(k) - \hat{\omega}\_{p\_d}(k) \tag{16}
$$

Finally, the reconstructed value of the system output can be derived as:

$$
\omega\_r(k) = \hat{\omega}(k) + \Delta\omega(k)\tag{17}
$$

#### *5.2. Fault-Tolerant Control*

In this paper, both the abrupt and incipient sensor faults are investigated. The abrupt fault is simply a sudden change of variables describing a sensor. When a fault occurs the value of a parameter jumps to a new constant value. In turn, the incipient fault gradually develops to a larger and larger value. The strength *fs* of the incipient fault can be modeled as follows:

$$f\_{\mathbf{s}}(t) = \frac{t - t\_{from}}{t\_{end} - t\_{from}} val \tag{18}$$

where *tf rom* is the fault start up time, *fend* is the fault end time and *val* is the fault strength at *tend*. For the system working repetitively, this means that an incipient fault can develop through a number of subsequent trials. The faults can also be split into multiplicative and additive ones. The multiplicative fault is simulated as:

$$
\omega\_m(t) = \omega(t)(1 + f\_s(t)),
\tag{19}
$$

where *ω*(*t*) is the real value of the process variable, *ωm*(*t*) is the measured value and *fs* is the fault intensity. The additive fault is simulated adding some bias to the angular velocity sensor as follows:

$$
\omega\_{\rm m}(t) = \omega(t) + f\_{\rm s}(t). \tag{20}
$$

In the case study the following faulty scenarios were investigated:


Each abrupt fault was simulated at the 11th trial and lasted for the following 20 trials. Figure 7 shows the behavior of the control system without fault tolerant abilities. In Figure 7a one can see the control performance of the healthy system. ILC drives the system to follow the reference closely with the tracking error norm below 0.2. Figure 7b displays the results in the case of scenario *f*1. Clearly, the abrupt multiplicative fault with the intensity *fs*(*t*) = 0.05 significantly deteriorated the control performance. During the fault occurrence the tracking error norm increased to the level of more than 0.3. Even worse

results were observed for the additive faults either with the positive or negative bias (Figure 7c,d). Clearly, any bias in the angular velocity sensor significantly brought down the control quality. The incipient faults were simulated at the 30th time instant of the 11th trial and lasted for the next 2000 time instants. This means that the incipient fault develops during the next seven trials. Figure 7e and Figure 7f show the control performance in case of the incipient scenarios *f*<sup>4</sup> and *f*5, respectively. In general, additive faults had the stronger impact on the control performance relative to multiplicative ones. All presented scenarios clearly show that there is a need for applying a control system that renders it possible to accommodate faults and maintain the acceptable control performance in case of faults.

**Figure 7.** Control of the magnetic brake: (**a**) normal functioning, (**b**) scenario *f*1, (**c**) scenario *f*2, (**d**) scenario *f*3, (**e**) scenario *f*<sup>4</sup> and (**f**) scenario *f*5.

**Abrupt faults.** As mentioned earlier, each abrupt scenario was simulated at the 11th trial and lasted for 20 trials. Fault detection was carried out by means of the constant threshold (15). Then, the performance of the control system should first be evaluated during normal operation conditions. The control was carried out for the 100 trials, for each trial calculating the value of the diagnostic signal. After that the mean value as well as standard deviation were calculated to be:

$$
\mu\_r = -0.0226, \quad \sigma\_r = 0.0344... 
$$

For the significance level *α* = 0.05 the value of the threshold was *T* = 1.644.

In Figure 8a one can see the diagnostic signal (solid blue line) with the threshold marked with the dashed red line for the fault scenario *f*1. Clearly, the fault is reliably detected with the trial detection index *pd* equal to 1. Due to the fact that ILC works offline, i.e., the control signal is derived based on data recorded during the previous trial, the fault accommodation can be immediately launched. The fault-tolerant control results are presented in Figure 8b. The fault was completely accommodated and the tracking error norm remained on the acceptable level. Additionally, the neural model had the property of data filtering. During training, the model weights are derived in such a way as to make it possible for the model to approximate any nonlinear mapping. That is the reason for the flat region observed in the tracking error norm course between the 11th and 30th trial.

**Figure 8.** Fault-tolerant control—scenario *f*1: (**a**) diagnostic signal (solid blue line) and the threshold (dashed red line), (**b**) control results.

Figure 9 shows the results for the faulty scenario *f*2. The changes observed in the diagnostic signal (Figure 9a) were not as large as in the case of the multiplicative fault; however, this fault was also quickly detected with *pd* = 1, which renders possible the fast reconstruction of the signal measured by the faulty sensor while keeping high quality control of the magnetic brake.

The last abrupt scenario was the additive fault with negative fault intensity (Figure 10). Contrary to the previously analyzed scenarios, this time the fault effect was distinctly visible in the diagnostic signal course (see Figure 10a). As with the previous abrupt scenarios the fault accommodation was immediately launched. The fault was detected within one trial. The fault-tolerant control was satisfactory, as presented in Figure 10b.

**Figure 9.** Fault-tolerant control—scenario *f*2: (**a**) diagnostic signal (solid blue line) and the threshold (dashed red line), (**b**) control results.

**Figure 10.** Fault-tolerant control—scenario *f*3: (**a**) diagnostic signal (solid blue line) and the threshold (dashed red line), (**b**) control results.

**Incipient faults.** The incipient faults were introduced at the 11th trial. The fault startup time was set to the 30th time instant and the fault end time was the 2030th time instant. The full strength of incipient faults was achieved at the 17th trial. The threshold level used was the same as in case of abrupt faults.

Figure 11 presents the results achieved for the additive scenario *f*4. As the fault developed slowly its detection took some time. In Figure 11a one can see that the diagnostic signal started to increase when the fault affected the velocity sensor. The diagnostic signal crossed the threshold at the 12th trial. This time fault accommodation launched one trial later than in the case of the abrupt faults. Nonetheless, the fault-tolerant control worked with a quality similar to the normal operating conditions. As a matter of fact, a slight increase of the tracking error norm was observable, but this change was not significant, contrary to the control scheme without FTC (compare with Figure 7e).

**Figure 11.** Fault-tolerant control—scenario *f*4: (**a**) diagnostic signal (solid blue line), (**b**) control results.

The last investigated scenario was the incipient multiplicative sensor fault. The obtained results are shown in Figure 12. It is clear that the evolution of the diagnostic signal was quite similar to that observed for the incipient additive fault *f*4. The fault was accommodated at the 12th trial (Figure 12a). In means that the trail of fault detection was equal to 2. Despite the one trail delay of the reconstruction of the signal measured by the velocity sensor, the fault did not affect the quality of the control at all (see Figure 12b). This phenomenon can be easily explained by analyzing the evolution of the incipient multiplicative fault. From (18) we know that after one trial the fault intensity growed up to the value of 0.0073. For the multiplicative fault this means that the value measured by the velocity sensor increased by ≈ 0.007%. Such a fault intensity is not meaningful to the work of the control system.

**Figure 12.** Fault-tolerant control—scenario *f*5: (**a**) residual, (**b**) control results.

The main objective of the FTC system is to maintain the current performance of the system as close as possible to the desirable one in the presence of faults. Then, the main performance index pointing out the quality of the proposed FTC system is the value of the tracking error. All reported experimental results clearly show that sensor faults were detected fast enough to assure high quality reference tracking. As shown in Figures 8b, 9b, 10b, 11b and 12b, the tracking error norm in case of a fault was kept on the level observed for the nominal operation conditions (see Figure 7a).

**Fault size influence.** To verify the performance as well as the sensitivity of the proposed FTC scheme the influence of the fault size on the control system performance was investigated. Key results are shown in Table 2. The abrupt/additive fault with a size smaller than 0.01 stayed undetected. For the abrupt/multiplicative fault slight problems were observed for the fault size *val* = 0.01. In this case it was observed that for several trials the value of the diagnostic signal was below the threshold. However, the fault was marked as detected because during a predominant number of trials the diagnostic signal crossed the threshold. However, for a fault size equal to 0.008 serious problems were already observed as the diagnostic signal oscillated around the threshold and the fault was not detected anymore. For a fault size smaller than 0.006, the diagnostic signal remained below the threshold, indicating that the fault was undetected. In the case of an incipient fault, the smaller was the fault size, the longer was the fault detection time, which is expected behavior. For a fault size smaller than 0.007, the fault was not detectable. Generally speaking, the portrayed results clearly show that faults with an intensity of less than 0.01 were not detected by the proposed fault diagnosis procedure. Thus, the crucial question arises of what is the quality of the control when a fault occurred in a sensor and it was not detected. The simple verification is to compare the tracking error norm in the case of control with the presence of undetected fault with control at the nominal operating conditions. For example, for an incipient fault with a size equal to 0.007, the tracking error norm took the values from the interval [0.1666, 0.1783]. Similar values were observed for the control at nominal operating conditions (see Figure 7a). Clearly, for the fault-free case, the tracking error norm converged to the level of the norm of disturbances acting on the magnetic brake. Then, a fault with an intensity of less than 0.01 influenced the system similarly to the external disturbance and had no adverse impact on the control system performance.

**Table 2.** Influence of the fault size on the fault diagnosis performance.


#### **6. Concluding Remarks**

The proposed method of sensor fault-tolerant control was found to be very efficient in the case of the class of nonlinear systems represented by the example of a magnetic brake. The advantage of the iterative control scheme is that the approach offers great flexibility to reduce the modeling and process uncertainty of nonlinear control design using the observational data available from previous trials. Despite its somewhat complex mathematical foundations, the resulting fault detection scheme is very easy to implement in practical conditions.

There is still space for further improvements, for example, to incorporate a feedback controller so as to increase the control performance with respect to online disturbances. More precisely, the offline ILC is not suitable for accurate compensation of random disturbances. In addition, the identification within a feedback loop is more difficult. As for a potential solution, the approach reported in [23] developed in the context of a similar class of nonlinear systems and feedback control can be extended to the online procedure.

Another potential future research direction that is very important in the context of neural modeling is the proper choice of representative observational data. A successful application of experimental design

theory to solve this problem was developed in [22,26,27] but not in the context of FTC. The efficient adaptation of this approach establishes an open problem, which requires further investigation.

Yet another problem is an extension towards robust solutions with additional constraints imposed on the control design, e.g., related to the actuator faults or uncertainties in the material parameters. This can be solved within the framework of the proposed ILC scheme via exploration of dedicated structures of neural controllers.

**Author Contributions:** Concept: K.P. and M.P.; methodology: K.P. and M.P.; software: K.P. and M.P.; validation: K.P., M.P.; formal analysis: K.P., M.P. and K.K.; writing—original draft preparation: K.P., M.P. and K.K.; visualization: K.P. and K.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The work was supported by the National Science Center in Poland, grant No. 2017/27/B/ST7/01874.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**



© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
