**3. Method**

## *3.1. Global Architecture*

The AOC-OPTICS method is developed for monitoring the state of health of a bearing throughout its entire life. It is based on the physical manifestations involved in the deterioration of a bearing. Thus 3 automated phases were proposed, Figure 2. Phase 1 considers that when a bearing is fitted, it is healthy during an interval *Th*. This phase allows the initialization of the method. Phase 2 corresponds to the failure detection phase. It is effective if the failure is not detected. Data agglomeration is used for early and reliable fault detection. The third phase corresponds to the follow-up of the evolution of the fault. In view of the evolutionary nature of the fault, the third phase is a follow-up loop of this

state by second class geometrical values. It runs until the bearing fails. Each phase is described in the following sections and Table 1 shows the associated pseudo code.

**Figure 2.** Flowchart of AOC-OPTICS.

**Table 1.** Pseudocode for automatic online classification monitoring based on ordering points to identify clustering structure (AOC-OPTICS).



**Table 1.** *Cont.*

#### *3.2. Phase 1, Initialization*

The first phase is executed for a duration *Th*, which is assumed to be a healthy phase of the bearing. For every iteration *k*, *n* signals were collected. *p* = 17 features were extracted in the time, spectral and/or time-frequency domains. The use of a multidomain feature in the detection of defect bearing can offer an efficacy diagnosis for different defects of rolling bearings, with variated speed and load. The time domain provides nine characteristic features as descriptive statistics. The statistical indicators are widely used to their relations with significant bearing damages [23]. The frequency-domain allows one to localize and detect the nature of the bearing defect [24]. Six indicators are computed. The time scale domain uses the wavelet method to extract two features [25], Table 2. These indicators are stored in a matrix [*HI*] where each column corresponds to a signal and each row to an indicator.

**Table 2.** Computed features. *x* is the sequence of samples obtained after digitizing the time domain signals, *xi* is a signals series for i = 1, 2..., N. *WS*(*fk*) corresponds to the spectral density of the max coefficients of the continuous wavelet transform. *s*(*k*) is a spectrum for *k* = 1, 2 ... *K*, *K* is the number of spectrum lines, *fK* is the frequency value of the kth spectrum value.


At the time *Th*, the indicator matrix [*HI*]*i* is normalized [*HI*]*norm i* , Equation (3). Normalization aims to transform the computed to be on a similar scale.

$$[HI]\_{i,(k+1)n}^{\text{norm}} = \frac{HI\_i - \overline{HI\_i}}{\text{std}(HI\_i)}\tag{3}$$

*Processes* **2020**, *8*, 606

A ranking step is applied. The ranking features are a significant method for eliminating the unimportant features before reduction the dimension. The massive amount of data calculates features take a long time. To reduce this long process, the method of ranking features is implemented to minimize the number of features, which can make the calculation faster, without touching the accuracy of detecting the defect. For the AOC-OPTICS, two methods are compared in Section 4.3 to eliminate the unnecessary features, with the different amounts of features: the relief method and the chi-square method [26].

Although the nuisance of dimensionality poses serious problems, processing data with high dimensions has an advantage that the data can give more information. The reduction method t-distributed stochastic neighbor embedding (t-SNE) is a powerful dimensional reduction tool, which can reduce functionality dimensions and increase the recognition rate to an overwhelming majority. The dimension reduced to be in three components, which will give more accuracy than two dimensions. The difference accuracy between the dimensions noticed in the representation of amplitude. Due to the use of the three-component in this paper, figures are shown in three dimensions.

Finally, the calculation of ε is done after a reduction in dimension. ε corresponds to the maximum distance between the center of the class, *ch* and the *MinPtsth* neighbor, Equation (4).

$$\varepsilon = \text{distance} \{ c\_{\text{h}} \colon \text{MinPts}^{\text{th}} \text{ height} \} \tag{4}$$

The resulting class is a so-called healthy class, noted *Ch*, with center *ch*. This class corresponds to a reference state.

#### *3.3. Phase 2, Detection*

The second phase is a step to detect the mechanical failure. The objective of this phase is to detect a new state called the defective class, noted *Cf* . At each new iteration *k*, the indicators are extracted, normalized, sorted and reduced as in the previous phase. These features [*FI*]3,(*k*+<sup>1</sup>)*n* are tested by the OPTICS method to detect or not a second class. If only one class is obtained, which corresponds to the reference state, the algorithm remains in the detection phase, this new data feeds the reference state. If two classes are detected, this new class *Cf* , is obtained in a plan *B*, which will be kept for the follow-up phase.

#### *3.4. Phase 3, Follow-up*

The third step is carried out in plan *B*, which is determined in the previous phase. It is important to keep the same plan in order to visualize the evolution of the characteristics. This plan is the best plan to follow the evolution of the bearing failure. With each new series of data, the indicators were extracted, standardized and projected in plan *B*. From these features, [*FI*]3,(*k*+<sup>1</sup>)*n* , five geometrical parameters *GVi* were calculated to monitor over time.

The Calinski-Harabasz index, *GV*1, is based on the density and the separated clusters, Equation (5). p is the features number. *c f* , *ch* are the center of the class *Cf* , *Ch* respectively. *nc* is the number of clusters. *d* is the Euclidean distance between *x*, *ci* .

$$GV1 = \frac{\sum\_{i} pd^{2} \{c\_{f}, c\_{h}\} / (n\_{c} - 1)}{\frac{\sum\_{\mathbf{x} \in \mathcal{C}\_{h}} d^{2} (\mathbf{x} \mathcal{L}\_{h})}{p - n\_{c}} + \frac{\sum\_{\mathbf{x} \in \mathcal{C}\_{f}} d^{2} (\mathbf{x} \mathcal{L}\_{f})}{p - n\_{c}}} \tag{5}$$

The Davies–Bouldin index, *GV*2, measures the average of similarity between each cluster. The lower index means a better cluster configuration. *Rij* is the similarity measure of two clusters *i and j*. *nc* is the number of clusters.

$$GV\_2 = \frac{1}{n\_c} \sum\_{i=1}^{n\_c} R\_i \text{ with } R\_i = \max\_{j=1...n\_{c\_c}n\_{i\neq j}} \left( R\_{ij} \right), i = 1...n\_c \tag{6}$$

This third parameter, *GV*3, calculates the distance between the center cluster of the initial phase *Ch* with the centre of the fault cluster *Cf* , where *dM* is the Manhattan distance.

$$GV\_3 = d\_M \mathbf{(c\_f, c\_h)} \tag{7}$$

Finally, the contour, *GV*4, of the cluster is calculated from a convex hull, which is the smallest convex set that contains the points. The density, *GV*5, is the number of points of the cluster, *Cf* for a volume *Vf* .
