3.1. Discrete Wavelet Transform
Signals from faulty components exhibit non-stationary behavior. However, if the frequency section of non-stationary signals is computed using the Fourier transform, the results will reflect the frequency composition averaged across the signal period [
17,
18]. Time–frequency analysis techniques are suitable for non-stationary transformations due to this differentiating feature. Numerous time–frequency analysis methodologies [
19], such as wavelet transforms, have been used for flaw discovery and diagnosis. This technique is evaluated to establish its primary advantages and reasons for use.
The wavelet transform (
) was developed and utilized in numerous applications to alleviate the resolution limitation of Fourier transforms [
20]. Trigonometric functions are employed in the Fourier series to modify the signal to provide a collection of coefficients; in the wavelet series, the primary mother wavelet is fitted to the signal, followed by the inner product of the inspected signal and a succession of daughter wavelets. Using the scaling (
) and shifting (
) parameters, the daughter wavelets are formed by shifting and scaling the wavelet transform. The scaling of the mother wavelet is exposed to expansion or dilation; if the wavelet is enlarged horizontally, it is compressed in the vertical axis to ensure the power density of the scaled wavelet and the original primary mother wavelet are identical [
21]. In the shifting stage, the wavelet is moved down the
x-axis until it entirely covers the studied signal, which may be expressed as follows mathematically [
22]:
is described as the wavelet transform portion of the signal , and is subsequently described as the transforming function (or the mother wavelet). The mother wavelet begins and ends, unlike the endless trigonometric functions. The mother wavelet fits the signal locally, not globally. Thus, it is the best method for accurately analyzing the projected quad-copter vibration signal.
The continuous wavelet transform (CWT) improves signal processing accuracy; nevertheless, it is potentially infinitely redundant, considering it unmanageable [
23]. This increases the amount of power, computation time, and memory required, making the CWT unusable in many cases, mainly when performing wavelet analysis in real time on embedded systems or other real-time monitoring systems, as is the case here. To conserve time and energy, the DWT was developed, in which the primary mother wavelet is only scaled and adjusted at discrete moments along with the signals rather than continuously. DWT is often used to deconstruct the original signal into many signals, each with a specific frequency bandwidth and capable of being considered as a separate signal on which alternative studies may be conducted. The DWT’s strength is that it evaluates data at different scales using filters with numerous cut-off frequencies. A high-pass (HP) filter is used to examine high frequencies, followed by a low-pass (LP) filter to analyze low frequencies. A complex-valued modification and augmentation to the basic DWT with essential qualities such as multi-resolution, limiting representation, and the ability to eliminate aliasing issues caused by the overlap of opposing-frequency pass-bands of the wavelet filters, the dual-tree complex wavelet transform represents another type of wavelet analysis known as complex wavelet transform [
24,
25]. For deconstructing and rebuilding, the dual-tree technique employs two concurrent DWTs with different low-pass and high-pass filters in each scale. The two DWTs use two pairs of filtrations, each of which passes the condition of complete reconstruction.
In general, employing the DWT to subdivide time-domain signals allows for multi-resolution analysis in many frequency bands with varying resolutions [
26]. The DWT utilizes the wavelet and scaling function related to the HP and LP filters. The original signal
is separated at the first level by passing it throughout both of these filters, acquiring two resulting signals under the same sampling length as the foremost primary signal, which is referred to as coefficients. To keep the number of these factors in the filtered signals equal to the number of coefficients in the primary signal samples, the samples are down-sampled by a factor of two, with only one out of every two subsequent samples retained. Thus, the first level detail coefficients
are the returned signal coefficients from the HP filter after down-sampling. These coefficients include the high-frequency info of the primary signal, while the coefficients recovered from the LP filter and again after the down-sampling procedure are known as the first level approximation coefficients
. These coefficients conceal the signal’s low-frequency information. This is mathematically stated as [
27]:
and
segments denote the high-pass and low-pass filters, respectively. When the first level decomposition is achieved, the preceding technique may be reused to subdivide cA1 into additional approximation and detail coefficients, as expressed in Equations (4) and (5) [
27]. This technique is repeated until the desired level is reached, where the decomposition must be recognized.
and
resemble the DWT coefficients at level
while
is the approximation coefficient at
level. In each subdivision level, the pertaining approximation and detail coefficients have particular frequency bands defined by
for the detailed coefficient
and
for the approximation one; F
S stands for sampling frequency [
28]. Nevertheless, filtering and subsampling at each level will deliver half the sample number (half the temporal resolution) and half the frequency spectrum (twice the frequency resolution). Due to the repeated down-sampling by two, the total number of samples in the processed signal must also be a power of two. Concatenating all coefficients starting with the last level of decomposition yields the DWT of the original signal, which has the same number of samples as the original signal.
Figure 2 is a graphic representation of how multi-level subdivision is accomplished. The number of necessary decomposition levels is determined by the lowest frequency bandwidth to be monitored. In addition, the highest level of deconstruction may be achieved when the unique complexities consist of a single occurrence [
28].
After calculating the detail and approximation coefficients, it is possible to reconstruct the detail and approximation waves at each level to view the data and accurately depict healthy and faulty circumstances. Each signal will have the same number of samples as the primary input signal but will have a separate frequency range. This can be achieved by up-sampling the detail (or approximation) coefficients by two, as they were initially produced by down-sampling by two, and then synthesizing them using low-pass and high-pass filters. To reconstruct the first level A1 approximation wave signal, for instance, just the approximation coefficients are required at this level, while a vector of zeros is given in place of the detail coefficients. Similarly, the same method can generate the first level detail signal D1.
Figure 3 illustrates the concept of signal synthesis.
3.2. Selection of the Optimum Mother Wavelet
DWT supports several wavelet families. To determine the best wavelet function for this study, a survey was conducted to identify the many main mother wavelets authors have used for fault diagnosis. Previous wavelet families include Daubechies (dbN), Symlets (symN), and Coiflet (coifN), where N is the number of orders in the wavelet family [
29]. For example, the wavelets symN and dbN have 2N coefficients. No generalized theoretical approach has been reported for choosing the best wavelet family when researchers use multiple families to analyze the same wave signal [
30,
31]. In many cases, the selection is performed by trial and error [
32]. Indeed, if the mother wavelet and the operative instance signal have a substantial similarity, the wavelet function is deemed suitable for evaluating the signal under consideration [
33].
Certain wavelet functions, such as sym7 or db10, have multiple filter coefficients due to the increased computational load on the PC and programs, which increases the processing time required for real-time wavelet analysis. Moreover, because the remaining options are decreased and limited to lower-order families, no more quantitative procedures are necessary to pick the mother wavelet. Symelet and Daubechies families are well-known for their performance in vibration signal analysis and encompass a wide range of wavelet orders [
34]; hence, Daubechies’ fourth order (db4) was used in this study.
Figure 4 illustrates the analysis of the discrete wavelet transform’s high-pass and low-pass filters and the synthesis of the same high-pass and low-pass filters where a wavelet family of db4 is selected.
3.3. Deep Neural Network
A deep neural network, or deep net for short, is a neural network with some amount of complexity, generally, at least two layers. Deep nets use advanced math modeling to analyze data in complicated ways. For instance, parameters
are used to represent a neural network. Weight matrices
and bias vectors
are among the parameters, where
denotes the depth of the neural network structure or the number of hidden layers as depicted in
Figure 5a. By minimizing the loss function
, DNNs provide the best approximation of the original function. The neural networks under investigation in this research are multilayer feed-forward neural networks, which are composed of the alternating affine linear equation
where
represents the set of training data and the nonlinear function
, which are known as activation functions. The weight matrix and bias matrix alter the training data set at each hidden layer, and the result is passed back to the next hidden layer through the activation function. The neural network learning approach is based on merging numerous linear and nonlinear functions to approximate the goal Equation 6 below [
35]:
where
and
are the ith hidden layer’s weight matrices and bias vectors, respectively,
and
are the output layer’s parameters, and
is the ith layer activation function, which is an element-wise nonlinear function. The most frequently utilized activation functions are sigmoid, tanh, and ReLU. Because the tanh function (Equation 7) is in the range
, it has the benefit of being more readily able to handle negative integers. As a result, it has been used in this investigation.
Typically, network training consists of adjusting the parameter based on gradient optimization during neural network backpropagation. The objective is to identify the best parameter that minimizes the loss function. This technique needs to differentiate its unknown parameters and , namely, to further assess the proposed algorithm’s differential operators. The flux gradient is significant in this procedure. It denotes the direction in which the parameters might change.
We understand that increasing the number of hidden layers in a single hidden layer network may result in a deep neural network. To exemplify the phrase, consider a network with two hidden layers, as illustrated in
Figure 5b. The network output
for the training data set
may be represented as follows:
In this case, an assumption is that the hidden layers have
N1 and
N2 neurons.
W1,
W2, and
Wout are thus the weight matrices of the following type:
where the weight is represented by
of the ith neuron on the kth hidden layer to the jth neuron on the (k+1)th hidden layer. Consequently,
b1,
b2, and
bout are the bias vectors:
The previous equations obtain the values of each quantity in
Figure 6 for the previously indicated activation function:
Employing the equations mentioned above when calculating the objective gradient function with respect to the parameters, the loss at each hidden layer is computed beginning with the network output layer and progressing layer by layer until it reaches the input layer. An intelligent data mining tool sets the parameters randomly for the most optimized outcome.