1. Introduction
In the present application, the tapered roller bearings are installed in a two-stage planetary gearbox in port vehicles. Since the bearings are exposed to a highly stressful environment, the bearings can evolve the wearing phenomena. Particularly in the port application, unpredictable machine failures must be avoided, as this results in high costs. Therefore, the condition monitoring of the bearings is necessary to detect damage at an early stage. To implement this, acoustic signals are evaluated in this contribution.
There is already a large number of publications regarding the condition monitoring of bearings. The most important results are introduced briefly. In general, two diverging approaches can be distinguished, namely signal processing methods and intelligent methods [
1,
2]. In general, the signals in the time domain, frequency domain or time–frequency domain can be used for signal-processing methods. Typical methods to analyze the signals are statistical indicators, the detection of characteristic fault frequencies for each component or occurring sidebands and spectrum observations [
3,
4]. In addition to common methods like Fourier transformation, multiple decomposition methods like empirical mode decomposition or variational mode decomposition, have been developed [
5]. The advantage of these methods compared to Fourier transformation is that they are applicable to non-stationary signals, which are frequently occurring in gearboxes. Therefore, the vibration signals are decomposed into mono-component modes. A new method for detecting bearing damage in this field is feature mode decomposition (FMD) [
6]. Based on the use of a finite-impulse response filter, the signals are preprocessed. Afterwards, a mode selection based on a combined argument of correlated kurtosis and the correlation coefficient is performed. FMD can successfully detect individual damages and, in contrast to previous decomposition methods, is also capable of extracting the modes in the case of combined damages. The database of all presented signal processing methods can be used as an input for a variant of intelligent methods, like support vector machines or neural networks [
7].
Based on the literature, the presented methods are not applicable to every use case. Most of the publications concern laboratory investigations on bearings. Investigations of bearings integrated in a gearbox are rare. Planetary gearboxes are used in rough and variable conditions. Therefore, the evaluation of fault frequencies based on the fast Fourier transformation is not possible since it is designed for stationary signals [
8]. Furthermore, planetary gearboxes, and, therefore, bearings in planetary gearboxes, show additional challenges like the superposition of vibration events, nonlinear transmission paths, low-speed components and noise. Also, most investigations are based on localized faults, like local pittings. Investigations on distributed wearing phenomena are rare.
This paper presents an investigation of localized and distributed defects. The investigations on the tapered roller bearings are being developed in the compound system of the two-stage planetary gearbox. In addition, this paper investigates the influence of unknown damage and is not limited to the detection of damage classes which the network has already used in the training process. For this, a convolutional neural network is developed, which takes spectral data as input. The first investigation addresses the class granularity. It is analyzed how many classes the classifier can reliably differentiate. Therefore, a binary classifier differentiating between two classes, healthy and damaged, is first developed. Afterwards, a classifier differentiating between three classes, healthy, localized and distributed damage, is used. Finally, a classifier differentiating between seven classes, which show different severities of the presented subclasses, is designed and evaluated. The second investigation addresses the impact of different speed rotations. The last investigation deals with unknown damage phenomena. For this, the classifier only differentiating in three classes is used. Single damage phenomena are held back during training and are only used in the test set to reveal if a correct classification is possible.
The article is structured as follows.
Section 2 presents the experimental setup. First, the two-stage planetary gearbox design is introduced. Afterwards, the investigated damages and lastly, the data acquisition and test rig are presented.
Section 3 deals with the dataset and classifier preparation. Afterwards, the different investigations are presented in
Section 4. In the first investigation the implementable class granularity is proven. Secondly, the influence of different speed rotations is observed. Finally, the robustness of the developed classifier is tested by classifying unknown damage events. In
Section 5, the results are discussed and a conclusion is drawn including the optimization potential.
3. Data Processing
In the present investigation, it is very difficult to derive precise predictions of the bearing condition based on signal-processing methods due to the noisy environment. Therefore, intelligent methods are used. A convolutional neural network (CNN) is tested for this purpose. Since the CNN processes image data, spectrograms are generated from the raw signals in the first step. This is explained in more detail in
Section 3.1. The general functionality and the structure of the used CNN is presented in
Section 3.2. Furthermore, since many classifiers show the overfitting and drawbacks of imbalanced datasets, suitable evaluation methods are identified in
Section 3.3 to enable a reliable evaluation of the results.
3.1. Data Preparation
Each dataset is divided into data segments with the length of half a second, 50,000 samples respectively, which are transformed into the time–frequency domain and depicted as spectrograms. The spectrograms are scaled logarithmic and the color scale is normed with the minimal and maximal values over the whole dataset. The spectrograms are saved as pictures without an axis because they are normed. The pictures are then used as three-dimensional input for a CNN. In total, the dataset consists of 9800 spectrograms, which are evenly distributed over all seven classes. An example of the generated spectrograms for a rotational speed of 2700 rpm is shown in
Figure 2. For the same operating conditions, the resulting spectrograms of three damage cases, respectively
healthy,
dist 2 and
local 0, are shown. Slight differences can be detected. In relation to the healthy condition, the signal amplitude in the lower frequency range increases in the case of distributed damage, while the middle frequency range obtains higher amplitudes only within a clearly definable frequency band. For local damage, on the other hand, the lower frequency band has higher amplitude than for the healthy state and is larger. In addition, this damage exhibits higher amplitude–frequency bands in the middle-frequency range. In the high-frequency range, all three damage phenomena behave similarly.
3.2. Classifier
CNNs are used in many applications, which can be summarized mainly into three categories [
15]. The first category is classification. Examples for classification are mainly medical image analysis and real-time target tracking. CNNs are also used for object detection, which is the second category. Object detection includes object localization and object classification. It is often used in medical context, for example, in radiology with X-ray images to detect fractured bones or in the case of tumor detection [
16]. Lastly, the third main application is segmentation. In addition, CNNs can be used as a preprocessing step for feature extraction [
17].
CNNs provide many advantages. They offer to extract features from the raw data with no need for preprocessing steps [
17]. Furthermore, they are precise and result in high accuracy [
18]. They have a high ability to work with variable or complex data, outperform other intelligent methods, are computationally efficient and only need few parameters [
19]. Therefore, the application to noisy acoustic signals in the port environment is promising for the realization of a reliable classifier.
Generally, a CNN is a multi-layer neural network and used for image data [
17]. CNNs use discrete convolution to extract information from the image data. The general structure of CNNs is presented shortly. CNNs basically consist of an input layer, hidden layers and an output layer. The hidden layers are convolutional, pooling or fully connected layers. First, the images are fed to the network using the input layer. Afterwards, a combination of multiple convolutional and pooling layers is used to extract features. One convolutional layer is built by a set of multiple kernels [
15]. Each kernel has weights, which are window-wise convolved with the input array. This creates features and also reduces the data size [
20]. The key parameters are the size and the number of kernels [
16], whereas pooling layers are only used to reduce the feature size, while retaining the relevant information. They reduce the dimensionalities and downscale the matrix. Pooling layers, in addition, improve the net transferability to unknown data [
18]. After multiple convolutional and pooling layers, the two-dimensional feature space is flattened to one dimensionality. This is performed through a fully connected layer and is required for the classification. At least one layer is necessary but multiple are possible. The last layer has the same size as the defined classes. In each fully connected layer, each input is connected and weighted to the next output. Lastly, a function of activation is needed, which normalizes the output to class probabilities.
CNNs have successfully been used in bearing condition monitoring [
2]. Guo et al., for example, used an adaptive deep CNN to differ between four bearing health states [
19]. Qian et al. used adaptive overlapping convolutional neural networks to increase the performance by using the one-dimensional raw vibration signal [
21]. This method reached high accuracy for a public bearing dataset with ten health conditions, while using only 5% of the data for training. In addition, the results were verified with a second dataset of a different bearing investigation.
For the presented application, a CNN is developed with tensorflow and tested for differentiating into the presented bearing damage cases. The architecture is shown in
Figure 3. In addition to the layer configuration, the two-dimensional kernel size of each layer is provided. Also, the number of features is shown below each convolutional layer. The first convolutional layer restructures the three-dimensional input, which is comprised of the RGB values of the spectrograms. The layer is combined with a relu activation layer and a max pooling layer. Afterwards, three convolutional layers, each combined with a relu activation layer, a normalization layer and a pooling layer using max pooling, are implemented. Finally, the data are flattened by global pooling, which is connected to a dense layer. Lastly, a second dense layer combined with a softmax activation layer is connected, which has the same size as the considered number of classes. Since the number of classes varies in different investigations, the last layer is variable.
3.3. Evaluation
To train the CNN, k-fold cross validation is used. Typically, in machine and deep learning,
is chosen as in the present investigation. First, the dataset is randomly shuffled. Afterwards, the dataset is divided into ten equal-sized parts, where each part is used once for testing [
22,
23]. The remaining data are used to develop the CNN-model, where eight parts are used for training and one part is used for validation. As consequence, the model is trained ten times. To evaluate the model accuracy, the mean result as well as the best and the worst iterations are considered. Cross validation ensures that all data points can be used for training and testing without resulting in overfitting [
24]. Without a good evaluation, many models tend to classify the trained data perfectly but fail for new data. This is called the generalization problem, which is overcome by using cross validation.
The present investigation is a multi-class problem. The most common classifier metric is the accuracy (ACC). It represents the ratio of correctly classified samples to the whole dataset [
25]:
Therefore, the correctly classified samples of the positive class (TP) and the correctly classified samples of the negative class (TN) are set in relation to all sample points. This includes the correct classification as well as the false classification of the positive (FP) and negative (FN) class.
This metric is useful if there is either a balanced class distribution or only the probability of a correct classification is important, independent of the classes. In the present case, a balanced class distribution is only given when differentiating in seven classes. But this might not be the case for each cross validation. In addition, further information about the ability to detect every single class is needed. Therefore, another metric, handling imbalanced datasets, should be used.
A common solution for this is the balanced accuracy (BAC) [
26,
27]. The BAC is the average over each class dependent recall:
The recall is the ratio of correctly classified samples of a defined class. Defining the considered class as positive and all others as negative, the class specific recall is the TP classified samples of a class in relation to all suggested class samples, which includes the TP but also the FN:
4. Results
The main aim of this contribution is to realize a reliable classifier to detect bearing damages. In this context, some specific investigations are presented for this application. First of all, the implementable class granularity should be tested. This is presented in
Section 4.1. The investigations on the test rig are examined on defined speeds. In real operations, constant and defined speeds at any time cannot be assumed. Therefore, the effects of data collected at unknown speeds on the classifier results should be investigated. This is shown in
Section 4.2. Lastly, based on the presented class granularities, it is investigated if unknown damages can be sorted into subclasses by the trained classifier. This is shown in
Section 4.3. These last two tests are intended to provide an indication of the robustness of the designed classifier from
Section 4.1.
4.1. Class Granularity
The first study examines whether the detection of bearing damage in the present application is implementable with the use of a CNN. At first, a binary classifier is developed, which distinguishes between healthy and damaged. The second class is composed of the distributed and localized classes. Then, the classification problem is extended to three classes. Thereby, the subclasses healthy, distributed and localized are differentiated. As shown in
Table 3, the classes
dist 0,
dist 1 and
dist 2 are combined for the distributed subclass. The subclass localized consists of
local 0,
local 1 and
local 2. In the last step, the class granularity is increased to seven classes. Therefore, all classes shown in
Table 3 should be detected. The investigation of the class granularity should provide information as to which granularity should be preferred in the case of a suitable trade-off between high-quality and high-information content. The results are shown in
Table 4. Each model is trained ten times using tenfold cross validation, with the averaged BAC over all iterations and the results of the best and worst iterations listed, respectively.
The result shows that the binary classifier achieves the overall best result with an average BAC of 0.99. The worse iteration is achieving a BAC of about 0.98. But the classifiers with higher granularity show very good results, too. With increasing class granularity, the BAC decreases. But the classifier with seven classes still reaches an average BAC of 0.96. The worst iteration produces a BAC of 0.70, which is far worse than the other granularities. However, this iteration could be seen as an outlier since the second-worst iteration reaches a BAC of about 0.98. In
Figure 4, the result of the worst iteration of the classifier with seven classes is shown. A confusion matrix (CM) is used for visualization. On the horizontal axis, the predicted labels are shown, whereas on the vertical axis, the true labels are depicted. A good classifier should predict the true labels, and therefore most of the samples should be placed on the main diagonal. The CM shows that the distributed classes are classified nearly perfectly, while the healthy and all of the local classes show multiple misclassifications. It is noteworthy that all four classes are often predicted as class
dist 0.
In summary, the classification of all presented granularities is possible with sufficiently good accuracy. Depending on the requirements and the use case, it should be decided individually which misclassifications are acceptable, and the largest granularity that meets these criteria should be selected accordingly. For the presented use case, the result of the classifier with seven classes is sufficient.
4.2. Speed Dependency
The second study examines the influence of speed dependencies. As presented, the data are collected at a constant speed with seven different rotational steps. In the real application, a constant speed at defined values is not possible. Therefore, the influence of the speed on the classifier accuracy should be investigated.
Two investigations are performed. In the first investigation, the classifier is trained with rotational speeds of 700, 1700, 2700 and 4200 rpm and tested with the other three speeds. In the second investigation, the classifier is trained with rotational speeds of 1200, 2100 and 3400 rpm. The following equations provide an overview of the investigations, where
D is the whole dataset:
Again, all investigations are performed using tenfold cross validation. The results are shown in
Table 5. The highlighted data show the results of the tested data, which means the trained classifier does not know those rotational speeds. First of all, the BAC is decreasing for samples, if the present rotational speed is not used in the training process. In the worst case, data samples collected at 700 rpm only reach an averaged BAC of 0.20, and the worst iteration only reaches 0.06. Therefore, the classifier becomes worse than random prediction. Secondly, it can be derived from the results that, compared among rotational speeds not used in the training process, the intermediate rotational speeds can achieve the best classification accuracy. Therefore, the data collected at 2100 rpm still reach an averaged BAC of 0.83 in the first investigation, although this speed is not trained. The second and third best results are reached at 3400 and 1700 rpm. Also, it can be noted that the results from the first investigation achieve higher BAC. The main difference to the second investigation is that only intermediate speeds are tested and wrapped by trained rotational speeds. In addition, with the worst results of the lowest and highest rotational speeds and the best results for the middle rotational speeds, it can be derived that for a good classification of unknown rotational speeds, a suitable coverage of rotational speeds during training is essential. Therefore, the rotational speeds in the training set should cover the whole range and be evenly distributed since the investigation shows that data of rotational speeds, which are wrapped by trained rotational speeds, result in good accuracy.
It can, therefore, be assumed that the ranges in between the known rotational speeds from the present measurements are also classifiable.
4.3. Unknown damages
In the third study, the robustness of the classifier is evaluated with regards to unknown damage conditions. To verify this, we first use the classifier with three classes. Unknown damages should therefore be classified into the categories healthy, local and distributed damage. A distinction of the severity levels of the local and distributed damages cannot be tested due to the limited data. The three distributed damages are all defined as severe. In addition, there is a difficulty of quantifying the damage. A distinctive separation between light and heavy damage is non-trivial and subjective for each applicator. For these reasons, a classification of the severity is not provided.
The focus of the investigation is on the classification into the three presented subclasses. For the investigation, each damage class except
healthy is withheld once in the training. The remaining damage phenomena are combined according to the subclasses and constitute the training data for the classifier. Subsequently, it is tested whether the withheld class can be classified into the correct subclass. The result is shown in
Table 6. Here, the result for each withheld class is listed separately. Again, this test is performed with tenfold cross validation. The average accuracy and the accuracies of the best and worst iterations are listed.
The result shows that classes dist 1, dist 2 and local 2 are classified almost perfectly. Class local 0 shows moderate accuracy, whereas classes dist 0 and local 1 cannot be classified with sufficient accuracy. Class dist 0 in particular performs very poorly, which is rather unexpected especially in comparison to the very good results of the other distributed damage classes.
A closer look at the misclassifications of withheld class
dist 0 reveals that most of the predictions belong to subclass
local. In contrast to the classification of a healthy gearbox if damage is present, this does not represent a safety risk. In
Figure 5, the CM of the best iteration of
local 1 is shown. Although most of the misclassifications are at the distributed damage and are therefore not critical, there are significantly more misclassifications at
healthy than in the previous case. Therefore, although it yields better accuracy than the tests with
dist 0, this could turn into a safety risk, and therefore the usage should be considered depending on the application.
Because not all unknown damages can be classified in the defined subclasses, a second investigation differentiating only between healthy and damaged using the binary classifier is performed. Once again, every damage class is withheld one time and tested afterwards. The results are shown in
Table 7. Like all other investigations, this is performed under tenfold cross validation, and therefore the average result as well as the best and the worst iterations are presented. Only class
local 0 shows slight difficulties, whereas all other classes exceed the average accuracies of over 0.95. For this reason, it can be derived that a reliable classification of unknown damage conditions into the classes healthy and damaged is achievable.
5. Conclusions
In this contribution, an investigation of the detection of various bearing damages is presented. Despite the measurement of the bearings in the compound of the two-stage planetary gearbox, which results in the superposition of vibration events, nonlinear transmisson paths, low-speed components and noise, the results show that a reliable detection of the bearing damage is achievable. A distinction can be made between six presented damage phenomena. For this purpose, vibration signals are collected by using an accelerometer. The vibration signals are evaluated by creating spectrograms, which are used by CNNs as input data.
In addition to the investigation of damage classification, complementary investigations regarding the robustness of the developed system are performed. These are intended to test the robustness of the system for field application as well as the expected variance in the damage cases. The results show that the classifier is robust for variations in the speeds within the trained speed ranges. Also in most cases, the classifier can achieve high accuracies in classifying unknown damages in the presented three sub-classes. However, due to difficulties with individual damage phenomena, the use of a binary classifier, which differentiates only between healthy and damaged, is recommended, especially for high-risk applications with an expected variance of damage phenomena to occur.
Since the classifier shows high robustness regarding unknown parameters, a transferability to planetary gearboxes with a different number of stages can be assumed. For this purpose, the network should probably be retrained. But due to missing access to equivalent data of a planetary gear with a different number of stages, a verification is not possible within this contribution. Therefore, the presented results are limited to the investigated two-stage planetary gearbox.
For future work, the classifier should be tested and optimized for bearing damages in the two-stage planetary gearbox installed in vehicles. Since the presented classifier can detect unknown damage phenomena, good transferability is assumed. Since unknown parameters, like additional noise or the influence of the load on the characteristics of the damage phenomena, are introduced in the field application, an adaptation of the classifier and additional filters are probably needed.