1. Introduction
Machinery facilities play a crucial role in industrial production, and faults in such facilities can lead to production disruptions and increased costs. Therefore, research aimed at improving fault detection and prediction holds significant importance. The causes of electric motor faults are primarily classified into bearing, winding, environmental, and various other issues. Bearing faults mainly occur due to factors such as corrosion, insufficient lubrication, and wear. Winding faults are divided into electrical and mechanical causes, with electrical causes including overload, interturn short circuits, interphase short circuits, and momentary overvoltages, while mechanical causes involve issues like shaft constraints and direct contact between the rotor and stator. Environmental faults primarily arise from moisture corrosion and chemical substances in the surrounding environment.
Unpredictable faults can occur in various parts of the device due to aging or operating conditions, and if regular inspections and maintenance are delayed, serious problems can arise. When faults occur, they not only affect the operation of the electric motor but also have a negative impact on the entire system, including industrial processes, transportation, water supply and drainage, firefighting, and power systems. Therefore, technology that can predict and prevent faults in advance is necessary. Accurate lifespan prediction is crucial as inaccurate predictions can result in unforeseen costs. Industries such as railways, machinery, and electric motors face challenges in lifespan prediction due to various installation times, manufacturers, and specifications. When faults or performance degradation occur in electric motors, anomalies are typically observed in vibrations, currents, temperatures, etc., deviating from normal ranges. Analyzing faults and malfunctions in electric motors requires practical experience, data collection, a comprehensive understanding of fault identification, and an understanding of causes and related symptoms.
The field of mechanical fault diagnosis has recently undergone substantial advancements, primarily driven by the application of deep learning techniques. Central to these developments is the use of convolutional neural networks (CNNs) for fault diagnosis, which has become increasingly prevalent [
1]. This includes a range of approaches, from employing CNNs for fault classification in mechanical equipment [
2] and small current grounded distribution systems [
3] to using them for bearing fault classification, where methods integrate spectral kurtosis filtering and Mel-frequency cepstral coefficients [
4], and even investigating the learning mechanisms of CNNs through time–frequency spectral images [
5].
Additionally, novel network architectures and signal-processing techniques have been introduced to enhance fault diagnosis. This includes the development of a weight-sharing capsule network using one-dimensional CNNs [
6], the exploration of one-dimensional local binary patterns (1D-LBPs) in bearing vibration signal analysis [
7], and the use of one-dimensional ternary patterns (1D-TPs) for accurate fault diagnosis in bearings [
8]. Advanced machine learning models like improved random forest algorithms have also been applied for industrial process fault classification [
9] and rotating machinery fault diagnosis using multiscale dimensionless indicators [
10].
Moreover, the field is witnessing a growing trend in employing complex machine learning strategies for fault detection. These include the application of kernelized support tensor machines (KSTMs) and multilinear principal component analysis (MPCA) for rotating machine fault detection [
11], the use of ensemble machine learning-based fault classification schemes [
12], and the integration of adaptive features extracted by modified neural networks for intelligent fault diagnosis [
13]. Furthermore, cross-domain approaches and advanced deep learning strategies are gaining traction. This is evidenced by the use of tensor-aligned invariant subspace learning combined with 2DCNN for intelligent fault diagnosis [
14], the implementation of deep transfer learning strategies for automated fault diagnosis [
15], and the development of specialized models like the PrismPatNet for engine fault detection [
16].
In addition, there is an increasing focus on using deep learning for more efficient and robust fault diagnosis under various conditions. Techniques like the deep focus parallel convolutional neural network (DFPCN) have been introduced to address imbalanced machine fault diagnosis [
17], and simplified CNN structures have been proposed for more effective rolling bearing fault diagnosis [
18]. Multisensor approaches using 2D deep learning frameworks are also being explored for distributed bearing fault detection [
19]. Lastly, researchers in the field are exploring the potential of synthetic data generation using variational autoencoders for enhanced fault classification and localization in transmission networks [
20].
To address these needs, this study proposes a method to improve fault detection and prediction, especially in electric motor installations in the industrial field. We utilized real mechanical equipment fault prediction sensor data [
21], and in particular, we extracted 2.2 kW current data from subway station air conditioning equipment. In real industrial facilities, various noises may exist, and it may be difficult to use high-specification processing systems. Therefore, this study aims to improve the accuracy of fault classification by utilizing various multi-input CNN structures and image conversion techniques as a robust yet lightweight model. The contributions of this research are as follows:
Automatically extracting and transforming key features of time series signals to use features in both time and frequency domains, and standardizing different formats such as sampling rate, duration, etc., through image transformation, so that the same model can be used in different datasets;
Converting signals into images reduces the number of dimensions of the data and makes it easier to process efficiently in deep learning models using CNNs with a two-dimensional image representation;
Through experiments, we compared different image conversion methods (RP, GASF, and GADF) and proposed a multi-input CNN structure that combines the conversion methods and shows more robust performance.
3. Models
In this study, a convolutional neural network (CNN)-based model was employed for the classification of faults in electric motor machinery. This model was designed with a lightweight architecture, utilizing two convolutional layers and one max-pooling layer for feature extraction. For classification, three fully connected layers were integrated, and the output layer consisted of a single node. Binary classification was achieved through the sigmoid activation function. During the training process, the model was trained using binary cross-entropy loss function. A multi-input CNN was configured using encoded images from the RP, GASF, and GADF. Detailed explanations of the model architecture used in the experiments are provided in the following sections.
3.1. Single-Input CNN (Convolutional Neural Network) Model
The single-input CNN model described in
Figure 6 and
Table 2 was based on a lightweight architecture. This model took one of the RP, GASF, or GADF images as input and extracted features through two convolutional layers and one max-pooling layer. For classification, three fully connected layers were utilized, and binary classification was performed in the output layer with a single node. The rationale behind the selection of model hyperparameters involved using 32 filters of size 3 × 3 in the first convolutional layer to capture various features of the images. In the second convolutional layer, 64 filters were employed to extract more complex patterns. Dropout was applied at a 50% rate during training to prevent overfitting, thereby enhancing the model’s generalization ability. The architecture of the model effectively reduced spatial dimensions by incorporating max-pooling layers after two convolutional layers. This allowed the model to capture both local and abstract features while maintaining computational efficiency. The fully connected layer with 256 neurons contributed to learning high-level abstract features and understanding intricate patterns.
3.2. Multi-Input CNN Model
The dual-input CNN model depicted in
Figure 7 utilized input pairs such as {RP, GASF}, {RP, GADF}, and {GASF, GADF}. The features extracted from these image pairs were effectively combined using a merge layer, using various forms such as addition, concatenated, or average functions.
The “addition” layer takes multiple inputs and computes the element-wise sum of each input, generating a single output. This layer is commonly employed to process multiple inputs or combine outputs from specific layers. The concatenated layer takes multiple inputs and concatenates them, typically used to concatenate multiple inputs or combine outputs from specific layers. The average layer takes multiple inputs and computes the element-wise average based on all inputs at the same position, generating a single output. This layer is commonly used to average multiple inputs or average outputs from specific layers.
Both images were processed using convolution layers, max-pooling layers, and three fully connected layers for feature extraction and classification. The model performed binary classification, utilizing the sigmoid activation function and the binary cross-entropy loss function during training.
The triple-input CNN model described in
Figure 8 utilized the {RP, GASF, GADF} image set as input. The structure of this model was similar to the dual-input CNN model but involved more inputs for feature extraction and classification.
4. Experimental Procedures
4.1. Experimental Configuration
The belt dataset used in the experiment comprised 130,000 normal and 372,000 abnormal data, and the bearing dataset comprised 154,000 normal and 400,000 abnormal data. The shaft dataset encompassed 194,000 normal and 728,000 abnormal data, and the rotor dataset consisted of 1,334,000 normal and 458,000 abnormal data. For each dataset in the model, the ratio of train, test, and validate was 70:24:6. Experiments were conducted for each of the bearing, belt, shaft, and rotor datasets with 15 different configurations. These configurations included variations in input combinations (RP, GASF, and GADF) and model architectures (single-input CNN, concatenated multi-input CNN, and addition multi-input CNN). We also explored image fusion using single-input CNN.
For optimization, the Adam optimizer was chosen, and the initial learning rate was set to 0.001. The learning rate was fine-tuned through experimentation. Model performance was evaluated based on accuracy and binary cross-entropy loss. To evaluate the generalization performance of the model, we used early stopping and cross-validation.
4.2. Experimental Results
For performance evaluation, standard metrics such as accuracy, precision, recall, and F1 score were utilized. Confusion matrices were generated to examine the fault classification performance of each model in detail.
4.2.1. Bearing
Through the comparison of the results in
Table 3, it can be observed that the concatenate multi-input (GASF-GADF) and average multi-input (RP-GADF) had superior performance compared to other configurations. These models achieved 100% accuracy by correctly classifying all faults, and they also exhibited the highest precision, recall, and F1 scores. This indicates that models utilizing concatenation with multiple inputs achieved the most effective fault classification, particularly in the case of bearings.
Figure 9 shows the loss and accuracy of the RP-GADF model in epochs, and
Figure 10 shows the confusion matrix results.
4.2.2. Belt
As shown in the comparison results in
Table 4, the concatenated multi-input (GASF-GADF) model exhibited superior performance compared to other configurations. This model had the best performance in terms of precision, recall, and F1 score.
Figure 11 shows the loss and accuracy results of the model using the GASF-GADF method, and
Figure 12 shows the confusion matrix results.
4.2.3. Shaft
As shown in the comparison results in
Table 5, the concatenated multi-input (GASF-GADF) model demonstrated superior performance compared to other configurations. This model exhibited the best performance in terms of accuracy, precision, and F1 score. In all evaluation metrics except recall, the model yielded excellent results in effectively predicting shaft faults.
Figure 13 shows the loss and accuracy results in epochs for the best model, and
Figure 14 shows the confusion matrix.
4.2.4. Rotor
As shown in the comparison results in
Table 6, the average multi-input (RP-GADF) model demonstrated superior performance compared to other configurations. This model achieved the highest accuracy and maintained top performance in precision, recall, and F1 score.
Figure 15 shows the loss and accuracy results of the average RP-GADF method, and
Figure 16 shows the confusion matrix results.
5. Discussion
In this study, we introduced a novel fault diagnosis model leveraging deep learning techniques, emphasizing feature extraction and image transformation to enhance performance. This model demonstrates significant potential in identifying and diagnosing faults in mechanical systems, offering a promising tool for predictive maintenance and operational efficiency.
However, the effectiveness of our proposed model is contingent upon the availability of substantial training data. This dependency poses a notable challenge, as acquiring a comprehensive dataset, particularly encompassing rare fault conditions or outliers, is inherently difficult in real-world scenarios. Such scarcity of fault data often leads to a class imbalance problem, which can skew the model’s learning process and potentially compromise its diagnostic accuracy.
To address these challenges, we suggest a two-pronged approach. Firstly, the integration of simulation-based methods [
24,
25] into the fault diagnosis process presents a viable solution. By utilizing simulated data, we can artificially augment the dataset with a wider range of fault conditions, including those not commonly encountered in real-world operations. This approach not only helps in balancing the class distribution but also enriches the model’s learning experience, potentially enhancing its diagnostic capabilities.
Secondly, the concept of continuous learning [
26,
27] in mechanical facilities and equipment is crucial. In dynamic industrial environments, where operating conditions and machine behaviors can evolve, the ability of a diagnostic model to adapt and learn continuously is paramount. This could be achieved through techniques like transfer learning, where a pretrained model is fine-tuned with new data, allowing it to adapt to new or changing fault patterns without the need for retraining from scratch.
Another critical aspect that warrants further research is the optimization of AI models for industrial applications. The current size and computational requirements of sophisticated deep learning models pose a challenge for their deployment in embedded systems commonly used in industrial settings. Therefore, research focused on reducing the computational footprint of these models, without compromising their performance, is essential [
28]. This could involve techniques like model pruning, quantization, or the development of more efficient neural network architectures.
In conclusion, while our proposed model shows promising results in fault diagnosis, its practical application is subject to overcoming challenges related to data availability, continuous learning, and model optimization for industrial deployment. Future research in these areas is not only necessary but will also significantly contribute to the advancement and practical utility of AI-driven fault diagnosis in the industrial sector.