Next Article in Journal
Methodological Solutions for Predicting Energy Efficiency of Organic Rankine Cycle Waste Heat Recovery Systems Considering Technological Constraints
Previous Article in Journal
Real-Time Underwater Fish Detection and Recognition Based on CBAM-YOLO Network with Lightweight Design
Previous Article in Special Issue
A Wind Power Combination Forecasting Method Based on GASF Image Representation and UniFormer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Cross-Domain Motor Fault Diagnosis Method Based on Bimodal Inputs

1
Key Laboratory of High Performance Ship Technology, Ministry of Education, Wuhan University of Technology, Wuhan 430063, China
2
School of Naval Architecture, Ocean and Energy Power Engineering, Wuhan University of Technology, Wuhan 430063, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2024, 12(8), 1304; https://doi.org/10.3390/jmse12081304
Submission received: 6 July 2024 / Revised: 23 July 2024 / Accepted: 31 July 2024 / Published: 1 August 2024

Abstract

:
Electric motors are indispensable electrical equipment in ships, with a wide range of applications. They can serve as auxiliary devices for propulsion, such as air compressors, anchor winches, and pumps, and are also used in propulsion systems; ensuring the safe and reliable operation of motors is crucial for ships. Existing deep learning methods typically target motors under a specific operating state and are susceptible to noise during feature extraction. To address these issues, this paper proposes a Resformer model based on bimodal input. First, vibration signals are transformed into time–frequency diagrams using continuous wavelet transform (CWT), and three-phase current signals are converted into Park vector modulus (PVM) signals through Park transformation. The time–frequency diagrams and PVM signals are then aligned in the time sequence to be used as bimodal input samples. The analysis of time–frequency images and PVM signals indicates that the same fault condition under different loads but at the same speed exhibits certain similarities. Therefore, data from the same fault condition under different loads but at the same speed are combined for cross-domain motor fault diagnosis. The proposed Resformer model combines the powerful spatial feature extraction capabilities of the Swin-t model with the excellent fine feature extraction and efficient training performance of the ResNet model. Experimental results show that the Resformer model can effectively diagnose cross-domain motor faults and maintains performance even under different noise conditions. Compared with single-modal models (VGG-11, ResNet, ResNeXt, and Swin-t), dual-modal models (MLP-Transformer and LSTM-Transformer), and other large models (Swin-s, Swin-b, and VGG-19), the Resformer model exhibits superior overall performance. This validates the method’s effectiveness and accuracy in the intelligent recognition of common cross-domain motor faults.

1. Introduction

In recent years, the problems of energy shortages and environmental pollution have become increasingly serious, which has accelerated the development of renewable energies [1,2,3,4]. International shipping is expected to be one of the fastest-growing sectors regarding greenhouse gas emissions due to the continued growth of global maritime trade [5]. In response to the challenges posed by energy scarcity and greenhouse gases in recent years, the international shipping industry has advocated for a transition to a low-carbon sector. Although the International Maritime Organization (IMO)’s proposed measures to enhance energy efficiency and reduce emissions—including research into alternative marine fuels, slow steaming, predictive maintenance, and additional emission-reduction technologies [6]—are laudable, they alone are not sufficient to address the sector’s sustainability challenges. One approach to improving the efficiency of ship propulsion systems and reducing pollutant emissions is to use electricity as the primary energy source [7,8,9]. To ameliorate the aforementioned situation, the transition from traditional internal combustion engines to electric motors is gradually being implemented in the propulsion systems of ships. Asynchronous motors, crucial components of electrical equipment aboard ships, have a diverse range of applications. They can drive auxiliary equipment such as air compressors, anchor winches, and water pumps, as well as propulsion systems. Malfunctions in these motors can not only cause damage to the motors themselves but also result in significant equipment and economic losses. Additionally, these potential losses can adversely affect the ship’s electrical system [10]. In practical applications, the working environment of asynchronous motors on ships is complex. These motors often operate under the harsh conditions of elevated temperature and humidity and frequently encounter issues such as overloading. Thus, the probability of motor faults is quite high. In the early stages of a motor fault, the motor’s operation is generally not affected, but its lifespan is significantly reduced, potentially leading to irreversible consequences. Motor fault diagnosis technology can detect faults in their early stages, enabling timely and targeted maintenance. This approach saves substantial time and money on repairs, prevents equipment downtime, and enhances economic efficiency [11]. Therefore, it is crucially important to examine the common faults of asynchronous motors and to investigate the corresponding fault diagnostic techniques to improve the reliability of equipment operations, minimize faulty shutdowns, and decrease maintenance costs.
During the operation of shipboard motors, the working environment is complex and often subject to noise interference. To address this issue, many scholars have proposed various methods to mitigate noise interference. These methods can be broadly classified into two categories. One category of processing involves the reduction in noise interference through the manipulation of signals. In reference [12], Chegini et al. leverage the advantages of wavelet transform in noise reduction and propose a new diagnostic approach. In reference [13], Abdelkader et al. propose a method that combines empirical mode decomposition and threshold optimization. In reference [14], Wang and Chu create a new indicator for selecting the optimal filtering band, making fault signals more prominent. The other category involves fusing multiple sensor data to leverage the complementary advantages of information and reduce noise interference. In reference [15], Choudhary et al. fuse the features of vibration signals and sound signals, proposing an acoustical fusion technique that enhances the accuracy and noise resistance of motor fault diagnosis. In reference [16], Ren et al. present a fault diagnosis method based on motor speed and kurtosis spectrum analysis. This method facilitates the extraction of impact components and improves the signal-to-noise ratio. In some articles [17,18,19], it has been demonstrated that the use of disparate signals can augment the information on various fault characteristics to a certain extent. Consequently, models with multi-modal inputs frequently exhibit superior performance even in the absence of data denoising. Therefore, models with multi-modal inputs often perform better even without data denoising. Vibration signals are the most widely used in motor signal applications, while current signals offer advantages such as ease of acquisition, high accuracy, low noise, and convenience in detection [20]. Therefore, this paper chooses vibration signals and current signals as multi-modal inputs to build the model.
In the last few years, the development of deep learning has been used to provide end-to-end solutions for diagnosing complex mechanical equipment [21]. Hinton et al. [22] were the first to introduce the concept of deep learning, making it a mainstream research area. As a branch of machine learning, deep learning enables adaptive feature extraction, overcoming the uncertainties associated with human factors in traditional machine learning. It effectively uncovers hidden features in high-dimensional data, thereby improving the accuracy of fault diagnosis [23]. Intelligent diagnostic technology based on visual images has been widely used in fault detection, as image samples can reveal fault information across different dimensions. There are many methods to convert one-dimensional signals into two-dimensional images, such as Short-Time Fourier Transform (STFT), wavelet transform (WVT), Gramian Angular Fields (GAFs), and Symmetric Dot Patterns (SDPs). Xin et al. [24] used the logarithmic STFT method to convert bearing data into time –frequency images with clear physical significance. Jia et al. [25] obtained wavelet time–frequency images of signals based on WVT and combined them with stacked denoising autoencoders to achieve sensitive fault feature extraction from bearing vibration signals under strong noise. Some scholars have used Gramian Angular Fields [26] and Symmetric Dot Patterns (SDP) [27] to generate two-dimensional images with multi-dimensional information. They classified these images based on the distinguishability of different fault images, achieving excellent fault diagnosis results.
As deep learning is developed and applied in fault diagnosis, more and more models are being developed for different problems. Among these, convolutional neural networks (CNNs) are a prominent representation of deep learning in fault diagnosis. Jia-Ling Xie et al. [28] have developed a deep learning model for fault diagnosis, utilizing the feature extraction capabilities of ResNet and the temporal processing abilities of bidirectional long short-term memory (BiLSTM) networks. The purpose of the model is to diagnose high-resistance connection faults in the electrical propulsion systems of ships. Yong Zhu et al. [29] employed a hybrid approach, integrating VGG and long short-term memory (LSTM) models, to intelligently detect typical faults in hydraulic axial piston pumps. In recent years, the development of visual Transformer models has challenged the dominance of CNN models. The Transformer model [30] is a neural network based on the self-attention mechanism. Unlike traditional CNN models that can only capture local features, the self-attention mechanism used by the Transformer model can grasp the global features of an image and the relationships between different input features. This has led to superior performance compared to CNNs in many fields [31,32,33].
It is important to acknowledge that these methods face challenges related to generalizability and scalability when used for engine fault diagnosis. They are often only available for certain types of engines and are difficult to implement on a large scale. For example, the aforementioned methods, which include signal denoising in deep learning, the introduction of attention mechanisms, deepening network structures, and building large models, are effective but often come with high time and computational costs. They require substantial computing resources and processing time, which restricts their application in real-time and online scenarios. Additionally, due to the similar vibration effects caused by motor voltage imbalance and certain mechanical faults, it is challenging to distinguish these faults using a single vibration signal. Therefore, designing an efficient, reliable, and widely applicable motor fault diagnosis method remains a topic that requires further research.
To address these limitations, a method for motor fault diagnosis under variable operating conditions is proposed. The proposed hybrid model employs several innovative techniques to tackle these challenges. First, vibration signals and current signals are collected from the motor under different loads and fault conditions. The one-dimensional vibration signals are then converted into two-dimensional image signals using continuous wavelet transform (CWT) [34]. Performing Park transformation on three-phase currents [35] has been successfully applied to rotor faults, stator inter-turn insulation faults, power unbalance, inter-turn short circuits, and load imbalances [36]. These signal-processing techniques can reduce the impact of noise to some extent. Subsequently, these data are fed into the model. The current signal is a one-dimensional time-series signal, while the image obtained from the vibration signal after continuous wavelet transform (CWT) belongs to a two-dimensional time–frequency signal. The dimensions and modalities of the current signal and vibration signal are different. To align data from different dimensions and modalities, a dual-model architecture was employed. This architecture combines Swin-t with ResNet and then utilizes Multi-Layer Perceptron (MLP) layers to fuse features from the fault data. The hybrid model was validated using data contaminated with various levels of noise, demonstrating higher robustness, generalization, and accuracy compared to standalone convolutional neural networks or Transformer architectures.
The primary innovations of this study encompass three main aspects:
(1) By inputting time–frequency images obtained from continuous wavelet transform (CWT) of vibration signals and PVM signals from the Park transformation of three-phase current signals into the model, this method fully extracts fault features from different modalities. It leverages the raw sensor data, eliminating the need for complex and time-consuming signal denoising and modulation steps. Additionally, the bimodal input allows the model to access a richer set of fault information, thereby enhancing its noise resistance and classification capabilities.
(2) For different load motor faults, this paper proposes a hybrid model based on ResNet and Swin-t, simplified as Resformer. This integration improves feature extraction capabilities, mitigates overfitting, and facilitates fault diagnosis across diverse operating conditions. Experimental evidence shows that, even without data noise processing, the Resformer model exhibits greater robustness and generalization capabilities compared to models with single-modality input and other bimodal input models.
(3) By adding noise data of varying degrees and comparing the performance of different network models under noise interference, it was found that the Resformer model possesses better fault separation capabilities, stronger anti-interference abilities, and fewer parameters. By reducing the number and complexity of model parameters, the Resformer model achieves lower training costs, faster inference speeds, and reduced storage requirements. The construction of this model provides theoretical support for the real-time diagnostics of motors under different load conditions.
The rest of the paper has the following structure: In Section 2, the principles of relevant works are presented. Section 3 elaborates on the implementation process of intelligent fault diagnosis methods. Section 4 outlines the process of collecting experimental data and constructing fault samples. Section 5 conducts comparative experiments on the models and analyzes the results. Section 6 concludes with remarks and provides a perspective on future research directions.

2. Basic Theory

2.1. Continuous Wavelet Transform

The continuous wavelet transform (CWT) has significant advantages in handling the nonlinear and non-stationary vibration signals of marine machinery. It ensures appropriate frequency resolution while maintaining time resolution, offering high adaptability and resolution. Motor vibration signals typically involve changes in both time and frequency, making CWT capable of providing a more comprehensive description of motor vibration characteristics. The wavelet transform method represents the signal through wavelet functions. Assuming Ψ ( t ) L 2 ( ) , whose Fourier Transform is Ψ ( t ) and satisfies Ψ ( 0 ) = 0, the wavelet function coefficient Ψ is a family of functions obtained by scaling and translation.
Ψ a , b ( t ) = | a | 1 / 2 Ψ ( t b a )
In the equation, Ψ a , b ( t ) is a continuous wavelet; Ψ is the basic wavelet; a and b are the scaling factor and translation factor; a, b ; and a 0.
The formula for the continuous wavelet transformation of any function f ( t ) L 2 ( R ) based on the wavelet sequence Ψ a , b ( t ) is
W f ( a , b ) = f , ψ a , b = | a | 1 / 2 + f ( t ) Ψ ( t b a ) d t
In the formula, Ψ is a complex conjugation and f , ψ a , b is the inner product.
The Morse wavelet is selected as the basis function due to its superior time–frequency resolution compared to the Morlet wavelet [37].
The generalized Morse wavelet frequency domain is defined as
ψ ( β , γ , t ) = 1 2 π + ψ ( β , γ , w ) · e j w t d w
ψ ( β , γ , ω ) = a ( β , γ ) · ω β · e ω γ · 1 ω > 0 1 2 ω = 0 0 ω < 0
In the formula, a ( β , γ ) is a constant; ω is the frequency parameter that controls the frequency of the wavelet function; and β and γ are parameters that control the attenuation of the wavelet in the time and frequency domains, respectively.

2.2. Park Transformation

For electrical faults in asynchronous motors, the Park transformation is a commonly used method. The performance of a motor during operation is typically described by its current and voltage equations over time. Influenced by external environmental factors, the magnitude of current, magnetic flux, and voltage in the motor constantly changes, making the motor’s operating model very complex. The Park vector transformation converts the currents in a three-phase coordinate system into two orthogonal components in a rectangular coordinate system. This conversion simplifies the differential equations used to establish the electromagnetic relationships in the rotor circuit by making the coefficient matrix a constant rather than one that varies with time and space. This greatly simplifies the analysis of the motor’s electromagnetic relationships. The transformation has been successfully applied to rotor faults, stator inter-turn insulation faults, power imbalance, inter-turn short circuits, and load imbalances [38,39].
The transformation formulas for the Park transform are as follows:
i d i q = 2 3 · 1 1 2 1 2 0 3 2 3 2 i A i B i C
Under ideal conditions, a three-phase current will generate a vector containing the following components:
i d = 6 2 I M sin ω t i q = 6 2 I M sin ω t π 2
In these equations, i d and i q represent the currents equivalent to the direct and quadrature axes. i A , i B , and i C are the three-phase currents. I M is the maximum current of the supply phase and ω represents the supply frequency.
In addition, to simplify the input data volume, the two current components are reduced to a single component of the Park vector modulus (PVM), as given by Equation (7).
P V M = i d 2 + i q 2

2.3. ResNet Network Model Structure

In typical deep learning network architectures, layers are directly connected, meaning that signals from upper layers are passed directly to lower layers. However, this connection method often leads to the loss of feature information. As the network depth increases, the amount of lost feature information also increases, further exacerbating the vanishing gradient problem. Consequently, training deep networks becomes more challenging, ultimately affecting classification accuracy. Residual neural networks (ResNets) were first proposed by Kaiming He et al. [40]. The main contribution of ResNets is the identification of the “degradation problem” and the invention of “shortcut connections” to address this issue. The residual modules in ResNets can solve the performance degradation problem caused by increased network depth, thereby improving the network’s performance.
The ResNet-18 network is primarily composed of stacked residual blocks, consisting of one convolutional layer, two pooling layers, one fully connected layer, and eight residual blocks (each block containing two convolutional layers). Figure 1a depicts the structure of the ResNet-18 network, while Figure 1b illustrates the structure of the residual block. In Figure 1, dashed lines represent changes in the number of channels, solid lines indicate no change in the number of channels, k represents the receptive field size of neurons in the convolutional layer, s represents the stride, and p represents the number of zero-padding at the borders.

2.4. Swin Transformer Model Structure

The Vision Transformer (ViT) [41] network, after being thoroughly pre-trained on large datasets such as ImageNet-21K and JFT-300M, has been transferred to smaller datasets. In numerous classification tasks, it has matched or even surpassed the performance of current state-of-the-art CNNs. The Swin-t [42] network further extends the applicability of the Vision Transformer (ViT) network. The Swin-t network demonstrates strong performance in recognition tasks such as image classification, object detection, and semantic segmentation. It achieves a good balance between computational efficiency and performance. The overall architecture of the Swin-t network is shown in Figure 2. An input RGB image of size H × W × 3 is divided into non-overlapping, equally sized patches using the Patch Partition module. Each patch is treated as a token, with its features being the concatenated RGB values of the image pixels. The Linear Embedding Block projects the tensor into an arbitrary dimension C. Subsequently, these tensors are fed into the Swin Transformer Blocks with enhanced self-attention mechanisms, as shown in Stage 1 (the first dashed box in Figure 2). As the network depth increases, the number of tokens is reduced through the Patch Merging Block. The Patch Merging Block concatenates each group of 2 × 2 adjacent patches, forming a 4C-dimensional concatenated feature. A linear layer is then applied to this concatenated feature, reducing the output to 2C. The features are then transformed using the Swin Transformer Block. The first Patch Merging Block and its corresponding feature transformation Swin Transformer Block are designated as Stage 2 (as shown in the second dashed box in Figure 2. Repeating the same process as in Stage 2 two more times results in Stage 3 and Stage 4 (as shown in the third and fourth dashed boxes in Figure 2, respectively). Finally, the Classification layer produces the final prediction vector.
The two successive Swin Transformer Blocks, as shown in Figure 3, include a shifted window-based self-attention module (W-MSA/SW-MSA), followed by a two-layer MLP with a nonlinear GELU [43] activation function. A LayerNorm layer is used between each multi-head self-attention (MSA) module and the MLP. Additionally, residual connections are applied after each MSA and MLP.
The Swin-t module incorporates relative position bias B, where B belongs to R M   2 × M   2 , to compute attention. The calculation formula is
A t t e n t i t i o n ( Q , K , V ) = SoftMax ( Q K T / d + B ) V
In the equation, Attention(·) represents the self-attention value; Q, K, and V are the Query matrix, Key matrix, and Value matrix, respectively; Softmax is the exponential normalization function; d is the input channel number; B is the bias matrix.
The MLP part can be represented as
Y = σ ( Norm ( X ) W 1 ) W 2 + X
The formula represents W 1 R C × C and W 2 R r C × C as the learning parameters for the MLP expansion ratio r; σ ( · ) is the nonlinear activation function GELU, with X as the input parameter and Y as the output parameter.
The product of self-attention weights and V represents the final output containing self-attention values. The overall output is obtained by merging and concatenating the outputs of multiple heads of self-attention. If z is the output of each layer, its calculation process can be represented by Equation (10).
z ^ l = F W-MSA ( F L N ( z l 1 ) ) + z l 1 z l = F M L P ( F L N ( z ^ l ) ) + z ^ l z ^ l + 1 = F SW-MSA ( F L N ( z l ) ) + z l z l + 1 = F M L P ( F L N ( z ^ l + 1 ) ) + z ^ l + 1
This equation indicates that F W-MSA is the modular processing function for the W-MSA layer; F M L P represents the modular processing function for the MLP module; F L N represents the modular processing function for the layer normalization layer; and  F SW-MSA represents the modular processing function for the SW-MSA layer.

3. Intelligent Diagnostic Model for Multi-Modal Cross-Domain Motor

A cross-domain motor fault diagnosis model was developed based on the fusion of fault features from motor vibration signals and current signals, leveraging the advantages of Swin-t and ResNet in deep learning. The Swin-t demonstrates powerful capabilities in extracting spatial features, while ResNet excels in capturing subtle features and efficient training. Combining these strengths, the model is capable of handling complex data that contain both spatial and temporal information, enabling cross-domain motor fault diagnosis. To achieve this, a dual-channel network model was established to extract fault features from different modalities. Specifically, the Swin-t is used to extract fault features from two-dimensional time–frequency images, while ResNet is used to extract fault features from one-dimensional time-domain PVM signals. To effectively fuse and retain the visual and signal feature vectors, the model’s fusion layer utilizes a Multi-Layer Perceptron (MLP) model to merge the extracted feature vectors, ultimately outputting the fault type. The specific network structure is shown in Table 1.
The model architecture is illustrated in Figure 4. This model effectively retains fault features from both current and vibration signals through a dual-channel setup. Specifically, the current signal is dimensionally reduced using Park transformation to extract a one-dimensional PVM signal from the three-phase current, thereby reducing redundant data. On the other hand, the vibration signal undergoes continuous wavelet transform to be elevated to a two-dimensional time–frequency image, capturing more detailed fault features. The processed signals are then fed into the training model for feature extraction, aiming to enhance the accuracy of fault diagnosis.
For current signals, fault features mainly appear in the harmonic distribution, particularly in the local variations of the PVM signal. The ResNet architecture enhances network depth by incorporating residual blocks, allowing it to capture more complex features. Initially, the network performs preliminary feature extraction using one-dimensional convolutional pooling. Then, through four layers of residual structures, the learned features are directly added to the input features, capturing finer details. Finally, adaptive average pooling reduces the dimensionality of the feature vector, lowering computational complexity and parameter count. The resulting feature vector, with a length of 512, contains the extracted current fault features.
For vibration signals, fault features are prominent in both the time and frequency domains. We transform vibration signals into two-dimensional images to capture this rich information. The Swin-t architecture excels in this aspect. Initially, Patch Partition and Embedding layers reduce the image size and computational complexity while preserving spatial structural information. Next, features are extracted through four Transformer blocks, with the Patch Merging Layer performing feature aggregation and dimensionality reduction. The Swin Transformer Block Layer facilitates information exchange across windows, effectively integrating global information. Finally, global average pooling reduces the feature map dimensions, minimizing computational complexity and parameter count. The output is a vector of length 768, containing the extracted vibration fault features.
In the final fusion layer, we concatenate various fault features to retain the extracted information. A three-layer Multi-Layer Perceptron (MLP) with a Softmax activation function maps these features to classification results. The Softmax function normalizes each output value to a range between 0 and 1, ensuring the sum equals 1, representing the probability of each category. The category with the highest probability is selected as the predicted result for fault diagnosis. The model’s performance is assessed by computing the cross-entropy loss between the predicted and actual results and updating the model parameters through backpropagation to align the predicted probabilities with the actual labels.

4. Collection of Experimental Data and Construction of Fault Samples

4.1. Setup of Experimental Platform

In order to evaluate the effectiveness of the proposed method, a three-phase asynchronous motor experimental platform was constructed. The test bench used in this experiment is the integrated drivetrain fault diagnosis test bench designed by SpectraQuest (SpectraQuest, Shanghai, China). This test bench consists of a variable speed drive motor, a planetary gearbox, a parallel axis gearbox supported by rolling bearings, a variable load magnetic brake, and a signal acquisition system, as shown in Figure 5. The current sensor model used is SY-DF4D-AA5 (Suozheng, Nanjing, China), the voltage sensor model is SY-DF4D-AV450 (Suozheng, Nanjing, China), the current Transformer model is LMZJ1-0.5 (Ouyi, Shanghai, China), and the vibration sensor model is HS-100 (Hansford Sensors, Shanghai, China). The variable speed drive motor is a marathon_d396a (Rayber Electric Group, Wuxi, China) type three-phase asynchronous motor. In addition to a normal motor, the test bench is equipped with motors for fault simulation, including one with voltage imbalance, one with phase loss, one with inter-turn short circuit, one with rotor imbalance, and one with broken rotor bars. The fault categories set up for the experiment include eight types: normal (NM), rotor unbalance (RU), rotor misalignment (RM), rotor bow (RB), bearing defects (BD), broken bar rotor (BR), turn-to-turn short circuit (SC), and stator single-phase open (SP).
In the setup shown in Figure 5, the marathon_d396a type three-phase asynchronous motor is set to a constant speed of 1000 r/min. Steady-state data are collected under different loads, with the load torque set to 0 N · m , 50 N · m , and 100 N · m . The data used in the experiment include the vibration acceleration data from the motor drive end and the stator current data, both collected by sensors. The sampling frequency for both the current and vibration sensors is 25,600 Hz, and the data collection duration is 10 s. For each operating condition, 256,000 samples of motor current and vibration signals are collected to establish the experimental dataset for subsequent data processing.

4.2. Construction of Experimental Dataset

4.2.1. Fault Sample Feature Extraction

Fault data for different motor conditions were collected using the experimental platform. To increase the number of samples in the training set, data augmentation was performed using the moving window method. The window width was set to 768 sample points, and the window step size was set to 512 sample points. Specifically, each dataset’s samples from [i × 512, i × 512 + 768] generated a fault sample feature.
Figure 6 and Figure 7 illustrate the processing of current and vibration signals for a normal motor and a motor with a stator single-phase open fault, respectively. The vibration signals are transformed into time–frequency spectrograms, while the current signals undergo Park transformation to generate PVM signals. The time–frequency spectrograms and PVM signals differ significantly under different fault conditions as shown in these figures. In the time–frequency spectrograms, these differences are represented by distinct frequency distributions, where the color distribution indicates frequency and the intensity of the color represents the amplitude. In the PVM signals, the differences manifest as variations in waveform harmonics and amplitude.

4.2.2. Fault Sample Data Analysis

In studying the vibration and current signals under different loads, whether the vibration and current signals exhibit similar characteristics at the same speed but different loads is explored. In Figure 8, three different steady states are depicted, including the original vibration signals, CWT (continuous wavelet transform) images, and the amplitude–frequency plot of the vibration signals. The CWT images shown in Figure 8 are obtained using the moving window method to capture partial images, while the amplitude–frequency plot displays the frequency distribution of the entire time sequence of the original vibration signals. It can be clearly observed that the original vibration signals from the three operating conditions are difficult to directly distinguish between different faulty motors, and the original vibration signals also vary across different operating conditions. From the CWT images, differences can be seen between the time–frequency plots of the same load but with different fault states, with each fault type exhibiting distinct fault characteristics in the time–frequency plots. The time–frequency plots of the same motor state under different loads exhibit some degree of similarity, as further verified by the analysis of the amplitude–frequency plots. The amplitude–frequency plots illustrate the frequency distribution of motor vibration signals under different operating conditions. Based on the amplitude–frequency plots, it can be observed that the frequency distribution of motor vibration is approximately similar under different loads but in the same motor state. The variation lies in the magnitude of amplitude, which is manifested as changes in brightness and darkness in the CWT images.
Figure 9 shows the PVM signals of three-phase currents under three different steady states after Park transformation, as well as a portion of the PVM signals collected using the moving window method. To better analyze the PVM signals, we normalized the PVM signals under different fault conditions using the normal signal as the standard. As shown in Figure 9, the harmonic components of the current vary under different fault conditions, leading to different PVM signals for the motor’s three-phase currents under different faults. Notably, the harmonic distribution of the PVM signals shows significant differences. However, it is important to note that the PVM signals under different loads but the same fault condition exhibit similar characteristics.

4.2.3. Constructing Fault Sample Library

An analysis of the fault data reveals that the CWT images and PVM signals exhibit similar characteristics for different loads but the same motor state, while there are significant differences for the same load but different motor states. Therefore, in constructing fault samples, those with different loads but the same motor state are classified as one type of fault.
Using the moving window method, the 256,000 data points collected by the sensors over 10 s are segmented into multiple samples for data augmentation. After segmenting, the data samples undergo CWT and Park transformation to construct the fault sample library. Each motor state is sampled 400 times. Fault data of the same motor state under three different loads are classified as one category, resulting in 1200 fault samples per fault mode. With eight fault modes, the fault sample library contains a total of 9600 samples, which include 9600 CWT images and 9600 PVM signals. Finally, the samples are split 7:2:1 into training, validating, and testing sets. Details are provided in Table 2.
To further verify the model’s generalization capability, Gaussian white noise was added to the vibration signals to simulate noise interference. In real environments, noise is often a composite of many different sources. Assuming real noise is the sum of many random variables with different probability distributions, where each variable is independent, the Central Limit Theorem states that the normalized sum of these variables approaches a Gaussian distribution as the number of noise sources increases. Therefore, in noise simulation, Gaussian noise can be used to approximately simulate real noise. Since current signals are rarely affected by interference and can be modulated using filter circuits or differential circuits, or filtered out using wavelet decomposition to remove current harmonics [44], this paper only adds noise to the vibration signals of the fault samples, as vibration signals are more susceptible to noise. Gaussian white noise at different signal-to-noise ratios (30 dB, 20 dB, 13 dB, and 10 dB) was added to the vibration signals to construct different test samples [45,46,47,48].

5. Results and Discussion

In order to validate the effectiveness of the proposed model, the constructed fault sample library was used as input to compare different algorithms. To analyze the impact of bimodal inputs on model performance, both single-model and dual-model comparison experiments were conducted. First, in the single-model experiments, vibration signals were used as input to compare and evaluate the performance of classical CNN models (VGG-11, ResNet-18, ResNet-50, and ResNeXt) as well as the newly proposed attention mechanism-based Swin Transformer model. Then, in the dual-model experiments, where bimodal data were used as input, the performance of the MLP-Transformer model, LSTM-Transformer model, and Resformer model was validated. In the end, Resformer’s performance was analyzed in detail in the comparative experiments between the large-scale model and the lightweight model. The analysis then focused on the performance of the Resformer model under different noise interferences.
The model parameters were set as follows: the batch size was 32, the number of epochs was 30 for single models and 100 for dual models, the SGD algorithm was chosen as the optimizer, cross-entropy served as the loss function, and the learning rate was established to 0.001. The deep learning framework used was Pytorch (version 2.2.1+cu118) and the programming language was Python. The hardware configuration included a 13th Gen Intel(R) Core(TM) i5-13600KF 3.50 GHz processor, Nvidia GeForce RTX 3070 GPU, and 32 GB RAM.

5.1. Comparison Experiment Analysis of Single-Modal Models

To evaluate the performance of single models, classical CNN models and the Swin Transformer model were used for motor fault diagnosis on the same dataset. The input for all models consisted of two-dimensional time–frequency images under different motor fault conditions. The models were comprehensively evaluated based on the training process, accuracy, and generalization capabilities. It is noteworthy that to enhance the model’s generalization, regularization techniques (such as Dropout and image sample expansion) were employed during training. These regularization techniques were turned off during testing and validation. Consequently, in the early stages of iteration, the validation set accuracy may be higher than the training set accuracy. As training progresses, model parameters become more fixed, which improves performance on training data but may reduce generalization capability on test data.
The diagnostic results are shown in Table 3, which records the performance of each model on different datasets. The data in the table correspond to the best performance of each model on the test set when the training loss curve converges. In Figure 10, the training process of different single-modality models is illustrated. From the training accuracy and loss curves, it can be observed that all models achieve good performance on the training set, showing accurate recognition. Looking at the results on the validation set, Swin-t and VGG-11 demonstrate good performance, while ResNet-18, ResNet-50, and ResNeXt perform poorly. As the number of iterations increases, Swin-t and VGG-11 show a gradual improvement in accuracy and decreasing loss on the test set, performing well. However, for ResNet-18, ResNet-50, and ResNeXt, although they perform well on the training set, the performance on the validation set is the opposite. When the models converge on the training set, there is an oscillation in accuracy and loss on the validation set, failing to converge like the Swin-t and VGG-11 models.
To validate the generalization capabilities of different single-modality models, we tested the trained models on a test set with varying levels of noise added. The test results are shown in Figure 11. The results indicate that the Swin-t model outperforms other models in terms of performance and has fewer training parameters. However, under strong noise interference, the Swin-t model’s performance deteriorates. To address this issue, a dual-modality model is proposed to enhance the model’s robustness and generalization. The dual-modality model is based on the best-performing Swin-t model from the single-modality experiments.

5.2. Comparison Experiment Analysis of Double-Modal Models

Figure 12 shows the accuracy and error loss curves of the dual-model training results. It is evident that with the increase in iterations, the training accuracy of all three models gradually increases, with minor fluctuations. The validation accuracy also gradually increases but shows instability during the process. Figure 12 demonstrates that Resformer not only converges faster on the training set but also exhibits smaller fluctuations in both the validation accuracy and loss curves. The MLP-Transformer model achieves a high training accuracy smoothly over the training curve, yet its validation accuracy curve does not converge. In contrast, the validation accuracy curve for the LSTM-Transformer model, despite experiencing oscillations in the early iterations, eventually converges within a certain range.
Table 4 shows the diagnostic results, which record the performance of each model on different datasets. The data in the table correspond to the best performance of each model on the test set when the training loss curve converges. All three models demonstrate similar optimal values, achieving over 97% accuracy on both the training and test sets. Compared to the original Swin-t model, the dual models further reduce the training loss.
The results of fault diagnosis on the test set using confusion matrices for different models are visualized in Figure 13. It is evident that all three models achieve high recognition rates for various types of faults. Although the MLP-Transformer model also attains 97% accuracy on the test set, it misclassifies the normal operating state of the motor compared to the original Swin-t model. In contrast, both the LSTM-Transformer and Resformer models correctly identify the normal operating state without misclassification and improve the diagnostic accuracy for other fault types. Figure 13 indicates that Resformer outperforms LSTM-Transformer, demonstrating superior recognition capabilities for motor faults across different domains. To evaluate the noise resistance capabilities of the dual-model approach, we conducted tests on the trained model using a test set with varying levels of added noise. The experimental findings are illustrated in Figure 14. The findings reveal that while the LSTM-Transformer model and MLP-Transformer model exhibited improved performance on the training set compared to the original Swin-t model, they compromised noise resilience. However, integrating ResNet with the Swin-t model in the Resformer model not only yielded good performance on the training and validation sets but also showcased significantly stronger performance on the test set under different levels of noise pollution compared to the Swin-t model.

5.3. Performance Analysis of the Resformer Model

In this section, we compare the proposed Resformer model with large models (Swin-s and Swin-b) [42] under the Transformer architecture and classic large models (VGG) [49] under the CNN architecture. We also provide a detailed analysis of the Resformer model’s performance under various interference signals.
Table 5 shows the performance results. The Resformer model maintains high accuracy across all signal-to-noise ratio (SNR) conditions, achieving 88.54% even at 10 dB, significantly outperforming other models. This demonstrates its strong noise resistance and robustness. With 32.2 million parameters, the Resformer is more lightweight than most models, except for Swin-t, which has 19.6 million parameters. This means Resformer can maintain high performance while reducing storage and computational resource requirements. The proposed model requires longer training time due to its excellent generalization capabilities, preventing it from easily falling into local optima. In contrast, large models with many parameters may fit training data well but are more likely to memorize noise and outliers, leading to poorer test performance. In terms of runtime, “sample” refers to 1 s of vibration and current signal data collected by the sensor. The Resformer model has a runtime of 0.636 s per sample, making it faster, especially compared to Swin-s and Swin-b. This indicates that Resformer can process sensor data more quickly in practical applications, thereby enhancing real-time performance and response speed.
The Confusion Matrix for the Resformer model under various noise levels is presented in Figure 15. The fault state detection of the model is analyzed using the indicators presented in the literature [50], with the calculation results displayed in Table 6 and Table 7. The Resformer model demonstrates excellent performance regardless of low or high noise levels. It is worth noting that even under different noise conditions, the model does not produce false alarms for normal motor operation.
Where precision, recall, and F1 Score can be determined as Equations (11)–(13) to evaluate the diagnostic performance.
Precision = T P T P + F P
Recall = TP T P + T N
F 1 = 2 × Precision × Recall Precision + Recall
where TP, TN, and FP denote the number of true-positive samples, true-negative samples, and false-positive samples, respectively. Evidently, the higher F1 Score represents the better diagnostic performance of the model.
After the above analysis, the conclusions are as follows: The Resformer model exhibits high accuracy, robustness, and generalization capability across most fault types, demonstrating strong classification performance. Specifically, under low noise interference, the fault detection accuracy is 100% for faults such as normal, rotor unbalance, rotor misalignment, rotor bow, bearing defects, broken bar rotor, and stator single-phase open. This indicates that the model classifies these fault types with high accuracy, without omissions. However, as noise increases, the accuracy for detecting turn-to-turn short circuits and broken bar rotors gradually decreases. Under strong interference, the recognition rate for turn-to-turn short circuits is around 63%, and for broken bar rotors, it is approximately 85%. In the presence of significant noise, the model misclassifies some instances of broken bar rotors as rotor unbalance and some instances of turn-to-turn short circuits as rotor unbalance, rotor misalignment, or broken bar rotors. Further research and improvements can be made for these specific fault types. Overall, the Resformer model demonstrates strong classification capabilities for most fault types. Even under significant interference, it operates reliably without false alarms, effectively enhancing the accuracy of motor fault identification.

6. Conclusions

To achieve intelligent fault diagnosis for motors across different domains, a combined model based on ResNet and Swin-t, called the Resformer model, is proposed. This model leverages the excellent performance of Swin-t in image processing and the robust local feature extraction capabilities of ResNet, thereby enhancing the accuracy of diagnosing common motor faults. The main conclusions of this study are as follows:
(1) Using the CWT method, one-dimensional vibration signals are converted into two-dimensional time–frequency images, enriching the model’s input with more fault features. The Park transformation is utilized to convert three-phase current signals into PVM signals, simplifying the input parameters of the model. The constructed model performs feature extraction and fault classification directly on the raw signals, requiring minimal specialized knowledge and experience in signal processing. By analyzing time–frequency images and PVM signals under different loads, it was found that fault signals of the same state exhibit certain similarities across various loads. Consequently, fault signals of the same state under different loads are classified into a single fault category. After aligning the time–frequency images and PVM signals in the time domain, fault diagnosis samples are established. These samples enable cross-domain and bimodal input for fault diagnosis.
(2) In the analysis of single-modal models, vibration signals with different noise levels were used as the sample set to test the performance of various models. The results indicate that the Transformer model outperforms the classical CNNs. Although the VGG model and the Swin-t model exhibit similar performance across various tests, the Swin-t model requires one order of magnitude fewer training parameters than VGG-11. For the analysis of bimodal models, a comparison was made between the MLP-Transformer, LSTM-Transformer, and Resformer models. It was found that bimodal input can better extract features compared to unimodal input. However, for data with interference, both MLP-Transformer and LSTM-Transformer did not perform as well as Swin-t. In contrast, the Resformer model demonstrated superior performance over Swin-t. Specifically, the Resformer model’s accuracy was 1.3% higher under 30 dB noise interference, 1.78% higher under 20 dB noise interference, 9.48% higher under 13 dB noise interference, and 18.54% higher under 10 dB noise interference compared to the Swin-t model. By combining the strengths of the ResNet and Swin-t models, the Resformer model improved fault identification accuracy under noisy conditions, resulting in better robustness and generalization capability.
(3) By conducting comparative experiments on large models, we found that these models have more parameters than lightweight models. While this allows large models to better fit the training data, they are also more likely to memorize noise and outliers, resulting in poorer test performance. In cases where fault samples are imbalanced and insufficient, lightweight models with multi-modal inputs perform better. Compared to large models, the Resformer model has fewer training parameters and faster inference times. This means that in practical applications, Resformer can maintain high performance while reducing storage and computational resource demands. Additionally, it can process sensor data more quickly, enhancing real-time performance and response speed.
This approach can be explored for other rotating machinery like gearboxes, bearings, etc. Future research will focus on the following aspects:
(1) The impact of the Resformer model on motor fault identification under variable speed conditions.
(2) Exploring improved feature fusion methods to enable the Resformer model to converge more quickly and stably, while also enhancing its robustness and generalization capabilities.
(3) The effect of different sliding window sizes on the performance of the fusion diagnostic method.

Author Contributions

Methodology, Q.S.; Validation, T.J.; Investigation, T.J.; Writing—Original Draft Preparation, Q.S. and T.J.; Writing—Review and Editing, M.C.; Funding Acquisition, Q.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 52171275 and 51909200, and the National Key Research and Development Program of China, grant number 2019YFE0104600.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Huang, G.; Tang, Y.; Chen, X.; Chen, M.; Jiang, Y. A Comprehensive Review of Floating Solar Plants and Potentials for Offshore Applications. J. Mar. Sci. Eng. 2023, 11, 2064. [Google Scholar] [CrossRef]
  2. Chen, M.; Jiang, J.; Zhang, W.; Li, C.B.; Zhou, H.; Jiang, Y.; Sun, X. Study on Mooring Design of 15 MW Floating Wind Turbines in South China Sea. J. Mar. Sci. Eng. 2023, 12, 33. [Google Scholar] [CrossRef]
  3. Zhang, Z.; Kuang, L.; Zhao, Y.; Han, Z.; Zhou, D.; Tu, J.; Chen, M.; Ji, X. Numerical investigation of the aerodynamic and wake characteristics of a floating twin-rotor wind turbine under surge motion. Energy Convers. Manag. 2023, 283, 116957. [Google Scholar] [CrossRef]
  4. Chen, M.; Deng, J.; Yang, Y.; Zhou, H.; Tao, T.; Liu, S.; Sun, L.; Hua, L. Performance Analysis of a Floating Wind–Wave Power Generation Platform Based on the Frequency Domain Model. J. Mar. Sci. Eng. 2024, 12, 206. [Google Scholar] [CrossRef]
  5. Inal, O.B.; Charpentier, J.-F.; Deniz, C. Hybrid power and propulsion systems for ships: Current status and future challenges. Renew. Sustain. Energy Rev. 2022, 156, 111965. [Google Scholar] [CrossRef]
  6. Chen, M.; Chen, Y.; Li, T.; Tang, Y.; Ye, J.; Zhou, H.; Ouyang, M.; Zhang, X.; Shi, W.; Sun, X. Analysis of the wet-towing operation of a semi-submersible floating wind turbine using a single tugboat. Ocean Eng. 2024, 299. [Google Scholar] [CrossRef]
  7. Nuchturee, C.; Li, T.; Xia, H. Energy efficiency of integrated electric propulsion for ships—A review. Renew. Sustain. Energy Rev. 2020, 134, 110145. [Google Scholar] [CrossRef]
  8. Serra, P.; Fancello, G. Towards the IMO’s GHG Goals: A Critical Overview of the Perspectives and Challenges of the Main Options for Decarbonizing International Shipping. Sustainability 2020, 12, 3220. [Google Scholar] [CrossRef]
  9. Geertsma, R.D.; Negenborn, R.R.; Visser, K.; Hopman, J.J. Design and control of hybrid power and propulsion systems for smart ships: A review of developments. Appl. Energy 2017, 194, 30–54. [Google Scholar] [CrossRef]
  10. Choudhary, A.; Goyal, D.; Letha, S.S. Infrared Thermography-Based Fault Diagnosis of Induction Motor Bearings Using Machine Learning. IEEE Sens. J. 2021, 21, 1727–1734. [Google Scholar] [CrossRef]
  11. Xu, X.; Yan, X.; Yang, K.; Zhao, J.; Sheng, C.; Yuan, C. Review of condition monitoring and fault diagnosis for marine power systems. Transp. Saf. Environ. 2021, 3, 85–102. [Google Scholar] [CrossRef]
  12. Chegini, S.N.; Bagheri, A.; Najafi, F. Application of a new EWT-based denoising technique in bearing fault diagnosis. Measurement 2019, 144, 275–297. [Google Scholar] [CrossRef]
  13. Abdelkader, R.; Kaddour, A.; Bendiabdellah, A.; Derouiche, Z. Rolling Bearing Fault Diagnosis Based on an Improved Denoising Method Using the Complete Ensemble Empirical Mode Decomposition and the Optimized Thresholding Operation. IEEE Sens. J. 2018, 18, 7166–7172. [Google Scholar] [CrossRef]
  14. Wang, T.; Chu, F. Bearing fault diagnosis under time-varying rotational speed via the fault characteristic order (FCO) index based demodulation and the stepwise resampling in the fault phase angle (FPA) domain. ISA Trans. 2019, 94, 391–400. [Google Scholar] [CrossRef] [PubMed]
  15. Choudhary, A.; Mishra, R.K.; Fatima, S.; Panigrahi, B.K. Multi-input CNN based vibro-acoustic fusion for accurate fault diagnosis of induction motor. Eng. Appl. Artif. Intell. 2023, 120, 105872. [Google Scholar] [CrossRef]
  16. Ren, B.; Yang, M.; Chai, N.; Li, Y.; Xu, D. Fault Diagnosis of Motor Bearing Based on Speed Signal Kurtosis Spectrum Analysis. In Proceedings of the 2019 22nd International Conference on Electrical Machines and Systems (ICEMS), Harbin, China, 11–14 August 2019; pp. 1–6. [Google Scholar]
  17. AlShorman, O.; Alkahatni, F.; Masadeh, M.; Irfan, M.; Glowacz, A.; Althobiani, F.; Kozik, J.; Glowacz, W. Sounds and acoustic emission-based early fault diagnosis of induction motor: A review study. Adv. Mech. Eng. 2021, 13, 1687814021996915. [Google Scholar] [CrossRef]
  18. Zhou, Y.; Shang, Q.; Guan, C. Three-Phase Asynchronous Motor Fault Diagnosis Using Attention Mechanism and Hybrid CNN-MLP by Multi-Sensor Information. IEEE Access 2023, 11, 98402–98414. [Google Scholar] [CrossRef]
  19. Xie, F.; Li, G.; Hu, W.; Fan, Q.; Zhou, S. Intelligent Fault Diagnosis of Variable-Condition Motors Using a Dual-Mode Fusion Attention Residual. J. Mar. Sci. Eng. 2023, 11, 1385. [Google Scholar] [CrossRef]
  20. Bessam, B.; Menacer, A.; Boumehraz, M.; Cherif, H. Detection of broken rotor bar faults in induction motor at low load using neural network. ISA Trans. 2016, 64, 241–246. [Google Scholar] [CrossRef]
  21. Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
  22. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
  23. Wang, F.; Liu, R.; Hu, Q.; Chen, X. Cascade Convolutional Neural Network with Progressive Optimization for Motor Fault Diagnosis under Nonstationary Conditions. IEEE Trans. Ind. Inform. 2021, 17, 2511–2521. [Google Scholar] [CrossRef]
  24. Xin, G.; Li, Z.; Jia, L.; Zhong, Q.; Dong, H.; Hamzaoui, N.; Antoni, J. Fault Diagnosis of Wheelset Bearings in High-Speed Trains Using Logarithmic Short-Time Fourier Transform and Modified Self-Calibrated Residual Network. IEEE Trans. Ind. Inform. 2022, 18, 7285–7295. [Google Scholar] [CrossRef]
  25. Jia, N.; Cheng, Y.; Liu, Y.; Tian, Y. Intelligent Fault Diagnosis of Rotating Machines Based on Wavelet Time-Frequency Diagram and Optimized Stacked Denoising Auto-Encoder. IEEE Sens. J. 2022, 22, 17139–17150. [Google Scholar] [CrossRef]
  26. Tang, H.; Liao, Z.; Chen, P.; Zuo, D.; Yi, S. A Novel Convolutional Neural Network for Low-Speed Structural Fault Diagnosis Under Different Operating Condition and Its Understanding via Visualization. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
  27. Sun, Y.; Li, S.; Wang, X.J.M. Bearing fault diagnosis based on EMD and improved Chebyshev distance in SDP image. Measurement 2021, 176, 109100. [Google Scholar] [CrossRef]
  28. Xie, J.-L.; Shi, W.-F.; Xue, T.; Liu, Y.-H. High-Resistance Connection Fault Diagnosis in Ship Electric Propulsion System Using Res-CBDNN. J. Mar. Sci. Eng. 2024, 12, 583. [Google Scholar] [CrossRef]
  29. Zhu, Y.; Su, H.; Tang, S.; Zhang, S.; Zhou, T.; Wang, J. A Novel Fault Diagnosis Method Based on SWT and VGG-LSTM Model for Hydraulic Axial Piston Pump. J. Mar. Sci. Eng. 2023, 11, 594. [Google Scholar] [CrossRef]
  30. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  31. Jin, Y.; Hou, L.; Chen, Y. A new rotating machinery fault diagnosis method based on the Time Series Transformer. arXiv 2021, arXiv:2108.12562. [Google Scholar]
  32. Wu, H.; Triebe, M.J.; Sutherland, J.W. A transformer-based approach for novel fault detection and fault classification/diagnosis in manufacturing: A rotary system application. J. Manuf. Syst. 2023, 67, 439–452. [Google Scholar] [CrossRef]
  33. Ding, Y.; Jia, M.; Miao, Q.; Cao, Y. A novel time–frequency Transformer based on self–attention mechanism and its application in fault diagnosis of rolling bearings. Mech. Syst. Signal Process. 2022, 168, 108616. [Google Scholar] [CrossRef]
  34. Yan, R.; Gao, R.X.; Chen, X. Wavelets for fault diagnosis of rotary machines: A review with applications. Signal Process. 2014, 96, 1–15. [Google Scholar] [CrossRef]
  35. Das, S.; Purkait, P.; Dey, D.; Chakravorti, S. Monitoring of inter-turn insulation failure in induction motor using advanced signal and data processing tools. IEEE Trans. Dielectr. Electr. Insul. 2011, 18, 1599–1608. [Google Scholar] [CrossRef]
  36. Sonje, D.M.; Kundu, P.; Chowdhury, A. A Novel Approach for Sensitive Inter-turn Fault Detection in Induction Motor under Various Operating Conditions. Arab. J. Sci. Eng. 2019, 44, 6887–6900. [Google Scholar] [CrossRef]
  37. Lilly, J.M.; Olhede, S.C. Higher-Order Properties of Analytic Wavelets. IEEE Trans. Signal Process. 2009, 57, 146–160. [Google Scholar] [CrossRef]
  38. Douglas, H.; Pillay, P.; Barendse, P. The detection of interturn stator faults in doubly-fed induction generators. In Proceedings of the Fourtieth IAS Annual Meeting. Conference Record of the 2005 Industry Applications Conference, Hong Kong, China, 2–6 October 2005; pp. 1097–1102. [Google Scholar]
  39. Sonje, D.M.; Chowdhury, A.; Kundu, P. Fault diagnosis of induction motor using parks vector approach. In Proceedings of the 2014 International Conference on Advances in Electrical Engineering (ICAEE), Vellore, India, 9–11 January 2014; pp. 1–4. [Google Scholar]
  40. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  41. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  42. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
  43. Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
  44. Cano, A.; Arévalo, P.; Benavides, D.; Jurado, F. Integrating discrete wavelet transform with neural networks and machine learning for fault detection in microgrids. Int. J. Electr. Power Energy Syst. 2024, 155, 109616. [Google Scholar] [CrossRef]
  45. Han, H.; Wang, H.; Liu, Z.; Wang, J. Intelligent vibration signal denoising method based on non-local fully convolutional neural network for rolling bearings. ISA Trans. 2022, 122, 13–23. [Google Scholar] [CrossRef]
  46. Li, Y.; Cheng, G.; Liu, C. Research on bearing fault diagnosis based on spectrum characteristics under strong noise interference. Measurement 2021, 169, 108509. [Google Scholar] [CrossRef]
  47. Zhao, D.; Cui, L.; Liu, D. Bearing Weak Fault Feature Extraction Under Time-Varying Speed Conditions Based on Frequency Matching Demodulation Transform. IEEE/ASME Trans. Mechatron. 2023, 28, 1627–1637. [Google Scholar] [CrossRef]
  48. Zhao, D.; Cai, W.; Cui, L. Adaptive thresholding and coordinate attention-based tree-inspired network for aero-engine bearing health monitoring under strong noise. Adv. Eng. Inform. 2024, 61, 102559. [Google Scholar] [CrossRef]
  49. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  50. Jia, L.; Chow, T.W.S.; Yuan, Y. GTFE-Net: A Gramian Time Frequency Enhancement CNN for bearing fault diagnosis. Eng. Appl. Artif. Intell. 2023, 119, 105794. [Google Scholar] [CrossRef]
Figure 1. (a) ResNet-18 network structure; (b) residual block structure.
Figure 1. (a) ResNet-18 network structure; (b) residual block structure.
Jmse 12 01304 g001
Figure 2. Swin-t model structure.
Figure 2. Swin-t model structure.
Jmse 12 01304 g002
Figure 3. Two successive Swin Transformer Blocks.
Figure 3. Two successive Swin Transformer Blocks.
Jmse 12 01304 g003
Figure 4. Resformer model structure.
Figure 4. Resformer model structure.
Jmse 12 01304 g004
Figure 5. The experimental platform. (a) Motor test bench; (b) digital data acquisition system.
Figure 5. The experimental platform. (a) Motor test bench; (b) digital data acquisition system.
Jmse 12 01304 g005
Figure 6. Normal motor data processing procedure.
Figure 6. Normal motor data processing procedure.
Jmse 12 01304 g006
Figure 7. Stator single-phase open fault motor data processing procedure.
Figure 7. Stator single-phase open fault motor data processing procedure.
Jmse 12 01304 g007
Figure 8. Typical vibration signal diagram, CWT images, and FFT images of the motor under different loads in a steady state.
Figure 8. Typical vibration signal diagram, CWT images, and FFT images of the motor under different loads in a steady state.
Jmse 12 01304 g008
Figure 9. Typical PVM images and sample plots of the motor under different loads in a steady state.
Figure 9. Typical PVM images and sample plots of the motor under different loads in a steady state.
Jmse 12 01304 g009
Figure 10. The training processes of the various single-modal models.
Figure 10. The training processes of the various single-modal models.
Jmse 12 01304 g010
Figure 11. Accuracy of different single-modal models under noise interference.
Figure 11. Accuracy of different single-modal models under noise interference.
Jmse 12 01304 g011
Figure 12. The training processes of the various double-modal models.
Figure 12. The training processes of the various double-modal models.
Jmse 12 01304 g012
Figure 13. Confusion matrix of different models. (a) Swin-t model. (b) MLP-Transformer model. (c) LSTM-Transformer model. (d) Resformer model.
Figure 13. Confusion matrix of different models. (a) Swin-t model. (b) MLP-Transformer model. (c) LSTM-Transformer model. (d) Resformer model.
Jmse 12 01304 g013
Figure 14. Accuracy of Swin-t and different double-modal models under noise interference.
Figure 14. Accuracy of Swin-t and different double-modal models under noise interference.
Jmse 12 01304 g014
Figure 15. The confusion matrix for the Resformer model under different noise impacts. (a) 30 dB; (b) 20 dB; (c) 13 dB; (d) 10 dB.
Figure 15. The confusion matrix for the Resformer model under different noise impacts. (a) 30 dB; (b) 20 dB; (c) 13 dB; (d) 10 dB.
Jmse 12 01304 g015
Table 1. Structure parameters of Resformer network.
Table 1. Structure parameters of Resformer network.
Network LayerSwin-t BranchResnet-18 Branch
Output DimensionLayer ConfigurationOutput DimensionLayer Configuration
Input224 × 224 × 3-768 × 1-
Stage 156 × 56 × 96Patch Partition [4 × 4] stride 4384 × 64Conv1d 7 × 7, stride 2
56 × 56 × 96Linear Embedding192 × 64Max-Pool 3 × 3, stride 2
Stage 256 × 56 × 96Transformer block × 2192 × 64Residual block
[3 × 3, 64] × 2
Stage 328 × 28 × 192Patch Merging and
Transformer block × 2
96 × 128Residual block
[3 × 3, 128] × 2, stride 2
Stage 414 × 14 × 384Patch Merging and
Transformer block × 2
48× 256Residual block
[3 × 3, 256] × 2, stride 2
Stage 57 × 7 × 768Patch Merging and
Transformer block × 2
24 × 512Residual block
[3 × 3, 512] × 2, stride 2
Average Pooling768Global Average Pooling512Adaptive AvgPool1d
Concatenation1280Concatenation of the outputs of Swin-t and
ResNet-18
MLP Layer 1512Linear(1280, 512), ReLU, Dropout(0.5)
MLP Layer 2256Linear(512, 256), ReLU, Dropout(0.5)
MLP Layer 38Linear(256, 8)
Table 2. Division of sample library.
Table 2. Division of sample library.
Fault of MotorSample NumberLabel
TrainValidationTest
Normal67201920960NM
Rotor unbalance67201920960RU
Rotor misalignment67201920960RM
Rotor bow67201920960RB
Bearing defects67201920960BD
Broken bar rotor67201920960BR
Turn-to-turn short circuit67201920960SC
Stator single-phase open67201920960SP
Table 3. Comparison of accuracy across various single-modal models.
Table 3. Comparison of accuracy across various single-modal models.
ModelAccuracy of Training/%Training LossAccuracy of Verification/%Validation LossAccuracy of Testing/%Parameter
Swin-t99.270.001398.130.002898.6519.6 M
VGG-1198.940.002598.180.002898.54132.9 M
ResNet-1899.080.002375.310.025575.7311.7 M
ResNet-5099.140.001675.680.031575.7125.6 M
ResNeXt99.330.001383.700.012982.4025 M
Table 4. Comparison of accuracy across various double-modal models.
Table 4. Comparison of accuracy across various double-modal models.
ModelAccuracy of Training/%Training LossAccuracyof Verification/%Validation LossAccuracy of Testing/%Parameter
MLP-Transformer99.480.000898.130.001397.2928.9 M
LSTM-Transformer99.310.001097.970.002298.9129.0 M
Resformer99.820.000499.690.000399.6932.2 M
Table 5. Performance Comparison.
Table 5. Performance Comparison.
ModelSNRParameter (M)Training Time (h)Running Time (s/Sample)
30 dB20 dB13 dB10 dB
Resformer99.90%99.58%93.44%88.54%32.22.20.636
Swin-t98.60%97.80%83.96%70.00%19.61.10.569
Swin-s99.48%98.02%82.77%66.52%49.62.40.938
Swin-b99.61%99.16%76.19%60.49%87.82.10.944
VGG-1198.30%97.22%80.94%67.61%132.90.90.617
VGG-1999.58%99.17%84.69%73.96%143.71.40.808
Table 6. Evaluation of fault detection effectiveness under 30 dB and 20 dB noise conditions.
Table 6. Evaluation of fault detection effectiveness under 30 dB and 20 dB noise conditions.
Fault Label30 dB20 dB
PrecisionRecallF1 ScorePrecisionRecallF1 Score
NM111111
RU111111
RM0.99210.9960.96810.984
RB10.9920.996111
BD111111
BR111111
SC0.9920.9920.99210.9670.983
SP111111
Table 7. Evaluation of fault detection effectiveness under 13 dB and 10 dB noise conditions.
Table 7. Evaluation of fault detection effectiveness under 13 dB and 10 dB noise conditions.
Fault Label13 dB10 dB
PrecisionRecallF1 ScorePrecisionRecallF1 Score
NM111111
RU0.81610.8990.66710.8
RM0.8310.9090.7580.9920.860
RB111111
BD111111
BR0.8950.8500.8720.8820.6250.732
SC10.6250.7690.9830.4750.640
SP111111
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shang, Q.; Jin, T.; Chen, M. A New Cross-Domain Motor Fault Diagnosis Method Based on Bimodal Inputs. J. Mar. Sci. Eng. 2024, 12, 1304. https://doi.org/10.3390/jmse12081304

AMA Style

Shang Q, Jin T, Chen M. A New Cross-Domain Motor Fault Diagnosis Method Based on Bimodal Inputs. Journal of Marine Science and Engineering. 2024; 12(8):1304. https://doi.org/10.3390/jmse12081304

Chicago/Turabian Style

Shang, Qianming, Tianyao Jin, and Mingsheng Chen. 2024. "A New Cross-Domain Motor Fault Diagnosis Method Based on Bimodal Inputs" Journal of Marine Science and Engineering 12, no. 8: 1304. https://doi.org/10.3390/jmse12081304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop