1. Introduction
The marine vertical centrifugal pump is a key component of a ship pipeline system, which is responsible for conveying a fluid working medium to maintain the normal operation of various parts of mechanical equipment. However, the ship’s swing, the vibration of the operating equipment and the long-term operation under a certain load easily cause the failure of the rotor components of the vertical centrifugal pump unit and reduce the service life of the equipment [
1,
2,
3]. Therefore, the establishment of multi-fault classification and an identification method for the rotor fault and the mechanical loose fault of the marine vertical centrifugal pump has become a research hotspot. It is also a technical means to prevent the sudden failure of equipment and effectively improve the operating efficiency and service life of equipment.
In recent years, the fault diagnosis of rotating machinery has developed from traditionally relying on expert experience to the intelligent fault identification based on machine learning [
4]. The fault diagnosis process of the rotating machinery system generally includes three key stages: (1) Obtaining the original signal of equipment fault information based on the field sensor. (2) Signal preprocessing and feature extraction are used to obtain fault signal features. (3) The trained diagnostic model is used to diagnose and identify the test samples [
5]. In the industrial production scene, the data collected by the sensor usually have the characteristics of not obvious features and noise interference. Therefore, it is very important to mine signal characteristics and reduce noise in the rotating machinery fault diagnosis [
6]. Zhang et al. proposed an image representation fault diagnosis method based on convolutional neural network (CNN) signal features, which overcomes the problem that information cannot fully reflect the fault mode, and has been verified in rotor tests [
7]. Shao et al. proposed a multi-domain convolutional neural network method based on deep learning. Its features are composed of time domain and frequency domain characteristic parameters, and make full use of the characteristics of the two domains for fault diagnosis. The effectiveness of this method on the bearing data set is verified by a large number of experiments [
8]. Pang et al. proposed a deep learning fault diagnosis method, which extracts features from time domain and frequency domain signals. Two sets of deep features in multiple domains are fused into intrinsic low-dimensional features. A lot of experiments on the gearbox, rotor and engine bearing show that the method has better diagnostic performance and stronger adaptability [
9]. Sun et al. proposed a data-driven multi-wavelet denoising technique, which has been successfully applied in weak feature extraction of minor faults in bearing inner rings [
10]. By product function selection and wavelet packet decomposition, Lee et al. effectively removed the high-frequency noise, effectively extracted the fault information hidden under the noise, and established a high-precision fault location feature set of bearing roller [
11]. Therefore, signal noise reduction is essential for feature extraction of industrial production equipment fault signals, which is conducive to reflect the actual state of equipment earlier, more accurately and more truly.
William proposed a zero-crossing feature parameter extraction method for the early fault detection of rotating machinery [
12]. In this method, zero-crossing characteristic parameters are extracted from time-domain signals by using continuous zero-crossing time intervals. When the data dimension is high, the direct use of machine learning for data classification will cause a long computing time, and the classification performance cannot be achieved. To solve these problems, kernel principal component analysis has advantages in dimensionality reduction. According to the eigenvalue of the kernel matrix, the kernel function weight is formed, and the data dimension is reduced by combining multiple kernel functions [
13,
14]. Du et al. proposed wavelet packet decomposition (WPD) and high-order cumulant to extract features from bearing failure vibration signals, and combined them with principal component analysis (PCA) to reduce the redundancy of feature data [
15]. Shen et al. proposed a feature selection method based on the polling mode and the weighted kernel principal component analysis (WKPCA) method. Finally, this method can adaptively classify highly sensitive features with more fault information of rotating components, improving the separability of the fault sample subset [
16]. In order to identify the fault state of centrifugal pump effectively, it is important to construct the feature space with the optimal dimension to improve the recognition ability.
Machine learning is an effective method of equipment fault diagnosis, and the accuracy of classification recognition can be used as a typical basis to judge the diagnosis model. Support vector machine (SVM) is a machine learning method widely studied by scholars. When combined with other intelligent algorithms, SVM can realize high-precision fault identification under limited fault samples [
17]. Huang et al. proposed an SVM model based on the optimized genetic algorithm, which successfully identified the faults of the operating mechanism such as the loosening of the foundation screw and the failure of the buffer spring. Compared with traditional SVM, it has higher recognition accuracy [
18]. The SVM classifier optimized by grid search technology can effectively identify the running state of centrifugal pumps [
19]. Maamar uses wavelet packet transform for feature extraction at multiple decomposition levels, and studies two parent wavelets to verify the effectiveness of feature extraction. At the same time, a genetic algorithm was used to optimize the number of hidden layers and multilayer perceptual neurons, and the hybrid training method a combining genetic algorithm and a BP algorithm was applied to the fault classification of the centrifugal pump [
20]. Rotating machinery fault diagnosis technology is usually proposed for a single fault. But compound failures of rotating machinery occur more frequently [
21,
22,
23]. Liu proposed a hybrid intelligent model based on redundant second-generation wavelet packet transform, kernel principal component analysis and dual support vector machines to realize multi-fault detection of the rotating machinery. Experimental results have proved the effectiveness of this method [
24]. Tang et al. proposed a particle swarm optimization support vector machine (PSO-SVM) multi-fault diagnosis method based on information fusion, and the accuracy distribution of this method for normal state, single fault mode and multi-fault mode can reach 98.3%, 97.6% and 94% [
25]. Therefore, combining an intelligent algorithm to optimize the kernel function and penalty parameters in traditional SVM is an effective way to improve its fault recognition accuracy.
In the study of rotating machinery fault diagnosis, the above literature mainly considered the signal noise reduction ability of the diagnosis algorithm, the feature set screening of the optimal dimension and the fault recognition accuracy, but ignored the sensitivity of signals at different measuring points of the equipment to fault characteristics and the characterization ability of multi-domain signal features of a single measuring point. Due to the characteristics of marine vertical centrifugal pump, including complex structure, variable operating conditions, rotor dynamics and fluid dynamics and other factors, different measuring points have different sensitivity to fault characteristics. The limitation of single feature representation is an important factor affecting the diagnostic accuracy. This paper takes the marine vertical centrifugal pump as the research object to solve the problems of uncertainty of the fault signal, limitation of single feature representation and poor classification and identification effect of multiple faults. In this paper, the fault simulation test platform and data acquisition system of the Marine vertical centrifugal pump are built. The test point with the highest sensitivity to fault characteristics is selected from six typical test points of the Marine vertical centrifugal pump. Secondly, the multi-domain and multi-type feature parameter set is established, and the feature set dimension is reduced by the WKPCA method, which effectively solves the aliasing phenomenon of various fault features. Finally, the optimal parameters are selected based on the PSO algorithm to improve the performance of the support vector machine model in the classification and identification of multiple faults of Marine vertical centrifugal pumps.
3. Feature Extraction
3.1. Signal Preprocessing
The implementation process of Kalman Filter is to predict the current value by using the optimal result of the previous time, and modify the predicted current by using the measured value to finally obtain the optimal result. Its calculation formula is:
where,
is the optimal result;
is the predicted current value;
is the measured value;
is Kalman gain;
is the covariance of the predicted value [
27,
28].
3.2. Weighted Kernel Principal Component Analysis (WKPCA)
In order to realize the rapid screening of nonlinear characteristic parameters and eliminate the influence of irrelevant factors on pattern recognition, the Kernel Principal Component Analysis (KPCA) method is used to standardize and reduce the dimension of the characteristic matrix.
Standardize the original fault data with the number of characteristic parameters s and the number of samples k, and the calculation formula is as follows:
where,
.
Select kernel function
K to map matrix
X to high-dimensional matrix
C, and calculate the sample correlation coefficient matrix:
The Jacobian method is used to solve the eigenvalues
and corresponding eigenvectors
of the correlation coefficient matrix R. The calculation formula of the characteristic contribution rate is as follows:
According to the principle that the contribution rate of characteristic parameters is sorted from high to low and the index is greater than 0.85, the first m eigenvectors of the contribution rate are taken to form an eigenvalue matrix with s eigenvalues reduced into m principal elements [
29].
3.3. Establishing Feature Set
The filtering noise reduction method eliminates and suppresses the influence of random noise components on signal feature extraction to a certain extent. The vibration signal of vertical centrifugal pump has nonlinear and unsteady characteristics. In order to improve the adaptability of multi-resolution analysis of vibration signal. The original signal is adaptively decomposed to obtain intrinsic mode functions (IMF) components at different characteristic time scales, so that the decomposed components have a single eigenfrequency, and the linearization and smoothing of the original signal are realized [
30].
Taking the vibration displacement data of measuring point M2 in the rotor misalignment fault test as an example, the empirical mode decomposition (EMD) results are shown in
Figure 4. It can be seen from the figure that the waveforms of the IMF4 and IMF5 are all approximate sinusoidal waves with period 2π, and according to the corresponding spectral diagram, obvious amplitude characteristics appear at 1
fAPF and 2
fAPF. The time-domain waveform data of IMF4 and IMF5 are extracted for signal reconstruction analysis, and the results are shown in
Figure 5.
By observing the time-domain waveform of the reconstructed signal, it can be found that the time-domain waveform presents the characteristics of a sine wave, and there are depressions and double peaks at the peak. This is the main time-domain feature of the dual frequency rotation component in the corresponding frequency-domain diagram. By Fourier transform, it can be further found that the main frequency of the signal is 1fAPF, accompanied by a secondary frequency of 2fAPF, which accords with the spectrum characteristics of the rotor misalignment fault.
The vibration characteristics are different due to different faults. The number and order of the IMF component are also different. For example, the IMF5 component is selected to reconstruct the unbalanced rotor fault signal after decomposition. The time-domain signal of the IMF5 component presents a sinusoidal function with a period 2π. After mechanical loosening fault signal decomposition, IMF2, IMF5 and IMF6 components were selected for reconstruction. In order to cover different fault characteristic information, the first six IMF components after EMD decomposition are selected as the characteristic parameters in the time-frequency domain.
The characteristic energy is calculated as follows:
An eigenvector matrix is constructed by taking the energy characteristic of the first six IMF components decomposed as elements:
The normalized energy characteristic matrix is obtained:
The four operating states of normal, rotor unbalanced, rotor misalignment and mechanical looseness have different characteristics. In order to avoid the limitation of fault trend analysis caused by a single characteristic parameter or a single type characteristic parameter, a multi-domain and multi-class fault feature extraction method is used to form a fault feature set. The characteristic indexes include 10 time-domain indexes, 4 frequency-domain indexes and 6 time-frequency domain energy characteristic indexes, as shown in
Table 3.
According to the above table, a multi-domain and multi-type feature sample library is constructed for signals collected under four typical conditions of the marine vertical centrifugal pump, and a single fault sample contains 100 feature matrices. According to the needs of model training and verification, this paper divides the single fault sample library into training set and test set according to the ratio of 5:5. A total of 50 samples were selected to build a training set for training the diagnostic model, and 50 samples were selected to build a test set for verifying the fault classification and recognition effect of the diagnostic model. At the same time, labels are configured for four typical states of the Marine vertical centrifugal pump, which are used for training and verification of the diagnostic model.
Table 4 describes the fault sample database classification and label configuration.
3.4. Data Dimensionality Reduction
Due to the rich variety of original signal feature parameters, the correlation between parameters results in information redundancy. In order to avoid aliasing between different faulty core principal components, the problem of information redundancy is reduced or eliminated. The original high-dimensional fault features are dimensionally reduced, and the weight index size of each feature parameter in the original fault feature set is calculated, as shown in
Figure 6.
It can be seen from the graph analysis that the original 20 characteristic parameters are subjected to dimensionality reduction processing, and the 9 characteristic parameters that meet the conditions are Mean, Kurtosis, Peak, Harmonic Mean, Mean Frequency, E1, E2, E5, E6 respectively. The weight values are 0.21515, 0.2938, 0.317025, 0.35055, 0.109925, 0.319, 0.1441, 0.114, 0.1457. The feature matrix composed of 9 feature parameters is used as the target feature set for fault classification.
3.5. Feature Extraction Result
The distribution of the four operating state feature parameters kernel key element feature points obtained by the KPCA method is relatively concentrated. Among them, the core principal element feature points of mechanical loosening faults are relatively concentrated; that is, the intra-class spacing is small, and it is clearly distinguished from the principal element feature points in the normal state; that is, the inter-class spacing is obvious, and the distinguishing effect is good. However, the feature points of the two types of faults of rotor unbalance and rotor misalignment overlap with each other, and it is impossible to completely distinguish the core feature points of normal operating conditions and mechanical loose faults. The fault principal components extracted by the feature dimension reduction of KPCA have obvious aliasing, which cannot fully achieve the effect of classifying multiple faults of centrifugal pumps, as shown in
Figure 7a. The WKPCA is an extension of the general KPCA method. Its basic idea is to strengthen the role of some fault features on fault classification by weighting each data feature in the data set. The two-dimensional distribution is shown in
Figure 7b. The core principal element feature points of the four types of working conditions can be clearly distinguished, the distance between classes is obvious, and the discrimination degree of multiple fault categories is good.
It can be seen from the figure that compared with the feature parameter kernel pivot feature point extraction method of KPCA, the intra-class spacing of the kernel pivot feature points of the four operating states extracted by the WKPCA method is smaller. The inter-class spacing is larger. Accurate classification of kernel key element feature points in four operating states is achieved.
4. Diagnostic Algorithm
4.1. Support Vector Machines
Support vector machine based on statistical theory has obvious advantages in small sample, nonlinear and high-dimensional classification and recognition, and can achieve high fault classification performance. Its general form can be expressed as follows:
where,
C is the penalty factor of SVM;
is the relaxation variable;
is the fault characteristic sample;
is the sample label. Radial basis function (RBF) is adopted as the kernel function, which can overcome the complexity of inner product operation in high-dimensional feature space. Its kernel function expression is:
where,
is the fault characteristic sample;
is a kernel function.
The selection of RBF kernel parameter
and SVM penalty parameter
C ultimately affects the sample training and test results, which easily to causes the SVM classification results to fall into local optimization, affecting the accuracy of fault identification. Therefore, PSO is used to optimize the kernel parameter
and penalty parameter
C [
31,
32].
4.2. Particle Swarm Optimization
In a D-dimensional target search space,
N Particles form a community, in which the
ith particle is represented as a D-dimensional vector:
The optimization speed of the ith particle is also a D-dimensional vector, expressed as:
Particle velocity and position updates can be expressed as:
where,
are the velocity vector and position vector on the
j dimension after the
t + 1 iteration of the
ith particle, respectively;
is the inertia weight that decreases linearly;
are individual learning coefficient and global learning coefficient, respectively;
are independent random numbers of [0,1];
is the current optimal position of the
ith particle;
is the current optimal position of the whole group [
33,
34].
In summary, the steps of feature extraction and the multi-fault identification method of the marine vertical centrifugal pump based on WKPCA and PSO-SVM are as follows:
(1) Collect original data. A fault simulation test system is built, and the original data with high sensitivity are selected based on vibration intensity. The signal is denoised by Kalman filter.
(2) Extract signal features. Multi-domain and multi-type feature parameters are extracted, the feature matrix is normalized and dimensionality reduced, and the feature parameters are weighted based on the ReliefF algorithm. WKPCA is used to achieve accurate classification of the four fault states.
(3) Establish fault samples. 100 groups of samples are selected for each fault state, which are divided into 50 groups of training set and 50 groups of test set. Label different fault samples.
(4) Select kernel function. In order to select the appropriate kernel function, RBF kernel function with strong learning ability and generalization ability is selected as the mapping function of the model.
(5) Optimize parameters. For the selected mapping function, based on the existing fault feature set data samples, particle swarm optimization algorithm is selected to globally optimize the penalty factor C and kernel function g parameter values of SVM.
(6) Fault classification based on PSO-SVM recognition model. The obtained training set and test set are input into the recognition model respectively to analyze the effect of fault classification.
The flow of feature extraction and the fault classification model based on WKPCA and PSO-SVM algorithm is shown in
Figure 8.
5. Results and Discussion
In order to further explore the contribution of multi-domain and multi-type characteristic parameters to the classification and recognition of multiple faults of the marine vertical centrifugal pump, the multi-domain and multi-type characteristic parameters established in Chapter 3 are divided into a single domain, two domains and three domains. A single domain is divided into three cases, including 4 features in the time domain (T), 1 feature in the frequency domain (F), and 4 features in the time-frequency domain (TF). The two domains are divided into three cases, including 5 features of the time domain and frequency domain (T + F), 8 features of the time domain and time frequency domain (T + TF), and 5 features of the frequency domain and time frequency domain (F + TF). The three domains contain 9 features of the time domain, frequency domain and time frequency domain (T + F + TF). The seven categories were used as input of the PSO-SVM model for multi-fault classification identification, and the results are shown in
Table 5.
As can be seen from
Table 5, when a single domain is the input feature of the model, the average classification accuracy of the feature parameters of the three domains is 0.8. The average accuracy of classification in two domains is 0.92. The classification accuracy reaches 1 in three domains. At the same time, F-1 score is an indicator of the accuracy of the binary model. It can be regarded as a weighted average of model accuracy and recall rates. The average F-1 score of a single domain is 0.86; The average F-1 score of the two domains is 0.96; The F-1 score for all three domains is 1. When measured by classification accuracy and F-1 score, the classification performance of the PSO-SVM model is improved with the increase of the number of domains. It can be seen that the establishment of multi-domain and multi-type characteristic parameters has a positive contribution to the multi-fault classification of the marine vertical centrifugal pump.
The parameters of the SVM model are optimized based on the PSO optimization algorithm, and the diagnostic classification results are shown in
Figure 9. It can be seen from the figure that the fitness value of the PSO algorithm reaches the maximum convergence accuracy of 98.8% in the 11th generation. It is obtained that the optimal solution of penalty parameter
C is 56.22, and the optimal solution of kernel parameter
g is 0.81. The accuracy of fault classification based on the PSO-SVM model has reached 1; that is, the four operating states have been accurately classified. Thus, it is verified that the proposed fault pattern recognition model has strong learning generalization ability.
In order to further verify the advantages of the proposed parameter optimization and pattern recognition algorithm in the multi-fault classification of the marine vertical centrifugal pump, 200 groups of test samples composed of four fault data sets are classified and compared with the traditional SVM model and GA-SVM model, respectively. The final classification performance is shown in
Figure 10 and
Figure 11, and the data comparison results are shown in
Table 6.
It can be found from
Table 6 that the classification accuracy of test samples based on the traditional SVM model is only 0.83, and that the accuracy of those based on the GA-SVM model is 0.96. The classification accuracy of 200 groups of test samples based on the PSO-SVM recognition model is 1, the fitness is 98.8%, and the convergence speed is the fastest. The classification performance of the PSO-SVM model on test samples is significantly better than that of traditional SVM and GA-SVM models. The F-1 score based on the PSO-SVM model reached 1, which is higher than the both of the other models. The superiority of the proposed PSO-SVM algorithm in multi-fault classification of vertical centrifugal pumps is comprehensively verified.