The rolling bearing is an important part of rotating machinery and equipment, whose main role is to transfer kinetic energy from the drive shaft to the shaft seat and reduce the energy loss caused by friction. A large part of the failure of rotating machinery and equipment is caused by rolling bearing failure. Rolling bearing failure will not only affect the progress of the project but also cause huge economic losses and, more seriously, will lead to staff casualties. Therefore, the study of rolling bearing fault diagnosis is necessary [
1,
2,
3,
4]. In the early days, staff mainly relied on manual experience to diagnose rolling bearings, and this method was inefficient and could not detect faults in the bearings at the earliest possible time. Later, it was found that the analysis of rolling bearing vibration signals could detect the status of bearings in real-time, so a large number of scholars studied various methods to process the signals. Dragomiretskiy [
5] proposes variational mode decomposition (VMD), which is an adaptive signal decomposition method. Instead of adopting the same decomposition mode as empirical mode decomposition (EMD), this method adopts a non-recursive variational mode, which avoids the occurrence of the end effect and makes the decomposed mode components more accurate. However, the drawback of this method is that the number of mode components
K and the penalty factor
α have a large impact on the decomposition results [
6]. To obtain the accurate number of mode components
K, Zhou et al. [
7] combined EMD and center frequency to determine the value of
K according to the trend of center frequency variation of each intrinsic mode function (IMF). Zhang et al. [
8] used the Gini index and autocorrelation function to construct the weighted autocorrelative function maximum (AFM) indicator as the optimization objective function and optimized the VMD using the improved particle swarm optimization (IPSO) algorithm to obtain the required parameters
K and
α for the VMD decomposition to obtain the sensitive IMFs. Wang et al. [
9] used the Archimedes optimization algorithm (AOA) to optimize the mode number
K and penalty factor
α of the VMD algorithm by taking the minimum average value of all IMFs’ correlation waveform index (Cwi) as the objective function. Jiao et al. [
10] determined the mode number
K required for VMD decomposition according to the method of abnormal decline of center frequency (ADCF). Duan et al. [
11] combined the improved VMD and sample entropy (SE) to determine the value of
K by the maximum correntropy criterion (MCC), which effectively improved the statistical properties of highly nonlinear process errors. Li et al. [
12] proposed a genetic algorithm (GA) to optimize VMD decomposition parameters
K and
α, which decomposes the optimal IMFs and improves the accuracy of VMD decomposition. Extracting appropriate feature information is the key that determines the accuracy and reliability of fault diagnosis results. He et al. [
13] used an improved sparrow search algorithm to optimize the VMD parameters with dispersion entropy as the fitness value and used the optimized VMD algorithm to decompose the original signal into a series of mode components and calculate the energy entropy of each mode component to complete the flywheel bearing fault diagnosis. Xue et al. [
14] calculated the dispersion entropy of IMF components in different frequency bands and then used the joint approximate diagonalization of eigenmatrices (JADE) to extract fusion features and finally obtain the hierarchical discrete entropy (HDE) for bearing fault diagnosis. Wang et al. [
15] proposed a feature extraction method based on the combination of variational mode extraction (VME) and multi-objective information fusion band-pass filter (MIFBF). Yang et al. [
16] used the fractional Fourier transform (FRFT) algorithm to extract fault features from the original signals and then used stochastic resonance (SR) to enhance the weak fault feature information to complete bearing fault diagnosis according to the fault feature frequency. Yan et al. [
17] performed VMD decomposition of bearing signals, and the calculated multi-scale envelope dispersion entropy (MEDE) of the IMF component was used as the feature to complete bearing fault pattern recognition. Zheng et al. [
18] calculated the permutation entropy (PE) value of each IMF obtained by VMD decomposition to reflect the characteristic information of the bearing vibration signal. Zhang et al. [
19] combined VMD and sample entropy and used the multi-domain indexes to construct the feature vector to characterize the fault information.
An intelligent fault diagnosis method is needed for pattern recognition of rolling bearings in order to enable rapid fault diagnosis of fault characteristic information and avoid mechanical equipment failures. Vapnik [
20] proposed the support vector machine (SVM) machine learning algorithm mainly to solve the problems of nonlinearity as well as insufficient samples. Zhang et al. [
21] used multi-scale information entropy to construct a sample set, and IPSO optimization SVM was used to realize bearing fault diagnosis. Wang et al. [
22] used quantum-behaved particle swarm optimization (QPSO) and multi-scale permutation entropy (MPE) to extract features from denoising bearing signals and then used SVM to identify faults. The experimental results show that the proposed fault diagnosis method can identify bearing fault types well. Ye et al. [
23] used VMD-MPE to construct feature vectors, then used PSO to optimize SVM to improve the model recognition accuracy. However, SVM is complicated to solve the non-equation constraint problem, and in order to reduce the solution difficulty, Suykens [
24] improved SVM and proposed the least square support vector machine (LSSVM), which replaced the non-equation constraint in SVM with an equation constraint, greatly reducing the solution difficulty. The LSSVM algorithm has been widely applied in the field of industrial intelligence in recent years [
25,
26,
27,
28]. He et al. [
29] used wavelet packet transform to extract fault features and combined them with LSSVM to complete the fault identification of circuit output voltage signals. Gao et al. [
30] fused singular entropy, energy entropy, and permutation entropy to obtain complementary features, combined with the PSO algorithm to optimize LSSVM, and successfully completed the diagnosis of bearing faults. Zhao et al. [
31] extracted narrowband kurtosis vectors from the cyclic correntropy spectrum (CCES) as feature vectors of LSSVM for the early detection and classification of locomotive axle bearing faults. Zhu et al. [
32] used VMD to decompose the bearing vibration signal, used the fuzzy entropy of each IMF as the feature vector, optimized the LSSVM model by the gray wolf optimizer (GWO) algorithm, and finally completed the identification of the rolling bearing faults.
The methods in the above literature simply perform individual optimization of feature extraction or model parameters, which limits the accuracy of rolling bearing fault diagnosis. The future trend is definitely to optimize feature extraction and model parameters simultaneously with different algorithms to avoid the problem of low accuracy caused by individual optimization. In this paper, the whale algorithm (WOA) is used to optimize the VMD algorithm, and the optimal combination of parameters (K, α) required for VMD decomposition is obtained. According to the Pearson correlation coefficient (PCC) criterion, the optimal IMF component is selected, and its optimal multi-scale permutation entropy is calculated to form the feature set. Finally, k-fold cross-validation was used to train the MPSO-LSSVM model, and the test set was input into the trained model for identification. The experimental results show that compared with PSO-SVM, LSSVM, and PSO-LSSVM, the MPSO-LSSVM fault diagnosis model has higher recognition accuracy. Meanwhile, compared with VMD-SE, VMD-MPE, and PSO-VMD-MPE, WOA-VMD-MPE can extract more accurate features.