1. Introduction
As a pivotal component of hydraulic systems, hydraulic plunger pumps are extensively utilized in fields such as aerospace, mechanical manufacturing, and construction machinery [
1,
2,
3]. However, due to the harsh working conditions they endure, such as high pressure, high speed, and heavy loads, the likelihood of faults significantly increases [
4,
5]. When faults occur, they can lead to equipment damage and increased maintenance costs at best and jeopardize the safety of personnel at worst [
6,
7,
8]. Therefore, ensuring the normal operation of hydraulic plunger pumps and enhancing equipment reliability necessitates timely and accurate fault diagnosis, which is of paramount importance.
At present, the field of fault diagnosis for hydraulic plunger pumps has garnered extensive attention and in-depth research from numerous scholars worldwide, with most methods focusing on vibration signal analysis [
9,
10,
11,
12,
13]. However, the acquisition of vibration signals requires contact measurement, and sensor placement is often restricted, making it inconvenient in certain situations. In contrast, sound signal acquisition is more convenient and can be achieved non-contact. When a hydraulic plunger pump malfunctions, its sound pressure level inevitably changes [
14]. By analyzing and processing sound signals, sensitive features related to specific faults can be effectively extracted, enabling a fault diagnosis of the hydraulic plunger pump. Zhu et al. proposed a fault detection method for hydraulic plunger pump sound signals based on particle swarm optimization-enhanced convolutional neural networks (PSO-CNN) [
15]. Ugli et al. introduced a diagnostic method based on the automatic optimization of a one-dimensional convolutional neural network (1D-CNN) structure using a genetic algorithm, achieving high recognition accuracy for axial hydraulic plunger pump sound signals [
16]. Zhang et al. developed a fault diagnosis method for hydraulic pumps based on the transfer of a ResNet-50 model using average spectrogram histograms of voiceprints [
17]. Tang et al. proposed an adaptive CNN-based fault diagnosis method for hydraulic plunger pumps using acoustic images, which demonstrated high accuracy and robustness [
18].
As a common feature in audio signal analysis, cepstral coefficients [
19,
20] effectively capture spectral information, making them suitable for classification and recognition tasks. Among them, Mel Frequency Cepstral Coefficients (MFCCs) [
21,
22], one of the most widely utilized acoustic features, excel at extracting low-frequency information and are thus highly effective in capturing spectral details of sound signals. Sun et al. proposed a fault diagnosis method based on MFCCs and a transfer learning CNN network, which accurately identified the operational states of a water supply pump [
23]. However, its limited resolution in high-frequency regions often results in the loss of critical details of high-frequency fault patterns, thereby impacting diagnostic accuracy. The Inverse Mel Frequency Cepstral Coefficients (IMFCCs) improve high-frequency feature extraction, enhancing the recognition of high-frequency fault modes. Zhang et al. introduced an equipment sound recognition method that integrates MFCC and IMFCC features, achieving high average recognition rates and accuracy, though its susceptibility to environmental noise weakens its robustness [
24]. Gammatone Frequency Cepstral Coefficients (GFCCs) [
25], which align with human auditory characteristics, demonstrate strong noise resistance under noisy conditions. Hu et al. developed a hybrid feature extraction method combining MFCCs and GFCCs with wavelet decomposition, classified via a CNN network, which enabled an accurate recognition of helicopter audio signals [
26]. However, this approach has limited sensitivity to low-frequency signals, and CNN networks offer constrained temporal modeling capabilities [
27,
28,
29,
30]. Linear Prediction Cepstral Coefficients (LPCCs) emphasize the resonant properties of audio signals. Ding et al. conducted multi-dimensional feature extraction on audio data by combining MFCCs and LPCCs, then reduced and normalized the features with PCA, applying a Support Vector Machine (SVM) for fault classification, which successfully diagnosed CNC machine tool faults [
31]. Despite its effectiveness in identifying certain mechanical faults, this method is somewhat limited under multi-fault scenarios and complex noise conditions, and SVM’s computational complexity poses challenges for large-scale data processing [
32,
33].
To more comprehensively characterize the features of hydraulic plunger pump sound signals and fully leverage the advantages of various cepstral coefficients, this study employs a fusion of MFCCs, IMFCCs, GFCCs, and LPCCs to create a hybrid cepstral feature known as MIGLCCs. An MFCC primarily focuses on the spectral characteristics of sound signals with high resolution in the low-frequency range [
34,
35,
36,
37], an IMFCC emphasizes high-frequency details [
38,
39,
40], a GFCC exhibits robust noise resistance [
41,
42,
43,
44], and an LPCC reflects the resonant peak characteristics of sound signals [
45,
46,
47]. This fusion aims to enhance the representational capability of hydraulic plunger pump sound signals by complementing the strengths of each cepstral coefficient. To date, no one has applied a method combining this hybrid cepstral feature and a double layer long short-term memory (DLSTM) network to the diagnosis of hydraulic plunger pump sound signals. Therefore, this paper proposes an intelligent diagnosis method for hydraulic plunger pumps based on MIGLCC-DLSTM using sound signals. The primary contributions are as follows:
(1) A novel fused feature, an MIGLCC, based on four classical cepstral features (MFCC, IMFCC, GFCC, and LPCC), is proposed for the first time. An MIGLCC significantly enhances the representation of high- and low-frequency features while improving noise resistance and formant capture capability. By fully leveraging the complementary strengths of multiple cepstral features, it provides a comprehensive and precise feature description framework for the intelligent diagnosis of sound signals.
(2) Deep learning techniques are introduced through the design of a double layer long short-term memory (DLSTM) network, optimizing the classification model’s training process. Incorporating a Dropout layer optimization strategy effectively reduces overfitting risk and enhances model generalization. By integrating MIGLCC features with the DLSTM network, the MIGLCC-DLSTM intelligent diagnosis method achieves efficient modeling of time-series information in sound signals while maintaining low model complexity. This approach demonstrates exceptional diagnostic accuracy, making it particularly suitable for real-time fault diagnosis in industrial scenarios.
(3) The method consistently achieves high diagnostic accuracy in evaluating the operating states of hydraulic plunger pumps under various working conditions, underscoring its practicality in complex industrial environments. Additionally, validation using the open-source CWRU bearing dataset and actual steam turbine high-pressure servo motor state monitoring data confirms the outstanding generalization capability of the MIGLCC-DLSTM method, showcasing its broad application potential across diverse industries.
The subsequent chapters of this paper are arranged as follows:
Section 2 introduces various cepstral coefficient feature extraction methods and reviews the fundamental principles of LSTM networks.
Section 3 experiments on hydraulic plunger pumps and analyzes the effectiveness of the proposed method based on the experimental results.
Section 4 presents the extended application experiments of the process.
Section 5 concludes with the final findings.
3. Hydraulic Plunger Pump Simulation Experiment
This paper presents an intelligent diagnosis method for hydraulic plunger pump based on MIGLCC-DLSTM using sound signals, by combining the hybrid cepstral features MIGLCC extracted from the pump’s sound signals with the DLSTM network to construct the MIGLCC-DLSTM diagnostic model. The DLSTM network structure is illustrated in
Figure 6. This method enables intelligent diagnosis of hydraulic plunger pumps based on sound signals, with the process flow depicted in
Figure 7. The specific implementation steps were as follows:
Step 1: Collect monitoring signals from the hydraulic plunger pump in various states and save the collected data into the computer.
Step 2: Divide the collected sound data into training and testing sets, and label the sound data corresponding to different states of the hydraulic plunger pump.
Step 3: Pre-process the training and testing sets by applying pre-emphasis (with a coefficient of 0.97) and framing with a Hamming window.
Step 4: Extract different cepstral features—MFCC, IMFCC, GFCC, and LPCC—where l represents the order of each cepstral feature, which was set to 12 in this study.
Step 5: Fuse the features through vector concatenation, normalize them, and generate the hybrid cepstral feature MIGLCC (MFCC and IMFCC and GFCC and LPCC).
Step 6: Establish the DLSTM network model, initializing the DLSTM network weight parameters using a Gaussian distribution.
Step 7: Define the model learning rate, number of iterations, and batch size. Input the feature data into the designed DLSTM network model.
Step 8: Train the MIGLCC-DLSTM diagnostic model using the hybrid cepstral feature MIGLCC data extracted from the training set.
Step 9: Test and evaluate the trained model using the MIGLCC feature data extracted from the testing set.
Step 10: Calculate evaluation metrics such as the accuracy of the model on the testing set and comprehensively assess the model’s performance.
3.1. Construction of Experimental Setup and Signal Acquisition
The experiment was conducted on a hydraulic plunger pump fault simulation test bench, as shown in
Figure 8, with the system’s operating principles illustrated in
Figure 9. A domestically produced AWA5661 precision pulse sound level meter was chosen to capture acoustic signals. This device, equipped with a condenser microphone with a sensitivity of 40 mV/Pa and a frequency response range of 10~16,000 Hz, is well suited for capturing the acoustic characteristics of the hydraulic plunger pump in operation. The sound level meter is designed to convert sound pressure levels into AC voltage signals, which are then input into a computer via a data acquisition card for accurate monitoring and analysis.
During the experiment, the sound signal collection process may be influenced by background noise from sources such as dynamic and transmission noise. This is especially true when the sensor is positioned farther from the sound source, where equipment friction noise may be masked by reverberation, resulting in decreased signal stability. To address this, the experiment employs a near-field measurement approach, positioning the sensor within 0.5 m of the plunger pump and approximately 0.75 m above the ground to capture the primary acoustic signals of the hydraulic plunger pump’s operation. Based on these measurements, the point with the highest sound pressure level around the plunger pump was selected as the main measurement location, where the sound level meter was suspended. A windscreen was also installed to effectively reduce ambient wind noise interference, ensuring a high signal-to-noise ratio in the collected acoustic data.
In addition to using a sound level meter to capture acoustic signals during the hydraulic plunger pump’s operation, pressure and acceleration sensors were also employed to monitor the pressure at the pump’s outlet and the vibrations of the pump casing. These signals were synchronously collected and recorded on a computer, though this study will not delve into these aspects in detail. Once collected via a data acquisition card, the sensor signals were transferred to the computer for monitoring, recording, and further processing. The primary components used in the hydraulic plunger pump fault simulation test, along with their specific models and parameters, are detailed in
Table 1.
To ensure the simulated faults accurately reflect actual fault conditions encountered in hydraulic plunger pumps, common faults were replicated through controlled fault injection, as outlined in
Table 2. This simulation included typical fault states, such as swash plate wear, slipper wear, and loose slipper. During this process, operators introduced faults by physically intervening with key components of the hydraulic plunger pump under preset fault conditions, recording the real-time state of the faulted components. Photographs of the corresponding faulted components are shown in
Figure 10.
During the experiments, the system pressure was set to 5 MPa, the sampling frequency was set to 10 kHz, and each sampling duration was 10 s. The collected data for normal and three fault conditions were stored in the computer for subsequent validation of the intelligent fault diagnosis algorithm.
3.2. Experimental Data Partitioning
To validate the effectiveness of the proposed intelligent diagnosis method, MIGLCC-DLSTM, the collected sound data from the hydraulic plunger pump experiment were analyzed. The time–domain waveforms and power spectra of the sound signals in different states are depicted in
Figure 11. Although there are observable differences in the fluctuations and power spectrum distributions of sound signals across various states, these differences are not readily distinguishable merely by observing the time–domain waveforms and power spectra, making it challenging to accurately identify different fault types of the hydraulic plunger pump.
To accurately analyze the local features and dynamic changes within the signals while enhancing the efficiency of the analysis process and the model’s applicability, it is necessary to appropriately segment the collected sound data. When dividing continuous time-series data into individual segments, each segment should fully cover one rotational period of the hydraulic plunger pump. Hence, each data segment should contain at least
N sampling points, where
N = 60
fs/
np, with
fs being the sampling frequency and
np being the pump speed. Therefore, in this experiment, 1024 sampling points constituted 1 data segment, resulting in 97 data segments for each of the normal and three fault states, totaling 388 data segments across four states. The dataset was divided into training and testing sets based on different partitioning ratios to comprehensively evaluate the model’s performance. The specific partitioning method is shown in
Table 3.
3.3. MIGLCC Feature Extraction
In this experiment, the frame length was set to 256 sampling points (25.6 ms, typically 10–30 ms) [
54], with an overlap region of 50% (12.8 ms, typically around 10 ms) [
54] and a pre-emphasis coefficient of 0.97, and the Hamming window was applied. During the extraction of individual cepstral coefficient features, each data segment was divided into 7 frames, and the order
L for different cepstral coefficients was set to 12, meaning that each frame’s feature vector had 12 dimensions. Consequently, a set of 7 feature samples, each with 12 dimensions, was extracted from each data segment. The different cepstral coefficient features under normal and three fault conditions are shown in
Figure 12.
As observed in
Figure 12, for data of the same state type of the hydraulic plunger pump, the four different single cepstral coefficient features reflect various cepstral coefficient changes under the same frame and the same feature dimension. This indicates that each cepstral coefficient feature captures different aspects of the same sound signal of the hydraulic plunger pump, highlighting the diversity and complementarity of these cepstral coefficient features. This indirectly suggests the potential and necessity of fusing these features to enhance the performance of hydraulic plunger pump sound signal analysis. Moreover, although subtle variations in the fluctuations of identical individual cepstral coefficients were observed across the data from the four distinct operational states of the hydraulic plunger pump—some of which may be attributed to noise or temporal instability—these features still exhibited a notable sensitivity to changes in the sound signals. Therefore, to enhance the stability and robustness of the analysis results, it was necessary to fuse the individual cepstral coefficient features for comprehensive analysis.
The hybrid cepstral feature MIGLCC used in this experiment comprised MFCC, IMFCC, GFCC, and LPCC, unified in dimension [
48] and concatenated into vectors, followed by normalization. Each frame’s feature vector dimension was thus 48, resulting in a set of 7 feature samples, each with 48 dimensions, extracted from each data segment.
3.4. Network Parameter Configuration
In this experiment, we utilized the deep learning framework based on PyTorch (version 1.12.0+cu113) with Python 3.9 as the programming language. The runtime environment was configured using Anaconda, and PyCharm was employed as the integrated development environment. The operating system used was Windows 11, with 16.0 GB of memory, a 12th Gen Intel(R) Core(TM) i5-12500H CPU, and an NVIDIA GeForce RTX 3050 Laptop GPU. The experiment implemented a DLSTM network (subsequent sections include a sensitivity analysis of the parameters, where the impact of different network layers on model performance was examined, ultimately confirming the use of a two-layer LSTM network) [
22]. In addition to the number of network layers, other critical parameters included the learning rate, the number of hidden units, dropout rate, batch size, and number of epochs. The specific configurations are detailed in
Table 4.
3.5. Analysis of Experimental Results
3.5.1. Performance Comparison of Diagnostic Models with Different Data Partition Ratios
To compare the three different partitioning methods of the aforementioned dataset, the proposed MIGLCC-DLSTM method was employed for diagnosis. During the experiment, the input to the DLSTM network consisted of the extracted MIGLCC feature data from the hydraulic plunger pump sound signals. The dimensions of the input and output data for each layer of the network are detailed in
Table 5. To ensure the accuracy of the test results, each test was repeated 10 times under the same parameter conditions. The average overall accuracy of the 10 tests, along with the precision, recall, and F1 score under the Macro criterion, were used as evaluation metrics. The results are presented in
Table 6.
As observed, when the proportion of the training set in the dataset decreased from 70% to 20%, the overall accuracy of the MIGLCC-DLSTM model in diagnosing the normal state and the three fault states of the sound signal decreased from 99.41% to 98.88%, and the F1 score dropped from 0.9940 to 0.9887. This indicates that the MIGLCC-DLSTM model’s ability to capture overall data characteristics diminishes when the training data are reduced. To enhance the model’s capacity to capture long-term dependencies within the data and improve its overall performance and generalization ability, subsequent experiments were conducted using Dataset 1 (with a 7:3 split ratio). Nevertheless, despite the reduction in training data, the model’s overall accuracy and F1 score remained at a high level, demonstrating strong generalization capabilities, which is of significant importance for practical industrial applications.
3.5.2. Performance Comparison of Diagnostic Models with Different Cepstral Features
To validate the diagnostic superiority of the proposed hybrid cepstral feature MIGLCC within the DLSTM network, we selected individual cepstral features MFCC, IMFCC, GFCC, and LPCC, along with dual-feature combinations MICC (MFCC and IMFCC), MGCC (MFCC and GFCC), MLCC (MFCC and LPCC), IGCC (IMFCC and GFCC), ILCC (IMFCC and LPCC), and GLCC (GFCC and LPCC), as well as triple-feature combinations MIGCC (MFCC and IMFCC and GFCC), MILCC (MFCC and IMFCC and LPCC), MGLCC (MFCC and GFCC and LPCC), and IGLCC (IMFCC and GFCC and LPCC) for the comparative analysis. Each of these features was extracted from Dataset 1 and input into the DLSTM network for diagnosis. Evaluation was based on the average overall accuracy, precision, recall, and F1 score under the Macro criteria across 10 trials, with the experimental results presented in
Table 7.
The results indicate that the proposed hybrid cepstral feature MIGLCC achieved an overall accuracy of 99.41%, a precision of 99.39%, a recall of 99.43%, and an F1 score of 0.9940. This demonstrates that the MIGLCC feature effectively combines the advantages of an MFCC, which provides good resolution for the low-frequency parts of sound data; an IMFCC, which offers higher resolution in the mid-to-high-frequency range; a GFCC, which exhibits robust noise interference resistance; and an LPCC, which reflects the resonant peak characteristics of the signal. By complementing each other’s strengths, an MIGLCC provides a more comprehensive description of the hydraulic plunger pump sound data. Compared to the average diagnostic results of a single cepstral feature, the overall accuracy improved by 10.09%, precision by 10.65%, recall by 9.72%, and F1 score by 0.1120, indicating that the single feature cannot fully capture the complexity and diversity of hydraulic plunger pump sound data. Compared to the dual-feature and triple-feature fused cepstral features, the hybrid cepstral feature MIGLCC demonstrated superior performance in overall accuracy, precision, recall, and F1 score. This indicates that an MIGLCC possesses richer information representation capabilities, capturing critical information in the hydraulic plunger pump sound data more comprehensively and accurately. This enhanced capability aids in distinguishing between different state categories, providing stronger robustness when dealing with hydraulic plunger pump sound data under various conditions.
The average classification accuracy of the MIGLCC-DLSTM model across multiple experiments is illustrated in
Figure 13. For the four distinct operating conditions of the hydraulic plunger pump, the model demonstrates remarkable consistency, with a minimum recognition accuracy of 98.30% and a maximum of 100%. Over repeated experiments, the overall average accuracy reaches 99.41%, with a standard deviation of 0.664. These results highlight the MIGLCC-DLSTM method’s minimal susceptibility to random factors across trials, showcasing exceptional stability and robustness, thereby establishing a solid foundation for reliable application in complex industrial environments.
To further investigate the optimal intelligent diagnosis performance of the hybrid cepstral feature MIGLCC in the DLSTM network, confusion matrices for single trials under MFCC, IMFCC, MICC, MIGCC, MILCC, and MIGLCC features were presented, along with a detailed analysis of the DLSTM network performance with different feature inputs from the hydraulic plunger pump sound data. The confusion matrices are shown in
Figure 14.
In these matrices, 0 represents normal, 1 represents swash plate wear, 2 represents slipper wear, and 3 represents loose slipper.
Figure 14a,b illustrate the diagnostic results in the DLSTM network using single cepstral features MFCC and IMFCC, respectively. Both features achieve a 100% recall for the normal and slipper wear conditions, indicating their strong discriminative capability for these states in the acoustic data of the hydraulic plunger pump.
Figure 14c shows the confusion matrix for the diagnostic results of the dual-feature fused cepstral feature MICC in the DLSTM network. Compared with single cepstral features MFCC and IMFCC, while an MICC exhibits a slight reduction in recognition capability for swash plate wear, it improves the recall for loose slipper from 70.59% and 61.76% to 79.41%, highlighting the enhanced recognition rate for loose slipper achieved through feature fusion.
Figure 14d,e display the confusion matrices for the diagnostic results of the triple-feature fused cepstral features MIGCC and MILCC in the DLSTM network. Compared to the single cepstral feature MFCC, these fused features demonstrate a marked improvement in identifying loose slipper, with recall rising from 70.59% to 91.18% and 82.35%, respectively. Compared to the dual-feature fused cepstral feature MICC, although there is a slight reduction in recognition accuracy for the normal condition, an MIGCC and MILCC achieve a 95.83% recall for swash plate wear and recall of 91.18% and 82.35% for loose slipper, respectively, both higher than those of an MICC.
Figure 14f depicts the confusion matrix for the diagnosis results using the proposed four-feature fused cepstral feature MIGLCC in the DLSTM network. The recall for normal state, swash plate wear, and slipper wear reached 100%, and the recall for loose slipper is 97.06%. This demonstrates that MIGLCC features can effectively distinguish the sound data of the hydraulic plunger pump under different state types. Compared to single cepstral features and the dual-feature and triple-feature fusion methods, the MIGLCC exhibits clear advantages. This further illustrates that MIGLCC features fully integrate the strengths of various cepstral coefficients, showcasing the model’s robustness and generalization capability under multi-type fault experimental conditions.
3.5.3. Performance Comparison of Different Diagnostic Methods
To further validate the superiority of the proposed intelligent diagnosis method MIGLCC-DLSTM, it was compared with popular current diagnosis methods, including SVM, 1D-CNN, and RNN. The experiment used the four-feature fused MIGLCC data from the sound signals of the hydraulic plunger pump as input, and the average overall accuracy of 10 trials as the evaluation metric. The results are presented in
Figure 15. The parameter settings for each method are as follows:
- (1)
SVM: The penalty factor
C was set to 0.1, the kernel function type was the radial basis function, and the width parameter
σ was 12. The principle of SVM is illustrated in
Figure 16a.
- (2)
1D-CNN: The network included an input layer, three convolutional layers (with 64, 128, and 256 filters; kernel size of three; and stride of one), three max-pooling layers (pooling window size of two), two fully connected layers, and an output layer. During training, the ReLU activation function was used, with an Adam optimizer, a learning rate of 0.001, a dropout rate of 0.5, a batch size of 16, and 20 epochs. The network structure is shown in
Figure 16b.
- (3)
RNN: The network comprises an input layer, hidden layers, and an output layer. During training, the Adam optimizer was used, with 32 hidden units, a learning rate of 0.001, a dropout rate of 0.2, a batch size of 16, and 20 epochs. The network structure is depicted in
Figure 16c.
When using the SVM method in machine learning, the overall diagnostic accuracy reached 92.31%. With the 1D-CNN network in deep learning, the accuracy increased to 95.56%, with 913,668 parameters and a runtime of 47.18 s. The traditional RNN network achieved an overall diagnostic accuracy of 95.04%, with 23,300 parameters and a runtime of 25.66 s. In contrast, the MIGLCC-DLSTM method proposed in this paper achieved a diagnostic accuracy of 99.41%, with only 223,748 parameters and a significantly shorter runtime of 22.53 s. Although the SVM and RNN methods feature lower model complexity, their accuracy and ability to capture complex patterns in the fault diagnosis of hydraulic plunger pump sound signals lag behind the proposed method. While the 1D-CNN method improved diagnostic accuracy, its large parameter size substantially increased computational complexity. In comparison, the MIGLCC-DLSTM intelligent diagnostic method more effectively captures key information from the data, significantly improving diagnostic accuracy while demonstrating clear advantages in computational complexity and runtime. This provides a more feasible solution for diagnosing hydraulic plunger pump sound signals, validating the method’s superior performance.
3.5.4. Performance Analysis of the Diagnostic Model Under Multiple Operating Conditions
To further validate the applicability of the proposed MIGLCC-DLSTM intelligent diagnostic method under various operating conditions of the hydraulic plunger pump, different working conditions were simulated by altering the pressure of the test pump. Pressure is one of the most critical parameters in hydraulic systems, effectively reflecting the system’s state under varying loads. Sound data were collected under pressures of 2 MPa, 8 MPa, 10 MPa, and 15 MPa for the hydraulic plunger pump in its normal state as well as under conditions of swash plate wear, slipper wear, and loose slipper. All other experimental conditions remained unchanged to ensure the model’s capability of accurately diagnosing faults across a wide range of load conditions. The sound data were processed using the proposed MIGLCC feature extraction method and then input into the DLSTM network for diagnosis. The model’s performance was evaluated using the average overall accuracy, precision, recall, and F1 score under the Macro criterion across 10 trials.
Table 8 presents the test results of the MIGLCC-DLSTM intelligent diagnostic model under different operating conditions.
The results show that under the operating conditions of 2 MPa, 8 MPa, 10 MPa, and 15 MPa, the overall diagnostic accuracy of the MIGLCC-DLSTM model reached 98.89%, 99.40%, 98.63%, and 98.97%, respectively, with corresponding F1 scores of 0.9885, 0.9939, 0.9841, and 0.9895. These findings demonstrate that the MIGLCC-DLSTM model maintains excellent diagnostic performance across different operating conditions of the hydraulic plunger pump. This highlights not only the model’s strong fault recognition capability under single operating conditions but also its ability to sustain high diagnostic accuracy and generalization across varying conditions. Future research could extend to more practical scenarios, exploring the model’s diagnostic performance under varying environmental temperatures, rotational speeds, and other factors to further enhance its robustness and practical application value.
3.6. Parameter Sensitivity
Compared to single-layer neural networks, multi-layer neural networks exhibit enhanced capabilities in feature extraction, particularly in handling complex signals. By increasing the number of layers in an LSTM network, the representational capacity of the network can be improved, enabling it to effectively learn abstract features from high-dimensional time-series data and thereby enhance the model’s recognition accuracy. However, as the number of layers increases, the complexity of the model also rises, leading to longer training times and a higher risk of overfitting. Therefore, this study evaluated the performance of LSTM networks with different numbers of layers using the extracted MIGLCC feature data. The experimental results are shown in
Figure 17.
It can be observed that in diagnosing the sound data of the hydraulic plunger pump, the model achieved an overall diagnostic accuracy of 99.41% with a training time of 22.53 s when the LSTM network had two layers. This configuration demonstrated a significant advantage in overall accuracy compared to network structures with one or four layers. When compared to a three-layer network structure, the two-layer network achieved the same overall diagnostic accuracy but required less training time, indicating higher efficiency. Considering a balance between overall diagnostic accuracy and training time, this study ultimately employed a two-layer LSTM network structure for the intelligent diagnosis task to achieve optimal model performance.
3.7. Visualization of Feature Representations
Through repeated experimental validation, the proposed intelligent diagnosis method, MIGLCC-DLSTM, demonstrated outstanding recognition capabilities across four different state types of hydraulic plunger pump sound signals. To intuitively illustrate the feature learning process of the MIGLCC-DLSTM diagnostic model, we employed t-distributed Stochastic Neighbor Embedding (t-SNE) to analyze the hydraulic plunger pump sound data. The feature clustering results are presented in
Figure 18. Here, 0 represents normal, 1 represents swash plate wear, 2 represents slipper wear, and 3 represents loose slipper, while component1 and component2 represent the two dimensions after t-SNE visualization and dimensionality reduction. As shown in
Figure 18a, the original sound data appear as scattered, disorganized points. After extracting MIGLCC features, the data for different state types start to cluster, although not very tightly. The normal state data features are somewhat separated from the other three state categories, while the features for swash plate wear, slipper wear, and loose slipper are closer together, with some overlap between slipper wear and loose slipper features. Following processing through the LSTM1 and LSTM2 layers of the DLSTM network, data of the same state type become more densely clustered, and the distances between clusters of different state types increase further. Finally, after the data passes through the FC layer of the DLSTM network, four distinct clusters can be observed, significantly enhancing the classification effect. The results indicate that the intelligent diagnosis method, MIGLCC-DLSTM, possesses robust recognition and classification capabilities for hydraulic plunger pump sound signals.
5. Conclusions
This paper proposes an intelligent diagnostic method for hydraulic plunger pumps based on sound signals, utilizing the MIGLCC-DLSTM model for analyzing the collected sound data. Through cepstral analysis of the sound signals, four distinct features—an MFCC, IMFCC, GFCC, and LPCC—were extracted and fused into a novel mixed cepstral feature, MIGLCC, which was then fed into the DLSTM network for diagnosis. The results demonstrate the following:
The MIGLCC feature effectively integrates the strengths of the individual cepstral features, excelling particularly in capturing both high- and low-frequency information, noise resilience, and resonance peak characteristics. The method achieved an overall diagnostic accuracy of 99.41%, significantly surpassing that of single- or dual- and triple-feature fusion methods, thereby proving its superior ability to represent the complexities and nuances of hydraulic plunger pump sound signals. In comparative analyses with other diagnostic approaches, MIGLCC-DLSTM exhibited far superior performance, with a total parameter of 223,748 and a running time of 22.53 s, showcasing exceptional control over model complexity and computational efficiency.
Furthermore, under various operational conditions, ranging from 2 MPa to 15 MPa, the method maintained a high diagnostic accuracy of 98.63% to 99.41%, underscoring its robust fault detection capabilities and remarkable generalization across different scenarios. By virtue of its outstanding feature extraction capabilities, high accuracy, and exceptional operational efficiency, the MIGLCC-DLSTM intelligent diagnostic method presents a highly effective and reliable solution for hydraulic plunger pump fault diagnosis, offering vast potential for industrial applications. Additionally, when applied to other monitored systems, such as bearings and servo motors, this method continued to deliver excellent diagnostic performance, further affirming its versatility and broad applicability.
However, to enhance the accuracy and practicality of our diagnostic model, future research will focus on integrating physical characteristics of hydraulic systems with data-driven deep learning approaches, aiming to develop hybrid models that reflect the system’s internal mechanisms. This would improve both the interpretability and robustness of fault diagnosis, ensuring reliability in detecting novel fault types. Additionally, more sophisticated feature fusion techniques—such as attention mechanisms and multi-level fusion—will be explored to better combine acoustic features, further boosting diagnostic performance. Finally, a deeper investigation into the acoustic feature variations of hydraulic pumps under varying operational conditions and their underlying physical mechanisms will refine the feature extraction process, improving both the accuracy and timeliness of fault detection. These efforts will contribute to more precise, generalizable, and efficient fault diagnosis systems in industrial applications.