1. Introduction
The intelligent operation and maintenance of modern road infrastructure are highly dependent on long-term, high-fidelity in situ monitoring data. Sensors embedded within asphalt pavements serve as a critical means of acquiring internal structural responses and evaluating service status [
1,
2]. However, subjected to harsh service conditions—including construction disturbances, high-frequency cyclic loading, steep temperature gradients, and environmental corrosion—these sensors are prone to performance degradation and various types of faults. This leads to distorted monitoring data, which can subsequently trigger erroneous maintenance decisions. Therefore, developing a precise, efficient, and robust automated fault diagnosis method for these sensors is of great significance for ensuring data quality and solidifying the foundation for concepts such as “smart roads”.
As a core component for ensuring the reliability of Structural Health Monitoring (SHM) systems, sensor fault diagnosis has evolved along several technological paths. Model-based diagnostic methods identify faults by comparing the residuals between predictions from a mathematical or physical model and actual measurements [
3]. Yan et al. [
4] applied the generalized quasi-natural ratio test for self-diagnosis of sensors in building SHM systems, while Lydakis et al. [
5] identified sensor faults by establishing an overdetermined system between measurement signals and the actual structural motion. Li et al. [
6] utilized the generalized likelihood ratio and correlation coefficients to detect sensor faults in bridge SHM systems. Although such methods are physically interpretable, their diagnostic accuracy is highly dependent on model fidelity. Establishing a high-fidelity model for a system as complex as an asphalt pavement—characterized by material nonlinearity [
7,
8,
9], intricate interlayer interactions, and variable loading and environmental conditions—is exceptionally challenging.
To overcome this limitation, data-driven fault diagnosis methods have emerged. Their primary advantage lies in their ability to learn fault patterns directly from monitoring data, thereby circumventing the need for precise models and demonstrating greater flexibility and applicability in complex systems. Traditional data-driven approaches rely on signal processing techniques for feature extraction, such as Principal Component Analysis (PCA) [
10,
11] and Wavelet Transform (WT) [
12]. However, PCA’s inherent linear assumptions hinder its ability to capture nonlinear relationships [
13,
14], while the performance of WT is sensitive to the choice of wavelet basis and decomposition levels [
15,
16], and its robustness and sensitivity may be limited in the context of strong background noise and weak fault signals in road engineering.
In recent years, Artificial Intelligence (AI) techniques, particularly machine learning and deep learning, have gained prominence in fault diagnosis due to their powerful nonlinear mapping and automatic feature learning capabilities [
17,
18,
19]. While traditional machine learning methods like Support Vector Machine (SVM) [
20] and K-Nearest Neighbors (KNN) [
21] have been applied, their effectiveness is constrained by the quality of hand-crafted features. Deep learning methods, on the other hand, show immense potential by automatically learning deep, abstract features from raw or minimally processed data [
22]. Jana et al. [
23] combined a CNN and a CAE for real-time fault handling; Tang et al. [
24] utilized a CNN to fuse multi-source information for sensor anomaly detection; and Li et al. [
25] employed a Transformer-enhanced DenseNet to achieve precise fault localization. Furthermore, promising results have been achieved using LSTM-based fault classification and isolation [
26,
27], specific deep learning architectures for fault detection and signal reconstruction [
28,
29,
30], multi-instance learning for sensor failure problems [
31], and hybrid methods combining wavelet analysis with neural networks [
32,
33]. Notably, the Attention Mechanism, a state-of-the-art technique, has already been shown to significantly enhance diagnostic accuracy in complex engineering systems by enabling models to dynamically focus on critical segments of time-series data [
34,
35,
36].
However, a significant disconnect persists between these advanced techniques and their application within pavement engineering. The application of deep learning is a relatively recent development in pavement engineering, a field historically dominated by research into materials and structures [
37,
38,
39]. Current deep learning research in the field is predominantly focused on “downstream” tasks, such as pavement performance prediction [
40,
41,
42] and image-based distress detection [
43,
44,
45], which implicitly assume the integrity and accuracy of their underlying data sources. This prevailing oversight highlights a critical research gap: the absence of robust diagnostic methods for the “upstream,” foundational challenge of ensuring embedded sensor reliability. This gap is not accidental; it stems from two fundamental and intertwined challenges inherent to the domain: (1) Scarcity of Domain-Specific Fault Knowledge and Data: The operational environment for sensors embedded in asphalt—subjected to extreme temperatures, compaction pressures, and being non-replaceable—is unique. Consequently, their failure modes, evolutionary paths, and signal signatures remain poorly characterized. Moreover, the difficulty in acquiring well-labeled, in-service fault data presents a formidable barrier to developing and validating specialized data-driven models. (2) Inherent Complexity of Compound Faults: Under real-world conditions, sensor failures rarely occur in isolation. Instead, they often manifest as “compound faults,” where multiple fault mechanisms coexist, producing complex and latent signal patterns. These composite failures pose a severe challenge to the robustness and resolution of existing methods, which are ill-equipped for such diagnostic complexity.
To address these challenges, this paper introduces a problem-driven, intelligent diagnostic framework that makes three primary contributions: (1) Systematic Problem Formulation. This study systematically defined the unique failure modes of embedded pavement sensors, including compound fault. It further proposed a novel sample construction method to resolve the representational challenges posed by heterogeneous time-scale features in the sensor data. (2) A Novel “Decompose-Focus-Fuse” Architecture. A specialized architecture was designed to tackle feature heterogeneity, employing parallel, independently pre-trained sub-models to “focus” on two disparate feature sets: short-term statistical features and long-term wavelet coefficients. (3) Attention-Based Intelligent Fusion. An attention mechanism is leveraged to intelligently weigh and fuse the outputs of the expert sub-models. This targeted fusion is critical for accurately decoupling and identifying complex compound-mode failures that are intractable for conventional methods. This model employs a “Decomposition-Focus-Fusion” strategy to efficiently and accurately identify eight sensor operational states, including complex compound faults, thereby providing high-quality foundational data to support the perception, interpretation, diagnosis, and evaluation of pavement performance.
2. Fault Mode Definition and Sample Construction for Embedded Pavement Sensors
Effective fault diagnosis for sensors relies on a comprehensive and realistic fault dataset. In the context of asphalt pavement engineering, however, obtaining labeled real-world fault data is exceptionally difficult due to two primary challenges: data scarcity and labeling complexity. To address this, this study employs a fine-grained fault injection and labeling strategy to generate a synthetic dataset. This strategy systematically defines typical sensor fault modes and resolves the challenge of integrating fault features across different time scales during sample construction.
2.1. Fault Mode Definition
Based on their temporal characteristics, sensor faults are categorized into three types: Short-Term Faults, characterized by sudden, second-scale changes and anomalous fluctuations; Long-Term Faults, characterized by slow, continuous baseline deviations over minutes, hours, or longer; and Compound Faults, where both types occur concurrently.
2.1.1. Short-Term Faults
Short-term faults encompass a range of modes, including complete failure (e.g., no signal), high-frequency noise, freezing (i.e., a stuck constant value), and calibration errors. Calibration errors typically manifest in the signal as bias or gain faults, which are already core components of this study. While faults like complete failure or freezing are important, they often produce highly distinct and readily identifiable signatures (e.g., zero or constant variance) that can be effectively detected by simpler statistical checks. Therefore, this study deliberately focuses on three classic and more challenging short-term faults whose signal characteristics are subtle and easily confused with normal operation or each other and thus pose a greater diagnostic challenge: Bias, Gain, and Detachment.
- (1)
Bias
A bias fault manifests as a nearly constant offset superimposed on the sensor’s output signal, often accompanied by some level of noise fluctuation (
Figure 1). For embedded pavement sensors, this can be caused by aging of a sensing element, deformation of a packaging material, or micro-changes in the interface state between the sensor and pavement structure. Its mathematical model can be expressed as:
where
is the original clean signal,
is the faulty signal (as in subsequent equations),
is the bias magnitude, and
is additive Gaussian white noise.
- (2)
Gain
A gain fault refers to an unintended change in sensor sensitivity, leading to a systematic amplification of the output signal’s amplitude (
Figure 2). This typically originates from alterations in the sensing element’s response characteristics or a failure of calibration parameters. Its model is represented as:
where
is the gain factor simulates the abnormal signal amplification, and
σg(
t) is additive Gaussian white noise.
- (3)
Detachment
A detachment or decoupling fault occurs when the mechanical transfer path between the sensor and the host pavement structure deteriorates or is interrupted. This prevents the sensor from accurately perceiving the true changes in the structural response, manifesting as a significant signal amplitude decay accompanied by abnormal drift and increased noise (
Figure 3). Common causes include adhesive aging, interface delamination, or improper installation. Its model combines signal decay, a drift term, and a noise term:
where
t′ is the time elapsed since the detachment fault began,
is an exponential decay factor representing the signal attenuation, and the function of time
f(
t′) is implemented as a power-law function of normalized time.
d(
t′) represents a potential free oscillation or irregular drift component after decoupling, and
σd(
t′) is additive random noise.
2.1.2. Long-Term Fault
Drift is the archetypal long-term fault, characterized by a slow, continuous, and systematic deviation of the sensor’s baseline over minute-to-hour time scales (
Figure 4). It is typically caused by factors such as sensor element aging, packaging material creep, and the cumulative effects of uncompensated long-term temperature variations. This study employs a combined linear and quadratic model to simulate this process:
where the drift term
d(
t) begins to increase at
t >
tstart and is defined as:
represents a quasi-linear drift trend, often caused by factors like periodic temperature changes, while represents a nonlinear trend, often associated with accelerated material aging. The drift rate coefficients K1 and K2 are determined based on the signal’s dynamic range and the expected severity of the drift.
2.1.3. Compound Faults
Under complex in-service conditions, an embedded pavement sensor may experience the concurrent or sequential effects of multiple fault modes, forming a Compound Fault. In this study, a Compound Fault specifically refers to the co-occurrence of a long-term fault (drift) and a short-term fault (bias, gain, or detachment). As shown in
Figure 5, the features of such faults are complex and subtle, differing only slightly from their single-fault counterparts and thus posing a greater challenge to diagnostic models.
To ensure the diversity and realism of the generated samples, the key parameters in the fault models described in this section were randomly drawn from predefined ranges that are both physically meaningful and produce distinct fault features. The detailed parameters and their value ranges are provided in
Table 1.
2.2. Differentiated Sample Construction Method
To address the significant feature disparities between short-term (e.g., bias) and long-term (e.g., drift) faults and their conflicting requirements for analysis window length, this study proposes a differentiated sample construction strategy. The core of this strategy is to establish a “baseline analysis window length” sufficient to capture the complete evolution of a long-term fault and then apply customized injection and labeling rules for different fault types within this unified dimension. This approach aims to maximize the distinctiveness of each fault’s features while maintaining a consistent input dimension for the model, thereby providing optimized data for subsequent staged training. The construction logic is as follows:
- (1)
Drift Fault Sample Construction: As shown in
Figure 6, a drift fault is injected globally into a long base signal, which is then segmented using a sliding window of the “baseline analysis window length.” To effectively distinguish true drift from normal baseline fluctuations, a significance threshold is introduced. A window is assigned a fault label only if the drift component within it exceeds this threshold.
- (2)
Short-Term Fault Sample Construction: To prevent the dilution or truncation of transient features during sliding window segmentation, a complete short-term fault is directly injected into a single signal segment of the “baseline analysis window length.”(
Figure 7) This entire segment constitutes an independent sample, a method that preserves the integrity of the fault features while maintaining a uniform sample dimension.
- (3)
Compound Fault Sample Construction: To simulate complex fault superimposition, a long-term fault is first injected globally into the base signal, followed by the addition of a short-term fault at a random time point. Samples are generated using the same sliding window approach (
Figure 8). To ensure labeling accuracy, a window is assigned a compound fault label only if two conditions are met simultaneously: first, the long-term fault component must exceed the aforementioned significance threshold; second, to address the issue of feature submergence when a short-term fault occupies only a small portion of a window, an overlap ratio threshold is introduced. A window is identified as a valid compound fault sample only if the duration of the short-term fault exceeds this overlap ratio, thereby preventing mislabeling due to minor overlaps.
All fault injection parameters were randomly drawn from predefined ranges to enhance sample diversity. Finally, Random Oversampling with Replacement was applied to balance the dataset by increasing the number of samples in minority fault classes, thereby improving the model’s generalization capabilities.
2.3. Specific Dataset Construction
Based on the differentiated sample construction method described above, this study utilized healthy sensor signals (sampling frequency 100 Hz) obtained from the full-scale Accelerated Pavement Testing (APT) facility at Changsha University of Science & Technology as the data foundation. To adequately capture the evolutionary characteristics of long-term faults, a 100-s (10,000-point) signal was selected as the injection base. The drift fault was initiated at 15 s. Through experimentation, the baseline window length was determined to be 500 points (5 s), with a step size of 25 points. The diagnostic threshold for drift and the overlap ratio threshold for short-term faults were set to 5% and 20%, respectively.
To implement the “Decomposition-Focus-Fusion” training strategy (detailed in
Section 3.4), three distinct datasets were generated: a Short-Term Fault Dataset and a Long-Term Fault Dataset for the independent pre-training of the specialist sub-models, and a comprehensive Mixed-Fault Dataset for the subsequent training and evaluation of the final fusion model. The composition of these purpose-driven datasets is detailed in
Table 2.
The class distribution in
Table 2 reveals a notable imbalance, with the ‘drift’ class being the largest. This imbalance is a direct consequence of the differentiated sample construction method (
Section 2.2). ‘Drift’ samples are generated by applying a sliding window to a long signal with a globally injected continuous fault—a process that naturally yields a large number of samples—whereas short-term faults are injected into discrete signal segments. To mitigate the risk of overfitting potentially introduced by this class imbalance and the use of oversampling, a comprehensive suite of regularization techniques was implemented during model construction and training. These include high-rate Dropout, an Early Stopping mechanism, a staged training and fine-tuning approach, Weight Decay (L2 regularization), and the use of a Focal Loss function.
4. Experimental Setup, Results, and Discussion
4.1. Validation Case
To ensure the practical engineering relevance of this study, the baseline healthy data were sourced from the full-scale Accelerated Pavement Testing (APT) facility at Changsha University of Science & Technology. This facility, a large-scale indoor linear trafficking device, is a core technological means in pavement engineering for simulating the long-term service behavior of road structures, accelerating damage accumulation, and acquiring critical performance evolution data.
The APT facility features five representative asphalt pavement structures, including semi-rigid base, flexible base, full-depth, inverted, and a novel durable asphalt pavement structure (
Figure 17b). The PaveMLS69 (Manufacturer: PaveTesting Limited, Letchworth, UK) loading system (
Figure 17a) efficiently simulates heavy traffic loads, with key operational parameters—such as axle load (40–75 kN), tire pressure (500–1000 kPa), and loading speed—being precisely controllable. Data collected under this high-fidelity simulation of real-world service conditions formed the basis for the fault simulation and comprehensive dataset construction described in
Section 2. This platform not only validates the effectiveness of the proposed method but also provides evidence for its potential application in real-world structural health monitoring [
46].
4.2. Experimental Setup and Evaluation Metrics
- (1)
Comparative Methods
To comprehensively evaluate the performance of the proposed framework, three representative methods for benchmarking were selected: two classic machine learning models, namely Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel and K-Nearest Neighbors (KNN) with K = 5; and one end-to-end Unified LSTM-Model with a complexity comparable to that of our short-term sub-model. To ensure a fair comparison, all models were evaluated on the same comprehensive fault dataset constructed in
Section 2.3. The input data were standardized: SVM and KNN used the same multi-scale wavelet features as our sub-models, while the Unified LSTM-Model received a unified input sequence formed by directly concatenating the short-term and long-term features.
- (2)
Evaluation Metrics
Performance was evaluated using the following core metrics: Overall Accuracy, Weighted F1-Score, Recall per Class, and the Confusion Matrix for error analysis. The detailed definitions of these metrics are provided in
Table 8.
- (3)
Validation Strategy
Two validation strategies were employed to comprehensively assess model performance. The primary analysis in this study was conducted using a single, stratified 80/20 train-test split, where 80% of the data were used for training and 20% for testing. To further validate the robustness of these findings and ensure they were not dependent on this specific data partition, a subsequent 5-fold cross-validation was also performed. The results from the initial 80/20 split are presented for the main comparative analysis, while the cross-validation results, reported as mean ± standard deviation, are provided as a supplementary validation of the model’s stability and reliability.
4.3. Sub-Model Performance and Generalization Analysis
4.3.1. Performance on the Standard Test Set
Before evaluating the final fusion model, the performance of the independently pre-trained short-term and long-term fault diagnosis sub-models was examined to confirm their effectiveness as foundational modules. The experimental results (
Figure 18) show that the short-term fault sub-model performed perfectly on its specialized test set, achieving an accuracy and a weighted F1-score of 1.0000. Its confusion matrix was completely diagonal, demonstrating the high efficiency and accuracy of the extracted multi-scale wavelet statistical features and the Bi-LSTM architecture in capturing and distinguishing transient fault characteristics. Similarly, the long-term fault sub-model also excelled on its dedicated test set containing only “normal” and “drift” states, achieving a test accuracy of 0.9935. This confirms the success of the strategy that combines lowest-frequency wavelet approximation coefficients with a stacked LSTM network for identifying slow drift trends. The outstanding performance of these two sub-models validates that the “expert” modules in our modular design possess a powerful capability for single-fault-type identification, laying a solid foundation for effective subsequent feature fusion.
4.3.2. Generalization and Robustness Analysis
Furthermore, the robustness of the sub-models against a reduced signal-to-noise ratio was evaluated. To simulate this condition, an additional layer of Gaussian white noise, with a standard deviation equivalent to 2.5% of each signal’s dynamic range, was added to the standard test set. The comparative performance of the sub-models under these conditions is summarized in
Table 9.
The results in
Table 9 indicate that, as expected, performance moderately decreased with the introduction of additional noise. The short-term model’s accuracy saw a degradation of approximately 3.5%, while the long-term model’s accuracy decreased by about 4.2%. Crucially, both models maintained a high accuracy of over 95%. This demonstrates a graceful degradation rather than a catastrophic failure, providing strong evidence that the learned features are robust and resilient to a realistic level of signal interference, thereby confirming the models’ potential for reliable field deployment.
4.4. Fusion Model Performance and Comparative Analysis
To validate the training process and confirm the absence of overfitting, the training and validation history of the Fusion Model is presented in
Figure 19.
The plots demonstrate a healthy training dynamic across both phases. The training and validation curves for accuracy and loss track each other closely, without any significant divergence. This confirms the effectiveness of the anti-overfitting strategies, thereby validating the generalization capability of the final model evaluated below.
Figure 20 and
Table 10 summarize the confusion matrices and overall performance metrics of all models on the entire test set, providing an initial impression for the subsequent in-depth analysis.
As can be seen from
Table 10, the proposed Fusion Model significantly outperforms all comparative methods in both test accuracy and F1-score. To further reveal the deeper differences in the diagnostic capabilities of the models, the following sections will conduct a detailed comparative analysis of their performance from two key dimensions: single-fault and compound-fault diagnosis.
4.4.1. Comparative Performance in Single Fault Diagnosis
Accurate diagnosis of single faults, especially distinguishing between those with similar or subtle features, is critical for assessing a model’s foundational capabilities.
Figure 21 illustrates the performance of each model on the five single-state classes.
As indicated by
Figure 20 and
Figure 21, distinguishing between the “normal” and “drift” states constitutes the primary challenge in single-fault diagnosis. The Fusion Model achieved a near-perfect distinction between “normal” (accuracy: 0.9843) and “drift” (accuracy: 0.9908) states, with virtually no mutual misclassification. This success is attributed to its long-term sub-model’s focused learning on the characteristic slow-varying trend of drift. In contrast, SVM and KNN, which rely on static feature snapshots and lack temporal modeling capabilities, exhibited severe confusion between these two states: SVM misclassified 169 “drift” samples as “normal,” while KNN misclassified 64 “normal” samples as “drift.” Although the Unified LSTM-Model possesses temporal capabilities, its unified architecture for processing heterogeneous features struggled to adequately focus on the key slow-varying characteristics that differentiate “normal” from “drift,” resulting in an accuracy of only 0.7411 for the “normal” state and significant mutual misclassification.
For the other three single-fault types with relatively distinct features, the Fusion Model also performed exceptionally well: “bias” (0.9524), “gain” (0.9921), and “detachment” (0.9870). This is credited to its short-term sub-model’s precise capture of sudden changes in statistical properties during transient faults. SVM performed well in identifying faults that cause distinct and stable changes in signal statistics, such as “bias” (0.9920) and “gain” (0.9760), but its accuracy slightly decreased for the more complex “detachment” fault (0.9421). KNN’s performance was notably poor on “bias” (0.4854) and “gain” (0.7600), as its distance-based metric is prone to sharp performance degradation when feature distributions are not compact or class boundaries are ambiguous; its accuracy on “detachment” (0.8144) was also suboptimal. The Unified LSTM-Model’s accuracy on these three short-term faults (0.9320, 0.9920, and 0.9375, respectively) was superior to that of SVM and KNN, demonstrating LSTM’s ability to learn temporal patterns. However, due to potential interference from the long-term feature information flow on the learning of short-term fault features within the unified model, its balance and overall precision still fell short of the Fusion Model.
4.4.2. Comparison of Compound Fault Diagnosis Performance
The diagnosis of compound faults poses a higher demand on an algorithm’s feature decoupling capability. As revealed by the confusion matrices in
Figure 20 and
Figure 22, the core challenge lies in distinguishing between a pure “drift” fault and a “drift + X” type compound fault, where a transient disturbance is superimposed.
The baseline models exhibited clear deficiencies in this regard. Both SVM and KNN generated substantial bi-directional confusion between pure “drift” and compound faults (SVM misclassified 55 “drift” samples as compound, while KNN misclassified 120 compound samples as “drift”). The root cause is their inability to separate superimposed patterns from static feature snapshots. The Unified LSTM-Model, although an improvement, still showed significant feature confusion (42 “drift” samples were misclassified), indicating that the direct input of heterogeneous features in a single model induces learning interference, hindering effective decoupling of the background and the disturbance.
In stark contrast, the Fusion Model demonstrated exceptional diagnostic capability, achieving near-perfect identification of all three compound fault types (accuracy > 0.97 for all) with almost no confusion with single faults (only 6 misclassifications). This success is attributed to its staged learning, which provides clean, single-fault representations, and its attention mechanism, which can then intelligently identify the specific transient disturbances superimposed on the “drift” background.
Furthermore, an interesting “accuracy paradox” was observed in the experimental results: for both KNN and the Unified LSTM, the diagnostic accuracy for certain compound faults (e.g., “drift + bias”) was higher than that for their corresponding single faults (e.g., “bias”). This is likely not due to an improvement in model capability but rather a statistical artifact—the combination of a weak feature and a strong feature becomes more separable in the feature space. Interestingly, the Fusion Model also exhibited a similar trend (“drift + bias” at 0.9949 > “bias” at 0.9524), but this reflects a true synergistic enhancement effect. In this mechanism, the long-term sub-model’s precise capture of the “drift” background provides a strong context, enabling the attention mechanism to more robustly identify the superimposed weak transient disturbance, which further highlights the inherent advantages of its architecture.
4.4.3. Robustness Validation with 5-Fold Cross-Validation
To provide a more robust evaluation of model performance and mitigate potential bias from a single data split, a 5-fold cross-validation was conducted on the entire mixed-fault dataset. For a fair comparison, all models were subjected to this validation procedure, with the final performance reported as the mean and standard deviation across the five folds. The overall performance comparison is presented in
Table 11. To further dissect the diagnostic capabilities of each model,
Table 12 provides a detailed comparison of the per-class recall (diagnostic accuracy) for all models across the eight distinct operational states.
The cross-validation results in
Table 11 confirm the superior overall performance and stability of the proposed Fusion Model, as evidenced by its high mean accuracy and low standard deviation.
Table 12 provides deeper insight, revealing that baseline models struggle significantly with specific, challenging classes. In contrast, the Fusion Model demonstrates consistently high recall and low variance across all fault types, including the complex compound faults, underscoring its reliability and advanced diagnostic capability.
4.4.4. Out-of-Distribution Generalization Test
To rigorously assess the generalization capability of the proposed framework, a stringent Out-of-Distribution (OOD) test was conducted, designed to simulate deployment in a completely novel engineering context. For this purpose, a new OOD test set was curated. The baseline healthy signals for this set originated from different road structures, sensor types, and axle loads, exhibiting distinct waveform characteristics compared to the training data (
Figure 23). On this new baseline, faults of significantly greater severity than those seen during training were injected, with a parameter comparison detailed in
Table 13. The model’s performance on this OOD test set is summarized in
Table 14 and
Figure 24.
As shown in
Table 14, the model’s accuracy exhibited a graceful degradation from 98.82% on the in-distribution test set to 90.71% when faced with the dual OOD challenges of unseen signal morphologies and severe faults. This result strongly indicates that the fundamental, generalizable physical signatures of faults were learned, rather than the model merely overfitting to the training data distribution. Consequently, robust diagnostic performance is maintained even in a more challenging and previously unseen environment.
The confusion matrix (
Figure 24) provides deeper insights into the model’s behavior under extreme conditions. The primary misclassification, for instance, was the confusion of drift + detach with drift + gain. This suggests that on the new signal baseline, the feature representation of a severe detachment fault superimposed on a drift trend became highly similar to that of a severe gain fault under the same drift condition. This finding not only highlights the inherent complexities of real-world diagnostics but also underscores the framework’s sensitivity in capturing subtle feature variations.
4.5. Ablation Study
To systematically deconstruct the framework and quantify the contribution of its key architectural and methodological components, a series of ablation experiments were conducted. The performance of the full model was compared against several degraded versions, with the results summarized in
Table 15.
The ablation results in
Table 15 reveal a clear synergistic effect between the framework’s components. The transition from raw data (85.17%) to multi-scale DWT features (93.60%) establishes the critical role of feature extraction as the foundation for high performance. Building on this, the learning strategy proves paramount; the poor performance of the end-to-end trained fusion model (91.76%) confirms that the staged ‘Decomposition-Focus-Fusion’ paradigm is essential to resolve optimization conflicts between the heterogeneous feature streams. Finally, the attention mechanism provides the decisive refinement, with its inclusion (98.82%) yielding a significant boost over simple feature concatenation (93.99%) by intelligently resolving feature conflicts during fusion. Therefore, the framework’s success relies on this synergy: DWT provides discriminative features, staged training enables their effective learning, and attention optimizes their final fusion.
4.6. Analysis of the Attention Mechanism
To further investigate the role of the attention mechanism in feature fusion, the attention weight distributions corresponding to randomly selected samples from the test set are visualized in
Figure 25a–d.
As shown in
Figure 25a,b, the attention weights for the short-term and long-term features exhibit distinct peaks near 0 and 1, respectively (with means of 0.456 and 0.544), indicating that the model does not statically or uniformly allocate weights. Instead, when a feature of one time scale (e.g., “drift”) dominates, its weight approaches 1, while the weight of the other feature correspondingly approaches 0. This dynamic trade-off is further confirmed by the scatter plot in
Figure 25c, where the weight data points are tightly clustered around the line of sum-to-one (with a Pearson correlation coefficient close to −1.0). The box plot in
Figure 25d compares the statistical distributions of the two weights, showing that the long-term feature weights have a greater dispersion and more outliers in the high-weight region. This may reflect the model’s heightened focus on diverse and persistent “drift” patterns. These visualizations collectively confirm that the attention mechanism can intelligently assign weights to features of different time scales based on the real-time characteristics of the input signal, achieving effective fusion of heterogeneous information. This is a key technical support for the exceptional diagnostic performance of the Fusion Model. Additionally, the introduction of Focal Loss, by compelling the model to focus on hard-to-classify samples, further enhanced its ability to discriminate subtle fault differences and effectively addressed the data imbalance problem.
4.7. Real-Time Diagnostic Potential Assessment of the Fusion Model
To evaluate the online diagnostic efficacy of the Fusion Model in practical engineering applications, a simulated continuous data stream fault diagnosis experiment was designed and conducted. This experiment used a 100-s healthy sensor signal as a base, onto which various single and compound faults were dynamically introduced at different time points according to the pre-defined fault injection strategy, thereby simulating a real-world fault evolution process. Subsequently, the continuous fault signal stream was segmented using a sliding window (length: 500, step: 25) and fed into the trained Fusion Model for real-time diagnosis.
As shown in
Figure 26, the Fusion Model accurately tracked and identified the various fault types as they dynamically appeared in the continuous signal stream. The model not only successfully distinguished between transient faults like “bias,” “gain,” and “detachment” and accurately identified the long-term “drift” fault, but it also demonstrated high-precision discrimination of complex compound faults (e.g., “drift + bias”). The diagnostic results highly matched the injection times and types of the true faults, proving the model’s fast response capability to dynamic fault evolution. A noteworthy detail is that in the 15–20 s interval, the model did not immediately detect the injected “drift” fault because its signal variation was extremely subtle in the initial stage. However, once the feature became more pronounced, the model quickly and accurately identified it, which aligns with the logic of progressive identification of weak faults in practical applications.
To quantitatively assess the model’s computational efficiency, the processing time at various diagnostic stages was further analyzed, as shown in
Figure 27a–d.
The analysis of average time consumption per stage (
Figure 27a) shows that although model prediction was the main computational overhead (0.0433 s), efficient feature extraction (0.0012 s) kept the total average processing time per window to approximately 0.0445 s. The processing time trend (
Figure 27b) and distribution histogram (
Figure 27c) further confirmed the high stability and consistency of the model’s processing speed, with no significant abnormal fluctuations. Most critically, the comparison of cumulative processing time versus actual time (
Figure 27d) clearly shows that the total time to process the entire 100-s signal stream (381 windows) was only 16.9 s. This means that the model’s average processing speed (0.044 s/window) was far faster than the data generation speed (0.25 s/window), with a real-time processing ratio much greater than 1.
In summary, the continuous data stream diagnosis experiment and processing time statistical analysis collectively validate that the proposed fusion diagnostic framework not only exhibits high accuracy and rapid response to dynamic fault evolutions but also possesses excellent computational efficiency, meeting the requirements for real-time online monitoring. This demonstrates its application potential in condition monitoring and fault warning for embedded sensors in road engineering.
4.8. Discussion
The proposed supervised framework is subject to two limitations inherent to its data-driven nature: its reliance on a synthetic dataset, which may not fully capture the variability of real-world failures, and its dependence on a predefined set of fault modes, which can be challenged by data scarcity or novel faults in the field. However, the primary contribution and generalization potential of this study lie in its “Decomposition-Focus-Fusion” architecture, which is designed as a modular and extensible blueprint to address these long-term challenges. Its adaptability allows practitioners to integrate new sub-models for novel fault types and retrain the attention layer to learn new feature relationships. Furthermore, for highly complex diagnostic scenarios involving numerous sub-models, the current fusion layer could be enhanced by adopting more advanced attention mechanisms, such as multi-head or self-attention, to capture intricate inter-dependencies between the various fault feature streams. This makes the core methodology a scalable and generalizable paradigm for complex sensor fault diagnosis tasks.