1. Introduction
Asset management across industries has undergone a substantial transformation in recent decades. With the introduction and exponential growth of technologies such as Machine Learning (ML), the Internet of Things (IoT), cyber-physical systems, and advanced sensorization, companies across various sectors are now capable of recording data from their assets, transmitting it in real time, and accessing it remotely from anywhere in the world. Moreover, the use of Artificial Intelligence (AI) enables the prediction of operating conditions based on the complex interactions among system components. This approach marks a significant shift in the way maintenance has been conducted in recent decades, evolving from traditional preventive strategies, based on fixed schedules, and corrective actions triggered by failure conditions, to a predictive, data-driven methodology [
1,
2]. Nevertheless, this new maintenance paradigm introduces a series of obstacles that must be addressed to ensure effective implementation. One of the primary barriers is the limited technical expertise within the current workforce, which can hinder the adoption of advanced digital tools. In addition, the required initial investment constitutes a considerable constraint, particularly due to the infrastructure necessary to fully exploit asset data. This includes both the deployment of platforms capable of processing and visualizing data in real time and the potential hardware adaptations needed to enable equipment to collect, transmit, and respond to information efficiently [
1,
2,
3,
4].
Recent research has increasingly explored how Machine Learning (ML) supports the digital transformation of maintenance, particularly by enhancing diagnostics, anticipating failures, and optimizing intervention timing in the automotive sector. These studies adopt diverse strategies and tools tailored to specific operational scenarios. For example, one approach proposes a remote diagnosis and maintenance system based on Least Squares Support Vector Machines (LS-SVM), capable of estimating the Remaining Useful Life (RUL) of vehicle components using sensor data and Diagnostic Trouble Codes (DTCs) [
5]. The method applies multiple classifiers per subsystem to improve accuracy, and, in a gearbox case study, LS-SVM outperformed KNN. However, low correlation among sensor inputs emerged as a key limitation, addressed through modular classification. Another contribution presents an ensemble anomaly detection system that combines one-class and two-class classifiers to identify both known and unknown faults in automotive systems using multivariate time-series data from OBD-II during road trials [
6]. The system achieved strong performance (F2-score = 83.3%) in detecting unseen faults under overland driving conditions. While developed for offline analysis, its structure supports potential deployment in online predictive maintenance contexts. A different approach focuses on predicting air compressor failures in commercial trucks using Random Forest models trained on historical workshop data [
7]. Feature selection techniques, such as beam search, enhanced performance over expert-defined variables. Limitations included inconsistent labeling, sparse sensor readings, and data quality issues. The model was implemented in an offline setting, processing data downloaded during scheduled maintenance intervals, offering a scalable and realistic strategy for operational fleets. In data-scarce predictive maintenance scenarios, classical Machine Learning models, when combined with structured feature engineering pipelines, can outperform deep learning approaches both in performance and efficiency. A recent study demonstrated that, even with a limited and imbalanced dataset, traditional classifiers such as MLP and Random Forest, supported by a modular preprocessing pipeline, achieved superior detection of critical fault states compared to CNN and LSTM architectures. This highlights the viability of interpretable and resource-efficient approaches in industrial contexts where large-scale labeled data is not readily available [
8]. Additionally, a range of other contributions have applied Machine Learning to vehicle maintenance, further confirming its versatility and practical value across different operational settings [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22].
As seen, the automotive sector has produced several studies aimed at improving maintenance through failure prediction. In the context of urban bus transportation, where large fleets must be continuously monitored to ensure service availability, Machine Learning is playing an increasingly important role. By enabling the prediction of failures, maintenance interventions can shift from being reactive or fixed-schedule to a more flexible, condition-based approach. This not only helps maximize component usage but also reduces unexpected service interruptions [
23,
24].
Several recent works have focused specifically on the application of Machine Learning (ML) in urban bus fleets. One study introduces an unsupervised method for fault detection in vehicles, including city buses, by analyzing onboard signal relationships without predefined models or labels [
25]. Each vehicle identifies anomalous behavior locally and transmits compact models to an offline server, where deviations are detected through cross-vehicle comparison. The approach is designed for real-world constraints, limited bandwidth, no extra sensors, and high system variability and showed strong results in simulated and real bus data, comparable to supervised methods. Still, it is not suitable for all systems and faces challenges such as model complexity and data sparsity. Building upon that work, a semi-supervised approach called ICOSMO periodically updates the set of sensors used for anomaly detection based on fault repair records [
26]. While it introduces a dynamic sensor selection mechanism within an IoT architecture for public bus fleets, it does not address supervised prediction tasks such as estimating fault duration, nor does it include model retraining or performance validation over time. Moreover, the ICOSMO model is still under development and has only been tested on a single vehicle. Complementing these anomaly detection strategies, another study explored Remaining Useful Life (RUL) prediction of turbo actuators in diesel engines as a way to improve preventive maintenance in bus fleets [
27]. Among several models tested, the Accelerated Weibull Failure Time (AWFT) model outperformed deep learning approaches like TabNet and RNN, offering better accuracy, lower RMSE, and clearer feature interpretability. Using offline historical and snapshot data, the model enabled proactive maintenance planning despite limited post-warranty information, highlighting the value of wear-based predictions tailored to each fleet. More recently, unsupervised learning has gained traction in predictive maintenance for bus fleets, especially when labeled failure data is limited. One study tested a context-aware method on 19 urban buses, combining streaming anomaly detection with clustering and dimensionality reduction to capture inter-vehicle variability [
28]. It achieved strong early fault detection (15–30 day horizon) and lower maintenance costs compared to baselines. Adapted TranAD and a hybrid detector proved effective in cost-sensitive scenarios, highlighting the importance of contextual modeling under data scarcity. In parallel, supervised models trained on historical service records have also been explored. A recent study applied dense neural networks, fed with both real and synthetically generated data, to capture real-world variability in failure patterns [
29]. While the model achieved high specificity (0.896) and an AUC of 0.71, its low sensitivity (0.16) due to class imbalance revealed the need for better rebalancing strategies, reinforcing the importance of techniques such as SMOGN in practical applications. Finally, some contributions have proposed IoT-integrated ML frameworks aimed at real-time predictive maintenance in hybrid and electric bus fleets [
30]. By leveraging onboard sensor data, such as engine temperature, brake wear, and battery voltage, these systems dynamically adjust maintenance schedules to reduce failures and improve fleet reliability. While the results were promising, key challenges were also noted, including data security, system interoperability, and the need for algorithmic refinement to ensure scalable deployment. Additionally, other recent contributions [
31,
32] further reinforce the potential of Machine Learning for predictive maintenance in urban bus fleets, addressing diverse components and use cases that complement the approaches discussed above.
Despite the growing interest in data-driven maintenance strategies within public transport systems, there is still a gap in studies providing clear, reproducible, and open-source pipelines that maintenance teams can adopt under real operational conditions. Most existing works assume full digitalization and access to large, well-curated datasets, an assumption that rarely matches the early stages of digital transformation, where data is limited, heterogeneous, and constrained by technological or organizational barriers. In these contexts, teams often lack not only robust analytical tools but also practical methodological guidance to extract value from the operational data already being recorded by modern vehicles.
To bridge the identified methodological gap, this study proposes and validates a reproducible Machine Learning pipeline for predictive maintenance in urban bus fleets operating under data scarcity conditions. The pipeline builds upon a previous study [
33] but moves one step further by targeting online-operational applicability in constrained environments. The prior study was conducted using a rich historical dataset and did not aim to develop or deploy a predictive model for real-time support. Instead, it focused on analyzing past regeneration behavior to extract insights into the control logic of the DPF system, using explainable artificial intelligence to assist maintenance teams in understanding and improving existing practices. Moreover, that earlier work focused exclusively on the active regeneration phase of the DPF. In contrast, the present study targets the preceding clogging phase, characterized by progressive soot accumulation. The main contributions of this work can be summarized as follows:
Development of a reproducible Machine Learning pipeline tailored for data-scarce environments in urban bus fleets, enabling the deployment of predictive maintenance strategies without requiring extensive historical datasets or full digitalization.
Systematic evaluation of lightweight and interpretable algorithms (Linear Regression, Ridge Regression, Decision Trees, KNN), in conjunction with techniques suitable for small datasets (SMOGN augmentation and Leave-One-Out Cross-Validation).
Implementation of a scheduled batch retraining strategy to enable long-term model adaptability in the face of operational changes and data drift, validated through a real-world case study on Diesel Particulate Filter (DPF) clogging.
Proposal of a scalable and transferable framework that can be adapted to other fleets and operational scenarios, supporting digital transformation efforts in public transport systems.
By combining these innovations, the proposed approach enables fleet operators to make proactive decisions, reduce service interruptions, and progressively scale their data-driven capabilities.
To demonstrate the applicability of the proposed pipeline under real operational conditions, a real-world use case was examined: The progressive clogging of the Diesel Particulate Filter (DPF) in hybrid diesel buses operating in an urban transport fleet. The DPF is a critical component of modern exhaust after-treatment systems (
Figure 1) designed to comply with stringent emission regulations, such as those mandated by the Euro VI standard, by capturing particulate matter produced during diesel combustion. Its proper functioning is essential for maintaining both environmental compliance and operational reliability in public transport systems. In urban settings, however, the regeneration process that keeps the filter clean is often compromised due to frequent stops and low engine loads, which can lead to service disruptions and unplanned maintenance. Specifically, within the fleet under study, DPF clogging represents one of the main challenges currently faced by the maintenance management team. These operational constraints make the DPF an ideal candidate for testing predictive maintenance strategies under real and challenging conditions. The remainder of this paper is structured as follows.
Section 2 describes the materials and methods, including the dataset, feature engineering steps, modeling strategy, and retraining logic.
Section 3 presents the results in terms of model performance and robustness, followed by a discussion of the implications, limitations, and practical applicability of the proposed pipeline. Finally,
Section 4 summarizes the main conclusions and outlines potential directions for future work.
2. Materials and Methods
This study proposes a structured methodology for building predictive maintenance models under real-world data constraints. The approach focuses on low-data scenarios, robust validation, and scheduled batch retraining to ensure adaptability over time. It is designed to be scalable and transferable across different operational contexts where data availability and infrastructure maturity may vary.
2.1. Problem Context and Objectives
The Diesel Particulate Filter (DPF) is a critical component in the emissions control systems of hybrid diesel buses, especially in urban environments characterized by frequent stops, low speeds, and prolonged idling. These operational conditions create suboptimal scenarios for DPF regeneration, a process that requires sustained high exhaust temperatures and engine loads to effectively burn off accumulated soot. Consequently, incomplete regenerations are common, leading to accelerated clogging and an increased need for unplanned maintenance interventions.
In the fleet under study, the Electronic Control Unit (ECU) continuously monitors the Diesel Particulate Filter (DPF) saturation level and classifies it using an ordinal soot load scale ranging from Zone 0 to Zone 5. Zones 0 to 3 correspond to normal or near-normal operating conditions, while Zone 4 is classified as critical due to its activation of engine power limitations and the need for manual regeneration. If saturation continues unchecked and the vehicle reaches Zone 5, it must be immediately withdrawn from service to undergo a forced regeneration at the depot, a process that can only be triggered using an OBDII diagnostic scanner. This procedure operates under more aggressive cleaning conditions, which, while effective, may also accelerate the degradation of the DPF. It is worth noting that both manual and forced regenerations typically require 30 to 40 min and involve keeping the bus parked and the engine at high revolutions while injecting fuel into the exhaust system to generate the necessary temperatures for soot combustion.
Table 1 summarizes the soot zones and their corresponding operational implications. From a maintenance perspective, Zones 3 and 4 are particularly critical.
In addition to the problem itself, a major limitation in this context is the slow and incremental development of the dataset, particularly with respect to the moments of interest. This gradual accumulation of data requires methodological strategies specifically tailored to scenarios of data scarcity and delayed availability.
Given this operational context, the main objective of this study is to develop a supervised learning model capable of predicting, in real time, the expected duration a vehicle will remain within Zones 3 or 4 after entering them. The target variable, duration within a soot zone, is calculated based on historical ECU data by measuring the time difference between the first and last timestamp of each uninterrupted zone occurrence. This predictive insight enables maintenance teams to assess whether a vehicle can safely complete its assigned route or requires immediate intervention.
2.2. Data Collection and Technological Infrastructure
This study was conducted using data collected from a fleet of 164 hybrid diesel buses (
Figure 2), all compliant with the EURO VI-D emissions standard. These vehicles primarily operate in urban environments and are part of a major public transport network.
Data acquisition is carried out through the vehicles’ OBDII systems and integrated telematics diagnostic tools installed across the entire fleet. These electronic systems continuously monitor and record a wide range of variables from multiple ECUs (engine, aftertreatment, HVAC, etc.). Specifically, for the aftertreatment system, up to 99 variables are available, including both direct sensor measurements, such as temperatures, pressures, and NOx concentrations, and internally calculated values, such as the time or distance since the last regeneration event. Key components monitored include the Diesel Particulate Filter (DPF), Diesel Oxidation Catalyst (DOC), Selective Catalytic Reduction (SCR) system, and the ammonia blocker. The sampling frequency of the data ranges from 10 to 30 min, depending on the signal and the operational context.
Based on prior domain-specific analyses and previous studies focused on DPF regeneration behavior, a predefined subset of variables was selected for this work. Rather than conducting a broad exploratory selection, the study directly leverages the variables known to be most informative for predicting clogging dynamics. These include, among others, the estimated soot content in the filter and operational indicators such as the distance and time since the last regeneration. The complete list of selected variables is summarized in
Table 2.
The telematics and storage infrastructure, provided by a third-party technology supplier, enables the download of data in XLSX format for subsequent processing and analysis. Although data extraction in this study was performed manually, the platform also offers API-based access, which opens the possibility for future integration of real-time and automated data ingestion. This would allow the proposed model to operate in a fully online manner while also supporting continuous validation and updates through scheduled batch retraining. The dataset was processed using Python (version 3.12.7), employing standard libraries such as pandas, matplotlib, and scikit-learn. One key limitation of the dataset is its relatively low sampling frequency of 10 min, which, while adequate for general operational monitoring, may hinder the granularity and precision of certain visualizations and downstream analyses. This constraint originates from the vehicle’s own onboard electronics (OBDII system), as its operational design imposes a limit on the number of requests that can be made per unit of time. Additionally, it does not allow simultaneous data retrieval from different ECUs, further restricting the resolution and synchronicity of the collected signals.
2.3. Exploratory Data Analysis (EDA) and Preprocessing
The exploratory data analysis (EDA) and preprocessing stages are essential in any Machine Learning workflow, as they ensure the consistency, quality, and usability of the data. Data preparation tasks, such as cleaning, transformation, and structuring, typically account for over 60% of the total effort in ML projects [
34], highlighting the foundational importance of this step.
Once the target variable and input features were defined, an initial data cleaning phase was conducted to remove entries that, based on the nature of the phenomenon under study, were not relevant for the modeling task. The following cleaning steps were applied:
Removal of regeneration periods, which do not reflect natural soot accumulation behavior.
Elimination of null, inconsistent, or physically implausible values.
Isolation of complete soot zone transitions, ensuring that only uninterrupted operational cycles were retained.
Given the limited dataset size, the analysis followed a visualization-first approach, where patterns, anomalies, and structural inconsistencies were primarily explored through graphical methods. This was carried out by using ydata-profiling (an open-source Python library), which played a central role in this process by providing an automated and comprehensive EDA report. This tool facilitated the generation of multiple statistical diagnostics, such as univariate summaries, histograms, and more, which are crucial for assessing data quality without requiring extensive manual scripting. For this specific study, heatmaps, scatter plots, and boxplots, alongside univariate statistics, were used to uncover underlying patterns and identify outliers. Additionally, expert domain knowledge was a key factor in validating any data filtering decisions and ensuring the operational plausibility of the remaining records.
The study was conducted separately for Zones 3 and 4 to better capture the distinct accumulation patterns and operating conditions of each zone. Although the full dataset included soot zone transitions from Zones 0 to 5, the scope of this study is operationally centered on Zones 3 and 4, as they represent critical thresholds for intervention and power limitation.
2.4. Feature Engineering
A feature engineering phase was carried out to enrich the representation of the soot accumulation process using variables with strong operational relevance and interdependence, specifically, the distance traveled, elapsed time, and fuel consumed since the last regeneration. These were used to derive new features aimed at enhancing the model’s ability to capture subtle behavioral patterns.
Among these, two versions of a parameter named “Fuel Intensity Index” were developed to quantify the relationship between fuel usage and driving effort over time. The first one, referred to as the “Fuel Intensity Index (Abs)”, was calculated using the absolute values of fuel consumption, distance, and time at the moment of entering a new soot load zone:
This version aimed to characterize the operational effort at the onset of each soot level.
Additionally, to capture the temporal evolution of soot accumulation, a second variable named “Fuel Intensity Index (Prev Delta)” was introduced. This feature quantifies the fuel usage relative to distance and time during the immediately preceding soot zone, offering insight into the operational dynamics that led up to the current state. It was computed as:
By integrating this lagged feature, the model is better equipped to recognize transitional behavior patterns and contextualize current observations within recent operational history.
2.5. Model Development
A supervised learning approach was adopted to predict the duration a vehicle remains in a given soot load zone, specifically Zones 3 and 4. The problem was formulated as a regression task, where the target variable is the time spent in each zone.
2.5.1. Modeling Strategy and Algorithm Selection
To address the challenge of data scarcity, the modeling strategy focused on interpretable and structurally simple regression algorithms capable of generalizing from small datasets. More complex models, such as ensemble methods or neural networks, were deliberately excluded for two main reasons. First, given the limited size of the dataset, their use would increase the risk of overfitting and reduce the model’s ability to generalize. Second, since the proposed pipeline is intended for early-stage adoption by maintenance teams with limited expertise in Machine Learning, it is essential to prioritize models that are inherently interpretable, avoiding the opacity and complexity of black-box approaches [
35,
36,
37,
38]. To implement the selected models, the training pipeline was developed using Scikit-learn, an open-source Python library widely used for machine learning tasks. Its transparency and extensive documentation make it particularly suitable for building robust and reproducible predictive workflows in real-world applications. For each model, a grid search was performed to select the best-performing hyperparameters based on the validation criteria established for this study.
The following four algorithms were selected based on their alignment with the project’s constraints and goals:
Linear Regression: A baseline model that assumes a linear relationship between input variables and the target. It is simple to interpret and computationally efficient but lacks flexibility for capturing non-linear dependencies.
Ridge Regression: An extension of Linear Regression that includes L2 regularization to mitigate overfitting. It performs better than standard Linear Regression when multicollinearity is present but still struggles with non-linear patterns.
Decision Tree: A non-parametric model that splits data based on feature thresholds. It captures non-linear relationships well and is easy to interpret but is prone to overfitting.
K-Nearest Neighbors (KNN): A simple, instance-based learner that makes predictions based on the average of the closest training samples. While it lacks traditional interpretability (e.g., coefficients or rules), its conceptual simplicity and local behavior make it useful for benchmarking in low-data contexts. However, it is sensitive to feature scaling and may become computationally inefficient when applied to large datasets.
To ensure that all input features were on a comparable scale, particularly for models sensitive to feature magnitude, such as KNN, Linear Regression, and Ridge Regression, the RobustScaler method was applied during preprocessing. This approach also mitigates the influence of outliers while preserving consistency across all algorithms.
2.5.2. Data Augmentation for Small Samples
In scenarios where training data is limited, data augmentation can help improve model learning. In this study, the critical soot accumulation zones (Zones 3 and 4) had relatively few available records, making it harder for the model to learn the full range of behaviors. One method designed for these situations is SMOGN (Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise). This technique generates new, artificial data points either by interpolating between nearby real samples or by injecting Gaussian noise into more distant ones. The goal is to increase the number of examples in areas of the dataset that are underrepresented, allowing the model to better understand and learn from those less frequent cases [
39]. To avoid introducing bias in the performance evaluation, the synthetic data were used only in the training set, while the test set remained composed entirely of real observations. This ensures that the model is evaluated only on actual vehicle behavior, reflecting its true ability to generalize.
2.6. Validation and Performance Assessment
To address the limitations imposed by the small sample size, Leave-One-Out Cross-Validation (LOOCV) was adopted as the primary validation strategy. Unlike traditional k-fold cross-validation, which can suffer from high variance and unstable validation splits in low-data contexts, LOOCV maximizes the use of available samples by iteratively training on all data points except one (
Figure 3). This method offers a nearly unbiased estimate of model generalization ability, ensuring that every observation contributes to both training and validation phases. While more computationally intensive, the small dataset size made LOOCV computationally feasible in this case.
Following model training, the focus moved to validation, where three standard metrics were used to assess performance:
R2 (coefficient of determination) to measure the proportion of variance explained by the model.
MAE (Mean Absolute Error) to evaluate average prediction error in absolute terms.
RMSE (Root Mean Squared Error) to penalize large deviations more strongly, thus capturing the overall prediction robustness.
The mathematical formulations of these evaluation metrics are defined as follows:
In addition to numerical metrics, visual diagnostic tools were employed to analyze model behavior. Scatter plots of predicted versus actual values provided an intuitive measure of alignment with the ground truth, while residual plots were used to detect potential patterns or systematic biases in the prediction errors.
Once the model demonstrated acceptable performance during the validation phase, a final training step was carried out using the entire initial dataset (i.e., all data from the first month and a half). This process aimed to maximize the model’s learning potential by incorporating the full range of observed operating conditions. The resulting model serves as the production version, to be used in subsequent evaluations and operational testing.
Feature Importance and Model Interpretability
Model interpretability plays a crucial role in predictive maintenance applications, as it enables understanding and trust in the model’s decisions. Given the variety of algorithms proposed in this study, each model requires a specific interpretability approach aligned with its intrinsic nature.
For Linear Regression and Ridge Regression, interpretability is intrinsic and based on the analysis of learned coefficients. When input features are scaled, using RobustScaler in this case, the sign and magnitude of each coefficient indicate the direction and strength of its effect on the target. A positive value implies that increasing the feature raises the predicted duration, while a negative one implies the opposite. Feature relevance can be assessed by comparing the absolute values of these coefficients.
In contrast, Decision Tree models offer interpretability through their hierarchical structure. In this work, tree depth was deliberately limited to reduce overfitting, which also enhances clarity. The model’s logic can be illustrated through tree diagrams that highlight the sequence of decisions based on feature thresholds or alternatively expressed as simplified, human-readable rules (e.g., “If the fuel intensity index is below 0.3, the expected time in Zone 3 is under 1 h”). Additionally, feature importance scores extracted from the model summarize each variable’s impact on predictions.
Finally, for K-Nearest Neighbors, which lacks internal model coefficients or rules, a model-agnostic approach was required. In this case, permutation importance was used to assess each feature’s relevance by measuring the decrease in R2 when the values of a feature are randomly permuted in the test set. This method enables the estimation of each variable’s contribution to the model’s predictive accuracy, even in non-parametric settings such as KNN.
2.7. Adaptive Retraining Strategy
To ensure that the predictive model remains accurate and relevant over time, a scheduled batch retraining strategy was incorporated into the pipeline. This approach is particularly suited for operational environments where data accumulates gradually. Instead of relying on continuous updates, the model is periodically re-evaluated with newly acquired data and is only retrained when a significant performance drop is detected.
Model performance is monitored using the same set of metrics established during the initial evaluation, namely, R2, MAE, and RMSE. To determine when retraining is warranted, a trigger threshold is defined based on the model’s initial performance gain over a baseline predictor (e.g., a mean-based model). After each one-month period of new operational data, the model’s performance is reassessed by averaging results over this window. If performance deteriorates to the point where the original improvement is lost, such as RMSE approaching baseline levels, retraining is triggered. This periodic assessment ensures that model updates are evidence-based, timely, and operationally justified, avoiding unnecessary changes while preserving reliability.
To simulate real-world deployment and evaluate long-term robustness, a one-month batch of data, completely withheld during the model development phase, was used as a final test set. These records, obtained from the historical files described earlier in the data collection phase, serve as a proxy for future operational scenarios. The evaluation of this set determines whether the model requires retraining or whether its prediction error remains acceptable.
As more operational data becomes available over time, the retraining strategy can evolve not only in frequency but also in modeling complexity. A larger dataset would enable the safe adoption of more sophisticated models, such as ensemble methods or neural networks, while mitigating overfitting risks. Furthermore, the increased data volume would support the inclusion of additional features, potentially reducing prediction variance and enhancing generalization across diverse operational scenarios. This evolution could also extend to the validation strategy: while Leave-One-Out Cross-Validation (LOOCV) was appropriate under data-scarce conditions, it is important to acknowledge its limitations, particularly its high computational cost and tendency to produce high-variance estimates. In later stages, alternative validation methods such as k-fold cross-validation could offer a more efficient and robust solution for larger datasets.
Although the data used in this case study were manually downloaded, the telematics platform provides an API that enables automated access. The methodology described in this section is designed under the assumption that the API is active, allowing full automation of both the real-time prediction process and the scheduled model evaluations.
2.8. Summary of the Proposed Methodology
Figure 4 summarizes the complete methodological pipeline, outlining the key stages of model development, deployment, and retraining. Each block reflects a critical phase in the lifecycle of the predictive maintenance pipeline.
4. Conclusions
This study aimed to establish a complete and reproducible methodology for leveraging operational data in scenarios characterized by data scarcity, as commonly encountered in urban bus fleets undergoing the early stages of maintenance digitalization. Building upon a previously proposed framework, the methodology was adapted to include specific steps, tools, and practical considerations to guide maintenance teams in initiating data-driven analysis even with limited resources. A key enhancement introduced was the incorporation of a scheduled batch retraining strategy, helping maintain predictive performance in the face of evolving conditions. To demonstrate its applicability, a real-world case study was conducted focusing on the prediction of the duration of soot load levels in the Diesel Particulate Filter (DPF), specifically Zones 3 and 4, which precede conditions that can significantly impact bus operability. Once the training and deployment simulations were completed, two distinct performance scenarios emerged. In Zone 4, despite data limitations, the deployed model continued to outperform the updated baseline (MAE: 0.6758 vs. 0.9612, RMSE: 0.9330 vs. 1.0722, R2: 0.24), demonstrating a certain level of robustness. This suggests that while model updates could enhance performance, the current model remains sufficiently accurate for operational use until the next validation cycle. However, in Zone 3, a substantial drop in evaluation metrics was observed during the new test batch (R2 ≈ −0.10, MAE ≈ 1.01 h, RMSE ≈ 1.16 h), indicating that the previously deployed model was no longer capable of capturing the underlying dynamics. This highlights the need for a model revision process, including an updated exploratory analysis and retraining with the most recent data. The overall results confirm the potential and practical utility of the proposed methodology. While predictive accuracy varied across zones, particularly with a notable drop observed in Zone 3, the framework proved effective in identifying performance degradation, guiding necessary revisions, and supporting decision-making under data-scarce and evolving operational conditions. This adaptability highlights the methodology’s practical value for fleet managers operating in the early stages of digital transformation. Several limitations emerged during the application of the proposed approach. One key challenge is the potential for operational changes in the bus fleet, such as new routes, different traffic conditions, or driving behaviors, which can introduce data drift—i.e., shifts in the statistical distribution of the input data that degrade model performance over time. Additionally, the emergence of new outliers highlights the need for ongoing monitoring and data validation. Another limitation is the lack of data in specific ranges; for example, in Zone 4, the model tended to overestimate when the real duration was below 2.5 h, indicating reduced accuracy for short-duration events. These findings emphasize the importance of continuous evaluation and model adaptation. Looking ahead, the next steps for the case study focus on operational deployment. First, the predictive models will be implemented using the existing API connection to access the necessary data streams in real time. Additionally, since the data analysis tools have already been defined and validated, the next logical automation step is to link the database to a dedicated processing module that can automatically generate the performance indicators and plots outlined in this study. This would significantly accelerate the review process for maintenance managers. Integrating a notification system or dashboard could further enhance usability, ensuring that key insights are delivered promptly to the right stakeholders. Moreover, as data availability grows, the pipeline could also be scaled to incorporate more complex models and additional variables, further improving predictive performance. Finally, the proposed methodology could be adapted to other transit fleets or urban contexts with similar data constraints, serving as a foundation for broader applications of predictive maintenance in the public transport sector.