A Reproducible Pipeline for Leveraging Operational Data Through Machine Learning in Digitally Emerging Urban Bus Fleets

Tormos, Bernardo; Bermudez, Vicente; Sánchez-Márquez, Ramón; Alvis, Jorge

doi:10.3390/app15158395

Open AccessArticle

A Reproducible Pipeline for Leveraging Operational Data Through Machine Learning in Digitally Emerging Urban Bus Fleets

CMT-Clean Mobility & Thermofluids, Universitat Politècnica de València, 46022 Valencia, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8395; https://doi.org/10.3390/app15158395

Submission received: 20 June 2025 / Revised: 22 July 2025 / Accepted: 24 July 2025 / Published: 29 July 2025

(This article belongs to the Special Issue Big-Data-Driven Advances in Smart Maintenance and Industry 4.0)

Download

Browse Figures

Versions Notes

Abstract

The adoption of predictive maintenance in public transportation has gained increasing attention in the context of Industry 4.0. However, many urban bus fleets remain in early digital transformation stages, with limited historical data and fragmented infrastructures that hinder the implementation of data-driven strategies. This study proposes a reproducible Machine Learning pipeline tailored to such data-scarce conditions, integrating domain-informed feature engineering, lightweight and interpretable models (Linear Regression, Ridge Regression, Decision Trees, KNN), SMOGN for imbalance handling, and Leave-One-Out Cross-Validation for robust evaluation. A scheduled batch retraining strategy is incorporated to adapt the model as new data becomes available. The pipeline is validated using real-world data from hybrid diesel buses, focusing on the prediction of time spent in critical soot accumulation zones of the Diesel Particulate Filter (DPF). In Zone 4, the model continued to outperform the baseline during the production test, indicating its validity for an additional operational period. In contrast, model performance in Zone 3 deteriorated over time, triggering retraining. These results confirm the pipeline’s ability to detect performance drift and support predictive maintenance decisions under evolving operational constraints. The proposed framework offers a scalable solution for digitally emerging fleets.

Keywords:

urban bus fleets; machine learning; data scarcity; data-driven maintenance; fleet digitalization; scheduled batch retraining

1. Introduction

Asset management across industries has undergone a substantial transformation in recent decades. With the introduction and exponential growth of technologies such as Machine Learning (ML), the Internet of Things (IoT), cyber-physical systems, and advanced sensorization, companies across various sectors are now capable of recording data from their assets, transmitting it in real time, and accessing it remotely from anywhere in the world. Moreover, the use of Artificial Intelligence (AI) enables the prediction of operating conditions based on the complex interactions among system components. This approach marks a significant shift in the way maintenance has been conducted in recent decades, evolving from traditional preventive strategies, based on fixed schedules, and corrective actions triggered by failure conditions, to a predictive, data-driven methodology [1,2]. Nevertheless, this new maintenance paradigm introduces a series of obstacles that must be addressed to ensure effective implementation. One of the primary barriers is the limited technical expertise within the current workforce, which can hinder the adoption of advanced digital tools. In addition, the required initial investment constitutes a considerable constraint, particularly due to the infrastructure necessary to fully exploit asset data. This includes both the deployment of platforms capable of processing and visualizing data in real time and the potential hardware adaptations needed to enable equipment to collect, transmit, and respond to information efficiently [1,2,3,4].

Recent research has increasingly explored how Machine Learning (ML) supports the digital transformation of maintenance, particularly by enhancing diagnostics, anticipating failures, and optimizing intervention timing in the automotive sector. These studies adopt diverse strategies and tools tailored to specific operational scenarios. For example, one approach proposes a remote diagnosis and maintenance system based on Least Squares Support Vector Machines (LS-SVM), capable of estimating the Remaining Useful Life (RUL) of vehicle components using sensor data and Diagnostic Trouble Codes (DTCs) [5]. The method applies multiple classifiers per subsystem to improve accuracy, and, in a gearbox case study, LS-SVM outperformed KNN. However, low correlation among sensor inputs emerged as a key limitation, addressed through modular classification. Another contribution presents an ensemble anomaly detection system that combines one-class and two-class classifiers to identify both known and unknown faults in automotive systems using multivariate time-series data from OBD-II during road trials [6]. The system achieved strong performance (F2-score = 83.3%) in detecting unseen faults under overland driving conditions. While developed for offline analysis, its structure supports potential deployment in online predictive maintenance contexts. A different approach focuses on predicting air compressor failures in commercial trucks using Random Forest models trained on historical workshop data [7]. Feature selection techniques, such as beam search, enhanced performance over expert-defined variables. Limitations included inconsistent labeling, sparse sensor readings, and data quality issues. The model was implemented in an offline setting, processing data downloaded during scheduled maintenance intervals, offering a scalable and realistic strategy for operational fleets. In data-scarce predictive maintenance scenarios, classical Machine Learning models, when combined with structured feature engineering pipelines, can outperform deep learning approaches both in performance and efficiency. A recent study demonstrated that, even with a limited and imbalanced dataset, traditional classifiers such as MLP and Random Forest, supported by a modular preprocessing pipeline, achieved superior detection of critical fault states compared to CNN and LSTM architectures. This highlights the viability of interpretable and resource-efficient approaches in industrial contexts where large-scale labeled data is not readily available [8]. Additionally, a range of other contributions have applied Machine Learning to vehicle maintenance, further confirming its versatility and practical value across different operational settings [9,10,11,12,13,14,15,16,17,18,19,20,21,22].

As seen, the automotive sector has produced several studies aimed at improving maintenance through failure prediction. In the context of urban bus transportation, where large fleets must be continuously monitored to ensure service availability, Machine Learning is playing an increasingly important role. By enabling the prediction of failures, maintenance interventions can shift from being reactive or fixed-schedule to a more flexible, condition-based approach. This not only helps maximize component usage but also reduces unexpected service interruptions [23,24].

Several recent works have focused specifically on the application of Machine Learning (ML) in urban bus fleets. One study introduces an unsupervised method for fault detection in vehicles, including city buses, by analyzing onboard signal relationships without predefined models or labels [25]. Each vehicle identifies anomalous behavior locally and transmits compact models to an offline server, where deviations are detected through cross-vehicle comparison. The approach is designed for real-world constraints, limited bandwidth, no extra sensors, and high system variability and showed strong results in simulated and real bus data, comparable to supervised methods. Still, it is not suitable for all systems and faces challenges such as model complexity and data sparsity. Building upon that work, a semi-supervised approach called ICOSMO periodically updates the set of sensors used for anomaly detection based on fault repair records [26]. While it introduces a dynamic sensor selection mechanism within an IoT architecture for public bus fleets, it does not address supervised prediction tasks such as estimating fault duration, nor does it include model retraining or performance validation over time. Moreover, the ICOSMO model is still under development and has only been tested on a single vehicle. Complementing these anomaly detection strategies, another study explored Remaining Useful Life (RUL) prediction of turbo actuators in diesel engines as a way to improve preventive maintenance in bus fleets [27]. Among several models tested, the Accelerated Weibull Failure Time (AWFT) model outperformed deep learning approaches like TabNet and RNN, offering better accuracy, lower RMSE, and clearer feature interpretability. Using offline historical and snapshot data, the model enabled proactive maintenance planning despite limited post-warranty information, highlighting the value of wear-based predictions tailored to each fleet. More recently, unsupervised learning has gained traction in predictive maintenance for bus fleets, especially when labeled failure data is limited. One study tested a context-aware method on 19 urban buses, combining streaming anomaly detection with clustering and dimensionality reduction to capture inter-vehicle variability [28]. It achieved strong early fault detection (15–30 day horizon) and lower maintenance costs compared to baselines. Adapted TranAD and a hybrid detector proved effective in cost-sensitive scenarios, highlighting the importance of contextual modeling under data scarcity. In parallel, supervised models trained on historical service records have also been explored. A recent study applied dense neural networks, fed with both real and synthetically generated data, to capture real-world variability in failure patterns [29]. While the model achieved high specificity (0.896) and an AUC of 0.71, its low sensitivity (0.16) due to class imbalance revealed the need for better rebalancing strategies, reinforcing the importance of techniques such as SMOGN in practical applications. Finally, some contributions have proposed IoT-integrated ML frameworks aimed at real-time predictive maintenance in hybrid and electric bus fleets [30]. By leveraging onboard sensor data, such as engine temperature, brake wear, and battery voltage, these systems dynamically adjust maintenance schedules to reduce failures and improve fleet reliability. While the results were promising, key challenges were also noted, including data security, system interoperability, and the need for algorithmic refinement to ensure scalable deployment. Additionally, other recent contributions [31,32] further reinforce the potential of Machine Learning for predictive maintenance in urban bus fleets, addressing diverse components and use cases that complement the approaches discussed above.

Despite the growing interest in data-driven maintenance strategies within public transport systems, there is still a gap in studies providing clear, reproducible, and open-source pipelines that maintenance teams can adopt under real operational conditions. Most existing works assume full digitalization and access to large, well-curated datasets, an assumption that rarely matches the early stages of digital transformation, where data is limited, heterogeneous, and constrained by technological or organizational barriers. In these contexts, teams often lack not only robust analytical tools but also practical methodological guidance to extract value from the operational data already being recorded by modern vehicles.

To bridge the identified methodological gap, this study proposes and validates a reproducible Machine Learning pipeline for predictive maintenance in urban bus fleets operating under data scarcity conditions. The pipeline builds upon a previous study [33] but moves one step further by targeting online-operational applicability in constrained environments. The prior study was conducted using a rich historical dataset and did not aim to develop or deploy a predictive model for real-time support. Instead, it focused on analyzing past regeneration behavior to extract insights into the control logic of the DPF system, using explainable artificial intelligence to assist maintenance teams in understanding and improving existing practices. Moreover, that earlier work focused exclusively on the active regeneration phase of the DPF. In contrast, the present study targets the preceding clogging phase, characterized by progressive soot accumulation. The main contributions of this work can be summarized as follows:

Development of a reproducible Machine Learning pipeline tailored for data-scarce environments in urban bus fleets, enabling the deployment of predictive maintenance strategies without requiring extensive historical datasets or full digitalization.
Systematic evaluation of lightweight and interpretable algorithms (Linear Regression, Ridge Regression, Decision Trees, KNN), in conjunction with techniques suitable for small datasets (SMOGN augmentation and Leave-One-Out Cross-Validation).
Implementation of a scheduled batch retraining strategy to enable long-term model adaptability in the face of operational changes and data drift, validated through a real-world case study on Diesel Particulate Filter (DPF) clogging.
Proposal of a scalable and transferable framework that can be adapted to other fleets and operational scenarios, supporting digital transformation efforts in public transport systems.

By combining these innovations, the proposed approach enables fleet operators to make proactive decisions, reduce service interruptions, and progressively scale their data-driven capabilities.

To demonstrate the applicability of the proposed pipeline under real operational conditions, a real-world use case was examined: The progressive clogging of the Diesel Particulate Filter (DPF) in hybrid diesel buses operating in an urban transport fleet. The DPF is a critical component of modern exhaust after-treatment systems (Figure 1) designed to comply with stringent emission regulations, such as those mandated by the Euro VI standard, by capturing particulate matter produced during diesel combustion. Its proper functioning is essential for maintaining both environmental compliance and operational reliability in public transport systems. In urban settings, however, the regeneration process that keeps the filter clean is often compromised due to frequent stops and low engine loads, which can lead to service disruptions and unplanned maintenance. Specifically, within the fleet under study, DPF clogging represents one of the main challenges currently faced by the maintenance management team. These operational constraints make the DPF an ideal candidate for testing predictive maintenance strategies under real and challenging conditions. The remainder of this paper is structured as follows. Section 2 describes the materials and methods, including the dataset, feature engineering steps, modeling strategy, and retraining logic. Section 3 presents the results in terms of model performance and robustness, followed by a discussion of the implications, limitations, and practical applicability of the proposed pipeline. Finally, Section 4 summarizes the main conclusions and outlines potential directions for future work.

2. Materials and Methods

This study proposes a structured methodology for building predictive maintenance models under real-world data constraints. The approach focuses on low-data scenarios, robust validation, and scheduled batch retraining to ensure adaptability over time. It is designed to be scalable and transferable across different operational contexts where data availability and infrastructure maturity may vary.

2.1. Problem Context and Objectives

The Diesel Particulate Filter (DPF) is a critical component in the emissions control systems of hybrid diesel buses, especially in urban environments characterized by frequent stops, low speeds, and prolonged idling. These operational conditions create suboptimal scenarios for DPF regeneration, a process that requires sustained high exhaust temperatures and engine loads to effectively burn off accumulated soot. Consequently, incomplete regenerations are common, leading to accelerated clogging and an increased need for unplanned maintenance interventions.

In the fleet under study, the Electronic Control Unit (ECU) continuously monitors the Diesel Particulate Filter (DPF) saturation level and classifies it using an ordinal soot load scale ranging from Zone 0 to Zone 5. Zones 0 to 3 correspond to normal or near-normal operating conditions, while Zone 4 is classified as critical due to its activation of engine power limitations and the need for manual regeneration. If saturation continues unchecked and the vehicle reaches Zone 5, it must be immediately withdrawn from service to undergo a forced regeneration at the depot, a process that can only be triggered using an OBDII diagnostic scanner. This procedure operates under more aggressive cleaning conditions, which, while effective, may also accelerate the degradation of the DPF. It is worth noting that both manual and forced regenerations typically require 30 to 40 min and involve keeping the bus parked and the engine at high revolutions while injecting fuel into the exhaust system to generate the necessary temperatures for soot combustion. Table 1 summarizes the soot zones and their corresponding operational implications. From a maintenance perspective, Zones 3 and 4 are particularly critical.

In addition to the problem itself, a major limitation in this context is the slow and incremental development of the dataset, particularly with respect to the moments of interest. This gradual accumulation of data requires methodological strategies specifically tailored to scenarios of data scarcity and delayed availability.

Given this operational context, the main objective of this study is to develop a supervised learning model capable of predicting, in real time, the expected duration a vehicle will remain within Zones 3 or 4 after entering them. The target variable, duration within a soot zone, is calculated based on historical ECU data by measuring the time difference between the first and last timestamp of each uninterrupted zone occurrence. This predictive insight enables maintenance teams to assess whether a vehicle can safely complete its assigned route or requires immediate intervention.

2.2. Data Collection and Technological Infrastructure

This study was conducted using data collected from a fleet of 164 hybrid diesel buses (Figure 2), all compliant with the EURO VI-D emissions standard. These vehicles primarily operate in urban environments and are part of a major public transport network.

Data acquisition is carried out through the vehicles’ OBDII systems and integrated telematics diagnostic tools installed across the entire fleet. These electronic systems continuously monitor and record a wide range of variables from multiple ECUs (engine, aftertreatment, HVAC, etc.). Specifically, for the aftertreatment system, up to 99 variables are available, including both direct sensor measurements, such as temperatures, pressures, and NOx concentrations, and internally calculated values, such as the time or distance since the last regeneration event. Key components monitored include the Diesel Particulate Filter (DPF), Diesel Oxidation Catalyst (DOC), Selective Catalytic Reduction (SCR) system, and the ammonia blocker. The sampling frequency of the data ranges from 10 to 30 min, depending on the signal and the operational context.

Based on prior domain-specific analyses and previous studies focused on DPF regeneration behavior, a predefined subset of variables was selected for this work. Rather than conducting a broad exploratory selection, the study directly leverages the variables known to be most informative for predicting clogging dynamics. These include, among others, the estimated soot content in the filter and operational indicators such as the distance and time since the last regeneration. The complete list of selected variables is summarized in Table 2.

The telematics and storage infrastructure, provided by a third-party technology supplier, enables the download of data in XLSX format for subsequent processing and analysis. Although data extraction in this study was performed manually, the platform also offers API-based access, which opens the possibility for future integration of real-time and automated data ingestion. This would allow the proposed model to operate in a fully online manner while also supporting continuous validation and updates through scheduled batch retraining. The dataset was processed using Python (version 3.12.7), employing standard libraries such as pandas, matplotlib, and scikit-learn. One key limitation of the dataset is its relatively low sampling frequency of 10 min, which, while adequate for general operational monitoring, may hinder the granularity and precision of certain visualizations and downstream analyses. This constraint originates from the vehicle’s own onboard electronics (OBDII system), as its operational design imposes a limit on the number of requests that can be made per unit of time. Additionally, it does not allow simultaneous data retrieval from different ECUs, further restricting the resolution and synchronicity of the collected signals.

2.3. Exploratory Data Analysis (EDA) and Preprocessing

The exploratory data analysis (EDA) and preprocessing stages are essential in any Machine Learning workflow, as they ensure the consistency, quality, and usability of the data. Data preparation tasks, such as cleaning, transformation, and structuring, typically account for over 60% of the total effort in ML projects [34], highlighting the foundational importance of this step.

Once the target variable and input features were defined, an initial data cleaning phase was conducted to remove entries that, based on the nature of the phenomenon under study, were not relevant for the modeling task. The following cleaning steps were applied:

Removal of regeneration periods, which do not reflect natural soot accumulation behavior.
Elimination of null, inconsistent, or physically implausible values.
Isolation of complete soot zone transitions, ensuring that only uninterrupted operational cycles were retained.

Given the limited dataset size, the analysis followed a visualization-first approach, where patterns, anomalies, and structural inconsistencies were primarily explored through graphical methods. This was carried out by using ydata-profiling (an open-source Python library), which played a central role in this process by providing an automated and comprehensive EDA report. This tool facilitated the generation of multiple statistical diagnostics, such as univariate summaries, histograms, and more, which are crucial for assessing data quality without requiring extensive manual scripting. For this specific study, heatmaps, scatter plots, and boxplots, alongside univariate statistics, were used to uncover underlying patterns and identify outliers. Additionally, expert domain knowledge was a key factor in validating any data filtering decisions and ensuring the operational plausibility of the remaining records.

The study was conducted separately for Zones 3 and 4 to better capture the distinct accumulation patterns and operating conditions of each zone. Although the full dataset included soot zone transitions from Zones 0 to 5, the scope of this study is operationally centered on Zones 3 and 4, as they represent critical thresholds for intervention and power limitation.

2.4. Feature Engineering

A feature engineering phase was carried out to enrich the representation of the soot accumulation process using variables with strong operational relevance and interdependence, specifically, the distance traveled, elapsed time, and fuel consumed since the last regeneration. These were used to derive new features aimed at enhancing the model’s ability to capture subtle behavioral patterns.

Among these, two versions of a parameter named “Fuel Intensity Index” were developed to quantify the relationship between fuel usage and driving effort over time. The first one, referred to as the “Fuel Intensity Index (Abs)”, was calculated using the absolute values of fuel consumption, distance, and time at the moment of entering a new soot load zone:

F u e l I n t e n s i t y I n d e x (A b s) [\frac{L}{k m * h}] = \frac{F u e l c o n s u m p t i o n [L]}{D i s t a n c e [k m] * T i m e [h]}

This version aimed to characterize the operational effort at the onset of each soot level.

Additionally, to capture the temporal evolution of soot accumulation, a second variable named “Fuel Intensity Index (Prev Delta)” was introduced. This feature quantifies the fuel usage relative to distance and time during the immediately preceding soot zone, offering insight into the operational dynamics that led up to the current state. It was computed as:

F u e l I n t e n s i t y I n d e x (P r e v D e l t a) [\frac{L}{k m * h}] = \frac{D e l t a f u e l c o n s u m p t i o n p r e v z o n e (L)}{D e l t a d i s t a n c e p r e v z o n e [k m] * D e l t a t i m e p r e v z o n e [h]}

By integrating this lagged feature, the model is better equipped to recognize transitional behavior patterns and contextualize current observations within recent operational history.

2.5. Model Development

A supervised learning approach was adopted to predict the duration a vehicle remains in a given soot load zone, specifically Zones 3 and 4. The problem was formulated as a regression task, where the target variable is the time spent in each zone.

2.5.1. Modeling Strategy and Algorithm Selection

To address the challenge of data scarcity, the modeling strategy focused on interpretable and structurally simple regression algorithms capable of generalizing from small datasets. More complex models, such as ensemble methods or neural networks, were deliberately excluded for two main reasons. First, given the limited size of the dataset, their use would increase the risk of overfitting and reduce the model’s ability to generalize. Second, since the proposed pipeline is intended for early-stage adoption by maintenance teams with limited expertise in Machine Learning, it is essential to prioritize models that are inherently interpretable, avoiding the opacity and complexity of black-box approaches [35,36,37,38]. To implement the selected models, the training pipeline was developed using Scikit-learn, an open-source Python library widely used for machine learning tasks. Its transparency and extensive documentation make it particularly suitable for building robust and reproducible predictive workflows in real-world applications. For each model, a grid search was performed to select the best-performing hyperparameters based on the validation criteria established for this study.

The following four algorithms were selected based on their alignment with the project’s constraints and goals:

Linear Regression: A baseline model that assumes a linear relationship between input variables and the target. It is simple to interpret and computationally efficient but lacks flexibility for capturing non-linear dependencies.
Ridge Regression: An extension of Linear Regression that includes L2 regularization to mitigate overfitting. It performs better than standard Linear Regression when multicollinearity is present but still struggles with non-linear patterns.
Decision Tree: A non-parametric model that splits data based on feature thresholds. It captures non-linear relationships well and is easy to interpret but is prone to overfitting.
K-Nearest Neighbors (KNN): A simple, instance-based learner that makes predictions based on the average of the closest training samples. While it lacks traditional interpretability (e.g., coefficients or rules), its conceptual simplicity and local behavior make it useful for benchmarking in low-data contexts. However, it is sensitive to feature scaling and may become computationally inefficient when applied to large datasets.

To ensure that all input features were on a comparable scale, particularly for models sensitive to feature magnitude, such as KNN, Linear Regression, and Ridge Regression, the RobustScaler method was applied during preprocessing. This approach also mitigates the influence of outliers while preserving consistency across all algorithms.

2.5.2. Data Augmentation for Small Samples

In scenarios where training data is limited, data augmentation can help improve model learning. In this study, the critical soot accumulation zones (Zones 3 and 4) had relatively few available records, making it harder for the model to learn the full range of behaviors. One method designed for these situations is SMOGN (Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise). This technique generates new, artificial data points either by interpolating between nearby real samples or by injecting Gaussian noise into more distant ones. The goal is to increase the number of examples in areas of the dataset that are underrepresented, allowing the model to better understand and learn from those less frequent cases [39]. To avoid introducing bias in the performance evaluation, the synthetic data were used only in the training set, while the test set remained composed entirely of real observations. This ensures that the model is evaluated only on actual vehicle behavior, reflecting its true ability to generalize.

2.6. Validation and Performance Assessment

To address the limitations imposed by the small sample size, Leave-One-Out Cross-Validation (LOOCV) was adopted as the primary validation strategy. Unlike traditional k-fold cross-validation, which can suffer from high variance and unstable validation splits in low-data contexts, LOOCV maximizes the use of available samples by iteratively training on all data points except one (Figure 3). This method offers a nearly unbiased estimate of model generalization ability, ensuring that every observation contributes to both training and validation phases. While more computationally intensive, the small dataset size made LOOCV computationally feasible in this case.

Following model training, the focus moved to validation, where three standard metrics were used to assess performance:

R² (coefficient of determination) to measure the proportion of variance explained by the model.
MAE (Mean Absolute Error) to evaluate average prediction error in absolute terms.
RMSE (Root Mean Squared Error) to penalize large deviations more strongly, thus capturing the overall prediction robustness.

The mathematical formulations of these evaluation metrics are defined as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

M A E = \frac{1}{n} * \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

R M S E = \sqrt{\frac{1}{n} * \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

In addition to numerical metrics, visual diagnostic tools were employed to analyze model behavior. Scatter plots of predicted versus actual values provided an intuitive measure of alignment with the ground truth, while residual plots were used to detect potential patterns or systematic biases in the prediction errors.

Once the model demonstrated acceptable performance during the validation phase, a final training step was carried out using the entire initial dataset (i.e., all data from the first month and a half). This process aimed to maximize the model’s learning potential by incorporating the full range of observed operating conditions. The resulting model serves as the production version, to be used in subsequent evaluations and operational testing.

Feature Importance and Model Interpretability

Model interpretability plays a crucial role in predictive maintenance applications, as it enables understanding and trust in the model’s decisions. Given the variety of algorithms proposed in this study, each model requires a specific interpretability approach aligned with its intrinsic nature.

For Linear Regression and Ridge Regression, interpretability is intrinsic and based on the analysis of learned coefficients. When input features are scaled, using RobustScaler in this case, the sign and magnitude of each coefficient indicate the direction and strength of its effect on the target. A positive value implies that increasing the feature raises the predicted duration, while a negative one implies the opposite. Feature relevance can be assessed by comparing the absolute values of these coefficients.

In contrast, Decision Tree models offer interpretability through their hierarchical structure. In this work, tree depth was deliberately limited to reduce overfitting, which also enhances clarity. The model’s logic can be illustrated through tree diagrams that highlight the sequence of decisions based on feature thresholds or alternatively expressed as simplified, human-readable rules (e.g., “If the fuel intensity index is below 0.3, the expected time in Zone 3 is under 1 h”). Additionally, feature importance scores extracted from the model summarize each variable’s impact on predictions.

Finally, for K-Nearest Neighbors, which lacks internal model coefficients or rules, a model-agnostic approach was required. In this case, permutation importance was used to assess each feature’s relevance by measuring the decrease in R² when the values of a feature are randomly permuted in the test set. This method enables the estimation of each variable’s contribution to the model’s predictive accuracy, even in non-parametric settings such as KNN.

2.7. Adaptive Retraining Strategy

To ensure that the predictive model remains accurate and relevant over time, a scheduled batch retraining strategy was incorporated into the pipeline. This approach is particularly suited for operational environments where data accumulates gradually. Instead of relying on continuous updates, the model is periodically re-evaluated with newly acquired data and is only retrained when a significant performance drop is detected.

Model performance is monitored using the same set of metrics established during the initial evaluation, namely, R², MAE, and RMSE. To determine when retraining is warranted, a trigger threshold is defined based on the model’s initial performance gain over a baseline predictor (e.g., a mean-based model). After each one-month period of new operational data, the model’s performance is reassessed by averaging results over this window. If performance deteriorates to the point where the original improvement is lost, such as RMSE approaching baseline levels, retraining is triggered. This periodic assessment ensures that model updates are evidence-based, timely, and operationally justified, avoiding unnecessary changes while preserving reliability.

To simulate real-world deployment and evaluate long-term robustness, a one-month batch of data, completely withheld during the model development phase, was used as a final test set. These records, obtained from the historical files described earlier in the data collection phase, serve as a proxy for future operational scenarios. The evaluation of this set determines whether the model requires retraining or whether its prediction error remains acceptable.

As more operational data becomes available over time, the retraining strategy can evolve not only in frequency but also in modeling complexity. A larger dataset would enable the safe adoption of more sophisticated models, such as ensemble methods or neural networks, while mitigating overfitting risks. Furthermore, the increased data volume would support the inclusion of additional features, potentially reducing prediction variance and enhancing generalization across diverse operational scenarios. This evolution could also extend to the validation strategy: while Leave-One-Out Cross-Validation (LOOCV) was appropriate under data-scarce conditions, it is important to acknowledge its limitations, particularly its high computational cost and tendency to produce high-variance estimates. In later stages, alternative validation methods such as k-fold cross-validation could offer a more efficient and robust solution for larger datasets.

Although the data used in this case study were manually downloaded, the telematics platform provides an API that enables automated access. The methodology described in this section is designed under the assumption that the API is active, allowing full automation of both the real-time prediction process and the scheduled model evaluations.

2.8. Summary of the Proposed Methodology

Figure 4 summarizes the complete methodological pipeline, outlining the key stages of model development, deployment, and retraining. Each block reflects a critical phase in the lifecycle of the predictive maintenance pipeline.

3. Results and Discussion

This section presents the results of the study, beginning with a summary of the exploratory data analysis (EDA) and the rationale behind key preprocessing and modeling decisions. It then delves into the performance of the predictive models developed for each soot load zone, concluding with an operational test using new data to evaluate the models’ robustness and assess their suitability for continued deployment.

3.1. Exploratory Data Analysis

The exploratory analysis was carried out separately for each soot load zone (Zone 3 and Zone 4), with the primary objective of identifying relevant variable interactions for modeling the duration a vehicle remains in each zone. The process was iterative and guided by both domain expertise and data-driven criteria, which led to the progressive refinement of the dataset.

Among the filtering steps, data points with extreme or implausible values were removed to improve the consistency and representativeness of the sample. For this purpose, the ydata-profiling library, previously introduced in the methodology, was used to generate a comprehensive statistical summary of the dataset. For instance, Figure 5 and Figure 6 present the distribution of the variable “Distance since last regeneration” for Zones 3 and 4, respectively. In Zone 3, the distribution of distance since last regeneration exhibited a strong positive skew (skewness = 4.03), indicating a long right tail with the presence of extreme values. While the interquartile range (IQR) was 125.5 km and the 95th percentile stood at approximately 563.6 km, the maximum recorded value reached 1644 km, more than three times the third quartile (Q3 = 325 km). The upper Tukey limit for this variable was calculated at 513.25 km, confirming the presence of outliers well beyond the expected range. Based on this analysis, a conservative upper threshold of 500 km was applied to remove these anomalous records and preserve only representative operational patterns. In Zone 4, a similar distributional shape was observed, with a skewness of 2.65. Although the values were less extreme than in Zone 3, the 95th percentile still reached approximately 772 km, and the maximum value observed was 1275 km. The interquartile range was slightly higher (140 km), and the corresponding Tukey threshold was estimated at 533 km. Given this, a consistent cutoff of 500 km was also applied in this case to eliminate atypical observations that could distort model learning. This univariate filtering approach was systematically applied to other core features to identify and remove outliers, helping to define the normal operational range of each variable.

Once the cleaning steps are completed, Figure 7 and Figure 8 show the resulting correlation heatmaps for Zones 3 and 4, respectively, while Table 3 displays the codification used for presenting the variables. In the case of Zone 3, it is evident that the absolute variables, namely, distance traveled (Variable 1), time elapsed (Variable 4), and fuel consumed (Variable 3), exhibit the highest positive correlations with the target variable Delta_time [h] (Variable 10), with correlation coefficients of 0.61, 0.57, and 0.58, respectively. These variables, which reflect accumulated behavior since the last regeneration event, appear to offer slightly more predictive value than their delta-based counterparts (Variables 6–8), whose correlations with the target are somewhat lower. In line with these findings, the Fuel Intensity Index (Variable 5), which represents the ratio of fuel consumption over both distance and time, displays a negative correlation of −0.64 with the target, further supporting its relevance in capturing operational load patterns and its selection as a primary feature for modeling. However, it is also worth noting that the Fuel Intensity Index from the previous zone (Variable 13) shows a similarly strong correlation (−0.62), making it a promising secondary candidate or complementary input. While the current index is prioritized for initial model development, the previous-zone index remains a valuable candidate that may be tested in future iterations to assess its potential benefits. When turning to Zone 4, the correlation heatmap reveals a distinct pattern when compared to Zone 3. In this case, the delta-based variables from the previous zone, namely, Delta_time_prev_zone (Variable 6), Delta_dist_prev_zone (Variable 7), and Delta_cons_prev_zone (Variable 8), exhibit the strongest positive correlations with the target variable Delta_time [h] (Variable 10), with coefficients of 0.69, 0.67, and 0.66, respectively. These results suggest that, under critical soot accumulation conditions, the time and effort spent in the preceding zone may significantly influence the duration in the current zone, likely due to the residual effects of high-load operation. In addition, the Fuel Intensity Index from the previous zone (Variable 13) shows a relatively strong negative correlation with the target (−0.70), indicating that lower efficiency in the preceding zone (i.e., higher fuel usage per unit of distance and time) is associated with longer durations in Zone 4. Given this behavior, the Fuel Intensity Index from the previous zone, in combination with the simulated soot content (Variable 2), was selected as the primary feature set for model development in Zone 4. Finally, other engineered variables, such as fuel_rate and fuel_per_km, whether calculated for the current or previous zone, showed consistently weak correlations in both heatmaps and were therefore excluded from the final feature selection process. Figure 9 and Figure 10 display scatter plots visualizing the correlation between the selected engineered features for each zone and the target variable.

Regarding the differences observed between zones, these may stem from variability in operational conditions associated with soot accumulation levels. Urban buses operate in dynamic environments, where factors such as route assignments, stop frequency, passenger loads, and traffic congestion can vary considerably. These variations may influence both soot buildup and the time spent in each zone. Consequently, such contextual differences could partially explain potential divergences in feature relevance or model performance across zones.

3.2. Model Results and Validation

After data exploration and cleaning, the final dataset used for both the initial training process and the subsequent production evaluation is summarized in Table 4.

3.2.1. Model Results—Zone 3

For Zone 3, the regression algorithms were evaluated using LOOCV. Table 5 summarizes their performance alongside a simple baseline model (which always predicts the mean duration):

Among all methods, KNN delivers the greatest improvement over the baseline, reducing MAE by 0.18 h and RMSE by 0.34 h while raising R² to ≈0.60, suggesting that local, non-linear relationships, captured through a simple structure such as KNN with a small neighborhood size, are the best fit for the current characteristics of the dataset. Figure 11 and Figure 12 present the actual vs. predicted and residual plots, respectively. Because RMSE penalizes larger errors more heavily than MAE, the fact that RMSE decreased almost twice as much implies that KNN is particularly effective at mitigating extreme deviations. This means the model not only improves average accuracy but also substantially curbs the most significant prediction errors, yielding a more reliable distribution of residuals. Regarding the rest of the models, the Decision Tree model outperformed the purely linear approaches, and its optimal configuration was very shallow. When deeper trees or smaller min_samples_split values were tested during grid search, overfitting became apparent: training accuracy improved, but test performance deteriorated sharply. Linear Regression and Ridge Regression performed nearly identically, confirming that a simple linear relationship is insufficient to capture the underlying dynamics. As a result, KNN was selected as the primary model for Zone 3.

Entering the topic of model explainability, we evaluated the contribution of each input variable in the final KNN model using the permutation importance technique (Figure 13). This model-agnostic method estimates the relevance of each feature by measuring the loss in R² when its values are randomly permuted, thereby disrupting the original input-output relationship. Results indicated that the Fuel Intensity Index was the most influential feature, with a mean importance of 1.24, whereas the simulated soot content in the particulate filter exhibited a lower and less stable impact (mean: 0.27; std: 0.23). Despite its reduced contribution to predictive performance, the soot content variable was retained due to its physical interpretability and relevance to the clogging process. Its inclusion supports domain-informed modeling and enhances transparency when communicating results to operational teams.

3.2.2. Model Results—Zone 4

Table 6 summarizes the performance of each model on Zone 4, using the same LOOCV as for Zone 3:

Once again, KNN delivers the best results, achieving the highest R² (≈0.72) and the lowest MAE (≈0.53 h) and RMSE (≈0.69 h). Figure 14 and Figure 15 show the actual vs. predicted and residual plots, respectively, confirming that KNN predictions align closely with observed values across the range of durations. Compared to the baseline (MAE = 1.17 h, RMSE = 1.31 h), the KNN model achieves an absolute reduction of 0.64 h in MAE and 0.61 h in RMSE. Notably, the reduction in RMSE is almost equal in magnitude to the MAE drop, despite RMSE giving more weight to larger errors. This suggests that KNN not only improves average predictive accuracy but also effectively reduces the impact of large deviations, resulting in a more reliable and stable model output. As for the rest of the models, the linear perform similarly (R² ≈ 0.65), indicating that while a global linear trend captures part of the behavior, it fails to account for important non-linear patterns in the data. Meanwhile, the Decision Tree model exhibits poor generalization: its low R² and high RMSE point to overfitting, even with minimal depth. Taken together, these findings confirm KNN as the most appropriate model for Zone 4.

Regarding explainability, the final KNN model for Zone 4 was also examined using permutation importance. In this case, the Fuel Intensity Index from the previous zone emerged as the most relevant input, with a mean contribution of 0.77 to the R² score (Figure 16). In contrast, the simulated soot content showed a lower importance (0.20) and greater variability across permutations, suggesting a less consistent influence. Nevertheless, following the same rationale as in Zone 3, this feature was retained in the model due to its physical significance and its role in facilitating transparent communication with maintenance stakeholders.

3.3. Production Results and Validation

3.3.1. Production Results—Zone 3

When evaluated on the most recent batch of unseen operational data, the KNN model for Zone 3 exhibited a significant performance drop relative to the newly computed baseline (Figure 17 and Figure 18). While the baseline predictor (which simply outputs the mean of the training targets) achieved an MAE of 0.97 h and an RMSE of 1.11 h, the KNN model underperformed on both metrics, reaching an MAE of 1.01 h and an RMSE of 1.16 h. Moreover, the R² value dropped below zero (−0.10), indicating that the model failed to explain any variance in the new data and performed worse than the naive baseline. This degradation highlights a clear loss of generalization capacity when applied to data from a subsequent operational period.

Several factors may explain this abrupt drop in performance. First, changes in operating patterns, such as the inclusion of routes that were not present in the initial dataset, shifts in traffic conditions, or variations in driving behavior, may have altered the statistical properties of the input features, leading to a phenomenon known as data drift. In this context, data drift occurs when the real-world data distribution diverges from that used to train the model, causing it to make inaccurate predictions despite being valid at the time of deployment. Second, the model’s dependence on a very small number of neighbors (k = 3) may have made it overly sensitive to local patterns present in the training data. This situation reveals a high level of sensitivity in the developed model, where slight changes in the input distribution can significantly degrade predictive performance. Such sensitivity reflects a lack of robustness to perturbations in the feature space and to variations in local data structure. This fragility is especially common in low-bias, high-variance algorithms like KNN when trained on small datasets. Taking into account that our methodology relies on scheduled batch retraining rather than sensitivity analyses (which are limited by small datasets), we expect model sensitivity to decrease with each retraining cycle as new deviations are incorporated and the input space becomes more thoroughly covered.

Given these results, a model revision and retraining phase is necessary. This revision should include:

An updated exploratory analysis of the new batch to detect emerging trends or shifts in the distribution of key features.
A model retraining phase using an expanded dataset that incorporates the newly labeled data.
A re-evaluation of model complexity, considering whether the original KNN architecture remains appropriate or whether the use of more expressive models (e.g., deeper Decision Trees or ensemble methods like Random Forests) becomes justified as more data becomes available.

This adaptive retraining step ensures that the model remains aligned with evolving real-world dynamics and maintains its predictive value in operational settings.

3.3.2. Production Results—Zone 4

When tested on the most recent production batch, the KNN model for Zone 4 maintained superior performance over the baseline (Figure 19 and Figure 20), achieving an MAE of 0.6758 h and an RMSE of 0.9330 h, compared to the baseline’s 0.9612 h and 1.0722 h, respectively. The model also preserved a positive R² value (≈0.24), confirming that it still explains a portion of the variance in the new data. However, these values represent a considerable drop from the performance observed during validation (R² ≈ 0.72, MAE ≈ 0.53 h), suggesting that generalization has weakened.

This decline may stem from subtle forms of data drift, particularly the introduction of new operating scenarios or routes not previously represented. Unlike Zone 3, the degradation here is not severe, and the model continues to outperform the naïve baseline. Therefore, retraining is not strictly necessary at this point, and the decision to trigger a model update should be left to the fleet manager or system administrator, depending on operational priorities and risk tolerance.

One notable pattern observed in the residual analysis is a consistent overestimation of durations in cases where the actual values are below 2.5 h. This localized prediction bias suggests that the model may not fully capture the behavior of shorter-duration events, potentially due to their underrepresentation in the training set or shifts in their underlying dynamics in the new data.

Following the pipeline defined in the methodology, two possible courses of action are available:

Maintain the current model in production until the next validation cycle, considering that the current performance still surpasses the baseline and no critical degradation is observed.
Initiate a retraining process, which may involve simply updating the model with the new data or conducting a more detailed revision to assess whether new outliers have emerged or significant behavioral changes in the fleet’s operation justify further model adaptation.

This decision should be made in alignment with operational priorities and the acceptable margin of prediction error in current maintenance planning or route management processes.

4. Conclusions

This study aimed to establish a complete and reproducible methodology for leveraging operational data in scenarios characterized by data scarcity, as commonly encountered in urban bus fleets undergoing the early stages of maintenance digitalization. Building upon a previously proposed framework, the methodology was adapted to include specific steps, tools, and practical considerations to guide maintenance teams in initiating data-driven analysis even with limited resources. A key enhancement introduced was the incorporation of a scheduled batch retraining strategy, helping maintain predictive performance in the face of evolving conditions. To demonstrate its applicability, a real-world case study was conducted focusing on the prediction of the duration of soot load levels in the Diesel Particulate Filter (DPF), specifically Zones 3 and 4, which precede conditions that can significantly impact bus operability. Once the training and deployment simulations were completed, two distinct performance scenarios emerged. In Zone 4, despite data limitations, the deployed model continued to outperform the updated baseline (MAE: 0.6758 vs. 0.9612, RMSE: 0.9330 vs. 1.0722, R²: 0.24), demonstrating a certain level of robustness. This suggests that while model updates could enhance performance, the current model remains sufficiently accurate for operational use until the next validation cycle. However, in Zone 3, a substantial drop in evaluation metrics was observed during the new test batch (R² ≈ −0.10, MAE ≈ 1.01 h, RMSE ≈ 1.16 h), indicating that the previously deployed model was no longer capable of capturing the underlying dynamics. This highlights the need for a model revision process, including an updated exploratory analysis and retraining with the most recent data. The overall results confirm the potential and practical utility of the proposed methodology. While predictive accuracy varied across zones, particularly with a notable drop observed in Zone 3, the framework proved effective in identifying performance degradation, guiding necessary revisions, and supporting decision-making under data-scarce and evolving operational conditions. This adaptability highlights the methodology’s practical value for fleet managers operating in the early stages of digital transformation. Several limitations emerged during the application of the proposed approach. One key challenge is the potential for operational changes in the bus fleet, such as new routes, different traffic conditions, or driving behaviors, which can introduce data drift—i.e., shifts in the statistical distribution of the input data that degrade model performance over time. Additionally, the emergence of new outliers highlights the need for ongoing monitoring and data validation. Another limitation is the lack of data in specific ranges; for example, in Zone 4, the model tended to overestimate when the real duration was below 2.5 h, indicating reduced accuracy for short-duration events. These findings emphasize the importance of continuous evaluation and model adaptation. Looking ahead, the next steps for the case study focus on operational deployment. First, the predictive models will be implemented using the existing API connection to access the necessary data streams in real time. Additionally, since the data analysis tools have already been defined and validated, the next logical automation step is to link the database to a dedicated processing module that can automatically generate the performance indicators and plots outlined in this study. This would significantly accelerate the review process for maintenance managers. Integrating a notification system or dashboard could further enhance usability, ensuring that key insights are delivered promptly to the right stakeholders. Moreover, as data availability grows, the pipeline could also be scaled to incorporate more complex models and additional variables, further improving predictive performance. Finally, the proposed methodology could be adapted to other transit fleets or urban contexts with similar data constraints, serving as a foundation for broader applications of predictive maintenance in the public transport sector.

Author Contributions

Conceptualization, B.T. and J.A.; methodology, R.S.-M. and V.B.; software, R.S.-M.; validation, R.S.-M., B.T. and J.A.; formal analysis, R.S.-M. and V.B.; investigation, R.S.-M.; resources, B.T.; data curation, R.S.-M.; writing—original draft preparation, R.S.-M.; writing—review and editing, B.T. and V.B.; visualization, R.S.-M.; supervision, B.T.; project administration, B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Mendeley Data at DOI: 10.17632/3sk43brs4p.1. The full implementation code, including data preprocessing, modeling pipeline, and validation experiments, is available in Zenodo under DOI: 10.5281/zenodo.15874832.

Acknowledgments

We would like to thank the EMT, ALSA, and Jaltest staff for their support during the elaboration of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yan, J.; Meng, Y.; Lu, L.; Li, L. Industrial Big Data in an Industry 4.0 Environment: Challenges, Schemes, and Applications for Predictive Maintenance. IEEE Access 2017, 5, 23484–23491. [Google Scholar] [CrossRef]
Teoh, Y.K.; Gill, S.S.; Parlikad, A.K. IoT and Fog-Computing-Based Predictive Maintenance Model for Effective Asset Management in Industry 4.0 Using Machine Learning. IEEE Internet Things J. 2023, 10, 2087–2094. [Google Scholar] [CrossRef]
Resende, C.; Folgado, D.; Oliveira, J.; Franco, B.; Moreira, W.; Oliveira-Jr, A.; Cavaleiro, A.; Carvalho, R. TIP4.0: Industrial Internet of Things Platform for Predictive Maintenance. Sensors 2021, 21, 4676. [Google Scholar] [CrossRef]
Çınar, Z.M.; Abdussalam Nuhu, A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine Learning in Predictive Maintenance towards Sustainable Smart Manufacturing in Industry 4.0. Sustainability 2020, 12, 8211. [Google Scholar] [CrossRef]
Taie, M.; Diab, M.; Elhelw, M. Remote prognosis, diagnosis and maintenance for automotive architecture based on least squares support vector machine and multiple classifiers. In Proceedings of the 4th International Congress on Ultra Modern Telecommunications and Control Systems (ICUMT), St. Petersburg, Russia, 3–5 October 2012; pp. 128–134. [Google Scholar] [CrossRef]
Theissler, A. Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection. Knowl.-Based Syst. 2017, 123, 163–173. [Google Scholar] [CrossRef]
Prytz, R.; Nowaczyk, S.; Rögnvaldsson, T.; Byttner, S. Predicting the need for vehicle compressor repairs using maintenance records and logged vehicle data. Eng. Appl. Artif. Intell. 2015, 41, 139–150. [Google Scholar] [CrossRef]
Giordano, D.; Giobergia, F.; Pastor, E.; La Macchia, A.; Cerquitelli, T.; Baralis, E.; Mellia, M.; Tricarico, D. Data-driven strategies for predictive maintenance: Lessons learned from an automotive use case. Comput. Ind. 2022, 134, 103554. [Google Scholar] [CrossRef]
Singh, P.; Hungund, T.; Kukret, S. Revolutionizing system operation and maintenance in the automobile industry through machine learning applications. Int. J. Comput. Sci. Inf. Technol. 2024, 16, 93. [Google Scholar] [CrossRef]
Aravind, R. Machine learning applications in predictive maintenance for vehicles: Case studies. Int. J. Eng. Comput. Sci. 2022, 11, 25628–25640. [Google Scholar] [CrossRef]
Mykich, I.; Zavushchak, A.; Savka, A. Predictive maintenance for automotive vehicle engines in military logistics. In Proceedings of the 6th International Workshop on Modern Machine Learning Technologies (MoMLeT-2024), Lviv-Shatsk, Ukraine, 31 May–1 June 2024; CEUR-WS. 2024; pp. 333–344. [Google Scholar]
Shah, C. Machine Learning Algorithms for Predictive Maintenance in Autonomous Vehicles. Int. J. Eng. Comput. Sci. 2024, 13, 26015–26032. [Google Scholar] [CrossRef]
Aravind, R.; Shah, C.V. Physics model-based design for predictive maintenance in autonomous vehicles using AI. Int. J. Sci. Res. Manag. 2023, 11, 932–946. [Google Scholar] [CrossRef]
Johnson, N.; Ewards, S.E.V.; Silas, S.; Kathrine, G.J.W. Predictive vehicle maintenance using deep neural networks. In Proceedings of the 2024 International Conference on Cognitive Robotics and Intelligent Systems (ICC-ROBINS), Coimbatore, India, 17–19 April 2024; pp. 322–326. [Google Scholar] [CrossRef]
Sharma, N.; Kalra, M. Predictive maintenance for commercial vehicles tyres using machine learning. In Proceedings of the 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2022; pp. 1–6. [Google Scholar] [CrossRef]
Raj, V.; Sharma, D. Predictive maintenance in autonomous vehicles using machine learning techniques. In Proceedings of the 2024 2nd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), Faridabad, India, 28–29 November 2024; pp. 912–917. [Google Scholar] [CrossRef]
Mohanraj, E.; Eniyavan, N.; Sidarth, S.; Sridharan, S. Digital twins for automotive predictive maintenance. In Proceedings of the 2024 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 24–26 April 2024; pp. 1579–1584. [Google Scholar] [CrossRef]
Celestin, M. How predictive maintenance in logistics fleets is reducing equipment downtime and operational losses. Brainae J. Bus. Sci. Technol. 2023, 7, 1023–1033. [Google Scholar] [CrossRef]
Gong, C.-S.A.; Su, C.-H.S.; Chen, Y.-H.; Guu, D.-Y. How to implement automotive fault diagnosis using artificial intelligence scheme. Micromachines 2022, 13, 1380. [Google Scholar] [CrossRef] [PubMed]
Panda, C.; Singh, T.R. ML-based vehicle downtime reduction: A case of air compressor failure detection. Eng. Appl. Artif. Intell. 2023, 122, 106031. [Google Scholar] [CrossRef]
Chaudhuri, A.; Patil, R.; Ghosh, S.K. Predictive maintenance of vehicle fleets using LSTM autoencoders for industrial IoT datasets. Big Data Privacy and Security in Smart Cities. In Advanced Sciences and Technologies for Security Applications; Jiang, R., Ed.; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
De Freitas, T.N.; Gaspar, R.; Lins, R.G.; Hartman Junior, E.J. Data-driven methodology for predictive maintenance of commercial vehicle turbochargers. In Proceedings of the 2023 15th IEEE International Conference on Industry Applications (INDUSCON), São Bernardo do Campo, Brazil, 12–15 November 2023; pp. 807–814. [Google Scholar] [CrossRef]
de Villiers, P.-R.H.; Jooste, J.L.; Lucke, D. Smart maintenance system for inner city public bus services. Procedia CIRP 2023, 120, 285–290. [Google Scholar] [CrossRef]
Werbińska-Wojciechowska, S.; Giel, R.; Winiarska, K. Digital twin approach for operation and maintenance of transportation system—Systematic review. Sensors 2024, 24, 6069. [Google Scholar] [CrossRef]
Byttner, S.; Rögnvaldsson, T.; Svensson, M. Consensus self-organized models for fault detection (COSMO). Eng. Appl. Artif. Intell. 2011, 24, 833–839. [Google Scholar] [CrossRef]
Killeen, P.; Ding, B.; Kiringa, I.; Yeap, T. IoT-based predictive maintenance for fleet management. Procedia Comput. Sci. 2019, 151, 607–613. [Google Scholar] [CrossRef]
Bhave, D.; Adiga, D.T.; Powar, N.; McKinley, T. Remaining useful life prediction of turbo actuators for predictive maintenance of diesel engines. PHM Soc. Eur. Conf. 2021, 6, 11. [Google Scholar] [CrossRef]
Giannoulidis, A.; Gounaris, A. A context-aware unsupervised predictive maintenance solution for fleet management. J. Intell. Inf. Syst. 2023, 60, 521–547. [Google Scholar] [CrossRef]
Avilés-Castillo, F.; Yánez-Arcos, D.; Ayala-Chauvin, M.; Blanco-Romero, E. Performance and real-world variability of predictive maintenance models for vehicle fleets. In Proceedings of the 2024 IEEE Eighth Ecuador Technical Chapters Meeting (ETCM), Cuenca, Ecuador, 15–18 October 2024; pp. 1–6. [Google Scholar] [CrossRef]
Mittal, V.; Srividya Devi, P.; Pandey, A.K.; Singh, T.; Dhingra, L.; Beliakov, S.I. IoT-enabled predictive maintenance for sustainable transportation fleets. E3S Web Conf. 2024, 511, 01012. [Google Scholar] [CrossRef]
Silva, D. Detección de Anomalías en la Monitorización de una Flota de Autobuses. Master’s Thesis, Universitat Politècnica de València, Valencia, Spain, 2022. [Google Scholar]
Massaro, A.; Selicato, S.; Galiano, A. Predictive maintenance of bus fleet by intelligent smart electronic board implementing artificial intelligence. IoT 2020, 1, 180–197. [Google Scholar] [CrossRef]
Tormos, B.; Pla, B.; Sánchez-Márquez, R.; Carballo, J.L. Explainable AI using on-board diagnostics data for urban buses maintenance management: A study case. Information 2025, 16, 74. [Google Scholar] [CrossRef]
Sarih, H.; Tchangani, A.; Medjaher, K.; Pere, E. Data preparation and preprocessing for broadcast systems monitoring in PHM framework. In Proceedings of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23–26 April 2019; pp. 1444–1449. [Google Scholar] [CrossRef]
Ndao, M.L.; Youness, G.; Niang, N.; Saporta, G. Improving predictive maintenance: Evaluating the impact of preprocessing and model complexity on the effectiveness of eXplainable Artificial Intelligence methods. Eng. Appl. Artif. Intell. 2025, 144, 110144. [Google Scholar] [CrossRef]
Aminzadeh, A.; Sattarpanah Karganroudi, S.; Majidi, S.; Dabompre, C.; Azaiez, K.; Mitride, C.; Sénéchal, E. A machine learning implementation to predictive maintenance and monitoring of industrial compressors. Sensors 2025, 25, 1006. [Google Scholar] [CrossRef]
Ucar, A.; Karakose, M.; Kırımça, N. Artificial intelligence for predictive maintenance applications: Key components, trustworthiness, and future trends. Appl. Sci. 2024, 14, 898. [Google Scholar] [CrossRef]
Zheng, Z.; Yang, Y.; Zhou, J.; Gu, F. Research on time series data prediction based on machine learning algorithms. In Proceedings of the 2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (IC-CECT), Jilin, China, 26–28 April 2024; pp. 680–686. [Google Scholar] [CrossRef]
Branco, P.; Torgo, L.; Ribeiro, R.P. SMOGN: A pre-processing approach for imbalanced regression. In Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications (LIDTA 2017), ECML-PKDD, Skopje, Macedonia, 22 September 2017; pp. 36–50. [Google Scholar]

Figure 1. After-treatment system of hybrid vehicles.

Figure 2. A hybrid diesel bus from the fleet under study.

Figure 3. Leave-One-Out Cross-Validation strategy.

Figure 4. Proposed methodological pipeline.

Figure 5. Distance traveled since last regeneration summary—Zone 3.

Figure 6. Distance traveled since last regeneration summary—Zone 4.

Figure 7. Correlation heatmap for Zone 3.

Figure 8. Correlation heatmap for Zone 4.

Figure 9. Fuel Intensity Index (Abs) and target variable for Zone 3.

Figure 10. Fuel Intensity Index (Prev Delta) and target variable for Zone 4.

Figure 11. Actual vs. predicted for the KNN model—Zone 3.

Figure 12. Residuals for KNN model—Zone 3.

Figure 13. Impact of feature permutation on model performance—Zone 3.

Figure 14. Actual vs. predicted for KNN model—Zone 4.

Figure 15. Residuals for KNN model—Zone 4.

Figure 16. Impact of feature permutation on model performance—Zone 4.

Figure 17. Actual vs. predicted Δtime in the production scenario for Zone 3.

Figure 18. Residuals vs. predicted Δtime in the production scenario for Zone 3.

Figure 19. Actual vs. predicted Δtime in the production scenario for Zone 4.

Figure 20. Residuals vs. predicted Δtime in the production scenario for Zone 4.

Table 1. Soot level zones and their impact on bus operation.

Soot Level Zone in the DPF	Observations
0 to 3	Bus operability is not affected.
4	Engine power is reduced. Manual regeneration is required.
5	The bus must be returned to the depot. Forced regeneration is required.

Table 2. Variables from the selected after-treatment ECU.

Original Name	Unit	Variable Type
Distance traveled since last DPF regeneration	km	Continuous
Simulated soot content in the particulate filter	kg
Fuel consumed since DPF regeneration	L	Continuous
Soot level zone in the DPF	-	Categorical—Ordinal
Time elapsed since last DPF regeneration	h	Continuous

Table 3. Heatmap variable codification.

Original Name	Variable Code
Distance traveled since last DPF regeneration [km]	1
Simulated soot content in the particulate filter [kg]	2
Fuel consumed since DPF regeneration [L]	3
Time elapsed since last DPF regeneration [h]	4
fuel_intensity_index	5
Delta_time_prev_zone	6
Delta_dist_prev_zone	7
Delta_cons_prev_zone	8
Delta_soot_prev_zone	9
Delta_time [h]	10
fuel_rate	11
fuel_per_km	12
fuel_intensity_index_prev_zone	13
fuel_rate_prev_zone	14
fuel_per_km_prev_zone	15

Table 4. Dataset Overview.

Condition	Soot Level	Number of Data Points
First Training	3	45
First Training	4	32
Production Test	3	27
Production Test	4	16

Table 5. Performance metrics of regression models for predicting duration in Zone 3.

Model	R²	MAE	RMSE	Parameters
Baseline	0.0000	0.6998	0.9188
Linear Regression	0.3395	0.6508	0.7467	{}
Ridge Regression	0.3397	0.6505	0.7466	{‘alpha’: 0.01}
Decision Tree	0.4705	0.5853	0.6685	{‘max_depth’: 2, ‘min_samples_split’: 2}
KNN	0.5970	0.5233	0.5833	{‘n_neighbors’: 3, ‘weights’: ‘uniform’}

Table 6. Performance metrics of regression models for predicting duration in Zone 4.

Model	R²	MAE	RMSE	Parameters
Baseline	0.0000	1.1714	1.3074
Linear Regression	0.6481	0.7117	0.7756	{}
Ridge Regression	0.6479	0.7119	0.7758	{‘alpha’: 0.01}
Decision Tree	0.3241	0.7665	1.0748	{‘max_depth’: 2, ‘min_samples_split’: 2}
KNN	0.7198	0.5340	0.6921	{‘n_neighbors’: 3, ‘weights’: ‘uniform’}

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tormos, B.; Bermudez, V.; Sánchez-Márquez, R.; Alvis, J. A Reproducible Pipeline for Leveraging Operational Data Through Machine Learning in Digitally Emerging Urban Bus Fleets. Appl. Sci. 2025, 15, 8395. https://doi.org/10.3390/app15158395

AMA Style

Tormos B, Bermudez V, Sánchez-Márquez R, Alvis J. A Reproducible Pipeline for Leveraging Operational Data Through Machine Learning in Digitally Emerging Urban Bus Fleets. Applied Sciences. 2025; 15(15):8395. https://doi.org/10.3390/app15158395

Chicago/Turabian Style

Tormos, Bernardo, Vicente Bermudez, Ramón Sánchez-Márquez, and Jorge Alvis. 2025. "A Reproducible Pipeline for Leveraging Operational Data Through Machine Learning in Digitally Emerging Urban Bus Fleets" Applied Sciences 15, no. 15: 8395. https://doi.org/10.3390/app15158395

APA Style

Tormos, B., Bermudez, V., Sánchez-Márquez, R., & Alvis, J. (2025). A Reproducible Pipeline for Leveraging Operational Data Through Machine Learning in Digitally Emerging Urban Bus Fleets. Applied Sciences, 15(15), 8395. https://doi.org/10.3390/app15158395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Reproducible Pipeline for Leveraging Operational Data Through Machine Learning in Digitally Emerging Urban Bus Fleets

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Context and Objectives

2.2. Data Collection and Technological Infrastructure

2.3. Exploratory Data Analysis (EDA) and Preprocessing

2.4. Feature Engineering

2.5. Model Development

2.5.1. Modeling Strategy and Algorithm Selection

2.5.2. Data Augmentation for Small Samples

2.6. Validation and Performance Assessment

Feature Importance and Model Interpretability

2.7. Adaptive Retraining Strategy

2.8. Summary of the Proposed Methodology

3. Results and Discussion

3.1. Exploratory Data Analysis

3.2. Model Results and Validation

3.2.1. Model Results—Zone 3

3.2.2. Model Results—Zone 4

3.3. Production Results and Validation

3.3.1. Production Results—Zone 3

3.3.2. Production Results—Zone 4

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI