A Comprehensive Study on Predicting the Need for Vehicle Maintenance Using Machine Learning

Mahiyudin, Ghulam; Hussain, Manzoor; Dewi, Dhita Diana

doi:10.3390/engproc2025107089

Open AccessProceeding Paper

A Comprehensive Study on Predicting the Need for Vehicle Maintenance Using Machine Learning^†

by

Ghulam Mahiyudin

^1,*,

Manzoor Hussain

²

and

Dhita Diana Dewi

³

¹

Department of Software Engineering, University of Sialkot, Sialkot 51040, Pakistan

²

Department of Computing, Indus University, Karachi 75500, Pakistan

³

Department of Engineering, Nusat Putra University, Sukabumi 43152, West Java, Indonesia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.

Eng. Proc. 2025, 107(1), 89; https://doi.org/10.3390/engproc2025107089

Published: 15 September 2025

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

Download

Browse Figures

Versions Notes

Abstract

Predicting vehicle maintenance is an important task to reduce downtime and cost. Traditional methods based on mileage and manufacturer direction can lead to maintenance at an early stage or too late. By leveraging machine learning, we can predict maintenance in a better way that can save time and cost effectively. In our paper, we have used machine learning models to predict the maintenance needs based on vehicle features. We have an imbalanced dataset, which contains information about 50,000 vehicles; first, we balanced it using SMOTE (generated new sample points and increased the size up to 80,000). After addressing the imbalance dataset challenge, we have applied several algorithms, including Random Forest, Decision Tree, gradient booster, naïve Bayes and KNN. Decision tree perform maximum performs well on both imbalanced and balanced data samples and achieves an accuracy of 99.97%. These finding highlights the importance of machine learning in predicting vehicle maintenance to save cost and downtime.

Keywords:

predicting vehicle maintenance; machine learning; imbalanced dataset; SMOTE; decision tree

1. Introduction

Vehicle maintenance is an important part of owning a vehicle to ensure the safety, reliability and long life of the vehicle. But, the traditional methods for identifying maintenance needs rely on mileage coverage or a certain schedule; sometimes, they fail to identify the actual maintenance need, leading to either service early or late from the actual time. This may lead to spending more money and time. In 2023, the average spending on maintenance of a vehicle is USD 1475 yearly for 15,000 miles [1]. Regardless of this cost, unneeded maintenance can lead to financial burden, as over half of U.S. adults are unable to afford the maintenance cost of USD 1000 or more in case of emergency [2]. However, the traditional methods do not always align with the actual needs of the vehicle. Over the years, the cost of vehicle parts has increased, as shown in the Consumer Price Index (CPI). From 2014 to 2024, the CPI rose from 120.5 to 181.387, showing an 81.4% increase in cost since 1982 to 1984 as base years [3].

Currently, the maintenance of a vehicle is often managed by routine checks like mileage coverage or the vehicle manufacturer’s guidance. For example, routine maintenance is generally conducted between 5000 and 7500 miles, which includes services like oil change, tire rotations and other multiple inspections [4]. These traditional methods lack some factors like driver behaviour, environmental conditions or actual failure of a vehicle part. Figure 1 shows the consumer price index for motor vehicle parts and equipment from 2014 to 2024. As shown in Figure 2, the paper is organized into several sections, starting with the introduction and ending with the conclusion.

As a result, many vehicle owners carry out needless maintenance or ignore the main issue, which leads them to face big failures. Some studies have applied the machine learning models to predict vehicle maintenance, but they are limited to specific data points, model accuracy or complexity in implementation. They are using data from IOT devices, but sometimes it will be noisy and can lead the models to not perform well [5]. Currently, electric vehicles are becoming popular, so most existing solutions are lacking in tackling them. Also, some previous solutions are limited to specific vehicle models, which makes them less universal in application. Our proposed solution is to enhance the prediction of vehicle maintenance based on vehicle characteristics. We have picked the dataset from Kaggle, which has features like vehicle age, fuel type (Diesel, Petrol or Electric), engine size, mileage, previous maintenance history, meter reading, accident history, last service time, etc. We try different machine learning models individually, and also perform ensemble learning on the dataset, and check the performance of each applied model. During model development, the model learns from the dataset and identifies patterns. And these patterns will help to predict the maintenance needs of new data. This approach not only helps to predict maintenance needs but also to minimize the risk of unexpected breakdowns, helping vehicle owners to optimize their performance and cost. This will improve the maintenance need prediction process in a way that predicts the need effectively to better operational efficiency, reduce downtime and ultimately play a role in cost-effective vehicle management.

2. Literature Review

Predicting maintenance is getting on the notice board in industry due to its role in saving time and cost. Recent enhancements in machine learning capabilities are helping to give more accurate results. For instance, in 2018, a study was conducted where they obtained data from 15 machines, and the total data sample was 530,731. They apply a Random Forest classifier and achieve 95% accuracy [6]. Another study conducted in the same year collected data from five garages, and, after data cleaning, they finalized features like vehicle age, odometer reading, repair history, cost of labor and spare parts. Based on that, they proposed Hierarchical Modified Fuzzy SVM for classifying maintenance need into three classes, like immediate, short-term and longer-term [7]. A model was proposed for vehicle maintenance that is familiar with mileage and previous maintenance data, including 10,252 vehicle records sourced from the National Vehicle Maintenance Electronic Health Record System, which range from 2015 to 2023. This highlights that mileage has a correlation with maintenance frequency, with 0.08, and achieves an F1 of 40.99 by utilizing LSTM [8]. Predicting maintenance of vehicle engine components gives prominence to detecting faults through the simulation data. The dataset used for the study has a total of 146,606 samples from four driving cycles (NEDC, EUDC, FTP-75 and WLTP) with 14 total features, but after applying brute force, only 5 are shortlisted. Using Random Forest and Gaussian Processes helps to attain an accuracy of 0.99262, while with Support Vector Machine, it attains an accuracy of 0.99041, and they also use an Artificial Neural Network [9]. Another study uses a deep learning approach to predict vehicle maintenance using GIS and maintenance data to achieve better accuracy. The MLSTM model was developed that show improvement with GIS features like weather, traffic and terrain data. The experiment shows that MCC increased by 0.13, and the standard deviation increased by 63.2%. The GIS data gathered from the UK weather station, traffic department and elevation map [10]. In 2021, a survey was conducted that involved content from 62 papers and highlights that the dominance of supervised learning relies on the label data. The paper stresses combining all the publicly available datasets for attaining better results for automotive research and identifying critical gaps [11]. It also involves obtaining data from the IOT devices and then passing it to a machine learning model so that it will predict whether maintenance is required or not. But the big challenge during this is that the data can be noisy from IOT devices, and directly giving it to the model can lead the model to give inaccurate results. Case studies show that electric vehicle battery health monitoring, fleet management optimisation and detecting faults in real time in automotive systems highlights the role of data fusion and large datasets [12]. Predicting maintenance in automotive by leveraging IOT and ML is helpful in enhancing safety and optimising vehicle management. It helps in minimising the cost up to 8–12% as compared to traditional methods. The common machine learning algorithms playing a role in it are Decision Tree, Support Vector Machine, KNN and deep learning, including LSTMs and CNNs [13]. Technologies in vehicles that collect, process and transmit data like driver behaviour and environmental conditions are known as telematics systems, and, as the technology is upgrading, these systems are also becoming better, and, now, they have capabilities to monitor vehicle performance such as engine metrics, tire pressure and fuel consumption. This data can help predict maintenance in real time and make it efficient for applications like advanced driver assistance systems, which enables us to minimise downtime and save costs. Data from multiple sources, like engine parameters and geospatial data, further improves the prediction accuracy [5]. A study conducted in a university where they gather data from the university campus and, based on that, they give results, as the LSTEM model is better in predicting battery remaining useful life, and clustering strategies reduce component failures and spare parts needed [14]. In 2023, a study was conducted, and the data was collected from telematics systems that include raw sensor data like engine performance, fuel efficiency and tire pressure, and then preprocessed. Additionally, they explored that integrating data from vehicle-to-everything (V2X) communication systems further improves the maintenance prediction process [15]. Commercial vehicles data of Michigan State collected from 1999 to 2017. Linear SVM, naïve Bayes and Decision Tree algorithms were applied, achieving 89% accuracy. The data key attributes influencing predictions are vehicle type, fuel type and trip distance. The dataset also shows that 75% increase in light-duty vehicles used commercially [16]. A dataset with 26,831 vehicles data from 73 maintenance companies (2011–2023) is used to train model MsDFN by leveraging a deep fusion network with attention to mileage, maintenance record and base information, achieving a weighted F1 score of 34.6 [17]. Utilising sensor data, system logs and maintenance records from autonomous vehicles, including speed, temperature and vibration metrics, were collected over six months. Apply three types of learning supervised that include Random Forests and Gradient Boosting, achieving a high prediction of 92.3% from Random Forest, in unsupervised learning Gaussian Mixture Model detected at a 90.4% rate, and reinforcement learning, especially Q-learning, optimised energy consumption, reducing it by 20.3% [18]. Currently, electric vehicles are becoming popular, and maintaining them is crucial. Decision trees playing a role in making decisions regarding their maintenance can be efficient and save both cost and money. They construct a Decision Tree model created to analyse new energy vehicles [19]. Daily vehicle usage also influences the maintenance opportunities, and predicting need on this factor using Neural Networks, Decision Tree and Random Forest helps to give better results, enhancing safety, reducing operational cost and limiting downtime, especially in shared autonomous vehicles. But the challenge in it is to identify the daily usage data collection part [20]. Another study emphasises a three-tier architecture to enhance effectiveness. It focuses on anomaly detection as an important factor for identifying component failures and uses an LSTM model for estimating the remaining useful life of components, which not only lowers the cost but also safety and efficiency [21]. Machine learning can be used to predict the next maintenance time based on previous history, to predict the data collected over four years via CAN bus at 100 Hz. Models were developed for both old and new vehicles. For old vehicles, Random Forest achieved the lowest relative error of 6.9% followed by XGB. For old vehicles, focus on the shift toward global data and use first-cycle data from older vehicles for prediction [22]. ML for predictive maintenance in vehicles for focusing on improving reliability, safety and reducing costs. Techniques such as Neural Networks, Decision Trees, and Random Forest predict components’ health, minimising downtime and operational risks.

3. Methodology

Machine learning is evolving day by day in different domains, and, in this study, we are utilising it for predicting the maintenance of vehicles. It helps to identify issues in vehicles as a result of reducing cost and time. Our workflow is shown in Figure 3.

3.1. Methodology Flow

3.1.1. Data Collection

The dataset we are using in this study is sourced from Kaggle. The dataset name is Vehicle Maintenance Data, containing the historical vehicle usage data. The dataset contains records of 50,000 vehicles with 19 independent and 1 dependent variable. The target (dependent) feature is Need_Maintenance, which represents whether the vehicle requires maintenance or not (1 = Yes, 0 = No). The other independent features are vehicle model, maintenance history, mileage, reported issues, vehicle age, transmission type (Manual or Auto), fuel type (Petrol, Diesel, Electric), engine size, odometer reading, last service data, accident history, etc. The dataset captures a wide range of vehicle features that can influence the prediction of maintenance.

3.1.2. Data Processing

The dataset was processed to ensure that it was cleaned, consistent and suitable for applying any machine learning model. The original dataset is cleaned but highly imbalanced, as 0s are 9501 and 1s are 40,499. This means that the dataset has a roughly 23.4% to 76.6% distribution. So, the dataset ratio before applying SMOTE is to make the dataset balanced, so we have applied the Synthetic Minority Oversampling Technique (SMOTE). It is used to obtain new data points of a class that has fewer samples, to balance the dataset. This helps in ensuring fair training for models. SMOTE is applied to numerical data, but the dataset has categorical variables like fuel type, transmission type, etc., so to overcome this situation, we need to convert them into a numerical format. And to convert categorical data into numerical data, we perform encoding. One is one-hot encoding, which is useful for converting nominal categorical variables and another is label encoding for converting ordinal categorical variables. In our dataset, we have applied label encoding. After encoding, the process of SMOTE begins and, as a result, we obtain new samples in the minority class as shown in Figure 4. After the generation of new samples, the data integration process starts, where these new samples are mixed with the existing ones and make the new balanced dataset. During the data integration process, we need to take care that the number of columns remains consistent and no data is lost or duplicated. The total size is 80,996, which contains 40,498 for each class. The code of this part is available on GitHub [23].

3.1.3. Data Splitting

Data splitting seems an easy task, just dividing data into two or three parts, but that is not the game. Splitting can affect the model’s performance, so we have to be careful during data splitting. It typically involves three sets: training, validation and testing. Training is the part of the data that is given to the model to learn and identify patterns from it. Validation is an optional use to validate the model. Testing is used to check the model, what it learned from the training data and, based on that, we check the model’s performance, like accuracy, precision, recall, F1, etc. Data splitting ratio is also a subjective topic, but the common ratios are seventy–thirty and eighty–twenty. We use the 70%/30% in our case; the original dataset size is 50,000, so training is on 35,000 samples and testing is on 15,000. After applying SMOTE, we reached 80,996, and on it, we have 56,698 and 24,298.

3.2. Model Development and Evaluation

Data is refined, and now it is time to apply models, and for that, we try different models using the tool named Rapid Miner, which has now changed to AI Studio. First of all, we load the dataset into it and, during this process, we select our label (target variable) that needs maintenance. Then, we divide the data into training and testing (70/30). On the training part, we apply different algorithms, which include Decision Tree, Random Forest, gradient booster, naïve Bayes and KNN.

3.2.1. Decision Tree

It is based on supervised learning used for classification. It divides data into branches to adopt the shape of a tree. In our study, we have applied a Decision Tree on both the original dataset and the SMOTE-based dataset. The algorithm performs well and shows better results.

3.2.2. Random Forest

Random Forest is built on top of a Decision Tree. During the model training process, it constructs multiple Decision Trees, and, at the output time, applies the voting concept and picks the majority voted side. We use this classifier on both the original and the newly created dataset.

3.2.3. Gradient Booster

It builds multiple weak Decision Trees sequentially, and each corrects the error made by the previous one. The model is trained by minimising the loss function through gradient descent. We also use this classifier to train on our dataset and analyse its results.

3.2.4. Naïve Bayes

Naïve Bayes is a probability-based algorithm that, firstly, identifies the target classes and then their total occurrences. After that, each class in each attribute is identified, and its probability is found. And when a new sample came by using the existing calculated probabilities, a new class of samples was defined. In our case, we also use it.

3.2.5. KNN

KNN stands for K-Nearest Neighbour. In it, we calculate the distance of the new sample from each existing one, and then sort them after we have selected the value of k. K’s value is debatable, but the recommendation is that it should be odd. In our case, we have kept the value of k as 5. This means select the top five and check which class has the majority, and predict this one.

3.2.6. Ensemble Learning

Combining two or more models and the same dataset and then training them is known as an ensemble method. We have also tried out ensemble on our dataset for maintenance prediction. We use Decision Tree, Random Forest and gradient booster in one place, and each classifier trains on data and predicts according to its learning, and after obtaining output from all, we apply a voting system and pick the majority vote result.

3.3. Model Evaluation

At the time of splitting the data, we have divided it into two parts: one is training, and the other is testing. Now, it is time to test the model for evaluating its performance. For evaluating a model’s performance, we have an accuracy measure, but sometimes it does not work better on imbalanced data, so we have another technique, named the confusion matrix. It shows four main things, like true positive (TP), false positive (FP), true negative (TN) and false negative (FN) values that help to divide into the model’s performance. On top of the confusion matrix, we have precision, recall and F1 measure, which give more insights about the model.

4. Results

In this section, we are going to present our results, which we calculated from different models. We obtain a dataset for vehicle maintenance prediction from Kaggle and balance it. Then, train different models on top of it. We have used Decision Trees, Random Forest, gradient booster, naïve Bayes, KNN and ensemble learning to predict the maintenance need. We have applied all these models and checked their performance by a confusion matrix, and especially focused on accuracy, precision and recall.

4.1. Imbalanced Dataset Result

Table 1 shows the comparison of performance of machine learning algorithms.

On an imbalanced dataset, the highest accuracy we obtain is 99.97% by a Decision Tree with a recall of 100%. After this, the ensemble learning (combination of tree models) performs well and attains an accuracy of 99.93%, followed by gradient booster and Random Forest with 99.33% and 99.13%, respectively. The naïve Bayes has an accuracy of 94.55% with a better recall of 98.27%. And the KNN has the lowest at 77.56%, where we have selected the size of k as 5.

4.2. Balanced Dataset Result

Table 2 presents the results of evaluation metrics across all models.

After making the dataset balanced using SMOTE, we applied all models the same to those who are on an imbalanced dataset. We have a maximum accuracy of 99.58% by gradient booster with better both precision and recall, followed by a Decision Tree at 98.92% and Random Forest at 97.28%, but both have a better precision of 99.92% and 99.99%, respectively. The results are presented in Figure 5, which compares the accuracies of all machine learning algorithms.

4.3. Accuracy Comparison (Imbalanced and Balanced Datasets)

Overall, perhaps you may notice that the accuracy is slightly lower on the balanced dataset. Although the accuracy on imbalanced data (original dataset) is comparatively high, the challenge in it is that there are more likely chances that the model favours the majority class and ignores the minority, so, to avoid such a situation, we have balanced the dataset using SMOTE. And there are also chances of overfitting the model on an imbalanced dataset. Now each class has an equal stake, and it also helps to tackle the model’s bias condition if it occurs.

5. Conclusions

We have explored the role of ML in predicting vehicle maintenance needs. Traditional methods rely on a fixed schedule or mileage coverage that can be insufficient. By utilising the machine learning models, we have proposed an approach to predict maintenance based on different characteristics of vehicles like vehicle age, fuel type (Diesel, Petrol or Electric), engine size, mileage, previous maintenance history, meter reading, accident history, last service time, etc. We have collected a dataset from Kaggle that is highly imbalanced. On imbalance, we have reached maximum accuracy of 99.97% with a Decision Tree, followed by ensemble learning with values of 99.93%. Other models, such as Random Forest and gradient booster, also perform well. We have applied SMOTE to generate new data samples to balance the dataset. For that, we first convert categorical data into numerical data using encoding, then apply SMOTE and produce a new balanced dataset by ensuring that it does not affect existing ones. After balancing, we have a slight difference in accuracy. The highest we have is a 99.58% gradient booster. The results of both imbalanced and balanced define the strength of models, especially deep learning, which consistently shows better results. The results can help in enhancing maintenance strategies for both vehicle owners and fleet managers to minimise downtime and make maintenance cost-effective.

Author Contributions

G.M. conceptualized the study, designed the research framework, and supervised the project. M.H. carried out data collection, preprocessing, model implementation, evaluation of results, and prepared figures and tables. D.D.D. provided supervision and guidance, critically reviewed the methodology and results, validated findings, and contributed to manuscript editing and final approval. All authors have read and agreed to the published version of the manuscript.

Funding

Authors received no external funding for this research.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data will be made available upon reasonable request to the first author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Betterton, R. Average Cost of Car Maintenance: 2024 Estimates. Bankrate. Available online: https://www.bankrate.com/loans/auto-loans/average-car-maintenance-costs/ (accessed on 14 January 2025).
Baluch, A.C.A. How Much Car Maintenance Costs in 2025. Available online: https://insurify.com/car-insurance/knowledge/car-maintenance-costs/ (accessed on 14 January 2025).
Consumer Price Index for All Urban Consumers: Motor Vehicle Parts and Equipment in U.S. City Average (CUUR0000SETC) | FRED | St. Louis Fed. Available online: https://fred.stlouisfed.org/series/CUUR0000SETC (accessed on 14 January 2025).
Your Driving Costs 2024: How Much Does It Really Cost to Own a New Car? Available online: https://newsroom.aaa.com/wp-content/uploads/2024/09/YDC_Fact-Sheet-FINAL-9.2024.pdf (accessed on 14 January 2025).
Selvaraj, A.; Venkatachalam, D.; Sunder Singh, J.T. Advanced Telematics and Real-Time Data Analytics in the Automotive Industry: Leveraging Edge Computing for Predictive Vehicle Maintenance and Performance Optimization. J. Artif. Intell. Res. Appl. 2023, 3, 581–622. [Google Scholar]
Faisal, A.; Jhanjhi, N.Z.; Ashraf, H.; Ray, S.K.; Ashfaq, F. A Comprehensive Review of Machine Learning Models: Principles, Applications, and Optimal Model Selection. TechRxiv 2025. [Google Scholar] [CrossRef]
Chaudhuri, A. Predictive Maintenance for Industrial IoT of Vehicle Fleets Using Hierarchical Modified Fuzzy Support Vector Machine. arXiv 2018, arXiv:1806.09612. [Google Scholar] [CrossRef]
Chen, F.; Shang, D.; Zhou, G.; Ye, K.; Ren, F. Mileage-Aware for Vehicle Maintenance Demand Prediction. Appl. Sci. 2024, 14, 7341. [Google Scholar] [CrossRef]
Tessaro, I.; Mariani, V.C.; Coelho, L.d.S. Machine Learning Models Applied to Predictive Maintenance in Automotive Engine Components. Proceedings 2020, 64, 26. [Google Scholar] [CrossRef]
Chen, C.; Liu, Y.; Sun, X.; Di Cairano-Gilfedder, C.; Titmus, S. An Integrated Deep Learning-Based Approach for Automobile Maintenance Prediction with GIS Data. Reliab. Eng. Syst. Saf. 2021, 216, 107919. [Google Scholar] [CrossRef]
Theissler, A.; Pérez-Velázquez, J.; Kettelgerdes, M.; Elger, G. Predictive Maintenance Enabled by Machine Learning: Use Cases and Challenges in the Automotive Industry. Reliab. Eng. Syst. Saf. 2021, 215, 107864. [Google Scholar] [CrossRef]
Aravind, R.; Shah, C.V.; Surabhi, M.D. Machine Learning Applications in Predictive Maintenance for Vehicles: Case Studies. Int. J. Eng. Comput. Sci. 2022, 11, 25628–25640. [Google Scholar] [CrossRef]
Arena, F.; Collotta, M.; Luca, L.; Ruggieri, M.; Termine, F.G. Predictive Maintenance in the Automotive Sector: A Literature Review. Math. Comput. Appl. 2021, 27, 2. [Google Scholar] [CrossRef]
Jafari, R. Machine Learning for Predictive Maintenance in Autonomous Vehicle Fleets. J. AI-Assist. Sci. Discov. 2023, 3, 1–12. [Google Scholar]
Parida, P.R.; Murthy, C.J. Predictive Maintenance in Automotive Telematics Using Machine Learning Algorithms for Enhanced Reliability and Cost Reduction. J. Comput. Intell. Robot. 2023, 3, 44–82. [Google Scholar]
Al-Tarawneh, M.; Alhomaidat, F.; Twaissi, M. Unlocking Insights from Commercial Vehicle Data: A Machine Learning Approach for Predicting Commercial Vehicle Classes Using Michigan State Data (1999–2017). Results Eng. 2024, 21, 101691. [Google Scholar] [CrossRef]
Chen, F.; Shang, D.; Zhou, G.; Ye, K.; Wu, G. Multi-Source Data Fusion for Vehicle Maintenance Project Prediction. Future Internet 2024, 16, 371. [Google Scholar] [CrossRef]
Goriparthi, R.G. AI-Driven Predictive Analytics for Autonomous Systems: A Machine Learning Approach. Rev. Intel. Artif. Med. 2024, 15, 844–879. [Google Scholar]
Jiang, X.; Li, M.; Cheng, L. Application of Decision Tree and Machine Learning in New Energy Vehicle Maintenance Decision Making. Appl. Math. Nonlinear Sci. 2024, 9, 1–22. [Google Scholar] [CrossRef]
Shah, C.V. Machine Learning Algorithms for Predictive Maintenance in Autonomous Vehicles. Int. J. Eng. Comput. Sci. 2024, 13, 26015–26032. [Google Scholar] [CrossRef]
Raj, V.; Sharma, D. Predictive Maintenance in Autonomous Vehicles Using Machine Learning Techniques. In Proceedings of the 2024 2nd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), Faridabad, India, 28–29 November 2024; IEEE: New York, NY, USA; Volume 1, pp. 912–917. [Google Scholar]
Mishra, S.; Vassio, L.; Cagliero, L.; Mellia, M.; Baralis, E.; Loti, R.; Salvatori, L. Machine Learning Supported Next-Maintenance Prediction for Industrial Vehicles. In Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 30 March 2020; p. 24. [Google Scholar]
Ghulammahiyudin. Vehicle Maintenance Prediction. Available online: https://github.com/ghulammahiyudin/vehicle_maintenance_prediction (accessed on 19 January 2025).

Figure 1. Consumer Price Index.

Figure 2. Paper flow.

Figure 3. Proposed framework.

Figure 4. Dataset ratio Before SMOTE (a) and after SMOTE (b). (red: 0; blue: 1).

Figure 5. Accuracy comparison.

Table 1. Performance comparison of machine learning algorithms (imbalanced dataset).

Algorithm	Accuracy	Precision	Recall
Decision Tree	99.97%	99.96%	100%
Random Forest	99.13%	99.95%	98.98%
Gradient Booster	99.33%	100%	99.18%
Naïve Bayes	94.55%	95.16%	98.27%
KNN (k = 5)	77.56%	80.94%	94.57%
Ensemble Learning (DT, RF, GB)	99.93%	99.96%	99.95%

Table 2. Performance comparison of machine learning algorithms (balanced dataset).

Algorithm	Accuracy	Precision	Recall
Decision Tree	98.46%	99.92%	97.01%
Random Forest	97.28%	99.99%	94.56%
Gradient Booster	99.58%	99.98%	99.18%
Naïve Bayes	95.28%	93.16%	97.72%
KNN (k = 5)	54.35%	54.41%	53.62%
Ensemble Learning (DT, RF, GB)	98.30%	100%	96.61%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mahiyudin, G.; Hussain, M.; Dewi, D.D. A Comprehensive Study on Predicting the Need for Vehicle Maintenance Using Machine Learning. Eng. Proc. 2025, 107, 89. https://doi.org/10.3390/engproc2025107089

AMA Style

Mahiyudin G, Hussain M, Dewi DD. A Comprehensive Study on Predicting the Need for Vehicle Maintenance Using Machine Learning. Engineering Proceedings. 2025; 107(1):89. https://doi.org/10.3390/engproc2025107089

Chicago/Turabian Style

Mahiyudin, Ghulam, Manzoor Hussain, and Dhita Diana Dewi. 2025. "A Comprehensive Study on Predicting the Need for Vehicle Maintenance Using Machine Learning" Engineering Proceedings 107, no. 1: 89. https://doi.org/10.3390/engproc2025107089

APA Style

Mahiyudin, G., Hussain, M., & Dewi, D. D. (2025). A Comprehensive Study on Predicting the Need for Vehicle Maintenance Using Machine Learning. Engineering Proceedings, 107(1), 89. https://doi.org/10.3390/engproc2025107089

Article Menu

A Comprehensive Study on Predicting the Need for Vehicle Maintenance Using Machine Learning^†

Abstract

1. Introduction

2. Literature Review