Driver Identification System Based on a Machine Learning Operations Platform Using Controller Area Network Data

Shin, Hyunseo; Park, Wangyu; Kim, Suhong; Kweon, Juhum; Moon, Changjoo

doi:10.3390/electronics14061138

Open AccessArticle

Driver Identification System Based on a Machine Learning Operations Platform Using Controller Area Network Data

by

Hyunseo Shin

¹

,

Wangyu Park

¹,

Suhong Kim

¹,

Juhum Kweon

² and

Changjoo Moon

^1,*

¹

Department of Smart Vehicle Engineering, Konkuk University, Seoul 05029, Republic of Korea

²

Graduate School of Future Defense Technology Convergence, Konkuk University, Seoul 05029, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(6), 1138; https://doi.org/10.3390/electronics14061138

Submission received: 10 February 2025 / Revised: 12 March 2025 / Accepted: 13 March 2025 / Published: 14 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

Ensuring vehicle security and preventing unauthorized driving are critical in modern transportation. Traditional driver identification methods, such as biometric authentication, require additional hardware and may not adapt well to changing driving behaviors. This study proposes a real-time driver identification system leveraging a Machine Learning Operations (MLOps)-based platform that continuously re-trains a deep learning model using vehicle Controller Area Network (CAN) data. The system collects CAN data, converts them into Markov Transition Field (MTF) images, and classifies drivers using a ResNet-18 model deployed on the Google Cloud Platform (GCP). An automated pipeline utilizing Pub/Sub, GCP Composer, and Vertex AI ensures continuous model updates based on newly uploaded driving data. Our experimental results demonstrate that models trained only on recent data significantly outperform those incorporating historical data, highlighting the necessity of frequent retraining. The intruder detection system effectively identifies unregistered drivers, further enhancing vehicle security. By automating model retraining and deployment, this system provides an adaptive solution that accommodates evolving driving behaviors, reducing reliance on static models. These findings emphasize the importance of real-time data adaptation in driver authentication systems, contributing to enhanced vehicle security and safety.

Keywords:

driver identification; CAN data; MLOps; Google Cloud Platform

1. Introduction

1.1. Research Background

Vehicle security has become a critical issue in modern society [1]. In particular, detecting and alerting unauthorized drivers in real time is essential not only for preventing vehicle theft and unauthorized use but also for clarifying legal responsibility. A prevalent issue known as “driver substitution” occurs when an intoxicated driver attempts to evade accident liability by switching seats with another person. Identifying the actual driver in such cases is highly challenging, especially if the vehicle’s black box SD card is intentionally removed or if no nearby CCTV footage is available. For instance, to investigate a driver substitution case, the Seoul Metropolitan Police analyzed footage from over 40 CCTV cameras, a process that took approximately two weeks. If drivers could be identified using data acquired from within the vehicle itself, reliance on external CCTV analysis could be significantly reduced, saving time and minimizing costs. This is particularly important for commercial vehicles, such as buses and taxis, where multiple drivers share the same vehicle. Unlike private vehicles, commercial vehicles require precise driver identification at specific times for security and operational accountability. In such scenarios, a real-time driver detection and identification system based on registered driver information and driving patterns can play a vital role in enhancing vehicle security and clarifying driver responsibility. This technology not only helps prevent vehicle intrusions and unauthorized use but also contributes to traffic safety by identifying risky driving behaviors before they escalate into accidents.

Driver identification methods can be categorized into pre-driving identification, which determines the driver before the vehicle is operated, and post-driving identification, which identifies the current driver while the vehicle is in motion. One of the pre-driving identification methods involves facial recognition using a camera. The identification method using a camera achieved an accuracy of 88.7% [2]. However, the identity verification process in this approach raises privacy concerns and may cause discomfort for drivers who are reluctant to expose their faces to a camera. In contrast, post-driving identification uses data acquired within the vehicle during operation to determine the driver, reducing potential resistance from users. This method includes network monitoring to analyze driver behavior and CAN (Controller Area Network) data-based identification, which utilizes in-vehicle communication data such as speed, steering angle, and pedal positions. This approach does not require additional equipment but instead relies on existing in-vehicle sensor data [3,4,5]. In particular, research on CAN data and sensor-based driver identification is continuously evolving. Modern vehicles collect real-time information on vehicle states and driver behaviors through CAN data, which include driver operation patterns (e.g., steering angle, accelerator pedal pressure, brake pedal pressure) and vehicle states (e.g., speed, acceleration), making them highly useful for distinguishing drivers.

However, since a driver’s driving habits can change over time, a driver classification model initially trained on a dataset before these changes occur may not accurately reflect the updated driving patterns [6]. To address this issue, an automated retraining process for machine learning models is necessary to update the model based on the latest data [7,8,9]. Therefore, this study aims to build a system that retrains driver classification models using the latest driving datasets by incorporating updated driving habit data through an MLOps-based cloud platform, where MLOps (Machine Learning Operations) integrates machine learning model development, deployment, and monitoring into a continuous and automated pipeline. This approach overcomes the limitations of static model training in existing driver identification systems and enables the real-time reflection of continuously changing driving patterns. The key contributions of this study are as follows:

First, an MLOps-based automated model retraining and deployment system is developed to continuously reflect changes in a driver’s driving habits. While conventional machine learning models may experience performance degradation over time, this study focuses on maintaining and improving model accuracy through continuous data collection and training. Additionally, this study explores applications for enhancing vehicle security and safety. The developed system can be utilized for detecting unauthorized driver changes, managing commercial vehicle drivers, and identifying risky driving behaviors. These applications contribute to preventing vehicle theft, reducing accidents, and clarifying driver accountability. Beyond simply identifying drivers, this study establishes a framework for continuously improving driver identification models, thereby contributing to the implementation of a more reliable vehicle security system.

1.1.1. Research Overview

An overview of the system used in this study is illustrated in Figure 1. The system is broadly divided into two components: an edge layer, corresponding to the vehicle, and a cloud layer, where the MLOps platform is deployed. In the edge layer, the vehicle collects real-time CAN data and processes them through an algorithm that analyzes key driving parameters such as steering angle and driving speed. Based on this analysis, the system classifies the driving condition into one of six categories: left turn, right turn, U-turn, start, stop, or straight driving. The classified CAN data are then converted into images using a visualization technique and uploaded to the cloud. For instance, if a left turn is detected from 10 s of CAN data, the corresponding time-series data are visualized as an image and transmitted to the cloud. Once uploaded, the system re-trains the driver classification model using the newly generated data. The updated model (.pt file) is then made available for use in the vehicle. Through this iterative process, the system ensures that the driver classification model continuously adapts to evolving driving behaviors, maintaining real-time accuracy and relevance.

1.1.2. Literature Overview

David Hallac et al. (2016) proposed a method to identify drivers using vehicle sensor data during a single turn [10]. This study focused on learning unique patterns of driving behavior based on sensor data generated when a driver executes a specific turn, allowing for driver classification. The dataset was collected in real-world driving environments and included data from 10 vehicles and 64 drivers, with each turn recorded for an average of 8–10 s. By analyzing various sensor data, including steering angle, speed, accelerator pedal usage, and brake usage, the study achieved classification accuracies ranging from 76.9% (for two drivers) to 50.1% (for five drivers).

Jingbo Yang et al. (2020) proposed a deep learning model that analyzes unique driving habits and behaviors using vehicular sensor data for driver identification [11]. This model, based on data collected through simulation, identifies drivers using just 10 s of sensor data and achieves an average accuracy of 83.1%. This model’s performance was significantly higher than that of models investigated in previous studies and demonstrated consistent results across various driving conditions and scenarios.

Lee et al. (2022) proposed a driver identification method using vehicle sensor data [12]. In this study, driving data were collected from seven drivers using a driving simulator and processed with a CNN (Convolutional Neural Network) model for classification. To efficiently handle long time-series data, a sliding window technique was applied, and predictions for each frame were aggregated using a voting method to identify the final driver. This approach demonstrated a 19.94% improvement in accuracy compared to conventional CNN models and maintained accuracy even with a larger number of drivers. Although the method effectively distinguished driver patterns using long-term data, it did not use real vehicle data.

Kim Ju Yeop (2023) proposed a method to classify drivers by integrating CAN data with LiDAR data [13]. The study visualized data using the Recurrence Plot (RP) technique and utilized a CNN model for driver classification based on data collected through real-world experiments. By combining LiDAR data, which provided external environmental information, with CAN data, the study aimed to achieve higher classification accuracy. The strength of their study lay in its development of a realistic driver classification model using real-world data. However, the method exhibited low classification accuracy in certain driving scenarios and faced limitations in real-time processing and applicability due to the complexity of integrating CAN and LiDAR data.

Previous studies have achieved high levels of accuracy in driver identification using vehicle sensor data, but they have not sufficiently considered changes in driving habits over time. Additionally, many studies rely on simulation data, which may introduce discrepancies when compared to real-world driving data [14,15]. Table 1 provides a comparative summary of previous studies and the proposed approach, highlighting key differences in data sources, adaptation to changing driving behaviors, and methodological advancements. This study does not simply classify drivers based on specific road segments or simulation data. Instead, it develops a system that leverages real-world vehicle data to handle various driving scenarios while also enabling the real-time detection of unregistered drivers. Furthermore, an MLOps-based environment on the Google Cloud Platform (GCP) is employed to ensure that the model undergoes continuous retraining, allowing it to adapt to evolving driving behaviors. Unlike previous research, which primarily relied on the Recurrence Plot (RP) technique, this study utilizes Markov Transition Field (MTF) visualization to enhance driver classification accuracy. By incorporating this alternative approach, the system effectively captures driver-specific behavioral patterns, contributing to improved model performance and adaptability.

2. Vehicle (Local Edge)

This study applies an edge computing approach in which initial preprocessing of CAN data and image generation are performed on the vehicle’s local device to minimize data transmission and enhance real-time performance [16]. Compared to traditional methods that transmit raw data to the cloud for processing, this approach significantly reduces the network load and improves cloud resource efficiency [17]. In commercial environments where multiple vehicles process data simultaneously, local preprocessing helps reduce network costs and cloud computing resource consumption. Section 2.1, Section 2.2 and Section 2.3 provide a detailed explanation of the data processing and image visualization tasks performed within the vehicle.

2.1. CAN Data Collection and Preprocessing

To classify the six driving conditions, it was first necessary to collect CAN data from the vehicle. The CAN data contained approximately 300 columns that recorded various driving conditions, including overall battery status, steering angle, vehicle speed, and yaw rotation angle. The data were obtained from a Hyundai Ioniq 5 and included a column for the distance to the preceding vehicle. Over a 90- to 120-day period, CAN data were collected, including instances when the vehicle was not started, resulting in approximately 1.4 million seconds of data per vehicle. Table 2 presents only the CAN data column names used in this study from the Hyundai Ioniq 5 dataset. Ultimately, this approach established a stable and efficient MLOps environment.

Missing values of approximately 2 to 3 s may occur in CAN data. This is because CAN communication involves multiple in-vehicle devices simultaneously exchanging data over a network. The primary causes of missing values include network congestion delays and the characteristics of the CAN bus protocol, where lower-priority messages are queued to prevent transmission collisions when multiple devices send messages simultaneously [18]. To accurately classify the six driving conditions, it was essential to address missing values in the CAN data. Missing values can fall into other categories, such as random missing data, where the missing values occur at random and do not depend on any specific pattern, or causal missing data, where the missing values are related to a prior event or condition. Since CAN data are time-series-based, missing values often result from sensor errors or communication failures, which are typically classified as non-random missing data [19]. Missing values of 3 s or less were treated as non-random and input using the value from 1 s prior to the missing data. In contrast, missing values of 4 s or longer were corrected by removing the entire row to maintain data reliability.

After addressing missing values, the vehicle’s driving data visualization revealed speed variations, as shown in Figure 2. The graph clearly illustrates a repetitive pattern of stopping and driving, where speed values approach zero during stationary periods and exhibit varied fluctuations during driving periods. Notably, sharp increases or decreases in speed indicate acceleration or deceleration events, providing valuable insights for analyzing driving patterns. Furthermore, the continuity of the data was preserved after processing missing values, which enhances the reliability of the analysis [20].

2.2. Classification of Driving Conditions and Training Data Generation

2.2.1. Driving Condition Classification Criteria

Driving conditions were classified based on predefined ranges of steering angle and vehicle speed extracted from the CAN data. The specific criteria for each condition are presented in Table 3. To ensure robust maneuver detection, predefined thresholds were established based on an empirical analysis of CAN data. A left turn is identified when the vehicle speed is above 0 km/h and the steering angle falls between +70° and +300°, while a right turn is detected when the steering angle is between −70° and −300°, both of which must be sustained for multiple consecutive instances to ensure stability. A U-turn is classified when the steering angle exceeds ±300°, as this maneuver involves extreme vehicle rotation.

Start and stop maneuvers were determined based on changes in vehicle speed. A start event is recognized when the vehicle accelerates from a near-zero speed with a continuous increase in velocity, while a stop event is detected when the vehicle decelerates to zero within a short period. To prevent false positives, these events must occur while the steering angle remains within ±20°, ensuring that sudden lane changes or parking maneuvers are not misclassified. Straight driving is assigned when the vehicle maintains a stable speed with minimal steering input, defined as a steering angle within ±15° and a speed above 5 km/h for a sustained duration. To prevent classification ambiguity, scenarios such as parking maneuvers with rapid steering fluctuations beyond ±300° or traffic congestion, where speed alternates between 0 and higher values, were labeled as “unknown”. The collected CAN data were recorded at a frequency of 1 Hz. For driving condition analysis, the data were initially segmented into blocks of 10 to 15 s. However, when using 15-s blocks, instances were observed where multiple driving conditions appeared within a single block. To improve classification accuracy, the block length was adjusted to 10 s to ensure that each block contained only one driving condition, enhancing the reliability of maneuver-based driver identification.

2.2.2. Generation of Training Data

To more effectively capture changes in steering angle and driving speed while generating enough training data, each block’s starting point was shifted by 1 s instead of aligning with the end of the previous block, resulting in a 9-s overlap, as illustrated in Figure 3. This approach provides a data augmentation effect, allowing, for instance, the creation of 10 training image datasets from 20 s of data [21]. These overlapping blocks incorporate diverse temporal information, enhancing the generalization performance of the model. This method increases data diversity and enriches the training dataset, even within the same data sequence, ensuring more robust learning and improved classification accuracy [22].

Based on approximately 700,000 s of raw CAN data from the Ioniq 5, missing values were processed, and the driving condition classification algorithm was applied to generate driving condition blocks. Each block was classified into one of the six predefined driving conditions—left turn, right turn, U-turn, start, stop, and straight driving—if it met the corresponding criteria. Cases that did not clearly fit into these categories were classified as “unknown”. Situations classified as unknown included parking scenarios, where the steering angle fluctuates sharply beyond −300 or 300 degrees, and congestion conditions, where the speed alternates repeatedly between 0 and a higher value. Approximately 2500 to 3000 images were collected for each of the five driving conditions, excluding straight driving, while around 27,000 images were gathered for straight driving. The higher number of images for straight driving is attributed to its dominance in road environments, as road design typically favors straight paths over turns [23,24].

The vehicle trajectory graph in Figure 4 visually confirms that the vehicle follows a left-turn trajectory, displaying changes in steering angle and speed over time. It also demonstrates that when the steering angle remains above a certain threshold, speed variations remain relatively stable. This verifies the consistency of steering and speed patterns during left-turn segments. Furthermore, utilizing GPS data, the vehicle’s actual trajectory can be tracked, visually indicating whether the system’s predicted driving condition aligns with the real-world trajectory.

2.3. Driving Condition MTF Visualization

In this study, the six classified driving conditions identified by the algorithm in Section 2.2 were converted into MTF (Markov Transition Field) images at the vehicle level. Since CAN data are time-series-based, possessing distinct patterns and temporal continuity, a visual representation is required to effectively highlight these characteristics [25]. Compared to directly classifying CAN time-series data using an RNN, converting the data into images provides a more intuitive interpretation of each driver’s unique driving patterns, making it easier to analyze steering angle and speed variations [26]. By transforming time-series data into images using MTF, the system can visually express data patterns and relationships, enabling the deep learning model to learn more effectively and enhancing its classification performance.

Unlike static datasets, our system dynamically generates images from time-series CAN data to ensure that the model does not learn from repetitive or highly similar driving conditions. Instead of capturing images at fixed timestamps, we apply random sampling across multiple driving sessions, incorporating data from various road conditions and time periods. This prevents the model from developing a bias toward specific temporal or environmental factors and ensures that it learns generalized driving behavior patterns. Additionally, to prevent class imbalance issues, which could cause the model to favor certain maneuvers over others, we ensure that the dataset maintains a balanced representation of different driving conditions. If an imbalance is detected, we downsample overrepresented classes or apply additional augmentations to underrepresented ones. This approach ensures that the model learns each maneuver type with equal importance, preventing overfitting to dominant driving patterns and enhancing its ability to generalize across various driving conditions.

To further improve the generalization of MTF images and prevent overfitting, post-processing techniques were applied. First, all MTF images were normalized to maintain intensity values within the range [0, 1], ensuring consistency across different driving sessions. Additionally, contrast enhancement techniques such as Adaptive Histogram Equalization (AHE) and Contrast Stretching were employed. Contrast stretching was applied by scaling pixel intensities between the 2nd and 98th percentiles, ensuring that critical transition patterns in driving behavior were preserved while preventing over-reliance on specific pixel values. These techniques collectively contribute to improved feature visibility and generalization, allowing the model to focus on meaningful behavioral transitions rather than fixed pixel structures. The image-based approach offers advantages in terms of pattern recognition and scalability, leveraging the visual characteristics of the data for improved analysis and model adaptability [27].

Time-series data can be transformed into images using various visualization techniques, such as Recurrence Plot (RP) and Markov Transition Field (MTF). RP visualizes patterns by identifying points where the system returns to a previous state, making it well suited for capturing periodic characteristics over time. However, RP does not clearly represent transitions between specific moments within the time series. In contrast, MTF represents state transitions as probabilities along the time axis, allowing it to capture both local variations and structural patterns in the data [28]. This makes it particularly effective for analyzing acceleration and deceleration patterns during left and right turns. Since drivers may exhibit individual variations in steering angle adjustments and acceleration behaviors when executing turns (left turn, right turn, U-turn), visualizing these differences as transition probabilities using MTF enables more precise extraction of unique driving characteristics. Additionally, MTF allows dynamic state transitions to be set based on data distribution, making it highly advantageous for analyzing time-series data containing diverse driving conditions, such as CAN data [29].

The MTF transformation process is as follows: First, the steering angle and speed data are normalized and discretized into Q states. Based on the discretized data, a Markov Transition Matrix (M) is generated, representing the probability of transitioning from a given state

s_{i}

to the next state

s_{j}

according to Equation (1).

M_{i j} = P (s_{t + 1} = j∣ s_{t} = i) = \frac{\sum_{t = 1}^{n - 1} 1 (s_{t} = i, s_{t + 1} = j)}{\sum_{t = 1}^{n - 1} 1 (s_{t} = i)}

(1)

where

\sum_{t = 1}^{n - 1} 1 (s_{t} = i, s_{t + 1} = j)

returns 1 if

s_{t} = i

and

s_{t + 1} = j

(otherwise, it returns 0), and

\sum_{t = 1}^{n - 1} 1 (s_{t} = i)

returns 1 if

s_{t} = i

(otherwise, it returns 0). Finally, according to Equation (2), an MTF image is generated by mapping the state transition probabilities at each time step to a 2D matrix, effectively capturing the temporal evolution of transitions.

M T F (i, j) = M_{s_{i,} s_{j}}

(2)

Figure 5 visualizes the right-turn scenarios of drivers A, B, and C over a 10-s period using the Markov Transition Field (MTF) technique, where we utilized the pyts (Python Time Series) library (version 0.12.0) to convert the time-series data into 10 × 10 images. This visualization incorporates data on the steering angle, vehicle speed, and the distance to the preceding vehicle. Notably, the distance to the preceding vehicle is capped at a maximum of 204 m, with a fixed value of 204 applied when no preceding vehicle is present. Consequently, only values below 204 were included in the visualization to accurately represent the characteristics of the distance maintained by the driver. Even for the same right-turn scenario, unique patterns for each driver emerged, highlighting that MTF images distinctly reveal individual driving habits, such as acceleration, changes in steering angle, and driving intervals during right turns.

2.4. Performance Evaluation of the Retrained Classification Model

This section demonstrates the necessity of retraining the driver identification system. To achieve this, retraining was conducted on the existing classification model, and its performance was evaluated to provide an objective analysis of the need for model updates.

The ResNet (Residual Network) model was selected for driver classification for the following reasons [30]: ResNet utilizes residual learning, which enhances model stability when learning complex features, allowing it to effectively capture the fine-grained patterns necessary for classifying different driving conditions [31]. Although ResNet-18 has a relatively lightweight architecture, overfitting occurred despite reducing the number of epochs, primarily due to the limited size of the driver classification dataset. To mitigate this issue, the final layer (Layer 4) of the network was removed to further reduce model complexity. By removing this layer, the model retains the core concept of residual learning while being optimized according to the characteristics of the dataset, ensuring a more efficient and adaptable structure [32]. Figure 6 illustrates the architecture of the modified ResNet-18 model, where Layer 4 has been removed to achieve a more compact design tailored to driver classification.

This study evaluated the modified ResNet-18 model’s effectiveness by setting up two primary test scenarios. The first scenario examined whether the system could correctly classify Driver D, who was not registered in the vehicle system, as an unregistered person while distinguishing between registered drivers A, B, and C. The second scenario assessed the impact of using updated driving data for training. Specifically, the model was trained using driving data from 2023 for drivers A, B, and C, excluding Driver A’s latest data. The model’s performance was then evaluated using Driver A’s most recent driving data, providing an indication of its accuracy, which serves as the fundamental metric for classification quality [33]. Accuracy was chosen as the primary evaluation metric because the dataset containing the data of each driver was balanced, ensuring equal class distribution. Since the key objective was to correctly identify Driver D as an unregistered person, other metrics such as precision and recall, which are influenced by class distribution imbalances, were not considered. Subsequently, the model was retrained using the most recent driving data for Drivers A, B, and C, and the same test was repeated using Driver A’s latest data to determine the model’s accuracy. This experiment assessed the impact of incorporating the latest driving data on model performance. Through these two evaluation scenarios, this study validated the effectiveness and practical performance of the proposed system.

To enhance the reliability of driver classification, our system performs multiple identification checks throughout a driving session rather than relying on a single attempt. A single misclassification does not trigger an alert; however, if a driver is consistently classified as unregistered across multiple maneuvers, such as left turns, right turns, and U-turns, the system reinforces this decision to prevent security risks. Conversely, occasional misclassifications of a registered driver do not generate alerts, minimizing false alarms while maintaining security standards. Since turning maneuvers involve distinct vehicle dynamics, consistent misclassification across all major maneuvers confirms unauthorized status. This multi-maneuver validation approach significantly reduces the risk of false acceptance, ensuring a secure and reliable driver authentication system.

In the first scenario, the system was tested to determine whether it could accurately classify Driver D as an unregistered person while distinguishing between registered drivers A, B, and C. To achieve this, a multi-class classification approach (four classes) was employed, evaluating whether the model—trained exclusively on Drivers A, B, and C—could correctly identify Driver D as an unregistered person whose data did not match those of any of the registered drivers [34]. During model training, Driver D’s data were excluded from the training set, ensuring that the model was trained only on Drivers A, B, and C. Subsequently, Driver D’s data were used as the evaluation dataset (Eval Set) to measure how accurately the model could predict Driver D as an unregistered person.

Table 4 summarizes the number of training, testing, and validation images for each driver. To prevent overfitting, the number of images used for training was limited to 700 per driver, as the visualization method is relatively simple and pixel-based, which could lead to an excessive increase in data volume. The dataset was split into training (80%), testing (10%), and validation (10%) sets, with images randomly selected in each epoch from the dataset using the method described in Section 2.2. This approach ensured that the model would not rely too heavily on specific data, promoting diverse data distribution for sufficient learning and reducing the risk of overfitting. Additionally, the model learned from new data combinations in each epoch, enhancing its generalization performance. Model training was conducted on an NVIDIA GeForce RTX 3060 GPU, requiring approximately 16 min for 10 epochs. The maximum training loss recorded was 0.3254, while the minimum training loss was 0.1322, and the number of epochs was capped at 10, based on the point at which no significant loss reduction was observed. The batch size was set to 64, considering the dataset size.

Table 5 presents the classification accuracy for each driving condition, evaluating the performance of the registered/unregistered driver classification model across six driving conditions: left turn, right turn, U-turn, start, stop, and straight. Accuracy represents the proportion of correct predictions of the total predictions, calculated using the following Formula (3):

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} * 100

(3)

In this study, to ensure model reliability and high generalization performance, only driving conditions with classification accuracy exceeding 70% were used. As a result, left turn, right turn, and U-turn met this criterion. Generally, an accuracy of 70% or higher is considered a reliable threshold for evaluating the predictive performance of machine learning models, and it is widely used as a benchmark for interpreting model effectiveness across various research fields [35].

Table 5 has been expanded to provide a more comprehensive evaluation of the system’s performance by incorporating Precision, Recall, and F1-score, in addition to Accuracy. These metrics offer a more detailed assessment of the classification model’s predictive capabilities across different driving conditions. Table 6 presents the classification results for left turn, right turn, and U-turn, which met the 70% accuracy threshold used in this study.

The U-turn scenario exhibited the highest performance across all metrics, with an accuracy of 79.32% and an F1-score of 0.7758, indicating strong predictive capability. Both left and right turns showed comparable performance, with accuracies of 70.13% and 72.30%, respectively. These additional evaluation metrics enhance the robustness of the model assessment and align with standard performance reporting practices in machine learning research.

In contrast, the start, stop, and straight conditions exhibited relatively lower classification accuracies of 62.67%, 52.21%, and 54.22%, respectively. This discrepancy can be attributed to the characteristics of the data for each driving condition. Specifically, these three conditions tended to show minimal or near-zero changes in steering angle, resulting in static patterns when visualized using the Markov Transition Field (MTF). Due to the lack of significant variations, the model likely struggled to differentiate these driving conditions. On the other hand, left turn, right turn, and U-turn exhibited distinct steering angle variations, forming clear visual patterns, which contributed to higher classification accuracies. Based on this analysis, the study utilized only the left turn, right turn, and U-turn conditions, which achieved classification accuracies above 70%, for distinguishing between registered and unregistered drivers.

In the second scenario, an experiment was conducted to evaluate the classification model’s performance and validate the necessity of retraining. To achieve this, we compared the classification accuracy of two models: one trained using historical and recent data, and another trained exclusively on recent data, both tested on the same latest dataset. If the model trained only on recent data achieved higher accuracy, it would indicate that the drivers’ driving habits changed over time, highlighting the need for continuous retraining to maintain model performance.

Table 7 presents the training datasets and the sizes of the training and evaluation datasets for each model. The historical data (Past A) were collected between January and April 2023, while the latest data (Recent A, Recent B, Recent C) were gathered from September to December 2024. The Past Included model was trained using Past A along with Recent B and Recent C, whereas the Recent Only model was trained exclusively on the latest dataset (Recent A, Recent B, Recent C). Following the data collection method described in Section 2.2, 1200 images were randomly selected per epoch, maintaining an 8:1:1 split for the training, testing, and validation sets. Additionally, the 400 images in the evaluation set (Eval Set) from Recent A were randomly selected from the entire dataset, ensuring that they did not overlap with the Recent A data used for training.

Table 8 compares the classification accuracy of the Past Included and Recent Only models for each driving condition. The table presents the models’ ability to accurately classify left turn, right turn, and U-turn using the most recent dataset (Recent A). The results indicate that the Past Included model achieved relatively low accuracies of 28.50%, 48.75%, and 39.50% for left turn, right turn, and U-turn, respectively. In contrast, the Recent Only model exhibited significantly higher accuracies of 66.50%, 72.50%, and 78.00%, demonstrating a substantial improvement over the Past Included model. These findings suggest that the Past Included model, trained with historical data, fails to capture the latest driving patterns, likely due to changes in driving habits. The superior performance of the Recent Only model further highlights that models trained exclusively on recent data more accurately identify drivers and evolving driving behaviors. This result strongly supports the necessity of continuous model retraining to maintain classification performance.

3. MLOps Platform (Cloud)

In this study, the Google Cloud Platform (GCP) was selected as the MLOps system infrastructure. The GCP provides scalable computing resources and data management tools, enabling flexible processing of large-scale data, model deployment, and monitoring [7,8]. Additionally, the GCP’s AI services and container-based environment allow for automated model deployment and continuous performance monitoring. This ensures that newly collected data can be rapidly integrated, maintaining model accuracy and enabling adaptive updates in response to changes in driving behavior. Through this approach, a stable and efficient MLOps environment was established, supporting the seamless retraining and deployment of the driver classification model.

Figure 7 illustrates the overall cloud-based architecture workflow, starting with CAN data collection from the vehicle, where edge computing is used to convert time-series data into MTF (Markov Transition Field) images. These images are then processed in the Google Cloud environment, enabling automated model training and deployment.

Step ➀ in Figure 7 is the image upload process, which triggers the training pipeline. The image files, generated using the method described in Section 2, are uploaded to a GCP Cloud Storage bucket with the filenames structured as <driver_driving condition>, supporting large-scale data storage and processing. Images are categorized into left, right, or U-turn folders based on the driving condition specified in the filename, and upon completion, a Pub/Sub message is generated to notify the system. Figure 8 displays a complete list of the generated Pub/Sub topics, while Figure 9 details Topic (1), showing an example of a message generated when an image is uploaded to the left-turn folder. The message payload contains metadata such as bucketId (storage location), eventTime (timestamp), and eventType (event occurrence type). Additionally, an empty text file is uploaded at the end of the process to trigger the “OBJECT_FINALIZE” eventType, signaling that the image upload has been successfully completed.

Subsequently, the Pub/Sub message triggers the first task in the Directed Acyclic Graph (DAG) of GCP Composer, a Google Cloud solution that utilizes Apache Airflow for workflow orchestration and deployment. The DAG defines the execution order and dependencies among individual tasks within the pipeline, ensuring a structured workflow. It includes a pipeline file that specifies the task flow, allowing the entire data processing pipeline to be executed automatically.

Figure 10 illustrates the complete workflow of the model training DAG (Directed Acyclic Graph) pipeline on this platform, corresponding to Step ➁ in Figure 7. The Pub/Sub Operator is configured to subscribe to messages published when an image is uploaded to the GCS bucket, which is handled by the [wait_for_upload] task in Figure 10. This task is set up to detect OBJECT_FINALIZE events, indicating successful file uploads. In addition to verifying image uploads, the [log_upload] task now incorporates model performance evaluation using Vertex AI Model Monitoring. If classification accuracy on the new dataset remains within an acceptable range, retraining is skipped to optimize efficiency. However, if a significant accuracy drop is detected, the [Retrain_resnet18] task is triggered to update the model.

This ensures that retraining occurs only when necessary, preventing redundant updates while maintaining model performance.

For training in Vertex AI, a module designed for machine learning model training and deployment, a custom dataset is required, which consists of image paths (URIs) and corresponding labels in a CSV file. The [create_dataset] task generates this CSV file, which contains the uploaded image paths and label information.

The [Retrain_resnet18] task triggers model retraining only when necessary, following Step ➂ in Figure 7. Instead of retraining upon every new data upload, the [log_upload] task first evaluates the model’s classification accuracy on the new dataset using Vertex AI Model Monitoring. If performance degradation is detected, the pipeline proceeds with the [Retrain_resnet18] task to update the model. Since Vertex AI requires a custom dataset, the CSV file created by the [create_dataset] task is loaded to structure the dataset. The training process is guided by a training script containing model parameters and hyperparameter settings (e.g., learning rate, batch size, and epochs). To ensure independent training pipelines for different driving conditions, separate retraining scripts were used for left, right, and U-turn scenarios. This structure allows each driving condition to be trained separately within the DAG pipeline.

Finally, the [check_pt] task verifies whether the .pt model file has been successfully generated. If all tasks are completed successfully, the system status changes to “success”, whereas any errors result in a “failed” status. By monitoring task statuses, the system can ensure proper operation and identify potential issues.

In Step ➃ of Figure 7, when a .pt file is uploaded to the GCS bucket, the GCP’s event-driven Pub/Sub system detects the upload and generates a file upload completion message via Topic ➃ in Figure 8. This message triggers Cloud Functions, a managed computing service that enables the execution of containers directly within Google’s infrastructure. This function is responsible for downloading the file from the GCS bucket and transferring it to a VM instance within the GCP.

For security reasons, direct communication between the GCP and the vehicle network is avoided; instead, the VM instance serves as an intermediary for file transfers [36,37]. This design mitigates potential security threats, such as data leaks, malware injection, and unauthorized access, by restricting direct cloud-to-vehicle data exchange [37]. Consequently, to enhance security, data transmission between the cloud and the vehicle is strictly controlled, utilizing the VM instance as a secure intermediary. Vehicles can then periodically download the model file from the VM instance, ensuring a secure and controlled deployment process.

Figure 11 displays the log records and transfer time for the .pt file transmission from Cloud Functions to the VM instance. The first log entry confirms that the .pt file has been successfully generated and uploaded to the GCP bucket. The second and third log entries indicate the process in which Cloud Functions reads the .pt file and downloads it in preparation for transferring it to the VM instance. Finally, the last log entry confirms that the .pt file has been successfully transferred and stored in the ptfile folder of the VM instance. Through the end-to-end process outlined in Section 3, this MLOps platform effectively facilitates the retraining and deployment of the driver classification model, ensuring a seamless and automated model update pipeline.

4. Conclusions

This study developed and implemented a driver identification system based on an MLOps platform using CAN data. The system enables real-time processing of large-scale vehicle data, continuously updating the model to reflect changing driving patterns by analyzing the driver’s habits. The core of this research lies in utilizing the Google Cloud Platform (GCP) to collect data in real time, and using the ResNet-18 model to identify the driver based on these data. This technology precisely distinguishes specific driving situations, contributing to security and traffic safety.

The main objective of this study was to establish an MLOps-based environment for identifying drivers’ identities using only the CAN data collected from within the vehicle and to enable the model to be retrained according to evolving driving habits. To achieve this, real-time data processing, cloud-based model updates, and an automated driver identification system were developed. The implemented system could play a crucial role in vehicle security, accident prevention, and driver identification. By incorporating changes in driving habits into the model, the system is expected to prevent crimes such as vehicle theft and the illegal driving of commercial vehicles and improve traffic safety. Additionally, thanks to the ResNet-18 model, the system’s implementation resulted in high accuracy in analyzing driving habits, and the process of retraining and deploying models in real time was automated through the MLOps pipeline. The model was able to distinguish drivers based on six driving conditions, and only models with a validation accuracy above 70% were deployed, thereby enhancing performance. By setting performance and accuracy thresholds, the system’s ability to handle a wide range of real-world driving scenarios was improved.

In the GCP environment, data transmission and processing are critical considerations, as delays and network congestion can occur in large-scale vehicle networks. To address this issue, edge computing technology was introduced, allowing some processing to occur within the vehicle before data are transmitted to the cloud, thereby reducing costs and improving real-time processing performance. To enhance security, the system avoids direct connections between the vehicle and the cloud by using a VM instance as an intermediary. This approach enhances security, particularly in public and commercial vehicles, and can be expanded to accommodate various vehicle types and usage environments in the future. For example, it could serve as a critical technology for driver authentication in commercial vehicles such as taxis and buses.

Unlike previous studies, which primarily relied on static models trained on fixed datasets, this study introduces a dynamic system that continuously retrains itself through an MLOps-based pipeline. Existing research on driver identification has either focused on limited driving conditions (e.g., specific turns, controlled environments) or relied heavily on simulation data, which may not fully capture real-world variations in driving behavior. In contrast, this study not only employs real-world CAN data but also adapts to changes in driving habits over time, ensuring that the model remains accurate and reliable under various driving conditions.

Additionally, previous studies often faced limitations in scalability and real-time applicability due to the need for extensive offline processing. This study overcomes such challenges by implementing a cloud-based MLOps framework that enables automated deployment and updating of models. The introduction of edge computing further enhances real-time processing capabilities, addressing concerns related to network latency and large-scale data transmission. Moreover, this study introduces the use of Markov Transition Field (MTF) visualization, which provides improved feature extraction compared to conventional approaches such as Recurrence Plot (RP). This method enhances the system’s ability to capture fine-grained driving behavior patterns, ultimately leading to better classification accuracy. However, since this study primarily utilizes CAN data, its accuracy may be affected in specific driving environments, such as high-speed driving or sharp turns, where data reliability might decrease.

To address this limitation, future research will explore selecting a broader range of CAN data features to improve driver identification. For instance, engine RPM can reflect driving aggressiveness, while the frequency and intensity of brake and accelerator pedal usage can provide insights into a driver’s tendencies and efficiency. Furthermore, fusing external sensor data (e.g., cameras, LiDAR, GPS) with CAN data is expected to enhance both the accuracy and robustness of the system. Conducting experiments under diverse driving conditions will also contribute to improving model performance. Additionally, optimizing learning speed and reducing model complexity will be key areas of focus to ensure the system’s feasibility for real-world deployment across various vehicle platforms.

Author Contributions

Conceptualization, H.S. and C.M.; methodology, H.S. and S.K.; software, H.S.; validation, H.S.; formal analysis, H.S., W.P., S.K., and J.K.; investigation, H.S.; resources, C.M.; data curation, H.S.; writing—original draft preparation, H.S.; writing—review and editing, H.S., W.P., J.K., and C.M.; visualization, H.S. and W.P.; supervision, J.K. and C.M.; project administration, H.S.; funding acquisition, C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by a Korea Institute for Advancement of Technology (KIAT) grant funded by the Korean Government (MOTIE) (P0020536, HRD Program for Industrial Innovation).

Data Availability Statement

The data are available upon request to the authors.

Acknowledgments

This paper utilized datasets provided by HansNet Inc. (S. Korea) and was supported by Konkuk University Researcher Fund in 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Y.; Wang, J. A survey on vehicle security: Challenges and solutions. J. Transp. Secur. 2023, 16, 25–45. [Google Scholar]
Song, Y.J. Real-time Driver Behavior Recognition System Using a CNN-LSTM Model. Master’s Thesis, Hanyang University, Seoul, Republic of Korea, February 2021. [Google Scholar]
Ezzini, S.; Berrada, I.; Ghogho, M. Who is behind the wheel? Driver identification and fingerprinting. J. Big Data 2018, 5, 9. [Google Scholar] [CrossRef]
Abdennour, N.; Ouni, T.; Ben Amor, N. Driver identification using only the CAN-Bus vehicle data through an RCN deep learning approach. Robot. Auton. Syst. 2021, 136, 103707. [Google Scholar] [CrossRef]
Dolos, K.; Meyer, C.; Attenberger, A.; Steinberger, J. Driver identification using in-vehicle digital data in the forensic context of a hit and run accident. Forensic Sci. Int. Digit. Investig. 2020, 35, 301090. [Google Scholar]
Peppes, N.; Alexakis, T.; Adamopoulou, E.; Demestichas, K. Driving Behaviour Analysis Using Machine and Deep Learning Methods for Continuous Streams of Vehicular Data. Sensors 2021, 21, 4704. [Google Scholar] [CrossRef]
John, M.M.; Olsson, H.H.; Bosch, J. Towards MLOps: A Framework and Maturity Model. In Proceedings of the 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), Palermo, Italy, 1–3 September 2021. [Google Scholar]
Kreuzberger, D.; Kühl, N.; Hirschl, S. Machine Learning Operations (MLOps): Overview, Definition, and Architecture. IEEE Access 2023, 11, 31866–31875. [Google Scholar] [CrossRef]
Ma, L.; Zhang, W.; Jiao, J.; Wang, W.; Butrovich, M.; Lim, W.S.; Menon, P.; Pavlo, A. MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD ’21), Virtual Event, 20–25 June 2021. [Google Scholar]
Hallac, D.; Sharang, A.; Stahlmann, R.; Lamprecht, A.; Huber, M.; Roehder, M.; Sosič, R.; Leskovec, J. Driver Identification Using Automobile Sensor Data from a Single Turn. In Proceedings of the 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016. [Google Scholar]
Yang, J.; Zhao, R.; Zhu, M.; Hallac, D.; Sodnik, J.; Leskovec, J. Driver2vec: Driver Identification from Automotive Data. In Proceedings of the 2020 MileTS Workshop, ACM, San Diego, CA, USA, 24 August 2020. [Google Scholar]
Lee, S.H.; Lim, J.B.; Kim, T.G.; Cho, Y.H.; Lee, J.S.; Jeon, D.K.; Kim, D.H.; Choi, J.W.; Baek, Y.J. CNN-based Driver Identification Method for Long-Term Time Series Driving Data. In Proceedings of the 2022 Korean Institute of Communications and Information Sciences (KICS) Winter Conference, Pyeongchang, Republic of Korea, 9–11 February 2022. [Google Scholar]
Kim, J. Driver Identification Model Using CAN Data and LiDAR Data. Master’s Thesis, Konkuk University, Department of Smart Vehicle Engineering, Seoul, Republic of Korea, February 2023. [Google Scholar]
Zhang, S.; Zhao, C.; Zhang, Z.; Lv, Y. Driving simulator validation studies: A systematic review. Simul. Model. Pract. Theory 2025, 138, 103020. [Google Scholar] [CrossRef]
Reimer, B.; D’Ambrosio, L.A.; Coughlin, J.F.; Kafrissen, M.E.; Biederman, J. Using self-reported data to assess the validity of driving simulation data. Behav. Res. Methods 2006, 38, 314–324. [Google Scholar] [CrossRef]
Wang, M.; Yi, H.; Jiang, F.; Lin, L.; Gao, M. Review on Offloading of Vehicle Edge Computing. J. Artif. Intell. Technol. 2022, 2, 132–143. [Google Scholar] [CrossRef]
Lin, L.; Liao, X.; Jin, H.; Li, P. Computation Offloading Toward Edge Computing. Proc. IEEE 2019, 107, 1584–1592. [Google Scholar] [CrossRef]
Avatefipour, O.; Malik, H. State-of-the-Art Survey on In-Vehicle Network Communication: “CAN-Bus” Security and Vulnerabilities. J. Automot. Cybersecur. 2022, 7, 45–67. [Google Scholar]
Kim, E.D.; Ko, S.K.; Son, S.C.; Lee, B.T. Technical Trends of Time-Series Data Imputation. Electron. Telecommun. Trends 2021, 36, 146–153. [Google Scholar]
Thakur, K.; Kumar, H.; Snehmani. Advancing Missing Data Imputation in Time-Series: A Review and Proposed Prototype. In Proceedings of the 2023 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), Windhoek, Namibia, 16–18 August 2023. [Google Scholar]
Maharana, K.; Mondal, S.; Nemade, B. A Review: Data Pre-Processing and Data Augmentation Techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
Mumuni, A.; Mumuni, F. Data Augmentation: A Comprehensive Survey of Modern Approaches. Array 2022, 16, 100258. [Google Scholar] [CrossRef]
Park, J.; Abdel-Aty, M. Safety Performance of Combinations of Traffic and Roadway Cross-Sectional Design Elements at Straight and Curved Segments. J. Transp. Eng. Part A Syst. 2017, 143, 04017015. [Google Scholar] [CrossRef]
Múčka, P. Longitudinal Road Profile Spectrum Approximation by Split Straight Lines. J. Transp. Eng. 2012, 138, 243–251. [Google Scholar] [CrossRef]
Wang, Z.; Yan, W.; Oates, T. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017. [Google Scholar]
Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Phys. D 2020, 404, 132306. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Wang, Z.; Oates, T. Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks. In Proceedings of the 2015 AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Lin, J.; Keogh, E.; Wei, L.; Lonardi, S. Experiencing SAX: A Novel Symbolic Representation of Time Series. Data Min. Knowl. Discov. 2007, 15, 107–144. [Google Scholar] [CrossRef]
Wu, Z.; Shen, C.; van den Hengel, A. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. Pattern Recognit. 2019, 90, 119–133. [Google Scholar] [CrossRef]
Huang, G.; Sun, Y.; Liu, Z.; Sedra, D.; Weinberger, K.Q. Deep Networks with Stochastic Depth. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Novaković, J.D.; Veljović, A.; Ilić, S.S.; Papić, Z.; Tomović, M. Evaluation of Classification Models in Machine Learning. Theory Appl. Math. Comput. Sci. 2017, 7, 39–46. [Google Scholar]
Grandini, M.; Bagli, E.; Visani, G. Metrics for Multi-Class Classification: An Overview. arXiv 2020, arXiv:2008.05756. [Google Scholar]
Hand, D.J. Classifier Technology and the Illusion of Progress. Stat. Sci. 2006, 21, 1–15. [Google Scholar] [CrossRef]
Subashini, S.; Kavitha, V. A Survey on Security Issues in Service Delivery Models of Cloud Computing. J. Netw. Comput. Appl. 2011, 34, 1–11. [Google Scholar] [CrossRef]
Yang, T.; Sun, R.; Rathore, R.S.; Baig, I. Enhancing Cybersecurity and Privacy Protection for Cloud Computing-Assisted Vehicular Network of Autonomous Electric Vehicles: Applications of Machine Learning. World Electr. Veh. J. 2025, 16, 14. [Google Scholar] [CrossRef]

Figure 1. Driver classification system overview and data processing flow.

Figure 2. Visualization of vehicle speed variations after missing value processing.

Figure 3. Overlapping data blocks for training data augmentation.

Figure 4. Visualization of vehicle trajectory, steering angle, and speed during a left turn.

Figure 5. MTF visualization of right-turn scenarios for drivers A, B, and C.

Figure 6. The architecture of the modified ResNet-18 model with convolutional and pooling layers.

Figure 7. Cloud-based architecture workflow for model training and deployment.

Figure 8. List of all generated topics for data upload and processing.

Figure 9. Example of a message generated when an image is uploaded to the left-turn folder.

Figure 10. Directed Acyclic Graph (DAG) workflow for automated model training.

Figure 11. Log records and transfer time for the .pt file transmission from Cloud Functions to the VM instance.

Table 1. Comparison of previous studies and the proposed method in terms of approach and performance.

	Study	Category	Data Source	Methodology	Accuracy	Strengths	Limitations
1	David Hallac et al. (2016) [10]	Sensor-based methods	Real-world data (10 vehicles, 64 drivers)	Analyzing sensor data from a single turn	76.9% (2 drivers), 50.1% (5 drivers)	Uses real-world data, focuses on unique driving behaviors	Limited to single-turn analysis
2	Jingbo Yang et al. (2020) [11]	Machine learning methods	Simulation data	Deep learning model using sensor	83.1%	High accuracy, effective across different conditions	Simulation data may not fully reflect real-world scenarios
3	Lee et al. (2022) [12]	Machine learning methods	Driving simulator (7 drivers)	CNN model with sliding window technique	19.94% improvement over conventional CNN	Sliding window technique improves time-series processing	No real-world vehicle data; limited driver diversity
4	Kim Ju Yeop (2023) [13]	Machine learning methods	Real-world data (CAN + LiDAR)	CNN model with Recurrence Plot (RP) visualization	Improved classification using LiDAR + CAN	Combines CAN and LiDAR data for enhanced accuracy	Low accuracy in certain scenarios; real-time processing limitations
5	Proposed	Machine learning methods	Real-world data (CAN)	Markov Transition Field (MTF) visualization with real-time adaptation	Continuously improving accuracy with real-time retraining	Driver change detection, adaptation to evolving behaviors and enhanced classification accuracy with MTF	Low accuracy in linear driving scenarios

Table 2. CAN data columns and vehicle specifications used in this study.

Index Time (s)	Steering Angle (°)	Vehicle Speed (km/h)	GPS Longitude (°)	GPS Latitude (°)	Following Distance (m)
Idx_time	EPS_SteeringAngle	VCU_VehicleSpeed_STD	GPS_Longitude	GPS_Latitude	ADAS_DistanceToTarget

Table 3. Classification criteria for six driving conditions based on CAN data.

Driving Condition	Criteria
Left Turn	- Speed > 0 - 70 < steering angle < 300
Right Turn	- Speed > 0 - −300 < steering angle < −70
U-turn	- Speed > 0 - Steering angle > 300
Start	- Speed = 0 for 3 s - Then, speed > 0
Stop	- Speed changes to 0 - Speed remains at 0 for 3 s
Straight	- Speed > 5) ratio ≥ 0.5 - (Steering angle < 15) - (Gear = drive) ratio ≥ 0.9

Table 4. Distribution of training, testing, and validation images for detecting unregistered people.

	Training	Testing	Validation	Eval (Intruder)
Driver A (Registered)	560	70	70	700
Driver B (Registered)	560	70	70
Driver C (Registered)	560	70	70

Table 5. Classification accuracy of the registered/unregistered driver model across driving conditions.

Driving Condition	Accuracy (%)
Left Turn	70.13
Right Turn	72.3
U-turn	79.32
Start	62.67
Stop	52.21
Straight	54.22

Table 6. Classification performance of the registered/unregistered driver model across driving conditions.

	Accuracy	Precision	Recall	F1-Score
Left	0.7013	0.7191	0.6813	0.6536
Right	0.7230	0.7235	0.7030	0.6958
Uturn	0.7932	0.8147	0.7732	0.7758

Table 7. Comparison of model performance using historical and recent data.

Model	Training Set	Dataset Size	Eval Set (Images)
Past Included	Past A + Recent B + Recent C	1200	Recent A (400)
Recent Only	Recent A + Recent B + Recent C	1200	Recent A (400)

Table 8. Comparison of classification accuracy between past included and recent only models.

Model	Left Turn (%)	Right Turn (%)	U-Turn (%)
Past Included	33.50%	48.75%	39.50%
Recent Only	66.50%	72.50%	78.00%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shin, H.; Park, W.; Kim, S.; Kweon, J.; Moon, C. Driver Identification System Based on a Machine Learning Operations Platform Using Controller Area Network Data. Electronics 2025, 14, 1138. https://doi.org/10.3390/electronics14061138

AMA Style

Shin H, Park W, Kim S, Kweon J, Moon C. Driver Identification System Based on a Machine Learning Operations Platform Using Controller Area Network Data. Electronics. 2025; 14(6):1138. https://doi.org/10.3390/electronics14061138

Chicago/Turabian Style

Shin, Hyunseo, Wangyu Park, Suhong Kim, Juhum Kweon, and Changjoo Moon. 2025. "Driver Identification System Based on a Machine Learning Operations Platform Using Controller Area Network Data" Electronics 14, no. 6: 1138. https://doi.org/10.3390/electronics14061138

APA Style

Shin, H., Park, W., Kim, S., Kweon, J., & Moon, C. (2025). Driver Identification System Based on a Machine Learning Operations Platform Using Controller Area Network Data. Electronics, 14(6), 1138. https://doi.org/10.3390/electronics14061138

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Driver Identification System Based on a Machine Learning Operations Platform Using Controller Area Network Data

Abstract

1. Introduction

1.1. Research Background

1.1.1. Research Overview

1.1.2. Literature Overview

2. Vehicle (Local Edge)

2.1. CAN Data Collection and Preprocessing

2.2. Classification of Driving Conditions and Training Data Generation

2.2.1. Driving Condition Classification Criteria

2.2.2. Generation of Training Data

2.3. Driving Condition MTF Visualization

2.4. Performance Evaluation of the Retrained Classification Model

3. MLOps Platform (Cloud)

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI