Multi-Source Data Fusion for Vehicle Maintenance Project Prediction

Chen, Fanghua; Shang, Deguang; Zhou, Gang; Ye, Ke; Wu, Guofang

doi:10.3390/fi16100371

Open AccessArticle

Multi-Source Data Fusion for Vehicle Maintenance Project Prediction

by

Fanghua Chen

^1,2,3,*

,

Deguang Shang

¹,

Gang Zhou

^2,3,

Ke Ye

¹

and

Guofang Wu

^2,3

¹

College of Mechanical and Energy Engineering, Beijing University of Technology, Beijing 100124, China

²

Automobile Transportation Research Center, Research Institute of Highway Ministry of Transport, Beijing 100088, China

³

Key Laboratory of Operation Safety Technology on Transport Vehicles, Research Institute of Highway Ministry of Transport, Beijing 100088, China

^*

Author to whom correspondence should be addressed.

Future Internet 2024, 16(10), 371; https://doi.org/10.3390/fi16100371

Submission received: 12 August 2024 / Revised: 9 October 2024 / Accepted: 12 October 2024 / Published: 14 October 2024

(This article belongs to the Topic Smart Product Design and Manufacturing on Industrial Internet)

Download

Browse Figures

Versions Notes

Abstract

:

Ensuring road safety is heavily reliant on the effective maintenance of vehicles. Accurate predictions of maintenance requirements can substantially reduce ownership costs for vehicle owners. Consequently, this field has attracted increasing attention from researchers in recent years. However, existing studies primarily focus on predicting a limited number of maintenance needs, predominantly based solely on vehicle mileage and driving time. This approach often falls short, as it does not comprehensively monitor the overall health condition of vehicles, thus posing potential safety risks. To address this issue, we propose a deep fusion network model that utilizes multi-source data, including vehicle maintenance record data and vehicle base information data, to provide comprehensive predictions for vehicle maintenance projects. To capture the relationships among various maintenance projects, we create a correlation representation using the maintenance project co-occurrence matrix. Furthermore, building on the correlation representation, we propose a deep fusion network that employs the attention mechanism to efficiently merge vehicle mileage and vehicle base information. Experiments conducted on real data demonstrate the superior performance of our proposed model relative to competitive baseline models in predicting vehicle maintenance projects.

Keywords:

deep learning; multi-source data; vehicle maintenance; projects prediction

1. Introduction

With the rapid evolution of the vehicle industry, vehicles have become an indispensable and fundamental aspect of modern life. Consequently, the vehicle maintenance industry has emerged as a crucial service sector, directly impacting individuals’ well-being and overall quality of life. To ensure a secure and reliable driving experience while minimizing maintenance expenses, accurate vehicle maintenance prediction has garnered significant attention from both researchers and industry professionals in this domain. Unfortunately, current maintenance practices primarily rely on fixed time intervals or mileage, which represent a conventional yet restricted approach to planned maintenance. This approach overlooks the varying usage patterns and driving environments of different vehicles, consequently hampering maintenance flexibility and resulting in excessive costs and unnecessary maintenance procedures.

Throughout the vehicle maintenance procedure, multiple maintenance tasks exhibit interconnectedness, indicating strong correlations between them. For instance, to maintain the engine’s lubrication and cooling functionalities, it becomes imperative to simultaneously replace both the oil and oil filter. To sustain optimal combustion efficiency and emissions performance, the simultaneous replacement of both the air filter and fuel filter ensures a continuous supply of clean air and fuel to the engine. In order to uphold the ignition system’s functionality and brake operation, replacing the spark plugs and ignition coils after reaching a predetermined mileage is crucial.

Mileage is a pivotal factor in determining the requirement of vehicle maintenance. The number of common maintenance projects (Here, common maintenance projects are those that need to be performed more than five times a year.) for vehicles with different annual mileage, as depicted in Figure 1, varies based on experimental data. Heightened mileage intensifies the wear and aging process of vehicle components. Frequently used parts such as the engine, suspension, and braking systems endure heightened strain, fatigue, or even damage as mileage escalates. Moreover, long journeys result in increased fuel consumption, compromised lubricant quality, and associated complications, increasing the maintenance risk factor. Therefore, vehicles require more frequent and varied engagement with maintenance as mileage accumulates.

The vehicle base information, including location, manufacturer, type, engine type, engine displacement, model year, production year, and assembly plant, is closely related to the development of maintenance projects. Varied regions present dissimilar climates, road conditions, and environments, which in turn impact the demand for specific vehicle maintenance endeavors. Manufacturers vary in terms of vehicle design, performance benchmarks, and the developmental methodologies employed for their respective maintenance products. Additionally, disparate vehicle types involve unique high-stress components during operation, requiring specialized maintenance approaches. Diverse engine classifications and capacities require distinct maintenance procedures due to their inherent design and functional differences. Notably, advancing vehicle technology leads to discrete engineering upgrades and modifications across varying models and production years, consequently influencing corresponding maintenance protocols. Thus, different assembly processes and component selections at various assembly facilities result in discrepancies in maintenance requirements.

Drawing upon the aforementioned context, to address the prevalent challenges encountered in predicting vehicle maintenance projects, we propose a methodology founded upon vehicle maintenance record data and vehicle base information data. The former encapsulates maintenance projects and maintenance mileage, while the latter encompasses vehicle’s location, manufacturer, type, engine type, engine displacement, model year, production year, and assembly plant.

To the best of our knowledge, this paper represents the inaugural endeavor to predict vehicle maintenance projects and makes the following contributions:

To fully exploit the interdependencies among various maintenance projects, we introduce a novel correlation representation scheme for maintenance projects based on the co-occurrence matrix.
We develop a deep fusion network, endowed with an attention mechanism, to seamlessly integrate vehicle mileage and foundational vehicle information into the vehicle maintenance project prediction framework.
Extensive experiments conducted with real-world data demonstrate the superior performance of our model relative to contemporary baselines.

2. Related Work

Predicting vehicle maintenance projects is a complex endeavor, involving the forecasting of future maintenance needs using historical maintenance records and fundamental vehicle information, which constitutes a quintessential time series prediction task. Accordingly, we will offer a comprehensive description of the principal time series prediction techniques. Time series prediction techniques can be classified into traditional approaches, machine learning-based methods, and deep learning-based techniques.

2.1. Traditional Time Series Prediction Methods

Traditional time series analysis centers on establishing parametric models, determining model parameters, and utilizing the resolved models for future predictions. Exponential smoothing, introduced by Robert G. Brown [1], forecasts future values by assigning weights to historical values. The moving average method, devised by George W. Brown [2], computes the mean within a specific time window to forecast future values. Building on these methods, Zhang G. P. developed the extensively utilized ARIMA model [3]. ARIMA amalgamates autoregressive and moving average models to capture data trends, seasonal variations, and noise characteristics. Prior to prediction, the observed value series is subjected to tests for smoothness, white noise evaluation, and assessments of the autocorrelation coefficient and partial autocorrelation coefficient. Although traditional methods excel in addressing straightforward time series prediction problems, they encounter challenges in managing high-dimensional time series data with intricate dependencies and nonlinearities.

2.2. Machine Learning-Based Time Series Prediction Methods

Machine learning methodologies, particularly linear regression techniques [4], are significantly linked to time series prediction tasks. Linear regression employs a linear model to predict the future trajectory of a series, presuming a linear relationship between the input and output variables. SVM [5], grounded in statistical learning theory, employs kernel functions to map data to high-dimensional spaces, thus overcoming issues related to dimensionality. SVR [6], a variant of SVM functioning as regression analysis, transforms data nonlinearly within this feature-rich space. SVR determines a function that precisely represents the relationship between input and output data, thus facilitating time series fitting and prediction. MLP [7], characterized by its hidden layers and output layer, generates predictions through the iterative adjustment of weights and biases via backpropagation. The utilization of nonlinear activation functions and multiple layers in MLP enables it to learn and represent complex nonlinear relationships. HMMs [8] provide a probabilistic framework for the modeling of multivariate time series predictions. HMMs utilize hidden Markov chains, which represent underlying stochastic processes and can be estimated through a sequence of observations.

2.3. Deep Learning-Based Time Series Prediction Methods

Deep learning has undergone rapid development and achieved significant progress in predicting time series data. CNN [9], RNN [10], LSTM [11], Transformer [12], and GNN [13] are extensively employed across a range of time series prediction tasks. Numerous enhanced methodologies have been proposed based on these models. TCN [14] treats the time series as a one-dimensional object, capturing long-term relationships through iterative multilayer convolution. SCINet [15] employs a hierarchical convolutional network structure to extract and aggregate features at different temporal resolutions. To predict future diseases, RETAIN [16], Dipole [17], and Timeline [18] integrate RNN with attention mechanisms [12] to model and analyze patients’ historical disease diagnostic data. Deep State Space [19] models the relationship between consecutive hidden states via an RNN, enabling predictions from the current hidden state to the desired outcome. Log Sparse Transformer [20] employs causal convolution to generate Queries and Keys in the self-attention layer, thus introducing log sparse sparsity into the model. Chet [21] integrates GNN with the attention mechanism to accurately predict diseases for patients. In order to capture latent spatial dependencies in the data, Graph Wave Net [22] introduces an adaptive graph modeling technique. CoDMO [23] utilizes correlation-enhanced hierarchical propagation models and prior interactions in historical records to learn dual medical ontology representations for predicting a patient’s future conditions and procedures during the next admission. HGV4Risk [24] proposes the Global Graph Embedding module and

β

-attention mechanism, thereby enabling risk prediction based on temporal sequential data.

Although the aforementioned methods exhibit robust performance in time series prediction, their applicability to directly predict vehicle maintenance projects is limited. This limitation arises from their inability to accommodate the unique characteristics inherent to these tasks. Unlike these methods, our study comprehensively integrates the correlations between maintenance projects and the effects of vehicle mileage and foundational data to ensure precise predictions of vehicle maintenance projects. With the continuous development of the vehicle industry, methods for forecasting vehicle maintenance demand are evolving and can be categorized based on the data source into single data-based and combined data-based vehicle maintenance demand predictions.

3. Method

3.1. Notations

To facilitate a comprehensive depiction of the maintenance protection prediction task at hand, we provide a set of notations in Table 1. The base information of a vehicle is represented by

B = \{B_{1}, B_{2}, \dots, B_{n}\}

, where n is the types of base information. During the vehicle maintenance process, numerous maintenance projects are necessary. All maintenance projects are systematically encoded, forming the set of project codes denoted as

P = \{p_{1}, p_{2}, \dots, p_{| P |}\}

, comprising

| P |

different project types. Each vehicle possesses multiple maintenance records, which encapsulate two essential pieces of information: the maintenance projects and the corresponding mileage. The maintenance projects in the t-th maintenance are defined as a multi-hot column vector

V_{t} \in {0, 1}^{| p |}

;

V_{t}^{i}

= 1 means that the maintenance project

p_{i}

was carried out at the t-th maintenance;

t = 1, 2, \dots, T

; and

i = 1, 2, \dots, | P |

. Multi-hot column vector is a type of binary vector representation where each element of the vector corresponds to a specific maintenance project. If a certain maintenance project is performed during a maintenance event, the corresponding element in the vector is set to 1; otherwise, it is set to 0. For example, if a maintenance record includes projects

p_{1}

and

p_{3}

, the multi-hot vector could be [1, 0, 1, …, 0]. The mileage is denoted by

{M_{t}}_{t = 1, 2, \dots, T}

, with

M_{t}

representing the mileage achieved during the t-th maintenance, where T is the number of maintenance. In this paper, maintenance project prediction is based on the historical maintenance projects

V_{1}, V_{2}, \dots, V_{T}

; mileage data

M_{1}, M_{2}, \dots, M_{T}

; and base information B. The objective is to predict the maintenance projects

V_{T + 1}

for the (

T + 1

)-th maintenance.

3.2. Framework

The framework of the proposed Multi-source Data Deep Fusion Network (MsDFN) for vehicle maintenance project prediction is presented in Figure 2. The framework comprises two key modules:

(1) Maintenance project correlation representation: Recognizing the existence of correlations among different maintenance projects, we propose a correlation representation function to derive correlation representation results for each project based on the co-occurrence matrix.

(2) Multi-source data fusion network: Acknowledging the significant impact of vehicle mileage and vehicle base information on the development of maintenance projects, a multi-source data fusion network based on attention mechanism is proposed to effectively incorporate the project correlation representation results with the vehicle mileage and vehicle base information.

3.3. Maintenance Project Correlation Representation

During the process of vehicle maintenance, each maintenance session frequently entails the execution of multiple distinct maintenance projects simultaneously. For example, within a vehicle’s maintenance record, activities such as an oil change and air filter replacement occur simultaneously, indicating an interdependence among various maintenance projects. To fully utilize the correlations among maintenance projects, we propose a correlation representation module for their correlation based on the co-occurrence matrix.

We create a global co-occurrence graph G for all maintenance projects with weighted edges, where each node serves as a representation of a maintenance project

{p_{i}}_{i = 1, 2, \dots | P |}

sourced from the set P. If a code pair

(p_{i}, p_{j})

co-occurs in a vehicle’s maintenance record, two equal weights

\vec{(i, j)}

and

\vec{(j, i)}

are integrated into G. Then, we count the total co-occurrence frequency

t_{i j}

of

(p_{i}, p_{j})

in all vehicles’ maintenance records for further calculation of edge weights. In addition, we want to detect important and common project pairs. Therefore, we define a threshold

δ

to filter out combinations with low frequency and obtain a qualified set

Δ_{i} = \{p_{i} ∣ \frac{t_{i j}}{\sum_{n = 1}^{| P |} t_{i j}} \geq δ\}

for

p_{i}

. Let

q_{i} = \sum_{p_{j} \in Δ_{i}} t_{i j}

be the total frequency of qualified projects co-occurring with

p_{i}

. We use an adjacency matrix

A \in R^{| P | \times | P |}

to represent G with the definition in Equation (1).

A_{i j} = \{\begin{matrix} 0 if i = j or \frac{t_{i j}}{\sum_{j = 1}^{| P |} t_{i j}} < δ \\ \frac{t_{i j}}{q_{i}} otherwise \end{matrix}

(1)

Note that A is designed to be symmetric to represent the influence of two maintenance projects. As a static matrix, A quantifies the frequency at which global maintenance projects co-occur. However, the appearance and disappearance of different maintenance projects occur at varying stages. Even if a specific maintenance project is absent from the current maintenance records, a related maintenance project may arise in the future due to the correlations of different projects. Consequently, in order to account for the correlations among maintenance projects, the correlation represent results

{\{R_{t}\}}_{t = 1, 2, \dots, T}

of each maintenance project are derived for each vehicle’s historical maintenance projects

{\{V_{t}\}}_{t = 1, 2, \dots, T}

, utilizing

F_{1} (V_{t}, A)

individually. The definition of

F_{1} (V_{t}, A)

is presented in Equation (2).

\begin{matrix} F_{1} (V_{t}, A) = f (A^{T} \times D_{t}) \\ D_{t} = diag (V_{t} [1], V_{t} [2], \dots, V_{t} [n]) \end{matrix}

(2)

where

n = | P |

and function f serves to filter out the row vectors of the matrix that are all zeros.

3.4. Multi-Source Data Deep Fusion Network

The mileage and base information of a vehicle play a pivotal role in the formulation of maintenance projects. Based on maintenance project correlation representation, in order to thoroughly take advantage of their impact on this endeavor, we propose a multi-source data deep fusion network that combines maintenance project correlation representation result, mileage, and base information.

3.4.1. Mileage Fusion Representation

Mileage is an important facet of vehicle condition. In general, higher mileage corresponds to an increase in the type and frequency of maintenance projects required.

In the evaluation of vehicle maintenance tasks, it is imperative for maintenance personnel to first grasp the present state of the vehicle, encompassing its historical maintenance projects and mileage. Drawing on this information, maintenance staff can make initial inferences about the ongoing maintenance projects necessary for the vehicle. Yet, due to distinct driving patterns exhibited by different vehicles, the impact of mileage on maintenance projects varies across vehicles. To tackle this challenge, we propose a mileage-aware key query attention mechanism that discerns pivotal mileage thresholds in the development of vehicle maintenance projects. Herein, the correlation representation result of each project serves as the query vector, while the mileage of each maintenance episode forms the key and value vectors. Notably, the raw mileage

{M_{t}}_{t = 1, 2, \dots, T}

and the correlation representation results of the maintenance project

{R_{t}}_{t = 1, 2, \dots, T}

are not in the same potential space, and it becomes imperative to map

{M_{t}}_{t = 1, 2, \dots, T}

to the same potential space as

{R_{t}}_{t = 1, 2, \dots, T}

to obtain the mapping result

{N_{t}}_{t = 1, 2, \dots, T}

. This is achieved by

F_{2} (M_{t})

[25], as illustrated in Equation (3).

F_{2} (M_{t}) = W_{2} [1 - tanh ({(W_{1} \frac{M_{t}}{180} + b_{1})}^{2}) + b_{2}]

(3)

where

W_{1} \in R^{| P |}

,

b_{1} \in R^{| P |}

,

W_{2} \in R^{| P | \times | P |}

, and

b_{2} \in R^{| P |}

are parameters. Once we have obtained the potential space result

{N_{t}}_{t = 1, 2, \dots, T}

that aligns with

{R_{t}}_{t = 1, 2, \dots, T}

, we input

{N_{t}}_{t = 1, 2, \dots, T}

as the query vector and

{R_{t}}_{t = 1, 2, \dots, T}

as the key and value vectors into the attention mechanism one by one. The specific implementation is illustrated in Equation (4).

O_{t} = Atten (N_{t}, R_{t}, R_{t})

(4)

The attention mechanism [12] in the above equation is defined as shown in Equation (5):

Atten (Q, K, V) = softmax (\frac{Q W_{q} {(K W_{k})}^{T}}{\sqrt{d}}) V W_{v}

(5)

Attention weights are defined as shown in Equation (6):

Atten-weight (Q, K, V) = softmax (\frac{Q W_{q} {(K W_{k})}^{T}}{\sqrt{d}})

(6)

where d is the dimension of attention andthe dimension of attention and

W_{q}, W_{k} \in R^{| P | \times d}, W_{v} \in R^{| P | \times | P |}

are attention weights. For each maintenance project and mileage of the vehicle, the attention mechanism is fused one by one and merged to obtain the result of the historical mileage-fused attention representation of the vehicle

E = [O_{1}; O_{2}; \dots; O_{T}]

.

3.4.2. Representation of Data Fusion from Multiple Sources

The base information of the vehicle includes location, manufacturer, type, engine type, engine displacement, model year, production year, and assembly plant. These factors significantly influence the development of the vehicle’s maintenance project. Consequently, it is imperative to incorporate the aforementioned vehicle base information into the vehicle maintenance project prediction model. This integration effectively enhances the accuracy of the prediction process. The specific implementation is outlined as follows.

To begin with, the base information

B = {\{B_{1}, B_{2}, \dots, B_{n}\}}_{n = 8}

are transformed utilizing the representation functions

E m_{1} (B_{i})

and

E m_{2} (B_{j})

to acquire

S = {\{S_{1}, S_{2}, \dots, S_{n}\}}_{n = 8}

. The representation functions and corresponding outputs for each base information are illustrated in Table 2.

Let us consider a variable X comprising n categories, denoted as

X_{i}

for the i-th category. Through

E m_{1}

, each category is converted into a binary vector of length n, where only one element is assigned the value of 1 while the remainder are set to 0. Precisely, the representation result for the i-th category is outlined in Equation (7):

{Em}_{1} (X_{i}) = [0, 0, \dots, 1, \dots, 0]

(7)

where the length of the vector is n, the i-th element is 1, and the rest of the elements are 0.

E m_{2}

is defined as in Equation (8):

{Em}_{2} (B_{i}) = W_{e} {(E m_{1} (B_{i}))}^{T}

(8)

{E m}_{2}

maps discrete features onto a more meaningful lower dimensional space based on

{E m}_{1}

, where

W_{e} \in R^{d \times n}

is a parameter matrix and d denotes the dimension of the low-dimensional space. Using

{E m}_{1}

and

{E m}_{2}

, the results of each base information representation are spliced in order to obtain

U = [S_{1}, S_{2}, \dots, S_{8}]

.

For the categorical data in the model, we employ one-hot-based encoding and multi-hot vector encoding. For numerical data, we apply an embedding function,

F_{2} (M_{t})

, to map it to the same vector space as the categorical data. Utilizing different encoding methods enables the model to consider a wide range of factors, thus enhancing its ability to accurately predict upcoming maintenance projects by utilizing various vehicle characteristics and history.

To seamlessly incorporate the vehicle base information into the maintenance project prediction task, we set U as the query vector and E as both the key vector and value vector. These vectors are then fed into the attention mechanism, yielding the representation result L for the vehicle’s base information U, as well as the historical mileage fused attention representation E. The specifics of this implementation are outlined in Equation (5). Subsequently, L and U are concatenated to generate the representation result

H = [L, U]

for the vehicle. The final maintenance project prediction result

\hat{y}

is obtained through the vehicle representation result H. The specific implementation process is shown in Equation (9):

\hat{y} = Sigmoid W_{m} dropout (H) + b

(9)

where

W_{m} \in R^{| P | \times d_{H}}

and

b \in R^{| P |}

are parameters, and

d_{H}

is the dimension of H. To minimize the risk of overfitting during the prediction process, a dropout operation is performed before H makes the prediction.

3.5. Model Optimization

We train the MsDFN model to predict the last maintenance project for each vehicle, with a binary cross-entropy loss function for the global objective function, as shown in Equation (10):

l o s s = - \sum_{i = 1}^{| P |} (y_{i}^{T} log (\hat{y_{i}}) + {(1 - y_{i})}^{T} log (1 - \hat{y_{i}}))

(10)

where

\hat{y_{i}}

represents the predicted results of the maintenance project

p_{i}

and

y_{i}

represents the true label of the maintenance project

p_{i}

.

4. Experiment Result and Analysis

4.1. Dataset Description

To assess the performance of our proposed method, we utilized authentic vehicle maintenance and base information data sourced from 73 vehicle maintenance companies for validation purposes. During the preprocessing phase, we standardized the data formats by aligning field names, the maintenance project name, and data types across different companies in order to ensure consistency. We implemented a cleaning process, which included deduplication and outlier detection to enhance data accuracy. Subsequently, we screened the data to retain records of vehicles that had undergone two or more repairs, ensuring complete information for each repair and corresponding base information was available. For the data merging process, we employed a unified vehicle identifier to integrate datasets from different companies, thus maintaining comprehensive records for each vehicle. As a result of these preprocessing and merging steps, the dataset comprises records of 26,831 vehicles that underwent maintenance between April 2011 and April 2023. This enhanced description provides clarity on the methods used to preprocess and merge the data, ensuring its reliability and usability in our study. The detailed dataset statistics are presented in Table 3, and the distributions of the vehicle maintenance records are depicted in Figure 3.

To enhance the experimentation process, we proceed to randomly partition the dataset into training, validation, and testing sets. Specifically, these sets consist of 18,000, 3831, and 5000 vehicles, respectively. In our approach, we designate the last maintenance project as the label, while utilizing the remaining maintenance projects, mileage, and vehicle base information as input features. The global project co-occurrence graph G is constructed based on the maintenance project within the training set.

4.2. Baseline Models and Evaluation Metrics

The main task of the experiment is to predict the (

T + 1

)-th maintenance projects based on the vehicle’s first T maintenance projects, mileage, and vehicle base information, which is a multi-label classification problem. For this task, the evaluation metrics are weighted F₁ score (w-F₁) [21] and R@k [21]: w-F₁ calculates F₁-score for each project code and reports their weighted mean; R@k is an average ratio of desired project codes in top k predictions by the total number of desired project codes in each maintenance, which measures prediction accuracy. In order to compare our proposed method with state-of-the-art models, we choose the following method as a comparison experiment.

Traditional machine learning method: MLP [7].
Traditional deep learning methods: CNN [9], RNN [10], and LSTM [11].
Models based on RNN and attention: RETAIN [16] and Dipole [17].
Model based on dynamic graph and context-aware: Chet [21].
Typical methods for vehicle maintenance predicting: SLFN [26], DBN [27], and EFMSAE-LSTM [28].
Typical deep learning methods for data fusion: MIFDELN [29], MFDL [30], and IKN-ConvLSTM [31].

In the experimental process, MLP indicates that only historical maintenance projects are input into the model for prediction, MLP₊ indicates that historical maintenance projects, mileage, and vehicle base information are fused for prediction and so on.

4.3. Realization Details

To ensure unbiased results, we randomly initialize the model parameters in our experimental setup. The hyperparameters are carefully tuned on the validation set. Specifically, we set the threshold to 0.07, the dropout rate to 0.45, and consistently set the batch size to 32 across all experiments. Our model training process consists of 100 epochs, employing the Adam optimizer with an Initial learning rate of

1 \times 10^{- 3}

. We incorporate learning rate decay using a multi-step scheduler, learning rates are adjusted to

1 \times 10^{- 3}

and

1 \times 10^{- 5}

at epochs 5 and 15. We implemented the entire experiment using Python 3.7.0 and PyTorch 1.10.0, with CUDA 11.4 utilized on a device equipped with 64 GB of memory and an NVIDIA-SMI 472.39 GPU. To ensure the reliability of our findings, we repeated the experiment five times using distinct random seeds.

4.4. Prediction Performance

Table 4 depicts the experimental results. Since the average number of project codes per maintenance is 5.5, we established (k = [3,5,7]) for (R@k). Remarkably, our proposed model surpasses all baseline models in terms of performance. When compared to the top-performing baseline RETAIN₊, MsDFN demonstrates an enhanced accuracy in predicting maintenance projects at varying (k) values—specifically, 3 (k = 3), 5 (k = 5), and 7 (k = 7)—with improvements of 1.11%, 1.14%, and 1.21%, respectively. This proves the effectiveness of the correlation representation of maintenance projects and multi-source data deep fusion. Despite the interpretability associated with original traditional machine learning and deep learning models, their efficacy is limited because they solely focus on modeling maintenance history data without incorporating the essential aspects of learning from mileage and base information. Moreover, the two experimental results of each baseline model confirm the inadequacy of solely considering the development process of maintenance projects, thus underscoring the significance of integrating vehicle mileage and base information.

Compared to typical deep fusion models, MsDFN exhibits superior performance. Specifically, for w-F₁, MsDFN achieves a score of 34.62%, significantly surpassing IKN-ConvLSTM’s 33.30%, MFDL’s 33.35%, and MIFDELN’s 33.51%. Similarly, for R@3, MsDFN attains a higher score of 40.39%, outperforming scores of 39.15%, 39.20%, and 39.24% from IKN-ConvLSTM, MFDL, and MIFDELN, respectively. This trend continues in R@5 and R@7, with MsDFN achieving 47.29% and 52.18%, respectively, consistently outperforming the results of the other three models. This demonstrates MsDFN’s capability in processing comprehensive vehicle maintenance and basic information, thus affirming its superior performance in predicting maintenance projects.

4.5. Performance Assessment of Data Sufficiency

To evaluate the impact of data sufficiency on prediction accuracy, we maintained a fixed size of the validation set at 3831 entries; varied the size of the training sets at 12,000, 14,000, 16,000, and 18,000, respectively; and utilized the remaining data for the test set. The remarkable performance of MsDFN in comparison to other baseline models, even with limited data, is evident in Figure 4.

4.6. Ablation Study

In order to conduct a thorough analysis of each module’s effectiveness in our proposed approach, we compared ten ablation variants of the model, each with distinct settings. These variants are as follows:

MsDFN-B₁: This model aims to underscore the importance of location in predicting vehicle maintenance projects by removing the input of location.
MsDFN-B₂: The elimination of the manufacturer input in this model enables us to assess the significance of the manufacturer in predicting vehicle maintenance projects.
MsDFN-B₃: By excluding the input of vehicle type, this model allows us to evaluate the contribution of vehicle type to the prediction of maintenance projects.
MsDFN-B₄: The elimination of engine type as an input in this model enables us to ascertain the impact of engine type on predicting maintenance projects.
MsDFN-B₅: This model investigates the significance of engine displacement in predicting vehicle maintenance projects by removing the input of engine displacement.
MsDFN-B₆: The exclusion of the model year input in this model allows us to evaluate the importance of the vehicle model year in predicting maintenance projects.
MsDFN-B₇: By eliminating the input of the year of vehicle production, this model enables us to analyze the impact of production year on predicting maintenance projects.
MsDFN-B₈: This model investigates the contribution of vehicle assembly plants to the prediction of maintenance projects by removing the input of vehicle assembly plants.
MsDFN-B: The exclusion of all base information in this model allows for the assessment of its significance in predicting vehicle maintenance projects.
MsDFN-M: The removal of the mileage fusion process in this model enables the evaluation of the importance of mileage in predicting vehicle maintenance projects.
MsDFN-Co: This model assesses the significance of project correlation representation in predicting vehicle maintenance projects by eliminating the process of project correlation representation.

The results of the experiments for each variant model are presented in Table 5, indicating that the performance of each variant model of MsDFN is inferior to that of the original MsDFN, thereby validating the efficacy of each component within the MsDFN model.

The variant models MsDFN-B₁, MsDFN-B₂,…, MsDFN-B₈ evince that the inclusion of vehicle base information enhances the accuracy of vehicle maintenance project predictions, underscoring the necessity of integrating multiple data sources into the model.

MsDFN-B demonstrates the impact of the lack of base information on the experimental results at a holistic level, again showing that the development of vehicle maintenance projects is influenced by vehicle base information.

Remarkably, the MsDFN-M variant reveals the substantive impact of vehicle maintenance mileage on the formulation of maintenance projects, affirming a robust association between the advancement of vehicle maintenance projects and vehicle mileage.

Additionally, the MsDFN-Co variant demonstrates that the adequate representation of maintenance project correlations significantly enhances the precision of vehicle maintenance project predictions, highlighting the interplay between different projects in the development of maintenance projects and the imperative of incorporating project correlation representation within the model.

4.7. Prediction Analysis of New Maintenance Projects

In this study, a new maintenance project is defined as one that has not previously been executed on the vehicle. Within the realm of vehicle maintenance, there exists a heightened interest in predicting such previously unencountered projects. By leveraging the underlying assumption that distinct maintenance projects correlate, our proposed model is expected to exhibit enhanced accuracy in predicting new maintenance projects.

Table 6 displays the experimental results detailing each model’s predictive capabilities regarding new maintenance projects. Notably, our proposed model surpasses all baseline models, attesting to its exceptional proficiency in predicting these previously unencountered maintenance projects. These results validate our model’s effectiveness in predicting new maintenance projects.

4.8. Parametric Sensitivity Analysis

To explore the influence of variable dimensionality on the performance of MsDFN, we scrutinized the impact of E and U dimensions on the experimental results. Figure 5 shows the performance of the proposed MsDFN model across various hyperparameter combinations. Notably, we ascertained that an optimal balance between efficiency and performance is achieved by setting the E and U dimensions at approximately 75. Furthermore, we sought to evaluate the significance of the co-occurrence matrix threshold

δ

in Equation (1) on the experimental results, examining thresholds including 0.04, 0.05, 0.06, 0.07, 0.08, and 0.09. Remarkably, as illustrated in Figure 6, the overall performance remains relatively stable, with the most favorable results attained at a threshold of 0.07.

5. Case Study

To visually unveil the interrelationship between maintenance projects and the influence of mileage on these projects, we selectively sampled four sets of three maintenance projects from a single maintenance record. The maintenance projects for each group are illustrated in Table 7. We computed the frequency of each group’s maintenance projects within distinct mileage ranges in the training dataset and calculated the average attention weights in Equation (6) for each group’s maintenance projects across various mileage ranges in the validation dataset.

As depicted in Figure 7, a robust correlation emerges between the frequency and weight of the same project, signifying the efficacy of integrating mileage into the predictive process of maintenance projects. Moreover, mileage serves as an influencing factor on project weight, as deduced from the weight–frequency relationship. Furthermore, the weights of different projects exhibit a strong correlation, underscoring the effectiveness of the maintenance project correlation representation in capturing inter-project correlations. In summary, our case study showcases the comprehensive consideration of maintenance project correlations and the impact of mileage, affirming the capability of our model to effectively address these factors.

6. Conclusions

In this paper, we introduce a new deep fusion network that offers a comprehensive approach to predicting global maintenance projects. To enhance our understanding of the relationships between various maintenance projects, we introduce a correlation representation framework utilizing the maintenance project co-occurrence matrix. Building upon this correlation learning, we propose a deep fusion network that integrates the attention mechanism to synthesize vehicle mileage and vehicle base information. By conducting extensive experiments with actual vehicle data, we demonstrate the effectiveness and robustness of our model.

In summary, our model accounts for the interplay between vehicle maintenance projects, mileage, and vehicle base information. Furthermore, the strong interrelation between vehicle maintenance projects, vehicle breakdowns, and vehicle parts enables the seamless application of our model to forecast breakdowns and parts requirements. However, the correlation calculation is based solely on the co-occurrence frequency between maintenance projects, overlooking the correlation embedded in the textual information of maintenance projects. In addition, the co-occurrence can suggest correlation but does not necessarily imply causality or a meaningful relationship for prediction. In future studies, we intend to delve deeper into the correlations of maintenance projects and consider integrating explicit vehicle maintenance technology information into the prediction task. This approach is expected to enhance the interpretability of our model and promote further advancements in the field.

Author Contributions

F.C. contributed to the data analysis, algorithm construction, and writing and editing of the manuscript. D.S., G.Z. and G.W. reviewed and edited the manuscript. K.Y. proposed the idea, contributed to the data acquisition, performed supervision and project administration, and reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

On behalf of all authors, the corresponding author states that there are no conflicts of interest.

Abbreviations

Abbreviation	Full Term
MsDFN	Multi-source Data Deep Fusion Network
CNN	Convolutional Neural Network
RNN	Recurrent Neural Network
LSTM	Long Short-Term Memory
GNN	Graph Neural Network
TCN	Temporal Convolutional Network
ARIMA	Autoregressive Integrated Moving Average Model
SVM	Support Vector Machine
SVR	Support Vector Regression
MLP	Multi-Layer Perceptron
HMMs	Hidden Markov Models
MsDFN	Multi-source Deep Fusion Network

References

Brown, R.G. Exponential Smoothing for Prediction and Control. J. R. Stat. Soc. Ser. (Methodol.) 1956, 18, 296–301. [Google Scholar]
Brown, G.W. Exponential Smoothing for Predicting Demand. J. Oper. Res. Soc. Am. 1963, 11, 67–91. [Google Scholar]
Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
Edgeworth, F.Y. Method of Least Squares. Philos. Trans. R. Soc. London. Ser. A 1885, 183, 1–21. [Google Scholar] [CrossRef]
Vapnik, V.N.; Chervonenkis, A.Y. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Its Appl. 1963, 8, 264–280. [Google Scholar]
Hong, W.C.; Dong, Y.; Chen, L.Y.; Wei, S.Y. SVR with hybrid chaotic genetic algorithms for tourism demand forecasting. Appl. Soft Comput. 2011, 11, 1881–1890. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Zahari, A.; Jaafar, J. A novel approach of hidden Markov model for time series forecasting. In Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, Bali, Indonesia, 8–10 January 2015; pp. 1–5. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
Hossain, M.; Smahmood, H. Short term load forecasting using an LSTM neural network. In Proceedings of the 2020 IEEE Power and Energy Conference at Illinois (PECI), Champaign, IL, USA, 27–28 February 2020; pp. 1–6. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the Dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2020; pp. 753–763. [Google Scholar]
Chen, Y.; Kang, Y.; Chen, Y.; Wang, Z. Probabilistic forecasting with temporal convolutional neural network. Neurocomputing 2020, 396, 321–330. [Google Scholar] [CrossRef]
Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. SCINet: Timeseries modeling and Forecasting with sample convolution and interaction. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 5816–5828. [Google Scholar]
Choi, E.; Bahadori, M.T.; Kulas, J.A.; Schuetz, A.; Stewart, W.F.; Sun, J. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3504–3512. [Google Scholar]
Ma, F.; Chitta, R.; Zhou, J.; You, Q.; Sun, T.; Gao, J. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1903–1911. [Google Scholar]
Bai, T.; Zhang, S.; Egleston, B.L.; Vucetic, S. Interpretable representation learning for healthcare via capturing disease progression through time. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 19–23 August 2018; pp. 43–51. [Google Scholar]
Rangapuram, S.; Seeger, S.; Gasthaus, J. Deep state space models for time series forecasting. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 7796–7805. [Google Scholar]
Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 5244–5254. [Google Scholar]
Lu, C.; Han, T.; Ning, Y. Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 2982–2989. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. Graph WaveNet for deep spatial-temporal graph modeling. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 1907–1913. [Google Scholar]
Xu, M.; Zhu, Z.; Li, Y.; Zheng, S.; Li, L.; Wu, H.; Zhao, Y. Cooperative dual medical ontology representation learning for clinical assisted decision-making. Comput. Biol. Med. 2023, 163, 107138. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Zhu, Z.; Guo, X.; Li, S.; Yang, Y.; Zhao, Y. HGV4Risk: Hierarchical Global View-guided Sequence Representation Learning for Risk Prediction. ACM Trans. Knowl. Discov. Data (Early Access) 2023, 18, 1–21. [Google Scholar] [CrossRef]
Luo, J.; Ye, M.; Xiao, C.; Ma, F. HiTANet: Hierarchical time-aware attention networks for risk prediction on electronic health records. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2020; pp. 647–656. [Google Scholar]
Tessaro, I.; Mariani, V.C.; Coelho, L.S. Machine learning models applied to predictive maintenance in automotive engine components. Proceedings 2020, 64, 26. [Google Scholar] [CrossRef]
Xu, P.; Wei, G.; Song, K.; Chen, Y. High-accuracy health prediction of sensor systems using improved relevant vector-machine ensemble regression. In Knowledge-Based Systems; Elsevier: Amsterdam, The Netherlands, 2021; Volume 212, p. 106555. [Google Scholar]
Guo, J.; Lao, Z.; Hou, M.; Li, C.; Zhang, S. Mechanical fault time series prediction by using EFMSAE-LSTM neural network. Measurement 2021, 173, 108566. [Google Scholar] [CrossRef]
Ye, M.; Yan, X.; Jiang, D.; Xiang, L.; Chen, N. MIFDELN: A multi-sensor information fusion deep ensemble learning network for diagnosing bearing faults in noisy scenarios. Knowl.-Based Syst. 2024, 284, 111294. [Google Scholar] [CrossRef]
Jiang, L.; Wang, X.; Li, W.; Wang, L.; Yin, X.; Jia, L. Hybrid multitask multi-information fusion deep learning for household short-term load forecasting. IEEE Trans. Smart Grid 2021, 12, 5362–5372. [Google Scholar] [CrossRef]
Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A novel multi-source information-fusion predictive framework based on deep neural networks for accuracy enhancement in stock market prediction. J. Big Data 2021, 8, 17. [Google Scholar] [CrossRef]

Figure 1. The number of common maintenance projects with different annual mileage.

Figure 2. An overview of the MsDFN model. It is mainly composed of two modules: (1) Maintenance Project Correlation Representation; (2) Multi-source Data Deep Fusion Network.

Figure 3. The distributions of the vehicle maintenance record. (a) The distribution of vehicles by the number of maintenance. (b) The distribution of data by the number of maintenance project.

Figure 4. Accuracy of predictions of MsDFN and baseline with varying training data.

Figure 5. The performance under different dimensions of E and U combinations.

Figure 6. Parameter sensitivity analysis of

δ

.

Figure 6. Parameter sensitivity analysis of

δ

.

Figure 7. Correlation of frequency and attention weights for each group of projections. (a) Correlation of frequency and attention weights for Group I; (b) Correlation of frequency and attention weights for Group II; (c) Correlation of frequency and attention weights for Group III; (d) Correlation of frequency and attention weights for Group IV.

Table 1. Notations and descriptions.

Notation	Description
$B = \{B_{1}, B_{2}, \dots, B_{n}\}$	Base information about each vehicle
$P = {p_{1}, p_{2}, \dots ., p_{\| P \|}}$	Set of all maintenance project codes
${\{V_{t}\}}_{t = 1, 2, \dots, T}$	Set of all maintenance projects for a vehicle
$V_{t} \in {0, 1}^{\| p \|}$	The t-th maintenance projects for a vehicle
$F_{1} (V_{t}, A)$	Correlation representation function of $V_{t}$
${M_{t}}_{t = 1, 2, \dots, T}$	Set of all mileage for a vehicle
$F_{2} (M_{t})$	Representation function of $M_{t}$
$N_{t}$	Representation results of $M_{t}$
$A \in R^{\| P \| \times \| P \|}$	Co-occurrence matrix for all maintenance projects
$R_{t}$	Representation results of $V_{t}$
$O_{t}$	Attention output results for $R_{t}$ and $N_{t}$
E	Fusion representation results for ${O_{t}}_{t = 1, 2, \dots, T}$
$E m_{1} {(B_{i})}_{,} E m_{2} (B_{j})$	Representation functions for base information $B_{i}$ and $B_{j}$
$S_{i}$	Representation results of $B_{i}$
U	Fusion representation results for ${S_{i}}_{i = 1, 2, \dots, n}$
L	Attention results for E and U
H	Results of the fusion representation of vehicle
$V_{T + 1}$	Maintenance projects for the ( $T + 1$ )-th time

Table 2. Base information and representation.

Base Information	Notation	Representation Function	Representation Result
Location	$B_{1}$	${E m}_{2}$	$S_{1}$
Manufacturer	$B_{2}$	${E m}_{1}$	$S_{2}$
Type	$B_{3}$	${E m}_{1}$	$S_{3}$
Engine Type	$B_{4}$	${E m}_{1}$	$S_{4}$
Engine Displacement	$B_{5}$	${E m}_{1}$	$S_{5}$
Model Year	$B_{6}$	${E m}_{2}$	$S_{6}$
Production Year	$B_{7}$	${E m}_{2}$	$S_{7}$
Assembly Plant	$B_{8}$	${E m}_{1}$	$S_{8}$

Table 3. Experimental dataset statistics.

Statistic	Value
Number of Vehicles	26,831
Maximum Number of Maintenance	69
Average Number of Maintenance	3.85
Number of Maintenance Project Codes	3702
Maximum Number of Project Codes at One Maintenance	56
Average Number of Project Codes per Maintenance	5.5
Number of Location	398
Number of Manufacturer	2
Number of Type	3
Number of Engine Type	12
Number of Engine Displacement	6
Number of Model Year	34
Number of Production Year	22
Number of Assembly Plant	10

Table 4. Maintenance project prediction results using w-F₁(%) and R@k (%).

Model	w-F₁	R@3	R@5	R@7
SLFN [26]	23.56 ± 0.12	30.37 ± 0.11	39.29 ± 0.09	44.89 ± 0.11
SLFN₊ ¹ [26]	23.89 ± 0.11	31.50 ± 0.25	40.49 ± 0.14	45.08 ± 0.23
MLP [7]	23.60 ± 0.07	30.40 ± 0.16	39.32 ± 0.17	44.96 ± 0.17
MLP₊ [7]	23.98 ± 0.21	31.60 ± 0.15	40.54 ± 0.21	45.14 ± 0.33
DBN [27]	23.57 ± 0.21	30.38 ± 0.18	39.30 ± 0.37	44.92 ± 0.18
DBN₊ [27]	23.95 ± 0.21	30.55 ± 0.28	40.50 ± 0.27	44.99 ± 0.21
CNN [9]	29.70 ± 0.20	36.40 ± 0.68	42.22 ± 0.87	46.26 ± 0.78
CNN₊ [9]	32.22 ± 0.28	35.50 ± 0.17	44.75 ± 0.23	48.88 ± 0.22
RNN [10]	31.70 ± 0.16	38.68 ± 0.05	45.30 ± 0.38	50.04 ± 0.55
RNN₊ [10]	32.84 ± 0.11	39.51 ± 0.20	45.92 ± 0.25	50.43 ± 0.30
LSTM [11]	23.62 ± 0.10	31.40 ± 0.16	40.32 ± 0.18	44.60 ± 0.16
LSTM₊ [11]	24.90 ± 0.36	32.07 ± 0.44	41.82 ± 0.11	45.96 ± 0.21
EFMSAE-LSTM [28]	23.68 ± 0.12	31.43 ± 0.26	40.36 ± 0.28	44.63 ± 0.15
EFMSAE-LSTM₊ [28]	24.95 ± 0.23	32.12 ± 0.34	41.90 ± 0.21	45.99 ± 0.24
Dipole [16]	23.89 ± 0.18	31.19 ± 0.15	40.20 ± 0.24	44.77 ± 0.11
Dipole₊ [16]	25.31 ± 0.18	34.70 ± 0.58	41.68 ± 0.59	45.65 ± 0.47
RETAIN [17]	32.57 ± 0.08	38.11 ± 0.73	44.34 ± 0.71	48.87 ± 0.78
RETAIN₊ [17]	33.61 ± 0.21	39.28 ± 0.35	46.15 ± 0.27	50.97 ± 0.22
Chet [21]	30.65 ± 0.11	39.05 ± 0.13	45.10 ± 0.13	49.42 ± 0.18
Chet₊ [21]	33.25 ± 0.18	39.10 ± 0.16	45.99 ± 0.24	50.80 ± 0.27
IKN-ConvLSTM [31]	33.30 ± 0.25	39.15 ± 0.14	46.02 ± 0.25	50.83 ± 0.12
MFDL [30]	33.35 ± 0.28	39.20 ± 0.36	46.09 ± 0.26	50.89 ± 0.22
MIFDELN [29]	33.51 ± 0.28	39.24 ± 0.26	46.12 ± 0.23	50.92 ± 0.26
MsDFN	34.62 ± 0.15	40.39 ± 0.12	47.29 ± 0.16	52.18 ± 0.17

¹ Here, ₊ indicates that historical maintenance projects, mileage, and vehicle base information are fused for prediction.

Table 5. Ablation study of vehicle maintenance project prediction.

Types	Model	w-F₁	R@3	R@5	R@7
	MsDFN-B₁	34.34 ± 0.15	40.25 ± 0.12	47.01 ± 0.15	52.06 ± 0.11
	MsDFN-B₂	34.15 ± 0.25	40.01 ± 0.18	46.90 ± 0.80	51.75 ± 0.72
	MsDFN-B₃	33.89 ± 0.33	40.13 ± 0.31	46.88 ± 0.14	51.72 ± 0.45
Feature Level	MsDFN-B₄	33.97 ± 0.11	40.30 ± 0.24	47.21 ± 0.52	52.15 ± 0.46
	MsDFN-B₅	34.03 ± 0.40	40.33 ± 0.34	46.97 ± 0.74	51.78 ± 0.29
	MsDFN-B₆	34.09 ± 0.36	40.17 ± 0.33	46.86 ± 0.71	51.77 ± 0.55
	MsDFN-B₇	34.10 ± 0.07	40.32 ± 0.37	47.22 ± 0.77	52.14 ± 0.35
	MsDFN-B₈	34.12 ± 0.21	40.19 ± 0.36	47.05 ± 0.70	51.86 ± 0.50
	MsDFN-B	32.24 ± 0.31	40.11 ± 0.27	46.74 ± 0.32	51.20 ± 0.34
Module Level	MsDFN-M	33.82 ± 0.18	40.31 ± 0.25	47.19 ± 0.42	52.06 ± 0.54
	MsDFN-Co	33.77 ± 0.29	40.35 ± 0.30	47.11 ± 0.78	52.03 ± 0.67
	MsDFN	34.62 ± 0.15	40.39 ± 0.12	47.29 ± 0.16	52.18 ± 0.17

Table 6. New maintenance project prediction results using w-F₁(%) and R@k (%).

Model	w-F₁	R@3	R@5	R@7
SLFN [26]	6.89 ± 0.128	8.38 ± 0.26	13.78 ± 0.66	18.70 ± 0.34
SLFN₊ ¹ [26]	11.32 ± 0.13	11.33 ± 0.34	17.74 ± 0.57	22.68 ± 0.45
MLP [7]	6.99 ± 0.18	8.42 ± 0.16	13.82 ± 0.96	18.79 ± 0.44
MLP₊ [7]	11.36 ± 0.10	11.38 ± 0.44	17.80 ± 0.67	22.71 ± 0.75
DBN [27]	6.94 ± 0.31	8.40 ± 0.26	13.80 ± 0.36	18.75 ± 0.34
DBN ₊ [27]	11.33 ± 0.21	11.36 ± 0.23	17.78 ± 0.56	22.69 ± 0.45
CNN [9]	11.00 ± 0.23	11.19 ± 0.70	17.58 ± 0.49	22.25 ± 0.30
CNN₊ [9]	11.80 ± 0.42	11.76 ± 0.56	18.04 ± 0.64	23.26 ± 0.63
RNN [10]	7.24 ± 0.11	8.82 ± 0.55	14.32 ± 0.39	18.35 ± 0.53
RNN₊ [10]	11.23 ± 0.21	11.41 ± 0.21	17.83 ± 0.21	22.76 ± 0.21
LSTM [11]	7.31 ± 0.21	8.83 ± 0.56	14.08 ± 0.85	18.64 ± 0.70
LSTM₊ [11]	10.99 ± 0.23	11.21 ± 0.23	17.48 ± 0.23	22.26 ± 0.23
EFMSAE-LSTM [28]	7.42 ± 0.32	8.89 ± 0.36	14.04 ± 0.65	18.62 ± 0.57
EFMSAE-LSTM₊ [28]	10.96 ± 0.43	11.18 ± 0.21	17.44 ± 0.21	22.23 ± 0.13
Dipole [16]	7.75 ± 0.13	10.76 ± 0.17	15.42 ± 0.40	19.16 ± 0.58
Dipole₊ [16]	13.07 ± 0.29	13.97 ± 0.29	19.55 ± 0.29	23.90 ± 0.29
RETAIN [17]	12.50 ± 0.47	12.25 ± 0.15	17.22 ± 0.11	21.11 ± 0.17
RETAIN₊ [17]	14.72 ± 0.33	15.64 ± 0.33	22.08 ± 0.33	27.04 ± 0.33
Chet [21]	11.19 ± 0.11	14.02 ± 0.13	19.27 ± 0.13	23.93 ± 0.18
Chet₊ [21]	12.73 ± 0.18	14.57 ± 0.12	20.86 ± 0.20	24.88 ± 0.37
IKN-ConvLSTM [31]	12.83 ± 0.21	14.60 ± 0.16	20.92 ± 0.24	24.98 ± 0.39
MFDL [30]	12.85 ± 0.31	14.61 ± 0.13	20.95 ± 0.23	25.02 ± 0.31
MIFDELN [29]	12.88 ± 0.11	14.66 ± 0.15	20.99 ± 0.22	25.08 ± 0.32
MsDFN	15.65 ± 0.17	16.79 ± 0.35	23.27 ± 0.33	28.18 ± 0.27

¹ Here, ₊ indicates that historical maintenance projects, mileage, and vehicle base information are fused for prediction.

Table 7. Maintenance projects for each group.

Group	Maintenance Projects
Group I	Oil Change (Project code: 2000), Oil Filter Replacement (Project code: 2003), and Air Filter Replacement (Project code: 2010)
Group II	Fuel Filter Replacement (Project code: 2056), Oil Change (Project code: 2000), and Air Filter Replacement (Project code: 2010)
Group III	Fuel Filter Replacement (Project code: 2056), Oil Change (Project code: 2000), and Oil Filter Replacement (Project code: 2003)
Group IV	Engine Lubrication System Cleaner Addition (Project code: 2310), Oil Change (Project code: 2000), and Oil Filter Replacement (Project code: 2003)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, F.; Shang, D.; Zhou, G.; Ye, K.; Wu, G. Multi-Source Data Fusion for Vehicle Maintenance Project Prediction. Future Internet 2024, 16, 371. https://doi.org/10.3390/fi16100371

AMA Style

Chen F, Shang D, Zhou G, Ye K, Wu G. Multi-Source Data Fusion for Vehicle Maintenance Project Prediction. Future Internet. 2024; 16(10):371. https://doi.org/10.3390/fi16100371

Chicago/Turabian Style

Chen, Fanghua, Deguang Shang, Gang Zhou, Ke Ye, and Guofang Wu. 2024. "Multi-Source Data Fusion for Vehicle Maintenance Project Prediction" Future Internet 16, no. 10: 371. https://doi.org/10.3390/fi16100371

APA Style

Chen, F., Shang, D., Zhou, G., Ye, K., & Wu, G. (2024). Multi-Source Data Fusion for Vehicle Maintenance Project Prediction. Future Internet, 16(10), 371. https://doi.org/10.3390/fi16100371

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Source Data Fusion for Vehicle Maintenance Project Prediction

Abstract

1. Introduction

2. Related Work

2.1. Traditional Time Series Prediction Methods

2.2. Machine Learning-Based Time Series Prediction Methods

2.3. Deep Learning-Based Time Series Prediction Methods

3. Method

3.1. Notations

3.2. Framework

3.3. Maintenance Project Correlation Representation

3.4. Multi-Source Data Deep Fusion Network

3.4.1. Mileage Fusion Representation

3.4.2. Representation of Data Fusion from Multiple Sources

3.5. Model Optimization

4. Experiment Result and Analysis

4.1. Dataset Description

4.2. Baseline Models and Evaluation Metrics

4.3. Realization Details

4.4. Prediction Performance

4.5. Performance Assessment of Data Sufficiency

4.6. Ablation Study

4.7. Prediction Analysis of New Maintenance Projects

4.8. Parametric Sensitivity Analysis

5. Case Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI