Traffic Prediction with Data Fusion and Machine Learning

Qiu, Juntao; Zhao, Yaping

doi:10.3390/analytics4020012

Open AccessArticle

Traffic Prediction with Data Fusion and Machine Learning

by

Juntao Qiu

^† and

Yaping Zhao

^*,†

Department of Electric and Electronic Engineering, The University of Hong Kong, Hong Kong, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Analytics 2025, 4(2), 12; https://doi.org/10.3390/analytics4020012

Submission received: 14 January 2025 / Revised: 27 February 2025 / Accepted: 1 April 2025 / Published: 9 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Traffic prediction, as a core task to alleviate urban congestion and optimize the transport system, has limitations in the integration of multimodal data, making it difficult to comprehensively capture the complex spatio-temporal characteristics of the transport system. Although some studies have attempted to introduce multimodal data, they mostly rely on resource-intensive deep neural network architectures, which have difficultly meeting the demands of practical applications. To this end, we propose a traffic prediction framework based on simple machine learning techniques that effectively integrates property features, amenity features, and emotion features (PAE features). Validated with large-scale real datasets, the method demonstrates excellent prediction performance while significantly reducing computational complexity and deployment costs. This study demonstrates the great potential of simple machine learning techniques in multimodal data fusion, provides an efficient and practical solution for traffic prediction, and offers an effective alternative to resource-intensive deep learning methods, opening up new paths for building scalable traffic prediction systems.

Keywords:

traffic prediction; data fusion; machine learning; big data; data mining

1. Introduction

In recent years, with the rapid development of the automotive industry and accelerating urbanization, traffic flow in urban areas has shown a significant growth trend, and traffic congestion has become increasingly prominent. According to an authoritative statistical report from China’s Ministry of Public Security, China registered 4.40 million new energy vehicles in the first half of 2024, up 39.41 percent year-on-year, a record high. By the end of June 2024, the number of motor vehicles in China had climbed to 440 million, and the number of motor vehicle drivers had surpassed 532 million [1]. The continuous increase in private car ownership has put forward higher and more urgent requirements for building an efficient and smooth traffic network. Therefore, traffic prediction has become increasingly important as one of the key technical means to solve urban traffic problems.

By accurately predicting future traffic speeds and flows, traffic management authorities can scientifically optimize signal timings, provide drivers with real-time dynamic route suggestions, and formulate flexible and effective signal coordination and route guidance strategies. These measures not only help to shorten travel time and ease traffic bottlenecks but also reduce environmental pollution to a certain extent and improve the overall operational efficiency of urban transport. In recent years, the vigorous development of intelligent transportation systems (ITSs) and smart city projects has been highly dependent on real-time and accurate traffic data support. This trend has further stimulated research enthusiasm in the field of traffic prediction and driven the continuous innovation and evolution of related theories and technologies [2,3,4,5,6,7,8,9].

In the field of traffic prediction, the traditional methods rely on historical data sources, such as data collected by loop detectors or GPS trajectory information, as well as classical statistical models, such as ARIMA [10,11,12,13,14]. Although these methods provide fundamental insights and insights for traffic prediction, they exhibit significant limitations in terms of data coverage, temporal resolution, and real-time performance. With the increasing popularity of advanced technologies such as smartphones and connected cars, machine learning methods such as Hidden Markov Models (HMMs) [15,16], Fuzzy Neural Networks (FNNs) [17,18], and Support Vector Machines (SVMs) [19,20] have emerged and demonstrated significant advantages in capturing the nonlinear and high-dimensional characteristics of traffic flows. In addition, deep learning methods such as Long Short-Term Memory (LSTM) networks have also been applied to traffic prediction tasks, showing superior performance in modeling temporal dependencies and nonlinear relationships in traffic data [21,22]. In terms of spatio-temporal modeling, most existing studies use static graphs for graph convolution (GCN) or a combination of recurrent neural networks (RNNs), convolutional neural networks (CNNs), and methods based on the attention mechanism [23,24,25,26,27,28]. Unfortunately, however, many existing traffic prediction methods still neglect the effective integration of multimodal data, which makes it difficult for them to comprehensively capture the spatio-temporal heterogeneity and complex dynamic correlations in the traffic system and thus exhibit more limited robustness and adaptability. These methods often focus on a single data source or analysis based on static features and fail to fully explore and make use of the rich data resources from multiple dimensions and sources to construct a more comprehensive prediction model.

In recent years, several studies have begun to experiment with combining deep learning with multimodal data to explore that approach’s potential in the field of traffic prediction [29,30,31]. Although these studies have shown some promise, the neural network structures they rely on are often too broad and deep, leading to a sharp increase in the demand for computational and storage resources and a significant extension of the training time, thus limiting the feasibility of their practical applications to a certain extent. Therefore, how to balance computational efficiency and model scalability while effectively integrating multimodal data remains a serious challenge in the field of traffic prediction today.

In order to address the shortcomings of the existing traffic prediction methods in terms of basic machine learning methods and multimodal data fusion, this paper proposes a novel traffic prediction framework that combines multimodal data fusion with relatively simple and easy-to-implement machine learning techniques, as Figure 1 shows. By integrating multimodal data from different sources such as geographic features, tourist attraction features, amenity features, and socio-emotional features, we capture the complex spatio-temporal dynamic patterns in the traffic system. The experimental results show that, even with a simple machine learning approach, good performance can be achieved on large-scale real datasets, which validates the effectiveness of the approach. Meanwhile, we further analyze the impact of various types of features on traffic flow prediction and demonstrate that efficient traffic prediction can be achieved by the PAE framework, providing a more practical and prospective solution for dynamic traffic flow prediction.

The main contributions of our study are summarized in the following:

•: We have systematically analyzed the correlation between various types of features (including geographic features, tourist attraction features, amenity features, and socio-emotional features) and traffic flow and ranked the importance of these features, which provides valuable insights into the key elements influencing traffic flow and contributes to improving the precision of traffic condition forecasts.
•: We have compared and analyzed the prediction accuracy of various machine learning methods, such as Support Vector Machines, Linear Regression, XGBoost Regression, and Random Forest Regression, in a multi-source data fusion task and provided insights into the performance of different variants of Multi-Layer Perceptron (MLP) in a traffic prediction task.
•: Through an ablation study, we have provided an in-depth analysis of the contributions of different features (e.g., amenities, traffic characteristics, and social mood) to traffic flow prediction, revealing the potential roles of these multimodal features in traffic flow modeling.

By adopting the perspective of multi-source data fusion, our approach provides novel insights into traffic flow prediction. The results of this study are expected to improve the accuracy and reliability of traffic prediction models, thus providing strong support to traffic management authorities, urban planners, and related stakeholders, and providing a scientific basis for alleviating traffic congestion and optimizing transport networks.

The remainder of this paper is organized as follows: Section 2 will review the related work and outline the existing studies and their limitations. Section 3 describes in detail the adopted methodology, covering the whole process of data collection, feature extraction, and model selection. Section 4 presents the experimental results and provides an in-depth discussion of the performance of each model. Section 5 further analyses the effectiveness and importance of each component through an ablation study. Finally, Section 6 concludes the full paper.

2. Related Work

Traffic prediction, as a core research direction in the field of transport and urban planning, has always received extensive attention, and many scholars are committed to exploring traffic patterns and developing efficient prediction models. The existing research methods are rich and diverse and include three main types: statistical methods, traditional machine learning methods, and deep learning methods [32,33]. In early research, traffic prediction mainly relied on statistical methods, such as autoregressive (AR) models, moving average (MA) models, and their combined form—autoregressive integral sliding average models (ARIMAs) [34,34]—and other time-series analysis techniques. These methods show good interpretability at the theoretical level; however, they are weak when facing high-dimensional data and nonlinear behavior [35]. With the boom in machine learning techniques, many studies have turned to traditional machine learning methods to overcome the traffic prediction challenges. Among them, models such as Hidden Markov Models (HMMs) and Fuzzy Neural Networks (FNNs) have been favored for their ability to capture nonlinear relationships in traffic data [16,21]. Although traditional machine learning methods have improved the prediction performance to a certain extent, they still have limitations in capturing the complex spatio-temporal dynamic relationships in traffic systems and in handling high-dimensional and large-scale datasets.

For the past few years, deep learning techniques have emerged in the field of traffic prediction by virtue of their superior ability to model complex spatio-temporal dependencies and have become a mainstream research method [36,37]. Convolutional neural networks (CNNs), as one of the models representative of deep learning, have been widely used to extract volumetric features from traffic data, which greatly improves prediction accuracy [38,39,40,41,42,43,44]. Specifically, Li et al. [38] innovatively proposed the diffusion convolutional recurrent neural network (DCRNN), which cleverly models the traffic flow as a directed graph and effectively captures the spatial dependence through the bidirectional stochastic wandering mechanism. Wang et al. [39], on the other hand, based on the graph neural network approach, developed the regression graph recurrent neural network (GRNN), which uses trajectory data to conduct accurate predictions of the road networks in which the average traffic speed was accurately predicted. In addition, Zhang et al. [40] combined gated recurrent units (GRUs) with optimization techniques to improve the traffic speed prediction model and further improve the prediction accuracy. However, it is worth noting that most of these studies focus on single-modal data (e.g., traffic speed or geographic information), and the fusion of multimodal data is still under-utilized. The predictive power of these models is limited to some extent due to the failure to fully exploit and utilize the potential correlations between different data sources.

Although some work has begun to attempt multimodal data fusion in recent years, the related research is still in its infancy, and most of the approaches rely on deep and wide neural network structures, leading to higher computational and storage requirements. For example, Yan et al. achieved a high-dimensional representation by fusing multi-source urban data (e.g., social media, real estate, and point-of-interest data) and combined them with gated recurrent units (GRUs) to capture complex spatio-temporal correlations, which significantly improves the prediction performance [30]. However, the method requires a long training time and a large amount of computational resources, and the demand for storage resources is also more burdensome. In addition, Zhou et al. [31] demonstrated the potential of accurate traffic prediction on large-scale maps based on a large-scale spatio-temporal multimodal fusion framework, but these methods rely on high-dimensional feature spaces and complex network structures, leading to poor scalability in practical applications, especially in resource-constrained environments. Zhang et al. [45] proposed the BjTT dataset, a large multimodal dataset integrating time-series traffic data with textual event descriptions, aiming to address the challenges of insensitivity to anomalous event prediction and limited long-term prediction performance. Nevertheless, the existing methods based on this dataset struggle to achieve robust long-term prediction accuracy when dealing with complex scenarios. In addition, these methods face significant scalability issues due to the high computational and storage requirements for generating models. The graph sparse attention mechanism with bidirectional spatio-temporal convolutional networks (GSABTs) provides a novel framework for multimodal spatio-temporal prediction [46]. The method efficiently extracts global spatial features through the sparse attention mechanism and enhances cross-modal and within-modal spatio-temporal feature extraction using a bidirectional spatio-temporal convolutional network. Although the GSABT approach has achieved state-of-the-art results on multiple datasets such as BJ Taxi, NYC Taxi, and NYC Bike, its scalability and reliance on computationally intensive components in resource-constrained environments are still significant challenges. These challenges highlight the need for more efficient methods that integrate multimodal data effectively while reducing computational and storage demands.

Despite the significant progress achieved by deep learning methods in the field of traffic prediction, the integration of multimodal data remains an under-researched area. The existing methods usually rely on computationally intensive models with high demands on computational and storage resources, making it difficult to apply them practically in resource-constrained environments. In addition, many methods fail to take full advantage of the diverse and complementary information in multimodal data, instead focusing more on high-dimensional feature spaces and employing limited data integration strategies. To address these issues, this study introduces multidimensional features, including property attributes, community facilities, and sentiment indicators (PAE), and combines them with a relatively simple machine learning approach for traffic prediction. The experimental results show that, even with a lightweight algorithm, the complex spatio-temporal dynamic relationships can be effectively captured and good prediction performance can be achieved. This approach not only significantly reduces the computational and storage requirements associated with deep learning models but also highlights the unique value of multimodal features in improving prediction accuracy, providing a practical and efficient alternative.

3. Methodology

This study aims to delve into the potential of traffic flow prediction through multi-source data fusion, with a particular focus on the combined impact of PAE multimodal data on traffic flow variability. The cornerstone of the study is laid at the data collection and preprocessing stage (elaborated on in Section 3.1), during which a total of 27 feature indicators covering property characteristics, neighborhood amenity status, traffic dynamics information, and socio-emotional tendencies are meticulously extracted. Subsequently, in Section 3.3, we provide a nuanced analysis of the correlation between these features and traffic flows. Further, Section 3.4 provides an in-depth assessment of the specific degree of contribution of different features to the accuracy of traffic flow prediction. In terms of model construction and selection, a series of classically recognized machine learning models were used as validation tools in this study, including Support Vector Machines (SVMs) [47], Linear Regression Models [48], the XGBoost algorithm [49], Random Forest methods [50], and Multi-Layer Perceptron (MLP) [51]. These methods are mature and widely adopted in practice [52]. These models were applied to comprehensively validate the effectiveness of the multi-source data fusion strategy (see Section 3.5 for details). It should be particularly emphasized that the core of this study aims at assessing the application value of multimodal data in the field of traffic flow prediction rather than working on the development of completely new prediction algorithms. We comprehensively integrate multidimensional information such as property attributes, amenities, and socio-emotional information, focusing on the role of these factors when fused to enhance traffic flow modeling. The experimental results (presented in detail in Section 4) show that the multimodal data fusion approach demonstrates significant advantages in terms of prediction accuracy and provides valuable insights into the factors that influence traffic flow.

3.1. Data Collection and Preprocessing

In order to construct a high-precision traffic prediction model, this study implements a set of collection and preprocessing processes based on multi-source data [52,53]. The data are derived from the existing research literature and cover multiple dimensions, such as property characteristics, facility characteristics, emotional characteristics, and price information, to ensure the comprehensiveness, accuracy, and diversity of the data. In the following, the key stages in the data processing process will be elaborated on in detail and the role of each feature will be analyzed in depth.

3.1.1. Property Data Collection

Property characteristics are key elements that are used to reveal the potential impact that architectural and geographic attributes may have on traffic flow. The property characteristics adopted in this study cover a wide range of aspects, such as year of construction, lift availability, number of bedrooms, number of living rooms, number of kitchens, and number of bathrooms. These characteristics not only indirectly map the population density of the area but also reflect the functional layout of the residence, which in turn has a profound impact on the local traffic demand. In addition, the latitude and longitude information of the properties, as important geospatial reference data, provides strong support for capturing the spatial distribution pattern of the area and analyzing the intrinsic connection between the area and the traffic nodes.

3.1.2. Amenity Data Extraction

Amenity characteristics are used as quantitative indicators to provide a detailed picture of the level of accessibility around a property and its potential impact on traffic volume. The amenity data included in this study cover a wide range of data types, with specific reference to key aspects such as transport facilities, tourist attractions, education and healthcare facilities, as well as food and beverage and retail facilities. For transport facilities, the amount transport infrastructure in the vicinity and the average distance to the property characterize these data, which is a profound reflection of the layout of transport resources in the area and their attraction effect on traffic flows. For tourist attractions, the number of surrounding attractions and their average distance to the property serve as characteristic indicators, and these data directly correlate with the intensity of tourist traffic in the area, which in turn has a significant and direct impact on traffic pressure. For educational and medical facilities, which are indispensable activity nodes in daily life, the number and average distance of educational institutions, as well as the number and average distance of medical facilities, serve as characteristic data, which are important for understanding the spatial distribution pattern of traffic flow. In addition, in terms of catering and retail facilities, the number and average distance of catering establishments and retail facilities serve as characteristics, and these data not only reflect the commercial activity level of the area but also directly reflect its ability to attract traffic flow. With these data regarding multidimensional facility characteristics, this study is able to comprehensively and deeply analyze the direct impact of facility distribution on traffic flow, providing a solid and reliable data base for the study of the quantitative relationship between regional functionality and traffic behavior.

3.1.3. Emotional Feature Extraction

In order to effectively integrate socio-emotional factors into the traffic prediction and analysis system, the dataset used in this study draws on Fan’s research idea [54] and innovatively extracts sentiment data closely related to the area where the property is located from the social media platform of Weibo. These data are finely classified into five main emotional states, namely anger (AgrPct), dissatisfaction (DstPct), happiness (HppPct), sadness (SadPct), and fear (FeaPct), through in-depth sentiment analyses of the tweet texts. Each emotional state is characterized by its percentage within that particular region, thus quantifying the social–emotional climate of that region. Specifically, areas with a high percentage of anger or dissatisfaction may be intrinsically associated with traffic congestion or regional stress, while areas with a high percentage of happiness may map to smoother traffic flow. By cleverly integrating these emotional characteristics into the analytical framework, this study is able to capture the potential impact of social emotions on traffic flow in a more comprehensive way, which provides a strong basis for interpreting changes in traffic flow at the psychosocial level.

3.2. Feature Selection

In this study, we utilize an existing dataset that integrates multimodal information, which contains 27 features extracted from a variety of sources, and its feature extraction process references the methods of previous studies [52,53]. Through careful feature screening and data preprocessing, the comprehensiveness and accuracy of traffic flow modeling were ensured. To facilitate systematic analysis, these features are grouped into three main categories: property features, amenity features, and emotion features, which together form the core of the traffic flow prediction framework in this paper: the PAE feature system (refer to Table 1 for specific feature classifications and details).

Property features include the number of bedrooms, number of bathrooms, elevator availability, and other key building-related metrics. These features indirectly reflect the area’s population density potential and are closely related to transportation needs.

Amenity features cover the density and distribution of a wide range of amenities (e.g., transportation nodes, educational institutions, medical centers, dining establishments, tourist attractions, etc.) and their average distance from the target area. These characteristics directly reflect the attractiveness and functionality of the area and have a significant impact on traffic flow dynamics at different times of the day. Specifically, high density and diversity of amenities can significantly increase the number of visits to a given area at different points in time, in turn affecting the traffic flow patterns in that area.

Emotion features, calculated based on the results of sentiment analysis of microblogging data in the region, quantify the proportional distribution of five emotional states (e.g., anger, dissatisfaction, happiness, sadness, and fear). These features reflect the social atmosphere of the region and provide a new perspective for analyzing the social factors that implicitly influence traffic patterns.

By utilizing this dataset and categorizing the features within the framework of the PAE system, we aim to construct a traffic flow prediction model that is both robust and comprehensive. This model is designed to fully account for obvious as well as potential factors that influence traffic dynamics, resulting in more accurate and detailed predictions.

3.3. Feature Correlation

Pearson’s correlation coefficient

r_{p q}

is a statistical indicator used to quantify the strength of the linear relationship between two features, p and q, and is limited to values between

- 1

and 1 [55]. When this factor is near 1, it means that there is a strong positive relationship between the two features; on the contrary, if it is close to

- 1

, it implies that the two features show a strong negative correlation. Moreover, when the factor is close to 0, it indicates that there is almost no linear relationship between the two features. Pearson’s correlation coefficient accurately quantifies the linear relationship by calculating the ratio of the covariance of features p and q to the product of their respective standard deviations, which provides an important statistical basis for in-depth analysis of the interdependence between features.

\begin{matrix} r_{p q} = \frac{\sum_{j = 1}^{n} (p_{j} - \bar{p}) (q_{j} - \bar{q})}{\sqrt{\sum_{j = 1}^{n} {(p_{j} - \bar{p})}^{2}} \sqrt{\sum_{j = 1}^{n} {(q_{j} - \bar{q})}^{2}}}, \end{matrix}

(1)

3.4. Feature Importance

Feature importance analysis is a key method for assessing the contribution of features to the predictive power of a model. In this study, we used a Random Forest model to evaluate 27 features of the PAE system and generated a detailed feature importance ranking. Through this process, we identified the key features that have the greatest impact on traffic flow prediction, providing a scientific basis for model optimization and feature screening. This analysis not only helps to simplify the model structure and reduce the computational complexity but also enhances the understanding of the relationship between data features, thus providing strong support to further improve the model performance.

3.5. Prediction Model

In this study, we have carefully selected and incorporated a variety of machine learning models to comprehensively address the complexity and challenges in the traffic flow prediction task. The selected models include Support Vector Machine (SVM) [47], Linear Regression [48], XGBoost [49], Random Forest [50], and Multi-Layer Perceptron (MLP) [51]. In selecting a model, it is important to consider the complexity and diversity of the dataset holistically as well as to ensure a balance between interpretability, robustness, and predictive power. Linear Regression was chosen as the benchmark model due to its simple structure and ease of interpretation, which can provide a reliable reference point for evaluating more complex models. Meanwhile, three machine learning models were chosen for this study based on their complementary strengths: Support Vector Machines and XGBoost are particularly good at capturing nonlinear relationships and complex interaction effects, while Random Forests enhance the stability and robustness of the model by integrating the learning approach, which is particularly suitable for handling high-dimensional data. This combination of models not only enables a systematic comparison of linear and nonlinear models but also provides insights into their different performances under traffic flow prediction.

The proposed method can be represented as follows:

F = f (P, A, E)

where

P = {P_{1}, P_{2}, \dots, P_{k}}, A = {A_{1}, A_{2}, \dots, A_{m}}, E = {E_{1}, E_{2}, \dots, E_{n}}

The specific features for P, A, and E are detailed in Table 1.

Descriptions of the features are as follows:

P represents property features, such as Year, Elvt, Lat, etc.
A represents amenity features, such as RstNum, TspDst, AtrNum, etc.
E represents emotion features, such as AgrPct, HppPct, etc.

The function f represents the predictive model, which can be one of the following:

f \in {Linear Regression, SVM, Random Forest, XGBoost, Multi - Layer Perceptron} .

The model selection criteria cover model diversity (including statistical versus machine learning approaches), ability to handle nonlinear patterns, robustness to noisy data, and relevance to similar tasks in the literature [56,57]. The selected models were included in the study not only for their theoretical advantages but also for their ability to be practically applied to datasets with different attributes, such as geographic locations, transportation characteristics, and socio-economic indicators, which often exhibit nonlinear relationships or complex interactions.

Prediction quality may be affected due to data and model-related challenges. The dataset of this study contains complex interactions between variables, such as high-dimensional features and potential multicollinearity, all of which may affect the performance of the machine learning model in different ways. In addition, the sensitivity of the model to hyperparameter settings may also lead to fluctuations in prediction accuracy. To address these issues, we adopted a grid search combined with cross-validation to optimize the hyperparameters of Random Forest and XGBoost to ensure that the optimal parameter combinations are selected, thus striking a balance between prediction accuracy and computational efficiency. The dataset is divided into training and testing subsets according to a 70–30 ratio in order to facilitate the evaluation of the model’s generalization ability on unseen data.

3.6. Evaluation

In order to comprehensively and accurately assess the forecasting performance of the model, this study adopts five key evaluation indexes: the coefficient of determination (R²), the adjusted coefficient of determination (adjusted R²), the mean absolute error (MAE), the mean squared error (MSE), and the root mean square error (RMSE). These metrics provide an in-depth measure of the model’s predictive efficacy from different dimensions and perspectives, and their specific formulas are shown below:

\begin{matrix} R^{2} & = 1 - \frac{\sum_{j = 1}^{n} {(y_{j} - {\hat{y}}_{j})}^{2}}{\sum_{j = 1}^{n} {(y_{j} - \bar{y})}^{2}}, \end{matrix}

(2)

\begin{matrix} Adjusted R^{2} & = 1 - [\frac{(1 - R^{2}) \times (n - 1)}{n - k - 1}], \end{matrix}

(3)

\begin{matrix} MAE & = \frac{\sum_{j = 1}^{n} | y_{j} - {\hat{y}}_{j} |}{n}, \end{matrix}

(4)

\begin{matrix} MSE & = \frac{\sum_{j = 1}^{n} {(y_{j} - {\hat{y}}_{j})}^{2}}{n}, \end{matrix}

(5)

\begin{matrix} RMSE & = \sqrt{\frac{\sum_{j = 1}^{n} {(y_{j} - {\hat{y}}_{j})}^{2}}{n}}, \end{matrix}

(6)

With the help of these diversified evaluation indicators, we are able to examine the performance of the model in an all-round and multi-angle way, providing a solid and reliable basis for the horizontal and vertical comparative analyses of the forecasting capability.

4. Experiments

4.1. Feature Correlation

In this study, traffic speed (TrfV) serves as the core dependent variable, which is influenced by multidimensional factors including property attributes, neighborhood amenities, geographic coordinates, and emotional characteristics. To quantify the relationship between these factors and traffic speed, we used Pearson correlation analysis and assessed the reliability of the correlation through significance test (p-value), with p < 0.05 as the threshold to judge statistical significance. The heat map (Figure 2) visualizes the strength of the correlation between the variables and traffic speed and the interaction between different features, which provides a basis for further exploring the indirect effects of the variables and their intrinsic mechanisms. The results and logic of the analyses are explained in detail below in terms of each feature category.

1. There were varying degrees of correlation between property characteristics and traffic flow, with a correlation coefficient of 0.3 between house prices and traffic flow indicating that areas with higher house prices tend to be accompanied by higher traffic flows, possibly reflecting the fact that these areas typically have higher population densities or more vibrant economic activity, which in turn increases the traffic load. The correlation coefficient between year of construction and traffic flow is −0.1, indicating that newer areas may have slightly higher traffic flow, possibly due to the fact that newly developed areas tend to attract more residents to move in, accompanied by improved commercial and transport infrastructure, whereas older areas tend to be more saturated with development and have relatively stable traffic flow. The correlation coefficient between the number of lifts and traffic flow is 0.1, suggesting that there may be a slight correlation between the location of buildings with more lifts, which tend to be high-rise residential or commercial buildings, and usually imply higher population density or more intensive work–life activities.

Nonetheless, most of the property characteristics show weak correlations, suggesting that they have a limited direct impact on traffic flows. Some of the variables may indirectly influence traffic behavior through interaction with other factors; for example, housing prices may be an indicator of the economic vitality of the region, while the number of years since construction may indirectly reflect the stage of development and the state of infrastructure in the region. Therefore, although the direct correlations are low, these complex indirect relationships deserve to be further explored in order to gain a more comprehensive understanding of how property characteristics influence urban traffic dynamics through multiple pathways.

2. Emotion features are extracted from social media texts (e.g., posts, comments, etc.) by natural language processing tools, and user-generated content is classified into different emotion categories based on a sentiment analysis model, which calculates the overall percentage of each emotion category. Sentiment expressions of regional residents may indirectly reflect certain characteristics of regional traffic flows as these sentiments are often closely related to the living environment, social order, and traveling experience. For example, the correlation coefficient of 0.2 for the emotion of disgust (DstPct) suggests that areas with higher disgust may be accompanied by higher traffic flows, which may be due to a high population density or level of economic activity, as well as residents’ negative feelings about traffic conditions due to congestion or inadequate infrastructure. Similarly, the correlation coefficients for anger (AgrPct) and fear (FeaPct) were both 0.1, showing a weak positive correlation, suggesting that these emotions may increase in areas with higher traffic volumes, which may be related to travel stress, insecurity, or other potential problems.

In contrast, the correlation coefficients for the emotions of happiness (HppPct) and sadness (SadPct) were 0.0 and −0.0, respectively, suggesting that there is little to no significant linear relationship between these two emotions and traffic flow. Happiness and sadness reflect more of residents’ perceptions of overall quality of life rather than being limited to traffic dynamics. Therefore, although the direct association between affective traits and traffic flow is weak, they may act on traffic flow through indirect mechanisms such as influencing area satisfaction or social behavior, a finding that highlights the importance of further exploring the complex interactions between affective traits and traffic phenomena for a comprehensive understanding and optimization of the urban transport system and its associated social environment.

3. The result of the correlation analysis between geographic coordinates (latitude and longitude) and traffic flow is 0.0, indicating that there is no significant linear relationship between the two. Despite the seeming absence of a direct correlation, geographic coordinates may still indirectly influence traffic dynamics through a range of other factors. As direct identifiers of regional spatial location, latitude and longitude not only indicate geographic location but also implicitly reflect socio-economic characteristics such as the level of urbanization, population density, distribution of infrastructure, and the layout of the transport network in the region.

For example, a higher level of urbanization is usually accompanied by higher population density and denser distribution of transport nodes, which together result in higher traffic flows; on the contrary, in suburban or rural areas, traffic flows are relatively low due to lower population density and less transport infrastructure. The current findings fail to capture these complex indirect mechanisms in the linear correlation coefficients, implying that the effect of geographic coordinates on traffic flow is realized through multi-factor interactions. In order to more fully understand how geographic features indirectly influence traffic flow through interaction with other key factors, such as urbanization level, density of transport nodes, and population density, future studies need to explore this in depth using a more integrated approach.

4. Infrastructure characteristics, such as transport facilities, tourist attractions, restaurants, and educational facilities, have varying impacts on traffic flows, with the variables showing varying degrees of correlation, suggesting that these factors may significantly influence traffic dynamics under specific conditions. The results of the analyses in Figure 2 show that the number of transport facilities does not have a direct correlation with traffic flows (correlation coefficient of 0.0) and that the correlation between the average distance of facilities is again insignificant (correlation coefficient of 0.0). In contrast, the number of tourist attractions had a significant positive correlation with traffic flow (correlation coefficient of 0.4), suggesting that tourist activity increases vehicular and pedestrian traffic in the area, leading to increased traffic flow. Areas closer to attractions (the correlation coefficient between the average distance of the attractions and the traffic flow is −0.2) show higher traffic flows, while areas farther away from the attractions show a gradual decrease in traffic flows, reflecting the traffic pressure caused by the concentration of tourists in the vicinity of the attractions.

In addition, the number of educational facilities showed a weak positive correlation with traffic flow (correlation coefficient of 0.1), with areas close to the educational facilities (mean distance correlation coefficient of −0.1) showing a slight increase in localized traffic flow, which may be due to commuting demand around schools. The correlation coefficient between the number of medical facilities and traffic flow is 0.2, indicating that their distribution contributes to traffic flow; however, areas close to medical facilities (average distance correlation coefficient of −0.0) have a minimal direct effect on traffic flow. Neither the number of restaurants and retail facilities nor their mean distances had significant correlations with traffic flow (both 0.0 or close to 0.0), indicating that they have a small direct effect on traffic flow.

Overall, there were significant differences in the impact of the density and distribution of different infrastructures on traffic flows. The number and proximity of tourist attractions have a significant driving effect on traffic flow, while the effect of other facilities is more limited and only visible under certain conditions. Therefore, in urban planning and traffic management, we should focus on the traffic pressure in areas with high density of tourist attractions and optimize the traffic organization in high-traffic areas in combination with the distribution of educational facilities in order to alleviate local congestion and improve the overall travel efficiency.

Through systematic correlation analysis, this study delves into the complex relationship between various types of features and traffic flow (TrfV) in multimodal data, reveals the key role of amenity features, and validates the potential value of property features and emotional features in specific scenarios. This breakthrough discovery not only enriches the existing theoretical framework for traffic prediction but also provides a new methodological basis for accurately assessing the importance of features. Especially importantly, this study opens up a new path for optimizing prediction models and enhancing the effectiveness and scientificity of traffic management strategies by resolving the interactions between these features. Therefore, this study not only theoretically enhances the understanding of the application of multimodal data but also provides a solid scientific basis and practical guidance for the actual operation of urban planning and traffic management.

4.2. Feature Importance

The feature importance analysis reveals which core elements in the dataset play a pivotal role in constructing the model, and it provides us with clear insights by accurately quantifying the specific impact of each feature on the traffic prediction target variable (TrfV). As vividly demonstrated in Figure 3, the importance of features presents a distinctive cascading distribution pattern, where some of the features significantly outperform the rest of the features in terms of their contribution, which is decisive for the enhancement of the prediction effect. This finding not only points out the direction for further optimization of the model but also provides a highly valuable reference basis for feature selection.

1. The core dominance of tourism-related features, AtrNum (number of tourist attractions in the vicinity), stands out for its significant importance as the most critical feature for the prediction of TrfV. This finding provides strong evidence of the decisive influence of the density of tourist attractions on traffic speeds. In areas with a high concentration of tourist attractions, the frequency of tourist activity directly leads to a surge in traffic flow and a slowdown in speed. In addition, tourism-related distance characteristics, such as AtrDst (distance to attractions), also show high importance, further confirming that the association between geographic location and attractions has a direct and significant impact on traffic prediction. This suggests that tourism activities, as an important driver of regional traffic, should be given primary consideration in traffic prediction models.

2. House price (Price), as the second most important variable in terms of feature importance, plays a pivotal role in traffic forecasting. Higher price areas tend to reflect higher economic levels and advanced stages of urbanization. These areas are usually better equipped with commercial facilities, densely populated, and have relatively high vehicle ownership rates, which together result in a significant increase in traffic flow and dynamic changes in traffic speeds. In addition, high house prices imply that residents have greater economic power, making it easier for them to afford the cost of car ownership and use, a situation that further exacerbates the pressure on traffic flows in these areas. Therefore, house prices are not only a reflection of the level of economy and urbanization but also an important reference indicator for predicting the dynamics of traffic volume.

3. In terms of ranking the importance of features, geographical location features (Lat and Lng), although ranked first, are less important than the number of tourist attractions (AtrNum) and house price (Price), which are in the top two positions. This finding reveals that, while there is some correlation between geographic location and traffic speed, it is not the most influential factor. However, as a key determinant of regional infrastructure layout, population density, and traffic flow characteristics, the impact of geographic location on traffic conditions cannot be ignored. The geographic characteristics of the region largely determine the structural design of the road network, the formation of commuting patterns, and the characteristics of traffic flow distribution, and these elements are intertwined with each other to outline the intrinsic pattern of traffic speed changes.

4. Auxiliary role of facility distribution features and sentiment features. Facility distribution features, such as HthDst (average distance from neighboring medical facilities) and RtlDst (average distance from retail facilities), occupy a place in feature importance, suggesting that they also make a non-negligible contribution to traffic prediction. Healthcare and commercial facilities have a direct impact on traffic speeds as important gathering places for both pedestrian and vehicular traffic. Although emotional features, such as HppPct (Happy Emotional Proportion) and SadPct (Sad Emotional Proportion), are relatively low in the rankings, they still show some predictive power, which may reflect the fact that psychosocial states have an impact on traffic conditions in some indirect way.

In summary, the dominance of tourism-related features in traffic forecasts reveals the important role of the nature of regional activities (e.g., tourism activities) in driving traffic conditions beyond the influence of infrastructure distribution alone. The importance of location and house prices, on the other hand, highlights the profound shaping power of urbanization and economic development on traffic patterns. Meanwhile, the moderate importance of amenity distribution characteristics demonstrates the direct contribution of places for everyday activities (e.g., healthcare and retail facilities) to traffic flows. Despite the relatively low importance of emotional characteristics, their potential indirect effects provide us with a new dimension worth exploring in depth: how to incorporate psychosocial states to further optimize traffic prediction models. Based on these findings, we can further explore the interactions between different features with a view to constructing more comprehensive and accurate traffic prediction models.

By systematically analyzing the importance of features, this study not only quantifies the specific contribution of key features to traffic prediction models but also provides insights into the far-reaching impacts of the nature of regional activities and urbanization processes on traffic patterns. Our study clarifies which features are most critical, provides valuable optimization guidelines for feature engineering, and lays the foundation for improving the accuracy and reliability of traffic prediction models, opening up new perspectives on multimodal data fusion methods.

4.3. Prediction Model

4.3.1. Machine Learning Models

In the task of traffic speed prediction, this paper presents an exhaustive comparison of the performance of Linear Regression, Random Forest Regression, and XGBoost Regression models and develops an in-depth analysis of three dimensions: the correlation between the true values and the predicted values, the characteristics of the residual distribution, and the overall performance of the model.

1. Figure 4a–c specifically demonstrate the predictive effectiveness of the Linear Regression models. By looking at the scatterplot of the true and predicted values (Figure 4a), it can be observed that the fit of the data points to the ideal diagonal is low, especially in the low- and high-speed intervals, and the complex nonlinear relationships embedded in the data are difficult to be accurately captured by the model. This shortcoming is more visually demonstrated in the residual distribution plot (Figure 4b), where the residuals show a clear systematic bias, and the magnitude of the error is particularly significant in the region of low predicted values. Further, from the histogram of residuals (Figure 4c), it can be observed that the Linear Regression model has a much larger range of error distributions and significantly deviates from the normal distribution pattern. These phenomena fully reveal the inherent limitations of the theoretical framework of the Linear Regression model in dealing with traffic speed data with complex multimodal characteristics, which seriously constrains the predictive performance of the model.

2. Figure 4d–f demonstrate the significant predictive performance advantage of the Random Forest Regression model over the Linear Regression model. In the scatterplot of true versus predicted values (Figure 4d), the data points are more tightly clustered around the ideal diagonal, which indicates that the Random Forest Regression model is able to capture the feature information in the data more effectively. The plot of residual distribution (Figure 4e) shows that the residual distribution of the Random Forest Regression model is much more homogeneous, and the systematic bias is reduced substantially, although some bias still exists in the region of low predictive values. In addition, the histogram of residuals (Figure 4f) presents a shape closer to a normal distribution, and the range of the error distribution is significantly reduced. These results fully demonstrate that the Random Forest Regression model exhibits higher accuracy and stability in fitting complex data patterns, and its overall performance is clearly superior to that of the Linear Regression model.

3. Figure 4g–i demonstrate the prediction performance of the XGBoost Regression model. As can be seen from the scatterplot of the true values versus the predicted values (Figure 4g), the predicted values of the XGBoost model fit the ideal diagonal very well, which fully demonstrates its outstanding ability to capture complex nonlinear patterns in the data. This conclusion is further confirmed by the residual distribution plot (Figure 4h), where the residuals are closely distributed around the zero value, indicating that the magnitude of the error is very small and there is almost no obvious systematic bias. In addition, the histogram of the residuals (Figure 4i) shows a spiked normal distribution pattern, indicating that the XGBoost model has a high degree of uniformity in the error distribution. In summary, the XGBoost model demonstrates excellent accuracy and robustness in predicting traffic speed data.

4. Figure 4j–l demonstrate the prediction performance of the Support Vector Machine (SVM) model. The scatterplot of true versus predicted values (Figure 4j) shows that the distribution of data points does not fit well with the ideal diagonal, and the deviation of predicted values from the true values is more pronounced, especially in the low- and high-speed intervals. This indicates that the SVM model has some limitations in capturing the nonlinear relationships in the data. The plot of the residual distribution (Figure 4k) shows that there is large systematic bias in the residuals, and the error is particularly prominent in the region of low predicted values. In addition, the histogram of the residuals (Figure 4l) shows that the distribution of the errors is wide and does not exhibit the characteristics of a close-to-normal distribution. These results indicate that the SVM model has limited performance in dealing with complex nonlinear and multimodal data, and the stability and accuracy of its prediction results are insufficient compared with those of the Random Forest and XGBoost models.

Based on the quantitative assessment results presented in Table 2, we were able to validate the conclusions of the previous analyses in greater depth. Among the many models, the Random Forest Regression model stands out for its excellent performance on several assessment metrics. Its goodness of fit (R²) is as high as 0.9244, which not only far exceeds the Linear Regression model’s 0.2022 but also slightly outperforms the XGBoost model’s 0.8896, which is a strong proof of the Random Forest model’s unparalleled ability to capture the intrinsic patterns of the data. In addition, the performance of the Random Forest model in error control is also impressive, with a mean absolute error (MAE) of only 1.4605 and mean square error (MSE) and root mean square error (RMSE) as low as 17.0376 and 4.1276, respectively, which are the lowest among the models, which fully demonstrates its significant advantage in prediction accuracy.

Although the error metrics of the XGBoost model (MAE of 2.6739, MSE of 24.8939, and RMSE of 4.9894) are slightly higher than those of the Random Forest, its residual distribution pattern is close to a normal distribution, which indicates that the XGBoost model performs equally well in terms of robustness and stability. In contrast, the performance of the Linear Regression model is unsatisfactory in all the evaluation indexes, its MAE is as high as 10.1823, and its MSE and RMSE are soaring to 179.8612 and 13.4112, respectively, which fully demonstrates that the Linear Regression model is incompetent in dealing with complex nonlinear data, and it is difficult to satisfy the demand of high-precision prediction.

Compared to the three models mentioned above, the performance of the SVM model is dwarfed. Its goodness of fit (R²) is only 0.0291, which is even lower than that of the Linear Regression model, indicating that the SVM model is almost helpless in capturing complex patterns in the data. In terms of error metrics, the SVM model performs equally poorly, with an MAE of 6.7222 and MSE and RMSE as high as 218.8753 and 14.7944, respectively, which are the highest values among the models. These results undoubtedly reveal that the SVM model is not capable of handling complex multimodal data, both in terms of prediction ability and error control.

The combination of qualitative and quantitative assessment results confirms that tree-structured Nonlinear Regression methods such as Random Forest and XGBoost are more suitable than linear models for handling complex multimodal data. The analysis highlights the importance of simultaneously assessing model accuracy and robustness, providing a comprehensive understanding of model performance in traffic speed prediction tasks.

It is worth noting that, despite the theoretical significant advantages of Support Vector Machine (SVM) models in dealing with high-dimensional datasets and specific types of nonlinear problems, their performance in the task of traffic speed prediction in this study falls far short of expectations. This phenomenon may be attributed to the inherent limitations exhibited by SVM models when faced with highly noisy and multimodal data. The above findings reveal significant challenges in applying SVM models to complex real-world scenarios such as traffic speed prediction. In these scenarios, the data not only contain a large number of complex nonlinear interactions but also a significant noise component, which puts higher demands on the predictive performance of the model.

By systematically evaluating the performance of multiple machine learning models, this study highlights that tree-based Nonlinear Regression methods, such as Random Forest and XGBoost, demonstrate significant advantages in handling complex multimodal data. These models excel in capturing complex nonlinear relationships and reducing prediction errors, significantly outperforming traditional Linear Regression models and SVM (Support Vector Machine) models. This comprehensive comparison provides valuable guidance for selecting the most appropriate machine learning algorithms based on specific data characteristics and further emphasizes the central importance of robust and scalable machine learning models in the multimodal traffic speed prediction task. This study not only validates the effectiveness of the tree-based approach but also provides a practical reference for future exploration of intelligent transport systems, pointing out the development direction to improve the accuracy and reliability of traffic prediction. In this way, this work not only contributes to the academic community but also provides far-reaching implications for practical applications.

4.3.2. Multi-Layer Perceptron Models

In order to evaluate the performance of deep learning methods in traffic prediction, experimental analyses of 3-, 5-, 7-, 9-, 11-, 13-, and 15-layer Multi-Layer Perceptron (MLP) machine models [58] are conducted in this study, which show that these shallow MLP models perform significantly lower than traditional machine learning models such as Random Forest and XGBoost. MLP models are widely used in various prediction tasks due to their ability to approximate complex functions [59], but their performance heavily depends on the network depth and data representation [60]. As shown in Table 3, the R² value of the 15-layer MLP model is only 0.2590, while Random Forest and XGBoost reach 0.9244 and 0.8896, respectively, while the error metrics (e.g., MAE and MSE) of the MLP model are also significantly higher. This disparity may be due to the fact that, even with relatively shallow networks and a similar number of parameters, the MLP model still struggles to effectively capture the complex features of multimodal data, whereas Random Forest and XGBoost make more efficient use of the data properties through integrated learning mechanisms. To achieve the same effect through deep learning methods, deeper and more complex network structures need to be constructed, which will significantly increase the demand for computational and storage resources and prolong the training time, limiting the feasibility of their practical application. In contrast, traditional machine learning methods can achieve excellent prediction results at lower computational costs, demonstrating better model scalability and practicality.

5. Ablation Study

As shown in Table 4, in order to deeply and comprehensively evaluate the impact of different features on traffic flow prediction in multi-source data fusion, we carefully designed ablation experiments for four mainstream machine learning models, namely SVM, Linear Regression, XGBoost, and Random Forest, and systematically and meticulously evaluated their prediction performances on the training and test sets.

(1) When property features, amenity features, and sentiment features (with PAE) were combined, the best prediction performance was achieved by each model. This finding provides strong evidence that the fusion of multidimensional features has an indispensable and crucial role in accurately predicting traffic flow.

(2) When the amenity feature is excluded (without A), this leads to the most significant degradation in model performance. This result clearly indicates that amenity features play a more critical and significant role in traffic flow prediction than other features.

(3) It is worth noting that, although the impact of excluding sentiment features (without E) on model performance is relatively more limited, the contribution of sentiment features to the prediction performance as a single-dimensional attribute still cannot be ignored. Even if its impact is relatively small, the inclusion of sentiment features can still improve the prediction accuracy and robustness of the model to a certain extent.

Our study clearly reveals the importance of combining multiple data types (especially amenity features) to improve traffic flow prediction accuracy. Whilst the impact of affective features alone is relatively limited, when combined with property and amenity features, it still helps to improve the predictive accuracy and robustness of the model.

This analysis systematically quantifies the relative importance of different feature categories in the context of multimodal data fusion and constitutes an important contribution to our study. The findings highlight the key role of amenity features and the complementary value of emotion and property features, providing practical insights for optimizing traffic prediction models. By bridging the gap between feature-level analysis and model-level performance, this study deepens the understanding of the interactions between different data sources and paves the way for advancing traffic flow modeling and multi-source data fusion approaches.

6. Conclusions

In this paper, we address the limitations of the existing traffic prediction methods in the field of multimodal data fusion and propose a novel multimodal data fusion method based on a simple machine learning framework, aiming to fully exploit the potential of multimodal data and effectively deal with the problem of excessive resource requirements of complex deep learning models in practical applications. Currently, most traffic prediction methods are still limited to the analysis of single-modal data, ignoring the rich information and advantages that multimodal data fusion can provide. Although a few studies have attempted to introduce multimodal data into traffic prediction in recent years, these methods often rely on large and complex neural network structures, which, despite the gain in prediction performance, lead to inefficient training due to the huge scale of the model parameters and the high demand for computational and storage resources, making it difficult to satisfy the urgent demand for efficiency and scalability in real-world scenarios.

To this end, we adopt a simple and basic machine learning technique to achieve effective fusion of multimodal data, filling a research gap in this area. By integrating property, amenity, and emotion features (referred to as PAE features), the experimental results show that this simple machine learning approach provides strong prediction performance while significantly reducing computational and resource requirements, making it more suitable for practical applications. This approach highlights the importance of multi-source data fusion in improving the accuracy and usefulness of traffic flow prediction.

The core contribution of this study is to fully demonstrate the great potential and unique advantages of simple and easy-to-implement machine learning techniques in the field of multimodal traffic forecasts. Through the fusion of multimodal data (PAE features), we provide an efficient and practical solution for traffic prediction, which not only provides strong technical support for traffic management and urban planning but also opens up new paths for future research on multimodal data fusion and provides valuable references and insights. The methodological and practical results of this study are expected to promote the innovation and development of traffic prediction technology, and to contribute new strength to the construction of smart cities and the optimization of transport systems.

Despite the potential demonstrated by the findings, there are several limitations that need to be further explored and refined. Random Forests and XGBoost perform excellently in traffic speed prediction but face challenges when dealing with extreme situations. For example, at very low speeds or when vehicles are stationary, Random Forest struggles to accurately identify stationary states, while XGBoost may produce unreasonably negative predictions. Also, both models tend to underestimate speed values in the intermediate range, leading to significant residual errors in some cases. These challenges highlight the need for improvements to the models to enhance their robustness and accuracy in particular cases.

In addition, while multimodal datasets incorporating attributes, convenience, and sentiment characteristics provide a comprehensive view of the data, they may fail to adequately capture other key factors affecting traffic flow, such as weather conditions, changes in road infrastructure, or real-time crash information. Integrating these additional dimensions into the dataset can help to improve prediction accuracy and real-world application value.

Future research can address these issues in a number of ways. Developing hybrid models that combine the strengths of machine learning techniques with expertise in the field of traffic engineering can help to overcome the challenges in extreme scenarios. Utilizing advanced data sources, such as IoT real-time traffic sensors or satellite imaging data, can increase the richness and timeliness of datasets. Exploring more sophisticated hyperparameter optimization techniques or integrated learning strategies can also help to improve the stability and reliability of predictions, especially in complex urban environments.

Author Contributions

Conceptualization, Y.Z.; Methodology, Y.Z.; Software, Y.Z.; Validation, J.Q.; Formal analysis, J.Q.; Investigation, J.Q.; Resources, Y.Z.; Data curation, Y.Z.; Writing—original draft preparation, J.Q.; Writing—review and editing, Y.Z.; Visualization, J.Q.; Supervision, Y.Z.; Project administration, Y.Z.; Funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Security, P. Motor Vehicle Ownership Reaches 440 Million in China. Website. 2024. Available online: https://www.gov.cn/lianbo/bumen/202407/content_6961935.htm (accessed on 13 November 2024).
Muthuramalingam, S.; Bharathi, A.; Rakesh Kumar, S.; Gayathri, N.; Sathiyaraj, R.; Balamurugan, B. IoT based intelligent transportation system (IoT-ITS) for global perspective: A case study. In Internet of Things and Big Data Analytics for Smart Generation; Springer: Berlin/Heidelberg, Germany, 2019; pp. 279–300. [Google Scholar]
Lv, Z.; Shang, W. Impacts of intelligent transportation systems on energy conservation and emission reduction of transport systems: A comprehensive review. Green Technol. Sustain. 2023, 1, 100002. [Google Scholar] [CrossRef]
Liu, J.; Chang, C.; Liu, J.; Wu, X.; Ma, L.; Qi, X. Mars3d: A plug-and-play motion-aware model for semantic segmentation on multi-scan 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 9372–9381. [Google Scholar]
Lai, X.; Chen, Y.; Lu, F.; Liu, J.; Jia, J. Spherical transformer for lidar-based 3d recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 17545–17555. [Google Scholar]
Liu, J.; Chen, Y.; Ye, X.; Tian, Z.; Tan, X.; Qi, X. Spatial pruned sparse convolution for efficient 3d object detection. Adv. Neural Inf. Process. Syst. 2022, 35, 6735–6748. [Google Scholar]
Yang, J.; Shi, S.; Wang, Z.; Li, H.; Qi, X. St3d: Self-training for unsupervised domain adaptation on 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2021; pp. 10368–10378. [Google Scholar]
Faramarzi, A.; Heidarinejad, M.; Stephens, B.; Mirjalili, S. Equilibrium optimizer: A novel optimization algorithm. Knowl.-Based Syst. 2020, 191, 105190. [Google Scholar] [CrossRef]
Popescu, M.C.; Balas, V.E.; Perescu-Popescu, L.; Mastorakis, N. Multilayer perceptron and neural networks. WSEAS Trans. Circuits Syst. 2009, 8, 579–588. [Google Scholar]
Levin, M.; Tsao, Y.D. On forecasting freeway occupancies and volumes (abridgment). Transp. Res. Rec. 1980, 773, 47–49. [Google Scholar]
Bogaerts, T.; Masegosa, A.D.; Angarita-Zapata, J.S.; Onieva, E.; Hellinckx, P. A graph CNN-LSTM neural network for short and long-term traffic forecasting based on trajectory data. Transp. Res. Part C Emerg. Technol. 2020, 112, 62–77. [Google Scholar] [CrossRef]
Ahmed, M.S.; Cook, A.R. Analysis of Freeway Traffic Time-Series Data by Using Box-Jenkins Techniques. Transp. Res. Rec. 1979, 722. [Google Scholar]
Tedjopurnomo, D.A.; Bao, Z.; Zheng, B.; Choudhury, F.M.; Qin, A.K. A survey on modern deep neural network for traffic prediction: Trends, methods and challenges. IEEE Trans. Knowl. Data Eng. 2020, 34, 1544–1561. [Google Scholar] [CrossRef]
Zhao, Y.; Li, G.; Lam, E.Y. Cross-camera human motion transfer by time series analysis. In Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 4985–4989. [Google Scholar]
Zhu, G.; Song, K.; Zhang, P.; Wang, L. A traffic flow state transition model for urban road network based on Hidden Markov Model. Neurocomputing 2016, 214, 567–574. [Google Scholar] [CrossRef]
Qi, Y.; Ishak, S. A Hidden Markov Model for short term prediction of traffic conditions on freeways. Transp. Res. Part C Emerg. Technol. 2014, 43, 95–111. [Google Scholar] [CrossRef]
Yin, H.; Wong, S.; Xu, J.; Wong, C. Urban traffic flow prediction using a fuzzy-neural approach. Transp. Res. Part C: Emerg. Technol. 2002, 10, 85–98. [Google Scholar] [CrossRef]
Quek, C.; Pasquier, M.; Lim, B.B.S. POP-TRAFFIC: A novel fuzzy neural approach to road traffic analysis and prediction. IEEE Trans. Intell. Transp. Syst. 2006, 7, 133–146. [Google Scholar] [CrossRef]
Tang, J.; Chen, X.; Hu, Z.; Zong, F.; Han, C.; Li, L. Traffic flow prediction based on combination of support vector machine and data denoising schemes. Phys. A Stat. Mech. Appl. 2019, 534, 120642. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, Y. Traffic forecasting using least squares support vector machines. Transportmetrica 2009, 5, 193–213. [Google Scholar] [CrossRef]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
Vinayakumar, R.; Soman, K.; Poornachandran, P. Applying deep learning approaches for network traffic prediction. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; pp. 2353–2358. [Google Scholar]
Tian, Y.; Pan, L. Predicting short-term traffic flow by long short-term memory recurrent neural network. In Proceedings of the 2015 IEEE international conference on smart city/SocialCom/SustainCom (SmartCity), Chengdu, China, 19–21 December 2015; pp. 153–158. [Google Scholar]
Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; Zhang, J. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1720–1730. [Google Scholar]
Zhao, Y.; Wang, Z.; Lam, E.Y. Improving source localization by perturbing graph diffusion. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 1–9. [Google Scholar]
Zhao, S.; Zhong, R.Y.; Wang, J.; Xu, C.; Zhang, J. Unsupervised fabric defects detection based on spatial domain saliency and features clustering. Comput. Ind. Eng. 2023, 185, 109681. [Google Scholar] [CrossRef]
Zhou, B.; Liu, J.; Cui, S.; Zhao, Y. Large-scale traffic congestion prediction based on multimodal fusion and representation mapping. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 1–9. [Google Scholar]
Yan, Y.; Cui, S.; Liu, J.; Zhao, Y.; Zhou, B.; Kuo, Y.H. Multimodal fusion for large-scale traffic prediction with heterogeneous retentive networks. Inf. Fusion 2025, 114, 102695. [Google Scholar] [CrossRef]
Zhou, B.; Liu, J.; Cui, S.; Zhao, Y. A large-scale spatio-temporal multimodal fusion framework for traffic prediction. Big Data Min. Anal. 2024, 7, 621–636. [Google Scholar] [CrossRef]
Yang, H.; Du, L.; Zhang, G.; Ma, T. A traffic flow dependency and dynamics based deep learning aided approach for network-wide traffic speed propagation prediction. Transp. Res. Part B Methodol. 2023, 167, 99–117. [Google Scholar] [CrossRef]
Crawford, F.; Watling, D.; Connors, R. A statistical method for estimating predictable differences between daily traffic flow profiles. Transp. Res. Part B Methodol. 2017, 95, 196–213. [Google Scholar] [CrossRef]
Xu, C.; Li, Z.; Wang, W. Short-term traffic flow prediction using a methodology based on autoregressive integrated moving average and genetic programming. Transport 2016, 31, 343–358. [Google Scholar] [CrossRef]
Xu, D.W.; Wang, Y.D.; Jia, L.M.; Qin, Y.; Dong, H.H. Real-time road traffic state prediction based on ARIMA and Kalman filter. Front. Inf. Technol. Electron. Eng. 2017, 18, 287–302. [Google Scholar] [CrossRef]
Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. Ieee Trans. Intell. Transp. Syst. 2014, 16, 865–873. [Google Scholar] [CrossRef]
Polson, N.G.; Sokolov, V.O. Deep learning for short-term traffic flow prediction. Transp. Res. Part C Emerg. Technol. 2017, 79, 1–17. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Wang, X.; Chen, C.; Min, Y.; He, J.; Yang, B.; Zhang, Y. Efficient metropolitan traffic prediction based on graph recurrent neural network. arXiv 2018, arXiv:1811.00740. [Google Scholar]
Zhang, K.; Zheng, L.; Liu, Z.; Jia, N. A deep learning based multitask model for network-wide traffic speed prediction. Neurocomputing 2020, 396, 438–450. [Google Scholar] [CrossRef]
Nguyen, T.; Nguyen, G.; Nguyen, B.M. EO-CNN: An enhanced CNN model trained by equilibrium optimization for traffic transportation prediction. Procedia Comput. Sci. 2020, 176, 800–809. [Google Scholar] [CrossRef]
Yang, D.; Li, S.; Peng, Z.; Wang, P.; Wang, J.; Yang, H. MF-CNN: Traffic flow prediction using convolutional neural network and multi-features fusion. IEICE Trans. Inf. Syst. 2019, 102, 1526–1536. [Google Scholar] [CrossRef]
Mehdi, M.Z.; Kammoun, H.M.; Benayed, N.G.; Sellami, D.; Masmoudi, A.D. Entropy-based traffic flow labeling for CNN-based traffic congestion prediction from meta-parameters. IEEE Access 2022, 10, 16123–16133. [Google Scholar] [CrossRef]
Wang, J.; Zhao, S.; Xu, C.; Zhang, J.; Zhong, R. Brain-inspired interpretable network pruning for smart vision-based defect detection equipment. IEEE Trans. Ind. Inform. 2022, 19, 1666–1673. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, Y.; Shao, Q.; Feng, J.; Li, B.; Lv, Y.; Piao, X.; Yin, B. BjTT: A large-scale multimodal dataset for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2024, 25, 18992–19003. [Google Scholar] [CrossRef]
Zhang, D.; Yan, J.; Polat, K.; Alhudhaif, A.; Li, J. Multimodal joint prediction of traffic spatial-temporal data with graph sparse attention mechanism and bidirectional temporal convolutional network. Adv. Eng. Inform. 2024, 62, 102533. [Google Scholar] [CrossRef]
Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, J.; Lam, E.Y. House price prediction: A multi-source data fusion perspective. Big Data Min. Anal. 2024, 7, 603–620. [Google Scholar] [CrossRef]
Zhao, Y.; Shi, S.; Ravi, R.; Wang, Z.; Lam, E.Y.; Zhao, J. H4M: Heterogeneous, multi-source, multi-modal, multi-view and multi-distributional dataset for socioeconomic analytics in the case of beijing. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 1–10. [Google Scholar]
Fan, R.; Zhao, J.; Chen, Y.; Xu, K. Anger is more influential than joy: Sentiment correlation in Weibo. PLoS ONE 2014, 9, e110184. [Google Scholar] [CrossRef]
Cohen, I.; Huang, Y.; Chen, J.; Benesty, J.; Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. Noise Reduct. Speech Process. 2009, 2, 1–4. [Google Scholar]
Zhao, Y.; Ravi, R.; Shi, S.; Wang, Z.; Lam, E.Y.; Zhao, J. Pate: Property, amenities, traffic and emotions coming together for real estate price prediction. In Proceedings of the 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 1–10. [Google Scholar]
De Nadai, M.; Lepri, B. The economic value of neighborhoods: Predicting real estate prices from the urban environment. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; pp. 323–330. [Google Scholar]
Riedmiller, M. Advanced supervised learning in multi-layer perceptrons—from backpropagation to adaptive learning algorithms. Comput. Stand. Interfaces 1994, 16, 265–278. [Google Scholar] [CrossRef]
Taud, H.; Mas, J.F. Multilayer perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Springer: Berlin/Heidelberg, Germany, 2017; pp. 451–455. [Google Scholar]
Oliveira, D.D.; Rampinelli, M.; Tozatto, G.Z.; Andreao, R.V.; Müller, S.M. Forecasting vehicular traffic flow using MLP and LSTM. Neural Comput. Appl. 2021, 33, 17245–17256. [Google Scholar] [CrossRef]

Figure 1. The framework of multi-source data fusion for traffic prediction.

Figure 2. The heat map of Pearson’s correlation coefficients, illustrating the strength and direction of linear relationships between variables in the dataset.

Figure 3. The figure shows the ordering of features in the Random Forest Regression model, assessing the importance of these features in predicting house prices.

Figure 4. The results of three different regression models: Linear Regression (a–c), Random Forest Regression (d–f), XGBoost Regression (g–i), and SVM (j–l). Each row is dedicated to a model, presenting in turn a plot of the true values against the predicted values (first column), a plot of the correlation between the predicted values and the residuals (second column), and a histogram distribution of the residuals (third column). The different characteristics of the models in terms of performance and distribution of residuals are visually compared.

Table 1. The features collected and derived from various sources for traffic flow prediction, along with their descriptions.

Category	Feature	Description
Property	Year	The year the building was constructed.
	Elvt	Indicates the presence of an elevator.
	RmNum	The number of bedrooms.
	HllNum	The number of living and dining rooms.
	KchNum	The number of kitchens.
	BthNum	The number of bathrooms.
	Lat	The latitude coordinate of the property.
	Lng	The longitude coordinate of the property.
Amenity	TspNum	The number of nearby transportation facilities.
	TspDst	The average distance to transportation facilities.
	AtrNum	The number of tourist attractions nearby.
	AtrDst	The average distance to tourist attractions.
	EdcNum	The number of nearby educational institutions.
	EdcDst	The average distance to educational institutions.
	HthNum	The number of nearby healthcare facilities.
	HthDst	The average distance to healthcare facilities.
	RstNum	The number of nearby restaurants.
	RstDst	The average distance to restaurants.
	RtlNum	The number of nearby retail facilities.
	RtlDst	The average distance to retail facilities.
Emotions	AgrPct	The percentage of anger.
	DstPct	The percentage of detestation.
	HppPct	The percentage of happiness.
	SadPct	The percentage of sadness.
	FeaPct	The percentage of fear.
Price	Price	The price per square meter in RMB (RMB ^a).

^a The Renminbi (RMB) is the official currency of the People’s Republic of China.

Table 2. Performance evaluation of five predictive models (SVM, Linear Regression, Random Forest, XGBoost, and MLP) based on

R^{2}

, adjusted

R^{2}

, MAE, MSE, and RMSE.

Table 2. Performance evaluation of five predictive models (SVM, Linear Regression, Random Forest, XGBoost, and MLP) based on

R^{2}

, adjusted

R^{2}

, MAE, MSE, and RMSE.

Model	$R^{2}$	Adjusted $R^{2}$	MAE	MSE	RMSE
SVM	0.0291	0.0262	6.7222	218.8753	14.7944
Linear Regression	0.2022	0.1997	10.1823	179.8612	13.4112
Random Forest	0.9244	0.9242	1.4605	17.0376	4.1276
XGBoost	0.8896	0.8892	2.6739	24.8939	4.9894
15-layer MLP	0.2590	0.2567	8.5714	167.0596	12.9252

Table 3. The results of performance evaluation of three Multi-Layer Perceptron models (3-, 5-, 7-, 9-, 11-, 13-, and 15-layer) in a traffic prediction task are shown for comparison based on

R^{2}

, adjusted

R^{2}

, MAE, MSE, and RMSE metrics.

Table 3. The results of performance evaluation of three Multi-Layer Perceptron models (3-, 5-, 7-, 9-, 11-, 13-, and 15-layer) in a traffic prediction task are shown for comparison based on

R^{2}

, adjusted

R^{2}

, MAE, MSE, and RMSE metrics.

Model	$R^{2}$	Adjusted $R^{2}$	MAE	MSE	RMSE
3-layer MLP	0.1906	0.1881	10.2777	182.4695	13.5081
5-layer MLP	0.1175	0.1149	11.9208	198.9362	14.1045
7-layer MLP	0.2158	0.2134	9.6018	176.7830	13.2960
9-layer MLP	0.2298	0.2274	9.5818	173.7830	13.2260
11-layer MLP	0.2340	0.2318	9.3071	172.6663	13.1403
13-layer MLP	0.2171	0.2147	10.7309	176.4935	13.2850
15-layer MLP	0.2590	0.2567	8.5714	167.0596	12.9252

Table 4. Experiments with diverse setups: 1. without P: use amenities and emotional features; remove property features. 2. Without A: use property and emotional features; remove amenity features. 3. Without E: use property and amenities; remove emotional features. 4. All PAE features: use all features: property, amenities, and emotional features.

Data	Method	$R^{2}$ ↑	Adjusted $R^{2}$ ↑	MAE ↓	MSE ↓	RMSE ↓
Training set	SVM without P	−0.280	−0.280	8.037	280.643	16.752
	SVM without A	−0.150	−0.150	7.854	252.032	15.876
	SVM without E	0.022	0.021	6.569	214.454	14.644
	SVM with PAE	0.0349	0.0336	6.518	211.598	14.546
	Linear Regression without P	0.081	0.080	11.169	201.485	14.195
	Linear Regression without A	0.120	0.119	11.032	193.017	13.893
	Linear Regression without E	0.185	0.184	10.053	178.609	13.364
	Linear Regression with PAE	0.197	0.196	10.05fi3	176.068	13.269
	XGBoost Regression without P	0.891	0.891	2.935	23.882	4.887
	XGBoost Regression without A	0.914	0.914	2.578	18.917	4.349
	XGBoost Regression without E	0.958	0.958	1.761	9.165	3.027
	XGBoost Regression with PAE	0.968	0.968	1.741	9.062	3.000
	Random Forest without P	0.985	0.985	0.768	3.396	1.843
	Random Forest without A	0.985	0.985	0.715	3.187	1.785
	Random Forest without E	0.990	0.990	0.513	2.183	1.477
	Random Forest with PAE	0.998	0.998	0.500	2.081	1.460
Testing set	SVM without P	−0.293	−0.294	8.317	291.387	17.070
	SVM without A	−0.161	−0.162	8.130	261.638	16.175
	SVM without E	0.017	0.015	6.769	221.606	14.886
	SVM with PAE	0.0291	0.0262	6.7222	218.875	14.794
	Linear Regression without P	0.078	0.077	11.334	207.764	14.414
	Linear Regression without A	0.119	0.118	11.203	198.603	14.093
	Linear Regression without E	0.191	0.189	10.152	182.421	13.506
	Linear Regression with PAE	0.202	0.198	10.082	179.861	13.411
	XGBoost Regression without P	0.807	0.807	3.870	43.533	6.598
	XGBoost Regression without A	0.835	0.835	3.494	37.153	6.096
	XGBoost Regression without E	0.891	0.891	2.627	24.609	4.961
	XGBoost Regression with PAE	0.900	0.900	2.574	24.594	4.989
	Random Forest without P	0.888	0.888	2.166	25.257	5.026
	Random Forest without A	0.897	0.897	1.987	23.259	4.823
	Random Forest without E	0.927	0.926	1.469	16.549	4.068
	Random Forest with PAE	0.934	0.934	1.460	16.038	4.010

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, J.; Zhao, Y. Traffic Prediction with Data Fusion and Machine Learning. Analytics 2025, 4, 12. https://doi.org/10.3390/analytics4020012

AMA Style

Qiu J, Zhao Y. Traffic Prediction with Data Fusion and Machine Learning. Analytics. 2025; 4(2):12. https://doi.org/10.3390/analytics4020012

Chicago/Turabian Style

Qiu, Juntao, and Yaping Zhao. 2025. "Traffic Prediction with Data Fusion and Machine Learning" Analytics 4, no. 2: 12. https://doi.org/10.3390/analytics4020012

APA Style

Qiu, J., & Zhao, Y. (2025). Traffic Prediction with Data Fusion and Machine Learning. Analytics, 4(2), 12. https://doi.org/10.3390/analytics4020012

Article Menu

Traffic Prediction with Data Fusion and Machine Learning

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Data Collection and Preprocessing

3.1.1. Property Data Collection

3.1.2. Amenity Data Extraction

3.1.3. Emotional Feature Extraction

3.2. Feature Selection

3.3. Feature Correlation

3.4. Feature Importance

3.5. Prediction Model

3.6. Evaluation

4. Experiments

4.1. Feature Correlation

4.2. Feature Importance

4.3. Prediction Model

4.3.1. Machine Learning Models

4.3.2. Multi-Layer Perceptron Models

5. Ablation Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI