Next Article in Journal
A Comprehensive Survey of Privacy-Enhancing and Trust-Centric Cloud-Native Security Techniques Against Cyber Threats
Previous Article in Journal
A Multitask CNN for Near-Infrared Probe: Enhanced Real-Time Breast Cancer Imaging
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Leveraging Bluetooth and GPS Sensors for Route-Level Passenger Origin–Destination Flow Estimation

1
Smart Urban Mobility Institute, University of Shanghai for Science and Technology, Shanghai 200093, China
2
Shanghai Seari Intelligent System Co., Ltd., Shanghai 200063, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(8), 2351; https://doi.org/10.3390/s25082351
Submission received: 4 March 2025 / Revised: 2 April 2025 / Accepted: 6 April 2025 / Published: 8 April 2025
(This article belongs to the Section Navigation and Positioning)

Abstract

:
Accurate estimation of passenger origin–destination (OD) matrices is critical for optimizing public transportation systems, yet conventional methods face challenges, such as incomplete alighting data, high infrastructure costs, and privacy concerns. With existing GPS sensors and the additional deployment of a single low-cost Bluetooth sensor (10–20 US dollars) per bus, the proposed method can derive passenger OD flow without requiring passengers to tap in or tap out. The GPS sensor updates the bus locations, and the Bluetooth sensor receives signals from surrounding devices, including those onboard devices and nearby external devices. A Fuzzy C-Means clustering algorithm was employed to differentiate passenger and non-passenger devices based on detected indicators, such as detection frequency, signal strength, vehicular mobility, etc. Validation on Shanghai’s Fengpu BRT line demonstrated 91.22–96.02% accuracy in boarding proportion estimation and 95.18–95.52% for alighting during peak hours. Compared to the historical data-based method, the proposed method achieved higher similarity to ground truth and reduced the mean squared error by 12.89–69.95%.

1. Introduction

Accurate estimation of passenger origin–destination (OD) matrices is fundamental for optimizing public transportation systems, enabling evidence-based route planning, resource allocation, and service quality improvements [1]. Traditional methods face inherent limitations: manual surveys incur substantial labor costs and sampling biases [2], while automated fare collection (AFC) systems in open-loop payment environments frequently lack complete alighting information [3]. Although emerging technologies like Wi-Fi and video analytics have demonstrated potential to address these gaps, their implementation often encounters barriers, including high infrastructure costs [4] and privacy controversies [5]. These single-source approaches also exhibit inherent limitations—video analytics demands significant computational resources [3], while Wi-Fi-based solutions struggle with signal overdetection [6]—highlighting the need for multimodal integration.
This study proposes a novel, cost-effective methodology for OD estimation by integrating in-vehicle Bluetooth sensors and GPS data. Unlike existing approaches that rely on high-density sensor deployment or complex algorithms, our method requires only one low-cost Bluetooth sensor ($10–20 USD) per bus combined with widely available GPS and smart card datasets. Bluetooth technology offers advantages in device penetration (≈12–80% of passengers carry detectable devices) and spatial resolution [7,8], while GPS provides temporal–spatial vehicle context to filter non-passenger signals.
The proposed system simultaneously addresses two persistent challenges in transit OD analytics: (1) Significant reduction in hardware costs: This paper only requires one Bluetooth sensor (costing 10–20 yuan) plus existing GPS data. Compared with the scheme in [6] that requires the deployment of over 50 in-vehicle Wi-Fi probes (with a cost of 50 yuan per device), the cost per vehicle is reduced by 92–98%. When compared with the intelligent camera in the video surveillance scheme, which costs $5800 per vehicle as described in [9], the cost is reduced by 99.6%. (2) Estimation of OD flows between stations: By detecting the initial and final time of passengers’ addresses, the OD between stations for all passengers can be obtained. Previous studies only monitored the passenger flow of boarding and alighting at each station [10].
The remainder of this paper is structured as follows. Section 2 reviews the existing literature, discusses studies employing various methods for OD estimation, highlights the limitations of current approaches, and underscores the advantages of the present research. Section 3 proposes the feature calculation model and presents a clustering model designed to distinguish passengers from non-passengers. Subsequently, Section 4 develops the FCM algorithm for implementing the passenger/non-passenger differentiation clustering model. Section 5 conducts an empirical examination and provides a comparative analysis of the experimental results. Finally, Section 6 summarizes the key contributions and conclusions of this study.

2. Literature Review

2.1. Automated Fare Collection (AFC)-Based Approaches

Smart card data have become a primary OD estimation source due to their boarding time and location records. However, most AFC systems in bus networks only capture origins, requiring alighting stops to be inferred probabilistically. Lee et al. (2022) developed a trip-chaining algorithm using temporal patterns and historical records [11], achieving 60–74.9% destination matching accuracy. Similarly, Zhao et al. (2025) integrated AFC data with gravity models and entropy-weighted ensemble costs to estimate multimodal OD matrices, reducing mean absolute percentage errors (MAPE) to 12% [12]. While these methods improve upon manual surveys, their dependence on recurrent travel patterns limits applicability for irregular passengers [13].

2.2. Wireless Signal Detection Methods

Bluetooth and Wi-Fi sensing technologies enable passive OD estimation by tracking mobile devices. Early implementations by Kostakos et al. (2013) achieved 80% accuracy in reconstructing bus passenger flows using Bluetooth scanners, though device penetration rates (≈12%) constrained scalability [7]. Subsequent studies improved detection ranges and filtering algorithms: Pu et al. (2019) combined Fuzzy C-Means clustering with random forest regression to distinguish passengers from ambient signals, achieving 91% reidentification accuracy [10]. However, Wi-Fi/Bluetooth methods face challenges in dense urban environments due to signal interference and overdetection of non-passengers [6]. Recent case studies, further highlight the scalability challenges of dense signal-based approaches [14].

2.3. Video Analytics and IoT Solutions

Computer vision techniques extract OD data from surveillance footage by reidentifying passengers across stops. Zhang et al. (2025) [3] employed appearance feature matching and historical alighting probabilities, reducing root mean square errors (RMSE) to 0.5 persons/stop. While accurate, video-based systems require extensive computational resources and raise privacy concerns [15]. Complementary IoT approaches, such as NFC/BLE ticketing systems [16], provide precise OD data but necessitate passenger cooperation for check-in/check-out actions.

2.4. Hybrid and Emerging Methodologies

Recent studies integrate multiple data sources to overcome single-mode limitations. Owais (2024) combined deep learning with traffic sensor data to predict OD flows, achieving a 23% reduction in RMSE [9]. Similarly, Minea et al. (2020) fused Bluetooth triangulation with AI filtering for indoor passenger tracking, though hardware costs remained prohibitive for large-scale deployment [17]. Machine learning advancements, particularly self-supervised frameworks [18], show promise in handling incomplete datasets but require extensive training data [19].

2.5. Limitations of Existing Approaches

Current OD estimation methods exhibit three critical shortcomings:
  • Data Sparsity: AFC-based models struggle with infrequent travelers, while wireless sensing suffers from device penetration variability [20].
  • Infrastructure Costs: Video analytics and dense sensor networks (e.g., 50+ vehicle fleets in Demetrio et al., 2024 [6]) incur high deployment expenses [21,22].
  • Real-Time Limitations: Most methods focus on historical analysis rather than dynamic OD monitoring [23].
The proposed Bluetooth–GPS fusion approach directly addresses these gaps by leveraging low-cost hardware and outputs the OD results between stations. By exploiting Bluetooth’s device ubiquity without requiring active passenger participation, this methodology enables scalable OD estimation—advancing beyond the constraints of prior single-modality systems. The Bluetooth sensor used in this article is shown in Figure 1.

3. Methodology

In this methodology section, the implementation of three sequential steps is described. Step 1 computes 9 features (4 MAC address features and 5 vehicular mobility features) to distinguish passengers from non-passengers. Step 2 employs the computed feature values as the input to a classification model, which is trained to identify passenger MAC addresses. Step 3 extracts boarding and alighting stations from the model-predicted passenger MAC addresses, thereby deriving the complete origin–destination (OD) matrices for all passengers. The methodology proposed in this study comprises three sequential phases:
Step 1: Feature Computation. Relevant information pertaining to Bluetooth devices detected during travel time intervals is acquired via in-vehicle Bluetooth detectors, while real-time geospatial coordinates (longitude and latitude) and subsequent station identifiers are obtained through onboard GPS detectors. The collected Bluetooth MAC address data and vehicle GPS data from the acquisition stage undergo preprocessing, after which a feature computation model is employed to derive eigenvalues for each Bluetooth MAC address. These eigenvalues serve as feature inputs for the clustering model designed to discriminate between passengers and non-passengers.
Step 2: Passenger–Non-passenger Differentiation. The computed eigenvalues of the Bluetooth MAC addresses are subjected to Fuzzy C-Means (FCM) clustering, yielding distinct clusters corresponding to passenger and non-passenger groups.
Step 3: Result Validation. For each Bluetooth MAC address within the passenger cluster, the initial and final detection timestamps are cross-referenced with the temporal markers in the vehicle GPS data. This enables the identification of the imminent station arrival time, which is subsequently designated as the boarding/alighting station for the corresponding address. By iteratively applying this procedure to all passenger MAC addresses, an estimated origin–destination (OD) matrix is generated. The accuracy of this estimation is rigorously evaluated against ground-truth OD information.
The parameters employed in this section are categorized in Table 1, organized according to the respective methodological components.
The schematic workflow of the proposed methodology is illustrated in the accompanying Figure 2.

3.1. Feature Computation

During the data acquisition phase, in-vehicle Bluetooth detectors are employed to identify Bluetooth-enabled devices (e.g., smartphones, laptops, Bluetooth earphones) within signal range during travel intervals. Concurrently, onboard GPS detectors capture real-time geospatial coordinates (longitude and latitude) and imminent station identifiers. The collected Bluetooth MAC address data and vehicle GPS data are processed through a feature computation model to derive characteristic metrics for each MAC address [24], as formalized below.

3.1.1. MAC Address Features

1.
Detection Times ( N m ): Total number of detections for MAC address m, quantified as the count of data entries containing m in the Bluetooth dataset.
2.
Detection Duration ( t m ): Temporal span between the initial and final detection instances of MAC address m, measured in seconds (s). This metric is computed as follows:
t m = t l a s t m t f i r s t m
3.
Mean RSSI ( R m ¯ ): The average received signal strength indication (RSSI) for MAC address m, measured in dBm. This metric characterizes the aggregate signal intensity level and is computed as follows:
R m ¯ = t = t f i r s t m t l a s t m R t m N m
4.
Maximum RSSI ( R max m ): The peak RSSI value observed for MAC address m, measured in dBm. This metric reflects either the closest physical proximity between the device and detector or the optimal signal interaction state, calculated as follows:
R max m = max R t f i r s t m m , R t 2 m m , , R t l a s t m m

3.1.2. Vehicular Mobility Feature

To address temporal misalignment between Bluetooth and vehicular data sampling frequencies, temporal synchronization of Bluetooth MAC address timestamps with vehicular GPS timestamps is performed to geolocate MAC address observations. This ensures spatiotemporal consistency for subsequent feature computation. A temporal synchronization protocol is devised as follows:
t τ B T m , τ m a t c h m t = t     i f   t τ G P S arg min t τ G P S t t o t h e r w i s e
Formula (4) specifies that, if a timestamp t in the detected timestamp set τ B T m of Bluetooth MAC address m exists within the vehicle GPS data’s timestamp set τ G P S , the corresponding real-time vehicle GPS data associated with timestamp t will be directly utilized. Otherwise, the algorithm selects timestamp t′ from vehicle timestamp set τ G P S that exhibits the minimal absolute difference from t, thereby implementing a mapping operation using the temporally closest vehicle GPS data entry. This temporal alignment mechanism addresses the inconsistency in sampling frequencies between Bluetooth and vehicular GPS data acquisition systems, guaranteeing that each timestamp of every MAC address maintains a corresponding GPS coordinate mapping.
T p r e v = t τ G P S | t < t f i r s t m , s n e x t t f i r s t m 1
T n e x t = t τ G P S | t > t f i r s t m , s n e x t t f i r s t m + 1
d f r o n t m = t = max T p r e v t f i r s t m l a t t + 1 l a t t 2 + l o n t + 1 l o n t 2   i f   T p r e v φ o t h e r w i s e
d b a c k m = t = t f i r s t m min T n e x t l a t t + 1 l a t t 2 + l o n t + 1 l o n t 2   i f   T n e x t φ o t h e r w i s e
D s t a r t m = min d f r o n t m , d b a c k m
1.
Initial Detection Distance ( D s t a r t m ): The distance between the geographic location where Bluetooth MAC address m is initially detected and the nearest transportation hub, measured in meters (m). This parameter is calculated using the following formula:
The set Tprev comprises all timestamps prior to t f i r s t m that correspond to the vehicle’s position immediately preceding the target transportation node s n e x t t f i r s t m − 1, while the set Tnext contains all timestamps subsequent to t f i r s t m that align with the vehicle’s position immediately following the target node s n e x t t f i r s t m + 1. For the computation of d f r o n t m , if Tprev is non-empty, d f r o n t m is calculated as the cumulative summation of geodesic distances between consecutive longitude and latitude coordinate points recorded from the last occurrence of s n e x t t f i r s t m − 1 to t f i r s t m . Conversely, if Tprev is empty, d f r o n t m is assigned a value of infinity (∞). Similarly, for d b a c k m , if Tnext is non-empty, d b a c k m is derived from the cumulative distance spanning from t f i r s t m to the first occurrence of s n e x t t f i r s t m + 1 in the trajectory data. If Tnext is empty, d b a c k m is likewise set to infinity (∞).
2.
Final Detection Distance ( D e n d m ): The linear distance, measured in meters (m), between the geographic position of the last detected instance of MAC address m and its nearest transportation node. The computational methodology for this metric aligns with that of the initial detection distance ( D s t a r t m ).
3.
Travel Distance ( S m ): The total travel distance, in meters (m), traversed by the vehicle between the first and last detection timestamps of Bluetooth MAC address m. This parameter is derived through the following formula:
S m = t t f i r s t m , t e n d m l a t t + 1 l a t t 2 + l o n t + 1 l o n t 2
The cumulative geodesic distance is computed by iteratively summing the distances between consecutive longitude and latitude coordinate pairs recorded in the vehicle’s trajectory data across all timestamps spanning from t f i r s t m to t l a s t m .
4.
Average Velocity ( v m ¯ ): The mean speed, expressed in meters per second (m·s−1), of the vehicle during the interval between the first t f i r s t m and last t l a s t m detections of MAC address m. This parameter is derived through the following formula:
v m ¯ = S m t l a s t m t f i r s t m
5.
Maximum Velocity ( v max m ): The peak instantaneous speed, measured in meters per second (m·s−1), attained by the vehicle during the detection period of Bluetooth MAC address m. This parameter is derived through the following formula:
v max m = max l a t t + 1 l a t t 2 + l o n t + 1 l o n t 2 Δ t t | t t f i r s t m , t l a s t m
For each time interval Δ t t , the instantaneous velocity is calculated, and by traversing all the timestamps, the maximum value of the instantaneous velocity v max m is obtained.

3.2. Passenger–Non-Passenger Differentiation

3.2.1. Objective Function

Let X = x 1 , x 2 , , x n denote the dataset of n MAC addresses, where each x n R 9 contains the 9 feature values. The FCM objective function J α minimizes as follows:
J α U , V = i = 1 c k = 1 n u i k α x k v i A 2
where
c: Number of clusters (fixed at 2: passenger/non-passenger)
U = u ik : Fuzzy partition matrix ( u ik ∈ [0, 1])
V = v 1 , v 2 : Cluster centroids
α : Fuzzification exponent ( α > 1, typically 2)
  A : Mahalanobis distance with covariance matrix A

3.2.2. Membership Update

Under probabilistic constraints i = 1 c u ik = 1 , membership grades are computed as follows:
u ik = j = 1 c x k v i A x k v j A 2 / α 1 1
Formula (14) indicates that, the smaller the Euclidean distance between x k and the center v i is, the larger the membership degree u ik will be. The exponential term enhances the competition between classes, and α controls the fuzziness degree of the distribution of the membership degree.

3.2.3. Centroid Update

Cluster centroids are recalculated as follows:
v i = k = 1 n u i k α x k k = 1 n u i k α
Centroids represent the weighted mean of all data points, where weights are the α -th power of memberships. This formulation emphasizes points with higher cluster affiliations.

4. Solution

4.1. Rationale for Clustering Algorithm Selection

The identification of passenger/non-passenger groups from MAC address signatures constitutes an unsupervised pattern recognition problem. Fuzzy C-Means (FCM) clustering is particularly appropriate for this task due to three inherent characteristics:
(a)
Label ambiguity: Ground truth labels for passenger status are typically unavailable in transit monitoring systems. FCM autonomously discovers latent cluster structures without requiring pre-classified training data [25].
(b)
Feature overlap: The nine-dimensional feature space exhibits nonlinear correlations between variables (e.g., detection duration vs. average speed). FCM’s probabilistic membership assignment handles overlapping cluster boundaries better than hard clustering methods [26].
(c)
Dimensional complexity: With nine heterogeneous features spanning temporal, spatial, and signal strength domains, the Mahalanobis distance metric in FCM effectively weights feature contributions during cluster formation.

4.2. Pseudocode for Solving the Model

In this study, the algorithm is designed based on the assumption that the proportion of passengers carrying Bluetooth devices who get on and off at each station is evenly distributed overall. The parameters were determined based on previous research findings [10]. The fuzzification parameter α was set to 2, with the number of clusters c established as 2 to represent passenger and non-passenger groupings. The convergence threshold ε in the algorithm was configured at 10−5, while the maximum iteration count T was fixed at 1000 to ensure computational efficiency (Algorithm 1).
Algorithm 1: Passenger Classification via FCM Clustering
Input: Normalized feature matrix X, Cluster count c = 2, Fuzziness exponent α = 2.0, Convergence threshold ε = 10−5, Max iterations T = 1000
Output: Membership matrix U, Cluster centroid matrix V
1. Initialize U 0 with random memberships satisfying i = 1 c u ik = 1
2. Compute initial centroids V 0 via Equation (15)
3. Repeat:
a. Update U t + 1 using Equation (14)
b. Update V t + 1 using Equation (15)
c. Calculate J α t + 1 via Equation (13)
d. Until J α t + 1 J m t < ε or t > T
4. Assign labels: class( x k ) = arg max i u ik

5. Model Validation

To validate the effectiveness of the proposed methodology, a comprehensive case study was conducted through the deployment of Bluetooth detectors along authentic bus routes coupled with onboard GPS devices. This section begins with a description of the experimental background, followed by a systematic analysis of the empirical results.

5.1. Experimental Context

Bluetooth MAC address data and vehicle-mounted GPS datasets were collected along the operational Fengpu Bus Rapid Transit (BRT) corridor in Shanghai, China.
As shown in Figure 3, the Fengpu BRT route comprises 13 stations. In January 2025, the proposed methodology underwent empirical validation through systematic evaluation in both directions of the BRT corridor. Two distinct operational scenarios were investigated: a morning peak-hour service (9:00–11:00) departing from Shendu Highway to Nanqiao Station, and an evening peak-hour service (17:00–19:00) operating along the same origin–destination corridor.
Implementation of the Proposed Method: The proposed method is automatically implemented through the following data inputs of the Fengpu Express Bus Line.
(1)
Install Bluetooth detectors (located in the middle section of the vehicle) and collect Bluetooth MAC address data during the journey, which serve as the input for the proposed method. The collected Bluetooth MAC address data include information such as detection time, MAC address, Category (BT/BLE), Rssi, etc.
(2)
Vehicle GPS data, which serve as the input for the proposed method. The collected vehicle GPS data include information such as detection time, longitude, latitude, direction of travel (indicating the driving direction of the vehicle), the station that the vehicle is about to arrive at, etc.
(3)
Collection of ground truth: Firstly, determine the arrival times of the target vehicle for two trips and the vehicle preceding the target vehicle (i.e., the vehicle that arrives at each station earlier than the target vehicle) at each station, as shown in Figure 4a,b. Passengers whose arrival times at the stations are between those of the target vehicle and the vehicle preceding the target vehicle are regarded as passengers taking the target vehicle. For example, in Figure 4a, for Station 2, the arrival time of the vehicle preceding the target vehicle is 10:02:05, and the arrival time of the target vehicle is 10:09:27. The APC data of passengers who swipe their cards to enter the station at Station 2 within this time interval (including the time of entering and exiting the station and the station number of getting on and off the vehicle) are regarded as the data of passengers getting on the target vehicle at Station 2, and so on.
The ground truth origin–destination (OD) data for the two trips are obtained according to the above rules. To visualize the ground truth OD data, a Sankey diagram is adopted. As shown in the figure, on the left side are the alighting stations and the number of passengers alighting at each station, while on the right side are the boarding stations and the number of passengers boarding at each station. The lengths of the bar charts on both sides represent the passenger flow of boarding and alighting at the bus stops, and the numbers in parentheses indicate the number of boarding and alighting times. The width of the segments connecting the boarding and alighting stations represents the OD flow between the stations. Figure 5a,b respectively show two examples of bus trips during the morning and evening peak on 15 January 2024. One bus travels from Shendu Highway to Nanqiao Station during the morning peak (9:56–10:49), and the other bus travels in the same direction during the evening peak (17:56–18:48).

5.2. Experimental Results

To evaluate the accuracy of the proposed method in estimating the OD matrix, this section first compares the boarding and alighting ratios estimated by the proposed method with ground truth values, followed by performance comparisons with alternative OD estimation methods to demonstrate the effectiveness of our approach.
(1)
To quantify the performance of the proposed method in predicting boarding and alighting ratios, a multi-stage computational approach is adopted. Specifically, the mean absolute error (MAE) between the BLE estimates and ground truth values is calculated, based on which the accuracy rate is derived. The detailed formulas and procedures are outlined as follows.
For comparative analysis of the BLE estimates and ground truth ratios, the true boarding ratio P T a i and alighting ratio P T b i at the i-th station are first computed separately, as expressed in Formula (16):
P T a i = T i a j = 1 13 T j a , P T b i = T i b j = 1 13 T j b
In Formula (16), T i a and T i b are the true values of the number of passengers boarding and alighting at the i-th station, respectively, and the denominator is the sum of the true values of the number of passengers boarding and alighting at the 13 stations.
Subsequently, calculate the proportion P B a i of the number of passengers boarding detected by Bluetooth and the proportion P B b i of the number of passengers alighting at the i-th station. The proportions of passengers boarding and alighting detected by Bluetooth are normalized as follows:
P B a i = B i a j = 1 13 B j a , P B b i = B i b j = 1 13 B j b
In Formula (17), B i a and B i b represent the number of passengers boarding and alighting detected by Bluetooth at the i-th station, respectively. The denominator is the total number of passengers boarding and alighting detected by Bluetooth at the 13 stations. The normalization process avoids the evaluation bias caused by the differences in traffic flow among stations, ensuring the comparability of the results.
To evaluate the consistency between the Bluetooth proportion and the true proportion, the mean absolute error (MAE) is adopted as the core indicator. The reason for choosing MAE instead of the mean squared error (MSE) is that MAE is less sensitive to extreme values and can more robustly reflect the average deviation of the proportion data [27].
M A E a = 1 13 i = 1 13 P T a i P B a i
M A E b = 1 13 i = 1 13 P T b i P B b i
In Formulas (18) and (19), MAEa is the mean absolute error of the proportion of the number of passengers boarding detected by Bluetooth, and MAEb is the mean absolute error of the proportion of the number of passengers alighting detected by Bluetooth.
Based on the MAE values of the Bluetooth detection proportions of passengers boarding and alighting, the corresponding accuracy rates are further calculated as follows:
A C C a = 1 M A E a × 100 %
A C C b = 1 M A E b × 100 %
In Formulas (20) and (21), ACCa represents the accuracy rate of the proportion of the number of passengers boarding detected by Bluetooth compared with the true value of the proportion of the number of passengers boarding, and ACCb represents the accuracy rate of the proportion of the number of passengers alighting detected by Bluetooth compared with the true value of the proportion of the number of passengers alighting.
The accuracy rates calculated by the above method are shown in Table 2.
During the morning peak, the accuracy rate of the proportion of the number of passengers boarding detected by this method is 91.22%, and the accuracy rate of the proportion of the number of passengers alighting is 96.02%. During the evening peak, the accuracy rate of the proportion of the number of passengers boarding detected by this method is 95.18%, and the accuracy rate of the proportion of the number of passengers alighting is 95.52%. The results indicate that this method has a high degree of consistency with the true values in predicting the proportions of the number of passengers boarding and alighting, especially in the detection of the proportion of the number of passengers alighting.
(2)
Comparison with the method [28] of estimating the origin–destination (OD) based on historical boarding and alighting data (HD-based): In this part, we will select three indicators that are widely used to evaluate the prediction accuracy, namely the mean squared error (MSE), the mean absolute error (MAE), and the similarity, to analyze the OD values estimated by different methods. Moreover, during the calculation process, the data of Station 1 and Station 13 are excluded to avoid the interference of possible special situations on the results.
(a)
Mean squared error (MSE). The mean squared error is a commonly used indicator for measuring the degree of deviation between the predicted values and the true values. By averaging the squares of the prediction errors, it can comprehensively reflect the overall deviation of the prediction results. Its calculation formula is shown as follows:
M S E = 1 n i = 1 n y i y ^ i 2
In the Formula (22), n represents the sample size, y i represents the true value of the i-th sample, and y ^ i is the estimated value of the i-th sample.
(b)
Mean absolute error (MAE). The mean absolute error is another important indicator for evaluating the prediction accuracy. It directly calculates the average of the absolute errors between the predicted values and the true values and intuitively reflects the average degree of deviation between the prediction results and the true values. Its calculation formula is shown as follows:
M A E = 1 n i = 1 n y i y ^ i
(c)
Similarity. The cosine similarity is employed to measure the degree of similarity between the estimated origin–destination (OD) matrices. The reason for using the cosine similarity is that it can effectively quantify the angular similarity between two vectors without being affected by the magnitudes of the vectors. This is particularly useful when comparing the OD flow patterns estimated by different methods. In the context of OD matrix estimation, vectors represent the passenger flow distribution between different origin–destination pairs. The cosine similarity can highlight the similarity of flow patterns without being overly influenced by the absolute values of the flow volume, thus more meaningfully measuring the similarity of the estimated OD structures. The calculation formula for the cosine similarity is as follows:
cos θ = A B A B = i = 1 n A i B i i = 1 n A i 2 i = 1 n B i 2
In Formula (24), A and B are vectors representing the OD matrices estimated by different methods (e.g., Bluetooth-based (BT-based) estimation and historical boarding-alighting data-based (HD-based) estimation). A i and B i are the elements at the i-th position in vectors A and B , respectively, and n is the number of elements in the vectors, which is the number of origin–destination pairs.
The numerator i = 1 n A i B i calculates the dot product of the two vectors, measuring the extent to which the values at each corresponding position of the two vectors point in the same direction. The larger the dot product, the more similar the values at the corresponding positions are. The denominator i = 1 n A i 2 i = 1 n B i 2 normalizes the vectors to ensure that the value of the cosine similarity lies within the range of [−1, 1]. When the two vectors are identical, the cosine similarity is 1, indicating a perfect match; when they are orthogonal (completely different in direction), the cosine similarity is 0; when they are in exactly opposite directions, the cosine similarity is −1.
Figure 6a,b demonstrate the performance of different OD estimation methods across varying operational scenarios.
As shown in Table 3, in the context of the morning peak period, the MSE index of the BT-based method proposed in this paper is 0.377, which is 0.129 lower than that of the HD-based method (0.506), thus demonstrating superior error control capability. Regarding the MAE index, although the difference between the two methods has narrowed (0.189 vs. 0.197), the BT-based method still maintains a relative advantage. It is noteworthy that the difference in the similarity index is the most pronounced. The BT-based method achieves a similarity of 0.763, significantly outperforming the HD-based method’s 0.727. This indicates that, when dealing with the complex traffic flow patterns during the morning peak, the BT-based method exhibits stronger anti-interference ability and adaptability.
Upon further analysis of the performance differences across different time periods, it is found that the BT-based method demonstrates more outstanding estimation performance during the evening peak period. Its MSE index is merely 0.114, which is 69.95% lower than that of the HD-based method (0.378). This significant difference validates that the BT algorithm has better parameter convergence during the evening peak period when the commuting regularity is higher. Meanwhile, in terms of the MAE and similarity indices, the BT-based method achieves excellent results of 0.083 and 0.782, respectively, which are 31.9% lower in MAE and 5.96% higher in similarity than those of the HD-based method (0.121 MAE and 0.738 similarity), further confirming its robustness in traffic state estimation.
It is particularly important to point out that the HD-based method shows significant fluctuations in the similarity index, reaching 0.727 and 0.738 during the morning and evening peaks, respectively, which reflects that this method is highly sensitive to scenarios with sudden changes in traffic flow. In contrast, the similarity performance of the BT-based method during the two peak periods ranges between 0.763 and 0.782, showing strong stability.

6. Conclusions

This paper proposes a method for estimating the origin–destination (OD) matrix of bus passengers based on the fusion of multi-source data from Bluetooth and GPS. Through the deployment of low-cost hardware and the design of intelligent algorithms, it effectively addresses the limitations of traditional OD estimation methods in terms of data integrity and real-time performance. The experimental results show that this method demonstrates significant performance advantages in real bus operation scenarios, providing a new technological path for the intelligent management of urban public transportation. Theoretically, this paper constructs a multi-dimensional feature calculation model that incorporates MAC address features and vehicle driving features. Through the Fuzzy C-Means (FCM) clustering algorithm, it achieves accurate differentiation between passenger and non-passenger devices. Compared with traditional inference methods based on historical travel patterns, this method achieves an accuracy rate of 0.9122–0.9602 for estimating the proportion of boarding passengers and 0.9518–0.9552 for estimating the proportion of alighting passengers during the morning and evening rush hours, verifying the adaptability of multi-source data fusion to complex traffic scenarios.
In terms of practical applications, this method adopts a lightweight deployment scheme of installing one low-cost Bluetooth sensor per vehicle (cost 10–20 US dollars), which significantly reduces the high investment required by traditional video analysis or dense sensor networks. Empirical analysis shows that, compared with the historical data (HD) method based on historical data, this method achieves a similarityof 0.763–0.782 during peak hours, outperforming the HD-based method’s 0.727–0.738, providing a feasible technical option for bus companies to implement real-time passenger flow monitoring and dynamic dispatching optimization. In addition, the system’s ability to update the OD data at a minute-level can effectively support intelligent transportation application scenarios, such as early warning for sudden large passenger flows and signal control for bus priority.
Despite the aforementioned achievements, this study still has certain limitations. Firstly, the dependence of feature calculation on vehicle driving parameters may affect performance in areas with low GPS positioning accuracy. Secondly, spatiotemporal fluctuations in Bluetooth device penetration rates may introduce sampling bias for specific demographic groups. Future research will focus on four strategic optimizations: (1) Developing adaptive clustering algorithms with sliding time windows and online learning mechanisms to address dynamic device penetration rates; (2) Constructing multimodal data fusion frameworks that integrate behavioral features from onboard cameras with spatial trajectory characteristics from Bluetooth signals for comprehensive passenger profiling; (3) Enhancing methodological generalizability through multi-route validation (categorized by spatial attributes, including downtown areas, suburbs, transportation hubs, and large facilities like hospitals/schools) and multi-temporal experiments (covering peak hours, off-peak periods, and holidays) [29]; (4) Conducting quantitative analysis of external factors, including device variability (sensor drift, firmware heterogeneity) and signal interference (multipath effects, electromagnetic noise), with error propagation modeling to improve data reliability. Through continuous algorithmic refinement, this method is expected to become a core technical component supporting the digital transformation of urban bus systems.

Author Contributions

Conceptualization, C.Z. and X.Y.; Data curation, Z.P.; Formal analysis, C.Z.; Investigation, J.X.; Methodology, J.X., C.Z. and X.Y.; Resources, Z.P.; Supervision, C.Z. and X.Y.; Validation, J.X. and Z.P.; Writing—original draft, J.X.; Writing—review and editing, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 52202401 and the Shanghai Pujiang Program under Grant No. 22PJC081.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Zhenxing Pan was employed by the company Shanghai Seari Intelligent System Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BTBluetooth
FCMFuzzy C-Means
MAEMean Absolute Error
MSEMean Squared Error
HDHistorical boarding and alighting data

References

  1. Sun, W.; Schmocker, J.D.; Fukuda, K. Estimating the route-level passenger demand profile from bus dwell times. Transp. Res. Part C Emerg. Technol. 2025, 130, 103273. [Google Scholar]
  2. Demissie, M.G.; Kattan, L. Estimation of truck origin-destination flows using GPS data. Transp. Res. Part E Logist. Transp. Rev. 2025, 159, 102621. [Google Scholar] [CrossRef]
  3. Zhang, C.; Chen, X.; Zhao, J.; Jiang, Z.; Chung, E. Estimating bus passenger origin-destination flow via passenger reidentification using video images. IEEE Trans. Intell. Transp. Syst. 2025, 1–15. [Google Scholar] [CrossRef]
  4. Shafaeipour, N.; Stanciu, V.D.; van Steen, M.; Wang, M. Understanding the protection of privacy when counting subway travelers through anonymization. Comput. Environ. Urban Syst. 2025, 110, 102091. [Google Scholar]
  5. Chung, M. Real-time passenger counting in streetcars using high sound frequency signals. IEEE Sens. J. 2025, 15, 965–970. [Google Scholar]
  6. Demetrio, A.; Elgner, F.; Hameister, H.; Quinting, M.; Warzok, D.; Wendel, J. Large-scale Wi-Fi and Bluetooth data collection for reconstructing passenger flows. J. Locat. Based Serv. 2024, 18, 185–204. [Google Scholar]
  7. Kostakos, V.; Camacho, T.; Mantero, C. Towards proximity-based passenger sensing on public transport buses. Pers. Ubiquitous Comput. 2013, 17, 1807–1816. [Google Scholar]
  8. Friesen, M.R.; McLeod, R.D. Bluetooth in intelligent transportation systems: A survey. Int. J. Intell. Transp. Syst. Res. 2014, 13, 143–153. [Google Scholar]
  9. Owais, M. Deep learning for integrated origin-destination estimation and traffic sensor location problems. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6501–6513. [Google Scholar]
  10. Pu, Z.; Cui, Z.; Zhu, M.; Wang, Y. Mining public transit ridership flow and origin-destination information from Wi-Fi and Bluetooth sensing data. Transportation Research Part C. arXiv 2019, arXiv:1911.01282, 2019. [Google Scholar]
  11. Lee, I.; Cho, S.H.; Kim, K.; Kho, S.Y.; Kim, D.K. Travel pattern-based bus trip origin-destination estimation using smart card data. PLoS ONE 2022, 17, e0270346. [Google Scholar] [CrossRef] [PubMed]
  12. Zhao, D.; Mihaita, A.S.; Ou, Y.; Grzybowska, H.; Li, M. Origin-destination matrix estimation for public transport: A multi-modal weighted graph approach. Transp. Res. Part C Emerg. Technol. 2025, 165, 104694. [Google Scholar]
  13. Jin, M.; Wang, M.; Gong, Y.; Liu, Y. Spatio-temporally constrained origin-destination inferring using public transit fare card data. Phys. A Stat. Mech. Its Appl. 2022, 603, 127642. [Google Scholar]
  14. Ryu, S.; Park, B.B.; El-Tawab, S. WiFi sensing system for monitoring public transportation ridership: A case study. IEEE Trans. Intell. Transp. Syst. 2020, 24, 3092–3104. [Google Scholar]
  15. Servizi, V.; Persson, D.R.; Pereira, F.C.; Villadsen, H.; Bækgaard, P.; Peled, I.; Nielsen, O.A. “Is Not the Truth the Truth?”: Analyzing the impact of user validations for bus in/out detection in smartphone-based surveys. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11905–11920. [Google Scholar]
  16. Ferreira, M.C.; Dias, T.G.; Cunha, J.F. Anda: An innovative micro-location mobile ticketing solution based on NFC and BLE technologies. IEEE Trans. Intell. Transp. Syst. 2022, 23, 6316–6325. [Google Scholar]
  17. Minea, M.; Dumitrescu, C.; Costea, I.M.; Chiva, I.C.; Semenescu, A. Developing a solution for mobility and distribution analysis based on Bluetooth and artificial intelligence. Sensors 2020, 20, 7327. [Google Scholar] [CrossRef]
  18. Tan, Z.; Li, X.; Liu, Y. Bus passenger origin-destination flow estimation using entry-only smartcard data: A self-supervised learning method without alighting data. IEEE Trans. Intell. Transp. Syst. 2025, 26, 4808–4822. [Google Scholar]
  19. Li, H.; Wang, Y.; Xu, X.; Qin, L.; Zhang, H. Short-term passenger flow prediction under passenger flow control using a dynamic radial basis function network. Appl. Soft Comput. J. 2019, 83, 105620. [Google Scholar]
  20. Nguyen, K.A.; Wang, Y.; Li, G.; Luo, Z.; Watkins, C. Realtime tracking of passengers on the London Underground transport by matching smartphone accelerometer footprints. Sensors 2019, 19, 4184. [Google Scholar] [CrossRef]
  21. Fabre, L.; Bayart, C.; Bonnel, P.; Mony, N. The potential of Wi-Fi data to estimate bus passenger mobility. Technol. Forecast. Soc. Change 2025, 192, 122509. [Google Scholar]
  22. González, A.B.R.; Diaz, J.J.V.; Wilby, M.R. Detailed origin-destination matrices of bus passengers using radio frequency identification. IEEE Trans. Intell. Transp. Syst. 2025, 14, 141–152. [Google Scholar]
  23. Pu, Z.; Zhu, M.; Li, W.; Cui, Z.; Guo, X.; Wang, Y. Monitoring public transit ridership flow by passively sensing Wi-Fi and Bluetooth mobile devices. IEEE Internet Things J. 2020, 8, 474–486. [Google Scholar]
  24. Huang, Z.; Mireles de Villafranca, A.E.; Sipetas, C.; Quach, T. Crowd-sensing commuting patterns using multi-source wireless data: A case of Helsinki commuter trains. arXiv 2023, arXiv:2302.02661. [Google Scholar]
  25. Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar]
  26. Sun, H.; Wang, S.; Jiang, Q. FCM-based model selection algorithms for determining the number of clusters. Pattern recognition 2004, 37, 2027–2037. [Google Scholar]
  27. Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peerj Comput. Sci. 2021, 7, e623. [Google Scholar]
  28. Ji, Y.X.; Tuo, S.J.; Misulanilabi; Micle, D. Estimation method of bus passenger trip origin and destination based on boarding and alighting passenger counts. J. Tongji Univ. 2013, 41, 1020–1024+1118. [Google Scholar]
  29. Soltani Naveh, K.; Kim, J. Urban trajectory analytics: Day-of-week movement pattern mining using tensor factorization. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2540–2549. [Google Scholar]
Figure 1. Illustration of the Bluetooth sensor.
Figure 1. Illustration of the Bluetooth sensor.
Sensors 25 02351 g001
Figure 2. Methodological framework.
Figure 2. Methodological framework.
Sensors 25 02351 g002
Figure 3. Fengpu BRT line.
Figure 3. Fengpu BRT line.
Sensors 25 02351 g003
Figure 4. The arrival times of the target vehicle and the vehicle preceding it as well as the time difference between their arrivals. (a) shows the arrival times of the target vehicle and the preceding vehicle at each station during morning peak hours, while (b) displays the arrival times of the target vehicle and the preceding vehicle at each station during evening peak hours.
Figure 4. The arrival times of the target vehicle and the vehicle preceding it as well as the time difference between their arrivals. (a) shows the arrival times of the target vehicle and the preceding vehicle at each station during morning peak hours, while (b) displays the arrival times of the target vehicle and the preceding vehicle at each station during evening peak hours.
Sensors 25 02351 g004
Figure 5. The ground truth of passenger flow OD. (a) shows the OD ground truth of the target vehicle during morning peak hours, while (b) displays the OD ground truth of the target vehicle during evening peak hours.
Figure 5. The ground truth of passenger flow OD. (a) shows the OD ground truth of the target vehicle during morning peak hours, while (b) displays the OD ground truth of the target vehicle during evening peak hours.
Sensors 25 02351 g005
Figure 6. The performance of passenger flow origin–destination (OD) estimation methods in different scenarios. (a) shows the performance comparison of different OD estimation methods during morning peak hours, while (b) displays the performance comparison of different OD estimation methods during evening peak hours.
Figure 6. The performance of passenger flow origin–destination (OD) estimation methods in different scenarios. (a) shows the performance comparison of different OD estimation methods during morning peak hours, while (b) displays the performance comparison of different OD estimation methods during evening peak hours.
Sensors 25 02351 g006
Table 1. Notations.
Table 1. Notations.
Feature Computation
mBluetooth MAC address
tTimestamp
L o n t m Longitude of Bluetooth MAC address m detected at timestamp t
L a t t m Latitude of Bluetooth MAC address m detected at timestamp t
N m Total detection count of Bluetooth MAC address m
t m Detection duration of Bluetooth MAC address m
t l a s t m Last detection timestamp t for Bluetooth MAC address m
t f i r s t m First detection timestamp t for Bluetooth MAC address m
R m ¯ Average RSSI value of Bluetooth MAC address m
R max m Maximum RSSI value of Bluetooth MAC address m
R t m RSSI value recorded when detecting Bluetooth MAC address m at timestamp t
τ G P S Timestamp set of vehicle GPS data
τ B T m Bluetooth timestamp set of MAC address m
τ m a t c h m Matched timestamp set after aligning Bluetooth timestamps with vehicle GPS timestamps
s n e x t t Upcoming station number of the vehicle at timestamp t
T p r e v Timestamp set corresponding to (upcoming station number−1) before first detection time
T n e x t Timestamp set corresponding to (upcoming station number + 1) before first detection time
d f r o n t m Distance from preceding station for Bluetooth MAC address m at timestamp t
d b a c k m Distance to subsequent station for Bluetooth MAC address m at timestamp t
D f i r s t m Minimum distance to the nearest station during first detection of Bluetooth MAC address m
D e n d m Minimum distance to the nearest station during last detection of Bluetooth MAC address m
S m Total travel distance between first and last detection instances of Bluetooth MAC address m
v m ¯ Average travel speed during detection period of Bluetooth MAC address m
v max m Maximum instantaneous speed during detection period of Bluetooth MAC address m
Δ t t Time interval between consecutive timestamps t + 1 and t
Passenger–Non-passenger Differentiation
J α Objective function
UMembership matrix
VCluster centroid matrix
x k k-th data sample
u i k Membership degree of sample k to cluster i
v i i-th cluster centroid
nNumber of MAC addresses
α Fuzzification exponent
cNumber of clusters (fixed at 2: passenger/non-passenger)
ACovariance matrix
XInput feature matrix
Result Validation
iThe i-th station
T i a Ground-truth boarding count at station i
T i b Ground-truth alighting count at station i
B i a Bluetooth-detected boarding count at station i
B i b Bluetooth-detected alighting count at station i
P T a i Ground-truth boarding proportion at station i
P T b i Ground-truth alighting proportion at station i
P B a i Bluetooth-detected boarding proportion at station i
P B b i Bluetooth-detected alighting proportion at station i
Table 2. The accuracy rate of the proportion of the number of alighting passengers and boarding passengers detected by Bluetooth.
Table 2. The accuracy rate of the proportion of the number of alighting passengers and boarding passengers detected by Bluetooth.
ScenarioACCaACCb
Morning Peak91.22%96.02%
Evening Peak95.18%95.52%
Table 3. The performance of passenger flow origin–destination (OD) estimation methods in different scenarios.
Table 3. The performance of passenger flow origin–destination (OD) estimation methods in different scenarios.
ScenarioOD Estimation MethodMSEMAESimilarity
Morning PeakBT-based0.3770.1890.763
HD-based0.5060.1970.727
Evening PeakBT-based0.1140.0830.782
HD-based0.3780.1210.738
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, J.; Pan, Z.; Zhang, C.; Yang, X. Leveraging Bluetooth and GPS Sensors for Route-Level Passenger Origin–Destination Flow Estimation. Sensors 2025, 25, 2351. https://doi.org/10.3390/s25082351

AMA Style

Xu J, Pan Z, Zhang C, Yang X. Leveraging Bluetooth and GPS Sensors for Route-Level Passenger Origin–Destination Flow Estimation. Sensors. 2025; 25(8):2351. https://doi.org/10.3390/s25082351

Chicago/Turabian Style

Xu, Junming, Zhenxing Pan, Cheng Zhang, and Xiaoguang Yang. 2025. "Leveraging Bluetooth and GPS Sensors for Route-Level Passenger Origin–Destination Flow Estimation" Sensors 25, no. 8: 2351. https://doi.org/10.3390/s25082351

APA Style

Xu, J., Pan, Z., Zhang, C., & Yang, X. (2025). Leveraging Bluetooth and GPS Sensors for Route-Level Passenger Origin–Destination Flow Estimation. Sensors, 25(8), 2351. https://doi.org/10.3390/s25082351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop