1. Introduction
Urban planning aims to create orderly, functional, and sustainable development within urban areas, including efficient transportation that improves mobility and ensures safety for all road users [
1,
2]. In cities, roads serve as crucial infrastructure, acting as the physical network for the movement of people and cargo [
3,
4]. The movement of vehicles across the road network results in damage (for example, potholes and distresses) to the road surface [
5]. Additionally, various municipal facilities such as manholes and speed breakers are distributed alongside roads [
6,
7]. All of these elements contribute to different types of bump features of road surface (BFRSs). Information regarding these road features is vital for urban road maintenance and planning.
Therefore, sensing the surface condition to gather information like BFRSs is essential for urban transportation [
8,
9]. Traditional methods of road surface sensing rely on professional equipment and personnel, such as LiDAR, photogrammetry, and remote sensing [
8,
10,
11,
12,
13]. These methods are time-consuming and labor-intensive, making it challenging to keep up with the rapidly changing urban road conditions. However, individuals who use these roads also record their actual conditions. These user-generated, crowd-sourced trajectories have become a vital data source for gathering road information [
14,
15]. Consequently, in recent years, there has been a shift from professional to crowd-sourced equipment for road condition sensing [
16]. The use of smartphones as a source of crowd-sourced data for road information is gaining attention [
17,
18,
19,
20]. The advantage of smartphones as a source of crowd-sourced data lies in their ability to provide abundant road surface information [
21]. However, there are challenges in extracting road features, especially in terms of expressing road surface information and modeling movement information [
22,
23]. These challenges include effectively combining multiple sensors (GPS, orientation sensors, and accelerometers) of smartphones’ data and detecting and integrating this data [
6,
24,
25,
26]. Addressing these three key issues—multi-sensor data fusion, feature detection, and data integration—is crucial within the framework of crowd-sourced data for collecting BFRS and sensing the condition of urban road surfaces.
To address these issues, this article proposes the Detecting and Clustering Framework (DCF) for sensing urban road surface conditions. The method involves collecting BFRSs (Bump Feature Representation Scores) from crowd-sourced trajectories acquired from smartphones. The movement features are then extracted from the recorded trajectories and represented using wavelet scattering transformation [
27]. The BFRSs are detected using Long Short-Term Memory (LSTM) networks. Finally, the BFRSs from crowd-sourced trajectories are represented and integrated based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The contributions of this study are as follows: (1) the study explores movement features from recorded trajectories and applies spatial transformation based on the physical structure of multiple sensors; (2) the bump feature is encoded using wavelet scattering transformation, and a neural network is designed to detect BFRSs from crowd-sourced trajectories, with comparisons made to other methods; and (3) the BFRSs from crowd-sourced datasets are represented and integrated with movement information using the proposed two-stage clustering method.
The rest of this article is organized as follows. The subsequent section presents an overview of works related to this article.
Section 3 provides a detailed description of the BFRS collection method, which is based on crowd-sourced multi-sensor stream data from smartphones. Experiments and results are discussed in
Section 4, followed by conclusions and future work in
Section 5.
2. Related Work
From the perspective of road information sensing and the related dataset, these methods can be broadly categorized into two dimensions: those oriented toward linear road connectivity (xy dimension) and those targeting road surface vertical dimensions (z dimension).
(1) Methods focused on linear road connectivity are employed in this study. These methods encompass geometric structure detection [
11,
28], map construction [
18], map matching of trajectories [
25,
29], connection relations and dead-reckoning [
19,
20,
22], road traffic flow prediction [
23,
30,
31], and a focus on aspects such as the geometric shape of the road network, the topological relation of road junctions, and traffic congestion. Regarding the geometric shape of the road network, computations are usually performed for the centerline of road segments, geometric shape of road connections, and the xy dimensions of curbs. Traditional data collection techniques in this field include field survey measurements, vehicle-borne LiDAR, photogrammetry, and satellite image analysis. Hu et al. [
32] extracted the road centerline using LiDAR point cloud by employing a combination of mean shift, tensor voting, and Hough transform methods, while Wei et al. [
11] used a CNN-based segmentation framework to segment and convert roads from remote sensing into intensity maps, and extracted road segments through an iterative search strategy. On the other hand, crowd-sourced trajectory data mining methods, using non-professional devices such as smartphones, offer a rapid and convenient alternative [
1]. Additional methods and road information can be found in Siegemund et al. [
28], Lyu et al. [
18], and Wang & Bao et al. [
4]. With the help of trajectories, these approaches efficiently extract time-sensitive information such as road network structure, real-time traffic flow, road node connectivity, and vehicle movement routes. Therefore, trajectory-based road information sensing methods are widely used in the aspect of the topological relation of road junctions. Huang et al. [
22] computed turn information from trajectories, concentrated trajectory points around intersections, and detected intersections and road segments based on this information. This method deals with the 3D structure of roads and reconstructs the complex road network from the sparsely sampled trajectory. Moreover, crowd-sourced trajectories can be applied to detect current traffic rules at road intersections [
19] and enhance the road network with updated road information by identifying existing errors [
20]. In such situations, map matching between trajectories and road networks is usually conducted, involving data preprocessing, semantic rule enhancement, and similarity metric computation [
29]. However, robust map matching results become difficult to acquire due to signal loss and instability in urban areas, requiring the application of multiple sensors such as inertial measurement units (IMU) [
25]. In terms of traffic congestion, linear road connectivity information is crucial. Byun et al. [
23] used unmanned aerial vehicle (UAV) images to estimate vehicle speeds based on deep learning neural networks, detecting and tracking vehicle movements by computing the image scale from lane distances. This method is employed for road traffic monitoring. Furthermore, road traffic information can also be computed based on trajectories. Sun et al. [
30] matched collected trajectory data to the road network using hidden Markov models (HMMs), computed the average speed of each related road segment, and predicted road traffic congestion using different neural network models, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU). Similar work has also been conducted by Bogaerts et al. [
31], who designed and applied a CNN–LSTM neural network for short- and long-term traffic forecasting. This category emphasizes information on road passage dimensions.
(2) Methods targeting road surface vertical dimensions. This category includes methodologies focused on road surface materials, damage conditions, width, and municipal infrastructure information, and professional equipment used includes field measuring devices, LiDAR, and stereovision cameras, along with customized IoT sensor devices [
5,
6,
7,
8,
10,
33,
34]. Kuduev et al. [
10] and Bhatt et al. [
6] utilized LiDAR to collect point cloud data for the road surface, and detected road surface damages based on the threshold value of the distance difference between the local fitted plane and the point cloud surface. Tan et al. [
8] collected the road surface information from UAV oblique images, generated the point cloud of road surface based on the photogrammetry theory, and computed the road surface condition using the constructed 3D road surface model. While these methods yield highly accurate road information with precise geometric fidelity, they are limited by the need for specialized equipment and personnel, and are time-consuming and labor-intensive, thus constraining their application in frequent road information updates [
2,
3,
6]. Another kind of road surface information collection method belongs to the image-based methods [
7]. Ren et al. [
5] computed the road surface damage information from street-view images, based on the improved YOLOv5 network model that uses the generalized feature pyramid network (Generalized-FPN) structure. Such kinds of data acquirement methods improve the efficiency of traditional work. In addition, other kinds of IoT sensors can also be applied to achieve the goal. Mednis et al. [
33] designed the RoadMic based on the sound sensor, and collected vibrations of the vehicle during the movement, which can be used to detect potholes and damage of road surfaces. With the continuous development of smartphone hardware capabilities, acquiring road surface information using multi-sensor smartphones has become increasingly feasible [
22]. Zang et al. [
17] mounted the smartphone on a bicycle, collected road surface information based on the recorded acceleration changes, and derived the road surface changes using the threshold value method. In addition to the location information from smartphone sensors, these methods often require data from supplementary sensors like cameras, accelerometers, gyroscopes, and audio sensors [
9,
14,
26]. Li et al. [
16] proposed a road surface detection method based on the continuous wavelet transform (CWT). The collected accelerations were preprocessed with reorientation and noise filtering to acquire actual changes in road surface, then the CWT was applied to the acceleration and the threshold value was used to detect abnormal changes related to the road surface. However, the use of smartphones, a type of non-professional measuring device, introduces considerable noise and lower information density, making the extraction of road surface features and the integration of multi-source data challenging [
12,
20,
25,
34]. Since the neural network mode has the capability of feature encoding and detecting, these methods are widely used to detect complex features. Varona et al. [
22] designed an experiment to detect potholes in the road surface, and analyzed and applied different deep learning models, including CNN, LSTM, and reservoir computing models, to detect road surface information. Chen et al. [
26] detected the road surface information based on the combination of short-time fast Fourier transform and wavelet transform with the CNN, and compared of the results between a professional sensor and the smartphone sensor, which showed the feasibility of the smartphone sensors. Based on the feature encoding and detection capability of the neural network, road information can be collected and represented with the help of crowd-sourced trajectories.
(3) Methods targeting road surface dimensions are categorized into those focused on road surface materials, damage conditions, width, and municipal infrastructure information. Professional equipment such as field measuring devices, LiDAR, stereovision cameras, and customized IoT sensor devices are used for these methods [
5,
6,
7,
8,
10,
32,
33]. Kuduev et al. [
10] and Bhatt et al. [
6] utilized LiDAR to collect point cloud data for the road surface and detected damages based on the threshold value of the distance difference between a local fitted plane and the point cloud surface. Tan et al. [
8] collected road surface information from UAV oblique images, generated a point cloud of the road surface using photogrammetry theory, and computed the road surface condition using a constructed 3D road surface model. While these methods provide highly accurate road information with precise geometric fidelity, they are limited by the need for specialized equipment and personnel, and are time-consuming and labor-intensive, restricting their application in frequent road information updates [
2,
3,
6]. Another category of road surface information collection methods belongs to image-based methods [
7]. Ren et al. [
5] computed road surface damage information from street-view images using the improved YOLOv5 network model with the Generalized-FPN structure. These types of data acquisition methods improve the efficiency of traditional work. Additionally, other IoT sensors can be applied to achieve the same goal. Mednis et al. [
33] designed the Road-Mic, which uses sound sensors to collect vibrations of vehicles during movement. These vibrations can be used to detect potholes and damage on road surfaces. With the continuous development of smartphone hardware capabilities, it has become increasingly feasible to acquire road surface information using multi-sensor smartphones [
22]. Zang et al. [
17] mounted smartphones on bicycles to collect road surface information based on recorded acceleration changes and derived road surface changes using the threshold value method. These methods often require data from supplementary sensors like cameras, accelerometers, gyroscopes, and audio sensors in addition to location information from smartphone sensors [
9,
12,
26]. Li et al. [
16] proposed a road surface detection method based on the continuous wavelet transform (CWT). They preprocessed collected accelerations with reorientation and noise filtering to acquire actual changes in the road surface. Then, the CWT was applied to the accelerations, and a threshold value was used to detect abnormal changes related to the road surface. However, the use of smartphones, which are non-professional measuring devices, introduces considerable noise and lower information density, making the extraction of road surface features and the integration of multi-source data challenging [
12,
21,
26,
35]. Neural network models have the capability of feature encoding and detection, making them widely used to detect complex features. Varona et al. [
21] designed an experiment to detect potholes on the road surface and applied different deep learning models, including CNN, LSTM, and reservoir computing models, to analyze the road surface information. Chen et al. [
26] detected road surface information by combining short-time fast Fourier transform, wavelet transform, and CNN, and compared the results between professional and smartphone sensors, showing the feasibility of using smartphone sensors. With the feature encoding and detection capability of neural networks, road information can be collected and represented with the help of crowd-sourced trajectories.
To obtain information about the condition of the road surface using crowd-sourced trajectories, the road surface information is calculated by detecting, representing, and integrating BFRSs. The trajectory data from multiple smartphone sensors are aligned with the real world and processed with movement information. Then, a feature map is encoded using wavelet scattering transformation and sent to the neural network. Finally, the road surface condition is visualized by representing and integrating the BFRSs from crowd-sourced trajectories. However, it is important to note that crowd-sourced smartphone data are limited in terms of accuracy and quality, and cannot replace professional surveying tools. Therefore, this article will not address the specific shape and measurement accuracy of BFRSs.
3. Methodology
3.1. Motivation and Background
Suppose the road surface in the local region can be considered as a flat plane, with various features such as surface damage, potholes, and municipal facilities like speed breakers and gutters. These features play a crucial role in urban road maintenance and planning. In this article, BFRSs refer to surface features that have variations in elevation greater than 3 cm and extend horizontally along the road for more than 4 cm. Furthermore, when vehicles pass over these features, the resulting vibrations are recorded by smartphone sensors and form crowd-sourced multi-sensor trajectories, as shown in
Figure 1.
In these crowd-sourced trajectories, positional and vibrational data are captured using GPS sensors and an IMU sensor that includes accelerometers, magnetometers, and gyroscopes. This information is then used to extract BFRSs and determine their locations. However, extracting BFRSs from the data of multiple sensors on a single smartphone may result in missing information. Nevertheless, by utilizing crowd-sourced trajectories from numerous smartphones, it becomes possible to capture comprehensive BFRSs. Based on the above description, a road surface condition sensing method is proposed, which relies on crowd-sourced trajectories from smartphones. The process begins with a spatial transformation of the accelerations based on orientation data in order to obtain acceleration information perpendicular to the road surface. Then, by analyzing the motion trajectory, the vehicle’s heading direction and motion characteristics are determined. Next, the wavelet scattering transform method is used to extract BFRSs from the acceleration data, which are then input into an LSTM network to link BFRSs with their locations. Finally, a two-stage clustering method is applied to represent and integrate the fusion of this information, and crowd-sourced data are used to extract road surface conditions, ultimately providing the BFRSs and their location information. The workflow of urban road surface condition sensing based on the DCF is illustrated in
Figure 2.
3.2. Movement Feature Computation for Multiple SensorRecordings
To calculate BFRS from the acquired multi-sensor trajectories, several preprocessing steps are required to prepare the data for analysis. These steps include spatial transformation and heading angle computation. The transformation ensures that the z-axis of the accelerometer is aligned perpendicular to the road surface, which is crucial for accurately measuring vertical accelerations that indicate BFRSs. Furthermore, the recorded data from the location sensor is used to determine the vehicle’s heading angle. The specific procedures for carrying out these preprocessing steps are as follows:
- (1)
Spatial transformation based on the physical multi-sensor structure
In smartphones, the vehicle experiences acceleration changes in both the horizontal direction and perpendicular to the road surface during motion. However, the initial direction of the 3-axis orientation sensor is different from that of the accelerometer, and the directions of the
z-axis are usually opposite to each other. Additionally, smartphones are randomly placed in vehicles, and all of these factors make it difficult to accurately capture the acceleration changes of BFRSs. Therefore, coordinate alignment and spatial transformation are required to address such situations, as shown in
Figure 3.
In
Figure 3a, the 3D-axis of orientation is depicted by the right-hand rules coordinate system, with upward direction as the default
z-axis direction, However, as observed from the recording of acceleration, the default
z-direction is downward with the same direction of gravity; therefore, coordinate alignment between orientation and acceleration is needed to acquire the actual acceleration changes. Suppose the 3D-acceleration is denoted as
Acc = [
Accx,
Accy,
Accz], and the process of coordinate alignment can be denoted as Equation (1), where
T is the 3 × 3 identity matrix with the third diagonal element as −1.
To eliminate the impact of the arbitrary positioning of smartphones and capture the impact of BFRSs on vehicle dynamics, the acceleration needs to be converted from the sensor coordinate system to the world coordinate system, as shown in
Figure 3b. Assuming the smartphone is horizontally placed in the vehicle, with the accelerometer aligned with the world coordinate system, when it is placed in a different orientation, this is recorded by the sensor as
Ori =
. During the spatial transformation process, the world coordinate system can be determined by the 3D-axis orientation parameters. These parameters enable the construction of a spatial extrinsic rotation matrix
S, which transforms the acceleration of any spatial orientation to the current coordinate system, in the order of
ZYX, as denoted in Equation (2).
In other words, the recorded acceleration data are the spatial transformed data, and they need to be transformed back to the world coordinate system, with the inverse matrix
Sinv. However, the 3D axis of the orientation sensor are vertical to each other, i.e., the rotation matrix
S forms an orthogonal matrix. Hence, the inverse matrix
Sinv can be calculated simply by the transpose of the rotation matrix, i.e.,
Sinv =
ST, and the spatial transformation process for the acceleration from an arbitrary coordinate system to the world coordinate system can be performed with the following equation.
- (2)
Heading angle computation based on the spherical trigonometry
In the context of bi-directional travel, BFRSs can be found on roads that have different travel directions. To account for this, the heading angle is calculated based on the vehicle’s travel direction in relation to the north direction on the horizontal plane. The heading angle is measured as the clockwise angle from north. Since GPS locations are recorded in longitude and latitude, the heading angle
θ between points
p1(
lat1,
lon1) and
p2(
lat2,
lon2) can be computed based on spherical trigonometry [
36].
(1) Convert locations from degrees to radians based on radians = degrees × π/180, and represent the coordinate as p1(, ) and p2(, ).
(2) Apply the spherical trigonometry formula to calculate the heading angle
θ based on Equation (4):
where Δ
λ =
λ2 −
λ1, and
is an arctangent function that can handle angles in all four quadrants, with a return value between −π and π.
(3) Convert calculated radians back to degrees based on degrees = radians × 180/π.
(4) Adjust the angle to a range between 0 to 360 degrees, and this can be performed by checking the value of
θ and adding 360 degrees accordingly, as denoted by Equation (5).
where mod function returns the remainder after the division operation, and meets the demand of
.
With the heading angle information, BFRSs distributed in opposite directions of the pathway can be restored, which will be described in the following sections.
3.3. BFRS Feature Encoding and Detection Based on Neural Network
After transforming the multi-sensor trajectories spatially, it becomes necessary to choose the acceleration for detecting BFRSs. However, it is crucial to consider the instantaneous changes in acceleration for extracting BFRSs, as the acceleration data may contain noise and their quality can be inconsistent during sampling. Applying direct filtering to remove the noise might unintentionally eliminate some of the useful BFRSs. Therefore, it is essential to preprocess the acceleration data before inputting it into the neural network. The specific steps for this are as follows:
- (1)
Feature representation based on the wavelet scattering transformation
To encode the acceleration features, it selects the wavelet scattering transformation, and extracts the wavelet scattering spectrum matrix as the feature encoding result for the bump feature. The method combines the multi-scale analysis properties of wavelet transforms with the hierarchical structure of deep learning, and the basic principles of the wavelet scattering transform include the following: (1) wavelet transform, which provides localized information in both time and frequency domains, and uses a series of “wavelet” functions at different scales instead of the learnable convolution kernels of CNN and (2) hierarchical structure, and each layer of the network applies a set of wavelet filters, then computes the modulus of each filter’s output, which extracts features through a deep network structure without involving training. The transformation process is as follows:
(1) Pass the acceleration through a set of wavelet filters that localize the data
x at different scales, and represent the bump feature as a series of wavelet coefficients that capture local features of the input data, as denoted by Equation (6), where
is the wavelet function, and
j represents the scale. Then, apply a modulus operation to each wavelet coefficient (denoted as
) and maintain the transformation’s stability to local changes in the input data.
(2) Process these modulus values through a low-pass filter
to extract smooth features, and obtain the result
, as denoted by Equation (7).
(3) Apply a similar process of second and higher scattering layers to the output of the first layer but after the modulus operation, and obtain the result
, which allows the capture of more complex and advanced features of the data, as denoted by Equation (8). Then, repeat the low-pass filtering again, and obtain the result
, as denoted by Equation (9).
The wavelet scattering transform extracts complex patterns of acceleration in a hierarchical manner, ensuring stability to input variations without the need for the training process commonly used in deep learning models. Therefore, this article employs the wavelet scattering transform to encode features of changes in acceleration, as demonstrated in
Figure 4.
- (2)
BFRS detection from trajectories based on the LSTM neural network
The encoded feature is then input into the LSTM neural network to extract features from trajectories with temporal characteristics, allowing for the identification of BFRSs. LSTM is a specialized type of RNN that is well-suited for handling and predicting long-term dependencies in sequence data. Within the network, LSTM operates on the principle of a “memory cell” which can retain its state over extended periods of time. Each memory cell comprises multiple components, including one or more “gate” structures (such as the input gate, forget gate, and output gate) as well as a cell state. The BFRS detection process, based on LSTM, involves the following steps:
(1) First, decide which information to discard from the cell state based on the forget gate. It makes this decision based on the current input
xt and previous output
ht−1, as denoted by Equation (10), where σ is the sigmoid function and
Wf and
bf are the weights and biases of the forget gate.
(2) Next, decide which new information is stored in the cell state based on the input gate, and update the value by the sigmoid layer. Then, create a new candidate values vector
by the tanh layer, as denoted by Equation (11).
(3) Update the old cell state
to the new cell state
. The old state is multiplied by
to “forget” certain information, and then new candidate values are added by multiplying
and
, as denoted by Equation (12).
(4) Finally, determine which part of the cell state will be included in the output. The output is determined by the cell state, but it will first pass through the sigmoid layer to select the relevant part, as indicated by Equation (13).
In these steps,
W and
b represent weights and biases, and “*” denotes element-wise multiplication. Due to its excellent ability to handle long sequence data, LSTM is widely used in areas such as time series prediction. Therefore, this article utilizes an LSTM neural network to detect BFRSs, as illustrated in
Figure 4.
In
Figure 4, there are two main steps in the network architecture:
(1) Encoding the features. The acceleration data are inputted into the wavelet scattering transform model, which automatically encodes the features using multiple wavelet functions at different scales. To calculate the local feature encoding for each consecutive recording, a sliding window with a width of 170 is used, an invariance scale of 0.5 is set, and an oversampling factor of 2 is applied to the recorded acceleration of 100 Hz.
(2) Detecting the features. The encoded features are then passed to the LSTM. The hidden LSTM layer is set to 512, followed by a fully connected layer, a softmax layer, and a classification layer. Finally, each recorded position is assigned a probability of being a BFRS. The results are stored in a buffer queue with a length of 10, and the current result is considered a true BFRS if the number of detections exceeds the threshold (e.g., 5).
3.4. BFRS Representation and Integration Using Two-Stage Clustering
With the initially detected BFRSs of
Section 3.3, interpreting the road surface condition is still difficult due to the quality-unstable recordings and the need for further representation and integration of the specific location of BFRSs. Additionally, the BFRS is derived from a single data source, which has limited coverage of the road surface. Therefore, to obtain comprehensive BFRS information, the crowd-sourced data are processed using the proposed neural network model, resulting in the detection of multiple, inconsistent BFRSs. To address these issues, the two-stage clustering method DBSCAN is proposed. DBSCAN is a density-based spatial clustering algorithm that can identify regions of high density as clusters and consider low-density areas as outliers or noise. There are two main concepts in the algorithm: (1) a core point
p that is considered a core point of dataset
D if it has a high enough number
of points
q within its neighborhood
of distance
ε, as denoted by Equation (14), and (2) it is density-reachable, in that the relationship of a point
q can be reached from a core point
p by a chain of points {
,
, …,
}
with sufficient density condition
C, as denoted by Equation (15). The specific process of DBSCAN is as follows:
(1) First, determine two parameters: the neighborhood radius ε and the minimum number of points .
(2) For each point q in the dataset D, calculate the number of points within its neighborhood distance ε. If this count is greater or equal to , mark it as a core point p.
(3) For each core point p, identify all density-reachable points q starting from it to form a cluster. Repeat this process until all core points have been visited.
(4) Among the non-core points
q, those within the
ε neighborhood of any core point
p are marked as border points; others are considered noise points.
Based on the clustering method, the initial detection results are grouped according to distance in the first stage. In the second stage, each clustering group is re-clustered using road traffic direction information and heading angle information. The proposed two-stage clustering method is repeated to handle the crowd-sourced datasets, based on the BFRSs detected from different trajectories, as shown in
Figure 5.
As shown in
Figure 5, the BFRS representation and integration processes involve two stages of clustering operations. The first clustering fusion operation (
Figure 5b) merges the detection results (
Figure 5a) into BFRSs, and the second clustering fusion operation (
Figure 5c) merges detection results from crowd-sourced data into BFRSs (
Figure 5d).
- (1)
In the single dataset
(1) By applying the clustering method that relies on spatial proximity distance, the sampling points detected by the LSTM network model can be clustered. The clustering distance is usually set to 10 m.
(2) For each cluster result, the clustering method based on the heading angle is used to cluster the result. The clustering angle is typically set to 90°, representing BFRSs on lanes with different travel directions.
After the operation of a single dataset, the detection points of each group are fused using the detection probability as the weight. If the number of clustered points in each group is less than the threshold (e.g., 10), the LSTM detection results are converted to BFRSs.
- (2)
In the crowd-sourced datasets
(1) This uses the clustering method based on spatial proximity distance to cluster the BFRSs from crowd-sourced data, with the clustering distance typically set to 10 m.
(2) For each cluster result, the clustering method based on the heading angle is used to cluster the results, with the clustering angle typically set to 90°.
After performing the representation and integration operation, the points of each cluster are fused according to the number of points in each DBSCAN cluster. The fused results are then considered as the final detection outcome, as shown in
Figure 5.
3.5. BFRS Collection Using Non-Homogeneous Spectrum Feature
The main steps of the urban road surface sensing method described in this article can be summarized into three phases: (1) The computation of movement features for multiple sensor recordings. This initial phase involves spatially transforming the raw trajectory data to obtain acceleration aligned with the world coordinate system and the vehicle’s heading angle. (2) The encoding and detection of BFRSs based on a neural network. In this phase, the acceleration undergoes a wavelet scattering transformation to effectively encode the acceleration features. The transformed data are then fed into an LSTM network to detect BFRSs. (3) The representation and integration of BFRSs using a two-stage clustering approach. The final phase focuses on representing and integrating detection results from single data sources and further fusing BFRS results from multiple sources. The specific algorithm is detailed below, as outlined in Algorithm 1:
Algorithm 1 The urban road surface condition sensing based on the DCF |
Input: crowd-sourced trajectories from smartphone T |
Output: road surface condition represented by BFRS {BFRS} |
FOR trajectory(i) IN T |
//phase1: movement feature computation for multiple sensor recordings |
; //coordinate alignment based on Equation (1) |
; //spatial transformation based on Equations (2) and (3) |
; //heading angle computation based on Equations (4) and (5) |
//phase 2: BFRS feature encoding and detection based on neural network |
FOR recording IN trajectory |
feature = WaveletScatteringTransform; //feature encoding based on Equations (6)–(9) |
BFRSdetected = LSTM(feature); //feature detection based on Equations (10)–(13) |
END |
//phase 3: BFRS representation and integration using two-stage clustering |
BFRS(i)group = Cluster(BFRS(i)detected, d); //fist stage—clustering based on Equations (14) and (15) |
BFRS(i)cluster = Cluster(BFRS(i)group, θ); //second stage—clustering based on Equations (14) and (15) |
END |
//BFRS representation and integration for crowd-sourced dataset |
BFRSgroup = Cluster(BFRS(i)cluster, d); //fist stage—clustering based on Equations (14) and (15) |
{BFRS} = Cluster(BFRSgroup, θ); //second stage—clustering based on Equations (14) and (15) |
RETURN {BFRS}//road surface condition represented by BFRS |