1. Introduction
Side roads, as an important component of complex urban transportation networks, are typically used to connect other roads or provide services to surrounding areas such as residential and industrial zones, thereby effectively alleviating traffic pressure on main roads. With the development of intelligent transportation systems and autonomous driving technologies, the demand for higher accuracy in electronic maps has grown significantly. Detailed side road information provides a foundational data layer for traffic navigation and flow management. The absence or inaccuracies in side road information can lead to increased traffic congestion and reduced navigation efficiency, particularly during peak hours. The real-time and accuracy requirements for electronic maps have driven rapid advancements in map construction and inference technologies, with significant achievements, especially in the field of road information extraction [
1,
2,
3]. Comprehensive surveys such as Feng and Zhu [
4] provide an overview of trajectory data mining techniques and their wide-ranging applications, highlighting the central role of trajectory-based approaches in intelligent transportation systems. Digital road information can be extracted from various sources, including laser point clouds, remote sensing imagery, and crowdsourced trajectory data [
5]. Among these, laser point clouds are valued for their accuracy in road boundary detection, as demonstrated by Hervieu and Soheilian [
6], who proposed a precise roadside detection and reconstruction method using LIDAR sensors. However, they often suffer from sparsity and precision issues, making detailed boundary extraction difficult, especially under high-noise conditions [
7]. Remote sensing imagery is susceptible to weather and sunlight conditions, and can suffer from occlusion or confusion due to physical objects [
8]. In contrast, crowdsourced trajectory data contains rich implicit features and offers advantages such as low cost, high timeliness, and broad coverage, making it an important source of information for current digital road information extraction [
9]. While trajectory-based road information extraction techniques are relatively mature, they predominantly target main roads and major intersections, posing substantial challenges when applied directly to side roads. To overcome this limitation, we adapt and extend these methods by incorporating novel components, including semantic filtering and mutation interval analysis, to address the distinctive features of side roads. This focused approach not only complements existing methodologies but also fulfills practical needs for improved map accuracy and traffic management.
Currently, research on digital road information extraction based on trajectory data primarily includes road abstraction, incremental branching, and intersection linking [
10]. These approaches treat trajectory data as input to either construct the map in a single step or incrementally expand the map over time. However, side roads, as roads adjacent to main roads, face increased difficulty in data acquisition due to irregular traffic flow and complex environmental occlusions. Existing studies tend to focus resources and attention on main roads and key nodes, often neglecting side roads due to their lower traffic volume, and research in this area remains relatively scarce. The research most relevant to side road detection in digital road information extraction can be further categorized into two types: road-line-based and road-point-based. In road-line-based research, typical tasks include lane structure recognition, road centerline, road boundary, and lane marking extraction [
11], with further refinement in lane recognition [
12]. Road-point-based research mainly focuses on key road navigation points such as intersections, gas stations, and parking lots, along with auxiliary information [
13].
In road-line-based research, various methods have been proposed to extract road features from trajectory data, especially focusing on the identification of road networks and lane-level information. For example, Li et al. [
14] utilized a spatial linear clustering technique that effectively infers road segments from GPS trajectories, detects missing roads, and validates existing road networks. Yang et al. [
15] introduced a method using Delaunay triangulation to extract high-quality road boundary information from crowdsourced GPS vehicle trajectories. Li, Kulik [
14] further combined the Delaunay triangulation method with a constrained Gaussian mixture model, achieving lane information extraction from low-precision floating vehicle data [
16]. Zhou, Wang [
17] proposed a method for urban road extraction based on floating car trajectory clustering, which considers the impact of trajectory point locations and directional angles. This approach uses path clustering to identify road contours and employs Delaunay triangulation to extract the road skeleton line. Many studies have developed various clustering algorithms to analyze large-scale trajectory data, aiming to discover intrinsic movement patterns and extract meaningful structures. Methods such as the partition-and-group trajectory clustering [
18], as well as segmentation based on representativeness [
19], share the common feature of performing clustering on segmented sub-trajectories. By decomposing complex trajectories into manageable segments, these approaches enhance the accuracy and interpretability of lane detection in large-scale trajectory datasets. For instance, Wagstaff, Cardie [
20] introduced a new clustering algorithm to identify roads and compute lane structures. Uduwaragoda, Perera [
21] applied kernel density clustering methods to detect lane numbers and positions from vehicle GPS trajectory data. Alsahfi, Almotairi [
22] proposed a method using GPS trajectories to generate road maps by recognizing intersections and connecting them to construct road segments through a grid-based line simplification algorithm. Moreover, Wu, Zhang [
23] used DeepDualMapper to effectively combine aerial images and GPS trajectory data, generating high-precision maps. Li, Wang [
24] introduced a time-varying road network model (TRNM), which efficiently represents dynamic topological relationships arising from time-dependent traffic control measures, supporting lane-level navigation in path planning. Yuan, Yue [
25] further advanced this research by analyzing lane-change behavior to extract lane centerlines from high-precision trajectory data and inferring lane-level topological structures, leading to improved lane-level road network generation. These studies highlight the development of advanced methods to address road network extraction and lane-level information retrieval, facilitating more accurate digital map construction and efficient traffic management systems.
In point-based road recognition research, several methods have been developed to extract key locations such as intersections and points of interest from trajectory data. For instance, Xingzhe, Philips [
26] proposed a method based on GPS trajectory to extract common sub-trajectories, using local maxima in the sub-trajectory end point density map to identify intersections. Zhang, Liu [
27], addressing the density differences in trajectories on different road types, introduced an adaptive density equalization method and turning distance ratio for identifying intersections. Their method effectively distinguishes intersections from points with similar movement characteristics, such as gas stations and parking lots. Deng, Huang [
28] proposed a hierarchical trajectory rasterization strategy to address the spatial distribution heterogeneity of trajectory density. They also developed a full-process “conversion–segmentation–optimization” method for road intersection hierarchical extraction from a visual perspective. Yang, Tang [
29] utilized a multi-level feature extraction strategy to achieve intersection recognition and layout detection, automatically generating lane-level intersection maps from crowdsourced trajectory data. Karagiorgou and Pfoser [
30] employed hotspot analysis and point clustering based on triangulation to detect the spatial coverage of road intersections [
31]. They used an improved hierarchical trajectory clustering algorithm and K-segment fitting to generate urban road intersection models. Li, Su [
32] studied a personalized route guidance system called PaRE, which uses user trajectory data to extract points of interest or intersections from the road network. These studies highlight various methods aimed at accurately identifying road points such as intersections from trajectory data, contributing to the improvement of road network mapping, navigation systems, and urban traffic management. An intuitive comparison of key point- and line-based road network extraction methods is provided in
Table 1.
In summary, significant progress has been made in the field of main road information extraction, yet systematic research on side road information remains relatively scarce. Studies focusing on road lines emphasize the extraction of general road features, using methods such as clustering algorithms, geometric features, and probabilistic classification models to identify road line objects [
35,
36]. Essentially, these methods identify segments with higher trajectory density distributions, and the reliability of the results improves as trajectory density increases. However, they struggle to capture the geometric shape and topological structure of adjacent road segments. Since side roads are typically located adjacent to main roads and trajectory data is limited in precision, these methods tend to focus on the overall distribution of trajectory lines, overlooking the spatial heterogeneity in the data, which makes it difficult to effectively identify side road objects within adjacent areas. In research focused on road points, some progress has been made in intersection identification and road network simplification through feature engineering and clustering algorithms. However, these methods are still limited to key point extraction and have not delved deeply into the connectivity relationships of road networks [
37,
38]. When extracting spatial distribution information, different points of interest often exhibit unique behavioral characteristics. Even though side roads have associated access points, existing methods still struggle to effectively identify these access points. Therefore, the challenges in existing methods for extracting digital road information can be summarized as follows:
Not suitable for the extraction of side road information that is parallel and compactly distributed on both sides of the main road.
Over-reliance on geometric information of the road network and road structure, neglecting the spatial data distribution patterns of the trajectory data.
Lack of comprehensive consideration of the overall trajectory path morphology and the spatiotemporal feature changes, with most methods treating individual trajectory points as the main analysis objects.
Based on the principles of side road design and traffic flow characteristics analysis, trajectories on side roads are typically connected to the main road via access points. Therefore, accurately identifying side road access points is key to detecting side road information. Additionally, in the lane-change behaviors between the main and side roads, the trajectory paths exhibit significant differences in terms of morphology, turning, speed, and other driving behaviors, providing important evidence for the extraction of side road information. In light of these challenges, this study proposes a road segment-level side road detection method based on trajectory data, SRDet, which not only fills the research gap in the field of side road information detection but also effectively enhances the data-updating capability of navigation systems and the efficiency of road management. The contributions are as follows:
A mutation interval identification method is proposed to extract significant trajectory change features and combine side road lane-change driving behaviors, achieving precise classification of trajectory behaviors.
A road segment classification model based on multimodal features is proposed, combining the linear features of main and side roads with the spatiotemporal distribution features of trajectories to filter potential side road segments.
A method for access point identification based on mutation point density distribution and lane-change type is proposed, enabling access point localization and classification, and fitting to generate side roads.
2. Methodology
The trajectory points generated by vehicles on the road record the current driving status and location information. After matching with the corresponding road segments, these trajectory data can be used to extract feature information that reflects the road network structure [
39]. Main and side roads are typically connected by access points, and the lane-change behaviors of vehicles at these access points exhibit specific spatiotemporal characteristics. These characteristics can reveal the connection relationship between main and side roads, serving as a reference for road segment classification. Based on this, this study proposes a side road detection method, which, based on trajectory data, determines whether a side road exists within a road segment and further identifies the specific location of the main and side road access points, thereby generating road segment-level side road information and updating road segment navigation data. The overall process of the side road detection method is shown in
Figure 1.
Firstly, during the data preprocessing stage, trajectory and OSM road network data are preprocessed to achieve the connection between trajectory points and road segments. Secondly, in the trajectory dynamic analysis, lane-change behavior features are taken as core feature variables to identify lane-change behavior pattern trajectory sequences. Furthermore, a stepwise optimization strategy is employed, where the trajectory mutation intervals are defined to quantify the interaction features between main and side roads. Then, in road segment classification, the point pattern analysis method is used to detect the spatial clustering characteristics of mutation points, and a multimodal feature space is constructed. A random forest model is used for road segment classification. Finally, for potential side road segments, kernel density estimation and density peak clustering methods are applied to accurately identify and classify the main and side road access points, thereby fitting and generating the complete topology of the side road.
Definition 1 Trajectory Sequence (). , where , and ,
. A trajectory is composed of a series of points arranged in chronological order [
40]
, where and represent the spatial coordinates of trajectory point , typically in longitude and latitude, and is the timestamp of . The sampling frequency between two consecutive points is , and the sampling distance is . Definition 2 Mutation Interval (). , where . represents a subset of continuous trajectory points that meet specific spatiotemporal constraints, with trajectory point satisfying both time and spatial change conditions. is used to identify potential side road segments and serves as an input feature for the subsequent side road extraction process, providing guidance for accurately locating side road entrances and paths.
Definition 3 Road Segment (). , where each road segment , and represent the spatial coordinates of the start and end points of , is the unique identifier of , describes the topological attributes of , indicating its adjacency relationships within the road network, and denotes the classification attribute of , representing the road type.
2.1. Dynamic Analysis of Trajectory Lane-Change Patterns
To fully explore the geometric and dynamic features of trajectories, this study proposes a lane-change pattern recognition method based on core variables, aimed at dynamically selecting lane-change behavior trajectories. Additionally, a stepwise optimization strategy is employed, introducing the concept of mutation intervals to further reveal the intrinsic characteristics of trajectory sequences. By combining lane-change pattern recognition with mutation interval identification, this approach integrates both local and global features, providing data-driven guidance for
Section 2.3.
2.1.1. Data Cleaning and Fusion
GPS trajectory data is susceptible to noise and anomalous trajectories due to interference from weather conditions or tall surrounding objects, and may experience stopping phenomena in complex environments such as traffic congestion. To improve the accuracy of trajectory pattern recognition, preprocessing of GPS trajectory data is performed, including quality filtering, anomaly trajectory handling, trajectory segmentation, and stop detection. In order to ensure the accuracy of subsequent side road detection results and improve the quality of OSM road network data, crowdsourced OSM road network data is preprocessed, including the removal of low-level roads, road network interruptions, and road segment merging operations. Data cleaning ensures the quality of the input data and reduces noise interference.
Based on the processed trajectory data and road network topology, this study uses the Hidden Markov Model (HMM) for map matching to obtain the association information between road segments and trajectories [
41]. By considering multiple features such as the distance, direction, and sampling accuracy of trajectory points, along with the road segment direction, the observation and transition probabilities are calculated. The Viterbi algorithm is then used to infer the optimal path, correcting instances of reverse driving caused by trajectory point drift and connecting trajectory points to road segments [
42]. This method effectively combines the spatial location and dynamic features of trajectory points, ensuring the matching accuracy between trajectories and road segments. It provides a reliable data foundation for subsequent lane-change pattern recognition. The connection between trajectory points and data ensures that trajectory points are accurately mapped to specific road segments in the road network, laying the foundation for later classification based on road segments and providing high-quality data input for lane-change pattern analysis.
2.1.2. Lane-Change Pattern Recognition
Considering that vehicles often transition between the main road and side roads via access points, analyzing the temporal and spatial distribution characteristics of trajectory data at these access points becomes a crucial method for detecting side road information. In this study, lane-change behavior is defined as the continuous change in position and direction of a vehicle during its driving process. This behavior differs from actions such as entering a highway ramp, turning at an intersection, or overtaking on the main road, as it is influenced by the unique physical structure of access points and the dynamic interaction of vehicles. As a result, it exhibits distinct dynamic spatiotemporal distribution characteristics.
To effectively characterize this lane-change pattern, this study constructs two key variables: projection distance and angle distance, which are used to describe the geometric and dynamic characteristics of different lane-change patterns, respectively.
Definition 4 Projection Distance (). , where represents the projection distance of the trajectory point to the main road. Specifically, it is the perpendicular distance from the trajectory point to the line segment formed by the start and end points of the main road segment. If the trajectory point lies to the left of the road segment, the projection distance is positive; if it lies to the right, the projection distance is negative. The calculation of the trajectory’s projection distance is shown in Figure 2a. Definition 5 Angular Distance (). , where represents the angular distance between the direction angle of trajectory point and the main road direction, calculated as shown in Figure 2a, as indicated by Equation (1):where represents the azimuth angle of trajectory point , calculated from the latitude and longitude coordinates of trajectory points and , which indicates the travel direction of the trajectory point. denotes the road direction, which is aligned with the trajectory direction and is calculated from the latitude and longitude coordinates of the road’s start and end points. Figure 2b illustrates a specific trajectory scatter plot as an example, while (c) and (d) show the line charts of angular distance variation and projection distance variation, respectively. In this study, SRDet constructs two core variables—the projection distance and the angular distance—of the trajectory to quantify the dynamic features of the trajectory, enabling the quantitative filtering of lane-change behavior trajectories. Specifically, lane-change pattern recognition adopts a multidimensional constraint strategy, which, based on the geometric and dynamic changes in the trajectory and its statistical representation, quantifies the spatial distribution instability and directional mutations of the trajectory, further capturing lane-change behavior on main and side roads. Building on the classification concept of TraClass [
43], SRDet introduces a statistical-based trajectory feature extraction technique, combining local and global feature statistical representations to enhance the evaluation of trajectory variability, achieving the quantitative classification of lane-change and non-lane-change trajectories. During the recognition process, a multidimensional feature selection framework is utilized to improve classification accuracy and enhance the repeatability and generalization ability of the results. Based on this, SRDet accurately selects trajectory sequences that conform to the lane-change characteristics of main and side roads while filtering out irrelevant data that does not meet the criteria.
2.1.3. Mutation Interval Recognition
To further reveal the intrinsic features of trajectory sequences, SRDet adopts a multilayered optimization strategy for the fine extraction of mutation intervals. First, potential mutation points are detected based on the preprocessed trajectory data and core geometric variables (projection distance and angular distance). Then, an initial mutation interval is progressively constructed through temporal extension, describing the dynamic mutation range of the trajectory. Finally, the mutation point distribution and lane-change behavior types are output, as shown in the overall process in
Figure 3. During the extension process, the trend of the angular distance is analyzed based on preset threshold values, and interval boundaries are defined to improve the accuracy of mutation interval boundary recognition. In the interval merging process, for trajectories with multi-peak fluctuation characteristics, adjacent mutation intervals are further integrated to enhance the coherence of lane-change behavior.
To address the abnormal fluctuations that may arise from dynamic drift in some trajectory points, SRDet designs a fluctuation filtering mechanism to further optimize the interval quality in complex road network environments. Specifically, by combining the statistical characteristics of the changes in the azimuth angles at the beginning and end of the mutation intervals, SRDet sets dynamic angular constraints to capture the small-angle variation characteristics of real lane-change behavior (see Equation (2)), thereby excluding pseudo-mutation intervals caused by junction turns. The 30° threshold in Equation (2) is established based on highway design standards and geometric characteristics of mainline-side road configurations. According to highway design specifications of the Technical Standard for Highway Engineering in Suburban and Rural Town Areas, the geometric design of interchange ramps, acceleration/deceleration lanes, and weaving sections must satisfy specific angular constraints to ensure safe vehicle maneuvering. The stepwise optimization strategy retains the characteristics of true lane-change behavior, eliminates irrelevant pseudo-intervals, and enhances SRDet’s adaptability in complex traffic scenarios, providing higher-quality data input for subsequent feature construction.
A schematic representation of the real-world spatial distribution of main and side roads is shown in
Figure 4a. In practical road environments, the main and side roads are typically arranged in parallel, with a separation (green belt or fence) between them. As a result, vehicles can only switch between main and side roads at specific access points. This spatial constraint forms the foundation for reliably determining the direction of lane-change events. Furthermore, the physical meaning of the projection distance, which is used to identify the direction of lane changes, is further illustrated in
Figure 2a. In this figure, we show the geometric relationship between the trajectory points and the road segment, using the scenario of a side road to main road lane change as an example. The variation in the projection distance (refer to Definition 4) along the trajectory directly reflects the vehicle’s transition direction. In more detail, for the optimized mutation interval
, if
, indicating a decrease in the projection distance to the main road, it is considered a lane-change behavior from the main road to the side road. Conversely, if
, indicating an increase in the projection distance to the main road, it is defined as a lane-change behavior from the side road to the main road, as shown in
Figure 4b,c. SRDet uses the median index point of the interval trajectory or the geometric center of the two middle points of the interval as the mutation point for the mutation interval. The final output includes the optimized mutation interval and the set of mutation points, with each mutation interval labeled with its corresponding lane-change behavior type (main road to side road, side road to main road).
2.2. Potential Side Road Segment Recognition
Based on road segment data, trajectory data, and mutation intervals, this study proposes a refined road segment classification method for subsequent access point identification and side road generation. First, SRDet combines geometric features with spatial statistical theory, introducing Ripley’s K function to quantitatively model the spatial clustering characteristics of key mutation points [
44]. Then, by incorporating trajectory features (such as speed changes and angular direction) and road network features (such as segment length, curvature, and traffic flow distribution), SRDet constructs a multimodal feature extraction framework to systematically describe the complex characteristics of main and side road access points. Finally, SRDet proposes a side road segment recognition model based on random forests to improve the accuracy of access point identification and the robustness of the model.
2.2.1. Multimodal Feature Construction
To further capture the spatiotemporal distribution characteristics of lane-change behaviors on main and side roads, this study proposes a multimodal feature construction method based on the combination of geometric features and spatial statistical theory. This approach integrates spatial distribution statistical features based on Ripley’s K function, dynamic vehicle trajectory features, and road network features, systematically modeling the complex characteristics of main and side road access points and providing feature inputs for the random forest classification model.
First, to detect the spatial clustering characteristics of the mutation point set on the road segment, Ripley’s K function is used for quantitative modeling, as shown in Equation (3).
Here,
represents the spatial distance,
is the total number of mutation points, and
is the total area of the feature.
is the weight. In the specific analysis, mutation points from different road segments are extracted through the K function analysis to assess the significant clustering at specific distances.
Figure 5 presents the visualization results of mutation point pattern analysis for a specific road segment. For road segments with main and side road access points, mutation points tend to cluster at the access points, while road segments without side roads exhibit a random distribution.
The spatiotemporal distribution characteristics of trajectory and road network data are closely related to the identification of main and side road access points, and can be mined through road network and trajectory data. Therefore, this study constructs a series of features for the identification of potential side road segments and access points. As shown in
Table 2, based on the work in [
45], this study introduces statistical features related to dynamic behavior and spatial clustering, effectively combining the local and global correlations between road network topology information and trajectory behavior features, which enables a comprehensive description of the complex characteristics of main and side road access points.
2.2.2. Random Forest-Based Model Construction for Side Road Segment Identification
To identify potential side road segments, this study proposes a classification model based on random forest. The random forest algorithm demonstrates strong adaptability in handling complex data and multidimensional features. By integrating the results of multiple decision trees, it effectively captures the relationship between road segment features and the existence of side roads. Trajectory data often contains noise and randomness, and the ensemble learning mechanism of random forest can smooth the impact of noise while emphasizing key features, ensuring the stability and reliability of classification results. This makes it well suited for tasks involving structured data in the field of geographic information [
46].
For the mutation interval , SRDet constructs the feature array , where is the label vector indicating the road segment category (labeled as Y for potential side road segments and N for non-potential side road segments). Through feature importance analysis, random forest can automatically identify the key variables influencing side road determination, thereby efficiently adapting to complex data structures.
2.3. Access Point Identification and Side Road Generation
For potential side road segments, this study performs mutation point density calculation based on density peak clustering and kernel density estimation, extracting access points and fitting side roads. The advantage of SRDet lies in the fact that, with clear road segment characteristics, access point identification is conducted within the framework of existing road segment features and traffic behavior patterns, ensuring accurate recognition results.
As shown in
Figure 6, SRDet first uses the Gaussian kernel density method to estimate the local density of mutation points, characterizing the density distribution of mutation points for the target road segment [
47]. Based on the overall density distribution layer of mutation points, SRDet applies the density peak clustering algorithm to extract cluster centers, which are then treated as potential main and side road access points. Next, based on the lane-change behavior types identified through mutation interval recognition, SRDet classifies the main and side road access points, as shown in Equation (4). Finally, based on the characteristics of side road configuration and the categories of access points, SRDet fits and generates the complete structure of the side road.
where
represents the number of access points within the search radius where the lane-change behavior label is marked as transitioning from the main road to the side road, and
represents the number of access points within the search radius where the lane-change behavior label is marked as transitioning from the side road to the main road.
represents the proportion of access points within the search radius where the lane-change behavior label is marked as transitioning from the main road to the side road, relative to all access points. The maximum threshold
and minimum threshold
are set to determine the classification of main and side road access points, as shown in
Table 3.
According to road design standards and actual traffic flow characteristics, side roads can be classified into three categories: “Only main-to-side road”, “Only main-to-side road”, and “Bidirectional turn allowed” as shown in
Table 3. Based on the extracted access points and their categories, and considering the typical parallel distribution of side roads close to the main road, the side road segments are fitted and generated. This step identifies the access points of side roads and generates the road segment-level side road paths.
5. Conclusions and Future Work
With the acceleration of urban development, side road information has become increasingly important in road network construction and navigation. Due to the closely parallel nature of side roads, current methods of extracting road navigation information from trajectory data struggle to effectively identify side road features. This study proposes a novel side road information extraction method, SRDet, based on road information and trajectory semantic features. SRDet innovatively analyzes the unique trajectory lane-change patterns associated with the main-to-side road transition behaviors, adopting a multi-level optimization strategy to deeply mine trajectory feature information. By combining geometric features with spatial statistical theory, SRDet constructs a multimodal feature extraction framework. The use of a random forest classification model further enhances the ability to judge the main-to-side road connection relationships, accurately identifying the entry/exit points of side roads and thereby achieving effective detection of side roads at the road segment level. This provides more comprehensive data support for traffic planning and management.
Compared to existing methods for extracting digital road information, SRDet demonstrates good applicability in acquiring more detailed side road information. The experiments, based on trajectory data from Beijing, use remote sensing images, street view maps, and other annotated data for side road entry/exit points as ground truth to evaluate the main-to-side road discrimination results. The results show that SRDet performs with high accuracy and reliability in side road information extraction, verifying its potential and value in practical applications.
In conclusion, SRDet fills the research gap in the subfield of side road detection within navigation information extraction. Future improvements can focus on trajectory behavior recognition by deeply analyzing the interaction patterns between user driving behaviors and the main-to-side roads, enhancing the construction of multimodal features, and further exploring the spatiotemporal distribution information embedded in trajectory data.