Based on the questionnaire results, the primary factors identified were cycling road smoothness and visual friendliness, which were evaluated, respectively, through signal transformation analysis and image-based object recognition techniques.
2.3.1. Riding Roughness Assessment
Roughness is a significant factor affecting the cycling experience (see
Section 2.2.2). Road roughness can result from elements such as speed bumps, surface fractures from aging, or height discrepancies between road sections. Current studies on pavement roughness are mostly based on motor vehicle vibration data, weighting root mean square acceleration and bumps by using maximum acceleration [
34,
35], and root mean square (RMS) acceleration [
36]. The roughness of bike roads also contributes to vibration. Increased roughness in bike lanes leads to a bumpier cycling experience, impacting overall comfort. A vibration-based quantitative method is proposed to assess bike lane roughness, referencing the Highway Performance Assessment Standard (JTG 5210-2018) [
37]. Based on the aforementioned Standard, Equation (7) calculates the Road Roughness Index (RRI), which serves as a quantitative method to assess the roughness of a bike lane:
where
——the final value for the factor ‘riding roughness’;
CBR ——Cycling bump ratio (%);
——15.00 for asphalt pavement, 10.66 for cement concrete pavement;
——0.412 for asphalt pavement, 0.461 for cement concrete pavement;
——Conversion factor (m);
——Number of bumps;
——Length of the bike lane.
In this formula, bike lane length is directly measurable using a bike computer. A vibration analysis method calculates the number of bumps encountered during cycling. The vibration sensor is mounted on the seat, as shown in the red square frame in
Figure 5.
This study takes Zhang Wu Road as an example to expose the entire process of vibration signal processing. WTVB01-BT50 vibration sensor with an acceleration accuracy of 0.0005 g/LSB and BSC200 bike computer made by IGPSPORT is used for data acquisition. High-frequency signals primarily represent noise and bumps. Assuming noise at each sampling point follows an independent normal distribution, bumps can be identified by detecting outliers in the high-frequency signal. Extracting road bump information requires filtering and analyzing the high-frequency component of the vibration signal. Wavelet decomposition and reconstruction are widely used in signal processing [
38]. This study selects Symlet as the wavelet base for the vibration signal to reconstruct the high-frequency signal, as shown in
Figure 6.
After filtering, the high-frequency signal data can be converted into bump counts using the following method. According to the international standard ISO 5349-1:2001 [
39], adults experience discomfort at a vibration velocity of 0.5 m/s on vibration levels tolerable to the human body. Therefore, abnormal bumps on cycling routes can be identified if the vibration velocity exceeds this threshold. If vibration velocity is significantly lower than this threshold, abnormal bumps can be detected using the 3
law of normal distribution. If the signal is not normally distributed, abnormal bumps can be detected using the Z-score method, with calculation formulas shown in Equations (9)–(11).
where
——time of signal acquisition;
——initial time;
——vibration velocity at time ;
——mean value of the vibration velocity signal;
——standard deviation of the vibrational velocity signal;
——number of signal samples.
In this example, the high-frequency signal is analyzed, and its distribution is shown in
Figure 7. In the figure, the Histogram with KDE (Top Left) displays the distribution of vibration speed data using a histogram. The vertical bars represent the frequency of occurrences for different vibration speeds. The red line overlaid on the histogram represents the Kernel Density Estimate (KDE), which provides a smooth curve that approximates the probability density function of the data. Boxplot (Top Right) provides a graphical representation of the distribution of the vibration speed data through its quartiles. Kernel Density Estimate (Bottom Left) is a standalone version of the KDE shown in the top left plot. QQ Plot (Bottom Right) compares the quantiles of the sample data against the quantiles of the normal distribution. Applying the
p-value method to test normality in this high-frequency signal indicates that it is not normally distributed. Thus, the Z-score method is used to convert the signal into bump counts.
Using a score threshold of 4, 24 outliers in the high-frequency signals are identified, as shown in
Figure 8.
Finally, according to the previously mentioned quantitative method, the score (RRI) for this example is calculated as 0.95.
2.3.2. Visual Friendliness Assessment
The visual friendliness of cycling roads refers to the influence of environmental information perceived during cycling. The visual assessment of road visual friendliness primarily involves evaluating the separation methods of motor vehicles and non-motorized vehicles, occupation by parked motor vehicles, and the presence of bicycle-friendly facilities. The critical aspect of evaluating these factors lies in identifying the corresponding elements, a task efficiently performed by the You Only Look Once (YOLO) model.
YOLOv11 [
40,
41] is employed for the recognition of these elements. It represents an efficient series of object detection algorithms that have achieved significant advancements in recent years. Unlike traditional selective search methods, the YOLO series utilizes a unified neural network architecture that divides an image into grids, predicting bounding boxes and their associated category probabilities [
42]. This approach demonstrates excellent real-time performance. Compared with other models or algorithms such as SegNet, Mask R-CNN, and Faster R-CNN, YOLO has a clear advantage in recognition speed [
43]. YOLO–OBB is specifically applied to detect the separation method on the target road. Oriented bounding box (OBB) detection extends conventional target detection by incorporating object rotation, enabling the effective identification of inclined objects and enhancing detection accuracy for tilted or oriented targets [
44]. This capability makes it more suitable for such tasks compared to axis-aligned bounding boxes. Relevant studies have adopted a two-stage approach to optimize the model, highlighting its significant potential for real-time, large-scale vehicle recognition [
45]. This optimized method has proven effective in identifying vehicles from image frames extracted from video data.
In this study, three YOLO models were independently trained to identify different separation methods, occupied vehicles, and bicycle-friendly signs. Due to the absence of existing models for these detection components, training was conducted on thousands of self-categorized and labeled images corresponding to the relevant elements.
Field survey imagery from Shanghai was captured and split into an 80% training subset and a 20% validation subset. The training set contained precisely annotated images. For the separation method recognition model, four primary classification scenarios (Green Belt, Guardrail, Line and No separation) were identified through a comprehensive analysis of questionnaire responses and a systematic review of the existing literature. Approximately 3000 images per separation category were annotated, resulting in a cumulative total of 9156 annotated images across all categories (excluding the ‘No separation’ category, which does not require recognition). The occupation recognition model employed the VOC2007 benchmark dataset for vehicle class recognition training, while the friendly facilities recognition model was trained on a dataset containing approximately 15,000 annotated images. Model training was conducted on a computational platform equipped with an NVIDIA GeForce RTX 4060 Laptop GPU running Ubuntu 22.04. The software environment was configured with Python 3.12.7, PyTorch 2.5.0, and CUDA 11.5. The training process utilized an input resolution of 640 × 640 pixels, a batch size of 16, and was conducted over 400 epochs. The Stochastic Gradient Descent (SGD) optimizer was employed for parameter optimization to facilitate efficient convergence. This configuration was selected to balance computational efficiency and model performance. As shown in
Figure 9, the separation method recognition model exhibited progressive learning and refinement throughout the training over time. The results demonstrate competitive performance, with reduced loss values and enhanced evaluation metrics.
Techniques like batch normalization and dropout are used to improve classification performance [
46,
47]. The model is rigorously trained on VOC2007, with results displayed in
Figure 10.
Figure 10a illustrates the F1 value for the ‘car’ class across varying score thresholds, demonstrating that the F1 score stays consistently above 0.8 when the threshold is set to 0.5, indicating a robust balance between precision and recall. The precision–recall curve in
Figure 10b shows an Average Precision (AP) of 91.35% for the “car” class, highlighting the trade-off between precision and recall as recall increases, leading to additional false positives.
Figure 10c shows that the model sustains high precision even as the threshold increases. The Recall–Score Threshold curve in
Figure 10d complements the precision analysis by demonstrating the model’s capability to detect all instances as the threshold varies. A high recall rate, especially at higher thresholds, indicates that the model effectively captures all instances without omission.
To further enhance recognition accuracy, various image preprocessing techniques, such as Contrast Constrained Adaptive Histogram Equalization (CLAHE) [
48] and Spatial Transformer Networks (STNs) [
49], are applied to enhance image features and improve the model’s robustness. The accuracy of the various methods employed is presented in
Table 4. Ultimately, the best current Single CNN with 3 STNs method was chosen.
Using the trained model, specific elements are identified and utilized to assess the road’s visual friendliness. For the assessment of separation methods, occupations, and friendly facilities, distinct mathematical models are constructed to transform identification results into evaluation scores for each factor.
Regarding the separation method assessment, the YOLO–OBB model is employed to identify different kinds of segmentation methods for evaluation. Motor vehicle and non-motorized vehicle separation facilities are physical barriers that separate motor vehicle and non-motorized vehicle lanes. Their primary function is to prevent motor vehicles from intruding into non-motorized lanes, providing a safer and more comfortable cycling environment. As shown in
Section 2.2.2, improved separation facilities significantly enhance bike lane user satisfaction. This study proposes a quantitative methodology to categorize different separation methods.
Table 5 categorizes separation methods into four classes to determine their effectiveness in ensuring a safer cycling experience. The higher the separation facility’s grade, the more effective it is at isolating cyclists from the roadway, with a correspondingly higher score.
For multiple separation methods on the same road, a segmented weighted average technique is used to score the separation facilities across the entire road, as shown in Equation (12), where
represents the average assessed value of separation for this lane,
is the length of the
k-th separation method, and k is the score for the
k-th method, where the score ranges from 0 to 3, as shown in
Table 5. This equation assigns weights to the scores of the separation methods based on the length of the corresponding road sections, considering the scenario where a single road section may employ different separation methods. To assess the length of the
k-th separation method, a video of the entire road section is sampled continuously at a fixed frequency. The number of images represents the length of the
k-th separation method. Finally, the results are transformed into percentages as in Equation (13), where
is 0 and
is 3.
represents the final value for the factor ‘motor vehicle and non-motorized vehicle separation method’.
Regarding the occupation assessment, the YOLO model is employed to detect and quantify the number of vehicles occupying bike lanes for evaluation. Vehicle parking occupation on bike lanes refers to motor vehicles that violate traffic regulations by illegally occupying or entering the bike lane space. Such illegal occupation disrupts the continuity of bike lanes, and survey results indicate that a higher incidence of motor vehicles occupying bike lanes reduces the road’s cycle-friendliness (see
Section 2.2.2). Consequently, the number of vehicles is utilized as a metric to establish the formula (see Equation (14)), where n is the number of motor vehicles illegally parked in bike lanes, L is the road length, and
is the occupation level. The evaluation score is calculated by the number of motor vehicles illegally parked in bike lanes divided by the road length, with units of vehicles/km. The more vehicles occupying the cycling lane, the lower the road section’s cycle-friendliness. Since this encroachment is a negative indicator, it is adjusted using Equation (15). In this formula,
represents the final score for the factor ‘bike roads occupied by unauthorized vehicles’.
is the maximum value in this evaluation. Considering that an average vehicle is approximately 4.5 m long,
is set to 1000/4.5 = 220 vehicles/km, indicating that the road is fully occupied by unauthorized vehicles. The minimum evaluation value,
, is assumed to be 0, indicating the road is entirely unoccupied.
Regarding the friendly facilities assessment, the YOLO model is employed to detect and quantify the number of cycle-friendly signs for evaluation. Cycle-friendly facilities refer to supportive signs and markings that offer guidance and support for cyclists. These facilities are categorized into two types: static, including bicycle markings, signage, and guidelines, and dynamic, such as bicycle signals and left-turn phases.
Figure 11 illustrates an example of cycle-friendly facilities.
There is a direct correlation between cycle-friendly facility density and cycling friendliness. An increased presence of friendly signs not only effectively guides cyclists on their routes but also enhances their confidence in road safety, encouraging more people to choose cycling. When bike lanes are clearly and adequately marked with cycle-friendly signs, it helps enhance cyclists’ sense of safety and contributes to a better overall experience. In this study, this metric is evaluated based on the density of friendliness markers. Friendly facility coverage density is expressed by Equation (16), where n represents the number of signs and L is the length of the roadway.
represents the final score for the factor ‘bicycle-friendly facilities’. This indicator assesses the score for bicycle-friendly facilities using the number of signs per unit road.
The trained model performs well in recognizing elements related to visual friendliness. The output image displays multiple bounding boxes with identified names and confidence values, as shown in
Figure 12,
Figure 13 and
Figure 14, accurately identifying the locations and classes of the objects. This accurate detection makes it possible to evaluate relevant factors of bike lanes.
In addition to the above points, this study also considers the impact of green space along roadways on cycling visual friendliness. As residents’ expectations for travel experiences grow, green space along roadways is increasingly vital in enhancing cyclist comfort and well-being, particularly for riverfront leisure-oriented bike lanes [
54,
55]. This study adopts the road green coverage ratio as a primary metric for evaluating the cycling landscape, assuming that a higher percentage of green areas within the field of view will improve ride friendliness. The green area coverage rate is defined in Equation (17), where
represents the area of green belt pixels and
is the total image area.
represents the final score for the factor ‘roadside scenery’. This study utilizes the CV2 library in the OpenCV framework for image processing. By identifying pixels within a defined RGB color range, the ‘inRange’ function creates a binary mask, isolating green regions within the image [
56]. This approach segments vegetated areas, such as green belts or plant-covered surfaces, using color thresholds, enabling precise assessment of green coverage.
The evaluation of road width adheres to China’s Urban Road Engineering Design Code, which specifies dimensional standards based on road classification and functional type. In this study, road width data (denoted as
) were directly extracted from the Baidu Maps Open Platform [
57].
represents the final score for the factor ‘road width’.