KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition

Kim, Hyeongbok; Kim, Eunbi; Ahn, Sanghoon; Kim, Beomjin; Kim, Sung Jin; Sung, Tae Kyung; Zhao, Lingling; Su, Xiaohong; Dong, Gilmu

doi:10.3390/data10030036

Open AccessArticle

KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition

by

Hyeongbok Kim

^1,2,*

,

Eunbi Kim

¹,

Sanghoon Ahn

¹,

Beomjin Kim

¹,

Sung Jin Kim

³,

Tae Kyung Sung

⁴,

Lingling Zhao

²,

Xiaohong Su

² and

Gilmu Dong

^1,*

¹

Testworks, Inc., Seoul 01000, Republic of Korea

²

Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China

³

Korea Automotive Technology Institute (KATECH), Cheonan-si 31000, Republic of Korea

⁴

WiFive Ltd., Daejeon 34000, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Data 2025, 10(3), 36; https://doi.org/10.3390/data10030036

Submission received: 13 January 2025 / Revised: 26 February 2025 / Accepted: 6 March 2025 / Published: 14 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

Comprehensive datasets are crucial for developing advanced AI solutions in road infrastructure, yet most existing resources focus narrowly on vehicles or a limited set of object categories. To address this gap, we introduce the Korean Road Infrastructure Dataset (KRID), a large-scale dataset designed for real-world road maintenance and safety applications. Our dataset covers highways, national roads, and local roads in both city and non-city areas, comprising 34 distinct types of road infrastructure—from common elements (e.g., traffic signals, gaze-directed poles) to specialized structures (e.g., tunnels, guardrails). Each instance is annotated with either bounding boxes or polygon segmentation masks under stringent quality control and privacy protocols. To demonstrate the utility of this resource, we conducted object detection and segmentation experiments using YOLO-based models, focusing on guardrail damage detection and traffic sign recognition. Preliminary results confirm its suitability for complex, safety-critical scenarios in intelligent transportation systems. Our main contributions include: (1) a broader range of infrastructure classes than conventional “driving perception” datasets, (2) high-resolution, privacy-compliant annotations across diverse road conditions, and (3) open-access availability through AI Hub and GitHub. By highlighting critical yet often overlooked infrastructure elements, this dataset paves the way for AI-driven maintenance workflows, hazard detection, and further innovations in road safety.

Keywords:

road infrastructure; object detection; image segmentation; data annotation; intelligence driving

1. Introduction

Diverse, large-scale annotated visual datasets, such as ImageNet [1] and COCO [2], have been pivotal in driving recent advances in computer vision through supervised learning. Typical deep learning models can require millions of training examples to achieve state-of-the-art performance [3,4,5,6]. However, in domains like road infrastructure and autonomous driving, leveraging the power of deep learning is not as straightforward, largely due to the lack of sufficiently comprehensive datasets [7,8,9].

Although existing large-scale road datasets—such as Cityscapes [8], BDD100K [10], and Vistas [9]—have significantly propelled research on object detection and segmentation in traffic scenes, many remain constrained by limited scene diversity, narrow geographic coverage, or relatively basic annotations that focus primarily on driving perception categories (e.g., cars, pedestrians, bicycles). As a result, this scarcity of specialized infrastructure labels can lead to model overfitting to domain-specific characteristics [11] and ultimately impede higher-level situational understanding or complex decision-making—particularly when robust insights into road safety, traffic management, or facility maintenance are required.

Furthermore, it is crucial to move beyond conventional “intelligent driving perception” tasks and address the full life cycle of road facilities—encompassing type, location, and condition assessment for ongoing maintenance and proactive inspections. Such efforts not only support practical use cases like preventive maintenance and early hazard detection but also align with broader policy initiatives (e.g., digitized national road registries) geared toward enhancing existing infrastructure management. By producing specialized AI training data tailored to these more demanding scenarios, our dataset facilitates (1) more efficient road maintenance through AI-driven analytics, (2) the development of intelligent infrastructure services, and (3) a robust safety management framework to mitigate risks posed by aging facilities. Building upon these objectives, we now introduce a novel dataset that addresses the aforementioned shortcomings.

In addition, real-world applications often require an integrated approach to perception tasks of varying complexity [11,12,13], from basic bounding-box annotations to advanced instance segmentation or multi-object tracking [2,14,15,16,17]. Nonetheless, many existing efforts remain limited in scope, focusing on small-scale or narrowly targeted datasets [7,8]. Taken together, these limitations underscore an urgent need for a more comprehensive, high-resolution dataset that captures specialized road facilities in realistic and varied conditions. Such a resource would enable the development of robust AI models for advanced infrastructure management and intelligent driving applications.

As summarized in Table 1, our proposed Korean Road Infrastructure Dataset (KRID) incorporates substantially more road infrastructure categories, provides detailed annotations in both bounding-box and segmentation formats, and achieves nationwide coverage across diverse environments. Hence, this approach offers a more comprehensive foundation for next-generation intelligent transportation and road management systems.

Against this backdrop, road infrastructure is dynamic throughout its life cycle, encompassing construction, operation, and especially post-construction maintenance, where the timely detection of damage to guardrails, signs, tunnels, or bridges is critical for preventing safety hazards and excessive repair costs. Historically reliant on labor-intensive manual inspections, infrastructure management is increasingly turning to large-scale image data and advanced deep learning solutions for more efficient facility assessments [21,22]. Parallel efforts in road registry digitization further highlight the potential of combining large-scale image data with digital road records, thereby supporting proactive maintenance workflows and fueling further innovations in intelligent transportation systems.

From a broader perspective, rapid advances in machine learning and deep learning have demonstrated the crucial role of AI in automating complex tasks, such as object detection and classification. Various academic institutions have undertaken large-scale image and video dataset projects, showcasing the potential for combining deep learning with well-curated visual data [23]. Meanwhile, a range of national initiatives highlight the need to digitize extensive portions of the road network, streamline management across different agencies, and ultimately make these resources widely accessible. These ambitious digitization efforts underscore the growing demand for robust AI training datasets dedicated to road facilities, especially as mobility needs, safety standards, and intelligent driving technologies continue to evolve [24,25].

Accordingly, the principal objective of this study is to develop and validate a comprehensive AI training dataset for detecting and classifying road facilities, while also exploring its potential applicability in intelligent driving domains. Specifically, we first collected frames at 20–30 m intervals using a GPS-based coordinate system (see Appendix A.2), resulting in more than 2,000,000 high-resolution RGB images from diverse road environments.

Next, human reviewers conducted a manual review to remove duplicates or frames lacking discernible objects, retaining 2,000,000 images. They then manually screened these images for sufficient and diverse infrastructure features, finalizing a selection of 200,000. Subsequently, we annotated a total of 2,676,583 objects, spanning a wide array of road facilities, as shown in Figure 1, which provides an overview of the newly curated dataset and illustrates specialized infrastructure elements captured under various conditions. We then conducted a systematic evaluation of the dataset’s utility through AI-based analytical techniques—such as object detection and segmentation models—and by examining various intelligent driving application scenarios.

In summary, this paper makes the following key contributions:

We introduce a new large-scale road infrastructure dataset encompassing 34 facility categories, with annotations provided in both bounding-box and polygon segmentation formats;
We design and implement a robust annotation workflow—including strict privacy compliance and rigorous quality-control measures—that ensures the high quality of the labels;
We validate the dataset’s utility through extensive object detection and segmentation experiments, and release it as an open resource to promote further research and collaboration.

In the following sections, we outline our dataset collection and annotation methods, as well as the integration of image data with statistical records. We then present experimental results focused on detection and classification under diverse conditions, followed by a discussion of potential applications in infrastructure management and opportunities for future research.

2. Materials and Methods

In this section, we detail our data collection sites and acquisition tools, outline annotation procedures and workforce training protocols, and explain the steps taken to ensure privacy compliance. By providing these specifics, we enable the straightforward replication and potential extension of our methodology.

2.1. Data Collection Sites

Figure 2 provides an overview of the major road types (city and non-city) surveyed for this dataset. Specifically, the sub-figures illustrate how highways, national roads, and local roads have been categorized into City and Non-City sections, ensuring a balanced coverage across diverse regions.

Table 2 presents the distribution of road network lengths and corresponding ratios across 15 regions, categorized into Non-City and City roadways. Within the Non-City category, highways, national roads, and local roads are shown alongside their kilometer values and respective percentages. Similarly, the City category includes national and local roads, also detailed by region. Overall, the Non-City portion constitutes the majority (89.80%) of the total network, while the City portion makes up 10.20%. The table concludes with aggregate values in kilometers for each road type and highlights the percentage of each road category relative to the overall sum of 46,000 km.

We began by reviewing the entire national road network using Ministry of Land, Infrastructure, and Transport (MOLIT) data to examine road conditions and facility information for each segment. Next, in consultation with the National Information Society Agency (NIA), we narrowed our selection to those segments that best aligned with the project’s objectives, taking into account regional characteristics, road features, and budget considerations.

Non-city areas, which include highways, expressways, and a combination of national and local roads, comprised nearly 90% of the collected data. These regions typically feature wider lanes and more consistent traffic flow, making it feasible to capture clear, high-resolution imagery of road facilities. In contrast, the city sections (approximately 10%) reflect diverse geographical characteristics (e.g., mountainous, basin, plains) primarily in regions such as Chungcheong (South and North), ensuring that the dataset encapsulates both high-density and varied city layouts.

All surveyed segments largely overlap with the high-precision mapping zones administered by the National Geographic Information Institute (NGII), thereby facilitating alignment with various geospatial or AI-based analytical frameworks.

2.2. Data Acquisition Tools

2.2.1. Imaging System

Viewer and Control Program: A custom-developed application supported both single and multi-camera feeds, providing a real-time display to the operator and allowing for direct control over imaging parameters;
GPS-Based Time Server: To synchronize camera timestamps, a GPS-based time server was incorporated. This infrastructure ensured that the camera’s capture time was accurately logged for each frame, at a rate exceeding 10 frames per second;
Precise Timestamping: Each captured frame received a timestamp (in milliseconds) from the GPS server, enabling the accurate tagging of spatial and temporal data;
Automatic Positional Extraction: By aligning each frame’s timestamp with the associated GPS log, the system automatically extracted precise location coordinates, facilitating robust georeferencing.

To facilitate nationwide road imaging and GPS logging, we employed a specialized acquisition system, as shown in Figure 3. The XNB-8003 camera supports a 6-megapixel (3328 × 1872) resolution at 30 fps (H.265) with real-time video transmission. Additionally, each vehicle was outfitted with integrated cameras and onboard systems to ensure high-quality data collection.

2.2.2. Vehicle System

Vehicle Platforms: The data acquisition fleet included the UOK-LAND I (1 unit), UOK-LAND II (5 units), and 3 newly designed survey vehicles from Sodasystem—9 vehicles in total;
Mounting Structure: When necessary, a custom frame or elevated mounting apparatus was installed on top of the vehicles to secure the cameras in positions at or above 1900 mm;
Fixed Camera Angle: The camera pitch and yaw angles were held constant throughout data collection, minimizing perspective variation and streamlining subsequent processing steps.

As shown in Figure 4, this integrated approach—uniform camera placement, consistent imaging angles, and GPS-based synchronization—enables high-quality data collection across different road segments, facilitating downstream tasks such as the annotation, training, and validation of deep learning models for road-facility analysis.

2.2.3. Data Gathering Protocol

Data collection was organized into five distinct phases, each designed to ensure a systematic approach to acquiring high-quality data under diverse road conditions.

Survey Planning: First, we identified weekly survey regions (index) and obtained local weather forecasts. We also verified whether any construction projects or restricted zones might interfere with data acquisition in the targeted areas.
Preparation: All vehicles underwent mechanical inspections, and the camera equipment was tested to confirm functionality. These steps ensured that any technical issues were addressed before field deployment.
Field Operation: A skilled team with extensive data acquisition experience was dispatched to conduct the surveys. To maintain steady imaging and consistent viewpoints, the team observed specific driving protocols, such as maintaining a constant speed to reduce motion blur and ensure uniform image quality, positioning the vehicle near the lane center where safety permitted, keeping an appropriate following distance to avoid abrupt stops or accelerations, and minimizing unnecessary lane changes. These measures helped to standardize the conditions under which each road segment was surveyed (see Figure A3 for a detailed workflow illustration).
Reporting and Data Storage: Upon the completion of each run, the team prepared a concise survey report. Video data and GPS logs were uploaded to a centralized server, and backups were created to safeguard against potential data loss.
Verification and Quality Check: Finally, the collected data underwent a verification process. Image quality (resolution, focus, exposure) was inspected, and any missing elements (e.g., GPS information or video segments) were identified. Duplicate data instances were also detected and removed, ensuring the dataset’s overall consistency and reliability.

2.3. Data Annotation and Management

2.3.1. Data Annotation Tools

To process and annotate the collected RGB images, we employed a proprietary labeling tool called blackolive [26], developed by Testworks (see Figure 5). This tool features a Korean-language interface for local annotators, large-scale data and worker management, and a no-code MLOps-based auto-labeling model. In particular, it supports both bounding-box and polygon-based annotations, enabling two key AI tasks—object detection and segmentation. Notable features include the following:

Task Logging: The system records all worker actions, enabling the traceability of labeling tasks;
Online Operation: Runs on a web-based interface, allowing distributed annotators to collaborate remotely;
Auto-Labeling: Provides an optional auto-labeling function, where baseline annotations are automatically generated and then refined by human annotators;
Operating Institution: Developed and maintained in-house by Testworks, ensuring alignment with project-specific requirements.

In Figure 6, the human-in-the-loop annotation workflow begins with raw data, which undergoes an initial data annotation process. A trained model then performs auto-labeling, generating preliminary labels that are subsequently reviewed by human annotators. After any necessary corrections are made, the labeled data are used to retrain the model, thereby continuously improving the labeling accuracy. For more detailed labeling guidelines and training protocols, please refer to Appendix A.1, which includes examples of incorrect bounding-box placements and recommended solutions for quality improvement.

2.3.2. Data Processing and Management

To ensure efficient scheduling, progress tracking, and quality control, a dedicated web-based management platform was utilized. As illustrated in Figure 6, each step of the data annotation workflow—from initial task creation to final acceptance—was orchestrated within this procedure. The tool provides the following core functionalities:

Task Allocation: Distributes labeling work among annotators and supervisors;
Project and Worker Management: Monitors overall status, individual performance, and assignments for each project;
Quality Assurance Workflow: Tracks the review and acceptance process (e.g., rate of rejections, final approvals), allowing for the real-time monitoring of quality metrics;
Collaborative Issue Tracking: Offers a bulletin board system, enabling annotators, reviewers, and administrators to share and resolve issues;
Dashboard and Statistics: Presents real-time progress indicators such as completion rate, rejection percentage, and quality metrics, with the ability to download task data or generate charts for both worker-level and project-level reporting.

By integrating our annotation tool with this centralized management platform, the project maintained high-quality labeling standards, streamlined workflows, and ensured that each phase of the data pipeline was auditable and optimizable.

2.4. De-Identification

In accordance with the Personal Information Protection Act of Korea and the regulations set forth by the National Information Society Agency (NIA), we conducted a thorough legal review that addressed copyright issues, personal data usage consent, portrait usage consent, and potential defamation risks. Furthermore, we obtained informed consent from all personnel involved in handling these data, thereby ensuring compliance with the applicable legal requirements. All raw images intended for public release (via AI-Hub and GitHub (accessed on 1 January 2025)) were processed to remove any personally identifiable information. This de-identification process is summarized in Figure 7. It involved the following steps:

1.: Initial Automated Detection
An AI-based de-identification solution was employed to automatically detect potential privacy-sensitive regions in the images (e.g., faces, license plates). This first-pass detection generated candidate bounding boxes likely to contain personal data. In particular, we utilized an in-house de-identification tool, developed from well-known benchmark deep learning models and further refined with our own data. It served to set initial bounding-box candidates before human annotators performed final corrections;
2.: Secondary Review and Verification
The automatically detected regions were then provided to trained annotators for manual inspection and refinement. These annotators were rigorously selected from among individuals with proven experience in other vision-data processing projects, ensuring a high level of labeling proficiency. During this step, missing detections were added, inaccuracies were corrected, and all personal data regions were effectively blurred or obscured. This double-layered approach ensured that sensitive information was comprehensively removed while preserving the overall utility of the dataset for AI analysis.

2.5. Workforce Training

The annotation workforce was primarily secured through the lead institution’s in-house annotators, with additional personnel sourced via crowdsourcing from August to December 2023, leading to a total of 145 annotators recruited, including 125 crowd workers, 6 commercial employees, and 14 full-time staff. To ensure high-quality labeling and annotation, we developed detailed guidelines (Figure A1a) outlining the procedures and standards for consistent dataset creation. All hired crowd workers underwent mandatory training based on these guidelines, which included both offline sessions at our training center in Seoul (Figure A1b) and an online platform (Figure A1c). This comprehensive approach helped to maintain a uniform level of annotation quality throughout the project. On average, each annotator worked about 5 hours per day, culminating in approximately 700 man-months of labeling efforts.

2.6. Data Description

Table 3 outlines the structure and required fields of the dataset, specifying the information stored under each array (e.g., info[], images[], categories[], annotations[]). This ensures a consistent format for image metadata, annotation details, and category definitions, promoting accurate and reliable data management.

Figure 8 illustrates a range of facilities annotated with bounding boxes, encompassing road signage (e.g., delineator signs, seagull marks, road signs), safety equipment (e.g., speed bumps, obstacle markers), and infrastructure components (e.g., cross beams, pillars). Each sub-figure illustrates how these items appear and are labeled in their original traffic environments, thus highlighting the diverse range of objects encountered. This bounding-box approach is particularly effective for relatively discrete or well-defined objects, such as signs and small-scale safety features, allowing models to localize these elements within complex scenes efficiently.

In contrast, Figure 9 (Polygon Annotation) showcases facility types annotated with segmentation polygons, which include guardrails, shock absorptions, tunnels, and eco-engineered slopes, among others. These objects often exhibit a more elongated or irregular shape, making polygon-based annotation crucial for accurately capturing their contours. By providing precise boundaries, the dataset supports more advanced segmentation models that must identify detailed structural or spatial nuances—an essential requirement in scenarios like detecting damages on guardrails, monitoring rock-fall protection walls, or assessing the integrity of tunnels and bridges.

In summary, these figures underscore the dataset’s flexibility for both bounding-box and polygon annotations. Smaller objects benefit from bounding boxes for simpler localization, whereas larger or irregularly shaped facilities necessitate the granularity provided by polygon segmentation. This dual-format approach not only broadens the range of applications (e.g., object detection, instance segmentation) but also reflects practical deployment conditions where road infrastructure elements vary considerably in shape, size, and complexity.

In total, 34 facility types are included, encompassing 16 road safety facilities, 8 road management facilities, 8 traffic management facilities, and 2 other facilities. Among these, 19 facility types (56%) are annotated with bounding boxes (e.g., delineator signs, seagull marks, raised pavement markers, obstacle markers), while 15 facility types (44%) are annotated with segmentation masks (e.g., guardrails, tunnels, underpasses). A detailed breakdown of facility frequency and annotation types (bounding box or polygon) is available in Table 4.

In addition to tracking facility type and spatial extent, the dataset captures geographic and environmental variation. Figure 10 and Figure 11 illustrate the diverse distribution of annotated facilities and representative examples of bounding-box and polygon annotations. As shown in Table A1 and Table A2, the annotated objects are distributed across multiple administrative regions, road types (highway, national road, local road), and geographic contexts (city vs. non-city). This comprehensive coverage facilitates models that generalize effectively to a variety of traffic environments. Furthermore, each annotation includes fields specifying the region type (city or non-city) and road type (highway, national, or local), which enables fine-grained analyses of different road and landscape conditions.

Overall, the data outlined here offer a thorough foundation for the development and evaluation of computer vision algorithms aimed at detecting, segmenting, and classifying road infrastructure facilities. By balancing the breadth of coverage (i.e., multiple facility categories and geographic regions) with the depth of detail (i.e., precise bounding boxes or segmentation masks), this dataset provides a rich resource for building models and conducting advanced research in areas such as autonomous driving, infrastructure asset management, and intelligent transportation systems.

This road infrastructure facility training dataset comprises 34 facility types, broadly divided into 4 categories: 16 road safety facilities, 8 road management facilities, 8 traffic management facilities, and 2 facilities classified as “other”. Of these 34 facility types, 19 (56%) were annotated using bounding boxes, and the remaining 15 (44%) were annotated using segmentation. Specifically, the bounding-box annotation covers such facilities as delineator signs, seagull marks, raised pavement markers, obstacle markers, diagonal lines and structural paint, gaze-directed poles, lighting, road mirrors (or reflectors), speed bumps, traffic signals, road signs, safety signs, road nameplates, emergency contact facilities, CCTV, variable message signs, road guideposts, pillars, and utility pole. Meanwhile, the 15 facilities annotated via segmentation include median barriers, guardrails, shock absorptions, rock-fall nets, rock-fall fences, rock-fall protection walls, eco-engineered slopes, bridges, tunnels, underpasses, elevated roads, interchanges, underground passages, pedestrian overpasses, and bus or transit stops. Together, these annotated data form the foundation of a comprehensive dataset intended for robust training and evaluation in detecting and classifying a wide range of road infrastructure features. For additional details, please refer to Table 4.

Figure 12 illustrates sample images and their corresponding annotations for various road settings, including city national roads, city local roads, non-city highways, non-city national roads, and non-city local roads. In each row, the left-hand image depicts the original scene, while the right-hand image shows the same view with labeled objects. Within a single image, both bounding boxes and segmentation-based annotations can be applied as needed, demonstrating the consistency and versatility of the annotation process across diverse geographic and infrastructural contexts.

3. Results

3.1. Object Detection Model (Bounding Box)

We trained a bounding-box detection model to identify 19 types of road-related objects, including delineator signs, traffic signals, seagull signs, road signs, reflectors, safety signs, obstacle markers, road nameplates, structural paint and diagonal lines, emergency contact facilities, gaze-directed poles, CCTV, lighting fixtures, variable message signs, road mirrors, guideposts, speed bumps, and more.

A total of 200,000 images were used for bounding-box detection, split into training, validation, and test sets, as shown in Table 5. The dataset was curated to maintain a similar distribution of object classes.

In total, 1,604,318 objects were annotated in the training set, 200,540 in the validation set, and 226,775 in the test set, for a total of 2,031,633 annotated instances (see Table 6). Additionally, Table 7 provides the hardware setup and hyperparameter configurations for the YOLOv4-based bounding-box detection model, including GPU specifications and training conditions.

Object Detection Evaluation Results

We evaluated the trained bounding-box detection model on the test and validation sets. Table 8 presents the precision, recall, and mean average precision at a 50% IoU threshold (mAP₅₀).

Overall, the model achieves promising results, with an mAP₅₀ of approximately 70.57% on the test set. These outcomes confirm the feasibility of accurately detecting a wide range of road facilities in diverse environments. Figure 13 further illustrates sample inference results, demonstrating how the model localizes multiple object types under varying road conditions.

3.2. Segmentation Model (Polygon)

We trained a segmentation model to detect 15 categories of road-related infrastructure, including medians, guardrails, shock absorptions, rock-fall nets, rock-fall fences, rock-fall retaining walls, eco-engineered slopes, bridges, underpasses, viaducts, interchanges, pedestrian underpasses, footbridges, bus/transit stops, and tunnels. Each object instance was annotated using polygon-based masks appropriate for semantic or instance segmentation tasks. The dataset was divided into training, validation, and test sets, as shown in Table 5. Similar to the bounding-box dataset, a balanced distribution of object classes was ensured across each subset to maintain representativeness.

Additionally, Table 9 summarizes the total number of annotated instances—514,316 in the training set, 64,290 in the validation set, and 66,344 in the test set—amounting to 644,950 object instances. Furthermore, Table 10 details the environment setup used for the YOLOv5-based segmentation model, including hardware resources and training conditions.

Segmentation Evaluation Results

After training, the model was evaluated on both the test and validation sets. Table 11 summarizes the precision, recall, and mean average precision at 50% IoU threshold (mAP₅₀) for each set.

Overall, the segmentation model attained an mAP₅₀ of roughly 0.60 on the test set, indicating moderate to high accuracy in delineating diverse road facilities. These findings demonstrate the feasibility of polygon-based segmentation for complex infrastructure data in real-world conditions. Figure 14 further shows sample inference results, illustrating how the model accurately segments multiple facility types across various road scenes.

4. Discussion

This study introduces a large-scale, nationwide road infrastructure dataset, demonstrating the feasibility and robustness of AI-based object detection and segmentation in diverse real-world conditions.

4.1. Interpretation of Results

4.1.1. Balanced Coverage

The similar performance observed on both the test and validation sets (Table 8 and Table 11) suggests that our strategy of collecting data nationwide successfully captured a broad distribution of road facilities. The resulting dataset exhibits strong generalization capability with minimal risk of overfitting, notable given the heterogeneity and complexity of countrywide road networks.

4.1.2. Practical Utility

The dataset’s capacity to detect and segment a wide range of facilities—including guardrails, tunnels, and rock-fall-prevention structures—points to its applicability in infrastructure asset management, real-time hazard identification, and advanced driver-assistance systems. For instance, detecting damaged guardrails or fences can inform proactive maintenance, potentially averting large-scale accidents or failures.

4.1.3. Comparison with Prior Work

Unlike many region-specific or single-city studies, our dataset spans 34 facility classes (19 bounding-box categories, 15 segmentation categories), facilitating multi-task deep learning research that includes both straightforward object detection and more advanced instance or semantic segmentation. This nationwide scope addresses a major gap in current offerings, wherein smaller datasets may focus on sign detection alone or omit complex facility types.

4.2. Data Collection and Quality Assurance

The multi-phase data acquisition pipeline relied on nine specialized vehicles equipped with GPS-based synchronization and high-resolution cameras. Systematic measures—discarding low-quality or duplicated frames and performing a double-layered personal information de-identification—ensured both high data reliability and compliance with privacy regulations. Such rigorous protocols likely contributed to consistent model performance and minimized annotation errors in the ground-truth labels.

4.3. Limitations

4.3.1. Class Imbalance

Certain facility types (e.g., emergency contact devices, rare rock-fall nets) appear at very low frequencies (<1%), mirroring practical usage patterns but hindering consistent model generalization. Data augmentation or specialized sampling strategies could mitigate such imbalance.

4.3.2. Environmental Constraints

While comprehensive, the dataset includes fewer samples in nighttime or severe weather conditions (e.g., heavy rain or snow). Future data collection under these adverse conditions is crucial to improving robustness in harsher or atypical environments.

4.3.3. Lack of Temporal Annotations

Current annotations focus on static frames without temporal linkage. This omission restricts the model’s capacity to handle gradual occlusions or multi-frame continuity. Incorporating video sequences and time-linked annotations might further enhance performance, particularly when objects are partially obscured over time.

4.4. Practical and Theoretical Implications

The high detection and segmentation accuracies could significantly benefit both research and industry. Specifically, from a theoretical standpoint, our dataset provides a rich testbed for validating multi-task object detection and segmentation algorithms under diverse real-world conditions. This nationwide coverage opens new avenues for domain adaptation and transfer learning in road-infrastructure analytics, addressing the challenges of heterogeneous traffic environments [27].

On the practical side, our dataset can serve as a cornerstone for global traffic management improvements, offering detailed annotations that facilitate real-time hazard detection, infrastructure asset management, and advanced driver-assistance systems [24,25]. For instance, detecting damaged guardrails or fences can inform proactive maintenance [28], potentially averting large-scale accidents or failures. Integrating these capabilities into intelligent transportation systems may help reduce congestion, optimize route planning, and ultimately yield economic benefits by minimizing accident-related costs and traffic delays. Moreover, ongoing enhancements in 3D analysis models or sensor-fusion techniques (e.g., LiDAR, radar) can leverage our dataset to further improve perception accuracy, thereby contributing to the development of safer and more efficient autonomous vehicles.

4.5. Future Outlook

We are further improving the dataset by focusing on three main directions:

Sensor Fusion: Integrating LiDAR, radar, or thermal imagery could expand detection accuracy and ensure greater reliability for autonomous vehicles;
Rare Class Augmentation: Adopting GAN-based synthetic data generation or targeted data gathering may alleviate class imbalance issues;
Real-Time Monitoring Systems: Building on the current segmentation and detection capabilities, the continuous monitoring of critical road infrastructure is an attainable next step, enabling prompt maintenance responses and safer traffic management.

Overall, the results underscore the importance of a nationwide, high-resolution dataset for building robust, versatile AI models in road infrastructure applications. Despite the aforementioned challenges, the strong performance in both bounding-box detection and segmentation demonstrates practical viability, laying the groundwork for advanced, data-driven solutions in intelligent transportation, road safety, and beyond.

5. Conclusions

In this paper, we presented a large-scale, nationwide road infrastructure dataset for comprehensive facility recognition, encompassing highways, national roads, and local roads in both city and non-city regions throughout South Korea. By systematically collecting 2 million raw frames and refining them into 200,000 high-quality frames annotated with both bounding boxes and segmentation masks, we have established a robust foundation for training and evaluating computer vision models related to road safety, maintenance, and autonomous driving. From a theoretical perspective, this dataset expands the scope of multi-task learning—encompassing detection and instance segmentation—while also providing a rich resource for exploring transfer learning, domain adaptation, and class imbalance mitigation strategies in real-world road scenarios.

Through rigorous annotation protocols and multi-stage quality checks, our dataset demonstrates high precision and recall in detecting a diverse range of road facilities, including guardrails, traffic signals, tunnels, and bridges. The bounding-box detection experiment achieved an mAP₅₀ exceeding 70.57%, while the segmentation experiment attained around 60.2%, underscoring the dataset’s reliability for practical AI tasks. In practical applications, this comprehensive coverage of facilities supports various global traffic management and intelligent transportation systems, offering opportunities for preventative maintenance, reduced congestion, and potentially significant economic benefits through fewer accidents and more efficient routing. Additionally, by publicly releasing our dataset, we aim to encourage collaboration within the AI community, fostering further enhancements such as sensor fusion, and better annotation of rare or emerging road features.

Despite the dataset’s wide coverage, future expansions remain viable. We plan to incorporate additional imagery under challenging conditions (e.g., nighttime, severe weather) to enhance robustness. Moreover, if field conditions or technologies evolve (e.g., new sensors, advanced annotation platforms), we intend to update the dataset accordingly, ensuring its ongoing relevance for state-of-the-art traffic analysis and autonomous driving research. Through these continuous improvements, our dataset can serve as a flexible and evolving foundation for both theoretical advances in computer vision and real-world deployments—ultimately contributing to safer, more efficient road infrastructures worldwide.

Author Contributions

Conceptualization, G.D., S.J.K. and T.K.S.; methodology, S.J.K. and T.K.S.; software, E.K. and B.K.; validation, S.J.K. and E.K.; formal analysis, G.D. and H.K.; investigation, H.K. and S.A.; resources, G.D.; data curation, E.K.; writing—original draft preparation, H.K. and S.A.; writing—review and editing, H.K., L.Z. and X.S.; visualization, H.K. and B.K.; supervision, G.D.; project administration, G.D.; funding acquisition, G.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the AI Data Construction Project, funded by the National Information Society Agency (NIA) of the Republic of Korea.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. Furthermore, any personally identifiable information, including vehicle license plates, has been fully anonymized to ensure confidentiality and privacy compliance.

Data Availability Statement

Our dataset is publicly available for download via AI Hub (https://www.aihub.or.kr/, accessed on 1 January 2025) for Korean researchers and GitHub (https://github.com/kimhyeongbok/korean-road-infrastructure-dataset, accessed on 1 January 2025) for the international research community, distributed under the [licence type: CC-BY-NC-SA] license. For any inquiries or special requests, please contact the corresponding author or the first author.

Acknowledgments

We sincerely thank SODASYSTEM, UOK, KATECH, and WiFive for their active contributions. We also wish to express our gratitude to Su Xiaohong and Zhao Lingling from Harbin Institute of Technology for their valuable feedback and advice.

Conflicts of Interest

Authors Hyeongbok Kim, Eunbi Kim, Sanghoon Ahn, Beomjin Kim and Gilmu Dong were employed by the company Testworks, Inc. Author Tae Kyung Sung was employed by the company WiFive Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Dataset

Appendix A.1. Data Annotation Guidelines

Figure A1 provides an overview of the guidelines and training infrastructure used to achieve consistent, high-quality labeling across our dataset. As shown in Figure A1a, the annotation guidelines offer step-by-step instructions and quality benchmarks for labeling each object. These guidelines were introduced to all annotators through both in-person sessions at our offline training center in Seoul (Figure A1b) and remote courses via our online education platform (Figure A1c). This hybrid training approach helped to maintain uniform labeling standards and allowed contributors to collaborate effectively regardless of their location.

Figure A1. (a) Examples of the data annotation guidelines used for labeling and quality assurance; (b) our offline training center in Seoul; (c) our online training platform for remote data labeling education.

Figure A2 illustrates various bounding-box tagging scenarios:

Proper tagging (a): All objects are correctly identified and precisely outlined.
Wrong tagging (b): Bounding boxes are misplaced, either incorrectly aligned or mislabeled.
Under tagging (c): Some objects remain unannotated, resulting in incomplete labeling.
Over tagging (d): Bounding boxes are applied excessively or redundantly, potentially introducing confusion and inflating object counts.

By presenting common tagging errors alongside proper examples, these guidelines and training materials equip annotators with the knowledge to identify and avoid pitfalls. Consequently, the resulting dataset maintains a high level of annotation accuracy, which is crucial for reliable model training and evaluation.

Figure A2. Examples of labeling error corrections: (a) Proper tagging, where all objects are correctly annotated; (b) Wrong tagging, where bounding boxes are incorrectly placed or mislabeled; (c) Under tagging, where some objects are missed or not labeled; (d) Over tagging, where bounding boxes are excessively or redundantly applied.

Appendix A.2. Data Acquisition Detail

Ego-vehicle positional data are stored in .shp or .txt format using the WGS84-LLA coordinate system (developed by the National Geospatial-Intelligence Agency, Springfield, VA, USA). For speed and distance calculations, these data are transformed into the TM-KATECH (TM128) system from the Korea Automotive Technology Institute (KATECH), located in Cheonan, Republic of Korea, with values computed according to Equation (A1). Ego-vehicle and object positions are then matched to images captured at 20–30 m intervals (only those containing objects) and grouped into sequences by driving environment for further refinement.

During the pre-recording phase (Figure A3a,b), we first check and configure the filming equipment (e.g., frames per second and resolution), then inspect the vehicle and camera lens for any issues. A brief test drive follows to confirm normal operation and ensure stable image capture under typical driving conditions.

In the on-recording phase (Figure A3c,d), the driver maintains a constant speed, stays centered in the lane whenever it is safe to do so, and keeps an appropriate following distance from other vehicles. These measures help to produce uniform, high-quality footage by minimizing sudden movements or shifts in camera perspective.

Finally, in the post-recording phase (Figure A3e,f), we review the video for clarity and potential anomalies, re-verify all camera settings, and back up the newly recorded video along with its corresponding GPS data. This final step ensures the integrity and safety of the collected information before proceeding to the next stage of analysis.

\begin{matrix} d & = r archav (h) = 2 r arcsin (\sqrt{h}), \\ d & = 2 r arcsin (\sqrt{hav (φ_{2} - φ_{1}) + cos (φ_{1}) cos (φ_{2}) hav (λ_{2} - λ_{1})}), \\ = 2 r arcsin (\sqrt{{sin}^{2} (\frac{φ_{2} - φ_{1}}{2}) + cos (φ_{1}) cos (φ_{2}) {sin}^{2} (\frac{λ_{2} - λ_{1}}{2})}) . \end{matrix}

(A1)

Figure A3. An overview of the video recording workflow, from pre-recording to post-recording tasks.

Appendix A.3. Additional Data Detail Information

Table A1 provides a summary of how the annotated objects are distributed across various administrative regions. Notably, Gyeonggi-do accounts for the highest number of detected objects (506,115), followed by Gangwon-do (428,170) and Gyeongsangbuk-do (352,906). The overall distribution reflects a breadth of road infrastructure conditions, capturing both metropolitan areas (e.g., Seoul, Busan) and more non-city regions (e.g., Gangwon-do, Jeollabuk-do). By encompassing 2,676,583 objects across 16 distinct provinces and major cities, the dataset ensures extensive geographic representation.

Table A1. Regional distribution of detected objects.

Region	Object Count	Ratio
Gangwon-do	428,170	16.00%
Gyeonggi-do	506,115	18.91%
Gyeongsangnam-do (South)	183,905	6.87%
Gyeongsangbuk-do (North)	352,906	13.18%
Gwangju	27,178	1.02%
Daegu	74,706	2.79%
Daejeon	47,347	1.77%
Busan	50,348	1.88%
Seoul	96,767	3.62%
Sejong	10,157	0.38%
Ulsan	18,811	0.70%
Incheon	20,416	0.76%
Jeollanam-do (South)	182,585	6.82%
Jeollabuk-do (North)	188,040	7.03%
Chungcheongnam-do (South)	273,799	10.23%
Chungcheongbuk-do (North)	215,333	8.05%
Total	2,676,583	100.00%

Table A2 further breaks down the annotations according to road type (highway, national road, local road) and region type (city vs. non-city). Observably, the majority of objects lie in non-city sections, with a significant share along national roads and highways. However, city areas, while smaller in proportion, still feature a diverse range of annotated facilities on both national and local roads. These insights confirm that the dataset encompasses a variety of real-world traffic scenarios, making it suitable for training and evaluating AI models that must generalize to diverse environments—from city corridors to more sparsely populated roadways.

Table A2. Distribution by road type and region type.

Region Type	Road Type	Count	Ratio
City	National Road	193,387	7.23%
City	Local Road	9,649	0.36%
Non-City	Highway	781,924	29.21%
Non-City	National Road	1,478,748	55.25%
Non-City	Local Road	212,875	7.95%
Total	—	2,676,583	100.00%

Appendix A.4. Additional Experimental Results

High-Proportion of Small Object:
Figure A4a–c illustrates the distribution of object sizes in our dataset, revealing that many objects occupy only a small area of the image (i.e., low width and height). Object detection models often struggle with such small objects due to their limited pixel representation, which can negatively impact overall performance. When a significant portion of the dataset consists of these small objects, extracting meaningful features becomes even more challenging.
Extreme Aspect Ratios of Road Facilities:
Figure A4d shows the distribution of aspect ratios in our dataset. Object detection models often struggle with very tall and narrow or very wide and short objects, as these extremes occupy only a small portion of the image, making feature extraction and localization more difficult. Many detection frameworks (e.g., Faster R-CNN, YOLO) rely on predefined anchor boxes, which may not adequately cover such extreme shapes.

Figure A4. Distribution of object width, height, and area in dataset.

Occlusion-Induced Detection Failures:
Figure A5 shows two scenarios in which partial occlusion compromises object detection. In Figure A5a,b, a building partially obstructs certain objects, leading to incorrect bounding boxes or reduced IoU. In Figure A5c,d, tall trees obscure the target so severely that the model fails to detect it altogether. These examples underscore how occlusion—whether by buildings or vegetation—can result in detection errors or complete misses.

Figure A5. Occlusion-induced detection failures. Pairs (a–d) each present a scenario where partial occlusion hampers accurate detection.

References

Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Kuznetsova, A.; Rom, H.; Alldrin, N.; Uijlings, J.; Krasin, I.; Pont-Tuset, J.; Kamali, S.; Popov, S.; Malloci, M.; Kolesnikov, A.; et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vis. 2020, 128, 1956–1981. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
Neuhold, G.; Ollmann, T.; Rota Bulo, S.; Kontschieder, P. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4990–4999. [Google Scholar]
Yu, F.; Xian, W.; Chen, Y.; Liu, F.; Liao, M.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv 2018, arXiv:1805.04687. [Google Scholar]
Rebuffi, S.A.; Bilen, H.; Vedaldi, A. Learning multiple visual domains with residual adapters. Adv. Neural Inf. Process. Syst. 2017, 30, 506–516. [Google Scholar]
Zamir, A.R.; Sax, A.; Shen, W.; Guibas, L.J.; Malik, J.; Savarese, S. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3712–3722. [Google Scholar]
McCann, B.; Keskar, N.S.; Xiong, C.; Socher, R. The natural language decathlon: Multitask learning as question answering. arXiv 2018, arXiv:1806.08730. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Acuna, D.; Ling, H.; Kar, A.; Fidler, S. Efficient interactive annotation of segmentation datasets with polygon-rnn++. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 859–868. [Google Scholar]
Voigtlaender, P.; Krause, M.; Osep, A.; Luiten, J.; Sekar, B.B.G.; Geiger, A.; Leibe, B. Mots: Multi-object tracking and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7942–7951. [Google Scholar]
Milan, A. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
Choi, Y.; Kim, N.; Hwang, S.; Park, K.; Yoon, J.S.; An, K.; Kweon, I.S. KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 2018, 19, 934–948. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
Che, Z.; Li, G.; Li, T.; Jiang, B.; Shi, X.; Zhang, X.; Lu, Y.; Wu, G.; Liu, Y.; Ye, J. D²-city: A large-scale dashcam video dataset of diverse traffic scenarios. arXiv 2019, arXiv:1904.01975. [Google Scholar]
Vinothkumar, S.; Dhanushya, S.; Guhan, S.; Krisvanth, P. Enhancing Road Infrastructure Maintenance Using Deep Learning Approach. In Proceedings of the International Conference on Intelligent Systems Design and Applications, Online, 11–13 December 2023; Springer: Cham, Switzerland, 2023; pp. 205–214. [Google Scholar]
Silva, L.A.; Leithardt, V.R.Q.; Batista, V.F.L.; González, G.V.; Santana, J.F.D.P. Automated road damage detection using UAV images and deep learning techniques. IEEE Access 2023, 11, 62918–62931. [Google Scholar] [CrossRef]
Angulo, A.; Vega-Fernández, J.A.; Aguilar-Lobo, L.M.; Natraj, S.; Ochoa-Ruiz, G. Road damage detection acquisition system based on deep neural networks for physical asset management. In Proceedings of the Advances in Soft Computing: 18th Mexican International Conference on Artificial Intelligence, MICAI 2019, Xalapa, Mexico, 27 October–2 November 2019; Proceedings 18. Springer: Cham, Switzerland, 2019; pp. 3–14. [Google Scholar]
Sami, A.A.; Sakib, S.; Deb, K.; Sarker, I.H. Improved YOLOv5-based real-time road pavement damage detection in road infrastructure management. Algorithms 2023, 16, 452. [Google Scholar] [CrossRef]
Kulambayev, B.; Gleb, B.; Katayev, N.; Menglibay, I.; Momynkulov, Z. Real-Time Road Damage Detection System on Deep Learning Based Image Analysis. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1051–1061. [Google Scholar] [CrossRef]
Kim, H.; Kim, B.; Lu, W.; Li, L. No-code MLOps Platform for Data Annotation. In Proceedings of the 2023 IEEE International Conference on Memristive Computing and Applications (ICMCA), Jinan, China, 8–10 December 2023; pp. 1–6. [Google Scholar]
Botezatu, A.P.; Burlacu, A.; Orhei, C. A review of deep learning advancements in road analysis for autonomous driving. Appl. Sci. 2024, 14, 4705. [Google Scholar] [CrossRef]
Hwang, S.J.; Hong, S.W.; Yoon, J.S.; Park, H.; Kim, H.C. Deep Learning-based Pothole Detection System. J. Semicond. Disp. Technol. 2021, 20, 88–93. [Google Scholar]

Figure 1. Overview of our dataset.

Figure 2. Examples of nationwide roads collected for dataset construction. Highways, national roads, and local roads were categorized into city and non-city sections, allowing for the collection of diverse road-facility data across different regions. Subfigures illustrate (a) Distribution of roads, (b) City national roads, (c) City local roads, (d) Non-city highway roads, (e) Non-city national roads, and (f) Non-city local roads.

Figure 3. (a) The XNB-8003 camera supports 6-megapixel (3328 × 1872) resolution at 30 fps (H.265) with real-time video transmission. It includes fog correction, noise reduction, and gyro-based image stabilization for improved video clarity. (b) Vehicle specifications used for nationwide road imaging and GPS logging, each outfitted with integrated cameras and onboard systems to ensure high-quality data collection.

Figure 4. Data acquisition vehicles employed for nationwide road image and GPS data collection. A total of nine vehicles were deployed for data acquisition across nationwide roads, consisting of (a) one UOK-LAND I, (b) five UOK-LAND II, and (c) three survey vehicles from Sodasystem.

Figure 5. Our data annotation and management tool.

Figure 6. Our data processing procedure.

Figure 7. Personal data de-identification process.

Figure 8. Examples of road facilities requiring bounding-box annotations.

Figure 9. Examples of road facilities requiring polygon-based annotations.

Figure 10. Object-based distribution of road facilities.

Figure 11. Regional distribution of road facilities.

Figure 12. Examples of annotated data for different road types.

Figure 13. Inference results from the object detection model trained on our dataset.

Figure 14. Inference results from the segmentation model trained on our dataset.

Table 1. Comparison of our dataset (KRID) with other public road datasets.

Dataset	Ann. Images	Resolution	Classes of Road Infrastructure			Classes Total	Location	Instances
Dataset	Ann. Images	Resolution	Rd. Safety	Rd. Mgmt	Traffic Mgmt	Classes Total	Location	Bbox	Segment
BDD100K [10]	100 k	$1280 \times 720$	1	0	1	10	4 Cities	1,841,435	189,053
KAIST [18]	8.9 k	$1280 \times 960$	0	0	0	6	1 Met City	308,913	-
KITTI [19]	15 k	$1384 \times 1032$	0	0	0	8	1 City	∼200 k	-
D²-City [20]	700 k	$1920 \times 1080$	0	0	0	12	5 Cities	∼40 M	-
Cityscapes [8]	25 k	$2048 \times 1536$	2	2	2	30	50 Cities	-	65.4 k
Vistas [9]	25 k	$3264 \times 2448$	4	14	8	66	Global	-	∼2 M
KRID (Ours)	200 k	$3328 \times 1872$	16	8	8	34	Nationwide	2,031,633	644,950

Table 2. Road network length and ratio by region (Non-City and City).

Region	Non-City						City
Region	Highway	%	Nat. Road	%	Local Rd.	%	Nat. Road	%	Local Rd.	%
Gyeonggi-do	2589.90	20%	1793.57	8%	-	-	1488.40	25%	-	-
Gyeongsangbuk-do	1910.31	15%	4112.01	18%	-	-	258.53	7%	-	-
Gyeongsangnam-do	1452.26	11%	2669.52	12%	-	-	335.26	9%	-	-
Gangwon-do	1156.60	9%	3403.61	15%	-	-	171.70	5%	-	-
Chungcheongnam-do	1139.77	9%	2304.33	10%	3064.07	53%	420.05	9%	132.52	46%
Chungcheongbuk-do	969.28	8%	1608.34	7%	2691.57	47%	220.51	3%	158.01	54%
Jeollanam-do	1041.50	8%	3706.61	16%	-	-	135.70	4%	-	-
Jeollabuk-do	1004.58	8%	2554.93	11%	-	-	282.68	8%	-	-
Daegu	343.41	3%	44.90	0%	-	-	170.65	5%	-	-
Incheon	314.90	2%	45.17	0%	-	-	116.45	3%	-	-
Ulsan	257.06	2%	285.94	1%	-	-	84.72	2%	-	-
Daejeon	208.47	2%	59.68	0%	-	-	115.51	3%	-	-
Busan	201.85	2%	138.39	1%	-	-	136.18	4%	-	-
Gwangju	91.91	1%	62.19	0%	-	-	147.02	4%	-	-
Seoul	93.38	1%	2.98	0%	-	-	335.14	9%	-	-
Total (km)	12,775.19	100%	22,792.17	100%	5755.64	100%	4418.49	100%	290.54	100%
Percentage of total		27.80%		49.50%		12.50%		9.60%		0.60%
Ratio of Non-City to City =						89.80%				10.20%

Table 3. Data structure definition.

No.	Field Name	Type	Required	Description
1	license	array	N	Copyright
2	info[]	array	Y	General Information
2-1	info[].contributor	string	Y	Contributor
2-2	info[].date_created	string	Y	Creation Date
2-3	info[].description	string	Y	Data Description
2-4	info[].url	string	N	Image URL
2-5	info[].version	number	N	Processing Version
2-6	info[].year	number	N	Processing Year
3	images[]	array	Y	Image Information
3-1	images[].id	number	Y	Image ID
3-2	images[].file_name	string	Y	Image File Name
3-3	images[].width	number	Y	Image Width
3-4	images[].height	number	Y	Image Height
3-5	images[].latitude	number	N	Latitude
3-6	images[].longitude	number	N	Longitude
3-7	images[].day	string	Y	Shooting Date
3-8	images[].time	string	Y	Shooting Time
3-9	images[].camera	string	Y	Camera Model
3-10	images[].velocity	number	Y	Driving Speed
3-11	images[].direction	number	Y	Shooting Direction
3-12	images[].climate	string	Y	Weather at Shooting Time
3-13	images[].frame	number	Y	Frame Number
4	categories[]	array	Y	Object Information
4-1	categories[].id	number	Y	Object ID
4-2	categories[].name	enum	Y	Object Name
4-3	categories[].supercategory	string	N	Parent Category
5	annotations[]	array	Y	Annotation Information
5-1	annotations[].id	number	Y	Object ID
5-2	annotations[].image_id	number	Y	Image ID
5-3	annotations[].category_id	number	Y	Category ID
5-4	annotations[].bbox	array	Y	Bounding-Box Coordinates
5-5	annotations[].segmentation	array	Y	Segmentation Coordinates
5-6	annotations[].region1	enum	Y	Province/City
5-7	annotations[].region2	enum	N	Local District
5-8	annotations[].roadtype	enum	Y	Road Type
5-9	annotations[].regiontype	enum	Y	City/Non-City Type
5-10	annotations[].state	enum	Y	Facility Condition
5-11	annotations[].name	enum	Y	Facility Name
5-12	annotations[].subtype	string	N	Facility Subtype

Table 4. Distribution of road facilities in the dataset.

Facility Name	Type	Object Count	Ratio	Facility Category
CCTV	bbox	22,476	0.84%	Traffic Management
Utility pole	bbox	293,999	10.98%	Other
Seagull Mark	bbox	32,726	1.22%	Road Safety
Speed Bump	bbox	4773	0.18%	Road Safety
Traffic Signal	bbox	65,591	2.45%	Traffic Management
Diagonal Line	bbox	74,291	2.78%	Road Safety
Pillar	bbox	611,618	22.85%	Other
Emergency Contact Facility	bbox	160	0.01%	Traffic Management
Road Sign	bbox	202,289	7.56%	Traffic Management
Road Nameplate	bbox	31,281	1.17%	Traffic Management
Road Reflector	bbox	7503	0.28%	Road Safety
Road Guidepost	bbox	240,859	9.00%	Traffic Management
Variable Message Sign	bbox	13,401	0.50%	Traffic Management
Gaze-Directed Pole	bbox	104,075	3.89%	Road Safety
Delineator Sign	bbox	159,330	5.95%	Road Safety
Safety Sign	bbox	23,775	0.89%	Traffic Management
Obstacle Marker	bbox	13,067	0.49%	Road Safety
Lighting	bbox	128,500	4.80%	Road Safety
Raised Pavement Marker	bbox	1919	0.07%	Road Safety
Elevated Road	Polygon	28,727	1.07%	Road Management
Bridge	Polygon	3139	0.12%	Road Management
Rock-Fall Protection Wall	Polygon	40,683	1.52%	Road Safety
Rock-Fall Net	Polygon	7814	0.29%	Road Safety
Rock-Fall Fence	Polygon	31,774	1.19%	Road Safety
Guardrail	Polygon	307,060	11.47%	Road Safety
Eco-Engineering	Polygon	289	0.01%	Road Safety
Overpass Walkway	Polygon	2585	0.10%	Road Management
Interchange	Polygon	1111	0.04%	Road Management
Bus/Transit Stop	Polygon	11,834	0.44%	Road Management
Median Barrier	Polygon	169,411	6.33%	Road Safety
Underground Passage	Polygon	1326	0.05%	Road Management
Underpass	Polygon	1602	0.06%	Road Management
Shock Absorption	Polygon	27,655	1.03%	Road Safety
Tunnel	Polygon	9940	0.37%	Road Management
Total	-	2,676,583	100.00%	-

Table 5. Data split for training.

Images	Train	Validation	Test	Total
Number of Images	159,426	20,287	20,287	200,000
Percentage	79.70%	10.10%	10.10%	100.00%

Table 6. Object counts for bounding-box detection.

Objects	Train	Validation	Test	Total
Count	1,604,318	200,540	226,775	2,031,633
Ratio (%)	79.00%	9.90%	11.20%	100.00%

Table 7. Bounding-box object detection model: environment and training setup.

Item	Specification
CPU	Ryzen9 7900 (12 cores, 24 threads) (AMD, CA, USA)
Memory	32 GB
GPU	GeForce RTX 4080 (NVIDIA, CA, USA)
Storage	2 TB
OS	Ubuntu 22.04.3 (64-bit)
Development Language	Python 3.10.12
Training Algorithm	YOLOv4
Training Conditions	epoch: 15, learning_rate: 0.0013, optimizer: ADAM

Table 8. Bounding-box detection performance.

Dataset	Precision	Recall	mAP₅₀
Test Set	0.78	0.73	0.705744
Validation Set	0.78	0.73	0.7086

Table 9. Object counts for segmentation.

Objects	Train	Validation	Test	Total
Count	514,316	64,290	66,344	644,950
Ratio (%)	79.70%	10.00%	10.30%	100.00%

Table 10. Segmentation object detection model: environment and training setup.

Item	Specification
CPU	Xeon(R) Silver 4210 CPU @ 2.20 GHz (Intel, CA, USA)
Memory	256 GB
GPU	Tesla V100 (32 GB) $\times 2$ (NVIDIA, CA, USA)
Storage	2 TB
OS	Ubuntu 20.04.6 LTS
Development Language	Python 3.10.12
Training Algorithm	YOLOv5
Training Conditions	epoch: 60,
	start_learning_rate: 0.01, optimizer: SGD

Table 11. Segmentation performance on test and validation sets.

Dataset	Precision	Recall	mAP₅₀
Test Set	0.713	0.571	0.604
Validation Set	0.711	0.573	0.594

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, H.; Kim, E.; Ahn, S.; Kim, B.; Kim, S.J.; Sung, T.K.; Zhao, L.; Su, X.; Dong, G. KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition. Data 2025, 10, 36. https://doi.org/10.3390/data10030036

AMA Style

Kim H, Kim E, Ahn S, Kim B, Kim SJ, Sung TK, Zhao L, Su X, Dong G. KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition. Data. 2025; 10(3):36. https://doi.org/10.3390/data10030036

Chicago/Turabian Style

Kim, Hyeongbok, Eunbi Kim, Sanghoon Ahn, Beomjin Kim, Sung Jin Kim, Tae Kyung Sung, Lingling Zhao, Xiaohong Su, and Gilmu Dong. 2025. "KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition" Data 10, no. 3: 36. https://doi.org/10.3390/data10030036

APA Style

Kim, H., Kim, E., Ahn, S., Kim, B., Kim, S. J., Sung, T. K., Zhao, L., Su, X., & Dong, G. (2025). KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition. Data, 10(3), 36. https://doi.org/10.3390/data10030036

Article Menu

KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection Sites

2.2. Data Acquisition Tools

2.2.1. Imaging System

2.2.2. Vehicle System

2.2.3. Data Gathering Protocol

2.3. Data Annotation and Management

2.3.1. Data Annotation Tools

2.3.2. Data Processing and Management

2.4. De-Identification

2.5. Workforce Training

2.6. Data Description

3. Results

3.1. Object Detection Model (Bounding Box)

Object Detection Evaluation Results

3.2. Segmentation Model (Polygon)

Segmentation Evaluation Results

4. Discussion

4.1. Interpretation of Results

4.1.1. Balanced Coverage

4.1.2. Practical Utility

4.1.3. Comparison with Prior Work

4.2. Data Collection and Quality Assurance

4.3. Limitations

4.3.1. Class Imbalance

4.3.2. Environmental Constraints

4.3.3. Lack of Temporal Annotations

4.4. Practical and Theoretical Implications

4.5. Future Outlook

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Dataset

Appendix A.1. Data Annotation Guidelines

Appendix A.2. Data Acquisition Detail

Appendix A.3. Additional Data Detail Information

Appendix A.4. Additional Experimental Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI