1. Introduction
Among the different expenditure items that are part of the life cycle of transport infrastructure, maintenance is one of the most important. The development of transport infrastructure is a major investment for public administrations, and their life cycle can span decades. For that reason, proper maintenance is essential to get the best return on these investments [
1]. However, according to data gathered by the European Road Federation (ERF), the volume of investment in inland transport infrastructure has stalled since a significant cut after the 2008 crisis, when it reached its maximum [
2]. This is especially worrying considering that the transport of goods and passengers has been steadily growing for the last decade, and the infrastructure is ageing under a context of maintenance budget cuts. According to a study by Calvo-Poyo et al. [
3], spending on road maintenance not only prevents deterioration and prolongs the life of the infrastructure, but also increases road safety, reducing the death rate.
The use of new technologies for capturing, managing, and communicating information is essential to optimize the cost of infrastructure maintenance and to increase its security and resilience. Transport infrastructure digitalization is a key concept that is supposed to drive the transition towards the goals of the Sustainable and Smart Mobility Strategy of the European Union [
4].
Intelligent Transportation Systems (ITS) apply information and communication technologies to the infrastructure, vehicles and users, interfacing between different modes of transportation and potentially offering data management capabilities for road maintenance, and they are an intrinsic part of the future of transport [
5]. Together with ITS, remote sensing technologies are being extensively reported in the literature for transportation infrastructure maintenance and assessment. While technologies such as satellites or aerial drones are used for specific applications such as road network mapping [
6] or road traffic monitoring [
7], they have several limitations when compared with Mobile Mapping Systems (MMS) in terms of versatility for carrying out different road network maintenance activities [
8,
9,
10]. The main advantage of MMS in this context is the fact that they can be mounted on conventional vehicles, which drive around the infrastructure collecting data automatically, simplifying the field tasks of the operators, reducing their number and exposure to traffic, thus increasing their safety. The MMS are based on mapping technologies, mainly LiDAR and imagery (with laser scanners and cameras, respectively), providing 3D and 2D geometric and radiometric information of the environment, and global positioning technologies (Global Navigation Satellite System (GNSS)) that allow data geolocation. The literature has grown considerably during the last decade with applications of MMS for road infrastructure assessment, maintenance or inventory [
11,
12,
13].
Among the different applications that can be carried out using MMS data, this work will focus on the inventory of vertical traffic signs. Their retro-reflective properties make them one of the most important visual elements when driving, especially at night. This is critically relevant in the context of traffic safety, as a large proportion of traffic fatalities occur at night [
14], and the driver receives more than 90% of information visually [
15]. Studies as the one in [
16] show that the presence of traffic signalling and its correct layout improves driver behaviour, increasing traffic safety.
From the data collected by a MMS, the application of traffic sign inventory can have three different approaches, based on: (1) use of images, (2) use of point clouds, and (3) fusion of images and point clouds.
The image-based inventory of vertical traffic signs consists of the manual or automatic annotation of the signs present in images taken with cameras. In practice, the inventory process is still carried out manually or semi-automatically in many cases by maintenance companies, although the state of the art allows automatic processes with very good accuracies. This process can be divided into two parts: detection and recognition. Traffic sign detection involves the annotation of the position in the image coordinate system where the signs are located, while traffic sign recognition involves the assignment of semantics to the signs. In both cases, there are methods that allow an accurate resolution of these problems using architectures based on Deep Learning (DL). For the case of traffic sign detection, early approaches include the implementation of the AdaBoost based learning method and cascade structure [
17], and the combination of handcrafted features such as the Histogram of Oriented Gradients (HOG) together with Machine Learning (ML) algorithms such as Support Vector Machines (SVM) [
18], which were able to obtain almost perfect results in traffic sign recognition benchmarks, such as the German Traffic Sign Detection Benchmark (GTSDB) [
19]. In recent years, many DL strategies have been developed based on Convolutional Neural Networks (CNN), showing state-of-the-art results [
20,
21]. The CNN-based architectures such as YOLOv3 [
22], obtained remarkable results for real-time applications [
23,
24]. For traffic sign recognition, DL approaches have also a large presence in the literature. Arcos-García et al. [
25] employed a combination of convolutional and spatial transformer [
26] networks, and report a 99.71% accuracy at the German Traffic Sign Recognition Benchmark (GTSRB). Similar results have been achieved by large CNN ensembles together with data augmentation, as in [
27].
Point cloud approaches have also been explored in the literature. The ability of MMS to capture the geometry of the environment and the radiometric properties of materials, provided a promising new research line for applications such as road asset inventory. Pu et al. [
28] presented a methodology for structure recognition which, in the case of vertical signs, distinguished rectangular, circular and triangular shapes. Riveiro et al. [
29] further developed this idea, making use of the intensity parameter of the point cloud to easily detect vertical traffic sign panels. It was concluded that, although the geometric shapes of the signs could be detected automatically, the resolution of the 3D information was not sufficient to provide the detected signs with semantic information.
Image-based and point cloud approaches have complementary advantages and disadvantages. While semantic recognition in images can be performed accurately with DL-based techniques, the geolocation of signals is not as straightforward as in the case of point cloud-based methods. Therefore, it seems logical to merge both sources of information to carry out inventory tasks. One possible approach is based on detecting signals in the point cloud from the geometric and radiometric properties of traffic signs, and projecting the 3D information onto images where recognition is performed using ML or DL approaches, given that the point cloud and the images are synchronized with each other [
30,
31]. Other works have a complementary workflow, performing both detection and recognition on the images, and projecting the result on the point cloud to geolocate the position of the road sign [
8].
This paper presents a methodology for traffic signal inventory using MMS that follows the latter approach. First, the detection and recognition of traffic signs in images is carried out, and then the geolocation is performed on the point cloud, projecting the results obtained on the images. The contribution of this paper aims to close two gaps from previous works:
- (1)
While previous works use high-end MMS (RIEGL VMX-450 in [
30], Optech LYNX in [
8,
31]), which provide the calibration of their cameras and laser scanner system, this work uses a low-cost MMS with a commercial camera that was manually mounted together with the laser scanner. Thus, this methodology offers a complete workflow that includes the calibration between the camera and the laser scanner system to carry out the geolocation of traffic signs.
- (2)
The 3D visualization of inventory results is a pending work in the literature. In this work, a 3D WebGIS based on Potree architecture [
32] of large point cloud datasets is proposed.
This work is organized as follows.
Section 2 will describe the case study data and the proposed methodology for traffic sign inventory.
Section 3 will show and discuss the results obtained, focusing on the specific contributions that are being made. Finally,
Section 4 will outline the conclusions and future directions of this work.
4. Discussion
The results obtained raise several questions for discussion in this section. First, the geolocation of road signs depends on their correct detection in the images. That is why for those signs for which the Deep Learning model has not been trained, no detection and therefore no geolocation will be obtained, having a negative impact on the inventory process. However, it should be noted that the methodology is proposed as a combination of image-based and 3D point cloud information, so this disadvantage could be overcome by combining the results of an image-based detection, as in this methodology, and that of a 3D point cloud detection based on the intensity parameter and the geometry of the point cloud. The combination of both types of detection would allow the inventory of traffic signs that are not common as a large enough dataset to train a classification model that includes them is not available.
Exploiting the 3D point cloud data to improve the output of this methodology could have more benefits to the overall performance of the inventory process. While this work focuses on the calibration process of the Camera-Lidar system, and the projection from the image to the point cloud, extracting geometric information to enrich the inventory from the information in the 3D point cloud would be straightforward. Geometric measurements of the signal, or its orientation (towards which direction it points) are parameters that can be extracted from the point cloud and that would improve the level of information output of the method.
Finally, it is relevant to discuss the possible causes of error in the geolocation of signals that were correctly detected with the DL architecture. As can be seen in
Figure 7, the checkboard used for calibration had to be placed on the right-hand side of the image to enable overlap with the lidar beams. This means that, although the reprojection errors were acceptable, they were smaller in the part of the image where the checkboard was placed. This implies that for signals appearing on the left side of the image (which will also be further from the MMS than a traffic sign on the right side of the image), it is possible that the reprojection will not be performed correctly and it will not be possible to geolocate the signal. This disadvantage could be solved in several ways. On the one hand, by redesigning the placement of the sensors so that the signs are captured more optimally: with the camera pointing to the right in the forward direction of the vehicle and positioned in such a way that the lidar system beams overlap with the central part of the image; on the other hand, by taking pictures more frequently, to avoid cases where there is no optimal position of a signal in any image. In any case, the results obtained with this system allow us to ensure that the methodology of calibration and fusion of 2D and 3D data are valid for the inventory of traffic signs.