**1. Introduction**

Communication and mobility of people and goods are key elements of modern societies and developing countries. Economic growth has a huge dependence on and a big relationship with transport networks. Infrastructures such as ports (maritime and river), airports, railways, highways, and roads are among the most relevant transport systems to guarantee the quality of life of people. This relevance is well known by the EU. Proof of that are substantial national and EU funds spent on transport infrastructures every year [1]. These policies are developed based on annual budgets dedicated to new project construction and maintenance of existing infrastructures. In recent times, EU infrastructure policies are changing, focusing more on keeping existing infrastructure in good working condition and less on new construction [2]. This goal has been gathered, among other ways, through different national and EU research work programs (e.g., smart, green, and integrated transport in the case of H2020) [3]. This has promoted numerous activities to improve the service given to society in fields such as monitoring, resilience, reduction of fatal accidents, traffic disruption, maintenance costs, and improvement of network capacity.

Highways and roads are the most used infrastructures for mobility in short distances. As a consequence, their conservation and maintenance show high relevance in terms of safety and secure mobility and reducing associated costs [4,5]. New concepts, called digital infrastructure and Intelligent Transport System (ITS), are being developed in parallel with new concepts for mobility: resilient and fully automated infrastructures; electric, connected and autonomous cars [6].

Concepts of digital infrastructure and ITS are connected. Digital road and permanent monitoring are the bases of any ITS applied to highways and roads to ensure safe mobility and good service conditions. There are different techniques and technologies to achieve digital road monitoring. Depending on the effectiveness and applicability, the most used are based on satellite images, aerial images, and Mobile Mapping System (MMS) solutions.

The low resolution of satellite images makes it impossible to extract certain information from linear infrastructures [7]. Roads, highways, or railways are detectable in the satellite images, but it is not feasible to know the state of the pavement, rails, or their signalling. As a consequence, the scale of work is too small to get effective results with the ITS.

Aerial solutions have grown hugely in recent times based on civil drones and remotely piloted civil systems [8]. This is an emerging market of huge interest. However, drones still have many legal limitations related to the safety and protection of people. These drawbacks limit their use in many fields, among them transport infrastructures.

The MMS solutions, based on Light Detection and Ranging (LiDAR) technology, images (video and panoramic), and GNSS (Global Navigation Satellite System) technologies (for data geolocation) [9], are mature solutions that saw limited growth for mainly two reasons: the high price of the technology and high cost of processing the captured data (in terms of labour cost). Notwithstanding, the market is showing novel and very active emerging low-cost and multiplatform solutions for autonomous vehicles. On the other side, big data and artificial intelligence techniques allow efficient data processing [10].

This work is focused on developing a technical solution to generate infrastructure digital models and road infrastructure inventory based on the MMS. This work is applied to a specific component of roads, i.e., traffic signs (TSs), which are very relevant in transport infrastructures for the safety and security of people. The objective is the fast TS detection, recognition, and classification with accurate localization. But the application field of the proposed method is not limited to TSs. The proposed method shows high relevance for autonomous mobility solutions and urban planning. Based on them, the solution provides key information about on existing traffic signs, including accurate geolocation parameters.

This paper is organized as follows: Section 2 collects related work about traffic sign detection, recognition, and mapping in images and point clouds. Section 3 explains the designed method. Section 4 presents and discusses the results obtained from applying the method to case studies, and Section 5 concludes the work.

#### **2. Related Work**

The interest in the off-line automation of traffic sign inventory has increased in recent years. Previously, proposed approaches tackled the particular properties of traffic signs (i.e., retro-reflectivity for night-time visibility, colour, shape, size, height, orientation, planarity, and verticality), usually following safety standards. These properties require traffic signs to be treated as different objects from traffic lights [11–13], poles [14], lanes [15–17], trees [18], and other objects present in roads. A review of approaches depending on the object can be found in [19].

Traffic sign (TS) current technology provides mainly two sources of data: 3-D georeferenced point clouds acquired through Mobile Laser Scanning (MLS) techniques; and digital images from a still camera or as a frame extracted from a video. 3-D point-cloud data contains precise information related to 3-D location and geometrical properties of the TS, as well as intensity. However, resolution of most MMS techniques under normal use is not accurate enough to recognise all TS classes. Images are used to overcome that weakness as they contain visual properties, despite the lack of spatial information. Since the objective in automated traffic sign inventory is to accurately determine placement in global coordinates and the specific type of each traffic sign on the road, point cloud and image become complementary [20–22].

For TS inventory to be automated it is required to follow four main steps: traffic sign detection (TSD), segmentation (TSS), recognition or classification (TSR), and TS 3-D location. TSD aims to identify regions of interest (ROI) and boundaries of traffics signs. In TSS, a segment corresponding to the object is separated from the set of input data. TSR consists of determining the meaning of the traffic sign. Meanwhile, TS 3-D location deals with estimating 3-D position and orientation, or pose, of the TS. A variety of approaches for these steps have been proposed in literature directly or indirectly related to TS inventory.

One group of these approaches defines techniques focused on detecting and segmenting the set of points with spatial information of the TS from 3-D laser scanner point clouds. These techniques are based on the a priori knowledge of 3-D location, geometrical and/or retro-reflective properties. All approaches are conditioned by the huge amount of information contained in point clouds (see, for instance, [23–30]). With the aim of accurate TSR, aforementioned approaches combine point clouds with images to extract features. As a previous step to TSR, segmented points can be projected onto the corresponding 2-D image in the traffic-sign-mapping (TSM) step. A review of methods for TSR in point-cloud and image approaches can be found in [31].

These types of techniques based on TSD in 3-D point cloud and TSR in image are accurate and reliable for TS inventory. However, they entail high time and computational costs, mainly for the TSD and TSS steps. As an alternative, images can be used not only for TSR but also for TSD without making use of the 3-D point cloud. Some authors have used TSD in image for coarse segmentation of the 3-D point cloud [32,33].

TSD, TSS, and TSR in image, which become TSDR, have been extensively studied for TS inventory as well as for other applications such as advanced driver assistance systems (ADAS). The vast variety of techniques proposed by the computer–vision community have been reviewed and compared, detailing advantages and drawbacks, in [34–38]. Recently, Wali et al. [39] provided a comprehensive survey on vision-based TSDR systems.

According to them, in TSDR image-based techniques detection consists in finding the TS bounding box, while recognition involves classification by giving an image a label. Common TSD methods are: colour-based, on different colour spaces, i.e., RGB, CIELab, and HIS [40]; shape-based, such as Hough Transform (HT) and Distance Transform (DT); texture-based, such as Local Binary Patterns (LBP) [41]; and hybrid. By these methods a feature vector is extracted from image with lower computational cost than from 3-D point cloud. Then, the class label of the feature vector is obtained using a classifier such as Support Vector Machine (SVM) or with Deep Learning-based (DL) methods [42–44]. Among the latter, Convolutional Neural Networks (CNN) have been widely adopted, given their high performance in both TSD and TSR in images [45–48] and in point clouds [49].

Regarding TS inventory, TSDR in image requires the TS 3-D location to be completed. TS 3-D location, after TSDR, has been considered by several authors in image-based 3-D reconstruction approaches without making use of a 3-D point cloud. These techniques require prior accurate camera calibration and removement perspective distortion. In [50], 3-D localization is based on epipolar geometry of multiple images, while Hazelhoff et al. [51] calculated the position of the object from panoramic images referenced to northern direction and horizon. Balali et al. [52] built a point cloud by photogrammetry techniques using a three parameter pinhole model. Wang et al. [53] used stereo vision and triangulation techniques. In [54], 3-D reconstruction is conducted by geometric epipolar, taking into account geometric shape of TS.

While TSDR in image is proved as high-performance, reconstruction models for TS 3-D location from image are overcome in precision by 3-D point-cloud-based location. However, little research has paid attention to techniques for TS inventory that jointly takes advantage of TSDR in image and TS 3-D location from the 3-D MLS point cloud. In [32], a method to combine DL with retro-reflective properties is developed for TS extraction and location from a point cloud. In [55], TS candidates are detected on images based on colour and shape features and filtered with point-cloud information. Most authors use point clouds for TSD and images only for TSR [23–26,28,29].

In contrast to other approaches, in this work a data flow is implemented to minimize processing times by taking advantage of each type of data. First, images are used for TSD and TSR. Image processing is faster than point-cloud processing and allows the application of DL techniques, which right now are state of the art. In addition, the design of a modular workflow allows each network to be replaced in the future as its success rates increase. To maximize a correct TS identification, different networks for TSD and TSR are used, unlike other works that use the same network to detect and classify, see also [44,48]. After image processing, point clouds are used to filter out multiple TS detections and false positives. Point clouds allow more precise geolocation than the use of epipolar geometry of multiple images. Point clouds are not used for detection and classification since:

