*3.3. Tra*ffi*c Sign 3-D Location*

The projection of TSs detected in images onto the georeferenced point cloud is done using the pinhole model [60]. While in other works the four vertices of the detection polygon have been projected, in this work, only the central TS point is projected Sc. This saves processing time and minimizes calibration error. Another alternative would be detecting the pole directly or after TS detection. Pole detection would mean more precise positioning, but it has the following limitations: (1) poles may not have enough points to be easily detected, (2) some TSs share a pole, and (3) some TSs are located on traffic lights, light posts, or buildings, so specific detection methods are needed for each case. In view of the above, the authors have chosen to consider the error of positioning the TS to the pole

in the optical centre C of each image I.

point-cloud traffic signal ௦ = ሾ௪, ௪, ௪, 1ሿ்.

as negligible, and obtain a simpler and faster method based on TS location and positioning. The TS location in point cloud P<sup>S</sup> is done by projecting a line → *CS<sup>c</sup>* defined by the camera focal point *C* and the central sign-point S<sup>c</sup> (pinhole model in Equation (1)). matrices ሾR|tሿ = ሾଵ|ଵሿሾଶ|ଶሿ relates the positioning of the pixels with the image by calibration prior to implementation of the method. Once the matrix ሾଵ|ଵሿ for one image is obtained, it is valid for all images acquired with the same equipment. The calibration is done by manually selecting the four pairs of pixels in images and points in the point cloud per image. ሾଶ|ଶሿ positions the camera

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 6 of 15

provided by the manufacturer; ሾR|tሿ is the extrinsic camera parameters matrix; ௦ is the centre of the

$$\mathbf{s} \cdot \mathbf{S}\_{\mathbf{c}} = \mathbf{K} \begin{bmatrix} \mathbf{R} | \mathbf{t} \end{bmatrix} \mathbf{P}\_{\mathbf{s}} \tag{1}$$

where *s* is the scalability factor; *S<sup>c</sup>* is the centre of traffic signal S detected in an image I, *S<sup>c</sup>* = [*u*, *v*, 1] *T* with u = *I<sup>x</sup>* + w and v = *I<sup>y</sup>* − *<sup>h</sup>*/<sup>2</sup> w; K is the intrinsic camera parameters matrix provided by the manufacturer; [R|t] is the extrinsic camera parameters matrix; *P<sup>s</sup>* is the centre of the point-cloud traffic signal *P<sup>s</sup>* = [*Xw*,*Yw*,*Zw*, 1] *T* . The TS points in the point cloud Ps form a plane ௦. The TS is located in the intersection between the projection of the line ሬሬሬሬሬሬ⃗ following the pinhole model and plane ௦ (Figure 2), ௦ = ሬሬሬሬሬሬ⃗ ∩ ௦. In order to reduce processing time, a region of interest (ROI) is delimited in the point cloud to calculate possible ௦ planes (Figure 3). First, points located at a distance more than *d* from the MMS

The rotation and translation matrix [R|t] positions the camera in the same coordinate system as the point cloud Ps, which is already georeferenced. [R|t] is formed by two rotation–translation matrices [R|t] = [*R*1|*t*<sup>1</sup> ][*R*2|*t*2] relates the positioning of the pixels with the image by calibration prior to implementation of the method. Once the matrix [*R*1|*t*<sup>1</sup> ] for one image is obtained, it is valid for all images acquired with the same equipment. The calibration is done by manually selecting the four pairs of pixels in images and points in the point cloud per image. [*R*2|*t*2] positions the camera in the optical centre C of each image I. location at the time of taking the image are discarded. Distant TSs from the MMS are considered to have very low point density for correct location. Distant TSs also are detected in successive images captured near the MMS. Second, points located at a larger distance than *r* from line ሬሬሬሬሬሬ⃗ are discarded. Third, points not located in the image orientation are discarded. TSs detected in images cannot be in a point cloud in a different orientation. For remaining points, planes are detected in order to discard point not in planes. Since TSs are planar elements, planar estimation avoids false locations due to

The TS points in the point cloud P<sup>s</sup> form a plane *T<sup>s</sup>* . The TS is located in the intersection between the projection of the line → *CS<sup>c</sup>* following the pinhole model and plane *T<sup>s</sup>* (Figure 2), *P<sup>s</sup>* = → *CS<sup>c</sup>* ∩ *T<sup>s</sup>* . noise points crossing the projection line ሬሬሬሬሬሬ⃗.

**Figure 2.** Pinhole model used to project traffic signs (TSs) detected in the image onto the point cloud.

cloud. In order to reduce processing time, a region of interest (ROI) is delimited in the point cloud to calculate possible *T<sup>s</sup>* planes (Figure 3). First, points located at a distance more than *d* from the MMS location at the time of taking the image are discarded. Distant TSs from the MMS are considered to have very low point density for correct location. Distant TSs also are detected in successive images captured near the MMS. Second, points located at a larger distance than *r* from line → *CS<sup>c</sup>* are discarded. Third, points not located in the image orientation are discarded. TSs detected in images cannot be in a point cloud in a different orientation. For remaining points, planes are detected in order to discard point not in planes. Since TSs are planar elements, planar estimation avoids false locations due to noise points crossing the projection line → *CSc*.

**Figure 2.** Pinhole model used to project traffic signs (TSs) detected in the image onto the point

orientation; **d**) location of *Ps* in the first *S* of points forming a plane and crossed by ሬሬሬሬሬሬሬ⃗.

*3.4. Redundant Traffic Sign Filtering* 

cloud.

provided by the manufacturer; ሾR|tሿ is the extrinsic camera parameters matrix; ௦ is the centre of the

 The rotation and translation matrix ሾR|tሿ positions the camera in the same coordinate system as the point cloud Ps, which is already georeferenced. ሾR|tሿ is formed by two rotation–translation matrices ሾR|tሿ = ሾଵ|ଵሿሾଶ|ଶሿ relates the positioning of the pixels with the image by calibration prior to implementation of the method. Once the matrix ሾଵ|ଵሿ for one image is obtained, it is valid for all images acquired with the same equipment. The calibration is done by manually selecting the four pairs of pixels in images and points in the point cloud per image. ሾଶ|ଶሿ positions the camera

The TS points in the point cloud Ps form a plane ௦. The TS is located in the intersection between the projection of the line ሬሬሬሬሬሬ⃗ following the pinhole model and plane ௦ (Figure 2), ௦ = ሬሬሬሬሬሬ⃗ ∩ ௦.

In order to reduce processing time, a region of interest (ROI) is delimited in the point cloud to calculate possible ௦ planes (Figure 3). First, points located at a distance more than *d* from the MMS location at the time of taking the image are discarded. Distant TSs from the MMS are considered to have very low point density for correct location. Distant TSs also are detected in successive images captured near the MMS. Second, points located at a larger distance than *r* from line ሬሬሬሬሬሬ⃗ are discarded. Third, points not located in the image orientation are discarded. TSs detected in images cannot be in a point cloud in a different orientation. For remaining points, planes are detected in order to discard point not in planes. Since TSs are planar elements, planar estimation avoids false locations due to

point-cloud traffic signal ௦ = ሾ௪, ௪, ௪, 1ሿ்.

in the optical centre C of each image I.

noise points crossing the projection line ሬሬሬሬሬሬ⃗.

**Figure 3.** Top view of ROI delimitation in a point cloud road environment: **a**) delimitation by distance *d* from camera location; **b**) delimitation by distance from projection TS line; **c**) delimitation by image orientation; **d**) location of *Ps* in the first *S* of points forming a plane and crossed by ሬሬሬሬሬሬሬ⃗. **Figure 3.** Top view of ROI delimitation in a point cloud road environment: (**a**) delimitation by distance *d* from camera location; (**b**) delimitation by distance from projection TS line; (**c**) delimitation by image orientation; (**d**) location of *Ps* in the first *S* of points forming a plane and crossed by → *CSc*.

#### *3.4. Redundant Traffic Sign Filtering 3.4. Redundant Tra*ffi*c Sign Filtering Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 7 of 15

Since the same TS can be detected in multiple images, multiple detections of the same TS must be simplified. The filtering is done with information of the classified TS, because one post can contain TSs of different classes. TSs of the same class grouped in a smaller radius than *f* are eliminated, leaving only the first detected (Figure 4). Since the same TS can be detected in multiple images, multiple detections of the same TS must be simplified. The filtering is done with information of the classified TS, because one post can contain TSs of different classes. TSs of the same class grouped in a smaller radius than *f* are eliminated, leaving only the first detected (Figure 4).

**Figure 4.** Filtering of the same TS class in a radius *f.*  **Figure 4.** Filtering of the same TS class in a radius *f*.

#### **4. Experiments 4. Experiments**

#### *4.1. Equipment and Parametes 4.1. Equipment and Parametes*

epoch is shown in Figure 5.

The MMS equipment used for this work consisted of a Lynx Mobile Mapper, with a Ladybug5 360° camera and a GPS-IMU Applanix POS LV 520. The cube-images had a resolution of 2448 x 2024 pixels and they were captured with a frequency of 5 meters in MMS trajectory. The point cloud was a continuous acquisition over time. The values of parameters *d* and *r* to delimit the ROI were set at 15 m and 2 m, respectively. The value of parameter *f* was set to 1 m in order to simplify duplicate The MMS equipment used for this work consisted of a Lynx Mobile Mapper, with a Ladybug5 360◦ camera and a GPS-IMU Applanix POS LV 520. The cube-images had a resolution of 2448 × 2024 pixels and they were captured with a frequency of 5 m in MMS trajectory. The point cloud was a continuous acquisition over time. The values of parameters *d* and *r* to delimit the ROI were set at 15 m and 2 m, respectively. The value of parameter *f* was set to 1 m in order to simplify duplicate signals.

signals. For the RetinaNet training, 9500 images were used with 12,036 TSs, obtained by the 360° camera and labelled. The training of the InceptionV3 networks was carried out with data sets of Belgium [50], Germany [61], and images of Spanish traffic signs. The whole process (training and testing in real case studies) was executed on a CPU computer Intel i7 6700, 32 GB RAM, GPU Nvidia 1080ti. The For the RetinaNet training, 9500 images were used with 12,036 TSs, obtained by the 360◦ camera and labelled. The training of the InceptionV3 networks was carried out with data sets of Belgium [50], Germany [61], and images of Spanish traffic signs. The whole process (training and testing in real case studies) was executed on a CPU computer Intel i7 6700, 32 GB RAM, GPU Nvidia 1080ti. The code was combined TensorFlow–Python for TSD and TSR and C++ for 3-D location and filtering.

code was combined TensorFlow–Python for TSD and TSR and C++ for 3-D location and filtering. The RetinaNet training consumed 70 h with hyper-parameter optimization method *adam*, learning rate 1e-5, L2 Regularization 0.001, Max Epochs 50 and Batch Size 1. The hyper-parameters for the three Inceptionv3 training were optimization method *sgdm*, learning rate 1e-4, Momentum 0.9, Max Epochs 126 and Batch Size 64. The training of the triangular signs required 50 minutes, 12,995 The RetinaNet training consumed 70 h with hyper-parameter optimization method *adam*, learning rate 1e-5, L2 Regularization 0.001, Max Epochs 50 and Batch Size 1. The hyper-parameters for the three Inceptionv3 training were optimization method *sgdm*, learning rate 1e-4, Momentum 0.9, Max Epochs 126 and Batch Size 64. The training of the triangular signs required 50 min, 12,995 samples for training and 407 for validation. The training of the circular signs required 80 min, 25,000 samples for training

and 743 for validation. The training of the squared signs required 7 min, 1094 samples for training and *Remote Sens.*  243 for validation. The training process in terms of loss per epoch is shown in Figure **2020** 5. , *12*, x FOR PEER REVIEW 8 of 15

**Figure 5.** Evolution of the loss during training processes: **a**) RetinaNet detector training loss, **b**) InceptionV3 triangular-signs classifier training loss, **c**) InceptionV3 circular-signs classifier training **Figure 5.** Evolution of the loss during training processes: (**a**) RetinaNet detector training loss, (**b**) InceptionV3 triangular-signs classifier training loss, (**c**) InceptionV3 circular-signs classifier training loss, (**d**) InceptionV3 squared-signs classifier training loss.

#### loss, **d**) InceptionV3 squared-signs classifier training loss. *4.2. Case Studies*

*4.2. Case Studies*  The methodology was tested in two real case studies: two secondary roads located in Galicia (Spain) denominated EP9701 and EP9703. Road EP9701 case study was 9.2 km long, the point cloud contained 350 million points and was acquired with 7392 images. Road EP9703 case study was 5.5 km long, the point cloud contained 180 million points and was acquired with 4520 images. Both roads were located in rural areas where houses, fields, and wooded areas were interspersed. The roads had The methodology was tested in two real case studies: two secondary roads located in Galicia (Spain) denominated EP9701 and EP9703. Road EP9701 case study was 9.2 km long, the point cloud contained 350 million points and was acquired with 7392 images. Road EP9703 case study was 5.5 km long, the point cloud contained 180 million points and was acquired with 4520 images. Both roads were located in rural areas where houses, fields, and wooded areas were interspersed. The roads had frequent crossings and curves. The sign-posting of both roads was abundant and in good condition, with few samples that were damaged or partially occluded. The case studies were processed in 30 and 20 min, respectively.

frequent crossings and curves. The sign-posting of both roads was abundant and in good condition, with few samples that were damaged or partially occluded. The case studies were processed in 30 and 20 minutes, respectively. The acquisition was performed at the central hours of the day (to minimize shadows) and on a sunny day without fog, so as not to affect visibility. The MLS maintained a constant driving speed of The acquisition was performed at the central hours of the day (to minimize shadows) and on a sunny day without fog, so as not to affect visibility. The MLS maintained a constant driving speed of approximately 50 km/h, although this speed was reduced by following rules at intersections or traffic lights. Point density increased as the driving speed decreased. It was estimated that the points in acquisition direction were 1 cm closer for every 10 km/h that the speed was reduced.

#### approximately 50 km/h, although this speed was reduced by following rules at intersections or traffic *4.3. Results*

lights. Point density increased as the driving speed decreased. It was estimated that the points in acquisition direction were 1 cm closer for every 10 km/h that the speed was reduced. TS accounting was done manually by reviewing acquired images, detected signals, classified signs and their locations in the point cloud. Table 1 shows the image count for each case study.

*4.3. Results*  TS accounting was done manually by reviewing acquired images, detected signals, classified signs and their locations in the point cloud. Table 1 shows the image count for each case study. TSs were correctly detected at 89.7%, while 10.3 % were not detected. The use of the 360◦ camera and the cube-images made it possible to locate TSs in the opposite and lateral directions to the MMS movement. Some of the undetected TSs were partially occluded or were eliminated in the redundant TS filtering process (Section 3.4), since they were traffic signs of the same class separated within a distance *f*. Figure 6 shows examples of detected TSs.

and the cube-images made it possible to locate TSs in the opposite and lateral directions to the MMS movement. Some of the undetected TSs were partially occluded or were eliminated in the redundant

 **EP9701 EP9703 TOTAL** 

**TS total** 98 116 214 **TS detected** 86 87.8% 106 91.4% 192 89.7% **TS undetected** 12 12.2% 10 8.6% 22 10.3% **False detections** 22 19.5% 27 19.7% 49 19.6% **TS duplicated** 5 5.1% 4 3.4% 9 4.2% **TS correctly classified** 84 92.3% 102 92.7% 186 92.5% **TS uncorrectly classified** 7 7.7% 8 7.3% 15 7.5% TSs were correctly detected at 89.7%, while 10.3 % were not detected. The use of the 360° camera

**Table 1.** Results.


**Table 1.** Results.

**Figure 6.** TSs detected: case study 1 with frontal cube-image (**a**), and with lateral cube-image (**b**), case study 2 with frontal cube-image (**c**) and with lateral cube-image (**d**). Detected TSs remarked in red boxes and filtered TSs remarked in green boxes. **Figure 6.** TSs detected: case study 1 with frontal cube-image (**a**), and with lateral cube-image (**b**), case study 2 with frontal cube-image (**c**) and with lateral cube-image (**d**). Detected TSs remarked in red boxes and filtered TSs remarked in green boxes.

A high percentage of false detections was counted (19.6%). Of these, traffic mirrors represented 81.8% and 37% of false detections in case studies 1 and 2 respectively. Mirrors have a circular shape surrounded by a red ring, so they were detected as false circular signals. Although the use of the point cloud has been considered to eliminate these false positives, since mirrors should not have points due to their high reflectivity, in the case studies the mirrors contained points due to their dirt or deterioration. Nor have any characteristics been found that differentiate mirror points from TS points. The remaining false detections corresponded to different objects on roadsides. Figure 7 shows some examples of false detections. A high percentage of false detections was counted (19.6%). Of these, traffic mirrors represented 81.8% and 37% of false detections in case studies 1 and 2 respectively. Mirrors have a circular shape surrounded by a red ring, so they were detected as false circular signals. Although the use of the point cloud has been considered to eliminate these false positives, since mirrors should not have points due to their high reflectivity, in the case studies the mirrors contained points due to their dirt or deterioration. Nor have any characteristics been found that differentiate mirror points from TS points. The remaining false detections corresponded to different objects on roadsides. Figure 7 shows some examples of false detections.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 10 of 15

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 10 of 15

**Figure 7.** False detections (red boxes) caused by a road mirror in case study 1 (**a**) and by an awning in case study 2 (**b**). **Figure 7.** False detections (red boxes) caused by a road mirror in case study 1 (**a**) and by an awning in case study 2 (**b**). **Figure 7.** False detections (red boxes) caused by a road mirror in case study 1 (**a**) and by an awning in case study 2 (**b**).

Duplicate TSs were not filtered due to incorrect positioning by the TS 3-D location process (Section 3.3). In the input cube-side images, 382 TSs were detected in case study 1 and 441 signs in case study 2. After the 3-D localization and redundant filtering processes, the set of detections was reduced to 113 and 137 TSs, respectively. Duplicated TSs were 4.2% of the total. Duplicate TSs were not filtered due to incorrect positioning by the TS 3-D location process (Section 3.3). In the input cube-side images, 382 TSs were detected in case study 1 and 441 signs in case study 2. After the 3-D localization and redundant filtering processes, the set of detections was reduced to 113 and 137 TSs, respectively. Duplicated TSs were 4.2% of the total. Duplicate TSs were not filtered due to incorrect positioning by the TS 3-D location process (Section 3.3). In the input cube-side images, 382 TSs were detected in case study 1 and 441 signs in case study 2. After the 3-D localization and redundant filtering processes, the set of detections was reduced to 113 and 137 TSs, respectively. Duplicated TSs were 4.2% of the total.

The positioning of a TS was based on the georeferenced point cloud, where the authors assumed that the location of the point cloud corresponded precisely to reality, as in [23]. Authors also considered that the TSs positioned in the correct TS point cloud were correctly located (0 m error). A total 97.5% of the detected TSs corresponded to points belonging to TSs (Figure 8). Only five TSs were positioned with an error of between 0.5 meters and 8 meters to the real location of the sign. These TSs not correctly positioned in the corresponding TS point clouds were manually measured from their incorrect detected location to the real TS location in the point cloud. The positioning of a TS was based on the georeferenced point cloud, where the authors assumed that the location of the point cloud corresponded precisely to reality, as in [23]. Authors also considered that the TSs positioned in the correct TS point cloud were correctly located (0 m error). A total 97.5% of the detected TSs corresponded to points belonging to TSs (Figure 8). Only five TSs were positioned with an error of between 0.5 m and 8 m to the real location of the sign. These TSs not correctly positioned in the corresponding TS point clouds were manually measured from their incorrect detected location to the real TS location in the point cloud. The positioning of a TS was based on the georeferenced point cloud, where the authors assumed that the location of the point cloud corresponded precisely to reality, as in [23]. Authors also considered that the TSs positioned in the correct TS point cloud were correctly located (0 m error). A total 97.5% of the detected TSs corresponded to points belonging to TSs (Figure 8). Only five TSs were positioned with an error of between 0.5 meters and 8 meters to the real location of the sign. These TSs not correctly positioned in the corresponding TS point clouds were manually measured from their incorrect detected location to the real TS location in the point cloud.

**Figure 8.** Traffic sign location in point cloud (blue point) and labelled (**a**–**d**). **Figure 8.** Traffic sign location in point cloud (blue point) and labelled (**a**–**d**). **Figure 8.** Traffic sign location in point cloud (blue point) and labelled (**a**–**d**).

With regard to signal recognition, 92.5% of the detected TSs were correctly classified, both in good condition and partially erased. Since the methodology was tested in real case studies, it was not possible to test all the existing classes of training data. The main classes in the case studies were TSs of dangerous curves, speed bumps, no overtaking, and speed limits. To a lesser extent, there were also traffic signs of yield, stop, roundabouts, no entry, roadworks, and pedestrian crossings. No significant confusion was detected among classes. Errors in confusion were isolated and were corresponded to the results of training.

#### *4.4. Discussion*

In general, most TSs were detected and positioned correctly, although the algorithm showed a tendency to over-detection. This behaviour was chosen to facilitate monitoring by a human operator. In a correction process, it was considered easier to eliminate false detections than to check all input data for undetected signals. In terms of false positives per image, the false detection rate was low, 0.004 FP/image, compared to [54], where 0.32 and 0.07 were reached in the cases with images of better resolution. Regarding undetected TSs (false negatives), the neural network did not detect 10.3% of all TSs, which was similar to other artificial intelligence works: 10% in [51] and 11% in [28], but far from the best of the state of the art: 6% in [62], based on laser scanner intensity; 5% in [32], based on combining two neural networks; and 4% in [29], based on bag-of-visual-phrases.

The authors are aware that the detection success rate was not as high as in other applications using RetinaNet [63]. This was due to the relative small size of the data set for TSD and the great variability of elements that existed in the road environment. Generating a data set for detection is a costly work and was not the final objective of this work, which was focused on presenting a methodology composed of a series of processes to inventory signals, and not on optimizing the success rates of Deep Learning networks such as RetinaNet and InceptionV3.

The methodology did not reach detection rates as high as reference works in TSD and TSR, such as [50,52], although it is worth mentioning that the latter classifies TS grouped by type. By contrast, the proposed methodology is adaptable for mapping different objects, as it does not focus on exclusive TS features. Particularly, by not using reflectivity, it was possible to detect TSs whose reflectivity had diminished due to the passage of time and incorrect maintenance. With the use of Deep Learning techniques, although they do not explain exactly why false detections occur, it is possible to intuit the underlying problem. Deep Learning techniques allow continuous improvement and updates to the training database with new samples that, in this case, may be the wrong detections once corrected. In this way, the algorithm will be able to avoid them.

The combination of images to TSD and TSR with a point cloud to TS 3-D locations allowed a precise positioning of 97.5% of detected TSs in points belonging to TS point clouds, which was not reached by other works based exclusively on epipolar geometry of multiple images, such as [50], which only achieved a positioning with 26 cm of average error, [53] with 1-3 m of average error using dual cameras, and [64] with 3.6 m of average error using Google Street View images.

While point clouds provide valuable information for locating objects, they also require much more processing time than images. The methodology designed in [23] for TSD and TSR in point clouds was implemented in the two case studies. Processing times using point clouds has reached 45 and 30 min, respectively. The time increment is 50% more than performing TSD and TSR on images and 3-D location in point clouds, as proposed in this work. No relation was found between inventory quality and driving speed changes during acquisition. The work maintained a driving acquisition speed similar to other point-cloud mapping works.

#### **5. Conclusions**

In this work, a methodology for the automatic inventory of road traffic signs was presented. The methodology consists of four main processes: traffic sign detection (TSD), recognition (TSR), 3-D location and filtering. For the TSD and TSR phases, cube-images acquired with a 360◦ camera

were used and processed by Deep Learning techniques. Five shapes of traffic signs were detected in the cube-side images (stop, yield, triangular, circular and square) applying RetinaNet. Since the stop and yield forms each corresponded to only one TS, in order to recognize the other forms in their respective classes, an InceptionV3 network was trained for each classification. For the 3-D location and filtering phases, the georeferenced point cloud of the environment was used. TSs detected in the images were projected onto the cloud using the pinhole model for correct 3-D geolocation. Finally, the duplicate signals detected in different images were filtered based on the coincidence between classes and distance between them. The methodology was tested in two real case studies with a total of 214 TSs, 89.7% of the TSs were correctly detected, of which 92.5% were correctly classified. The false positive rate per image was only 0.004 and main false detections were due to road mirrors. 97.3% of the detected signals were correctly 3-D geolocated with less than 0.5 m of error.

The effectiveness in the combination of data images and point clouds was demonstrated in this work. Images allow the use of artificial intelligence techniques for detection and classification, which improve their success rates day by day with new networks and designs. In addition, image processing is much faster and more efficient than point cloud processing. The use of a 360◦ camera does not require the passage of the MMS in two road directions. Furthermore, point clouds allow a more precise geolocation of signals than only using images.

The entire process of TS inventorying from processing images (first) and point cloud (continued) ensures speed and effectiveness in processing time, 50% faster than other proposals that first treat point clouds and then images with much higher computational costs which, although they provide satisfactory results in terms of success rates, make their inclusion in production processes unfeasible due to cost of time and computer equipment. Due to these advantages, the presented methodology is suitable to be included in the production process of any company. Also, it is conducted automatically without human intervention.

Future work will focus on extending the methodology to more objects important for road safety and for the inventory of objects, as the methodology does not depend on any exclusive feature of TSs. In addition, it is proposed to feed back the network to improve the success rate of detections with corrected images that present the main types of error. It is also considered to test the methodology in other case studies such as highways and urban roads, to analyse the influence of driving speed during acquisition on 3-D point cloud location.

**Author Contributions:** Conceptualization, P.A. and D.C.; method and software, D.C.; validation, D.C. and J.B.; investigation, E.G.; resources, P.A.; writing, J.B., E.G. and P.A.; visualization, J.B.; supervision, E.G. and P.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Xunta de Galicia given through human resources grant (ED481B-2019-061, ED481D 2019/020) and competitive reference groups (ED431C 2016-038), the Ministerio de Ciencia, Innovación y Universidades -Gobierno de España (RTI2018-095893-B-C21). This project has also received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 769255. This document reflects only the views of the authors. Neither the Innovation and Networks Executive Agency (INEA) or the European Commission is in any way responsible for any use that may be made of the information it contains. The statements made herein are solely the responsibility of the authors.

**Acknowledgments:** Authors would like to thank to those responsible for financing this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Remote Sensing* Editorial Office E-mail: remotesensing@mdpi.com www.mdpi.com/journal/remotesensing

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-3489-3