**3. Method**

The method consists of four main processes. First, TSs are detected in images. Second, detected TSs are recognized. Third, TSs are 3-D geolocated by the projection of detected signs to the point cloud. Fourth, multiple TS detections of the same sign in different images are filtered. The input data of the method are MMS data: images from a 360◦ camera, point clouds, GPS-IMU positioning data, and camera calibration data. Figure 1 shows the workflow of the method.

#### *3.1. Tra*ffi*c Sign Detection*

TSD is based on object detection in images. No point-cloud data are used at this stage to speed up the detection process. The input images are acquired with a 360◦ RGB camera mounted on the MMS during acquisition. The panoramic image is converted and rectified into six images oriented according to cube sides. Images in trajectory direction I<sup>T</sup> provide TS information in front of the MMS. Images in the opposite direction ITo provide TS information in back of the MMS, either in the same lane or in different lanes. Lateral images are perpendicular to trajectory direction I⊥<sup>T</sup> and provide information about signs located on MMS sides. Lateral images I⊥<sup>T</sup> are particularly relevant for detecting no-parking signs or no-entry signs. The images forming the top and bottom of the cube are not relevant for TSD. Bottom images are occupied by the camera support. TSs that could be detected on top images are already detected by front images IT.

The object detector implemented in this method is RetinaNet [57]. This detector has been chosen because it is state of the art in standard accuracy metrics, memory consumption, and running times. RetinaNet is a one-stage detector that has good behaviour with unbalanced classes and in images with a high density of objects at several scales, key factors for traffic sign detection. RetinaNet uses ResNet [58] as a basic feature extractor, and in this work is used the ResNet 50.

The RetinaNet detector is applied to each cube-side image I of the set acquired with the MMS during the acquisition. As a result, an array is obtained for each TS detected *S(l,Ix,Iy,w,h)* where *l* indicates the label, *I<sup>x</sup>* and *I<sup>y</sup>* indicate top left corner position of the bounding box, *w* indicates TS width and *h* indicates TS height. In order to obtain maximum classification accuracy, the number of classes has been reduced to coincide with shapes of traffic sings. The classes for detection with RetinaNet are five: yield, stop, triangular, circular, and square (Figure 1). In the recognition phase (Section 3.2) these classes will be classified. **3. Method**  The method consists of four main processes. First, TSs are detected in images. Second, detected TSs are recognized. Third, TSs are 3-D geolocated by the projection of detected signs to the point cloud. Fourth, multiple TS detections of the same sign in different images are filtered. The input data of the method are MMS data: images from a 360° camera, point clouds, GPS-IMU positioning data, and camera calibration data. Figure 1 shows the workflow of the method.

geometry of multiple images. Point clouds are not used for detection and classification since:

• The addition of point-cloud data to images increases processing times.

• The low point density does not provide useful information for TSR.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 4 of 15

Most authors use point clouds for TSD and images only for TSR [23–26,28,29].

equivalent in image processing.

While TSDR in image is proved as high-performance, reconstruction models for TS 3-D location from image are overcome in precision by 3-D point-cloud-based location. However, little research has paid attention to techniques for TS inventory that jointly takes advantage of TSDR in image and TS 3-D location from the 3-D MLS point cloud. In [32], a method to combine DL with retro-reflective properties is developed for TS extraction and location from a point cloud. In [55], TS candidates are detected on images based on colour and shape features and filtered with point-cloud information.

In contrast to other approaches, in this work a data flow is implemented to minimize processing times by taking advantage of each type of data. First, images are used for TSD and TSR. Image processing is faster than point-cloud processing and allows the application of DL techniques, which right now are state of the art. In addition, the design of a modular workflow allows each network to be replaced in the future as its success rates increase. To maximize a correct TS identification, different networks for TSD and TSR are used, unlike other works that use the same network to detect and classify, see also [44,48]. After image processing, point clouds are used to filter out multiple TS detections and false positives. Point clouds allow more precise geolocation than the use of epipolar

• DL point-cloud processing techniques are computationally more expensive than their

**Figure 1.** Workflow of the method.
