**4. Experimental Results and Discussion**

In this section, we conduct a series of experiments to verify the effectiveness of our STV process for unstructured scenes. Moreover, the discussion regarding each experiment is also presented. The performance of our algorithm is compared with other state-of-art global descriptors. All experiments are performed on a computer equipped with an Intel Core (TM) i5-10210U CPU. To compare with Scan context [7] and test online capability, our algorithm is implemented both in MATLAB and C++.

## *4.1. Experimental Setup*

We select four sequences (00, 05, 06, and 08) from the KITTI dataset [15], all of which contain a large number of typical scenes dominated by unstructured objects (mainly vegetation). As shown in Figure 6, these outdoor scenes provide sufficient experimental resources for our algorithm.

**Figure 6.** Typical scenes from KITTI sequences. (**a**) sequence 00; (**b**) sequence 05; (**c**) sequence 06; and (**d**) sequence 08. These scenes are dominated by unstructured objects, which can easily cause mismatches.

In order to show higher accuracy and exhibit the application value of the algorithm, our parameter settings are similar to scan context-50 [7]. This means that in the first stage we will select 50 nearest neighbors, while ensuring real-time performance. If the ground truth Euclidean distance of matched pair is less than 4m, we consider it to be an inlier. Since *<sup>l</sup>* and *<sup>t</sup>* have the same physical meaning, we make them equal in the experiment. Other parameter values used are listed in Table 1.

**Table 1.** Parameter List.


## *4.2. Statistical Analysis*

To illustrate that our STV process can increase the distinguishability in scenes with large scale unstructured objects and effectively avoid the occasional mismatches brought by such scenes. We perform a statistical analysis.

The 4000∼4400th frames of KITTI sequence 00 contain a lot of places dominated by vegetation. Many of these frames are highly susceptible to mismatches, which are discovered through our temporal verification module.

We first carry out analysis on the structured and unstructured objects of the selected 400 frames to demonstrate that the segmentation module described in Section 3.2 can indeed separate unstructured objects from structured objects. Figure 7 presents our statistical results. We can find that the clustering number of structured objects after segmentation is much less than that of an unstructured one. The mean values in Figure 7a,b demonstrate a difference of about 30 times. We represent the size of a cluster by the number of points included. Figure 7c,d show that the former tend to be larger clusters, while the latter are small in size due to gaps in vegetation or noises. Generally, structured clusters are more than 10 times larger than unstructured clusters. Therefore, we naturally think of using the size of the cluster to remove vegetation, etc. In subsequent experiments, we will retain clusters with more than 30 points or occupying over 5 laser beams as structured objects.

**Figure 7.** Comparison of segmentation results between structured and unstructured objects. (**a**,**b**): Distribution of the number of unstructured and structured clusters in each frame, respectively. (**c**,**d**): Distribution of the average size of unstructured and structured clusters in each frame, respectively.

Second, we compare the similarity scores of these 400 pairs of false positives before and after segmentation. As shown in Figure 8, the scores between different places are significantly increased after removing the unstructured objects, as vegetation always has a high degree of similarity. It means improved distinguishability between false loop closures. This allows our algorithm to directly discard mismatches when encountering such places.

**Figure 8.** Comparison of false loop pairs' similarity scores before and after segmentation.

#### *4.3. Dynamic Threshold Evaluation*

In our segmentation algorithm, as the first step judgment, the geometric threshold plays a more critical role in accurate segmentation. According to the characteristics of laser beams, we design a dynamic adjustment strategy of *<sup>g</sup>*, as shown in Equation (3), which can prevent under-segmentation of near objects and over-segmentation of far objects compared with the fixed geometric threshold.

Here, we use the control variable method to test the influence of the dynamic threshold on the experimental results, so as to provide a parameter reference for next experiment. Specifically, we compare the precision and recall rates of fixed and dynamic thresholds with different initial values of *<sup>g</sup>*. Experiments are performed on KITTI sequences 00 and 08, which can provide more convincing references due to their large number of complex and typical unstructured scenes. From the results in Table 2, we can see that under the same

initial value, the dynamic threshold tends to achieve higher recall and precision rates than the fixed one. Moreover, we can conclude that the initial value of *<sup>g</sup>* is best set between 50 and 60.


**Table 2.** Precision and recall rates of different *g*, *p* and *q*.

Therefore, in the following experiments, we set parameters of dynamic threshold as *i <sup>g</sup>* = 60, *p* = 10 and *q* = 1.

#### *4.4. Precision Recall Evaluation*

We leverage precision-recall curves to comprehensively evaluate the performance of our STV-SC method in environments where large scale unstructured objects exist. The performance of our place recognition algorithm is compared with Scan context [7] and M2DP [5], since both are state-of-art global descriptors and neither specifically considers unstructured scenes. In particular, our algorithm is enhanced from Scan context, so the performance comparison with Scan context in unstructured environments is quite important.

As shown in Figure 9, the experiments are conducted on sequences 00, 05, 06, and 08. Since sequence 08 only has reverse loop, it can verify that our algorithm maintains the rotation invariance of Scan context.

Our proposed algorithm outperforms other approaches in all sequences. This is because in the suburban where the roads are surrounded by trees, the geometric information for place recognition is limited. For example, the frames we mentioned in Section 4.2, Scan context will cause mismatches due to the existence of vegetation. However, our method can mitigate the impact of vegetation and avoid many mismatches caused by unstructured objects. That is, under the same recall rate, STV-SC can obtain higher precision rate. As for sequence 08, M2DP performs poorly due to its inability to achieve rotation invariance. However, our algorithm achieves improved performance while maintaining rotation invariance. The residual outliers come from jungles with few or no structured objects or scenes where the structured parts are still very similar so that the geometric information can no longer meet the requirements of place recognition.

In the application, we pay more attention to the recall rate under high precision. Table 3 shows the recall of sequences 00, 05, and 06 at 100% precision. Since sequence 08 is more challenging, we take the recall rate when the precision is 90%. It is obvious that our method outperforms other approaches which do not consider unstructured objects. Compared with the original Scan context, the recall rate of our STV-SC algorithm on different sequences is increased by 1.4% to 16%. In particular, in sequence 08, an environment with a lot of vegetation. Other algorithms often have poor performance, while our algorithm improves the recall rate by more than 15%.

**Figure 9.** Precision-recall curves on KITTI dataset. (**a**) sequence 00; (**b**) sequence 05; (**c**) sequence 06; and (**d**) sequence 08. The performance of the algorithms is measured by the area enclosed by the curves and the coordinate axes.

**Table 3.** Recall at 100% precision on KITTI 00, 05, and 06; Recall at 90% precision on KITTI 08.


#### *4.5. Time-Consumption Analysis*

Compared to Scan context, our method adds segmentation and temporal verification (STV) process. Since the re-identification module does not require search and shift actions, the main time consumption of STV is concentrated in the segmentation module. In the meantime, subject to temporal verification, re-identification process is not always triggered, but only used when encountering ambiguous environment. As the main timeconsuming module, segmentation uses a range image-based breadth-first search, whose time consumption is fairly small.

Under the same conditions as Scan context-50 [7], we record the place recognition time consumption (cost time of STV-SC) for more than 100 triggered frames in Figure 10. Even at the peak, the time consumption is less than 0.4 s. The average time consumption of these 120 frames is 0.316 s (the original scan context is 0.201 s under 0.2 m3 point cloud downsampling), which is within a reasonable range (2–5 HZ on Matlab).

**Figure 10.** Time-consumption result of 120 triggered frames on KITTI 00. In the case of triggering re-identification, the average time consumption of the whole system is 0.316 s.

## *4.6. Online Loop-Closure Performance*

Now, we show the online performance of our STV-SC algorithm. Our algorithm is integrated into the well-known LiDAR odometry framework LOAM [31]. Specifically, our method is used as the loop closure detection module of LOAM, then the detected loop is added to the pose graph as an edge. GTSAM [32] is applied for back-end graph optimization. Finally, a drift-free trajectory is obtained. The experiments run on Robot Operating System (ROS Melodic) and perform on KITTI 00.

The white dots in Figure 11 represent examples of detected loop closures. As shown in the estimated trajectory, our method can effectively detect loop closures and eliminate drift errors in real time, even in unstructured-dominated environments.

**Figure 11.** Online loop-closure performance of STV-SC on KITTI 00. Left figure shows the trajectory without loop closure detection and pose graph optimization. The trajectory in the white circle exhibits noticeable drifts. Right figure shows the trajectory after pose graph optimization.

#### **5. Conclusions**

In this paper, we have proposed STV-SC, a new Scan context-based place recognition method that integrates segmentation and temporal verification process, which gives the original algorithm the ability to handle unstructured environments and enhances the stability of mobile agents in special and complex environments. By summarizing the characteristics of unstructured objects, we design a novel segmentation method to distinguish unstructured and structured objects according to the size of clusters. In addition, for more accurate segmentation we adopt a geometric threshold that varies with range value. In the matching part, we design a three-stage algorithm. Based on the temporal continuity of SLAM system, if temporal verification is not satisfied, the re-identification module will be triggered. Thus, effectively avoiding mismatches caused by unstructured objects. Comprehensive experiments on the KITTI dataset demonstrate that our segmentation method can effectively distinguish different types of objects. STV-SC achieves higher recall and precision rates than Scan context and other state-of-art global descriptors in vegetation-dominated environments. Specifically, it is considered that under the same

precision, the recall rate can be improved by 1.4∼16% by our algorithm in different datasets. Meanwhile, the average time consumption of STV-SC is 0.316 s which is within a reasonable bound and proves that the our algorithm can be run in the SLAM system online.

**Author Contributions:** Conceptualization, X.T. and P.Y.; methodology, X.T.; software, X.T.; validation, X.T., P.Y., F.Z. and Y.H.; formal analysis, X.T.; investigation, X.T.; resources, X.T. and J.L.; data curation, X.T.; writing—original draft preparation, X.T.; writing—review and editing, X.T., P.Y., F.Z., J.L. and Y.H.; visualization, X.T.; supervision, P.Y., J.L. and Y.H.; project administration, P.Y. and J.L.; funding acquisition, P.Y. and J.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was sponsored by Shanghai Sailing Program under Grant Nos. 20YF1453000 and 20YF1452800, the National Natural Science Foundation of China under Grant Nos. 62003239 and 62003240, and the Fundamental Research Funds for the Central Universities under Grant Nos. 22120200047 and 22120200048. This work has been supported in part by Shanghai Municipal Science and Technology, China Major Project under grant 2021SHZDZX0100 and by Shanghai Municipal Commission of Science and Technology, China Project under grant 19511132101.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.
