3.1.3. Virtual Sensor Data Generation

Based on the modeling of the unmanned platform in Section 3.1.2, two sensor components were added: an inertial measurement unit module and a GPS module. IMU and GPS data were obtained by adding noise to the true value of the sensor model in the Gazebo coordinate system. The IMU-related parameters are shown in Table 2:

**Table 2.** IMU data production module parameter configuration table.


The parameters related to GPS are shown in Table 3.


**Table 3.** GPS data production module parameter configuration table.

#### *3.2. Experimental Results*

This section introduces the simulation and experimental results of the autonomous positioning in an unknown environment by using the heterogeneous UAV and UGV system proposed. In the following experiment, we generated specified waypoints for the three UAVs and a UGV through the virtual simulation and saved the resulting sensor data and time stamps in the ROS bag for subsequent repeating tests. We examined the robustness of the method we proposed in an actual engineering scenario by modifying the ambient light, weather in the scene, and the down view angle of the UAV camera. Then we analyzed and compared the influence of the collaborative positioning framework on the positioning accuracy of each platform with single-platform positioning. The detailed settings for each dataset are shown in the Table 4.

**Table 4.** Parameter Settings for each dataset.


3.2.1. The Influence of External Factors on the Function of Algorithm Framework

Our algorithm depends on visual graph feature points to complete place recognition, which is the foundation of subsequent map fusion and loop closure. Considering that visual angle difference and observation conditions are the main factors affecting feature point matching, we set three meteorological environments in the virtual simulation system: no occlusion, mist and fog, and two lighting environments: normal light and high light ratio under sunset. There was no cover, mist and fog representing the visibility of infinity, 150 m and 50 m, respectively. A schematic of the scenario is shown in Figure 6.

**Figure 6.** Schematic diagram of the virtual simulation scene (**a**) bright scene with high visibility; (**b**) a dimly lit but fog-free scene; (**c**) foggy and dim scene. Different observation conditions will affect the extraction and matching of feature points.

The first thing worth paying attention to is the network bandwidth occupied by each agent and server for exchanging data in the virtual communication network environment. In Table 5, we list the average bandwidth and instantaneous traffic peaks of upload and downlink data between each agent and server. The peak of uploaded data occurs during map initialization, and the bandwidth usage is relatively smooth thereafter. There is a positive correlation between bandwidth usage and keyframe generation speed.

**Table 5.** This table shows the network bandwidth usage after all simulation datasets were run. The results are obtained from seven experiments.


Here, we compared the number of closed-loops generated with that of the feature points matched on the closed-loop frame during the operation process of the collaborative positioning algorithm on different datasets. From the results in Table 6, we found that the feature point matching had similar performance under two similar viewing angles when the viewing angle of UAV was the same. As many feature point pairs as possible are beneficial not only to achieve robust place recognition, but also to improve the positioning accuracy of SLAM. However, the number of feature point matching between UAV and UGV will decrease with an increase in the angle of view difference, which is more obvious when a UAV flies at low altitude. We believe that setting the camera angle of the UAV to 45◦ in the downward view can balance the need for visual repositioning among UAVs and between UAVs and UGVs better.

**Table 6.** The number of feature points matching and successful location re-identification under different angles of view and illuminations.


As can be seen from the data in Table 7, the change of light affected the detection of feature points, but the number of feature points in the picture was still enough to produce the correct closed-loop. In the mist mode, 59% of feature point pairs were lost, and the number of correct closed-loops produced also dropped dramatically. Moreover, dense fog interfered the feature point detection and place recognition thoroughly. In the dense fog mode, except when the map initialization was completed when all the UAVs and UGVs started from the same position at the first stage of the operation, all the position re-identifications failed at the position where the closed-loop should be generated due to insufficient matching feature points in subsequent operations.

**Table 7.** The number of feature points matching and successful location reidentification under different angles of view and illuminations.


#### 3.2.2. Collaborative SLAM Estimation Accuracy

To estimate the positioning accuracy of the system, in this section the influence of external observation conditions on the experiment results were ignored and the camera view that generated the largest average number of matching point pairs (45◦) was adopted. The experiment for this section was run on Factory 01, 03, 05, 06 on the datasets described in Table 8.


**Table 8.** The accuracy of each agent running in different modes.

We estimated the absolute trajectory error (ATE) of each agent. We tested the positioning performance in GPS-denied and GPS-challenged modes separately. In the GPS-challenged 1 mode, all agents obtained valid GPS location information every 20 s. In the GPS-challenged 2 mode, the UAV could only randomly obtain four valid GPS positions during the operation, and the UGV masked all GPS location information. In the GPS-denied mode, all agents would not use the GPS location data in the dataset. The effect trajectory of the three UAVs and the unmanned vehicle in the collaborative positioning is shown in Figure 7a.

**Figure 7.** *Cont*.

**Figure 7.** (**a**) The rendering of the co-location of three UAVs and a UGV; (**b**–**e**) trajectories and ground truth for a single-platform operation of each agent.

#### **4. Discussion**

From the experiments described in Section 3 we draw the following three conclusions.


the images generated under more optimum observation conditions. The visual frontend used in this manuscript fulfills the basic requirements for cooperative air-ground localization but is not yet sufficiently robust for weak observation conditions.

3. The method described in this manuscript allows the location information of each agent in the entire unmanned cluster to be propagated to the other agents in the cluster. This process improves the positioning accuracy while also allowing the server and the individual unmanned platforms to acquire relative positions to each other. We designed a comparison of four cases. They were GPS-challenged co-location, GPS-denied co-location, single-platform independent operation of ORBSLAM3, and single-platform independent operation of ORBSLAM3 fusion GPS. In GPS-denied mode, the simulation was performed on a solar body with no GNSS fix except of Earth, or on Earth but the GNSS signal had interference. In this case, the positioning of each agent completely depended on the camera and IMU. It can be seen from Table 8 that the accuracy of co-location was significantly improved compared with the positioning accuracy of ORBSLAM3 running on a single platform. Assuming that some agents in the system obtained satellite positioning information, the positioning accuracy of all agents was improved. It is worth noting that even if we assume in the experiment that the UGV cannot obtain satellite positioning signals at all, its positioning accuracy still benefits from the co-location algorithm. This is due to the fact that the new constraints brought by the satellite positioning information are propagated to the local maps of all agents through the global map. Through the comparison of the two GPS-challenged modes, we found that only a few GPS points were enough to greatly improve the positioning accuracy. The GPS-challenged 1 mode used several times more GPS positioning points than the GPS-challenged 2 mode, but the positioning accuracy was not significantly improved compared with the latter.

In the experiments designed in this manuscript, we set the UAV to use high-precision GNSS positioning information, while the UGV could not obtain GNSS signals. There were two main reasons for this assumption. The first point was that, in practice, UAVs located in the sky can often obtain GNSS positions with good accuracy through RTK or relative positioning measurements. Whereas vehicles located on the ground may temporarily lose their GNSS position due to obstruction by reinforced concrete buildings, or by entering tunnels and interiors. The second point was that ORBSLAM3 itself is an excellent singleplatform SLAM algorithm, which can have a trajectory drift error of less than 1% with a closed-loop bottleneck. This makes it difficult for the meter-level errors inherent in GNSS to contribute to the improvement in positioning accuracy if differential-free GNSS positioning is used in a virtual scene of limited size. If the scale of the motion trajectory reaches several kilometers, however, even differential-free GNSS positioning information can greatly optimize the positioning accuracy.

In our paper, we focused on collaborative localization. However, in engineering applications, the movement of each agent in the system was not infinite due to the limitation of communication bandwidth and communication distance. Due to the lack of prior information of the global map, the pose information of each agent and the local map information received by the server had time delays, and the global positioning pose updated through the visual closed-loop had no gradient information. In this case, if a motion strategy for convex online learning to train unmanned clusters is required, delayed mirror descent (DMD) [24] would be a good choice. We will use this approach in the subsequent work. In the aspect of multi-agent motion control, our method only used the most basic D\* Lite algorithm. Only the trajectory of a single agent was considered to fit the route generated in advance. However, in practical applications, there may be obstacles in the waypoints planned in advance, and the route itself is unreachable. Cooperation among multiple agents can be difficult to achieve. Collaborative algorithms such as [25] realize the decentralized and real-time cooperative pursuit of a single evader in the planar domain. By improving the method proposed in this paper, the control of the prior map obtained by the UAV would be improved.

Finally, it is worth mentioning that in the statistics of network bandwidth usage, we found that the peak bandwidth usage occurs in the map initialization phase, which is about three times the average value. In the case of long distance or interference, this will put higher requirements on the performance of wireless communication devices. When there is a prior map, it will greatly relieve the bandwidth pressure during system initialization. During system initialization, at least four high-quality GNSS positioning points are needed to convert the VIO coordinate system to the northeast sky coordinate system. Meanwhile, ORBSLAM3 s IMU initialization requires initial zero-bias estimation and gravity direction estimation to obtain scale information, which makes the initialization of the global map take too long. How to complete global map initialization more efficiently and quickly or to make better use of prior map information is the next urgent problem to be solved.
