*2.4. Global Map Initialization and Intra-Agent Measurements*

In order to enable the system to complete the positioning of each agent without GNSS at the initial stage of operation, we let the UAVs and UGVs start from the same position with the same angle of view. In this paper, UAVs are set to take off from the top platform of the UGV. After the IMU initialization is completed (which commonly needs 15–35 keyframes and takes 5–10 s), the agents perform a visual-inertial bundle adjustment. The aim of this action is to improve the precision of the initial map. Then, each agent sends its initial optimized local map to the server. For a certain terminal *Agentc*, the server will start the alignment from the keyframe *KF<sup>c</sup>* at the most recent moment sent in the *Mapc*. First, a subset of maps *Mapm* from the server map stack and brute-force descriptors are chosen that match all keyframes contained within that map with *KF<sup>c</sup> <sup>t</sup>* through DBoW2, and obtained the most likely matched candidate ) *KF<sup>m</sup> <sup>t</sup>*<sup>1</sup> , *KF<sup>m</sup> <sup>t</sup>*<sup>2</sup> ... *KF<sup>m</sup> ti* \* (We set *i* to four times that of the number of agents). For each keyframe, among all the feature points contained in it and its five best common-view keyframes, the 2D-2D feature point matching provided by DBoW2 is used to obtain the 3D-3D matching between their corresponding map points. The map points of candidacy *KF<sup>m</sup> tn* will be converted to *KF<sup>c</sup> <sup>t</sup>* through *T<sup>n</sup> cm*. If the reprojection error is less than the threshold, a vote is cast for the corresponding *KF<sup>m</sup> tn* , and the frame with the most votes is selected as the matching frame. For further improvement of the closed-loop's robustness, the matching frame and its common-view frame together need to achieve three successful matches to be considered as the complete place recognition. If, at this moment, *Agentc* is a UAV, *Agentm* is a UGV, then the UAV's local map will be fused into the UGV's through *TUAVc*\_*UGVm* = *Tcm*. If both *Agentc* and *Agentm* are UAVs, the relative pose transformation will be saved until one of them is fused with the UGV's map, and another will complete fusion through *TUAVc*\_*UAVm* × *TUAVm*\_*UGV*. After the above pairwise pairing of the unmanned platforms, the local map of each UAV is finally converted into the coordinate system of the local map of the unmanned vehicle. New incremental keyframes and map points received by the server during subsequent operations will be transferred to the global map with the corresponding poses until a new loop closure or GNSS location provides a new location constraint.
