**2. Materials and Methods**

#### *2.1. System Overview*

The architecture of the framework is depicted in Figure 1. At the early stages of system startup, all the UAV take off from a platform on the top of a UGV. After the unmanned platform completes IMU initialization and performs the first global bundle adjustment (BA), map fusion and relative pose solutions are completed between unmanned platforms through the local map. A flowchart of the co-location program executed on the server is shown below:

**Figure 1.** Overall algorithm framework. "MPs/KFs" is short for map points and keyframes. In the figure, the oval block represents all kinds of data, and the square block represents the processing of data. Input for each platform includes images, IMU, and GNSS location information. Each data handler thread is responsible for handling a corresponding agent. Pose estimation and global optimization on a single platform and server are realized by factor graph optimization. Finally, the final state after global optimization is output to the subsequent program for path planning, task allocation or map construction.

For convenience of explanation, the first stage of the position estimation coordinate system for a single platform is called the VIO frame. The coordinate system where each agent's local map is located after a single platform has completed the second stage of position estimation is called the local frame. The coordinate system where the global map is located after map fusion is completed on the server is called the global frame. The colocation program on the server is shown in Figure 2. When there is no GNSS positioning or collaborative positioning information, the VIO frame will overlap with the local frame. The global frame overlaps with the local frame of the UGV before initialization, and the process of global map initialization is to obtain the relative positioning relation between each UAV and UGV through PnP solutions and convert the local map of each agent to a global map based on UGV's local map. After initialization, the local frame of each agent will overlap with the global frame. As the unmanned platform continues to move, each agent generates a new keyframe during the local SLAM process and sends these new keyframes and map points to the server through wireless data communication. The server stack will cache map information from each agent. These keyframes and map points will be added to the global map through the initialized relative position changes between platforms. After the program discovers place recognition among platforms through detecting feature points, loop-closure and map fusion of the global map will be executed, as well as optimization. The optimized position and pose will be used to estimate that in the second stage, together with the GNSS positioning from each platform. Finally, the new pose of the new keyframes after collaborative positioning in the closed-loop position will be sent back to the corresponding agent. And the agent that accepts the collaborative positioning information will adjust their pose during the second stage of local optimization.

**Figure 2.** The running process of the co-location module in Figure 1 is explained in detail. The function

of this module is to restore the received MPs/KFs data from each agent to a local map. The local map is further stitched into a global map through visual position re-identification, or a new closed-loop is added to the global map that has been initialized. Finally, the GNSS positioning information is integrated to perform global optimization and output the result of co-location.
