**1. Introduction**

Multi-robot systems (MRSs) have garnered tremendous research interest in recent years [1]. Compared with a single robot, an MRS usually has greater efficiency and operational capability in accomplishing complex tasks, such as transportation [2], search and rescue [3], and mapping [4]. Among these MRS applications are the fundamental tasks of obtaining reliable localization information for the local robot and the uncooperative target using various measurements; these two processes are often referred to as collaborative self-localization (CL) [5–9] and distributed target tracking (DT) [10–13], respectively. In the CL process, each robot measures the relative quantities with regard to neighboring cooperative robots. By cooperating with other robots, each robot is able to refine its own positioning information. In the DT process, each robot performs a measurement function on the uncooperative targets to be tracked. Then, the states of the target can be estimated cooperatively through interactions with other robots. Although the problems of CL and DT are usually solved by two separate techniques, such as in [5–11,13–17],

they are correlated in most practical scenarios. In the DT process, the target tracking accuracy is dependent on the localization information of the corresponding robots, as well as the relative measurements between the robot and target. The target tracking results obtained by each local robot, in turn, can be implemented to improve the localization performance of the robots. To realize MRS self-localization and target tracking simultaneously, a combined collaborative self-localization and distributed target tracking (CLAT) framework is studied in this paper.

The problem of multi-robot collaborative localization has drawn significant attention in recent years. In [18], the state of the art in collaborative localization is surveyed, and the theoretical limits, algorithms, and practical challenges are discussed. As one of the fundamental challenges in CL, the application of a proper data fusion strategy to deal with the correlations between robots was studied in [5–9,14,15]. A direct approach involves the local robot treating the states of neighboring robots as fully confident variables that will lead to zero correlations between robots [5]. However, this impractical assumption of neighboring positions can lead to overconfident estimates. A more practical method to fuse the relative measurements when the correlation is unknown is the implementation of conservative correlation approximation methods, such as covariance intersection (CI) [6] or split covariance intersection (SCI) [7]. A CL approach using CI was proposed in [14]. This method is provably consistent and can handle asynchronous communication and measurement. The SCI-based approach, as studied in [7,8], further separates the covariance into correlated and uncorrelated parts, and the latter is fused using the CI method. Despite the ability of CI-based methods to preserve the consistency of the estimates, they often have overly conservative results. Making a trade-off between estimation accuracy and the corresponding cost during the CI-based collaborative localization process was investigated using the optimal scheduling problem in [19]. Another popular method to deal with the CL problem is based on factor graphs, which are formulated on the basis of entire trajectories, such as in [15]. The correlation can be explicitly tracked in the factor-graph-based method. However, storing all of the measurements resulting from this method requires significantly more storage space than the recursive method. To address the above drawbacks, a recursive extended Kalman filter (EKF)-based CL method was proposed in [20], in which the correlation was accurately tracked in a decentralized manner. In [21], the processing and storage costs were further reduced by introducing a server that broadcasts an update message when an inter-robot relative measurement is taken. However, in this method, when a relative measurement is taken between two robots, communication involves all robots rather than just the two in the relative measurement, and this significantly increases the communication burden. Another recursive EKF-based CL method was proposed in [9]. This method efficiently approximates the correlation and only stores the current measurement. When the relative measurement is taken, only the communication between the two robots is required.

The distributed tracking problem has also been extensively studied [22]. Early-stage algorithms that have been proposed to solve this problem can be roughly split into two categories: consensus-based algorithms [16] and diffusion-based algorithms [10]. The former category, in general, requires multiple communication iterations during each sampling time interval and hence could lead to a heavy communication burden. To reduce the communication bandwidth, a distributed Kalman filter with event-triggered communication was proposed in [23], and the stability is guaranteed. The latter category does not have such drawbacks, but it may require local joint detectabilities at every single agent, and such a requirement might not be satisfied in a general multi-robot target tracking scenario. A more practical DT approach called distributed hybrid information fusion (DHIF) [11] is able to guarantee stability and is asymptotically unbiased with very mild sufficient conditions. To further solve the distributed tracking problem with a nonlinear process and sensing models, an EKF-based paradigm was proposed in [24], and the stability was analyzed in [25]. Also, the unscented transformation-based approach, which has been regarded as a superior alternative to the EKF when the systems are highly nonlinear, was integrated with the DT process in [26]. However, both the EKF and unscented Kalman filter (UKF) mentioned

above are consensus-based and hence may generally result in a heavy communication burden. Recently, the aforementioned DHIF was extended to a nonlinear scenario using the DT approach in [12], and the stochastic stability was analytically studied. Besides the methodology research and theoretical analyses above, the performance of different fusion strategies in terms of their communication rate, information type, and memory size were compared in [27].

The CLAT framework has gained attention in recent years. There are two main kinds of methods: batch and recursive methods. The batch method estimates the entire state trajectory on the basis of all measurements and motion information up to the present, and the recursive method uses only the current measurement and control information. The batch-based method is supposed to outperform the recursive method but at the cost of significantly larger computation and storage requirements and, if in a distributed fashion, communication requirements. One batch-based method was proposed in [28]. By introducing a factor graph that contained robot and target nodes and relative measurements, the problem was formulated as a least-square minimization problem and was solved with sparse optimization methods. Another batch-based method was presented in [29], where the CLAT problem was formulated as a maximum a posteriori estimation problem, and the unscented transformation (UT) technique was implemented to better characterize the nonlinear process. Furthermore, the observability condition was extensively studied. For an MRS with limited computation and storage capacity, the recursive method is often preferred. A recursive-filter-based CLAT was studied in [30], and the error bounds are theoretically provided. Nevertheless, the results in [30] are based on a specially designed measurement graph so that the correlation can be tracked properly. A recursive Bayesian method was proposed in [31] to perform the CLAT in a distributed sequential fashion; however, this method needs synchronous communication at each time instance and will therefore add a significant communication burden. Further, an error propagation analysis was carried out, and the convergence conditions are given in [32], which showed that the localization and tracking accuracy only depends on the expectation of the measurement precision.

In this paper, the multi-robot localization and target tracking problem with a general nonlinear process and various measurement models is studied, and a UT-based CLAT scheme is proposed with consideration of the communication and memory limitations. The main contributions of this paper are summarized as follows: First, the proposed UT-based CLAT is recursive, and it does not store measurements; each robot only keeps the latest estimates of its own and the target, so the storage requirement is significantly reduced. Furthermore, communication is limited to the two robots to obtain a cooperative relative measurement, and no communication with other robots is needed. Meanwhile, to guarantee estimation consistency, inter-robot correlations are approximated in a distributed fashion on the basis of the covariance split method, and the robot–target correlation is discarded using the conservative CI method. Finally, the overall system is modeled on the basis of general nonlinear models and is characterized on the basis of the UT approach rather than the EKF method. Thus, the computation of a Jacobian is avoided. Simulations were carried out, and they indicate that the proposed UT-CLAT method is able to realize stable state estimates of both local robots and targets. More importantly, a hardware platform containing three quadrotors was implemented to verify the effectiveness of the proposed UT-CLAT method. Specifically, three types of measurements (absolute measurement, relative cooperative, and uncooperative measurement) from, respectively, the navigation system, ultra-wide bandwidth (UWB) transmitters, and onboard cameras were utilized to effectively estimate the states of the local robots and targets.

The rest of this paper is organized as follows: Section 2 formulates the CLAT problem, and Section 3 describes the proposed CLAT method. Sections 4 and 5, respectively, provide the simulation results, which are based on synthetic data, and experimental results, which are based on hardware platforms. Section 6 concludes the paper.
