*5.1. Robot and Target Dynamics Model*

For target monitoring, each quadrotor is controlled to follow a plenary circular trajectory in 3D space. The state **x***<sup>i</sup>* includes the position in 3D space and the heading angle, namely, **x***<sup>t</sup>* = [*xi*, *yi*, *zi*, *θi*] . **u***<sup>i</sup>* - [*vi*, *ωi*] denotes the control input command, namely, the linear plenary velocity and angular velocity. The actual velocity and angular velocity are contaminated by zero-mean Gaussian noises, *w<sup>v</sup> <sup>i</sup>*,*<sup>k</sup>* ∼ N 0, *Q<sup>v</sup> i* , *w<sup>ω</sup> <sup>i</sup>*,*<sup>k</sup>* ∼ N 0, *Q<sup>ω</sup> i* . An extra altitude noise *w<sup>z</sup> <sup>i</sup>*,*<sup>k</sup>* ∼ N 0, *Q<sup>z</sup> i* is added to the process noise. The overall process noise covariance is denoted as **Q***<sup>i</sup>* = blkdiag(*Q<sup>v</sup> <sup>i</sup>* , *<sup>Q</sup><sup>ω</sup> <sup>i</sup>* , *<sup>Q</sup><sup>z</sup> <sup>i</sup>* ). On the basis of the above definition, the process model for each robot *fi* is defined as follows:

$$\mathbf{x}\_{i,k+1} \triangleq \begin{bmatrix} \mathbf{x}\_{i,k+1} \\ \mathbf{y}\_{i,k+1} \\ \mathbf{z}\_{i,k+1} \\ \theta\_{i,k+1} \end{bmatrix} = \begin{bmatrix} \mathbf{x}\_{i,k} + \Delta\_T(\mathbf{v}\_i + \mathbf{w}\_{i,k}^y)\cos(\theta\_{i,k}) \\ \mathbf{y}\_{i,k} + \Delta\_T(\mathbf{v}\_i + \mathbf{w}\_{i,k}^y)\sin(\theta\_{i,k}) \\ \mathbf{z}\_c + \mathbf{w}\_{i,k}^z \\ \theta\_{i,k} + \Delta\_T(\omega\_i + \mathbf{w}\_{i,k}^\omega) \end{bmatrix} \tag{33}$$

The target ground robot is modeled with a unicycle model *ft* similar to Equation (33). Correspondingly, the state, control input, and process noise are denoted as **x***t*, **u***t*, and **Q***<sup>t</sup>* = blkdiag(*Q<sup>v</sup> <sup>t</sup>* , *Q<sup>ω</sup> <sup>t</sup>* , *Q<sup>z</sup> <sup>t</sup>*), respectively.

#### *5.2. Measurement Model*

The three types of measurement are utilized to realize the CLAT purpose in this system setup: private absolute measurement from the onboard navigation system, cooperative relative range measurement from the UWB sensors, and angle measurement relative to the target from the downward cameras.

Although the position information from the onboard navigation system is a fusion result from multiple sensors, in this part, it is treated as a private absolute measurement and is modeled as below:

$$\mathbf{z}\_{i,k}^a = H\_i^a \mathbf{x}\_{i,k} + \mathbf{v}\_{i,k}^a \tag{34}$$

The measurement noise is assumed to be subject to a zero-mean Gaussian distribution, **v***<sup>a</sup> <sup>i</sup>*,*<sup>k</sup>* ∼ N (**0**, **<sup>R</sup>***a*). Obviously, the absolute measurement model is a linear model. Therefore, we can substitute the inferred Jacobian matrix (defined in Section 3.2.1) with *H<sup>a</sup> <sup>i</sup>* . In this paper, we assume that only a subset of all robots has access to the onboard navigation signal.

The cooperative relative measurement between robots *i* and *j* from UWB is a scalar distance modeled as the following nonlinear model:

$$z\_{ij,k}^{\varepsilon} = \|\mathbf{x}\_{i,k} - \mathbf{x}\_{j,k}\|\_{d} + \upsilon\_{i,k}^{\varepsilon} \tag{35}$$

The measurement noise *v<sup>c</sup> ij*,*<sup>k</sup>* is assumed to be subject to the zero-mean Gaussian noise *<sup>v</sup><sup>c</sup> ij*,*<sup>k</sup>* ∼ N (0, **<sup>R</sup>***<sup>c</sup> i*). The target detection measurement is the position of the target on the captured image plane, and the measurement function is defined as

$$\mathbf{z}\_{it,k}^t = d\_f \frac{\mathbf{x}\_{it,k}^{(1:2)}}{\mathbf{x}\_{it,k}^{(3)}} + \mathbf{v}\_{i,k}^t \tag{36}$$

where *df* denotes the focal length in pixels, **x**¯*it*,*<sup>k</sup>* denotes the position of the target in the camera coordinate, i.e., **x**¯*it*,*<sup>k</sup>* = R(*θi*,*k*)R*c*,*k*(**x***t*,*<sup>k</sup>* − **x***i*,*k*). R(*θi*,*k*) denotes the yaw angle rotation matrix, and

$$
\mathcal{R}(\theta\_{i,k}) = \begin{bmatrix}
\cos(\theta\_{i,k}) & \sin(\theta\_{i,k}) & 0 \\
0 & 0 & 1
\end{bmatrix}.
$$

R*c*,*<sup>k</sup>* denotes the roll and pitch rotation matrix and is assumed to be retrieved from the quadrotor navigation system, and therefore, it is treated as a known variable. The measurement noise is *v<sup>t</sup> <sup>i</sup>*,*<sup>k</sup>* ∼ N (**0**, **<sup>R</sup>***<sup>t</sup> i* ). In this paper, target detection is carried out using kernelized correlation filters (KCFs) [36] on the image plane.

#### *5.3. Experiment Results and Analysis*

According to the above model description, an experiment with three quadrotors tracking one ground robot on the basis of the UT-CLAT algorithm was carried out at 10 Hz. The experimental setup is shown in Table 2.

One experimental snapshot of the three quadrotors with the corresponding captured images is shown in Figure 10 when the ground robot is within the FOV of all three quadrotor cameras.

According to the CLAT algorithm, the trajectories of the three quadrotors' self-localization results and target tracking results are plotted in Figure 11. As observed in Figure 11, the three quadrotors are able to localize themselves while stably tracking the target. It is obvious that the self-localization result from quad1 is better than that of the other two quadrotors as it can obtain the navigation signal from the onboard navigation system. The errors of self-localization and target tracking are plotted in Figures 12 and 13, respectively. The local estimation errors (solid line in colors) and the corresponding approximated ±3*σ* envelopes (dashed lines in the same color) of the aforementioned three quadrotors are plotted. As observed in Figures 12 and 13, the estimation errors by each quadrotor are bounded by the ±3*σ* envelopes in the steady state.


**Table 2.** Experiment setup.

**Figure 10.** Snapshot of the CLAT experiment setup (three quadrotors and corresponding captured image).

**Figure 11.** Trajectories of three quadrotors' self-localization results and target tracking results with different colors. Ground truth is indicated by black lines.

**Figure 12.** The self-localization errors and covariance of three quadrotors.

**Figure 13.** Object tracking errors and covariance of three quadrotors.
