**Proof.** Appendix B.

#### **5. Cooperative Path Following**

In this section, the problem of CPF is addressed. The end goal is to have an algorithm that allows one quadrotor and multiple ASVs to perform a path following mission cooperatively, using a distributed architecture. The vehicles are required to execute their mission according to a fixed geometric configuration. To cope with limitations imposed by real environments where inter-vehicle communications are discrete, an Event-Triggered Communications (ETC) mechanism is adopted, based on previous work developed by A. Aguiar and A. Pascoal [23] and N. Hung and F. Rego [13].

#### *Synchronisation Problem with Event-Triggered Communications*

Consider a group of *<sup>N</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> \ {1} autonomous vehicles/agents in a network that can be described mathematically by a digraph G(V, E, A), consisting of *N* vertices, a set of directed edges E ⊆ V×V, where the edge *εij* represents the flow of information from agent *<sup>i</sup>* to agent *<sup>j</sup>*, and a weighted adjacency matrix <sup>A</sup> = [*aij*] <sup>∈</sup> <sup>R</sup>*N*×*N*. Furthermore, each vehicle *<sup>i</sup>* is able to receive information from its neighbours in <sup>N</sup> *in <sup>i</sup>* and send information to its neighbours in <sup>N</sup> *out <sup>i</sup>* , i.e., G is undirected. Moreover, consider that the communication topology of the vehicles is fixed; hence, the Laplacian *L* associated to G is constant. Let the state vector of the system be composed by the path parameter of each individual vehicle *γ* = [*γ*1, ..., *γN*] *<sup>T</sup>*. In addition, each vehicle is equipped with the PF controllers proposed in Section 4, and has an assigned path to follow, appropriately parameterised in order to ensure that a given desired formation between the vehicles is met. The CPF problem consists in designing a distributed control scheme that adjusts the speed of the vehicles such that all path parameters *γ* reach a consensus. Consider the problem formulation below.

**Problem 2.** *For each agent i, with i* = 1, ..., *N, derive a consensus protocol for the speed correction term* **vcoord** = [*vcoord* <sup>1</sup> , ..., *<sup>v</sup>coord <sup>N</sup>* ] *T, such that* lim*t*→<sup>∞</sup> <sup>|</sup>*γ<sup>i</sup>* <sup>−</sup> *<sup>γ</sup>j*<sup>|</sup> <sup>=</sup> <sup>0</sup>*,* <sup>∀</sup>*<sup>j</sup>* <sup>∈</sup> *<sup>N</sup>in <sup>i</sup> , and the formation of vehicles achieves the desired speed assignment* **vL**(*γ*)=[*vL*1, ..., *vLN*] *<sup>T</sup> as t* <sup>→</sup> <sup>∞</sup>*.*

Note that, according to the previously developed (PF) controllers, for each vehicle *i*, |*γ*˙ *<sup>i</sup>* − *vd*(*γ*, *t*)| = 0 is only guaranteed as *t* → ∞, as the controlled variable is *γ*¨*<sup>i</sup>* and not *γ*˙ *<sup>i</sup>*. Having this fact in mind, and assuming that the vehicles have already converged to

their desired paths, i.e., *ep* ≈ 0 (and *ev* ≈ 0 in the case of the quadrotor), then the following simplifying assumption can be made:

**Assumption 1.** *The speed progression of all the virtual targets along the desired path is always assumed to be modelled by a single integrator system, which can be expressed in vectorial form as:*

$$
\dot{\gamma} = \mathbf{v}\_{\mathbf{d}}(\gamma, t) = \mathbf{v}\_{\mathbf{L}}(\gamma) + \mathbf{v}^{\text{coord}}.\tag{39}
$$

Let the synchronisation error vector be defined as *ε* = [*ε*1, ...,*εN*] *<sup>T</sup>* where, for each *i*:

$$\varepsilon\_i := \sum\_{j \in \mathcal{N}\_i^{in}} a\_{ij} (\gamma\_i - \gamma\_j)\_\prime \tag{40}$$

with *aij* elements of the weighted adjacency matrix that describes the vehicle network. This error can also be expressed in vectorial form as:

$$
\varepsilon := L\gamma,\tag{41}
$$

where *ε<sup>i</sup>* denotes the coordination error between vehicle *i* and its neighbours. With the above notation, the coordination error dynamics of the multi-vehicle system are given by:

$$
\dot{\varepsilon} := L\dot{\gamma}.\tag{42}
$$

In the work of N. Hung and F. Rego [13], the authors propose a scheme where each agent *<sup>i</sup>* has a set of estimators *<sup>γ</sup>*ˆ*j*, *<sup>j</sup>* ∈ N *in <sup>i</sup>* for the true state of each in-neighbour virtual target *γj*. In addition, each agent *i* has an estimator for its own state *γ*ˆ*i*, which is reset whenever vehicle *i* broadcasts its true state *γi*. The other estimators are reset whenever agent *<sup>i</sup>* receives the true state from its in-neighbours *<sup>j</sup>* ∈ N *in <sup>i</sup>* . In this work, a time-dependent broadcast condition is adopted.

**Proposition 3.** *Consider the distributed control law given by:*

$$w\_i^{\text{coord}} := -k\_\varepsilon \sum\_{j \in \mathcal{N}\_i^{\text{in}}} a\_{ij} (\gamma\_i - \hat{\gamma}\_j), \tag{43}$$

*where k<sup>ε</sup>* > 0 *and γ*ˆ*<sup>j</sup> is vehicle i's estimate of vehicle j's real virtual target value. Consider also that the bank of estimators that each vehicle i is running is described by the dynamics equation:*

$$
\hat{\gamma}\_i := \upsilon\_L(\hat{\gamma}\_i). \tag{44}
$$

*At any time instant t, under negligible transmission delays, the vehicle j's self-state estimate γ*ˆ*<sup>j</sup> is equal to vehicle i's estimate of γ*ˆ*j, which allows us to express the estimator dynamics using vectorial notation as:*

$$\boldsymbol{\hat{\gamma}} := \mathbf{v}\_{\mathcal{L}}(\boldsymbol{\hat{\gamma}}),\tag{45}$$

*where γ***ˆ** = [*γ*ˆ1, ..., *γ*ˆ *<sup>N</sup>*] *<sup>T</sup> is the self-estimate of the virtual target of each vehicle. Let γ***˜** = [*γ*˜1, ..., *γ*˜ *<sup>N</sup>*] *T denote the local estimation errors of each vehicle, such that <sup>γ</sup>***˜** <sup>=</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>γ</sup>***ˆ***. Then,* **<sup>v</sup>coord** *can also be given in vectorial notation, according to:*

$$\mathbf{v}^{\mathbf{coord}} := -k\_{\varepsilon}[D\gamma - \mathcal{A}\hat{\gamma}] = -k\_{\varepsilon}(\varepsilon + \mathcal{A}\hat{\gamma}),\tag{46}$$

*where D is a diagonal matrix and A the graph adjacency matrix. Consider also a triggering function used to define when to broadcast the along-path position of the virtual target of each vehicle, defined as:*

$$\begin{cases} \delta\_i(t) := |\bar{\gamma}\_i(t)| - \mathbb{g}\_i(t) \\ \bar{\gamma}\_i(t) = \hat{\gamma}\_i(t) - \gamma\_i(t), \end{cases} \tag{47}$$

*where γ*˜*i*(*t*) *is the local estimation error of agent i and gi*(*t*) *is a time-dependent threshold function, such that if the estimation error exceeds this threshold, i.e., δi*(*t*) ≥ 0*, vehicle i broadcasts its state to the out-neighbours* <sup>N</sup> *out <sup>i</sup> and resets its local estimator. Furthermore, consider gi*(*t*) *to belong to a class of non-negative functions, given by:*

$$\log\_i(t) = c\_i + b\_i e^{-a\_i t},\tag{48}$$

*with ci, bi and α<sup>i</sup> being positive constants and* **g**(*t*)=[*g*1, ..., *gN*] *<sup>T</sup> being the collection of functions gi for each individual vehicle i. Consider also that* **vL**(*γ*) = *vL***1** + **v˜ <sup>L</sup>***, where* **v˜ <sup>L</sup>** *is a bounded and arbitrarily small term that accounts for a transient period in which the vehicles are on different sections of the path, with slightly different desired speed profiles. Then, under Assumption 1, the system is ISS with respect to the error vector ε and the inputs γ***˜** *and* **v˜ <sup>L</sup>***.*

#### **Proof.** Appendix C.

The proposed control scheme used for achieving CPF using ETC is summarised in Algorithm 1.



2: **procedure** COORDINATION AND COMMUNICATION


Given the general distributed control scheme, we now elaborate and address a specific formation, in the context of this work, in the sections that follow.

#### **6. Path Planning**

This section addresses the problem of generating a set of smooth and planar reference paths for each individual vehicle to follow, with the end goal of encircling the boundary of a chemical spill. In order to make the vehicles follow the dynamic boundary according to a pre-defined formation (such as a triangle) multiple paths should be generated from one reference path that encodes the boundary. Borrowing from the work of Saldaña et al. [7], we start by presenting a rigorous mathematical definition of a dynamic boundary below.

**Definition 1.** *A dynamic boundary is a set of planar points* Ω*t, such that* ∀*z* ∈ Ω*t, and for any ξ* > 0*, the open disk centered at point z with radius ξ contains points of* Ω*<sup>t</sup> and its complement set* Ω*<sup>C</sup> <sup>t</sup> . Moreover, the dynamic boundary can be approximated by a parametric closed curve (Jordan curve)* **<sup>C</sup>**(*γ*, *<sup>t</sup>*) : [0, <sup>∞</sup>] <sup>×</sup> [0, <sup>∞</sup>] <sup>→</sup> <sup>R</sup>2*, mapped by a parameter <sup>γ</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> *and time <sup>t</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> *. The curve is continuous with no self-intersecting points, and changes smoothly with respect to both time t and parameter γ, as depicted in Figure 4a.*

Since the chemical spill boundary is assumed to be dynamic, a path planning problem can be formulated in which a quadrotor is actively re-planning the path that the ASVs should follow at the water surface, as the group of vehicles moves along it and more up-todate data is acquired by the quadrotor's vision system. Consider, therefore, Problem 3.

**Problem 3.** *Consider a quadrotor flying over a body of water at a pre-defined fixed altitude, equipped with a camera sensor pointing downwards with a fixed pitch angle relative to the vehicle's body reference frame* {*B*}*. Consider also that the vehicle is capable of detecting the boundary of a chemical* *spill in the 2-D image provided by the camera sensor. Furthermore, one or more ASVs at the surface of the water are required to follow a path dictated by the quadrotor, according to a pre-defined vehicle formation. As the quadrotor detects the dynamic boundary in the image:*


In order to solve Problem 3, a few simplifying assumptions are made:

**Assumption 2.** *The dynamic boundary is located at the ocean's surface, assumed to be a 2-D plane at ZU* = 0 *in the inertial frame of reference* {*U*}*.*

**Assumption 3.** *The quadrotor has a navigation system that can track the vehicle's pose with good accuracy.*

**Assumption 4.** *The quadrotor has a limited field of view of the environment, i.e, the camera sensor might not be able to capture the entire chemical spill boundary, but rather sections of it, according to Figure 4b.*

**Assumption 5.** *The detection of the pixels that encode the boundary in the image frame is a sub-system that is assumed to be already available, such as the one proposed in [25].*

**Figure 4.** Dynamic Boundary schematic: (**a**) Boundary formal definition. (**b**) Drone's field of view.

#### *6.1. Planar Point Cloud Generation*

The camera model adopted is characterised by: (i) a set of extrinsic parameters, which model the conversion between coordinates expressed in the world/inertial reference frame {*U*} and the camera reference frame {*C*}; (ii) intrinsic parameters which describe how a set of points in {*C*} are represented in the image frame, according to Figure 5.

**Figure 5.** Camera model and reference frames.

The intrinsic parameters consist of the focal distance *fd*, the scale factors (*sx*, *sy*) in the *X*- and *Y*-axis, respectively, and (*cx*, *cy*), which corresponds to the offset of the

focal point in the image plane. These parameters can be obtained a priori by resorting to a camera calibration process, described in detail in [26]. Combining the matrices of intrinsic parameters *K*, also known as the full-rank calibration matrix, and the matrix of external parameters *<sup>C</sup> <sup>U</sup>*[*R*|*T*], and expressing the inertial frame coordinates as homogeneous coordinates, the transformation between inertial frame and camera plane is described by:

$$
\lambda \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \underbrace{\begin{bmatrix} f\_d s\_x & 0 & c\_x \\ 0 & f\_d s\_y & c\_y \\ 0 & 0 & 1 \end{bmatrix}}\_{K} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}}\_{K} \underbrace{\begin{bmatrix} \prescript{C}{}{L}R & \prescript{C}{L}T \\ \mathbf{0}\_{1\times 3} & 1 \end{bmatrix}}\_{\overset{\circ}{U}[R|T]} \begin{bmatrix} \mathbf{X}\_{l\bar{l}} \\ \mathbf{Y}\_{l\bar{l}} \\ 1 \end{bmatrix} \tag{49}
$$

where *x* and *y* denote the coordinates in the image frame and *λ* is a scale factor. It is important to mention that *<sup>C</sup> <sup>U</sup>*[*R*|*T*] results from a series of successive rigid-body transformations (rotations and translations) given by:

$$\,\_{\overline{L}}^{\mathbb{C}}[\mathbb{R}|T] = \,\_{B}^{\mathbb{C}}[\mathbb{R}|T] \,\_{L}^{B}[\mathbb{R}|T],\tag{50}$$

where *<sup>B</sup> <sup>U</sup>*[*R*|*T*] denotes the conversion of coordinates expressed in the inertial frame {*U*} to the quadrotor's body frame {*B*}, provided by its navigation system, and *<sup>C</sup> <sup>B</sup>*[*R*|*T*] is a matrix known a priori, as the camera attached to the vehicle is assumed to be fixed. The intrinsic and extrinsic parameters can be aggregated in a matrix Ω according to:

$$
\Omega \Omega = \mathbb{K} \cdot \prescript{\mathbb{C}}{}{\mathrm{I}}[\mathbb{R}[T]].\tag{51}
$$

In order to convert a given set of pixels (*x*, *y*) that encode the chemical spill boundary in the image frame to a point cloud expressed in the inertial frame, depth information about the scene is required. Taking into consideration Assumption 2, all the points in the inertial frame will lie on the plane described by *ZU* = 0, which solves the depth requirement. Moreover, from Assumption 3, it can be concluded that the linear system of Equation (49) is well defined and can be inverted such that for each pixel representing the boundary of the chemical spill, *XU* and *YU* are extracted from:

$$
\frac{1}{\lambda} \begin{bmatrix} \mathbf{X}\_{lI} \\ \mathbf{Y}\_{lI} \\ 1 \end{bmatrix} = \begin{bmatrix} \boldsymbol{\Omega}\_{1} & \boldsymbol{\Omega}\_{2} & \boldsymbol{\Omega}\_{4} \\ \boldsymbol{\Omega}\_{5} & \boldsymbol{\Omega}\_{6} & \boldsymbol{\Omega}\_{8} \\ \boldsymbol{\Omega}\_{9} & \boldsymbol{\Omega}\_{10} & \boldsymbol{\Omega}\_{12} \end{bmatrix}^{-1} \begin{bmatrix} \boldsymbol{x} \\ \boldsymbol{y} \\ 1 \end{bmatrix}. \tag{52}
$$

**Remark 3.** *This methodology relies heavily on the assumption that the quadrotor has a good navigation system, since small estimation errors in the altitude of the vehicle can lead to errors of several meters in the generated point cloud.*

#### *6.2. Pre-Processing the Planar Point Cloud*

Before using the 2-D point cloud to generate a path, it is important to pre-process the information provided in it. Consider, for instance, the example in Figure 6, where the quadrotor produces a 2-D point cloud, representing the boundary, at an arbitrary timestep *tk*. In the point cloud, some points represent the chemical spill boundary in a region that is close to the vehicle—the region of interest, i.e., where the main cluster of points is expected to be located (in region B). The separation between regions A and B is defined by drawing a normal to the path at the point where the re-planning starts (defined formally in Section 6.3.1). Some points are outliers as a result of either noisy measurements or regions of the boundary that are not entirely captured by the field of view of the camera. The latter can be seen as disconnected from the main cluster and should be disregarded in the path planning process. According to Figure 6, the original path (in purple) obtained at time *tk*−<sup>1</sup> should be re-planned in order to obtain a new one (in red) that better fits the main cluster of points.

**Figure 6.** Pre-processing the point cloud and re-planning schematic.

Unlike conventional motion planning problems, the main cluster of points does not have an explicit ordering, yielding a sequence of waypoints that the vehicle should visit sequentially in time—this information must be inferred. On the other hand, it is possible to define explicitly where the path re-planning process starts—at a point **p***<sup>s</sup>* := *C*(*γs*) arbitrarily further ahead of the drone's position on the current curve *C*(*γ*), such that *γdrone* ≤ *γs*. Motivated by this example, and inspired by the work of Liu Y. et al. [27], the following pre-processing steps are introduced:


#### 6.2.1. Removing Unused Points

Consider **<sup>p</sup>***<sup>s</sup>* <sup>∈</sup> <sup>R</sup><sup>2</sup> to be the point at which the path re-planning starts. In order to remove the points that are in region A, consider that *ψ<sup>s</sup>* is the tangent angle to the current path at **p***s*. A coordinate transformation can be applied to the 2-D point cloud *<sup>X</sup>* :<sup>=</sup> {*Xl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> <sup>∈</sup> <sup>R</sup>2, such that in a new reference frame, points that are behind **<sup>p</sup>***<sup>s</sup>* (in region A) have a negative X-coordinate. This coordinate transformation is given by:

$$\mathbf{X}\_{l}^{\diamond} = \mathcal{R}(\boldsymbol{\psi}\_{s}) \cdot (\mathbf{X}\_{l} - \mathbf{p}\_{s}), \forall l = 1, \dots, L,\tag{53}$$

where *X*◦ *<sup>l</sup>* = [*X*◦*<sup>x</sup> <sup>l</sup>* , *<sup>X</sup>*◦*<sup>y</sup> l* ] *<sup>T</sup>*. Each point *X<sup>l</sup>* is discarded if *X*◦*<sup>x</sup> <sup>l</sup>* < 0. The points that belong to set *<sup>X</sup>* and are not discarded, and should be saved in a new set *<sup>X</sup>* :<sup>=</sup> {*Xj*}*<sup>J</sup> <sup>j</sup>*=<sup>1</sup> <sup>∈</sup> <sup>R</sup><sup>2</sup> with *J* ≤ *L*. The pseudo-code is shown in Algorithm 2.

#### **Algorithm 2** Remove points "behind" the re-planning point.


5: **procedure** REMOVE UNUSED POINTS(*X*, **p***s*, *ψs*)


#### 6.2.2. Ordering a Set of Points and Removing Outliers

In order to avoid clustering outliers, reduce the point cloud to a curve-like shape, and extract some implicit ordering from the data, Lee I. [28] proposes an algorithm that seeks to extract a structure "as simple as possible" from the data, by resorting to an Euclidean Minimum Spanning Tree (EMST). Consider the unordered set of points *X* obtained previously and a graph G = (V, E), such that V = {*X<sup>j</sup>* = (*xj*, *yj*)|*j* = 1, ..., *J*} and E = {(*Xi*, *Xj*)|*i*, *j* = 1, ..., *J*, *i* = *j*}. The EMST is a tree that connects all points in G with the weight of its edges corresponding to the Euclidean distance between each pair of points, that can be computed according to the very popular Kruskal's algorithm. In order to reduce the time complexity of this process, a threshold distance *NJ* can be used to define whether each pair of points is connected and a KDTree [29] can be used to compute a sparse graph G where each point has a limited set of neighbours, as shown in Figure 7.

**Figure 7.** From sparse graph to an ordered list of points (example).

To remove outliers and define a coarse path to follow, Breadth First Search (BFS) can be applied to the EMST, starting from **p***s*. This removal of outliers from the point cloud is crucial to avoid smaller clusters of points being considered later in the curve fitting problem. The resulting ordered list of points that forms the path with the highest number of points should be saved in a new ordered set *<sup>X</sup>*† :<sup>=</sup> {*Xk*}*<sup>K</sup> <sup>k</sup>*=<sup>1</sup> <sup>∈</sup> <sup>R</sup>2. The proposed steps are summarised in Algorithm 3.

**Algorithm 3** Order a set of 2-D points.

