*3.2. Graphic-Centric Description of M\**

This section uses the graphic-centric description introduced by wanger [6] to illustrate M\*. M\* is a complete and optimal MAPP algorithm. The main idea of M\* is to iteratively construct/update a so-called *search graph Gsch* (i.e., to iteratively remove the collision configuration vertexes and expand necessary neighbors) and apply the A\* algorithm on the new *Gsch* until the optimal collision-free path to *vd* exists in the *Gsch* and is found by the A\* search. Specifically, *Gsch* is a sub-graph of *G* and consists of three other sub-graphs: the *expanded graph Gexp*, *neighbor graph Gnbh*, and *policy graph Gφ*. The expanded graph *Gexp* is the sub-graph of *G* that has been explored by M\*. *Gnbh* contains the *limited neighbors* of all the joint vertexes in *Gexp*. The definition of limited neighbors is given below. *G<sup>φ</sup>* consists of the paths induced by the *individually optimal policy φ* that connects each joint vertex in *<sup>G</sup>nbh* <sup>∪</sup> *<sup>G</sup>exp* to *vd* without the collision-free constraint. Specifically, *<sup>φ</sup><sup>j</sup>* is the individually optimal policy for the agent *<sup>j</sup>* that leads any *<sup>v</sup><sup>j</sup>* in *<sup>G</sup>nbh* <sup>∪</sup> *<sup>G</sup>exp* to *<sup>v</sup> j <sup>d</sup>* without considering collisions. Examples of policy *φ* include the standard *Dijkstra's* algorithm [24] and A\* [5]. Using the above graphic concepts, we can define the *collision set Cp* as

$$\mathbb{C}\_{p} = \left\{ \begin{array}{c} \boldsymbol{\Psi} \left( \boldsymbol{v}\_{p} \right) \cup \{ \bigcup\_{\boldsymbol{v}\_{q} \in \mathbf{V}\_{p}} \boldsymbol{\Psi} \left( \boldsymbol{v}\_{q} \right) \}, \text{ for } \boldsymbol{v}\_{p} \in \mathbf{G}^{\text{exp}} \\ \mathcal{O} & \text{, for } \boldsymbol{v}\_{p} \notin \mathbf{G}^{\text{exp}} \end{array} \right. \tag{3}$$

where *Vp* <sup>=</sup> {*vq*|∃*π*(*vp*, *vq*) <sup>⊆</sup> *<sup>G</sup>exp*} is the set of the joint vertexes to which there exists a path to from *vp* in *Gexp*. Let *φ<sup>j</sup>* (*vj* ) be the immediate *successor vertex* of *v<sup>j</sup>* in the policy path, then the set of *limited neighbors Vnbh <sup>p</sup>* for the joint vertex *vp* in *Gnbh* is defined as

$$V\_p^{\text{nbl}} = \left\{ v\_q \, \middle| \, \left\{ \begin{array}{ll} \boldsymbol{e}\_{pq}^j \in \mathcal{E}^j, \text{ for } j \in \mathbb{C}\_p\\ v\_q^j = \phi^j \left( \boldsymbol{v}\_p^j \right), \text{ for } j \notin \mathbb{C}\_p \end{array} \right\} \right\},\tag{4}$$

where *e j pq* <sup>=</sup> *edge v j <sup>p</sup>*, *v j q* . The definition of the limited neighbors implies the sub-dimensional expansion strategy: We only expand the search space at the dimensions where the collision occurs (*j* ∈ *Cp*), otherwise for collision-free dimensions (*j* ∈/ *Cp*), M\* will not expand, limiting the unexpanded search space to the graph that only consists of individually optimal path induced by the policy *φ*.

#### *3.3. Algorithm Description of M\**

The high-level description of M\* is as follows [6]: Initially, M\* computes the individually optimal policy *φ* for each agent from source *vs* to destination *vd*. The initial search graph *Gsch* only consists of an individually optimal path: Initial *Gexp* contains *vs* only; initial *Gnbh* contains *φ*(*vs*) only, which is the successor of *vs* along the individually optimal policy; and initial *G<sup>φ</sup>* contains the optimal policy path from the vertex in *Gnbh* and *Gexp* all the way to *vd*. *Cp* = ∅ for all *vp* in initial *Gsch*. Given the initial *Gsch*, the A\* algorithm is applied using the following *admissible heuristic*

$$h\left(v\_p\right) = \lg(\pi\_\phi(v\_{p^\prime}v\_d)\ ) \le \lg(\pi\_\*(v\_{p^\prime}v\_d)),\tag{5}$$

where *πφ* is the individually optimal path induced by policy *φ*, and *π*<sup>∗</sup> is the ground-truth optimal multi-agent path we want to find. The initial *open list* (i.e., *priority queue*) contains *vs* only, with zero cost. The open list is sorted according to *vp*.*cost* + *h vp* , where *vp*.*cost* is the current cost of *vp* from the source.

In each iteration, M\* expands the first-ranked vertex *vp* from the open list to *Gexp* and investigates each joint vertex *vq* in the limited neighbors of *vp* (i.e., *vq* <sup>∈</sup> *<sup>V</sup>nbh <sup>p</sup>* ) if no collision occurs at *vp*; otherwise, it jumps to the next iteration. If there exists a collision (i.e., *ψ vq* = ∅), M\* will update the collision set *Cq* with *Cq* <sup>∪</sup> *<sup>ψ</sup> vq* , and this update will back-propagate from *vq* to: (1) its immediate predecessor *vp*; and (2) all the way back to any ancestors that have at least one path inside of *Gexp* leading to *vq* (see Equation (3) for details). After this pre-processing, the algorithm:


This process is repeated until *vd* is expanded or open list is empty.

The critical point is that: Only when a collision set *Cp* is changed will the search graph *Gsch* change. It is the operation of updating the collision set in a back-propagation way that makes the story different: By including *ψ vp* to *Cp*, M\* can tell which agents are *immediately* collided at the current *vp*; by including all *ψ vq* for *vq* ∈ *Vp* to *Cp* (i.e., the collision information of all the expanded downstream successors from *vp*), M\* can preview which agents will collide in the future, making it possible to pre-plan to avoid that. Therefore, using the limited neighbor set in Equation (4) makes sense: It advises M\* to only expand the dimensions where there exists an immediate collision at *vp* or there will be collisions in the future, starting from *vp*, in the current expanded graph *Gexp*. Figure 1 shows an example of how M\* solves the optimal collision-free path planning for the two agents.

**Figure 1.** Illustration of traditional M\* for two agents, where we show the evolution of the expanded graph *Gexp* (circle), neighbor graph *Gnbh* (diamond), and policy graph *G<sup>φ</sup>* (square) for Agent 2 as the M\* algorithm proceeds. (**a**) Individually optimal paths; (**b**) the first expanded vertex; (**c**) the third expanded vertex; (**d**) collision occurs at vertex *s*10; (**e**) sub-dimensional expansion; (**f**) search in the expanded space; (**g**) the destination of Agent 2 founded; (**h**) collision-free optimal paths for both agents founded by M\*.

In Figure 1, we can visualize the evolution of the search graph *Gsch* of Agent 2. *Gsch* consists of an expanded graph *Gexp* (circle), a neighbor graph *Gnbh* (diamond), and a policy graph *G<sup>φ</sup>* (square). Edge cost and direction-changing cost are considered during planning. Yellow zones are preferred areas with lower edge cost. In M\*, individually optimal paths are induced by *φ* for each individual agent (Figure 1a). We can observe that there will be a collision at vertex *s*10, which is ignored by *φ*. For Agent 2, M\* searches in the subspace, and the most promising vertex is expanded at each iteration (Figure 1b,c). Then, a collision occurs at vertex *s*<sup>10</sup> and triggers the removal of the rest of *Gsch* (Figure 1d), which is equivalent to jumping to the next iteration. Following the sub-dimensional expansion strategy, M\* extends the search space to include the limited neighbors, and a new *Gsch* is obtained (Figure 1e). By searching in the new *Gsch*, M\* finds the optimal collision-free path for Agent 2 (Figure 1f,g). On the other hand, the planning for Agent 1 is conducted simultaneously, and, finally, the collision-free optimal paths for both agents are found by M\* (Figure 1h).
