*5.1. Simulation Environment Settings*

We use six sensors consisted of three arrays to observe three acoustic targets as shown in Figure 2. The sensors are at (100 m, 95 m), (95 m, 100 m), (5 m, 100 m), (0 m, 95 m), (0 m, 5 m) and (5 m, 0 m), in [0, 100] × [0, 100] m2.

Three pairs of sensors track the target, each sensor's observation distance is 150 m, the simulated sound velocity is 344 m/s, the surviving probability is *PS* = 0.99 and the clutter intensity of Poisson distribution is *λ<sup>c</sup>* = 2. The scenario last 100 s, maximum number of targets is 3. The motion model is a linear state space equation(CV motion model) and the state of target is expressed as:

$$X\_k = AX\_{k-1} + B\omega\_k\tag{48}$$

$$A = \begin{bmatrix} 1 & \text{T} & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & \text{T} \\ 0 & 0 & 0 & 1 \end{bmatrix} \quad B = \begin{bmatrix} \text{T}^2/2 & 0 \\ \text{T} & 0 \\ 0 & \text{T}^2/2 \\ 0 & \text{T} \end{bmatrix} \omega\_k(t) \, \tag{49}$$

where *A* is the target state transition matrix, *B* is the noise matrix and *ω<sup>k</sup>* is process noise and follows a standard Gaussian distribution. The sampling period is T = 1.

**Figure 2.** Detection model diagram.

In Figure 2, the sensors are displayed by the blue circle, the black circle represents the starting point and the triangle represents the end position. The target location is unknown and two scenarios were compared in this section. The position and velocity vector of target and sensor are represented as x*<sup>k</sup>* = - *pk*,*x*, *p*˙ *<sup>k</sup>*,*x*, *pk*,*y*, *p*˙ *<sup>k</sup>*,*<sup>y</sup>* T and s*<sup>i</sup> <sup>k</sup>* = - *qi <sup>k</sup>*,*x*, *q*˙ *i <sup>k</sup>*,*x*, *<sup>q</sup><sup>i</sup> <sup>k</sup>*,*y*, *q*˙ *i k*,*y* T , respectively. The range dependent detection probability is defined as:

$$P\_{D,i}\left(\mathbf{x}\_k\right) = P\_{D,\max} \exp\left(-\frac{\left[\mathbf{x}\_k - \mathbf{s}\_k^i\right]^T \mathbf{C}^T \Sigma\_D^{-1} \mathbf{C} \left[\mathbf{x}\_k - \mathbf{s}\_k^i\right]}{2}\right),\tag{50}$$

where *PD*,*max* = 0.98, Σ*<sup>D</sup>* = diag(500, 500) <sup>2</sup> and *C* = 1000 0010 .

In the scenario 1, We build a parallel model and the survival period of three targets are 1 s–100 s, 10 s–90 s and 20 s–80 s, respectively. The initial states of the three targets are an LMB-RFS with parameters *rB*,*<sup>k</sup>* (*li*), *pB*,*<sup>k</sup>* (*li*) 3 *<sup>i</sup>*=1, where *li* = (*k*, *i*), *rB*,*<sup>k</sup>* (*li*) = 0.02 and *pB* (x0,*i*, *li*) = N x0,*i*; *<sup>μ</sup>*(*i*) *<sup>B</sup>* ; *PB* with:

$$\begin{aligned} \mu\_{B}^{(1)} &= \begin{bmatrix} 0 \text{ m}, 1 \text{ m/s}, 90 \text{ m}, \text{ } -1 \text{ m/s} \end{bmatrix}^{\text{T}} \\ \mu\_{B}^{(2)} &= \begin{bmatrix} 0 \text{ m}, 1 \text{ m/s}, 80 \text{ m}, \text{ } -1 \text{ m/s} \end{bmatrix}^{\text{T}} \\ \mu\_{B}^{(3)} &= \begin{bmatrix} 0 \text{ m}, 1 \text{ m/s}, 70 \text{ m}, \text{ } -1 \text{ m/s} \end{bmatrix}^{\text{T}} \\ P\_{B} &= \text{diag}\left( \left[ 0.2, 0.08, 0.2, 0.1 \right]^{\text{T}} \right)^{2} . \end{aligned} \tag{51}$$

In the scenario 2, the survival period of three targets are 1 s–90 s, 1 s–80 s and 30 s–100 s, respectively. Target 1 and target 2 are born in the same position at the same time. The initial parameters *rB*,*<sup>k</sup>* (*li*), *pB*,*<sup>k</sup>* (*li*) 2 *<sup>i</sup>*=1, *rB*,*<sup>k</sup>* (*li*) = 0.02 and *pB* (x0,*i*, *li*) = N x0,*i*; *<sup>μ</sup>*(*i*) *<sup>B</sup>* ; *PB* with:

$$
\mu\_B^{(1)} = \begin{bmatrix} 0 \text{ m}, 1 \text{ m/s}, 50 \text{ m}, 0 \text{ m/s} \end{bmatrix}^{\text{T}} \\
$$

$$
\mu\_B^{(2)} = \begin{bmatrix} 0 \text{ m}, 0.8 \text{ m/s}, 95 \text{ m}, -0.5 \text{ m/s} \end{bmatrix}^{\text{T}} \\
\tag{52}
$$

$$
P\_B = \operatorname{diag} \left( \begin{bmatrix} 0.2, 0.08, 0.2, 0.1 \end{bmatrix}^{\text{T}} \right)^2.
$$

The experiment uses the three Matlab audio files sample1.wav, sample2.wav and sample3.wav as the acoustic signals of the three targets in the Figure 3.

**Figure 3.** Acoustic signals of the three experimental targets

Taking the acoustic time difference as *τ* = 0.02 s as an example, the simulation results of the three signals through the cross-correlation algorithm are as shown in the Figure 4.

**Figure 4.** Cross-correlation waveform with a time difference of 0.02 s.

The time difference of the received signals of sensor arrays are calculated according to the GCC function and the angle difference of each group of sensors is calculated according to the signal receiving direction. The observation equation of the target is defined as:

$$\mathbf{z}\_{k}^{[q]} = \begin{bmatrix} \mathbf{f}\_{q} \\ \boldsymbol{\delta}\_{q} \end{bmatrix} + \begin{bmatrix} \sigma\_{\mathbf{r}} \\ \sigma\_{\boldsymbol{\delta}} \end{bmatrix}, q = 1, \dots, \mathbf{Q} \tag{53}$$

$$\text{At}\_q = \arg\max \int\_{-\infty}^{+\infty} \psi\_{12} \left( \omega \right) Z\_1 \left( \omega \right) Z\_2^\* \left( \omega \right) e^{-j\omega \tau\_q} d\omega \tag{54}$$

$$\delta\_{\eta} = \left| \arctan \left( \frac{p\_{k,y} - q\_{k,y}^{1}}{p\_{k,x} - q\_{k,x}^{1}} \right) - \arctan \left( \frac{p\_{k,y} - q\_{k,y}^{2}}{p\_{k,x} - q\_{k,x}^{2}} \right) \right|, \tag{55}$$

where, z [*q*] *<sup>k</sup>* is nonlinear. At time *k*, *τ*ˆ*<sup>q</sup>* is the time difference observed by a pair of sensors, *δ<sup>q</sup>* is the angle difference between a pair of sensors receiving signals, *στ* = 0.001 s and *σδ* = (*π*/720) rad are the standard deviations of the Gaussian distributed measurement noise. In the scenario 1, three pairs of sensor arrays detected the measurements data of target 1 as show in the Figure 5.

**Figure 5.** Measurement data.
