4.4.2. Implementation

The tracker component receives a description data structure from all recognised objects as input. From that data structure, it pulls the position coordinates and the corresponding timestamp and combines them into a position list for a given frame. The core of the tracker is an IMM filter. The filter consists of three different motion models, which are the constant velocity, constant acceleration and constant turn-rate models. In each step, the tracker gives an estimation of current positions for all the registered tracks. Then the tracker component pairs the tracks with the input positions using Munkres global nearest neighbour assignment algorithm. Then it manages the tracks in the following manner: If no existing track can be paired with a position, it creates a new one. If a track was paired with any positions 5 times within the last 7 frames, then it flags it as confirmed. With this method, any false positive detection can be filtered out. The component deletes a track when it has not been assigned with any positions at least 22 times within the last 25 frames. These settings fit the Yolo and point cloud based approach. Due to behaviour differences, the Yolo and homography based approach requires other settings for the tracker component for best results. Therefore, a tracker with optimised settings has been implemented for each detector solution. After the track management, the component compiles a list with the positions of the confirmed tracks. The output of the component is a data structure that contains the ID and position of the tracks and the position and orientation information of the sensor system. The output data structure has the same format as the input structure. The pseudo code of the tracker is listed below:

The latency histogram of tracking can be followed in Figure 9. First of all it is influenced by the number of current tracks and detections. The average response time of the component is 690.7 μs for the sample sequence.

**Figure 9.** Latency histogram of tracking.

```
READ input
FOR elements in detection data:
READ position
FOR each track:
    LOAD last position
    ESTIMATE new position with IMM Filter
ASSIGN estimations with current detection positions
FOR each track:
    IF track was paired with a detection for n times in the last m
    frames:
         REGISTER track as "Confirmed"
    IF track was paired with detection less times than j in the last
    k frames:
         DELETE track
IF there are any detection which were not assigned to a track:
    FOR each unassigned detection:
         CREATE new track based on current detection
ASSEMBLE a list from the positions of "Confirmed" tracks
CREATE output data structure
ADD system origin position and yaw information from input data
WRITE output
```
#### **5. Local Area Fusion Server**

#### *5.1. Stream Setup*

The local area fusion server we set up for our current demonstration automatically processes and converts the incoming detections streams across five sequential processing steps until we ge<sup>t</sup> the fused result in the final stream. The five so-called fusion-processors can be observed in Figure 10.

**Figure 10.** Current stream setup in the *Central Perception* functional sample server (cylinders represent topics/streams, while the numbered arrows represent stream processors).

The sequentially numbered stream processors from Figure 10 have the following responsibilities:

	- (a) **Position:** The transformation matrix is straightforward to derive from the relative frame (e.g., vehicle IMU): we just calculate the rotation matrix from the platform's current orientation and append its current position as a translation vector.
	- (b) **Orientation:** The global heading is obtained by adding the relative (IMUbased) object yaw to the system yaw. Calculating global pitch and roll is more involved and was skipped since this data is not represented in our current environment model. The pitch and roll values are set to zero.
	- (c) **Position covariance:** the object position covariance matrix has to be backrotated and added to system position covariance, assuming no cross-covariance between system and object positions since they are independent.

$$
\boldsymbol{\Sigma\_{\rm op}} = \boldsymbol{\Sigma\_{\rm PP}} + \mathbf{R} \,\,\boldsymbol{\Sigma\_{\rm FP}} \,\,\mathbf{R}^{\parallel} \tag{7}
$$

**R** denotes the IMU-to-UTM rotation matrix, while **<sup>Σ</sup>op**, **<sup>Σ</sup>pp**, **<sup>Σ</sup>rp** denote the resulting object position covariance, the platform position covariance and the IMU-based relative object position covariance, respectively.


#### *5.2. Fusion Algorithm*

Assuming no cross-correlation between sources, we employed a Kalman filter and Global Nearest Neighbor (GNN) association based central tracking source-to-track fusion method called trackerGNN (https://www.mathworks.com/help/fusion/ref/trackergnnsystem-object.html (accessed on 14 September 2021)), which is an integral part of the Sensor Fusion and Tracking Toolbox of Matlab. TrackerGNN maintains a single hypothesis (set of central tracks) about the environment and it follows the central tracking algorithm template detailed in the following subsection. The theory behind the implementation is based on [22]; notably it solves GNN association using the Kuhn-Munkres [23] algorithm, also known as the Hungarian method [24].

#### 5.2.1. Central Tracking

Central tracking, sensor-to-track or source-to-track (S2T) fusion has detections from multiple sources (usually sensors) as inputs and is expected to produce a single set of central tracks as output. Therefore, the detections have to be integrated across time and across sources. If we first perform the time-integration (tracking) and subsequently perform the source-integration (fusion), we ge<sup>t</sup> the equivalent of a track-to-track (T2T) fusion approach. In contrast, if we perform source-integration (fusion) before time-integration (tracking), we are talking about S2T fusion.

The general S2T fusion framework assumes the maintenance of a single set of central tracks throughout the filtering steps. An S2T fusion step usually follows the template given below:

	- (a) Track lifecycle managemen<sup>t</sup> is done during this step (trackerGNN uses parametrizable heuristics as detailed in Section 4.4.2).
	- (b) The assignment algorithm may handle passage of time. A simple solution like trackerGNN would disregard time and only use spatial data for assignment. A sophisticated solution might have to assign and integrate each measurement individually, ordered by time, iterating between steps 2 and 3, increasing computation costs.
