Discussion

The above example shows that event collapse is enabled (or disabled) by the type of warp. If the warp does not enable event collapse (contraction or accumulation of flow vectors cannot happen due to the geometric properties of the warp), as in the case of feature flow (2 DOF) [7,30] (Figure 3b) or rotational motion flow (3 DOF) [5,29] (Figure 3c), then the optimization problem is well posed and multiple objective functions can be designed to achieve event alignment [10,13]. However, the disadvantage is that the type of warps that satisfy this condition may not be rich enough to describe complex scene motions.

On the other hand, if the warp allows for event collapse, more complex scenarios can be described by such a broader class of motion hypotheses, but the optimization framework designed for non-event-collapsing scenarios (where the local maximum is assumed to be the global maximum) may not hold anymore. Optimizing the objective function may lead to an undesired solution with a larger value than the desired one. This depends on multiple elements: the landscape of the objective function (which depends on the data, the warp parametrization, and the shape of the objective function), and the initialization and search strategy of the optimization algorithm used to explore such a landscape. The challenge in this situation is to overcome the issue of multiple local maxima and make the problem better posed. Our approach consists of characterizing event collapse via novel metrics and including them in the objective function as weak constraints (penalties) to yield a better landscape.

#### *3.4. Proposed Regularizers*

#### 3.4.1. Divergence of the Event Transformation Flow

Inspired by physics, we may think of the flow vectors given by the event transformation E → E as an electrostatic field, whose sources and sinks correspond to the location of electric charges (Figure 4). Sources and sinks are mathematically described by the divergence operator ∇· . Therefore, the divergence of the flow field is a natural choice to characterize event collapse.

**Figure 4.** *Divergence of different vector fields*, ∇ · **v** = *∂x***v***x* + *<sup>∂</sup>y***v***y*. From left to right: contraction ("sink", leading to event collapse), expansion ("source"), and incompressible fields. Image adapted from khanacademy.org (accessed on 6 July 2022).

The warp **W** is defined over the space-time coordinates of the events, hence its time derivative defines a flow field over space-time:

$$\mathbf{f} \doteq \frac{\partial \mathbf{W}(\mathbf{x}, t; \theta)}{\partial t}. \tag{7}$$

For the warp in (6), we obtain **f** = <sup>−</sup>*hz***<sup>x</sup>**, which gives ∇ · **f** = −*hz*∇ · **x** = <sup>−</sup>2*hz*. Hence, (6) defines a constant divergence flow, and imposing a penalty on the degree of concentration of the flow field accounts to directly penalizing the value of the parameter *hz*. set

Computing the divergence at each event gives the

$$\mathcal{D}(\mathcal{E}, \boldsymbol{\theta}) \doteq \{ \nabla \cdot \mathbf{f}\_k \}\_{k=1'}^{N\_\varepsilon} \tag{8}$$

from which we can compute statistical scores (mean, median, min, etc.):

$$R\_D(\mathcal{E}, \boldsymbol{\theta}) \doteq \frac{1}{N\_{\mathcal{E}}} \sum\_{k=1}^{N\_{\mathcal{E}}} \nabla \cdot \mathbf{f}\_k. \tag{9}$$

To have a 2D visual representation ("feature map") of collapse, we build an image (like the IWE) by taking some statistic of the values ∇ · **f***k* that warp to each pixel, such as the "average divergence per pixel":

$$\text{DIWE}(\mathbf{x}; \mathcal{E}, \boldsymbol{\theta}) \doteq \frac{1}{N\_{\mathcal{E}}(\mathbf{x})} \sum\_{k} (\nabla \cdot \mathbf{f}\_{k}) \,\delta(\mathbf{x} - \mathbf{x}\_{k}^{\prime}) , \tag{10}$$

where *Ne*(**x**) .= ∑*k <sup>δ</sup>*(**x** − **x** *k*) is the number of warped events at pixel **x** (the IWE). Then we aggregate further into a score, such as the mean:

$$R\_{\rm DIWE}(\mathcal{E}, \theta) \doteq \frac{1}{|\Omega|} \int\_{\Omega} \text{DIV}(\mathbf{x}; \mathcal{E}, \theta) d\mathbf{x}.\tag{11}$$

In practice we focus on the collapsing part by computing a trimmed mean: the mean of the DIWE pixels smaller than a margin *α* (−0.2 in the experiments). Such a margin does not penalize small, admissible deformations.

#### 3.4.2. Area-Based Deformation of the Event Transformation

In addition to vector calculus, we may also use tools from differential geometry to characterize event collapse. Building on [12], the point trajectories define the streamlines of the transformation flow, and we may measure how they concentrate or disperse based on how the area element deforms along them. That is, we consider a small area element *dA* = *dxdy* attached to each point along the trajectory and measure how much it deforms when transported to the reference time: *dA* = | det(J)| *dA*, with the Jacobian

$$\mathbf{J}(\mathbf{x},t;\theta) \doteq \frac{\partial \mathbf{W}(\mathbf{x},t;\theta)}{\partial \mathbf{x}} \tag{12}$$

(see Figure 5). The determinant of the Jacobian is the amplification factor: | det(J)| > 1 if the area expands, and | det(J)| < 1 if the area shrinks.

**Figure 5.** *Area deformation of various warps*. An area of *dA* pix<sup>2</sup> at (**<sup>x</sup>***k*, *tk*) and is warped to *t*ref, giving an area *dA* = | det(<sup>J</sup>*k*)|*dA* pix<sup>2</sup> at (**x** *k*, *<sup>t</sup>*ref), where J*k* ≡ <sup>J</sup>(*ek*) ≡ <sup>J</sup>(**<sup>x</sup>***k*, *tk*; *θ*) (see (12)). From left to right, increasing area amplification factor | det(J)| ∈ [0, <sup>∞</sup>).

For the warp in (6), we have the Jacobian J = (1 − ˜*thz*)Id, and so det(J)=(<sup>1</sup> − ˜*thz*)2. Interestingly, the area deformation around event *ek*, <sup>J</sup>(*ek*) ≡ <sup>J</sup>(**<sup>x</sup>***k*, *tk*; *<sup>θ</sup>*), is directly related to the scaling factor *sk*: det(J(*ek*)) = *s*2*k*.

Computing the amplification factors at each event gives the set

$$\mathcal{A}(\mathcal{E}, \boldsymbol{\theta}) \doteq \left\{ \left| \det(\mathbb{J}(v\_k)) \right| \right\}\_{k=1'}^{N\_\varepsilon} \tag{13}$$

from which we can compute statistical scores. For example,

$$R\_A(\mathcal{E}, \boldsymbol{\theta}) \doteq \frac{1}{N\_{\mathcal{E}}} \sum\_{k=1}^{N\_{\mathcal{E}}} |\det(\mathbf{J}(\boldsymbol{\varepsilon}\_k))| \tag{14}$$

gives an average score: *RA* > 1 for expansion, and *RA* < 1 for contraction.

We build a deformation map (or image of warped areas (IWA)) by taking some statistic of the values | det(J(*ek*))| that warp to each pixel, such as the "average amplification per pixel":

$$\text{IWA}(\mathbf{x}) \doteq 1 + \frac{1}{N\_{\varepsilon}(\mathbf{x})} \sum\_{k=1}^{N\_{\varepsilon}} \left( |\det(\mathbf{J}(c\_{k}))| - 1 \right) \delta(\mathbf{x} - \mathbf{x}\_{k}'). \tag{15}$$

This assumes that if no events warp to a pixel **<sup>x</sup>***p*, then *Ne*(**<sup>x</sup>***p*) = 0, and there is no deformation (IWA(**<sup>x</sup>***p*) = 1). Then, we summarize the deformation map into a score, such as the mean:

$$R\_{\rm IWA}(\mathcal{E}, \theta) \doteq \frac{1}{|\Omega|} \int\_{\Omega} \text{IWA}(\mathbf{x}; \mathcal{E}, \theta) d\mathbf{x}.\tag{16}$$

To concentrate on the collapsing part, we compute a trimmed mean: the mean of the IWA pixels smaller than a margin *α* (0.8 in the experiments). The margin approves small, admissible deformations.

#### *3.5. Higher DOF Warp Models*

#### 3.5.1. Feature Flow

Event-based feature tracking is often described by the warp **<sup>W</sup>**(**<sup>x</sup>**, *t*; *θ*) = **x** + (*t* − *<sup>t</sup>*ref)*<sup>θ</sup>*, which assumes constant image velocity *θ* (2 DOFs) over short time intervals. As expected, the flow for this warp coincides with the image velocity, **f** = *θ*, which is independent of the space-time coordinates (**<sup>x</sup>**, *t*). Hence, the flow is incompressible (∇ · **f** = **0**): the streamlines given by the feature flow do not concentrate or disperse; they are parallel. Regarding the area deformation, the Jacobian J = *∂*(**x** + (*t* − *<sup>t</sup>*ref)*θ*)/*∂***<sup>x</sup>** = Id is the identity matrix. Hence | det(J)| = 1, that is, translations on the image plane do not change the area of the pixels around a point.

In-plane translation warps, such as the above 2-DOF warp, are well-posed and serve as reference to design the regularizers that measure event collapse. It is sensible for welldesigned regularizers to penalize warps whose characteristics deviate from those of the reference warp: zero divergence and unit area amplification factor.

#### 3.5.2. Rotational Motion

As the previous sections show, the proposed metrics designed for the zoom in/out warp produce the expected characterization of the 2-DOF feature flow (zero divergence and unit area amplification), which is a well-posed warp. Hence, if they were added as penalties into the objective function they would not modify the energy landscape. We now consider their influence on rotational motions, which are also well-posed warps. In particular, we consider the problem of estimating the angular velocity of a predominantly rotating event camera by means of CMax, which is a popular research topic [5,14,27–29]. By using calibrated and homogeneous coordinates, the warp is given by

$$\mathbf{x}^{h\prime} \sim \mathbb{R}(t\omega) \,\mathrm{x}^{h},\tag{17}$$

where *θ* ≡ *ω* = ( *ω*1, *ω*2, *<sup>ω</sup>*3) is the angular velocity, *t* ∈ [0, <sup>Δ</sup>*t*], and R is parametrized by using exponential coordinates (Rodrigues rotation formula [35,36]).

Divergence: It is well known that the flow is **f** = *<sup>B</sup>*(**x**) *ω*, where *<sup>B</sup>*(**x**) is the rotational part of the feature sensitivity matrix [37]. Hence

$$\nabla \cdot \mathbf{f} = \mathbf{3} (\mathbf{x}\omega\_2 - y\omega\_1). \tag{18}$$

Area element: Letting **r** 3be the third row of R, and using (32)–(34) in [38],

$$\det(\mathbf{J}) = (\mathbf{r}\_3^\top \mathbf{x}^h)^{-3}.\tag{19}$$

Rotations around the *Z* axis clearly present no deformation, regardless of the amount of rotation, and this is captured by the proposed metrics because: (i) the divergence is zero, thus the flow is incompressible, and (ii) det(J) = 1 since **r**3 = (0, 0, 1) and **x***h* = (*<sup>x</sup>*, *y*, <sup>1</sup>). For other, arbitrary rotations, there are deformations, but these are mild if the rotation angle <sup>Δ</sup>*tω* is small.

#### 3.5.3. Planar Motion

Planar motion is the term used to describe the motion of a ground robot that can translate and rotate freely on a flat ground. If such a robot is equipped with a camera pointing upwards or downwards, the resulting motion induced on the image plane, parallel to the ground plane, is an isometry (Euclidean transformation). This motion model is a subset of the parametric ones in [12], and it has been used for CMax in [14,27]. For short time intervals, planar motion may be parametrized by 3 DOFs: linear velocity (2 DOFs) and angular velocity (1 DOF). As the divergence and area metrics show in the Appendix A, planar motion is a well-posed warp. The resulting motion curves on the image plane do not lead to event collapse.

#### 3.5.4. Similarity Transformation

The 1-DOF zoom in/out warp in Section 3.3 is a particular case of the 4-DOF warp in [20], which is an in-plane approximation to the motion induced by a freely moving camera. The same idea of combining translation, rotation, and scaling for CMax is expressed by the similarity transformation in [27]. Both 4-DOF warps enable event collapse because they allow for zoom-out motion curves. Formulas justifying it are given in the Appendix A.

#### *3.6. Augmented Objective Function*

We propose to augmen<sup>t</sup> previous objective functions (e.g., (5)) with penalties obtained from the metrics developed above for event collapse:

$$\theta^\* = \arg\min\_{\theta} I(\theta) = \arg\min\_{\theta} (-G(\theta) + \lambda R(\theta)). \tag{20}$$

We may interpret *G*(*θ*) (e.g., contrast or focus score [13]) as the data fidelity term and *R*(*θ*) as the regularizer, or, in Bayesian terms, the likelihood and the prior, respectively.
