**1. Introduction**

Event cameras [1–3] offer potential advantages over standard cameras to tackle difficult scenarios (high speed, high dynamic range, low power). However, new algorithms are needed to deal with the unconventional type of data they produce (per-pixel asynchronous brightness changes, called events) and unlock their advantages [4]. Contrast maximization (CMax) is an event processing framework that provides state-of-the-art results on several tasks, such as rotational motion estimation [5,6], feature flow estimation and tracking [7–11], ego-motion estimation [12–14], 3D reconstruction [12,15], optical flow estimation [16–19], motion segmentation [20–24], guided filtering [25], and image reconstruction [26].

The main idea of CMax and similar event alignment frameworks [27,28] is to find the motion and/or scene parameters that align corresponding events (i.e., events that are triggered by the same scene edge), thus achieving motion compensation. The framework simultaneously estimates the motion parameters and the correspondences between events (data association). However, in some cases CMax optimization converges to an undesired solution where events accumulate into too few pixels, a phenomenon called event collapse (Figure 1). Because CMax is at the heart of many state-of-the-art event-based motion estimation methods, it is important to understand the above limitation and propose ways to overcome it. Prior works have largely ignored the issue or proposed workarounds without analyzing the phenomenon in detail. A more thorough discussion of the phenomenon is overdue, which is the goal of this work.

Contrary to the expectation that event collapse occurs when the event transformation becomes sufficiently complex [16,27], we show that it may occur even in the simplest case of one degree-of-freedom (DOF) motion. Drawing inspiration from differential geometry

**Citation:** Shiba, S.; Aoki, Y.; Gallego, G. Event Collapse in Contrast Maximization Frameworks. *Sensors* **2022**, *22*, 5190. https://doi.org/ 10.3390/s22145190

Academic Editor: Jing Tian

Received: 9 June 2022 Accepted: 7 July 2022 Published: 11 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and electrostatics, we propose principled metrics to quantify event collapse and discourage it by incorporating penalty terms in the event alignment objective function. Although event collapse depends on many factors, our strategy aims at modifying the objective's landscape to improve the well-posedness of the problem and be able to use well-known, standard optimization algorithms.

**Figure 1.** *Event Collapse.* **Left**: Landscape of the image variance loss as a function of the warp parameter *hz*. **Right**: The IWEs at the different *hz* marked in the landspace. (**A**) Original events (identity warp), accumulated over a small Δ*t* (polarity is not used). (**B**) Image of warped events (IWE) showing event collapse due to maximization of the objective function. (**C**) Desired IWE solution using our proposed regularizer: sharper than (**A**) while avoiding event collapse (**C**).

In summary, our contributions are:


To the best of our knowledge, this is the first work that focuses on the paramount phenomenon of event collapse, which may arise in state-of-the-art event-alignment methods. Our experiments show that the proposed metrics mitigate event collapse while they do not harm well-posed warps.

#### **2. Related Work**

#### *2.1. Contrast Maximization*

Our study is based on the CMax framework for event alignment (Figure 2, bottom branch). The CMax framework is an iterative method with two main steps per iteration: transforming events and computing an objective function from such events. Assuming constant illumination, events are triggered by moving edges, and the goal is to find the transformation/warping parameters *θ* (e.g., motion and scene) that achieve motion compensation (i.e., alignment of events triggered at different times and pixels), hence revealing the edge structure that caused the events. Standard optimization algorithms (gradient ascent, sampling, etc.) can be used to maximize the event-alignment objective. Upon convergence, the method provides the best transformation parameters and the transformed events, i.e., motion-compensated image of warped events (IWE).

The first step of the CMax framework transforms events according to a motion or deformation model defined by the task at hand. For instance, camera rotational motion estimation [5,29] often assumes constant angular velocity (*θ* ≡ *ω*) during short time spans, hence events are transformed following 3-DOF motion curves defined on the image plane by candidate values of *ω*. Feature tracking may assume constant image velocity *θ* ≡ **v** (2-DOF) [7,30], hence events are transformed following straight lines.

In the second step of CMax, several event-alignment objectives have been proposed to measure the goodness of fit between the events and the model [10,13], establishing connections between visual contrast, sharpness, and depth-from-focus. Finally, the choice of iterative optimization algorithm also plays a big role in finding the desired motioncompensation parameters. First-order methods, such as non-linear conjugate gradient (CG), are a popular choice, trading off accuracy and speed [12,21,22]. Exhaustive search, sampling, or branch-and-bound strategies may be affordable for low-dimensional (DOF) search spaces [14,29]. As will be presented (Section 3), our proposal consists of modifying the second step by means of a regularizer (Figure 2, top branch).

 

**Figure 2.** Proposed modification of the contrast maximization (CMax) framework in [12,13] to also account for the degree of regularity (collapsing behavior) of the warp. Events are colored in red/blue according to their polarity. Reprinted/adapted with permission from Ref. [13], 2019, Gallego et al.

#### *2.2. Event Collapse*

*In which estimation problems does event collapse appear?* At first look, it may appear that event collapse occurs when the number of DOFs in the warp becomes large enough, i.e., for complex motions. Event collapse has been reported in homographic motions (8 DOFs) [27,31] and in dense optical flow estimation [16], where an artificial neural network (ANN) predicts a flow field with 2*Np* DOFs (*Np* pixels), whereas it does not occur in feature flow (2 DOFs) or rotational motion flow (3 DOFs). However, a more careful analysis reveals that this is not the entire story because event collapse may occur even in the case of 1 DOF, as we show.

*How did previous works tackle event collapse?* Previous works have tackled the issue in several ways, such as: (i) initializing the parameters sufficiently close to the desired solution (in the basin of attraction of the local optimum) [12]; (ii) reformulating the problem, changing the parameter space to reduce the number of DOFs and increase the well-posedness of the problem [14,31]; (iii) providing additional data, such as depth [27], thus changing the problem from motion estimation given only events to motion estimation given events and additional sensor data; (iv) whitening the warped events before computing the objective [27]; and (v) redesigning the objective function and possibly adding a strong classical regularizer (e.g., Charbonnier loss) [10,16]. Many of the above mitigation strategies are taskspecific because it may not always be possible to consider additional data or reparametrize the estimation problem. Our goal is to approach the issue without the need for additional data or changing the parameter space, and to show how previous objective functions and newly regularized ones handle event collapse.
