1. Introduction
The recent research breakthroughs in the neuromorphic engineering domain have made possible the development of a new type of sensor, called the event camera, which is bioinspired by the human brain, as each pixel operates individually and mimics the behaviour of a separate nerve cell. In contrast to the conventional camera, in which all pixels are designed to capture the intensity of the incoming light at the same time, the event camera sensor reports only the changes of the incoming light intensity above a threshold, at any timestamp, and at any pixel position by triggering a sequence of asynchronous events (sometimes called spikes); otherwise it remains silent. Because each pixel detects and reports independently only the change in brightness, the event camera sensor proposes a new paradigm shift for capturing visual data.
The event camera provides a series of important technological advantages, such as a high temporal resolution as the asynchronous events can be triggered at a minimum timestamp distance of only
(
s), i.e., the event sensor can achieve a frame rate of up to 1 million (M) frames per second (fps). This is made possible thanks to the remarkable novel event camera feature of capturing all dynamic information without unnecessary static information (e.g., background), which is an extremely useful feature for capturing high-speed motion scenes for which the conventional camera usually fails to provide a good performance. Two types of sensors are currently available on the market: (i) the dynamic vision sensor (DVS) [
1], which captures only the event modality; and (ii) the dynamic and active-pixel vision sensor (DAVIS) [
2], which is comprised of a DVS sensor and an active pixel sensor (APS), i.e., it captures a sequence of conventional camera frames and their corresponding event data. The event camera sensors are now widely used in the computer vision domain, wherein the RGB and event-based solutions already provide an improved performance compared with state-of-the-art RGB-based solutions for applications such as deblurring [
3], feature detection and tracking [
4,
5], optic flow estimation [
6], 3D estimation [
7], superresolution [
8], interpolation [
9], visual odometry [
10], and many others. For more details regarding event-based applications in computer vision, please see the comprehensive literature review presented in [
11]. To achieve high frame rates, the captured asynchronous event sequences reach high bit-rate levels when stored using the raw event representation of 8 bytes (B) per event provided by the event camera. Therefore, for better preprocessing of event data on low-power event-processing chips, novel low-complexity and efficient event coding solutions are required to be able to store without any information loss the acquired raw event data. In this paper, a novel low-complexity lossless compression method is proposed for efficient-memory representation of the asynchronous event sequences by employing a novel low-complexity coding scheme so that the proposed codec is suitable for hardware implementation into low-cost event signal processing (ESP) chips.
The event data compression domain is understudied whereas the sensor’s popularity continues to grow thanks to improved technical specifications offered by the latest class of event sensors. The problem was tackled in only a few articles that propose to either encode the raw asynchronous event sequences generated by the sensor with or without any information loss [
12,
13,
14], or to first preprocess the event data from a sequence of synchronous event frames (EFs) that are finally encoded by employing a video coding standard [
15,
16]. The EF sequences are formed by using an event-accumulation process that consists of splitting the asynchronous event sequence into spatiotemporal neighbourhoods of time intervals, processing the events triggered in a single time interval, and then generating a single event for each pixel position in the EF. These performance-oriented coding solutions are too complex for hardware implementation in the ESP chip designed with limited memory, and may be integrated only in a system on a chip (SoC) wherein enough computation power and memory is available.
In our prior work [
17,
18], we proposed employing an event-accumulation process which first splits each asynchronous event sequence into spatiotemporal neighbourhoods by using different time-window values, and then generates the EF sequence by using a sum-accumulation process, whereby the events triggered in a time window are represented by a single event that is set as the sign of the event polarity sum and stored at the corresponding pixel position. In [
17], we proposed a performance-oriented, context-based lossless image codec for encoding the sequence of event camera frames, in which the event spatial information and the event polarity are encoded separately by using the event map image (EMI) and the concatenated polarity vector (CPV). One can note that the lossless compression codec proposed in [
17] is suitable for hardware implementation in SoC chips. In [
18], we proposed a low-complexity lossless coding framework for encoding event camera frames by adapting the run-length encoding scheme and Elias coding [
19] for EF coding. One can note that the low-complexity lossless compression codec proposed in [
18] is suitable for hardware implementation in ESP chips. The goal of this work is to propose a novel low complexity-oriented lossless compression codec for encoding asynchronous event sequences, suitable for hardware implementation in ESP chips.
In summary, the novel contributions of this work are summarized as follows.
- (1)
A novel low-complexity lossless compression method for encoding raw event data represented as asynchronous event sequences, which is suitable for hardware implementation into ESP chips.
- (2)
A novel low-complexity coding scheme for encoding residual errors by dividing the input range into several coding ranges arranged at concentric distances from an initial prediction.
- (3)
A novel event sequence representation that removes the event timestamp information by dividing the input sequence into ordered same-timestamp event subsequences that can be encoded in separated bit streams.
- (4)
A lossless event data codec that provides random access (RA) to any time window by using additional header information.
The remainder of this paper is organized as follows.
Section 2 presents an overview of state-of-the-art methods.
Section 3 describes the proposed low-complexity lossless coding framework.
Section 4 presents the experimental evaluation of the proposed codecs.
Section 5 draws the conclusions of this work.
2. State-of-the-Art Methods
To achieve an efficient representation of the large amount of event data, a first approach was proposed to losslessly (without any information loss) encode the asynchronous event representation. In [
12], a lossless compression method is proposed by removing the redundancy of the spatial and temporal information by using three strategies: adaptive macrocube partitioning structure, the address-prior mode, and the time-prior mode. The method was extended in [
13] by introducing an event sequence octree-based cube partition and a flexible intercube prediction method based on motion estimation and motion compensation. However, the coding performance of these methods (based on the spike coding strategy) remains limited.
In another approach, the asynchronous event representation is compressed by employing traditional lossless data compression methods. In [
14], the authors present a coding performance comparison study of different traditionally based lossless data compression strategies when employed to encode raw event data. The study shows that traditional dictionary-based methods for data compression provide the best performance. The dictionary-based approach consists of searching for matches of data between the data to be compressed and a set of strings stored as a dictionary, in which the goal is to find the best match between the information maintained in the dictionary and the data to be compressed. One of the most well-known algorithms for lossless data compression is the Lempel-Ziv 77 (LZ77) algorithm [
20], which was created by Lempel and Ziv in 1977. LZ77 iterates sequentially through the input string and stores any new match into a search buffer. The Zeta Library (ZLIB) [
21], an LZ77 variant called deflation, proposed a strategy whereby the input data is divided into a sequence of blocks. The Lempel–Ziv–Markov chain algorithm (LZMA) [
22] is an advanced dictionary-based codec developed by Igor Pavlov for lossless data compression, which was first used in the 7-Zip open source code. The Bzip2 algorithm is based on the well-known Burrows–Wheeler transform [
23] for block sorting, which operates by applying a reversible transformation to a block of input data.
In a more recent approach [
24], the authors propose to treat the asynchronous event sequence as a point cloud representation and to employ a lossless compression method based on a point cloud compression strategy. One can note that the coding performance of such a method depends on the performance of the geometry-based point cloud compression (G-PCC) algorithm used in the algorithm design.
Many of the upper-level applications prefer to consume the event data as an “intensity-like” image rather than asynchronous events sequence, wherein several event-accumulation processes are proposed [
25,
26,
27,
28,
29,
30] to form the EF sequence. Hence, in another approach, several methods are proposed to losslessly encode the generated EF sequence. The study in [
14] was extended in [
15] by proposing a time aggregation-based lossless video encoding method based on the strategy of accumulating events over a time interval by creating two event frames that count the number positive and negative polarity events, which are concatenated and encoded by the high-efficiency video coding (HEVC) standard [
31]. Similarly, the coding performance depends on the performance of the video coding standard employed to encode the concatenated frames.
To further improve event data representation, another approach was proposed to encode the asynchronous event sequences by relaxing the lossless compression constraint problem and accepting information loss. In [
32], the authors propose a macrocuboids partition of the raw event data, and they employ a novel spike coding framework, inspired by video coding, to encode spike segments. In [
16], the authors propose a lossy coding method based on a quad-tree segmentation map derived from the adjacent intensity images. One can note that the information loss introduced by such methods might affect the performance of the upper-level applications.
3. Proposed Low-Complexity Lossless Coding Framework
Let us consider an event camera having a pixel resolution. Any change of the incoming light intensity triggers an asynchronous event, which stores (based on the sensors representation) the following information in 8 B of memory:
spatial information i.e., the pixel positions where the event was triggered;
polarity information where the symbol “” signals a decrease and symbol “1” signals an increase in the light intensity; and
timestamp the time when the event was triggered.
Hence, an asynchronous event sequence, denoted as collects events triggered over a time period of The goal of this paper is to encode by employing a novel, low-complexity lossless compression algorithm.
Figure 1 depicts the proposed low-complexity lossless coding framework scheme for encoding asynchronous event sequences. A novel sequence representation groups the same-timestamp events in subsequences and reorders them. Each same-timestamp subsequence is encoded in turn by the proposed method, called low-complexity lossless compression of asynchronous event sequences (LLC-ARES). LLC-ARES is built based on a novel coding scheme, called the triple threshold-based range partition (TTP).
3.1. Proposed Sequence Representation
An input asynchronous event sequence, is arranged as a set of same-timestamp subsequences, where each same-timestamp subsequence collects all events in triggered at the same timestamp One can note that at the decoder side the timestamp information is recovered based on the subsequence length information, i.e., is set to all events. Each is ordered in the ascending order of the largest spatial information dimension, e.g., However, if then is further ordered in the ascending order of the remaining dimension, i.e.,
Figure 2 depicts the proposed sequence representation and highlights the difference between the sensor’s event-by-event (EE) order, depicted on the left side, and the same-timestamp (ST) order, depicted on the right side. Note that the EE order proposes to write to file, in turn, each event
Although the proposed ST order proposes to write to file the number of events of each same-timestamp subsequence,
having the same-timestamp
and, if
it is followed by the spatial and the event information of all same-timestamp events, i.e.,
Section 4 shows that the state-of-the-art dictionary-based data compression methods provide an improved performance when the proposed ST order is employed to represent the input data compared with the EE order.
3.2. Proposed Triple Threshold-Based Range Partition (TTP)
For hardware implementation of the proposed event data codec into low-power event-processing chips, a novel low-complexity coding scheme is proposed. The binary representation range of the residual error is partitioned into smaller intervals selected by using a short-depth decision tree designed based on a triple threshold, Hence, the input range is partitioned into several smaller coding ranges arranged at concentric distances from the initial prediction.
Let us consider the case of encoding
i.e., a finite range, by using the prediction
by writing the binary representation of the residual error
on exactly
bits. Because on the decoder side
is unknown, the triple threshold
is used to create a decision tree having the role of partitioning the input range
into five types of coding ranges (see
Figure 3a), where either the binary representation of
is represented by using a different number of bits or the binary representation of
x is written by using a different number of bits.
Let us denote and The 1st range, R1, is defined by using as to represent any residual error on bits plus an additional bit for The 2nd range, R2, is defined by using to represent any residual error on bits plus a sign bit, i.e., for and for Similarly, the 3rd range, R3, is defined by using to represent any residual error on bits plus a sign bit. The 4th (R4) and 5th (R5) ranges are defined for and used to represent on bits and on bits, respectively.
Figure 3b depicts the decision tree defined by checking the following four constraints:
- (c1)
is set by checking If true then otherwise,
- (c2)
If then is set by checking If true, then and R1 is employed to represent on bits; otherwise
- (c3)
If then is set by checking If true then and R2 is employed to represent on bits. Otherwise, and R3 is used to represent on bits.
- (c4)
If then is set by checking If true, then and R4 is employed to represent on bits. Otherwise, and R5 is used to represent on bits.
Note that the range contains possible values. To fully utilize the entire set of code words (i.e., including having bits length), is represented on bits.
Algorithm 1 presents the pseudocode of the basic implementation of the TTP encoding algorithm. It is employed to represent a general value
x by using the prediction
, the support range
and the triple threshold parameter,
as output bitstream
which contains the decision tree bits, followed by the binary representation of the required additional information for the corresponding coding range. Algorithm 2 presents the pseudocode of the basic implementation of the corresponding TTP decoding algorithm.
Algorithm 1: Encode a general x by using TTP |
|
Algorithm 2: Decode a general x by using TTP |
|
Section 3.2.1 presents the deterministic cases that may occur.
Section 3.2.2 analyses the different algorithmic variations proposed to encode the data structures in the proposed event representation that have different properties.
3.2.1. Deterministic Cases
In some special cases, some part of the information can be directly determined from the current coding context. For example, if
or
is outside the finite range (see
Figure 4a), then R4 or R5 does not exist and the context tree is built without checking condition (c4), i.e., in such case one bit is saved. More exactly, steps 11–14 in Algorithms 1 and 2 are replaced with either step 12 (encode/decode using R4) or step 14 (encode/decode using R5).
Moreover, because
and
are not power-2 numbers, the most significant bit of
is 0, thanks to the constraint
and
respectively.
Figure 4b shows that if
and
would be set as 1, then
and the constraint would be violated. Hence,
is always set 0 if
, (or similarly when
).
3.2.2. Algorithm Variations
The basic implementation of the TTP algorithm was modified for encoding different types of data. Let us denote
and
Then the sequence
is encoded by using version TTP
where
is used to detect another deterministic case: if
then
and the sign bit is saved (see
Figure 2 (ST order)). The sequence
having
(thanks to ST order) is encoded by using version TTP
which is designed to encode a general value
x found in range
Figure 3c,d show the TTP
range partitioning and decision tree, respectively.
Some data types have a very large or infinite support range. The sequence of number of events of each timestamp,
is encoded by using version TTP
Note that
; however, there is a very low probability of having a large majority of pixels triggered with the same timestamp. Therefore, because
is usually very small, TTP
is designed to use the doublet threshold
, as experiments show that a triplet threshold does not improve the coding performance.
Figure 3e shows the TTP
range partitioning, where the values
are encoded by R2 as the last value,
(having the binary representation as
bits of
i.e.,
), signals the use of R6 to encode
by using a simple coding technique, the Elias gamma coding (EGC) [
19].
Figure 3f shows the decision tree, where
(i.e.,
) is encoded by the first bit of the decision tree.
Finally, TTP
is designed to encode the length of the package bitstream
, denoted as
(see
Section 3.3.3). TTP
defines seven partition intervals by using two triple thresholds:
is used for encoding small errors using R1S, R2S, and R3S, and
is used for encoding large errors using R1L, R2L, and R3L. Similar to TTP
R6 is signalled in R3L by using the last value
and
is encoded by employing EGC [
19].
3.3. Proposed Method
The proposed method, LLC-ARES, employs the proposed representation to generate the set of same-timestamp subsequences,
(see
Section 3.1). Subsequence
is encoded as bitstream
by using Algorithm 3, which employs the proposed coding scheme, TTP (see
Section 3.2). The compressed file collects these bitstreams as
Algorithm 3: Encode the subsequence of ordered events |
|
Algorithm 3 encodes the following data structures:
- (i)
Encode
by employing TTP
using
computed by (
1), and
;
- (ii)
Encode as follows:
- (ii.1)
by employing TTP
using
computed by (
2), range
and
- (ii.2)
by employing TTP
using
computed by (
2), range
and
and
- (ii.3)
using binarization;
- (iii)
The remaining events are encoded as follows:
- (iii.1)
by employing TTP using range and
- (iii.2)
by employing TTP
using
computed by (
3),
range
and
and
- (iii.3)
using binarization.
- (iv)
Update the triple thresholds and
The decoding algorithm can be simply deducted by replacing the TTP encoding algorithm in Algorithm 3 with the corresponding decoding algorithm.
Section 3.3.1 describes the prediction of each type of data used in the proposed event representation.
Section 3.3.2 provides information about the setting of the triple thresholds used in the proposed method.
Section 3.3.3 describes the variation of LLC-ARES algorithm to provide RA to any time window
Finally,
Section 3.3.4 presents a coding example.
3.3.1. Prediction
To be able to employ each one of the four algorithm variations, TTP
TTP
TTP
and TTP
four types of predictions,
are computed by using the following set of equations:
In (
2), the prediction for the spatial information of the first event,
in the same-timestamp subsequence
is set as the sensor’s centre
whereas the rest of the values depend on the first event
of the previously nonempty same-timestamp subsequence
In (
3), if
is small,
is set as the median of a small prediction window of size
; otherwise it is of a larger prediction window of size
In our work, we set the parameters as follows:
3.3.2. Threshold Setting
In this paper, the triple threshold parameters,
and
are selected as power-2 numbers, and are set as follows:
3.3.3. Random Access Functionality
LLC-ARES-RA is an LLC-ARES version which provides RA to any time window of size
Hence,
is now divided into
packages of
time-length, denoted
The proposed LLC-ARES is employed to encode each package
as the bitstream set
which is collected as the package
ℓ bitstream,
having
bit length. The TTP
version is employed to encode
using the prediction
computed using (
4), and the two triple threshold
and
and to generate the header bitstream,
as depicted in
Figure 1. Hence, the bitstreams of the set
are collected by the header bitstream, denoted as
whereas all package bitstreams are collected by the sequence bitstream, denoted as
Finally, the compressed file with RA collects the
and
bitstreams in this order.
3.3.4. A Coding Example
Figure 5 presents in detail the workflow of encoding by using the proposed LLC-ARES method an asynchronous event sequence of
time-length, containing 23 triggered events. The input sequence received from the event sensor is initially represented by using the EE order. The proposed sequence representation is employed by first grouping and then rearranging the asynchronous event sequence by using the ST order. Because the input sequence contains two timestamps, the ST order consist of the same-timestamp subsequence
of 10 events and the same-timestamp subsequence
or 13 events. LLC-ARES encodes each data structure by using different TTP variations as described in Algorithm 3.
5. Conclusions
In this paper, we proposed a novel lossless compression method for encoding the event data acquired by the new event sensor and represented as an asynchronous event sequence. The proposed LLC-ARES method is built based on a novel low-complexity coding technique so that it is suitable for hardware implementation into low-power ESP chips. The proposed low-complexity coding scheme, TTP, creates short-depth decision trees to reduce either the binary representation of the residual error computed based on a simple prediction, or the binary representation of the true value. The proposed event representation employs the novel ST order, whereby same-timestamp events are first grouped into same-timestamp subsequences, and then reordered to improve the coding performance. The proposed LLC-ARES-RA method provides RA to any time window by employing a header structure to store the length of the bitstream packages.
The experimental results demonstrate that the proposed LLC-ARES codec provides an improved coding performance and a closer to real-time runtime performance compared with state-of-the-art lossless data compression codecs. More exactly, compared with Bzip2 [
36], LZMA [
22], and ZLIB [
35], respectively, the proposed method provides:
- (1)
an average CR improvement of , , and
- (2)
an average BR improvement of and
- (3)
an average bitsavings of bpev, bpev, and bpev;
- (4)
an average event density improvement of , and and
- (5)
an average TR improvement of and .
To our knowledge, the paper proposes the first low-complexity lossless compression method for encoding asynchronous event sequences that is suitable for hardware implementation into low-power chips.