A Moving Target Detection Model Inspired by Spatio-Temporal Information Accumulation of Avian Tectal Neurons

Huang, Shuman; Niu, Xiaoke; Wang, Zhizhong; Liu, Gang; Shi, Li

doi:10.3390/math11051169

Open AccessArticle

A Moving Target Detection Model Inspired by Spatio-Temporal Information Accumulation of Avian Tectal Neurons

by

Shuman Huang

¹

,

Xiaoke Niu

^1,*

,

Zhizhong Wang

¹,

Gang Liu

^1,*

and

Li Shi

^1,2

¹

Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China

²

Department of Automation, Tsinghua University, Beijing 100084, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2023, 11(5), 1169; https://doi.org/10.3390/math11051169

Submission received: 11 January 2023 / Revised: 16 February 2023 / Accepted: 24 February 2023 / Published: 27 February 2023

(This article belongs to the Special Issue Mathematical Modeling of Neurons and Brain Networks: Fundamental Principles and Special Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Moving target detection in cluttered backgrounds is always considered a challenging problem for artificial visual systems, but it is an innate instinct of many animal species, especially the avian. It has been reported that spatio-temporal information accumulation computation may contribute to the high efficiency and sensitivity of avian tectal neurons in detecting moving targets. However, its functional roles for moving target detection are not clear. Here we established a novel computational model for detecting moving targets. The proposed model mainly consists of three layers: retina layer, superficial layers of optic tectum, and intermediate-deep layers of optic tectum; in the last of which motion information would be enhanced by the accumulation process. The validity and reliability of this model were tested on synthetic videos and natural scenes. Compared to EMD, without the process of information accumulation, this model satisfactorily reproduces the characteristics of tectal response. Furthermore, experimental results showed the proposed model has significant improvements over existing models (EMD, DSTMD, and STMD plus) on STNS and RIST datasets. These findings do not only contribute to the understanding of the complicated processing of visual motion in avians, but also further provide a potential solution for detecting moving targets against cluttered environments.

Keywords:

moving target detection; spatio-temporal information accumulation; optic tectum; the avian

MSC:

68-04; 68T20; 92-10; 92B20

1. Introduction

Moving target detection plays a critical role in computer multiple vision scenarios, such as video surveillance, visual tracking, and defense. However, artificial visual systems are still facing great challenges, especially within complex natural environments and in an unsupervised manner. For many animal species, detecting moving targets in their surroundings is just one of their most fundamental abilities [1], and such ability of birds is particularly prominent [2]. Thus, the information processing mechanism of the avian visual system could provide potential effective solutions in detecting moving targets surrounded by complex and dynamic backgrounds.

Indeed, visual neuroscience research has contributed considerably to the design of artificial visual systems for moving target detection. Many moving target detection models have been proposed based on biologically relevant mechanisms [3,4,5]. Among these studies, there are three representative models: including the Barlow–Levick model (BLM) [6], the motion energy model [7], and the elementary motion detector (EMD) [8,9]. In particular, BLM usually originates from vertebrate retina [10,11,12]. Motion energy models are often used to describe the computations of the vertebrate visual cortex [13,14]. EMD is widely used in the studies of insect visual systems [15,16,17], and was originally proposed by Reichardt and Hassenstein [9] to explain insects’ perception of motion information. The response to motion of insect visual neurons can be induced by delayed effects of visual cues [18]. Specifically, the delayed and non-delayed luminance signals caused by motion in a particular direction arrive simultaneously to the subsequent processing step in the brain. These signals are then amplified nonlinearly to produce the neuronal response to motion [19,20]. Consequently, a series of generalization models is proposed based on EMD [16,17,21,22,23,24,25]. Now, the EMD-based models have been widely recognized in the field of neuroscience and biological visual computing [3,16,26]. However, some moving target-like features (such as flashed targets with luminance change, named meaningless features) cannot be simply filtered out by these existing EMD-based models only with delayed effects of luminance information when detecting moving targets. To solve this problem, other visual mechanisms should be taken into account for distinguishing moving targets from meaningless features.

Neurophysiological studies in the avian have shown that the visual motion information sequentially passes through the retina, the superficial layer of the optic tectum, the intermediate-deep layers of the optic tectum, and is finally processed in the specific subregion of the nucleus [2,27,28,29,30]. The layered neural circuit structure always serves as the foundation for constructing physiologically computational models. Meanwhile, recent research studies showed that the faster and stronger response to a moving target, compared to flashed ones, may result from a spatio-temporal integration process, corresponding to the spatially sequential activation of tectal neurons and the accumulation of temporal information [2,31]. The mechanism of spatio-temporal information accumulation is expected to improve the performance of the existing model further in discriminating moving targets from flashed meaningless features effectively. Although the neural circuits in avian and insects are not exactly the same, insects and birds are both flying creatures, and there are many similar response characteristics of visual neurons. Considering that different organisms may have similar computing principles [32,33], it is feasible that the spatio-temporal information accumulation is added to EMD, even though it was originally used for describing insect visual systems.

This paper proposed a moving target detection model inspired by the above biological findings. The newly proposed model consists of three components: retina layer, superficial layer of optic tectum, and intermediate-deep layers of optic tectum. We assume that EMD extracts motion information at every pixel location in the retina layer and the superficial layer of the optic tectum. In intermediate-deep layers of the optic tectum, spatio-temporal information accumulation enhances the strength of response to moving targets along any motion direction. Furthermore, the performance of this model is evaluated by synthetic datasets and real natural videos. The verification results show the proposed model not only has better biological plausibility, but also higher performance in detecting moving targets compared to the EMD which does not consider spatio-temporal information accumulation. The main contributions of this paper are presented below.

(1) This paper presents a novel computational model (Elementary motion detector with spatio-temporal information accumulation, referred to as EMD_TSA) for motion detection by simulating the information processing mechanism of avian tectal neurons to moving targets. The EMD_TSA was demonstrated to have better performance in detecting moving objects in both synthetic motion videos and most natural scenes compared to the traditional EMD model.

(2) This study provides a robust method for filtering out meaningless moving features (such as flashed targets with luminance change) when detecting moving targets among natural scenes by introducing a module of spatio-temporal information accumulation upon the existing EMD model framework.

(3) The study demonstrates that the new proposed model is biologically plausible, making it possible to add other more excellent neural mechanisms of the avian into EMD, which originated from insect visual systems, to further improve their performance in detecting moving targets in the future.

The remainder of the paper is organized as follows. The EMD, EMD_TSA, and testing methods are described in Section 2. In Section 3, experiments and results are given. Section 4 discusses the findings and the limitations of the proposed model. The paper concludes in Section 5.

2. Materials and Methods

2.1. Elementary Motion Detector

The EMD is composed of two spatially separated input channels (photoreceptors), a time delay, and a nonlinear interaction (multiplication). In this form, the input channels correspond to two spatial sites: site A and site B. A time delay ensures the temporal correlation between the signals of the two spatial separated sites, which do not peak at the same time when a target is moving. Finally, the two signals are multiplied to boost tightly correlated activity. That is,

R = [x_{A} (t) * x_{B} (t - τ)] - [x_{A} (t - τ) * x_{B} (t)]

(1)

where

x_{A}

and

x_{B}

denote the inputs of two adjacent sites A and B.

τ

stands for the time delay.

Details of the general form of the EMD model are provided below. We make the simplifying assumption that the target is moving along the detector axis. The speed of movement is set to

v = d s (t) / d t

where (

s (t)

represents the movement). The input of EMD can be expressed as:

F (x, t) = F (x + s (t))

(2)

Let the two input channels be

x_{A}

and

x_{B}

, respectively, and let

x_{B} - x_{A} = Δ Φ

. The two input channels can then be expressed as:

F (x_{A}, t) = F (x_{A} + s (t))

(3)

F (x_{B}, t) = F (x_{B} + s (t)) = F (x_{A} + Δ Φ + s (t))

(4)

The delay is set to

τ

, and the output

R

is expressed as:

R (x, t) = F (x + s (t)) \cdot F (x + Δ Φ + s (t - τ)) - F (x + s (t - τ)) \cdot F (x + Δ Φ + s (t))

(5)

To simplify representation, we expand the equation (Equation (5)) as a Taylor series, of which the higher-order small terms can be neglected, yielding:

d R (x, t) = - τ \cdot d s (t) / d t \cdot s (x, t) d x

(6)

where

s (x, t) = {[\frac{\partial F (x + s (t))}{\partial x}]}^{2} - F (x + s (t)) \cdot \frac{\partial^{2} F (x + s (t))}{\partial x^{2}}

(7)

Equation (6) can be further simplified to

d R / d x = - τ \cdot \frac{d s}{d t} \cdot (F_{x}^{2} - F \cdot F_{x x})

(8)

F_{x} = \frac{\partial F (x + s (t))}{\partial x}

(9)

F_{x x} = \frac{\partial^{2} F (x + s (t))}{\partial x^{2}}

(10)

where

F_{x}

and

F_{x x}

are the first and second derivatives with respect to

x

of the function

F (x + s (t))

.

The following can be seen from Equation (8):

At $d s / d t = 0$ or $F_{x}^{2} = F \cdot F_{x x}$ , the response $R$ of model is zero. Therefore, the motion of the target is a necessary condition for the model to generate a response.
The output of EMD correlates with the velocity v and spatial pattern $F (x, t)$ , and positive values indicate movement in the preferred direction, and negative values movement in the null direction.

2.2. Elementary Motion Detector with Spatio-Temporal Information Accumulation

Based on the neural circuit structure of the bird retina-optic tectum, we devised an EMD_TSA. The EMD_TSA is based on the framework of EMD and was improved by introducing a new module. Figure 1 shows the schematic of EMD_TSA. The proposed model is composed of three sequentially arranged neural layers (Figure 1a), including the retina layer, superficial layers of the optic tectum, and intermediate-deep layers of the optic tectum. These three neural layers have specific functions and cooperate for moving target detection. Figure 1b shows the detailed calculation process of spatio-temporal information accumulation, which is the main mathematical contribution of our work.

2.2.1. Retina Layer

In the visual system of avians, the retinal layer contains numerous photoreceptors [27]. Each photoreceptor views a small region of the whole visual field to supply the luminance information of a “pixel”. The modeling method was based on the literature [16,17,26], which concerns the retina layer of flying insects. Each frame of the image is topologically mapped to photoreceptors, which are arranged in a matrix, receiving the entire image frame as inputs while each photoreceptor only perceives one pixel of an image frame. The modeling procedure is biologically reasonable, because the spatial resolution of the bird retina is high enough to perceive the pixel-level value of each image [34,35], which is already the highest spatial resolution for representing an image.

Specifically, let

I (x, y, t) \in ℜ^{3}

represent the varying luminance values captured by the photoreceptors, where

x

,

y

and

t

are spatial and temporal field positions. The output

P (x, y, t)

of the retinal layer is defined as the convolution of the input

I (x, y, t)

and the Gaussian function

G_{σ_{1}} (x, y)

.

P (x, y, t) = \iint I (u, v, t) G_{σ_{1}} (x - u, y - v) dudv

(11)

G_{σ_{1}} (x, y) = \frac{1}{2 π σ_{1}^{2}} \exp (- \frac{x^{2} + y^{2}}{2 σ_{1}^{2}})

(12)

where

σ_{1}

is the standard deviation of the Gaussian function.

2.2.2. Superficial Layers of the Optic Tectum

The superficial optic tectum neurons have stable, asymmetric ON and OFF response properties [36], therefore the neuron here is modeled as a band-pass filter that is used to capture the temporal change in luminance at each pixel [16,17]. Considering the following advantages of the gamma kernel in processing temporal information, such as trivial stability, easy adaptation, impulse response support region, and order decoupling [37,38], we can define the impulse response of the temporal bandpass filter H(t) as the difference between two gamma kernels.

H (t) = Γ_{n_{1}, τ_{1}} (t) - Γ_{n_{2}, τ_{2}} (t)

(13)

Γ_{n, τ} (t) = {(n t)}^{n} \frac{\exp (- n t / τ)}{(n - 1)! \cdot τ^{n + 1}}

(14)

where

Γ_{n, τ} (t)

represents an

n

order gamma kernel function with a time constant

τ

, then the outputs are defined as:

L (x, y, t) = \int P (x, y, s) H (t - s) ds

(15)

The ON response and OFF response are then further separated to represent the increase and decrease in luminance.

S^{ON} (x, y, t) = {[L (x, y, t)]}^{+}

(16)

S^{OFF} (x, y, t) = {[- L (x, y, t)]}^{+}

(17)

where

S^{ON}

and

S^{OFF}

respectively represent the positive and negative values of

L (x, y, t)

, which are called ON response and OFF response, reflecting increase and decrease in luminance, respectively.

The ON response and OFF response are convolved with the gamma function for delay processing, which can be obtained by

S_{}^{D-ON} (x, y, t) = \int S^{ON} (x, y, t) Γ_{n_{3}, τ_{3}} (t - s) ds

(18)

S_{}^{D-OFF} (x, y, t) = \int S^{OFF} (x, y, t) Γ_{n_{4}, τ_{4}} (t - s) ds

(19)

The outputs of neurons in the superficial layer of optic tectum are defined as:

D (x, y, t) = S^{ON} (x, y, t) \cdot S_{}^{D-OFF} (x, y, t) - S^{OFF} (x, y, t) \cdot S_{}^{D-ON} (x, y, t)

(20)

2.2.3. Intermediate and Deep Layers of the Optic Tectum

So far, the above processing method can detect the motion information of the target, but it will also respond to the flashed target. The accumulation mechanism of optic tectum neurons [2] is further simulated. The accumulative calculation process is divided into four main directions, which are respectively accumulated in the four directions. Note that more directions may enhance the performance of the model but will increase the amount and complexity of the computation. Taking the rightward accumulation as an example, the schematic diagram is shown in Figure 1a.

For the pixel point

(x_{m}, y_{q}, t_{k})

of the input image when the target moved to the right, the responses of the intermediate-deep layers in the optic tectum are represented by:

E (x_{m}, y_{q}, t_{k}, 1) = {\begin{matrix} D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{q - 1} W_{i} * D (x_{m}, y_{q - i}, t_{k - i}), & q < k, \\ D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{k - 1} W_{i} * D (x_{m}, y_{q - i}, t_{k - i}), & q \geq k . \end{matrix}

(21)

In the same way, to the left,

E (x_{m}, y_{q}, t_{k}, 2) = {\begin{matrix} D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{Q - q} W_{i} * D (x_{m}, y_{q + i}, t_{k - i}), & (Q - q) < k, \\ D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{k - 1} W_{i} * D (x_{m}, y_{q + i}, t_{k - i}), & (Q - q) \geq k . \end{matrix}

(22)

To the up,

E (x_{m}, y_{q}, t_{k}, 3) = {\begin{matrix} D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{m - 1} W_{i} * D (x_{m - i}, y_{q}, t_{k - i}), & m < k, \\ D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{k - 1} W_{i} * D (x_{m - i}, y_{q}, t_{k - i}), & m \geq k . \end{matrix}

(23)

To the down,

E (x_{m}, y_{q}, t_{k}, 4) = {\begin{matrix} D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{M - m} W_{i} * D (x_{m + i}, y_{q}, t_{k - i}), & (M - m) < k, \\ D (x_{m}, y_{q}, t_{k}) + \sum_{i = 1}^{k - 1} W_{i} * D (x_{m + i}, y_{q}, t_{k - i}), & (M - m) \geq k . \end{matrix}

(24)

All pixels in the first frame are defined as

E (x_{m}, y_{q}, t_{k}, 0) = D (x_{m}, y_{q}, t_{k})

with

k = 1

. Consequently, the neural response

E (x, y, t, d i r)

at each pixel point can be obtained, where

x

and

y

represent space coordinates, and

t

represents time, and

d i r

represents motion direction. The accumulative weight

W_{i}

is represented by an exponential function with a mathematical constant

e

as the base and a value range of (0, 1). Then, for the pixel

(x_{m}, y_{q}, t_{k})

, the neural response contains four values (indicating the response to four directions of motion respectively), and the largest response value is selected as the neural response at this position. The direction of motion with the largest response value is recorded at the same time. If the response values in the four directions are all lower than a preset response value, it can be considered that there is no moving target at this position. The calculation runs as follows:

F (x_{m}, y_{q}, t_{k}, d) = {\begin{matrix} E (x_{m}, y_{q}, t_{k}, 0), & E (x_{m}, y_{q}, t_{k}, d i r) < E_{e}, \\ \max (E (x_{m}, y_{q}, t_{k}, d i r)), & E (x_{m}, y_{q}, t_{k}, d i r) \geq E_{e}, \end{matrix}

(25)

where

d i r

denote the four motion directions (1—right, 2—left, 3—up, 4—down).

d

stands for the motion direction at this position.

d = 0

indicates that there is no motion at this position.

After this calculation, the output of each frame image will obtain a two-dimensional matrix

F_{t_{k}} (x_{n}, y_{m})

. It has been shown that the optic tectum and the isthmic nucleus together constitute the midbrain saliency network, which is responsible for calculating the highest-priority stimuli and outputting them through the tectum [39,40,41]. Here, we consider moving objects as the highest priority stimuli in the field of view, so only one object location is outputted per frame image. Therefore, by setting the detection threshold, if the maximum value in the two-dimensional matrix

F_{t_{k}} (x_{n}, y_{m})

is greater than the threshold, the corresponding position is determined as the target position detected in the current frame, and the area where the response is not lower than 80% of the maximum response is the detected target area; otherwise, it can be considered that there is no moving target in the frame.

2.3. Testing Environment and Visual Dataset

The proposed model was tested in MATLAB R2019a (The MathWorks Inc., Natick, MA, USA). All the computation and analysis were executed on a computer with an Intel (R) Core (TM) i7-7700 CPU @ 3.60 HGz, and the memory of the computer was 16 GB. We evaluated the performance of the proposed model in two synthetic datasets (generated with Matlab) and two real datasets (STNS and RIST datasets, downloaded online). The STNS dataset is available at https://figshare.com/articles/STNSDataset/4496768, accessed on 16 June 2022. The RIST dataset is available at https://sites.google.com/view/hongxinwang-personalsite/download, accessed on 22 August 2022. The parameter settings are shown in Table 1.

Each of the synthetic datasets contains 300 frame images, and the size of each image is 250 × 500 pixels. The two synthetic videos and STNS datasets are described below.

(1): Synthetic videos with one moving target only.

Four groups of videos were generated to simulate the straight and uniform motion of a single object (with a size of 5 × 5 pixels) moving along four directions. As shown in Figure 2a, a black rectangular target moves straight to the right at a constant speed. The image sequences shown in Figure 2b–d simulate the straight, uniform motion to the left, up, and down, respectively.

(2): Synthetic videos with a moving target as well as flashed false targets.

Taking the target moving to the right as an example, two groups of videos were generated, both containing a single moving target and false targets. The false targets are flashed in random locations at a specific time in one group (Figure 3a). In the other group, they are flashed at random times in specific locations (Figure 3b).

(3): STNS dataset and RIST dataset

The STNS dataset contains 25 real videos featuring various moving targets and environments. The RIST dataset contains 19 real videos featuring various moving targets and environments. The scenarios include multiple challenges, such as ripples of leaves and water, camera motion, and rotational motion of targets [26,42,43].

To evaluate the performance of the proposed visual system model, the detection rate and false alarm rate are defined as follows:

D_{R} = \frac{TN}{GN}

(26)

F_{A} = \frac{FN}{AN}

(27)

where

TN

represents the number of moving target positions correctly detected by the model.

GN

denotes the total number of moving targets in the real video.

FN

represents the number of moving target position errors detected by the model.

AN

represents the total number of image sequences in the video. If the error between the detection result and the label value is less than 5 pixels, it is considered that the model correctly detects the moving target position [16,17].

3. Testing Results and Analysis

This section presents the results tested with the three groups of videos mentioned in Section 2.3 and further analyzes the accuracy and reliability of the model compared with the traditional EMD model without spatio-temporal accumulation module.

3.1. Test Results with Synthetic Video That Simulates One Single Moving Target

Each pixel along the path of motion exhibits regular changes in luminance when a black target is moving against a bright background. For better visualization and comparison of neural processing, we set the pixel value at

x_{0} = 127, y_{0} = 150

as input luminance signal

L (x_{0}, y_{0}, t)

. As shown in Figure 4a, L is presented against t (time in frames). The ON response

S^{ON}

and OFF response

S^{OFF}

are presented in Figure 4b,c respectively. To validate the role of the spatio-temporal information accumulation in moving target detection, we compare the outputs of the two models with and without accumulative calculation (indicated by EMD_TSA and EMD), which are shown in Figure 4d with a green line (EMD) and a purple line (EMD_TSA). From Figure 4d, we can see that both the EMD and EMD_TSA output the strongest response at

t_{0} = 150

when the target moves at this position. At other times, both models exhibit much weaker or even no response. Compared to EMD, the outputs of EMD_TSA show a larger response at

t = 150

These results show that both EMD and EMD_TSA can detect moving objects. However, the spatio-temporal information accumulation can enhance the response to a moving target.

3.2. Test Results with Synthetic Video That Simulate a Moving Target Disturbed by Flashed False Targets

Next, the model is challenged by the typical motion patterns with false targets which flash randomly in time or location. We first show the model response to moving targets with false targets flashed randomly in space (Figure 5). To intuitively illustrate the signal processing of the model, we also present the test results in a similar way to Section 3.1 to observe the output of the model with respect to

y

by setting

x

and

t

as

x_{0} = 127

,

t_{0} = 150

. The input luminance signal

L (x_{0}, y, t_{0})

is shown in Figure 5a. The red asterisk marks the position of the moving target, and the gray asterisk marks the position of flashed false targets. The luminance of the moving target and false targets is set as the same value. The ON response

S^{ON}

at the time

t_{0} = 154

and OFF response

S^{OFF}

at the time

t_{0} = 150

are presented in Figure 5a,c. The ON response corresponds to luminance increase while the OFF response suggests luminance decrease. So, the ON response is lagging compared to the OFF response. The outputs of the EMD (green line) and EMD_TSA (yellow line) are illustrated in Figure 5d. We observe that the EMD outputs a similar response at

y_{0} = 20

,

y_{0} = 100

,

y_{0} = 150

,

y_{0} = 200

,

y_{0} = 300

, and

y_{0} = 450

which are the locations of the moving target and false targets. The EMD_TSA shows a much higher response at the position of the moving target (

y_{0} = 150

) than those of the flashed false targets,

y_{0} = 20

,

y_{0} = 100

,

y_{0} = 200

,

y_{0} = 300

, and

y_{0} = 450

The above results indicate that EMD_TSA shows a stronger response to a moving target than to flashed targets in space.

We then compare the model response to a moving target with flashed false targets in time (Figure 6). We first fix

x

and

y

as

x_{0} = 127, y_{0} = 150

, then illustrate

L (x_{0}, y_{0}, t)

with respect to

t

in Figure 6a. Similarly, the red asterisk marks the position of the moving target, and the gray asterisk marks the position of the flashed false targets. Figure 6a–c shows the input luminance signal

I (x_{0}, y_{0}, t)

, ON response

S^{ON}

, and OFF response

S^{OFF}

, respectively. Figure 6d illustrates the outputs of the EMD (green line) and EMD_TSA (purple line). Both EMD and EMD_TSA respond to the moving target and flashed false targets. There was no significant difference between the outputs of the EMD when two types of targets appeared simultaneously. The outputs of the EMD_TSA are stronger than those of the false targets at the time when the moving target appears. These results indicate that EMD_TSA also shows stronger responses to a moving target than to flashed targets in the time domain.

Taken together, these results are consistent with neurophysiological findings that the neurons responded stronger to moving stimuli compared to flashed ones in the avian optic tectum [2,44]. These results show that adding the spatio-temporal information accumulation module could enhance the response of the model to the moving target compared to the flashed target, which is beneficial in distinguishing the flashed false target from the moving target and can effectively suppress the disturbance of the flashed target.

In the newly added module of the proposed model, one key parameter is the length of spatio-temporal information accumulation. This parameter could impact the output of the model to a moving target and may further affect the difference between the response to moving and false targets. Thus, we analyzed the output of the model under different lengths (2, 10, 30) of spatio-temporal information accumulation, indicated by EMD_TSA-2, EMD_TSA-10, and EMD_TSA-30 in Figure 7a. EMD_TSA-2 shows similar responses at the position of the moving target and false targets. EMD_TSA-10 and EMD_TSA-30 show a stronger response at the position of the moving target than that at the position of the false targets. The response ratio (RR) is then defined by

F_{p e a k} (moving) \div F_{p e a k} (false)

to quantify the output difference at the position of the moving target and false targets. The variation trend of the RR along with the length of spatio-temporal information accumulation is shown in Figure 7b. It can be seen from Figure 7b that, as the length of spatio-temporal information accumulation increases, the RR increases gradually. When the RR is greater than 5, it is considered that the two responses can be clearly distinguished. Therefore, in the subsequent experiments, the length of spatio-temporal information accumulation was set as 20.

3.3. Test Results with STNS and RIST Dataset

To validate the performance of the proposed model in detecting moving targets against cluttered backgrounds, we first compared the response of EMD and EMD_TSA with the STNS dataset to natural scenes. The comparison results are summarized in Table 2 (detection rate) and Table 3 (false alarm rate). The scatterplot of the detection rate and false alarm rate is represented in Figure 8. Based on the results shown in Table 2 and Table 3, we can easily find that the detection rate of EMD_TSA is greater than or equal to the detection rate of EMD with most of the data (20/25, Black font), and the false alarm rate of EMD_TSA is less than that of EMD (20/25, Black font). Furthermore, Figure 8a,b indicates the EMD_TSA has a significantly higher detection accuracy than EMD (t-test, p < 0.01). The results suggest the role of spatio-temporal information accumulation for the moving object detection against cluttered backgrounds. The detailed factors of poor performance with some of the data are further analyzed in the Discussion.

Finally, we validated the performance of EMD_TSA by comparing its performance with DSTMD [16] (directionally selective small target motion detectors. Available: https://github.com/wanghongxin/DSTMD, accessed on 7 October 2022) and STMD plus model [26] (Available: https://github.com/wanghongxin/STMD-Plus, accessed on 8 October 2022) in the STNS and RIST dataset. The comparison results are shown in Figure 9. As is shown in Figure 9a, most dots (t-test, p < 0.05) were below the diagonal line indicating that the detection rates of the EMD_TSA were slightly higher than that of the DSTMD and STMD plus model. In Figure 9a, most dots (t-test, p < 0.05) appeared in the above diagonal, indicating the false alarm rates of EMD_TSA are less than that of the other two models in both databases. In summary, these results show that EMD_TSA obtains the better performance for real videos, which means that the EMD_TSA can work more stably for different cluttered backgrounds and target types.

4. Discussion

In this study, we proposed a moving target detection model (EMD_TSA) inspired by spatio-temporal information accumulation of avian tectal neurons for moving target detection. The spatio-temporal information accumulation is modeled by correlating signals from the adjacent position along the motion direction. The EMD_TSA is enhanced in the response to moving targets and has advantages in discriminating moving targets from flashed false targets compared to the traditional EMD model. Systematic experiments showed that the output of EMD_TSA is not only consistent with the neural response properties of avian tectal neurons, but also demonstrates the advantages of the model in detecting a moving target and filtering out meaningless features, even in a natural scene.

The EMD-based models have been shown to be capable of accurately simulating the characterization of the relevant neural responses in insect visual systems, including contrast sensitivity, height tuning, and velocity tuning [1,16,18]. The EMD_TSA proposed in our study was based on the framework of EMD and further added an accumulative computation process inspired by neuronal mechanisms of the avian optic tectum. The testing results of EMD_TSA are also consistent with neurophysiological findings in the avian optic tectum, indicating that the proposed model is applicable to describe the visual system of the avian. In particular, the EMD_TSA showed a larger response to moving targets and a weaker response to flashed false targets compared to the traditional EMD-based model without spatio-temporal information accumulation. This agrees with the results of neurophysiological experiments in previous work that tectal neurons responded stronger to moving stimuli than flashed ones [2,44]. The accumulation process in a biological system reflects how neural activity evolves over time, and exists in value-based decision-making [45]. The spatio-temporal information accumulation of avian tectal neurons may depend on the sequential activation of new dendritic endings in the space dimension as moving stimulus would cause the accumulation of activation in the time dimension [2,44]. Indeed, over the last few years, the application of accumulative computation in motion detection has already been introduced [46,47,48,49]. For example, accumulative computation was proposed based on a synaptic structure with a local process of accumulation of persistent activity [50] and then used for motion detection [46,51,52,53,54,55] based on the allocation of charge levels, which are related to the history of motion presence detection of the pixel. Our accumulative computation method is new in the sense that it enhances the strength of response to moving targets along the motion direction after EMD extracts motion information at every pixel location. Additionally, we found that the length of spatio-temporal information accumulation has an impact on the output of the model. The accumulative calculation means that the current model output depends on the historical output. Hence, as the length of spatio-temporal information accumulation increases, the EMD_TSA responds far more strongly to a moving target. In natural scenes, the motion of targets is not kept constant. Large values of this parameter tended to reduce model performance. In this paper, we use the exponential function as the weight of accumulative calculation, describing the influence of past image information upon the current state of the model. The weight function affects the results of the model. However, the exponential function can give higher weight distribution to the image information that is nearest to the current frame compared to the other weight functions. Therefore, it is necessary to choose the appropriate parameter value for the model. Collectively, these results suggest that accumulative computation seems to be a general strategy of biological computation for moving target detection.

Detecting target motion is relatively easy, but distinguishing moving targets from false targets is more challenging and difficult [17,26,56,57]. The real natural environment always contains a few false targets, such as shaking leaves, water ripples, and targets under drastic illumination changes. Thus, the superior performance of EMD_TSA was further verified by applying it to detect motion in natural scenes. The common way is to comprehensively evaluate real datasets [16,17,43,57], in which the STNS and the RIST datasets are commonly used [26,42,43]. The testing results in our study show that the EMD_TSA performs better in detecting accuracy and false alarm rate than EMD on the motion target detection. This result further verifies the superiority of the module of spatio-temporal information accumulation and would be helpful to understand the underlying mechanism of motion information processing. Comprehensive evaluation on the real datasets, and comparisons with the DSTMD and STMD plus models demonstrate the superior detection performance of EMD_TSA. It has to be admitted that our model performance has room for improvement on several datasets of STNS and RIST datasets. There may be two reasons. (1) These datasets may contain many rotationally moving targets or looming/receding targets, not translationally, or (2) camera shake causes global motion, which may be mistaken for a moving target. For our model, the mechanism of spatio-temporal information accumulation was based on translational motion [2,31], the characteristics of which were quite different from rotational and looming/receding motions. The direction for translational motion is usually kept constant, but rotational motion and looming/receding motion are not. So, our model performance has room for improvement on rotational motion and looming/receding motion. The global motion means that the entire scene is moving, which may lead to misjudgment of the EMD_TSA. We believe that this model provides an effective solution for future studies on detecting moving targets, timely and robustly, because these processes are consistent with the neural mechanisms of avian visual systems. In the future, we can add more modules to the model to suppress global motion and apply it to more general situations.

Together, the insights gained in this study benefit the understanding of the motion detection mechanism of tectal neurons and provide effective solutions for moving target detection in computer vision. The in-depth knowledge of the biological mechanism of the visual system will help to build better models. We have to admit that biological signals are commonly complex and changeable. The good signal analysis method can play a significant role, such as multiscale principal component analysis (MSPCA) [58], empirical wavelet transform [58,59], non-linear features (including mean energy, mean Teager–Kaiser energy, Shannon wavelet entropy, and Log energy entropy). The combination of signal decomposition with dimension reduction techniques along with neural networks [60] is also an effective tool for biological signal analysis. In the future, we will investigate various visual neurons which extract different cues simultaneously, and further integrate these studies to improve the detection performance of the existing model.

5. Conclusions

This paper proposed a moving target detection model inspired by the neural mechanism of spatio-temporal information accumulation. The mechanism comes from the research on tectal neurons, which assumes that faster and stronger response to moving targets results from the spatio-temporal integration process. Extensive testing experiments were carried out to verify the proposed model’s effectiveness and demonstrate its responsive property to moving targets. In addition, the comparison results with the traditional EMD-based model show that the proposed mechanism performs better in detecting moving targets in most natural scenes. Further research should be conducted to investigate the biological system more in depth, which is expected to improve the detection performance in other scenes.

Author Contributions

Conceptualization, S.H. and X.N.; methodology, S.H.; software, Z.W.; validation, G.L., Z.W. and L.S.; formal analysis, S.H.; investigation, S.H. and X.N.; resources, L.S. and X.N; data curation, S.H. and Z.W.; writing—original draft preparation, S.H.; writing—review and editing, X.N. and G.L.; visualization, S.H. and Z.W.; supervision, L.S. and X.N.; project administration, L.S. and X.N.; funding acquisition, L.S. and X.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number: 62206253), the National Natural Science Foundation of China (grant number: 62173309), Henan Provincial Key R&D and Promotion Special Project (Science Scientific Problem Tackling): 222102310223.

Data Availability Statement

Data are contained within the article.

Acknowledgments

Many thanks to all our lab members for helpful discussions. Thanks for the support of experimental equipment in the Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology. We thank Xingtong Wang from Tsinghua University for the English revision.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fu, Q.; Yue, S. Modelling Drosophila motion vision pathways for decoding the direction of translating objects against cluttered moving backgrounds. Biol. Cybern. 2020, 114, 443–460. [Google Scholar] [CrossRef] [PubMed]
Huang, S.; Niu, X.; Wang, J.; Wang, Z.; Xu, H.; Shi, L. Visual Responses to Moving and Flashed Stimuli of Neurons in Domestic Pigeon (Columba livia domestica) Optic Tectum. Animals 2022, 12, 1798. [Google Scholar] [CrossRef] [PubMed]
Fu, Q.; Wang, H.; Hu, C.; Yue, S. Towards Computational Models and Applications of Insect Visual Systems for Motion Perception: A Review. Artif. Life 2019, 25, 263–311. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Carrillo, M.A.I.; Fonseca-Chávez, E. Bio-inspired for Detection of Moving Objects Using Three Sensors. Int. J. Electron. Electr. Eng. 2017, 5, 245–249. [Google Scholar] [CrossRef] [Green Version]
Martínez-Cañada, P.; Morillas, C.; Pino, B.; Ros, E.; Pelayo, F. A Computational Framework for Realistic Retina Modeling. Int. J. Neural Syst. 2016, 26, 1650030. [Google Scholar] [CrossRef]
Barlow, H.B.; Levick, W.R. The mechanism of directionally selective units in rabbit′s retina. J. Physiol. 1965, 178, 477–504. [Google Scholar] [CrossRef] [Green Version]
Adelson, E.H.; Bergen, J.R. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 1985, 2, 284–299. [Google Scholar] [CrossRef] [Green Version]
Frye, M. Elementary motion detectors. Curr. Biol. 2015, 25, R215–R217. [Google Scholar] [CrossRef] [Green Version]
Hassenstein, B.; Reichardt, W. Systemtheoretische Analyse der Zeit-, Reihenfolgen- und Vorzeichenauswertung bei der Bewegungsperzeption des Rüsselkäfers Chlorophanus. Z. Für Nat. B 1956, 11, 513–524. [Google Scholar] [CrossRef] [Green Version]
Barnes, T.; Mingolla, E. Representation of motion onset and offset in an augmented Barlow-Levick model of motion detection. J. Comput. Neurosci. 2012, 33, 421–434. [Google Scholar] [CrossRef] [Green Version]
Uchiyama, H.; Kanaya, T.; Sonohata, S. Computation of motion direction by quail retinal ganglion cells that have a nonconcentric receptive field. Vis. Neurosci. 2000, 17, 263–271. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Manookin, M.B. Neuroscience: Reliable and refined motion computations in the retina. Curr. Biol. 2022, 32, R474–R476. [Google Scholar] [CrossRef] [PubMed]
Emerson, R.C.; Bergen, J.R. Nonlinear Analysis of Motion Energy Calculations in Cat Visual Cortex. In Proceedings of the Fifteenth Annual Northeast Bioengineering Conference, Boston, MA, USA, 27–28 March 1989; pp. 143–144. [Google Scholar]
Lochmann, T.; Blanche, T.J.; Butts, D.A. Construction of direction selectivity through local energy computations in primary visual cortex. PLoS One 2013, 8, e58666. [Google Scholar] [CrossRef]
Nordstrom, K.; Barnett, P.D.; O’Carroll, D.C. Insect detection of small targets moving in visual clutter. PLoS Biol. 2006, 4, e54. [Google Scholar] [CrossRef]
Wang, H.; Peng, J.; Yue, S. A Directionally Selective Small Target Motion Detecting Visual Neural Network in Cluttered Backgrounds. IEEE Trans. Cybern. 2018, 50, 1541–1555. [Google Scholar] [CrossRef]
Wang, H.; Wang, H.; Zhao, J.; Hu, C.; Peng, J.; Yue, S. A Time-Delay Feedback Neural Network for Discriminating Small, Fast-Moving Targets in Complex Dynamic Environments. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 316–330. [Google Scholar] [CrossRef]
Wiederman, S.D.; Shoemaker, P.A.; O’Carroll, D.C. A model for the detection of moving targets in visual clutter inspired by insect physiology. PLoS One 2008, 3, e2784. [Google Scholar] [CrossRef] [Green Version]
Joesch, M.; Schnell, B.; Raghu, S.V.; Reiff, D.F.; Borst, A. ON and OFF pathways in Drosophila motion vision. Nature 2010, 468, 300–304. [Google Scholar] [CrossRef]
Meier, M.; Serbe, E.; Maisak, M.S.; Haag, J.; Dickson, B.; Borst, A. Neural Circuit Components of the Drosophila OFF Motion Vision Pathway. Curr. Biol. 2014, 24, 385–392. [Google Scholar] [CrossRef] [Green Version]
Ling, J.; Wang, H.; Xu, M.; Chen, H.; Li, H.; Peng, J. Mathematical study of neural feedback roles in small target motion detection. Front. Neurorobot. 2022, 16, 984430. [Google Scholar] [CrossRef]
James, J.V.; Cazzolato, B.S.; Grainger, S.; Wiederman, S.D. Nonlinear, neuronal adaptation in insect vision models improves target discrimination within repetitively moving backgrounds. Bioinspir. Biomim. 2021, 16, 066015. [Google Scholar] [CrossRef] [PubMed]
Shoemaker, P.A. Neural Network Model for Detection of Edges Defined by Image Dynamics. Front. Comput. Neurosci. 2019, 13, 76. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chen, J.; Mandel, H.B.; Fitzgerald, J.E.; Clark, D.A. Asymmetric ON-OFF processing of visual motion cancels variability induced by the structure of natural scenes. eLife 2019, 8, e47579. [Google Scholar] [CrossRef] [PubMed]
Evans, B.J.E.; O′Carroll, D.C.; Fabian, J.M.; Wiederman, S.D. Differential Tuning to Visual Motion Allows Robust Encoding of Optic Flow in the Dragonfly. J. Neurosci. 2019, 39, 8051–8063. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, H.; Peng, J.; Zheng, X.; Yue, S. A Robust Visual System for Small Target Motion Detection Against Cluttered Moving Backgrounds. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 839–853. [Google Scholar] [CrossRef] [Green Version]
Donovan, W.J. Structure and function of the pigeon visual system. Physiol. Psychol. 1978, 6, 403–437. [Google Scholar] [CrossRef] [Green Version]
Cowan, W.M.; Adamson, L.; Powell, T.P. An experimental study of the avian visual system. J. Anat. 1961, 95, 545–563. [Google Scholar]
Wang, S.; Ma, Q.; Qian, L.; Zhao, M.; Wang, Z.; Shi, L. Encoding Model for Continuous Motion-sensitive Neurons in the Intermediate and Deep Layers of the Pigeon Optic Tectum. Neuroscience 2022, 484, 1–15. [Google Scholar] [CrossRef]
Knudsen, E.I. Evolution of neural processing for visual perception in vertebrates. J. Comp. Neurol. 2020, 528, 2888–2901. [Google Scholar] [CrossRef] [Green Version]
Shuman, H.; Niu, X.; Shi, l. An Accumulated Energy Encoding Model of the Pigeon Optic Tectum: Accounting for the Difference of Response to Moving and Flashed Stimulus. Int. J. Psychophysiol. 2021, 168, S185. [Google Scholar] [CrossRef]
Borst, A.; Helmstaedter, M. Common circuit design in fly and mammalian motion vision. Nat. Neurosci. 2015, 18, 1067–1076. [Google Scholar] [CrossRef] [PubMed]
Borst, A.; Haag, J.; Reiff, D.F. Fly Motion Vision. Annu. Rev. Neurosci. 2010, 33, 49–70. [Google Scholar] [CrossRef] [PubMed]
McNeil, R.; McSween, A.; Lachapelle, P. Comparison of the Retinal Structure and Function in Four Bird Species as a Function of the Time They Start Singing in the Morning. Brain, Behav. Evol. 2005, 65, 202–214. [Google Scholar] [CrossRef] [PubMed]
Tyrrell, L.P.; Teixeira, L.B.C.; Dubielzig, R.R.; Pita, D.; Baumhardt, P.; Moore, B.A.; Fernández-Juricic, E. A novel cellular structure in the retina of insectivorous birds. Sci. Rep. 2019, 9, 15230. [Google Scholar] [CrossRef] [Green Version]
Wang, S.; Wang, M.; Wang, Z.; Shi, L. First spike latency of ON/OFF neurons in the optic tectum of pigeons. Integr. Zool. 2019, 14, 479–493. [Google Scholar] [CrossRef] [PubMed]
Principe, J.C.; Vries, B.d.; Oliveira, P.G.d. The gamma-filter-a new class of adaptive IIR filters with restricted feedback. IEEE Trans. Signal Process. 1993, 41, 649–656. [Google Scholar] [CrossRef]
Vries, B.; Principe, J. A Theory for Neural Networks with Time Delays. In Proceedings of the 1990 Conference on Neural Information Processing Systems (NIPS), Denver, CO, USA, 26–29 November 1990; Volume 3, pp. 162–168. [Google Scholar]
Mysore, S.; Asadollahi, A.; Knudsen, E. Global Inhibition and Stimulus Competition in the Owl Optic Tectum. J. Neurosci. Off. J. Soc. Neurosci. 2010, 30, 1727–1738. [Google Scholar] [CrossRef] [Green Version]
Marin, G.J.; Duran, E.; Morales, C.; Gonzalez-Cabrera, C.; Sentis, E.; Mpodozis, J.; Letelier, J.C. Attentional Capture Synchronized Feedback Signals from the Isthmi Boost Retinal Signals to Higher Visual Areas. J. Neurosci. 2012, 32, 1110–1122. [Google Scholar] [CrossRef] [Green Version]
Uchiyama, H.; Ohno, H.; Kawasaki, T.; Owatari, Y.; Narimatsu, T.; Miyanagi, Y.; Maeda, T. Attentional signals projecting centrifugally to the avian retina: A dual contribution to visual search. Vis. Res. 2022, 195, 108016. [Google Scholar] [CrossRef]
Bagheri, Z.M.; Wiederman, S.D.; Cazzolato, B.S.; Grainger, S.; O’Carroll, D.C. Performance of an insect-inspired target tracker in natural conditions. Bioinspir. Biomim. 2017, 12, 025006. [Google Scholar] [CrossRef] [Green Version]
Fu, Q.; Peng, J.; Yue, S. Bioinspired Contrast Vision Computation for Robust Motion Estimation Against Natural Signals. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
Luksch, H.; Khanbabaie, R.; Wessel, R. Synaptic dynamics mediate sensitivity to motion independent of stimulus details. Nat. Neurosci. 2004, 7, 380–388. [Google Scholar] [CrossRef]
Keung, W.; Hagen, T.A.; Wilson, R.C. A divisive model of evidence accumulation explains uneven weighting of evidence over time. Nat. Commun. 2020, 11, 2160. [Google Scholar] [CrossRef]
Sanchez, J.L.; Lopez, M.T.; Manuel Pastor, J.; Delgado, A.E.; Fernandez-Caballero, A. Accelerating bioinspired lateral interaction in accumulative computation for real-time moving object detection with graphics processing units. Nat. Comput. 2019, 18, 217–227. [Google Scholar] [CrossRef]
Bermudez, A.; Montero, F.; Lopez, M.T.; Fernandez-Caballero, A.; Sanchez, J.L. Optimization of lateral interaction in accumulative computation on GPU-based platform. J. Supercomput. 2019, 75, 1670–1685. [Google Scholar] [CrossRef]
López, M.T.; Bermúdez, A.; Montero, F.; Sánchez, J.L.; Fernández-Caballero, A. A Finite State Machine Approach to Algorithmic Lateral Inhibition for Real-Time Motion Detection. Sensors 2018, 18, 1420. [Google Scholar] [CrossRef] [Green Version]
Sanchez, J.L.; Viana, R.; Lopez, M.T.; Fernandez-Caballero, A. Acceleration of Moving Object Detection in Bio-Inspired Computer Vision. In Proceedings of the 6th International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC), Corunna, Spain, 19–23 June 2017; pp. 364–373. [Google Scholar]
Fernández, M.A.; Mira, J.; López, M.T.; Álvarez, J.R.; Manjarrés, A.; Barro, S. Local Accumulation of Persistent Activity at Synaptic Level: Application to Motion Analysis; Springer: Berlin/Heidelberg, Germany, 1995; pp. 137–143. [Google Scholar]
Delgado, A.E.; López, M.T.; Fernández-Caballero, A. Real-time motion detection by lateral inhibition in accumulative computation. Eng. Appl. Artif. Intell. 2010, 23, 129–139. [Google Scholar] [CrossRef] [Green Version]
Fernández-Caballero, A.; López, M.T.; Castillo, J.C.; Maldonado-Bascón, S. Real-Time Accumulative Computation Motion Detectors. Sensors 2009, 9, 10044–10065. [Google Scholar] [CrossRef] [Green Version]
Yuan, F. A fast accumulative motion orientation model based on integral image for video smoke detection. Pattern Recognit. Lett. 2008, 29, 925–932. [Google Scholar] [CrossRef]
Fernández-Caballero, A.; Pérez-Jiménez, R.; Fernández, M.A.; López, M.T. Comparison of Accumulative Computation with Traditional Optical Flow. In Knowledge-Based Intelligent Information and Engineering Systems: 11th International Conference, KES 2007, XVII Italian Workshop on Neural Networks, Vietri sul Mare, Italy, 12–14 September 2007; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2007; pp. 447–454. [Google Scholar]
Fernández-Caballero, A.; López, M.T.; Fernández, M.A.; Mira, J.; Delgado, A.E.; López-Valles, J.M. Accumulative Computation Method for Motion Features Extraction in Active Selective Visual Attention. In Attention and Performance in Computational Vision: Second International Workshop, WAPCV 2004, Prague, Czech Republic, 15 May 2004; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; pp. 206–215. [Google Scholar]
Lei, F.; Peng, Z.; Liu, M.; Peng, J.; Cutsuridis, V.; Yue, S. A Robust Visual System for Looming Cue Detection Against Translating Motion. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–15, to be published. [Google Scholar] [CrossRef]
Wang, H.; Zhao, J.; Wang, H.; Hu, C.; Peng, J.; Yue, S. Attention and Prediction-Guided Motion Detection for Low-Contrast Small Moving Targets. IEEE Trans. Cybern. 2022, 1–13, to be published. [Google Scholar] [CrossRef]
Sadiq, M.T.; Yu, X.; Yuan, Z.; Aziz, M.Z. Motor imagery BCI classification based on novel two-dimensional modelling in empirical wavelet transform. Electron. Lett. 2020, 56, 1367–1369. [Google Scholar] [CrossRef]
Sadiq, M.T.; Yu, X.; Yuan, Z.; Fan, Z.; Rehman, A.U.; Li, G.; Xiao, G. Motor Imagery EEG Signals Classification Based on Mode Amplitude and Frequency Components Using Empirical Wavelet Transform. IEEE Access 2019, 7, 127678–127692. [Google Scholar] [CrossRef]
Sadiq, M.T.; Yu, X.; Yuan, Z. Exploiting dimensionality reduction and neural network techniques for the development of expert brain–computer interfaces. Expert Syst. Appl. 2021, 164, 114031. [Google Scholar] [CrossRef]

Figure 1. Schematic of the elementary motion detector with spatio-temporal information accumulation (EMD_TSA). (a) Schematic diagram showing the overall design of EMD_TSA. OT_s: the superficial layers of optic tectum; OT_id; the intermediate-deep layers of optic tectum; D; the time delay unit. (b) Schematic illustration of spatio-temporal information accumulation along the motion direction. This figure is a visual example of Equation (21).

Figure 2. Schematic illustration of moving target. (a) The direction of motion was right. (b) The direction of motion was left. (c) The direction of motion was up. (d) The direction of motion was down.

Figure 3. Schematic illustration of moving target with flashed false targets. (a) A single moving target with flashed false targets in time. (b) A single moving target with flashed false targets in space.

Figure 4. Outputs of various layers in the proposed model for a single moving target. (a) The input luminance signal

L (x_{0}, y_{0}, t)

. (b) ON response

S^{ON}

of the proposed model. (c) OFF response

S^{OFF}

of the proposed model. (d) The outputs of the EMD (green line) and EMD_TSA (purple line). The red asterisk marks the position of the moving target.

Figure 4. Outputs of various layers in the proposed model for a single moving target. (a) The input luminance signal

L (x_{0}, y_{0}, t)

. (b) ON response

S^{ON}

of the proposed model. (c) OFF response

S^{OFF}

of the proposed model. (d) The outputs of the EMD (green line) and EMD_TSA (purple line). The red asterisk marks the position of the moving target.

Figure 5. Outputs of various layers in the proposed model for moving target with flashed false targets in space. (a) The input luminance signal

I (x_{0}, y, t_{0})

. (b) ON response

S^{ON}

of the proposed model. (c) OFF response

S^{OFF}

of the proposed model. (d) The outputs of EMD (green line) and EMD_TSA (yellow line). The red asterisk marks the position of the moving target, and the gray asterisk marks the position of flashed false targets. The green dashed lines indicate the peaks of the EMD outputs, while the yellow purple dashed lines correspond to the peaks of EMD_TSA outputs.

Figure 5. Outputs of various layers in the proposed model for moving target with flashed false targets in space. (a) The input luminance signal

I (x_{0}, y, t_{0})

. (b) ON response

S^{ON}

of the proposed model. (c) OFF response

S^{OFF}

of the proposed model. (d) The outputs of EMD (green line) and EMD_TSA (yellow line). The red asterisk marks the position of the moving target, and the gray asterisk marks the position of flashed false targets. The green dashed lines indicate the peaks of the EMD outputs, while the yellow purple dashed lines correspond to the peaks of EMD_TSA outputs.

Figure 6. Outputs of various layers in the proposed model for moving target with flashed false targets in time. (a) The input luminance signal

I (x_{0}, y_{0}, t)

. (b) ON response

S^{ON}

of the proposed model. (c) OFF response

S^{OFF}

of the proposed model. (d) The outputs of EMD (green line) and EMD_TSA (purple line). The red asterisk marks the position of the moving target, and the gray asterisk marks the position of flashed false targets. The green dashed lines indicate the peaks of the EMD outputs, while the purple dashed lines correspond to the peaks of EMD_TSA outputs.

Figure 6. Outputs of various layers in the proposed model for moving target with flashed false targets in time. (a) The input luminance signal

I (x_{0}, y_{0}, t)

. (b) ON response

S^{ON}

of the proposed model. (c) OFF response

S^{OFF}

of the proposed model. (d) The outputs of EMD (green line) and EMD_TSA (purple line). The red asterisk marks the position of the moving target, and the gray asterisk marks the position of flashed false targets. The green dashed lines indicate the peaks of the EMD outputs, while the purple dashed lines correspond to the peaks of EMD_TSA outputs.

Figure 7. The output of the model under different lengths of spatio-temporal information accumulation. (a) The outputs of EMD_TSA-2, EMD_TSA-10, and EMD_TSA-30. EMD_TSA-2, EMD_TSA-10, and EMD_TSA-30 indicate that the length of spatio-temporal information accumulation is 2, 10, and 30, respectively. The red asterisk marks the position of the moving target. Dashed lines correspond to the peak of outputs of EMD_TSA-2, EMD_TSA-10. (b) The RR (response ratio) under different lengths of spatio-temporal information accumulation. Purple dots represent the values of RR under different length of spatio-temporal information accumulation.

Figure 8. Scatterplot for the performance of EMD_TSA versus EMD. Each dot represents results from a video of STNS datasets. The diagonal line displays the locus of equal value. (a) The detection rate for EMD_TSA versus EMD. (b) The false alarm rate for EMD_TSA versus EMD.

Figure 9. Scatterplot for the performance of EMD_TSA versus DSTMD and STMD plus model on STNS and RIST dataset. The diagonal line displays the locus of equal value. (a) The detection rate. (b) The false-alarm rate.

Table 1. The parameter of the proposed model.

Parameter	Value	Parameter	Value
$σ_{1}$	3	$n_{3}$	3
$n_{1}$	2	$τ_{3}$	9
$τ_{1}$	3	$n_{4}$	4
$n_{2}$	6	$τ_{4}$	8
$τ_{2}$	9

Table 2. The detection rate of the EMD and EMD_TSA under the STNS dataset.

Data	1	2	3	4	5	6	7	8	9	10	11	12	13
EMD	0.04	0.25	0.24	0.2	0.25	0.34	0.1	0.08	0.34	0.48	0.1	0.18	0.21
EMD_TSA	0.075	0.63	0.3	0.28	0.3	0.48	0.06↓	0.19	0.36	0.65	0.05↓	0.28	0.11↓
Data	14	15	16	17	18	19	20	21	22	23	24	25
EMD	0.13	0.3	0.08	0.32	0.26	0.24	0.18	0.45	0.23	0.68	0.31	0.16
EMD_TSA	0.21	0.34	0.11	0.33	0.08↓	0.34	0.28	0.5	0.38	0.59↓	0.45	0.16

Bold font indicates the detection rate of EMD_TSA is higher than that of EMD model. Red font indicates that the detection rate of the EMD_TSA is lower than that of EMD model. Red arrows indicate the decrease of detection rate.

Table 3. The false-alarm rate of the EMD and EMD_TSA with the STNS dataset.

Data	1	2	3	4	5	6	7	8	9	10	11	12	13
EMD	0.68	0.54	0.53	0.68	0.61	0.48	0.84	0.79	0.68	0.41	0.77	0.52	0.4
EMD_TSA	0.75↑	0.31	0.48	0.6	0.56	0.41	0.83	0.68	0.55	0.37	0.83↑	0.49	0.56↑
Data	14	15	16	17	18	19	20	21	22	23	24	25
EMD	0.6	0.55	0.8	0.50	0.44	0.61	0.7	0.38	0.66	0.2	0.47	0.44
EMD_TSA	0.43	0.51	0.73	0.47	0.81↑	0.54	0.58	0.4↑	0.5	0.11	0.36	0.37

Bold font indicates the false-alarm rate of EMD_TSA is lower than that of EMD model. Red font indicates that the false-alarm rate of the EMD_TSA is higher than that of EMD model. Red arrows indicate the increase of false-alarm rate.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, S.; Niu, X.; Wang, Z.; Liu, G.; Shi, L. A Moving Target Detection Model Inspired by Spatio-Temporal Information Accumulation of Avian Tectal Neurons. Mathematics 2023, 11, 1169. https://doi.org/10.3390/math11051169

AMA Style

Huang S, Niu X, Wang Z, Liu G, Shi L. A Moving Target Detection Model Inspired by Spatio-Temporal Information Accumulation of Avian Tectal Neurons. Mathematics. 2023; 11(5):1169. https://doi.org/10.3390/math11051169

Chicago/Turabian Style

Huang, Shuman, Xiaoke Niu, Zhizhong Wang, Gang Liu, and Li Shi. 2023. "A Moving Target Detection Model Inspired by Spatio-Temporal Information Accumulation of Avian Tectal Neurons" Mathematics 11, no. 5: 1169. https://doi.org/10.3390/math11051169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Moving Target Detection Model Inspired by Spatio-Temporal Information Accumulation of Avian Tectal Neurons

Abstract

1. Introduction

2. Materials and Methods

2.1. Elementary Motion Detector

2.2. Elementary Motion Detector with Spatio-Temporal Information Accumulation

2.2.1. Retina Layer

2.2.2. Superficial Layers of the Optic Tectum

2.2.3. Intermediate and Deep Layers of the Optic Tectum

2.3. Testing Environment and Visual Dataset

3. Testing Results and Analysis

3.1. Test Results with Synthetic Video That Simulates One Single Moving Target

3.2. Test Results with Synthetic Video That Simulate a Moving Target Disturbed by Flashed False Targets

3.3. Test Results with STNS and RIST Dataset

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI