Next Article in Journal
Miniaturizing Hyperspectral Lidar System Employing Integrated Optical Filters
Next Article in Special Issue
An Identification Method of Corner Reflector Array Based on Mismatched Filter through Changing the Frequency Modulation Slope
Previous Article in Journal
Space–Air–Ground–Sea Integrated Network with Federated Learning
Previous Article in Special Issue
Ship Detection with Deep Learning in Optical Remote-Sensing Images: A Survey of Challenges and Advances
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MVT: Multi-Vision Transformer for Event-Based Small Target Detection

1
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(9), 1641; https://doi.org/10.3390/rs16091641
Submission received: 11 March 2024 / Revised: 17 April 2024 / Accepted: 24 April 2024 / Published: 4 May 2024
(This article belongs to the Special Issue Remote Sensing of Target Object Detection and Identification II)

Abstract

Object detection in remote sensing plays a crucial role in various ground identification tasks. However, due to the limited feature information contained within small targets, which are more susceptible to being buried by complex backgrounds, especially in extreme environments (e.g., low-light, motion-blur scenes). Meanwhile, event cameras offer a unique paradigm with high temporal resolution and wide dynamic range for object detection. These advantages enable event cameras without being limited by the intensity of light, to perform better in challenging conditions compared to traditional cameras. In this work, we introduce the Multi-Vision Transformer (MVT), which comprises three efficiently designed components: the downsampling module, the Channel Spatial Attention (CSA) module, and the Global Spatial Attention (GSA) module. This architecture simultaneously considers short-term and long-term dependencies in semantic information, resulting in improved performance for small object detection. Additionally, we propose Cross Deformable Attention (CDA), which progressively fuses high-level and low-level features instead of considering all scales at each layer, thereby reducing the computational complexity of multi-scale features. Nevertheless, due to the scarcity of event camera remote sensing datasets, we provide the Event Object Detection (EOD) dataset, which is the first dataset that includes various extreme scenarios specifically introduced for remote sensing using event cameras. Moreover, we conducted experiments on the EOD dataset and two typical unmanned aerial vehicle remote sensing datasets (VisDrone2019 and UAVDT Dataset). The comprehensive results demonstrate that the proposed MVT-Net achieves a promising and competitive performance.
Keywords: event cameras; multi-scale fusion; remote sensing; small target detection event cameras; multi-scale fusion; remote sensing; small target detection

Share and Cite

MDPI and ACS Style

Jing, S.; Lv, H.; Zhao, Y.; Liu, H.; Sun, M. MVT: Multi-Vision Transformer for Event-Based Small Target Detection. Remote Sens. 2024, 16, 1641. https://doi.org/10.3390/rs16091641

AMA Style

Jing S, Lv H, Zhao Y, Liu H, Sun M. MVT: Multi-Vision Transformer for Event-Based Small Target Detection. Remote Sensing. 2024; 16(9):1641. https://doi.org/10.3390/rs16091641

Chicago/Turabian Style

Jing, Shilong, Hengyi Lv, Yuchen Zhao, Hailong Liu, and Ming Sun. 2024. "MVT: Multi-Vision Transformer for Event-Based Small Target Detection" Remote Sensing 16, no. 9: 1641. https://doi.org/10.3390/rs16091641

APA Style

Jing, S., Lv, H., Zhao, Y., Liu, H., & Sun, M. (2024). MVT: Multi-Vision Transformer for Event-Based Small Target Detection. Remote Sensing, 16(9), 1641. https://doi.org/10.3390/rs16091641

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop