sensors-logo

Journal Browser

Journal Browser

Video Analysis and Tracking Using State-of-the-Art Sensors

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Physical Sensors".

Deadline for manuscript submissions: closed (30 October 2017) | Viewed by 177726

Special Issue Editor

Department of Image, Graduate School of Advanced Imaging Science, Chung-Ang University, Seoul 06974, Korea
Interests: image enhancement and restoration; computational imaging; intelligent surveillance systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Object detection, identification, recognition, and tracking from video is a fundamental problem in the computer vision and image processing fields. This task requires an object modelling and motion analysis, and various types of object models have been developed for improved performance. However, a practical object detection and tracking algorithm cannot avoid a number of limitations including: Object occlusion, unstable illumination, object deformation, an insufficient resolution of input video, limited computational resources to meet the video rendering speed, to name a few. Recent developments in state-of-the-art sensors widen the application area of video object tracking by solving the practical limitations.

The objective of this Special Issue is to highlight innovative development of video analysis and tracking technologies related with various state-of-the-art sensors. Topics include, but are not limited to:

  • Detection, identification, recognition and tracking objects using various sensors
  • Multiple camera network or association for very wide range surveillance
  • Development of non-visual sensors, such as time-of-flight sensor, RGB-D camera, IR sensor, RADAR, LIDAR, motion sensor, and acoustic wave sensor, and their applications to video analysis and tracking
  • Image and video enhancement algorithms to improve the quality of visual sensors for video tracking
  • Computational photography and imaging for advanced object detection and tracking
  • Depth estimation and three-dimensional reconstruction for augmented reality (AR) and/or advanced driver assistance systems (ADAS)

Prof. Dr. Joonki Paik
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Video tracking
  • motion estimation
  • optical flow
  • RGB-D camera
  • infra-red (IR) sensor
  • RADAR
  • LIDAR
  • computational photography
  • augmented reality (AR)
  • surveillance

Published Papers (28 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

5247 KiB  
Article
High-Speed Video System for Micro-Expression Detection and Recognition
by Diana Borza, Radu Danescu, Razvan Itu and Adrian Darabant
Sensors 2017, 17(12), 2913; https://doi.org/10.3390/s17122913 - 14 Dec 2017
Cited by 22 | Viewed by 6475
Abstract
Micro-expressions play an essential part in understanding non-verbal communication and deceit detection. They are involuntary, brief facial movements that are shown when a person is trying to conceal something. Automatic analysis of micro-expression is challenging due to their low amplitude and to their [...] Read more.
Micro-expressions play an essential part in understanding non-verbal communication and deceit detection. They are involuntary, brief facial movements that are shown when a person is trying to conceal something. Automatic analysis of micro-expression is challenging due to their low amplitude and to their short duration (they occur as fast as 1/15 to 1/25 of a second). We propose a fully micro-expression analysis system consisting of a high-speed image acquisition setup and a software framework which can detect the frames when the micro-expressions occurred as well as determine the type of the emerged expression. The detection and classification methods use fast and simple motion descriptors based on absolute image differences. The recognition module it only involves the computation of several 2D Gaussian probabilities. The software framework was tested on two publicly available high speed micro-expression databases and the whole system was used to acquire new data. The experiments we performed show that our solution outperforms state of the art works which use more complex and computationally intensive descriptors. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

1190 KiB  
Article
Iterative Refinement of Transmission Map for Stereo Image Defogging Using a Dual Camera Sensor
by Heegwang Kim, Jinho Park, Hasil Park and Joonki Paik
Sensors 2017, 17(12), 2861; https://doi.org/10.3390/s17122861 - 09 Dec 2017
Cited by 5 | Viewed by 4606
Abstract
Recently, the stereo imaging-based image enhancement approach has attracted increasing attention in the field of video analysis. This paper presents a dual camera-based stereo image defogging algorithm. Optical flow is first estimated from the stereo foggy image pair, and the initial disparity map [...] Read more.
Recently, the stereo imaging-based image enhancement approach has attracted increasing attention in the field of video analysis. This paper presents a dual camera-based stereo image defogging algorithm. Optical flow is first estimated from the stereo foggy image pair, and the initial disparity map is generated from the estimated optical flow. Next, an initial transmission map is generated using the initial disparity map. Atmospheric light is then estimated using the color line theory. The defogged result is finally reconstructed using the estimated transmission map and atmospheric light. The proposed method can refine the transmission map iteratively. Experimental results show that the proposed method can successfully remove fog without color distortion. The proposed method can be used as a pre-processing step for an outdoor video analysis system and a high-end smartphone with a dual camera system. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

21845 KiB  
Article
Vision System for Coarsely Estimating Motion Parameters for Unknown Fast Moving Objects in Space
by Min Chen and Koichi Hashimoto
Sensors 2017, 17(12), 2820; https://doi.org/10.3390/s17122820 - 05 Dec 2017
Cited by 3 | Viewed by 3707
Abstract
Motivated by biological interests in analyzing navigation behaviors of flying animals, we attempt to build a system measuring their motion states. To do this, in this paper, we build a vision system to detect unknown fast moving objects within a given space, calculating [...] Read more.
Motivated by biological interests in analyzing navigation behaviors of flying animals, we attempt to build a system measuring their motion states. To do this, in this paper, we build a vision system to detect unknown fast moving objects within a given space, calculating their motion parameters represented by positions and poses. We proposed a novel method to detect reliable interest points from images of moving objects, which can be hardly detected by general purpose interest point detectors. 3D points reconstructed using these interest points are then grouped and maintained for detected objects, according to a careful schedule, considering appearance and perspective changes. In the estimation step, a method is introduced to adapt the robust estimation procedure used for dense point set to the case for sparse set, reducing the potential risk of greatly biased estimation. Experiments are conducted against real scenes, showing the capability of the system of detecting multiple unknown moving objects and estimating their positions and poses. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

13255 KiB  
Article
Scene-Aware Adaptive Updating for Visual Tracking via Correlation Filters
by Fan Li, Sirou Zhang and Xiaoya Qiao
Sensors 2017, 17(11), 2626; https://doi.org/10.3390/s17112626 - 15 Nov 2017
Cited by 13 | Viewed by 5833
Abstract
In recent years, visual object tracking has been widely used in military guidance, human-computer interaction, road traffic, scene monitoring and many other fields. The tracking algorithms based on correlation filters have shown good performance in terms of accuracy and tracking speed. However, their [...] Read more.
In recent years, visual object tracking has been widely used in military guidance, human-computer interaction, road traffic, scene monitoring and many other fields. The tracking algorithms based on correlation filters have shown good performance in terms of accuracy and tracking speed. However, their performance is not satisfactory in scenes with scale variation, deformation, and occlusion. In this paper, we propose a scene-aware adaptive updating mechanism for visual tracking via a kernel correlation filter (KCF). First, a low complexity scale estimation method is presented, in which the corresponding weight in five scales is employed to determine the final target scale. Then, the adaptive updating mechanism is presented based on the scene-classification. We classify the video scenes as four categories by video content analysis. According to the target scene, we exploit the adaptive updating mechanism to update the kernel correlation filter to improve the robustness of the tracker, especially in scenes with scale variation, deformation, and occlusion. We evaluate our tracker on the CVPR2013 benchmark. The experimental results obtained with the proposed algorithm are improved by 33.3%, 15%, 6%, 21.9% and 19.8% compared to those of the KCF tracker on the scene with scale variation, partial or long-time large-area occlusion, deformation, fast motion and out-of-view. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

8373 KiB  
Article
Motion-Blur-Free High-Speed Video Shooting Using a Resonant Mirror
by Michiaki Inoue, Qingyi Gu, Mingjun Jiang, Takeshi Takaki, Idaku Ishii and Kenji Tajima
Sensors 2017, 17(11), 2483; https://doi.org/10.3390/s17112483 - 29 Oct 2017
Cited by 8 | Viewed by 6520
Abstract
This study proposes a novel concept of actuator-driven frame-by-frame intermittent tracking for motion-blur-free video shooting of fast-moving objects. The camera frame and shutter timings are controlled for motion blur reduction in synchronization with a free-vibration-type actuator vibrating with a large amplitude at hundreds [...] Read more.
This study proposes a novel concept of actuator-driven frame-by-frame intermittent tracking for motion-blur-free video shooting of fast-moving objects. The camera frame and shutter timings are controlled for motion blur reduction in synchronization with a free-vibration-type actuator vibrating with a large amplitude at hundreds of hertz so that motion blur can be significantly reduced in free-viewpoint high-frame-rate video shooting for fast-moving objects by deriving the maximum performance of the actuator. We develop a prototype of a motion-blur-free video shooting system by implementing our frame-by-frame intermittent tracking algorithm on a high-speed video camera system with a resonant mirror vibrating at 750 Hz. It can capture 1024 × 1024 images of fast-moving objects at 750 fps with an exposure time of 0.33 ms without motion blur. Several experimental results for fast-moving objects verify that our proposed method can reduce image degradation from motion blur without decreasing the camera exposure time. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

5808 KiB  
Article
DEEP-SEE: Joint Object Detection, Tracking and Recognition with Application to Visually Impaired Navigational Assistance
by Ruxandra Tapu, Bogdan Mocanu and Titus Zaharia
Sensors 2017, 17(11), 2473; https://doi.org/10.3390/s17112473 - 28 Oct 2017
Cited by 57 | Viewed by 8039
Abstract
In this paper, we introduce the so-called DEEP-SEE framework that jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) to detect, track and recognize in real time objects encountered during navigation in the outdoor environment. A first feature concerns an object [...] Read more.
In this paper, we introduce the so-called DEEP-SEE framework that jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) to detect, track and recognize in real time objects encountered during navigation in the outdoor environment. A first feature concerns an object detection technique designed to localize both static and dynamic objects without any a priori knowledge about their position, type or shape. The methodological core of the proposed approach relies on a novel object tracking method based on two convolutional neural networks trained offline. The key principle consists of alternating between tracking using motion information and predicting the object location in time based on visual similarity. The validation of the tracking technique is performed on standard benchmark VOT datasets, and shows that the proposed approach returns state-of-the-art results while minimizing the computational complexity. Then, the DEEP-SEE framework is integrated into a novel assistive device, designed to improve cognition of VI people and to increase their safety when navigating in crowded urban scenes. The validation of our assistive device is performed on a video dataset with 30 elements acquired with the help of VI users. The proposed system shows high accuracy (>90%) and robustness (>90%) scores regardless on the scene dynamics. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

896 KiB  
Article
Robust Small Target Co-Detection from Airborne Infrared Image Sequences
by Jingli Gao, Chenglin Wen and Meiqin Liu
Sensors 2017, 17(10), 2242; https://doi.org/10.3390/s17102242 - 29 Sep 2017
Cited by 20 | Viewed by 4557
Abstract
In this paper, a novel infrared target co-detection model combining the self-correlation features of backgrounds and the commonality features of targets in the spatio-temporal domain is proposed to detect small targets in a sequence of infrared images with complex backgrounds. Firstly, a dense [...] Read more.
In this paper, a novel infrared target co-detection model combining the self-correlation features of backgrounds and the commonality features of targets in the spatio-temporal domain is proposed to detect small targets in a sequence of infrared images with complex backgrounds. Firstly, a dense target extraction model based on nonlinear weights is proposed, which can better suppress background of images and enhance small targets than weights of singular values. Secondly, a sparse target extraction model based on entry-wise weighted robust principal component analysis is proposed. The entry-wise weight adaptively incorporates structural prior in terms of local weighted entropy, thus, it can extract real targets accurately and suppress background clutters efficiently. Finally, the commonality of targets in the spatio-temporal domain are used to construct target refinement model for false alarms suppression and target confirmation. Since real targets could appear in both of the dense and sparse reconstruction maps of a single frame, and form trajectories after tracklet association of consecutive frames, the location correlation of the dense and sparse reconstruction maps for a single frame and tracklet association of the location correlation maps for successive frames have strong ability to discriminate between small targets and background clutters. Experimental results demonstrate that the proposed small target co-detection method can not only suppress background clutters effectively, but also detect targets accurately even if with target-like interference. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

5078 KiB  
Article
American Sign Language Alphabet Recognition Using a Neuromorphic Sensor and an Artificial Neural Network
by Miguel Rivera-Acosta, Susana Ortega-Cisneros, Jorge Rivera and Federico Sandoval-Ibarra
Sensors 2017, 17(10), 2176; https://doi.org/10.3390/s17102176 - 22 Sep 2017
Cited by 26 | Viewed by 9814
Abstract
This paper reports the design and analysis of an American Sign Language (ASL) alphabet translation system implemented in hardware using a Field-Programmable Gate Array. The system process consists of three stages, the first being the communication with the neuromorphic camera (also called Dynamic [...] Read more.
This paper reports the design and analysis of an American Sign Language (ASL) alphabet translation system implemented in hardware using a Field-Programmable Gate Array. The system process consists of three stages, the first being the communication with the neuromorphic camera (also called Dynamic Vision Sensor, DVS) sensor using the Universal Serial Bus protocol. The feature extraction of the events generated by the DVS is the second part of the process, consisting of a presentation of the digital image processing algorithms developed in software, which aim to reduce redundant information and prepare the data for the third stage. The last stage of the system process is the classification of the ASL alphabet, achieved with a single artificial neural network implemented in digital hardware for higher speed. The overall result is the development of a classification system using the ASL signs contour, fully implemented in a reconfigurable device. The experimental results consist of a comparative analysis of the recognition rate among the alphabet signs using the neuromorphic camera in order to prove the proper operation of the digital image processing algorithms. In the experiments performed with 720 samples of 24 signs, a recognition accuracy of 79.58% was obtained. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

2308 KiB  
Article
Comparative Evaluation of Background Subtraction Algorithms in Remote Scene Videos Captured by MWIR Sensors
by Guangle Yao, Tao Lei, Jiandan Zhong, Ping Jiang and Wenwu Jia
Sensors 2017, 17(9), 1945; https://doi.org/10.3390/s17091945 - 24 Aug 2017
Cited by 24 | Viewed by 6090
Abstract
Background subtraction (BS) is one of the most commonly encountered tasks in video analysis and tracking systems. It distinguishes the foreground (moving objects) from the video sequences captured by static imaging sensors. Background subtraction in remote scene infrared (IR) video is important and [...] Read more.
Background subtraction (BS) is one of the most commonly encountered tasks in video analysis and tracking systems. It distinguishes the foreground (moving objects) from the video sequences captured by static imaging sensors. Background subtraction in remote scene infrared (IR) video is important and common to lots of fields. This paper provides a Remote Scene IR Dataset captured by our designed medium-wave infrared (MWIR) sensor. Each video sequence in this dataset is identified with specific BS challenges and the pixel-wise ground truth of foreground (FG) for each frame is also provided. A series of experiments were conducted to evaluate BS algorithms on this proposed dataset. The overall performance of BS algorithms and the processor/memory requirements were compared. Proper evaluation metrics or criteria were employed to evaluate the capability of each BS algorithm to handle different kinds of BS challenges represented in this dataset. The results and conclusions in this paper provide valid references to develop new BS algorithm for remote scene IR video sequence, and some of them are not only limited to remote scene or IR video sequence but also generic for background subtraction. The Remote Scene IR dataset and the foreground masks detected by each evaluated BS algorithm are available online: https://github.com/JerryYaoGl/BSEvaluationRemoteSceneIR. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

10711 KiB  
Article
Headgear Accessories Classification Using an Overhead Depth Sensor
by Carlos A. Luna, Javier Macias-Guarasa, Cristina Losada-Gutierrez, Marta Marron-Romera, Manuel Mazo, Sara Luengo-Sanchez and Roberto Macho-Pedroso
Sensors 2017, 17(8), 1845; https://doi.org/10.3390/s17081845 - 10 Aug 2017
Cited by 1 | Viewed by 5724
Abstract
In this paper, we address the generation of semantic labels describing the headgear accessories carried out by people in a scene under surveillance, only using depth information obtained from a Time-of-Flight (ToF) camera placed in an overhead position. We propose a new method [...] Read more.
In this paper, we address the generation of semantic labels describing the headgear accessories carried out by people in a scene under surveillance, only using depth information obtained from a Time-of-Flight (ToF) camera placed in an overhead position. We propose a new method for headgear accessories classification based on the design of a robust processing strategy that includes the estimation of a meaningful feature vector that provides the relevant information about the people’s head and shoulder areas. This paper includes a detailed description of the proposed algorithmic approach, and the results obtained in tests with persons with and without headgear accessories, and with different types of hats and caps. In order to evaluate the proposal, a wide experimental validation has been carried out on a fully labeled database (that has been made available to the scientific community), including a broad variety of people and headgear accessories. For the validation, three different levels of detail have been defined, considering a different number of classes: the first level only includes two classes (hat/cap, and no hat/cap), the second one considers three classes (hat, cap and no hat/cap), and the last one includes the full class set with the five classes (no hat/cap, cap, small size hat, medium size hat, and large size hat). The achieved performance is satisfactory in every case: the average classification rates for the first level reaches 95.25%, for the second one is 92.34%, and for the full class set equals 84.60%. In addition, the online stage processing time is 5.75 ms per frame in a standard PC, thus allowing for real-time operation. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

6450 KiB  
Article
A Study of Deep CNN-Based Classification of Open and Closed Eyes Using a Visible Light Camera Sensor
by Ki Wan Kim, Hyung Gil Hong, Gi Pyo Nam and Kang Ryoung Park
Sensors 2017, 17(7), 1534; https://doi.org/10.3390/s17071534 - 30 Jun 2017
Cited by 72 | Viewed by 10468
Abstract
The necessity for the classification of open and closed eyes is increasing in various fields, including analysis of eye fatigue in 3D TVs, analysis of the psychological states of test subjects, and eye status tracking-based driver drowsiness detection. Previous studies have used various [...] Read more.
The necessity for the classification of open and closed eyes is increasing in various fields, including analysis of eye fatigue in 3D TVs, analysis of the psychological states of test subjects, and eye status tracking-based driver drowsiness detection. Previous studies have used various methods to distinguish between open and closed eyes, such as classifiers based on the features obtained from image binarization, edge operators, or texture analysis. However, when it comes to eye images with different lighting conditions and resolutions, it can be difficult to find an optimal threshold for image binarization or optimal filters for edge and texture extraction. In order to address this issue, we propose a method to classify open and closed eye images with different conditions, acquired by a visible light camera, using a deep residual convolutional neural network. After conducting performance analysis on both self-collected and open databases, we have determined that the classification accuracy of the proposed method is superior to that of existing methods. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

6198 KiB  
Article
Improving Video Segmentation by Fusing Depth Cues and the Visual Background Extractor (ViBe) Algorithm
by Xiaoqin Zhou, Xiaofeng Liu, Aimin Jiang, Bin Yan and Chenguang Yang
Sensors 2017, 17(5), 1177; https://doi.org/10.3390/s17051177 - 21 May 2017
Cited by 26 | Viewed by 6661
Abstract
Depth-sensing technology has led to broad applications of inexpensive depth cameras that can capture human motion and scenes in three-dimensional space. Background subtraction algorithms can be improved by fusing color and depth cues, thereby allowing many issues encountered in classical color segmentation to [...] Read more.
Depth-sensing technology has led to broad applications of inexpensive depth cameras that can capture human motion and scenes in three-dimensional space. Background subtraction algorithms can be improved by fusing color and depth cues, thereby allowing many issues encountered in classical color segmentation to be solved. In this paper, we propose a new fusion method that combines depth and color information for foreground segmentation based on an advanced color-based algorithm. First, a background model and a depth model are developed. Then, based on these models, we propose a new updating strategy that can eliminate ghosting and black shadows almost completely. Extensive experiments have been performed to compare the proposed algorithm with other, conventional RGB-D (Red-Green-Blue and Depth) algorithms. The experimental results suggest that our method extracts foregrounds with higher effectiveness and efficiency. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Graphical abstract

377 KiB  
Article
A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data
by Alessandro Manzi, Paolo Dario and Filippo Cavallo
Sensors 2017, 17(5), 1100; https://doi.org/10.3390/s17051100 - 11 May 2017
Cited by 44 | Viewed by 6671
Abstract
Human activity recognition is an important area in computer vision, with its wide range of applications including ambient assisted living. In this paper, an activity recognition system based on skeleton data extracted from a depth camera is presented. The system makes use of [...] Read more.
Human activity recognition is an important area in computer vision, with its wide range of applications including ambient assisted living. In this paper, an activity recognition system based on skeleton data extracted from a depth camera is presented. The system makes use of machine learning techniques to classify the actions that are described with a set of a few basic postures. The training phase creates several models related to the number of clustered postures by means of a multiclass Support Vector Machine (SVM), trained with Sequential Minimal Optimization (SMO). The classification phase adopts the X-means algorithm to find the optimal number of clusters dynamically. The contribution of the paper is twofold. The first aim is to perform activity recognition employing features based on a small number of informative postures, extracted independently from each activity instance; secondly, it aims to assess the minimum number of frames needed for an adequate classification. The system is evaluated on two publicly available datasets, the Cornell Activity Dataset (CAD-60) and the Telecommunication Systems Team (TST) Fall detection dataset. The number of clusters needed to model each instance ranges from two to four elements. The proposed approach reaches excellent performances using only about 4 s of input data (~100 frames) and outperforms the state of the art when it uses approximately 500 frames on the CAD-60 dataset. The results are promising for the test in real context. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

13990 KiB  
Article
Fuzzy System-Based Target Selection for a NIR Camera-Based Gaze Tracker
by Rizwan Ali Naqvi, Muhammad Arsalan and Kang Ryoung Park
Sensors 2017, 17(4), 862; https://doi.org/10.3390/s17040862 - 14 Apr 2017
Cited by 10 | Viewed by 5373
Abstract
Gaze-based interaction (GBI) techniques have been a popular subject of research in the last few decades. Among other applications, GBI can be used by persons with disabilities to perform everyday tasks, as a game interface, and can play a pivotal role in the [...] Read more.
Gaze-based interaction (GBI) techniques have been a popular subject of research in the last few decades. Among other applications, GBI can be used by persons with disabilities to perform everyday tasks, as a game interface, and can play a pivotal role in the human computer interface (HCI) field. While gaze tracking systems have shown high accuracy in GBI, detecting a user’s gaze for target selection is a challenging problem that needs to be considered while using a gaze detection system. Past research has used the blinking of the eyes for this purpose as well as dwell time-based methods, but these techniques are either inconvenient for the user or requires a long time for target selection. Therefore, in this paper, we propose a method for fuzzy system-based target selection for near-infrared (NIR) camera-based gaze trackers. The results of experiments performed in addition to tests of the usability and on-screen keyboard use of the proposed method show that it is better than previous methods. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

2615 KiB  
Article
Tracking a Non-Cooperative Target Using Real-Time Stereovision-Based Control: An Experimental Study
by Tomer Shtark and Pini Gurfil
Sensors 2017, 17(4), 735; https://doi.org/10.3390/s17040735 - 31 Mar 2017
Cited by 16 | Viewed by 5167
Abstract
Tracking a non-cooperative target is a challenge, because in unfamiliar environments most targets are unknown and unspecified. Stereovision is suited to deal with this issue, because it allows to passively scan large areas and estimate the relative position, velocity and shape of objects. [...] Read more.
Tracking a non-cooperative target is a challenge, because in unfamiliar environments most targets are unknown and unspecified. Stereovision is suited to deal with this issue, because it allows to passively scan large areas and estimate the relative position, velocity and shape of objects. This research is an experimental effort aimed at developing, implementing and evaluating a real-time non-cooperative target tracking methods using stereovision measurements only. A computer-vision feature detection and matching algorithm was developed in order to identify and locate the target in the captured images. Three different filters were designed for estimating the relative position and velocity, and their performance was compared. A line-of-sight control algorithm was used for the purpose of keeping the target within the field-of-view. Extensive analytical and numerical investigations were conducted on the multi-view stereo projection equations and their solutions, which were used to initialize the different filters. This research shows, using an experimental and numerical evaluation, the benefits of using the unscented Kalman filter and the total least squares technique in the stereovision-based tracking problem. These findings offer a general and more accurate method for solving the static and dynamic stereovision triangulation problems and the concomitant line-of-sight control. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

2572 KiB  
Article
Gender Recognition from Human-Body Images Using Visible-Light and Thermal Camera Videos Based on a Convolutional Neural Network for Image Feature Extraction
by Dat Tien Nguyen, Ki Wan Kim, Hyung Gil Hong, Ja Hyung Koo, Min Cheol Kim and Kang Ryoung Park
Sensors 2017, 17(3), 637; https://doi.org/10.3390/s17030637 - 20 Mar 2017
Cited by 40 | Viewed by 7972
Abstract
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram [...] Read more.
Extracting powerful image features plays an important role in computer vision systems. Many methods have previously been proposed to extract image features for various computer vision applications, such as the scale-invariant feature transform (SIFT), speed-up robust feature (SURF), local binary patterns (LBP), histogram of oriented gradients (HOG), and weighted HOG. Recently, the convolutional neural network (CNN) method for image feature extraction and classification in computer vision has been used in various applications. In this research, we propose a new gender recognition method for recognizing males and females in observation scenes of surveillance systems based on feature extraction from visible-light and thermal camera videos through CNN. Experimental results confirm the superiority of our proposed method over state-of-the-art recognition methods for the gender recognition problem using human body images. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

6794 KiB  
Article
Conditional Random Field (CRF)-Boosting: Constructing a Robust Online Hybrid Boosting Multiple Object Tracker Facilitated by CRF Learning
by Ehwa Yang, Jeonghwan Gwak and Moongu Jeon
Sensors 2017, 17(3), 617; https://doi.org/10.3390/s17030617 - 17 Mar 2017
Cited by 6 | Viewed by 5577
Abstract
Due to the reasonably acceptable performance of state-of-the-art object detectors, tracking-by-detection is a standard strategy for visual multi-object tracking (MOT). In particular, online MOT is more demanding due to its diverse applications in time-critical situations. A main issue of realizing online MOT is [...] Read more.
Due to the reasonably acceptable performance of state-of-the-art object detectors, tracking-by-detection is a standard strategy for visual multi-object tracking (MOT). In particular, online MOT is more demanding due to its diverse applications in time-critical situations. A main issue of realizing online MOT is how to associate noisy object detection results on a new frame with previously being tracked objects. In this work, we propose a multi-object tracker method called CRF-boosting which utilizes a hybrid data association method based on online hybrid boosting facilitated by a conditional random field (CRF) for establishing online MOT. For data association, learned CRF is used to generate reliable low-level tracklets and then these are used as the input of the hybrid boosting. To do so, while existing data association methods based on boosting algorithms have the necessity of training data having ground truth information to improve robustness, CRF-boosting ensures sufficient robustness without such information due to the synergetic cascaded learning procedure. Further, a hierarchical feature association framework is adopted to further improve MOT accuracy. From experimental results on public datasets, we could conclude that the benefit of proposed hybrid approach compared to the other competitive MOT systems is noticeable. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

6784 KiB  
Article
Effective Visual Tracking Using Multi-Block and Scale Space Based on Kernelized Correlation Filters
by Soowoong Jeong, Guisik Kim and Sangkeun Lee
Sensors 2017, 17(3), 433; https://doi.org/10.3390/s17030433 - 23 Feb 2017
Cited by 13 | Viewed by 5379
Abstract
Accurate scale estimation and occlusion handling is a challenging problem in visual tracking. Recently, correlation filter-based trackers have shown impressive results in terms of accuracy, robustness, and speed. However, the model is not robust to scale variation and occlusion. In this paper, we [...] Read more.
Accurate scale estimation and occlusion handling is a challenging problem in visual tracking. Recently, correlation filter-based trackers have shown impressive results in terms of accuracy, robustness, and speed. However, the model is not robust to scale variation and occlusion. In this paper, we address the problems associated with scale variation and occlusion by employing a scale space filter and multi-block scheme based on a kernelized correlation filter (KCF) tracker. Furthermore, we develop a more robust algorithm using an appearance update model that approximates the change of state of occlusion and deformation. In particular, an adaptive update scheme is presented to make each process robust. The experimental results demonstrate that the proposed method outperformed 29 state-of-the-art trackers on 100 challenging sequences. Specifically, the results obtained with the proposed scheme were improved by 8% and 18% compared to those of the KCF tracker for 49 occlusion and 64 scale variation sequences, respectively. Therefore, the proposed tracker can be a robust and useful tool for object tracking when occlusion and scale variation are involved. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

5748 KiB  
Article
A Real-Time High Performance Computation Architecture for Multiple Moving Target Tracking Based on Wide-Area Motion Imagery via Cloud and Graphic Processing Units
by Kui Liu, Sixiao Wei, Zhijiang Chen, Bin Jia, Genshe Chen, Haibin Ling, Carolyn Sheaff and Erik Blasch
Sensors 2017, 17(2), 356; https://doi.org/10.3390/s17020356 - 12 Feb 2017
Cited by 14 | Viewed by 6039
Abstract
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion [...] Read more.
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

5556 KiB  
Article
Robust Video Stabilization Using Particle Keypoint Update and l1-Optimized Camera Path
by Semi Jeon, Inhye Yoon, Jinbeum Jang, Seungji Yang, Jisung Kim and Joonki Paik
Sensors 2017, 17(2), 337; https://doi.org/10.3390/s17020337 - 10 Feb 2017
Cited by 19 | Viewed by 6768
Abstract
Acquisition of stabilized video is an important issue for various type of digital cameras. This paper presents an adaptive camera path estimation method using robust feature detection to remove shaky artifacts in a video. The proposed algorithm consists of three steps: (i) robust [...] Read more.
Acquisition of stabilized video is an important issue for various type of digital cameras. This paper presents an adaptive camera path estimation method using robust feature detection to remove shaky artifacts in a video. The proposed algorithm consists of three steps: (i) robust feature detection using particle keypoints between adjacent frames; (ii) camera path estimation and smoothing; and (iii) rendering to reconstruct a stabilized video. As a result, the proposed algorithm can estimate the optimal homography by redefining important feature points in the flat region using particle keypoints. In addition, stabilized frames with less holes can be generated from the optimal, adaptive camera path that minimizes a temporal total variation (TV). The proposed video stabilization method is suitable for enhancing the visual quality for various portable cameras and can be applied to robot vision, driving assistant systems, and visual surveillance systems. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

1522 KiB  
Article
Real-Time Straight-Line Detection for XGA-Size Videos by Hough Transform with Parallelized Voting Procedures
by Jungang Guan, Fengwei An, Xiangyu Zhang, Lei Chen and Hans Jürgen Mattausch
Sensors 2017, 17(2), 270; https://doi.org/10.3390/s17020270 - 30 Jan 2017
Cited by 20 | Viewed by 7228
Abstract
The Hough Transform (HT) is a method for extracting straight lines from an edge image. The main limitations of the HT for usage in actual applications are computation time and storage requirements. This paper reports a hardware architecture for HT implementation on a [...] Read more.
The Hough Transform (HT) is a method for extracting straight lines from an edge image. The main limitations of the HT for usage in actual applications are computation time and storage requirements. This paper reports a hardware architecture for HT implementation on a Field Programmable Gate Array (FPGA) with parallelized voting procedure. The 2-dimensional accumulator array, namely the Hough space in parametric form (ρ, θ), for computing the strength of each line by a voting mechanism is mapped on a 1-dimensional array with regular increments of θ. Then, this Hough space is divided into a number of parallel parts. The computation of (ρ, θ) for the edge pixels and the voting procedure for straight-line determination are therefore executable in parallel. In addition, a synchronized initialization for the Hough space further increases the speed of straight-line detection, so that XGA video processing becomes possible. The designed prototype system has been synthesized on a DE4 platform with a Stratix-IV FPGA device. In the application of road-lane detection, the average processing speed of this HT implementation is 5.4ms per XGA-frame at 200 MHz working frequency. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

8321 KiB  
Article
Visual Object Tracking Based on Cross-Modality Gaussian-Bernoulli Deep Boltzmann Machines with RGB-D Sensors
by Mingxin Jiang, Zhigeng Pan and Zhenzhou Tang
Sensors 2017, 17(1), 121; https://doi.org/10.3390/s17010121 - 10 Jan 2017
Cited by 12 | Viewed by 5706
Abstract
Visual object tracking technology is one of the key issues in computer vision. In this paper, we propose a visual object tracking algorithm based on cross-modality featuredeep learning using Gaussian-Bernoulli deep Boltzmann machines (DBM) with RGB-D sensors. First, a cross-modality featurelearning network based [...] Read more.
Visual object tracking technology is one of the key issues in computer vision. In this paper, we propose a visual object tracking algorithm based on cross-modality featuredeep learning using Gaussian-Bernoulli deep Boltzmann machines (DBM) with RGB-D sensors. First, a cross-modality featurelearning network based on aGaussian-Bernoulli DBM is constructed, which can extract cross-modality features of the samples in RGB-D video data. Second, the cross-modality features of the samples are input into the logistic regression classifier, andthe observation likelihood model is established according to the confidence score of the classifier. Finally, the object tracking results over RGB-D data are obtained using aBayesian maximum a posteriori (MAP) probability estimation algorithm. The experimental results show that the proposed method has strong robustness to abnormal changes (e.g., occlusion, rotation, illumination change, etc.). The algorithm can steadily track multiple targets and has higher accuracy. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

8873 KiB  
Article
3D Visual Tracking of an Articulated Robot in Precision Automated Tasks
by Hamza Alzarok, Simon Fletcher and Andrew P. Longstaff
Sensors 2017, 17(1), 104; https://doi.org/10.3390/s17010104 - 07 Jan 2017
Cited by 9 | Viewed by 7126
Abstract
The most compelling requirements for visual tracking systems are a high detection accuracy and an adequate processing speed. However, the combination between the two requirements in real world applications is very challenging due to the fact that more accurate tracking tasks often require [...] Read more.
The most compelling requirements for visual tracking systems are a high detection accuracy and an adequate processing speed. However, the combination between the two requirements in real world applications is very challenging due to the fact that more accurate tracking tasks often require longer processing times, while quicker responses for the tracking system are more prone to errors, therefore a trade-off between accuracy and speed, and vice versa is required. This paper aims to achieve the two requirements together by implementing an accurate and time efficient tracking system. In this paper, an eye-to-hand visual system that has the ability to automatically track a moving target is introduced. An enhanced Circular Hough Transform (CHT) is employed for estimating the trajectory of a spherical target in three dimensions, the colour feature of the target was carefully selected by using a new colour selection process, the process relies on the use of a colour segmentation method (Delta E) with the CHT algorithm for finding the proper colour of the tracked target, the target was attached to the six degree of freedom (DOF) robot end-effector that performs a pick-and-place task. A cooperation of two Eye-to Hand cameras with their image Averaging filters are used for obtaining clear and steady images. This paper also examines a new technique for generating and controlling the observation search window in order to increase the computational speed of the tracking system, the techniques is named Controllable Region of interest based on Circular Hough Transform (CRCHT). Moreover, a new mathematical formula is introduced for updating the depth information of the vision system during the object tracking process. For more reliable and accurate tracking, a simplex optimization technique was employed for the calculation of the parameters for camera to robotic transformation matrix. The results obtained show the applicability of the proposed approach to track the moving robot with an overall tracking error of 0.25 mm. Also, the effectiveness of CRCHT technique in saving up to 60% of the overall time required for image processing. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

3110 KiB  
Article
Multi-User Identification-Based Eye-Tracking Algorithm Using Position Estimation
by Suk-Ju Kang
Sensors 2017, 17(1), 41; https://doi.org/10.3390/s17010041 - 27 Dec 2016
Cited by 8 | Viewed by 6289
Abstract
This paper proposes a new multi-user eye-tracking algorithm using position estimation. Conventional eye-tracking algorithms are typically suitable only for a single user, and thereby cannot be used for a multi-user system. Even though they can be used to track the eyes of multiple [...] Read more.
This paper proposes a new multi-user eye-tracking algorithm using position estimation. Conventional eye-tracking algorithms are typically suitable only for a single user, and thereby cannot be used for a multi-user system. Even though they can be used to track the eyes of multiple users, their detection accuracy is low and they cannot identify multiple users individually. The proposed algorithm solves these problems and enhances the detection accuracy. Specifically, the proposed algorithm adopts a classifier to detect faces for the red, green, and blue (RGB) and depth images. Then, it calculates features based on the histogram of the oriented gradient for the detected facial region to identify multiple users, and selects the template that best matches the users from a pre-determined face database. Finally, the proposed algorithm extracts the final eye positions based on anatomical proportions. Simulation results show that the proposed algorithm improved the average F1 score by up to 0.490, compared with benchmark algorithms. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

5671 KiB  
Article
A Novel Probabilistic Data Association for Target Tracking in a Cluttered Environment
by Xiao Chen, Yaan Li, Yuxing Li, Jing Yu and Xiaohua Li
Sensors 2016, 16(12), 2180; https://doi.org/10.3390/s16122180 - 18 Dec 2016
Cited by 27 | Viewed by 5919
Abstract
The problem of data association for target tracking in a cluttered environment is discussed. In order to improve the real-time processing and accuracy of target tracking, based on a probabilistic data association algorithm, a novel data association algorithm using distance weighting was proposed, [...] Read more.
The problem of data association for target tracking in a cluttered environment is discussed. In order to improve the real-time processing and accuracy of target tracking, based on a probabilistic data association algorithm, a novel data association algorithm using distance weighting was proposed, which can enhance the association probability of measurement originated from target, and then using a Kalman filter to estimate the target state more accurately. Thus, the tracking performance of the proposed algorithm when tracking non-maneuvering targets in a densely cluttered environment has improved, and also does better when two targets are parallel to each other, or at a small-angle crossing in a densely cluttered environment. As for maneuvering target issues, usually with an interactive multi-model framework, combined with the improved probabilistic data association method, we propose an improved algorithm using a combined interactive multiple model probabilistic data association algorithm to track a maneuvering target in a densely cluttered environment. Through Monte Carlo simulation, the results show that the proposed algorithm can be more effective and reliable for different scenarios of target tracking in a densely cluttered environment. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

13077 KiB  
Article
Robust and Accurate Vision-Based Pose Estimation Algorithm Based on Four Coplanar Feature Points
by Zimiao Zhang, Shihai Zhang and Qiu Li
Sensors 2016, 16(12), 2173; https://doi.org/10.3390/s16122173 - 17 Dec 2016
Cited by 8 | Viewed by 4696
Abstract
Vision-based pose estimation is an important application of machine vision. Currently, analytical and iterative methods are used to solve the object pose. The analytical solutions generally take less computation time. However, the analytical solutions are extremely susceptible to noise. The iterative solutions minimize [...] Read more.
Vision-based pose estimation is an important application of machine vision. Currently, analytical and iterative methods are used to solve the object pose. The analytical solutions generally take less computation time. However, the analytical solutions are extremely susceptible to noise. The iterative solutions minimize the distance error between feature points based on 2D image pixel coordinates. However, the non-linear optimization needs a good initial estimate of the true solution, otherwise they are more time consuming than analytical solutions. Moreover, the image processing error grows rapidly with measurement range increase. This leads to pose estimation errors. All the reasons mentioned above will cause accuracy to decrease. To solve this problem, a novel pose estimation method based on four coplanar points is proposed. Firstly, the coordinates of feature points are determined according to the linear constraints formed by the four points. The initial coordinates of feature points acquired through the linear method are then optimized through an iterative method. Finally, the coordinate system of object motion is established and a method is introduced to solve the object pose. The growing image processing error causes pose estimation errors the measurement range increases. Through the coordinate system, the pose estimation errors could be decreased. The proposed method is compared with two other existing methods through experiments. Experimental results demonstrate that the proposed method works efficiently and stably. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

4126 KiB  
Article
Adaptive Local Spatiotemporal Features from RGB-D Data for One-Shot Learning Gesture Recognition
by Jia Lin, Xiaogang Ruan, Naigong Yu and Yee-Hong Yang
Sensors 2016, 16(12), 2171; https://doi.org/10.3390/s16122171 - 17 Dec 2016
Cited by 6 | Viewed by 5131
Abstract
Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest [...] Read more.
Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest (MRoIs) are adaptively extracted using grayscale and depth velocity variance information to greatly reduce the impact of noise. Then, corners are used as keypoints if their depth, and velocities of grayscale and of depth meet several adaptive local constraints in each MRoI. With further filtering of noise, an accurate and sufficient number of keypoints is obtained within the desired moving body parts (MBPs). Finally, four kinds of multiple descriptors are calculated and combined in extended gradient and motion spaces to represent the appearance and motion features of gestures. The experimental results on the ChaLearn gesture, CAD-60 and MSRDailyActivity3D datasets demonstrate that the proposed feature achieves higher performance compared with published state-of-the-art approaches under the one-shot learning setting and comparable accuracy under the leave-one-out cross validation. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

4693 KiB  
Article
Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
by Jiatong Bao, Yunyi Jia, Yu Cheng, Hongru Tang and Ning Xi
Sensors 2016, 16(12), 2117; https://doi.org/10.3390/s16122117 - 13 Dec 2016
Cited by 3 | Viewed by 6342
Abstract
Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores [...] Read more.
Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications. Full article
(This article belongs to the Special Issue Video Analysis and Tracking Using State-of-the-Art Sensors)
Show Figures

Figure 1

Back to TopTop