Next Article in Journal
Estimation of the Land Surface Temperature over the Tibetan Plateau by Using Chinese FY-2C Geostationary Satellite Data
Next Article in Special Issue
An Improved Calibration Method for a Rotating 2D LIDAR System
Previous Article in Journal
Multi-Feature Classification of Multi-Sensor Satellite Imagery Based on Dual-Polarimetric Sentinel-1A, Landsat-8 OLI, and Hyperion Images for Urban Land-Cover Classification
Previous Article in Special Issue
Motorcycles that See: Multifocal Stereo Vision Sensor for Advanced Safety Systems in Tilting Vehicles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Vehicle Detection with Occlusion Handling, Tracking, and OC-SVM Classification: A High Performance Vision-Based System

by
Roxana Velazquez-Pupo
1,
Alberto Sierra-Romero
1,
Deni Torres-Roman
1,
Yuriy V. Shkvarko
1,†,
Jayro Santiago-Paz
1,*,
David Gómez-Gutiérrez
2,
Daniel Robles-Valdez
1,
Fernando Hermosillo-Reynoso
1 and
Misael Romero-Delgado
1
1
Center for Advanced Research and Education of the National Polytechnic Institute of Mexico, CINVESTAV Guadalajara, Zapopan C.P. 45019, Mexico
2
Intel Labs, Intel Tecnología de Mexico, Zapopan C.P. 45019, Mexico
*
Author to whom correspondence should be addressed.
Died at August 2016.
Sensors 2018, 18(2), 374; https://doi.org/10.3390/s18020374
Submission received: 30 November 2017 / Revised: 7 January 2018 / Accepted: 18 January 2018 / Published: 27 January 2018
(This article belongs to the Special Issue Sensors for Transportation)

Abstract

:
This paper presents a high performance vision-based system with a single static camera for traffic surveillance, for moving vehicle detection with occlusion handling, tracking, counting, and One Class Support Vector Machine (OC-SVM) classification. In this approach, moving objects are first segmented from the background using the adaptive Gaussian Mixture Model (GMM). After that, several geometric features are extracted, such as vehicle area, height, width, centroid, and bounding box. As occlusion is present, an algorithm was implemented to reduce it. The tracking is performed with adaptive Kalman filter. Finally, the selected geometric features: estimated area, height, and width are used by different classifiers in order to sort vehicles into three classes: small, midsize, and large. Extensive experimental results in eight real traffic videos with more than 4000 ground truth vehicles have shown that the improved system can run in real time under an occlusion index of 0.312 and classify vehicles with a global detection rate or recall, precision, and F-measure of up to 98.190%, and an F-measure of up to 99.051% for midsize vehicles.

1. Introduction

The main goal of Intelligent Transportation Systems (ITS) for an Internet of Things (IoT) Smart City is to improve safety, efficiency, and coordination in transport infrastructure and vehicles by applying information and communication technologies. To this end, it is necessary to have systems capable of collecting road information and monitoring traffic.
Video cameras are a good choice for these tasks, because they are non-intrusive, easy to install, and of moderate cost. In addition, advances in analytical techniques for processing video data, together with increased computing power, may now provide added value to cameras by automatically extracting relevant traffic information, such as volume, density, and vehicle velocity.
According to the type of sensors (active or passive) and its location, different approaches for detecting and classifying vehicles has been developed, such as: on-road camera [1,2,3,4], rear and forward looking cameras onboard [5], low-altitude airborne platforms with vision [6,7], and non-camera on the road [8,9,10].
Vehicle detection can use several sensors and has a different meaning in this area, e.g., from a moving camera for driver assistance, or from static camera for traffic surveillance, as in our case. Thus, vehicle detection is the first step of a vision-based traffic monitoring process with one static camera. Several vehicle detection techniques have been successfully used on highways, such as frame differencing [11,12], background subtraction [13,14], optical flow [15], GMM [16,17], and others.
Usually, the next step in video processing is to track detected moving objects from one frame to another in an image sequence. Tracking over time typically involves matching objects in consecutive frames using features such as points, lines, or blobs [18], and from these track sequences, different object behaviors can be inferred. In [19], the authors present a real-time vision-based traffic flow monitoring system, where a flow model is used to count vehicles traveling on each lane and to produce traffic statistics. In the literature, the most widely used tracking algorithms are Kalman filter [20,21,22], adaptive Kalman filter [23,24], and particle filter [25].
After vehicle tracking and feature extraction, the final step is vehicle classification. Numerous techniques are available for automatic classification of vehicles, the most commonly used being deterministic methods [26,27], stochastic methods [20,28], artificial neural networks [29,30,31], and Support Vector Machine (SVM) [7,9,10,32].
Major contributions for vision-based traffic surveillance with static camera presented in this paper are: A very high performance vision-based system that improves the detection rate of moving vehicles through occlusion handling; the introduction of a metric Vehicle Occlusion Index (VOI) to measure and characterize vehicle occlusion; and the novel inclusion of OC-SVM with the Radial Basis Function (RBF) Kernel for the classification stage, where the input space of the classifier is 3D based on geometric features.
This paper is organized as follows: in Section 2, an overview of the related work in the area of occlusion handling and vehicle classification is presented. In Section 3, the procedures for the proposed system are described: vehicle detection with occlusion handling, vehicle tracking, and vehicle classification based on K-means, SVM, and OC-SVM. Experimental results are provided in Section 4. Discussion of the paper is presented in Section 5. Finally, the conclusions are presented in Section 6.

2. Related Works

Although, our work is related to vision-based systems with static camera for traffic surveillance, some works related to other close area “on road vision-based systems” are overviewed. In [1], Sivaranan and Trivedi present a detailed survey about the advances in road vision-based vehicle detection, tracking, and behaviour analysis, particularly as regards to sensors for vehicle detection and representative works in vision-based vehicle detection and tracking. In addition to classification, aspects such as features and occlusion should be studied. Another paper of interest, for vehicle-mounted camera, is Arrospide, Salgado and Nieto [33], where “a new descriptor based on the analysis of gradient orientations in concentric rectangles is defined”, involving “a much smaller feature space compared to traditional descriptors, which are too costly for real-time applications. A new vehicle image database is generated to train the SVM”. On other hand, Huang in [34] shows a detailed study about the background and uses the entropy for a motion detection algorithm, although it is a very good paper, the accuracy achieved was relatively low (53.43%).
Due to perspective effects, shadows, camera vibration, lighting changes, and other factors, multiple vehicles could be detected as a single vehicle, greatly affecting system performance. Therefore, occlusion handling is an important step after vehicle detection. There are several methods for reducing occlusions. For example, in [20], a line-based algorithm using a set of horizontal and vertical lines is proposed to eliminate all unwanted shadows; these lines are derived from the information of lane-dividing lines. In addition, fusion of the image frames acquired from multiple cameras is used in [35] to deal with the occlusion problem. Furthermore, an algorithm based on car windshield appearance is proposed in [21] to handle occlusions in dense traffic. In [36], occlusion is detected through convex regions, if occlusion is detected, then it is removed with a cutting region. In [37] the vehicle corner was used as feature to solve partial occlusion. In [38], feature-based tracking is used in intersections to handle the problem caused by the disruption of the features. In [39], the vehicle counting with perspective view is performed using two-appearance-based classifiers. Table 1 shows related works in the detection/counting stage.
From this table and the literature, we conclude that:
  • Only reference [20] has uses greater number of Ground Truth (GT) points in the detection than us, but they used only 3326 for the classification. Therefore, our work shows the greatest number of GT for classification.
  • The detection rate DR or recall of 100% reported in [37] was achieved in a restricted scenario for only nine GT vehicles in 1000 frames; so, it’s not valid.
  • The most of papers don´t give information about videos, that can be downloaded and tested; or they are too short, or not show an easy replication.
  • Background models are addressed in the following highly cited articles [34,40,41,42], but all are based on assumptions that background pixel values show higher frequencies and less variance than any foreground pixel. Although the occlusion is not handled in these papers.
  • Background-foreground algorithms transform input videos or photos, with occlusion handling or not, into an output space that is used for the classification stage.
  • The output space delivered by the detection stage is the set of points or vectors modelling the moving vehicles.
  • It is important to keep a low dimensional output space of the detection algorithms and/or the use of low-computational complexity features to improve the performance of these real-time systems.
  • In [36] the occlusion is classified into partial and full visually, and convex regions were employed, reporting an improvement of the detection. However, a metric about the occlusion has not been presented.
  • In [39] the occlusion handling algorithm is based on SVM, using 11 videos for training and another three for the detection of occlusion. Although this technique is novel, it uses images as elements of the input space for the SVM classifier. Therefore, it has a greater computational complexity than other techniques that use elements of less complexity than those images.
  • All occlusion management algorithms should be tested with long-duration, high-frame-rate videos, 135-s videos and frame rates of 8 are relatively low.
  • Vehicle ROI extraction based on GMM to reduce computational complexity is achieved in some works like [43].
  • In our work assumptions such as (1) processing in the pixel domain, (2) tracking and decision at frame-level, (3) the use of low-computational complexity features and (4) processing of pixels in certain regions with high variability, are kept to reduce the computational complexity because these assumptions are crucial for a necessary future parallelization of these algorithms.
  • Our work has the largest number of different scenarios for detection and the largest number of frames. In addition, traffic load and other metrics are given.
In the literature, there are many selected and extracted features [7,9,10,32,44,45,46] such as: wave length, mean, variance, peak, valley, acreage, acoustic signals, Histogram Oriented Gradients (HOG) features, the vehicle length, Grey-Level Co-occurrence matrix features, low level features, area, width, height, centroid, and bounding box. In the classification stage, these features are employed to classify the vehicles into several classes; the most used are small, medium, and large. Since 2006, SVM has been used for vehicle classification using other input spaces, and other different scenarios, such as static images [47], vehicles on road ramps [10], visual surveillance from low-altitude airborne platforms [7], on-road camera [32], static side-road camera [48], and laser intensity image without vehicle occlusion [46]. Also, in this work we focus on traffic surveillance with only a vision camera as sensor, the scenarios are multilane ways with a relative high traffic load, under different weather conditions and a variable occlusion index (see [49]). Table 2 shows important aspects of the related works in vehicle classification, including our results, and where TPR is the True Positive Rate or Recall, TNR is the True Negative Rate, FNR is the False Negative Rate.
From Table 2 and the here mentioned literature it can be seen that:
  • Several systems used in addition to the video camera, other sensors, then different input spaces were created. Consequently, the use of a single static camera helps to maintain a low cost hardware system, and we have demonstrated that it is possible to have a high performance system.
  • The test scenarios used in this work are richer than those presented in related papers.
  • For traffic monitoring in Smart City IoT with a static camera located on the road-side, our system showed the highest performance and we calculated more performance metrics.
Motivation: For an IoT Smart City and particularly for video-based traffic surveillance, to have a very high-performance vision-based system that improves the detection rate of moving vehicles through geometric features and occlusion handling algorithms; the measurement of the occlusion by a metric here called VOI—Vehicle Occlusion Index— and the use of novel classifiers.

3. The Proposed System

In this paper, we present a system to detect, track, and classify vehicles from video sequences, with a higher performance than related methods in the literature. Figure 1 shows the block diagram of the system. In the training, the models for each class of vehicles are generated, for this, a training video is used. With the models, the classification is performed using OC-SVM.

3.1. System Initialization

The tasks related with the system initialization (see Figure 2) are the following:
  • Manual selection of the Region of Interest (ROI), which is the set of all pixels where moving objects or vehicles can be detected, tracked and classified. This concept helps to reduce the whole processing time.
  • Manual setting of the lane-dividing lines, detection line, and classification line.

3.2. Vehicle Detection

It is known that different techniques can be employed for vehicle detection, e.g., pixel-domain, photo-domain. Vehicle models are built from different sets of the features, which can be geometric, based on secondary sensors, or derived by certain mathematical transformations. We will work at pixel-domain because we observed that several algorithms achieve a high performance and useful for a necessary future parallelization of the algorithms.
Although, the background modelling is not a target of this work, to have a reliable background model is a very important issue for detection of moving objects like vehicles. This problem was addressed and modelled by different authors. Stauffer and Grimson [40] developed the adaptive GMM model, while Power and Schoonees [50] revealed important practical details of this model. Mandellos, Keramitsoglou and Kiranoudis [41], and Huang [34] developed background models. Nevertheless, all of them rely on assumptions that the background pixel-values show higher frequencies and less variance than any foreground pixels. The algorithm in [41] behaves for the background as a GMM-Model, improving the foreground only working on Luv color-space that means that its computational complexity is three times that obtained in gray scale. And, as the Huang-Algorithm doesn’t show a high performance, we select the Stauffer-Grimson algorithm.
To select a background-foreground algorithm, we assume: (1) processing in the pixel domain, (2) tracking and decision at frame-level, (3) the use of some techniques to reduce the computational complexity, e.g., low-complexity features, processing of pixels in certain regions with high variability. These issues are crucial for a necessary future parallelization of detection algorithms.
Let V ( τ ) be a video of a duration τ containing M ground truth vehicles. It can be considered as a sequence of K images or frames indexed by k = 1, 2, …, K. And each frame at time k can be seen as a matrix, I k of size ( m   x   n ) where each element is a pixel value represented as x k ( i , j ) , and where for the gray-space x k ( i , j ) G 1 and G 1 1 and for a 3D color-space x k ( i , j ) C 3 and C 3 3 , for ( 1 i m ,   1 j n ) . In this work, we use only the grayscale, and then the image at frame k is expressed as:
I k = { x k ( i , j ) | x k ( i , j ) G 1 }
and the background as:
B G k = { x k ( i , j ) | x k ( i , j ) G 1 }
which satisfies some mathematical background criteria.
Based on the before mentioned assumptions, the Adaptive GMM [40] was selected to segment the vehicles from the background mask. Each pixel in the image is modeled through a mixture of Z Gaussian distributions. The probability that a certain pixel has a value x at time 𝑘 can be written as:
P ( x k ) = z = 1 Z ω z , k · η ( x k , μ z , k , z , k ) ,
where ω z , k is an estimate of the weight of the z t h Gaussian in the mixture at time k and η is an n-dimensional Gaussian probability density function, with a mean value μ and a covariance matrix :
η ( x t , μ k , t , k , t ) = 1 ( 2 π ) n 2 | k , t | 1 2 e 1 2 ( x t μ k , t ) k , t 1 ( x t μ k , t ) .
Each pixel value, x k ( i , j ) at position ( i , j ) and frame k, that does not match the background, B G k , is used to construct the foreground B k , also:
B k = { x k ( i , j ) |   Difference   | I k   B G k |   is   significant }
After that, a connected components analysis is performed to group those pixels that model possible vehicles embedded in the input video, and these groups are called blobs in the literature. If a frame k or image contains L groups of possible vehicles or blobs, blo b l k , then:
blo b l k = { x k ( i , j ) | pixel   ( i , j )   is   connected   to   pixel   ( r , s ) ,   and   blo b l k B k }   f o r   l = 1 , , L
Note that, variable l is used to index a possible vehicle and index k for its temporal behavior or frame. Then, for the video V ( τ ) , l = {1, 2, …, N}, and where N = M for the ideal case. Any blob is denoted by blob , specific blob indexed by l is denoted as blob l , and temporal instances as blob k l .

3.3. Feature Extraction

In our case, the blobs are extracted from the foreground mask, and binary morphological operations (erosion and dilation) are performed to reduce noise and enhance the geometry and shape of the objects. Next, blob analysis is used to extract geometric features such as area (the sum of the connected pixels or spatial occupancy), height, width, and centroid of the bounding box, see Figure 3. Finally, if we select d features as explained in Section 3.6, each blob is mapped to a new point or vector x F , d where F d d is a new space before occlusion-handling where the vehicle models live.
It is important to observe the following notation. Any moving vehicle is referred as x F , d and its temporal instances at time or frame k by x k F , d specific vehicle indexed by l is denoted as x l F , d and its temporal instances by x k l F d .

3.4. Occlusion Handling

Due to camera position and height, occlusion occurs, and several errors are generated during the detection stage. The major task of any occlusion-handling algorithm in these scenarios is to minimize effects of the occlusion caused by large vehicles due to the high variance of their feature values. Therefore, we propose a simple algorithm to reduce these occlusion effects. This algorithm is based on the following assumptions:
  • The width of a vehicle cannot be greater than the width of one lane, except when it is a large vehicle that is completely inside the ROI (due to perspective effects), i.e.,:
    if   ( w b w l a n e > T h 1 )   and   ( a ¯ < T h m ) Occlusion
  • The width of a vehicle that is before the detection line cannot be greater than the width of two lanes, even if it is a large vehicle, i.e.,:
    if   ( w b w l a n e > T h 2 )   and   ( blob   is   before   D ) Occlusion
    where w b is the vehicle width (bounding box width), w l a n e is the lane width, a ¯ is the normalized area, D is the detection line, and T h 1 ,    T h 2 , and   T h m   are the thresholds with values 1.22, 2.27, and 0.12, respectively. The values of thresholds were selected using a training video with occluded vehicles; the values that increase the detection rate were selected. If at least one case is fulfilled (Figure 4a,c), then we use the lane-dividing lines to separate vehicles traveling side by side, which are detected as a single object.

3.4.1. Algorithm for Occlusion Handling Based on Lane Division

Inputs: D , B k , { c m , k ; m = 1 , , M } , { a m , k ; m = 1 , , M } , { b m , k ; m = 1 , , M } , and { L j ; j = 1 , ,   J }
Outputs: B k , { c m , k ; m = 1 , , M } , { a m , k ; m = 1 , , M } , and { b m , k ; m = 1 , , M } ,
where D is the detection line, B k is the foreground mask in frame k ,   L j is the j t h lane-dividing line, c m , k ,   a m , k and b m , k are the central point, area, and bounding box of vehicle m in frame k , and B k , is the updated foreground mask.
For each blob blo b l k of B k :
Step 1: Find L j and L j + 1 for c m , k (Figure 5).
Step 2: Estimate the lane width at point c m , k , as follows:
w l a n e j , k ( c m , k ) = | x L j ( y c ) x L j + 1 ( y c ) | ,
where x L j ( y c ) is the abscissa of the point on the j t h lane-dividing line with y c as the ordinate (Figure 5).
Step 3: Compute the normalized area as follows:
a ¯ m , k   = a m , k w 2 l a n e j , k ( c m , k )
Step 4: Check if there is occlusion using Equations (3) and (4). If at least one case is fulfilled, then draw:
{ L j + 1 if    d ( c m , k , L j ) > d ( c m , k , L j + 1 ) L j otherwise
where d ( c m , k , L j ) and d ( c m , k , L j + 1 ) can be defined as follows:
d ( c m , k , L j ) = | x L j ( y c ) x c | ,
d ( c m , k , L j + 1 ) = | x L j + 1 ( y c ) x c | .
Step 5: If all blobs have been analyzed and at least one lane-dividing line drawn, then extract the features, update the space B k , and end the algorithm. Otherwise, go to step 1.
The algorithm for occlusion handling considers a static camera, and previous initialization of system, i.e., the lane-dividing lines must be defined. If the camera changes its position, it will be consider as another scenario, then the initialization of system is required. The vehicles are detected in an area of approximately 5380 ft2, once an object is detected, the algorithm for handling occlusions begins to work.
Challenge: The challenge of any occlusion-handling algorithm in these scenarios is to minimize the effects of occlusion caused by large vehicles due to the high variance of their feature values, delivering a uniform space, which will be the input space for the classification stage.
At this point, we will have the new vehicle space B k expressed as:
B k = { x r s | x r s F d ,   and   x r s = x k l   when   there   is   not   occlusion }
where index s = {1, 2, …, S} and S is the number of vehicles after occlusion, i.e., S ≥ N.

3.4.2. Vehicle Occlusion Index

Occlusion is an open issue in this area. Some authors classify it into total and partial and some measurements with the area are given. For vehicle traffic surveillance, under the assumption that the detection algorithm perform well, is important to know how frequent the occlusion is and how well the occlusion algorithm performs its function. As occlusion occurs in short time intervals, the measurements should be realized in the same intervals. For these purposes, we introduce here a Vehicle Occlusion Index (VOI).
The VOI-Index is defined as the ratio of the number of new vehicles detected using the occlusion algorithm and the total number of new vehicles detected during a time interval:
V O I τ = number   of   new   detected   vehicles   by   the   occlusion   algorithm total   number   of   new   vehicles   detected ,
where τ is the interval of time. A V O I τ = 0 indicates that no new vehicles were detected by the algorithm or that the occlusion was not present in the time interval, while a V O I τ = 1 indicates that the new vehicles detected by the algorithm were tracked and counted too. The VOI versus time is a measure of the frequency with which the occlusion is present. In Table 3 the average VOI-Index for the studied videos is given, while in Section 5 results of the occlusion handling algorithm and VOI-Index are discussed.
Occlusion handling algorithms and occlusion metrics should be studied taking into account: techniques or methods used e.g., convex regions, SVM classifiers, and geometric feature space, computational complexity, classic performance metrics. In addition, they should be tested with long-duration videos and high frame rates, and should be compared with each other.

3.5. Vehicle Tracking

As the Kalman filter (KF) is an efficient and well known recursive filter that estimates the internal state of a linear dynamic system from a series of Gaussian noisy measurements. In mathematical terms, a linear discrete-time dynamical system embodies the following pair of equations [51]:
(1)
Process equation
x k = F x k 1 + ω k 1 ,
where x is the state vector, F is the transition matrix, and ω is the process noise; the subscript k denotes discrete time instant. The process noise is assumed to be additive, white, and Gaussian, with zero mean and the covariance matrix defined by:
E [ ω n ω k ] = { Q k   f   or   n = k 0   f   or   n k ,
where the superscript denotes matrix transposition.
(2)
Measurement equation
z k = H x k + v k ,
where z is the measurement vector, H is the measurement matrix, and v is the measurement noise, which is assumed to be additive, white, and Gaussian, with zero mean and the covariance matrix defined by:
E [ v n v k ] = { R k   f   or   n = k 0   f   or   n k
Since the time of the frame interval is very short, it is assumed that the moving object is in constant velocity within a frame interval. The state in frame k can be represented by the vector:
x k = [ x c , k ,   v x , k ,   y c , k ,   v y , k ] ,
where x c , k , y c , k are the centroid coordinates and v x , k ,   v y , k are the velocity components. The measurement vector of the system can be represented as:
z k = [ x c , k ,   y c , k ] .
For the whole video and frame by frame the blobs blo b k l represented as vector x k l B k are tracked by the corresponding Kalman filters, resulting vehicle tracking sequences Ts ( x ) = { x 1 ,   x 2 ,   ,   x k } as output space, where x represent any moving vehicle and x i are its instances.

3.6. Feature Selection and Environment for Classification

The detection stage delivers the whole space of tracked objects, i.e., detected vehicles or moving objects, to the classification stage. Also, all object tracking sequences, Ts(x), belong to the input space of the classification stage. As each sequence Ts(x) includes geometric and cinematic features and their temporal behaviors, it is necessary to decide where and/or when the instances are taken for classification. Also, for each moving vehicle x corresponds a temporal sequence Ts ( x ) = { x 1 ,   x 2 , ,   x c , ,   x k } where x c should be a well-defined instance of its class.
As these moving objects or vehicles are detected in different points of the ROI, the behaviors of the features are highly variable, and the most significant geometric feature—the area—is not sufficient for a good classification (see later Section 4.3). Studying other geometric features, such as the width and height of the bounding box, we observed that these showed a lower variance than the area (spatial occupancy). Particularly, these three features presented a very high variance for large vehicles, but a relatively low variance for midsize and small vehicles, see Figure 6.
As a class is a subspace of the input space, and inside of each class there are several points, and each point has several instances, is necessary to reduce these intra-class differences. Therefore, we propose for classification:
  • Instead of 1D geometric feature space, the use of a 3-D geometric feature space, 3 d . Then, for the detected vehicles or blobs are used the input points x 3 ,   x = ( Area , Width ,   Width / Height ) .
  • Classification is performed in a specific line of the ROI, called here classification line, to reduce intra-class differences of the space of tracking sequences Ts(x) (see Figure 7).
  • Reduction in the variation of the feature values of any input point by using the average of feature values of the last three instances—detected at k-th frame after the classification line—and projecting them to the classification line, i.e., Proj(x).
Challenge: The challenge is to find and select significant and/or invariant features for a very high detection rate and precision under different weather conditions and for several scenarios.

3.7. Vehicle Classification

Classification is carried out here based on input space and classifiers:
  • 1D feature input space and thresholds.
  • 3D feature input space and K-means.
  • 3D feature input space and SVM.
  • 3D feature input space and OC-SVM.
For case 1, once the estimated area has been computed, the vehicles are classified. The decision rule for classification is defined as:
{ small vehicle if   a ^ n T h s , midsize vehicle if   T h s   a ^ n   T h m , large vehicle otherwise   ,
where T h s and T h m are the thresholds for every class with values of 0.12 and 1.2, respectively.
For cases 2, 3, and 4, the vehicles are represented by vectors x 3 , which will be classified through K-means, SVM, and OC-SVM. In the classification employing the OC-SVM algorithm, a model for each class was defined. OC-SVM allows considering different behaviors of the detected blobs belonging to the same class.
OC-SVM [52,53,54,55] maps input data x 1 ,   ,   x N A into a high dimensional space F (via Kernel k(x,y)) and finds the maximal margin hyperplane that best separates the training data from the origin. To do this, the following quadratic program must be solved [52]:
min w F , b ,   ξ N 1 2 w 2 + 1 υ N i N ξ i b ,
Subject to ( w φ ( x i ) ) b ξ i ;   ξ i 0 ,   υ ( 0 , 1 ] , where w is the normal vector, φ is a map function A F , b is the bias, ξ i are nonzero slack variables, υ is the outlier parameter control, and k ( x , y ) = φ ( x ) , φ ( y ) . The equation is solved through a kernel function and Lagrangian multipliers i , and the solution returns a decision function of:
f ( x ) = s g n ( i N α i k ( x i , x ) b )
where w = i α i φ ( x i ) and i α i = 1 . The kernel function used in this paper is the RBF, k ( x , y ) = e η x y .
Challenge: The challenge in the classification is to find mathematical classifiers of the hypothesis set that allow mapping every point of the input space to the corresponding classes of the output space with minimal error.

4. Experimental Results

4.1. Video Processing: Test Environment

In this work, the performance of the proposed system was tested on real traffic videos: three videos, V1, V2, and V3, recorded in Guadalajara, Mexico; two videos (V4, V5) obtained from the GRAM Road-Traffic Monitoring (GRAM-RTM) dataset [56,57] (the video named V4 corresponds to video M-30, and the video named V5 is video M-30-HD); and video (V6, V7, and V8) recorded in Britain’s M6 motorway (see [58]).
The resolution of all videos was reduced to 420 × 240 pixels at 25 frames per second and downsampling was performed to decrease the computation time. The camera’s field of view was directly ahead of the vehicles. Videos V1, V2, and V3 were recorded with a cell phone at a height of 19.5 ft on the road. This video contains double trailer traffic, which is not present in the other videos. In addition, there is quite a bit of vibration. All image frames were visually inspected to provide the ground truth (GT) dataset for evaluation purposes.
Table 3 shows the number of frames in each video, the traffic load, and the place and weather conditions. In addition, more than 61 min of video, 4111 ground truth vehicles, three places in different countries and weather conditions, a traffic load of up to 1.32 vehicles/s with traffic load peaks from 2 to 4 vehicles/s (see Figure 8), and a vehicle occlusion index—VOI—from 0.00 to 0.312.
The system was implemented in MATLAB and tested on an Intel Core i7 PC, with a 3.40 GHz CPU and 16 GB RAM. The metrics used to characterize the system performance in different stages are the same, i.e.,:
D e t e c t i o n   r a t e   o r   R e c a l l   = T P T P + F N
P r e c i s i o n = T P T P + F P
F   m e a s u r e = 2 × R e c a l l × P r e c i s i o n R e c a l l + P r e c i s i o n
where TP, FP and FN have different interpretations depending on the stage where they are used. In the detection stage:
  • GT in the video is the ground truth or input space,
  • TP is the number of vehicles successfully detected,
  • FP is the number of false vehicles detected as vehicles,
  • FN is the number of vehicles not detected,
  • GT’ is the output space or the set of all points detected as moving vehicle, then GT’ is greater than GT.
In the classification stage, for the classes S small, M midsize and L large vehicles:
  • G T is now the new input space for classification,
  • T P ( class   i ) is the number of vehicles classified into the correct class i ,
  • F P ( class   i ) is the number of vehicles classified into class i that belong to another class j ,   j i ,
  • F N ( class   i ) is the number of vehicles of class i classified into another class j ,   j i .
For M classes
G T ( class   i ) = T P ( class   i ) + F N ( class   i )
Any point x F N ( class   i ) will be classified into another class j ,   j i ; then this point will be seen as F P ( class   j ) , and consequently:
F N ( class   i ) = j = 1 j i M F P i ( class   j ) ,
where F P i ( class   j ) are the elements of class i classified as belonging to class j ,   j i :
i = 1 M F N ( class   i ) = i = 1 M j = 1 j i M F P i ( class   j )
Consequently, for each class i, we will have their associated metrics e.g., DR(class i), Precision (class i) and F-measure (class i), which have generally different numerical values from one to another class, see Table 4, class S, M or L of any video:
D R ( class   i ) P r e c i s i o n ( class   i ) F m e a s u r e ( class   i ) .
But, for the classifier with all classes we have:
T P ( all   classes ) = i = 1 M T P ( class   i )
F N ( all   classes ) = i = 1 M F N ( class   i )
F P ( all   classes ) = i = 1 M F P ( class   i )
and from Equations (23) and (24):
F N ( all   classes ) = F P ( all   classes ) ,
Then, the following metrics, although with different physical meanings, are numerically equal each other, i.e., (Equations (19), (20) and (21), and see Table 5, for all classes of any video:
D R ( all   classes ) = P r e c i s i o n   ( all   classes ) = F m e a s u r e   ( all   classes )
The most significant metrics are detection rate or recall for the detection stage and F-measure for the classification stage, because it works on the complete input space for these scenarios, i.e., the space including TP, FP and FN—see Equations (19), (20) and (21).

4.2. Vehicle Detection Results

Table 4 shows the experimental results of the detection stage using the occlusion algorithm. Experimental results show that the detection stage without the occlusion-handling algorithm has a detection rate of 83.793% (see Table A1), while that using the occlusion-handling algorithm in the detection stage improves the detection rate by 11.423%, and the number of vehicles detected increased to 95.216%. During the detection stage of these videos, a very strong correlation was found between F-measure and the measured VOI index.
FP are produced by various conditions: camera locations with high vibration, camera angle, certain morphological operations embedded in the detection algorithm and because the occlusion algorithm divides large blobs into two or smaller ones, and some of them are not vehicles, i.e., FP. Particularly, videos V1, V2, and V3 were recorded in Mexico, where very large vehicles can transit, and the locations of the cameras showed a high vibration. The V4 and V5 videos were recorded in Madrid, Spain, showing a VOI index equal to 0 and the lowest FP numbers. While V6, V7, and V8 with a VOI index close to 0.2 showed results considered normal. These results show that it is necessary to improve the implemented occlusion handling algorithm, using other methods such as the convexity of the blobs and techniques such as K-means and SVM.

4.3. Vehicle Classification Results

The LIBSVM library [59] was used to implement the OC-SVM—and SVM—classification with a RBF Kernel. Additionally, for comparison purposes, K-means algorithm was implemented. Figure 9 shows one example for every vehicle class.
Table 5 shows the experimental results of the classification stage (with occlusion handling in the detection stage) using OC-SVM and the three selected features (area, width, relHW), where S, M, and L denote small, midsize, and large vehicles, respectively.
Table 6 shows the experimental results of videos V6, V7, and V8 in the classification stage (with occlusion handling in the detection stage) using the thresholds, K-means, SVM and OC-SVM and the three selected features (area, width, relHW), where S, M, and L are small, midsize and large vehicles, respectively.
Experimental results show that the performance of the classifiers increases when using three geometric features. In addition, SVM and OC-SVM classifiers have better performance than K-means. By using a single geometric feature, e.g., area, the recall and particularly the F-measure were 77.322%. However, using 3D feature input space and OC-SVM, the F-measure achieved a value of 98.190%.

5. Discussion

5.1. Test Environment

Eight videos with 4111 manually labelled ground truth vehicles and a duration of more than 61 min, three places in different countries and under different weather conditions, a mean traffic load of up to 1.32 vehicles/s with traffic load peaks from 2 to 4 vehicles/s (see Figure 8), and a vehicle occlusion index of up to 0.312. The system performs well and in real time under all these scenarios.

5.2. Occlusion Handling Algorithm and VOI-Index

As multiple vehicles will be detected as one due to perspective effects or shadows, an algorithm to reduce this occlusion was implemented. This algorithm allows improving the detection rate from 83.793% to 95.216% (see details in Table A1). FP are produced by various conditions: camera locations with high vibration, camera angle, certain morphological operations embedded in the detection algorithm and because the occlusion algorithm divides large blobs into two or smaller ones, and some of them are not vehicles, see Section 4.2, for details about videos V1–V7. From Table 3 and Table 4 we can conclude that a VOI-Index = 0 doesn’t mean that the number of FN is equal to 0, but indicates us that the algorithm for detection of moving vehicles should be improved.

5.3. Clustering Analysis

Clustering analysis, e.g., K-means, SVM, OC-SVM, was employed to classify the vehicles into three classes: small, midsize, and large. The use of these algorithms in the classification stage allows considering all variations in the geometric vehicle features observed in the training data.

5.4. SVM and OC-SVM

SVM and OC-SVM were the classifiers with the best performance; OC-SVM achieved a global recall and an F-measure of up to 98.525%, and a F-measure of 99.211% for medium size vehicles of video V6. The authors consider that the performance differences between SVM and OC-SVM are due to the parameters selected. In this work, the values of parameter C and η used to evaluate the SVM classifier are { 1 ,   5 ,   36 } and { 0.5 ,   0.65 ,   0.95 } , respectively. The parameter values for evaluating OC-SVM, i.e., η and υ , are { 1 ,   10.5 ,   15 } and { 0.001 ,   0.01 ,   0.1 } , respectively. The misclassification cases were due to unsolved occlusions in the detection stage, particularly in those cases where the vehicles move bumper-to-bumper. In future work, we will consider improving detection with a more efficient occlusion algorithm and other methods for background formation.
Behaviors with variations in the perspective views can be observed in video V2 and V3, where although the camera position changed 20 ft, only the models generated from video V2 were used for the classification stage of both videos, indicating that for certain lateral displacement of the camera, the algorithm is robust. In the K-means algorithm, the value of K = 3. Due to the short length of the training data for small vehicles, the K-means centroids may be biased; thus, the mean of each geometric feature was computed previously, and this information was passed as input to the K-means algorithm.

5.5. 3-D Geometric Feature Space

With the use of Area, Width, and Width/Height ratio of the bounding box—the classification performance was improved with respect to that using only one feature: the area (see Table 6). The geometric features are extracted directly of detected blobs; therefore, the computational cost is lower than those achieved with other features proposed in the state-of-the-art, like grey-level co-ocurrence matrix, texture coarseness, or Histogram of Oriented Gradients.

5.6. Real Time Processing

The average time to process one image frame in our system is less than 30 ms, which proves that our approach can run in real time for videos at 25 fps, and with an average-traffic load of 1.32 vehicles per second and peaks of 4 vehicles per second. In general, the higher the traffic load—particularly with large size vehicles—the higher the measured congestion is the vehicle occlusion index.
In this paper, a high-performance computer vision system is proposed for vehicle detection, tracking, and OC-SVM classification, which has the following advantages:
  • For the GMM based detection stage, the system does not require sample training and camera calibration.
  • Except for ROI, lane-dividing lines, the detection line, and the classification line, it requires no other initialization.
  • A proposed simple algorithm reduces occlusions, particularly in those cases where vehicles move side by side.
  • The use of OC-SVM and a 3D geometric feature space for the classification stage.

6. Conclusions

A very high-performance vision system with a single static camera, suitable for an IoT Smart City, for front- and rear-view moving vehicle detection, tracking, counting, and classification was achieved, implemented, and tested. The number and quality of employed metrics outperforms those used in most comparable papers.
The vehicle occlusion index defined here is a measure of how frequent the occlusion is, and how well the occlusion-handling algorithm performs its function. Our results support that the lower the VOI-Index, the better the performance of the algorithms for detection and classification.
Experimental results showed that our system performs well in real time with an average traffic flow of 1.32 vehicles per second and traffic load peaks from 2 to 4 vehicles/s on a three-lane road. A mean processing time of about 75% between two consecutive frames was achieved. The best classifiers were with SVM, where OC-SVM with a RBF Kernel successfully classified the vehicles with a high performance, e.g., recall, precision, and F-measure of up to 98.190%; and up to 99.051% for the midsize class.
The high performance of this system is due to the use of a 3D geometric feature space with side-occlusion handling as an output space of the detection stage (input feature space for the classification), the use of OC-SVM with a RBF Kernel in the classification stage, and the classification is performed in a specific line of the ROI to reduce intra-class differences of the input space.
Finally, an extensive test environment is available for researchers. It has eight videos with 4111 manually labelled ground truth vehicles and a duration of more than 61 min, three places in different countries and under different weather conditions, a mean traffic load of up to 1.32 vehicles/s with traffic load peaks from 2 to 4 vehicles/s (see Figure 8), and a vehicle occlusion index of up to 0.312.
Open Issues remaining after this study include:
  • Develop algorithms for the formation of background with different color spaces and updating is crucial for the different stages of traffic surveillance.
  • Develop algorithms for automatic detection of the ROI and the lane-dividing lines.
  • Improve algorithms for occlusion caused by high traffic loads, particularly for large vehicles, to increase the detection rate and, consequently, decrease variance of the values of points belonging to the input space for tracking and classification, and to characterize the occlusion by metrics.
  • Due to the number of features associated with this problem and the variance of intra-class and interclass feature values, the determination of the optimal number of classes for classification remains an open issue.

Acknowledgments

This work is supported partially by Intel Grant, and CONACYT project with ID: 253955. The authors would like to acknowledge the financial support of Intel Corporation for the development of this project. The authors acknowledge YouTube user DriveCamUK for the video that was analyzed in this work, https://www.youtube.com/watch?v=PNCJQkvALVc Also, the authors acknowledge the GRAM Road-Traffic Monitoring (GRAM-RTM) dataset, http://agamenon.tsc.uah.es/Personales/rlopez/data/rtm/.

Author Contributions

Roxana Velazquez-Pupo and Alberto Sierra-Romero conceived and designed the experiments for the detection, occlusion handling, and tracking stages; Jayro Santiago-Paz designed the classification stage and performed the experiments; all the authors analyzed the data and wrote the paper under the guidance of Deni Torres-Roman and Yuriy V. Shkvarko, David Gómez-Gutiérrez supported some technical aspects of the project, and Daniel Robles-Valdez, Fernando Hermosillo-Reynoso, Misael Romero-Delgado assisted with aspects related to testing and certain improvements of the algorithms.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Appendix A

Links to the video processing files uploaded at YouTube.

Appendix B

Table A1 shows the comparison between the results using an occlusion algorithm in the detection stage or not for videos V6, V7, and V8.
Table A1. Experimental results of the detection stage of videos V6, V7, and V8.
Table A1. Experimental results of the detection stage of videos V6, V7, and V8.
TestVideoGTTPFPFNDetection RatePrecisionF-Measure
Without occlusion handlingV6797653614481.93299.08989.697
V77256241210186.06998.11391.697
V89037551614883.61097.92490.203
Total242520323439383.79398.35490.492
With occlusion handlingV6797761533695.48393.48894.475
V7725686433994.62094.10194.360
V8903862824195.45991.31393.340
Total2425230917811695.21692.84294.014
Table A2 shows the confusion matrix obtained in the classification stage of videos V6, V7, and V8; (a–d) are the confusion matrix of the threshold, K-means, SVM, and OC-SVM methods, respectively.
Table A2. Matrix confusion of the classification stage of videos V6, V7, and V8.
Table A2. Matrix confusion of the classification stage of videos V6, V7, and V8.
ThresholdK-Means
SMLT SMLT
S91010S100010
M4341875272336M2462079112336
L406239141L123117141
T 2487T 2487
(a)(b)
SVMOC-SVM
SMLT SMLT
S160016S73010
M992214202333M132298252336
L14133138L13137141
T 2487T 2487
(c)(d)

References

  1. Sivaraman, S.; Trivedi, M.M. Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1773–1795. [Google Scholar] [CrossRef]
  2. Liu, X.; Dai, B.; He, H. Real-time on-road vehicle detection combining specific shadow segmentation and SVM classification. In Proceedings of the 2011 Second International Conference on Digital Manufacturing and Automation (ICDMA), Zhangjiajie, China, 5–7 August 2011; pp. 885–888. [Google Scholar] [CrossRef]
  3. Fang, S.; Liao, H.; Fei, Y.; Chen, K.; Huang, J.; Lu, Y.; Tsao, Y. Transportation modes classification using sensors on smartphones. Sensors 2016, 16, 1324. [Google Scholar] [CrossRef] [PubMed]
  4. Oh, S.; Kang, H. Object detection and classification by decision-level fusion for intelligent vehicle systems. Sensors 2017, 17, 207. [Google Scholar] [CrossRef] [PubMed]
  5. Llorca, D.; Sánchez, S.; Ocaña, M.; Sotelo, M. Vision-based traffic data collection sensor for automotive applications. Sensors 2010, 10, 860–875. [Google Scholar] [CrossRef] [PubMed]
  6. Xu, Y.; Yu, G.; Wang, Y.; Wu, X.; Ma, Y. A hybrid vehicle detection method based on viola-jones and HOG + SVM from UAV images. Sensors 2016, 16, 1325. [Google Scholar] [CrossRef] [PubMed]
  7. Cao, X.; Wu, C.; Yan, P.; Li, X. Linear SVM classification using boosting HOG features for vehicle detection in low-altitude airborne videos. In Proceedings of the 2011 18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, 11–14 September 2011; pp. 2421–2424. [Google Scholar] [CrossRef]
  8. Lamas-Seco, J.; Castro, P.; Dapena, A.; Vazquez-Araujo, F. Vehicle classification using the discrete fourier transform with traffic inductive sensors. Sensors 2015, 15, 27201–27214. [Google Scholar] [CrossRef] [PubMed]
  9. Zhou, F.; Wang, M. A new SVM algorithm and AMR sensor based vehicle classification. In Proceedings of the Second International Conference On Intelligent Computation Technology and Automation, Changsha, China, 10–11 October 2009; pp. 421–425. [Google Scholar] [CrossRef]
  10. Zhang, C.; Chen, Y. The research of vehicle classification using SVM and KNN in a ramp. In Proceedings of the International Forum on Computer Science-Technology and Applications, Chongqing, China, 25–27 December 2009; pp. 391–394. [Google Scholar] [CrossRef]
  11. Lipton, A.J.; Fujiyoshi, H.; Patil, R.S. Moving target classification and tracking from real-time video. In Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision, Princeton, NJ, USA, 19–21 October 1998; pp. 8–14. [Google Scholar] [CrossRef]
  12. Cucchiara, R.; Piccardi, M.; Mello, P. Image analysis and rule-based reasoning for a traffic monitoring system. Intell. Transp. Syst. 2000, 1, 119–130. [Google Scholar] [CrossRef]
  13. Zhang, G.; Avery, R.; Wang, Y. Video-based vehicle detection and classification system for real-time traffic data collection using uncalibrated video cameras. Transp. Res. Board 2007, 1993, 138–147. [Google Scholar] [CrossRef]
  14. Gupte, S.; Masoud, O.; Martin, R.F.; Papanikolopoulos, N.P. Detection and classification of vehicles. Intell. Transp. Syst. IEEE Trans. 2002, 3, 37–47. [Google Scholar] [CrossRef]
  15. Nagai, A.; Kuno, Y.; Shirai, Y. Surveillance system based on spatio-temporal information. In Proceedings of the IEEE International Conference on Image Processing, Lausanne, Switzerland, 19 September 1996; pp. 593–596. [Google Scholar] [CrossRef]
  16. Xu, T.; Liu, H.; Qian, Y.; Zhang, H. A novel method for people and vehicle classification based on Hough line feature. In Proceedings of the International Conference on Information Science and Technology (ICIST), Nanjing, China, 26–28 March 2011; pp. 240–245. [Google Scholar] [CrossRef]
  17. Kafai, M.; Bhanu, B. Dynamic bayesian networks for vehicle classification in video. IEEE Trans. Ind. Inform. 2012, 8, 100–109. [Google Scholar] [CrossRef]
  18. Hu, W.; Tan, T.; Wang, L.; Maybank, S. A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C 2004, 34, 334–352. [Google Scholar] [CrossRef]
  19. Bottino, A.; Garbo, A.; Loiacono, C.; Quer, S. Street viewer: An autonomous vision based traffic tracking system. Sensors 2016, 16, 813. [Google Scholar] [CrossRef] [PubMed]
  20. Hsieh, J.-W.; Yu, S.-H.; Chen, Y.-S.; Hu, W.-F. Automatic traffic surveillance system for vehicle tracking and classification. IEEE Trans. Intell. Transp. Syst. 2006, 7, 175–187. [Google Scholar] [CrossRef]
  21. Pham, H.V.; Lee, B.-R. Front-view car detection and counting with occlusion in dense traffic flow. Int. J. Control Autom. Syst. 2015, 13, 1150–1160. [Google Scholar] [CrossRef]
  22. Li, X.; Wang, K.; Wang, W.; Li, Y. A multiple object tracking method using Kalman filter. In Proceedings of the 2010 IEEE International Conference on Information and Automation, Harbin, China, 20–23 June 2010; pp. 1862–1866. [Google Scholar] [CrossRef]
  23. Weng, S.-K.; Kuo, C.-M.; Tu, S.-K. Video object tracking using adaptive Kalman filter. J. Vis. Commun. Image Represent. 2006, 17, 1190–1208. [Google Scholar] [CrossRef]
  24. Li, N.; Liu, L.; Xu, D. Corner feature based object tracking using adaptive Kalman filter. In Proceedings of the 9th International Conference on Signal Processing ICSP, Beijing, China, 26–29 October 2008; pp. 1432–1435. [Google Scholar] [CrossRef]
  25. De Oliveira, A.B.; Scharcanski, J. Vehicle counting and trajectory detection based on particle filtering. In Proceedings of the 23rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Gramado, Brazil, 30 August–3 September 2010; pp. 376–383. [Google Scholar]
  26. Ranga, H.T.P.; Kiran, M.R.; Shekar, S.R.; Kumar, S.K.N. Vehicle detection and classification based on morphological technique. In Proceedings of the International Conference on Signal and Image Processing (ICSIP), Chennai, India, 15–17 December 2010; pp. 45–48. [Google Scholar] [CrossRef]
  27. Gupte, S.; Masoud, O.; Papanikolopoulos, P. Vision-based vehicle classification. In Proceedings of the IEEE Intelligent Transportation Systems, Dearborn, MI, USA, 1–3 October 2000; pp. 46–51. [Google Scholar] [CrossRef]
  28. Liu, Y.; Wang, K. Vehicle classification system based on dynamic Bayesian network. In Proceedings of the IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), Qingdao, China, 8–10 October 2014; pp. 22–26. [Google Scholar] [CrossRef]
  29. Xiong, N.; He, J.; Park, J.H.; Cooley, D.H.; Li, Y. A Neural network based vehicle classification system for pervasive smart road security. J. UCS 2009, 15, 1119–1142. [Google Scholar]
  30. Goyal, A.; Verma, B. A neural network based approach for the vehicle classification. In Proceedings of the IEEE Symposium on Computational Intelligence in Image and Signal Processing, Honolulu, HI, USA, 1–5 April 2007; pp. 226–231. [Google Scholar] [CrossRef]
  31. Ozkurt, C.; Camci, F. Automatic traffic density estimation and vehicle classification for traffic surveillance systems using neural networks. Math. Comput. Appl. 2009, 14, 187–196. [Google Scholar] [CrossRef]
  32. Lee, S.H.; Bang, M.; Jung, K.H.; Yi, K. An efficient selection of HOG features for SVM classification of vehicle. In Proceedings of the 2015 IEEE International Symposium on Consumer Electronics (ISCE), Madrid, Spain, 24–26 June 2015; pp. 1–2. [Google Scholar] [CrossRef]
  33. Arróspide, J.; Salgado, L.; Nieto, M. Video analysis-based vehicle detection and tracking using an MCMC sampling framework. EURASIP J. Adv. Signal Process. 2012, 2012, 2. [Google Scholar] [CrossRef] [Green Version]
  34. Huang, S.C. An advanced motion detection algorithm with video quality analysis for video surveillance systems. IEEE Trans. Circuits Syst. Video Technol. 2011, 21, 1–14. [Google Scholar] [CrossRef]
  35. Hu, Z.; Wang, C.; Uchimura, K. 3D vehicle extraction and tracking from multiple viewpoints for traffic monitoring by using probability fusion map. In Proceedings of the 2007 IEEE Intelligent Transportation Systems Conference, Seattle, WA, USA, 30 September–3 October 2007; pp. 30–35. [Google Scholar] [CrossRef]
  36. Zhang, W.; Wu, Q.M.J.; Yang, X.; Fang, X. Multilevel framework to detect and handle vehicle occlusion. IEEE Trans. Intell. Transp. Syst. 2008, 9. [Google Scholar] [CrossRef]
  37. Fang, W.; Zhao, Y.; Yuan, Y.; Liu, K. Real-time multiple vehicles tracking with occlusion handling. In Proceedings of the 2011 Sixth International Conference on Image and Graphics (ICIG), Hefei, Anhui, 12–15 August 2011; pp. 667–672. [Google Scholar]
  38. Saunier, N.; Sayed, T. A feature-based tracking algorithm for vehicles in intersections. In Proceedings of the 3rd Canadian Conference on Computer and Robot Vision, Quebec City, QC, Canada, 7–9 June 2006. [Google Scholar]
  39. Shirazi, M.S.; Morris, B. Vision-Based Vehicle Counting with High Accuracy for Highways with Perspective View; Springer: Cham, Switzerland, 2015; pp. 809–818. [Google Scholar]
  40. Stauffer, C.; Grimson, W.E.L. Adaptive background mixture models for real-time tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, USA, 23–25 June 1999; Volume 2. [Google Scholar] [CrossRef]
  41. Mandellos, N.A.; Keramitsoglou, I.; Kiranoudis, C.T. A background subtraction algorithm for detecting and tracking vehicles. Expert Syst. Appl. 2011, 38, 1619–1631. [Google Scholar] [CrossRef]
  42. Cheng, F.C.; Chen, B.H.; Huang, S.C. A hybrid background subtraction method with background and foreground candidates detection. ACM Trans. Intell. Syst. Technol. 2015, 7, 7. [Google Scholar] [CrossRef]
  43. Huang, Z.; Qin, H.; Liu, Q. Vehicle ROI extraction based on area estimation gaussian mixture model. In Proceedings of the 2017 3rd IEEE International Conference on Cybernetics (CYBCONF), Exeter, UK, 21–23 June 2017; pp. 1–7. [Google Scholar]
  44. Kamkar, S.; Safabakhsh, R. Vehicle detection, counting and classification in various conditions. IET Intell. Transp. Syst. 2016, 10, 406–413. [Google Scholar] [CrossRef]
  45. Liang, M.; Huang, X.; Chen, C.H.; Chen, X.; Tokuta, A. Counting and classification of highway vehicles by regression analysis. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2878–2888. [Google Scholar] [CrossRef]
  46. Moussa, G.S. Vehicle type classification with geometric and appearance attributes. Int. J. Civ. Arch. Sci. Eng. 2014, 8, 273–278. [Google Scholar]
  47. Sun, Z.; Bebis, G.; Miller, R. Monocular Precrash vehicle detection: Features and classifiers. IEEE Trans. Image Process. 2006, 15, 2019–2034. [Google Scholar] [CrossRef] [PubMed]
  48. Chen, Z.; Pears, N.; Freeman, M.; Austin, J. A Gaussian mixture model and support vector machine approach to vehicle type and Colour classification. IET Intell. Transp. Syst. 2014, 8, 135–144. [Google Scholar] [CrossRef]
  49. TelecomCinvesGdl—Youtube. Available online: https://www.youtube.com/channel/UCGcLe9kzQvJGkeR_AO1cBwg (accessed on 3 June 2017).
  50. Power, P.W.; Schoonees, J.A. Understanding background mixture models for foreground segmentation. In Proceedings of the Proceedings Image and Vision Computing, Auckland, New Zealand, 26–28 November 2002; pp. 10–11. [Google Scholar]
  51. Grewal, M.S.; Andrews, A.P. Kalman Filtering: Theory and Practice with MATLAB; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  52. Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.C.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
  53. Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
  54. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
  55. Schölkopf, B.; Burges, C.J.C.; Smola, A.J. Advances in Kernel Methods: Support Vector Learning; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
  56. Guerrero-Gomez-Olmedo, R.; Lopez-Sastre, R.J.; Maldonado-Bascon, S.; Fernandez-Caballero, A. Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation; IWINAC 2013, Part II, LNCS 7931; Springer: Berlin, Germany, 2013; pp. 306–316. [Google Scholar]
  57. GRAM Road-Traffic Monitoring. Available online: http://agamenon.tsc.uah.es/Personales/rlopez/data/rtm/ (accessed on 3 June 2017).
  58. M6 Motorway Traffic—Youtube. Available online: https://www.youtube.com/watch?v=PNCJQkvALVc (accessed on 3 June 2017).
  59. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2. [Google Scholar] [CrossRef]
Figure 1. Block diagram of the proposed system.
Figure 1. Block diagram of the proposed system.
Sensors 18 00374 g001
Figure 2. System initialization.
Figure 2. System initialization.
Sensors 18 00374 g002
Figure 3. Vehicle detection: (a) actual image, green lines indicate the ROI, and blue line the detection line; (b) background and (c) foreground mask.
Figure 3. Vehicle detection: (a) actual image, green lines indicate the ROI, and blue line the detection line; (b) background and (c) foreground mask.
Sensors 18 00374 g003
Figure 4. Occlusion handling when cases 1 and 2 are fulfilled, green lines indicate the ROI, and blue line the detection line. Actual image and foreground mask, (a,c) before applying the algorithm and (b,d) after applying the algorithm.
Figure 4. Occlusion handling when cases 1 and 2 are fulfilled, green lines indicate the ROI, and blue line the detection line. Actual image and foreground mask, (a,c) before applying the algorithm and (b,d) after applying the algorithm.
Sensors 18 00374 g004
Figure 5. Estimation of the lane width.
Figure 5. Estimation of the lane width.
Sensors 18 00374 g005
Figure 6. Behavior of the selected geometric features of the detected vehicles, (a) area of the detected objects; (b) width of the detected objects, and (c) height of the detected objects.
Figure 6. Behavior of the selected geometric features of the detected vehicles, (a) area of the detected objects; (b) width of the detected objects, and (c) height of the detected objects.
Sensors 18 00374 g006
Figure 7. Projection of the vehicles into a classification line (yellow), green lines indicates the ROI.
Figure 7. Projection of the vehicles into a classification line (yellow), green lines indicates the ROI.
Sensors 18 00374 g007
Figure 8. Traffic load (vehicles per second).
Figure 8. Traffic load (vehicles per second).
Sensors 18 00374 g008
Figure 9. Vehicle examples for every class: (a) small; (b) midsize and (c) large.
Figure 9. Vehicle examples for every class: (a) small; (b) midsize and (c) large.
Sensors 18 00374 g009
Table 1. Related works in the detection of vehicles.
Table 1. Related works in the detection of vehicles.
ReferenceGTFramesScenariosTraffic LoadDR or RecallPrecisionF-Measure
Saunier, N.; Sayed, T. [38] (2006)30283603-88.4--
Hsieh, J.-W.; Yu, S.-H.; Chen, Y.-S.; Hu, W.-F. [20] (2006)20,44316,4003-82.16--
Hu, Z.; Wang, C.; Uchimura, K. [35] (2007)1074Not indicated--99.3--
Zhang, W.; Wu, Q. M. J.; Yang, X.; Fang, X. [36] (2008)427Not indicated--93.87–84.43, 100–83.8--
Fang, W.; Zhao, Y.; Yuan, Y.; Liu, K [37] (2011)22635002-86.8, 100--
Arróspide, J.; Salgado, L.; Nieto, M. [35] (2012)4000NA--96.14, 89.92, 94.14--
Pham, H.V.; Lee, B.-R. [21] (2015)67218,0001-97.17--
Shirazi, M.S.; Morris, B. [39] (2015)Not indicated1080 at 8 fps3-94--
Our System (2017)411192,160 at 25 fps51.3482.42–99.2468.7–99.574.6–98.3
Table 2. Related works in the classification of vehicles.
Table 2. Related works in the classification of vehicles.
ReferenceSensorsScenariosInput SpaceResultReported Metrics
Hsieh, J.-W.; Yu, S.-H.; Chen, Y.-S.; Hu, W.-F. [20] (2006)Camera onlyStatic side-road cameraSize and the “linearity” of a vehicleGlobal TPR of up to 94.8% for cars, minivans, trucks, and van-trucksTPR
Feng, Z.; Mingzhe, W. [9] (2009)Anisotropic magnetoresistive (AMR) sensorVehicle passes through the sensorFeatures of wave length, mean, variance, peak, valley, and acreage86%, 80%, 81%, and 89% TPR for big truck, bus, van, and carTPR
Changjun, Z.; Yuzong, C. [10] (2009)Acoustic signalsVehicles on the road rampSet of frequency feature vectors95.12% accuracy for car, bus, truck, and container truckAccuracy
Chen, Z.; Pears, N.; Freeman, M.; Austin, J. [48] (2014)Stationary roadside (CCTV) cameraStatic side-road cameraSize and width of the blob88.35%, 69.07%, and 73.47% TPR for car, van, and heavy goods vehiclesTPR, TNR, FPR
Moussa, G.S. [46] (2014)Laser sensorTop-down laser over road (different scenarios from those presented here.)Geometric-based features99.5%, 93.0%, and 97.5% TPR for small, midsize, and largeTPR
Liang, M.; Huang, X.; Chen, C.H.; Chen, X.; Tokuta, A. [45] (2015)Camera onlyStatic side-road cameraLow level features79.9%, 63.4%, and 92.7%, TPR for small, midsize, and largeTPR
Lamas-Seco, J.; Castro, P.; Dapena, A.; Vazquez-Araujo, F. [8] (2015)Inductive Loop detectorsVehicle passes through the sensorFourier Transform of inductive signatures Global TPR of up to 95.82% for small, midsize, and largeTPR
Kamkar, S.; Safabakhsh, R. [44] (2016)Camera onlyStatic side-road cameraVehicle length and Grey-Level Co-occurrence matrix features71.9% Global TPR for small, midsize, and largeTPR
Our System (2017)Camera onlyStatic side-road camera3-D geometric-based featuresGlobal TPR of up to 98.190% for small, midsize, and largeRecall or TPR, F-measure, Precision, and VOI-Index
Table 3. Videos analyzed in this work.
Table 3. Videos analyzed in this work.
VideoFramesVehicles per SecondOcclusion IndexRecording PlaceVehicle DirectionWeather
V116,9251.240.312Ringroad, Guadalajara, MexicoFrontSunny
V254001.050.189Ringroad, Guadalajara, MexicoFrontSunny
V338750.750.124Ringroad, Guadalajara, MexicoFront0 to 20 s Sunny, 21 to 140 s Cloudy
V475200.880.000M-30, Madrid, SpainRearSunny
V593900.630.000M-30, Madrid, SpainRearCloudy
V615,0501.320.249M6 motorway, EnglandFrontCloudy
V714,8751.210.203M6 motorway, EnglandFrontCloudy
V819,1251.180.202M6 motorway, EnglandFrontCloudy
Table 4. Experimental results of the detection stage with occlusion handling.
Table 4. Experimental results of the detection stage with occlusion handling.
VideoGTTPFPFNDetection RatePrecisionF-Measure
V184269432414882.42268.17274.623
V22282021042688.59666.01375.655
V3116103301388.79377.4482.730
V42642627299.24297.39798.311
V52362281896.61099.56398.064
V6797761533695.48393.48894.475
V7725686433994.62094.10194.360
V8903862824195.45991.31393.340
Table 5. Experimental results of the classification stage.
Table 5. Experimental results of the classification stage.
VideoClassInput SpaceTPFPFNRecallPrecisionF-Measure
V1S1791791320100.00057.55673.061
M7896692012084.79097.09790.527
L501623432.00088.88847.058
T101886415415484.87284.87284.872
V2S353426197.14256.66671.578
M21017753384.28597.25290.306
L61559690.16385.93788.000
T306266404086.92886.92886.928
V3S11101190.90990.90990.909-
M97958297.93892.23395.000
L25181772.00094.73681.818
T133123101092.48192.48192.481
V4S161512193.75055.55569.767
M23322241195.27998.23096.732
L20142670.00087.50077.777
T269251181893.30893.30893.308
V5S3360100.0033.33350.000
M2202110995.909100.00097.911
L645266.66644.44453.333
T229218111195.19695.19695.196
V6S322166.66750.00057.142
M76675511198.56499.86799.211
L454590100.00083.33390.909
T814802121298.52598.52598.525
V7S213150.00025.00033.333
M68867621298.25599.70598.975
L393710294.87178.72386.046
T729714151597.94297.94297.942
V8S549180.00030.76944.444
M88286731598.29999.65598.972
L57556296.49190.16393.220
T944926181898.09398.09398.093
Table 6. Experimental results of the classification stage of videos V6, V7, and V8 using different input spaces and classifiers.
Table 6. Experimental results of the classification stage of videos V6, V7, and V8 using different input spaces and classifiers.
Classification with the Thresholds and 1D Feature Input Space
TestClassInput SpaceTPFPFNRecallPrecisionF-Measure
With occlusion handlingS109474190.0001.8633.651
M233618756346180.26596.74987.739
L141392710227.65959.09037.681
Total2487192356456477.32277.32277.322
Classification with K-Means and 3D Feature Input Space
TestClassInput SpaceTPFPFNRecallPrecisionF-Measure
With occlusion handlingS10102470100.003.8917.490
M233620792325788.99898.90593.690
L141117112482.97891.40686.988
Total2487220628128188.70188.70188.701
Classification with SVM and 3D Feature Input Space
TestClassInput SpaceTPFPFNRecallPrecisionF-Measure
With occlusion handlingS16161000100.00013.79324.242
M23332214411994.89999.81997.736
L13813320596.37686.92891.408
Total2487236312412495.01495.01495.014
Classification with OC-SVM and 3D Feature Input Space
TestClassInput SpaceTPFPFNRecallPrecisionF-Measure
With occlusion handlingS10714370.00033.33345.161
M2336229863898.37399.73999.051
L14113725497.16384.56790.429
Total24872442454598.19098.19098.190

Share and Cite

MDPI and ACS Style

Velazquez-Pupo, R.; Sierra-Romero, A.; Torres-Roman, D.; Shkvarko, Y.V.; Santiago-Paz, J.; Gómez-Gutiérrez, D.; Robles-Valdez, D.; Hermosillo-Reynoso, F.; Romero-Delgado, M. Vehicle Detection with Occlusion Handling, Tracking, and OC-SVM Classification: A High Performance Vision-Based System. Sensors 2018, 18, 374. https://doi.org/10.3390/s18020374

AMA Style

Velazquez-Pupo R, Sierra-Romero A, Torres-Roman D, Shkvarko YV, Santiago-Paz J, Gómez-Gutiérrez D, Robles-Valdez D, Hermosillo-Reynoso F, Romero-Delgado M. Vehicle Detection with Occlusion Handling, Tracking, and OC-SVM Classification: A High Performance Vision-Based System. Sensors. 2018; 18(2):374. https://doi.org/10.3390/s18020374

Chicago/Turabian Style

Velazquez-Pupo, Roxana, Alberto Sierra-Romero, Deni Torres-Roman, Yuriy V. Shkvarko, Jayro Santiago-Paz, David Gómez-Gutiérrez, Daniel Robles-Valdez, Fernando Hermosillo-Reynoso, and Misael Romero-Delgado. 2018. "Vehicle Detection with Occlusion Handling, Tracking, and OC-SVM Classification: A High Performance Vision-Based System" Sensors 18, no. 2: 374. https://doi.org/10.3390/s18020374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop