Next Article in Journal
TwitPersonality: Computing Personality Traits from Tweets Using Word Embeddings and Supervised Learning
Previous Article in Journal
A Survey on Efforts to Evolve the Control Plane of Inter-Domain Routing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Element-Weighted Neutrosophic Correlation Coefficient and Its Application in Improving CAMShift Tracker in RGBD Video

1
Department of Computer Science and Engineering, Shaoxing University, Shaoxing 312000, China
2
Department of Electrical and Information Engineering, Shaoxing University, Shaoxing 312000, China
3
College of Computer and Information Science, Chongqing Normal University, Chongqing 400700, China
*
Author to whom correspondence should be addressed.
Information 2018, 9(5), 126; https://doi.org/10.3390/info9050126
Submission received: 11 April 2018 / Revised: 10 May 2018 / Accepted: 15 May 2018 / Published: 18 May 2018

Abstract

:
Neutrosophic set (NS) is a new branch of philosophy to deal with the origin, nature, and scope of neutralities. Many kinds of correlation coefficients and similarity measures have been proposed in neutrosophic domain. In this work, by considering that there may exist different contributions for the neutrosophic elements of T (Truth), I (Indeterminacy), and F (Falsity), a method of element-weighted neutrosophic correlation coefficient is proposed, and it is applied for improving the CAMShift tracker in RGBD (RGB-Depth) video. The concept of object seeds is proposed, and it is employed for extracting object region and calculating the depth back-projection. Each candidate seed is represented in the single-valued neutrosophic set (SVNS) domain via three membership functions, T, I, and F. Then the element-weighted neutrosophic correlation coefficient is applied for selecting robust object seeds by fusing three kinds of criteria. Moreover, the proposed correlation coefficient is applied for estimating a robust back-projection by fusing the information in both color and depth domains. Finally, for the scale adaption problem, two alternatives in the neutrosophic domain are proposed, and the corresponding correlation coefficient between the proposed alternative and the ideal one is employed for the identification of the scale. When considering challenging factors like fast motion, blur, illumination variation, deformation, and camera jitter, the experimental results revealed that the improved CAMShift tracker performs well.

1. Introduction

A neutrosophic set (NS) [1] is suitable for dealing with problems with indeterminate information. A neutrosophic set is always characterized independently via three membership functions, T, I, and F. These functions in the universal set X are real standard or nonstandard subsets of [0, 1+]. It is difficult to introduce the NS theory into science and engineering areas when considering the non-standard unit interval. The single-valued neutrosophic set (SVNS) [2] was proposed to handle such a problem. The membership functions are restrained to the normal standard real unit interval [0, 1].
Until now, NS has been successfully applied in a lot of fields [3] such as medical diagnosis [4], image segmentation [5,6,7,8,9], skeleton extraction [10], and object tracking [11,12]. Several similarity measurements or multicriteria decision-making methods [13,14,15,16,17,18,19,20] were employed to handle the neutrosophic problems. Decision-making can be regarded as a problem-solving activity terminating in a solution deemed to be satisfactory. It has been applied for residential house garage location selection [18], element and material selection [19], and sustainable market valuation of buildings [20]. For the application of image segmentation, several criteria in the NS domain were usually proposed for calculating a specific neutrosophic image [5,6,7,8,9]. The correlation coefficient between SVNSs [17] was applied for calculating a neutrosophic score-based image [9], and a robust threshold was estimated by employing the OTSU’s method [9]. In [11], two criteria were proposed in both color and depth domain. The information fusion problem was converted into a multicriteria decision-making issue, and the single-valued neutrosophic cross-entropy was employed to tackle this problem [11]. For the neutrosophic theory-based MeanShift tracker [12], by taking the consideration of the background information and appearance changes between frames, two kinds of criteria were considered, the object feature similarity and the background feature similarity. The SVNS correlation coefficient [17] was applied for calculating the weighted histogram, and then the histogram was finally used to enhance the traditional MeanShift tracker. Besides the fields mentioned above, the NS theory was also introduced into clustering algorithms such as c-means [21]. While NS-based correlation coefficients have been widely used for solving some engineering issues, the weights of the three membership functions are always treated equally. However, for the elements of T, I, and F, different contributions should be taken into consideration of the results of decision-making. In this work, a method of element-weighted neutrosophic correlation coefficient is proposed, and we try to utilize it for tracking a visual object in RGBD (RGB-Depth) video.
Visual object tracking is still an open issue [22,23,24], and its robustness still lacks the performance of vision applications like surveillance, traffic monitoring, video indexing, and auto-driving. For a tracking task, challenges like background clutter, fast motion, illumination variation, scale variation, motion blur, and camera jitter may happen during the tracking procedure. Two methods are considered to handle these challenging problems. One is utilizing robust features. The color feature is employed by the MeanShift [25] and CAMShift [26] tracker, due to its robustness when there exist challenges like deformation, blur, and rotation. Both trackers are of high efficiency [27]. However, when the surroundings have a similar color, they may easily drift from the target. To deal with such a problem, Cross-Bin metric [28], scale-invariant feature transform (SIFT) [29], and texture feature [30] were introduced into the mean shift-based tracker, and better performance was achieved. The other way is to train robust models, such as the multiple instance learning [31,32] and compressive sensing [33]-based tracking. Besides the kinds of trackers mentioned above, the local-global tracker (LGT) [34], incremental visual tracker (IVT) [35] and tracking-learning-detection (TLD) [36] also perform well.
Recently, due to the fact that the depth information can provide another dimension for tackling the object tracking problem, some RGBD-based trackers [11,37,38,39] have been proposed. Most algorithms are oriented to specific targets [37,38,39]. A few category-free RGBD trackers have been proposed. An improved CAMShift tracker was proposed by using the neutrosophic decision-making method in [11], but the indeterminate information was only considered in the information fusion phase. As mentioned above, the CAMShift tracker will perform well, with high efficiency, if a discriminative feature can be employed. In this work, based on the CAMShift framework, we focus on tackling the visual tracking problem when challenges like fast motion, blur, illumination variation, deformation, and camera jitter exist, but without serious occlusion. It is difficult to find such a tracker due to the fact that tracking an object without occlusion is still very challenging for both RGB and RGBD trackers. For the CAMShift tracker, calculating a robust back-projection is one of the most important issues for tracking a target. Indeterminate information always exists in the procedure of the CAMShift process. For instance, it exists when estimating the likelihood probability map, as well as localizing the target. In this work, we try to utilize the method of element-weighted neutrosophic correlation coefficient to handling decision-making problems when there is indeterminate information.
This work mainly exhibits four contributions. First, a method of element-weighted neutrosophic correlation coefficient is proposed. Secondly, three criteria are proposed for object seeds selection, and the corresponding membership functions, T, I, and F, are given. Thirdly, the proposed correlation coefficient is applied for estimating a robust back-projection by fusing the information in both color and depth domains. Finally, for the scale adaption problem, two alternatives in the neutrosophic domain are proposed, and the corresponding correlation coefficient between the proposed alternative and the ideal one is employed for the identification of the scale.
The remainder of this paper is organized as follows: in Section 2, the element-weighted neutrosophic correlation coefficient is presented. In Section 3, main steps and basic flow of the improved CAMShift visual tracker is given first, and the details are illustrated in the following subsections. Experimental evaluations and discussions are presented in Section 4, and Section 5 is the conclusions.

2. Element-Weighted Neutrosophic Correlation Coefficient

2.1. Neutrosophic Correlation Coefficient

Let A = { A 1 , A 2 , A m } be a set of alternatives and C = { C 1 , C 2 , C n } be a set of criteria. We get
A i = { C j , T C j ( A i ) , I C j ( A i ) , F C j ( A i ) | C j C } i = 1 m , j = 1 n ,
where Ai is an alternative in A, T C j ( A i ) indicates the degree to which the alternative Ai satisfies the criterion Cj; I C j ( A i ) denotes the indeterminacy degree to which the alternative Ai satisfies or does not satisfy the criterion Cj; F C j ( A i ) denotes the degree to which the alternative Ai does not satisfy the criterion Cj, T C j ( A i ) , I C j ( A i ) , F C j ( A i ) [ 0 , 1 ] , 1 T C j ( A i ) + I C j ( A i ) + F C j ( A i ) 3 , and A is a single-valued neutrosophic set.
The correlation coefficient under single-valued neutrosophic environment between two alternatives Ai and Aj is defined as [17]:
S C k ( A i , A j ) = T C k ( A i ) T C k ( A j ) + I C k ( A i ) I C k ( A j ) + F C k ( A i ) F C k ( A j ) T C k 2 ( A i ) + I C k 2 ( A i ) + F C k 2 ( A i ) T C k 2 ( A j ) + I C k 2 ( A j ) + F C k 2 ( A j ) .
Considering the contribution of each criterion, the weighted correlation coefficient between Ai and Aj is defined by
W ( A i , A j ) = k = 1 n w k T C k ( A i ) T C k ( A j ) + I C k ( A i ) I C k ( A j ) + F C k ( A i ) F C k ( A j ) T C k 2 ( A i ) + I C k 2 ( A i ) + F C k 2 ( A i ) T C k 2 ( A j ) + I C k 2 ( A j ) + F C k 2 ( A j ) ,
where wk [0,1] is the weight of each criterion Ck and j = 1 n w j = 1 .
Assume A* is the ideal alternative, A = { C j , T C j ( A ) , I C j ( A ) , F C j ( A ) | C j C } . Then the weighted correlation coefficient between any alternative Ai and the ideal alternative A* can be calculated by
W ( A i , A ) = k = 1 n w k T C k ( A i ) T C k ( A ) + I C k ( A i ) I C k ( A ) + F C k ( A i ) F C k ( A ) T C k 2 ( A i ) + I C k 2 ( A i ) + F C k 2 ( A i ) T C k 2 ( A ) + I C k 2 ( A ) + F C k 2 ( A ) .

2.2. Element-Weighted Neutrosophic Correlation Coefficient

As seen in Equation (3), only the contribution of each criterion is considered. However, in some engineering applications, it is better to provide specific weights for the T, I, and F elements for an alternative Ai when considering the criteria Cj. The element-weighted correlation coefficient between Ai and Aj is defined by
S e ( A i , A j ) = α T C k ( A i ) T C k ( A j ) + β I C k ( A i ) I C k ( A j ) + γ F C k ( A i ) F C k ( A j ) α T C k 2 ( A i ) + β I C k 2 ( A i ) + γ F C k 2 ( A i ) α T C k 2 ( A j ) + β I C k 2 ( A j ) + γ F C k 2 ( A j ) ,
where α, β, γ [0,1], α + β + γ = 1. Then S e ( A i , A j ) is actually the cosine value between the vector ( α T C k ( A i ) , β I C k ( A i ) , γ F C k ( A i ) ) and ( α T C k ( A j ) , β I C k ( A j ) , γ F C k ( A j ) ) . Hence, it is easy to find that Se satisfies the following properties:
(SP1) 0 S e ( A i , A j ) 1 ;
(SP2) S e ( A i , A j ) = S e ( A j , A i ) ;
(SP3) S e ( A i , A j ) = 1   i f   A i = A j ;
Then the element-criterion-weighted correlation coefficient between Ai and the ideal alternative A* is defined as
W S e ( A i , A ) = k = 1 n w k α T C k ( A i ) T C k ( A ) + β I C k ( A i ) I C k ( A ) + γ F C k ( A i ) F C k ( A ) α T C k 2 ( A i ) + β I C k 2 ( A i ) + γ F C k 2 ( A i ) α T C k 2 ( A ) + β I C k 2 ( A ) + γ F C k 2 ( A )

3. Improved CAMShift Visual Tracker Based on the Neutrosophic Theory

The algorithmic details of the proposed tracker are presented in this section.
A rectangle bounding box [22,23] is always employed for the representation of the target location when dealing with the visual tracking problem. For a visual tracker, the critical task is to estimate the corresponding bounding box in each frame.
Algorithm 1 illustrates the basic flow of the proposed algorithm, as seen in Table 1. Details of each main step are given in the following subsections.

3.1. Selecting Object Seeds

Assuming the tracker can calculate an adequate bounding box, and R is the extracted object region, then it is reasonable that we assume pixels located in R and the bounding box with smaller depth value are more likely to be a part of the target.
To select robust object seeds, the pixels located in R and the bounding box are first sorted by the descending order of the depth value. Then several pixels are selected as the candidate object seeds set S by sampling the sorted pixels with a fixed sampling space. Considering there may exists background regions in the corresponding area, only the top 50 percent pixels in the sorted pixel set are sampled.
For the proposition of depth value is a critical criterion for judging the object seed, TD, ID, and FD represent the probabilities when a proposition is true, indeterminate and false degrees, respectively. Then we can give the definitions:
T D = r max r s i r max
I D = var ( D ( R O I s i ) )
F D = 1 T D ,
where rmax is the lowest rank in S, and rsi is the rank of the i-th candidate seed Si; ROIsi is a set of pixels that is located in the square area centered at Si, and the length of the sides of the square should be set to an odd value; D(ROIsi) is the depth value set corresponding to ROIsi, and var(x) is the standard variance function.
As there may be other objects with a similar depth to the tracked target, the color similarity criterion is considered in this work to conquer such a problem. The corresponding three membership functions TC, IC and FC are defined as follows:
T C = P c ( S i )
I C = var ( P c ( R O I s i ) )
F C = 1 T C ,
where Pc is the back-projection calculated in color domain; Pc(Si) is the value of the back-projection located at Si. Pc(ROIsi) is the value set corresponding to ROIsi in Pc.
Besides the consideration of the depth and color feature separately, the fused color-depth information is also taken into consideration. The corresponding three membership functions TDC, IDC and FDC for the fused color-depth criterion are defined as follows:
T D C = P ( S i )
I D C = var ( P ( R O I s i ) )
F D C = 1 T D C ,
where P is the fused back-projection; P(Si) is the value of the back-projection located at Si; P(ROIsi) is the value set corresponding to ROIsi in P.
By substituting the corresponding T, I, and F under the proposed three criteria into Equation (6), the probability of the reliability for the seed Si can be calculated by
p r s i = w D p r D s i + w C p r C s i + w D C p r D C s i = w D α T D T D ( A ) + β I D I D ( A ) + γ F D F D ( A ) α T D 2 + β I D 2 + γ F D 2 α T D 2 ( A ) + β I D 2 ( A ) + γ F D 2 ( A ) + w C α T C T C ( A ) + β I C I C ( A ) + γ F C F C ( A ) α T C 2 + β I C 2 + γ F C 2 α T C 2 ( A ) + β I C 2 ( A ) + γ F C 2 ( A ) + w D C α T D C T D C ( A ) + β I D C I D C ( A ) + γ F D C F D C ( A ) α T D C 2 + β I D C 2 + γ F D C 2 α T D C 2 ( A ) + β I D C 2 ( A ) + γ F D C 2 ( A ) ,
where wD, wC, wDC [0,1] are the corresponding weights of criteria and wD + wC + wDC = 1. Assuming the ideal alternative under all the three criteria is the same as A = 1 , 0 , 0 , then Equation (16) can be simplified to
p r s i = α ( w D T D α T D 2 + β I D 2 + γ F D 2 + w C T C α T C 2 + β I C 2 + γ F C 2 + w D C T D C α T D C 2 + β I D C 2 + γ F D C 2 ) .
After calculating all the probability of the reliability for the seed in S, the first N seeds sorted by prsi are selected as object seeds set OS.

3.2. Extracting Object

The critical task for extracting object is to determine the extracted object region R.
For each frame, the image is segmented into regions in the depth domain. In this work, the fast depth segmentation method [40] is employed for segmentation. Suppose there exist M regions in the t-th frame, Ci represents the i-th region, and DCi is the depth value at the location of the centroid of Ci.
Suppose OSi is the i-th object seed in the previous frame and B is the pixel set located in the area of the bounding box calculated by the tracker in the current frame. The extracted region R is defined as:
R = { ( C k B ) | min ( | D C k D O S i | ) < T , i = 1 , 2 , N } ,
where DOSi is the depth value of OSi; T is a threshold value; and corresponds to the set operation. All the regions Ck belonging to R constructs the candidate object region RC.

3.3. Calculating the Fused Back-Projection

Back-projection is a probability distribution map with the same size as the input image. Each pixel value of the back-projection demonstrates the likelihood of the corresponding pixel belonging to the tracked object. Considering that using the discriminative feature is one of the most important factors for the visual tracker, both the color and depth information are employed for calculating the fused back-projection.
In [11], two criteria are proposed in both feature domain. For the near region similarity criterion, the corresponding neutrosophic elements are defined as follows:
T C n = u = 1 m c q ^ u c p ^ u c
I C n = u = 1 m c q ^ u c p ^ u c
F C n = 1 T C n ,
where q ^ u c is the u-th bin of the feature vector for the target in the color domain, it is firstly calculated in the first frame by
q ^ u c = C i = 1 n δ [ b c ( x i ) u ] ,
where { x i } i = 1 n are the pixels located in the extracted object region R; bc is the transformation function in color domain, and the function bc: R 2 { 1 m c } associates to the pixel at location xi the index b(xi) of its bin in the quantized feature space; δ is the Kronecker delta function; C is the normalization constant derived by imposing the condition u = 1 m c q ^ u c = 1 ; p ^ u c is the u-th bin of the feature vector corresponds to the extracted object region R in the current frame, and p ^ u c corresponds to the feature vector in the annular region near the target, as defined in [11].
Similarly, for the near region similarity criterion, the corresponding neutrosophic elements in the depth domain can be computed by
T D n = u = 1 m d q ^ u d p ^ u d , I D n = u = 1 m d q ^ u d p ^ u d , F D n = 1 T D n ,
where q ^ u d , p ^ u d and p ^ u d are the corresponding feature vector in the depth domain, similar to the calculation of q ^ u c , p ^ u c and p ^ u c .
By applying the far region similarity criterion [11], the related functions are presented as follows:
T C f = u = 1 m c q ^ u c p ^ u c ,   I C f = u = 1 m c q ^ u c p ^ u c ,   F C f = 1 T C f T D f = u = 1 m d q ^ u d p ^ u d ,   I D f = u = 1 m d q ^ u d p ^ u d ,   F D f = 1 T D f ,
where TCf, ICf, and FCf correspond to the color domain; TDf, IDf and FDf are the functions in the depth domain. As defined in [11], p ^ u c and p ^ u d are the feature vectors in the annular region far from the target in the color and depth domain, respectively.
As discussed in [11], the back-projection in the color domain is defined as
P c ( x ) = q ^ b c ( x ) c ,
where bc is the transformation function, as defined in Equation (22), and x is the pixel coordinate on the image plane.
By using the assumption that the tracked target is with a relative low speed, the back-projection corresponding to the depth domain is defined as [11]:
P D ( x ) = 1 2 e r f c ( 4 d ( x , S ) M A X D 2 ) , x 2 B p r e ,
where d (x, S) is the minimal depth distance between the pixel x and the previous seed set S; MAXD is the maximum depth displacement of the target between adjacent frames; 2Bpre is the pixel set that is covered by a bounding box of twice the size of the previous object location, but with the same center.
By considering the robustness of PC and PD, the fused back-projection P can be calculated as
P ( x ) = r c × P C ( x ) + ( 1 r c ) × P D ( x ) ,
where rc is calculated by
r c = n s C n s C + n s D ,
where n s C is the element-weighted correlation coefficient in the color domain. The ideal alternative under the near or far region similarity criterion is the same as A = 1 , 0 , 0 . Then nsC is defined as
n s C = α ( w C n T C n α T C n 2 + β I C n 2 + γ F C n 2 + w C f T C f α T C f 2 + β I C f 2 + γ F C f 2 ) ,
where wCn, wCf ∈ [0,1] are the corresponding weights of criteria and wCn + wCf = 1. Similarly, the element-weighted correlation coefficient in the depth domain is defined as
n s D = α ( w D n T D n α T D n 2 + β I D n 2 + γ F D n 2 + w D f T D f α T D f 2 + β I D f 2 + γ F D f 2 ) ,
where wDn, wDf ∈ [0,1] are the corresponding weights of criteria and wDn + wDf = 1. For P C and P D defined in Equation (27), they are computed by
P C ( x ) = { P C ( x ) i f P D ( x ) > T D 0          e l s e
P D ( x ) = { P D ( x ) i f P C ( x ) > T C 0          e l s e ,
where TD, TC [0,1] are the threshold for filtering the noise in the color or depth back-projection. As seen in Equation (31), the information in PD is employed for enhancing the color back-projection PC, and vice versa for Equation (32).

3.4. Scale Adaption

In this work, the bounding box of the tracked object is firstly decided by the traditional CAMShfit algorithm [26]. For each frame, the bounding box Bpre in the previous frame is employed as the start location of the mean shift process. The current tracking location can be calculated by
x = M 10 M 00 , y = M 01 M 00 ,
where M 10 = x B x P ( x ) , M 01 = x B y P ( x ) , and M 00 = x B P ( x ) . Then, the size of the bounding box is s = 2 M 00 / 256 . For convenience, this bounding box is called the initial bounding box bellow.
Getting the adequate scale of the bounding box is very important. If the scale cannot fit the object for a few frames, the tracker may finally drift from the target. Considering the size of the initial bounding box may be disturbed by the imprecision of the back-projection, a method in the neutrosophic domain is introduced into the scale identification process.
In this work, the color likelihood probability between the candidate object area and the object area is employed as the truth membership. The color likelihood probability between the candidate object area and the background area is employed as the indetermination membership. For the reducing scale alternative, the truth, indetermination, false membership functions are defined as
T r s = u = 1 m c q ^ u c p ^ r s u c
I r s = u = 1 m c p ^ r s u c b ^ r s u c
F r s = 1 T r s ,
where p ^ r s u c is the u-th bin of the feature vector in the initial bounding box with a reduced scale; b ^ r s u c corresponds to the feature vector in the annular region nearby the scale-reduced bounding box. It must be emphasized that all the pixels taken into consideration must located in the candidate object region RC in the current frame. By substituting Equations (34)–(36) into the Equation (6) with the assumption that the idea alternative A = 1 , 0 , 0 , the probability for reducing scale is defined as
w r s = α T r s α T r s 2 + β I r s 2 + γ F r s 2 .
Similarly, for the expanding scale alternative, the truth, indetermination, and false membership functions are defined as
T e x s = u = 1 m c q ^ u c p ^ e x s u c
I e x s = u = 1 m c p ^ e x s u c b ^ e x s u c
F e x s = 1 T e x s ,
where p ^ e x s u c and b ^ e x s u c are with the similar meaning as p ^ r s u c and b ^ r s u c , but with an expanded scale for the initial bounding box. Then the probability for reducing scale is defined as
w e x s = α T e x s α T e x s 2 + β I e x s 2 + γ F e x s 2 .
Finally, the scale of the initial bounding box is updated by
λ n e w = { 1 λ 0 i f ( w r s > s w e x s ) 1 + λ 0 i f ( w e x s > s w r s ) 1       o t h e r w i s e ,
where λ 0 (0,1) is the step value for scale adaption, it should be set to a relatively small value, it is set to 0.04 in this work; s > 1 is a scaling factor, it is employed for avoiding the noise in the color domain, and it is set to 1.1 here.
After the scale identification process, a bounding box with the same center as the initial bounding box, but with the tuned width and height is employed as the location of the target in the current frame. The method proposed in [11] for updating the feature vector of the target is employed here.

4. Experiment Results and Analysis

Several challenging video sequences captured from a Kinect sensor are employed. Both color and depth information is provided in each frame. As mentioned at the beginning, several challenging factors like fast motion, blur, deformation, illumination variation, and camera jitter are considered, and those selected testing sequences are all without serious occlusion challenge.
For comparison, the NeutRGBD [11] algorithm is employed. Compared to the tracker proposed here (NeutRGBDs), the NeutRGBD is also a tracker using the CAMShfit framework, but employing different strategies for object region extraction, information fusion, and scale identification. In addition, we implemented a neutrosophic CAMShfit tracker based on the tangent correlation coefficient [4] in the SVNS domain, we call it NeutRGBDst. The only difference between NeutRGBDs and NeutRGBDst is the correlation model, which allows us to ensure that the selection of correlation model is the cause of the performance difference.
To gauge absolute performance, four other trackers, compressive tracking (CT) [33], LGT [34], IVT [35], and TLD [36], are selected. The tracking-by-detection scheme is employed by the three trackers except LGT. For the LGT tracker, features like color, shape, and apparent local motion are introduced into the local layer updating process, and the local patches are applied for representing the target’s geometric deformation in the local layer.

4.1. Setting Parameters

For the proposed NeutRGBDs, the fixed sampling space for selecting candidate object seeds is set to 0.02. Due to the top 50% of pixels in the sorted pixel set being sampled, there will be 26 candidate seeds for each frame. After considering the neutrosophic criterion, N = 6 seeds are finally selected as object seeds. In order to emphasize the color information to some extent, wD, wC, and wDC in Equation (17) are set to 0.3, 0.4, and 0.3, respectively. It is not suggested to set one or two of those three parameters to a very small value, because the information in the corresponding feature domain may be wrongly discarded. The threshold T, defined in Equation (18), which decides the accuracy of R, is set to 50 mm. If T is set to a big enough value, the whole image region will be added into R, and a too small value will lead to an incomplete object region. The accuracy of the region extracting result may influence the tracking result to some extent. According to the attribution of the testing sequences and the displacement of the target between adjacent frames, the parameter MAXD in Equation (26) is set to 70 mm. Then wCn, wCf, wDn and wDf in Equations (29) and (30) are all set to 0.5 equally. A relatively low value should be given to the parameters TD and TC defined in Equations (31) and (32); otherwise, most useful information will be wrongly filtered out. In this work, both parameters are set to 0.1. For the element-weighted parameters α, β, and γ, they are set to 0.5, 0.25, and 0.25, respectively in Equations (17), (29), (30), (37) and (41) to enhance the truth element. Finally, all the values of these parameters are chosen by hand-tuning, and all of them are constant for all experiments.

4.2. Evaluation Criteria

Both the center position error and the success ratio are considered. The location error metric is employed for plotting the center position error curve. The center location Euclidean distance between the tracked target and the manually labeled ground truth is applied for calculating the center position error in each frame.
By setting an overlap score r, which is defined as the minimum overlap ratio, one can decide whether an output is correct or not. The success ratio R is calculated by the following formula:
R = i = 1 N u i / N , u i = { 1 i f s i > r 0 o t h e r w i s e ,
where N is the total number of frames, si is the overlap score, and r is the corresponding threshold. A robust tracker will earn a higher value for R. The overlap score can be calculated as
s i = a r e a ( ROI T i ROI G i ) a r e a ( ROI T i ROI G i ) ,
where ROITi is the region covered by the target bounding box in the i-th frame, and ROIGi is the region covered by the corresponding ground truth bounding box.

4.3. Tracking Results

Several screen captures for the testing sequences are given in Figure 1, Figure 2, Figure 3 and Figure 4. Success and center position error plots of each testing sequence are shown in Figure 5, Figure 6, Figure 7 and Figure 8. A more detailed discussion is described in the following.
Wr_no1 sequence: This sequence highlights the challenges of rolling, blur, fast motion, and appearance change. As shown in Figure 1, all the trackers perform well until frame #24. However, the bounding box calculated by the NeutRGBD tracker has a relatively small scale. As seen in frames #50 and #56, only the NeutRGBDs tracker produces an adequate scale for the bounding box. As shown in frames #56, #75, and #150, a relatively small scale is estimated by the NeutRGBDst tracker. The CT, LGT, and IVT trackers failed to track the rabbit in frame #75 because of the wrongly estimated scale, and the TLD tracker failed due to the challenge of serious appearance change when the person tried to turn the rabbit back. As shown in Figure 5a, the NeutRGBDs tracker has a good success ratio when different overlap thresholds are selected. Due to the good performance when dealing with the scale adaption problem, the center location of the tracked object is closer to the ground truth in most frames, as seen in Figure 5b. In summary, during the whole tracking process, the NeutRGBDs tracker has the best performance.
Toy_no sequence: The challenges like blur, fast motion, and rotation are included in this sequence. As seen in frame #6, the CT, IVT, and TLD trackers have already failed due to the fast motion of the toy. For the LGT tracker, an improper scale is estimated on account of the update of the local patches cannot follow such a rapid change. For the NeutRGBD tracker, due to the fact that the toy covers a relatively wide range of depth information, and the object seeds only cover a small range, the extracted object region sometimes only covers parts of the toy. As seen in Figure 6, on account of this factor, the center location of the target produced by the NeutRGBD tracker is less stable than that of the NeutRGBDs tracker. Thanks to the seed selection scheme, the NeutRGBDst tracker performs well on this sequence. Unlike the NeutRGBDs tracker, when calculating a neutrosophic correlation coefficient, T, I, and F are treated as the same weight for NeutRGBDst. As seen in Figure 2, the scale produced by the NeutRGBDs tracker performs the best. As shown in Figure 6a, both the NeutRGBDs and NeutRGBDst tracker perform well, and the NeutRGBDs tracker performs better when the overlap threshold is set to nearly 0.72.
Zball_no1 sequence: Challenges of illumination variation, rapid motion and camera jitter existed in this sequence. Due to the challenge of appearance change and rapid motion, the TLD tracker fails soon, as seen in Figure 3. The IVT tracker also fails due to the similar surroundings, especially for the wood floor. Though a relatively large scale of the bounding box is estimated by the CT and LGT trackers, both of them can localize the ball properly before frame #65. With the challenges of similar surroundings, rapid motion, and camera jitter, the CT tracker has already failed before frame #91. As shown in Figure 7b, when judging the center position error evaluation criteria, owing to the information fusion in the neutrosophic domain, the NeutRGBDs, NeutRGBDst, and NeutRGBD tracker all perform well. Thanks to the scale adaption and seeds selection strategy, the NeutRGBDs tracker has a more appropriate scale than the other two, as seen in Figure 3 and Figure 7a.
Hand_no_occ sequence: This sequence presents the challenges of illumination variation, deformation, out-of-plane rotation, and similar surroundings. As shown in frame #2 in Figure 4, a large background region is chosen as the object area at the phase of tracker initialization for the CT, LGT, IVT, and TLD tracker. All three trackers except LGT soon fail due to the weak initialization and the out-plane rotation challenge. The LGT tracker performs well throughout this sequence mainly due to the application of the scheme of apparent local motion. However, it is frequently disturbed by the similar surroundings, especially for regions with similar color and displacement. As seen in Figure 8b, a more accurate center has been produced by the NeutRGBDs tracker since frame #150. Although all three neutrosophic-based information fusion trackers perform well in this sequence, the NeutRGBDs tracker can produce a more accurate bounding box, as seen in Figure 4 and Figure 8.

4.4. Discussion

From the above illustrations of the tracking results, we see that a more accurate bounding box can be estimated when the element-weighted neutrosophic correlation coefficient is introduced into the CAMShift framework. Firstly, for the NeutRGBD tracker, only the depth information is considered for seed selection. A more robust seed selection scheme is applied by the NeutRGBDs tracker. Each seed is judged by using the T, I, and F elements in the neutrosophic domain, and the depth, color, and fused information are all taken into consideration. In addition, the truth element is emphasized for each criterion for the NeutRGBDs tracker. The object seeds play an essential role in the procedure of the NeutRGBD, NeutRGBDs, and NeutRGBDst tracker. The robust object seeds help the tracker earn a more accurate object region, as well as a more robust back-projection in the depth domain. Secondly, compared to the NeutRGBD tracker, the NeutRGBDs and NeutRGBDst trackers keep more useful information in both the color and depth domains in the final back-projection. Such a back-projection can provide a more discriminative feature when there are surroundings with similar color or depth to the target. Finally, for the NeutRGBDs and NeutRGBDst trackers, a scale identification process is first proposed in the neutrosophic domain, and this method contributes a lot when the CAMShift scheme fails to estimate an adequate scale. The only difference between the NeutRGBDs and NeutRGBDst trackers is the calculation of the correlation coefficient. The tangent correlation coefficient employed by the NeutRGBDst tracker treats the T, I, and F elements equally. As can be seen from the above analysis, it is the main reason the NeutRGBDs tracker always produces a more robust target bounding box.

5. Conclusions

A method of element-weighted neutrosophic correlation coefficient is proposed, and it is successfully applied in improving the CAMShift tracker in RGBD video. The experimental results have revealed its robustness. For the selection of robust object seeds, three kinds of criteria are proposed, and each candidate seed is represented in the SVNS domain via three membership functions, T, I, and F. Then these seeds are employed for extracting the object region and calculating the depth back-projection. Furthermore, the proposed neutrosophic correlation coefficient is applied for fusing the likelihood probability in both the color and depth domains. Finally, in order to modify the scale of the bounding box, two alternatives in the neutrosophic domain are proposed, and the corresponding correlation coefficient between the proposed alternative and the ideal one is employed for the identification of the scale. As discussed in this work, challenges without serious occlusion are considered here. It will be our primary mission to try to tackle the occlusion problem through the RGBD information in the future.

Author Contributions

The algorithm was conceived and designed by K.H.; the experiments were performed and analyzed by all the authors; J.Y. and S.S. fully supervised the work and approved the paper for submission.

Funding

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61603258, 61703280, 61601200, and the Plan Project of Science and Technology of Shaoxing City under Grant No. 2017B70056.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Smarandache, F. Neutrosophy: Neutrosophic Probability, Set and Logic; American Research Press: Rehoboth, Delaware, 1998; p. 105. [Google Scholar]
  2. Wang, H.; Smarandache, F.; Zhang, Y.; Sunderraman, R. Single valued neutrosophic sets. Multisp. Multistruct. 2010, 4, 410–413. [Google Scholar]
  3. El-Hefenawy, N.; Metwally, M.A.; Ahmed, Z.M.; El-Henawy, I.M. A review on the applications of neutrosophic sets. J. Comput. Theor. Nanosci. 2016, 13, 936–944. [Google Scholar] [CrossRef]
  4. Ye, J.; Fu, J. Multi-period medical diagnosis method using a single valued neutrosophic similarity measure based on tangent function. Comput. Methods Programs Biomed. 2016, 123, 142–149. [Google Scholar] [CrossRef] [PubMed]
  5. Guo, Y.; Şengür, A. A novel image segmentation algorithm based on neutrosophic similarity clustering. Appl. Soft Comput. J. 2014, 25, 391–398. [Google Scholar] [CrossRef]
  6. Anter, A.M.; Hassanien, A.E.; ElSoud, M.A.A.; Tolba, M.F. Neutrosophic sets and fuzzy c-means clustering for improving ct liver image segmentation. Adv. Intell. Syst. Comput. 2014, 303, 193–203. [Google Scholar]
  7. Karabatak, E.; Guo, Y.; Sengur, A. Modified neutrosophic approach to color image segmentation. J. Electron. Imaging 2013, 22, 4049–4068. [Google Scholar] [CrossRef]
  8. Zhang, M.; Zhang, L.; Cheng, H.D. A neutrosophic approach to image segmentation based on watershed method. Signal Process. 2010, 90, 1510–1517. [Google Scholar] [CrossRef]
  9. Guo, Y.; Şengür, A.; Ye, J. A novel image thresholding algorithm based on neutrosophic similarity score. Meas. J. Int. Meas. Confed. 2014, 58, 175–186. [Google Scholar] [CrossRef]
  10. Guo, Y.; Sengur, A. A novel 3d skeleton algorithm based on neutrosophic cost function. Appl. Soft Comput. J. 2015, 36, 210–217. [Google Scholar] [CrossRef]
  11. Hu, K.; Ye, J.; Fan, E.; Shen, S.; Huang, L.; Pi, J. A novel object tracking algorithm by fusing color and depth information based on single valued neutrosophic cross-entropy. J. Intell. Fuzzy Syst. 2017, 32, 1775–1786. [Google Scholar] [CrossRef]
  12. Hu, K.; Fan, E.; Ye, J.; Fan, C.; Shen, S.; Gu, Y. Neutrosophic similarity score based weighted histogram for robust mean-shift tracking. Information 2017, 8, 122. [Google Scholar] [CrossRef]
  13. Biswas, P.; Pramanik, S.; Giri, B.C. Topsis method for multi-attribute group decision-making under single-valued neutrosophic environment. Neural Comput. Appl. 2015, 27, 727–737. [Google Scholar] [CrossRef]
  14. Kharal, A. A neutrosophic multi-criteria decision making method. New Math. Nat. Comput. 2014, 10, 143–162. [Google Scholar] [CrossRef]
  15. Ye, J. Single valued neutrosophic cross-entropy for multicriteria decision making problems. Appl. Math. Model. 2014, 38, 1170–1175. [Google Scholar] [CrossRef]
  16. Majumdar, P. Neutrosophic sets and its applications to decision making. In Adaptation, Learning, and Optimization; Springer: Cham, Switzerland, 2015; Volume 19, pp. 97–115. [Google Scholar]
  17. Ye, J. Multicriteria decision-making method using the correlation coefficient under single-valued neutrosophic environment. Int. J. Gen. Syst. 2013, 42, 386–394. [Google Scholar] [CrossRef]
  18. Baušys, R.; Juodagalvienė, B. Garage location selection for residential house by waspas-svns method. J. Civ. Eng. Manag. 2017, 23, 421–429. [Google Scholar] [CrossRef]
  19. Zavadskas, E.K.; Bausys, R.; Juodagalviene, B.; Garnyte-Sapranaviciene, I. Model for residential house element and material selection by neutrosophic multimoora method. Eng. Appl. Artif. Int. 2017, 64, 315–324. [Google Scholar] [CrossRef]
  20. Zavadskas, E.K.; Bausys, R.; Kaklauskas, A.; Ubarte, I.; Kuzminske, A.; Gudiene, N. Sustainable market valuation of buildings by the single-valued neutrosophic mamva method. Appl. Soft Comput. J. 2017, 57, 74–87. [Google Scholar] [CrossRef]
  21. Guo, Y.; Sengur, A. Ncm: Neutrosophic c-means clustering algorithm. Pattern Recognit. 2015, 48, 2710–2724. [Google Scholar] [CrossRef]
  22. Smeulders, A.W.M.; Chu, D.M.; Cucchiara, R.; Calderara, S.; Dehghan, A.; Shah, M. Visual tracking: An experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1442–1468. [Google Scholar] [PubMed]
  23. Wu, Y.; Lim, J.; Yang, M.H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
  24. Wang, X. Tracking Interacting Objects in Image Sequences; École Polytechnique Fédérale de Lausanne: Lausanne, Switzerland, 2015. [Google Scholar]
  25. Comaniciu, D.; Ramesh, V.; Meer, P. Kernel-based object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 564–577. [Google Scholar] [CrossRef]
  26. Bradski, G.R. Real time face and object tracking as a component of a perceptual user interface. In Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision, Princeton, NJ, USA, 19–21 October 1998; pp. 214–219. [Google Scholar]
  27. Wong, S. An Fpga Implementation of the Mean-Shift Algorithm for Object Tracking; AUT University: Auckland, New Zealand, 2014. [Google Scholar]
  28. Leichter, I. Mean shift trackers with cross-bin metrics. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 695–706. [Google Scholar] [CrossRef] [PubMed]
  29. Zhu, C. Video Object Tracking Using Sift and Mean Shift. Master’s Thesis, Chalmers University of Technology, Gothenburg, Sweden, 2011. [Google Scholar]
  30. Bousetouane, F.; Dib, L.; Snoussi, H. Improved mean shift integrating texture and color features for robust real time object tracking. Vis. Comput. 2013, 29, 155–170. [Google Scholar] [CrossRef]
  31. Babenko, B.; Ming-Hsuan, Y.; Belongie, S. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 983–990. [Google Scholar]
  32. Hu, K.; Zhang, X.; Gu, Y.; Wang, Y. Fusing target information from multiple views for robust visual tracking. IET Comput. Vis. 2014, 8, 86–97. [Google Scholar] [CrossRef]
  33. Kaihua, Z.; Lei, Z.; Ming-Hsuan, Y. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2002–2015. [Google Scholar]
  34. Cehovin, L.; Kristan, M.; Leonardis, A. Robust visual tracking using an adaptive coupled-layer visual model. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 941–953. [Google Scholar] [CrossRef] [PubMed]
  35. Ross, D.; Lim, J.; Lin, R.-S.; Yang, M.-H. Incremental learning for robust visual tracking. Int. J. Comput. Vis. 2008, 77, 125–141. [Google Scholar] [CrossRef]
  36. Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1409–1422. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Almazan, E.J.; Jones, G.A. Tracking people across multiple non-overlapping RGB-D sensors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Portland, OR, USA, 23–28 June 2013; pp. 831–837. [Google Scholar]
  38. Han, J.; Pauwels, E.J.; De Zeeuw, P.M.; De With, P.H.N. Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans. Consum. Electr. 2012, 58, 255–263. [Google Scholar]
  39. Sridhar, S.; Oulasvirta, A.; Theobalt, C. Interactive markerless articulated hand motion tracking using rgb and depth data. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2456–2463. [Google Scholar]
  40. Camplani, M.; Hannuna, S.; Mirmehdi, M.; Damen, D.; Paiement, A.; Tao, L.; Burghardt, T. Real-time RGB-D tracking with depth scaling kernelised correlation filters and occlusion handling. In Proceedings of the British Machine Vision Conference (BMVC), Swansea, UK, 7–10 September 2015; pp. 145:1–145:11. [Google Scholar]
Figure 1. Performance on “wr_no1” sequence by seven trackers.
Figure 1. Performance on “wr_no1” sequence by seven trackers.
Information 09 00126 g001
Figure 2. Performance on “toy_no” sequence by seven trackers.
Figure 2. Performance on “toy_no” sequence by seven trackers.
Information 09 00126 g002
Figure 3. Performance on “zball_no1” sequence by seven trackers.
Figure 3. Performance on “zball_no1” sequence by seven trackers.
Information 09 00126 g003
Figure 4. Performance on “hand_no_occ” sequence by seven trackers.
Figure 4. Performance on “hand_no_occ” sequence by seven trackers.
Information 09 00126 g004
Figure 5. The quantitative plots for each tracker of the wr_no1 sequence: (a) Success plots; (b) center position error plots.
Figure 5. The quantitative plots for each tracker of the wr_no1 sequence: (a) Success plots; (b) center position error plots.
Information 09 00126 g005
Figure 6. The quantitative plots for each tracker of the toy_no sequence: (a) Success plots; (b) center position error plots.
Figure 6. The quantitative plots for each tracker of the toy_no sequence: (a) Success plots; (b) center position error plots.
Information 09 00126 g006
Figure 7. The quantitative plots for each tracker of the zball_no1 sequence: (a) Success plots; (b) center position error plots.
Figure 7. The quantitative plots for each tracker of the zball_no1 sequence: (a) Success plots; (b) center position error plots.
Information 09 00126 g007
Figure 8. The quantitative plots for each tracker of the hand_no_occ sequence: (a) Success plots; (b) center position error plots.
Figure 8. The quantitative plots for each tracker of the hand_no_occ sequence: (a) Success plots; (b) center position error plots.
Information 09 00126 g008
Table 1. Basic flow of the NeutRGBDs tracker.
Table 1. Basic flow of the NeutRGBDs tracker.
Algorithm 1 NeutRGBDs
Initialization
Input: 1st video frame in the RGBD domain
 (1) Select an object on the image plane
 (2) Extract object seeds using both color and depth information
 (3) Extract object region using object seeds and the information of the depth segmentation
 (4) Calculate the corresponding color and depth histograms as target model
Tracking
Input: (t+1)-th video frame in the RGBD domain
 (1) Calculate back-projections in both color (Pc) and depth domain (PD)
 (2) Calculate fused back-projection (P) using neutrosophic theory
 (3) Calculate the bounding box of the target in the CAMShift framework
 (4) Extract object region and update object model and seeds
 (5) Modify the scale of the bounding box in neutrosophic domain
Output: Tracking location

Share and Cite

MDPI and ACS Style

Hu, K.; Fan, E.; Ye, J.; Pi, J.; Zhao, L.; Shen, S. Element-Weighted Neutrosophic Correlation Coefficient and Its Application in Improving CAMShift Tracker in RGBD Video. Information 2018, 9, 126. https://doi.org/10.3390/info9050126

AMA Style

Hu K, Fan E, Ye J, Pi J, Zhao L, Shen S. Element-Weighted Neutrosophic Correlation Coefficient and Its Application in Improving CAMShift Tracker in RGBD Video. Information. 2018; 9(5):126. https://doi.org/10.3390/info9050126

Chicago/Turabian Style

Hu, Keli, En Fan, Jun Ye, Jiatian Pi, Liping Zhao, and Shigen Shen. 2018. "Element-Weighted Neutrosophic Correlation Coefficient and Its Application in Improving CAMShift Tracker in RGBD Video" Information 9, no. 5: 126. https://doi.org/10.3390/info9050126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop