1. Introduction
Usually, edge extraction serves as an early stage for computer vision tasks, such as object detection [
1], image classification [
2], and damage localization [
3]. As stated by Marr and Hildreth [
4], the concept of an edge has a partly visual and partly physical meaning. For the visual aspect of the edge, edge is defined as a location of a mathematically precipitous variation in intensity [
5]. The discontinuities of the image can be detected as points of high gradient magnitude. For the physical aspect of the edge, edge corresponds not only to the real contours of the interested target but also to differences in texture, changes in pigment, shadows, shadings, and even the real contours of the nearby objects which we do not care about. The scale of the edge also has both visual and physical meaning, and plays a vital role in edge extraction methods [
6,
7].
Originated from the works of Witkin et al. [
8] and Koenderink et al. [
9], scale is expressed as a parameter of image resolution in the scale space. Many works strive to determine the scale for edge extraction. For example, Lindeberg et al. [
10,
11] presented an automatic scale selection approach based on locally maximal normalized derivatives over scales, but the “scale” in this work is just a measurement of the diffusion degree of the step edge. Liu et al. [
12] refined this problem by the 3D Harris detector under an anisotropic diffusion assumption, and got better results that are consistent with the subjective experience of human beings. Elder et al. [
13] presented the concept of minimum reliable scale, estimated from the directed second derivatives, and then used it as the assigned scale for every location to compute the edge gradients. The scale of the edge discussed in the above works is a resolution-related parameter, connected with an isotropic/anisotropic diffusion process to extract edge. It is more closely related to a bottom-up attribute of the edge in a local range. In contrast, we suggest estimating both the scale and orientation of the edge in a top-down manner for better edge response.
In real applications, people may have certain prior knowledge about the scene and perspective effect from an imaging process [
14,
15,
16]. As shown in
Figure 1, it is easy for us to attain the imaging parameters (intrinsic and extrinsic parameters of the camera), some properties of the lane markings (e.g., the width of individual lane markings and the distance between adjacent lane markings), and the relative positions of the road and the camera. If we take lane marking as the concerned target, these kinds of top-down information are much more useful to estimate the scale (referred to as
localized target size) and orientation (referred to as
localized target orientation). Here, the
localized target size for any candidate pixel of a local feature is defined as the minimal distance between the real local features from the same target along the normal direction of the
localized target orientation nearest to that pixel. And, the
localized target orientation for any candidate pixel of a local feature is defined as the tangent direction of the real boundary of the interested target nearest to that pixel about the edge of the target. With the help of perspective effect, we know that, for the parts of the image nearer to the camera, a large scale should be chosen; for remote parts of the image, we should employ a small scale to account for the pop-out effect of the expected edges. This perspective effect is universal and does exist in both human vision system and mechanical imaging processes [
17,
18].
To extract the edge of the lane marking under the perspective effect, we associate the scale of edge in the image with the target size in the real world and assign different scales and orientations for the local filtering kernels. Naïvely performing filters with these kernels from different scales and orientations is very time consuming. There have been several studies on the process of computing the edge response at an arbitrary orientation and scale, such as steerable filter [
19] and deformable filter [
20], but the original purpose of these studies is to obtain the responses from multiple orientations and multiple scales with only a few basic filters. For high accuracy of edge response, they still need to spend a large amount of time using a large number of basic filters.
In this paper, we propose a novel efficient filtering approach called
orientation and scale tuned difference of boxes (osDoB) for edge extraction. By implementing the difference of boxes (DoB) filter [
7] on an integration map, we can reduce the computation needed for local filtering to only five add/subtract operations. This allows us to tune the box size, which is related to the scale of the edge, to any value easily. By rotating the original image to multiple orientations to generate the integration map, we can tune the relative pose of the image and filter to a specific orientation. By adjusting the aspect ratio of the DoB filter, we can gain a response with sharp directional profile. The osDoB filters can thus be easily tuned by scale and orientation, and also efficiently realized.
Recently, deep learning (DL) has achieved great success in computer vision tasks, including edge detection [
21,
22]. Most DL-based methods directly regress the edge map after training on a dataset. Compared to these DL methods, our method has three main advantages. First, our method is hand-crafted while DL methods are data-driven. Thus, our method does not require large-scale and expensive pixel-level annotations, which are essential for DL methods. Second, our method of working by linear filtering is very efficient and hardware-friendly. Nevertheless, DL methods have a large number of parameters and occupy huge computation resources, which is a big challenge for deployment on low-cost on-chip devices. Considering that edge detection is usually only used as a prerequisite for other advanced tasks, this challenge is even more severe. Third, due to the high degree of non-linearity, DL methods are difficult to interpret. In contrast, with our method it is easy to analyze and interpret the results.
The performance of our method is verified on both synthetic and real data. The real data is from the traffic scene and we choose lane marking as the target for edge extraction. Edge extraction for lane markings is a typical application in driver assistant systems (DAS). Furthermore, it is possible for us to compute the localized target size and localized target orientation ahead of edge extraction. Experimental results show that performance can be improved by assigning appropriate orientation and scale parameters for edge extraction. We also demonstrate that the early stage orientation tuning strategy used in our method cannot be replaced by a refined Canny detector combined with an orientation map at a later stage.
The main contributions of this work are threefold:
We deduce the relationship between the best scale parameter of certain filters and the localized target size.
A novel approach, called osDoB, is proposed for edge extraction. The proposed method can extract edges efficiently with preset scale and orientation.
We demonstrate that osDoB can improve the performance of edge extraction for lane marking by tuning the local filter ahead of time.
The paper is organized as shown in
Figure 2: in
Section 2, we discuss the problem of optimal scale and orientation assignment for the local edge detector. Given the basic form of a local feature detector, if we happen to know the localized target size and the localized target orientation of the expected target, we can assign the best scale parameter and orientation parameters for the specialized filter. In
Section 3, we discuss the relationship between the prior information of the scene and the assignment of optimal scale and orientation. We present the efficient edge extraction approach, osDoB, in
Section 4. In
Section 5, we analyze the experimental results. Concluding remarks and possible extensions of this work are shown in
Section 6.
2. Optimal Scale and Orientation Assignment
In the preceding section, we defined the localized target size and the localized target orientation of the expected target. Given the expected target to be detected, any location in the image may have an individual localized target size and localized target orientation. In this paper, we are only concerned with edge extraction. We assign a specialized edge extractor for every location in the image and assign optimal scale and optimal orientation to those filters for the best edge extraction performance. In this section, we discuss the optimal scale and orientation for a local edge detector.
2.1. Defining the Problem
Detecting the local feature
out of image patch
has been the subject of many studies. This is especially true for local edges. The edges may be the real contour of the target or the other edges inside the target, with which we are not concerned. For a patch of natural images, if the noise is assumed to be additive, the noised patch can be expressed as
Here, is the real context of the patch, and is the noise.
In our work, the signal is defined as a step edge at the original with limited extension, which works as the localized target size. For the 1D case,
is a pulse of width
d extended from
to
:
For the 2D case,
is a strip of width
d along the
y axis:
In these two cases, the parameter
d corresponds to the localized target size. In many works, such as that of Canny, the signal is defined as an unbound step edge, which can enlarge the scale of the filter to suppress the noise. Usually, the problem of local feature detection is to design a certain filter or a certain set of filters to detect the local features using specific criteria. Here,
is the impulse response of the filter (or one of the filters in the set). Here, we restrict ourselves to linear filtering, so the output of the filtered version of the local patch is as follows:
Here, is supposed to be designed to detect from . This problem has been addressed many times. Now, we are going to re-examine this problem when is the real contour of the target that we want to detect, and from certain method, we happen to know the localized target size and the localized target orientation of the contour.
2.2. Optimal Scale with Pre-Known Localized Target Size
As has been pointed out by Demigny [
23], when the scale of the filter is small, compared to the target, detection of the local edge is unaffected by the nearby edges, and the noise is not well suppressed. At first, enlarging the scale of the filter does not change the signal or reduce the noise, but if the scale of the filter is too large, signal is also reduced due to the overlapping of the response of the nearby edges. As in Canny’s work, the noise is supposed to be white and Gaussian. When the signal is a step edge at the original with limited extension, Demigny gave the best detection criteria as follows:
Usually,
is an FIR filter, so there exist certain parameters that control the scale and shape of the filter. For example, the parameter
for the Shen filter [
24] is as follows:
The parameter
for the Deriche filter [
25] is as follows:
The parameter
for FDOG is as follows:
The parameter
a for DoB is as follows:
Using dimensionless methods, similar to those of a previous study, we can deduce the optimal scale parameters for those filters when the input signal has a pre-known extension size [
23]. Here, we list the results for several typical cases.
Case 1: Using FDOG as the local filter for the 1D case.
The FDOG kernel for the 1D case has the form of Equation (
8). The optimal scale parameter for this case must follow the following equation:
By solving this equation numerically, we can infer that, in order to detect a step edge at the original with the limited extension of d, the optimal scale parameter for FDOG in the 1D case should be .
Case 2: Using FDOG as the local filter for the 2D case.
Here, we choose the 2D FDOG as the edge filter, so the kernel is as follows:
Here,
is the aspect ratio for the filter. The optimal scale parameter for this case must follow the equation:
This means that aspect ratio
has no effect on the optimal scale parameters for a given pre-known extension size. Solving this equation numerically, for FDOG in the 2D case with a pre-known extension size
d, the optimal scale parameter
can be obtained as
. This result is different from the result for the 1D case, as in Equation (
8), and has nothing to do with the aspect ratio
.
Case 3: Using DoB as the local filter for the 1D case.
The DoB kernel in the 1D case has the form Equation (
9). Using a similar deducing process to that with FDOG, we can attain the optimal scale parameter as:
Case 4: Using DoB as the local filter for the 2D case.
As in the 2D case, the DoB filter [
26] can be defined as follows:
Similar to the FDOG filter of Case 2, we have
as the aspect ratio for the filter. The same result as Case 3 for the optimal scale parameter can be attained in this case, which is:
In all of the above cases, we deduce the optimal scale parameter for the pre-known signal extension size by only using the first criterion of Canny: the best detection. The reason that we do not consider the other two criteria: minimal localization error and minimal multiple responses, is because in the discrete domain the other two criteria do not show significant variation, enlarging the filter scale, especially when the local filter is DoB [
27].
2.3. Angular Accuracy with Pre-Known Localized Target Orientation
To detect a local feature with a special orientation, the local feature detector must be sensitive to that special orientation and have invariance to orientation mismatches. To discuss the orientational performance of an oriented local feature detector, we can expand the concept of
selectivity and
invariance [
28]. Selectivity refers to the fact that the local feature detector should only have a high response to the local feature with that special orientation and a minimal response to all the other orientations. The invariance means that the local feature detector should have certain tolerance to orientation errors, which means it still has a rather high response to the orientations near the specified orientation. In our work, the local pattern is part of a strip with the width of
d aligned along the
y axis. The local feature detector is an odd linear filter with certain shape parameters. Here, we discuss the orientational performance of the FDOG and the DoB for 2D as in Equations (
11) and (
14), and the
in both of the extractors is the shape parameter for the aspect ratio.
As discussed in the previous section, the optimal scale parameter
for the 2D FDOG for the local patch with the width
d can be gained by solving Equation (
12). Here, the dominant orientation of the filter is the positive direction of the
y axis, and the dominant orientation of the local pattern is the edge direction of the strip at the original. The orientation mismatching angle is defined as the angle from the dominant orientation of the local pattern to the dominant orientation of the filter, as shown in
Figure 3a. We also plot out the response of the local edge extractor with
, while the orientation mismatching angle
varies from
to
, as shown in
Figure 3b.
As shown in
Figure 4a, better selectivity means the response curve should have a narrower distribution as it nears the
axis. Better invariance means the response curve should have a broad shoulder and remain almost constant at
. Intuitively, we can imagine that increasing
, the local filter will have a better selectivity.
Figure 4a shows the response curves for
,
,
,
and
. When
, the response curve has good invariance but very poor selectivity. With larger values of
, especially
, the response curve has better selectivity and worse invariance. To tune an oriented filter to a special orientation, we need to control the invariance and selectivity of the oriented filter to a certain extent. There are certain shape parameters that can be controlled to produce a tradeoff between selectivity and invariance for a special problem.
Poggio has defined the filter’s resolution by the mean width of the filter response
to the ideal signal. Similarly, we define the angular selectivity
by the mean width of the response
of filter
h to the strip with orientation
, and this can serve as a quantitative indicator for angular resolution. The response has certain symmetries around the origin. We consider the response from
to
:
The angular selectivity depends on the special shape of the filter. It is a function of the aspect ratio of the filter, which can be seen in
Figure 4b.
In addition to many versions of the same filter, each version can be applied differently by adding small rotation to find the response of a filter at many orientations, as described in the concept of the steerable filter by Freeman and Adelson [
19]. Steering a steerable filter involves rotating the original filter to an arbitrary angle using a linear combination of rotated versions of the same filter’s impulse response. Although it is very efficient, the steerable filters always suffer from the trade-off between the number of basic filters and the angular resolution. For the filter to be steerable, it must have certain constraints regarding its shape and angular resolution. We argue that angular resolution is an important performance factor for the extraction of orientated features. To design an oriented filter with a higher angular resolution, we need its shape to be freely adjustable. For this reason, we prefer to rotate the same filter to many orientations. All possible orientations are evenly divided and a filter bank is designed with one filter for each orientation. Here, the angle between adjacent orientations can be seen as the angular resolution of the filter bank. The angular resolution is connected tightly with the selectivity of the filter’s angular response. We can use this as the definition of selectivity.
For DoB in Equation (
14), this is still the case. We plot out the local filter response with
,
,
,
and
, as shown in
Figure 5a. The orientation mismatching angle
varying from
to
. Similarly, we plot the angular selectivity figure according to the aspect ratio of the filter.
2.4. Local Edge Extractor Selection and Performance Analysis
We compared the FDOG and DoB as the local feature detector. Here, we choose to use DoB as the local edge extractor. The reasons for this are as follows. Inspired by the integration map wildly used in the computer vision society, we want to use DoB to produce a fast algorithm. With the integration map, we can compute the response of DoB at any scale using only four addition and subtraction operations. By rotating the original image to multiple orientations before computing the integration map, we can tune the orientation using density orientations. The DoB also has the shape parameters of the aspect ratio, which can be used to determine a balance between angular selectivity and angular invariance. These two factors are important for the tuning of local features to any scale and a density orientation for fast realization in real-world applications. Similar reason can account for why the gPb edge detector was proposed by Arbeláez et al. [
7].
These points highlight the advantages of the proposed method with respect to the other existing approaches:
- (1)
For the system to distinguish the real contour of an object from the interference, such as false edges caused by noise or irrelevant events, the maximum acceptable scale is more important than the localization criteria. So, when we have some pre-knowledge about the localized target size, we prefer to deduce the maximum acceptable scale using only the best detection criteria while ignoring the minimal localization error and minimal multiple response criteria.
- (2)
Considering the discrete expression of Canny’s criteria of localization under 2D conditions, the localization error will always be acceptable with different scale parameters for the DoB filter [
29].
- (3)
As an FIR filter, the DoB filter is limited with respect to filter quality, but its 2D shape allows it much more freedom than steerable filters. For example, by adjusting the aspect ratio properly, we can attain a sharp angular response profile to meet the balance between angular selectivity and angular invariance.
For a local feature detector to be tuned by the scale and orientation, the filter should show certain invariance to the photometrical and geometrical deformation while keeping certain selectivity to the specified scale and orientation. If the responses of the local feature detectors is too robust with respect to scale and orientation, specifying the local scale and orientation becomes useless. Here, we investigate the performance of the optimal local filter over variations in scale and orientation. We use the orientation and scale-tuned DoB as the local feature detector, and will focus on the performance evaluation of DoB. The signal is still a strip, but the width of it varies from
to
, and the orientation varies from
to
. The optimal filter has the optimal scale parameter with
and the optimal aspect ratio parameter with
. The contour plot of the response of the DoB, having optimal parameters with the pre-known localized target orientation, varying from
to
, and the pre-known localized target size varying from
to
, is shown in
Figure 6.
As shown in
Figure 6, the optimal DoB has a rather good angular performance when the localized target orientation varies around the specified orientation. Somehow, the scale performance is not so good. When the real localized target size is less than the pre-known localized target size, the selectivity of DoB is rather good. However when the real localized target size is greater than the pre-known localized target size, the response of DoB remains constant, which means it has no selectivity at all. This is an important issue that should be considered in future works. The DoB is an acceptable local edge extractor that can be tuned by scale and orientation only when the real localized target size is less than the pre-known localized target size, which is usually acceptable in real applications.
Having the appropriate local edge extractor that can be tuned by scale and orientation, the next problem is how to assign the proper scale and orientation for special application. The relative positions of the target and the camera, and the local features of the target can appear different in size and orientation in the image. This is the perspective effect. Pre-known information about the real size of the target, the relative positions of the target and the camera, and the perspective parameters of the imaging system, provide some clarity about the size, orientation and location of the target, which appeared in the image. Although this does not tell us where the target will appear next in the image, we do know what its size and orientation will be when it does. That information is also called the contextual information regarding the scene to be imaged.
The problem of lane marking detection and lane boundary detection by on-board camera systems is typical in that it involves some known information about the target and the imaging system. In the next section, we use lane marking detection to test our proposed orientation and scale-tuned local feature detector, and we illustrate how to assign the scale and orientation of certain image locations for special target recognition.
4. Implementation
In traditional multi-scale strategies, only a few scales are taken (e.g., three scales). In this case, the whole image is convolved with these three kernels. In this paper, the response at each position is obtained by performing an arbitrary kernel. If we design a specific kernel for each position, the computational cost will be prohibitively large. Efforts have been made to alleviate such a problem. In one previous study, steerable filters [
19] have been proposed, so that, the response of a kernel with arbitrary orientation can be obtained using the linear combination of several kernels with fixed orientation. However, in this method, only symmetry Gaussian kernels are suitable. Another previous study presented the LDA method [
20], in which all the possible kernels for different scales and different orientations are collected and then several kernels are trained by using LDA from all these possible kernels. We note that, if the possible kernels vary in both orientation and scale, the number of representative kernels is very large and limits the reconstruction error [
20]. This prevents the idea from being realized in real time. If the aspect ratio of the possible kernels is not 1, the number would be higher [
33]. In this paper, we propose a novel, fast-acting algorithm for this task. The algorithm can be realized in real time.
In this project, we used DoB to detect local features. In fact, DoB is the optimal approximation of a Gaussian kernel. Given the basic shape of the DoB (e.g., the order, as shown in
Figure 9) kernel
, the DoB with different scale
s, aspect ratio
, and orientation
is defined as a spanned space, as follows:
Integration maps [
34,
35] have been proposed as a method of computing the responses of Harr-like kernels to a natural image. For any rectangle R in an image, the sum of the pixels
S within rectangle D can be efficiently computed on the integration map
with only
addition/subtraction operations, and the area
S is
where,
,
,
and
are the coordinates of four vertexes of
R with clockwise order, and vertex
A is located in the upper left corner. For more details about the integration map, please refer to [
35].
However, this technique is only suitable for upright and horizontal kernels. In order to solve this problem, the polygon integration map was introduced and could be computed in real time [
36]. It has been asserted that any polygonal area can be computed in real time. However, we note that the capitulation error cannot be eliminated, rendering the response inconsistent. In this paper, we combine the concepts from the integration map and from the image rotation and propose a new real-time implementation algorithm for computing responses of DoB kernels on natural images, namely osDoB.
We then take the detection of lane markings as an example. As shown in
Figure 8, the object scales of different part of the lane markings are diverse because of the perspective effect. The optimal kernels for different locations have different scales. The expected orientations of the local edges are also different. Parameter maps can be obtained using the perspective effect, as shown in the figure (only the scale map and the orientation maps are shown, and the entire image has a single aspect ratio). In this paper, only one order DOB filter is adopted, as shown in left side of
Figure 9. Instead of generating filters with different orientations, we keep the filter from rotating, but rotate the image at multiple orientations. Meanwhile, in order to solve the problem of arbitrary scale assignment for the kernel, we use the technique of integration map to realize the real-time calculation. In other words, by combining the advantages of image rotating and integration map, the proposed osDoB filter can not only achieve nearly the same real-time performance of the box filter, but can also maintain the ability to obtain the filter response of any orientation and scale.
Given the parameter maps, we can compute the response map based on the proposed algorithm. First, we rotate the image with different orientation steps (e.g., in this project, we use 18 orientations); then, for each rotated image, the integration map is computed, giving us 18 integration maps. When we compute the response for location (
x,
y), we look up both the scale parameter
s and the orientation parameter
from the parameter maps. With these two parameters and the aspect ration known before, the kernel shape and orientation are determined. To avoid rotating the kernel, we compute the respond in a standard DOB manner, but at the integration map of the rotated image. According to the parameters of the filter at location (
x,
y), we can find out the integration map corresponding to the given orientation and calculate the response using the DoB filter with given scale parameter. The response
M at
is obtained in a standard DoB manner with given kernel shape at the chosen integration map,
A DoB contains two rectangles, where R(A,B,C,D) is the left side of the rectangle (negative part) and R(B,E,F,C) is the right side (positive part). The area of these two rectangles can be computed efficiently.
The process is summarized in Algorithm 1.
Algorithm 1 Workflow of osDoB. |
Input: Image I with height H and width W Scale , orientation for kernels at (x, y) and aspect ratio for all the kernels Output: The response map M
|
- 1:
▹ Padding boundaries - 2:
for each orientation do - 3:
▹ Generating rotated images - 4:
▹ Generating integration maps - 5:
end for - 6:
for each valid position do - 7:
▹ Selecting integration map by orientation - 8:
▹ Positioning vertexes by scale and aspect ratio - 9:
▹ Calculating difference of box (DoB) - 10:
end for - 11:
return
M
|
5. Experiments
We first used a synthesized image to show the performance of the proposed method in detecting objects with different scales and orientations. Then, we performed experiments on real-world images under the perspective effect. Many edge detectors have been developed in the last few decades, such as those of Sobel, Prewitt, Roberts, etc. Though the Canny detector [
37] was proposed in the early days of computer vision, it is still a state-of-the-art edge detector with high efficiency. Edge detectors with better performance than Canny usually require longer computation time [
38,
39]. Thus, in this experiment, we qualitatively compared our method with the Canny detector, and compared our method with the Canny, Sobel, Prewitt and Roberts detectors quantitatively.
Figure 10 shows the synthesized image employed for comparison. The objects we used were rectangles from different orientations and scales. In this image, there are nine objects in a
array. In each row, objects have the same scale. In each column, objects have the same orientation. Here, we chose the direction of the longer side of rectangle as the localized target orientation for osDoB. Therefore, osDoB was supposed to only extract the edge along the orientation of object and have little response on other orientations, especially in the direction of the shorter side of the rectangle. Thus, Canny is supposed to extract all the edges of the objects.
Images near the top of
Figure 10 show the noisy initial image, the gradient map produced by the proposed method (osDoB, tuning both the scale and orientation), and the edge map produced by osDoB. Near the bottom of the figure, there are three edge maps produced by the Canny edge detector with different scales for the Gaussian blurring kernels. We can see that osDoB only extracts edges corresponding to the preset orientation, but has the ability to detect all of the objects well with quite acceptable localization accuracy. However, for the Canny detector, it is difficult to detect all the objects simultaneously, even using different scales. We observe that, if the Gaussian kernel scale is small, false edges caused by noise will pop out; if the scale is large, the small objects will disappear and the shapes of large objects will be deformed. Another reason why the shape of the large objects was destroyed is as follows. In the Canny detector, a symmetric kernel is used (Gaussian kernel). The aspect ratio of this kernel is 1. As discussed above, such an aspect ratio corresponds to a very low angular accuracy, so it must affect the localization accuracy. However, in the proposed method, the aspect ratio can be selected arbitrarily. In this experiment, we use a 3:1 aspect ratio, and thus yield a quite good angular accuracy. It is worth noting that osDoB may totally fail if we have no knowledge of the localized target orientation. Thankfully, in a real scene, which we will discuss below, we can obtain both the localized target size and localized target orientation from prior knowledge about the scene and perspective effect in imaging process.
The perspective effect is always present in the imaging process in both human and mechanical vision. In computation, it is very hard to assign the special orientation and scale for each location in the image. The expected orientation and scale can be determined using the following two factors: first, the extrinsic and intrinsic parameters of the imaging equipment; second, the prior of the targets/objects to be detected. As mentioned in the proceeding section, there is one case, road detection, in which we can determine the orientation map and scale map ahead of time under certain constraints. The intrinsic and extrinsic parameters of the imaging equipment are known beforehand, and other parameters regarding the road shape and the relative relationship between the road and vehicle can either be obtained by tracking or set to constants. Hence, the optimal scale and the expected orientation could be computed for the optimal response at each location. In this paper, we use road images to validate the performance of the proposed algorithm.
In this paper, we use a dataset provided by a previous study to compare the proposed edge detector with other algorithms [
40]. This dataset consists of 116 high-quality color road images, which are captured in different scenarios. For each image, the lane marking areas are manually labeled as the ground truth. Some examples of the dataset and the corresponding human labeled ground truth are shown in
Figure 11. We employ the whole dataset for evaluation. In the evaluation, if an edge point labeled by an algorithm is near the edge of the ground truth (the block distance between them is less than two pixels), we consider it to be marked correctly.
In this experiment, both the ROC (receiver operating characteristic) curve [
41] and DSC (dice similarity coefficient) curve [
42] were used to evaluate these algorithms. ROC reflects the true-positive rate (hit rate) against the false-positive rate (false alarm rate) while shifting the threshold from zero to one. The area under the ROC curve (AUC) is called the ROC score and it reflects the performance of the algorithm. In this experiment, the false-positive rate and the true-positive rate were computed by matching the edge map given by an algorithm to the one of the ground truth at each threshold value. The DSC curve is another evaluation method, which records the DSC values against the thresholds. The maximum DSC values are a measure of the performance of an algorithm at the optimal threshold setting. The DSC value is defined as
, where TP is the true positive, FP is the false-positive, and T is the true point in the ground truth.
We compared our approach osDoB, tuning both the orientation and scale to the Canny edge detector. There are two versions of our method: one is only tuning the orientation (oDoB); the other is tuning both the orientation and scale (osDoB). We slid the threshold for each model from 0 to 1 and recorded these threshold edge maps. We then computed the ROC and DSC curves for each model.
Some examples of the comparison are shown in
Figure 12. We show here six road images with different scenarios. In each grid, the top row shows the input image, the expected orientation map, and the expected scale map (from left to right). These maps can be computed using the estimated imaging parameters and the estimated vanishing points. In the lower part of the the figure, there are three columns showing the results of Canny, oDoB (only tuning the orientation) and osDoB (tuning both the orientation and scale). Each column contains two maps: the gradient magnitude map (top) and the edge map (bottom).
As shown in
Figure 12, the proposed model can handle both curving lane markings (as shown in
Figure 12e) and straight one. From these examples, we observe that the orientation tuning is of great importance for edge detection. As shown in
Figure 12a,c, most of these heavy shadows in the road surface were removed. This is very helpful as, without it, the wrong edge points, such as those caused by the shadows on the road surface, referred to as the corresponding edge detection results by Canny, will yield spurious peaks in the Hough space. It has been believed that, for Canny edge detector, this type of problem can be handled by combining an orientation map. These false edge points can be removed by examining the orientation gradient at the expected orientation. In the following, we will show that this is not true.
As shown in
Figure 13, there are two cases in which orientation tuning will play an important role. In the first case, orientation tuning removes noisy edges with unexpected orientations. As shown in
Figure 13a, besides several lane markings, there are many unexpected white lines with unexpected orientations. The osDoB can suppress these lines. In this case, such a result can also be obtained by the Canny detector, which combines with an orientation map. We can remove these edges by determining whether these edges possess the expected orientation. We can also remove them at this stage (after edge detection stage). In the second case, the osDoB will play a very important role. A single edge point may possess more than one orientation [
19]. For traditional edge detection, such as the Canny edge detector, the orientation with the most intense response will be labeled as an edge point with that orientation. However, sometimes the orientation of one point will lower the intensity of the response to the expected orientation, and that point has a consistent orientation along a local line. As shown in
Figure 13b, the edges in the boundary of the lane marking have the same orientation, but some of them show higher responses at other orientations. This is caused by heavy shadows. For the Canny detector, these edges can give a point with the wrong orientation, and such edges may be removed in the non-maxima suppression stage. This is possible because they lack consistent orientations, as shown in the lower part of
Figure 13b. For the proposed method, an orientation tuning strategy is employed, and points with weak expected orientations can be extracted.
Assigning proper scale for each point is very important for achieving maximal SNR using such a strategy. From these comparisons, we observe that assigning a proper scale for each location is also important. As shown in
Figure 12d, the osDoB can suppress the edges with unexpected orientations, but some of them are still present. This is because the response of these points is very heavy at smaller scales. When we assign a proper scale to each point (osDoB), we observe that most of the unexpected edges are removed. The results shown in
Figure 12f also supports this conclusion. In the input image, there is a passenger car which possesses many edges and lines with unexpected orientations. Most of these edges can be suppressed by assigning the proper scale.
The quantitative results are shown in
Figure 14a,b. We compare oDoB, and osDoB with edge detectors, such as Canny, Sobel, Prewitt and Roberts. The ROC curves of these models are shown in
Figure 14a. We also adopt the AUC (area under the curve) of the ROC curve as the evaluation metric and depict the value of the corresponding AUC of each curve. By comparing quantitatively, we can see that oDoB achieves compatible performance with Canny. The osDoB has the highest AUC, with a
higher AUC than Canny. The osDoB can always obtain the highest hitting rate among these edge detectors when a specific false positive rate is given. From the DSC curves, which are shown in
Figure 14b, we can draw the conclusion that orientation tuning can produce better performance. Tuning both the orientation and the scale is needed and is able to produce the best performance. The optimal threshold for osDoB is
.
6. Conclusions
We address the problem of edge extraction according to the top-down information about certain targets under perspective effect. The top-down information contains two parts: one is the intrinsic and extrinsic parameters of the imaging process; the other is prior to the targets in the scene (in this paper, we consider edge extraction for the purposes of target detection). Once we have this information, we have more of a chance to extract the local edges from the real contours of the target, while ignoring the edges caused by noise or other irrelevant events, such as the texture, shadows, shading, and even edges from nearby objects. In contrast to traditional methods, we introduced the top-down information into the low-level feature extraction stage as early as possible.
In this paper, we argue that the most important aspects to enhance a local edge are scales (localized target size) and the expected orientation (localized target orientation). We assigned special orientation and scale for each location in the image before edge extraction. To compute the response of an edge detector at many orientations and arbitrary scale in real time, we proposed the osDoB. To illustrate that our osDoB can be tuned to arbitrary orientation and scale in real time, we solve the lane detection problem in challenging scenes. Experimental results show that performance can be greatly improved by the introduction of global appearance information during local feature detection.
We argue that the scale and orientation should have more physical meaning connected with the real target information. The same problem came with other local features in computer vision literature, “similar to other concepts used in computer vision, such as “texture” and “face”, the notion of contour is the result of common human experience rather than a formal mathematical definition” [
6]. Although, in this work, we assign scale and orientation for local edge detection in rather a hard manner, it is also possible for us to assign scale and orientation for local feature detectors in a soft manner, such as applying the Gestalt principle [
43], which can be addressed in future work. osDoB can be used as an efficient local filter for special orientation and arbitrary scale, and it has many usages in applications such as local edge extraction for local feature descriptors.