Efficient Filtering for Edge Extraction under Perspective Effect

Li, Jian; An, Xiangjing

doi:10.3390/app11188558

Open AccessArticle

Efficient Filtering for Edge Extraction under Perspective Effect

by

Jian Li

^1,*,† and

Xiangjing An

^1,2,†

¹

College of Intelligence Science, National University of Defense Technology, Changsha 410073, China

²

Xingshen Tech, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2021, 11(18), 8558; https://doi.org/10.3390/app11188558

Submission received: 30 July 2021 / Revised: 7 September 2021 / Accepted: 10 September 2021 / Published: 15 September 2021

Download

Browse Figures

Versions Notes

Abstract

:

Though it is generally believed that edges should be extracted at different scales when using a linear filter, it is still difficult to determine the optimal scale for each filter. In this paper, we propose a novel approach called orientation and scale tuned difference of boxes (osDoB) to solve this problem. For certain computer vision applications, such as lane marking detection, the prior information about the concerned target can facilitate edge extraction in a top-down manner. Based on the perspective effect, we associate the scale of the edge in an image with the target size in the real world and assign orientation and scale parameters for filtering each pixel. Considering the fact that it is very time-consuming to naïvely perform filters with different orientations and scales, we further design an extended integration map technology to speed up filtering. Our method is validated on synthetic and real data. The experimental results show that assigning appropriate orientation and scale parameters for filters is effective and can be realized efficiently.

Keywords:

edge extraction; perspective effect; difference of boxes; lane marking detection

1. Introduction

Usually, edge extraction serves as an early stage for computer vision tasks, such as object detection [1], image classification [2], and damage localization [3]. As stated by Marr and Hildreth [4], the concept of an edge has a partly visual and partly physical meaning. For the visual aspect of the edge, edge is defined as a location of a mathematically precipitous variation in intensity [5]. The discontinuities of the image can be detected as points of high gradient magnitude. For the physical aspect of the edge, edge corresponds not only to the real contours of the interested target but also to differences in texture, changes in pigment, shadows, shadings, and even the real contours of the nearby objects which we do not care about. The scale of the edge also has both visual and physical meaning, and plays a vital role in edge extraction methods [6,7].

Originated from the works of Witkin et al. [8] and Koenderink et al. [9], scale is expressed as a parameter of image resolution in the scale space. Many works strive to determine the scale for edge extraction. For example, Lindeberg et al. [10,11] presented an automatic scale selection approach based on locally maximal normalized derivatives over scales, but the “scale” in this work is just a measurement of the diffusion degree of the step edge. Liu et al. [12] refined this problem by the 3D Harris detector under an anisotropic diffusion assumption, and got better results that are consistent with the subjective experience of human beings. Elder et al. [13] presented the concept of minimum reliable scale, estimated from the directed second derivatives, and then used it as the assigned scale for every location to compute the edge gradients. The scale of the edge discussed in the above works is a resolution-related parameter, connected with an isotropic/anisotropic diffusion process to extract edge. It is more closely related to a bottom-up attribute of the edge in a local range. In contrast, we suggest estimating both the scale and orientation of the edge in a top-down manner for better edge response.

In real applications, people may have certain prior knowledge about the scene and perspective effect from an imaging process [14,15,16]. As shown in Figure 1, it is easy for us to attain the imaging parameters (intrinsic and extrinsic parameters of the camera), some properties of the lane markings (e.g., the width of individual lane markings and the distance between adjacent lane markings), and the relative positions of the road and the camera. If we take lane marking as the concerned target, these kinds of top-down information are much more useful to estimate the scale (referred to as localized target size) and orientation (referred to as localized target orientation). Here, the localized target size for any candidate pixel of a local feature is defined as the minimal distance between the real local features from the same target along the normal direction of the localized target orientation nearest to that pixel. And, the localized target orientation for any candidate pixel of a local feature is defined as the tangent direction of the real boundary of the interested target nearest to that pixel about the edge of the target. With the help of perspective effect, we know that, for the parts of the image nearer to the camera, a large scale should be chosen; for remote parts of the image, we should employ a small scale to account for the pop-out effect of the expected edges. This perspective effect is universal and does exist in both human vision system and mechanical imaging processes [17,18].

To extract the edge of the lane marking under the perspective effect, we associate the scale of edge in the image with the target size in the real world and assign different scales and orientations for the local filtering kernels. Naïvely performing filters with these kernels from different scales and orientations is very time consuming. There have been several studies on the process of computing the edge response at an arbitrary orientation and scale, such as steerable filter [19] and deformable filter [20], but the original purpose of these studies is to obtain the responses from multiple orientations and multiple scales with only a few basic filters. For high accuracy of edge response, they still need to spend a large amount of time using a large number of basic filters.

In this paper, we propose a novel efficient filtering approach called orientation and scale tuned difference of boxes (osDoB) for edge extraction. By implementing the difference of boxes (DoB) filter [7] on an integration map, we can reduce the computation needed for local filtering to only five add/subtract operations. This allows us to tune the box size, which is related to the scale of the edge, to any value easily. By rotating the original image to multiple orientations to generate the integration map, we can tune the relative pose of the image and filter to a specific orientation. By adjusting the aspect ratio of the DoB filter, we can gain a response with sharp directional profile. The osDoB filters can thus be easily tuned by scale and orientation, and also efficiently realized.

Recently, deep learning (DL) has achieved great success in computer vision tasks, including edge detection [21,22]. Most DL-based methods directly regress the edge map after training on a dataset. Compared to these DL methods, our method has three main advantages. First, our method is hand-crafted while DL methods are data-driven. Thus, our method does not require large-scale and expensive pixel-level annotations, which are essential for DL methods. Second, our method of working by linear filtering is very efficient and hardware-friendly. Nevertheless, DL methods have a large number of parameters and occupy huge computation resources, which is a big challenge for deployment on low-cost on-chip devices. Considering that edge detection is usually only used as a prerequisite for other advanced tasks, this challenge is even more severe. Third, due to the high degree of non-linearity, DL methods are difficult to interpret. In contrast, with our method it is easy to analyze and interpret the results.

The performance of our method is verified on both synthetic and real data. The real data is from the traffic scene and we choose lane marking as the target for edge extraction. Edge extraction for lane markings is a typical application in driver assistant systems (DAS). Furthermore, it is possible for us to compute the localized target size and localized target orientation ahead of edge extraction. Experimental results show that performance can be improved by assigning appropriate orientation and scale parameters for edge extraction. We also demonstrate that the early stage orientation tuning strategy used in our method cannot be replaced by a refined Canny detector combined with an orientation map at a later stage.

The main contributions of this work are threefold:

We deduce the relationship between the best scale parameter of certain filters and the localized target size.
A novel approach, called osDoB, is proposed for edge extraction. The proposed method can extract edges efficiently with preset scale and orientation.
We demonstrate that osDoB can improve the performance of edge extraction for lane marking by tuning the local filter ahead of time.

The paper is organized as shown in Figure 2: in Section 2, we discuss the problem of optimal scale and orientation assignment for the local edge detector. Given the basic form of a local feature detector, if we happen to know the localized target size and the localized target orientation of the expected target, we can assign the best scale parameter and orientation parameters for the specialized filter. In Section 3, we discuss the relationship between the prior information of the scene and the assignment of optimal scale and orientation. We present the efficient edge extraction approach, osDoB, in Section 4. In Section 5, we analyze the experimental results. Concluding remarks and possible extensions of this work are shown in Section 6.

2. Optimal Scale and Orientation Assignment

In the preceding section, we defined the localized target size and the localized target orientation of the expected target. Given the expected target to be detected, any location in the image may have an individual localized target size and localized target orientation. In this paper, we are only concerned with edge extraction. We assign a specialized edge extractor for every location in the image and assign optimal scale and optimal orientation to those filters for the best edge extraction performance. In this section, we discuss the optimal scale and orientation for a local edge detector.

2.1. Defining the Problem

Detecting the local feature

s (x, y)

out of image patch

p (x, y)

has been the subject of many studies. This is especially true for local edges. The edges may be the real contour of the target or the other edges inside the target, with which we are not concerned. For a patch of natural images, if the noise is assumed to be additive, the noised patch can be expressed as

p (x, y) = s (x, y) + n (x, y) .

(1)

Here,

s (x, y)

is the real context of the patch, and

n (x, y)

is the noise.

In our work, the signal is defined as a step edge at the original with limited extension, which works as the localized target size. For the 1D case,

s (x)

is a pulse of width d extended from

x = 0

to

x = d

:

s (x) = \{\begin{matrix} 1 & for x \in [0, d], \\ 0 & otherwise . \end{matrix}

(2)

For the 2D case,

s (x, y)

is a strip of width d along the y axis:

s (x, y) = \{\begin{matrix} 1 & for x \in [0, d] \times [- \infty, \infty], \\ 0 & otherwise . \end{matrix}

(3)

In these two cases, the parameter d corresponds to the localized target size. In many works, such as that of Canny, the signal is defined as an unbound step edge, which can enlarge the scale of the filter to suppress the noise. Usually, the problem of local feature detection is to design a certain filter or a certain set of filters to detect the local features using specific criteria. Here,

h (x, y)

is the impulse response of the filter (or one of the filters in the set). Here, we restrict ourselves to linear filtering, so the output of the filtered version of the local patch is as follows:

\hat{p} (x, y) = s (x, y) \otimes h (x, y) + n (x, y) \otimes h (x, y) .

(4)

Here,

h (x, y)

is supposed to be designed to detect

s (x, y)

from

p (x, y)

. This problem has been addressed many times. Now, we are going to re-examine this problem when

s (x, y)

is the real contour of the target that we want to detect, and from certain method, we happen to know the localized target size and the localized target orientation of the contour.

2.2. Optimal Scale with Pre-Known Localized Target Size

As has been pointed out by Demigny [23], when the scale of the filter is small, compared to the target, detection of the local edge is unaffected by the nearby edges, and the noise is not well suppressed. At first, enlarging the scale of the filter does not change the signal or reduce the noise, but if the scale of the filter is too large, signal is also reduced due to the overlapping of the response of the nearby edges. As in Canny’s work, the noise is supposed to be white and Gaussian. When the signal is a step edge at the original with limited extension, Demigny gave the best detection criteria as follows:

C_{d} = \frac{\int_{- d}^{0} h (x) d x}{{[\int_{- \infty}^{\infty} h^{2} (x) d x]}^{\frac{1}{2}}} .

(5)

Usually,

h (x, y)

is an FIR filter, so there exist certain parameters that control the scale and shape of the filter. For example, the parameter

β

for the Shen filter [24] is as follows:

h (x) = - s g n (x) e^{- β | x |},

(6)

The parameter

α

for the Deriche filter [25] is as follows:

h (x) = - x e^{- α | x |},

(7)

The parameter

σ

for FDOG is as follows:

h (x) = - x e^{- \frac{x^{2}}{2 σ^{2}}},

(8)

The parameter a for DoB is as follows:

h (x) = \{\begin{matrix} - s g n (x) & for x \in [- a, a] \\ 0 & otherwise, \end{matrix} .

(9)

Using dimensionless methods, similar to those of a previous study, we can deduce the optimal scale parameters for those filters when the input signal has a pre-known extension size [23]. Here, we list the results for several typical cases.

Case 1: Using FDOG as the local filter for the 1D case.

The FDOG kernel for the 1D case has the form of Equation (8). The optimal scale parameter for this case must follow the following equation:

(2 d^{2} + σ^{2}) e^{\frac{- d^{2}}{2 σ^{2}}} = σ^{2} .

(10)

By solving this equation numerically, we can infer that, in order to detect a step edge at the original with the limited extension of d, the optimal scale parameter for FDOG in the 1D case should be

0.462578 d

.

Case 2: Using FDOG as the local filter for the 2D case.

Here, we choose the 2D FDOG as the edge filter, so the kernel is as follows:

h (x, y) = \frac{- x}{2 π λ σ^{4}} e^{\frac{x^{2}}{2 σ^{2}} + \frac{y^{2}}{2 λ^{2} σ^{2}}} .

(11)

Here,

λ

is the aspect ratio for the filter. The optimal scale parameter for this case must follow the equation:

- σ^{2} + σ^{2} e^{\frac{- d^{2}}{2 σ^{2}}} + d^{2} e^{\frac{- d^{2}}{2 σ^{2}}} = 0 .

(12)

This means that aspect ratio

λ

has no effect on the optimal scale parameters for a given pre-known extension size. Solving this equation numerically, for FDOG in the 2D case with a pre-known extension size d, the optimal scale parameter

σ

can be obtained as

0.630835 d

. This result is different from the result for the 1D case, as in Equation (8), and has nothing to do with the aspect ratio

λ

.

Case 3: Using DoB as the local filter for the 1D case.

The DoB kernel in the 1D case has the form Equation (9). Using a similar deducing process to that with FDOG, we can attain the optimal scale parameter as:

a = d .

(13)

Case 4: Using DoB as the local filter for the 2D case.

As in the 2D case, the DoB filter [26] can be defined as follows:

h (x, y) = \{\begin{matrix} \frac{1}{2 a \sqrt{λ}} & for x \in (0, a), y \in (- λ a, λ a), \\ - \frac{1}{2 a \sqrt{λ}} & for x \in (- a, 0), y \in (- λ a, λ a), \\ 0 & otherwise . \end{matrix}

(14)

Similar to the FDOG filter of Case 2, we have

λ

as the aspect ratio for the filter. The same result as Case 3 for the optimal scale parameter can be attained in this case, which is:

a = d .

(15)

In all of the above cases, we deduce the optimal scale parameter for the pre-known signal extension size by only using the first criterion of Canny: the best detection. The reason that we do not consider the other two criteria: minimal localization error and minimal multiple responses, is because in the discrete domain the other two criteria do not show significant variation, enlarging the filter scale, especially when the local filter is DoB [27].

2.3. Angular Accuracy with Pre-Known Localized Target Orientation

To detect a local feature with a special orientation, the local feature detector must be sensitive to that special orientation and have invariance to orientation mismatches. To discuss the orientational performance of an oriented local feature detector, we can expand the concept of selectivity and invariance [28]. Selectivity refers to the fact that the local feature detector should only have a high response to the local feature with that special orientation and a minimal response to all the other orientations. The invariance means that the local feature detector should have certain tolerance to orientation errors, which means it still has a rather high response to the orientations near the specified orientation. In our work, the local pattern is part of a strip with the width of d aligned along the y axis. The local feature detector is an odd linear filter with certain shape parameters. Here, we discuss the orientational performance of the FDOG and the DoB for 2D as in Equations (11) and (14), and the

λ

in both of the extractors is the shape parameter for the aspect ratio.

As discussed in the previous section, the optimal scale parameter

σ

for the 2D FDOG for the local patch with the width d can be gained by solving Equation (12). Here, the dominant orientation of the filter is the positive direction of the y axis, and the dominant orientation of the local pattern is the edge direction of the strip at the original. The orientation mismatching angle is defined as the angle from the dominant orientation of the local pattern to the dominant orientation of the filter, as shown in Figure 3a. We also plot out the response of the local edge extractor with

λ = 1

, while the orientation mismatching angle

γ

varies from

- π

to

π

, as shown in Figure 3b.

As shown in Figure 4a, better selectivity means the response curve should have a narrower distribution as it nears the

R e s p o n s e

axis. Better invariance means the response curve should have a broad shoulder and remain almost constant at

γ = 0

. Intuitively, we can imagine that increasing

λ

, the local filter will have a better selectivity. Figure 4a shows the response curves for

λ = 10^{- 1}

,

λ = 10^{- \frac{1}{2}}

,

λ = 10^{0}

,

λ = 10^{\frac{1}{2}}

and

λ = 10^{1}

. When

λ < 1

, the response curve has good invariance but very poor selectivity. With larger values of

λ

, especially

λ > 3

, the response curve has better selectivity and worse invariance. To tune an oriented filter to a special orientation, we need to control the invariance and selectivity of the oriented filter to a certain extent. There are certain shape parameters that can be controlled to produce a tradeoff between selectivity and invariance for a special problem.

Poggio has defined the filter’s resolution by the mean width of the filter response

g (x)

to the ideal signal. Similarly, we define the angular selectivity

ϑ (h)

by the mean width of the response

g (γ)

of filter h to the strip with orientation

γ

, and this can serve as a quantitative indicator for angular resolution. The response has certain symmetries around the origin. We consider the response from

- \frac{π}{2}

to

\frac{π}{2}

:

ϑ (h) = {[\frac{\int_{- \frac{π}{2}}^{\frac{π}{2}} γ^{2} g (γ) d γ}{\int_{- \frac{π}{2}}^{\frac{π}{2}} g (γ) d γ}]}^{\frac{1}{2}} .

(16)

The angular selectivity depends on the special shape of the filter. It is a function of the aspect ratio of the filter, which can be seen in Figure 4b.

In addition to many versions of the same filter, each version can be applied differently by adding small rotation to find the response of a filter at many orientations, as described in the concept of the steerable filter by Freeman and Adelson [19]. Steering a steerable filter involves rotating the original filter to an arbitrary angle using a linear combination of rotated versions of the same filter’s impulse response. Although it is very efficient, the steerable filters always suffer from the trade-off between the number of basic filters and the angular resolution. For the filter to be steerable, it must have certain constraints regarding its shape and angular resolution. We argue that angular resolution is an important performance factor for the extraction of orientated features. To design an oriented filter with a higher angular resolution, we need its shape to be freely adjustable. For this reason, we prefer to rotate the same filter to many orientations. All possible orientations are evenly divided and a filter bank is designed with one filter for each orientation. Here, the angle between adjacent orientations can be seen as the angular resolution of the filter bank. The angular resolution is connected tightly with the selectivity of the filter’s angular response. We can use this as the definition of selectivity.

For DoB in Equation (14), this is still the case. We plot out the local filter response with

λ = 10^{- 1}

,

λ = 10^{- \frac{1}{2}}

,

λ = 10^{0}

,

λ = 10^{\frac{1}{2}}

and

λ = 10^{1}

, as shown in Figure 5a. The orientation mismatching angle

θ

varying from

- \frac{π}{2}

to

\frac{π}{2}

. Similarly, we plot the angular selectivity figure according to the aspect ratio of the filter.

2.4. Local Edge Extractor Selection and Performance Analysis

We compared the FDOG and DoB as the local feature detector. Here, we choose to use DoB as the local edge extractor. The reasons for this are as follows. Inspired by the integration map wildly used in the computer vision society, we want to use DoB to produce a fast algorithm. With the integration map, we can compute the response of DoB at any scale using only four addition and subtraction operations. By rotating the original image to multiple orientations before computing the integration map, we can tune the orientation using density orientations. The DoB also has the shape parameters of the aspect ratio, which can be used to determine a balance between angular selectivity and angular invariance. These two factors are important for the tuning of local features to any scale and a density orientation for fast realization in real-world applications. Similar reason can account for why the gPb edge detector was proposed by Arbeláez et al. [7].

These points highlight the advantages of the proposed method with respect to the other existing approaches:

(1): For the system to distinguish the real contour of an object from the interference, such as false edges caused by noise or irrelevant events, the maximum acceptable scale is more important than the localization criteria. So, when we have some pre-knowledge about the localized target size, we prefer to deduce the maximum acceptable scale using only the best detection criteria while ignoring the minimal localization error and minimal multiple response criteria.
(2): Considering the discrete expression of Canny’s criteria of localization under 2D conditions, the localization error will always be acceptable with different scale parameters for the DoB filter [29].
(3): As an FIR filter, the DoB filter is limited with respect to filter quality, but its 2D shape allows it much more freedom than steerable filters. For example, by adjusting the aspect ratio properly, we can attain a sharp angular response profile to meet the balance between angular selectivity and angular invariance.

For a local feature detector to be tuned by the scale and orientation, the filter should show certain invariance to the photometrical and geometrical deformation while keeping certain selectivity to the specified scale and orientation. If the responses of the local feature detectors is too robust with respect to scale and orientation, specifying the local scale and orientation becomes useless. Here, we investigate the performance of the optimal local filter over variations in scale and orientation. We use the orientation and scale-tuned DoB as the local feature detector, and will focus on the performance evaluation of DoB. The signal is still a strip, but the width of it varies from

0.1 d

to

2 d

, and the orientation varies from

- \frac{π}{2}

to

\frac{π}{2}

. The optimal filter has the optimal scale parameter with

a = d

and the optimal aspect ratio parameter with

λ = 2.98

. The contour plot of the response of the DoB, having optimal parameters with the pre-known localized target orientation, varying from

- \frac{π}{2}

to

\frac{π}{2}

, and the pre-known localized target size varying from

0.1 d

to

2 d

, is shown in Figure 6.

As shown in Figure 6, the optimal DoB has a rather good angular performance when the localized target orientation varies around the specified orientation. Somehow, the scale performance is not so good. When the real localized target size is less than the pre-known localized target size, the selectivity of DoB is rather good. However when the real localized target size is greater than the pre-known localized target size, the response of DoB remains constant, which means it has no selectivity at all. This is an important issue that should be considered in future works. The DoB is an acceptable local edge extractor that can be tuned by scale and orientation only when the real localized target size is less than the pre-known localized target size, which is usually acceptable in real applications.

Having the appropriate local edge extractor that can be tuned by scale and orientation, the next problem is how to assign the proper scale and orientation for special application. The relative positions of the target and the camera, and the local features of the target can appear different in size and orientation in the image. This is the perspective effect. Pre-known information about the real size of the target, the relative positions of the target and the camera, and the perspective parameters of the imaging system, provide some clarity about the size, orientation and location of the target, which appeared in the image. Although this does not tell us where the target will appear next in the image, we do know what its size and orientation will be when it does. That information is also called the contextual information regarding the scene to be imaged.

The problem of lane marking detection and lane boundary detection by on-board camera systems is typical in that it involves some known information about the target and the imaging system. In the next section, we use lane marking detection to test our proposed orientation and scale-tuned local feature detector, and we illustrate how to assign the scale and orientation of certain image locations for special target recognition.

3. Lane Marking Detection

To detect the lane markings in a road scene, we do have some pre-knowledge about this task. First, we do know the real size of the lane markings. Here, we care only about the width of the lane markings. According the national standards about road construction, the width of lane markings painted on the road surface may vary from 15 to 20 cm. Second, the purpose of a lane detection system is to find the lane markings that indicate the intended local trajectory of the vehicle. It is reasonable to suppose that the lane markings must lie on the ground surface in the local area. This means that we have some information about the relative position between the lane markings and the on-board camera. Because on-board cameras that are mounted on vehicles are carefully designed and calibrated, it is out of question that we can gain the perspective parameters of the imaging system. Those parameters are usually related to the intrinsic and the extrinsic parameters of the on-board camera, which can be determined using certain calibration procedures. With all this information, although we cannot predict where in the image the lane markings will appear, we do know the scale and orientation of the lane markings, at least within a local range around the vehicle, as shown in Figure 1.

To obtain an explicit description of the scale and orientation of the lane markings in the frame buffer, we require a parametrical expression of all the pixels on the lane markings in the frame buffer.

3.1. Parametric Model of Projective Lane Markings

Once we parameterize the relative position between the lane markings and the vehicle, we can determine all of the possible appearances about the lane markings with varying relative poses and positions. This parameterized expression of the lane markings is the lane model. Although a variety of different lane-modeling techniques have been used, most of them can be divided into two categories. The first category involves modeling the lane markings themselves in the vehicle coordinates. Bertozzi and Broggi [30] assumed that the road markings would form parallel lines in an inverse-perspective-warped image. Others have used approximations of flat roads with piecewise constant curvatures. The second type of technique involves modeling the appearance of the lane marking directly in the frame buffer. For example, deformable contours, such as B-snakes [31], have been used to parameterize the appearance of the curved lanes. Because we need to parameterize the relative positions of the lane markings and the vehicle, we must model the lane markings in the vehicle coordinates using the values of the relative positions of the lane markings and the vehicle as parameters.

Before we deduce the parametrical expression of lane markings in the frame buffer, we first provide definitions of coordinates. To illustrate the orientation of the camera in vehicle coordinate and the projection from 3D space to the 2D space in a simple way, we defined the following four coordinate systems: the frame buffer

(C, R)

in pixels, the image coordinates

(X_{I}, Y_{I})

in m, the camera coordinates

(X_{C}, Y_{C}, Z_{C})

in m, and the vehicle coordinates

(X_{V}, Y_{V}, Z_{V})

in m. Here, we use the ENU-system (east–north–up) as the positive direction of the vehicle coordinates. The origin is the intersection of the

X_{V}

-axis with the plane, which passes the origin of the camera coordinates and is perpendicular to the

X_{V}

axis. The definitions of other coordinates are straightforward. They are not given here. Definitions are shown in Figure 7.

The appearance of the lane markings in the frame buffer can be determined as a transformation from the 3D Euclidean space on the vehicle coordinate to the 2D Euclidean space of the frame buffer. In practice, this type of transform can be divided into two stages: the first stage is the transformation from the world coordinate system to the camera coordinate system, and the second stage is the projection from the camera coordinate system to the image frame buffer.

Usually, the onboard camera of the lane-detection system is carefully mounted on the vehicle. It is reasonable to assume that the offset of optical center

^{V} P_{C}

in the vehicle coordinate is

(0, 0, h)

and the rotation angles form the vehicle coordinate to the camera coordinate to be

(ϕ, p i / 2 + θ, 0)

, as illustrated in Figure 7. Furthermore, if we combine all of the steps of transformation into one transformation expression, the perspective projection from the 3D vehicle coordinate to the 2D frame buffer can be expressed as follows:

\{\begin{matrix} c =^{B} C_{I} + \frac{x_{v} c o s (ϕ) + y_{v} s i n (ϕ)}{c_{f} ((h - z_{v}) s i n (θ) + c o s (θ) (y_{v} c o s (ϕ) - x_{v} s i n (ϕ)))}, \\ r =^{B} R_{I} + \frac{(h - z_{v}) - s i n (θ) (y_{v} c o s (ϕ) - x_{v} s i n (ϕ))}{r_{f} ((h - z_{v}) s i n (θ) + c o s (θ) (y_{v} c o s (ϕ) - x_{v} s i n (ϕ)))} . \end{matrix}

(17)

Here,

c_{f} = \frac{1}{N_{c} f}

represents the real length of one pixel divided by the focus length and

N_{c}

is the distance between two adjacent pixels in the horizontal direction.

r_{f} = \frac{1}{N_{r} f}

represents the real width of one pixel, divided by the focus length, and

N_{r}

is the distance between two adjacent pixels in the vertical direction.

^{B} C_{I}

and

^{B} R_{I}

are the coordinate values of the optical center in the frame buffer.

To obtain an explicit description of the appearance of the lane markings in the frame buffer, we require a lane model of the lane markings in the vehicle coordinate first. Here, we represent the lane markings in the vehicle coordinate as parameter function of a certain variable t. This means the lane markings can be expressed as

(x_{v} (t), y_{v} (t), z_{v} (t))

. To analyze the lane markings of certain shapes, we expand this parameter function of the lane markings with the Taylor’s series at the original point,

\{\begin{matrix} x_{v} (t) = c_{0} + c_{1} t + c_{2} t^{2} + c_{3} t^{3} + O [t^{4}], \\ y_{v} (t) = l_{0} + l_{1} t + O [t^{2}], \\ z_{v} (t) = h_{0} + h_{1} t + O [t^{2}] . \end{matrix}

(18)

If we examine parameter t as the arc length along the running direction of the vehicle in a local area, it makes sense to express the

x_{v} (t)

with more items because we care more about the shape of the lane markings along the lateral direction.

c_{0}

is the lateral offset of the vehicle relative to the lane marking.

c_{1} = t a n (ϕ_{v})

is the tangent of the heading angle of the vehicle, relative to the local direction of the lane marking.

c_{2}

and

c_{3}

are the curvature parameters of the lane marking. If both

c_{2}

and

c_{3}

are zero, the lane markings modeled by Equation (20) are straight lines. If

c_{3}

is zero and

c_{2}

is non-zero, the curvature of the lane markings is constant. If both

c_{2}

and

c_{3}

are non-zero, the curvature of the lane markings is linearly variable as a function along the arc length t. This is typical of lane markings [32].

l_{0}

is the offset of the arc length of the lane marking in the vehicle coordinate. It makes sense to set

l_{0}

to 0 and

l_{1}

to 1 without a loss of generality.

h_{0}

is the variation in the vehicle’s height due to the soft suspension system of the modern vehicles.

h_{1} = t a n (θ_{v})

is the tangent of the vehicle’s pitch angle, which is also caused by the suspension system. In this way, we can rewrite the parameter function of the lane marking and omit the infinite items of higher orders,

\{\begin{matrix} x_{v} (t) = c_{0} + t a n (ϕ_{v}) t + c_{2} t^{2} + c_{3} t^{3}, \\ y_{v} (t) = t, \\ z_{v} (t) = h_{0} + t a n (θ_{v}) t . \end{matrix}

(19)

To develop a simpler form of lane marking, the yaw angle and pitch angle of the vehicle can be combined with the corresponding angle of the camera, so the influence of those two items can be considered in the perspective projection transformation. The height of the vehicle can be combined with the height of the camera in the vehicle coordinates. The sum can be considered in the perspective projection transformation. In this way, we can determine the parameter function of the lane marking, as follows:

\{\begin{matrix} x_{v} (t) = c_{0} + c_{2} t^{2} + c_{3} t^{3}, \\ y_{v} (t) = t, \\ z_{v} (t) = 0 . \end{matrix}

(20)

The refined perspective projection transformation from the 3D vehicle coordinate to the 2D image frame buffer is as follows:

\{\begin{matrix} c =^{B} C_{I} + \frac{x_{v} c o s (ϕ + ϕ_{v}) + y_{v} s i n (ϕ + ϕ_{v})}{c_{f} ((h + h_{0} - z_{v}) s i n (θ + θ_{v}) + c o s (θ + θ_{v}) (y_{v} c o s (ϕ + ϕ_{v}) - x_{v} s i n (ϕ + ϕ_{v})))}, \\ r =^{B} R_{I} + \frac{(h + h_{0} - z_{v}) c o s (θ + θ_{v}) - s i n (θ + θ_{v}) (y_{v} c o s (ϕ + ϕ_{v}) - x_{v} s i n (ϕ + ϕ_{v}))}{r_{f} ((h + h_{0} - z_{v}) s i n (θ + θ_{v}) + c o s (θ + θ_{v}) (y_{v} c o s (ϕ + ϕ_{v}) - x_{v} s i n (ϕ + ϕ_{v})))} . \end{matrix}

(21)

We can rewrite the formula with

h_{p} = h + h_{0}

,

ϕ_{p} = ϕ + ϕ_{v}

,

θ_{p} = θ + θ_{v}

. Under real conditions, the combined pitch angle

θ_{p}

and the combined yaw angle

ϕ_{p}

are very slight. It is common for them to be kept below 0.1 rads. Under these conditions, we can expand all the sinusoidal functions with Taylor’s series and ignore all the items over the second orders. The simplified perspective projection transformation can be rewritten as follows:

\{\begin{matrix} c =^{B} C_{I} + \frac{x_{v} y_{v}}{c_{f} y_{v}^{2}} + \frac{x_{v} z_{v} - h_{p} x_{v}}{c_{f} y_{v}^{2}} θ_{p} + \frac{x_{v}^{2} + y_{v}^{2}}{c_{f} y_{v}^{2}} ϕ_{p}, \\ r =^{B} R_{I} + \frac{(h_{p} - z_{v}) y_{v}}{r_{f} y_{v}^{2}} + \frac{- y_{v}^{2} - {(h_{p} - z_{v})}^{2}}{r_{f} y_{v}^{2}} θ_{p} + \frac{(h_{p} - z_{v}) x_{v}}{r_{f} y_{v}^{2}} ϕ_{p} . \end{matrix}

(22)

Substituting the simplified parameter functions Equation (20) into the above perspective projection transformation Equation (22), the final appearance of the lane markings can be attained as a parameter function form, as follows:

\{\begin{matrix} c =^{B} C_{I} + \frac{c_{0}}{c_{f}} \frac{1}{t} + \frac{- c_{2} h_{p} θ_{p} + ϕ_{p} + 2 c_{0} c_{2} ϕ_{p}}{c_{f}} + \\ \frac{c_{2} - c_{3} h_{p} θ_{p} + 2 c_{0} c_{3} ϕ_{p}}{c_{f}} t + \frac{(c_{3} - c_{2}^{2} ϕ_{p})}{c_{f}} t^{2} + \frac{2 c_{2} c_{3} ϕ_{p}}{c_{f}} t^{3}, \\ r =^{B} R_{I} + \frac{h_{p}}{r_{f}} \frac{1}{t} + \frac{- θ_{p} + 2 c_{2} h_{p} ϕ_{p}}{r_{f}} + \frac{c_{3} h_{p} ϕ_{p}}{r_{f}} t . \end{matrix}

(23)

3.2. Scale and Orientation Assignment for Lane Markings

Once we have the parametric function of the lane markings in the frame buffer, the local orientation at the boundary of the lane markings is the tangent direction of the parametric function at that location. This can be formulated as the derivative of c to r at any pixel location,

O (c, r) = \frac{\partial c}{\partial r} |_{(c, r)} .

(24)

Substituting this into the parametric function in Equation (23), we can determine the parametric form of the local orientation at the boundary of the lane markings, according to the arc length parameter t,

O (c, r) = \frac{r_{f} (t^{2} (- c_{3} h_{p} ϕ_{p} + 2 c_{0} c_{3} ϕ_{p} + c_{2}) - c_{0})}{c_{f} h_{p} (c_{3} t^{2} ϕ_{p} - 1)} + \frac{r_{f} (6 c_{2} c_{3} t^{4} ϕ_{p} + t^{3} (2 c_{3} - 2 c_{2}^{2} ϕ_{p}))}{c_{f} h_{p} (c_{3} t^{2} ϕ_{p} - 1)} .

(25)

According to this parametric function, we can calculate all the possible local orientations of the lane markings, varying the parameters in these equations. In practice, the lateral position parameter

c_{0}

is the most important indicator of the local orientation of the lane markings. By varying

c_{0}

and t and leaving the others constant, we can create a map showing the local orientation at every location that may be a pixel from the boundary of the lane markings. We call this the orientation map, as illustrated in Figure 8. Some of the other parameters, such as the intrinsic and extrinsic parameters, can also reasonably be set as constant. This is because all of the cameras in DAS have been mounted carefully and can be carefully calibrated before application. The other parameters are either almost constant and have little influence on the appearance of the lane markings in the frame buffer or can be determined using the tracking strategy.

According to our definition, the localized target size is directly related to the width of the lane markings, which are in proportion to the derivative of c to

c_{0}

along the normal direction of the local orientation at that location, which can be determined by the following:

S (c, r) = \frac{\partial c}{\partial c_{0}} |_{(c, r)} s i n (O (c, r)) .

(26)

Similarly, substituting the localized target size into parametric function in Equation (23), we can determine the parametric form of the size of the localized lane markings at the boundary of the lane markings, according to the arc length parameter t,

S (c, r) = \frac{2 c_{3} t^{2} ϕ_{p} + 2 c_{2} t ϕ_{p} + 1}{t c_{f}} s i n (O (c, r)) .

(27)

Combining this information with the parametric function in Equation (25), we can create a map showing the size of the localized lane markings at every location that may be a pixel from the boundary of the lane markings. We call this the orientation map, as illustrated in Figure 8.

With the orientation map and the scale map, we can steer the local edge extractor for the lane markings to the expected orientation and scale at every location in the frame buffer.

4. Implementation

In traditional multi-scale strategies, only a few scales are taken (e.g., three scales). In this case, the whole image is convolved with these three kernels. In this paper, the response at each position is obtained by performing an arbitrary kernel. If we design a specific kernel for each position, the computational cost will be prohibitively large. Efforts have been made to alleviate such a problem. In one previous study, steerable filters [19] have been proposed, so that, the response of a kernel with arbitrary orientation can be obtained using the linear combination of several kernels with fixed orientation. However, in this method, only symmetry Gaussian kernels are suitable. Another previous study presented the LDA method [20], in which all the possible kernels for different scales and different orientations are collected and then several kernels are trained by using LDA from all these possible kernels. We note that, if the possible kernels vary in both orientation and scale, the number of representative kernels is very large and limits the reconstruction error [20]. This prevents the idea from being realized in real time. If the aspect ratio of the possible kernels is not 1, the number would be higher [33]. In this paper, we propose a novel, fast-acting algorithm for this task. The algorithm can be realized in real time.

In this project, we used DoB to detect local features. In fact, DoB is the optimal approximation of a Gaussian kernel. Given the basic shape of the DoB (e.g., the order, as shown in Figure 9) kernel

k_{p}

, the DoB with different scale s, aspect ratio

λ

, and orientation

γ

is defined as a spanned space, as follows:

{k_{p}} = k_{s p a n {s, λ, γ}} .

(28)

Integration maps [34,35] have been proposed as a method of computing the responses of Harr-like kernels to a natural image. For any rectangle R in an image, the sum of the pixels S within rectangle D can be efficiently computed on the integration map

I^{I}

with only

(4 - 1)

addition/subtraction operations, and the area S is

S = I^{I} (V_{D}) + I^{I} (V_{C}) + I^{I} (V_{B}) - I^{I} (V_{A}),

(29)

where,

V_{A}

,

V_{B}

,

V_{C}

and

V_{D}

are the coordinates of four vertexes of R with clockwise order, and vertex A is located in the upper left corner. For more details about the integration map, please refer to [35].

However, this technique is only suitable for upright and horizontal kernels. In order to solve this problem, the polygon integration map was introduced and could be computed in real time [36]. It has been asserted that any polygonal area can be computed in real time. However, we note that the capitulation error cannot be eliminated, rendering the response inconsistent. In this paper, we combine the concepts from the integration map and from the image rotation and propose a new real-time implementation algorithm for computing responses of DoB kernels on natural images, namely osDoB.

We then take the detection of lane markings as an example. As shown in Figure 8, the object scales of different part of the lane markings are diverse because of the perspective effect. The optimal kernels for different locations have different scales. The expected orientations of the local edges are also different. Parameter maps can be obtained using the perspective effect, as shown in the figure (only the scale map and the orientation maps are shown, and the entire image has a single aspect ratio). In this paper, only one order DOB filter is adopted, as shown in left side of Figure 9. Instead of generating filters with different orientations, we keep the filter from rotating, but rotate the image at multiple orientations. Meanwhile, in order to solve the problem of arbitrary scale assignment for the kernel, we use the technique of integration map to realize the real-time calculation. In other words, by combining the advantages of image rotating and integration map, the proposed osDoB filter can not only achieve nearly the same real-time performance of the box filter, but can also maintain the ability to obtain the filter response of any orientation and scale.

Given the parameter maps, we can compute the response map based on the proposed algorithm. First, we rotate the image with different orientation steps (e.g., in this project, we use 18 orientations); then, for each rotated image, the integration map is computed, giving us 18 integration maps. When we compute the response for location (x, y), we look up both the scale parameter s and the orientation parameter

γ

from the parameter maps. With these two parameters and the aspect ration known before, the kernel shape and orientation are determined. To avoid rotating the kernel, we compute the respond in a standard DOB manner, but at the integration map of the rotated image. According to the parameters of the filter at location (x, y), we can find out the integration map corresponding to the given orientation and calculate the response using the DoB filter with given scale parameter. The response M at

(x, y)

is obtained in a standard DoB manner with given kernel shape at the chosen integration map,

M (x, y) = I^{I} (V_{F}) - I^{I} (V_{E}) + I^{I} (V_{D}) - 2 I^{I} (V_{C}) + 2 I^{I} (V_{B}) - I^{I} (V_{A}) .

(30)

A DoB contains two rectangles, where R(A,B,C,D) is the left side of the rectangle (negative part) and R(B,E,F,C) is the right side (positive part). The area of these two rectangles can be computed efficiently.

The process is summarized in Algorithm 1.

Algorithm 1 Workflow of osDoB.

Input:
Image I with height H and width W
Scale $s (x, y)$ , orientation $γ (x, y)$ for kernels at (x, y) and aspect ratio $λ$ for all the kernels
Output:
The response map M

1:: $I^{P} \leftarrow P a d (I, [m a x (H, W), m a x (H, W)])$ ▹ Padding boundaries
2:: for each orientation $θ$ do
3:: $I_{θ}^{R} \leftarrow R o t a t e (I^{P}, θ)$ ▹ Generating rotated images
4:: $I_{θ}^{I} \leftarrow I n t e g r a t e (I_{θ}^{R})$ ▹ Generating integration maps
5:: end for
6:: for each valid position $(x, y)$ do
7:: $I_{θ^{*}}^{I} \leftarrow S e l e c t (I^{I}, γ (x, y))$ ▹ Selecting integration map by orientation
8:: ${V_{i}} \leftarrow P o s i t i o n ((x, y), s (x, y), λ)$ ▹ Positioning vertexes by scale and aspect ratio
9:: $M (x, y) \leftarrow D o B (I_{θ^{*}}^{I}, {V_{i}})$ ▹ Calculating difference of box (DoB)
10:: end for
11:: return M

5. Experiments

We first used a synthesized image to show the performance of the proposed method in detecting objects with different scales and orientations. Then, we performed experiments on real-world images under the perspective effect. Many edge detectors have been developed in the last few decades, such as those of Sobel, Prewitt, Roberts, etc. Though the Canny detector [37] was proposed in the early days of computer vision, it is still a state-of-the-art edge detector with high efficiency. Edge detectors with better performance than Canny usually require longer computation time [38,39]. Thus, in this experiment, we qualitatively compared our method with the Canny detector, and compared our method with the Canny, Sobel, Prewitt and Roberts detectors quantitatively.

Figure 10 shows the synthesized image employed for comparison. The objects we used were rectangles from different orientations and scales. In this image, there are nine objects in a

3 \times 3

array. In each row, objects have the same scale. In each column, objects have the same orientation. Here, we chose the direction of the longer side of rectangle as the localized target orientation for osDoB. Therefore, osDoB was supposed to only extract the edge along the orientation of object and have little response on other orientations, especially in the direction of the shorter side of the rectangle. Thus, Canny is supposed to extract all the edges of the objects.

Images near the top of Figure 10 show the noisy initial image, the gradient map produced by the proposed method (osDoB, tuning both the scale and orientation), and the edge map produced by osDoB. Near the bottom of the figure, there are three edge maps produced by the Canny edge detector with different scales for the Gaussian blurring kernels. We can see that osDoB only extracts edges corresponding to the preset orientation, but has the ability to detect all of the objects well with quite acceptable localization accuracy. However, for the Canny detector, it is difficult to detect all the objects simultaneously, even using different scales. We observe that, if the Gaussian kernel scale is small, false edges caused by noise will pop out; if the scale is large, the small objects will disappear and the shapes of large objects will be deformed. Another reason why the shape of the large objects was destroyed is as follows. In the Canny detector, a symmetric kernel is used (Gaussian kernel). The aspect ratio of this kernel is 1. As discussed above, such an aspect ratio corresponds to a very low angular accuracy, so it must affect the localization accuracy. However, in the proposed method, the aspect ratio can be selected arbitrarily. In this experiment, we use a 3:1 aspect ratio, and thus yield a quite good angular accuracy. It is worth noting that osDoB may totally fail if we have no knowledge of the localized target orientation. Thankfully, in a real scene, which we will discuss below, we can obtain both the localized target size and localized target orientation from prior knowledge about the scene and perspective effect in imaging process.

The perspective effect is always present in the imaging process in both human and mechanical vision. In computation, it is very hard to assign the special orientation and scale for each location in the image. The expected orientation and scale can be determined using the following two factors: first, the extrinsic and intrinsic parameters of the imaging equipment; second, the prior of the targets/objects to be detected. As mentioned in the proceeding section, there is one case, road detection, in which we can determine the orientation map and scale map ahead of time under certain constraints. The intrinsic and extrinsic parameters of the imaging equipment are known beforehand, and other parameters regarding the road shape and the relative relationship between the road and vehicle can either be obtained by tracking or set to constants. Hence, the optimal scale and the expected orientation could be computed for the optimal response at each location. In this paper, we use road images to validate the performance of the proposed algorithm.

In this paper, we use a dataset provided by a previous study to compare the proposed edge detector with other algorithms [40]. This dataset consists of 116 high-quality color road images, which are captured in different scenarios. For each image, the lane marking areas are manually labeled as the ground truth. Some examples of the dataset and the corresponding human labeled ground truth are shown in Figure 11. We employ the whole dataset for evaluation. In the evaluation, if an edge point labeled by an algorithm is near the edge of the ground truth (the block distance between them is less than two pixels), we consider it to be marked correctly.

In this experiment, both the ROC (receiver operating characteristic) curve [41] and DSC (dice similarity coefficient) curve [42] were used to evaluate these algorithms. ROC reflects the true-positive rate (hit rate) against the false-positive rate (false alarm rate) while shifting the threshold from zero to one. The area under the ROC curve (AUC) is called the ROC score and it reflects the performance of the algorithm. In this experiment, the false-positive rate and the true-positive rate were computed by matching the edge map given by an algorithm to the one of the ground truth at each threshold value. The DSC curve is another evaluation method, which records the DSC values against the thresholds. The maximum DSC values are a measure of the performance of an algorithm at the optimal threshold setting. The DSC value is defined as

D S C = \frac{2 * T P}{T P + F P + T}

, where TP is the true positive, FP is the false-positive, and T is the true point in the ground truth.

We compared our approach osDoB, tuning both the orientation and scale to the Canny edge detector. There are two versions of our method: one is only tuning the orientation (oDoB); the other is tuning both the orientation and scale (osDoB). We slid the threshold for each model from 0 to 1 and recorded these threshold edge maps. We then computed the ROC and DSC curves for each model.

Some examples of the comparison are shown in Figure 12. We show here six road images with different scenarios. In each grid, the top row shows the input image, the expected orientation map, and the expected scale map (from left to right). These maps can be computed using the estimated imaging parameters and the estimated vanishing points. In the lower part of the the figure, there are three columns showing the results of Canny, oDoB (only tuning the orientation) and osDoB (tuning both the orientation and scale). Each column contains two maps: the gradient magnitude map (top) and the edge map (bottom).

As shown in Figure 12, the proposed model can handle both curving lane markings (as shown in Figure 12e) and straight one. From these examples, we observe that the orientation tuning is of great importance for edge detection. As shown in Figure 12a,c, most of these heavy shadows in the road surface were removed. This is very helpful as, without it, the wrong edge points, such as those caused by the shadows on the road surface, referred to as the corresponding edge detection results by Canny, will yield spurious peaks in the Hough space. It has been believed that, for Canny edge detector, this type of problem can be handled by combining an orientation map. These false edge points can be removed by examining the orientation gradient at the expected orientation. In the following, we will show that this is not true.

As shown in Figure 13, there are two cases in which orientation tuning will play an important role. In the first case, orientation tuning removes noisy edges with unexpected orientations. As shown in Figure 13a, besides several lane markings, there are many unexpected white lines with unexpected orientations. The osDoB can suppress these lines. In this case, such a result can also be obtained by the Canny detector, which combines with an orientation map. We can remove these edges by determining whether these edges possess the expected orientation. We can also remove them at this stage (after edge detection stage). In the second case, the osDoB will play a very important role. A single edge point may possess more than one orientation [19]. For traditional edge detection, such as the Canny edge detector, the orientation with the most intense response will be labeled as an edge point with that orientation. However, sometimes the orientation of one point will lower the intensity of the response to the expected orientation, and that point has a consistent orientation along a local line. As shown in Figure 13b, the edges in the boundary of the lane marking have the same orientation, but some of them show higher responses at other orientations. This is caused by heavy shadows. For the Canny detector, these edges can give a point with the wrong orientation, and such edges may be removed in the non-maxima suppression stage. This is possible because they lack consistent orientations, as shown in the lower part of Figure 13b. For the proposed method, an orientation tuning strategy is employed, and points with weak expected orientations can be extracted.

Assigning proper scale for each point is very important for achieving maximal SNR using such a strategy. From these comparisons, we observe that assigning a proper scale for each location is also important. As shown in Figure 12d, the osDoB can suppress the edges with unexpected orientations, but some of them are still present. This is because the response of these points is very heavy at smaller scales. When we assign a proper scale to each point (osDoB), we observe that most of the unexpected edges are removed. The results shown in Figure 12f also supports this conclusion. In the input image, there is a passenger car which possesses many edges and lines with unexpected orientations. Most of these edges can be suppressed by assigning the proper scale.

The quantitative results are shown in Figure 14a,b. We compare oDoB, and osDoB with edge detectors, such as Canny, Sobel, Prewitt and Roberts. The ROC curves of these models are shown in Figure 14a. We also adopt the AUC (area under the curve) of the ROC curve as the evaluation metric and depict the value of the corresponding AUC of each curve. By comparing quantitatively, we can see that oDoB achieves compatible performance with Canny. The osDoB has the highest AUC, with a

10 %

higher AUC than Canny. The osDoB can always obtain the highest hitting rate among these edge detectors when a specific false positive rate is given. From the DSC curves, which are shown in Figure 14b, we can draw the conclusion that orientation tuning can produce better performance. Tuning both the orientation and the scale is needed and is able to produce the best performance. The optimal threshold for osDoB is

0.26

.

6. Conclusions

We address the problem of edge extraction according to the top-down information about certain targets under perspective effect. The top-down information contains two parts: one is the intrinsic and extrinsic parameters of the imaging process; the other is prior to the targets in the scene (in this paper, we consider edge extraction for the purposes of target detection). Once we have this information, we have more of a chance to extract the local edges from the real contours of the target, while ignoring the edges caused by noise or other irrelevant events, such as the texture, shadows, shading, and even edges from nearby objects. In contrast to traditional methods, we introduced the top-down information into the low-level feature extraction stage as early as possible.

In this paper, we argue that the most important aspects to enhance a local edge are scales (localized target size) and the expected orientation (localized target orientation). We assigned special orientation and scale for each location in the image before edge extraction. To compute the response of an edge detector at many orientations and arbitrary scale in real time, we proposed the osDoB. To illustrate that our osDoB can be tuned to arbitrary orientation and scale in real time, we solve the lane detection problem in challenging scenes. Experimental results show that performance can be greatly improved by the introduction of global appearance information during local feature detection.

We argue that the scale and orientation should have more physical meaning connected with the real target information. The same problem came with other local features in computer vision literature, “similar to other concepts used in computer vision, such as “texture” and “face”, the notion of contour is the result of common human experience rather than a formal mathematical definition” [6]. Although, in this work, we assign scale and orientation for local edge detection in rather a hard manner, it is also possible for us to assign scale and orientation for local feature detectors in a soft manner, such as applying the Gestalt principle [43], which can be addressed in future work. osDoB can be used as an efficient local filter for special orientation and arbitrary scale, and it has many usages in applications such as local edge extraction for local feature descriptors.

Author Contributions

Conceptualization, J.L. and X.A.; methodology, J.L. and X.A.; software, J.L.; validation, J.L.; writing—original draft preparation, J.L. and X.A.; writing—review and editing, J.L. and X.A. Both authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China under Grant 61973311.

Data Availability Statement

The results of our method are available on request by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zitnick, C.L.; Dollár, P. Edge boxes: Locating object proposals from edges. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 391–405. [Google Scholar]
Kang, X.; Li, S.; Benediktsson, J.A. Spectral–spatial hyperspectral image classification with edge-preserving filtering. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2666–2677. [Google Scholar] [CrossRef]
Civera, M.; Zanotti Fragonara, L.; Surace, C. An experimental study of the feasibility of phase-based video magnification for damage detection and localisation in operational deflection shapes. Strain 2020, 56, e12336. [Google Scholar] [CrossRef]
Marr, D.; Hildreth, E. Theory of edge detection. Proc. R. Soc. Lond. Ser. B Biol. Sci. 1980, 207, 187–217. [Google Scholar]
Szeliski, R. Computer Vision: Algorithms and Applications; Springer: New York, NY, USA, 2010. [Google Scholar]
Papari, G.; Petkov, N. Edge and line oriented contour detection: State of the art. Image Vis. Comput. 2011, 29, 79–103. [Google Scholar] [CrossRef]
Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Witkin, A. Scale-space filtering. In Readings in Computer Vision: Issues, Problems, Principles, and Paradigms; Morgan Kaufmann: San Francisco, CA, USA, 1987; pp. 329–332. [Google Scholar]
Koenderink, J.J. The structure of images. Biol. Cybern. 1984, 50, 363–370. [Google Scholar] [CrossRef]
Lindeberg, T. Scale-space theory: A basic tool for analyzing structures at different scales. J. Appl. Stat. 1994, 21, 225–270. [Google Scholar] [CrossRef]
Lindeberg, T. Edge detection and ridge detection with automatic scale selection. In Proceedings of the CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 18–20 June 1996; pp. 465–470. [Google Scholar]
Liu, X.M.; Wang, C.; Yao, H.; Zhang, L. The scale of edges. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 462–469. [Google Scholar]
Elder, J.H.; Zucker, S.W. Local scale control for edge detection and blur estimation. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 699–716. [Google Scholar] [CrossRef]
Hoiem, D.; Efros, A.; Hebert, M. Putting objects in perspective. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2137–2144. [Google Scholar]
Kong, H.; Audibert, J.Y.; Ponce, J. Vanishing point detection for road detection. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 96–103. [Google Scholar]
Ren, J.; Chen, Y.; Xin, L.; Shi, J. Lane detection in video-based intelligent transportation monitoring via fast extracting and clustering of vehicle motion trajectories. Math. Probl. Eng. 2014, 2014, 156296. [Google Scholar] [CrossRef]
Alvarez, J.M.; Gevers, T.; Lopez, A.M. 3D scene priors for road detection. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 57–64. [Google Scholar]
Kong, H.; Audibert, J.; Ponce, J. General Road Detection From a Single Image. IEEE Trans. Image Process. 2010, 19, 2211–2220. [Google Scholar] [CrossRef]
Freeman, W.; Adelson, E. The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 891–906. [Google Scholar] [CrossRef]
Perona, P. Deformable kernels for early vision. IEEE Trans. Pattern Anal. Mach. Intell. 1995, 17, 488–499. [Google Scholar] [CrossRef]
Liu, Y.; Cheng, M.M.; Hu, X.; Wang, K.; Bai, X. Richer convolutional features for edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3000–3009. [Google Scholar]
Yu, Z.; Feng, C.; Liu, M.Y.; Ramalingam, S. Casenet: Deep category-aware semantic edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5964–5973. [Google Scholar]
Demigny, D.; Karabernou, M. An effective resolution definition or how to choose an edge detector, its scale parameter and the threshold? In Proceedings of the International Conference on Image Processing, Lausanne, Switzerland, 16–19 September 1996; Volume 1, pp. 829–832. [Google Scholar]
Shen, J.; Castan, S. An optimal linear operator for step edge detection. CVGIP Graph. Model. Image Process. 1992, 54, 112–133. [Google Scholar] [CrossRef]
Deriche, R. Using Canny’s criteria to derive a recursively implemented optimal edge detector. Int. J. Comput. Vis. 1987, 1, 167–187. [Google Scholar] [CrossRef]
Herskovitz, A.; Binford, T. On boundary detection. In MIT Technical Report; MIT Press: Cambridge, MA, USA, 1980. [Google Scholar]
Demigny, D.; Kamlé, T. A Discrete Expression of Canny’s Criteria for Step Edge Detector Performances Evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 1199–1211. [Google Scholar] [CrossRef]
Yokono, J.; Poggio, T. Oriented filters for object recognition: An empirical study. In Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, 17–19 May 2004; pp. 755–760. [Google Scholar]
Demigny, D. On Optimal Linear Filtering for Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 11, 728–737. [Google Scholar] [CrossRef]
Bertozzi, M.; Broggi, A. GOLD: A parallel real-time stereo vision system for generic obstacle and lane detection. IEEE Trans. Image Process. 1998, 7, 62–81. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, Y.; Teoh, E.; Shen, D. Lane detection and tracking using B-Snake. Image Vis. Comput. 2004, 22, 269–280. [Google Scholar] [CrossRef]
Dickmanns, E.; Mysliwetz, B. Recursive 3-D road and relative ego-state recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 199–213. [Google Scholar] [CrossRef]
An, X.; Li, J.; Shang, E.; He, H. Multi-scale and Multi-orientation Local Feature Extraction for Lane Detection Using High-Level Information. In Proceedings of the 2011 Sixth International Conference on Image and Graphics (ICIG), Hefei, China, 12–15 August 2011; pp. 576–581. [Google Scholar]
Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1. [Google Scholar]
Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
Pham, M.T.; Gao, Y.; Hoang, V.D.D.; Cham, T.J. Fast polygonal integration and its application in extending haar-like features to improve object detection. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 942–949. [Google Scholar]
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 8, 679–698. [Google Scholar] [CrossRef] [PubMed]
Maini, R.; Aggarwal, H. Study and comparison of various image edge detection techniques. Int. J. Image Process. 2009, 3, 1–11. [Google Scholar]
Civera, M.; Fragonara, L.Z.; Surace, C. Video processing techniques for the contactless investigation of large oscillations. J. Phys. Conf. Ser. 2019, 1249, 012004. [Google Scholar] [CrossRef]
Veit, T.; Tarel, J.; Nicolle, P.; Charbonnier, P. Evaluation of road marking feature extraction. In Proceedings of the ITSC 2008 11th International IEEE Conference on Intelligent Transportation Systems, Beijing, China, 12–15 October 2008; pp. 174–181. [Google Scholar]
Green, D.M.; Swets, J.A. Signal Detection Theory and Psychophysics; Wiley: New York, NY, USA, 1966; Volume 1. [Google Scholar]
Dice, L.R. Measures of the amount of ecologic association between species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
Bileschi, S.; Wolf, L. Image representations beyond histograms of gradients: The role of gestalt descriptors. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 18–23 June 2007; pp. 1–8. [Google Scholar]

Figure 1. Illustration of the orientation and scale tuning at each location of the input image. (a) Kernels of different scale and orientation for best detection of the lane markings. (b) Scale and orientation assignment according to the prior knowledge.

Figure 2. Introductory flowchart.

Figure 3. Angular response of FDOG.

Figure 4. Angular response of FDOG for different aspect ratio

λ

.

Figure 4. Angular response of FDOG for different aspect ratio

λ

.

Figure 5. Angular response of DoB for different aspect ratio

λ

.

Figure 5. Angular response of DoB for different aspect ratio

λ

.

Figure 6. Response for varying pre-known localized target size and localized target orientation of the DoB, having optimal parameters.

Figure 7. Definition of coordinates and illustration of the parametric model of lane markings.

Figure 8. Workflow of the proposed speedup orientation and scale tuned difference of boxes (osDoB).

Figure 9. Illustration of shape of DoB kernels that are controlled by three parameters (left: 1 order DoB filter, right: 2 order DoB filter).

Figure 10. Performance of the proposed method and Canny detector on a noisy synthesized image.

Figure 11. Some examples from the dataset and the corresponding labeled ground truths [40].

Figure 12. Some comparison results on road images. See text for more details.

Figure 13. Illustration about the importance of orientation tuning. (a) Salient edges other than expected orientation appeared. (b) Multiple orientations caused by both of the lane markings and the shadows.

Figure 14. The quantitative comparison results.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; An, X. Efficient Filtering for Edge Extraction under Perspective Effect. Appl. Sci. 2021, 11, 8558. https://doi.org/10.3390/app11188558

AMA Style

Li J, An X. Efficient Filtering for Edge Extraction under Perspective Effect. Applied Sciences. 2021; 11(18):8558. https://doi.org/10.3390/app11188558

Chicago/Turabian Style

Li, Jian, and Xiangjing An. 2021. "Efficient Filtering for Edge Extraction under Perspective Effect" Applied Sciences 11, no. 18: 8558. https://doi.org/10.3390/app11188558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Filtering for Edge Extraction under Perspective Effect

Abstract

1. Introduction

2. Optimal Scale and Orientation Assignment

2.1. Defining the Problem

2.2. Optimal Scale with Pre-Known Localized Target Size

2.3. Angular Accuracy with Pre-Known Localized Target Orientation

2.4. Local Edge Extractor Selection and Performance Analysis

3. Lane Marking Detection

3.1. Parametric Model of Projective Lane Markings

3.2. Scale and Orientation Assignment for Lane Markings

4. Implementation

5. Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI