**1. Introduction**

Today, traffic video analysis plays a very important role in intelligent transportation systems. It has become a common way to help people track a vehicle, as well as locate and judge an accident. Because the images captured by outdoor cameras are often affected by different weather conditions, they suffer from poor visibility and lack of contrast. In the literature, there are many enhancements and dehazing algorithms that improve different images, such as traffic videos, underwater images, and satellite imagery [1–3]. The hazy weather that happens frequently all over the world is becoming a video analysis killer. The haze captured in the video degrades the contrast and color information and reduces the visibility. Therefore, the problem of how to efficiently and effectively remove the haze in traffic videos has attracted broad attention from both academia and industry. In general, when dealing with haze removal in traffic videos, the existing dehazing algorithms often exhibit poor real-time performance, overstretched contrast, and even fail to remove dense haze. The key issue of these problems is how to deal with images in different scenes with different degrees of haze, thus an adaptive algorithm that can remove haze based on the image characteristics is needed. Moreover, the existing video-dehazing methods are almost universal for all videos and do not consider the characteristics of videos in particular scenarios. For traffic videos, the time continuity, lane space structure, and camera spatial locations can be effectively used to decrease computational cost.

In order to restore traffic videos with different degrees of haziness in a real-time and adaptive manner, this paper presents an efficient traffic video dehazing method using adaptive dark channel prior and spatial-temporal correlations. This method can avoid overstretched contrast after haze removal and obtain satisfactory restored results for dense hazy videos by using a novel approach involving adaptive transmission estimation. This method also takes full advantage of the temporal and spatial correlations in traffic videos to meet the requirements of real-time dehazing, such as using time continuity to set the time slice, refining transmission by characteristics of block structure, decreasing restored area according to the lane space, and simplifying the calculation of parameters by using multi-camera distribution.

## **2. Related Works**

Essentially, videos are composed of frames, thus the haze removal method for images can be used for videos. The image dehazing method is the most common way to restore hazy images. This method considers the inverse process of image degradation and describes the image degradation process in detail through an established physical model. The most critical step of this method is to obtain the parameters of the degradation model. Oakley et al. [4] improved the image quality by using the physical model and estimated the degradation model parameters based on a statistical model. This method is not widely used because it is only useful for gray-scale images, and the acquiring parameters require calibrated radar to ge<sup>t</sup> depth information. Narasimhan et al. [5] proposed a method to estimate the depth information by comparing two images of the same scene in different weather conditions. Chen et al. [6] used a sunny image and a foggy image for reference images to calculate parameters. Both of these methods need to receive eligible images in advance, which increases the difficulty of image acquisition.

To obtain the parameters of the degradation model effectively, some dehazing methods based on prior knowledge or assumptions were proposed, and they do not need to ge<sup>t</sup> reference images in advance or use an additional hardware device. Therefore, these methods have better adaptability than previous methods. Based on the assumption that a haze-free image has a higher contrast than a hazy image, Tan [7] proposed a haze removal approach by maximizing the contrast of recovered scene radiance. This approach can produce a satisfactory result for haze removal in single images, but it tends to overcompensate for the reduced contrast and leads to halo effects. Fattal [8] decomposed scene radiance of an image into the albedo and shading and then estimated the scene radiance based on independent component analysis, assuming that transmission shading and surface shading are locally uncorrelated. However, this method cannot generate impressive results when the captured image is heavily obscured by fog. He et al. [9] presented a single image haze removal method by using dark channel prior, which can estimate the transmission map directly. However, when a large white area without shading exists in the images, or the images have uneven illumination, this method takes a long time to restore the hazy images. In addition, the use of the soft matting algorithm makes this a complex computation. Then, Lai et al. [10] presented a haze removal method based on the difference-structure-preservation prior. In this method, the difference-structure-preservation dictionary is learned such that the local consistency features of the transmission map can be well preserved after coefficient shrinkage. Zhu et al. [11] presented a simple but effective Color Attenuation Prior (CAP)algorithm similar to Dark Channel Prior (DCP)using the difference in brightness and saturation to estimate the haze concentration to build a depth model for dehazing. Up until now, other researchers have improved their dehazing algorithms based on the dark channel prior. Yeh et al. [12] introduced a haze removal algorithm based on region decomposition and feature fusion, which is especially suitable for hazy images with large sky regions. Li et al. [13] proposed a novel haze removal method based on sky segmentation and dark channel prior to restore images. In this method, the average image intensity of the sky region is chosen as the atmospheric light value. Wang et al. [14] designed a new method of selecting atmospheric light values to weaken the area where the dark channel priority does not work effectively. A visibility restoration method was introduced by Huang et al. [15], which consists of three modules: (i) depth estimation module based on dark channel priority, (ii) color analysis module that repairs depth estimation distortion, and (iii) visibility restoration module that generates repair results. Riaz et al. [16] proposed a new and efficient method for transmission estimation with bright-object handling capability, which uses a local average haziness value to compute the

transmission of such surfaces based on the observation that the transmission of a surface is loosely connected to its neighbors.

Usually, traffic video dehazing algorithms are proposed based on single-image dehazing algorithms. However, the computational complexity makes it difficult to apply single-image dehazing algorithms directly to video dehazing. Most existing research on video dehazing is to speed up the process of dehazing. Sun et al. [17] proposed a real-time haze removal method based on bilateral filtering to reduce the processing time of 320 × 240 images to a speed of 20 frames per second. However, this method cannot satisfy the requirements of high-definition videos. Wang et al. [18] proposed a method based on Retinex theory that enhances image contrast in YUV color space and can process an image of 704 × 576 in 0.055 s. Kumari et al. [19] proposed an approach for dehazing images and videos based on a filtering method. The use of a gray-scale morphological operation made the approach faster, and it took only 80% of the execution time compared to a fast bilateral filter. Berman et al. [20,21] proposed a new method via calculating the air-light to dehaze fogs, which was based on a non-local prior. Their algorithm relies on the assumption that colors of a haze-free image are well approximated by a few hundred distinct colors that form tight clusters in RGB space. It performs well on a wide variety of images. However, these methods take every frame in videos as a single image, and they are completely based on image dehazing methods.

The characteristics of videos can be applied in specific video dehazing algorithms. Tarel et al. [22] proposed a video dehazing method for onboard video systems. This method can separate moving objects and driveway regions in videos and only update the depth information of moving objects. Zhang et al. [23] proposed a method based on spatial and temporal correlation that uses spatial and temporal similarity between frames to optimize the estimation of a scene depth map. Shin et al. [24] proposed an effective video dehazing technique to reduce flicker artifacts by using adaptive temporal average. However, these methods cannot remove the haze from videos in real time. Therefore, Kim et al. [25] proposed an image-dehazing method based on the image degradation model and kept a balance between image contrast enhancement and image information loss. To improve the speed of video dehazing, they adopted a video dehazing method by using temporal correlation, which can reach a speed of 30 frames per second for videos with a resolution of 640 × 480. However, this method adopts a fixed initial transmission value that cannot be adapted to images with different degrees of haze, and it cannot efficiently remove dense haze in videos. Our method uses an adaptive initial transmission value based on image characteristics to handle different degrees of hazes; meanwhile, it can reduce the processing time through lane space separation.

#### **3. Single-Image Dehazing Using Adaptive Dark Channel Prior**

#### *3.1. Framework of Single-Image Dehazing Method*

The most common dehazing model is based on atmospheric optics [26], which can describe the degradation process of a hazy image. In [27], the modeling function is simplified, and it is represented by Equation (1).

$$I(p) = I(p)t(p) + A(1 - t(p))\tag{1}$$

where *p* is a pixel in the image, *<sup>I</sup>*(*p*) and *J*(*p*) are the observed and haze-free image, respectively, *A* is the global atmospheric light, and *<sup>t</sup>*(*p*) ∈ [0, 1] is the transmission map for each pixel that describes the proportion of the light arriving at a digital camera without scattering.

The process of haze removal for every frame of a traffic video can be divided into three steps: calculating atmospheric light, estimating the transmission map, and restoring the image. In this paper, we present a novel adaptive method for transmission map estimation, thus the dehazing algorithm can be applied to images with different degrees of haze. The framework of the single-image dehazing algorithm is shown in Figure 1.

**Figure 1.** Framework of single-image dehazing method.

We use a hierarchical searching method based on quad-tree subdivision [25] to find the areas least affected by haze and to ge<sup>t</sup> the brightest pixel in this area. The detailed steps are as follows:


At last, we choose the color vector, which minimizes the distance -*Ir*(*p*), *Ig*(*p*), *Ib*(*p*) − (255, 255, <sup>255</sup>), where *<sup>I</sup>*(*p*) is the value of pixel *p* in the selected region as the atmospheric light.

#### *3.2. Transmission Estimation for Enhancing the Contrast of Blocks*

In general, a hazy block yields low contrast, and the contrast of a restored block increases as the value of the estimated transmission decreases. We adopt the image-contrast-enhanced method [18] to maximize the contrast of the restored blocks and ge<sup>t</sup> the best estimated transmission value.

Mean squared error contrast (CMSE) [28] can define the contrast of a restored block, which is represented by Equation (2):

$$\mathcal{C}\_{MSE} = \sum\_{p=1}^{N} \frac{\left(f\_{\mathcal{E}}(p) - \overline{f\_{\mathcal{E}}}\right)^2}{N} \tag{2}$$

where *Jc*(*p*) represents the RGB color channel of pixel p in a block of input image, *c* ∈ {*<sup>r</sup>*, *g*, *b*}, *Jc* is the average value of *Jc*(*p*), and *N* is the number of pixels in a block.

According to the assumption that the scene depths are locally similar [8,12,16], the dehazing algorithm in this paper determines a single transmission value for each block of size 32 × 32, and then gets the fixed optimal transmission value *t* for each block. For a pixel *p* in a block, *<sup>t</sup>*(*p*) in Equation (1) can be replaced with the fixed estimated transmission *t* of its block. Hence, *Jc*(*p*) is represented by Equation (3).

$$J\_{\mathfrak{c}}(p) = \frac{I\_{\mathfrak{c}}(p) - A}{t} + A \tag{3}$$

If Equation (3) is put into Equation (2), *CMSE* can be represented by Equation (4):

$$C\_{MSE} = \sum\_{p=1}^{N} \frac{\left(I\_c(p) - \overline{I\_c}\right)^2}{t^2 N} \tag{4}$$

where *Ic* is the average value of *Ic*(*p*) in the input block. According to Equation (4), we can find that the mean squared error contrast is a decreasing function of *t*. Thus, we can select a small value of *t* to increase the contrast of a restored block. However, the value of *t* influences the pixel's restored image value according to Equation (3).

However, when a block contains dense haze, it has a relatively narrow value range for input pixels. Thus, even though it is assigned a small *t* value, most of the input values are not truncated, and the block can be correctly restored. On the contrary, a block without haze usually has a broad range of values for input pixels and should be assigned a larger *t* value to reduce the information loss due to the truncation. Thus, we should not only enhance the contrast but also reduce the information loss.

Therefore, we need to set quantitative evaluations for contrast and information integrity. The contrast cost *Econtrast* and the information loss cost *Eloss* were proposed by Kim [25] to evaluate the contrast and information integrity, respectively.

$$E\_{\text{contrast}} = -\sum\_{c \in \{r, g, b\}} \sum\_{p \in B} \frac{\left(l\_c(p) - \overline{l\_c}\right)}{N\_B} = -\sum\_{c \in \{r, g, b\}} \sum\_{p \in B} \frac{\left(l\_c(p) - \overline{l\_c}\right)}{N\_B} \tag{5}$$

where *Jc* and *Ic* are the average values of *Jc*(*p*) and *Ic*(*p*) in block *B*, respectively, and *NB* is the number of pixels in *B*. Thus, we can maximize the mean squared error contrast by minimizing the value of *Econtrast*.

$$E\_{\text{loss}} = \sum\_{\mathbf{c} \in \{r, g, h\}} \sum\_{p \in B} \left\{ \left( \min \{ 0, f\_{\mathbf{c}}(p) \} \right)^2 + \left( \max \{ 0, f\_{\mathbf{c}}(p) - 255 \} \right)^2 \right\} \tag{6}$$

where *min*{0, *Jc*(*p*)} and *max*{0, *Jc*(*p*) − 255} denote the truncated values for output pixels due to the underflow and overflow, respectively.

If we want to ge<sup>t</sup> a better restored image, the image contrast should be smoother, and the color information should be maintained as much as possible. Thus, these two factors should be taken into consideration synthetically, and the overall cost function is described as Equation (7).

$$E = E\_{contrast} + \lambda\_L E\_{loss} \tag{7}$$

where *λL* is a weight coefficient that controls the relative importance of the contrast cost and the information loss cost [18]. The minimum value of *E* represents the most suitable contrast for restored images, and the color loss is as small as possible. Finally, for each block in a hazy image, we can ge<sup>t</sup> an optimal transmission *t*∗ by minimizing the value of *E*. The value of *t*∗ is the transmission we use while dehazing.

#### *3.3. Adaptive Estimation of Initial Transmission*

#### 3.3.1. Calculating Image Haziness Flag

We present a haziness flag *T* to measure the degree of haze in an image. The dark channel prior [9] can estimate the transmission of a block, which represents the luminosity of objects. The transmission has a close relationship with the degree of haze. Therefore, we can adopt the average value of

transmission as the haziness flag *T* of an image. The haziness flag *T* is concerned with the effects of the degree of haze in images.

The dark channel prior is based on the observation that most local blocks in haze-free outdoor images contain some pixels that have very low intensities in at least one color channel. In other words, the dark channel value of a haze-free image is close to zero [9]. For any input image *J*, dark channel *Jdark* can be expressed as Equation (8).

$$J\_{dark}(p) = \min\_{y \in \mathcal{D}(p)} \left( \min\_{c \in \{r, g, b\}} J\_{\mathbb{C}}(y) \right) \tag{8}$$

where *c* ∈ {*<sup>r</sup>*, *g*, *b*} and *<sup>Ω</sup>*(*p*) represent a local block centered at *p*, and *y* is a pixel in the local block *<sup>Ω</sup>*(*p*). A dark channel is the outcome of two minimum operators: *minc Jc*(*y*) is performed on each pixel, and *min y*∈*<sup>Ω</sup>*(*p*)is a minimum filter [9].

Assuming that the atmospheric light *Ac* is given, we can normalize the haze imaging Equation (1) by *Ac* [9]:

$$\frac{I\_c(p)}{A\_c} = t(p)\frac{I\_c(p)}{A\_c} + 1 - t(p) \tag{9}$$

Since the transmission *<sup>t</sup>*(*p*) is a constant *<sup>t</sup>*(*p*) in local block, and the value of *Ac* is given, the dark channel operation can be given by the following equations [9].

$$\min\_{y \in \Omega(p)} \left( \min\_{\varepsilon} \frac{I\_{\varepsilon}(y)}{A\_{\varepsilon}} \right) = \tilde{t}(p) \min\_{y \in \Omega(p)} \left( \min\_{\varepsilon} \frac{I\_{\varepsilon}(y)}{A\_{\varepsilon}} \right) + 1 - \tilde{t}(p) \tag{10}$$

Using the concept of a dark channel [9], if *Jc* is an outdoor haze-free image except for the sky region, the intensity of dark channel is low and tends to be zero, which leads to:

$$\min\_{y \in D(p)} \left( \min\_{\mathcal{c}} \frac{J\_{\mathcal{c}}(y)}{A\_{\mathcal{c}}} \right) = 0 \tag{11}$$

Putting Equation (11) into Equation (9), we can eliminate the multiplicative term and estimate the transmission *<sup>t</sup>*(*p*) simply by

$$\tilde{t}(p) = 1 - \min\_{y \in \Omega(p)} \left( \min\_{\mathcal{c}} \frac{I\_{\mathcal{c}}(y)}{A\_{\mathcal{c}}} \right) \tag{12}$$

where *<sup>t</sup>*(*p*) is the predicted value of transmission of a block [9]. We need to calculate the average transmission for all blocks to obtain the average transmission T for the whole image, which is the value of image haziness flag.

#### 3.3.2. Correction of Initial Transmission

According to our experimental results, in a hazy image, the range of *T* is generally between 0.4 and 0.6. Although the image haziness flag *T* can characterize the nature of the image, taking *T* as the initial transmission value to ge<sup>t</sup> the optimal transmission leads to an excessive value of *t*<sup>∗</sup>. Thus, we set a correction value *X*, and set *T* ∗ *X* as the initial transmission value to decrease this initial value.

The structural similarity (SSIM) index is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. To guarantee that the restored images are closer to ground truths, we adopted the SSIM index [29] to measure the similarity between the ground truths and restored images. Because the traffic video is captured by a fixed camera, we can ge<sup>t</sup> a haze-free image of the same scene as a reference image in advance and compare the restored image with the reference image. The initial value of *T* can be obtained directly because it is relevant to the nature of images, whereas the unknown value *X* is calculated by the SSIM. In our experiments, we set *X* as a series of values between 0.3 and 1.2, and the interval is 0.02. Then, we take every *X* in this range multiplied by *T*, that is, *T* ∗ *X*, as the initial value of transmission and ge<sup>t</sup> the corresponding restored image. At last, we find a restored image that is closest to the haze-free image based on the maximum value of the SSIM index. Thus, the value of transmission is the optimal initial value, and the corresponding correction value *X* is the optimal correction value of initial transmission.

However, this method needs a haze-free image to ge<sup>t</sup> the optimal correction value *X*. This method is limited in practical applications, thus it is necessary to ge<sup>t</sup> the correction value according to the image characteristics. After analyzing the image contrast and the haze in images, we find the relationship between the correction value of initial transmission and the image characteristics. Therefore, a relatively reasonable initial transmission correction value can be obtained directly from hazy images.

If the relatively reasonable correction value of initial transmission is *X*, we take *T* ∗ *X* as the initial transmission value. Because the dehazing algorithm is based on the concept of enhancing the image contrast to the greatest degree, the contrast is the important indicator. The value of image haziness flag *T* represents the degree of haze that degrades the image contrast. Thus, the image contrast and haziness flag value should be considered simultaneously. We set *C* as the image contrast and set *T* ∗ *C* as a quantitative value representing the image characteristics. The constant value *X* depends on the range of value *T* ∗ *C*.

Table 1 shows the values of *X* for different ranges of *T* ∗ *C*. In Table 1, *X* is the optimal correction value obtained by the method with reference images, and *X* is the relatively reasonable correction value obtained by the ranges of *T* ∗ *C*. In the dehazing algorithm, the initial transmission value is the key factor that affects the dehazing result. Table 1 shows the values of *T* ∗ *X* and *T* ∗ *X*, which are the initial transmission value derived by optimal correction of initial *X* and relatively reasonable correction value *X*, respectively. Figure 2 shows the histogram of *T* ∗ *X* and *T* ∗ *X*, where the values of *T* ∗ *X* and *T* ∗ *X* in the same group have similar values, and the difference of the values in the same group does not affect the dehazing results significantly. Therefore, our method can determine the optimal initial transmission value using only the nature of images and then obtain a more adaptive transmission value.


**Table 1.** The value of *x'* for different ranges of *T \* C.*

**Figure 2.** The histogram of *T* \* *X* and *T* \* *X*.

#### **4. Adaptive Traffic Video Dehazing Method Using Spatial–Temporal Correlations**

Compared with static traffic images, traffic videos have some unique characteristics. First, a traffic video is a collection of images with time continuity. Second, the cameras are fixed on the road and capture videos of the same scene over a long time, thus the videos are consistent in space. Therefore, we can use the correlations of spatial-temporal information to speed up traffic video dehazing.

#### *4.1. Time Continuity of Traffic Videos*

Because the cameras are fixed, the scenes in traffic videos barely change over a long period of time, and the influence of haze is stable. In our experiments, we use the traffic videos from ZhongHe elevated freeways in Hangzhou City, set a cycle of five minutes, and regard the frames in one cycle as a collection of images with the same characteristics. Figure 3 shows images whose interval is 1 min in a 5 min cycle, and the difference of *T* is very small, usually less than 0.04. Figure 4 presents the difference in restored images by using different *T* values where the results have no obvious influence on visibility with the difference of *T* less than 0.04. Therefore, if the videos are captured at the same scene, the values of *T* for these video images in a 5 min cycle are at the same level, and the cycle of 5 min is reasonable in practical application.

After setting the 5 min cycle, we can take the first frame of a video segmen<sup>t</sup> as a reference frame. We can determine the image haziness flag value *T* and the relatively reasonable initial transmission correction value *X* from the reference frame and then calculate the optimal transmission *t*<sup>∗</sup>. In this way, we can speed up the dehazing processing for the traffic video. This method can avoid incorrect transmission estimation, which is caused by the changes in atmospheric light, and eliminate the discontinuity of videos after dehazing.

**Figure 3.** The difference of *T* for the images in a 5 min cycle. The images come from different scenes (**<sup>a</sup>**,**b**).
