3.1. Image Preprocessing
Bilateral filtering is an edge protection filtering method using a weighted-average strategy based on Gaussian distribution [
22,
23]. Bilateral filtering consists of two parts: spatial matrix and range matrix. The spatial matrix is analogous to Gaussian filtering, which is used for fuzzy denoising; the range matrix is obtained according to the gray-scale similarity, and is used to protect edges. The specific formulas of the spatial matrix and range matrix are as follows:
where
(i, j) is the coordinate of the center point of the filter window, and
(k, l) is any point in the field of the center point.
For Gaussian filtering, only the weight coefficient kernel of the spatial distance is used to convolve the image to determine the gray value of the center point, the closer the point to the center point, the larger the weight coefficient. The weight of gray information is added to the bilateral filtering. In the field, the closer the gray value is to the center point, the greater the point weight [
22]. This weight is determined by Equation (2). By multiplying Equations (1) and (2), the final convolution template is obtained.
Multiplying Equations (1) and (2) is the calculation formula of the bilateral filter weight matrix, the final weight matrix by multiplying the two weight matrices:
Finally, calculate the weighted average as the pixel value of the center point after filtering:
In the dehazing algorithm [
18], firstly, the dark channel prior theory believes that in most non-sky local areas, certain pixels will always have at least one color channel with a very low value.
Given a mathematical definition of the dark channel, for any input image
J, the dark channel expression is as follows:
where
represents each channel of the color image,
x and
y represent the pixel.
represents a window centered on pixel
x.
is the dark primary color of the image in the
neighborhood. For clear and fog-free images, its value tends to 0.
In computer vision and computer graphics, the fog map formation model described by the following equation is widely used:
where
I(x) is the original image,
J(x) is the image after defogging,
A is the atmospheric light value, and
t(x) is the transmittance.
The current known condition is I(x), and the target value J(x) is required. This is an equation with countless solutions. Therefore, some priors were needed.
The formula of the fog map formation model is processed and can be written by Equation (7).
where
c represents the R/G/B three channels.
He et al. [
24] assumed that the transmittance was constant in each window, which was defined as
. From the dark channel map, take the first 0.1% of pixels according to the brightness. In these positions, find the corresponding value of the point with the highest brightness in the original foggy image
I as the value of
A. Therefore, Equation (7) can be transformed into:
According to the previous prior theory, the method to further deduce the estimated value of transmittance is as follows:
where
ω is the artificially introduced correction constant (generally 0.95), which is used to retain the fog at part of the perspective and maintain the variation of the depth of field. We set the lower limit
t0 to limit
t(x) in order to prevent the contrast from getting too large. When
t is less than
t0, t will be set to
t0. Therefore, the final recovery Equation (10) is as follows:
where
t0 is set to 0.1.
In order to improve the speed of the dehazing algorithm and achieve real-time effects, when optimizing the image, the original image is first down-sampled. First, the image is reduced to one-quarter of the original image, the transmittance of the reduced image is calculated, and then the approximate transmittance of the original image is obtained by interpolation, which greatly improves the execution speed, while the effect is basically unchanged.
3.2. Fusion Module
The multi-exposure images fusion model mainly includes three aspects: fusion network, feature retention, and loss function. The specific structure is shown in
Figure 2.
The input multi-exposure images are represented as
,
and the fusion image is generated through DenseNet training in the fusion network. In the feature retention module, the outputs of feature extraction parts are feature maps
, …,
and
, …,
. In information measurement, the amount of information extracted from the special graph is expressed as
and
. Through subsequent processing, the degree of information retention in the final obtained source images is represented by
and
.
, , , , are sent into the loss function without the need for ground-truth. During the training, the DenseNet was continuously optimized to minimize the loss function [
25]. It was not necessary to measure
and
again in the process of testing, and the fusion speed was faster in practical application.
The DenseNet architecture in the fusion network consisted of 10 layers, each of which had a convolutional layer and an activation function [
26]. Dense connections were applied inside each Dense Block, and a convolutional layer plus a pooling layer were utilized between adjacent Dense Block. The advantage of DenseNet is that the network is narrower, has fewer parameters, and reduces the phenomenon of gradient disappearance. The activation function of the first nine layers is LeakReLu with a slope of 0.2 and the last layer is the
tanh. The kernel size of all convolutional layers is set to 3 × 3, and the stride is set to 1 [
6].
We extracted the features through the pre-trained VGGNet-16, which is shown in
Figure 3. The convolutional layer output before the maximum pooling layer is the feature graph, which was used for subsequent information measurement [
27]. In the source image, the underexposed image has lower brightness. Therefore, the overexposed image contains more texture details or greater gradients than the underexposed image. The shallow features such as texture and details were extracted from
and
, and the deep features such as structural content were extracted from the feature maps of the later layers.
After the image gradient was estimated, the feature map information could be measured. Information measurement is defined as follows:
where
is the feature map and
k denotes the feature map in the
k-th channel of
Dj channels.
represents Frobenius norm and
represents a Laplace operator.
In this method,
and
were obtained through
and
. As the difference between
and
is an absolute value and is too small compared with itself, therefore, we scaled them with a predefined positive number
c to better distribute the weight. Through the function, the expressions of
and
are as follows:
The loss function consists of two parts defined as follows:
where
θ represents the parameter in DenseNet and
D represents the training dataset.
is the similarity loss between the fused image and the source multi-exposure image, and
is the mean square error loss between the images. α is used to control the trade-off.
Structural similarity index measure (SSIM) [
28,
29] is widely used in modeling distortion based on the similarity of light, contrast, and structural information. In this paper, SSIM was used to constrain the structural similarity between
,
and
. The loss function under the SSIM framework is as follows:
where
Sx,y denotes the value of SSIM between
x and
y.
Considering that SSIM only focuses on contrast and structure changes and it shows weaker constraints on the difference of the intensity distribution, we supplemented it by mean square error (MSE) between two images, and the loss function of this part was as follows:
3.3. Image Optimization
The fusion result should be a normal exposure under ideal conditions, which can reflect the road sign information. However, due to the limited training set and the small number of individual road signs, the exposure of the fusion result will still not reach the normal level. Therefore, we utilized histogram equalization and Laplacian to optimize the image with a poor fusion effect.
In actual applications, the fusion image was not what we expected due to the defects of our fusion algorithm. Although it retained the general characteristics of the under-exposure image and the over-exposure image, the overall brightness presented made ours feel inappropriate. Therefore, we introduced optimization algorithms next to adjust the brightness of the image without affecting the image quality.
When the brightness of the fusion image is lower than the pre-set value, the Laplace algorithm [
30] is used to sharpen the image. When the value is higher than the pre-set value, Histogram equalization [
31] is adopted to enhance the over-bright part of the image. The procedure is summarized in Algorithm 2.
Algorithm 2: The description of the optimization |
* Process of Optimization |
Parameter: denotes the brightness of , Mgray means the gray average of an image. |
Input: Fused image from DenseNet, the high threshold Th, and the low threshold Tl. |
Output: Final image after optimization. |
1: | Compute the Brightness of using Mgray/255.0; |
2: | if 1 > > Th then |
3: | ← Brightness reduction on ; |
4: | return ; |
5: | else if 0 < < Tl then |
6: | ← Brightness enhancement on ; |
7: | return ; |
8: | else ifTl ≤ ≤ Th then |
9: | return ; |
10: | else |
11: | return false; |
12: | end if |
The function of image sharpening is to enhance the gray-scale contrast, so that the sharpness of the image is improved. The essence of image blur is that the image is subjected to averaging operation or integration operation. The Laplacian operation is a kind of differential operator; through the inverse operation of the image, it enhances the region of the gray mutation in the image, highlights the details of the image, and obtains a clear image.
An image that describes the gray level mutation is generated through the Laplacian operator to process the original images, then the Laplacian image is superimposed with the original image to produce a sharpened image. This principle is actually a convolution operation.
Laplacian operator is the simplest isotropic differential operator with rotation invariance. The Laplace transform of two-dimensional image function is the isotropic second derivative, which is defined as:
In a two-dimensional function
f(
x,y), the second-order difference in
x and
y directions is as follows:
where
fx,y represents
f(
x,y).
In order to be more suitable for digital image processing, the equation is converted into a discrete form, which is as follows:
The basic method of Laplace operator enhanced image can be expressed as follows:
where
f(x, y) and
g(x, y) are the input image and the sharpened image, respectively.
c is the coefficient, indicating how much detail is added.
This simple sharpening method can produce the effect of Laplacian sharpening while retaining the background information. The original image is superimposed on the processing result of the Laplace transform, so that the gray value in the image can be retained, and the contrast at the gray level mutation can be enhanced. The end result is to highlight small details in the image while preserving the background of the image.
The fusion result will have the phenomenon of partial exposure. In the case, histogram equalization was utilized to optimize the fusion image.
Histogram equalization is a simple and effective image enhancement technology that changes the grayscale of each pixel in the image by changing the histogram in the image. The gray levels of overexposed pictures are concentrated in the high brightness range. Through the histogram equalization, the gray value of the large number of pixels in the image is expanded, and the gray value of the small number of pixels is merged, and the histogram of the original image can be transformed into a uniform distribution (balanced) form. This increases the dynamic range of the gray value difference between pixels, thereby achieving the effect of enhancing the overall contrast of the image [
31,
32].
The gray histogram of the image is a one-dimensional discrete function, which can be written as:
where
nk is the number of pixels whose gray level is
rk in the source image.
Based on the histogram, the relative frequency
of gray level appearing in the normalized histogram is further defined, and the expression is as follows:
where
N represents the total number of pixels in the source image
I.
Histogram equalization is to transform the gray value of pixels according to the histogram.
r and
s represent the normalized original image grayscale and the histogram equalized grayscale, respectively, and the values of them are between 0 and 1. For any
r in the interval of [0, 1], a corresponding
s can be generated by the transformation function
T(
r), the expression is as follows:
T(r) is a monotonically increasing function to ensure that the gray level of the image after equalization does not change from black to white. At the same time, the range of T(r) is also between 0 and 1, ensuring that the pixel gray of the equalized image is within the allowable range.
The inverse transformation of the above formula is as follows:
It is known that the probability density function of
r is
pr(
r) and
s is the function of
r. Therefore, the probability density
ps(
s) of
s can be obtained from
pr(
r). Further, because the probability density function is the derivative of the distribution function, the probability density function of
s is further obtained through the distribution function
Fs(
s). The specific derivation process is as follows:
Equation (26) shows that the probability density function
ps(
s) of the image gray level can be controlled by the transformation function
T(
r), thereby improving the image gray level. Therefore, in the histogram equalization,
ps(
s) should be a uniformly distributed probability density function. As we have normalized the
r, the value of the
ps(
s) is 1. Therefore,
ds = pr(
r)
dr, the integral on both sides of the formula can be obtained as follows:
Equation (24) is the expression of the transformation function
T(
r). It shows that when the transformation function
T(
r) is the cumulative distribution probability of the original image histogram, the histogram can be equalized. For digital images with discrete gray levels, using frequency instead of probability, the discrete mindset of the transformation function
T(
rk) can be expressed as:
rk is between 0 and 1, which represents the gray value after normalization, which is calculated by the quotient of k and L − 1. k represents the gray value before normalization. Equation (28) shows that the gray value sk of each pixel after equalization can be directly calculated from the histogram of the source image.