1. Introduction
One of the basic functions of digital cameras is to obtain real-world information naturally and vividly, which is comparable to the information captured by the human eye. However, current digital camera technology still has many limitations in obtaining rich texture details, vivid colors and brightness. This is because the dynamic range of the real scene is larger (
[
1]), while the dynamic range captured by the sensor in ordinary cameras [
2] is small compared to this. A photo can only capture the smaller dynamic range of the real world. Therefore, in HDR (high dynamic range) scenes, the picture is often underexposed or overexposed. The most direct way to deal with this problem is to use high dynamic range devices [
3] to acquire and display real scenes. However, these devices are often expensive and cannot be used universally [
4].
In recent years, software-based solutions have gradually entered people’s field of vision. Compared with hardware-based methods, software-based methods are easy to implement, inexpensive, and suitable for ordinary cameras [
5]. Existing software-based solutions fall into two main categories: HDR imaging techniques and multi-exposure image fusion (MEF). HDR imaging technology uses multiple exposures of images captured with ordinary cameras to estimate the camera response function (CRF) [
6,
7], resulting in a high dynamic range image. Then, the image is compressed, and the tone mapping method [
8] is used to convert the high dynamic range image into a low dynamic range (LDR) image so that the image can be visualized on common display devices [
9,
10,
11].
However, the complexity of HDR imaging technology is high and the time required is long, which is not suitable for ordinary cameras [
12]. Different from HDR imaging technology, the multi-exposure fusion method does not need to construct a HDR image. It extracts pixels with more information, better exposure, and higher image quality from the input multiple-exposure low dynamic range image, and then fuses them. The resulting fused image does not need to be processed and can be displayed on an ordinary display device [
13]. Compared with HDR imaging technology, the multi-exposure image fusion method has lower computational complexity and faster speed, so this type of method is the first choice for ordinary cameras [
12].
In the existing multi-exposure image fusion techniques, the source image sequence is obtained under different exposures, and the difference in exposure causes the loss of image details. When there are moving objects in the captured image, if the images are fused directly, ghosting artifacts can appear in the resulting image. The elimination of ghosting artifacts is a major difficulty in the current multi-exposure fusion technology.
As early as the 1980s, Brut et al. [
14] proposed Laplacian pyramid decomposition, which can effectively fuse two images. Subsequently, this method is used in many image fusion techniques. Mertens et al. [
12] proposed a multi-resolution fusion method. This method uses contrast, saturation, and exposure as three quality measures to construct a weight map, and then fuses the images with a Laplacian pyramid decomposition method. The images fused by this method have good saturation but lack detailed information, which is not suitable for processing images in dynamic scenes.
Shen [
15] proposed a method based on the generalized random walk framework, which deals with image color distortion where the fused image lacks correct color information.
The method proposed by Gu et al. [
16] fuses multi-exposure images in a gradient field, generates the gradient value of each pixel by maximizing the structure tensor, and derives the pixel value from the gradient field that best represents the geometric features in the scene. This method can extract some detailed information, but the resulting image is darker, and the details in the dark areas are lost.
Li and Kang et al. [
17] used a multi-exposure fusion method based on median filtering and recursive filtering. This method uses color dissimilarity to eliminate the influence of moving objects, and to a certain extent, it can eliminate ghosting caused by motion, but it does not process the color information effectively, resulting in color distortion of the fused image.
Shen [
18] proposed a new hybrid exposure weight metric. This method is characterized by the use of mixed exposure weights to guide Laplacian pyramid enhancement. His method can maintain the color appearance and texture structure of the image, but cannot maintain the edge well, and the algorithm has high complexity and low efficiency.
Bruce et al. [
19] calculated the information entropy and the neighborhood radius of the pixel in the logarithmic domain and then set the weight according to the value of the information entropy of each pixel point. The result of this method is that although the information entropy is high, the overall color of the image is dark and distorted.
The main feature of Li and Wang [
20] et al.’s method is the use of structure tensors to extract image details. The resulting images have better color saturation and better preservation of image texture details, but this method handles image sequences in dynamic scenes, and there are ghosting artifacts in the resulting images.
Hayat et al. [
21] proposed a multi-exposure image fusion technique based on multi-resolution fusion, which estimates the weight map using contrast, saturation, exposure, and color dissimilarity, and finally uses a multi-resolution method of pyramid decomposition to obtain the fused image. The benefit of this technique is that it can avoid the seam problem well, but when moving objects appear in multiple images, ghosting artifacts can still be generated.
The characteristics of the method of Huang [
22] et al. are that the resulting image has rich colors and appropriate exposure, but some texture details are lost in the overexposed and underexposed areas of the image, and when fusing image sequences in dynamic scenes, results in significant ghosting in the image.
Our method solved the problems of brightness imbalance, insufficient retention of detail information in light and dark areas, and ghosting artifacts that occurred when fusing dynamic images. To solve the above problems, this paper constructed a new ghost-free multi-exposure image fusion model based on the multi-scale block LBP operator. In this method, for multi-exposure image sequences containing moving objects, the multi-scale block LBP operator [
8,
23,
24,
25,
26,
27] was used for local texture extraction in bright and dark areas and ghost removal caused by moving objects. On this basis, a new brightness adaptive method was also proposed to ensure the fused image had better visibility. After constructing the weight map, the discontinuous and noisy initial weight map was refined using filters, and finally, the weight map was fused by a multi-resolution method to obtain the final result image. The main contributions of this paper can be summarized as the following three points:
This method took advantage of the multi-scale block LBP operator (MBLBP) and applied it to the multi-exposure image fusion method for the first time. We designed two quality metrics based on the multi-scale block LBP for local texture extraction and ghost removal in images, respectively.
According to the brightness characteristics of the image, a new brightness adaptive method was proposed, which could adaptively adjust the brightness weight of the pixels in the source image sequence according to the brightness and darkness of the pixels.
A new method of multi-exposure fusion based on the multi-scale block LBP was proposed. This method could fuse multi-exposure image sequences captured in static scenes and dynamic scenes.
The rest of this article is organized as follows.
Section 2 discusses a detailed explanation of the proposed technique.
Section 3 contains the comparison of experimental results, as well as parameter evaluation.
Section 4 discusses the conclusions, limitations, and future work of this methodology.
2. Proposed Method
In this paper, a ghost-free multi-exposure image fusion technique based on the multi-scale block LBP operator was proposed. Our method constructed an initial weight map by computing three image features: texture features, luminance features, and spatial consistency features. We used the multi-scale LBP operator for local texture measurement, which could compute spatial details from LDR source image sequences. A new motion detection method based on the LBP operator was also proposed for fusing image sequences of dynamic scenes containing moving objects. Finally, the method used fast-guided filters to initially refine the weights, and a pyramid decomposition method to fuse images.
The above quality metric calculation methods and experimental results are discussed in detail in the following subsections.
Figure 1 shows an overall flow chart of our method.
2.1. Texture Change Weight
When fusing image sequences captured in static scenes, two image features were considered: contrast and brightness. We used contrast as one of the weights so that the fused image had more texture and edge information. When using the multi-resolution method [
12] to fuse multi-exposure images, it retained enough detailed information in the normally-exposed regions. However, since the texture details of the bright and dark areas were affected by the brightness, some of the details were lost in the bright and dark areas. Aiming at this problem, we innovatively proposed the subregion texture detail extraction method based on the multi-scale block LBP.
We let
denote the source image sequence, where
= 1, 2, …,
K denoted the ith image in the image sequence. In a set of multi-exposure image sequences, the average brightness of each pixel
represented the brightness of the pixels in the HDR scene, which was calculated as follows:
where
was the normalized average brightness of pixels at position
in the image sequence, and
represented the brightness value of pixels at position
of the ith image.
Next, we used the average brightness
at each pixel
to divide the bright region
, the dark region
, and the normally exposed region
of each image. The calculation was as follows:
where
was the grayscale image and
was the luminance threshold, which usually took a value of 0.04–0.12 [
28]. In this paper, we also set
to be 0.1 (as proposed by previous methods [
17,
29]). When the average brightness of the pixel at
was less than 0.1, the pixel belonged to the dark area, when the average brightness of the pixel at
was greater than 0.9, the pixel belonged to the bright area, otherwise, it belonged to the normal exposure area. For the convenience of calculation, here, the bright and dark regions in the image were denoted as
together.
For the normal exposure area in the image, the Scharr operator was used to extract the texture detail information, and the contrast of each pixel was calculated as follows:
and
represented the texture changes in the x-axis and y-axis directions of the image, respectively, and the symbol * was the convolution operator. Then, we used the above convolution calculation results to calculate the texture change weight in the normal area, the calculation was as follows:
To preserve the details of the bright and dark regions in the input image sequence, we used the multi-scale block LBP operator to extract the texture details of the region. Because this operator had rotation invariance and grayscale invariance, and had strong robustness to illumination, it could better extract the texture detail information of this area, and its calculation was as follows:
was an operator for extracting local texture features of images. For more details on LBPs, please refer to [
25,
26,
27].
was used as the encoded value of the pixel at
, that is, the LBP eigenvalue, which could reflect the texture information of the central pixel
and its neighborhood.
Then, the fast local Laplacian filter [
30] was used to enhance the detail information in
, while retaining the information of the edge parts. The texture change weights in the bright and dark areas were as follows:
stood for the fast local Laplacian filtering operation, and and were the parameters of this filter. represented the magnitude of the edge in the image: the intensity percentage less than was considered as detail, the percentage change greater than was considered as edge and was preserved, and when was less than 1 it meant enhancement of detail.
Finally, by combining the above two texture change weights, the final contrast weight map was obtained, as follows:
2.2. Brightness
If the image was underexposed/overexposed, the image was very dark/bright, and the brightness of the image was 0 (or close to 0)/1 (or close to 1), which caused the image to lose a lot of information. Therefore, the basic idea of constructing the brightness weight in this paper was to set a small brightness weight for the overexposed and underexposed areas in the image and assign a higher brightness weight to the pixels with a brightness close to 0.5 [
21]. The formula for the brightness weight in this paper was as follows:
,
, and
represented the Gaussian curve of the red, green, and blue channels [
21], respectively, calculated as follows:
To simplify the formula, the variable was used to represent the red, green, and blue channels (for example, when the value of was , the above formula represented the Gaussian curve of the red channel).
However, some pixels in the image were possibly inherently bright and dark rather than being too bright or too dark due to over or underexposure. In this case, we needed to choose appropriate weights for the pixels in the bright and dark areas to better preserve the characteristics of the source image. This paper developed a novel and flexible bell curve function, and innovatively proposed a luminance adaptive function to adjust the weight of pixels in light and dark areas. The improved method was calculated as follows:
The usage of variables
and
here was similar to Formula (10) (for example, when
took the value R,
was
, and the formula represented the brightness weight curve of the red channel), where
was 0.2 [
12] and
was 1.
,
, and
were adaptive functions, and
,
, and
were the average luminance values of the pixels in the red, green, and blue channels at the position
in the source image, respectively. The calculation method used Formula (1). According to the average brightness of the pixel in each channel, according to the previous method, the bright, dark, and normal areas in each channel were divided, and the specific calculation method used Formula (2).
Take the red channel
as an example:
was the bright area in the channel,
was the dark area in the channel,
was the normal area of exposure, and the adaptive function was:
represented the luminance value of the ith image pixel in the red channel in the image sequence. The adaptive functions and in the green and blue channels were calculated in the same way as in the red channel.
2.3. Spatial Consistency
When the image sequence contained moving objects (such as moving people), the weight map also needed to consider the effect of moving objects, otherwise, ghosting could appear in the resulting image. To solve this problem, this paper proposed a new method for constructing spatially consistent weight terms based on the multi-scale block LBP (MDLBP).
Figure 2 shows a flow diagram of the motion detection based on the multi-scale LBP.
First, we computed the LBP feature for each pixel in the source image:
was a grayscale image. For any two different images
and
in the image sequence, the local similarity between them was measured by calculating the Euclidean distance between
and
at the pixel
. The calculation was as follows:
where
was the 2-norm operation that computed the matrix.
Then, the spatial consistency weight term of the image in the moving scene was constructed in the following way, and the formula was as follows:
Among them,
represented the initial spatial consistency weight calculated using Formula (15) (
S1 was just a symbol used to distinguish the spatial consistency weight from the texture weight and brightness weight mentioned above and had no other practical significance). The standard deviation
controlled the influence of the local similarity
on the weight
, and
was also set to 0.05 (the value of 0.05, which has been used in another multi-exposure method [
29], and the rationality of this value has been proved by previous work) in the method. The design idea here, is that if pixel
belonged to the motion region in image
, the local similarity
between the image
and all
(
) at
was increased. The spatial consistency weight
at the pixel point was decreased, resulting in a decrease in the weight value of the image
at the pixel point
.
When the moving frequency of moving objects in the image sequence was high, to remove the influence of ghosting artifacts more effectively, the method of “split-channel” was adopted. We calculated the LBP eigenvalues (
,
, and
) of the red, green, and blue channels at the pixel point
of the source image, respectively, and then calculated the
,
, and
of the pixel
in each channel, respectively. The local similarity between the
ith and
jth images was calculated as follows:
The spatial consistency weight term of the image constructed by the method proposed in this paper was calculated as follows:
where
was the initialized spatial consistency weight calculated using Equation (17). Formula (17) was improved from Formula (15), and Formula (17) had only one more step than Formula (15), that was, the calculation of
. When the motion frequency of moving objects in the source image sequence was relatively high, the effect of removing ghost images using Equation (17) was better than using Equation (15). The effect of using Formulas (15) and (17) was the same in other cases, but the complexity of Formula (17) was higher than that of Formula (15). For simplicity, the following discussion of spatial consistency weights was based on Formula (17) unless otherwise stated (including the flow charts of spatial consistency weights in
Figure 1 and
Figure 2).
Finally, the weight map was refined by the morphological operator to remove the influence of noise, and the final spatial consistency weight was obtained (among them, S in and S2 in were similar to the above-mentioned S1. They were symbols used to distinguish them from other weights and had no other practical significance).
Figure 3 shows the weight map for each quality measure of the method, and
Figure 3a shows the source image sequence “Arches”.
Figure 3b shows a local contrast weight map, a texture detector that assigned higher weights to pixels with more detail.
Figure 3c shows a luminance weight map that assigned higher weights to pixels with an average luminance value between 0.1 and 0.9. Pixels in light and dark areas used a brightness adaptation function to adjust their weights. The spatial consistency of
Figure 3d detected moving objects by computing the texture differences between different input images in the source image sequence to remove the ghosting artifacts that affected the fused images.
2.4. Weight Map Estimation
According to the previous calculation, three image features (texture change weight, brightness feature, and spatial consistency) were obtained. In this step, these weight terms needed to be combined to obtain the initial weight map. For the proposed method to extract the highest quality regions from the weight map, this paper used pixel multiplication to combine different weight maps, calculated as follows:
Among them, , , and represented the texture change weight, brightness weight, and spatial consistency weight of image , respectively, and was the initial weight map of this method.
In a static scene, the initial weight map did not need to consider the spatial consistency weight, which was only used for dynamic scenes with moving objects, to eliminate ghosting artifacts in the fused images caused by moving objects. After generating the initial weight map, to ensure that the sum of the weights of each pixel
was 1, we also needed to normalize the weight map, calculated as follows:
ε was a small positive number (for more details about ε, please refer to [
21,
29]).
2.5. Weight Map Refinement
Since the initial weight map was usually affected by noise, we needed to denoise it with a filter before fusing the weight map. When we used the smoothing filter to refine the weight map, we needed to keep the sharpness of the image edge texture while removing the influence of noise, so the edge preservation smoothing filter was selected. Many existing filters could be used to optimize the weight map, such as recursive filters [
31], bilateral filters [
32], etc. Due to the low complexity of the fast-guided filter [
33], it was used in this method to refine the initial weight map, which was calculated as follows:
represented the refined weight map and represented the fast-guided filtering operation. , , and were the parameters of the filter, where was the window radius of the filter and was the regularization parameter of the filter which controlled the smoothness of the filter. was the subsampling rate and and represented the guide image and the image to be filtered, respectively. In this method, the weight map was used as both the guide image and the input image. Finally, the weight map was normalized to obtain the final weight map .
2.6. Image Fusion
If the image was fused at a single resolution (that is, the weight map was directly fused with the image), due to the different exposure of each input image, they had different absolute intensities, which led to seam problems in the fused image. To solve this problem, a pyramid-based multi-resolution method [
12] was used to fuse images in this paper. In this method, the source image was decomposed into a Laplacian pyramid, the above weight map was decomposed into a Gaussian pyramid, and then the Laplacian pyramid and the Gaussian pyramid were fused at each level, and the calculation was as follows:
meant decomposing the weight map into a Gaussian pyramid, meant decomposing the input image into a Laplacian pyramid, was the new Laplacian pyramid after fusion, and l meant the number of layers of the pyramid. Finally, we reconstructed to obtain the final image. Since the multi-resolution method was used to fuse images, it was the image features that were fused, not the image intensities. Therefore, when using this method to fuse images, there was no seam problem in the image fusion process.
4. Conclusions and Future Work
In this paper, a novel ghost-free multi-exposure image fusion method was proposed. In this model, we used a novel multi-scale LBP operator-based method to extract local texture features of bright and dark regions from source images. If the captured image contained moving objects, we adopted the method based on the MDLBP operator to remove ghosting artifacts in fused images. Through quantitative and qualitative analysis of this method, the experimental results showed that this method was superior to many representative image fusion methods in both visual quality and objective analysis. This method could obtain images with good contrast, color and brightness, and could remove ghosting artifacts in fused images. In addition, this paper also studied two spatially consistent weight distribution methods and compared and discussed the effects of these two methods on the results of dynamic image fusion.
The methods proposed in this paper followed the weighted sum-based framework for multi-exposure image fusion without ghosting, but these methods still suffer from some common limitations. When processing an image sequence containing moving objects, if moving objects appear in most positions of the image sequence, and the same moving object appears in different positions in different images, the effect of this method in eliminating ghosting artifacts may not be good enough. For example, in the image sequence “Garden”, there were many moving people in the scene. Since a certain pixel position in most of the source images was covered by different moving objects or the same moving objects, it was difficult to eliminate all the ghosting in this scenario. To solve this limitation, it is one of our future tasks to improve the multi-exposure image fusion method and develop a method that can eliminate ghosting more effectively.
In the future, I intend to further utilize the advantages of the multi-scale LBP operator to improve the ghost removal method in this paper to address the limitations mentioned above.