The flowchart of the proposed method is shown in
Figure 1, and it mainly consists of the following steps: First, apply the Retinex model to the input image, separating the image into an illumination space and reflection space. Next, process the illumination image based on the sum of the mean and standard deviation of the current image pixels, separate the highlight positions of the current image, and generate the corresponding mask. Then, based on this mask, separate the specular reflection in this area. The advantage of this approach is that it only processes the highlight region separately, avoiding color distortion due to global processing. Given that there are noticeable processing artifacts between the highlight and non-highlight regions after highlight removal, this paper introduces a compensation function to reduce the post-processing discontinuity. Finally, due to the significant loss of details at the original highlight positions after highlight removal, this paper uses an approach of the adaptive Laplacian operator and gradient fusion to enhance the image, resulting in the final output of the image after highlight removal.
3.1. Highlight Removal
The accurate detection of highlight positions is crucial for highlight removal and directly affects the image restoration results and the efficiency of highlight removal algorithms. This paper is inspired by the Retinex model [
28], which suggests that a digital image can be represented as the pixel-wise product of illumination and reflection components. In this context, the illumination component reflects ambient light information, while the reflection component reflects the detail information of the image. The relationship between them can be expressed as:
where
represents the pixel at position (
x,
y) in the image,
represents the color channel, and
and
represent the reflection and illumination component information at the corresponding pixel position. Based on this theory, this paper separates the highlight image into reflection and illumination components and performs highlight localization based on the illumination component, as shown below.
In the above equation, represents the extracted highlight mask, which also serves as the threshold for highlights. Here, represents the total number of pixels, where is the width and is the height of the image. represents the i-th pixel value in the illumination component. Finally, the two are added together to obtain the mask for the highlight region.
Subsequently, in the mask region extracted based on the approach in this paper, specular reflection is separated using the dichromatic reflection model. However, compared to traditional methods, this paper processes only the region identified by highlight localization rather than the entire image. This can improve the color bias issues encountered in previous methods. The specific improvements are as follows:
The above formula represents the dichromatic reflection model, where denotes the image’s horizontal and vertical coordinates, and represents the intensity of the three-channel pixels in a color image. In the formula, D(x) and indicate the diffuse reflection and specular reflection components, respectively, represents the diffuse reflection chromaticity, and represents the specular reflection chromaticity. and are the coefficients for these two components.
Based on the inference from reference [
12], according to formula (4), the following formula can be derived, where
is the approximate illumination-free image separated in the RGB space,
is the minimum pixel value in the RGB space, and
is the mean of the minimum pixel values in the RGB space.
However, on the surface of metals, because they are mostly smooth and lack proper diffuse reflectance chromaticity [
15], the highlight removal process in this paper is as follows. First, based on the highlight mask separated in this paper, the following operation is performed, where
represents the conversion of the image to the HSV space,
is the original RGB image,
is the minimum value in the RGB pixels,
is the highlight mask separated based on Formula (3), and
indicates that the operation is only applied to the masked area.
represents the V space in the HSV color space of the original image, and finally, the minimum pixel is separated:
Finally, the method for highlight removal in this paper is as follows, where
represents the conversion from the HSV space back to the RGB space and makes a more accurate judgment on the highlight removal by adding the threshold information calculated in this paper,
.
However, after processing using our method, it was observed that there was a sense of disconnection at the edge of the mask. Therefore, we introduced a compensation factor to handle the mask, reducing the processing traces at the restoration location. The formula is as follows:
In the above formula,
x and
y represent the coordinates of the corresponding pixels, and
i and
j represent the size of the convolution kernel. In this paper, a 5 × 5 convolution kernel is used to achieve the best restoration effect. The repair effect of highlight removal with the introduction of this compensation coefficient is shown in
Figure 2.
Due to the fact that the compensation part in the above images is mainly located at the edges of the highlights, certain images may not exhibit significant improvements. For comparison, columns (c) and (d) in
Figure 2 were selected to compare the performance metrics, and the specific results are shown in
Figure 3. The horizontal axis represents the input images in
Figure 2a that’s A minus E, and the vertical axis represents the PSNR (peak signal-to-noise ratio) values of the images. After comparison, it was found that the PSNR values of the images significantly improved after the addition of the compensation function.
3.2. Detail Restoration
After improving the method in this paper to repair highlight regions, there was often a severe loss of details in these areas. This had a significant impact on weld seam detection. Therefore, the paper introduces global detail and minor detail restoration. Firstly, in the HSV color space, the V channel of the repaired image is decomposed using guided filtering [
29] (
). Subsequently, based on the texture layer
, weight parameters are applied to the structural layer
for global detail enhancement to correct global detail issues. The paper improves the Laplacian global image enhancement operator, refining image sharpness to enhance the overall detail information and achieve global detail enhancement. The formula is as follows:
In the above formula,
represents the position of the image’s structural layer at the corresponding pixel.
and
are two adjustable parameters with default values of 0.1 and 0.75, where the parameter
can better control sharpness, and
can better restore the brightness of edges. In the final Laplacian operator, this paper introduces a weight
to control visual sharpness (this parameter requires
), and the higher the value of
, the clearer the result. The final formula is as follows:
To adapt to images in different scenes, this paper proposes an adaptive weight
to meet the sharpening effects of various scene images. The formula is as follows, where
represents the position of the image’s texture layer at the corresponding pixel:
After applying global optimization, there were still missing details in the highlight removal area. Therefore, this paper uses the method of maximum gradient fusion [
30] to restore the tiny details of the image. In this paper, the horizontal gradient
is obtained through convolution, and its formula is
, where
represents convolution,
represents the target image, and
represents the Sobel operator. Similarly, using the transpose of
, the paper calculates the vertical gradient
with the specific formula
, where
. After obtaining the magnitudes of horizontal and vertical gradients, the image gradient can be defined as:
where
and
are the positions of the pixels in the horizontal and vertical gradient maps, respectively. Through the above equations, the gradient information of the original image and the image detected and repaired in the previous section can be obtained separately. The visibility of the image details is closely related to the magnitude of gradients, so this paper extracts the maximum gradient value
from each image. Since structural similarity is used to evaluate the perceptual similarity between two images, this paper calculates the similarity between the maximum gradient map
and the gradient map
of the globally enhanced image, which can be expressed by the following formula:
In the above equation,
,
,
,
, and
represent the local mean, local variance, and covariance of
and
, respectively.
and
are stabilizing constants. In this way, this paper obtains the gradient quality
by averaging the quality maps, where
and
are the height and width of the image:
After obtaining the gradient quality, the texture is pyramid blended with the globally enhanced image to obtain the final experimental results, as shown in
Figure 4.