3.2.2. Color Correction
Due to the limitations of the digital camera’s photosensitive device, there is a difference between the recorded color and the real color. This difference can be expressed as a relation of mapping. Hirschmuller [
33] estimated the downlink attenuation coefficients of different colors under the water based on the multispectral or hyperspectral data of underwater images of target at a specific location. By reversely creating an unenhanced image, the over-enhanced phenomenon caused by the camera built-in functions of white balance and color enhancement is eliminated. Therefore, the color information of the underwater image is effectively restored. The proposed system used three basic color (RGB) calibration method to calibrate and restore the color information of images. Through a variety of curve fitting tests, it is found that the three polynomial fitting results are the best and the fitting formula of the photosensitive curve is as follows:
where
r,
g and
b are recorded values of digital cameras for red, green and blue, respectively;
R,
G and
B are standard values for red, green and blue. We carried out the color correction experiment with ‘ColorChecker 24’ using the method of three basic color calibration and achieved the fitting coefficient of the photoreceptor curve. The photoreceptor curve fitting parameters of cameras are given in
Table 4.
To demonstrate the effectiveness of our proposal,
Figure 5 gives a comparison of the original underwater images, the results using the dark channel prior method in Reference [
38] and the results using BM3D filtering and color correction of our research. It can be seen from
Figure 5 that, the ‘atomization’ phenomenon is effectively eliminated and the true color information of the image is well restored after BM3D filtering and color correction. By using BM3D filtering and color correction, the targets in the underwater images become much clearer, which will contribute to the following accurate stereo matching of the image pairs. Compared with our method, although the method in Reference [
38] can also well remove the ‘atomization’ phenomenon, it may fail to recover the true color, which can be seen from the first three images in the second row. In the first image of the second row of
Figure 5, the third and fifth blocks, which should have different colors, are made the same color; and that is the same case for the square target and vase in the second image, which should have different red colors; while in the third image, the red color is not well recovered. To clearly demonstrate this, the standard color checkboard, the color checkboard restored by the method in Reference [
38] and that restored by BM3D filtering and color correction are further shown in detail in
Figure 6, from which it can be seen that the color checkboard restored by BM3D filtering and color correction is much closer to the real one.
3.2.3. Image Segmentation
After image denoising and color restoration, the true colors of images are obtained. However, due to the particularity of marine environment, the background of underwater images usually contains less texture information, which may cause many mismatched regions. Therefore, the proposed system implemented the segmentation and extraction of the target from the background before the stereo image matching. Taking the advantages of the super-pixel segmentation algorithm that can reduce the complexity of the subsequent image processing, a segmentation algorithm based on super-pixel clustering is adopted in this paper. First, the brightness and texture features are extracted from the underwater image after the noise reduction. Next, the similarity of the two features is calculated to make a weighted fusion. Then, the pixels are clustered to generate the super-pixels by using the fusion similarity as the distance measurement. The calculation formula of the distance metric
D is as follows:
where
dt,
dc and
ds are similarity distances for texture, color and spatial features, respectively;
Ns is the maximum space distance within the cluster;
Nc is the maximum color similarity; and
is a weight parameter. Obviously, the smaller the distance metric
D is, the greater the similarity between the pixels.
Figure 7 shows the results of the target segmentation. Based on the feature, the generated super-pixels are detected and the super-pixels with saliency are marked by red line. All the super-pixels are clustered by the method of the Max-Flow/Min-Cut algorithm. After that, the proportion of the significant super-pixels in clustering is calculated and compared with the preset threshold. Thus, the segmentation result for the foreground object is obtained.
3.2.4. Stereo Matching
Stereo matching is an important part of the system implementation. The system uses binocular cameras to get different views of the left and right images, to calculate the cost of stereo matching and get the matching disparity map. Considering both matching accuracy and time efficiency, the Semi-Global Matching (SGM) algorithm that has the advantages of fast matching speed and high matching accuracy is preferred. According to this, we proposed an improved algorithm based on the semi-global matching algorithm in our system. Taking the right view as the reference,
Figure 8a shows the stereo matching principle of SGM algorithm. If there is a point to be matched in the right view image and its horizontal ordinate is
x, we could search for the best match point starting from the position
minDis within the range of
Windows in the left view image. However, the disadvantage of this algorithm is that the color characteristics of underwater images are seriously disturbed and degraded by the influence of light and water scattering. Therefore, there are many mismatches in the background areas.
The improved stereo matching algorithm based on SGM algorithm proposed in this paper can accurately extract the target area from the background in underwater images. The stereo matching process is strictly constrained within the valid target area. As shown in
Figure 8b,c, the black pixels belong to background that are invalid for stereo matching, while the white ones belong to target areas that are valid for stereo matching; and the gray one is the current pixel to be matched. If the matching pixel in the left image of the current pixel to be matched in the right one is in the valid target area, the matching search process starts directly from the position
x in the left image until the matching pixel reaches the boundary of search window. If the boundary of search window is in the invalid background area, the search process ends in advance. If the matching pixel in the left image of the current pixel to be matched in the right one is in the invalid background area, the matching search process starts from the first valid pixel in the search window until the matching pixel reaches the window boundary or the invalid background. The implementation of the improved algorithm can be divided into the following four steps:
(1) Gradient information extraction. To further eliminate the effect of image noise on calculation of disparity map, the horizontal Sobel operator is used to extract the gradient information of the image. The Sobel operator is given as follows:
where
I represents the pixel value of the image. After processed by the Sobel operator and smoothed by the Gauss filter, the original image is mapped to generate a new image. The mapping function is given by:
where
I represents the pixel value of the original image;
Inew indicates the pixel value of the remapping image; and
Th is the threshold of the filter.
(2) Matching cost calculation. In practical applications, the different visual angle of the binocular vision system always leads to an inhomogeneous phenomenon between the left and right view images, which causes an increase in the mismatch rate. Mutual information has the advantage of insensitivity to light, so the semi-global matching algorithm is based on such information. The computational efficiency and accuracy of the stereo matching are improved by the cost calculation of the hierarchical mutual information instead of the traditional gray value calculation. The definition of mutual information is as follows:
where
and
are the entropies of the left and right images, respectively;
is the combined entropy for the two images. According to the Taylor expansion formula, entropy
and combined entropy
can be respectively expressed as:
where
represents the joint probability distribution of the images;
g(
i,
k)is the Gauss kernel function. Therefore, the mutual information
can be finally given by:
Then, the corresponding matching cost is defined as:
where
Ip is the value of point p and q is its corresponding point on the polar line in the left view image. If the horizontal ordinate of p is
x, then the horizontal ordinate of q is
x +
d, where
d is the disparity value.
(3) Cost aggregation. The matching cost based on the mutual information has been obtained after the above calculation process for matching cost. But such matching cost with the form of pixel by pixel can be easily affected by mismatch points or noise and other factors. Therefore, the penalty function based on the neighborhood disparity data is introduced to increase the smoothness constraint. Accordingly, the energy function can be defined as:
where
is the data item representing the matching costs of all pixels in the image and the next two items are used for punishment. If the disparity value between the point p and the point q equals 1, the punishment item
P1 works; otherwise if the disparity value is greater than 1 and
P2 is larger than
P1 at the same time, the punishment item
P2 works. Besides, q is a point within the neighborhood (
) of point p. To minimize the energy, the dynamic programming method is adopted and the idea of scanning line optimization is introduced. The matching cost on the direction
r could be defined as:
where
indicates the matching cost of point p on disparity value
d; the second term represents the minimum matching cost of the path adjacent point p−r based on the disparity smoothing constraint; and the third term represents the minimum matching cost of the path adjacent point p−r along the direction
r. Therefore, the sum of the matching costs of point p can be obtained by aggregation of the path costs in the direction of each scan line, which is given by Equation (18):
(4) Disparity map optimization. According to the above matching cost calculation method, the right view image was set as the reference and the left view image was the one to be matched. The effective area of the whole image is traversed by progressive scanning. Upon every valid pixel in the right view getting the best matching point with the lowest matching cost in the left view, the basic disparity map was consequently formed. Aiming at the problem of mismatching or invalid disparity in the weak texture area, the proposed method made use of the super-pixel segmentation data to optimize the basic disparity map within every super-pixel area by the least squares fitting interpolation method. The plane template used in this paper is given by Equation (20):
The weighted least square method was used to calculate
a,
b and
c, which formed the parameters set of the disparity plane template. The calculation formula of the weighted least square method is as follows:
where
N is the total number of pixels in the plane area; (
xi,
yi) and
di are the coordinates and disparity values of the pixel indexed by
i, respectively.
As shown in
Figure 9, compared with the basic disparity maps obtained by stereo matching, the optimization results of the least squares plane fitting interpolation method are smoother and more complete, with fewer holes. The invalid matching areas have been basically eliminated. In addition, the disparity plane in the same area is effectively smoothed and the transition of disparity values is more placid. The fitting parameters of the disparity plane templates obtained for optimizing the three basic disparity maps are given in
Table 5.
In
Figure 10, we provide a comparison of the disparity maps produced by our method and four state of the art stereo matching methods, which are AD-Census method by Mei et al., Fast Cost-Volume Filtering (FCVF) method by Hosni et al., Adaptive Random Walk with Restart (ARWR) method by Lee et al. and Semi Global Matching (SGM) method by Hirschmuller et al. Among the five methods, the proposed method usually can provide best disparity maps, which are smooth and continuous, with fewer black holes. The two global matching methods, that is, the AD-Census method and the ARWR method, have better performance than the FCVF method and the SGM method. By constraining the matching within the valid target area and further optimizing the basic disparity map using the least squares plane fitting interpolation, the proposed method has made a remarkable improvement of the SGM method.