4.3. Qualitative Evaluation
We apply the FE-GAN model to real and artificially synthesized underwater image datasets, process paired and unpaired distorted images, and compare them with the corresponding ground truth images.
Figure 6 shows some examples of the FE-GAN model in the EUVP dataset. It can be seen that our model effectively improves the contrast of the distorted image and makes the overall color more vivid. Moreover, it also improves the sharpness, so the image appears clearer. Here, we selected several state-of-the-art models. Among them, pix2pix [
23], ResGAN [
39], UGAN [
40], FUnIE-GAN [
26] are models based on the learning method, Mband-En [
41] and Uw-HL [
42] are physical-based models. As can be seen from
Figure 7, the images generated by the physical-based models (Mband-En and Uw-HL) have over-saturated brightness and over-exposure, simultaneously, and the whole image presents red and yellow hues. The background of the image generated by ResGAN is too dark, and the contrast is not distinct; UGAN and FUnIE-GAN are not thorough enough to deal with the blue-green hues of the distorted image, so the background is still light green.
For the ImageNet dataset, we randomly selected 628 pairs of real underwater images for testing.
Figure 8 shows a part of the test results. Through the processing of our FE-GAN model, the generated image effectively repairs the overall blue-green hue, and the entire brightness of the image is improved. The enhanced image effectively restores the red light, which is absorbed fastest, better distinguishes the background and foreground, and the overall image is clearer. We also selected several state-of-the-art learning-based models, UWCNN [
43], CycleGAN [
24], UGAN [
40], IPMGAN [
17], and compared them with our experimental results. As can be seen from
Figure 9, the image generated by UWCNN is dark overall, but some areas are too bright to highlight the pivotal areas in the image. CycleGAN and UGAN have poor repair effects on green hues, and obvious traces of the original image can still be seen in the background. The sharpening effect of the image generated by IPMGAN is not good, and the holistic image is pale yellow and not clear enough.
To further verify the generalization ability of FE-GAN, we selected 990 images from the artificially synthesized dataset for testing and compared them with the corresponding ground truth images.
Figure 10 shows the effect of partially generated images. The overall hue of the image generated by FE-GAN is close to the ground truth. At the same time, the repair effect of some overexposed areas in the distorted image is also better. We can see more texture information from the restored image. Considering the complex layout of the images in the dataset, the holistic effect of the enhanced image can meet the practical applications. However, it has to be mentioned that whether it is a real or artificial dataset, our model does not perform well for images with insufficient light. It cannot restore the brightness and details of the image very well, and the enhanced image is still relatively dark.
4.4. Quantitative Evaluation
To quantitatively analyze the enhancement effect of the FE-GAN model on the paired underwater image, we choose PSNR (peak signal-to-noise ratio) and SSIM (structural similarity) as reference indicators. PSNR is an index used in the image field to measure the quality of reconstructed images, which is defined by taking the logarithm of MSE (mean squared error). Given a generated image
I of size
and its corresponding ground truth image
K, their MSE is expressed as
Since we resized the image before the experiment, the values of
m and
n are both 256. The definition of PSNR is as follows:
where
represents the maximum possible pixel value of the image, where each pixel is represented by 8 in binary, so
.
SSIM is a metric used to measure the similarity of images, and it can also be used to judge the quality of images after compression. It is mainly composed of three parts: luminance, contrast, and structure contrast. These are expressed as follows:
where, the
,
represent the mean value of
x,
y;
,
represent the variance of
x,
y; and
represent the covariance of
x,
y.
,
are two constants, and
L represents the range of pixel values [0, 255]. The definition of SSIM is as follows:
UIQM is a non-referenced underwater image quality evaluation metric based on the human visual system excitation, mainly for the degradation mechanism and imaging characteristics of underwater images. Using UICM (color measurement index), UISM (sharpness measurement index), UIConM (contrast measurement index) as the evaluation basis. UIQM is expressed as a linear combination of these three indexes. The larger the value, the better the color balance, clarity, and color of the image. UIQM expresses as follows:
where
,
,
[
44].
Table 1 shows the quantitative comparison results of our model and state-of-the-art models on the EUVP dataset. We chose 1k paired images and tested PSNR, SSIM, and UIQM, respectively. As can be seen from the table, our model achieved better results in the two measurement indicators of PSNR and UIQM. SSIM achieves second place after FUnIE-GAN among the learning-based methods. In general, after comparing with ground truth images, the generated images of our model are better.
In the ImageNet dataset, we randomly selected 5500 pairs of images for training and the remaining 628 pairs for testing.
Table 2 shows the test results of our model with UWCNN [
43], CycleGAN [
24], UGAN [
40], FUnIE-GAN [
26], IPMGAN [
17] on the ImageNet dataset test set. Similarly, the PSNR value of the image generated by our model reached the highest results.
For the Mixed dataset, we selected Test-R90 (90 paired images) and Test-C60 (60 unpaired images) as the test sets of paired and unpaired images respectively and compared them with the same methods in qualitative evaluation. Both of these test sets are from the UIEBD dataset, which is more challenging. These images were taken in a poor light environment, and the overall number of this dataset is small, which brings a certain degree of difficulty to training. Here, we selected UCycleGAN [
46], WaterNet [
38], UWCNN [
43], Unet-U [
47], Ucolor [
48], and FUnIE-GAN [
26] models for comparison.
Table 3 shows the PSNR test results of Test-R90. Our model achieved the best value.
Figure 11a shows some of the test results. The image generated by our model can improve the clarity and brightness of the image while retaining the overall layout, effectively improving the yellow-green hue problem.
Table 4 shows the test results of Test-C60. Since there is no reference image in this dataset, we only chose UIQM as a comparison indicator. Similarly, our model achieved better results.
Figure 11b shows part of the generated image of Test-C60. The overall brightness of the image is significantly improved, which will be helpful for the recognition of underwater images and resource exploration.
The application of underwater image enhancement technology to underwater detection equipment is an important research direction. Several aspects should be taken into consideration, such as FLOPs, number of parameters, and inference time during deploying on resource-limited devices. We chose fps as a metric to measure inference time, which expresses as
Table 5 shows the comparison results among FE-GAN, UGAN, and the more lightweight network FUnIE-GAN. The amount of parameters of the FE-GAN model is much fewer than that of UGAN. However, since the FE-GAN encoder is deeper than the FUnIE-GAN network, the amount of parameters is slightly larger than FUnIE-GAN. Furthermore, due to the structural re-parameterization module’s ability to speed up network inference and reduce memory consumption, our model has the lowest FLOPs, which means that the real-time performance is better. The fps indicators of FE-GAN and FUnIE-GAN both exceed 200. In summary, the FE-GAN model can meet the current real-time requirements of underwater detection.
For AUVs and ROVs, during underwater exploration activities, the purpose of improving the image quality is to improve the accuracy of tasks such as object detection and classification. We chose the pre-trained YOLOv5 as the object detection model and tested the images before and after enhancement on the EUVP dataset. Part of the test results is shown in
Figure 12.
The first line is the unprocessed original distorted images, and the second line is the FE-GAN processed images. Compared with the original distorted image, the processed image has a more natural tone and increased brightness, so the target in the image is clearer and easier to identify. The results in the second, fifth, and last columns show that the fuzzy target can be detected in the processed image. Other examples show that the recognition error of the processed image is alleviated.
In addition, we downloaded the Aquarium Combined dataset, then trained and tested this dataset on the same hardware environment as the FE-GAN enhancement experiment. The object detection test was performed before and after the FE-GAN processing. Here, we chose YOLOv5 as the object detector.
Figure 13 shows part of the test results. The overall blue tone of the original image is obvious, and the processed image effectively repairs this problem, making the image closer to the ground truth situation.
Figure 14,
Figure 15 and
Figure 16 are the precision curve and confusion matrix trained on YOLOv5 for this dataset.
In the Aquarium Combined dataset, there are seven types of targets to be detected: fish, jellyfish, penguin, puffin, shark, starfish, and stingray. Here we used mAP (mean average precision) as a reference metric.
Table 6 shows FUnIE-GAN, UGAN, Pix2Pix, and FE-GAN in the above different types of mAP, and the average mAP of all classes. FE-GAN obtained the highest mAP value in the five categories of detection, and the average mAP of all classes also reached 0.672, which is the highest value among the above models.
In the case of insufficient natural light, the image obtained with the artificial light source itself is extremely distorted. Although the brightness and details of the image enhanced by FE-GAN were restored partially, there is still a large gap from the image style under natural light, which is also the focus of future research.