Adversarial Image Colorization Method Based on Semantic Optimization and Edge Preservation
Abstract
:1. Introduction
- (1)
- The generative network is designed based on the U-Net [19] network. The local feature-extraction subnet and the global feature-extraction subnet are used to achieve downsampling, and the coloring subnet is used to achieve upsampling. The local feature-extraction subnet is constructed by a residual structure, dilated convolution and attention mechanism and combined with the global feature-extraction subnet to fuse local features and global features, which can enhance the understanding of global semantic information. PixelShuffle [20] is introduced into the coloring subnet, and the local features of each scale obtained in the upsampling and downsampling process are integrated by skip connections, so that the network can pay attention to the context information to reduce the loss of texture and color information in the image coloring process.
- (2)
- The batch normalization layer (BN) in the discriminative network is removed to reduce image artifacts and maintain image contrast.
- (3)
- The edge-assisted network is designed, and the edge-assisted loss is introduced, and the loss function is optimized by combining it with the adversarial loss of relativistic average discriminator (RaGAN) [21], which can improve the constraint ability of the image color and edge and increase the robustness of the network.
Methods | Classification | Comparison |
---|---|---|
Levin et al. [7], Welsh et al. [8], etc. | Local-color-expansion-based methods | No need for huge image data. No automatic colorization, and manual assisted coloring is required; it takes a lot of time; the coloring effect of complex background images is poor, with inconsistent coloring, edge distortion and color crosstalk. |
Xu et al. [9], Instance [10], Broad-GAN [11], etc. | Instance-based methods | It can be automatically colored to reduce time complexity; the coloring effect is significantly improved. The network structure is complex, and it is often only suitable for processing a certain type of image; the coloring effect depends on the quality of the provided instance image, which has great limitations. |
Deoldify [15], ChromaGAN [16], Pix2PixGAN [18], CycleGAN [19], etc. | Deep-learning-based methods | The network structure is improved, which can realize the fast and automatic end-to-end coloring of various types of grayscale images, and the coloring authenticity is high. It needs the support of a huge image dataset; there are certain problems of color anomaly, border blooming and lack of contrast. |
Proposed method | Deep-learning-based method | An end-to-end automatic colorization method for various types of grayscale images, including old photos; the network structure and the loss function are improved, and the ability for network semantic optimization and edge enhancement is enhanced, which can effectively colorize complex background images and alleviate the problems of abnormal color, blurred edges and lack of saturation. For some images with special complex background, the coloring effect needs to be improved. |
2. Proposed Method
2.1. The Generative Network
2.1.1. The Local Feature-Extraction Subnet
2.1.2. The Global Feature-Extraction Subnet
2.1.3. The Coloring Subnet
2.2. The Edge-Assisted Network
2.3. The Discriminative Network
2.4. The Design of Objective Loss Function
3. Experiments and Analysis
3.1. Experimental Environment and Settings
- Dataset. In this paper, the network model is trained and tested based on the Places365 [26] and ImageNet [27] datasets. More than 1.8 million images of 365 different scenes are included in the Places365 [26] dataset. After filtering out a small number of grayscale images and dim images in the dataset, 15,000 images of different scenes are selected as the test set, and the remaining 1.68 million images are the training set. The ImageNet [27] dataset is a special database of the ILSVRC Challenge. In this paper, a subset of ImageNet [27] dataset is adopted. More than 1.2 million images of more than 1000 image categories are included in the ImageNet [27] dataset, and 10,000 images of different categories are selected as the test set, and the rest of the images are the training set. All images are pre-cropped to 224 × 224.
- Experimental Environment. The experiment is based on the Windows 10, and the deep learning framework Pytorch1.9.0 based on the CUDA-accelerated is used to build the network model. The hardwares are Intel Core i9 9900 K CPU with 64 GB of memory and NVIDIA GeForce RTX 3090 GPU with 24 GB of video memory.
- Training Settings. The discriminator is updated once per training, and the generator is updated twice per training. The optimizer chooses Adam; the weight decay is 0.005; the learning rate is set to 0.0003; the batch size is set to 32, and the iteration is 200 epochs.
3.2. Evaluation Indicators
3.3. Comparison of Results of Different Methods
3.3.1. Comparison of Coloring Effects under the Original Image
3.3.2. Comparison of the Coloring Effects of Old Black and White Photos
3.3.3. Comparison of Time Complexity
3.4. Ablation Experiments
3.4.1. The Ablation Experiments 1
3.4.2. The Ablation Experiments 2
3.4.3. The Ablation Experiments 3
3.5. Failure Cases
3.6. Discussions
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zhao, Y.Z.; Po, L.M.; Cheung, K.W.; Yu, W.Y.; Rehman, Y.A.U. SCGAN: Saliency Map-Guided Colorization with Generative Adversarial Network. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3062–3077. [Google Scholar] [CrossRef]
- Wu, Y.Z.; Wang, X.T.; Li, Y.; Zhang, H.L.; Zhao, X.; Shan, Y. Towards vivid and diverse image colorization with generative color prior. In Proceeding of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14357–14366. [Google Scholar]
- Zhuo, L.; Tan, S.; Li, B.; Huang, J.W. ISP-GAN: Inception Sub-Pixel Deconvolution-Based Lightweight GANs for Colorization. Multimed. Tools Appl. 2022, 81, 24977–24994. [Google Scholar] [CrossRef]
- Morra, L.; Piano, L.; Lamberti, F.; Tommasi, T. Bridging the gap between natural and medical images through deep colorization. In Proceeding of the 2021 IEEE International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021; pp. 835–842. [Google Scholar]
- Liu, S.; Gao, M.L.; John, V.J.; Liu, Z.; Blasch, E. Deep Learning Thermal Image Translation for Night Vision Perception. ACM Trans. Intell. Syst. Technol. 2021, 9, 1–18. [Google Scholar] [CrossRef]
- Wan, Z.Y.; Zhang, B.; Chen, D.D.; Zhang, P.; Chen, D.; Liao, J. Bringing old photos back to life. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2744–2754. [Google Scholar]
- Levin, A.; Lischinski, D.; Weiss, Y. Colorization Using Optimization. ACM Trans. Graph. 2004, 23, 689–694. [Google Scholar] [CrossRef]
- Welsh, T.; Ashikhmin, M.; Mueller, K. Transferring Color to Greyscale Images. ACM Trans. Graph. 2002, 21, 277–280. [Google Scholar] [CrossRef]
- Xu, Z.Y.; Wang, T.T.; Fang, F.M.; Sheng, Y.; Zhang, G.X. Stylization-based architecture for fast deep exemplar colorization. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9360–9369. [Google Scholar]
- Su, J.W.; Chu, H.K.; Huang, J.B. Instance-aware image colorization. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7965–7974. [Google Scholar]
- Li, H.X.; Sheng, B.; Li, P.; Ali, R.; Chen, C.L.P. Globally and Locally Semantic Colorization via Exemplar-Based Broad-GAN. IEEE Trans. Image Process. 2021, 30, 8526–8539. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Isola, P.; Efros, A.A. Colorful image colorization. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 649–666. [Google Scholar]
- Zhang, R.; Zhu, J.Y.; Isola, P. Real-Time User-Guided Image Colorization with Learned Deep Priors. ACM Trans. Graph. 2017, 36, 1–11. [Google Scholar] [CrossRef]
- Jason, A. Colorizing and Restoring Photos and Video. Available online: https://github.com/jantic/DeOldify (accessed on 16 January 2022).
- Vitoria, P.; Cisa, L.R.; Ballester, C. Chroma GAN: Adversarial picture colorization with semantic class distribution. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 2434–2443. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial network. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.H.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Shi, W.Z.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Jolicoeur-Martineau, A. The relativistic discriminator: A key element missing from standard GAN. arXiv 2018, arXiv:1807.00734v3. [Google Scholar]
- Hou, Q.B.; Zhou, D.Q.; Feng, J.S. Coordinate attention for efficient mobile network design. In Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar]
- Jiang, Y.C.; Liu, Y.Q.; Zhan, W.D.; Zhu, D.P. Light Weight Dual-Stream Residual Network for Single Image Super-Resolution. IEEE Access 2021, 9, 129890–129901. [Google Scholar] [CrossRef]
- Dong, Z.; Kamata, S.I.; Breckon, T.P. Infrared image colorization using a s-shape network. In Proceedings of the IEEE International Conference on Image Processing, Athens, Greece, 7–10 October 2018; pp. 2242–2246. [Google Scholar]
- Wang, X.T.; Yu, K.; Wu, S.X.; Gu, J.J.; Liu, Y.H.; Dong, C. ESRGAN: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 Million Image Database for Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Dong, W.; Socher, R.; Li, J.; Li, K.; Li, F.F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhu, D.P.; Zhan, W.D.; Jiang, Y.C.; Xu, X.Y.; Guo, R.Z. MIFFuse: A Multi-Level Feature Fusion Network for Infrared and Visible Images. IEEE Access 2021, 9, 130778–130792. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Layer | Output Size | Layer | Output Size |
---|---|---|---|
s1-3 × 3-LeakyReLU | 224 × 224 | s1-3 × 3-LeakyReLU | 56 × 56 |
s2-3 × 3-LeakyReLU | 112 × 112 | s2-3 × 3-LeakyReLU | 28 × 28 |
s1-3 × 3-LeakyReLU | 112 × 112 | s1-3 × 3-LeakyReLU | 1 × 1 |
s2-3 × 3-LeakyReLU | 56 × 56 | s1-3 × 3-LeakyReLU | 1 × 1 |
Method | Places365 | ImageNet | ||||
---|---|---|---|---|---|---|
PSNR/dB | SSIM | LPIPS | PSNR/dB | SSIM | LPIPS | |
Ref. [13] Deoldify ChromaGAN | 24.247 27.932 28.897 | 0.927 0.943 0.945 | 0.312 0.186 0.172 | 24.126 27.856 28.702 | 0.931 0.940 0.942 | 0.324 0.180 0.174 |
Proposed method | 30.903 | 0.956 | 0.147 | 30.545 | 0.946 | 0.150 |
Model | Ref. [13] | Deoldify | ChromaGAN | Proposed Method |
---|---|---|---|---|
Average training time/s | 0.22 | 0.34 | 0.43 | 0.25 |
Average coloring time/s | 0.14 | 0.26 | 0.33 | 0.18 |
Model | Places365 | ImageNet | ||||
---|---|---|---|---|---|---|
PSNR/dB | SSIM | LPIPS | PSNR/dB | SSIM | LPIPS | |
U-Net U-Net + G1 U-Net + G3 G1 + G3 | 23.462 26.484 25.672 29.133 | 0.919 0.934 0.923 0.945 | 0.323 0.204 0.272 0.165 | 23.032 26.368 25.804 28.821 | 0.921 0.930 0.925 0.938 | 0.344 0.193 0.266 0.172 |
Proposed method | 30.903 | 0.956 | 0.147 | 30.545 | 0.946 | 0.150 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gui, T.; Zhan, W.; Gu, X.; Hu, J. Adversarial Image Colorization Method Based on Semantic Optimization and Edge Preservation. Electronics 2022, 11, 3006. https://doi.org/10.3390/electronics11193006
Gui T, Zhan W, Gu X, Hu J. Adversarial Image Colorization Method Based on Semantic Optimization and Edge Preservation. Electronics. 2022; 11(19):3006. https://doi.org/10.3390/electronics11193006
Chicago/Turabian StyleGui, Tingting, Weida Zhan, Xing Gu, and Jiahui Hu. 2022. "Adversarial Image Colorization Method Based on Semantic Optimization and Edge Preservation" Electronics 11, no. 19: 3006. https://doi.org/10.3390/electronics11193006
APA StyleGui, T., Zhan, W., Gu, X., & Hu, J. (2022). Adversarial Image Colorization Method Based on Semantic Optimization and Edge Preservation. Electronics, 11(19), 3006. https://doi.org/10.3390/electronics11193006