Internal Learning for Image Super-Resolution by Adaptive Feature Transform
Abstract
:1. Introduction
- We propose a novel framework to exploit the strengths of the external prior and internal prior in the image super-resolution task. In contrast to the full training and fine-tuning methods, the proposed method modulates the intermediate output according to the testing low-resolution image via its internal examples to produce more accurate SR images.
- We perform adaptive feature transformation to simulate various image feature distributions extracted from the testing low-resolution image. We carefully investigate the properties of adaptive feature transformation layers, providing detailed guidance on the usage of the proposed method. Furthermore, the framework of our network is flexible and able to be integrated into CNN-based models.
- The extensive experimental results demonstrate that the proposed method is effective for improving the performance of lightweight deep network SR. This is promising for providing new ideas for the community to introduce internal priors to the deep network for SR methods.
2. Related Work
2.1. Internal Learning for Image Super-Resolution
2.2. Feature-Wise Transformation
3. Proposed Method
3.1. External Learning
Algorithm 1 External training. |
|
3.2. Internal Learning via AFT Layers
3.2.1. Adaptive Feature-Wise Transform Layer
3.2.2. Internal Learning
Algorithm 2 Internal learning. |
|
3.3. Image-Adaptive Super-Resolution
4. Experiments and Results
4.1. Experimental Set-Up
- External training. For external training, we use the images from DIV2K [37]. The image patches sized are input, and the ground truth is the corresponding HR patches sized , where r is the upscaling factor. Training data augmentation is performed with random up-down and left-right flips and clockwise rotations.
- Internal learning. For internal learning, we generate internal LR-HR pairs from the test images and following the steps of [11]. and become the ground-truth images. After downsampling and with the blur kernel, their corresponding LR sons become LR images. The training dataset is built by extracting patches from the "ground-truth" images and their LR sons. In our experiment, IASR and ZSSR extract internal examples with the same strategy, including the number of examples (3000), the sampling stride (4), the scale augmentation (without). Finally, the internal dataset consists of HR patches sized and LR patches sized , which are further enriched by augmentation such as rotations and flips.
- Training settings. For both training phases, we use the loss with the ADAM optimizer [38] with and . All models are built using the PyTorch framework [39]. The output feature maps are padded by zeros before convolutions. To minimize the overhead and make maximum use of the GPU memory, the batch size is set to 64 and the training stops after 60 epochs. The initial learning rate is , which decreases by 10 percent after every 20 epochs. To synthesize the LR examples, these examples are first downsampled by a given upscaling factor, and then these LR examples are upscaled by the same factor via Bicubic interpolation to form the LR images. The upscaling block in Figure 3 is implemented via “bicubic” interpolation. We conduct the experiments on a machine with a NVIDIA TitanX GPU with 16G of memory.
4.2. Improvement for the Lightweight CNN
- Image-adaptive (A) SR is a more effective way to improve performance than back-projections (B) and enhancement (E). The gains of the image-adaptive technique for SRCNN and ResNet are both about +0.18 dB. The gain of back projection is only about +0.01 dB on average (note that back projection needs to presuppose a degradation operator, which makes it hard to give a precise estimation). It confirms that our image-adaptive approach is a generic way to improve the lightweight network for SR.
- Among the three benchmark datasets, the Urban100 images present strong self-similarities and redundant repetitive patterns; therefore, they provide a large number of internal examples for internal learning. By applying the image-adaptive internal learning technique, both the SRCNN and ResNet are largely improved on Urban100 (+0.31 and +0.24 dB). The poorest gains are achieved on BSD100 (average +0.06 dB and +0.13 dB). The reason is mainly due to the BSD100 dataset being natural outdoor images, which are similar to the external training images.
- The combination of an image-adaptive internal learning technique and enhanced prediction brings larger gains. ResNet achieves better performance (+0.28 dB) than ResNet on average. It indicates some complementarity between the different methods.
4.3. Comparison with State-of-the-Arts
4.3.1. Evaluations on “Ideal” Case
4.3.2. Evaluations on “Non-Ideal” Case
4.4. Real Image Super-Resolution
5. Discussion
5.1. The Kernel Size and Depth of the AFT Layers
5.2. Adapting to the Different Scale Factor
5.3. Complexity Analysis
5.4. Comparison with Other State-of-the-Art Methods
5.5. Limitations and Failed Examples
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Hou, H.; Andrews, H. Cubic splines for image interpolation and digital filtering. IEEE Trans. Acoust. Speech Signal Process. 1978, 26, 508–517. [Google Scholar]
- Li, X.; Orchard, M.T. New edge-directed interpolation. IEEE Trans. Image Process. 2001, 10, 1521–1527. [Google Scholar] [PubMed] [Green Version]
- Irani, M.; Peleg, S. Improving resolution by image registration. CVGIP Graph. Models Image Process. 1991, 53, 231–239. [Google Scholar] [CrossRef]
- Bevilacqua, M.; Roumy, A.; Guillemot, C.; Morel, M.L.A. Single-image super-resolution via linear mapping of interpolated self-examples. IEEE Trans. Image Process. 2014, 23, 5334–5347. [Google Scholar] [CrossRef] [Green Version]
- Sun, J.; Xu, Z.; Shum, H.Y. Image super-resolution using gradient profile prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AL, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Yang, C.Y.; Huang, J.B.; Yang, M.H. Exploiting self-similarities for single frame super-resolution. In Proceedings of the Asian Conference on Computer Vision (ACCV), Queenstown, New Zealand, 8–12 November 2010; pp. 497–510. [Google Scholar]
- Timofte, R.; De Smet, V.; Van Gool, L. Anchored neighborhood regression for fast example-based super-resolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 1920–1927. [Google Scholar]
- Shi, Y.; Wang, K.; Xu, L.; Lin, L. Local-and holistic-structure preserving image super resolution via deep joint component learning. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Seattle, WA, USA, 11–15 July 2016; pp. 1–6. [Google Scholar]
- Liang, Y.; Wang, J.; Zhou, S.; Gong, Y.; Zheng, N. Incorporating image priors with deep convolutional neural networks for image super-resolution. Neurocomputing 2016, 194, 340–347. [Google Scholar] [CrossRef] [Green Version]
- Huang, J.J.; Liu, T.; Luigi Dragotti, P.; Stathaki, T. SRHRF+: Self-example enhanced single image super-resolution using hierarchical random forests. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 71–79. [Google Scholar]
- Shocher, A.; Cohen, N.; Irani, M. “Zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 3118–3126. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [Green Version]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017; pp. 624–632. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2599–2613. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Chen, J.; Hoi, S.C. Deep learning for image super-resolution: A survey. arXiv 2019, arXiv:1902.06068. [Google Scholar] [CrossRef] [Green Version]
- Anwar, S.; Khan, S.; Barnes, N. A deep journey into super-resolution: A survey. arXiv 2019, arXiv:1904.07523. [Google Scholar]
- Wang, Z.; Yang, Y.; Wang, Z.; Chang, S.; Han, W.; Yang, J.; Huang, T. Self-tuned deep super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, 7–12 June 2015; pp. 1–8. [Google Scholar]
- Freedman, G.; Fattal, R. Image and video upscaling from local self-examples. ACM Trans. Graph. TOG 2011, 30, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Glasner, D.; Bagon, S.; Irani, M. Super-resolution from a single image. In Proceedings of the IEEE Conference on Computer Vision (CVPR), Kyoto, Japan, 29 September–2 October 2009; pp. 349–356. [Google Scholar]
- Zhang, J.; Zhao, D.; Gao, W. Group-based sparse representation for image restoration. IEEE Trans. Image Process. 2014, 23, 3336–3351. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep image prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 9446–9454. [Google Scholar]
- Yokota, T.; Hontani, H.; Zhao, Q.; Cichocki, A. Manifold Modeling in Embedded Space: A Perspective for Interpreting “Deep Image Prior”. arXiv 2019, arXiv:1908.02995. [Google Scholar]
- Timofte, R.; Rothe, R.; Van Gool, L. Seven Ways to Improve Example-Based Single Image Super Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1865–1873. [Google Scholar]
- Liang, Y.; Timofte, R.; Wang, J.; Gong, Y.; Zheng, N. Single image super resolution-when model adaptation matters. arXiv 2017, arXiv:1703.10889. [Google Scholar]
- Wang, Z.; Yang, Y.; Wang, Z.; Chang, S.; Yang, J.; Huang, T.S. Learning super-resolution jointly from external and internal examples. IEEE Trans. Image Process. 2015, 24, 4359–4371. [Google Scholar] [CrossRef] [PubMed]
- Cheong, J.Y.; Park, I.K. Deep CNN-based super-resolution using external and internal examples. IEEE Signal Process. Lett. 2017, 24, 1252–1256. [Google Scholar] [CrossRef]
- Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [Green Version]
- Perez, E.; Strub, F.; De Vries, H.; Dumoulin, V.; Courville, A. FiLM: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New Orleans, LA, USA, 2–7 February 2018; pp. 3942–3951. [Google Scholar]
- Tseng, H.Y.; Lee, H.Y.; Huang, J.B.; Yang, M.H. Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation. arXiv 2020, arXiv:2001.08735. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- He, J.; Dong, C.; Qiao, Y. Modulating image restoration with continual levels via adaptive feature modification layers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–21 June 2019; pp. 11056–11064. [Google Scholar]
- Timofte, R.; Gu, S.; Wu, J.; Van Gool, L. NTIRE 2018 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 852–863. [Google Scholar]
- Yang, J.; Wright, J.; Huang, T.; Ma, Y. Image super-resolution as sparse representation of raw image patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in PyTorch. 2017. Available online: https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf (accessed on 12 December 2017).
- Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image super-resolution via sparse representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef]
- Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Vancouver, BC, Canada, 7–14 July 2001; pp. 416–423. [Google Scholar]
- Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. arXiv 2015, arXiv:1511.04587. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Soh, J.W.; Cho, S.; Cho, N.I. Meta-Transfer Learning for Zero-Shot Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, Seattle, WA, USA, 16–18 June 2020; pp. 3516–3525. [Google Scholar]
- Gu, J.; Lu, H.; Zuo, W.; Dong, C. Blind super-resolution with iterative kernel correction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–21 June 2019; pp. 1604–1613. [Google Scholar]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
SRCNN | SRCNN | SRCNN | SRCNN | SRCNN | SRCNN | |
---|---|---|---|---|---|---|
Set5 | 36.63 | 36.81 | 36.68 | 36.67 | 36.56 | 36.77 |
BSD100 | 31.32 | 31.38 | 31.33 | 31.34 | 31.41 | 31.41 |
Urban100 | 29.39 | 29.70 | 29.43 | 29.42 | 29.57 | 29.57 |
midrule Improv. | — | +0.18 | +0.03 | +0.03 | +0.07 | +0.14 |
ResNet | ResNet | ResNet | ResNet | ResNet | ResNet | |
Set5 | 37.18 | 37.34 | 37.21 | 37.36 | 37.25 | 37.53 |
BSD 100 | 31.62 | 31.75 | 31.60 | 31.65 | 31.71 | 31.80 |
Urban100 | 30.27 | 30.51 | 30.28 | 30.41 | 30.55 | 30.58 |
Improv. | — | +0.18 | +0.01 | +0.12 | +0.15 | +0.28 |
No Learning | External Learning | Internal Learning | External and Internal Learning | |||||
---|---|---|---|---|---|---|---|---|
Dataset | Scale | Bicubic | RCAN | VDSR | ZSSR | MZSR(1) | MZSR(10) | IASR |
Set5 | 2 | 33.66 | 38.27 | 37.53 | 36.93 | 36.77 | 37.25 | 37.34 |
0.9290 | 0.9614 | 0.9590 | 0.9554 | 0.9549 | 0.9567 | 0.9583 | ||
3 | 30.39 | 34.74 | 33.67 | 31.83 | — | — | 33.42 | |
0.8682 | 0.9299 | 0.9210 | 0.896 | — | — | 0.9181 | ||
4 | 28.42 | 32.63 | 31.35 | 28.72 | — | — | 30.96 | |
0.8104 | 0.9002 | 0.8830 | 0.8237 | — | — | 0.8760 | ||
Set14 | 2 | 30.23 | 34.12 | 33.05 | 32.51 | — | — | 33.03 |
0.8678 | 0.9216 | 0.9130 | 0.9078 | — | — | 0.9114 | ||
3 | 27.54 | 30.65 | 29.78 | 28.85 | — | — | 29.73 | |
0.7736 | 0.8482 | 0.8320 | 0.8182 | — | — | 0.8278 | ||
4 | 26.00 | 28.87 | 28.02 | 26.92 | — | — | 27.86 | |
0.7019 | 0.7889 | 0.7680 | 0.7433 | — | — | 0.7596 | ||
BSD100 | 2 | 29.57 | 32.41 | 31.90 | 31.39 | 31.33 | 31.64 | 31.75 |
0.8434 | 0.9027 | 0.8960 | 0.8891 | 0.8910 | 0.8928 | 0.8941 | ||
3 | 27.22 | 29.32 | 28.82 | 28.27 | — | — | 28.62 | |
0.7394 | 0.8111 | 0.7990 | 0.7845 | — | — | 0.7919 | ||
4 | 25.99 | 27.77 | 27.29 | 26.62 | — | — | 27.02 | |
0.6692 | 0.7436 | 0.7226 | 0.7063 | — | — | 0.7154 | ||
Urban100 | 2 | 26.87 | 33.34 | 30.77 | 29.43 | 30.01 | 30.41 | 30.51 |
0.8404 | 0.9384 | 0.9140 | 0.8942 | 0.9054 | 0.9092 | 0.9100 | ||
3 | 24.46 | 29.09 | 27.14 | 25.90 | — | — | 26.80 | |
0.7355 | 0.8702 | 0.8290 | 0.7896 | — | — | 0.8167 | ||
4 | 23.14 | 26.82 | 25.18 | 24.12 | — | — | 24.86 | |
0.6589 | 0.8087 | 0.7540 | 0.7070 | — | — | 0.7381 |
No Learning | External Learning | Internal Learning | External and Internal Learning | |||||
---|---|---|---|---|---|---|---|---|
Kernel | Dataset | Bicubic | RCAN | IKC | ZSSR | MZSR(1) | MZSR(10) | IASR |
Set5 | 30.54 | 31.54 | 33.88 | 35.24 | 35.18 | 36.64 | 35.41 | |
0.8773 | 0.8992 | 0.9357 | 0.9434 | 0.9430 | 0.9498 | 0.9535 | ||
BSD100 | 27.49 | 28.27 | 30.95 | 30.74 | 29.02 | 31.25 | 28.92 | |
0.7546 | 0.7904 | 0.8860 | 0.8743 | 0.8544 | 0.8818 | 0.7563 | ||
Urban100 | 24.74 | 25.65 | 29.47 | 28.30 | 28.27 | 29.83 | 29.80 | |
0.7527 | 0.7946 | 0.8956 | 0.8693 | 0.8771 | 0.8965 | 0.8714 | ||
Set5 | 28.73 | 29.15 | 29.05 | 34.90 | 35.20 | 36.05 | 35.48 | |
0.8449 | 0.8601 | 0.8896 | 0.9397 | 0.9398 | 0.9439 | 0.9403 | ||
BSD100 | 26.51 | 26.89 | 27.46 | 30.57 | 30.58 | 31.09 | 30.54 | |
0.7157 | 0.7394 | 0.8156 | 0.8712 | 0.8627 | 0.8739 | 0.8625 | ||
Urban100 | 23.70 | 24.14 | 25.17 | 27.86 | 28.23 | 29.19 | 28.41 | |
0.7109 | 0.7384 | 0.8169 | 0.8582 | 0.8657 | 0.8838 | 0.8662 |
Image | Bicubic | IASR | ZSSR | MZSR(1) | MZSR(10) |
---|---|---|---|---|---|
Old photo | 5.91/42.30 | 5.88/40.13 | 6.97/46.79 | 9.79/85.17 | 11.39/93.23 |
Img_005_SRF | 6.91/ 42.71 | 6.04/43.15 | 6.290/46.18 | 11.18/91.63 | 12.67/99.66 |
Eyechart | 15.82/48.99 | 14.02/48.19 | 11.68/ 32.23 | 13.30/ 41.28 | 14.20/61.84 |
37.18/0.9571 | 30.17/0.9042 | 26.45/0.8277 | |
IASR | 37.34/0.9581 | 35.42/0.9465 | 35.36/0.9451 |
Improv. | +0.16/0.0010 | +5.25/0.0423 | +8.01/0.1174 |
Methods | Parameters | Time (s) |
---|---|---|
SRCNN | 57 K | 0.20 |
VDSR | 665 K | 0.36 |
RCAN | 15,445 K | 1.72 |
ResNet | 225 K | 0.33 |
ZSSR | 225 K | 148.40 |
MZSR(1) | 225 K | 0.13 |
MZSR(10) | 225 K | 0.36 |
IASR | 229 K | 34.03 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, Y.; Cao, W.; Du, X.; Chen, C. Internal Learning for Image Super-Resolution by Adaptive Feature Transform. Symmetry 2020, 12, 1686. https://doi.org/10.3390/sym12101686
He Y, Cao W, Du X, Chen C. Internal Learning for Image Super-Resolution by Adaptive Feature Transform. Symmetry. 2020; 12(10):1686. https://doi.org/10.3390/sym12101686
Chicago/Turabian StyleHe, Yifan, Wei Cao, Xiaofeng Du, and Changlin Chen. 2020. "Internal Learning for Image Super-Resolution by Adaptive Feature Transform" Symmetry 12, no. 10: 1686. https://doi.org/10.3390/sym12101686
APA StyleHe, Y., Cao, W., Du, X., & Chen, C. (2020). Internal Learning for Image Super-Resolution by Adaptive Feature Transform. Symmetry, 12(10), 1686. https://doi.org/10.3390/sym12101686