Unlocking Efficiency in Fine-Grained Compositional Image Synthesis: A Single-Generator Approach
Abstract
:1. Introduction
- We propose a novel method that enables the generation of all semantic parts using a single local generator. This method effectively addresses the efficiency challenge encountered in large-scale compositional image-synthesis tasks.
- We introduce SSSGAN, a model that is semantic-aware, compositional and efficient. It addresses the gap in existing works by meeting all three of these criteria simultaneously.
- We conduct extensive experiments to evaluate the performance of SSSGAN, examining its compositional properties and the characteristics of its latent space.
- We investigate the entanglement mechanism between semantic parts and devise a method to control the intensity of this entanglement.
Model | Semantic-Aware | Compositional | Efficient |
---|---|---|---|
StyleGAN [10] | ✗ | ✗ | ✓ |
SPADE [11] | ✓ | ✗ | ✓ |
SemanticStyleGAN [12] | ✓ | ✓ | ✗ |
Ours | ✓ | ✓ | ✓ |
2. Related Work
References | Achievement | Disadvantage | Comparison with Our Model |
---|---|---|---|
[14,15] | Increased disentanglement by encouraging latent codes to be factorized | Non-obvious, non-predictable disentanglement | Disentanglement factor has semantic meaning |
[16,17] | Use of mutual information loss to boost factorized latent codes for improved disentanglement | Non-obvious, non-predictable disentanglement | Disentanglement factor has semantic meaning |
[18,19] | Style–content disentanglement | Not fine-grained | Part-level disentanglement |
[19] | Identity–pose disentanglement | Not fine-grained | Part-level disentanglement |
[20] | Identity–domain disentanglement | Not fine-grained | Part-level disentanglement |
[21] | Shape and physical attribute disentanglement | Not fine-grained | Part-level disentanglement |
[22,23,24,25,26,27] | Appearance–geometry disentanglement | Not fine-grained | Part-level disentanglement |
[28,29,30,31] | Structure disentanglement by customizing model architecture, unsupervised | Disentanglement factor does not align well with human conception | Disentanglement factor aligns with human conception |
[32] | Structure disentanglement by use of structured noise codes, unsupervised | Disentanglement factor does not align well with human conception | Disentanglement factor aligns with human conception |
[34,35,36] | Compositional image synthesis in pixel space | Floating object over background or feasible on simple dataset | Composition in hidden space, high quality on real-image dataset |
[37,38,39,40,41,42,43,44] | 3D-aware compositional image synthesis | Decomposes images into object and background, not fine-grained | Part-level compositional image synthesis |
[45,46] | Decomposes image into foreground shape, foreground appearance and background | Not fine-grained | Part-level compositional image synthesis |
[47] | Part-level compositionality, unsupervised | Decomposition quality is low | With supervision, high-quality decomposition |
[12] | Part-level compositionality, high-quality for image and decomposition | Need substantial amount of heavy local generators, inefficient | Uses single local-generator, efficient |
[48,49,50,51,52,53,54] | Sketch-to-image mapping | Semantic-agnostic, need sketches as input | Semantic-aware, no need for input |
[11,55,56,57,58] | Semantic layout-to-image mapping, semantic aware | Object-level, need semantic layout as input | Part-level, no need for input, generation from scratch |
[59,60] | Semantic layout-to-image mapping, part-level | Need layout as input | No need for input, generation from scratch |
[60,61,62] | Semantic layout-to-image mapping, supports both part- or object-level layout | Needs layout as input | No need for input, generation from scratch |
[55,62,63] | Semantic layout-to-image mapping, utilizing multi-scale methods to better encode semantic component | Needs layout as input | No need for input, generation from scratch |
3. Methods
4. Experiments and Results
4.1. Dataset and Training Method
4.2. Results
4.2.1. Image Quality Evaluation
Method | Data | Compositional | FID ↓ | IS ↑ |
---|---|---|---|---|
SemanticGAN 1 | img & seg | ✗ | 7.50 | 3.51 |
SemanticStyleGAN 1 | img & seg | ✓ | 6.42 | 3.21 |
SSSGAN | img & seg | ✓ | 6.19 | 3.15 |
4.2.2. Compositional and Disentanglement Properties
4.2.3. Noise and Latent Space Property
4.2.4. Necessity of the LSTM Module
4.2.5. Performance on Other Domains
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GAN | Generative Adversarial Network |
MLP | Multi-Layer Perceptron |
SSSGAN | Single-Generator SemanticStyleGAN |
SPC | Sequential Part Composition |
SMSE | Sequential Mean Square Error |
LSTM | Long Short-Term Memory |
FID | Fréchet Inception Distance |
IS | Inception Score |
References
- Guido, R.C.; Pedroso, F.; Contreras, R.C.; Rodrigues, L.C.; Guariglia, E.; Neto, J.S. Introducing the Discrete Path Transform (DPT) and Its Applications in Signal Analysis, Artefact Removal and Spoken Word Recognition. Digit. Signal Process. 2021, 117, 103158. [Google Scholar] [CrossRef]
- Guariglia, E.; Silvestrov, S. Fractional-Wavelet Analysis of Positive definite Distributions and Wavelets on ′(ℂ). In Engineering Mathematics II; Springer Proceedings in Mathematics & Statistics Series; Springer: Berlin/Heidelberg, Germany, 2016; pp. 337–353. [Google Scholar]
- Yang, L.; Sun, H.; Zhong, C.; Meng, Z.; Luo, H.; Li, X.; Tang, Y.Y.; Lu, Y. Hyperspectral image classification using wavelet transform-based smooth ordering. Int. Wavelets Multiresolution Inf. Process. 2019, 17, 1950050:1–1950050:18. [Google Scholar] [CrossRef]
- Guariglia, E. Harmonic Sierpinski Gasket and Applications. Entropy 2018, 20, 714. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Guariglia, E. Primality, Fractality and Image Analysis. Entropy 2019, 21, 304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zheng, X.; Tang, Y.Y.; Zhou, J. A Framework of Adaptive Multiscale Wavelet Decomposition for Signals on Undirected Graphs. IEEE Trans. Signal Process. 2019, 67, 1696–1711. [Google Scholar] [CrossRef]
- Berry, M.V.; Lewis, Z.V.; Nye, J.F. On the Weierstrass-Mandelbrot fractal function. Proc. R. Soc. Lond. Math. Phys. Sci. 1980, 370, 459–484. [Google Scholar]
- Osherson, D.N.; Smith, E.E. On the Adequacy of Prototype Theory as a Theory of Concepts. Cognition 1981, 9, 35–58. [Google Scholar] [CrossRef]
- Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building Machines That Learn and Think Like People. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef] [Green Version]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
- Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic Image Synthesis With Spatially-Adaptive Normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2337–2346. [Google Scholar]
- Shi, Y.; Yang, X.; Wan, Y.; Shen, X. SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11254–11264. [Google Scholar]
- Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [Green Version]
- Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Kim, H.; Mnih, A. Disentangling by Factorising. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 2649–2658. [Google Scholar]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; Volume 29, pp. 2172–2180. [Google Scholar]
- Lin, Z.; Thekumparampil, K.; Fanti, G.; Oh, S. InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs. In Proceedings of the International Conference on Machine Learning (ICML) 2020, Virtual, 13–18 July 2020; Volume 119, pp. 6127–6139. [Google Scholar]
- Kazemi, H.; Iranmanesh, S.M.; Nasrabadi, N. Style and Content Disentanglement in Generative Adversarial Networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; pp. 848–856. [Google Scholar]
- Tran, L.; Yin, X.; Liu, X. Disentangled Representation Learning GAN for Pose-Invariant Face Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1415–1424. [Google Scholar]
- Liu, A.H.; Liu, Y.C.; Yeh, Y.Y.; Wang, Y.C.F. A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2018; Volume 31, pp. 2595–2604. [Google Scholar]
- Medin, S.C.; Egger, B.; Cherian, A.; Wang, Y.; Tenenbaum, J.B.; Liu, X.; Marks, T.K. MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation. AAAI Conf. Artif. Intell. 2022, 36, 1962–1971. [Google Scholar] [CrossRef]
- Skafte, N.; ren Hauberg, S. Explicit Disentanglement of Appearance and Perspective in Generative Models. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2019; Volume 32, pp. 1016–1026. [Google Scholar]
- Lorenz, D.; Bereska, L.; Milbich, T.; Ommer, B. Unsupervised Part-Based Disentangling of Object Shape and Appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 10955–10964. [Google Scholar]
- Xing, X.; Gao, R.; Han, T.; Zhu, S.C.; Wu, Y.N. Deformable Generator Networks: Unsupervised Disentanglement of Appearance and Geometry. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1162–1179. [Google Scholar] [CrossRef] [PubMed]
- Liu, L.; Jiang, X.; Saerbeck, M.; Dauwels, J. EAD-GAN: A Generative Adversarial Network for Disentangling Affine Transforms in Images. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–11. [Google Scholar] [CrossRef] [PubMed]
- Tewari, A.; R, M.B.; Pan, X.; Fried, O.; Agrawala, M.; Theobalt, C. Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance From Monocular Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1516–1525. [Google Scholar]
- Nguyen-Phuoc, T.; Li, C.; Theis, L.; Richardt, C.; Yang, Y.L. HoloGAN: Unsupervised Learning of 3D Representations from Natural Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7588–7597. [Google Scholar]
- Sønderby, C.K.; Raiko, T.; Maaløe, L.; Sønderby, S.R.K.; Winther, O. Ladder Variational Autoencoders. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2016; Volume 29, pp. 3738–3746. [Google Scholar]
- Zhao, S.; Song, J.; Ermon, S. Learning Hierarchical Features from Deep Generative Models. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; Volume 70, pp. 4091–4099. [Google Scholar]
- Li, Z.; Murkute, J.V.; Gyawali, P.K.; Wang, L. Progressive Learning and Disentanglement of Hierarchical Representations. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Kaneko, T.; Hiramatsu, K.; Kashino, K. Generative Adversarial Image Synthesis with Decision Tree Latent Controller. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6606–6615. [Google Scholar]
- Alharbi, Y.; Wonka, P. Disentangled Image Generation through Structured Noise Injection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2020; pp. 5134–5142. [Google Scholar]
- Goyal, A.; Bengio, Y. Inductive biases for deep learning of higher-level cognition. R. Soc. Math. Phys. Eng. Sci. 2022, 478, 20210068. [Google Scholar] [CrossRef]
- Arandjelović, R.; Zisserman, A. Object Discovery with a Copy-Pasting GAN. arXiv 2019, arXiv:cs/1905.11369. [Google Scholar]
- Azadi, S.; Pathak, D.; Ebrahimi, S.; Darrell, T. Compositional GAN: Learning Image-Conditional Binary Composition. Int. J. Comput. Vis. 2020, 128, 2570–2585. [Google Scholar] [CrossRef]
- Sbai, O.; Couprie, C.; Aubry, M. Surprising Image Compositions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 3926–3930. [Google Scholar]
- Burgess, C.P.; Matthey, L.; Watters, N.; Kabra, R.; Higgins, I.; Botvinick, M.; Lerchner, A. MONet: Unsupervised Scene Decomposition and Representation. arXiv 2019, arXiv:cs/1901.11390. [Google Scholar]
- Greff, K.; Kaufman, R.L.; Kabra, R.; Watters, N.; Burgess, C.; Zoran, D.; Matthey, L.; Botvinick, M.; Lerchner, A. Multi-Object Representation Learning with Iterative Variational Inference. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 10–15 June 2019; Volume 97, pp. 2424–2433. [Google Scholar]
- Liao, Y.; Schwarz, K.; Mescheder, L.; Geiger, A. Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5871–5880. [Google Scholar]
- Nguyen-Phuoc, T.H.; Richardt, C.; Mai, L.; Yang, Y.; Mitra, N. BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2020; Volume 33, pp. 6767–6778. [Google Scholar]
- Niemeyer, M.; Geiger, A. GIRAFFE: Representing Scenes As Compositional Generative Neural Feature Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 11453–11464. [Google Scholar]
- Li, N.; Eastwood, C.; Fisher, R. Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2020; Volume 33, pp. 5656–5666. [Google Scholar]
- Anciukevicius, T.; Lampert, C.H.; Henderson, P. Object-Centric Image Generation with Factored Depths, Locations and Appearances. arXiv 2020, arXiv:cs/2004.00642. [Google Scholar]
- Henderson, P.; Lampert, C.H. Unsupervised Object-Centric Video Generation and Decomposition in 3D. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2020; Volume 33, pp. 3106–3117. [Google Scholar]
- Singh, K.K.; Ojha, U.; Lee, Y.J. FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 6490–6499. [Google Scholar]
- Schwarz, K.; Liao, Y.; Niemeyer, M.; Geiger, A. GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2020; Volume 33, pp. 20154–20166. [Google Scholar]
- Kwak, H.; Zhang, B.T. Generating Images Part by Part with Composite Generative Adversarial Networks. arXiv 2016, arXiv:cs/1607.05387. [Google Scholar]
- Chen, W.; Hays, J. SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 9416–9425. [Google Scholar]
- Lu, Y.; Wu, S.; Tai, Y.W.; Tang, C.K. Image Generation from Sketch Constraint Using Contextual GAN. In Proceedings of the European Conference on Computer Vision (ECCV), Salt Lake City, UT, USA, 18–22 June 2018; pp. 205–220. [Google Scholar]
- Zhao, J.; Xie, X.; Wang, L.; Cao, M.; Zhang, M. Generating Photographic Faces From the Sketch Guided by Attribute Using GAN. IEEE Access 2019, 7, 23844–23851. [Google Scholar] [CrossRef]
- Ghosh, A.; Zhang, R.; Dokania, P.K.; Wang, O.; Efros, A.A.; Torr, P.H.S.; Shechtman, E. Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1171–1180. [Google Scholar]
- Chen, S.Y.; Su, W.; Gao, L.; Xia, S.; Fu, H. DeepFaceDrawing: Deep Generation of Face Images from Sketches. ACM Trans. Graph. 2020, 39, 72:1–72:16. [Google Scholar] [CrossRef]
- Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 2287–2296. [Google Scholar]
- Wang, S.Y.; Bau, D.; Zhu, J.Y. Sketch Your Own GAN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 14050–14060. [Google Scholar]
- Chen, Q.; Koltun, V. Photographic Image Synthesis with Cascaded Refinement Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1511–1520. [Google Scholar]
- Liu, X.; Yin, G.; Shao, J.; Wang, X.; Li, h. Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2019; Volume 32, pp. 568–578. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-To-Image Translation With Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Tang, H.; Xu, D.; Sebe, N.; Wang, Y.; Corso, J.J.; Yan, Y. Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2417–2426. [Google Scholar]
- Wang, Y.; Qi, L.; Chen, Y.C.; Zhang, X.; Jia, J. Image Synthesis via Semantic Composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 11–17 October 2021; pp. 13749–13758. [Google Scholar]
- Zhu, Z.; Xu, Z.; You, A.; Bai, X. Semantically Multi-Modal Image Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 5467–5476. [Google Scholar]
- Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 5104–5113. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar]
- Chen, A.; Liu, R.; Xie, L.; Chen, Z.; Su, H.; Yu, J. SofGAN: A Portrait Image Generator with Dynamic Styling. ACM Trans. Graph. 2022, 41, 1–26. [Google Scholar] [CrossRef]
- Thanh-Tung, H.; Tran, T. On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks. arXiv 2020, arXiv:cs/1807.04015. [Google Scholar]
- van den Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel Recurrent Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), New York City, NY, USA, 19–24 June 2016; Volume 48, pp. 1747–1756. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
- Mescheder, L.M.; Geiger, A.; Nowozin, S. Which Training Methods for GANs do Actually Converge? In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 3478–3487. [Google Scholar]
- Lee, C.; Liu, Z.; Wu, L.; Luo, P. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA, 14–19 June 2020; pp. 5548–5557. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2017; Volume 30, pp. 6626–6637. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X.; Chen, X. Improved Techniques for Training GANs. In Proceedings of the Advances in Neural Information Processing Systems (NIPS); NIPS: La Jolla, CA, USA, 2016; Volume 29, pp. 2226–2234. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Liu, Z.; Luo, P.; Qiu, S.; Wang, X.; Tang, X. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1096–1104. [Google Scholar]
Model | SemanticStyleGAN | SSSGAN | |||
---|---|---|---|---|---|
Temperature | 1.0 | 1.0 | 0.8 | 0.1 | 0.01 |
skin | 0.078 ± 0.101 | 0.244 ± 0.232 | 0.214 ± 0.210 | 0.028 ± 0.044 | 0.022 ± 0.036 |
eyes | 0.022 ± 0.009 | 0.031 ± 0.011 | 0.032 ± 0.012 | 0.041 ± 0.017 | 0.040 ± 0.016 |
eyebrows | 0.004 ± 0.002 | 0.006 ± 0.003 | 0.006 ± 0.002 | 0.006 ± 0.002 | 0.006 ± 0.002 |
mouth | 0.009 ± 0.006 | 0.009 ± 0.005 | 0.009 ± 0.005 | 0.008 ± 0.004 | 0.008 ± 0.004 |
nose | 0.004 ± 0.003 | 0.013 ± 0.005 | 0.014 ± 0.005 | 0.014 ± 0.006 | 0.014 ± 0.007 |
ears | 0.008 ± 0.012 | 0.012 ± 0.016 | 0.011 ± 0.016 | 0.005 ± 0.010 | 0.004 ± 0.010 |
hair | 0.053 ± 0.051 | 0.149 ± 0.104 | 0.115 ± 0.090 | 0.052 ± 0.044 | 0.046 ± 0.041 |
neck | 0.011 ± 0.012 | 0.013 ± 0.017 | 0.012 ± 0.016 | 0.008 ± 0.007 | 0.008 ± 0.007 |
cloth | 0.004 ± 0.006 | 0.005 ± 0.009 | 0.005 ± 0.009 | 0.003 ± 0.005 | 0.002 ± 0.004 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Liu, Z. Unlocking Efficiency in Fine-Grained Compositional Image Synthesis: A Single-Generator Approach. Appl. Sci. 2023, 13, 7587. https://doi.org/10.3390/app13137587
Wang Z, Liu Z. Unlocking Efficiency in Fine-Grained Compositional Image Synthesis: A Single-Generator Approach. Applied Sciences. 2023; 13(13):7587. https://doi.org/10.3390/app13137587
Chicago/Turabian StyleWang, Zongtao, and Zhiming Liu. 2023. "Unlocking Efficiency in Fine-Grained Compositional Image Synthesis: A Single-Generator Approach" Applied Sciences 13, no. 13: 7587. https://doi.org/10.3390/app13137587
APA StyleWang, Z., & Liu, Z. (2023). Unlocking Efficiency in Fine-Grained Compositional Image Synthesis: A Single-Generator Approach. Applied Sciences, 13(13), 7587. https://doi.org/10.3390/app13137587