Deep Fashion Designer: Generative Adversarial Networks for Fashion Item Generation Based on Many-to-One Image Translation
Abstract
:1. Introduction
- End-to-end manner: Our framework is trained through a fully end-to-end process, eliminating the need for additional networks. This design simplifies the training process and makes it less challenging compared to OutfitGAN, as it avoids the complexity of multi-stage training.
- Fashion compatibility batch algorithm: The proposed algorithm enables the framework to effectively learn fashion compatibility by leveraging numerous unsuitable outfits, facilitating the generation of plausible fashion images even on unseen data.
- A large image resolution: The resolution of the generated images is significantly improved, quadrupling that of OutfitGAN [18] from 128 × 128 pixels to 256 × 256 pixels.
2. Related Work
2.1. Generative Adversarial Networks
2.2. Fashion Item Generation
3. Methodology
3.1. Problem Definition
3.2. Model Architecture
3.3. Fashion Compatibility Batch Algorithm
Algorithm 1 Pseudocode for fashion compatibility batch algorithm. |
|
3.4. Objectives
3.5. Training Algorithm
Algorithm 2 DFDGAN training algorithm. |
|
4. Experimental Evaluation
4.1. Dataset and Pre-Processing
4.2. Implementation Details
5. Evaluation
5.1. Baseline and Metrics
5.2. Inception Score
5.3. Fréchet Inception Distance
5.4. Learned Perceptual Image Patch Similarity
5.5. Comparison with the Baselines
5.6. Architecture Configuration Study
5.7. Quantitative Result
6. Method Analysis
6.1. Architectural Difference with the Baselines
6.2. Outfit Space Exploration
7. Discussion
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Conference Track Proceedings. Bengio, Y., LeCun, Y., Eds.; Computational and Biological Learning Society: San Diego, CA, USA, 2015. Available online: https://ora.ox.ac.uk/objects/uuid:60713f18-a6d1-4d97-8f45-b60ad8aebbce (accessed on 12 October 2024).
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In NeurIPS; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; pp. 2672–2680. [Google Scholar]
- Han, X.; Wu, Z.; Wu, Z.; Yu, R.; Davis, L.S. VITON: An Image-Based Virtual Try-on Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7543–7552. [Google Scholar] [CrossRef]
- Choi, S.; Park, S.; Lee, M.; Choo, J. VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 14131–14140. [Google Scholar]
- Neuberger, A.; Borenstein, E.; Hilleli, B.; Oks, E.; Alpert, S. Image Based Virtual Try-On Network From Unpaired Data. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 14–19 June 2020; pp. 5183–5192. [Google Scholar] [CrossRef]
- Dong, H.; Liang, X.; Zhang, Y.; Zhang, X.; Shen, X.; Xie, Z.; Wu, B.; Yin, J. Fashion Editing With Adversarial Parsing Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 14–19 June 2020; pp. 8117–8125. [Google Scholar] [CrossRef]
- Hsiao, W.L.; Katsman, I.; Wu, C.Y.; Parikh, D.; Grauman, K. Fashion++: Minimal Edits for Outfit Improvement. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5046–5055. [Google Scholar] [CrossRef]
- Han, X.; Wu, Z.; Huang, W.; Scott, M.; Davis, L. FiNet: Compatible and Diverse Fashion Image Inpainting. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4480–4490. [Google Scholar] [CrossRef]
- Kwon, Y.R.; Kim, S.; Yoo, D.; Eui Yoon, S. Coarse-to-Fine Clothing Image Generation with Progressively Constructed Conditional GAN. In Proceedings of the VISIGRAPP, Prague, Czech Republic, 5–27 February 2019. [Google Scholar]
- Zhang, H.; Sun, Y.; Liu, L.; Xu, X. CascadeGAN: A category-supervised cascading generative adversarial network for clothes translation from the human body to tiled images. Neurocomputing 2020, 382, 148–161. [Google Scholar] [CrossRef]
- Zhang, H.; Sun, Y.; Liu, L.; Wang, X.; Li, L.; Liu, W. ClothingOut: A category-supervised GAN model for clothing segmentation and retrieval. Neural Comput. Appl. 2020, 32, 4519–4530. [Google Scholar] [CrossRef]
- Jiang, S.; Fu, Y. Fashion Style Generator. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 3721–3727. [Google Scholar] [CrossRef]
- Chen, L.; Tian, J.; Li, G.; Wu, C.H.; King, E.K.; Chen, K.T.; Hsieh, S.H.; Xu, C. TailorGAN: Making User-Defined Fashion Designs. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 3230–3239. [Google Scholar] [CrossRef]
- Shih, Y.S.; Chang, K.Y.; Lin, H.T.; Sun, M. Compatibility family learning for item recommendation and generation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Lin, Y.; Ren, P.; Chen, Z.; Ren, Z.; Ma, J.; de Rijke, M. Improving Outfit Recommendation with Co-supervision of Fashion Generation. In Proceedings of the The World Wide Web Conference, New York, NY, USA, 13–17 May 2019; WWW ’19. pp. 1095–1105. [Google Scholar] [CrossRef]
- Liu, J.; Song, X.; Chen, Z.; Ma, J. MGCM: Multi-modal generative compatibility modeling for clothing matching. Neurocomputing 2020, 414, 215–224. [Google Scholar] [CrossRef]
- Liu, L.; Zhang, H.; Zhou, D. Clothing generation by multi-modal embedding: A compatibility matrix-regularized GAN model. Image Vis. Comput. 2021, 107, 104097. [Google Scholar] [CrossRef]
- Moosaei, M.; Lin, Y.; Akhazhanov, A.; Chen, H.; Wang, F.; Yang, H. OutfitGAN: Learning Compatible Items for Generative Fashion Outfits. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, New Orleans, LA, USA, 18–24 June 2022; pp. 2273–2277. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of the ICLR, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4396–4405. [Google Scholar] [CrossRef]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8107–8116. [Google Scholar] [CrossRef]
- Karras, T.; Aittala, M.; Laine, S.; Härkönen, E.; Hellsten, J.; Lehtinen, J.; Aila, T. Alias-Free Generative Adversarial Networks. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 852–863. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–27 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
- Kim, T.; Cha, M.; Kim, H.; Lee, J.K.; Kim, J. Learning to Discover Cross-domain Relations with Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1857–1865. [Google Scholar]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8789–8797. [Google Scholar] [CrossRef]
- Choi, Y.; Uh, Y.; Yoo, J.; Ha, J.W. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8185–8194. [Google Scholar] [CrossRef]
- Hinz, T.; Fisher, M.; Wang, O.; Shechtman, E.; Wermter, S. CharacterGAN: Few-Shot Keypoint Character Animation and Reposing. In Proceedings of the WACV, Waikoloa, HI, USA, 4–8 January 2022; pp. 1988–1997. [Google Scholar]
- Ge, C.; Song, Y.; Ge, Y.; Yang, H.; Liu, W.; Luo, P. Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On. In Proceedings of the CVPR, Virtual, 19–25 June 2021; pp. 16928–16937. [Google Scholar]
- Gafni, O.; Ashual, O.; Wolf, L. Single-Shot Freestyle Dance Reenactment. In Proceedings of the CVPR, Virtual, 19–25 June 2021; pp. 882–891. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the MICCAI, Munich, Germany, 5–9 October 2015; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar]
- Han, X.; Wu, Z.; Jiang, Y.G.; Davis, L.S. Learning Fashion Compatibility with Bidirectional LSTMs. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; MM ’17. pp. 1078–1086. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Vasileva, M.I.; Plummer, B.A.; Dusad, K.; Rajpal, S.; Kumar, R.; Forsyth, D. Learning Type-Aware Embeddings for Fashion Compatibility. In Proceedings of the Computer Vision—ECCV 2018, Munich Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer Science+Business Media: Cham, Switzerland, 2018; pp. 405–421. [Google Scholar]
- Veit, A.; Belongie, S.; Karaletsos, T. Conditional Similarity Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1781–1789. [Google Scholar] [CrossRef]
- Santoro, A.; Raposo, D.; Barrett, D.G.; Malinowski, M.; Pascanu, R.; Battaglia, P.; Lillicrap, T. A simple neural network module for relational reasoning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part VII. Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image Style Transfer Using Convolutional Neural Networks. In Proceedings of the hlCVPR; IEEE Computer Society: Washington, DC, USA, 2016; pp. 2414–2423. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved Training of Wasserstein GANs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Smolley, S.P. On the Effectiveness of Least Squares Generative Adversarial Networks. TPAMI 2019, 41, 2947–2960. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X.; Chen, X. Improved Techniques for Training GANs. In NeurIPS; Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 2234–2242. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 448–456. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Kolesnikov, A.; Beyer, L.; Zhai, X.; Puigcerver, J.; Yung, J.; Gelly, S.; Houlsby, N. Big Transfer (BiT): General Visual Representation Learning. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V. Springer: Berlin/Heidelberg, Germany, 2020; pp. 491–507. [Google Scholar] [CrossRef]
- Saquib Sarfraz, V.S.M.; Stiefelhagen, R. Efficient Parameter-free Clustering Using First Neighbor Relations. In Proceedings of the CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 8934–8943. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V.S. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
Outfits | Tops | Bottoms | Dresses | Shoes | Bags | Eyeglasses | Earrings | |
---|---|---|---|---|---|---|---|---|
train | 2512 | 4146 | 2214 | 346 | 2515 | 2349 | 396 | 594 |
val | 314 | 515 | 281 | 37 | 315 | 285 | 53 | 37 |
test | 315 | 519 | 286 | 38 | 317 | 294 | 47 | 74 |
IS | FID | LPIPS | |
---|---|---|---|
DFDGAN | 3.87 ± 0.18 | 80.9 | 0.642 |
pix2pix | 1.61 ± 0.04 | 226.9 | 0.74 |
CycleGAN | 1.56 ± 0.14 | 361.8 | 0.83 |
IS | FID | LPIPS | IS | FID | LPIPS | ||
---|---|---|---|---|---|---|---|
DFDGAN | 3.87 ± 0.18 | 80.9 | 0.642 | DFDGAN128 | 2.03 ± 0.07 | 189.8 | 0.71 |
DFDGANmulti | 3.57 ± 0.20 | 93.9 | 0.687 | DFDGAN256 | 3.29 ± 0.09 | 116.3 | 0.700 |
DFDGANconcat | 3.55 ± 0.06 | 100.0 | 0.700 | DFDGAN1024 | 3.39 ± 0.07 | 98.3 | 0.689 |
DFDGANGT | 3.37 ± 0.12 | 132.2 | 0.699 | DFDGAN2048 | 3.78 ± 0.11 | 82.8 | 0.676 |
DFDGANw/o_Mapping | 3.38 ± 0.11 | 119.3 | 0.660 | DFDGANwgan-gp | 3.37 ± 0.15 | 96.8 | 0.682 |
DFDGANvanila_env | 3.35 ± 0.12 | 107.6 | 0.680 | DFDGANmulti_D | 3.57 ± 0.09 | 96.7 | 0.6557 |
DFDGANlayer_9 | 3.20 ± 0.06 | 124.7 | 0.711 | DFDGANlayer_13 | 2.63 ± 0.11 | 154.4 | 0.770 |
DFDGANlayer_5,9 | 3.41 ± 0.12 | 96.3 | 0.696 | DFDGANlayer_5,9,13 | 3.39 ± 0.08 | 96.5 | 0.698 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jung, J.; Kim, H.; Park, J. Deep Fashion Designer: Generative Adversarial Networks for Fashion Item Generation Based on Many-to-One Image Translation. Electronics 2025, 14, 220. https://doi.org/10.3390/electronics14020220
Jung J, Kim H, Park J. Deep Fashion Designer: Generative Adversarial Networks for Fashion Item Generation Based on Many-to-One Image Translation. Electronics. 2025; 14(2):220. https://doi.org/10.3390/electronics14020220
Chicago/Turabian StyleJung, Jaewon, Hyeji Kim, and Jongyoul Park. 2025. "Deep Fashion Designer: Generative Adversarial Networks for Fashion Item Generation Based on Many-to-One Image Translation" Electronics 14, no. 2: 220. https://doi.org/10.3390/electronics14020220
APA StyleJung, J., Kim, H., & Park, J. (2025). Deep Fashion Designer: Generative Adversarial Networks for Fashion Item Generation Based on Many-to-One Image Translation. Electronics, 14(2), 220. https://doi.org/10.3390/electronics14020220