Efficient Comic Content Extraction and Coloring Composite Networks
Abstract
:1. Introduction
- We designed a comic feature extraction network based on ResNet [14], incorporating the RoIalign layer and a subdivision layer to improve the performance of comic content detail feature detection and extraction.
- We created a GAN-based comic coloring network module. Using U-Net as the generator and Patchgan as the discriminator, the perception of image details is enhanced, the overall color performance of the image is improved, and the generated color comic images are comparable to those manually colored by comic artists.
- We created a dataset, KComics5000, based on Korean comics, consisting of 5000 comic pages.
2. Related Work
3. Methods
3.1. Comics Intelligent Detection and Recognition Framework
3.2. Comic Coloring Module
4. Results
4.1. Dataset and Implementation Details
4.2. Overall Performance
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Kondo, K.; Hasegawa, T. CNN-based criteria for classifying artists by illustration style. In Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing, Singapore, 20–22 March 2020; pp. 93–98. [Google Scholar]
- Qin, X.; Zhou, Y.; He, Z.; Wang, Y.; Tang, Z. A Faster R-CNN based method for comic characters face detection. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; IEEE: Piscataway, NJ, USA, 2017; Volume 1, pp. 1074–1080. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA, 7–12 December 2015; Volume 28. [Google Scholar]
- Xin, H.; Ma, C. Comic text detection and recognition based on deep learning. In Proceedings of the 2021 3rd International Conference on Applied Machine Learning (ICAML), Changsha, China, 23–25 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 20–23. [Google Scholar]
- Xianbao, C.; Guihua, Q.; Yu, J.; Zhu, Z. An improved small object detection method based on YOLOv3. Pattern Anal. Appl. 2021, 24, 1347–1355. [Google Scholar] [CrossRef]
- Soykan, G.; Yuret, D.; Sezgin, T.M. Identity-aware semi-supervised learning for comic character re-identification. arXiv 2023, arXiv:2308.09096. [Google Scholar]
- Hinami, R.; Ishiwatari, S.; Yasuda, K.; Matsui, Y. Towards fully automated manga translation. Proc. AAAI Conf. Artif. Intell. 2021, 35, 12998–13008. [Google Scholar] [CrossRef]
- Sharma, V.; Kukreja, V. Visual Narratives Unveiled: Comic Genre Classification through CNN-SVM Fusion. In Proceedings of the 2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM), Noida, India, 21–23 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Shimizu, Y.; Furuta, R.; Ouyang, D.; Taniguchi, Y.; Hinami, R.; Ishiwatari, S. Painting style-aware manga colorization based on generative adversarial networks. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1739–1743. [Google Scholar]
- Liu, Y.; Guo, Z.; Guo, H.; Xiao, H. Zoom-GAN: Learn to colorize multi-scale targets. Vis. Comput. 2023, 39, 3299–3310. [Google Scholar] [CrossRef]
- Parmar, G.; Kumar Singh, K.; Zhang, R.; Li, Y.; Lu, J.; Zhu, J.Y. Zero-shot image-to-image translation. In Proceedings of the ACM SIGGRAPH 2023 Conference Proceedings, Los Angeles, CA, USA, 6–10 August 2023; pp. 1–11. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Arai, K.; Tolle, H. Method for real time text extraction of digital manga comic. Int. J. Image Process. (IJIP) 2011, 4, 669–676. [Google Scholar]
- Rigaud, C.; Burie, J.C.; Ogier, J.M. Text-independent speech balloon segmentation for comics and manga. In Graphic Recognition, Current Trends and Challenges: 11th International Workshop, GREC 2015, Nancy, France, 22–23 August 2015; Revised Selected Papers 11; Springer: Cham, Switzerland, 2017; pp. 133–147. [Google Scholar]
- Nguyen, N.; Rigaud, C.; Burie, J. Comic characters detection using deep learning. In Proceedings of the 2nd International Workshop on coMics Analysis, Processing, and Understanding, 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017; pp. 41–46. [Google Scholar]
- Han, X.; Chang, J.; Wang, K. You only look once: Unified, real-time object detection. Procedia Comput. Sci. 2021, 183, 61–72. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Dutta, A.; Biswas, S.; Das, A.K. CNN-based segmentation of speech balloons and narrative text boxes from comic book page images. Int. J. Doc. Anal. Recognit. (IJDAR) 2021, 24, 49–62. [Google Scholar] [CrossRef]
- Dubray, D.; Laubrock, J. Deep CNN-based speech balloon detection and segmentation for comic books. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1237–1243. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Chen, S.Y.; Zhang, J.Q.; Gao, L.; He, Y.; Xia, S.; Shi, M.; Zhang, F.L. Active colorization for cartoon line drawings. IEEE Trans. Vis. Comput. Graph. 2020, 28, 1198–1208. [Google Scholar] [CrossRef] [PubMed]
- Dou, Z.; Wang, N.; Li, B.; Wang, Z.; Li, H.; Liu, B. Dual color space guided sketch colorization. IEEE Trans. Image Process. 2021, 30, 7292–7304. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Wu, W.; Li, C.; Li, Y.; Wu, H. Reference-guided structure-aware deep sketch colorization for cartoons. Comput. Vis. Media 2022, 8, 135–148. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9799–9808. [Google Scholar]
- Guérin, C.; Rigaud, C.; Mercier, A.; Ammar-Boudjelal, F.; Bertet, K.; Bouju, A.; Burie, J.C.; Louis, G.; Ogier, J.M.; Revel, A. eBDtheque: A representative database of comics. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 1145–1149. [Google Scholar]
- Li, Y.; Aizawa, K.; Matsui, Y. Manga109Dialog: A Large-Scale Dialogue Dataset for Comics Speaker Detection. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Nguyen, N.V.; Rigaud, C.; Burie, J.C. Digital comics image indexing based on deep learning. J. Imaging 2018, 4, 89. [Google Scholar] [CrossRef]
Method | Recall | Precision | F1 |
---|---|---|---|
Arai and Tolle [15] | 69.8 | 62.3 | 63.6 |
Qin et al. [3] | 78.7 | 75.1 | 75.4 |
Nguyen et al. [17] | 83.2 | 84.0 | 82.7 |
Dutta et al. [20] | 84.6 | 85.1 | 85.3 |
Xin and Ma [5] | 88.5 | 87.9 | 88.4 |
Dubray and Laubrock [21] | 90.7 | 91.6 | 91.1 |
Ours | 95.3 | 98.8 | 98.5 |
Model | FID | SSIM |
---|---|---|
Pix2Pix | 103.25 | 0.53 |
S2PV3 | 79.01 | 0.61 |
DCSGAN | 54.36 | 0.69 |
Zoom-GAN | 34.75 | 0.87 |
VAE-GAN | 27.09 | 0.91 |
Liu et al. [12] | 25.16 | 0.92 |
Ours | 24.58 | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Man, Q.; Cho, Y.-I. Efficient Comic Content Extraction and Coloring Composite Networks. Appl. Sci. 2025, 15, 2641. https://doi.org/10.3390/app15052641
Man Q, Cho Y-I. Efficient Comic Content Extraction and Coloring Composite Networks. Applied Sciences. 2025; 15(5):2641. https://doi.org/10.3390/app15052641
Chicago/Turabian StyleMan, Qiaoyue, and Young-Im Cho. 2025. "Efficient Comic Content Extraction and Coloring Composite Networks" Applied Sciences 15, no. 5: 2641. https://doi.org/10.3390/app15052641
APA StyleMan, Q., & Cho, Y.-I. (2025). Efficient Comic Content Extraction and Coloring Composite Networks. Applied Sciences, 15(5), 2641. https://doi.org/10.3390/app15052641