Advancing Ancient Artifact Character Image Augmentation through Styleformer-ART for Sustainable Knowledge Preservation
Abstract
:1. Introduction
- We introduce Styleformer-ART—a transformer-based model: we modified the encoder architecture of the Styleformer model with StyleGAN2 to suit the intricate requirements involved in generating high-quality synthetic artifact character images;
- We demonstrate the use of augmented images for training a recognition model for ancient artifact characters, demonstrating how the use of augmented images can enhance the training and performance of recognition model, which was, in our case, a CNN model for artifact character recognition tasks;
- Through an evaluation, we demonstrate that our Styleformer-ART model outperforms all the other cutting-edge GAN models that are suitable for artifact character recognition.
2. Related Work
2.1. Character Detection on Artifacts
2.2. Data Augmentation Techniques
2.3. Challenges and Opportunities
3. Dataset
- The Scarcity of Data: The primary challenge lies in the limited availability of labeled images of ancient artifacts. Such data are not readily available in large quantities, which is crucial for training machine learning models effectively [2,11]. Ancient artifacts with detailed and verified annotations are rare, making the collection process difficult and time-consuming;
- Historical and Geographical Diversity: The dataset encompasses artifacts from various historical periods and geographical locations, introducing significant variability in artifact types and character styles [1,11]. This diversity, while valuable for training robust models, complicates the collection process as it requires sourcing from a wide array of periods and locations;
- Character Frequency: The restriction to the 10 most frequently encountered English characters highlights the difficulty in obtaining a balanced dataset with all English alphabet characters. The distribution of characters, as shown in Table 1, is uneven, with some characters being much more prevalent than others. This imbalance can affect the model’s ability to generalize across all characters.
4. Methodology
4.1. Dataset Preparation
4.2. Model Design
4.2.1. Styleformer-ART for Data Augmentation
4.2.2. CNN Model for Data Recognition
4.3. Data Augmentation
4.4. Experimental Setup
4.5. Evaluation Metrics
5. Result and Evaluation
6. Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Assael, Y.; Sommerschield, T.; Shillingford, B.; Bordbar, M.; Pavlopoulos, J.; Chatzipanagiotou, M.; Androutsopoulos, I.; Prag, J.; de Freitas, N. Restoring and attributing ancient texts using deep neural networks. Nature 2022, 603, 280–283. [Google Scholar] [CrossRef] [PubMed]
- Narang, S.R.; Kumar, M.; Jindal, M.K. DeepNetDevanagari: A deep learning model for Devanagari ancient character recognition. Multimed. Tools Appl. 2021, 80, 20671–20686. [Google Scholar] [CrossRef]
- Huang, H.; Yang, D.; Dai, G.; Han, Z.; Wang, Y.; Lam, K.M.; Yang, F.; Huang, S.; Liu, Y.; He, M. AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022. [Google Scholar]
- Casini, L.; Marchetti, N.; Montanucci, A.; Orrù, V.; Roccetti, M. A human–AI collaboration workflow for archaeological sites detection. Sci. Rep. 2023, 13, 8699. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Neural Information Processing Systems, Cambridge, MA, USA, 8–13 December 2014. [Google Scholar]
- Alqahtani, H.; Kavakli-Thorne, M.; Kumar, G. Applications of Generative Adversarial Networks (GANs): An Updated Review. Arch. Comput. Methods Eng. 2019, 28, 525–552. [Google Scholar] [CrossRef]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved Training of Wasserstein GANs. In Proceedings of the Neural Information Processing Systems, Red Hook, NY, USA, 4–9 December 2017. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8107–8116. [Google Scholar]
- Warde-Farley, D.; Bengio, Y. Improving Generative Adversarial Networks with Denoising Feature Matching. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Fontanella, F.; Colace, F.; Molinara, M.; di Freca, A.S.; Stanco, F. Pattern recognition and artificial intelligence techniques for cultural heritage. Pattern Recognit. Lett. 2020, 138, 23–29. [Google Scholar] [CrossRef]
- Yalin, M.; Li, L.; Ji, Y.; Li, G. Research on denoising method of chinese ancient character image based on chinese character writing standard model. Sci. Rep. 2022, 12, 19795. [Google Scholar] [CrossRef] [PubMed]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Ding, X.; Wang, Y.; Xu, Z.; Welch, W.J.; Wang, Z.J. CcGAN: Continuous Conditional Generative Adversarial Networks for Image Generation. arXiv 2020, arXiv:2011.07466. [Google Scholar]
- Midoh, Y.; Nakamae, K. Image quality enhancement of a CD-SEM image using conditional generative adversarial networks. In Proceedings of the Advanced Lithography, San Jose, CA, USA, 24–28 February 2019. [Google Scholar]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar]
- Park, J.; Kim, Y. Styleformer: Transformer-based Generative Adversarial Networks with Style Vector. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8973–8982. [Google Scholar]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef]
- Abdulraheem, A.; Suleiman, J.T.; Jung, I.Y. Generative Adversarial Network Models for Augmenting Digit and Character Datasets Embedded in Standard Markings on Ship Bodies. Electronics 2023, 12, 3668. [Google Scholar] [CrossRef]
- Hidayat, A.A.; Purwandari, K.; Cenggoro, T.W.; Pardamean, B. A Convolutional Neural Network-based Ancient Sundanese Character Classifier with Data Augmentation. Procedia Comput. Sci. 2021, 179, 195–201. [Google Scholar] [CrossRef]
- Jindal, A.; Ghosh, R. An optimized CNN system to recognize handwritten characters in ancient documents in Grantha script. Int. J. Inf. Technol. 2023, 15, 1975–1983. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Cazenavette, G.; de Guevara, M.L. MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation. arXiv 2021, arXiv:2105.14110. [Google Scholar]
- Emami, H.; Aliabadi, M.M.; Dong, M.; Chinnam, R.B. SPA-GAN: Spatial Attention GAN for Image-to-Image Translation. IEEE Trans. Multimed. 2019, 23, 391–401. [Google Scholar] [CrossRef]
- Guha, R.; Das, N.; Kundu, M.; Nasipuri, M.; Santosh, K.C. DevNet: An Efficient CNN Architecture for Handwritten Devanagari Character Recognition. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2052009. [Google Scholar] [CrossRef]
- Driss, S.B.; Soua, M.; Kachouri, R.; Akil, M. A comparison study between MLP and convolutional neural network models for character recognition. In Proceedings of the Commercial + Scientific Sensing and Imaging, Anaheim, CA, USA, 9–13 April 2017. [Google Scholar]
- Bhardwaj, A. An Accurate Deep-Learning Model for Handwritten Devanagari Character Recognition. Int. J. Mech. Eng. 2022, 7, 1317–1328. [Google Scholar]
- Abdulraheem, A.; Jung, I.Y. Effective Digital Technology Enabling Automatic Recognition of Special-Type Marking of Expiry Dates. Sustainability 2023, 15, 12915. [Google Scholar] [CrossRef]
- Corazza, M.; Tamburini, F.; Valério, M.; Ferrara, S. Unsupervised deep learning supports reclassification of Bronze age cypriot writing system. PLoS ONE 2022, 17, e0269544. [Google Scholar] [CrossRef]
- Wu, J.; Huang, Z.; Thoma, J.; Acharya, D.; Gool, L.V. Wasserstein Divergence for GANs. In Proceedings of the European Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Odena, A.; Olah, C.; Shlens, J. Conditional Image Synthesis with Auxiliary Classifier GANs. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Dimitrakopoulos, P.; Sfikas, G.; Nikou, C. Wind: Wasserstein Inception Distance For Evaluating Generative Adversarial Network Performance. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 3182–3186. [Google Scholar]
- Yu, Y.; Zhang, W.; Deng, Y. Frechet Inception Distance (fid) for Evaluating Gans; China University of Mining Technology Beijing Graduate School: Xuzhou, China, 2021. [Google Scholar]
- Benny, Y.; Galanti, T.; Benaim, S.; Wolf, L. Evaluation Metrics for Conditional Image Generation. Int. J. Comput. Vis. 2020, 129, 1712–1731. [Google Scholar] [CrossRef]
- Betzalel, E.; Penso, C.; Navon, A.; Fetaya, E. A Study on the Evaluation of Generative Models. arXiv 2022, arXiv:2206.10935. [Google Scholar]
- Kynkaanniemi, T.; Karras, T.; Aittala, M.; Aila, T.; Lehtinen, J. The Role of ImageNet Classes in Fréchet Inception Distance. arXiv 2022, arXiv:2203.06026. [Google Scholar]
Characters | Number of Samples |
---|---|
A | 85 |
D | 24 |
E | 75 |
I | 46 |
L | 37 |
N | 45 |
T | 58 |
S | 31 |
R | 57 |
O | 43 |
Linformer | Depth | Minimum Heads | FID |
---|---|---|---|
✓ | 32 | 1 | 191 |
✗ | 32 | 1 | 141 |
✗ | 64 | 1 | 140 |
✓ | 64 | 1 | 198 |
✓ | 32 | 2 | 194 |
✗ | 32 | 2 | 165 |
✓ | 64 | 2 | 201 |
✗ | 64 | 2 | 175 |
Characters | WGANGP [7] | WGANDIV [30] | ACGAN [31] | CCGAN [13] | Styleformer-ART |
---|---|---|---|---|---|
A | 315.88 | 314.92 | 456.87 | 399.71 | 146.88 |
D | 224.14 | 225.74 | 421.86 | 283.15 | 106.95 |
E | 291.64 | 293.71 | 415.85 | 352.91 | 155.80 |
I | 287.84 | 291.25 | 488.31 | 379.81 | 248.59 |
L | 276.03 | 282.71 | 491.18 | 409.40 | 215.04 |
N | 301.49 | 307.07 | 488.31 | 408.13 | 291.00 |
O | 249.46 | 249.24 | 396.67 | 341.99 | 216.54 |
R | 286.43 | 288.35 | 476.93 | 347.79 | 135.83 |
S | 224.90 | 229.34 | 469.72 | 341.50 | 131.50 |
T | 285.08 | 291.28 | 484.30 | 372.63 | 267.30 |
Average | 274.29 | 277.36 | 459.0 | 363.70 | 210.72 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Suleiman, J.T.; Jung, I.Y. Advancing Ancient Artifact Character Image Augmentation through Styleformer-ART for Sustainable Knowledge Preservation. Sustainability 2024, 16, 6455. https://doi.org/10.3390/su16156455
Suleiman JT, Jung IY. Advancing Ancient Artifact Character Image Augmentation through Styleformer-ART for Sustainable Knowledge Preservation. Sustainability. 2024; 16(15):6455. https://doi.org/10.3390/su16156455
Chicago/Turabian StyleSuleiman, Jamiu T., and Im Y. Jung. 2024. "Advancing Ancient Artifact Character Image Augmentation through Styleformer-ART for Sustainable Knowledge Preservation" Sustainability 16, no. 15: 6455. https://doi.org/10.3390/su16156455
APA StyleSuleiman, J. T., & Jung, I. Y. (2024). Advancing Ancient Artifact Character Image Augmentation through Styleformer-ART for Sustainable Knowledge Preservation. Sustainability, 16(15), 6455. https://doi.org/10.3390/su16156455