Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage
Abstract
:1. Introduction
- To what extent can Midjourney produce images of buildings and sites close to their original forms?
- What are this tool’s limits to producing these images, and what are the various intertwined factors that affect these limits?
- Evaluating the ability of Midjourney to produce images of buildings and sites close to their original forms.
- Defining the limits of this tool to produce these images and the various intertwined factors that affect these limits.
2. Literature Review
2.1. Technical Dimensions
2.2. Recent Approaches
3. Methods and Materials
Heatmap: This visual representation illustrates the likelihood that each segment of an image attracts attention within the initial 3–5 seconds of observation. Features perceived during this brief timeframe carry a heightened potential for capturing the audience’s attention.
Hotspots: This numerical simplification of the heatmap outcomes reveals the content most likely to be observed where each region is assigned a numeric score, predicting the probability of viewers directing their attention within that particular region during the first 3–5 seconds.
Gaze Sequence: This component delineates the four most probable gaze locations arranged by their anticipated viewing sequences within the first 3–5 seconds.
- Rate the similarity between (A), which shows an AI-generated image of Ka‘ba, and (B), which shows an actual photo.
- Rate the similarity between (A), which shows an AI-generated image of the Dome of the Rock (Qubbat al-Sakhra), and (B), which offers an actual photo.
- Rate the similarity between (A), which shows an AI-generated image of the spiral minaret of Ibn Tulun Mosque in Cairo, Egypt, and (B), which shows an actual photo.
- Rate the similarity between (A), which shows an AI-generated image of the Ibn Tulun Mosque in Cairo, and (B), which offers an actual photo.
- The following images show the minarets as crucial architectural elements in different sub-regions around the Islamic world. Please attempt to recognize the regions for each image (from A to D).
4. Results
- Limits of the prompt
- Length
- Language
- Numeracy
- Controllability
- Limits of fame
- Limits of regionality and historical styles
- Limits of architectural and urban elements and details
5. Discussions
5.1. The Limits
5.1.1. Limits of the Prompt
- Length
- Language
- Numeracy
- Controllability
5.1.2. Limits of Fame
5.1.3. Limits of Regionality and Historical Styles
5.1.4. Limits of Architectural and Urban Elements and Details
5.2. Validation
5.2.1. Technical Validation using Visual Attention Analysis
5.2.2. Human Validation through Survey
6. Conclusions
6.1. Research Limitation
6.2. Future Research
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sukkar, A.W.; Fareed, M.W.; Yahia, M.W.; Mushtaha, E.; De Giosa, S.L. Artificial Intelligence Islamic Architecture (AIIA): What Is Islamic Architecture in the Age of Artificial Intelligence? Buildings 2024, 14, 781. [Google Scholar] [CrossRef]
- Bevilacqua, M.G.; Caroti, G.; Piemonte, A.; Ulivieri, D. Reconstruction of Lost Architectural Volumes by Integration of Photogrammetry from Archive Imagery with 3-D Models of the Status Quo. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 119–125. [Google Scholar] [CrossRef]
- Balletti, C.; Dabrowski, M.; Guerra, F.; Vernier, P. Digital Reconstruction of the Lost San Geminiano’s Church in San Marco’s Square, Venice. In Proceedings of the IMEKO TC-4 International Conference on Metrology for Archaeology and Cultural Heritage, Trento, Italy, 22–24 October 2020; pp. 1–5. Available online: https://www.imeko.org/publications/tc4-Archaeo-2020/IMEKO-TC4-MetroArchaeo2020-057.pdf (accessed on 1 January 2024).
- Sabri, R.; Abdalla, S.B.; Rashid, M. Towards a Digital Architectural Heritage Knowledge Management Platform: Producing the HBIM Model of Bait al Naboodah in Sharjah, UAE. In Proceedings of the 12th International Conference on Structural Analysis of Historical Constructions, Online Event, 29 September–1 October 2021; Roca, P., Pelà, L., Molins, C., Eds.; SAHC 2021. International Centre for Numerical Methods in Engineering, CIMNE: Barcelona, Spain, 2021; pp. 1641–1650. [Google Scholar] [CrossRef]
- Aburamadan, R.; Moustaka, A.; Trillo, C.; Makore, B.C.N.; Udeaja, C.; Gyau Baffour Awuah, K. Heritage Building Information Modelling (HBIM) as a Tool for Heritage Conservation: Observations and Reflections on Data Collection, Management and Use in Research in a Middle Eastern Context. In Culture and Computing: Interactive Cultural Heritage and Arts, HCII 2021; Lecture Notes in Computer Science; Rauterberg, M., Ed.; Springer International Publishing: Cham, Switzerland, 2021; Volume 12794, pp. 3–14. [Google Scholar] [CrossRef]
- Abdalla, S.B.; Rashid, M.; Yahia, M.W.; Mushtaha, E.; Opoku, A.; Sukkar, A.; Maksoud, A.; Hamad, R. Comparative Analysis of Building Information Modeling (BIM) Patterns and Trends in the United Arab Emirates (UAE) Compared to Developed Countries. Buildings 2023, 13, 695. [Google Scholar] [CrossRef]
- Günay, S. Virtual Reality for Lost Architectural Heritage Visualization Utilizing Limited Data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, XLVI-2/W1-2022, 253–257. [Google Scholar] [CrossRef]
- Pietroni, E.; Ferdani, D. Virtual Restoration and Virtual Reconstruction in Cultural Heritage: Terminology, Methodologies, Visual Representation Techniques, and Cognitive Models. Information 2021, 12, 167. [Google Scholar] [CrossRef]
- Strobelt, H.; Webson, A.; Sanh, V.; Hoover, B.; Beyer, J.; Pfister, H.; Rush, A.M. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. IEEE Trans. Vis. Comput. Graph. 2022, 29, 1146–1156. [Google Scholar] [CrossRef] [PubMed]
- Oppenlaender, J.; Linder, R.; Silvennoinen, J. Prompting AI Art: An Investigation into the Creative Skill of Prompt Engineering. arXiv 2023, arXiv:2303.13534. [Google Scholar] [CrossRef]
- White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv 2023, arXiv:2302.11382v1. [Google Scholar] [CrossRef]
- Ramzan, S.; Iqbal, M.M.; Kalsum, T. Text-to-Image Generation Using Deep Learning. Eng. Proc. 2022, 20, 16. [Google Scholar] [CrossRef]
- Ku, H.; Lee, M. TextControlGAN: Text-to-Image Synthesis with Controllable Generative Adversarial Networks. Appl. Sci. 2023, 13, 5098. [Google Scholar] [CrossRef]
- Abdallah, Y.K.; Estévez, A.T. Biomaterials Research-Driven Design Visualized by AI Text-Prompt-Generated Images. Designs 2023, 7, 48. [Google Scholar] [CrossRef]
- Repenning, A.; Grabowski, S. Prompting is Computational Thinking. In Proceedings of the IS-EUD 2023: 9th International Symposium on End-User Development, Cagliari, Italy, 6–8 June 2023; Available online: https://ceur-ws.org/Vol-3408/short-s2-07.pdf (accessed on 1 January 2024).
- Göring, S.; Ramachandra Rao, R.R.; Merten, R.; Raake, A. Analysis of Appeal for Realistic AI-generated Photos. IEEE Access 2023, 11, 38999–39012. [Google Scholar] [CrossRef]
- Ruskov, M. Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales. arXiv 2023, arXiv:2302.08961v2. [Google Scholar] [CrossRef]
- Oppenlaender, J. The Creativity of Text-to-Image Generation. In Proceedings of the 25th International Academic Mindtrek Conference, Tampere, Finland, 16–18 November 2022; pp. 192–202. [Google Scholar] [CrossRef]
- Lyu, Y.; Wang, X.; Lin, R.; Wu, J. Communication in Human–AI Co-creation: Perceptual Analysis of Paintings Generated by the Text-to-image System. Appl. Sci. 2022, 12, 11312. [Google Scholar] [CrossRef]
- Chen, J.; Shao, Z.; Hu, B. Generating Interior Design from Text: A New Diffusion Model-Based Method for Efficient Creative Design. Buildings 2023, 13, 1861. [Google Scholar] [CrossRef]
- Zhang, Z.; Fort, J.M.; Giménez Mateu, L. Exploring the Potential of Artificial Intelligence as a Tool for Architectural Design: A Perception Study Using Gaudí’s Works. Buildings 2023, 13, 1863. [Google Scholar] [CrossRef]
- Barandy, K. Alternative Histories: Iconic Architecture Reimagined in Different Styles Using AI. Available online: https://www.designboom.com/architecture/getagent-iconic-architecture-reimagined-ai-buildings-different-architectural-styles-ai-midjourney-04-03-2023/ (accessed on 1 January 2024).
- Najafian, K. Maximalist AI Explorations Reimagine the Versailles Palace with Mesmerizing Gold Facades; Mango, Z., Ed.; Designboom: Milan, Italy; New York, NY, USA; Beijing, China; Tokyo, Japan, 2022; Available online: https://www.designboom.com/architecture/maximalist-ai-explorations-versailles-palace-gold-facades-kaveh-najafian-09-15-2022/ (accessed on 1 January 2024).
- Betsky, A. The Voyage Continues: Designers Use Midjourney to Reimagine Capri. 2022. Available online: https://www.architectmagazine.com/design/the-voyage-continues-designers-use-midjourney-to-reimagine-capri_o (accessed on 1 January 2024).
- Khan, R. Midjourney Reinvents Ancient Ziggurat Pyramid as Modern Cultural Landmarks. 2023. Available online: https://www.designboom.com/architecture/midjourney-ancient-ziggurat-pyramid-temple-modern-arts-venue-rolando-cedeno-de-la-cruz-04-27-2023/ (accessed on 1 January 2024).
- Göring, S.; Ramachandra Rao, R.R.; Merten, R.; Raake, A. Appeal and Quality Assessment for AI-generated Images. In Proceedings of the 15th International Conference on Quality of Multimedia Experience (QoMEX), Ghent, Belgium, 20–22 June 2023; pp. 115–118. [Google Scholar] [CrossRef]
- Gibney, E. Is AI Fuelling a Reproducibility Crisis in Science? Nature 2022, 608, 250–251. [Google Scholar] [CrossRef]
- Kang, Y.; Zhang, Q.; Roth, R. The Ethics of AI-Generated Maps: A Study of DALLE 2 and Implications for Cartography. arXiv 2023, arXiv:2304.10743v3. [Google Scholar] [CrossRef]
- Creswell, J.W. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches; Sage: Thousand Oaks, CA, USA, 2014. [Google Scholar]
- Shi, Y.; Du, J.; Ragan, E. Review Visual Attention and Spatial Memory in Building Inspection: Toward a Cognition-driven Information System. Adv. Eng. Inform. 2020, 44, 101061. [Google Scholar] [CrossRef]
- Villegas, E.; Fonts, E.; Fernández, M.; Fernández-Guinea, S. Visual Attention, and Emotion Analysis Based on Qualitative Assessment and Eye-tracking Metrics—The Perception of a Video Game Trailer. Sensors 2023, 23, 9573. [Google Scholar] [CrossRef]
- Salama, A.M.; Salingaros, N.A.; MacLean, L. A Multimodal Appraisal of Zaha Hadid’s Glasgow Riverside Museum—Criticism, Performance Evaluation, and Habitability. Buildings 2023, 13, 173. [Google Scholar] [CrossRef]
- Li, N.; Zhang, S.; Xia, L.; Wu, Y. Investigating the Visual Behavior Characteristics of Architectural Heritage Using Eye-Tracking. Buildings 2022, 12, 1058. [Google Scholar] [CrossRef]
- Lavdas, A.A.; Salingaros, N.A. Architectural Beauty: Developing a Measurable and Objective Scale. Challenges 2022, 13, 56. [Google Scholar] [CrossRef]
- Lavdas, A.A.; Salingaros, N.A.; Sussman, A. Visual Attention Software: A New Tool for Understanding the ‘Subliminal’ Experience of the Built Environment. Appl. Sci. 2021, 11, 6197. [Google Scholar] [CrossRef]
- Mushtaha, E.; Abu Dabous, S.; Alsyouf, I.; Ahmed, A.; Raafat Abdraboh, N. The Challenges and Opportunities of Online Learning and Teaching at Engineering and Theoretical Colleges during the Pandemic. Ain Shams Eng. J. 2022, 13, 101770. [Google Scholar] [CrossRef]
- Alalouch, C. Cognitive Styles, Gender, and Student Academic Performance in Engineering Education. Educ. Sci. 2021, 11, 502. [Google Scholar] [CrossRef]
- Peterson, A. Dictionary of Islamic Architecture; Routledge: London, UK, 1996; pp. 187–190. [Google Scholar]
- Bloom, J.; Blair, S. (Eds.) Grove Encyclopedia of Islamic Art and Architecture, 3 Volumes; Oxford University Press: Oxford, UK, 2009; Volume 2. [Google Scholar]
- Reviriego, P.; Merino-Gómez, E. Text to Image Generation: Leaving no Language Behind. arXiv 2022, arXiv:2208.09333v2. [Google Scholar] [CrossRef]
- Liu, S.; Leng, D.; Yin, Y. Bridge Diffusion Model: Bridge non-English Language-native Text-to-image Diffusion Model with English Communities. arXiv 2023, arXiv:2309.00952v1. [Google Scholar] [CrossRef]
- Wasielewski, A. Midjourney Can’t Count: Questions of Representation and Meaning for Text-to-Image Generators. Interdiscip. J. Image Sci. 2023, 37, 71–82. [Google Scholar] [CrossRef]
- Yang, S.; Wang, Z.; Wang, Z.; Xu, N.; Liu, J.; Guo, Z. Controllable Artistic Text Style Transfer via Shape-Matching GAN. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4442–4451. [Google Scholar] [CrossRef]
- Alafandi, R.; Rahim, A.A. Umayyad Mosque in Aleppo Yesterday, Today and Tomorrow. Int. J. Arts Sci. 2014, 7, 319–347. Available online: https://www.universitypublications.net/ijas/0705/pdf/H4V574.pdf (accessed on 29 December 2023).
- Karim, M.M. Kaaba Mirror. Wikipedia 2007. Available online: https://en.m.wikipedia.org/wiki/File:Kaaba_mirror_edit_jj.jpg (accessed on 29 November 2023).
- Franco, S. Dome of the Rock. Unsplash 2019. Available online: https://unsplash.com/photos/blue-and-brown-mosque-ex9KQrN1mj0 (accessed on 29 November 2023).
- Tahoon, A. Minaret of Ahmed Ibn Tulun Mosque. Wikipedia 2018. Available online: https://ar.m.wikipedia.org/wiki/%D9%85%D9%84%D9%81:Minaret_of_Ahmed_Ibn_Tulun_Mosque.jpg (accessed on 29 November 2023).
- Fareed, M.W.; Amer, M. People-centred Natural Language Processing for Cultural Tourism Market: A Research Agenda. In Proceedings of the 2nd International Satellite Conference on Visual Pattern Extraction and Recognition for Cultural Heritage Understanding; CEUR-WS Workshop: Zadar, Croatia, 2023; p. 3600. [Google Scholar]
- Sukkar, A.; Yahia, M.W.; Mushtaha, E.; Maksoud, A.; Abdalla, S.B.; Nasif, O.; Melahifci, O. Applying Active Learning Method to Enhance Teaching Outcomes in Architectural Engineering Courses. Open House Int. 2024, 49, 205–220. [Google Scholar] [CrossRef]
- Sukkar, A.; Yahia, M.W.; Mushtaha, E.; Maksoud, A.; Nassif, O.; Melahifci, O. The Effect of Active Teaching on Quality Learning: Students’ Perspective in an Architectural Science Course at the University of Sharjah. In Proceedings of the 2022 Advances in Science and Engineering Technology International Conferences (ASET), IEEE Xplore, Dubai, United Arab Emirates, 21–24 February 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Yahia, M.W.; Abdalla, S.B.; Sukkar, A.; Saleem, A.A.; Maksoud, A.M. Towards Better Site Analysis in Architectural and Urban Design: Adapting Experiential Learning Theory in Post-COVID Architectural Teaching Methods. Arch. Des. Res. 2023, 36, 51–65. [Google Scholar] [CrossRef]
- Duan, Q.; Qi, L.; Cao, R.; Si, P. Research on Sustainable Reuse of Urban Ruins Based on Artificial Intelligence Technology: A Study of Guangzhou. Sustainability 2022, 14, 14812. [Google Scholar] [CrossRef]
- Amro, D.K.; Sukkar, A.; Yahia, M.W.; Abukeshek, M.K. Evaluating the Cultural Sustainability of the Adaptive Reuse of Al-Nabulsi Traditional House into a Cultural Center in Irbid, Jordan. Sustainability 2023, 15, 13198. [Google Scholar] [CrossRef]
- Leach, N. Design in the Age of Artificial Intelligence. Landsc. Archit. Front. 2018, 6, 8–19. [Google Scholar] [CrossRef]
- Cantrell, B.; Zhang, Z. A Third Intelligence. Landsc. Archit. Front. 2018, 6, 42–51. [Google Scholar] [CrossRef]
- Foy, P. Getting Started with Midjourney V6. 2023. Available online: https://www.mlq.ai/getting-started-with-midjourney-v6/ (accessed on 1 January 2024).
Limitations | Factors | |
---|---|---|
Limits of the prompt | ||
Length | Long/medium/short | |
Language | English, Arabic, etc. | |
Numeracy | One, two, three, etc. | |
Controllability | Controllable/less controllable/uncontrollable | |
Limits of fame | Famous/less famous | |
Limits of regionality and historical styles | Arabia, Levant, North Africa, Far East, etc. Early Islamic, Ottoman, Mamluk, etc. | |
Limits of architectural and urban elements and details | Calligraphy, arabesque, ornaments, etc. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sukkar, A.W.; Fareed, M.W.; Yahia, M.W.; Abdalla, S.B.; Ibrahim, I.; Senjab, K.A.K. Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage. Buildings 2024, 14, 786. https://doi.org/10.3390/buildings14030786
Sukkar AW, Fareed MW, Yahia MW, Abdalla SB, Ibrahim I, Senjab KAK. Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage. Buildings. 2024; 14(3):786. https://doi.org/10.3390/buildings14030786
Chicago/Turabian StyleSukkar, Ahmad W., Mohamed W. Fareed, Moohammed Wasim Yahia, Salem Buhashima Abdalla, Iman Ibrahim, and Khaldoun Abdul Karim Senjab. 2024. "Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage" Buildings 14, no. 3: 786. https://doi.org/10.3390/buildings14030786
APA StyleSukkar, A. W., Fareed, M. W., Yahia, M. W., Abdalla, S. B., Ibrahim, I., & Senjab, K. A. K. (2024). Analytical Evaluation of Midjourney Architectural Virtual Lab: Defining Major Current Limits in AI-Generated Representations of Islamic Architectural Heritage. Buildings, 14(3), 786. https://doi.org/10.3390/buildings14030786