Feature Papers in Artificial Intelligence

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (30 September 2025) | Viewed by 10066

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
Interests: machine learning; pattern recognition; computer vision
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
BISITE Research Group, University of Salamanca, Edificio Multiusos I + D + I, 37007 Salamanca, Spain
Interests: artificial intelligence; multi-agent systems; cloud computing and distributed systems; technology-enhanced learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Automatics and Applied Software, Faculty of Engineering, Aurel Vlaicu University of Arad, 310130 Arad, Romania
Interests: intelligent systems; soft computing; fuzzy control; modeling and simulation; biometrics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial intelligence has revolutionized numerous aspects of our society. Foundation models and embodied intelligence are expanding the frontiers of AI research, and the convergence of AI with fundamental sciences has accelerated scientific discovery across disciplines. This feature paper Special Issue aims to showcase cutting-edge research that advances both the theoretical understanding and practical applications of AI technologies.

We welcome high-quality contributions spanning the full spectrum of AI research, from novel methodologies and theoretical frameworks to transformative applications. The scope also encompasses research on AI system development challenges, such as scaling approaches, low-cost deployment, system reliability, trustworthiness, and human–AI interaction paradigms. We also encourage submissions that bridge AI with other scientific fields, such as drug discovery and materials science. The topics of interest for this Special Issue, include but are not limited to, the following:

  • Machine learning;
  • Optimization and statistical learning;
  • Deep learning and neural networks;
  • Knowledge representation and reasoning;
  • Foundation models;
  • Autonomous systems and robotics;
  • AI for science;
  • AI for societal impact.

Prof. Dr. Xin Geng
Prof. Dr. George Papakostas
Prof. Dr. Fernando De la Prieta Pintado
Prof. Dr. Valentina E. Balas
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • machine learning
  • deep learning
  • neural networks
  • autonomous systems
  • robotics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

25 pages, 2538 KB  
Article
Fic2Bot: A Scalable Framework for Persona-Driven Chatbot Generation from Fiction
by Sua Kang, Chaelim Lee, Subin Jung and Minsu Lee
Electronics 2025, 14(19), 3859; https://doi.org/10.3390/electronics14193859 - 29 Sep 2025
Abstract
This paper presents Fic2Bot, an end-to-end framework that automatically transforms raw novel text into in-character chatbots by combining scene-level retrieval with persona profiling. Unlike conventional RAG-based systems that emphasize factual accuracy but neglect stylistic coherence, Fic2Bot ensures both factual grounding and consistent persona [...] Read more.
This paper presents Fic2Bot, an end-to-end framework that automatically transforms raw novel text into in-character chatbots by combining scene-level retrieval with persona profiling. Unlike conventional RAG-based systems that emphasize factual accuracy but neglect stylistic coherence, Fic2Bot ensures both factual grounding and consistent persona expression without any manual intervention. The framework integrates (1) Major Entity Identification (MEI) for robust coreference resolution, (2) scene-structured retrieval for precise contextual grounding, and (3) stylistic and sentiment profiling to capture linguistic and emotional traits of each character. Experiments conducted on novels from diverse genres show that Fic2Bot achieves robust entity resolution, more relevant retrieval, highly accurate speaker attribution, and stronger persona consistency in multi-turn dialogues. These results highlight Fic2Bot as a scalable and domain-agnostic framework for persona-driven chatbot generation, with potential applications in interactive roleplaying, language and literary studies, and entertainment. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

11 pages, 935 KB  
Article
Frequency-Aware Residual Networks with Masked Dual-Path Convolution for Image Classification
by Jisoo Baik, Youngbeom Jung and Kyujoong Lee
Electronics 2025, 14(18), 3690; https://doi.org/10.3390/electronics14183690 - 18 Sep 2025
Viewed by 200
Abstract
Recent deep learning-based models have demonstrated outstanding performance in various computer vision tasks, including image classification. Among them, ResNet is a representative model that achieves high accuracy by utilizing a deep network architecture. However, as the depth of ResNet increases, the accumulated computational [...] Read more.
Recent deep learning-based models have demonstrated outstanding performance in various computer vision tasks, including image classification. Among them, ResNet is a representative model that achieves high accuracy by utilizing a deep network architecture. However, as the depth of ResNet increases, the accumulated computational cost leads to higher overall FLOPs, which in turn results in increased computational burden and processing time. To address this issue, this paper proposes a novel model architecture that improves computational efficiency while maintaining accuracy. The proposed model reduces the total computational cost by separating the input image into high-frequency and low-frequency regions and applying convolution operations specialized to each region. Experimental results show that the proposed method reduces FLOPs by up to 29% compared to the original ResNet while maintaining competitive classification performance. In addition, by incorporating depthwise convolution instead of standard convolution, the model achieves an average reduction of 14.3% in operations compared to the standard multi-path model while reducing top-1 accuracy by only 0.18%. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Graphical abstract

23 pages, 8508 KB  
Article
A Short-Term User-Side Load Forecasting Method Based on the MCPO-VMD-FDFE Decomposition-Enhanced Framework
by Yu Du, Jiaju Shi, Xun Dou and Yu He
Electronics 2025, 14(18), 3611; https://doi.org/10.3390/electronics14183611 - 11 Sep 2025
Viewed by 240
Abstract
With the transition of the energy structure and the continuous development of smart grids, short-term user-side load forecasting plays a key role in fine power dispatch and efficient system operation. However, existing parameter optimization methods lack multi-dimensional and physically interpretable fitness evaluation. They [...] Read more.
With the transition of the energy structure and the continuous development of smart grids, short-term user-side load forecasting plays a key role in fine power dispatch and efficient system operation. However, existing parameter optimization methods lack multi-dimensional and physically interpretable fitness evaluation. They also fail to fully exploit frequency-domain features of decomposed modal components. These limitations reduce model accuracy and robustness in complex scenarios. To address this issue, this paper proposes a short-term user-side load forecasting method based on the MCPO-VMD-FDFE decomposition-enhanced framework. Firstly, a multi-dimensional fitness function is designed using indicators such as modal energy entropy and energy concentration. The Crested Porcupine Optimizer with Multidimensional Fitness Function (MCPO) algorithm is applied in VMD (Variational Mode Decomposition) to optimize the number of decomposition modes (K) and the penalty factor (α), thereby improving decomposition quality. Secondly, each IMF component obtained from VMD is analyzed by FFT. Key frequency components are selectively enhanced based on adaptive thresholds and weight coefficients to improve feature expression. Finally, a multi-scale convolution module is added to the PatchTST model to enhance its ability to capture local and multi-scale temporal features. The enhanced IMF components are fed into the improved model for prediction, and the final output is obtained by aggregating the results of all components. Experimental results show that the proposed method achieves the best performance on user-side load datasets for weekdays, Saturdays, and Sundays. The RMSE is reduced by 45.65% overall, confirming the effectiveness of the proposed approach in short-term user-side load forecasting tasks. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

16 pages, 4411 KB  
Article
Interpretable Deep Prototype-Based Neural Networks: Can a 1 Look like a 0?
by Esteban García-Cuesta, Daniel Manrique and Radu Constantin Ionescu
Electronics 2025, 14(18), 3584; https://doi.org/10.3390/electronics14183584 - 10 Sep 2025
Viewed by 605
Abstract
Prototype-Based Networks (PBNs) are inherently interpretable architectures that facilitate understanding of model outputs by analyzing the activation of specific neurons—referred to as prototypes—during the forward pass. The learned prototypes serve as transformations of the input space into a latent representation that more effectively [...] Read more.
Prototype-Based Networks (PBNs) are inherently interpretable architectures that facilitate understanding of model outputs by analyzing the activation of specific neurons—referred to as prototypes—during the forward pass. The learned prototypes serve as transformations of the input space into a latent representation that more effectively encapsulates the main characteristics shared across data samples, thereby enhancing classification performance. Crucially, these prototypes can be decoded and projected back into the original input space, providing direct interpretability of the features learned by the network. While this characteristic marks a meaningful advancement toward the realization of fully interpretable artificial intelligence systems, our findings reveal that prototype representations can be deliberately or inadvertently manipulated without compromising the superficial appearance of explainability. In this study, we conduct a series of empirical investigations that demonstrate this phenomenon, framing it as a structural paradox potentially intrinsic to the architecture or its design, which may represent a significant robustness challenge for explainable AI methodologies. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Graphical abstract

31 pages, 1503 KB  
Article
From Games to Understanding: Semantrix as a Testbed for Advancing Semantics in Human–Computer Interaction with Transformers
by Javier Sevilla-Salcedo, José Carlos Castillo Montoya, Álvaro Castro-González and Miguel A. Salichs
Electronics 2025, 14(17), 3480; https://doi.org/10.3390/electronics14173480 - 31 Aug 2025
Viewed by 561
Abstract
Despite rapid progress in natural language processing, current interactive AI systems continue to struggle with interpreting ambiguous, idiomatic, and contextually rich human language, a barrier to natural human–computer interaction. Many deployed applications, such as language games or educational tools, showcase surface-level adaptation but [...] Read more.
Despite rapid progress in natural language processing, current interactive AI systems continue to struggle with interpreting ambiguous, idiomatic, and contextually rich human language, a barrier to natural human–computer interaction. Many deployed applications, such as language games or educational tools, showcase surface-level adaptation but do not systematically probe or advance the deeper semantic understanding of user intent in open-ended, creative settings. In this paper, we present Semantrix, a web-based semantic word-guessing platform, not merely as a game but as a living testbed for evaluating and extending the semantic capabilities of state-of-the-art Transformer models in human-facing contexts. Semantrix challenges models to both assess the nuanced meaning of user guesses and generate dynamic, context-sensitive hints in real time, exposing the system to the diversity, ambiguity, and unpredictability of genuine human interaction. To empirically investigate how advanced semantic representations and adaptive language feedback affect user experience, we conducted a preregistered 2 × 2 factorial study (N = 42), independently manipulating embedding depth (Transformers vs. Word2Vec) and feedback adaptivity (dynamic hints vs. minimal feedback). Our findings revealed that only the combination of Transformer-based semantic modelling and adaptive hint generation sustained user engagement, motivation, and enjoyment; conditions lacking either component led to pronounced attrition, highlighting the limitations of shallow or static approaches. Beyond benchmarking game performance, we argue that the methodologies applied in platforms like Semantrix are helpful for improving machine understanding of natural language, paving the way for more robust, intuitive, and human-aligned AI approaches. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Graphical abstract

22 pages, 6033 KB  
Article
High-Density Neuromorphic Inference Platform (HDNIP) with 10 Million Neurons
by Yue Zuo, Ning Ning, Ke Cao, Rui Zhang, Cheng Fu, Shengxin Wang, Liwei Meng, Ruichen Ma, Guanchao Qiao, Yang Liu and Shaogang Hu
Electronics 2025, 14(17), 3412; https://doi.org/10.3390/electronics14173412 - 27 Aug 2025
Viewed by 513
Abstract
Modern neuromorphic processors exhibit neuron densities that are orders of magnitude lower than those of the biological cortex, hindering the deployment of large-scale spiking neural networks (SNNs) on single chips. To bridge this gap, we propose HDNIP, a 40 nm high-density neuromorphic inference [...] Read more.
Modern neuromorphic processors exhibit neuron densities that are orders of magnitude lower than those of the biological cortex, hindering the deployment of large-scale spiking neural networks (SNNs) on single chips. To bridge this gap, we propose HDNIP, a 40 nm high-density neuromorphic inference platform with a density-first architecture. By eliminating area-intensive on-chip SRAM and using 1280 compact cores with a time-division multiplexing factor of up to 8192, HDNIP integrates 10 million neurons and 80 billion synapses within a 44.39 mm2 synthesized area. This achieves an unprecedented neuron density of 225 k neurons/mm2, over 100 times greater than prior art. The resulting bandwidth challenges are mitigated by a ReRAM-based near-memory computation strategy combined with input reuse, reducing off-chip data transfer by approximately 95%. Furthermore, adaptive TDM and dynamic core fusion ensure high hardware utilization across diverse network topologies. Emulator-based validation using large SNNs, demonstrates a throughput of 13 GSOP/s at a low power consumption of 146 mW. HDNIP establishes a scalable pathway towards single-chip, low-SWaP neuromorphic systems for complex edge intelligence applications. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

27 pages, 4153 KB  
Article
Mitigating Context Bias in Vision–Language Models via Multimodal Emotion Recognition
by Constantin-Bogdan Popescu, Laura Florea and Corneliu Florea
Electronics 2025, 14(16), 3311; https://doi.org/10.3390/electronics14163311 - 20 Aug 2025
Viewed by 945
Abstract
Vision–Language Models (VLMs) have become key contributors to the state of the art in contextual emotion recognition, demonstrating a superior ability to understand the relationship between context, facial expressions, and interactions in images compared to traditional approaches. However, their reliance on contextual cues [...] Read more.
Vision–Language Models (VLMs) have become key contributors to the state of the art in contextual emotion recognition, demonstrating a superior ability to understand the relationship between context, facial expressions, and interactions in images compared to traditional approaches. However, their reliance on contextual cues can introduce unintended biases, especially when the background does not align with the individual’s true emotional state. This raises concerns for the reliability of such models in real-world applications, where robustness and fairness are critical. In this work, we explore the limitations of current VLMs in emotionally ambiguous scenarios and propose a method to overcome contextual bias. Existing VLM-based captioning solutions tend to overweight background and contextual information when determining emotion, often at the expense of the individual’s actual expression. To study this phenomenon, we created synthetic datasets by automatically extracting people from the original images using YOLOv8 and placing them on randomly selected backgrounds from the Landscape Pictures dataset. This allowed us to reduce the correlation between emotional expression and background context while preserving body pose. Through discriminative analysis of VLM behavior on images with both correct and mismatched backgrounds, we find that in 93% of the cases, the predicted emotions vary based on the background—even when models are explicitly instructed to focus on the person. To address this, we propose a multimodal approach (named BECKI) that incorporates body pose, full image context, and a novel description stream focused exclusively on identifying the emotional discrepancy between the individual and the background. Our primary contribution is not just in identifying the weaknesses of existing VLMs, but in proposing a more robust and context-resilient solution. Our method achieves up to 96% accuracy, highlighting its effectiveness in mitigating contextual bias. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

33 pages, 8930 KB  
Article
Network-Aware Gaussian Mixture Models for Multi-Objective SD-WAN Controller Placement
by Abdulrahman M. Abdulghani, Azizol Abdullah, Amir Rizaan Rahiman, Nor Asilah Wati Abdul Hamid and Bilal Omar Akram
Electronics 2025, 14(15), 3044; https://doi.org/10.3390/electronics14153044 - 30 Jul 2025
Viewed by 447
Abstract
Software-Defined Wide Area Networks (SD-WANs) require optimal controller placement to minimize latency, balance loads, and ensure reliability across geographically distributed infrastructures. This paper introduces NA-GMM (Network-Aware Gaussian Mixture Model), a novel multi-objective optimization framework addressing key limitations in current controller placement approaches. Three [...] Read more.
Software-Defined Wide Area Networks (SD-WANs) require optimal controller placement to minimize latency, balance loads, and ensure reliability across geographically distributed infrastructures. This paper introduces NA-GMM (Network-Aware Gaussian Mixture Model), a novel multi-objective optimization framework addressing key limitations in current controller placement approaches. Three principal contributions distinguish NA-GMM: (1) a hybrid distance metric that integrates geographic distance, network latency, topological cost, and link reliability through adaptive weighting, effectively capturing multi-dimensional network characteristics; (2) a modified expectation–maximization algorithm incorporating node importance-weighting to optimize controller placements for critical network elements; and (3) a robust clustering mechanism that transitions from probabilistic (soft) assignments to definitive (hard) cluster selections, ensuring optimal placement convergence. Empirical evaluations on real-world topologies demonstrate NA-GMM’s superiority, achieving up to 22.7% lower average control latency compared to benchmark approaches, maintaining near-optimal load distribution with node distribution ratios, and delivering a 12.9% throughput improvement. Furthermore, NA-GMM achieved exquisite computational efficiency, executing 68.9% faster and consuming 41.5% less memory than state of the art methods, while achieving exceptional load balancing. These findings confirm NA-GMM’s practical viability for large-scale SD-WAN deployments where real-time multi-objective optimization is essential. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

35 pages, 10153 KB  
Article
EnvMat: A Network for Simultaneous Generation of PBR Maps and Environment Maps from a Single Image
by SeongYeon Oh, Moonryul Jung and Taehoon Kim
Electronics 2025, 14(13), 2554; https://doi.org/10.3390/electronics14132554 - 24 Jun 2025
Viewed by 524
Abstract
Generative neural networks have expanded from text and image generation to creating realistic 3D graphics, which are critical for immersive virtual environments. Physically Based Rendering (PBR)—crucial for realistic 3D graphics—depends on PBR maps, environment (env) maps for lighting, and camera viewpoints. Current research [...] Read more.
Generative neural networks have expanded from text and image generation to creating realistic 3D graphics, which are critical for immersive virtual environments. Physically Based Rendering (PBR)—crucial for realistic 3D graphics—depends on PBR maps, environment (env) maps for lighting, and camera viewpoints. Current research mainly generates PBR maps separately, often using fixed env maps and camera poses. This limitation reduces visual consistency and immersion in 3D spaces. Addressing this, we propose EnvMat, a diffusion-based model that simultaneously generates PBR and env maps. EnvMat uses two Variational Autoencoders (VAEs) for map reconstruction and a Latent Diffusion UNet. Experimental results show that EnvMat surpasses the existing methods in preserving visual accuracy, as validated through metrics like L-PIPS, MS-SSIM, and CIEDE2000. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Graphical abstract

35 pages, 6431 KB  
Article
Delving into YOLO Object Detection Models: Insights into Adversarial Robustness
by Kyriakos D. Apostolidis and George A. Papakostas
Electronics 2025, 14(8), 1624; https://doi.org/10.3390/electronics14081624 - 17 Apr 2025
Cited by 3 | Viewed by 3192
Abstract
This paper provides a comprehensive study of the security of YOLO (You Only Look Once) model series for object detection, emphasizing their evolution, technical innovations, and performance across the COCO dataset. The robustness of YOLO models under adversarial attacks and image corruption, offering [...] Read more.
This paper provides a comprehensive study of the security of YOLO (You Only Look Once) model series for object detection, emphasizing their evolution, technical innovations, and performance across the COCO dataset. The robustness of YOLO models under adversarial attacks and image corruption, offering insights into their resilience and adaptability, is analyzed in depth. As real-time object detection plays an increasingly vital role in applications such as autonomous driving, security, and surveillance, this review aims to clarify the strengths and limitations of each YOLO iteration, serving as a valuable resource for researchers and practitioners aiming to optimize model selection and deployment in dynamic, real-world environments. The results reveal that YOLOX models, particularly their large variants, exhibit superior robustness compared to other YOLO versions, maintaining higher accuracy under challenging conditions. Our findings serve as a valuable resource for researchers and practitioners aiming to optimize YOLO models for dynamic and adversarial real-world environments while guiding future research toward developing more resilient object detection systems. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

Review

Jump to: Research

53 pages, 3279 KB  
Review
Cognitive Bias Mitigation in Executive Decision-Making: A Data-Driven Approach Integrating Big Data Analytics, AI, and Explainable Systems
by Leonidas Theodorakopoulos, Alexandra Theodoropoulou and Constantinos Halkiopoulos
Electronics 2025, 14(19), 3930; https://doi.org/10.3390/electronics14193930 - 3 Oct 2025
Abstract
Cognitive biases continue to pose significant challenges in executive decision-making, often leading to strategic inefficiencies, misallocation of resources, and flawed risk assessments. While traditional decision-making relies on intuition and experience, these methods are increasingly proving inadequate in addressing the complexity of modern business [...] Read more.
Cognitive biases continue to pose significant challenges in executive decision-making, often leading to strategic inefficiencies, misallocation of resources, and flawed risk assessments. While traditional decision-making relies on intuition and experience, these methods are increasingly proving inadequate in addressing the complexity of modern business environments. Despite the growing integration of big data analytics into executive workflows, existing research lacks a comprehensive examination of how AI-driven methodologies can systematically mitigate biases while maintaining transparency and trust. This paper addresses these gaps by analyzing how big data analytics, artificial intelligence (AI), machine learning (ML), and explainable AI (XAI) contribute to reducing heuristic-driven errors in executive reasoning. Specifically, it explores the role of predictive modeling, real-time analytics, and decision intelligence systems in enhancing objectivity and decision accuracy. Furthermore, this study identifies key organizational and technical barriers—such as biases embedded in training data, model opacity, and resistance to AI adoption—that hinder the effectiveness of data-driven decision-making. By reviewing empirical findings from A/B testing, simulation experiments, and behavioral assessments, this research examines the applicability of AI-powered decision support systems in strategic management. The contributions of this paper include a detailed analysis of bias mitigation mechanisms, an evaluation of current limitations in AI-driven decision intelligence, and practical recommendations for fostering a more data-driven decision culture. By addressing these research gaps, this study advances the discourse on responsible AI adoption and provides actionable insights for organizations seeking to enhance executive decision-making through big data analytics. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
41 pages, 2098 KB  
Review
Learning-Based Viewport Prediction for 360-Degree Videos: A Review
by Mahmoud Z. A. Wahba, Sara Baldoni and Federica Battisti
Electronics 2025, 14(18), 3743; https://doi.org/10.3390/electronics14183743 - 22 Sep 2025
Viewed by 277
Abstract
Nowadays, virtual reality is experiencing widespread adoption, and its popularity is expected to grow in the next few decades. A relevant portion of virtual reality content is represented by 360-degree videos, which allow users to be surrounded by the video content and to [...] Read more.
Nowadays, virtual reality is experiencing widespread adoption, and its popularity is expected to grow in the next few decades. A relevant portion of virtual reality content is represented by 360-degree videos, which allow users to be surrounded by the video content and to explore it without limitations. However, 360-degree videos are extremely demanding in terms of storage and streaming requirements. At the same time, users are not able to enjoy the 360-degree content all at once due to the inherent limitations of the human visual system. For this reason, viewport prediction techniques have been proposed: they aim at forecasting where the user will look, thus allowing the transmission of the sole viewport content or the assignment of a different quality level for viewport and non-viewport regions. In this context, artificial intelligence plays a pivotal role in the development of high-performance viewport prediction solutions. In this work, we analyze the evolution of viewport prediction based on machine and deep learning techniques in the last decade, focusing on their classification based on the employed processing technique, as well as the input and output formats. Our review shows common gaps in the existing approaches, thus paving the way for future research. An increase in viewport prediction accuracy and reliability will foster the diffusion of virtual reality content in real-life scenarios. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop