Journal Description
AI
AI
is an international, peer-reviewed, open access journal on artificial intelligence (AI), including broad aspects of cognition and reasoning, perception and planning, machine learning, intelligent robotics, and applications of AI, published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within ESCI (Web of Science), Scopus, EBSCO, and other databases.
- Journal Rank: JCR - Q2 (Computer Science, Artificial Intelligence) / CiteScore - Q2 (Artificial Intelligence)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18.9 days after submission; acceptance to publication is undertaken in 4.9 days (median values for papers published in this journal in the second half of 2024).
- Recognition of Reviewers: APC discount vouchers, optional signed peer review, and reviewer names published annually in the journal.
Impact Factor:
3.1 (2023);
5-Year Impact Factor:
3.3 (2023)
Latest Articles
Discriminative Deformable Part Model for Pedestrian Detection with Occlusion Handling
AI 2025, 6(4), 70; https://doi.org/10.3390/ai6040070 - 3 Apr 2025
Abstract
►
Show Figures
Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real
[...] Read more.
Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real time, these systems can alert drivers or even autonomously apply brakes, minimizing the possibility of collisions. However, occlusion is a major obstacle to pedestrian detection. Pedestrians are typically occluded by trees, street poles, cars, and other pedestrians. State-of-the-art detection methods are based on fully visible or little-occluded pedestrians; hence, their performance declines with increasing occlusion level. To meet this challenge, a pedestrian detector capable of handling occlusion is preferred. To increase the detection accuracy for occluded pedestrians, we propose a new method called the Discriminative Deformable Part Model (DDPM), which uses the concept of breaking human image into deformable parts via machine learning. In existing works, human image breaking into deformable parts has been performed by human intuition. In our novel approach, machine learning is used for deformable objects such as humans, combining the benefits and removing the drawbacks of the previous works. We also propose a new pedestrian dataset based on Eastern clothes to accommodate the detector’s evaluation under different intra-class variations of pedestrians. The proposed method achieves a higher detection accuracy on Pascal VOC and VisDrone Detection datasets when compared with other popular detection methods.
Full article
Open AccessArticle
LineMVGNN: Anti-Money Laundering with Line-Graph-Assisted Multi-View Graph Neural Networks
by
Chung-Hoo Poon, James Kwok, Calvin Chow and Jang-Hyeon Choi
AI 2025, 6(4), 69; https://doi.org/10.3390/ai6040069 - 3 Apr 2025
Abstract
Anti-money laundering (AML) systems are important for protecting the global economy. However, conventional rule-based methods rely on domain knowledge, leading to suboptimal accuracy and a lack of scalability. Graph neural networks (GNNs) for digraphs (directed graphs) can be applied to transaction graphs and
[...] Read more.
Anti-money laundering (AML) systems are important for protecting the global economy. However, conventional rule-based methods rely on domain knowledge, leading to suboptimal accuracy and a lack of scalability. Graph neural networks (GNNs) for digraphs (directed graphs) can be applied to transaction graphs and capture suspicious transactions or accounts. However, most spectral GNNs do not naturally support multi-dimensional edge features, lack interpretability due to edge modifications, and have limited scalability owing to their spectral nature. Conversely, most spatial methods may not capture the money flow well. Therefore, in this work, we propose LineMVGNN (Line-Graph-Assisted Multi-View Graph Neural Network), a novel spatial method that considers payment and receipt transactions. Specifically, the LineMVGNN model extends a lightweight MVGNN module, which performs two-way message passing between nodes in a transaction graph. Additionally, LineMVGNN incorporates a line graph view of the original transaction graph to enhance the propagation of transaction information. We conduct experiments on two real-world account-based transaction datasets: the Ethereum phishing transaction network dataset and a financial payment transaction dataset from one of our industry partners. The results show that our proposed method outperforms state-of-the-art methods, reflecting the effectiveness of money laundering detection with line-graph-assisted multi-view graph learning. We also discuss scalability, adversarial robustness, and regulatory considerations of our proposed method.
Full article
(This article belongs to the Special Issue AI in Finance: Leveraging AI to Transform Financial Services)
►▼
Show Figures

Figure 1
Open AccessArticle
Voice-AttentionNet: Voice-Based Multi-Disease Detection with Lightweight Attention-Based Temporal Convolutional Neural Network
by
Jintao Wang, Jianhang Zhou and Bob Zhang
AI 2025, 6(4), 68; https://doi.org/10.3390/ai6040068 - 28 Mar 2025
Abstract
Voice data contain a wealth of temporal and spectral information and can be a valuable resource for disease classification. However, traditional methods are often not effective in capturing the key features required for the classification of multiple disease classes. To address this challenge,
[...] Read more.
Voice data contain a wealth of temporal and spectral information and can be a valuable resource for disease classification. However, traditional methods are often not effective in capturing the key features required for the classification of multiple disease classes. To address this challenge, we propose a voice-based multi-disease detection approach with a lightweight attention-based temporal convolution neural network (Voice-AttentionNet) designed to analyze speech data for multi-class disease classification. Our model utilizes the temporal convolution neural network (CNN) architecture to extract high-resolution temporal features, while incorporating attention mechanisms to highlight disease-related patterns. Extensive experiments have been conducted on our dataset, including speech samples from patients with multiple illnesses. The results show that our method achieves the most advanced performance with an average classification accuracy of 91.61% on six datasets and is superior to the existing classical models. These findings highlight the potential of combining attention mechanisms with temporal CNNs in the use of speech data for disease classification. Moreover, this study provides a promising direction for deploying AI-driven diagnostic tools in clinical scenarios.
Full article
(This article belongs to the Section Medical & Healthcare AI)
►▼
Show Figures

Figure 1
Open AccessReview
Survey of Architectural Floor Plan Retrieval Technology Based on 3ST Features
by
Hongxing Ling, Guangsheng Luo, Nanrun Zhou and Xiaoyan Jiang
AI 2025, 6(4), 67; https://doi.org/10.3390/ai6040067 - 26 Mar 2025
Abstract
Feature retrieval technology for building floor plans has garnered significant attention in recent years due to its critical role in the efficient management and execution of construction projects. This paper presents a comprehensive exploration of four primary features essential for the retrieval of
[...] Read more.
Feature retrieval technology for building floor plans has garnered significant attention in recent years due to its critical role in the efficient management and execution of construction projects. This paper presents a comprehensive exploration of four primary features essential for the retrieval of building floor plans: semantic features, spatial features, shape features, and texture features (collectively referred to as 3ST features). The extraction algorithms and underlying principles associated with these features are thoroughly analyzed, with a focus on advanced methods such as wavelet transforms and Fourier shape descriptors. Furthermore, the performance of various retrieval algorithms is evaluated through rigorous experimental analysis, offering valuable insights into optimizing the retrieval of building floor plans. Finally, this study outlines prospective directions for the advancement of feature retrieval technology in the context of floor plans.
Full article
(This article belongs to the Topic Theoretical Foundations and Applications of Deep Learning Techniques)
►▼
Show Figures

Figure 1
Open AccessArticle
One-Shot Autoregressive Generation of Combinatorial Optimization Solutions Based on the Large Language Model Architecture and Learning Algorithms
by
Bishad Ghimire, Ausif Mahmood and Khaled Elleithy
AI 2025, 6(4), 66; https://doi.org/10.3390/ai6040066 - 26 Mar 2025
Abstract
Large Language Models (LLMs) have immensely advanced the field of Artificial Intelligence (AI), with recent models being able to perform chain-of-thought reasoning and solve complex mathematical problems, ranging from theorem proving to ones involving advanced calculus. The success of LLMs derives from a
[...] Read more.
Large Language Models (LLMs) have immensely advanced the field of Artificial Intelligence (AI), with recent models being able to perform chain-of-thought reasoning and solve complex mathematical problems, ranging from theorem proving to ones involving advanced calculus. The success of LLMs derives from a combination of the Transformer architecture with its attention mechanism, the autoregressive training methodology with masked attention, and the alignment fine-tuning via reinforcement learning algorithms. In this research, we attempt to explore a possible solution to the fundamental NP-hard problem of combinatorial optimization, in particular, the Traveling Salesman Problem (TSP), by following the LLM approach in terms of the architecture and training algorithms. Similar to the LLM design, which is trained in an autoregressive manner to predict the next token, our model is trained to predict the next node in a TSP graph. After the model is trained on random TSP graphs with known near-optimal solutions, we fine-tune the model using Direct Preference Optimization (DPO). The tour generation in a trained model is autoregressive one-step generation with no need for iterative refinement. Our results are very promising and indicate that, for TSP graphs up to 100 nodes, a relatively small amount of training data yield solutions within a few percent of the optimal. This optimization improves if more data are used to train the model.
Full article
(This article belongs to the Section AI Systems: Theory and Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Adapting a Large-Scale Transformer Model to Decode Chicken Vocalizations: A Non-Invasive AI Approach to Poultry Welfare
by
Suresh Neethirajan
AI 2025, 6(4), 65; https://doi.org/10.3390/ai6040065 - 25 Mar 2025
Abstract
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition,
[...] Read more.
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, to decode chicken vocalizations. Our primary objective was to determine whether Whisper could effectively identify acoustic patterns associated with emotional and physiological states in poultry, thereby enabling real-time, non-invasive welfare assessments. To achieve this, chicken vocal data were recorded under diverse experimental conditions, including healthy versus unhealthy birds, pre-stress versus post-stress scenarios, and quiet versus noisy environments. The audio recordings were processed through Whisper, producing text-like outputs. Although these outputs did not represent literal translations of chicken vocalizations into human language, they exhibited consistent patterns in token sequences and sentiment indicators strongly correlated with recognized poultry stressors and welfare conditions. Sentiment analysis using standard NLP tools (e.g., polarity scoring) identified notable shifts in “negative” and “positive” scores that corresponded closely with documented changes in vocal intensity associated with stress events and altered physiological states. Despite the inherent domain mismatch—given Whisper’s original training on human speech—the findings clearly demonstrate the model’s capability to reliably capture acoustic features significant to poultry welfare. Recognizing the limitations associated with applying English-oriented sentiment tools, this study proposes future multimodal validation frameworks incorporating physiological sensors and behavioral observations to further strengthen biological interpretability. To our knowledge, this work provides the first demonstration that Transformer-based architectures, even without species-specific fine-tuning, can effectively encode meaningful acoustic patterns from animal vocalizations, highlighting their transformative potential for advancing productivity, sustainability, and welfare practices in precision poultry farming.
Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
►▼
Show Figures

Figure 1
Open AccessArticle
SMART Restaurant ReCommender: A Context-Aware Restaurant Recommendation Engine
by
Ayesha Ubaid, Adrian Lie and Xiaojie Lin
AI 2025, 6(4), 64; https://doi.org/10.3390/ai6040064 - 25 Mar 2025
Abstract
With the rise of e-commerce and web application usage, recommendation systems have become important to our daily tasks. They provide personalized suggestions to assist with any task under consideration. While various machine learning algorithms have been developed for recommendation tasks, existing systems still
[...] Read more.
With the rise of e-commerce and web application usage, recommendation systems have become important to our daily tasks. They provide personalized suggestions to assist with any task under consideration. While various machine learning algorithms have been developed for recommendation tasks, existing systems still face limitations. This research focuses on advancing context-aware recommendation sytems by leveraging the capabilities of Large Language Models (LLMs) in conjunction with real-time data. The research exploits the integration of existing real-time data APIs with LLMs to enhance the capabilities of the recommendation systems already integrated into smart societies. The experimental results demonstrate that the hybrid approach significantly improves the user experience and recommendation quality, ensuring more relevant and dynamic suggestions.
Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
►▼
Show Figures

Figure 1
Open AccessArticle
FedBirdAg: A Low-Energy Federated Learning Platform for Bird Detection with Wireless Smart Cameras in Agriculture 4.0
by
Samy Benhoussa, Gil De Sousa and Jean-Pierre Chanet
AI 2025, 6(4), 63; https://doi.org/10.3390/ai6040063 - 21 Mar 2025
Abstract
Birds can cause substantial damage to crops, directly affecting farmers’ productivity and profitability. As a result, detecting bird presence in crop fields is crucial for effective crop management. Traditional agricultural practices have used various tools and techniques to deter pest birds, while digital
[...] Read more.
Birds can cause substantial damage to crops, directly affecting farmers’ productivity and profitability. As a result, detecting bird presence in crop fields is crucial for effective crop management. Traditional agricultural practices have used various tools and techniques to deter pest birds, while digital agriculture has advanced these efforts through Internet of Things (IoT) and artificial intelligence (AI) technologies. With recent advancements in hardware and processing chips, connected devices can now utilize deep convolutional neural networks (CNNs) for on-field image classification. However, training these models can be energy-intensive, especially when large amounts of data, such as images, need to be transmitted for centralized model training. Federated learning (FL) offers a solution by enabling local training on edge devices, reducing data transmission costs and energy demands while also preserving data privacy and achieving shared model knowledge across connected devices. This paper proposes a low-energy federated learning framework for a compact smart camera network designed to perform simple image classification for bird detection in crop fields. The results demonstrate that this decentralized approach achieves performance comparable to a centrally trained model while consuming at least 8 times less energy. Further efficiency improvements, with a minimal tradeoff in performance reduction, are explored through early stopping.
Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
►▼
Show Figures

Figure 1
Open AccessPerspective
AI-Driven Telerehabilitation: Benefits and Challenges of a Transformative Healthcare Approach
by
Rocco Salvatore Calabrò and Sepehr Mojdehdehbaher
AI 2025, 6(3), 62; https://doi.org/10.3390/ai6030062 - 17 Mar 2025
Abstract
►▼
Show Figures
Artificial intelligence (AI) has revolutionized telerehabilitation by integrating machine learning (ML), big data analytics, and real-time feedback to create adaptive, patient-centered care. AI-driven systems enhance telerehabilitation by analyzing patient data to personalize therapy, monitor progress, and suggest adjustments, eliminating the need for constant
[...] Read more.
Artificial intelligence (AI) has revolutionized telerehabilitation by integrating machine learning (ML), big data analytics, and real-time feedback to create adaptive, patient-centered care. AI-driven systems enhance telerehabilitation by analyzing patient data to personalize therapy, monitor progress, and suggest adjustments, eliminating the need for constant clinician oversight. The benefits of AI-powered telerehabilitation include increased accessibility, especially for remote or mobility-limited patients, and greater convenience, allowing patients to perform therapies at home. However, challenges persist, such as data privacy risks, the digital divide, and algorithmic bias. Robust encryption protocols, equitable access to technology, and diverse training datasets are critical to addressing these issues. Ethical considerations also arise, emphasizing the need for human oversight and maintaining the therapeutic relationship. AI also aids clinicians by automating administrative tasks and facilitating interdisciplinary collaboration. Innovations like 5G networks, the Internet of Medical Things (IoMT), and robotics further enhance telerehabilitation’s potential. By transforming rehabilitation into a dynamic, engaging, and personalized process, AI and telerehabilitation together represent a paradigm shift in healthcare, promising improved outcomes and broader access for patients worldwide.
Full article

Figure 1
Open AccessArticle
Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques
by
Nixon Jiménez, Stefany Orellana, Bertha Mazon-Olivo, Wilmer Rivas-Asanza and Iván Ramírez-Morales
AI 2025, 6(3), 61; https://doi.org/10.3390/ai6030061 - 17 Mar 2025
Abstract
Leaf diseases, such as Black Sigatoka and Cordana, represent a growing threat to banana crops in Ecuador. These diseases spread rapidly, impacting both leaf and fruit quality. Early detection is crucial for effective control measures. Recently, deep learning has proven to be a
[...] Read more.
Leaf diseases, such as Black Sigatoka and Cordana, represent a growing threat to banana crops in Ecuador. These diseases spread rapidly, impacting both leaf and fruit quality. Early detection is crucial for effective control measures. Recently, deep learning has proven to be a powerful tool in agriculture, enabling more accurate analysis and identification of crop diseases. This study applied the CRISP-DM methodology, consisting of six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. A dataset of 900 banana leaf images was collected—300 of Black Sigatoka, 300 of Cordana, and 300 of healthy leaves. Three pre-trained models (EfficientNetB0, ResNet50, and VGG19) were trained on this dataset. To improve performance, data augmentation techniques were applied using TensorFlow Keras’s ImageDataGenerator class, expanding the dataset to 9000 images. Due to the high computational demands of ResNet50 and VGG19, training was performed with EfficientNetB0. The models—EfficientNetB0, ResNet50, and VGG19—demonstrated the ability to identify leaf diseases in bananas, with accuracies of 88.33%, 88.90%, and 87.22%, respectively. The data augmentation increased the performance of EfficientNetB0 to 87.83%, but did not significantly improve its accuracy. These findings highlight the value of deep learning techniques for early disease detection in banana crops, enhancing diagnostic accuracy and efficiency.
Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Antiparasitic Pharmacology Goes to the Movies: Leveraging Generative AI to Create Educational Short Films
by
Benjamin Worthley, Meize Guo, Lucas Sheneman and Tyler Bland
AI 2025, 6(3), 60; https://doi.org/10.3390/ai6030060 - 17 Mar 2025
Abstract
Medical education faces the dual challenge of addressing cognitive overload and sustaining student engagement, particularly in complex subjects such as pharmacology. This study introduces Cinematic Clinical Narratives (CCNs) as an innovative approach to teaching antiparasitic pharmacology, combining generative artificial intelligence (genAI), edutainment, and
[...] Read more.
Medical education faces the dual challenge of addressing cognitive overload and sustaining student engagement, particularly in complex subjects such as pharmacology. This study introduces Cinematic Clinical Narratives (CCNs) as an innovative approach to teaching antiparasitic pharmacology, combining generative artificial intelligence (genAI), edutainment, and mnemonic-based learning. The intervention involved two short films, Alien: Parasites Within and Wormquest, designed to teach antiparasitic pharmacology to first-year medical students. A control group of students only received traditional text-based clinical cases, while the experimental group engaged with the CCNs in an active learning environment. Students who received the CCN material scored an average of 8% higher on exam questions related to the material covered by the CCN compared to students in the control group. Results also showed that the CCNs improved engagement and interest among students, as evidenced by significantly higher scores on the Situational Interest Survey for Multimedia (SIS-M) compared to traditional methods. Notably, students preferred CCNs for their storytelling, visuals, and interactive elements. This study underscores the potential of CCNs as a supplementary educational tool, and suggests the potential for broader applications across other medical disciplines outside of antiparasitic pharmacology. By leveraging genAI and edutainment, CCNs represent a scalable and innovative approach to enhancing the medical learning experience.
Full article
(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)
►▼
Show Figures

Figure 1
Open AccessArticle
Clinical Applicability of Machine Learning Models for Binary and Multi-Class Electrocardiogram Classification
by
Daniel Nasef, Demarcus Nasef, Kennette James Basco, Alana Singh, Christina Hartnett, Michael Ruane, Jason Tagliarino, Michael Nizich and Milan Toma
AI 2025, 6(3), 59; https://doi.org/10.3390/ai6030059 - 14 Mar 2025
Abstract
Background: This study investigates the application of machine learning models to classify electrocardiogram signals, addressing challenges such as class imbalances and inter-class overlap. In this study, “normal” and “abnormal” refer to electrocardiogram findings that either align with or deviate from a standard electrocardiogram,
[...] Read more.
Background: This study investigates the application of machine learning models to classify electrocardiogram signals, addressing challenges such as class imbalances and inter-class overlap. In this study, “normal” and “abnormal” refer to electrocardiogram findings that either align with or deviate from a standard electrocardiogram, warranting further evaluation. “Borderline” indicates an electrocardiogram that requires additional assessment to distinguish benign variations from pathology. Methods: A hierarchical framework reformulated the multi-class problem into two binary classification tasks—distinguishing “Abnormal” from “Non-Abnormal” and “Normal” from “Non-Normal”—to enhance performance and interpretability. Convolutional neural networks, deep neural networks, and tree-based models, including Gradient Boosting Classifier and Random Forest, were trained and evaluated using standard metrics (accuracy, precision, recall, and F1 score) and learning curve convergence analysis. Results: Results showed that convolutional neural networks achieved the best balance between generalization and performance, effectively adapting to unseen data and variations without overfitting. They exhibit strong convergence and robust feature importance rankings, with ventricular rate, QRS duration, and P-R interval identified as key predictors. Tree-based models, despite their high performance metrics, demonstrated poor convergence, raising concerns about their reliability on unseen data. Deep neural networks achieved high sensitivity but suffered from overfitting, limiting their generalizability. Conclusions: The hierarchical binary classification approach demonstrated clinical relevance, enabling nuanced diagnostic insights. Furthermore, the study emphasizes the critical role of learning curve analysis in evaluating model reliability, beyond performance metrics alone. Future work should focus on optimizing model convergence and exploring hybrid approaches to improve clinical applicability in electrocardiogram signal classification.
Full article
(This article belongs to the Section Medical & Healthcare AI)
►▼
Show Figures

Figure 1
Open AccessArticle
Leveraging Spectral Neighborhood Information for Corn Yield Prediction with Spatial-Lagged Machine Learning Modeling: Can Neighborhood Information Outperform Vegetation Indices?
by
Efrain Noa-Yarasca, Javier M. Osorio Leyton, Chad B. Hajda, Kabindra Adhikari and Douglas R. Smith
AI 2025, 6(3), 58; https://doi.org/10.3390/ai6030058 - 13 Mar 2025
Abstract
Accurate and reliable crop yield prediction is essential for optimizing agricultural management, resource allocation, and decision-making, while also supporting farmers and stakeholders in adapting to climate change and increasing global demand. This study introduces an innovative approach to crop yield prediction by incorporating
[...] Read more.
Accurate and reliable crop yield prediction is essential for optimizing agricultural management, resource allocation, and decision-making, while also supporting farmers and stakeholders in adapting to climate change and increasing global demand. This study introduces an innovative approach to crop yield prediction by incorporating spatially lagged spectral data (SLSD) through the spatial-lagged machine learning (SLML) model, an enhanced version of the spatial lag X (SLX) model. The research aims to show that SLSD improves prediction compared to traditional vegetation index (VI)-based methods. Conducted on a 19-hectare cornfield at the ARS Grassland, Soil, and Water Research Laboratory during the 2023 growing season, this study used five-band multispectral image data and 8581 yield measurements ranging from 1.69 to 15.86 Mg/Ha. Four predictor sets were evaluated: Set 1 (spectral bands), Set 2 (spectral bands + neighborhood data), Set 3 (spectral bands + VIs), and Set 4 (spectral bands + top VIs + neighborhood data). These were evaluated using the SLX model and four decision-tree-based SLML models (RF, XGB, ET, GBR), with performance assessed using R2 and RMSE. Results showed that incorporating spatial neighborhood data (Set 2) outperformed VI-based approaches (Set 3), emphasizing the importance of spatial context. SLML models, particularly XGB, RF, and ET, performed best with 4–8 neighbors, while excessive neighbors slightly reduced accuracy. In Set 3, VIs improved predictions, but a smaller subset (10–15 indices) was sufficient for optimal yield prediction. Set 4 showed slight gains over Sets 2 and 3, with XGB and RF achieving the highest R2 values. Key predictors included spatially lagged spectral bands (e.g., Green_lag, NIR_lag, RedEdge_lag) and VIs (e.g., CREI, GCI, NCPI, ARI, CCCI), highlighting the value of integrating neighborhood data for improved corn yield prediction. This study underscores the importance of spatial context in corn yield prediction and lays the foundation for future research across diverse agricultural settings, focusing on optimizing neighborhood size, integrating spatial and spectral data, and refining spatial dependencies through localized search algorithms.
Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
►▼
Show Figures

Figure 1
Open AccessArticle
Integration of YOLOv8 Small and MobileNet V3 Large for Efficient Bird Detection and Classification on Mobile Devices
by
Axel Frederick Félix-Jiménez, Vania Stephany Sánchez-Lee, Héctor Alejandro Acuña-Cid, Isaul Ibarra-Belmonte, Efraín Arredondo-Morales and Eduardo Ahumada-Tello
AI 2025, 6(3), 57; https://doi.org/10.3390/ai6030057 - 13 Mar 2025
Abstract
Background: Bird species identification and classification are crucial for biodiversity research, conservation initiatives, and ecological monitoring. However, conventional identification techniques used by biologists are time-consuming and susceptible to human error. The integration of deep learning models offers a promising alternative to automate and
[...] Read more.
Background: Bird species identification and classification are crucial for biodiversity research, conservation initiatives, and ecological monitoring. However, conventional identification techniques used by biologists are time-consuming and susceptible to human error. The integration of deep learning models offers a promising alternative to automate and enhance species recognition processes. Methods: This study explores the use of deep learning for bird species identification in the city of Zacatecas. Specifically, we implement YOLOv8 Small for real-time detection and MobileNet V3 for classification. The models were trained and tested on a dataset comprising five bird species: Vermilion Flycatcher, Pine Flycatcher, Mexican Chickadee, Arizona Woodpecker, and Striped Sparrow. The evaluation metrics included precision, recall, and computational efficiency. Results: The findings demonstrate that both models achieve high accuracy in species identification. YOLOv8 Small excels in real-time detection, making it suitable for dynamic monitoring scenarios, while MobileNet V3 provides a lightweight yet efficient classification solution. These results highlight the potential of artificial intelligence to enhance ornithological research by improving monitoring accuracy and reducing manual identification efforts.
Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
►▼
Show Figures

Figure 1
Open AccessArticle
Emotion-Aware Embedding Fusion in Large Language Models (Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4) for Intelligent Response Generation
by
Abdur Rasool, Muhammad Irfan Shahzad, Hafsa Aslam, Vincent Chan and Muhammad Ali Arshad
AI 2025, 6(3), 56; https://doi.org/10.3390/ai6030056 - 13 Mar 2025
Abstract
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention
[...] Read more.
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention mechanisms to prioritize semantic and emotional features in therapy transcripts. Our approach combines multiple emotion lexicons, including NRC Emotion Lexicon, VADER, WordNet, and SentiWordNet, with state-of-the-art LLMs such as Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4. Therapy session transcripts, comprising over 2000 samples, are segmented into hierarchical levels (word, sentence, and session) using neural networks, while hierarchical fusion combines these features with pooling techniques to refine emotional representations. Attention mechanisms, including multi-head self-attention and cross-attention, further prioritize emotional and contextual features, enabling the temporal modeling of emotional shifts across sessions. The processed embeddings, computed using BERT, GPT-3, and RoBERTa, are stored in the Facebook AI similarity search vector database, which enables efficient similarity search and clustering across dense vector spaces. Upon user queries, relevant segments are retrieved and provided as context to LLMs, enhancing their ability to generate empathetic and contextually relevant responses. The proposed framework is evaluated across multiple practical use cases to demonstrate real-world applicability, including AI-driven therapy chatbots. The system can be integrated into existing mental health platforms to generate personalized responses based on retrieved therapy session data. The experimental results show that our framework enhances empathy, coherence, informativeness, and fluency, surpassing baseline models while improving LLMs’ emotional intelligence and contextual adaptability for psychotherapy.
Full article
(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)
►▼
Show Figures

Figure 1
Open AccessArticle
Integrating Pose Features and Cross-Relationship Learning for Human–Object Interaction Detection
by
Lang Wu, Jie Li, Shuqin Li, Yu Ding, Meng Zhou and Yuntao Shi
AI 2025, 6(3), 55; https://doi.org/10.3390/ai6030055 - 12 Mar 2025
Abstract
►▼
Show Figures
Background: The main challenge in human–object interaction detection (HOI) is how to accurately reason about ambiguous, complex, and difficult to recognize interactions. The model structure of the existing methods is relatively single, and the image input may be occluded and cannot be accurately
[...] Read more.
Background: The main challenge in human–object interaction detection (HOI) is how to accurately reason about ambiguous, complex, and difficult to recognize interactions. The model structure of the existing methods is relatively single, and the image input may be occluded and cannot be accurately recognized. Methods: In this paper, we design a Pose-Aware Interaction Network (PAIN) based on transformer architecture and human posture to address these issues through two innovations: A new feature fusion method is proposed, which fuses human pose features and image features early before the encoder to improve the feature expression ability, and the individual motion-related features are additionally strengthened by adding to the human branch; the Cross-Attention Relationship fusion Module (CARM) better fuses the three-branch output and captures the detailed relationship information of HOI. Results: The proposed method achieves 64.51% , 66.42% on the public dataset V-COCO and 30.83% AP on HICO-DET, which can recognize HOI instances more accurately.
Full article

Figure 1
Open AccessReview
A Bibliometric Analysis on Artificial Intelligence in the Production Process of Small and Medium Enterprises
by
Federico Briatore, Marco Tullio Mosca, Roberto Nicola Mosca and Mattia Braggio
AI 2025, 6(3), 54; https://doi.org/10.3390/ai6030054 - 12 Mar 2025
Abstract
Industry 4.0 represents the main paradigm currently bringing great innovation in the field of automation and data exchange among production technologies, according to the principles of interoperability, virtualization, decentralization and production flexibility. The Fourth Industrial Revolution is driven by structural changes in the
[...] Read more.
Industry 4.0 represents the main paradigm currently bringing great innovation in the field of automation and data exchange among production technologies, according to the principles of interoperability, virtualization, decentralization and production flexibility. The Fourth Industrial Revolution is driven by structural changes in the manufacturing sector, such as the demand for customized products, market volatility and sustainability goals, and the integration of artificial intelligence and Big Data. This work aims to analyze, from a bibliometric point of view of journal papers on Scopus, with no time limitation, the existing literature on the application of AI in SMEs, which are crucial elements in the industrial and economic fabric of many countries. However, the adoption of modern technologies, particularly AI, can be challenging for them, due to the intrinsic structure of this type of enterprise, despite the positive effects obtained in large organizations.
Full article
(This article belongs to the Special Issue Artificial Intelligence Challenges to the Industrial Internet of Things and Industrial Control Systems Applications)
►▼
Show Figures

Graphical abstract
Open AccessArticle
Trade-Offs in Navigation Problems Using Value-Based Methods
by
Petra Csereoka and Mihai V. Micea
AI 2025, 6(3), 53; https://doi.org/10.3390/ai6030053 - 10 Mar 2025
Abstract
►▼
Show Figures
Deep Q-Networks (DQNs) have shown remarkable results over the last decade in scenarios ranging from simple 2D fully observable short episodes to partially observable, graphically intensive, and complex tasks. However, the base architecture of a vanilla DQN presents several shortcomings, some of which
[...] Read more.
Deep Q-Networks (DQNs) have shown remarkable results over the last decade in scenarios ranging from simple 2D fully observable short episodes to partially observable, graphically intensive, and complex tasks. However, the base architecture of a vanilla DQN presents several shortcomings, some of which were mitigated by new variants focusing on increased stability, faster convergence, and time dependencies. These additions, on the other hand, bring increased costs in terms of the required memory and lengthier training times. In this paper, we analyze the performance of state-of-the-art DQN families in a simple partially observable mission created in Minecraft and try to determine the optimal architecture for such problem classes in terms of the cost and accuracy. To the best of our knowledge, the analyzed methods have not been tested on the same scenario before, and hence a more in-depth comparison is required to understand the real performance improvement they provide better. This manuscript also offers a detailed overview of state-of-the-art DQN methods, together with the training heuristics and performance metrics registered during the proposed mission, allowing researchers to select better-suited models to solving future problems. Our experiments show that Double DQN networks are capable of handling partially observable scenarios gracefully while maintaining a low hardware footprint, Recurrent Double DQNs can be a good candidate even when the resources must be restricted, and double-dueling DQNs are a well-performing middle ground in terms of their cost and performance.
Full article

Figure 1
Open AccessArticle
Influence of Model Size and Image Augmentations on Object Detection in Low-Contrast Complex Background Scenes
by
Harman Singh Sangha and Matthew J. Darr
AI 2025, 6(3), 52; https://doi.org/10.3390/ai6030052 - 5 Mar 2025
Abstract
Background: Bigger and more complex models are often developed for challenging object detection tasks, and image augmentations are used to train a robust deep learning model for small image datasets. Previous studies have suggested that smaller models provide better performance compared to bigger
[...] Read more.
Background: Bigger and more complex models are often developed for challenging object detection tasks, and image augmentations are used to train a robust deep learning model for small image datasets. Previous studies have suggested that smaller models provide better performance compared to bigger models for agricultural applications, and not all image augmentation methods contribute equally to model performance. An important part of these studies was also to define the scene of the image. Methods: A standard definition was developed to describe scenes in real-world agricultural datasets by reviewing various image-based machine-learning applications in the agriculture literature. This study primarily evaluates the effects of model size in both one-stage and two-stage detectors on model performance for low-contrast complex background applications. It further explores the influence of different photo-metric image augmentation methods on model performance for standard one-stage and two-stage detectors. Results: For one-stage detectors, a smaller model performed better than a bigger model. Whereas in the case of two-stage detectors, model performance increased with model size. In image augmentations, some methods considerably improved model performance and some either provided no improvement or reduced the model performance in both one-stage and two-stage detectors compared to the baseline.
Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
►▼
Show Figures

Figure 1
Open AccessArticle
Sentence Interaction and Bag Feature Enhancement for Distant Supervised Relation Extraction
by
Wei Song and Qingchun Liu
AI 2025, 6(3), 51; https://doi.org/10.3390/ai6030051 - 4 Mar 2025
Abstract
Background: Distant supervision employs external knowledge bases to automatically match with text, allowing for the automatic annotation of sentences. Although this method effectively tackles the challenge of manual labeling, it inevitably introduces noisy labels. Traditional approaches typically employ sentence-level attention mechanisms, assigning lower
[...] Read more.
Background: Distant supervision employs external knowledge bases to automatically match with text, allowing for the automatic annotation of sentences. Although this method effectively tackles the challenge of manual labeling, it inevitably introduces noisy labels. Traditional approaches typically employ sentence-level attention mechanisms, assigning lower weights to noisy sentences to mitigate their impact. But this approach overlooks the critical importance of information flow between sentences. Additionally, previous approaches treated an entire bag as a single classification unit, giving equal importance to all features within the bag. However, they failed to recognize that different dimensions of features have varying levels of significance. Method: To overcome these challenges, this study introduces a novel network that incorporates sentence interaction and a bag-level feature enhancement (ESI-EBF) mechanism. We concatenate sentences within a bag into a continuous context, allowing information to flow freely between them during encoding. At the bag level, we partition the features into multiple groups based on dimensions, assigning an importance coefficient to each sub-feature within a group. This enhances critical features while diminishing the influence of less important ones. In the end, the enhanced features are utilized to construct high-quality bag representations, facilitating more accurate classification by the classification module. Result: The experimental findings from the New York Times (NYT) and Wiki-20m datasets confirm the efficacy of our suggested encoding approach and feature improvement module. Our method also outperforms state-of-the-art techniques on these datasets, achieving superior relation extraction accuracy.
Full article
(This article belongs to the Section AI Systems: Theory and Applications)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
AI, Buildings, Computers, Drones, Entropy, Symmetry
Applications of Machine Learning in Large-Scale Optimization and High-Dimensional Learning
Topic Editors: Jeng-Shyang Pan, Junzo Watada, Vaclav Snasel, Pei HuDeadline: 30 April 2025
Topic in
AI, Applied Sciences, Education Sciences, Electronics, Information
Explainable AI in Education
Topic Editors: Guanfeng Liu, Karina Luzia, Luke Bozzetto, Tommy Yuan, Pengpeng ZhaoDeadline: 30 June 2025
Topic in
Applied Sciences, Energies, Buildings, Smart Cities, AI
Smart Electric Energy in Buildings
Topic Editors: Daniel Villanueva Torres, Ali Hainoun, Sergio Gómez MelgarDeadline: 15 July 2025
Topic in
AI, BDCC, Fire, GeoHazards, Remote Sensing
AI for Natural Disasters Detection, Prediction and Modeling
Topic Editors: Moulay A. Akhloufi, Mozhdeh ShahbaziDeadline: 25 July 2025

Conferences
Special Issues
Special Issue in
AI
Artificial Intelligence for Network Management
Guest Editors: Stephen Ojo, Agbotiname Lucky Imoize, Lateef Adesola AkinyemiDeadline: 15 April 2025
Special Issue in
AI
Artificial Intelligence in Agriculture
Guest Editor: Arslan MunirDeadline: 30 April 2025
Special Issue in
AI
Advances in Tiny Machine Learning (TinyML): Applications, Models, and Implementation
Guest Editors: Giovanni Delnevo, Pietro ManzoniDeadline: 30 April 2025
Special Issue in
AI
Artificial Intelligence for Future Healthcare: Advancement, Impact, and Prospect in the Field of Cancer
Guest Editor: Arka BhowmikDeadline: 30 April 2025