MDPI - Publisher of Open Access Journals

38 pages, 2375 KB

Open AccessArticle

A Novel Dual-Loop Causality-Traceable Retrieval Framework for Long-Horizon Conversational Agents

by Din-Yuen Chan, Chih-Yu Cheng, Jhing-Fa Wang and Shih-Pang Tseng

Electronics 2026, 15(11), 2373; https://doi.org/10.3390/electronics15112373 - 1 Jun 2026

Viewed by 63

In long-horizon multi-party conversations, human-centric AI agents face a persistent structural problem: similarity-based retrieval may fail to reconnect semantically dispersed fragments of the same evolving event. This problem severely weakens causal continuity and multi-hop context recovery. To improve attribution trust and reduce structural erasure, we propose MemLoom, a dual-loop causality-traceable retrieval framework that organizes conversational history as an event memory graph. MemLoom decouples latency-sensitive online interaction from off-peak structural curation through online event formation, sentence-level buffering, asynchronous neuro-symbolic graph synthesis, and bounded dual-stream retrieval. Evaluations across QMSum, LoCoMo, and the synthetic causal diagnostic suite (SCDS) support the structural utility of MemLoom. For LoCoMo, under our unified local evaluation setup, MemLoom shows favorable temporal and multi-hop reasoning results (J = 65.77 and 58.14) relative to contemporary agentic baselines, such as Mem0, Zep, and A-Mem. For SCDS, within a controlled diagnostic setting, it recovers demanded causal chains more reliably than GraphRAG (SCR = 0.72 vs. 0.35) and maintains stronger answer-level auditability (AA = 0.80 vs. 0.50). This is achieved with a bounded online P95 latency of 1.67 s. These results indicate that asynchronous dual-loop stewardship has practical value for causality-traceable, event-centric conversational memory in multi-party settings. Full article

(This article belongs to the Special Issue AI-Driven Frameworks for Human–Computer Interaction)

► Show Figures

Figure 1

20 pages, 4055 KB

Open AccessArticle

An Efficient Gaze Control System for Kiosk-Based Embodied Conversational Agents in Multi-Party Conversations

by Sunghun Jung, Junyeong Kum and Myungho Lee

Electronics 2025, 14(8), 1592; https://doi.org/10.3390/electronics14081592 - 15 Apr 2025

Viewed by 1843

Abstract

The adoption of kiosks in public spaces is steadily increasing, with a trend toward providing more natural user experiences through embodied conversational agents (ECAs). To achieve human-like interactions, ECAs should be able to appropriately gaze at the speaker. However, kiosks in public spaces often face challenges, such as ambient noise and overlapping speech from multiple people, making it difficult to accurately identify the speaker and direct the ECA’s gaze accordingly. In this paper, we propose a lightweight gaze control system that is designed to operate effectively within the resource constraints of kiosks and the noisy conditions common in public spaces. We first developed a speaker detection model that identifies the active speaker in challenging noise conditions using only a single camera and microphone. The proposed model achieved a 91.6% mean Average Precision (mAP) in active speaker detection and a 0.6% improvement over the state-of-the-art lightweight model (Light ASD) (as evaluated on the noise-augmented AVA-Speaker Detection dataset), while maintaining real-time performance. Building on this, we developed a gaze control system for ECAs that detects the dominant speaker in a group and directs the ECA’s gaze toward them using an algorithm inspired by real human turn-taking behavior. To evaluate the system’s performance, we conducted a user study with 30 participants, comparing the system to a baseline condition (i.e., a fixed forward gaze) and a human-controlled gaze. The results showed statistically significant improvements in social/co-presence and gaze naturalness compared to the baseline, with no significant difference between the system and human-controlled gazes. This suggests that our system achieves a level of social presence and gaze naturalness comparable to a human-controlled gaze. The participants’ feedback, which indicated no clear distinction between human- and model-controlled conditions, further supports the effectiveness of our approach. Full article

(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)

► Show Figures

Figure 1

16 pages, 3403 KB

Open AccessArticle

Beyond Binary Dialogues: Research and Development of a Linguistically Nuanced Conversation Design for Social Robots in Group–Robot Interactions

by Christoph Bensch, Ana Müller, Oliver Chojnowski and Anja Richert

Appl. Sci. 2024, 14(22), 10316; https://doi.org/10.3390/app142210316 - 9 Nov 2024

Cited by 5 | Viewed by 2315

Abstract

In this paper, we detail the technical development of a conversation design that is sensitive to group dynamics and adaptable, taking into account the subtleties of linguistic variations between dyadic (i.e., one human and one agent) and group interactions in human–robot interaction (HRI) using the German language as a case study. The paper details the implementation of robust person and group detection with YOLOv5m and the expansion of knowledge databases using large language models (LLMs) to create adaptive multi-party interactions (MPIs) (i.e., group–robot interactions (GRIs)). We describe the use of LLMs to generate training data for socially interactive agents including social robots, as well as a self-developed synthesis tool, knowledge expander, to accurately map the diverse needs of different users in public spaces. We also outline the integration of a LLM as a fallback for open-ended questions not covered by our knowledge database, ensuring it can effectively respond to both individuals and groups within the MPI framework. Full article

(This article belongs to the Special Issue Advances in Cognitive Robotics and Control)

► Show Figures

Figure 1

20 pages, 1190 KB

Open AccessArticle

UAV Confrontation and Evolutionary Upgrade Based on Multi-Agent Reinforcement Learning

by Xin Deng, Zhaoqi Dong and Jishiyu Ding

Drones 2024, 8(8), 368; https://doi.org/10.3390/drones8080368 - 1 Aug 2024

Cited by 3 | Viewed by 2832

Abstract

Unmanned aerial vehicle (UAV) confrontation scenarios play a crucial role in the study of agent behavior selection and decision planning. Multi-agent reinforcement learning (MARL) algorithms serve as a universally effective method guiding agents toward appropriate action strategies. They determine subsequent actions based on the state of the agents and the environmental information that the agents receive. However, traditional MARL settings often result in one party agent consistently outperforming the other party due to superior strategies, or both agents reaching a strategic stalemate with no further improvement. To solve this issue, we propose a semi-static deep deterministic policy gradient algorithm based on MARL. This algorithm employs a centralized training and decentralized execution approach, dynamically adjusting the training intensity based on the comparative strengths and weaknesses of both agents’ strategies. Experimental results show that during the training process, the strategy of the winning team drives the losing team’s strategy to upgrade continuously, and the relationship between the winning team and the losing team keeps changing, thus achieving mutual improvement of the strategies of both teams. The semi-static reinforcement learning algorithm improves the win-loss relationship conversion by 8% and reduces the training time by 40% compared with the traditional reinforcement learning algorithm. Full article

(This article belongs to the Special Issue Distributed Control, Optimization, and Game of UAV Swarm Systems)

► Show Figures

Figure 1

17 pages, 9159 KB

Open AccessArticle

The Effect of Eye Contact in Multi-Party Conversations with Virtual Humans and Mitigating the Mona Lisa Effect

by Junyeong Kum, Sunghun Jung and Myungho Lee

Electronics 2024, 13(2), 430; https://doi.org/10.3390/electronics13020430 - 19 Jan 2024

Cited by 2 | Viewed by 3407

Abstract

The demand for kiosk systems with embodied conversational agents has increased with the development of artificial intelligence. There have been attempts to utilize non-verbal cues, particularly virtual human (VH) eye contact, to enable human-like interaction. Eye contact with VHs can affect satisfaction with the system and the perception of VHs. However, when rendered in 2D kiosks, the gaze direction of a VH can be incorrectly perceived, due to a lack of stereo cues. A user study was conducted to examine the effects of the gaze behavior of VHs in multi-party conversations in a 2D display setting. The results showed that looking at actual speakers affects the perceived interpersonal skills, social presence, attention, co-presence, and competence in conversations with VHs. In a second study, the gaze perception was further examined with consideration of the Mona Lisa effect, which can lead users to believe that a VH rendered on a 2D display is gazing at them, regardless of the actual direction, within a narrow range. We also proposed the camera rotation angle fine tuning (CRAFT) method to enhance the users’ perceptual accuracy regarding the direction of the VH’s gaze.The results showed that the perceptual accuracy for the VH gaze decreased in a narrow range and that CRAFT could increase the perceptual accuracy. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI