MDPI - Publisher of Open Access Journals

18 pages, 23505 KB

Open AccessFeature PaperArticle

ArtUnmasked: A Multimodal Classifier for Real, AI, and Imitated Artworks

by Akshad Chidrawar and Garima Bajwa

J. Imaging 2026, 12(3), 133; https://doi.org/10.3390/jimaging12030133 - 16 Mar 2026

Viewed by 349

Differentiating AI-generated, real, or imitated artworks is becoming a tedious and computationally challenging problem in digital art analysis. AI-generated art has become nearly indistinguishable from human-made works, posing a significant threat to copyrighted content. This content is appearing on online platforms, at exhibitions, [...] Read more.

Differentiating AI-generated, real, or imitated artworks is becoming a tedious and computationally challenging problem in digital art analysis. AI-generated art has become nearly indistinguishable from human-made works, posing a significant threat to copyrighted content. This content is appearing on online platforms, at exhibitions, and in commercial galleries, thereby escalating the risk of copyright infringement. This sudden increase in generative images raises concerns like authenticity, intellectual property, and the preservation of cultural heritage. Without an automated, comprehensible system to determine whether an artwork has been AI-generated, authentic (real), or imitated, artists are prone to the reduction of their unique works. Institutions also struggle to curate and safeguard authentic pieces. As the variety of generative models continues to grow, it becomes a cultural necessity to build a robust, efficient, and transparent framework for determining whether a piece of art or an artist is involved in potential copyright infringement. To address these challenges, we introduce ArtUnmasked, a practical and interpretable framework capable of (i) efficiently distinguishing AI-generated artworks from real ones using a lightweight Spectral Artifact Identification (SPAI), (ii) a TagMatch-based artist filtering module for stylistic attribution, and (iii) a DINOv3–CLIP similarity module with patch-level correspondence that leverages the one-shot generalization ability of modern vision transformers to determine whether an artwork is authentic or imitated. We also created a custom dataset of ∼24K imitated artworks to complement our evaluation and support future research. The complete implementation is available in our GitHub repository. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

15 pages, 5293 KB

Open AccessSystematic Review

Embodied Artificial Intelligence in Healthcare: A Systematic Review of Robotic Perception, Decision-Making, and Clinical Impact

by Bilal Ahmad Mir, Dur E. Nishwa and Seung Won Lee

Healthcare 2026, 14(5), 572; https://doi.org/10.3390/healthcare14050572 - 25 Feb 2026

Viewed by 813

Abstract

Background: Embodied artificial intelligence (EAI), integrating advanced AI algorithms with robotic platforms capable of sensing, planning, and acting, has emerged as a transformative approach in healthcare delivery. This systematic review synthesizes evidence on robotic perception, decision-making, and clinical impact of EAI systems [...] Read more.

Background: Embodied artificial intelligence (EAI), integrating advanced AI algorithms with robotic platforms capable of sensing, planning, and acting, has emerged as a transformative approach in healthcare delivery. This systematic review synthesizes evidence on robotic perception, decision-making, and clinical impact of EAI systems in healthcare settings. Methods: Following PRISMA 2020 guidelines, we searched PubMed/MEDLINE, Scopus, Web of Science, IEEE Xplore, and ACM Digital Library for studies published between January 2020 and August 2025. Seventeen studies met eligibility criteria, spanning four domains: surgical assistance, rehabilitation, hospital logistics, and telepresence. The protocol was prospectively registered in PROSPERO under ID: CRD420261285936. Results: Perception architectures predominantly employed multimodal sensor fusion, combining vision with force/torque, depth, and physiological signals. Decision-making approaches included imitation learning, reinforcement learning, and hybrid symbolic-neural control. Key findings indicate that surgical robots demonstrated consistency advantages in specific experimental tasks, rehabilitation robotics produced statistically significant improvements (SMD = 0.29) across 396 randomized controlled trials, and both logistics and telepresence systems achieved very high operational success levels. Nonetheless, important barriers remain, including limited external validation, small sample sizes, and insufficient cost-effectiveness data. Conclusions: Future research should prioritize standardized benchmarks, prospective multicenter trials, and patient-centered outcome measures to facilitate clinical translation of EAI technologies. Full article

(This article belongs to the Special Issue Assistive Technologies, Robotics, and Automated Machines in the Health Domain: Third Edition)

► Show Figures

Figure 1

14 pages, 3289 KB

Open AccessBrief Report

iTBS Stimulation of the Bilateral IFG/IPL Alters the Oscillatory Pattern in ASD

by Mitra Assadi, Reza Koiler, Ryan Ally, Richard Fischer and Rodney Scott

Brain Sci. 2026, 16(2), 192; https://doi.org/10.3390/brainsci16020192 - 6 Feb 2026

Viewed by 574

Abstract

Background: Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by impairments in social communication, reciprocity, and adaptive behavior. Converging neurobiological evidence suggests that these clinical features arise from aberrant connectivity and dysregulated neuronal oscillations across distributed brain networks. In particular, dysfunction within [...] Read more.

Background: Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by impairments in social communication, reciprocity, and adaptive behavior. Converging neurobiological evidence suggests that these clinical features arise from aberrant connectivity and dysregulated neuronal oscillations across distributed brain networks. In particular, dysfunction within the mirror neuron regions, concentrated in the inferior frontal gyrus (IFG) and inferior parietal lobule (IPL), has been implicated in deficits of imitation, empathy, and social cognition in ASD. Non-invasive neuromodulation using repetitive transcranial magnetic stimulation (rTMS) has shown modest behavioral benefits in ASD. However, most studies apply the conventional protocols targeting the dorsolateral prefrontal cortex. The effects of intermittent theta-burst stimulation (iTBS), a potent excitatory rTMS protocol targeting the mirror neuron regions, on the oscillatory dynamics in ASD remain largely unexplored. Objective: To investigate whether iTBS targeting the bilateral IFG and IPL modulates EEG-derived oscillatory activity in adolescents with ASD and to explore the relationship between oscillatory changes and social reciprocity. Methods: Six adolescents with Level I or II ASD (ages 13–18) underwent bilateral iTBS targeting the IFG and IPL using a figure-of-eight coil and standardized theta-burst parameters. Participants were randomized to receive either 18 active iTBS sessions or a waitlist-controlled crossover design (9 sham followed by 9 active sessions). Standard 21-channel EEG recordings were obtained during the first (EEG-1) and final (EEG-2) active stimulation sessions, including pre- and post-stimulation epochs. Power spectral analyses were conducted across frequency bands (delta through gamma). Behavioral outcomes were assessed using the Childhood Autism Rating Scale, Second Edition (CARS2), administered pre- and post-intervention. Results: All participants tolerated the intervention without adverse effects. Behavioral analysis demonstrated a significant reduction in CARS2 scores following iTBS and is reported in detail in our prior clinical outcomes manuscript, consistent with improved social reciprocity (p < 0.001). EEG analysis revealed an immediate post-stimulation increase in gamma-band power during EEG-1 in five of six participants, whereas lower-frequency bands exhibited variable responses. In contrast, EEG-2 showed no consistent post-stimulation gamma enhancement. Net comparisons between EEG-1 and EEG-2 demonstrated attenuation of the initial gamma response in the same five participants. At the group level, gamma percent change did not reach statistical significance at EEG-1 (p = 0.12) or EEG-2 (p = 0.66), and exploratory comparisons between the 9-active versus 18-active arms did not reach statistical significance. While ipsi-directional changes in gamma power and CARS2 scores were observed in four participants, correlation was not identified in this pilot sample. Conclusions: Bilateral iTBS targeting the IFG and IPL induces a transient enhancement of gamma oscillations in adolescents with ASD that attenuates with repeated stimulation. This pattern is consistent with adaptive homeostatic plasticity (metaplasticity) within excitatory–inhibitory circuits, potentially mediated by GABAergic interneurons. These findings support the feasibility of EEG as an objective biomarker of neuromodulatory engagement in ASD and highlight the importance of network-level and oscillatory mechanisms in interpreting therapeutic responses. Larger, sham-controlled studies incorporating multimodal biomarkers are warranted to clarify clinical relevance and optimize personalized neuromodulation strategies. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Figure 1

21 pages, 3773 KB

Open AccessArticle

Motion Strategy Generation Based on Multimodal Motion Primitives and Reinforcement Learning Imitation for Quadruped Robots

by Qin Zhang, Guanglei Li, Benhang Liu, Chenxi Li, Chuanle Zhu and Hui Chai

Biomimetics 2026, 11(2), 115; https://doi.org/10.3390/biomimetics11020115 - 4 Feb 2026

Viewed by 685

Abstract

With the advancement of task-oriented reinforcement learning (RL), the capability of quadruped robots for motion generation and complex task completion has significantly improved. However, current control strategies require extensive domain expertise and time-consuming design processes to acquire operational skills and achieve multi-task motion [...] Read more.

With the advancement of task-oriented reinforcement learning (RL), the capability of quadruped robots for motion generation and complex task completion has significantly improved. However, current control strategies require extensive domain expertise and time-consuming design processes to acquire operational skills and achieve multi-task motion control, often failing to effectively manage complex behaviors composed of multiple coordinated actions. To address these limitations, this paper proposes a motion policy generation method for quadruped robots based on multimodal motion primitives and imitation learning. A multimodal motion library was constructed using 3D engine motion design, motion capture data retargeting, and trajectory planning. A temporal domain-based behavior planner was designed to combine these primitives and generate complex behaviors. We developed a RL-based imitation learning training framework to achieve precise trajectory tracking and rapid policy deployment, ensuring the effective application of actions/behaviors on the quadruped platform. Simulation and physical experiments conducted on the Lite3 quadruped robot validated the efficacy of the proposed approach, offering a new paradigm for the deployment and development of motion strategies for quadruped robots. Full article

(This article belongs to the Section Locomotion and Bioinspired Robotics)

► Show Figures

Figure 1

21 pages, 3516 KB

Open AccessArticle

Diffusion-Guided Model Predictive Control for Signal Temporal Logic Specifications

by Jonghyuck Choi and Kyunghoon Cho

Electronics 2026, 15(3), 551; https://doi.org/10.3390/electronics15030551 - 27 Jan 2026

Viewed by 417

Abstract

We study control synthesis under Signal Temporal Logic (STL) specifications for driving scenarios where strict rule satisfaction is not always feasible and human experts exhibit context-dependent flexibility. We represent such behavior using robustness slackness—learned rule-wise lower bounds on STL robustness—and introduce sub-goals that [...] Read more.

We study control synthesis under Signal Temporal Logic (STL) specifications for driving scenarios where strict rule satisfaction is not always feasible and human experts exhibit context-dependent flexibility. We represent such behavior using robustness slackness—learned rule-wise lower bounds on STL robustness—and introduce sub-goals that encode intermediate intent in the state/output space (e.g., lane-level waypoints). Prior learning-based MPC–STL methods typically infer slackness with VAE priors and plug it into MPC, but these priors can underrepresent multimodal and rare yet valid expert behaviors and do not explicitly model intermediate intent. We propose a diffusion-guided MPC–STL framework that jointly learns slackness and sub-goals from demonstrations and integrates both into STL-constrained MPC. A conditional diffusion model generates pairs of (rule-wise slackness, sub-goal) conditioned on features from the ego vehicle, surrounding traffic, and road context. At run time, a few denoising steps produce samples for the current situation; slackness values define soft STL margins, while sub-goals shape the MPC objective via a terminal (optionally stage) cost, enabling context-dependent trade-offs between rule relaxation and task completion. In closed-loop simulations on held-out highD track-driving scenarios, our method improves task success and yields more realistic lane-changing behavior compared to imitation-learning baselines and MPC–STL variants using CVAE slackness or strict rule enforcement, while remaining computationally tractable for receding-horizon MPC in our experimental setting. Full article

(This article belongs to the Special Issue Real-Time Path Planning Design for Autonomous Driving Vehicles)

► Show Figures

Figure 1

21 pages, 1118 KB

Open AccessReview

Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines

by Yutong Liu, Qingquan Sun and Dhruvi Rajeshkumar Kapadia

AI 2025, 6(7), 158; https://doi.org/10.3390/ai6070158 - 15 Jul 2025

Cited by 8 | Viewed by 11058

Abstract

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into [...] Read more.

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into low-level control signals, supporting semantic planning and enabling adaptive execution. Systems like SayTap improve gait stability through LLM-generated contact patterns, while TrustNavGPT achieves a 5.7% word error rate (WER) under noisy voice-guided conditions by modeling user uncertainty. Frameworks such as MapGPT, LLM-Planner, and 3D-LOTUS++ integrate multi-modal data—including vision, speech, and proprioception—for robust planning and real-time recovery. We also highlight the use of physics-informed neural networks (PINNs) to model object deformation and support precision in contact-rich manipulation tasks. To bridge the gap between simulation and real-world deployment, we synthesize best practices from benchmark datasets (e.g., RH20T, Open X-Embodiment) and training pipelines designed for one-shot imitation learning and cross-embodiment generalization. Additionally, we analyze deployment trade-offs across cloud, edge, and hybrid architectures, emphasizing latency, scalability, and privacy. The survey concludes with a multi-dimensional taxonomy and cross-domain synthesis, offering design insights and future directions for building intelligent, human-aligned robotic systems powered by LLMs. Full article

► Show Figures

Figure 1

17 pages, 2758 KB

Open AccessArticle

History-Aware Multimodal Instruction-Oriented Policies for Navigation Tasks

by Renas Mukhametzianov and Hidetaka Nambo

AI 2025, 6(4), 75; https://doi.org/10.3390/ai6040075 - 11 Apr 2025

Viewed by 1925

Abstract

The rise of large-scale language models and multimodal transformers has enabled instruction-based policies, such as vision-and-language navigation. To leverage their general world knowledge, we propose multimodal annotations for action options and support selection from a dynamic, describable action space. Our framework employs a [...] Read more.

The rise of large-scale language models and multimodal transformers has enabled instruction-based policies, such as vision-and-language navigation. To leverage their general world knowledge, we propose multimodal annotations for action options and support selection from a dynamic, describable action space. Our framework employs a multimodal transformer that processes front-facing camera images, light detection and ranging (LIDAR) sensor’s point clouds, and tasks as textual instructions to produce a history-aware decision policy for mobile robot navigation. Our approach leverages a pretrained vision–language encoder and integrates it with a custom causal generative pretrained transformer (GPT) decoder to predict action sequences within a state–action history. We propose a trainable attention score mechanism to efficiently select the most suitable action from a variable set of possible options. Action options are text–image pairs and encoded using the same multimodal encoder employed for environment states. This approach of annotating and dynamically selecting actions is applicable to broader multidomain decision-making tasks. We compared two baseline models, ViLT (vision-and-language transformer) and FLAVA (foundational language and vision alignment), and found that FLAVA achieves superior performance within the constraints of 8 GB video memory usage in the training phase. Experiments were conducted in both simulated and real-world environments using our custom datasets for instructed task completion episodes, demonstrating strong prediction accuracy. These results highlight the potential of multimodal, dynamic action spaces for instruction-based robot navigation and beyond. Full article

(This article belongs to the Section AI in Autonomous Systems)

► Show Figures

Figure 1

16 pages, 6556 KB

Open AccessArticle

Origami-Inspired Vacuum-Actuated Foldable Actuator Enabled Biomimetic Worm-like Soft Crawling Robot

by Qiping Xu, Kehang Zhang, Chenhang Ying, Huiyu Xie, Jinxin Chen and Shiju E

Biomimetics 2024, 9(9), 541; https://doi.org/10.3390/biomimetics9090541 - 6 Sep 2024

Cited by 11 | Viewed by 4045

Abstract

The development of a soft crawling robot (SCR) capable of quick folding and recovery has important application value in the field of biomimetic engineering. This article proposes an origami-inspired vacuum-actuated foldable soft crawling robot (OVFSCR), which is composed of entirely soft foldable mirrored [...] Read more.

The development of a soft crawling robot (SCR) capable of quick folding and recovery has important application value in the field of biomimetic engineering. This article proposes an origami-inspired vacuum-actuated foldable soft crawling robot (OVFSCR), which is composed of entirely soft foldable mirrored origami actuators with a Kresling crease pattern, and possesses capabilities of realizing multimodal locomotion incorporating crawling, climbing, and turning movements. The OVFSCR is characterized by producing periodically foldable and restorable body deformation, and its asymmetric structural design of low front and high rear hexahedral feet creates a friction difference between the two feet and contact surface to enable unidirectional movement. Combining an actuation control sequence with an asymmetrical structural design, the body deformation and feet in contact with ground can be coordinated to realize quick continuous forward crawling locomotion. Furthermore, an efficient dynamic model is developed to characterize the OVFSCR’s motion capability. The robot demonstrates multifunctional characteristics, including crawling on a flat surface at an average speed of 11.9 mm/s, climbing a slope of 3°, carrying a certain payload, navigating inside straight and curved round tubes, removing obstacles, and traversing different media. It is revealed that the OVFSCR can imitate contractile deformation and crawling mode exhibited by soft biological worms. Our study contributes to paving avenues for practical applications in adaptive navigation, exploration, and inspection of soft robots in some uncharted territory. Full article

(This article belongs to the Special Issue Bioinspired Structures for Soft Actuators: 2nd Edition)

► Show Figures

Figure 1

17 pages, 1626 KB

Open AccessArticle

Modeling Autonomous Vehicle Responses to Novel Observations Using Hierarchical Cognitive Representations Inspired Active Inference

by Sheida Nozari, Ali Krayani, Pablo Marin, Lucio Marcenaro, David Martin Gomez and Carlo Regazzoni

Computers 2024, 13(7), 161; https://doi.org/10.3390/computers13070161 - 28 Jun 2024

Cited by 6 | Viewed by 2599

Abstract

Equipping autonomous agents for dynamic interaction and navigation is a significant challenge in intelligent transportation systems. This study aims to address this by implementing a brain-inspired model for decision making in autonomous vehicles. We employ active inference, a Bayesian approach that models decision-making [...] Read more.

Equipping autonomous agents for dynamic interaction and navigation is a significant challenge in intelligent transportation systems. This study aims to address this by implementing a brain-inspired model for decision making in autonomous vehicles. We employ active inference, a Bayesian approach that models decision-making processes similar to the human brain, focusing on the agent’s preferences and the principle of free energy. This approach is combined with imitation learning to enhance the vehicle’s ability to adapt to new observations and make human-like decisions. The research involved developing a multi-modal self-awareness architecture for autonomous driving systems and testing this model in driving scenarios, including abnormal observations. The results demonstrated the model’s effectiveness in enabling the vehicle to make safe decisions, particularly in unobserved or dynamic environments. The study concludes that the integration of active inference with imitation learning significantly improves the performance of autonomous vehicles, offering a promising direction for future developments in intelligent transportation systems. Full article

(This article belongs to the Special Issue System-Integrated Intelligence and Intelligent Systems 2023)

► Show Figures

Figure 1

28 pages, 1535 KB

Open AccessArticle

Technology-Mediated Hindustani Dhrupad Music Education: An Ethnographic Contribution to the 4E Cognition Perspective

by Stella Paschalidou

Educ. Sci. 2024, 14(2), 203; https://doi.org/10.3390/educsci14020203 - 17 Feb 2024

Cited by 5 | Viewed by 3116

Abstract

Embodiment lies at the core of music cognition, prompting recent pedagogical shifts towards a multi-sensory, whole-body approach. However, the education of oral music genres that rely exclusively on direct teacher–disciple transmission through live demonstration and imitation is now undergoing a transformation by rapidly [...] Read more.

Embodiment lies at the core of music cognition, prompting recent pedagogical shifts towards a multi-sensory, whole-body approach. However, the education of oral music genres that rely exclusively on direct teacher–disciple transmission through live demonstration and imitation is now undergoing a transformation by rapidly adapting to technology-mediated platforms. This paper examines challenges in embodied facets of video-mediated synchronous distance Hindustani music pedagogy. For this, it takes an ethnomusicological stance and showcases a thematic analysis of interviews featuring Dhrupad music practitioners. The analysis is driven and organized by the 4E Cognition principles, which stress the intimate relationship between body, mind, and environment. Findings indicate that while this adaptation aims to make music content more widely accessible, it comes at the cost of reducing opportunities for multi-modal engagement and interaction among participants. Results reveal limitations in transmitting non-verbal, embodied, multi-sensory cues, along with visual and acoustic disruptions of a sense of shared spatial and physical context, that hinder effective interaction and a sense of immersion, elements that are deemed vital in music education. They prompt concerns about the suitability of conventional videoconferencing platforms and offer key insights for the development of alternative technologies that can better assist embodied demands of the pedagogical practices involved. Full article

(This article belongs to the Special Issue Cultivating Creativity and Innovation in Music Education)

► Show Figures

Figure 1

15 pages, 630 KB

Open AccessArticle

Imitation of Novel Intransitive Body Actions in a Beluga Whale (Delphinapterus leucas): A “Do as Other Does” Study

by José Zamorano-Abramson and María Victoria Hernández-Lloreda

Animals 2023, 13(24), 3763; https://doi.org/10.3390/ani13243763 - 6 Dec 2023

Viewed by 3575

Abstract

Cetaceans are well known for their unique behavioral habits, such as calls and tactics. The possibility that these are acquired through social learning continues to be explored. This study investigates the ability of a young beluga whale to imitate novel behaviors. Using a [...] Read more.

Cetaceans are well known for their unique behavioral habits, such as calls and tactics. The possibility that these are acquired through social learning continues to be explored. This study investigates the ability of a young beluga whale to imitate novel behaviors. Using a do-as-other-does paradigm, the subject observed the performance of a conspecific demonstrator involving familiar and novel behaviors. The subject: (1) learned a specific ‘copy’ command; (2) copied 100% of the demonstrator’s familiar behaviors and accurately reproduced two out of three novel actions; (3) achieved full matches on the first trial for a subset of familiar behaviors; and (4) demonstrated proficiency in coping with each familiar behavior as well as the two novel behaviors. This study provides the first experimental evidence of a beluga whale’s ability to imitate novel intransitive (non-object-oriented) body movements on command. These results contribute to our understanding of the remarkable ability of cetaceans, including dolphins, orcas, and now beluga whales, to engage in multimodal imitation involving sounds and movements. This ability, rarely documented in non-human animals, has significant implications for the development of survival strategies, such as the acquisition of knowledge about natal philopatry, migration routes, and traditional feeding areas, among these marine mammals. Full article

(This article belongs to the Special Issue Advances in Marine Mammal Cognition and Cognitive Welfare)

► Show Figures

Figure 1

17 pages, 8426 KB

Open AccessArticle

Design, Modeling, and Control of an Aurelia-Inspired Robot Based on SMA Artificial Muscles

by Yihan Yang, Chenzhong Chu, Hu Jin, Qiqiang Hu, Min Xu and Erbao Dong

Biomimetics 2023, 8(2), 261; https://doi.org/10.3390/biomimetics8020261 - 15 Jun 2023

Cited by 16 | Viewed by 4165

Abstract

This paper presented a flexible and easily fabricated untethered underwater robot inspired by Aurelia, which is named “Au-robot”. The Au-robot is actuated by six radial fins made of shape memory alloy (SMA) artificial muscle modules, which can realize pulse jet propulsion motion. The [...] Read more.

This paper presented a flexible and easily fabricated untethered underwater robot inspired by Aurelia, which is named “Au-robot”. The Au-robot is actuated by six radial fins made of shape memory alloy (SMA) artificial muscle modules, which can realize pulse jet propulsion motion. The thrust model of the Au-robot’s underwater motion is developed and analyzed. To achieve a multimodal and smooth swimming transition for the Au-robot, a control method integrating a central pattern generator (CPG) and an adaptive regulation (AR) heating strategy is provided. The experimental results demonstrate that the Au-robot, with good bionic properties in structure and movement mode, can achieve a smooth transition from low-frequency swimming to high-frequency swimming with an average maximum instantaneous velocity of 12.61 cm/s. It shows that a robot designed and fabricated with artificial muscle can imitate biological structures and movement traits more realistically and has better motor performance. Full article

(This article belongs to the Special Issue Bio-Inspired Underwater Robot)

► Show Figures

Figure 1

19 pages, 8583 KB

Open AccessArticle

Learning-Based Visual Servoing for High-Precision Peg-in-Hole Assembly

by Yue Shen, Qingxuan Jia, Ruiquan Wang, Zeyuan Huang and Gang Chen

Actuators 2023, 12(4), 144; https://doi.org/10.3390/act12040144 - 27 Mar 2023

Cited by 16 | Viewed by 7358

Abstract

Visual servoing is widely used in the peg-in-hole assembly due to the uncertainty of pose. Humans can easily align the peg with the hole according to key visual points/edges. By imitating human behavior, we propose P2HNet, a learning-based neural network that can directly [...] Read more.

Visual servoing is widely used in the peg-in-hole assembly due to the uncertainty of pose. Humans can easily align the peg with the hole according to key visual points/edges. By imitating human behavior, we propose P2HNet, a learning-based neural network that can directly extract desired landmarks for visual servoing. To avoid collecting and annotating a large number of real images for training, we built a virtual assembly scene to generate many synthetic data for transfer learning. A multi-modal peg-in-hole strategy is then introduced to combine image-based search-and-force-based insertion. P2HNet-based visual servoing and spiral search are used to align the peg with the hole from coarse to fine. Force control is then used to complete the insertion. The strategy exploits the flexibility of neural networks and the stability of traditional methods. The effectiveness of the method was experimentally verified in the D-sub connector assembly with sub-millimeter clearance. The results show that the proposed method can achieve a higher success rate and efficiency than the baseline method in the high-precision peg-in-hole assembly. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications in Robotics)

► Show Figures

Figure 1

15 pages, 10966 KB

Open AccessArticle

Multimodal Biometrics Recognition Using a Deep Convolutional Neural Network with Transfer Learning in Surveillance Videos

by Hsu Mon Lei Aung, Charnchai Pluempitiwiriyawej, Kazuhiko Hamamoto and Somkiat Wangsiripitak

Computation 2022, 10(7), 127; https://doi.org/10.3390/computation10070127 - 21 Jul 2022

Cited by 20 | Viewed by 4570

Abstract

Biometric recognition is a critical task in security control systems. Although the face has long been widely accepted as a practical biometric for human recognition, it can be easily stolen and imitated. Moreover, in video surveillance, it is a challenge to obtain reliable [...] Read more.

Biometric recognition is a critical task in security control systems. Although the face has long been widely accepted as a practical biometric for human recognition, it can be easily stolen and imitated. Moreover, in video surveillance, it is a challenge to obtain reliable facial information from an image taken at a long distance with a low-resolution camera. Gait, on the other hand, has been recently used for human recognition because gait is not easy to replicate, and reliable information can be obtained from a low-resolution camera at a long distance. However, the gait biometric alone still has constraints due to its intrinsic factors. In this paper, we propose a multimodal biometrics system by combining information from both the face and gait. Our proposed system uses a deep convolutional neural network with transfer learning. Our proposed network model learns discriminative spatiotemporal features from gait and facial features from face images. The two extracted features are fused into a common feature space at the feature level. This study conducted experiments on the publicly available CASIA-B gait and Extended Yale-B databases and a dataset of walking videos of 25 users. The proposed model achieves a 97.3 percent classification accuracy with an F1 score of 0.97and an equal error rate (EER) of 0.004. Full article

► Show Figures

Figure 1

12 pages, 3188 KB

Open AccessArticle

Identification of 3D Lip Shape during Japanese Vowel Pronunciation Using Deep Learning

by Yoshihiro Sato and Yue Bao

Appl. Sci. 2022, 12(9), 4632; https://doi.org/10.3390/app12094632 - 5 May 2022

Cited by 1 | Viewed by 3857

Abstract

People with speech impediments and hearing impairments, whether congenital or acquired, often encounter difficulty in speaking. Therefore, to acquire conversational communication abilities, it is necessary to practice lipreading and imitation so that correct vocalization can be achieved. In conventional lipreading methods using machine [...] Read more.

People with speech impediments and hearing impairments, whether congenital or acquired, often encounter difficulty in speaking. Therefore, to acquire conversational communication abilities, it is necessary to practice lipreading and imitation so that correct vocalization can be achieved. In conventional lipreading methods using machine learning, model refinement and multimodal processing are the norm to maintain high accuracy. However, since 3D point clouds can now be obtained using smartphones and other devices, it is becoming viable to consider methods that use 3D information. Therefore, given the obvious relation between vowel pronunciation and three-dimensional (3D) lip shape, in this study, we propose a method of extracting and discriminating vowel features via deep learning using 3D point clouds of the lip region. For training, we created two datasets: mixed-gender and male-only datasets. The results of the experiment showed that the average accuracy rate of the k-fold cross-validation exceeded 70% for both the mixed-gender and male-only data. In particular, although the proposed method was ~3.835% less accurate than the machine learning results for 2D images, the training parameters were reduced by 92.834%, and the proposed method succeeded in obtaining vowel features from 3D lip shapes. Full article

(This article belongs to the Special Issue Applications of Deep Learning and Artificial Intelligence Methods)

► Show Figures

Figure 1

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI