Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (612)

Search Parameters:
Keywords = human skeleton

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 3363 KB  
Article
A Hybrid CNN-GCN Architecture with Sparsity and Dataflow Optimization for Mobile AR
by Jiazhong Chen and Ziwei Chen
Appl. Sci. 2025, 15(17), 9356; https://doi.org/10.3390/app15179356 - 26 Aug 2025
Viewed by 281
Abstract
Mobile augmented reality (AR) applications require high-performance, energy-efficient deep learning solutions to deliver immersive experiences on resource-constrained devices. We propose SAHA-WS, a Sparsity-Aware Hybrid Architecture with Weight-Stationary Dataflow, combining Convolutional Neural Networks (CNNs) and Graph Convolutional Networks (GCNs) to efficiently process grid-like (e.g., [...] Read more.
Mobile augmented reality (AR) applications require high-performance, energy-efficient deep learning solutions to deliver immersive experiences on resource-constrained devices. We propose SAHA-WS, a Sparsity-Aware Hybrid Architecture with Weight-Stationary Dataflow, combining Convolutional Neural Networks (CNNs) and Graph Convolutional Networks (GCNs) to efficiently process grid-like (e.g., images) and graph-structured (e.g., human skeletons) data. SAHA-WS leverages channel-wise sparsity in CNNs and adjacency matrix sparsity in GCNs, paired with weight-stationary dataflow, to minimize computations and memory access. Evaluations on ImageNet, COCO, and NTU RGB+D datasets demonstrate SAHA-WS achieves 87.5% top-1 accuracy, 75.8% mAP, and 92.5% action recognition accuracy at 0% sparsity, with 40 ms latency and 42 mJ energy consumption at 60% sparsity, outperforming a baseline by 1020% in efficiency. Ablation studies confirm the contributions of sparsity and dataflow optimizations. SAHA-WS enables complex AR applications to run smoothly on mobile devices, enhancing immersive and engaging experiences. Full article
Show Figures

Figure 1

19 pages, 7846 KB  
Article
Effect of Visual Quality of Street Space on Tourists’ Stay Willingness in Traditional Villages—Empirical Evidence from Huangcun Village Based on Street View Images and Machine Learning
by Li Tu, Xiao Jiang, Yixing Guo and Qi Qin
Land 2025, 14(8), 1631; https://doi.org/10.3390/land14081631 - 13 Aug 2025
Viewed by 395
Abstract
As the texture skeleton of the traditional village, the street space is the main area for tourists to visit in traditional villages; it is regarded as the spatial conversion place of human flow and the space frequently visited by tourists. Accumulating evidence shows [...] Read more.
As the texture skeleton of the traditional village, the street space is the main area for tourists to visit in traditional villages; it is regarded as the spatial conversion place of human flow and the space frequently visited by tourists. Accumulating evidence shows that the visual quality of street spaces has an effect on pedestrians’ walking behaviors in urban areas, but this effect in traditional villages needs to be further explored. This paper takes Huangcun Village, Yixian County, Huangshan City, as the research area to explore the influence of the objective visual factors of street spaces on tourists’ subjective stay willingness. First, an evaluation system of the visual quality of street spaces was developed. With the assistance of computer vision and deep learning technologies, semantic segmentation of Huangcun Village street view images was performed to obtain a visual quality index and then calculate the descriptive index of Huangcun Village’s street space. Then, combining the data of tourists’ stay willingness with the visual quality of the street space, the overall evaluation results and space distribution of tourists’ stay willingness in Huangcun Village were predicted using the Trueskill algorithm and machine learning prediction model. Finally, the influence of the objective visual quality of the street space on tourist subjective stay willingness was analyzed by correlation analysis. This research could provide some useful information for street space design and tourism planning in traditional villages. Full article
(This article belongs to the Section Land Planning and Landscape Architecture)
Show Figures

Figure 1

21 pages, 2428 KB  
Article
Robust Human Pose Estimation Method for Body-to-Body Occlusion Using RGB-D Fusion Neural Network
by Jae-hyuk Yoon and Soon-kak Kwon
Appl. Sci. 2025, 15(15), 8746; https://doi.org/10.3390/app15158746 - 7 Aug 2025
Viewed by 571
Abstract
In this study, we propose a novel approach for human pose estimation (HPE) in occluded scenes by progressively fusing features extracted from RGB-D images, which contain RGB and depth images. Conventional bottom-up human pose estimation models that rely solely on RGB inputs often [...] Read more.
In this study, we propose a novel approach for human pose estimation (HPE) in occluded scenes by progressively fusing features extracted from RGB-D images, which contain RGB and depth images. Conventional bottom-up human pose estimation models that rely solely on RGB inputs often produce erroneous skeletons when parts of a person’s body are obscured by another individual, because they struggle to accurately infer body connectivity due to the lack of 3D topological information. To address this limitation, we modify the traditional OpenPose that is a bottom-up HPE model to take a depth image as an additional input, thereby providing explicit 3D spatial cues. Each input modality is processed by a dedicated feature extractor. Each input modality is processed by a dedicated feature extractor. In addition to the two existing modules for each stage—joint connectivity and joint confidence map estimations for the color image—we integrate a new module for estimating joint confidence maps for the depth image into the initial few stages. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next, ensuring that 3D topological information from the depth image is effectively utilized for both joint localization and body part association. Subsequently, the confidence maps derived from both depth and RGB modalities are fused at each stage and forwarded to the next to ensure that 3D topological information is effectively utilized for estimating both joint localization and their connectivity. The experimental results on the NTU 120+ RGB-D Dataset verify that our proposed approach achieves a 13.3% improvement in average recall compared to the original OpenPose model. The proposed method can enhance the performance of the bottom-up HPE models for the occlusion scenes. Full article
(This article belongs to the Special Issue Advanced Pattern Recognition & Computer Vision)
Show Figures

Figure 1

18 pages, 4529 KB  
Article
LGSIK-Poser: Skeleton-Aware Full-Body Motion Reconstruction from Sparse Inputs
by Linhai Li, Jiayi Lin and Wenhui Zhang
AI 2025, 6(8), 180; https://doi.org/10.3390/ai6080180 - 7 Aug 2025
Viewed by 536
Abstract
Accurate full-body motion reconstruction from sparse sensors is crucial for VR/AR applications but remains challenging due to the under-constrained nature of limited observations and the computational constraints of mobile platforms. This paper presents LGSIK-Poser, a unified and lightweight framework that supports real-time motion [...] Read more.
Accurate full-body motion reconstruction from sparse sensors is crucial for VR/AR applications but remains challenging due to the under-constrained nature of limited observations and the computational constraints of mobile platforms. This paper presents LGSIK-Poser, a unified and lightweight framework that supports real-time motion reconstruction from heterogeneous sensor configurations, including head-mounted displays, handheld controllers, and up to three optional inertial measurement units, without requiring reconfiguration across scenarios. The model integrates temporally grouped LSTM modeling, anatomically structured graph-based reasoning, and region-specific inverse kinematics refinement to enhance end-effector accuracy and structural consistency. Personalized body shape is estimated using user-specific anthropometric priors within the SMPL model, a widely adopted parametric representation of human shape and pose. Experiments on the AMASS benchmark demonstrate that LGSIK-Poser achieves state-of-the-art accuracy with up to 48% improvement in hand localization, while reducing model size by 60% and latency by 22% compared to HMD-Poser. The system runs at 63.65 FPS with only 3.74 M parameters, highlighting its suitability for real-time immersive applications. Full article
Show Figures

Figure 1

16 pages, 4989 KB  
Review
The Use of Paranasal Sinuses in Human Identification: Useful Concepts for Forensic Practitioners
by Joe Adserias-Garriga, Hannah Skropits and Brailey Moeder
Forensic Sci. 2025, 5(3), 35; https://doi.org/10.3390/forensicsci5030035 - 6 Aug 2025
Viewed by 392
Abstract
Background: Positive identification is at the forefront of tasks for forensic practitioners when a set of remains is discovered. Standard means of identification include fingerprints, dental, and DNA analyses; however, additional methods are utilized by forensic practitioners to identify remains when these primary [...] Read more.
Background: Positive identification is at the forefront of tasks for forensic practitioners when a set of remains is discovered. Standard means of identification include fingerprints, dental, and DNA analyses; however, additional methods are utilized by forensic practitioners to identify remains when these primary methods of identification are not applicable. Comparative radiography has become a frequently employed approach for positive identification, specifically focused on individualizing characteristics evident in human skeletal variation. Regions that display wide ranges of morphological variation within the human skeleton include the cranium as well as the thorax. With regard to the cranium specifically, paranasal sinuses have been recognized as unique features and are valuable for identification purposes. Objectives: This paper explores the basic information of the anatomy and development, range of variation, and the importance of paranasal sinuses in forensic contexts. Results: This article discusses how practitioners can best use the morphological information contained in the paranasal sinuses and how to compare the antemortem and postmortem datasets involving different imaging modalities for positive identification purposes, in order to provide practical concepts that may assist in cases where paranasal sinuses may be used for forensic human identification. Conclusions: Understanding the development of paranasal sinuses, the imaging techniques applied for their visualization, as well as the principles of identification, is key to conducting proper antemortem vs. postmortem comparisons and effectively utilizing paranasal sinuses in forensic identification contexts. Full article
(This article belongs to the Special Issue Forensic Anthropology and Human Biological Variation)
Show Figures

Figure 1

16 pages, 5104 KB  
Article
Integrating OpenPose for Proactive Human–Robot Interaction Through Upper-Body Pose Recognition
by Shih-Huan Tseng, Jhih-Ciang Chiang, Cheng-En Shiue and Hsiu-Ping Yueh
Electronics 2025, 14(15), 3112; https://doi.org/10.3390/electronics14153112 - 5 Aug 2025
Viewed by 443
Abstract
This paper introduces a novel system that utilizes OpenPose for skeleton estimation to enable a tabletop robot to interact with humans proactively. By accurately recognizing upper-body poses based on the skeleton information, the robot autonomously approaches individuals and initiates conversations. The contributions of [...] Read more.
This paper introduces a novel system that utilizes OpenPose for skeleton estimation to enable a tabletop robot to interact with humans proactively. By accurately recognizing upper-body poses based on the skeleton information, the robot autonomously approaches individuals and initiates conversations. The contributions of this paper can be summarized into three main features. Firstly, we conducted a comprehensive data collection process, capturing five different table-front poses: looking down, looking at the screen, looking at the robot, resting the head on hands, and stretching both hands. These poses were selected to represent common interaction scenarios. Secondly, we designed the robot’s dialog content and movement patterns to correspond with the identified table-front poses. By aligning the robot’s responses with the specific pose, we aimed to create a more engaging and intuitive interaction experience for users. Finally, we performed an extensive evaluation by exploring the performance of three classification models—non-linear Support Vector Machine (SVM), Artificial Neural Network (ANN), and convolutional neural network (CNN)—for accurately recognizing table-front poses. We used an Asus Zenbo Junior robot to acquire images and leveraged OpenPose to extract 12 upper-body skeleton points as input for training the classification models. The experimental results indicate that the ANN model outperformed the other models, demonstrating its effectiveness in pose recognition. Overall, the proposed system not only showcases the potential of utilizing OpenPose for proactive human–robot interaction but also demonstrates its real-world applicability. By combining advanced pose recognition techniques with carefully designed dialog and movement patterns, the tabletop robot successfully engages with humans in a proactive manner. Full article
Show Figures

Figure 1

18 pages, 3407 KB  
Article
Graph Convolutional Network with Multi-View Topology for Lightweight Skeleton-Based Action Recognition
by Liangliang Wang, Xu Zhang and Chuang Zhang
Symmetry 2025, 17(8), 1235; https://doi.org/10.3390/sym17081235 - 4 Aug 2025
Viewed by 493
Abstract
Skeleton-based action recognition is an important subject in deep learning. Graph Convolutional Networks (GCNs) have demonstrated strong performance by modeling the human skeleton as a natural topological graph, representing the connections between joints. However, most existing methods rely on non-adaptive topologies or insufficiently [...] Read more.
Skeleton-based action recognition is an important subject in deep learning. Graph Convolutional Networks (GCNs) have demonstrated strong performance by modeling the human skeleton as a natural topological graph, representing the connections between joints. However, most existing methods rely on non-adaptive topologies or insufficiently expressive representations. To address these limitations, we propose a Multi-view Topology Refinement Graph Convolutional Network (MTR-GCN), which is efficient, lightweight, and delivers high performance. Specifically: (1) We propose a new spatial topology modeling approach that incorporates two views. A dynamic view fuses joint information from dual streams in a pairwise manner, while a static view encodes the shortest static paths between joints, preserving the original connectivity relationships. (2) We propose a new MultiScale Temporal Convolutional Network (MSTC), which is efficient and lightweight. (3) Furthermore, we introduce a new temporal topology strategy by modeling temporal frames as a graph, which strengthens the extraction of temporal features. By modeling the human skeleton as both a spatial and a temporal graph, we reveal a topological symmetry between space and time within the unified spatio-temporal framework. The proposed model achieves state-of-the-art performance on several benchmark datasets, including NTU RGB + D (XSub: 92.8%, XView: 96.8%), NTU RGB + D 120 (XSub: 89.6%, XSet: 90.8%), and NW-UCLA (95.7%), demonstrating the effectiveness of our GCN module, TCN module, and overall architecture. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

22 pages, 2525 KB  
Article
mmHSE: A Two-Stage Framework for Human Skeleton Estimation Using mmWave FMCW Radar Signals
by Jiake Tian, Yi Zou and Jiale Lai
Appl. Sci. 2025, 15(15), 8410; https://doi.org/10.3390/app15158410 - 29 Jul 2025
Viewed by 367
Abstract
We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using [...] Read more.
We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using a dual-node radar acquisition platform. Leveraging the collected data, we develop a two-stage neural architecture for human skeleton estimation. The first stage employs a dual-branch network with depthwise separable convolutions and self-attention to extract multi-scale spatiotemporal features from dual-view radar inputs. A cross-modal attention fusion module is then used to generate initial estimates of 21 skeletal keypoints. The second stage refines these estimates using a skeletal topology module based on graph convolutional networks, which captures spatial dependencies among joints to enhance localization accuracy. Experiments show that mmHSE achieves a Mean Absolute Error (MAE) of 2.78 cm. In cross-domain evaluations, the MAE remains at 3.14 cm, demonstrating the method’s generalization ability and robustness for non-intrusive human pose estimation from mmWave FMCW radar signals. Full article
Show Figures

Figure 1

23 pages, 5594 KB  
Article
Dynamic Properties of Steel-Wrapped RC Column–Beam Joints Connected by Embedded Horizontal Steel Plate: Experimental Study
by Jian Wu, Mingwei Ma, Changhao Wei, Jian Zhou, Yuxi Wang, Jianhui Wang and Weigao Ding
Buildings 2025, 15(15), 2657; https://doi.org/10.3390/buildings15152657 - 28 Jul 2025
Viewed by 447
Abstract
The performance of reinforced concrete (RC) frame structures will gradually decrease over time, posing a threat to the safety of buildings. Although the performance of some buildings may still meet the safety requirements, they cannot meet new usage requirements. Therefore, this paper proposes [...] Read more.
The performance of reinforced concrete (RC) frame structures will gradually decrease over time, posing a threat to the safety of buildings. Although the performance of some buildings may still meet the safety requirements, they cannot meet new usage requirements. Therefore, this paper proposes a new-type joint to promote the development of research on the reinforcement and renovation of RC frame structures in response to this situation. The RC beams and columns of the joints are connected by embedded horizontal steel plate (a single plate with dimension of 150 mm × 200 mm × 5 mm), and the beams and columns are individually wrapped in steel. Through conducting low cyclic loading tests, this paper analyzes the influence of carrying out wrapped steel treatment and the thickness of wrapped steel of the beam and connector on mechanical performance indicators such as hysteresis curve, skeleton curve, stiffness, ductility, and energy dissipation. The experimental results indicate that the reinforcement using steel plate can significantly improve the dynamic performance of the joint. The effect of changing the thickness of the connector on the dynamic performance of the specimen is not significant, while increasing the thickness of wrapped steel of beam can effectively improve the overall strength of joint. The research results of this paper will help promote the application of reinforcement and renovation technology for existing buildings, and improve the quality of human living. Full article
Show Figures

Figure 1

18 pages, 352 KB  
Review
Bone Type Selection for Human Molecular Genetic Identification of Skeletal Remains
by Jezerka Inkret and Irena Zupanič Pajnič
Genes 2025, 16(8), 872; https://doi.org/10.3390/genes16080872 - 24 Jul 2025
Viewed by 474
Abstract
This review paper presents a comprehensive overview of DNA preservation in hard tissues (bones and teeth) for applications in forensic and archaeogenetic analyses. It presents bone structure, DNA location in bones and teeth, and extensive information about postmortem DNA location and preservation. Aged [...] Read more.
This review paper presents a comprehensive overview of DNA preservation in hard tissues (bones and teeth) for applications in forensic and archaeogenetic analyses. It presents bone structure, DNA location in bones and teeth, and extensive information about postmortem DNA location and preservation. Aged bones are a challenging biological material for DNA isolation due to their low DNA content, degraded DNA, and the potential presence of PCR inhibitors. In addition, the binding of DNA to the mineral matrix necessitates the inclusion of a demineralization process in extraction, and its contribution to the resulting increase in both DNA quality and quantity is explained. Guidelines and recommendations on bone sample selection to obtain higher DNA yields are discussed in terms of past, recent, and possible future recommendations. Interskeletal and intraskeletal differences in DNA yield are also explained. Recent studies have shown that current recommendations for the genetic identification of skeletal remains, including femurs, tibias, and teeth, may not be the most effective sampling approach. Moreover, when mass disasters and mass graves with commingled skeletal remains are considered, there is a greater possibility that the recommended set of skeletal elements will not be available for sampling and subsequent genetic testing. This review highlights interskeletal and intraskeletal variability in DNA yield, with a focus on studies conducted on poorly preserved skeletal remains, including both postwar (1945) victims from Slovenia and ancient human skeletons. Special emphasis is placed on anatomical differences and potential mechanisms influencing DNA preservation, as demonstrated in research on both modern and historical skeletons. Finally, the petrous part of the temporal bone and tooth cementum were reviewed in greater detail because they have been recognized as an optimal sampling type in both ancient DNA studies and routine forensic case analyses. Our experiences with the Second World War and archaeological petrous bones are discussed and compared to those of other bone types. Full article
(This article belongs to the Section Molecular Genetics and Genomics)
29 pages, 3547 KB  
Article
Morphological and Metric Analysis of Medieval Dog Remains from Wolin, Poland
by Piotr Baranowski
Animals 2025, 15(15), 2171; https://doi.org/10.3390/ani15152171 - 23 Jul 2025
Viewed by 480
Abstract
This study analyzes 209 dog skeletons from two sites in Wolin (9th–mid-13th century AD) using 100 standard metric variables covering cranial, mandibular, and postcranial elements. Estimated withers height, body mass, age at death, and sex were derived using established methods. The results indicate [...] Read more.
This study analyzes 209 dog skeletons from two sites in Wolin (9th–mid-13th century AD) using 100 standard metric variables covering cranial, mandibular, and postcranial elements. Estimated withers height, body mass, age at death, and sex were derived using established methods. The results indicate the presence of at least two to three morphotypes: small spitz-like dogs (40–50 cm, 4–6 kg), medium brachycephalic forms (50–60 cm, 10–15 kg), and larger mesocephalic individuals (up to 65 cm, 20–40 kg). Dogs lived 3–10 years, with both sexes represented. Signs of cranial trauma and dental wear suggest utilitarian roles such as guarding. The size range and morphological diversity point to intentional breeding and trade-based importation. Small dogs likely served as companions or city guards, while medium and large types were used for herding, hunting, or transport. These findings highlight Wolin’s role as a dynamic cultural and trade center, where human–dog relationships were shaped by anthropogenic selection and regional exchange. Full article
(This article belongs to the Section Companion Animals)
Show Figures

Figure 1

18 pages, 33092 KB  
Article
Yarn Color Measurement Method Based on Digital Photography
by Jinxing Liang, Guanghao Wu, Ke Yang, Jiangxiaotian Ma, Jihao Wang, Hang Luo, Xinrong Hu and Yong Liu
J. Imaging 2025, 11(8), 248; https://doi.org/10.3390/jimaging11080248 - 22 Jul 2025
Viewed by 370
Abstract
To overcome the complexity of yarn color measurement using spectrophotometry with yarn winding techniques and to enhance consistency with human visual perception, a yarn color measurement method based on digital photography is proposed. This study employs a photographic colorimetry system to capture digital [...] Read more.
To overcome the complexity of yarn color measurement using spectrophotometry with yarn winding techniques and to enhance consistency with human visual perception, a yarn color measurement method based on digital photography is proposed. This study employs a photographic colorimetry system to capture digital images of single yarns. The yarn and background are segmented using the K-means clustering algorithm, and the centerline of the yarn is extracted using a skeletonization algorithm. Spectral reconstruction and colorimetric principles are then applied to calculate the color values of pixels along the centerline. Considering the nonlinear characteristics of human brightness perception, the final yarn color is obtained through a nonlinear texture-adaptive weighted computation. The method is validated through psychophysical experiments using six yarns of different colors and compared with spectrophotometry and five other photographic measurement methods. Results indicate that among the seven yarn color measurement methods, including spectrophotometry, the proposed method—based on centerline extraction and nonlinear texture-adaptive weighting—yields results that more closely align with actual visual perception. Furthermore, among the six photographic measurement methods, the proposed method produces most similar to those obtained using spectrophotometry. This study demonstrates the inconsistency between spectrophotometric measurements and human visual perception of yarn color and provides methodological support for developing visually consistent color measurement methods for textured textiles. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

22 pages, 3538 KB  
Article
Evaluating the Effectiveness of Coxal Bone Measurements for Sex Estimation via Machine Learning
by Diana Toneva, Silviya Nikolova, Gennady Agre, Nevena Fileva, Georgi Milenov and Dora Zlatareva
Biology 2025, 14(7), 866; https://doi.org/10.3390/biology14070866 - 17 Jul 2025
Viewed by 394
Abstract
The pelvis is the most dimorphic part of the human skeleton, primarily because of its involvement in the birth process. Many sexually dimorphic traits are concentrated in the coxal bones, which form the larger part of the birth canal. The present study aimed [...] Read more.
The pelvis is the most dimorphic part of the human skeleton, primarily because of its involvement in the birth process. Many sexually dimorphic traits are concentrated in the coxal bones, which form the larger part of the birth canal. The present study aimed to assess the sex differences in coxal bone size and to develop machine learning (ML) models for sex estimation based on coxal bone measurements. The sample included abdominal computed tomography scans of 276 adult Bulgarians. Three-dimensional models of the pelves were generated using InVesalius. The three-dimensional coordinates of 34 landmarks located on the right and left coxal bones were collected in MeshLab. Based on the landmark coordinates, various measurements characterizing the coxal bones were calculated. The coxal bone dimensions were tested for significant differences with respect to sex, age, and laterality. Support Vector Machines and logistic regression were employed to train models for sex estimation. The results demonstrate strong sexual dimorphism in coxal bone dimensions along with some bilateral and age-related differences. The trained ML models classify male and female bones with very high accuracy, ranging between 95% and 100%. Full article
(This article belongs to the Section Medical Biology)
Show Figures

Figure 1

20 pages, 5700 KB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 748
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

19 pages, 709 KB  
Article
Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks
by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng
Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025
Viewed by 556
Abstract
Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.
Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article
(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)
Show Figures

Figure 1

Back to TopTop