MDPI - Publisher of Open Access Journals

13 pages, 706 KB

Open AccessArticle

Enhancing 3D Face Recognition: Achieving Significant Gains via 2D-Aided Generative Augmentation

by Cuican Yu, Zihui Zhang, Huibin Li and Chang Liu

Sensors 2025, 25(16), 5049; https://doi.org/10.3390/s25165049 - 14 Aug 2025

Viewed by 285

The development of deep learning-based 3D face recognition has been constrained by the limited availability of large-scale 3D facial datasets, which are costly and labor-intensive to acquire. To address this challenge, we propose a novel 2D-aided framework that reconstructs 3D face geometries from [...] Read more.

The development of deep learning-based 3D face recognition has been constrained by the limited availability of large-scale 3D facial datasets, which are costly and labor-intensive to acquire. To address this challenge, we propose a novel 2D-aided framework that reconstructs 3D face geometries from abundant 2D images, enabling scalable and cost-effective data augmentation for 3D face recognition. Our pipeline integrates 3D face reconstruction with normal component image encoding and fine-tunes a deep face recognition model to learn discriminative representations from synthetic 3D data. Experimental results on four public benchmarks, i.e., the BU-3DFE, FRGC v2, Bosphorus, and BU-4DFE databases, demonstrate competitive rank-1 accuracies of 99.2%, 98.4%, 99.3%, and 96.5%, respectively, despite the absence of real 3D training data. We further evaluate the impact of alternative reconstruction methods and empirically demonstrate that higher-fidelity 3D inputs improve recognition performance. While synthetic 3D face data may lack certain fine-grained geometric details, our results validate their effectiveness for practical recognition tasks under diverse expressions and demographic conditions. This work provides an efficient and scalable paradigm for 3D face recognition by leveraging widely available face images, offering new insights into data-efficient training strategies for biometric systems. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Sensing Technology)

► Show Figures

Figure 1

23 pages, 85184 KB

Open AccessArticle

MB-MSTFNet: A Multi-Band Spatio-Temporal Attention Network for EEG Sensor-Based Emotion Recognition

by Cheng Fang, Sitong Liu and Bing Gao

Sensors 2025, 25(15), 4819; https://doi.org/10.3390/s25154819 - 5 Aug 2025

Viewed by 492

Abstract

Emotion analysis based on electroencephalogram (EEG) sensors is pivotal for human–machine interaction yet faces key challenges in spatio-temporal feature fusion and cross-band and brain-region integration from multi-channel sensor-derived signals. This paper proposes MB-MSTFNet, a novel framework for EEG emotion recognition. The model constructs [...] Read more.

Emotion analysis based on electroencephalogram (EEG) sensors is pivotal for human–machine interaction yet faces key challenges in spatio-temporal feature fusion and cross-band and brain-region integration from multi-channel sensor-derived signals. This paper proposes MB-MSTFNet, a novel framework for EEG emotion recognition. The model constructs a 3D tensor to encode band–space–time correlations of sensor data, explicitly modeling frequency-domain dynamics and spatial distributions of EEG sensors across brain regions. A multi-scale CNN-Inception module extracts hierarchical spatial features via diverse convolutional kernels and pooling operations, capturing localized sensor activations and global brain network interactions. Bi-directional GRUs (BiGRUs) model temporal dependencies in sensor time-series, adept at capturing long-range dynamic patterns. Multi-head self-attention highlights critical time windows and brain regions by assigning adaptive weights to relevant sensor channels, suppressing noise from non-contributory electrodes. Experiments on the DEAP dataset, containing multi-channel EEG sensor recordings, show that MB-MSTFNet achieves 96.80 ± 0.92% valence accuracy, 98.02 ± 0.76% arousal accuracy for binary classification tasks, and 92.85 ± 1.45% accuracy for four-class classification. Ablation studies validate that feature fusion, bidirectional temporal modeling, and multi-scale mechanisms significantly enhance performance by improving feature complementarity. This sensor-driven framework advances affective computing by integrating spatio-temporal dynamics and multi-band interactions of EEG sensor signals, enabling efficient real-time emotion recognition. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

26 pages, 829 KB

Open AccessArticle

Enhanced Face Recognition in Crowded Environments with 2D/3D Features and Parallel Hybrid CNN-RNN Architecture with Stacked Auto-Encoder

by Samir Elloumi, Sahbi Bahroun, Sadok Ben Yahia and Mourad Kaddes

Big Data Cogn. Comput. 2025, 9(8), 191; https://doi.org/10.3390/bdcc9080191 - 22 Jul 2025

Viewed by 592

Abstract

Face recognition (FR) in unconstrained conditions remains an open research topic and an ongoing challenge. The facial images exhibit diverse expressions, occlusions, variations in illumination, and heterogeneous backgrounds. This work aims to produce an accurate and robust system for enhanced Security and Surveillance. [...] Read more.

Face recognition (FR) in unconstrained conditions remains an open research topic and an ongoing challenge. The facial images exhibit diverse expressions, occlusions, variations in illumination, and heterogeneous backgrounds. This work aims to produce an accurate and robust system for enhanced Security and Surveillance. A parallel hybrid deep learning model for feature extraction and classification is proposed. An ensemble of three parallel extraction layer models learns the best representative features using CNN and RNN. 2D LBP and 3D Mesh LBP are computed on face images to extract image features as input to two RNNs. A stacked autoencoder (SAE) merged the feature vectors extracted from the three CNN-RNN parallel layers. We tested the designed 2D/3D CNN-RNN framework on four standard datasets. We achieved an accuracy of

98.9 %

. The hybrid deep learning model significantly improves FR against similar state-of-the-art methods. The proposed model was also tested on an unconstrained conditions human crowd dataset, and the results were very promising with an accuracy of

95 %

. Furthermore, our model shows an 11.5% improvement over similar hybrid CNN-RNN architectures, proving its robustness in complex environments where the face can undergo different transformations. Full article

► Show Figures

Figure 1

24 pages, 824 KB

Open AccessArticle

MMF-Gait: A Multi-Model Fusion-Enhanced Gait Recognition Framework Integrating Convolutional and Attention Networks

by Kamrul Hasan, Khandokar Alisha Tuhin, Md Rasul Islam Bapary, Md Shafi Ud Doula, Md Ashraful Alam, Md Atiqur Rahman Ahad and Md. Zasim Uddin

Symmetry 2025, 17(7), 1155; https://doi.org/10.3390/sym17071155 - 19 Jul 2025

Viewed by 523

Abstract

Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often [...] Read more.

Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often fails to determine the offender’s identity when they conceal their face by wearing helmets and masks to evade identification. In such cases, gait-based recognition is ideal for identifying offenders, and most existing work leverages a deep learning (DL) model. However, a single model often fails to capture a comprehensive selection of refined patterns in input data when external factors are present, such as variation in viewing angle, clothing, and carrying conditions. In response to this, this paper introduces a fusion-based multi-model gait recognition framework that leverages the potential of convolutional neural networks (CNNs) and a vision transformer (ViT) in an ensemble manner to enhance gait recognition performance. Here, CNNs capture spatiotemporal features, and ViT features multiple attention layers that focus on a particular region of the gait image. The first step in this framework is to obtain the Gait Energy Image (GEI) by averaging a height-normalized gait silhouette sequence over a gait cycle, which can handle the left–right gait symmetry of the gait. After that, the GEI image is fed through multiple pre-trained models and fine-tuned precisely to extract the depth spatiotemporal feature. Later, three separate fusion strategies are conducted, and the first one is decision-level fusion (DLF), which takes each model’s decision and employs majority voting for the final decision. The second is feature-level fusion (FLF), which combines the features from individual models through pointwise addition before performing gait recognition. Finally, a hybrid fusion combines DLF and FLF for gait recognition. The performance of the multi-model fusion-based framework was evaluated on three publicly available gait databases: CASIA-B, OU-ISIR D, and the OU-ISIR Large Population dataset. The experimental results demonstrate that the fusion-enhanced framework achieves superior performance. Full article

(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)

► Show Figures

Figure 1

20 pages, 1798 KB

Open AccessArticle

An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment

by Alessio Catalfamo, Antonio Celesti, Maria Fazio, A. F. M. Saifuddin Saif, Yu-Sheng Lin, Edelberto Franco Silva and Massimo Villari

Big Data Cogn. Comput. 2025, 9(7), 188; https://doi.org/10.3390/bdcc9070188 - 17 Jul 2025

Viewed by 578

Abstract

Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications [...] Read more.

Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Computer Vision, Augmented Reality Virtual Reality and Metaverse)

► Show Figures

Figure 1

24 pages, 589 KB

Open AccessArticle

FaceCloseup: Enhancing Mobile Facial Authentication with Perspective Distortion-Based Liveness Detection

by Yingjiu Li, Yan Li and Zilong Wang

Computers 2025, 14(7), 254; https://doi.org/10.3390/computers14070254 - 27 Jun 2025

Viewed by 752

Abstract

Facial authentication has gained widespread adoption as a biometric authentication method, offering a convenient alternative to traditional password-based systems, particularly on mobile devices equipped with front-facing cameras. While this technology enhances usability and security by eliminating password management, it remains highly susceptible to [...] Read more.

Facial authentication has gained widespread adoption as a biometric authentication method, offering a convenient alternative to traditional password-based systems, particularly on mobile devices equipped with front-facing cameras. While this technology enhances usability and security by eliminating password management, it remains highly susceptible to spoofing attacks. Adversaries can exploit facial recognition systems using pre-recorded photos, videos, or even sophisticated 3D models of victims’ faces to bypass authentication mechanisms. The increasing availability of personal images on social media further amplifies this risk, making robust anti-spoofing mechanisms essential for secure facial authentication. To address these challenges, we introduce FaceCloseup, a novel liveness detection technique that strengthens facial authentication by leveraging perspective distortion inherent in close-up shots of real, 3D faces. Instead of relying on additional sensors or user-interactive gestures, FaceCloseup passively analyzes facial distortions in video frames captured by a mobile device’s camera, improving security without compromising user experience. FaceCloseup effectively distinguishes live faces from spoofed attacks by identifying perspective-based distortions across different facial regions. The system achieves a 99.48% accuracy in detecting common spoofing methods—including photo, video, and 3D model-based attacks—and demonstrates 98.44% accuracy in differentiating between individual users. By operating entirely on-device, FaceCloseup eliminates the need for cloud-based processing, reducing privacy concerns and potential latency in authentication. Its reliance on natural device movement ensures a seamless authentication experience while maintaining robust security. Full article

(This article belongs to the Special Issue Cyber Security and Privacy in IoT Era)

► Show Figures

Figure 1

46 pages, 2741 KB

Open AccessReview

Innovative Technologies Reshaping Meat Industrialization: Challenges and Opportunities in the Intelligent Era

by Qing Sun, Yanan Yuan, Baoguo Xu, Shipeng Gao, Xiaodong Zhai, Feiyue Xu and Jiyong Shi

Foods 2025, 14(13), 2230; https://doi.org/10.3390/foods14132230 - 24 Jun 2025

Cited by 1 | Viewed by 1522

Abstract

The Fourth Industrial Revolution and artificial intelligence (AI) technology are driving the transformation of the meat industry from mechanization and automation to intelligence and digitization. This paper provides a systematic review of key technological innovations in this field, including physical technologies (such as [...] Read more.

The Fourth Industrial Revolution and artificial intelligence (AI) technology are driving the transformation of the meat industry from mechanization and automation to intelligence and digitization. This paper provides a systematic review of key technological innovations in this field, including physical technologies (such as smart cutting precision improved to the millimeter level, pulse electric field sterilization efficiency exceeding 90%, ultrasonic-assisted marinating time reduced by 12 h, and ultra-high-pressure processing extending shelf life) and digital technologies (IoT real-time monitoring, blockchain-enhanced traceability transparency, and AI-optimized production decision-making). Additionally, it explores the potential of alternative meat production technologies (cell-cultured meat and 3D bioprinting) to disrupt traditional models. In application scenarios such as central kitchen efficiency improvements (e.g., food companies leveraging the “S2B2C” model to apply AI agents, supply chain management, and intelligent control systems, resulting in a 26.98% increase in overall profits), end-to-end temperature control in cold chain logistics (e.g., using multi-array sensors for real-time monitoring of meat spoilage), intelligent freshness recognition of products (based on deep learning or sensors), and personalized customization (e.g., 3D-printed customized nutritional meat products), these technologies have significantly improved production efficiency, product quality, and safety. However, large-scale application still faces key challenges, including high costs (such as the high investment in cell-cultured meat bioreactors), lack of standardization (such as the absence of unified standards for non-thermal technology parameters), and consumer acceptance (surveys indicate that approximately 41% of consumers are concerned about contracting illnesses from consuming cultured meat, and only 25% are willing to try it). These challenges constrain the economic viability and market promotion of the aforementioned technologies. Future efforts should focus on collaborative innovation to establish a truly intelligent and sustainable meat production system. Full article

(This article belongs to the Special Issue Mechanism and Engineering Research on Processing, Storage and Preservation of Fresh Food: 2nd Edition)

► Show Figures

Figure 1

42 pages, 3140 KB

Open AccessReview

Face Anti-Spoofing Based on Deep Learning: A Comprehensive Survey

by Huifen Xing, Siok Yee Tan, Faizan Qamar and Yuqing Jiao

Appl. Sci. 2025, 15(12), 6891; https://doi.org/10.3390/app15126891 - 18 Jun 2025

Viewed by 3220

Abstract

Face recognition has achieved tremendous success in both its theory and technology. However, with increasingly realistic attacks, such as print photos, replay videos, and 3D masks, as well as new attack methods like AI-generated faces or videos, face recognition systems are confronted with [...] Read more.

Face recognition has achieved tremendous success in both its theory and technology. However, with increasingly realistic attacks, such as print photos, replay videos, and 3D masks, as well as new attack methods like AI-generated faces or videos, face recognition systems are confronted with significant challenges and risks. Distinguishing between real and fake faces, i.e., face anti-spoofing (FAS), is crucial to the security of face recognition systems. With the advent of large-scale academic datasets in recent years, FAS based on deep learning has achieved a remarkable level of performance and now dominates the field. This paper systematically reviews the latest advancements in FAS based on deep learning. First, it provides an overview of the background, basic concepts, and types of FAS attacks. Then, it categorizes existing FAS methods from the perspectives of RGB (red, green and blue) modality and other modalities, discussing the main concepts, the types of attacks that can be detected, their advantages and disadvantages, and so on. Next, it introduces popular datasets used in FAS research and highlights their characteristics. Finally, it summarizes the current research challenges and future directions for FAS, such as its limited generalization for unknown attacks, the insufficient multi-modal research, the spatiotemporal efficiency of algorithms, and unified detection for presentation attacks and deepfakes. We aim to provide a comprehensive reference in this field and to inspire progress within the FAS community, guiding researchers toward promising directions for future work. Full article

(This article belongs to the Special Issue Deep Learning in Object Detection)

► Show Figures

Figure 1

17 pages, 3741 KB

Open AccessArticle

DeepSeaNet: An Efficient UIE Deep Network

by Jingsheng Li, Yuanbing Ouyang, Hao Wang, Di Wu and Yushan Pan

Electronics 2025, 14(12), 2411; https://doi.org/10.3390/electronics14122411 - 12 Jun 2025

Viewed by 536

Abstract

Underwater image enhancement and object recognition are crucial in multiple fields, like marine biology, archeology, and environmental monitoring, but face severe challenges due to low light, color distortion, and reduced contrast in underwater environments. DeepSeaNet re-evaluates the model guidance strategy from multiple dimensions, [...] Read more.

Underwater image enhancement and object recognition are crucial in multiple fields, like marine biology, archeology, and environmental monitoring, but face severe challenges due to low light, color distortion, and reduced contrast in underwater environments. DeepSeaNet re-evaluates the model guidance strategy from multiple dimensions, enhances color recovery using the MCOLE score, and addresses the problem of inconsistent attenuation across different regions of underwater images by integrating a feature extraction method guided by a global attention mechanism by ViT. Comprehensive tests on diverse underwater datasets show that DeepSeaNet achieves a maximum PSNR of 28.96 dB and an average SSIM of 0.901, representing a 20–40% improvement over baseline methods. These results highlight DeepSeaNet’s superior performance in enhancing image clarity, color richness, and contrast, making it a remarkably effective instrument for underwater image processing and analysis. Full article

► Show Figures

Graphical abstract

13 pages, 2855 KB

Open AccessArticle

Research on Video Behavior Detection and Analysis Model for Sow Estrus Cycle Based on Deep Learning

by Kaidong Lei, Bugao Li, Shan Zhong, Hua Yang, Hao Wang, Xiangfang Tang and Benhai Xiong

Agriculture 2025, 15(9), 975; https://doi.org/10.3390/agriculture15090975 - 30 Apr 2025

Cited by 1 | Viewed by 672

Abstract

Against the backdrop of precision livestock farming, sow behavior analysis holds significant theoretical and practical value. Traditional production methods face challenges such as low production efficiency, high labor intensity, and increased disease prevention risks. With the rapid advancement of optoelectronic technology and deep [...] Read more.

Against the backdrop of precision livestock farming, sow behavior analysis holds significant theoretical and practical value. Traditional production methods face challenges such as low production efficiency, high labor intensity, and increased disease prevention risks. With the rapid advancement of optoelectronic technology and deep learning, more technologies are being integrated into smart agriculture. Intelligent large-scale pig farming has become an effective means to improve sow quality and productivity, with behavior recognition technology playing a crucial role in intelligent pig farming. Specifically, monitoring sow behavior enables an effective assessment of health conditions and welfare levels, ensuring efficient and healthy sow production. This study constructs a 3D-CNN model based on video data from the sow estrus cycle, achieving analysis of SOB, SOC, SOS, and SOW behaviors. In typical behavior classification, the model attains accuracy, recall, and F1-score values of (1.00, 0.90, 0.95; 0.96, 0.98, 0.97; 1.00, 0.96, 0.98; 0.86, 1.00, 0.93), respectively. Additionally, under conditions of multi-pig interference and non-specifically labeled data, the accuracy, recall, and F1-scores for the semantic recognition of SOB, SOC, SOS, and SOW behaviors based on the 3D-CNN model are (1.00, 0.90, 0.95; 0.89, 0.89, 0.89; 0.91, 1.00, 0.95; 1.00, 1.00, 1.00), respectively. These findings provide key technical support for establishing the classification and semantic recognition of typical sow behaviors during the estrus cycle, while also offering a practical solution for rapid video-based behavior detection and welfare monitoring in precision livestock farming. Full article

(This article belongs to the Special Issue Computer Vision Analysis Applied to Farm Animals)

► Show Figures

Figure 1

16 pages, 7057 KB

Open AccessFeature PaperArticle

VRBiom: A New Periocular Dataset for Biometric Applications of Head-Mounted Display

by Ketan Kotwal, Ibrahim Ulucan, Gökhan Özbulak, Janani Selliah and Sébastien Marcel

Electronics 2025, 14(9), 1835; https://doi.org/10.3390/electronics14091835 - 30 Apr 2025

Cited by 1 | Viewed by 886

Abstract

With advancements in hardware, high-quality head-mounted display (HMD) devices are being developed by numerous companies, driving increased consumer interest in AR, VR, and MR applications. This proliferation of HMD devices opens up possibilities for a wide range of applications beyond entertainment. Most commercially [...] Read more.

With advancements in hardware, high-quality head-mounted display (HMD) devices are being developed by numerous companies, driving increased consumer interest in AR, VR, and MR applications. This proliferation of HMD devices opens up possibilities for a wide range of applications beyond entertainment. Most commercially available HMD devices are equipped with internal inward-facing cameras to record the periocular areas. Given the nature of these devices and captured data, many applications such as biometric authentication and gaze analysis become feasible. To effectively explore the potential of HMDs for these diverse use-cases and to enhance the corresponding techniques, it is essential to have an HMD dataset that captures realistic scenarios. In this work, we present a new dataset of periocular videos acquired using a virtual reality headset called VRBiom. The VRBiom, targeted at biometric applications, consists of 900 short videos acquired from 25 individuals recorded in the NIR spectrum. These 10 s long videos have been captured using the internal tracking cameras of Meta Quest Pro at 72 FPS. To encompass real-world variations, the dataset includes recordings under three gaze conditions: steady, moving, and partially closed eyes. We have also ensured an equal split of recordings without and with glasses to facilitate the analysis of eye-wear. These videos, characterized by non-frontal views of the eye and relatively low spatial resolutions (

400 \times 400

), can be instrumental in advancing state-of-the-art research across various biometric applications. The VRBiom dataset can be utilized to evaluate, train, or adapt models for biometric use-cases such as iris and/or periocular recognition and associated sub-tasks such as detection and semantic segmentation. In addition to data from real individuals, we have included around 1100 presentation attacks constructed from 92 PA instruments. These PAIs fall into six categories constructed through combinations of print attacks (real and synthetic identities), fake 3D eyeballs, plastic eyes, and various types of masks and mannequins. These PA videos, combined with genuine (bona fide) data, can be utilized to address concerns related to spoofing, which is a significant threat if these devices are to be used for authentication. The VRBiom dataset is publicly available for research purposes related to biometric applications only. Full article

(This article belongs to the Special Issue Recent Advances and Applications of Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

9 pages, 5740 KB

Open AccessArticle

Anti-Freezing Conductive Ionic Hydrogel-Enabled Triboelectric Nanogenerators for Wearable Speech Recognition

by Tao Chen, Andeng Liu, Wentao Lei, Guoxu Wu, Jiajun Xiang, Yixin Dong, Yangyang Chen, Bingqi Chen, Meidan Ye, Jizhong Zhao and Wenxi Guo

Materials 2025, 18(9), 2014; https://doi.org/10.3390/ma18092014 - 29 Apr 2025

Viewed by 673

Abstract

Flexible wearable electronics face critical challenges in achieving reliable physiological monitoring, particularly due to the trade-off between sensitivity and durability in flexible electrodes, compounded by mechanical modulus mismatch with biological tissues. To address these limitations, we develop an anti-freezing ionic hydrogel through a [...] Read more.

Flexible wearable electronics face critical challenges in achieving reliable physiological monitoring, particularly due to the trade-off between sensitivity and durability in flexible electrodes, compounded by mechanical modulus mismatch with biological tissues. To address these limitations, we develop an anti-freezing ionic hydrogel through a chitosan/acrylamide/LiCl system engineered via the solution post-treatment strategy. The optimized hydrogel exhibits exceptional ionic conductivity (24.1 mS/cm at 25 °C) and excellent cryogenic tolerance. Leveraging these attributes, we construct a gel-based triboelectric nanogenerator (G-TENG) that demonstrates ultrahigh sensitivity (1.56 V/kPa) under low pressure. The device enables the precise capture of subtle vibrations at a frequency of 1088 Hz with a signal-to-noise ratio of 16.27 dB and demonstrates operational stability (>16,000 cycles), successfully differentiating complex physiological activities including swallowing, coughing, and phonation. Through machine learning-assisted analysis, the system achieves 96.56% recognition accuracy for five words and demonstrates good signal recognition ability in different ambient sound scenarios. This work provides a paradigm for designing environmentally adaptive wearable sensors through interfacial modulus engineering and ion transport optimization. Full article

(This article belongs to the Special Issue Materials, Design, and Performance of Nanogenerators)

► Show Figures

Figure 1

17 pages, 3439 KB

Open AccessArticle

A Novel Approach for Visual Speech Recognition Using the Partition-Time Masking and Swin Transformer 3D Convolutional Model

by Xiangliang Zhang, Yu Hu, Xiangzhi Liu, Yu Gu, Tong Li, Jibin Yin and Tao Liu

Sensors 2025, 25(8), 2366; https://doi.org/10.3390/s25082366 - 8 Apr 2025

Cited by 1 | Viewed by 941

Abstract

Visual speech recognition is a technology that relies on visual information, offering unique advantages in noisy environments or when communicating with individuals with speech impairments. However, this technology still faces challenges, such as limited generalization ability due to different speech habits, high recognition [...] Read more.

Visual speech recognition is a technology that relies on visual information, offering unique advantages in noisy environments or when communicating with individuals with speech impairments. However, this technology still faces challenges, such as limited generalization ability due to different speech habits, high recognition error rates caused by confusable phonemes, and difficulties adapting to complex lighting conditions and facial occlusions. This paper proposes a lip reading data augmentation method—Partition-Time Masking (PTM)—to address these challenges and improve lip reading models’ performance and generalization ability. Applying nonlinear transformations to the training data enhances the model’s generalization ability when handling diverse speakers and environmental conditions. A lip-reading recognition model architecture, Swin Transformer and 3D Convolution (ST3D), was designed to overcome the limitations of traditional lip-reading models that use ResNet-based front-end feature extraction networks. By adopting a strategy that combines Swin Transformer and 3D convolution, the proposed model enhances performance. To validate the effectiveness of the Partition-Time Masking data augmentation method, experiments were conducted on the LRW video dataset using the DC-TCN model, achieving a peak accuracy of 92.15%. The ST3D model was validated on the LRW and LRW1000 video datasets, achieving a maximum accuracy of 56.1% on the LRW1000 dataset and 91.8% on the LRW dataset, outperforming current mainstream lip reading models and demonstrating superior performance on challenging easily confused samples. Full article

(This article belongs to the Special Issue Sensors for Biomechanical and Rehabilitation Engineering)

► Show Figures

Figure 1

21 pages, 941 KB

Open AccessReview

Technological Advancements in Human Navigation for the Visually Impaired: A Systematic Review

by Edgar Casanova, Diego Guffanti and Luis Hidalgo

Sensors 2025, 25(7), 2213; https://doi.org/10.3390/s25072213 - 1 Apr 2025

Cited by 2 | Viewed by 3224

Abstract

Visually impaired people face significant obstacles when navigating complex environments. However, recent technological advances have greatly improved the functionality of navigation systems tailored to their needs. The objective of this research is to evaluate the effectiveness and functionality these navigation systems through a [...] Read more.

Visually impaired people face significant obstacles when navigating complex environments. However, recent technological advances have greatly improved the functionality of navigation systems tailored to their needs. The objective of this research is to evaluate the effectiveness and functionality these navigation systems through a comparative analysis of recent technologies. For this purpose, the PRISMA 2020 methodology was used to perform a systematic literature review. After identification and screening, 58 articles published between 2019 and 2024 were selected from three academic databases: Dimensions (26 articles), Web of Science (18 articles), and Scopus (14 articles). Bibliometric analysis demonstrated a growing interest of the research community in the topic, with an average of 4.552 citations per published article. Even with the technological advances that have occurred in recent times, there is still a significant gap in the support systems for people with blindness due to the lack of digital accessibility and the scarcity of adapted support systems. This situation limits the autonomy and inclusion of people with blindness, so the need to continue developing technological and social solutions to ensure equal opportunities and full participation in society is evident. This study emphasizes the great advances with the integration of sensors such as high-precision GPS, ultrasonic sensors, Bluetooth, and various assistance apps for object recognition, obstacle detection, and trajectory generation, as well as haptic systems, which provide tactile information through wearables or actuators and improve spatial awareness. Current navigation algorithms were also identified in the review with methods including obstacle detection, path planning, and trajectory prediction, applied to technologies such as ultrasonic sensors, RGB-D cameras, and LiDAR for indoor navigation, as well as stereo cameras and GPS for outdoor navigation. It was also found that AI systems employ deep learning and neural networks to optimize both navigation accuracy and energy efficiency. Finally, analysis revealed that 79% of the 58 reviewed articles included experimental validation, 87% of which were on haptic systems and 40% on smartphones. These results underscore the importance of experimentation in the development of technologies for the mobility of people with visual impairment. Full article

(This article belongs to the Section Environmental Sensing)

► Show Figures

Figure 1

21 pages, 12241 KB

Open AccessArticle

A Social Assistance System for Augmented Reality Technology to Redound Face Blindness with 3D Face Recognition

by Wen-Hau Jain, Bing-Gang Jhong and Mei-Yung Chen

Electronics 2025, 14(7), 1244; https://doi.org/10.3390/electronics14071244 - 21 Mar 2025

Cited by 1 | Viewed by 838

Abstract

The objective of this study is to develop an Augmented Reality (AR) visual aid system to help patients with prosopagnosia recognize faces in social situations and everyday life. The primary contribution of this study is the use of 3D face models as the [...] Read more.

The objective of this study is to develop an Augmented Reality (AR) visual aid system to help patients with prosopagnosia recognize faces in social situations and everyday life. The primary contribution of this study is the use of 3D face models as the basis of data augmentation for facial recognition, which has practical applications for various social situations that patients with prosopagnosia find themselves in. The study comprises the following components: First, the affordances of Active Stereoscopy and stereo cameras were combined. Second, deep learning was employed to reconstruct a detailed 3D face model in real-time based on data from the 3D point cloud and the 2D image. Data were also retrieved from seven angles of the subject’s face to improve the accuracy of face recognition from the subject’s profile and in a range of dynamic interactions. Second, the data derived from the first step were entered into a convolutional neural network (CNN), which then generated a 128-dimensional characteristic vector. Next, the system deployed Structured Query Language (SQL) to compute and compare Euclidean distances to determine the smallest Euclidean distance and match it to the name that corresponded to the face; tagged face data were projected by the camera onto the AR lenses. The findings of this study show that our AR system has a robustness of more than 99% in terms of face recognition. This method offers a higher practical value than traditional 2D face recognition methods when it comes to large-pose 3D face recognition in day-to-day life. Full article

(This article belongs to the Special Issue Real-Time Computer Vision)

► Show Figures

Figure 1

Search Results (200)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (200)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI