Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (169)

Search Parameters:
Keywords = keyframes

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 3921 KB  
Article
Tightly Coupled LiDAR-Inertial Odometry for Autonomous Driving via Self-Adaptive Filtering and Factor Graph Optimization
by Weiwei Lyu, Haoting Li, Shuanggen Jin, Haocai Huang, Xiaojuan Tian, Yunlong Zhang, Zheyuan Du and Jinling Wang
Machines 2025, 13(11), 977; https://doi.org/10.3390/machines13110977 - 23 Oct 2025
Abstract
Simultaneous Localization and Mapping (SLAM) has become a critical tool for fully autonomous driving. However, current methods suffer from inefficient data utilization and degraded navigation performance in complex and unknown environments. In this paper, an accurate and tightly coupled method of LiDAR-inertial odometry [...] Read more.
Simultaneous Localization and Mapping (SLAM) has become a critical tool for fully autonomous driving. However, current methods suffer from inefficient data utilization and degraded navigation performance in complex and unknown environments. In this paper, an accurate and tightly coupled method of LiDAR-inertial odometry is proposed. First, a self-adaptive voxel grid filter is developed to dynamically downsample the original point clouds based on environmental feature richness, aiming to balance navigation accuracy and real-time performance. Second, keyframe factors are selected based on thresholds of translation distance, rotation angle, and time interval and then introduced into the factor graph to improve global consistency. Additionally, high-quality Global Navigation Satellite System (GNSS) factors are selected and incorporated into the factor graph through linear interpolation, thereby improving the navigation accuracy in complex and unknown environments. The proposed method is evaluated using KITTI dataset over various scales and environments. Results show that the proposed method has demonstrated very promising better results when compared with the other methods, such as ALOAM, LIO-SAM, and SC-LeGO-LOAM. Especially in urban scenes, the trajectory accuracy of the proposed method has been improved by 33.13%, 57.56%, and 58.4%, respectively, illustrating excellent navigation and positioning capabilities. Full article
(This article belongs to the Section Vehicle Engineering)
Show Figures

Figure 1

28 pages, 32292 KB  
Article
Contextual Feature Fusion-Based Keyframe Selection Using Semantic Attention and Diversity-Aware Optimization for Video Summarization
by Chitrakala S and Aparyay Kumar
Symmetry 2025, 17(10), 1737; https://doi.org/10.3390/sym17101737 - 15 Oct 2025
Viewed by 279
Abstract
Training-free video summarization tackles the challenge of selecting the most informative keyframes from a video without relying on costly training or complex deep models. This work introduces C2FVS-DPP (Contextual Feature Fusion Video Summarization with Determinantal Point Process), a lightweight framework that [...] Read more.
Training-free video summarization tackles the challenge of selecting the most informative keyframes from a video without relying on costly training or complex deep models. This work introduces C2FVS-DPP (Contextual Feature Fusion Video Summarization with Determinantal Point Process), a lightweight framework that generates concise video summaries by jointly modeling semantic importance, visual diversity, temporal structure, and symmetry. The design centers on a symmetry-aware fusion strategy, where appearance, motion, and semantic cues are aligned in a unified embedding space, and on a reward-guided optimization logic that balances representativeness and diversity. Specifically, appearance features from ResNet-50, motion cues from optical flow, and semantic representations from BERT-encoded BLIP captions are fused into a contextual embedding. A Transformer encoder assigns importance scores, followed by shot boundary detection and K-Medoids clustering to identify candidate keyframes. These candidates are refined through a reward-based re-ranking mechanism that integrates semantic relevance, representativeness, and visual uniqueness, while a Determinantal Point Process (DPP) enforces globally diverse selection under a keyframe budget. To enable reliable evaluation, enhanced versions of the SumMe and TVSum50 datasets were curated to reduce redundancy and increase semantic density. On these curated benchmarks, C2FVS-DPP achieves F1-scores of 0.22 and 0.43 and fidelity scores of 0.16 and 0.40 on SumMe and TVSum50, respectively, surpassing existing models on both metrics. In terms of compression ratio, the framework records 0.9959 on SumMe and 0.9940 on TVSum50, remaining highly competitive with the best-reported values of 0.9981 and 0.9983. These results highlight the strength of C2FVS-DPP as an inference-driven, symmetry-aware, and resource-efficient solution for video summarization. Full article
Show Figures

Figure 1

18 pages, 8879 KB  
Article
Energy-Conscious Lightweight LiDAR SLAM with 2D Range Projection and Multi-Stage Outlier Filtering for Intelligent Driving
by Chun Wei, Tianjing Li and Xuemin Hu
Computation 2025, 13(10), 239; https://doi.org/10.3390/computation13100239 - 10 Oct 2025
Viewed by 261
Abstract
To meet the increasing demands of energy efficiency and real-time performance in autonomous driving systems, this paper presents a lightweight and robust LiDAR SLAM framework designed with power-aware considerations. The proposed system introduces three core innovations. First, it replaces traditional ordered point cloud [...] Read more.
To meet the increasing demands of energy efficiency and real-time performance in autonomous driving systems, this paper presents a lightweight and robust LiDAR SLAM framework designed with power-aware considerations. The proposed system introduces three core innovations. First, it replaces traditional ordered point cloud indexing with a 2D range image projection, significantly reducing memory usage and enabling efficient feature extraction with curvature-based criteria. Second, a multi-stage outlier rejection mechanism is employed to enhance feature robustness by adaptively filtering occluded and noisy points. Third, we propose a dynamically filtered local mapping strategy that adjusts keyframe density in real time, ensuring geometric constraint sufficiency while minimizing redundant computation. These components collectively contribute to a SLAM system that achieves high localization accuracy with reduced computational load and energy consumption. Experimental results on representative autonomous driving datasets demonstrate that our method outperforms existing approaches in both efficiency and robustness, making it well-suited for deployment in low-power and real-time scenarios within intelligent transportation systems. Full article
(This article belongs to the Special Issue Object Detection Models for Transportation Systems)
Show Figures

Figure 1

24 pages, 4488 KB  
Review
Advances in Facial Micro-Expression Detection and Recognition: A Comprehensive Review
by Tian Shuai, Seng Beng, Fatimah Binti Khalid and Rahmita Wirza Bt O. K. Rahmat
Information 2025, 16(10), 876; https://doi.org/10.3390/info16100876 - 9 Oct 2025
Viewed by 782
Abstract
Micro-expressions are facial movements with extremely short duration and small amplitude, which can reveal an individual’s potential true emotions and have important application value in public safety, medical diagnosis, psychotherapy and business negotiations. Since micro-expressions change rapidly and are difficult to detect, manual [...] Read more.
Micro-expressions are facial movements with extremely short duration and small amplitude, which can reveal an individual’s potential true emotions and have important application value in public safety, medical diagnosis, psychotherapy and business negotiations. Since micro-expressions change rapidly and are difficult to detect, manual recognition is a significant challenge, so the development of automatic recognition systems has become a research hotspot. This paper reviews the development history and research status of micro-expression recognition and systematically analyzes the two main branches of micro-expression analysis: micro-expression detection and micro-expression recognition. In terms of detection, the methods are divided into three categories based on time features, feature changes and deep features according to different feature extraction methods; in terms of recognition, traditional methods based on texture and optical flow features, as well as deep learning-based methods that have emerged in recent years, including motion unit, keyframe and transfer learning strategies, are summarized. This paper also summarizes commonly used micro-expression datasets and facial image preprocessing techniques and evaluates and compares mainstream methods through multiple experimental indicators. Although significant progress has been made in this field in recent years, it still faces challenges such as data scarcity, class imbalance and unstable recognition accuracy. Future research can further combine multimodal emotional information, enhance data generalization capabilities, and optimize deep network structures to promote the widespread application of micro-expression recognition in practical scenarios. Full article
Show Figures

Figure 1

24 pages, 16680 KB  
Article
Research on Axle Type Recognition Technology for Under-Vehicle Panorama Images Based on Enhanced ORB and YOLOv11
by Xiaofan Feng, Lu Peng, Yu Tang, Chang Liu and Huazhen An
Sensors 2025, 25(19), 6211; https://doi.org/10.3390/s25196211 - 7 Oct 2025
Viewed by 523
Abstract
With the strict requirements of national policies on truck dimensions, axle loads, and weight limits, along with the implementation of tolls based on vehicle types, rapid and accurate identification of vehicle axle types has become essential for toll station management. To address the [...] Read more.
With the strict requirements of national policies on truck dimensions, axle loads, and weight limits, along with the implementation of tolls based on vehicle types, rapid and accurate identification of vehicle axle types has become essential for toll station management. To address the limitations of existing methods in distinguishing between drive and driven axles, complex equipment setup, and image evidence retention, this article proposes a panoramic image detection technology for vehicle chassis based on enhanced ORB and YOLOv11. A portable vehicle chassis image acquisition system, based on area array cameras, was developed for rapid on-site deployment within 20 min, eliminating the requirement for embedded installation. The FeatureBooster (FB) module was employed to optimize the ORB algorithm’s feature matching, and combined with keyframe technology to achieve high-quality panoramic image stitching. After fine-tuning the FB model on a domain-specific area scan dataset, the number of feature matches increased to 151 ± 18, substantially outperforming both the pre-trained FB model and the baseline ORB. Experimental results on axle type recognition using the YOLOv11 algorithm combined with ORB and FB features demonstrated that the integrated approach achieved superior performance. On the overall test set, the model attained an mAP@50 of 0.989 and an mAP@50:95 of 0.780, along with a precision (P) of 0.98 and a recall (R) of 0.99. In nighttime scenarios, it maintained an mAP@50 of 0.977 and an mAP@50:95 of 0.743, with precision and recall both consistently at 0.98 and 0.99, respectively. The field verification shows that the real-time and accuracy of the system can provide technical support for the axle type recognition of toll stations. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

28 pages, 10315 KB  
Article
DKB-SLAM: Dynamic RGB-D Visual SLAM with Efficient Keyframe Selection and Local Bundle Adjustment
by Qian Sun, Ziqiang Xu, Yibing Li, Yidan Zhang and Fang Ye
Robotics 2025, 14(10), 134; https://doi.org/10.3390/robotics14100134 - 25 Sep 2025
Viewed by 678
Abstract
Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder [...] Read more.
Reliable navigation for mobile robots in dynamic, human-populated environments remains a significant challenge, as moving objects often cause localization drift and map corruption. While Simultaneous Localization and Mapping (SLAM) techniques excel in static settings, issues like keyframe redundancy and optimization inefficiencies further hinder their practical deployment on robotic platforms. To address these challenges, we propose DKB-SLAM, a real-time RGB-D visual SLAM system specifically designed to enhance robotic autonomy in complex dynamic scenes. DKB-SLAM integrates optical flow with Gaussian-based depth distribution analysis within YOLO detection frames to efficiently filter dynamic points, crucial for maintaining accurate pose estimates for the robot. An adaptive keyframe selection strategy balances map density and information integrity using a sliding window, considering the robot’s motion dynamics through parallax, visibility, and matching quality. Furthermore, a heterogeneously weighted local bundle adjustment (BA) method leverages map point geometry, assigning higher weights to stable edge points to refine the robot’s trajectory. Evaluations on the TUM RGB-D benchmark and, crucially, on a mobile robot platform in real-world dynamic scenarios, demonstrate that DKB-SLAM outperforms state-of-the-art methods, providing a robust and efficient solution for high-precision robot localization and mapping in dynamic environments. Full article
(This article belongs to the Special Issue SLAM and Adaptive Navigation for Robotics)
Show Figures

Figure 1

29 pages, 5817 KB  
Article
Unsupervised Segmentation and Alignment of Multi-Demonstration Trajectories via Multi-Feature Saliency and Duration-Explicit HSMMs
by Tianci Gao, Konstantin A. Neusypin, Dmitry D. Dmitriev, Bo Yang and Shengren Rao
Mathematics 2025, 13(19), 3057; https://doi.org/10.3390/math13193057 - 23 Sep 2025
Viewed by 446
Abstract
Learning from demonstration with multiple executions must contend with time warping, sensor noise, and alternating quasi-stationary and transition phases. We propose a label-free pipeline that couples unsupervised segmentation, duration-explicit alignment, and probabilistic encoding. A dimensionless multi-feature saliency (velocity, acceleration, curvature, direction-change rate) yields [...] Read more.
Learning from demonstration with multiple executions must contend with time warping, sensor noise, and alternating quasi-stationary and transition phases. We propose a label-free pipeline that couples unsupervised segmentation, duration-explicit alignment, and probabilistic encoding. A dimensionless multi-feature saliency (velocity, acceleration, curvature, direction-change rate) yields scale-robust keyframes via persistent peak–valley pairs and non-maximum suppression. A hidden semi-Markov model (HSMM) with explicit duration distributions is jointly trained across demonstrations to align trajectories on a shared semantic time base. Segment-level probabilistic motion models (GMM/GMR or ProMP, optionally combined with DMP) produce mean trajectories with calibrated covariances, directly interfacing with constrained planners. Feature weights are tuned without labels by minimizing cross-demonstration structural dispersion on the simplex via CMA-ES. Across UAV flight, autonomous driving, and robotic manipulation, the method reduces phase-boundary dispersion by 31% on UAV-Sim and by 30–36% under monotone time warps, noise, and missing data (vs. HMM); improves the sparsity–fidelity trade-off (higher time compression at comparable reconstruction error) with lower jerk; and attains nominal 2σ coverage (94–96%), indicating well-calibrated uncertainty. Ablations attribute the gains to persistence plus NMS, weight self-calibration, and duration-explicit alignment. The framework is scale-aware and computationally practical, and its uncertainty outputs feed directly into MPC/OMPL for risk-aware execution. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

23 pages, 3485 KB  
Article
MSGS-SLAM: Monocular Semantic Gaussian Splatting SLAM
by Mingkai Yang, Shuyu Ge and Fei Wang
Symmetry 2025, 17(9), 1576; https://doi.org/10.3390/sym17091576 - 20 Sep 2025
Viewed by 1088
Abstract
With the iterative evolution of SLAM (Simultaneous Localization and Mapping) technology in the robotics domain, the SLAM paradigm based on three-dimensional Gaussian distribution models has emerged as the current state-of-the-art technical approach. This research proposes a novel MSGS-SLAM system (Monocular Semantic Gaussian Splatting [...] Read more.
With the iterative evolution of SLAM (Simultaneous Localization and Mapping) technology in the robotics domain, the SLAM paradigm based on three-dimensional Gaussian distribution models has emerged as the current state-of-the-art technical approach. This research proposes a novel MSGS-SLAM system (Monocular Semantic Gaussian Splatting SLAM), which innovatively integrates monocular vision with three-dimensional Gaussian distribution models within a semantic SLAM framework. Our approach exploits the inherent spherical symmetries of isotropic Gaussian distributions, enabling symmetric optimization processes that maintain computational efficiency while preserving geometric consistency. Current mainstream three-dimensional Gaussian semantic SLAM systems typically rely on depth sensors for map reconstruction and semantic segmentation, which not only significantly increases hardware costs but also limits the deployment potential of systems in diverse scenarios. To overcome this limitation, this research introduces a depth estimation proxy framework based on Metric3D-V2, which effectively addresses the inherent deficiency of monocular vision systems in depth information acquisition. Additionally, our method leverages architectural symmetries in indoor environments to enhance semantic understanding through symmetric feature matching. Through this approach, the system achieves robust and efficient semantic feature integration and optimization without relying on dedicated depth sensors, thereby substantially reducing the dependency of three-dimensional Gaussian semantic SLAM systems on depth sensors and expanding their application scope. Furthermore, this research proposes a keyframe selection algorithm based on semantic guidance and proxy depth collaborative mechanisms, which effectively suppresses pose drift errors accumulated during long-term system operation, thereby achieving robust global loop closure correction. Through systematic evaluation on multiple standard datasets, MSGS-SLAM achieves comparable technical performance to existing three-dimensional Gaussian model-based semantic SLAM systems across multiple key performance metrics including ATE RMSE, PSNR, and mIoU. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

17 pages, 3935 KB  
Article
Markerless Force Estimation via SuperPoint-SIFT Fusion and Finite Element Analysis: A Sensorless Solution for Deformable Object Manipulation
by Qingqing Xu, Ruoyang Lai and Junqing Yin
Biomimetics 2025, 10(9), 600; https://doi.org/10.3390/biomimetics10090600 - 8 Sep 2025
Viewed by 504
Abstract
Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential [...] Read more.
Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential interference with robotic flexibility. Consequently, these conventional sensors are unsuitable for biomimetic robot requirements in object perception, natural interaction, and agile movement. Therefore, this study proposes a sensorless external force detection method that integrates SuperPoint-Scale Invariant Feature Transform (SIFT) feature extraction with finite element analysis to address force perception challenges. A visual analysis method based on the SuperPoint-SIFT feature fusion algorithm was implemented to reconstruct a three-dimensional displacement field of the target object. Subsequently, the displacement field was mapped to the contact force distribution using finite element modeling. Experimental results demonstrate a mean force estimation error of 7.60% (isotropic) and 8.15% (anisotropic), with RMSE < 8%, validated by flexible pressure sensors. To enhance the model’s reliability, a dual-channel video comparison framework was developed. By analyzing the consistency of the deformation patterns and mechanical responses between the actual compression and finite element simulation video keyframes, the proposed approach provides a novel solution for real-time force perception in robotic interactions. The proposed solution is suitable for applications such as precision assembly and medical robotics, where sensorless force feedback is crucial. Full article
(This article belongs to the Special Issue Bio-Inspired Intelligent Robot)
Show Figures

Figure 1

23 pages, 4627 KB  
Article
Dynamic SLAM Dense Point Cloud Map by Fusion of Semantic Information and Bayesian Moving Probability
by Qing An, Shao Li, Yanglu Wan, Wei Xuan, Chao Chen, Bufan Zhao and Xijiang Chen
Sensors 2025, 25(17), 5304; https://doi.org/10.3390/s25175304 - 26 Aug 2025
Viewed by 924
Abstract
Most existing Simultaneous Localization and Mapping (SLAM) systems rely on the assumption of static environments to achieve reliable and efficient mapping. However, such methods often suffer from degraded localization accuracy and mapping consistency in dynamic settings, as they lack explicit mechanisms to distinguish [...] Read more.
Most existing Simultaneous Localization and Mapping (SLAM) systems rely on the assumption of static environments to achieve reliable and efficient mapping. However, such methods often suffer from degraded localization accuracy and mapping consistency in dynamic settings, as they lack explicit mechanisms to distinguish between static and dynamic elements. To overcome this limitation, we present BMP-SLAM, a vision-based SLAM approach that integrates semantic segmentation and Bayesian motion estimation to robustly handle dynamic indoor scenes. To enable real-time dynamic object detection, we integrate YOLOv5, a semantic segmentation network that identifies and localizes dynamic regions within the environment, into a dedicated dynamic target detection thread. Simultaneously, the data association Bayesian mobile probability proposed in this paper effectively eliminates dynamic feature points and successfully reduces the impact of dynamic targets in the environment on the SLAM system. To enhance complex indoor robotic navigation, the proposed system integrates semantic keyframe information with dynamic object detection outputs to reconstruct high-fidelity 3D point cloud maps of indoor environments. The evaluation conducted on the TUM RGB-D dataset indicates that the performance of BMP-SLAM is superior to that of ORB-SLAM3, with the trajectory tracking accuracy improved by 96.35%. Comparative evaluations demonstrate that the proposed system achieves superior performance in dynamic environments, exhibiting both lower trajectory drift and enhanced positioning precision relative to state-of-the-art dynamic SLAM methods. Full article
(This article belongs to the Special Issue Indoor Localization Technologies and Applications)
Show Figures

Figure 1

24 pages, 14239 KB  
Article
CAESAR: A Unified Framework for Foundation and Generative Models for Efficient Compression of Scientific Data
by Xiao Li, Liangji Zhu, Jaemoon Lee, Rahul Sengupta, Scott Klasky, Sanjay Ranka and Anand Rangarajan
Appl. Sci. 2025, 15(16), 8977; https://doi.org/10.3390/app15168977 - 14 Aug 2025
Viewed by 895
Abstract
We introduce CAESAR, a new framework for scientific data reduction that stands for Conditional AutoEncoder with Super-resolution for Augmented Reduction. The baseline model, CAESAR-V, is built on a standard variational autoencoder with scale hyperpriors and super-resolution modules to achieve high compression. It encodes [...] Read more.
We introduce CAESAR, a new framework for scientific data reduction that stands for Conditional AutoEncoder with Super-resolution for Augmented Reduction. The baseline model, CAESAR-V, is built on a standard variational autoencoder with scale hyperpriors and super-resolution modules to achieve high compression. It encodes data into a latent space and uses learned priors for compact, information-rich representations. The enhanced version, CAESAR-D, begins by compressing keyframes using an autoencoder and extends the architecture by incorporating conditional diffusion to interpolate the latent spaces of missing frames between keyframes. This enables high-fidelity reconstruction of intermediate data without requiring their explicit storage. By distinguishing CAESAR-V (variational) from CAESAR-D (diffusion-enhanced), we offer a modular family of solutions that balance compression efficiency, reconstruction accuracy, and computational cost for scientific data workflows. Additionally, we develop a GPU-accelerated postprocessing module which enforces error bounds on the reconstructed data, achieving real-time compression while maintaining rigorous accuracy guarantees. Experimental results across multiple scientific datasets demonstrate that our framework achieves up to 10× higher compression ratios compared to rule-based compressors such as SZ3. This work provides a scalable, domain-adaptive solution for efficient storage and transmission of large-scale scientific simulation data. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 4728 KB  
Article
Stereo Direct Sparse Visual–Inertial Odometry with Efficient Second-Order Minimization
by Chenhui Fu and Jiangang Lu
Sensors 2025, 25(15), 4852; https://doi.org/10.3390/s25154852 - 7 Aug 2025
Viewed by 1262
Abstract
Visual–inertial odometry (VIO) is the primary supporting technology for autonomous systems, but it faces three major challenges: initialization sensitivity, dynamic illumination, and multi-sensor fusion. In order to overcome these challenges, this paper proposes stereo direct sparse visual–inertial odometry with efficient second-order minimization. It [...] Read more.
Visual–inertial odometry (VIO) is the primary supporting technology for autonomous systems, but it faces three major challenges: initialization sensitivity, dynamic illumination, and multi-sensor fusion. In order to overcome these challenges, this paper proposes stereo direct sparse visual–inertial odometry with efficient second-order minimization. It is entirely implemented using the direct method, which includes a depth initialization module based on visual–inertial alignment, a stereo image tracking module, and a marginalization module. Inertial measurement unit (IMU) data is first aligned with a stereo image to initialize the system effectively. Then, based on the efficient second-order minimization (ESM) algorithm, the photometric error and the inertial error are minimized to jointly optimize camera poses and sparse scene geometry. IMU information is accumulated between several frames using measurement preintegration and is inserted into the optimization as an additional constraint between keyframes. A marginalization module is added to reduce the computation complexity of the optimization and maintain the information about the previous states. The proposed system is evaluated on the KITTI visual odometry benchmark and the EuRoC dataset. The experimental results demonstrate that the proposed system achieves state-of-the-art performance in terms of accuracy and robustness. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

24 pages, 1751 KB  
Article
Robust JND-Guided Video Watermarking via Adaptive Block Selection and Temporal Redundancy
by Antonio Cedillo-Hernandez, Lydia Velazquez-Garcia, Manuel Cedillo-Hernandez, Ismael Dominguez-Jimenez and David Conchouso-Gonzalez
Mathematics 2025, 13(15), 2493; https://doi.org/10.3390/math13152493 - 3 Aug 2025
Viewed by 745
Abstract
This paper introduces a robust and imperceptible video watermarking framework designed for blind extraction in dynamic video environments. The proposed method operates in the spatial domain and combines multiscale perceptual analysis, adaptive Just Noticeable Difference (JND)-based quantization, and temporal redundancy via multiframe embedding. [...] Read more.
This paper introduces a robust and imperceptible video watermarking framework designed for blind extraction in dynamic video environments. The proposed method operates in the spatial domain and combines multiscale perceptual analysis, adaptive Just Noticeable Difference (JND)-based quantization, and temporal redundancy via multiframe embedding. Watermark bits are embedded selectively in blocks with high perceptual masking using a QIM strategy, and the corresponding DCT coefficients are estimated directly from the spatial domain to reduce complexity. To enhance resilience, each bit is redundantly inserted across multiple keyframes selected based on scene transitions. Extensive simulations over 21 benchmark videos (CIF, 4CIF, HD) validate that the method achieves superior performance in robustness and perceptual quality, with an average Bit Error Rate (BER) of 1.03%, PSNR of 50.1 dB, SSIM of 0.996, and VMAF of 97.3 under compression, noise, cropping, and temporal desynchronization. The system outperforms several recent state-of-the-art techniques in both quality and speed, requiring no access to the original video during extraction. These results confirm the method’s viability for practical applications such as copyright protection and secure video streaming. Full article
(This article belongs to the Section E: Applied Mathematics)
Show Figures

Figure 1

23 pages, 13739 KB  
Article
Traffic Accident Rescue Action Recognition Method Based on Real-Time UAV Video
by Bo Yang, Jianan Lu, Tao Liu, Bixing Zhang, Chen Geng, Yan Tian and Siyu Zhang
Drones 2025, 9(8), 519; https://doi.org/10.3390/drones9080519 - 24 Jul 2025
Viewed by 1088
Abstract
Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and [...] Read more.
Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and localization annotation. A total of 5082 keyframes were labeled with 1–5 targets each, and 14,412 instances of data were prepared (including flight altitude and camera angles) for action classification and position annotation. To mitigate the challenges posed by high-resolution drone footage with excessive redundant information, we propose the SlowFast-Traffic (SF-T) framework, a spatio-temporal sequence-based algorithm for recognizing traffic accident rescue actions. For more efficient extraction of target–background correlation features, we introduce the Actor-Centric Relation Network (ACRN) module, which employs temporal max pooling to enhance the time-dimensional features of static backgrounds, significantly reducing redundancy-induced interference. Additionally, smaller ROI feature map outputs are adopted to boost computational speed. To tackle class imbalance in incident samples, we integrate a Class-Balanced Focal Loss (CB-Focal Loss) function, effectively resolving rare-action recognition in specific rescue scenarios. We replace the original Faster R-CNN with YOLOX-s to improve the target detection rate. On our proposed dataset, the SF-T model achieves a mean average precision (mAP) of 83.9%, which is 8.5% higher than that of the standard SlowFast architecture while maintaining a processing speed of 34.9 tasks/s. Both accuracy-related metrics and computational efficiency are substantially improved. The proposed method demonstrates strong robustness and real-time analysis capabilities for modern traffic rescue action recognition. Full article
Show Figures

Figure 1

33 pages, 4382 KB  
Article
A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation
by Peng Liu, Yuxuan Bi, Caixia Wang and Xiaojiao Jiang
Drones 2025, 9(7), 504; https://doi.org/10.3390/drones9070504 - 18 Jul 2025
Viewed by 1900
Abstract
To overcome the limitations in the perception performance of individual robots and homogeneous robot teams, this paper presents a distributed multi-robot collaborative SLAM method based on air–ground cross-domain cooperation. By integrating environmental perception data from UAV and UGV teams across air and ground [...] Read more.
To overcome the limitations in the perception performance of individual robots and homogeneous robot teams, this paper presents a distributed multi-robot collaborative SLAM method based on air–ground cross-domain cooperation. By integrating environmental perception data from UAV and UGV teams across air and ground domains, this method enables more efficient, robust, and globally consistent autonomous positioning and mapping. First, to address the challenge of significant differences in the field of view between UAVs and UGVs, which complicates achieving a unified environmental understanding, this paper proposes an iterative registration method based on semantic and geometric features assistance. This method calculates the correspondence probability of the air–ground loop closure keyframes using these features and iteratively computes the rotation angle and translation vector to determine the coordinate transformation matrix. The resulting matrix provides strong initialization for back-end optimization, which helps to significantly reduce global pose estimation errors. Next, to overcome the convergence difficulties and high computational complexity of large-scale distributed back-end nonlinear pose graph optimization, this paper introduces a multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization. This method constructs a multi-level, balanced pose subgraph based on the coupling degree of robot nodes. Then, it uses the minimization substitution function of non-trivial loss kernel optimization to gradually converge the distributed pose graph optimization problem to a first-order critical point, thereby significantly improving global pose estimation accuracy. Finally, experimental results on benchmark SLAM datasets and the GRACO dataset demonstrate that the proposed method effectively integrates environmental feature information from air–ground cross-domain UAV and UGV teams, achieving high-precision global pose estimation and map construction. Full article
Show Figures

Figure 1

Back to TopTop