Loading [MathJax]/jax/output/HTML-CSS/jax.js
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (512)

Search Parameters:
Keywords = 3D real-scene

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 4347 KiB  
Article
Optimization Method of Human Posture Recognition Based on Kinect V2 Sensor
by Hang Li, Hao Li, Ying Qin and Yiming Liu
Biomimetics 2025, 10(4), 254; https://doi.org/10.3390/biomimetics10040254 - 21 Apr 2025
Viewed by 203
Abstract
Human action recognition aims to understand human behavior and is crucial in enhancing the intelligence and naturalness of human–computer interaction and bionic robots. This paper proposes a method to improve the complexity and real-time performance of action recognition by combining the Kinect sensor [...] Read more.
Human action recognition aims to understand human behavior and is crucial in enhancing the intelligence and naturalness of human–computer interaction and bionic robots. This paper proposes a method to improve the complexity and real-time performance of action recognition by combining the Kinect sensor with the OpenPose algorithm, the Levenberg–Marquardt (LM) algorithm, and the Dynamic Time Warping (DTW) algorithm. First, the Kinect V2 depth sensor is used to capture color images, depth images, and 3D skeletal point information from the human body. Next, the color image is processed using OpenPose to extract 2D skeletal point information, which is then mapped to the depth image to obtain 3D skeletal point information. Subsequently, the LM algorithm is employed to fuse the 3D skeletal point sequences with the sequences obtained from Kinect, generating stable 3D skeletal point sequences. Finally, the DTW algorithm is utilized to recognize complex movements. Experimental results across various scenes and actions demonstrate that the method is stable and accurate, achieving an average recognition rate of 95.94%. The method effectively addresses issues, such as jitter and self-occlusion, when Kinect collects skeletal points. The robustness and accuracy of the method make it highly suitable for application in robot interaction systems. Full article
Show Figures

Graphical abstract

27 pages, 32676 KiB  
Article
Action Recognition via Multi-View Perception Feature Tracking for Human–Robot Interaction
by Chaitanya Bandi and Ulrike Thomas
Robotics 2025, 14(4), 53; https://doi.org/10.3390/robotics14040053 - 19 Apr 2025
Viewed by 262
Abstract
Human–Robot Interaction (HRI) depends on robust perception systems that enable intuitive and seamless interaction between humans and robots. This work introduces a multi-view perception framework designed for HRI, incorporating object detection and tracking, human body and hand pose estimation, unified hand–object pose estimation, [...] Read more.
Human–Robot Interaction (HRI) depends on robust perception systems that enable intuitive and seamless interaction between humans and robots. This work introduces a multi-view perception framework designed for HRI, incorporating object detection and tracking, human body and hand pose estimation, unified hand–object pose estimation, and action recognition. We use the state-of-the-art object detection architecture to understand the scene for object detection and segmentation, ensuring high accuracy and real-time performance. In interaction environments, 3D whole-body pose estimation is necessary, and we integrate an existing work with high inference speed. We propose a novel architecture for 3D unified hand–object pose estimation and tracking, capturing real-time spatial relationships between hands and objects. Furthermore, we incorporate action recognition by leveraging whole-body pose, unified hand–object pose estimation, and object tracking to determine the handover interaction state. The proposed architecture is evaluated on large-scale, open-source datasets, demonstrating competitive accuracy and faster inference times, making it well-suited for real-time HRI applications. Full article
(This article belongs to the Special Issue Human–AI–Robot Teaming (HART))
Show Figures

Figure 1

22 pages, 31401 KiB  
Article
BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
by Daniel Ayo Oladele, Elisha Didam Markus and Adnan M. Abu-Mahfouz
AI 2025, 6(4), 82; https://doi.org/10.3390/ai6040082 - 18 Apr 2025
Viewed by 458
Abstract
Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain [...] Read more.
Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain degradation and depth estimation inaccuracies persist. This paper introduces BEVCAM3D, a unified bird’s-eye view (BEV) architecture that fuses monocular cameras and LiDAR point clouds to overcome single-sensor limitations. BEVCAM3D integrates a deformable cross-modality attention module for feature alignment and a fast ground segmentation algorithm to reduce computational overhead by 40%. Evaluated on the nuScenes dataset, BEVCAM3D achieves state-of-the-art performance, with a 73.9% mAP and a 76.2% NDS, outperforming existing LiDAR-camera fusion methods like SparseFusion (72.0% mAP) and IS-Fusion (73.0% mAP). Notably, it excels in detecting pedestrians (91.0% AP) and traffic cones (89.9% AP), addressing the class imbalance in autonomous driving scenarios. The framework supports real-time inference at 11.2 FPS with an EfficientDet-B3 backbone and demonstrates robustness under low-light conditions (62.3% nighttime mAP). Full article
(This article belongs to the Section AI in Autonomous Systems)
Show Figures

Figure 1

16 pages, 8416 KiB  
Article
DIN-SLAM: Neural Radiance Field-Based SLAM with Depth Gradient and Sparse Optical Flow for Dynamic Interference Resistance
by Tianzi Zhang, Zhaoyang Xia, Mingrui Li and Lirong Zheng
Electronics 2025, 14(8), 1632; https://doi.org/10.3390/electronics14081632 - 17 Apr 2025
Viewed by 249
Abstract
The neural implicit SLAM system performs excellently in static environments, offering higher-quality rendering and scene reconstruction capabilities compared to traditional dense SLAM. However, in dynamic real-world scenes, these systems often experience tracking drift and mapping errors. To address these problems, we suggest DIN-SLAM, [...] Read more.
The neural implicit SLAM system performs excellently in static environments, offering higher-quality rendering and scene reconstruction capabilities compared to traditional dense SLAM. However, in dynamic real-world scenes, these systems often experience tracking drift and mapping errors. To address these problems, we suggest DIN-SLAM, a dynamic scene neural implicit SLAM system based on optical flow and depth gradient verification. DIN-SLAM combines depth gradients, optical flow, and motion consistency to achieve robust filtering of dynamic pixels, while optimizing dynamic feature points through optical flow registration to enhance tracking accuracy. The system also introduces a dynamic scene optimization strategy that utilizes photometric consistency loss, depth gradient loss, motion consistency constraints, and edge matching constraints to improve geometric consistency and reconstruction performance in dynamic environments. To reduce the interference of dynamic objects on scene reconstruction and eliminate artifacts in scene updates, we propose a targeted rendering and ray sampling strategy based on local feature counts, effectively mitigating the impact of dynamic object occlusions on reconstruction. Our method supports multiple sensor inputs, including pure RGB and RGB-D. The experimental results demonstrate that our approach consistently outperforms state-of-the-art baseline methods, achieving an 83.4% improvement in Absolute Trajectory Error Root Mean Square Error (ATE RMSE), a 91.7% enhancement in Peak Signal-to-Noise Ratio (PSNR), and the elimination of artifacts caused by dynamic interference. These enhancements significantly boost the performance of tracking and mapping in dynamic scenes. Full article
Show Figures

Figure 1

29 pages, 6622 KiB  
Article
Semantic Fusion Algorithm of 2D LiDAR and Camera Based on Contour and Inverse Projection
by Xingyu Yuan, Yu Liu, Tifan Xiong, Wei Zeng and Chao Wang
Sensors 2025, 25(8), 2526; https://doi.org/10.3390/s25082526 - 17 Apr 2025
Viewed by 256
Abstract
Common single-line 2D LiDAR sensors and cameras have become core components in the field of robotic perception due to their low cost, compact size, and practicality. However, during the data fusion process, the randomness and complexity of real industrial scenes pose challenges. Traditional [...] Read more.
Common single-line 2D LiDAR sensors and cameras have become core components in the field of robotic perception due to their low cost, compact size, and practicality. However, during the data fusion process, the randomness and complexity of real industrial scenes pose challenges. Traditional calibration methods for LiDAR and cameras often rely on precise targets and can accumulate errors, leading to significant limitations. Additionally, the semantic fusion of LiDAR and camera data typically requires extensive projection calculations, complex clustering algorithms, or sophisticated data fusion techniques, resulting in low real-time performance when handling large volumes of data points in dynamic environments. To address these issues, this paper proposes a semantic fusion algorithm for LiDAR and camera data based on contour and inverse projection. The method has two remarkable features: (1) Combined with the ellipse extraction algorithm of the arc support line segment, a LiDAR and camera calibration algorithm based on various regular shapes of an environmental target is proposed, which improves the adaptability of the calibration algorithm to the environment. (2) This paper proposes a semantic segmentation algorithm based on the inverse projection of target contours. It is specifically designed to be versatile and applicable to both linear and arc features, significantly broadening the range of features that can be utilized in various tasks. This flexibility is a key advantage, as it allows the method to adapt to a wider variety of real-world scenarios where both types of features are commonly encountered. Compared with existing LiDAR point cloud semantic segmentation methods, this algorithm eliminates the need for complex clustering algorithms, data fusion techniques, and extensive laser point reprojection calculations. When handling a large number of laser points, the proposed method requires only one or two inverse projections of the contour to filter the range of laser points that intersect with specific targets. This approach enhances both the accuracy of point cloud searches and the speed of semantic processing. Finally, the validity of the semantic fusion algorithm is proven by field experiments. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

20 pages, 41816 KiB  
Article
The 3D Gaussian Splatting SLAM System for Dynamic Scenes Based on LiDAR Point Clouds and Vision Fusion
by Yuquan Zhang, Guangan Jiang, Mingrui Li and Guosheng Feng
Appl. Sci. 2025, 15(8), 4190; https://doi.org/10.3390/app15084190 - 10 Apr 2025
Viewed by 941
Abstract
This paper presents a novel 3D Gaussian Splatting (3DGS)-based Simultaneous Localization and Mapping (SLAM) system that integrates Light Detection and Ranging (LiDAR) and vision data to enhance dynamic scene tracking and reconstruction. Existing 3DGS systems face challenges in sensor fusion and handling dynamic [...] Read more.
This paper presents a novel 3D Gaussian Splatting (3DGS)-based Simultaneous Localization and Mapping (SLAM) system that integrates Light Detection and Ranging (LiDAR) and vision data to enhance dynamic scene tracking and reconstruction. Existing 3DGS systems face challenges in sensor fusion and handling dynamic objects. To address these, we introduce a hybrid uncertainty-based 3D segmentation method that leverages uncertainty estimation and 3D object detection, effectively removing dynamic points and improving static map reconstruction. Our system also employs a sliding window-based keyframe fusion strategy that reduces computational load while maintaining accuracy. By incorporating a novel dynamic rendering loss function and pruning techniques, we suppress artifacts such as ghosting and ensure real-time operation in complex environments. Extensive experiments show that our system outperforms existing methods in dynamic object removal and overall reconstruction quality. The key innovations of our work lie in its integration of hybrid uncertainty-based segmentation, dynamic rendering loss functions, and an optimized sliding window strategy, which collectively enhance robustness and efficiency in dynamic scene reconstruction. This approach offers a promising solution for real-time robotic applications, including autonomous navigation and augmented reality. Full article
(This article belongs to the Special Issue Trends and Prospects for Wireless Sensor Networks and IoT)
Show Figures

Figure 1

19 pages, 15020 KiB  
Article
Discrete Diffusion-Based Generative Semantic Scene Completion
by Yiqi Wu, Xuan Huang, Boxiong Yang, Yong Chen, Fadi Aburaid and Dejun Zhang
Electronics 2025, 14(7), 1447; https://doi.org/10.3390/electronics14071447 - 3 Apr 2025
Viewed by 232
Abstract
Semantic scene completion through AI-driven content generation is a rapidly evolving field with crucial applications in 3D reconstruction and scene understanding. This task presents considerable challenges, arising from the intrinsic data sparsity and incomplete nature of input points generated by LiDAR. This paper [...] Read more.
Semantic scene completion through AI-driven content generation is a rapidly evolving field with crucial applications in 3D reconstruction and scene understanding. This task presents considerable challenges, arising from the intrinsic data sparsity and incomplete nature of input points generated by LiDAR. This paper proposes a generative semantic scene completion method based on a discrete denoising diffusion probabilistic model to tackle these issues. In the discrete diffusion phase, a weighted K-nearest neighbor uniform transition kernel is introduced based on feature distance in the discretized voxel space to control the category distribution transition processes by capturing the local structure of data, which is more in line with the diffusion process in the real world. Moreover, to mitigate the feature information loss during point cloud voxelization, the aggregated point features are integrated into the corresponding voxel space, thereby enhancing the granularity of the completion. Accordingly, a combined loss function is designed for network training that considers both the KL divergence for global completion and the cross-entropy for local details. The evaluation, which results from multiple public outdoor datasets, demonstrates that the proposed method effectively accomplishes semantic scene completion. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

37 pages, 94699 KiB  
Article
Two-Dimensional Spatial Variation Analysis and Correction Method for High-Resolution Wide-Swath Spaceborne Synthetic Aperture Radar (SAR) Imaging
by Zhenyu Hou, Pin Li, Zehua Zhang, Zhuo Yun, Feng He and Zhen Dong
Remote Sens. 2025, 17(7), 1262; https://doi.org/10.3390/rs17071262 - 2 Apr 2025
Viewed by 199
Abstract
With the development and application of spaceborne Synthetic Aperture Radar (SAR), higher resolution and a wider swath have become significant demands. However, as the resolution increases and the swath widens, the two-dimensional (2D) spatial variation between different targets in the scene and the [...] Read more.
With the development and application of spaceborne Synthetic Aperture Radar (SAR), higher resolution and a wider swath have become significant demands. However, as the resolution increases and the swath widens, the two-dimensional (2D) spatial variation between different targets in the scene and the radar becomes very pronounced, severely affecting the high-precision focusing and high-quality imaging of spaceborne SAR. In previous studies on the correction of two-dimensional spatial variation in spaceborne SAR, either the models were not accurate enough or the computational efficiency was low, limiting the application of corresponding algorithms. In this paper, we first establish a slant range model and a signal model based on the zero-Doppler moment according to the spaceborne SAR geometry, thereby significantly reducing the impact of azimuth spatial variation in two-dimensional spatial variation. Subsequently, we propose a Curve-Sphere Model (CUSM) to describe the ground observation geometry of spaceborne SAR, and based on this, we establish a more accurate theoretical model and quantitative description of two-dimensional spatial variation. Next, through modeling and simulation, we conduct an in-depth analysis of the impact of two-dimensional spatial variation on spaceborne SAR imaging, obtaining corresponding constraints and thresholds and concluding that in most cases, only one type of azimuth spatial variation needs to be considered, thereby greatly reducing the demand and difficulty of two-dimensional spatial variation correction. Relying on these, we propose a two-dimensional spatial variation correction method that combines range blocking and azimuth nonlinear chirp scaling processing and analyze its scalability to be applicable to more general cases. Finally, the effectiveness and applicability of the proposed method are validated through both simulation experiments and real data experiments. Full article
Show Figures

Figure 1

19 pages, 13596 KiB  
Article
SMS3D: 3D Synthetic Mushroom Scenes Dataset for 3D Object Detection and Pose Estimation
by Abdollah Zakeri, Bikram Koirala, Jiming Kang, Venkatesh Balan, Weihang Zhu, Driss Benhaddou and Fatima A. Merchant
Computers 2025, 14(4), 128; https://doi.org/10.3390/computers14040128 - 1 Apr 2025
Viewed by 304
Abstract
The mushroom farming industry struggles to automate harvesting due to limited large-scale annotated datasets and the complex growth patterns of mushrooms, which complicate detection, segmentation, and pose estimation. To address this, we introduce a synthetic dataset with 40,000 unique scenes of white Agaricus [...] Read more.
The mushroom farming industry struggles to automate harvesting due to limited large-scale annotated datasets and the complex growth patterns of mushrooms, which complicate detection, segmentation, and pose estimation. To address this, we introduce a synthetic dataset with 40,000 unique scenes of white Agaricus bisporus and brown baby bella mushrooms, capturing realistic variations in quantity, position, orientation, and growth stages. Our two-stage pose estimation pipeline combines 2D object detection and instance segmentation with a 3D point cloud-based pose estimation network using a Point Transformer. By employing a continuous 6D rotation representation and a geodesic loss, our method ensures precise rotation predictions. Experiments show that processing point clouds with 1024 points and the 6D Gram–Schmidt rotation representation yields optimal results, achieving an average rotational error of 1.67° on synthetic data, surpassing current state-of-the-art methods in mushroom pose estimation. The model, further, generalizes well to real-world data, attaining a mean angle difference of 3.68° on a subset of the M18K dataset with ground-truth annotations. This approach aims to drive automation in harvesting, growth monitoring, and quality assessment in the mushroom industry. Full article
(This article belongs to the Special Issue Advanced Image Processing and Computer Vision (2nd Edition))
Show Figures

Figure 1

29 pages, 16314 KiB  
Article
A Novel Framework for Real ICMOS Image Denoising: LD-NGN Noise Modeling and a MAST-Net Denoising Network
by Yifu Luo, Ting Zhang, Ruizhi Li, Bin Zhang, Nan Jia and Liping Fu
Remote Sens. 2025, 17(7), 1219; https://doi.org/10.3390/rs17071219 - 29 Mar 2025
Viewed by 252
Abstract
Intensified complementary metal-oxide semiconductor (ICMOS) sensors involve multiple steps, including photoelectric conversion and photoelectric multiplication, each of which introduces noise that significantly impacts image quality. To address the issues of insufficient denoising performance and poor model generalization in ICMOS image denoising, this paper [...] Read more.
Intensified complementary metal-oxide semiconductor (ICMOS) sensors involve multiple steps, including photoelectric conversion and photoelectric multiplication, each of which introduces noise that significantly impacts image quality. To address the issues of insufficient denoising performance and poor model generalization in ICMOS image denoising, this paper proposes a systematic solution. First, we established an experimental platform to collect real ICMOS images and introduced a novel noise generation network (LD-NGN) that accurately simulates the strong sparsity and spatial clustering of ICMOS noise, generating a multi-scene paired dataset. Additionally, we proposed a new noise evaluation metric, KL-Noise, which allows a more precise quantification of noise distribution. Based on this, we designed a denoising network specifically for ICMOS images, MAST-Net, and trained it using the multi-scene paired dataset generated by LD-NGN. By capturing multi-scale features of image pixels, MAST-Net effectively removes complex noise. The experimental results show that, compared to traditional methods and denoisers trained with other noise generators, our method outperforms both qualitatively and quantitatively. The denoised images achieve a peak signal-to-noise ratio (PSNR) of 35.38 dB and a structural similarity index (SSIM) of 0.93. This optimization provides support for tasks such as image preprocessing, target recognition, and feature extraction. Full article
Show Figures

Graphical abstract

20 pages, 8973 KiB  
Article
UE-SLAM: Monocular Neural Radiance Field SLAM with Semantic Mapping Capabilities
by Yuquan Zhang, Guangan Jiang, Mingrui Li and Guosheng Feng
Symmetry 2025, 17(4), 508; https://doi.org/10.3390/sym17040508 - 27 Mar 2025
Viewed by 426
Abstract
Neural Radiance Fields (NeRF) have transformed 3D reconstruction by enabling high-fidelity scene generation from sparse views. However, existing neural SLAM systems face challenges such as limited scene understanding and heavy reliance on depth sensors. We propose UE-SLAM, a real-time monocular SLAM system integrating [...] Read more.
Neural Radiance Fields (NeRF) have transformed 3D reconstruction by enabling high-fidelity scene generation from sparse views. However, existing neural SLAM systems face challenges such as limited scene understanding and heavy reliance on depth sensors. We propose UE-SLAM, a real-time monocular SLAM system integrating semantic segmentation, depth fusion, and robust tracking modules. By leveraging the inherent symmetry between semantic segmentation and depth estimation, UE-SLAM utilizes DINOv2 for instance segmentation and combines monocular depth estimation, radiance field-rendered depth, and an uncertainty framework to produce refined proxy depth. This approach enables high-quality semantic mapping and eliminates the need for depth sensors. Experiments on benchmark datasets demonstrate that UE-SLAM achieves robust semantic segmentation, detailed scene reconstruction, and accurate tracking, significantly outperforming existing monocular SLAM methods. The modular and symmetrical architecture of UE-SLAM ensures a balance between computational efficiency and reconstruction quality, aligning with the thematic focus of symmetry in engineering and computational systems. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

15 pages, 2753 KiB  
Article
Monocular Object-Level SLAM Enhanced by Joint Semantic Segmentation and Depth Estimation
by Ruicheng Gao and Yue Qi
Sensors 2025, 25(7), 2110; https://doi.org/10.3390/s25072110 - 27 Mar 2025
Viewed by 366
Abstract
SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances. However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes. Moreover, high-level semantic [...] Read more.
SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances. However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes. Moreover, high-level semantic information can always contribute to the SLAM process due to its similarity to human vision. Addressing these problems, we propose a monocular object-level SLAM system enhanced by real-time joint depth estimation and semantic segmentation. The multi-task network, called JSDNet, is designed to predict depth and semantic segmentation simultaneously, with four contributions that include depth discretization, feature fusion, a weight-learned loss function, and semantic consistency optimization. Specifically, feature fusion facilitates the sharing of features between the two tasks, while semantic consistency aims to guarantee the semantic segmentation and depth consistency among various views. Based on the results of JSDNet, we design an object-level system that combines both pixel-level and object-level semantics with traditional tracking, mapping, and optimization processes. In addition, a scale recovery process is also integrated into the system to evaluate the truth scale. Experimental results on NYU depth v2 demonstrate state-of-the-art depth estimation and considerable segmentation precision under real-time performance, while the trajectory accuracy on TUM RGB-D shows less errors compared with other SLAM systems. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

29 pages, 5776 KiB  
Article
Intention Reasoning for User Action Sequences via Fusion of Object Task and Object Action Affordances Based on Dempster–Shafer Theory
by Yaxin Liu, Can Wang, Yan Liu, Wenlong Tong and Ming Zhong
Sensors 2025, 25(7), 1992; https://doi.org/10.3390/s25071992 - 22 Mar 2025
Viewed by 335
Abstract
To reduce the burden on individuals with disabilities when operating a Wheelchair Mounted Robotic Arm (WMRA), many researchers have focused on inferring users’ subsequent task intentions based on their “gazing” or “selecting” of scene objects. In this paper, we propose an innovative intention [...] Read more.
To reduce the burden on individuals with disabilities when operating a Wheelchair Mounted Robotic Arm (WMRA), many researchers have focused on inferring users’ subsequent task intentions based on their “gazing” or “selecting” of scene objects. In this paper, we propose an innovative intention reasoning method for users’ action sequences by fusing object task and object action affordances based on Dempster–Shafer Theory (D-S theory). This method combines the advantages of probabilistic reasoning and visual affordance detection to establish an affordance model for objects and potential tasks or actions based on users’ habits and object attributes. This facilitates encoding object task (OT) affordance and object action (OA) affordance using D-S theory to perform action sequence reasoning. Specifically, the method includes three main aspects: (1) inferring task intentions from the object of user focus based on object task affordances encoded with Causal Probabilistic Logic (CP-Logic); (2) inferring action intentions based on object action affordances; and (3) integrating OT and OA affordances through D-S theory. Experimental results demonstrate that the proposed method reduces the number of interactions by an average of 14.085% compared to independent task intention inference and by an average of 52.713% compared to independent action intention inference. This demonstrates that the proposed method can capture the user’s real intention more accurately and significantly reduce unnecessary human–computer interaction. Full article
Show Figures

Figure 1

28 pages, 4102 KiB  
Article
Three-Dimensional Instance Segmentation of Rooms in Indoor Building Point Clouds Using Mask3D
by Michael Brunklaus, Maximilian Kellner and Alexander Reiterer
Remote Sens. 2025, 17(7), 1124; https://doi.org/10.3390/rs17071124 - 21 Mar 2025
Viewed by 508
Abstract
While most recent work in room instance segmentation relies on orthographic top-down projections of 3D point clouds to 2D density maps, leading to information loss of one dimension, 3D instance segmentation methods based on deep learning were rarely considered. We explore the potential [...] Read more.
While most recent work in room instance segmentation relies on orthographic top-down projections of 3D point clouds to 2D density maps, leading to information loss of one dimension, 3D instance segmentation methods based on deep learning were rarely considered. We explore the potential of the general 3D instance segmentation deep learning model Mask3D for room instance segmentation in indoor building point clouds. We show that Mask3D generates meaningful predictions for multi-floor scenes. After hyperparameter optimization, Mask3D outperforms the current state-of-the-art method RoomFormer evaluated in 3D on the synthetic Structured3D dataset. We provide generalization results of Mask3D trained on Structured3D to the real-world S3DIS and Matterport3D datasets, showing a domain gap. Fine-tuning improves the results. In contrast to related work in room instance segmentation, we employ the more expressive mean average precision (mAP) metric, and we propose the more intuitive successfully detected rooms (SDR) metric, which is an absolute recall measure. Our results indicate potential for the digitization of the construction industry. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

27 pages, 38446 KiB  
Article
YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment
by Yifeng Pan, Yonghong Long, Xin Li and Yejing Cai
Information 2025, 16(3), 229; https://doi.org/10.3390/info16030229 - 15 Mar 2025
Viewed by 453
Abstract
In the aluminum electrolysis production workshop, heavy-load overhead cranes equipped with multi-functional operation terminals are responsible for critical tasks such as anode replacement, shell breaking, slag removal, and material feeding. The real-time monitoring of these four types of operation terminals is of the [...] Read more.
In the aluminum electrolysis production workshop, heavy-load overhead cranes equipped with multi-functional operation terminals are responsible for critical tasks such as anode replacement, shell breaking, slag removal, and material feeding. The real-time monitoring of these four types of operation terminals is of the utmost importance for ensuring production safety. High-resolution cameras are used to capture dynamic scenes of operation. However, the terminals undergo morphological changes and rotations in three-dimensional space according to task requirements during operations, lacking rotational invariance. This factor complicates the detection and recognition of multi-form targets in 3D environment. Additionally, operations like striking and material feeding generate significant dust, often visually obscuring the terminal targets. The challenge of real-time multi-form object detection in high-resolution images affected by smoke and dust environments demands detection and dehazing algorithms. To address these issues, we propose the YOLOv8n-Al-Dehazing method, which achieves the precise detection of multi-functional material handling terminals in aluminum electrolysis workshops. To overcome the heavy computational costs associated with processing high-resolution images by using YOLOv8n, our method refines YOLOv8n through component substitution and integrates real-time dehazing preprocessing for high-resolution images, thereby reducing the image processing time. We collected on-site data to construct a dataset for experimental validation. Compared with the YOLOv8n method, our method approach increases inference speed by 15.54%, achieving 120.4 frames per second, which meets the requirements for real-time detection on site. Furthermore, compared with state-of-the-art detection methods and variants of YOLO, YOLOv8n-Al-Dehazing demonstrates superior performance, attaining an accuracy rate of 91.0%. Full article
Show Figures

Figure 1

Back to TopTop