MDPI - Publisher of Open Access Journals

22 pages, 5746 KB

Open AccessArticle

AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching

by Qianglong Feng, Xiaofeng Wang, Zhenglin Lu, Haiyu Wang, Tingfeng Qi and Tianyi Zhang

Sensors 2025, 25(18), 5905; https://doi.org/10.3390/s25185905 - 21 Sep 2025

Viewed by 269

The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To [...] Read more.

The performance of unsupervised stereo matching in complex regions such as weak textures and occlusions is constrained by the inherently local receptive fields of convolutional neural networks (CNNs), the absence of geometric priors, and the limited expressiveness of MLP in conventional ViTs. To address these problems, we propose an Adaptive Geometry-aware Stereo-KANformer Network (AGSK-Net) for unsupervised stereo matching. Firstly, to resolve the conflict between the isotropic nature of traditional ViT and the epipolar geometry priors in stereo matching, we propose Adaptive Geometry-aware Multi-head Self-Attention (AG-MSA), which embeds epipolar priors via an adaptive hybrid structure of geometric modulation and penalty, enabling geometry-aware global context modeling. Secondly, we design Spatial Group-Rational KAN (SGR-KAN), which integrates the nonlinear capability of rational functions with the spatial awareness of deep convolutions, replacing the MLP with flexible, learnable rational functions to enhance the nonlinear expression ability of complex regions. Finally, we propose a Dynamic Candidate Gated Fusion (DCGF) module that employs dynamic dual-candidate states and spatially aware pre-enhancement to adaptively fuse global and local features across scales. Experiments demonstrate that AGSK-Net achieves state-of-the-art accuracy and generalizability on Scene Flow, KITTI 2012/2015, and Middlebury 2021. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

23 pages, 25528 KB

Open AccessArticle

UGC-Net: Uncertainty-Guided Cost Volume Optimization with Contextual Features for Satellite Stereo Matching

by Wonje Jeong and Soon-Yong Park

Remote Sens. 2025, 17(10), 1772; https://doi.org/10.3390/rs17101772 - 19 May 2025

Viewed by 716

Abstract

Disparity estimation in satellite stereo images is a highly challenging task due to complex terrain, occlusions caused by tall buildings and structures, and texture-less regions such as roads, rivers, and building roofs. Recent deep learning-based satellite stereo disparity estimation methods have adopted cascade [...] Read more.

Disparity estimation in satellite stereo images is a highly challenging task due to complex terrain, occlusions caused by tall buildings and structures, and texture-less regions such as roads, rivers, and building roofs. Recent deep learning-based satellite stereo disparity estimation methods have adopted cascade multi-scale feature extraction techniques to address these challenges. However, the recent learning-based methods still struggle to effectively estimate disparity in the high ambiguity regions. This paper proposes a disparity estimation and refinement method that leverages variance uncertainty in the cost volume to overcome these limitations. The proposed method calculates variance uncertainty from the cost volume and generates uncertainty weights to adjust the cost volume based on this information. These weights are designed to emphasize geometric features in regions with low uncertainty while enhancing contextual features in regions with high uncertainty, such as occluded or texture-less areas. Furthermore, the proposed method introduces a pseudo volume, referred to as the 4D context volume, which extends the reference image’s features during the stereo-matching aggregation step. By integrating the 4D context volume into the aggregation layer of the geometric cost volume, our method effectively addresses challenges in disparity estimation, particularly in occluded and texture-less areas. For the evaluation of the proposed method, we use the Urban Semantic 3D dataset and the WHU-Stereo dataset. The evaluation results show that the proposed method achieves state-of-the-art performance, improving disparity accuracy in challenging regions. Full article

► Show Figures

Figure 1

19 pages, 43835 KB

Open AccessArticle

A Stereo Disparity Map Refinement Method Without Training Based on Monocular Segmentation and Surface Normal

by Haoxuan Sun and Taoyang Wang

Remote Sens. 2025, 17(9), 1587; https://doi.org/10.3390/rs17091587 - 30 Apr 2025

Viewed by 1012

Abstract

Stereo disparity estimation is an essential component in computer vision and photogrammetry with many applications. However, there is a lack of real-world large datasets and large-scale models in the domain. Inspired by recent advances in the foundation model for image segmentation, we explore [...] Read more.

Stereo disparity estimation is an essential component in computer vision and photogrammetry with many applications. However, there is a lack of real-world large datasets and large-scale models in the domain. Inspired by recent advances in the foundation model for image segmentation, we explore the RANSAC disparity refinement based on zero-shot monocular surface normal prediction and SAM segmentation masks, which combine stereo matching models and advanced monocular large-scale vision models. The disparity refinement problem is formulated as follows: extracting geometric structures based on SAM masks and surface normal prediction, building disparity map hypotheses of the geometric structures, and selecting the hypotheses-based weighted RANSAC method. We believe that after obtaining geometry structures, even if there is only a part of the correct disparity in the geometry structure, the entire correct geometry structure can be reconstructed based on the prior geometry structure. Our method can best optimize the results of traditional models such as SGM or deep learning models such as MC-CNN. The model obtains 15.48% D1-error without training on the US3D dataset and obtains 6.09% bad 2.0 error and 3.65% bad 4.0 error on the Middlebury dataset. The research helps to promote the development of scene and geometric structure understanding in stereo disparity estimation and the application of combining advanced large-scale monocular vision models with stereo matching methods. Full article

(This article belongs to the Special Issue 3D City Modeling and Observation Using Remote Sensing and Artificial Intelligence)

► Show Figures

Figure 1

26 pages, 9869 KB

Open AccessArticle

CAGFNet: A Cross-Attention Image-Guided Fusion Network for Disparity Estimation of High-Resolution Satellite Stereo Images

by Qian Zhang, Jia Ge, Shufang Tian and Laidian Xi

Remote Sens. 2025, 17(9), 1572; https://doi.org/10.3390/rs17091572 - 28 Apr 2025

Viewed by 834

Abstract

Disparity estimation in high-resolution satellite stereo images is a critical task in remote sensing and photogrammetry. However, significant challenges arise due to the complexity of satellite stereo image scenes and the dynamic variations in disparities. Stereo matching becomes particularly difficult in areas with [...] Read more.

Disparity estimation in high-resolution satellite stereo images is a critical task in remote sensing and photogrammetry. However, significant challenges arise due to the complexity of satellite stereo image scenes and the dynamic variations in disparities. Stereo matching becomes particularly difficult in areas with textureless regions, repetitive patterns, disparity discontinuities, and occlusions. Recent advancements in deep learning have opened new research avenues for disparity estimation. This paper presents a novel end-to-end disparity estimation network designed to address these challenges through three key innovations: (1) a cross-attention mechanism for robust feature extraction, (2) an image-guided module that preserves geometric details, and (3) a 3D feature fusion module for context-aware disparity refinement. Experiments on the US3D dataset demonstrate State-of-the-Art performance, achieving an endpoint error (EPE) of 1.466 pixels (14.71% D1-error) on the Jacksonville subset and 0.996 pixels (10.53% D1-error) on the Omaha subset. The experimental results confirm that the proposed network excels in disparity estimation, exhibiting strong learning capability and robust generalization performance. Full article

► Show Figures

Graphical abstract

25 pages, 6410 KB

Open AccessArticle

Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume

by Zongcheng Zuo, Yuanxiang Li, Yu Zhou and Fan Mo

Sensors 2025, 25(7), 2233; https://doi.org/10.3390/s25072233 - 2 Apr 2025

Viewed by 1564

Abstract

Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional [...] Read more.

Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional kernels with scene perspective lines, while the use of metadata (e.g., camera pose distance) enables geometric reasoning during cost aggregation. In PAC-MVSNet, we introduce feature matching with long-range tracking that utilizes both internal and external focuses to integrate extensive contextual data within individual images as well as across multiple images. To enhance the performance of the feature matching with long-range tracking, we also propose a perspective-aware convolution module that directs the convolutional kernel to capture features along the perspective lines. This enables the module to extract perspective-aware features from images, improving the feature matching. Finally, we crafted a specific 2D CNN that fuses image priors, thereby integrating keyframes and geometric metadata within the cost volume to evaluate depth planes. Our method represents the first attempt to embed the existing physical model knowledge into a network for completing MVS tasks, which achieved optimal performance using multiple benchmark datasets. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 2nd Volume)

► Show Figures

Figure 1

21 pages, 12241 KB

Open AccessArticle

A Social Assistance System for Augmented Reality Technology to Redound Face Blindness with 3D Face Recognition

by Wen-Hau Jain, Bing-Gang Jhong and Mei-Yung Chen

Electronics 2025, 14(7), 1244; https://doi.org/10.3390/electronics14071244 - 21 Mar 2025

Cited by 1 | Viewed by 1042

Abstract

The objective of this study is to develop an Augmented Reality (AR) visual aid system to help patients with prosopagnosia recognize faces in social situations and everyday life. The primary contribution of this study is the use of 3D face models as the [...] Read more.

The objective of this study is to develop an Augmented Reality (AR) visual aid system to help patients with prosopagnosia recognize faces in social situations and everyday life. The primary contribution of this study is the use of 3D face models as the basis of data augmentation for facial recognition, which has practical applications for various social situations that patients with prosopagnosia find themselves in. The study comprises the following components: First, the affordances of Active Stereoscopy and stereo cameras were combined. Second, deep learning was employed to reconstruct a detailed 3D face model in real-time based on data from the 3D point cloud and the 2D image. Data were also retrieved from seven angles of the subject’s face to improve the accuracy of face recognition from the subject’s profile and in a range of dynamic interactions. Second, the data derived from the first step were entered into a convolutional neural network (CNN), which then generated a 128-dimensional characteristic vector. Next, the system deployed Structured Query Language (SQL) to compute and compare Euclidean distances to determine the smallest Euclidean distance and match it to the name that corresponded to the face; tagged face data were projected by the camera onto the AR lenses. The findings of this study show that our AR system has a robustness of more than 99% in terms of face recognition. This method offers a higher practical value than traditional 2D face recognition methods when it comes to large-pose 3D face recognition in day-to-day life. Full article

(This article belongs to the Special Issue Real-Time Computer Vision)

► Show Figures

Figure 1

24 pages, 6629 KB

Open AccessArticle

UnDER: Unsupervised Dense Point Cloud Extraction Routine for UAV Imagery Using Deep Learning

by John Ray Bergado and Francesco Nex

Remote Sens. 2025, 17(1), 24; https://doi.org/10.3390/rs17010024 - 25 Dec 2024

Viewed by 1085

Abstract

Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within [...] Read more.

Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within overlapping images captured by the UAV from different locations. Recent developments in deep learning utilize deep convolutional networks to perform this dense pixel correspondence task. A common theme in these developments is to train the network in a supervised setting using available dense 3D reference datasets. However, in this work we propose a novel unsupervised dense point cloud extraction routine for UAV imagery, called UnDER. We propose a novel disparity-shifting procedure to enable the use of a stereo matching network pretrained on an entirely different typology of image data in the disparity-estimation step of UnDER. Unlike previously proposed disparity-shifting techniques for forming cost volumes, the goal of our procedure was to address the domain shift between the images that the network was pretrained on and the UAV images, by using prior information from the UAV image acquisition. We also developed a procedure for occlusion masking based on disparity consistency checking that uses the disparity image space rather than the object space proposed in a standard 3D reconstruction routine for UAV data. Our benchmarking results demonstrated significant improvements in quantitative performance, reducing the mean cloud-to-cloud distance by approximately 1.8 times the ground sampling distance (GSD) compared to other methods. Full article

(This article belongs to the Special Issue Deep Learning Applications of 3D Reconstruction and Visualization from Remote Sensing Imagery)

► Show Figures

Figure 1

23 pages, 31563 KB

Open AccessArticle

Comparative Analysis of Deep Learning-Based Stereo Matching and Multi-View Stereo for Urban DSM Generation

by Mario Fuentes Reyes, Pablo d’Angelo and Friedrich Fraundorfer

Remote Sens. 2025, 17(1), 1; https://doi.org/10.3390/rs17010001 - 24 Dec 2024

Cited by 3 | Viewed by 2269

Abstract

The creation of digital surface models (DSMs) from aerial and satellite imagery is often the starting point for different remote sensing applications. For this task, the two main used approaches are stereo matching and multi-view stereo (MVS). The former needs stereo-rectified pairs as [...] Read more.

The creation of digital surface models (DSMs) from aerial and satellite imagery is often the starting point for different remote sensing applications. For this task, the two main used approaches are stereo matching and multi-view stereo (MVS). The former needs stereo-rectified pairs as inputs and the results are in the disparity domain. The latter works with images from various perspectives and produces a result in the depth domain. So far, both approaches have proven to be successful in producing accurate DSMs, especially in the deep learning area. Nonetheless, an assessment between the two is difficult due to the differences in the input data, the domain where the directly generated results are provided and the evaluation metrics. In this manuscript, we processed synthetic and real optical data to be compatible with the stereo and MVS algorithms. Such data is then applied to learning-based algorithms in both analyzed solutions. We focus on an experimental setting trying to establish a comparison between the algorithms as fair as possible. In particular, we looked at urban areas with high object densities and sharp boundaries, which pose challenges such as occlusions and depth discontinuities. Results show in general a good performance for all experiments, with specific differences in the reconstructed objects. We describe qualitatively and quantitatively the performance of the compared cases. Moreover, we consider an additional case to fuse the results into a DSM utilizing confidence estimation, showing a further improvement and opening up a possibility for further research. Full article

(This article belongs to the Section Urban Remote Sensing)

► Show Figures

Figure 1

20 pages, 4856 KB

Open AccessArticle

Enhancing the Ground Truth Disparity by MAP Estimation for Developing a Neural-Net Based Stereoscopic Camera

by Hanbit Gil, Sehyun Ryu and Sungmin Woo

Sensors 2024, 24(23), 7761; https://doi.org/10.3390/s24237761 - 4 Dec 2024

Viewed by 1871

Abstract

This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the [...] Read more.

This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the generalization issues often encountered with neural network-based disparity estimation. However, SGM struggles with occlusions and textureless areas, leading to invalid disparity values. Our approach, though relatively simple, mitigates these issues by interpolating invalid pixels using surrounding disparity information and Bayesian inference, improving both the visual quality of disparity maps and their usability for training neural network-based commercial depth-sensing devices. Experimental results validate that our enhanced disparity maps preserve SGM’s accuracy in valid regions while improving the overall performance of neural networks on both synthetic and real-world datasets. This method provides a robust framework for advanced stereoscopic camera systems, particularly in autonomous applications. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 2nd Volume)

► Show Figures

Figure 1

16 pages, 4359 KB

Open AccessArticle

Adaptive Kernel Convolutional Stereo Matching Recurrent Network

by Jiamian Wang, Haijiang Sun and Ping Jia

Sensors 2024, 24(22), 7386; https://doi.org/10.3390/s24227386 - 20 Nov 2024

Cited by 1 | Viewed by 1229

Abstract

For binocular stereo matching techniques, the most advanced method currently is using an iterative structure based on GRUs. Methods in this class have shown high performance on both high-resolution images and standard benchmarks. However, simply replacing cost aggregation with a GRU iterative method [...] Read more.

For binocular stereo matching techniques, the most advanced method currently is using an iterative structure based on GRUs. Methods in this class have shown high performance on both high-resolution images and standard benchmarks. However, simply replacing cost aggregation with a GRU iterative method leads to the original cost volume for disparity calculation lacking non-local geometric and contextual information. Based on this, this paper proposes a new GRU iteration-based adaptive kernel convolution deep recurrent network architecture for stereo matching. This paper proposes a kernel convolution-based adaptive multi-scale pyramid pooling (KAP) module that fully considers the spatial correlation between pixels and adds new matching attention (MAR) to refine the matching cost volume before inputting it into the iterative network for iterative updates, enhancing the pixel-level representation ability of the image and improving the overall generalization ability of the network. At present, the AKC-Stereo network proposed in this paper has a higher improvement than the basic network. On the Sceneflow dataset, the EPE of AKC-Stereo reaches 0.45, which is 0.02 higher than the basic network. On the KITTI 2015 dataset, the AKC-Stereo network outperforms the base network by 5.6% on the D1-all metric. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

15 pages, 8542 KB

Open AccessArticle

The Adversarial Robust and Generalizable Stereo Matching for Infrared Binocular Based on Deep Learning

by Bowen Liu, Jiawei Ji, Cancan Tao, Jujiu Li and Yingxun Wang

J. Imaging 2024, 10(11), 264; https://doi.org/10.3390/jimaging10110264 - 22 Oct 2024

Viewed by 1442

Abstract

Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method [...] Read more.

Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method that obviates the need for large infrared image datasets and adapts seamlessly to any specific infrared camera. Moreover, this adaptability extends to standard binocular images, allowing the method to work effectively on both infrared and visible light stereo images. We further investigate the role of infrared textures in a deep learning framework, demonstrating their continued utility for stereo matching even in complex lighting environments. To compute the matching cost volume, we apply the multi-scale census transform to the input stereo images. A stacked sand leak subnetwork is subsequently employed to address the matching task. Our approach substantially improves adversarial robustness while maintaining accuracy on comparison with state-of-the-art methods which decrease nearly a half in EPE for quantitative results on widely used autonomous driving datasets. Furthermore, the proposed method exhibits superior generalization capabilities, transitioning from simulated datasets to real-world datasets without the need for fine-tuning. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

16 pages, 6078 KB

Open AccessArticle

Matchability and Uncertainty-Aware Iterative Disparity Refinement for Stereo Matching

by Junwei Wang, Wei Zhou, Yujun Tang and Hanming Guo

Appl. Sci. 2024, 14(18), 8457; https://doi.org/10.3390/app14188457 - 19 Sep 2024

Viewed by 1505

Abstract

After significant progress in stereo matching, the pursuit of robust and efficient ill-posed-region disparity refinement methods remains challenging. To further improve the performance of disparity refinement, in this paper, we propose the matchability and uncertainty-aware iterative disparity refinement neural network. Firstly, a new [...] Read more.

After significant progress in stereo matching, the pursuit of robust and efficient ill-posed-region disparity refinement methods remains challenging. To further improve the performance of disparity refinement, in this paper, we propose the matchability and uncertainty-aware iterative disparity refinement neural network. Firstly, a new matchability and uncertainty decoder (MUD) is proposed to decode the matchability mask and disparity uncertainties, which are used to evaluate the reliability of feature matching and estimated disparity, thereby reducing the susceptibility to mismatched pixels. Then, based on the proposed MUD, we present two modules: the uncertainty-preferred disparity field initialization (UFI) and the masked hidden state global aggregation (MGA) modules. In the UFI, a multi-disparity window scan-and-select method is employed to provide a further initialized disparity field and more accurate initial disparity. In the MGA, the adaptive masked disparity field hidden state is globally aggregated to extend the propagation range per iteration, improving the refinement efficiency. Finally, the experimental results on public datasets show that the proposed model achieves a reduction up to 17.9% in disparity average error and 16.9% in occluded outlier proportion, respectively, demonstrating its more practical handling of ill-posed regions. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

13 pages, 3182 KB

Open AccessArticle

Simultaneous Stereo Matching and Confidence Estimation Network

by Tobias Schmähling, Tobias Müller, Jörg Eberhardt and Stefan Elser

J. Imaging 2024, 10(8), 198; https://doi.org/10.3390/jimaging10080198 - 14 Aug 2024

Cited by 1 | Viewed by 1963

Abstract

In this paper, we present a multi-task model that predicts disparities and confidence levels in deep stereo matching simultaneously. We do this by combining its successful model for each separate task and obtaining a multi-task model that can be trained with a proposed [...] Read more.

In this paper, we present a multi-task model that predicts disparities and confidence levels in deep stereo matching simultaneously. We do this by combining its successful model for each separate task and obtaining a multi-task model that can be trained with a proposed loss function. We show the advantages of this model compared to training and predicting disparity and confidence sequentially. This method enables an improvement of 15% to 30% in the area under the curve (AUC) metric when trained in parallel rather than sequentially. In addition, the effect of weighting the components in the loss function on the stereo and confidence performance is investigated. By improving the confidence estimate, the practicality of stereo estimators for creating distance images is increased. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

18 pages, 21505 KB

Open AccessArticle

Correction Compensation and Adaptive Cost Aggregation for Deep Laparoscopic Stereo Matching

by Jian Zhang, Bo Yang, Xuanchi Zhao and Yi Shi

Appl. Sci. 2024, 14(14), 6176; https://doi.org/10.3390/app14146176 - 16 Jul 2024

Viewed by 1125

Abstract

Perception of digitized depth is a prerequisite for enabling the intelligence of three-dimensional (3D) laparoscopic systems. In this context, stereo matching of laparoscopic stereoscopic images presents a promising solution. However, the current research in this field still faces challenges. First, the acquisition of [...] Read more.

Perception of digitized depth is a prerequisite for enabling the intelligence of three-dimensional (3D) laparoscopic systems. In this context, stereo matching of laparoscopic stereoscopic images presents a promising solution. However, the current research in this field still faces challenges. First, the acquisition of accurate depth labels in a laparoscopic environment proves to be a difficult task. Second, errors in the correction of laparoscopic images are prevalent. Finally, laparoscopic image registration suffers from ill-posed regions such as specular highlights and textureless areas. In this paper, we make significant contributions by developing (1) a correction compensation module to overcome correction errors; (2) an adaptive cost aggregation module to improve prediction performance in ill-posed regions; (3) a novel self-supervised stereo matching framework based on these two modules. Specifically, our framework rectifies features and images based on learned pixel offsets, and performs differentiated aggregation on cost volumes based on their value. The experimental results demonstrate the effectiveness of the proposed modules. On the SCARED dataset, our model reduces the mean depth error by 12.6% compared to the baseline model and outperforms the state-of-the-art unsupervised methods and well-generalized models. Full article

(This article belongs to the Special Issue Application of Machine Vision and Deep Learning Technology)

► Show Figures

Figure 1

15 pages, 8167 KB

Open AccessArticle

Underwater Unsupervised Stereo Matching Method Based on Semantic Attention

by Qing Li, Hongjian Wang, Yao Xiao, Hualong Yang, Zhikang Chi and Dongchen Dai

J. Mar. Sci. Eng. 2024, 12(7), 1123; https://doi.org/10.3390/jmse12071123 - 4 Jul 2024

Cited by 2 | Viewed by 2229

Abstract

A stereo vision system provides important support for underwater robots to achieve autonomous navigation, obstacle avoidance, and precise operation in complex underwater environments. This article proposes an unsupervised underwater stereo matching method based on semantic attention. By combining deep learning and semantic information, [...] Read more.

A stereo vision system provides important support for underwater robots to achieve autonomous navigation, obstacle avoidance, and precise operation in complex underwater environments. This article proposes an unsupervised underwater stereo matching method based on semantic attention. By combining deep learning and semantic information, it fills the challenge of insufficient training data, enhances the intelligence level of underwater robots, and promotes the progress of underwater scientific research and marine resource development. This article proposes an underwater unsupervised stereo matching method based on semantic attention, targeting the missing training supervised dataset for underwater stereo matching. An adaptive double quadtree semantic attention model for the initial estimation of semantic disparity is designed, and an unsupervised AWLED semantic loss function is proposed, which is more robust to noise and textureless regions. Through quantitative and qualitative evaluations in the underwater stereo matching dataset, it was found that D1 all decreased by 0.222, EPE decreased by 2.57, 3px error decreased by 1.53, and the runtime decreased by 7 ms. This article obtained advanced results. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

Search Results (74)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (74)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI