AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching
Abstract
1. Introduction
- We propose a novel Adaptive Geometry-aware Multi-head Self-Attention (AG-MSA) for unsupervised stereo matching. Our approach introduces an AGRP attention bias modulation framework, which encodes epipolar geometry priors through a hybrid design of geometric functions for adaptive modulation and penalty, replacing conventional isotropic or simple additive biases and thereby enhancing the geometric consistency and global inference capability.
- We design a Spatial Group-Rational KAN (SGR-KAN) for unsupervised stereo matching. By integrating the powerful nonlinear expressive capability of rational functions with the spatial perception ability of deep convolution, flexible and learnable Group-Rational KAN are directly applied on 2D feature maps to replace MLP. This enables channel-wise and spatial grouped modeling, thereby explicitly preserving spatial structure and enhancing the nonlinear expressive capability in complex regions.
- We propose a Dynamic Candidate Gated Fusion (DCGF) module for global and local unsupervised stereo matching. The module constructs a novel dynamic dual-candidate state mechanism and coordinate attention mechanism enhanced with spatial information and adaptively arbitrates between different fusion strategies based on feature content, ensuring a more effective and complementary fuse of information from the CNN and Stereo-KANformer backbones.
2. Related Work
2.1. CNN-Based Unsupervised Stereo Matching
2.2. Transformer-BasedStereo Matching
2.3. Epipolar Geometry in Stereo and Multi-View Methods
2.4. Kolmogorov–Arnold Networks
3. Materials and Methods
3.1. Overview
3.2. Stereo-KANformer
3.2.1. Stereo-KANformer Block (SKB)
3.2.2. Adaptive Geometry-Aware Multi-Head Self-Attention
3.2.3. Spatial Group-Rational KAN
- (1)
- SGR-KAN Core Activation Unit
- (2)
- Full Architecture of SGR-KAN
3.3. Dynamic Candidate Gated Fusion
- (1)
- Attention Enhancement and Gating Signal Generation
- (2)
- Dynamic Parallel Dual Candidate State Construction
- (3)
- Adaptive Fusion and Update
3.4. Adaptive Geometry-Aware Stereo-KANformer Network
4. Experiment and Analysis
4.1. Experimental Data and Environment
4.2. Ablation Study of AGSK-Net
4.2.1. Effect on Swin Transformer
4.2.2. Effect on Stereo-KANformer
4.2.3. Effect on DCGF
4.2.4. Detailed Analysis of AGRP and Network Architecture
4.3. Benchmark Evaluation
4.3.1. Quantitative Evaluation
Models | KITTI 2015 | Scene Flow | |||||
---|---|---|---|---|---|---|---|
All/% | Noc./% | EPE/Pixel | |||||
D1-bg | D1-fg | D1-All | D1-bg | D1-fg | D1-All | ||
3DG-DVO [38] | 14.12 | 18.68 | 14.88 | 13.54 | 17.27 | 14.16 | - |
Zhou [10] | - | - | 10.23 | - | - | 9.91 | - |
OASM-Net [39] | 6.89 | 19.42 | 8.98 | 5.44 | 17.30 | 7.39 | 3.86 |
SegStereo [40] | - | - | 8.79 | - | - | 7.70 | - |
Self-SuperFlow [41] | 5.78 | 19.76 | 8.11 | 4.69 | 18.29 | 6.93 | - |
AAFS [42] | 6.27 | 13.95 | 7.54 | 5.96 | 13.01 | 7.12 | 2.88 |
PASMNet [12] | 5.41 | 16.36 | 7.23 | 5.02 | 15.16 | 6.69 | 3.54 |
Permutation Stereo [43] | 5.53 | 15.47 | 7.18 | 5.18 | 14.51 | 6.72 | - |
SPSMnet [44] | 5.42 | 12.84 | 6.65 | 4.94 | 12.01 | 6.10 | - |
UHP [45] | 5.00 | 13.70 | 6.45 | 4.65 | 12.37 | 5.93 | - |
CRD-Fusion [46] | 4.59 | 13.68 | 6.11 | 4.30 | 12.73 | 5.69 | - |
Ours | 4.44 | 12.40 | 5.69 | 4.36 | 10.36 | 5.68 | 2.64 |
Models | >2-Pixel/% | >3-Pixel/% | >5-Pixel/% | |||
---|---|---|---|---|---|---|
Noc. | All | Noc. | All | Noc. | All | |
Zhou [10] | - | 14.32 | - | 9.86 | - | 7.88 |
SegStereo [40] | - | - | 7.89 | 9.64 | - | - |
OASM-Net [39] | 9.01 | 11.17 | 6.39 | 8.60 | 4.32 | 6.50 |
Permutation Stereo [43] | 11.89 | 13.16 | 7.39 | 8.48 | 4.32 | 5.11 |
PASMNet [12] | 8.77 | 10.58 | 5.91 | 6.98 | 3.86 | 4.67 |
UHP [45] | 9.08 | 10.37 | 6.05 | 7.09 | 3.69 | 4.43 |
AAFS [42] | 10.64 | 11.69 | 6.10 | 6.94 | 3.28 | 3.81 |
Ours | 7.24 | 8.85 | 4.84 | 5.74 | 3.08 | 3.71 |
4.3.2. Qualitative Evaluation
4.4. Generalization Performance
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yang, M.; Wu, F.; Li, W. RLStereo: Real-time stereo matching based on reinforcement learning. IEEE Trans. 2021, 30, 9442–9455. [Google Scholar] [CrossRef]
- Liu, J.; Ji, P.; Bansal, N.; Cai, C.; Yan, Q.; Huang, X.; Xu, Y. Planemvs: 3D plane reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8665–8675. [Google Scholar]
- Liu, H.; Jia, J.; Gong, N.Z. Pointguard: Provably robust 3D point cloud classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6186–6195. [Google Scholar]
- Wang, J.; Sun, H.; Jia, P. Adaptive Kernel Convolutional Stereo Matching Recurrent Network. Sensors 2024, 24, 7386. [Google Scholar] [CrossRef]
- Chang, J.-R.; Chen, Y.-S. Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar]
- Jiang, Y.; Dong, Z.; Mai, S. Robust Cost Volume Generation Method for Dense Stereo Matching in Endoscopic Scenarios. Sensors 2023, 23, 3427. [Google Scholar] [CrossRef]
- Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1851–1858. [Google Scholar]
- Garg, R.; Bg, V.K.; Carneiro, G.; Reid, I. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; pp. 740–756. [Google Scholar]
- Chen, X.; Xiong, Z.; Cheng, Z.; Peng, J.; Zhang, Y.; Zha, Z.J. Degradation-agnostic correspondence from resolution-asymmetric stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12962–12971. [Google Scholar]
- Zhou, C.; Zhang, H.; Shen, X.; Jia, J. Unsupervised learning of stereo matching. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1567–1575. [Google Scholar]
- Wang, C.; Bai, X.; Wang, X.; Liu, X.; Zhou, J.; Wu, X.; Li, H.; Tao, D. Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Trans. Cybern. 2020, 51, 4770–4783. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Guo, Y.; Wang, Y.; Liang, Z.; Lin, Z.; Yang, J.; An, W. Parallax attention for unsupervised stereo correspondence learning. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2108–2125. [Google Scholar] [CrossRef] [PubMed]
- Tosi, F.; Bartolomei, L.; Poggi, M. A survey on deep stereo matching in the twenties. Int. J. Comput. Vis. 2025, 133, 4245–4276. [Google Scholar] [CrossRef]
- Hamid, M.S.; Manap, N.A.; Hamzah, R.A.; Kadmin, A.F. Stereo matching algorithm based on deep learning: A survey. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 1663–1673. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Li, Z.; Liu, X.; Drenkow, N.; Ding, A.; Creighton, F.X.; Taylor, R.H.; Unberath, M. Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6197–6206. [Google Scholar]
- Jia, D.; Cai, P.; Wang, Q.; Yang, N. A transformer-based architecture for high-resolution stereo matching. IEEE Trans. Comput. Imaging 2024, 10, 83–92. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Su, X.; Liu, S.; Li, R.; Bing, Z.; Knoll, A. Efficient Stereo Matching Using Swin Transformer and Multilevel Feature Consistency in Autonomous Mobile Systems. IEEE Trans. Ind. Inform. 2024, 20, 7957–7965. [Google Scholar] [CrossRef]
- Huang, B.; Zheng, J.Q.; Giannarou, S.; Elson, D.S. H-net: Unsupervised attention-based stereo depth estimation leveraging epipolar geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4460–4467. [Google Scholar]
- Chang, J.; He, J.; Zhang, T.; Yu, J.; Wu, F. EI-MVSNet: Epipolar-guided multi-view stereo network with interval-aware label. IEEE Trans. Image Process. 2024, 33, 753–766. [Google Scholar] [CrossRef]
- Liu, Y.; Cai, Q.; Wang, C.; Yang, J.; Fan, H.; Dong, J.; Chen, S. Geometry-enhanced attentive multi-view stereo for challenging matching scenarios. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7401–7416. [Google Scholar] [CrossRef]
- Wang, X.; Zhu, Z.; Huang, G.; Qin, F.; Ye, Y.; He, Y.; Chi, X.; Wang, X. MVSTER: Epipolar transformer for efficient multi-view stereo. In Proceedings of the European Conference on Computer Vision, Cham, Switzerland, 23–27 October 2022; pp. 573–591. [Google Scholar]
- Taud, H.; Mas, J.F. Multilayer perceptron (MLP). Geomatic Approaches for Modeling Land Change Scenarios; Springer International Publishing: Cham, Switzerland, 2017; pp. 451–455. [Google Scholar]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
- Hu, Y.; Liang, Z.; Yang, F.; Hou, Q.; Liu, X.; Cheng, M.M. Kac: Kolmogorov-arnold classifier for continual learning. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 15297–15307. [Google Scholar]
- Li, C.; Liu, X.; Li, W.; Wang, C.; Liu, H.; Liu, Y.; Chen, Z.; Yuan, Y. U-kan makes strong backbone for medical image segmentation and generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 4652–4660. [Google Scholar]
- Yang, X.; Wang, X. Kolmogorov-arnold transformer. arXiv 2024, arXiv:2409.10594. [Google Scholar] [PubMed]
- Zhang, Y. KM-UNet KAN Mamba UNet for medical image segmentation. arXiv 2025, arXiv:2501.02559. [Google Scholar] [CrossRef]
- Zhang, B.; Huang, H.; Shen, Y.; Sun, M. MM-UKAN++: A Novel Kolmogorov-Arnold Network Based U-shaped Network for Ultrasound Image Segmentation. IEEE Trans. Ultrason. Ferroelectr. Freq. Control. 2025. [Google Scholar] [CrossRef]
- Kolmogorov, A.N. On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk USSR. 1957, 114, 953–956. [Google Scholar]
- Molina, A.; Schramowski, P.; Kersting, K. Padé activation units: End-to-end learning of flexible activation functions in deep networks. arXiv 2019, arXiv:1907.06732. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Zhou, S.; Yuan, G.; Hua, Z.; Li, J. DGFEG: Dynamic gate fusion and edge graph perception network for remote sensing change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3581–3598. [Google Scholar] [CrossRef]
- Wang, X.; Yu, J.; Sun, Z.; Sun, J.; Su, Y. Multi-scale graph neural network for global stereo matching. Signal Process. Image Commun. 2023, 118, 117026. [Google Scholar] [CrossRef]
- Li, M.; Ye, P.; Cui, S.; Zhu, P.; Liu, J. HKAN: A Hybrid Kolmogorov–Arnold Network for Robust Fabric Defect Segmentation. Sensors 2024, 24, 8181. [Google Scholar] [CrossRef] [PubMed]
- Brateanu, A.; Balmez, R.; Orhei, C.; Ancuti, C.; Ancuti, C. Enhancing low-light images with kolmogorov–arnold networks in transformer attention. Sensors 2025, 25, 327. [Google Scholar] [CrossRef] [PubMed]
- Zach, J.; Stelldinger, P. Self-Supervised Deep Visual Stereo Odometry with 3D-Geometric Constraints. In Proceedings of the 18th ACM International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA, 25–27 June 2025; pp. 336–342. [Google Scholar]
- Li, A.; Yuan, Z. Occlusion aware stereo matching via cooperative unsupervised learning. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; pp. 197–213. [Google Scholar]
- Yang, G.; Zhao, H.; Shi, J.; Deng, Z.; Jia, J. Segstereo: Exploiting semantic information for disparity estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 636–651. [Google Scholar]
- Bendig, K.; Schuster, R.; Stricker, D. Self-superflow: Self-supervised scene flow prediction in stereo sequences. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 481–485. [Google Scholar]
- Chang, J.R.; Chang, P.C.; Chen, Y.S. Attention-aware feature aggregation for real-time stereo matching on edge devices. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Brousseau, P.A.; Roy, S. A permutation model for the self-supervised stereo matching problem. In Proceedings of the 2022 19th Conference on Robots and Vision (CRV), Toronto, ON, Canada, 31 May–2 June 2022; pp. 122–131. [Google Scholar]
- Yang, X.; Lai, H.; Zou, B.; Fu, H.; Long, Q. Self-supervised Learning of PSMNet via Generative Adversarial Networks. In Proceedings of the International Conference on Intelligent Computing, Singapore, 25–27 September 2024; pp. 469–479. [Google Scholar]
- Yang, R.; Li, X.; Cong, R.; Du, J. Unsupervised hierarchical iterative tile refinement network with 3D planar segmentation loss. IEEE Robot. Autom. Lett. 2024, 9, 2678–2685. [Google Scholar] [CrossRef]
- Fan, X.; Jeon, S.; Fidan, B. Occlusion-aware self-supervised stereo matching with confidence guided raw disparity fusion. In Proceedings of the 2022 19th Conference on Robots and Vision (CRV), Toronto, ON, Canada, 31 May–2 June 2022; pp. 132–139. [Google Scholar]
ID | AGSK-Net Setting | Scene Flow | KITTI 2015 | |||
---|---|---|---|---|---|---|
Swin Transformer | Stereo-KANformer | DCGF | EPE/Pixel | >3-Pixel/% | ||
AG-MSA | SGR-KAN | |||||
ID1 | 5.01 | 7.438 | ||||
ID2 | ✓ | 4.62 | 7.030 | |||
ID3 | ✓ | 4.28 | 6.715 | |||
ID4 | ✓ | ✓ | 3.46 | 6.104 | ||
ID5 | ✓ | ✓ | ✓ | 2.64 | 5.695 |
ID | Ablation Component | Setting | Scene Flow EPE/Pixel | KITTI 2015 >3-Pixel/% |
---|---|---|---|---|
ID2 | Baseline | Swin-Transformer | 4.62 | 7.030 |
ID3a | AGRP Components | M-only () | 4.45 | 6.881 |
ID3b | P-only () | 4.36 | 6.802 | |
ID3 | Modulation + Penalty | 4.28 | 6.715 | |
ID4a | SKB Architecture | Single-Scale (1/16) | 2.98 | 6.313 |
ID4b | Multi-Scale (1/16, 1/8) | 2.90 | 6.265 | |
ID4c | Multi-Scale (1/16, 1/8, 1/4) | 2.82 | 6.104 | |
ID4d | SKB Depth | Shallow (1,1,1) | 2.82 | 6.104 |
ID4e | Medium Depth (1,1,2) | 2.64 | 5.695 | |
ID4f | Deep (1,2,2) | 2.63 | 5.693 |
Models | Middlebury 2021 | |
---|---|---|
>3-Pixel/% | EPE/Pixel | |
PASMNet [12] | 25.568 | 9.854 |
Ours | 18.089 | 6.720 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feng, Q.; Wang, X.; Lu, Z.; Wang, H.; Qi, T.; Zhang, T. AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching. Sensors 2025, 25, 5905. https://doi.org/10.3390/s25185905
Feng Q, Wang X, Lu Z, Wang H, Qi T, Zhang T. AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching. Sensors. 2025; 25(18):5905. https://doi.org/10.3390/s25185905
Chicago/Turabian StyleFeng, Qianglong, Xiaofeng Wang, Zhenglin Lu, Haiyu Wang, Tingfeng Qi, and Tianyi Zhang. 2025. "AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching" Sensors 25, no. 18: 5905. https://doi.org/10.3390/s25185905
APA StyleFeng, Q., Wang, X., Lu, Z., Wang, H., Qi, T., & Zhang, T. (2025). AGSK-Net: Adaptive Geometry-Aware Stereo-KANformer Network for Global and Local Unsupervised Stereo Matching. Sensors, 25(18), 5905. https://doi.org/10.3390/s25185905