Next Article in Journal
Advancements in MXene Composite Materials for Wearable Sensors: A Review
Previous Article in Journal
Empowering Diabetics: Advancements in Smartphone-Based Food Classification, Volume Measurement, and Nutritional Estimation
Previous Article in Special Issue
Learning-Based Non-Intrusive Electric Load Monitoring for Smart Energy Management
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Monocular Depth Estimation via Self-Supervised Self-Distillation

1
College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2
College of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
3
Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
*
Authors to whom correspondence should be addressed.
Sensors 2024, 24(13), 4090; https://doi.org/10.3390/s24134090
Submission received: 16 May 2024 / Revised: 12 June 2024 / Accepted: 21 June 2024 / Published: 24 June 2024

Abstract

Self-supervised monocular depth estimation can exhibit excellent performance in static environments due to the multi-view consistency assumption during the training process. However, it is hard to maintain depth consistency in dynamic scenes when considering the occlusion problem caused by moving objects. For this reason, we propose a method of self-supervised self-distillation for monocular depth estimation (SS-MDE) in dynamic scenes, where a deep network with a multi-scale decoder and a lightweight pose network are designed to predict depth in a self-supervised manner via the disparity, motion information, and the association between two adjacent frames in the image sequence. Meanwhile, in order to improve the depth estimation accuracy of static areas, the pseudo-depth images generated by the LeReS network are used to provide the pseudo-supervision information, enhancing the effect of depth refinement in static areas. Furthermore, a forgetting factor is leveraged to alleviate the dependency on the pseudo-supervision. In addition, a teacher model is introduced to generate depth prior information, and a multi-view mask filter module is designed to implement feature extraction and noise filtering. This can enable the student model to better learn the deep structure of dynamic scenes, enhancing the generalization and robustness of the entire model in a self-distillation manner. Finally, on four public data datasets, the performance of the proposed SS-MDE method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy (δ1) of 89% while minimizing the error (AbsRel) by 0.102 in NYU-Depth V2 and achieving an accuracy (δ1) of 87% while minimizing the error (AbsRel) by 0.111 in KITTI.
Keywords: monocular depth estimation; self-distillation; self-supervised learning; normal estimate monocular depth estimation; self-distillation; self-supervised learning; normal estimate

Share and Cite

MDPI and ACS Style

Hu, H.; Feng, Y.; Li, D.; Zhang, S.; Zhao, H. Monocular Depth Estimation via Self-Supervised Self-Distillation. Sensors 2024, 24, 4090. https://doi.org/10.3390/s24134090

AMA Style

Hu H, Feng Y, Li D, Zhang S, Zhao H. Monocular Depth Estimation via Self-Supervised Self-Distillation. Sensors. 2024; 24(13):4090. https://doi.org/10.3390/s24134090

Chicago/Turabian Style

Hu, Haifeng, Yuyang Feng, Dapeng Li, Suofei Zhang, and Haitao Zhao. 2024. "Monocular Depth Estimation via Self-Supervised Self-Distillation" Sensors 24, no. 13: 4090. https://doi.org/10.3390/s24134090

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop