Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection
Abstract
:1. Introduction
- We propose a quality-driven dual-branch feature integration network, consisting of the quality-driven multi-modal feature fusion (QMFF) module and the dual-branch-based multi-level feature aggregation (DMFA) module, to appropriately and sufficiently utilize multi-modal and multi-level features.
- We design the QMFF module to sufficiently explore the complementarity of the spatial and temporal features, where the quality scores are treated as weights to re-calibrate the two modal features and generate the guidance map. Particularly, the guidance map steers the two modal features to pay more attention to salient regions.
- We deploy the DMFA module to adequately integrate the multi-level spatiotemporal features, where the dual-branch fusion (DF) unit is designed to fuse the outputs of two branches including the main body cues of progressive decoder branch and the local details of direct concatenation branch.
2. Related Works
2.1. Handcrafted-Feature Based Video Saliency Models
2.2. Deep Learning-Based Video Saliency Models
3. The Proposed Method
3.1. Architecture Overview
3.2. Quality-Driven Multi-Modal Feature Fusion
3.3. Dual-Branched-Based Multi-Level Feature Aggregation
3.4. Loss Functions
4. Experimental Results
4.1. Datasets, Implementation Details, and Evaluation Metrics
4.2. Comparison with the State-of-the-Art Methods
4.2.1. Quantitative Comparison
4.2.2. Qualitative Comparison
4.3. Ablation Studies
4.4. Failure Cases and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tung, F.; Zelek, J.S.; Clausi, D.A. Goal-based trajectory analysis for unusual behaviour detection in intelligent surveillance. Image Vis. Comput. 2011, 29, 230–240. [Google Scholar] [CrossRef] [Green Version]
- Verlekar, T.T.; Soares, L.D.; Correia, P.L. Gait recognition in the wild using shadow silhouettes. Image Vis. Comput. 2018, 76, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.; Qin, S.; Itti, L. Visual attention guided bit allocation in video compression. Image Vis. Comput. 2011, 29, 1–14. [Google Scholar] [CrossRef]
- Zheng, B.; Chen, Y.; Tian, X.; Zhou, F.; Liu, X. Implicit dual-domain convolutional network for robust color image compression artifact reduction. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3982–3994. [Google Scholar] [CrossRef] [Green Version]
- Hendry; Chen, R.C. Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 2019, 87, 47–56. [Google Scholar] [CrossRef]
- Zhou, X.; Li, G.; Gong, C.; Liu, Z.; Zhang, J. Attention-guided RGBD saliency detection using appearance information. Image Vis. Comput. 2020, 95, 103888. [Google Scholar] [CrossRef]
- Wu, Z.; Li, S.; Chen, C.; Hao, A.; Qin, H. A deeper look at image salient object detection: Bi-stream network with a small training dataset. IEEE Trans. Multimed. 2020, 24, 73–86. [Google Scholar] [CrossRef]
- Pang, Y.; Zhao, X.; Zhang, L.; Lu, H. Multi-scale interactive network for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9413–9422. [Google Scholar]
- Wang, W.; Shen, J.; Shao, L. Video salient object detection via fully convolutional networks. IEEE Trans. Image Process. 2017, 27, 38–49. [Google Scholar] [CrossRef] [Green Version]
- Sun, M.; Zhou, Z.; Hu, Q.; Wang, Z.; Jiang, J. SG-FCN: A motion and memory-based deep learning model for video saliency detection. IEEE Trans. Cybern. 2018, 49, 2900–2911. [Google Scholar] [CrossRef] [Green Version]
- Song, H.; Wang, W.; Zhao, S.; Shen, J.; Lam, K.M. Pyramid dilated deeper convlstm for video salient object detection. In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Springer Nature: Cham, Switzerland, 2018; pp. 715–731. [Google Scholar]
- Fan, D.P.; Wang, W.; Cheng, M.M.; Shen, J. Shifting more attention to video salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8554–8564. [Google Scholar]
- Le, T.N.; Sugimoto, A. Deeply Supervised 3D Recurrent FCN for Salient Object Detection in Videos. In Proceedings of the 28th British Machine Vision Conference (BMVC), London, UK, 4–9 September 2017; Volume 1, pp. 1–13. [Google Scholar]
- Dong, S.; Gao, Z.; Pirbhulal, S.; Bian, G.B.; Zhang, H.; Wu, W.; Li, S. IoT-based 3D convolution for video salient object detection. Neural Comput. Appl. 2020, 32, 735–746. [Google Scholar] [CrossRef]
- Gu, Y.; Wang, L.; Wang, Z.; Liu, Y.; Cheng, M.M.; Lu, S.P. Pyramid constrained self-attention network for fast video salient object detection. Proc. AAAI Conf. Artif. Intell. 2020, 34, 10869–10876. [Google Scholar] [CrossRef]
- Bi, H.; Yang, L.; Zhu, H.; Lu, D.; Jiang, J. STEG-Net: Spatio-Temporal Edge Guidance Network for Video Salient Object Detection. IEEE Trans. Cogn. Dev. Syst. 2021, 14, 902–915. [Google Scholar] [CrossRef]
- Chen, P.; Lai, J.; Wang, G.; Zhou, H. Confidence-Guided Adaptive Gate and Dual Differential Enhancement for Video Salient Object Detection. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar]
- Zhang, M.; Liu, J.; Wang, Y.; Piao, Y.; Yao, S.; Ji, W.; Li, J.; Lu, H.; Luo, Z. Dynamic context-sensitive filtering network for video salient object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 1553–1563. [Google Scholar]
- Li, G.; Xie, Y.; Wei, T.; Wang, K.; Lin, L. Flow guided recurrent neural encoder for video salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3243–3252. [Google Scholar]
- Li, H.; Chen, G.; Li, G.; Yu, Y. Motion guided attention for video salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7274–7283. [Google Scholar]
- Jiao, Y.; Wang, X.; Chou, Y.C.; Yang, S.; Ji, G.P.; Zhu, R.; Gao, G. Guidance and Teaching Network for Video Salient Object Detection. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 2199–2203. [Google Scholar]
- Chen, C.; Song, J.; Peng, C.; Wang, G.; Fang, Y. A Novel Video Salient Object Detection Method via Semisupervised Motion Quality Perception. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2732–2745. [Google Scholar] [CrossRef]
- Chen, C.; Wei, J.; Peng, C.; Qin, H. Depth-quality-aware salient object detection. IEEE Trans. Image Process. 2021, 30, 2350–2363. [Google Scholar] [CrossRef]
- Zhao, X.; Pang, Y.; Zhang, L.; Lu, H.; Zhang, L. Suppress and balance: A simple gated network for salient object detection. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 35–51. [Google Scholar]
- Ma, T.; Yang, M.; Rong, H.; Qian, Y.; Tian, Y.; Al-Nabhan, N. Dual-path CNN with Max Gated block for text-based person re-identification. Image Vis. Comput. 2021, 111, 104168. [Google Scholar] [CrossRef]
- Khorramshahi, P.; Kumar, A.; Peri, N.; Rambhatla, S.S.; Chen, J.C.; Chellappa, R. A dual-path model with adaptive attention for vehicle re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6132–6141. [Google Scholar]
- Zheng, Z.; Zheng, L.; Garrett, M.; Yang, Y.; Xu, M.; Shen, Y.D. Dual-path convolutional image-text embeddings with instance loss. ACM Trans. Multimed. Comput. Commun. Appl. 2020, 16, 1–23. [Google Scholar] [CrossRef]
- Zhang, P.; Wang, D.; Lu, H.; Wang, H.; Ruan, X. Amulet: Aggregating multi-level convolutional features for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 202–211. [Google Scholar]
- Liu, Z.; Wang, Y.; Tu, Z.; Xiao, Y.; Tang, B. TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 4481–4490. [Google Scholar]
- Liu, Z.; Tan, Y.; He, Q.; Xiao, Y. SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection. arXiv 2022, arXiv:2204.05585. [Google Scholar] [CrossRef]
- Liu, H.; Zhang, J.; Yang, K.; Hu, X.; Stiefelhagen, R. CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers. arXiv 2022, arXiv:2203.04838. [Google Scholar]
- Wang, W.; Shen, J.; Shao, L. Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 2015, 24, 4185–4196. [Google Scholar] [CrossRef] [Green Version]
- Wang, W.; Shen, J.; Porikli, F. Saliency-aware geodesic video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3395–3402. [Google Scholar]
- Zhou, X.; Liu, Z.; Li, K.; Sun, G. Video saliency detection via bagging-based prediction and spatiotemporal propagation. J. Vis. Commun. Image Represent. 2018, 51, 131–143. [Google Scholar] [CrossRef]
- Fang, Y.; Wang, Z.; Lin, W.; Fang, Z. Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 2014, 23, 3910–3921. [Google Scholar] [CrossRef] [PubMed]
- Xi, T.; Zhao, W.; Wang, H.; Lin, W. Salient object detection with spatiotemporal background priors for video. IEEE Trans. Image Process. 2016, 26, 3425–3436. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Zou, W.; Tang, Y.; Li, X.; Xu, C.; Komodakis, N. SCOM: Spatiotemporal constrained optimization for salient object detection. IEEE Trans. Image Process. 2018, 27, 3345–3357. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.; Kim, Y.; Sim, J.Y.; Kim, C.S. Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 2015, 24, 2552–2564. [Google Scholar] [CrossRef]
- Li, Y.; Tan, Y.; Yu, J.G.; Qi, S.; Tian, J. Kernel regression in mixed feature spaces for spatio-temporal saliency detection. Comput. Vis. Image Underst. 2015, 135, 126–140. [Google Scholar] [CrossRef]
- Chen, C.; Li, S.; Wang, Y.; Qin, H.; Hao, A. Video saliency detection via spatial-temporal fusion and low-rank coherency diffusion. IEEE Trans. Image Process. 2017, 26, 3156–3170. [Google Scholar] [CrossRef]
- Chen, C.; Li, S.; Qin, H.; Pan, Z.; Yang, G. Bilevel feature learning for video saliency detection. IEEE Trans. Multimed. 2018, 20, 3324–3336. [Google Scholar] [CrossRef]
- Liu, Z.; Li, J.; Ye, L.; Sun, G.; Shen, L. Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans. Circuits Syst. Video Technol. 2016, 27, 2527–2542. [Google Scholar] [CrossRef]
- Zhou, X.; Liu, Z.; Gong, C.; Liu, W. Improving video saliency detection via localized estimation and spatiotemporal refinement. IEEE Trans. Multimed. 2018, 20, 2993–3007. [Google Scholar] [CrossRef]
- Guo, F.; Wang, W.; Shen, J.; Shao, L.; Yang, J.; Tao, D.; Tang, Y.Y. Video saliency detection using object proposals. IEEE Trans. Cybern. 2017, 48, 3159–3170. [Google Scholar] [CrossRef] [Green Version]
- Guo, F.; Wang, W.; Shen, Z.; Shen, J.; Shao, L.; Tao, D. Motion-aware rapid video saliency detection. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 4887–4898. [Google Scholar] [CrossRef]
- Li, S.; Seybold, B.; Vorobyov, A.; Lei, X.; Kuo, C.C.J. Unsupervised video object segmentation with motion-based bilateral networks. In Computer Vision—ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 207–223. [Google Scholar]
- Wen, H.; Zhou, X.; Sun, Y.; Zhang, J.; Yan, C. Deep fusion based video saliency detection. J. Vis. Commun. Image Represent. 2019, 62, 279–285. [Google Scholar] [CrossRef]
- Li, Y.; Li, S.; Chen, C.; Hao, A.; Qin, H. A plug-and-play scheme to adapt image saliency deep model for video data. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 2315–2327. [Google Scholar] [CrossRef]
- Chen, C.; Wang, G.; Peng, C.; Fang, Y.; Zhang, D.; Qin, H. Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 2021, 30, 3995–4007. [Google Scholar] [CrossRef]
- Xu, M.; Fu, P.; Liu, B.; Li, J. Multi-Stream Attention-Aware Graph Convolution Network for Video Salient Object Detection. IEEE Trans. Image Process. 2021, 30, 4183–4197. [Google Scholar] [CrossRef]
- Tang, Y.; Zou, W.; Jin, Z.; Chen, Y.; Hua, Y.; Li, X. Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 1973–1984. [Google Scholar] [CrossRef] [Green Version]
- Yan, P.; Li, G.; Xie, Y.; Li, Z.; Wang, C.; Chen, T.; Lin, L. Semi-supervised video salient object detection using pseudo-labels. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7284–7293. [Google Scholar]
- Le, T.N.; Sugimoto, A. Video salient object detection using spatiotemporal deep features. IEEE Trans. Image Process. 2018, 27, 5002–5015. [Google Scholar] [CrossRef] [Green Version]
- Fang, Y.; Ding, G.; Li, J.; Fang, Z. Deep3DSaliency: Deep stereoscopic video saliency detection model by 3D convolutional networks. IEEE Trans. Image Process. 2018, 28, 2305–2318. [Google Scholar] [CrossRef]
- Zhou, X.; Shen, K.; Liu, Z.; Gong, C.; Zhang, J.; Yan, C. Edge-aware multiscale feature integration network for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5605315. [Google Scholar] [CrossRef]
- Zhou, X.; Fang, H.; Liu, Z.; Zheng, B.; Sun, Y.; Zhang, J.; Yan, C. Dense Attention-guided Cascaded Network for Salient Object Detection of Strip Steel Surface Defects. IEEE Trans. Instrum. Meas. 2021, 71, 5004914. [Google Scholar] [CrossRef]
- Zhou, X.; Wen, H.; Shi, R.; Yin, H.; Zhang, J.; Yan, C. FANet: Feature aggregation network for RGBD saliency detection. Signal Process. Image Commun. 2022, 102, 116591. [Google Scholar] [CrossRef]
- Teed, Z.; Deng, J. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision—ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerlad, 2020; pp. 402–419. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Fan, D.P.; Ji, G.P.; Cheng, M.M.; Shao, L. Concealed object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 6024–6042. [Google Scholar] [CrossRef] [PubMed]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zhang, L.; Dai, J.; Lu, H.; He, Y.; Wang, G. A bi-directional message passing model for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1741–1750. [Google Scholar]
- De Boer, P.T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7479–7489. [Google Scholar]
- Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
- Máttyus, G.; Luo, W.; Urtasun, R. Deeproadmapper: Extracting road topology from aerial images. In Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3438–3446. [Google Scholar]
- Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 724–732. [Google Scholar]
- Li, F.; Kim, T.; Humayun, A.; Tsai, D.; Rehg, J.M. Video segmentation by tracking many figure-ground segments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Sydney, NSW, Australia, 1–8 December 2013; pp. 2192–2199. [Google Scholar]
- Da, K. A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Wang, L.; Lu, H.; Wang, Y.; Feng, M.; Wang, D.; Yin, B.; Ruan, X. Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 136–145. [Google Scholar]
- Wang, B.; Liu, W.; Han, G.; He, S. Learning long-term structural dependencies for video salient object detection. IEEE Trans. Image Process. 2020, 29, 9017–9031. [Google Scholar] [CrossRef]
- Achanta, R.; Hemami, S.; Estrada, F.; Susstrunk, S. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 1597–1604. [Google Scholar]
- Fan, D.P.; Cheng, M.M.; Liu, Y.; Li, T.; Borji, A. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4548–4557. [Google Scholar]
- Yang, C.; Zhang, L.; Lu, H.; Ruan, X.; Yang, M.H. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 3166–3173. [Google Scholar]
- Li, G.; Yu, Y. Visual saliency detection based on multiscale deep CNN features. IEEE Trans. Image Process. 2016, 25, 5012–5024. [Google Scholar] [CrossRef] [Green Version]
- Cheng, M.M.; Mitra, N.J.; Huang, X.; Torr, P.H.; Hu, S.M. Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 569–582. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
DAVIS | DAVSOD | ViSal | SegV2 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SGSP [42] | 0.693 | 0.664 | 0.134 | 0.577 | 0.426 | 0.207 | 0.616 | 0.488 | 0.195 | 0.682 | 0.674 | 0.124 |
STBP [36] | 0.651 | 0.485 | 0.105 | 0.559 | 0.401 | 0.166 | 0.629 | 0.622 | 0.163 | 0.736 | 0.643 | 0.061 |
SFLR [40] | 0.771 | 0.698 | 0.060 | 0.624 | 0.478 | 0.143 | 0.814 | 0.779 | 0.062 | 0.804 | 0.746 | 0.036 |
SCNN [51] | 0.761 | 0.679 | 0.077 | 0.672 | 0.529 | 0.129 | 0.847 | 0.831 | 0.071 | - | - | - |
SCOM [37] | 0.794 | 0.712 | 0.058 | 0.603 | 0.473 | 0.219 | 0.759 | 0.829 | 0.128 | 0.815 | 0.764 | 0.030 |
FGRNE [19] | 0.838 | 0.783 | 0.043 | 0.701 | 0.589 | 0.095 | 0.861 | 0.848 | 0.045 | 0.770 | 0.694 | 0.035 |
MBNM [46] | 0.887 | 0.862 | 0.031 | 0.646 | 0.506 | 0.109 | 0.898 | 0.883 | 0.020 | 0.809 | 0.716 | 0.026 |
PDB [11] | 0.882 | 0.855 | 0.028 | 0.698 | 0.572 | 0.116 | 0.907 | 0.888 | 0.032 | 0.864 | 0.808 | 0.024 |
SSAV [12] | 0.893 | 0.861 | 0.028 | 0.755 | 0.659 | 0.084 | 0.942 | 0.938 | 0.021 | 0.849 | 0.797 | 0.023 |
MGA [20] | 0.910 | 0.889 | 0.023 | 0.748 | 0.650 | 0.082 | 0.941 | 0.940 | 0.016 | 0.881 | 0.829 | 0.026 |
PCSA [15] | 0.902 | 0.880 | 0.022 | 0.741 | 0.655 | 0.086 | 0.946 | 0.941 | 0.017 | 0.866 | 0.811 | 0.024 |
MAGCN [50] | 0.878 | 0.836 | 0.034 | - | - | - | 0.916 | 0.920 | 0.025 | - | - | - |
STFA [49] | 0.892 | 0.865 | 0.023 | 0.744 | 0.650 | 0.086 | 0.952 | 0.952 | 0.013 | 0.891 | 0.860 | 0.017 |
GTNet [21] | 0.916 | 0.898 | 0.022 | 0.757 | 0.692 | 0.074 | 0.948 | 0.947 | 0.018 | 0.756 | 0.684 | 0.036 |
DCFNet [18] | 0.914 | 0.900 | 0.016 | 0.741 | 0.660 | 0.074 | 0.952 | 0.953 | 0.010 | 0.883 | 0.839 | 0.015 |
CAG-DDE [17] | 0.906 | 0.898 | 0.018 | 0.763 | 0.671 | 0.072 | 0.924 | 0.925 | 0.017 | 0.865 | 0.827 | 0.026 |
OUR | 0.918 | 0.912 | 0.018 | 0.773 | 0.705 | 0.069 | 0.946 | 0.952 | 0.012 | 0.883 | 0.834 | 0.015 |
DAVIS | DAVSOD | |||||
---|---|---|---|---|---|---|
w/o QMFF-qf | 0.914 | 0.897 | 0.019 | 0.748 | 0.661 | 0.077 |
w/o QMFF-f | 0.917 | 0.907 | 0.017 | 0.763 | 0.690 | 0.071 |
w/o QMFF-qp | 0.918 | 0.907 | 0.017 | 0.764 | 0.689 | 0.073 |
w/o QMFF-q | 0.917 | 0.904 | 0.018 | 0.736 | 0.645 | 0.080 |
w/o IFI | 0.915 | 0.902 | 0.018 | 0.760 | 0.691 | 0.071 |
w/o db1 | 0.918 | 0.911 | 0.017 | 0.753 | 0.685 | 0.072 |
w/o db2 | 0.917 | 0.905 | 0.020 | 0.764 | 0.680 | 0.071 |
w/o DF | 0.914 | 0.904 | 0.017 | 0.751 | 0.676 | 0.074 |
w BiFPN | 0.897 | 0.882 | 0.027 | 0.759 | 0.680 | 0.079 |
w lw | 0.902 | 0.899 | 0.020 | 0.755 | 0.677 | 0.078 |
OUR | 0.918 | 0.912 | 0.018 | 0.773 | 0.705 | 0.069 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, X.; Gao, H.; Yu, L.; Yang, D.; Zhang, J. Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection. Electronics 2023, 12, 680. https://doi.org/10.3390/electronics12030680
Zhou X, Gao H, Yu L, Yang D, Zhang J. Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection. Electronics. 2023; 12(3):680. https://doi.org/10.3390/electronics12030680
Chicago/Turabian StyleZhou, Xiaofei, Hanxiao Gao, Longxuan Yu, Defu Yang, and Jiyong Zhang. 2023. "Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection" Electronics 12, no. 3: 680. https://doi.org/10.3390/electronics12030680
APA StyleZhou, X., Gao, H., Yu, L., Yang, D., & Zhang, J. (2023). Quality-Driven Dual-Branch Feature Integration Network for Video Salient Object Detection. Electronics, 12(3), 680. https://doi.org/10.3390/electronics12030680