Bi-Directional Pyramid Network for Edge Detection
Abstract
:1. Introduction
- Firstly, a down-sampling pyramid network is proposed to enrich the multi-scale representation of the previous encoder. To our knowledge, it is the first time that a down-sampling pyramid network has been used to enrich multi-scale features in the field of edge detection.
- Secondly, a lightweight up-sampling pyramid network is proposed to enhance the multi-scale representation of the previous decoder. Combining these two pyramid networks with a trimmed VGG16 makes up our bi-directional pyramid network (BDP-Net).
- Last but not least, while retaining the test accuracy with BDCN [14], the state-of-the-art model in the field of edge detection, the proposed BDP-Net is experimentally demonstrated to improve the training speed by about one time.
2. Related Work
2.1. Edge Detection
2.2. Existing Multi-Scale Learning
3. Methodology
3.1. The Proposed Down-Sampling Pyramid Network
3.2. The Proposed Lightweight Up-Sampling Pyramid Network
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. The Effect of Network Architecture
4.3.1. Ablation Study on BSDS500
4.3.2. Performance on NYUDv2 and Multicue
5. Conclusions and Discussion
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Ramadevi, Y.; Sridevi, T.; Poornima, B.; Kalyani, B. Segmentation and Object Recognition Using Edge Detection Techniques. Int. J. Comput. Sci. Inf. Technol. 2010, 2, 153–161. [Google Scholar] [CrossRef]
- Arbeláez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour Detection and Hierarchical Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Muthukrishnan, R.; Radha, M. Edge Detection Techniques for Image Segmentation. Int. J. Comput. Sci. Inf. Technol. 2011, 3, 259. [Google Scholar] [CrossRef]
- Gudmundsson, M.; El-Kwae, E.A.; Kabuka, M.R. Edge Detection in Medical Images Using A Genetic Algorithm. IEEE Trans. Med. Imaging 1998, 17, 469–474. [Google Scholar] [CrossRef] [PubMed]
- Nikolic, M.; Tuba, E.; Tuba, M. Edge Detection in Medical Ultrasound Images Using Adjusted Canny Edge Detection Algorithm. In Proceedings of the IEEE 24th Telecommunications Forum, Belgrade, Serbia, 22–23 November 2016; pp. 1–4. [Google Scholar]
- Lin, T.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. AugFPN: Improving Multi-Scale Feature Learning for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, SUA, 14–19 June 2020; pp. 12595–12604. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zhang, Z.; Zhang, X.; Peng, C.; Cheng, D.; Sun, J. ExFuse: Enhancing Feature Fusion for Semantic Segmentation. In Proceedings of the IEEE Conference on European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 273–288. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Xie, S.; Tu, Z. Holistically-Nested Edge Detection. In Proceedings of the IEEE International Conference on Computer Vision, San Diego, CA, USA, 13–16 December 2015; pp. 1395–1403. [Google Scholar]
- Lee, C.; Xie, S.; Gallagher, P.W.; Zhang, Z.; Tu, Z. Deeply-Supervised Nets. In Proceedings of the AISTATS, San Diego, CA, USA, 9–12 May 2015; pp. 562–570. [Google Scholar]
- Liu, Y.; Cheng, M.M.; Hu, X.; Wang, K.; Bai, X. Richer Convolutional Features for Edge Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3000–3009. [Google Scholar]
- He, J.; Zhang, S.; Yang, M.; Shan, Y.; Huang, T. Bi-Directional Cascade Network for Perceptual Edge Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–21 June 2019; pp. 3828–3837. [Google Scholar]
- Hou, Q.; Liu, J.; Cheng, M.; Borji, A.; Torr, P.H.S. Three Birds One Stone: A Unified Framework for Salient Object Segmentation, Edge Detection and Skeleton Extraction. In Proceedings of the IEEE Conference on European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 1–17. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Julesz, B. A Method of Coding Television Signals Based on Edge Detection. Bell Syst. Tech. J. 1959, 38, 1001–1020. [Google Scholar] [CrossRef]
- Robinson, G.S. Color Edge Detection. Proc. SPIE 1975, 87, 126–133. [Google Scholar] [CrossRef]
- Sobel, I. Camera Models and Machine Perception; Technical Report; Department of Electrical Engineering, Stanford University: Stanford, CA, USA, 1970. [Google Scholar]
- Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 6, 679–698. [Google Scholar] [CrossRef]
- Martin, D.; Fowlkes, C.C.; Malik, J. Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 530–549. [Google Scholar] [CrossRef] [PubMed]
- Lim, J.J.; Zitnick, C.L.; Dollar, P. Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3158–3165. [Google Scholar]
- Felzenszwalb, P.F.; Huttenlocher, D.P. Efficient Graph-Based Image Segmentation. Int. J. Comput. Vis. 2004, 59, 167–181. [Google Scholar] [CrossRef]
- Dollár, P.; Zitnick, C.L. Fast Edge Detection Using Structured Forests. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1558–1570. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hallman, S.; Fowlkes, C.C. Oriented Edge Forests for Boundary Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–12 June 2015; pp. 1732–1740. [Google Scholar]
- Zhang, Z. DeepContour: A Deep Convolutional Feature Learned by Positive-sharing Loss for Contour Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–12 June 2015; pp. 3982–3991. [Google Scholar]
- Torresani, L. DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–12 June 2015; pp. 4380–4389. [Google Scholar]
- Ganin, Y.; Lempitsky, V. N4-Fields: Neural Network Nearest Neighbor Fields for Image Transforms. In Proceedings of the Asian Conference on Computer Vision, Singapore, 1–5 November 2014; pp. 536–551. [Google Scholar]
- Deng, R.; Shen, C.; Liu, S.; Wang, H.; Liu, X. Learning to Predict Crisp Boundaries. In Proceedings of the IEEE Conference on European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 570–586. [Google Scholar]
- Arbelaez, P.; Ponttuset, J.; Barron, J.; Marques, F.; Malik, J. Multiscale Combinatorial Grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 328–335. [Google Scholar]
- Farabet, C.; Couprie, C.; Najman, L.; Lecun, Y. Learning Hierarchical Features for Scene Labeling. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1915–1929. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–12 June 2015; pp. 1–9. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4278–4284. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834. [Google Scholar] [CrossRef] [PubMed]
- Huang, G.; Liu, Z.; Der Maaten, L.V.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor Segmentation and Support Inference from RGBD Images. In Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 746–760. [Google Scholar]
- Mély, D.A.; Kim, J.; McGill, M.; Guo, Y.; Serre, T. A Systematic Comparison between Visual Cues for Boundary Detection. Vis. Res. 2016, 120, 93–107. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, F.F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, 20–26 June 2009; p. 1037. [Google Scholar]
- Comaniciu, D.; Meer, P. Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 603–619. [Google Scholar] [CrossRef] [Green Version]
- Mottaghi, R.; Chen, X.; Liu, X.; Cho, N.G.; Yuille, A. The Role of Context for Object Detection and Semantic Segmentation in the Wild. In Proceedings of the Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 891–898. [Google Scholar]
- So, D.R.; Liang, C.; Le, Q.V. The Evolved Transformer. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 5877–5886. [Google Scholar]
- Wu, Z.; Liu, Z.; Lin, J.; Lin, Y.; Han, S. Lite Transformer with Long-Short Range Attention. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Thomas, E.; Jan, H.M.; Frank, H. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1–21. [Google Scholar]
- Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. arXiv 2020, arXiv:2006.02903. [Google Scholar]
Methods | ODS | OIS | ♯ Para | GFLOPS | TT | FPS |
---|---|---|---|---|---|---|
Human | 0.803 | 0.803 | – | – | – | – |
RCF [13] | 0.792 | 0.809 | 14.8 M | 81.77 | 7 h | 5.95 |
BDCN_SEM [14] | 0.801 | 0.817 | +0.1 K | 81.78 | 8 h 28 min | 4.92 |
BDCN_BDC [14] | 0.803 | 0.819 | +1.5 M | 114.3 | 12 h 27 min | 3.35 |
BDCN [14] | 0.804 | 0.820 | +1.5 M | 114.31 | 12 h 35 min | 3.31 |
Ours | 0.803 | 0.822 | +3.9 K | 81.81 | 7 h 42 min | 5.41 |
Methods | ODS | OIS | AP | R50 | ♯ Para | GFLOPS | TT | FPS |
---|---|---|---|---|---|---|---|---|
Human | 0.803 | 0.803 | – | – | – | – | – | – |
Baseline | 0.758 | 0.771 | 0.673 | 0.773 | −89,685 | 80.49 | 5 h 25 min | 7.69 |
BDP-Net_DSP | 0.798 | 0.813 | 0.714 | 0.899 | +74 | 81.78 | 4 h 55 min | 5.65 |
BDP-Net_USP | 0.800 | 0.816 | 0.772 | 0.889 | +3840 | 81.8 | 7 h 9 min | 5.83 |
BDP-Net | 0.803 | 0.822 | 0.845 | 0.918 | +3914 | 81.81 | 7 h 42 min | 5.41 |
Baseline | 0.788 | 0.799 | 0.766 | 0.812 | −89,685 | 80.49 | 5 h 25 min | 7.69 |
BDP-Net_DSP | 0.807 | 0.823 | 0.795 | 0.902 | +74 | 81.78 | 4 h 55 min | 5.71 |
BDP-Net_USP | 0.807 | 0.825 | 0.804 | 0.889 | +3840 | 81.8 | 7 h 9 min | 5.83 |
BDP-Net | 0.808 | 0.828 | 0.847 | 0.913 | +3914 | 81.81 | 7 h 42 min | 5.41 |
RCF [13] | 0.792 | 0.809 | 0.780 | 0.898 | 14.8 M | 81.77 | 7 h | 5.95 |
BDCN [14] | 0.804 | 0.820 | 0.724 | 0.894 | +1.5 M | 114.31 | 12 h 35 min | 3.31 |
RCF [13] | 0.801 | 0.820 | 0.803 | 0.895 | 14.8 M | 81.77 | 7 h | 5.95 |
BDCN [14] | 0.811 | 0.831 | 0.796 | 0.890 | +1.5 M | 114.31 | 12 h 35 min | 3.31 |
Methods | Data | ODS | OIS | AP | R50 | GFLOPS | TT | FPS |
---|---|---|---|---|---|---|---|---|
RCF [13] | HHA | 0.693 | 0.709 | 0.678 | 0.827 | 94.21 | 13 h 26 min | 3.10 |
RGB | 0.736 | 0.755 | 0.728 | 0.896 | 94.21 | 13 h 26 min | 3.10 | |
RGB-HHA | 0.751 | 0.770 | 0.765 | 0.899 | - | - | - | |
BDCN [14] | HHA | 0.694 | 0.708 | 0.671 | 0.818 | 131.39 | 24 h 26 min | 1.71 |
RGB | 0.752 | 0.768 | 0.739 | 0.874 | 131.39 | 24 h 26 min | 1.71 | |
RGB-HHA | 0.762 | 0.776 | 0.761 | 0.869 | - | - | - | |
BDP-Net | HHA | 0.694 | 0.708 | 0.676 | 0.812 | 94.25 | 13 h 43 min | 3.04 |
RGB | 0.746 | 0.762 | 0.750 | 0.892 | 94.25 | 13 h 43 min | 3.04 | |
RGB-HHA | 0.759 | 0.776 | 0.772 | 0.895 | - | - | - |
Methods | ODS | OIS | AP | R50 | GFLOPS | TT | FPS |
---|---|---|---|---|---|---|---|
Human_Boundary [40] | 0.760 | - | - | - | - | - | - |
RCF_Boundary [13] | 0.812 | 0.817 | 0.861 | 0.963 | 98.77 | 1 h 45 min | 3.17 |
BDCN_Boundary [14] | 0.827 | 0.835 | 0.875 | 0.971 | 137.78 | 3 h 15 min | 1.71 |
BDP-Net_Boundary | 0.827 | 0.834 | 0.842 | 0.974 | 98.81 | 1 h 48 min | 3.09 |
Human_Edge [40] | 0.750 | - | - | - | - | - | - |
RCF_Edge [13] | 0.851 | 0.857 | 0.853 | 0.888 | 98.77 | 53 min | 3.14 |
BDCN_Edge [14] | 0.851 | 0.858 | 0.852 | 0.889 | 137.78 | 1 h 36 min | 1.74 |
BDP-Net_Edge | 0.871 | 0.877 | 0.892 | 0.926 | 98.81 | 1 h 6 min | 2.53 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, K.; Tian, Y.; Wang, B.; Qi, Z.; Wang, Q. Bi-Directional Pyramid Network for Edge Detection. Electronics 2021, 10, 329. https://doi.org/10.3390/electronics10030329
Li K, Tian Y, Wang B, Qi Z, Wang Q. Bi-Directional Pyramid Network for Edge Detection. Electronics. 2021; 10(3):329. https://doi.org/10.3390/electronics10030329
Chicago/Turabian StyleLi, Kai, Yingjie Tian, Bo Wang, Zhiquan Qi, and Qi Wang. 2021. "Bi-Directional Pyramid Network for Edge Detection" Electronics 10, no. 3: 329. https://doi.org/10.3390/electronics10030329
APA StyleLi, K., Tian, Y., Wang, B., Qi, Z., & Wang, Q. (2021). Bi-Directional Pyramid Network for Edge Detection. Electronics, 10(3), 329. https://doi.org/10.3390/electronics10030329