Deep Cross-Dimensional Attention Hashing for Image Retrieval
Abstract
:1. Introduction
- First, this paper designs an end-to-end learning framework: DCDAH, which obtains paired images as inputs and finally generates discrete binary code.
- Second, this paper proposes the CDA module which is embedded in the ResNet18. The model emphasizes the importance of cross-dimensional dependence while calculating the weights of channel and spatial attention, and improves the accuracy of image feature representation with almost no additional parameters.
- Third, this paper introduces a new scheme named Zpool to reduce the dimension of the tensor, which can reduce computation and retain abundant representation. A detailed introduction is in the following content. In order to obtain more discernible hash code, the pairwise loss and balanced loss are referenced to minimize quantization error and preserve pairwise similarity.
2. Related Work
2.1. Unsupervised Hashing
2.2. Supervised Hashing
3. Deep Cross-Dimensional Attention Hashing
3.1. Problem Statement
3.2. Network Architecture
3.3. CDA Model
3.4. Measure
3.5. Learning
Algorithm 1. DCDAH. |
Input: |
. |
Output: |
The updated parameters . |
Initialization: |
The parameters of the ResNet18 model are initialized using gaussian distribution. |
Repeat: |
Randomly extract small batches of image data from the input images; |
; |
; |
Calculate the partial derivatives according to (18), (21) and (22); |
Carry on the back propagation and update the parameters iteratively. |
Until: |
Complete a certain number of iterations. |
4. Experiments
4.1. Data Sets
4.2. Evaluation Index
4.3. Hyperparameter Analysis
4.4. Ablation Experiments
4.5. Analysis of Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wei, Y.; Zhao, Y.; Lu, C.; Wei, S.; Liu, L.; Zhu, Z.; Yan, S. Cross-Modal Retrieval With CNN Visual Features: A New Baseline. IEEE Trans. Cybern. 2017, 47, 449–460. [Google Scholar] [CrossRef] [PubMed]
- Chaudhuri, U.; Banerjee, B.; Bhattacharya, A.; Datcu, M. Attention-Driven Cross-Modal Remote Sensing Image Retrieval. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Brussels, Belgium, 11–16 July 2021; pp. 4783–4786. [Google Scholar]
- Misra, M.; Nalamada, T.; Uppili Arasanipalai, A.; Hou, Q. Rotate to Attend: Convolutional Triplet Attention Module. WACV, 2021. pp. 3138–3147. Available online: https://openaccess.thecvf.com/content/WACV2021/html/Misra_Rotate_to_Attend_Convolutional_Triplet_Attention_Module_WACV_2021_paper.html?ref=https://coder.social (accessed on 12 October 2022).
- Pachori, S.; Deshpande, A.; Raman, S. Hashing in the zero-shot framework with domain adaptation. Neurocomputing 2018, 275, 2137–2149. [Google Scholar] [CrossRef]
- Venkateswara, H.; Eusebio, J.; Chakraborty, S.; Panchanathan, S. Deep Hashing Network for Unsupervised Domain Adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5385–5394. [Google Scholar]
- Du, A.; Cheng, S.; Wang, L. Low-Rank Semantic Feature Reconstruction Hashing for Remote Sensing Retrieval. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Wang, B.; Lu, X.; Zheng, X.; Li, X. Semantic descriptions of high-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. Lett. 2019, 16, 1274–1278. [Google Scholar] [CrossRef]
- Guo, Y.; Ding, G.; Liu, L.; Han, J.; Shao, L. Learning to hash with optimized anchor embedding for scalable retrieval. IEEE Trans. Image Process. 2017, 26, 1344–1354. [Google Scholar] [CrossRef]
- Bergamo, A.; Torresani, L.; Fitzgibbon, A. Picodes: Learning a Compact Code for Novel-Category Recognition. In NIPS. 2011, pp. 2088–2096. Available online: https://proceedings.neurips.cc/paper/2011/hash/1896a3bf730516dd643ba67b4c447d36-Abstract.html (accessed on 12 October 2022).
- Liu, D.; Shen, J.; Xia, Z.; Sun, X. A content-based image retrieval scheme using an encrypted difference histogram in cloud computing. Information 2017, 8, 96. [Google Scholar] [CrossRef]
- Bronstein, M.M.; Bronstein, A.M.; Michel, F.; Paragios, N. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 3594–3601. [Google Scholar]
- Webb, B.S.; Dhruv, N.T.; Solomon, S.G. Early and late mechanisms of surround suppression in striate cortex of macaque. Neuroscience 2005, 25, 11666–11675. [Google Scholar] [CrossRef]
- Vedaldi, A.; Zisserman, A. Efficient Additive Kernels via Explicit Feature Maps. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 480–492. [Google Scholar] [CrossRef]
- Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1717–1724. [Google Scholar]
- Li, P.; Han, L. Hashing nets for hashing: A quantized deep learning to hash framework for remote sensing image retrieval. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7331–7345. [Google Scholar] [CrossRef]
- Lin, K.; Lu, J.; Chen, C.; Zhou, J. Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1183–1192. [Google Scholar]
- Deng, C.; Yang, E.; Liu, T.; Li, J.; Liu, W.; Tao, D. Unsupervised Semantic-Preserving Adversarial Hashing for Image Search. IEEE Trans. Image Process. 2019, 28, 4032–4044. [Google Scholar] [CrossRef]
- Zhang, H.; Gu, Y.; Yao, Y.; Zhang, Z.; Liu, L.; Zhang, J.; Shao, L. Deep Unsupervised Self-Evolutionary Hashing for Image Retrieval. IEEE Trans. Multim. 2021, 23, 3400–3413. [Google Scholar] [CrossRef]
- Zhang, J.; Peng, Y. SSDH: Semi-Supervised Deep Hashing for Large Scale Image Retrieval. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 212–225. [Google Scholar] [CrossRef]
- Zheng, S.; Wang, L.; Du, A. Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval. Entropy 2020, 22, 1266. [Google Scholar]
- Zhu, H.; Gao, S. Locality Constrained Deep Supervised Hashing for Image Retrieval. In Proceedings of the 2017 International Joint Conference on Artifificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3567–3573. [Google Scholar]
- Liu, C.; Ma, J.; Tang, X.; Liu, F.; Zhang, X.; Jiao, L. Deep Hash Learning for Remote Sensing Image Retrieval. IEEE Trans. Geosci. Remote. Sens. 2021, 59, 3420–3443. [Google Scholar] [CrossRef]
- Yan, C.; Gong, B.; Wei, Y.; Gao, Y. Deep Multi-View Enhancement Hashing for Image Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1445–1451. [Google Scholar] [CrossRef]
- Gong, Y.; Lazebnik, S.; Gordo, A.; Perronnin, F. Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval. IEEE Trans. Pattern Anal. Mach Intell. 2013, 35, 2916–2929. [Google Scholar] [CrossRef] [PubMed]
- Lu, J.; Hu, J.; Zhou, J. Deep Metric Learning for Visual Understanding: An Overview of Recent Advances. IEEE Signal Process. 2017, 34, 76–84. [Google Scholar] [CrossRef]
- Long, J.; Wei, X.; Qi, Q.; Wang, Y. A deep hashing method based on attention module for image retrieval. In Proceedings of the 2020 13th International Conference on Intelligent Computation Technology and Automation (ICICTA), Xi’an, China, 24–25 October 2020; pp. 284–288. [Google Scholar]
- Cheng, S.; Wang, L.; Du, A.; Li, Y. Bidirectional Focused Semantic Alignment Attention Network for Cross-Modal Retrieval. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 4340–4344. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.; Keweon, I. CBAM: Convolutional Block Attention Module. ECCV. 2018, pp. 3–19. Available online: https://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.html (accessed on 12 October 2022).
- Shen, F.; Xu, Y.; Liu, L.; Yang, Y.; Huang, Z.; Shen, H.T. Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 3034–3044. [Google Scholar] [CrossRef]
- Luo, C.C. A novel web attack detection system for internet of things via ensemble classifcation. IEEE Trans. Indus. 2020, 17, 5810–5818. [Google Scholar] [CrossRef]
- Liu, W.; Wang, J.; Ji, R.; Jiang, Y.; Chang, S. Supervised hashing with kernels. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 2074–2081. [Google Scholar]
- Xia, R.; Pan, Y.; Lai, H.; Liu, C.; Yan, S. Supervised Hashing for Image Retrieval via Image Representation Learning. In AAAI. 2014, pp. 2156–2162. Available online: https://web.archive.org/web/*/http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8137 (accessed on 12 October 2022).
- Liu, H.; Wang, R.; Shan, S.; Chen, X. Deep Supervised Hashing for Fast Image Retrieval. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2064–2072. [Google Scholar]
- Cao, Y.; Long, M.; Liu, B.; Wang, J. Deep Cauchy Hashing for Hamming Space Retrieval. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1229–1237. [Google Scholar]
- Zheng, X.; Zhang, Y. Deep balanced discrete hashing for image retrieval. Neurocomputing 2020, 403, 224–236. [Google Scholar] [CrossRef]
- Li, X.; Xu, M.; Xu, J.; Weise, T.; Zou, L.; Sun, F.; Wu, Z. Image Retrieval Using a Deep Attention-Based Hash. IEEE Access 2020, 8, 142229–142242. [Google Scholar] [CrossRef]
- Jin, L.; Shu, X.; Li, K.; Li, Z.; Qi, G.; Tang, J. Deep Ordinal Hashing with Spatial Attention. IEEE Trans. Image Process. 2019, 28, 2173–2186. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Wang, L.; Cheng, S. Deep parameter-free attention hashing for image retrieval. Sci. Rep. 2022, 12, 7082. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Pei, W.; Zha, Y.; Gemert, J. Push for Quantization: Deep Fisher Hashing. BMVC 2019, 21. Available online: https://bmvc2019.org/wp-content/uploads/papers/0938-paper.pdf (accessed on 12 October 2022).
- Li, Q.; Sun, Z.; He, R.; Tan, T. Deep Supervised Discrete Hashing. Adv. Neural Inf. Processing Syst. 2017, 30, 2482–2491. [Google Scholar]
- Wang, X.; Shi, Y.; Kitani, K. Deep Supervised Hashing with Triplet Labels. In Asian Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 70–84. [Google Scholar]
- Cao, Z.; Long, M.; Wang, J.; Yu, P. HashNet: Deep Learning to Hash by Continuation. ICCV 2017, 5609–5618. Available online: https://www.computer.org/csdl/proceedings-article/iccv/2017/1032f609/12OmNqGA5a7 (accessed on 12 October 2022).
- Zhang, Z.; Zou, Q.; Lin, Y.; Chen, L.; Wang, S. Improved Deep Hashing with Soft Pairwise Similarity for Multi-label Image Retrieval. IEEE Trans. Multim. 2020, 22, 540–553. [Google Scholar] [CrossRef]
- Chen, Y.; Kalantidis, Y.; Li, J.; Yan, S.; Feng, J. A2-Nets: Double Attention Networks. CORR 2018. Available online: https://proceedings.neurips.cc/paper/2018/hash/e165421110ba03099a1c0393373c5b43-Abstract.html (accessed on 12 October 2022).
- Woo, S.; Park, J.; Lee, J.; Keweon, I. BAM: Bottleneck Attention Module. BMVC 2018, 147. Available online: http://bmvc2018.org/contents/papers/0092.pdf (accessed on 12 October 2022).
Layer | Configuration |
---|---|
Convolution layer | {64 × 112 × 112, k = 7 × 7, s = 2 × 2, p = 3 × 3, ReLU} |
Maxpool | {64 × 54 × 54, k = 3 × 3, s = 2 × 2, p = 1 × 1, ReLU} |
Layer1 | {64 × 56 × 56, k = 3 × 3, s = 1 × 1, p = 1 × 1, ReLU} × 4 |
Layer2 | {128 × 28 × 28, k = 3 × 3, s = 2 × 2, p = 1 × 1, ReLU} × 4 |
Layer3 | {256 × 14 × 14, k = 3 × 3, s = 2 × 2, p = 1 × 1, ReLU} × 4 |
Layer4 | {512 × 7 x 7, k = 3 × 3, s = 2 × 2, p = 1 × 1, ReLU} × 4 |
Avgpool | 512 × 1 × 1 |
Hash layer | L |
Item | Configuration |
---|---|
OS | Ubuntu 16.04(×4) |
GPU | Tesla V100 |
Model | Parameters | FLOPs |
---|---|---|
ResNet18 | 22.36 M | 7.29 G |
ResNet18 + A2 Attention | 22.62 M | 7.31 G |
ResNet18 + BAM | 22.44 M | 7.30 G |
ResNet18 + CBAM | 22.40 M | 7.30 G |
ours | 22.36 M | 7.29 G |
α | CIFAR10 (mAP@ALL) | NUS-WIDE (mAP@5000) | ||||||
---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 48 bit | 64 bit | 16 bit | 32 bit | 48 bit | 64 bit | |
0.01 | 0.810 | 0.833 | 0.806 | 0.827 | 0.826 | 0.842 | 0.851 | 0.853 |
0.05 | 0.810 | 0.814 | 0.844 | 0.821 | 0.821 | 0.841 | 0.847 | 0.853 |
0.1 | 0.827 | 0.845 | 0.844 | 0.838 | 0.822 | 0.845 | 0.851 | 0.857 |
0.2 | 0.811 | 0.824 | 0.824 | 0.838 | 0.820 | 0.840 | 0.850 | 0.851 |
0.3 | 0.811 | 0.820 | 0.830 | 0.843 | 0.819 | 0.839 | 0.849 | 0.850 |
0.4 | 0.805 | 0.840 | 0.829 | 0.821 | 0.824 | 0.840 | 0.847 | 0.854 |
0.5 | 0.802 | 0.814 | 0.818 | 0.833 | 0.824 | 0.840 | 0.844 | 0.854 |
Framework | DBDH | DCDAH-1 | DCDAH-2 |
---|---|---|---|
AlexNet | √ | ||
ResNet18 | √ | √ | |
DCA Module | √ | ||
mAP(32 bit) | 0.773 | 0.814 | 0.845 |
Method | CIFAR-10 (mAP@ALL) | NUS-WIDE (mAP@5000) | ||||||
---|---|---|---|---|---|---|---|---|
16 bit | 32 bit | 48 bit | 64 bit | 16 bit | 32 bit | 48 bit | 64 bit | |
DCDAH | 0.827 | 0.845 | 0.844 | 0.838 | 0.822 | 0.845 | 0.851 | 0.857 |
DBDH | 0.798 | 0.814 | 0.815 | 0.820 | 0.811 | 0.834 | 0.839 | 0.849 |
DCH | 0.756 | 0.805 | 0.821 | 0.800 | 0.785 | 0.799 | 0.807 | 0.798 |
DFH | 0.584 | 0.680 | 0.795 | 0.822 | 0.784 | 0.815 | 0.806 | 0.817 |
DSH | 0.555 | 0.479 | 0.562 | 0.499 | 0.676 | 0.759 | 0.785 | 0.783 |
DSDH | 0.765 | 0.806 | 0.812 | 0.802 | 0.768 | 0.782 | 0.773 | 0.765 |
DTSH | 0.695 | 0.784 | 0.804 | 0.805 | 0.815 | 0.834 | 0.837 | 0.832 |
HashNet | 0.543 | 0.646 | 0.746 | 0.764 | 0.718 | 0.802 | 0.809 | 0.742 |
IDHN | 0.777 | 0.777 | 0.728 | 0.736 | 0.786 | 0.766 | 0.727 | 0.566 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chao, Z.; Li, Y. Deep Cross-Dimensional Attention Hashing for Image Retrieval. Information 2022, 13, 506. https://doi.org/10.3390/info13100506
Chao Z, Li Y. Deep Cross-Dimensional Attention Hashing for Image Retrieval. Information. 2022; 13(10):506. https://doi.org/10.3390/info13100506
Chicago/Turabian StyleChao, Zijian, and Yongming Li. 2022. "Deep Cross-Dimensional Attention Hashing for Image Retrieval" Information 13, no. 10: 506. https://doi.org/10.3390/info13100506
APA StyleChao, Z., & Li, Y. (2022). Deep Cross-Dimensional Attention Hashing for Image Retrieval. Information, 13(10), 506. https://doi.org/10.3390/info13100506