Swin Transformer Based on Two-Fold Loss and Background Adaptation Re-Ranking for Person Re-Identification
Abstract
:1. Introduction
- In the training phase, many background removal methods completely remove the background information, which makes Re-ID model unable to utilize a little valuable background information to distinguish the similarity of different identities.
- In the retrieval phase, some re-ranking algorithms only consider original features for retrieval without further calculating the mixed metric of original features and background removal features, resulting in many positive samples still ranking low after recalculating the distance due to background interference.
- (1)
- The complete background-removal methods cannot learn any valuable background information. Compared with these methods, TL-TransNet is proposed to force the Re-ID model to preserve several pieces of background-related information and pay more attention to pedestrian body information. Meanwhile, the parameters of TL-TransNet are optimized by minimizing two-fold loss.
- (2)
- A background adaptation re-ranking method has been developed to improve the ranking results of the positive samples that rank lower due to the interference of background information in the retrieval phase. Firstly, DeepLabV3+ [13] is used as a pedestrian background segmentation model to obtain pedestrian body regions extracted directly with the mask in probe and gallery pedestrian set. Then, the proposed re-ranking based on mixed metrics (i.e., Euclidean and Jaccard metric) combines the original feature with the background-removed feature to obtain a more reliable rank-list that is robust to interference background information.
- (3)
- Comprehensive experiments show that the proposed method can improve generalization ability in terms of background variation issues.
2. Related Work
2.1. Feature Representation Learning for Person Re-ID
2.2. Metric Learning for Person Re-ID
2.3. Re-Ranking for Person Re-ID
2.4. Semantic Segmentation
3. Proposed Method
3.1. Pipeline and Overview
3.2. Training with TL-TransNet
- Circle loss is introduced to strengthen the identity ability of pedestrians. This loss, as an improved triplet loss, designs a weight to control the gradient contribution of positive and negative samples to each other. Given L classes in a person Re-ID dataset, a triplet input is composed of three samples , and . is an anchor sample belong to class a, where class a in {1, 2, …, L}. is a positive sample that comes from the same person class as , while is a negative sample taken from a different person class in terms of . The function of circle loss is computed as follows:
- Instance loss is added to provide better weight initialization for TL-TransNet and encourage the Re-ID model to find fine-grained details with different pedestrians. As instance loss clearly considers the inter-class distance, it can reduce the feature correlation between two pedestrian images. The instance loss is formulated as follows:
3.3. Pedestrian Background Segmentation and Removal
3.4. Background Adaptation Re-Ranking
4. Experimental Section
4.1. Datasets and Evaluation Metric
- Market-1501 is a dataset of 32,668 pedestrian images collected from 6 cameras on campus. It is divided into two subsets. The training set consists of 12,963 images of 751 subjects. Additionally, the testing set is composed of 19,281 images of 750 subjects.
- DukeMTMC-reID is a large-scale person re-identification dataset captured by eight different cameras in the real world. Its training set contains 16,522 images of 702 IDs. Additionally, its test set is composed of 17,661 images of 702 IDs, of which 2228 images are a query set.
Benchmark | Item | Total | Train | Test |
---|---|---|---|---|
Market-1501 | ID | 1501 | 751 | 750 |
Image | 32,668 | 12,936 | 19,281 | |
DukeMTMC-ReID | ID | 1404 | 702 | 702 |
Image | 36,411 | 16,522 | 17,661 |
4.2. Experimental Settings
4.3. Ablation Studies
4.4. Visualized Attention Maps of the TL-TransNet
4.5. The Impact on Tow-Fold Loss with Different
4.6. Rank-List Visualization Analysis
4.7. Comparison with the State-of-the-Art Method
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Wang, X.; Liu, M.; Raychaudhuri, D.S.; Paul, S.; Wang, Y.; Roy-Chowdhurry, A.K. Learning person re-identification models from videos with weak supervision. IEEE Trans. Image Process. 2021, 30, 3017–3028. [Google Scholar] [CrossRef] [PubMed]
- Yang, Q.; Wu, A.; Zheng, W. Person re-identification by contour sketch under moderate clothing change. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2029–2046. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, S.; Li, C.-T.; Kot, A.C. Multi-domain adversarial feature generalization for person re-identification. IEEE Trans. Image Process. 2021, 30, 1596–1607. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Min, W.; Xu, J.; Han, Q.; Wang, Q.; Yang, Z.; Zhou, L. SPACE: Finding key-speaker in complex multi-person scenes. IEEE Trans. Emerg. Topics Comput. 2021. [Google Scholar] [CrossRef]
- Wang, Q.; Min, W.; Han, Q.; Liu, Q.; Zha, C.; Zhao, H.; Wei, Z. Inter-domain adaptation label for data augmentation in vehicle re-identification. IEEE Trans. Multimed. 2022, 24, 1031–1041. [Google Scholar] [CrossRef]
- Wang, S.; Duan, L.; Yang, N.; Dong, J. Person re-identification with deep dense feature representation and Joint Bayesian. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3560–3564. [Google Scholar]
- Song, C.; Huang, Y.; Ouyang, W.; Wang, L. Mask-guided contrastive attention model for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1179–1188. [Google Scholar]
- Yang, C.; Qi, F.; Jia, H. Part-weighted deep representation learning for person re-identification. In Proceedings of the 2020 International Conference on Computing and Data Science (CDS), Stanford, CA, USA, 1–2 August 2020; pp. 36–39. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 3–7 May 2021. [Google Scholar]
- Tian, M.; Yi, S.; Li, H.; Li, S.; Zhang, X.; Shi, J.; Yan, J.; Wang, X. Eliminating background-bias for robust person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5794–5803. [Google Scholar]
- Huang, Y.; Wu, Q.; Xu, J.; Zhong, Y. SBSGAN: Suppression of inter-domain background shift for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 9526–9535. [Google Scholar]
- Mansouri, N.; Ammar, S.; Kessentini, Y. Improving person re-identification by combining Siamese convolutional neural network and re-ranking process. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–8. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Li, Y.; Zhou, L.; Hu, X.; Zhang, J. A combined feature representation of deep feature and hand-crafted features for person re-identification. In Proceedings of the 2016 International Conference on Progress in Informatics and Computing (PIC), Shanghai, China, 23–25 December 2016; pp. 224–227. [Google Scholar]
- Zheng, S.; Li, X.; Men, A.; Guo, X.; Yang, B. Integration of deep features and hand-crafted features for person re-identification. In Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 10–14 July 2017; pp. 674–679. [Google Scholar]
- Li, J.; Zhang, S.; Tian, Q.; Wang, M.; Gao, W. Pose-guided representation learning for person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 622–635. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Liu, F.; Zhang, D. Adversarial view confusion feature learning for person re-identification. IEEE Trans. Circ. Syst. Video Technol. 2022, 31, 1490–1502. [Google Scholar] [CrossRef]
- Han, H.; Tang, J.; Huang, L.; Zhang, Y. Fine and coarse-grained feature learning for unsupervised person re-identification. In Proceedings of the 2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), Shanghai, China, 27–29 August 2021; pp. 314–318. [Google Scholar]
- Wei, L.; Wei, Z.; Jin, Z.; Yu, Z.; Huang, J.; Cai, D.; He, X.; Hua, X.-S. SIF: Self-inspirited feature learning for person re-identification. IEEE Trans. Image Process. 2020, 29, 4942–4951. [Google Scholar] [CrossRef] [PubMed]
- Zhao, C.; Lv, X.; Zhang, Z.; Zuo, W.; Wu, J.; Miao, D. Deep fusion feature representation learning with hard mining center-triplet loss for person re-identification. IEEE Trans. Multimed. 2020, 22, 3180–3195. [Google Scholar] [CrossRef]
- Chen, C.; Dou, H.; Hu, X.; Peng, S. Deep top-rank counter metric for person re-identification. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2732–2739. [Google Scholar]
- Fernando, D.N.; del Carmen, D.J.; Cajote, R. Descriptor extraction and distance metric earning for a robust person re-identification system. In Proceedings of the TENCON 2018—2018 IEEE Region 10 Conference, Jeju, Korea, 28–31 October 2018; pp. 477–482. [Google Scholar]
- Nguyen, B.; De Baets, B. Kernel distance metric learning using pairwise constraints for person re-identification. IEEE Trans. Image Process. 2019, 28, 589–600. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Wang, M.; Tao, D. Person re-identification with metric learning using privileged information. IEEE Trans. Image Process. 2018, 27, 791–805. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yu, H.-X.; Wu, A.; Zheng, W.-S. Unsupervised person re-identification by deep asymmetric metric embedding. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 956–973. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xu, R.; Shen, F.; Wu, H.; Zhu, J.; Zeng, H. Dual modal meta metric learning for attribute-image person re-identification. In Proceedings of the 2021 IEEE International Conference on Networking, Sensing and Control (ICNSC), Xiamen, China, 3–5 December 2021; pp. 1–6. [Google Scholar]
- Guo, R.-P.; Li, C.-G.; Li, Y.; Lin, J. Density-adaptive kernel based re-ranking for person re-identification. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 982–987. [Google Scholar]
- Xu, T.; Zhao, X.; Hou, J.; Zhang, J.; Hao, X.; Yin, J. A general re-ranking method based on metric learning for person re-identification. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Wu, G.; Zhu, X.; Gong, S. Person re-identification by ranking ensemble representations. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2259–2263. [Google Scholar]
- Bai, S.; Tang, P.; Torr, P.H.; Latecki, L.J. Re-ranking via metric fusion for object retrieval and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 740–749. [Google Scholar]
- Jiang, L.; Liang, C.; Xu, D.; Huang, W. Multi-similarity re-ranking for person re-identification. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1212–1216. [Google Scholar]
- Chen, S.; Guo, C.; Lai, J. Deep ranking for person re-identification via joint representation learning. IEEE Trans. Image Process. 2016, 25, 2353–2367. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mortezaie, Z.; Hassanpour, H.; Beghdadi, A. A color-based re-ranking process for people re-identification: Paper ID 21. In Proceedings of the 2021 European Workshop on Visual Information Processing (EUVIP), Paris, France, 23–25 June 2021; pp. 1–5. [Google Scholar]
- Zhang, X.; Li, N.; Zhang, R.; Li, G. Pedestrian re-identification method based on bilateral feature extraction network and re-ranking. In Proceedings of the 2021 International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA), Xi’an, China, 28–30 May 2021; pp. 191–197. [Google Scholar]
- Wang, W.; Zhou, T.; Yu, F.; Dai, J.; Konukoglu, E.; Gool, L.V. Exploring cross-image pixel contrast for semantic segmentation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 7283–7293. [Google Scholar]
- Zhou, T.; Zhang, M.; Zhao, F.; Li, J. Regional semantic contrast and aggregation for weakly supervised semantic segmentation. arXiv 2022, arXiv:2203.09653. [Google Scholar]
- Zhou, T.; Wang, W.; Konukoglu, E.; Gool, L.V. Rethinking semantic segmentation: A prototype view. arXiv 2022, arXiv:2203.15102. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar]
- Sun, Y.; Cheng, C.; Zhang, Y.; Zhang, C.; Zheng, L.; Wang, Z.; Wei, Y. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6398–6407. [Google Scholar]
- Zheng, Z.; Zheng, L.; Garrett, M.; Yang, Y.; Xu, M.; Shen, Y.D. Dual-path convolutional image-text embeddings with instance loss. ACM Trans. Multimed. Comput. 2020, 16, 1–23. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Cao, D.; Li, S. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3652–3661. [Google Scholar]
- Wang, Y.; Wang, L.; You, Y.; Zou, X.; Chen, V.; Li, S.; Huang, G.; Hariharan, B.; Weinberger, K.Q. Resource aware person re-identification across multiple resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8042–8051. [Google Scholar]
- Wu, L.; Wang, Y.; Gao, J.; Wang, M.; Zha, Z.-J.; Tao, D. Deep coattention-based comparator for relative representation learning in person re-identification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 722–735. [Google Scholar] [CrossRef] [PubMed]
- Chang, Y.; Shi, Y.; Wang, Y.; Tian, Y. Bi-directional re-ranking for person re-identification. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 48–53. [Google Scholar]
- Sun, L.; Liu, J.; Zhu, Y.; Jiang, Z. Local to global with multi-scale attention network for person re-identification. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2254–2258. [Google Scholar]
- Shi, X. Person re-identification based on improved residual neural networks. In Proceedings of the 2021 5th International Conference on Communication and Information Systems (ICCIS), Chongqing, China, 15–17 October 2021; pp. 170–174. [Google Scholar]
- Zhang, X.; Yan, Y.; Xue, J.-H.; Hua, Y.; Wang, H. Semantic-aware occlusion-robust network for occluded person re-identification. IEEE Trans. Circ. Syst. Video Technol. 2020, 31, 2764–2778. [Google Scholar] [CrossRef]
- Munir, A.; Martinel, N.; Micheloni, C. Multi branch siamese network for person re-identification. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2019; pp. 2351–2355. [Google Scholar]
- Jin, H.; Lai, S.; Qian, X. Occlusion-sensitive person re-identification via attribute-based shift attention. IEEE Trans. Circ. Syst. Video Technol. 2022, 32, 2170–2185. [Google Scholar] [CrossRef]
Methods | Market-1501 | DukeMTMC-ReID | ||
---|---|---|---|---|
Rank-1 (%) | mAP (%) | Rank-1 (%) | mAP (%) | |
Backbone (ResNet-50) | 88.36 | 72.65 | 66.47 | 46.75 |
Backbone (TL-TransNet) | 94.15 | 86.00 | 86.13 | 74.03 |
Backbone (TL-TransNet) + BAR | 95.34 | 92.60 | 88.91 | 84.83 |
Methods | Market-1501 | DukeMTMC-ReID | ||
---|---|---|---|---|
Rank-1 (%) | mAP (%) | Rank-1 (%) | mAP (%) | |
Swin Transformer (Contrastive + Circle) | 94.39 | 85.64 | 85.73 | 72.46 |
Swin Transformer (Contrastive + Instance) | 94.06 | 85.88 | 87.93 | 75.50 |
Ours | 95.34 | 92.60 | 88.91 | 84.83 |
Methods | Market-1501 | DukeMTMC-ReID | ||
---|---|---|---|---|
Rank-1(%) | mAP (%) | Rank-1 (%) | mAP (%) | |
Circle | 94.29 | 86.28 | 87.79 | 74.98 |
Instance | 94.45 | 84.55 | 87.88 | 74.80 |
Ours | 95.34 | 92.60 | 88.91 | 84.83 |
Params | Market-1501 | DukeMTMC-reID | ||
---|---|---|---|---|
Rank-1 (%) | mAP (%) | Rank-1 (%) | mAP (%) | |
0.1 | 94.21 | 85.38 | 88.02 | 75.68 |
0.2 | 93.79 | 85.42 | 87.93 | 75.56 |
0.3 | 94.24 | 85.70 | 87.61 | 75.83 |
0.4 | 94.03 | 85.78 | 88.06 | 76.72 |
0.5 | 94.15 | 86.00 | 86.13 | 74.03 |
0.6 | 94.42 | 85.79 | 87.70 | 76.18 |
0.7 | 94.63 | 86.23 | 88.29 | 75.97 |
0.8 | 93.79 | 85.67 | 87.16 | 75.83 |
0.9 | 94.15 | 85.87 | 86.98 | 75.44 |
Methods | Market-1501 | DukeMTMC-reID | ||
---|---|---|---|---|
Rank-1 (%) | mAP (%) | Rank-1 (%) | mAP (%) | |
DaRe(R) [42] | 86.40 | 69.30 | 75.20 | 57.40 |
DCC [43] | 86.90 | 69.40 | 84.30 | 69.90 |
Res50+Bi-re [44] | 84.30 | 76.60 | 76.30 | 65.80 |
LGMANet [45] | 94.00 | 82.70 | 87.20 | 73.90 |
Improved ResNet [46] | 85.60 | 69.00 | 78.00 | 60.00 |
SORN [47] | 94.80 | 84.50 | 86.90 | 74.10 |
Muti-C [48] | 91.10 | 75.90 | 82.20 | 62.90 |
BiFeNet + Re-ranking + Circle loss [34] | 94.43 | 89.53 | 82.47 | 77.67 |
AOPS [49] | 93.40 | 84.10 | 86.20 | 74.10 |
TL-TransNet + BAR (Ours) | 95.34 | 92.60 | 88.91 | 84.83 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Q.; Huang, H.; Zhong, Y.; Min, W.; Han, Q.; Xu, D.; Xu, C. Swin Transformer Based on Two-Fold Loss and Background Adaptation Re-Ranking for Person Re-Identification. Electronics 2022, 11, 1941. https://doi.org/10.3390/electronics11131941
Wang Q, Huang H, Zhong Y, Min W, Han Q, Xu D, Xu C. Swin Transformer Based on Two-Fold Loss and Background Adaptation Re-Ranking for Person Re-Identification. Electronics. 2022; 11(13):1941. https://doi.org/10.3390/electronics11131941
Chicago/Turabian StyleWang, Qi, Hao Huang, Yuling Zhong, Weidong Min, Qing Han, Desheng Xu, and Changwen Xu. 2022. "Swin Transformer Based on Two-Fold Loss and Background Adaptation Re-Ranking for Person Re-Identification" Electronics 11, no. 13: 1941. https://doi.org/10.3390/electronics11131941
APA StyleWang, Q., Huang, H., Zhong, Y., Min, W., Han, Q., Xu, D., & Xu, C. (2022). Swin Transformer Based on Two-Fold Loss and Background Adaptation Re-Ranking for Person Re-Identification. Electronics, 11(13), 1941. https://doi.org/10.3390/electronics11131941