Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning
Abstract
:1. Introduction
- Proposition of the global appearance model for MOT using contrastive learning among tracked objects;
- Proposition of the global relation motion model for MOT using adversarial learning with object self motions and relative motions;
- Proposition of the object constraint learning to reduce the online learning computational complexity during the model update.
2. Related Works
2.1. Online Multi-Object Tracking
2.1.1. Multi-Object Tracking with Object-Specific Models
2.1.2. Multi-Object Tracking with Global Models
2.2. Global Appearance Model Learning
2.3. Motion Model Learning
3. Online Multi-Object Tracking
- (1)
- Recent multi-object tracking methods tend to improve the accuracy by applying well-designed detection methods. For example, Refs. [2,28,61] exploit the Faster R-CNN head [62] which is one of popular detection methods. By using the detection method, they refine public detections by discarding false detections or correcting misaligned detections before feeding it to multi-object tracking network. Moreover, Ref. [63] attaches an appearance embedding feature head into a detector [64] in order to identify the tracked object, as well as more accurate object localizations compared to original public detections. They can improve the MOT accuracy but the overall tracking speed degrades in return because of the computational cost for detection. On the other hand, the confidence-based object association algorithm exploits public detections without any manipulation and additional inputs by detection heads. To improve the accuracy, this method aims to enhance the association quality which is key for robust multi-object tracking regardless of the quality of object location by detection methods.
- (2)
- The confidence-based object association algorithm is one of representative multi-object tracking methods which improves the tracking accuracy by applying adaptive association methods (i.e., local association and global association) according to confidences of tracked objects. However, their affinity models used for the association are somewhat outdated. Therefore, in this work, we present more powerful affinity global appearance (in Section 4) and motion models (in Section 5), and the constraint learning method (in Section 6) to update affinity models effectively. As a result, we can improve both tracking accuracy and speed considerably.
3.1. Confidence-Based Object Association
3.2. Affinity Model
4. Global Appearance Model
4.1. Deep Feature Extractor
4.2. Triplet Loss
4.3. Online Hard Triplet Mining and Loss
5. Global Relation Motion Model
5.1. Generative Adversarial Networks
5.2. Generator and Discriminator
5.2.1. Generator
5.2.2. Discriminator
6. Object Constraint Learning
- Constraint 1: becomes lower than ;
- Constraint 2: .
7. Experimental Results
7.1. Datasets
7.2. Implementation Details
7.3. Performance Evaluation Metrics
7.4. Comparison on the MOT Benchmark Challenge
7.5. Ablation Studies
7.5.1. Comparison with the Baseline MOT Method
7.5.2. Global Object Model Comparison
- (M1)
- Baseline multi-object tracking method;
- (M2)
- Combining global relation motion model with M1;
- (M3)
- Combining global appearance and global relation motion models with M1.
7.5.3. Appearance Model Comparison
7.5.4. Motion Model Comparison
7.5.5. Object Constraint Learning
7.6. Qualitative Results
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, X.; Fan, B.; Chang, S.; Wang, Z.; Liu, X.; Tao, D.; Huang, T.S. Greedy batch-based minimum-cost flows for tracking multiple objects. IEEE TIP 2017, 26, 4765–4776. [Google Scholar] [CrossRef] [PubMed]
- Hornakova, A.; Henschel, R.; Rosenhahn, B.; Swoboda, P. Lifted disjoint paths with application in multiple object tracking. In Proceedings of the ICML, Virtual, 12–18 July 2020; pp. 4364–4375. [Google Scholar]
- Chen, L.; Ai, H.; Chen, R.; Zhuang, Z. Aggregate tracklet appearance features for multi-object tracking. IEEE Signal Process. Lett. 2019, 26, 1613–1617. [Google Scholar] [CrossRef]
- Yang, B.; Nevatia, R. Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In Proceedings of the CVPR, Providence, RI, USA, 16–21 June 2012; pp. 1918–1925. [Google Scholar]
- Kim, C.; Li, F.; Rehg, J.M. Multi-object tracking with neural gating using bilinear lstm. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 200–215. [Google Scholar]
- Fagot-Bouquet, L.; Audigier, R.; Dhome, Y.; Lerasle, F. Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. In Proceedings of the ECCV, Amsterdam, Netherlands, 8–16 October 2016; pp. 774–790. [Google Scholar]
- He, Y.; Wei, X.; Hong, X.; Ke, W.; Gong, Y. Identity-Quantity Harmonic Multi-Object Tracking. IEEE Trans. Image Process. 2022, 31, 2201–2215. [Google Scholar] [CrossRef]
- Wang, G.; Wang, Y.; Gu, R.; Hu, W.; Hwang, J.N. Split and connect: A universal tracklet booster for multi-object tracking. IEEE Trans. Multimed. 2022. [Google Scholar] [CrossRef]
- Bae, S.H.; Yoon, K.J. Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 595–610. [Google Scholar] [CrossRef] [PubMed]
- Eiselein, V.; Arp, D.; Pätzold, M.; Sikora, T. Real-time multi-human tracking using a probability hypothesis density filter and multiple detectors. In Proceedings of the AVSS, Beijing, China, 18–21 September 2012; pp. 325–330. [Google Scholar]
- Chu, P.; Fan, H.; Tan, C.C.; Ling, H. Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In Proceedings of the WACV, Waikoloa Village, HI, USA, 7–11 January 2019; pp. 161–170. [Google Scholar]
- Tian, W.; Lauer, M.; Chen, L. Online multi-object tracking using joint domain information in traffic scenarios. IEEE Trans. Intell. Transp. Syst. 2019, 21, 374–384. [Google Scholar] [CrossRef]
- Feng, W.; Hu, Z.; Wu, W.; Yan, J.; Ouyang, W. Multi-object tracking with multiple cues and switcher-aware classification. arXiv 2019, arXiv:1901.06129. [Google Scholar]
- He, Q.; Wu, J.; Yu, G.; Zhang, C. Sot for mot. arXiv 2017, arXiv:1712.01059. [Google Scholar]
- Zhu, J.; Yang, H.; Liu, N.; Kim, M.; Zhang, W.; Yang, M.H. Online multi-object tracking with dual matching attention networks. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 366–382. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In Proceedings of the ECCV, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Liu, Q.; Chen, D.; Chu, Q.; Yuan, L.; Liu, B.; Zhang, L.; Yu, N. Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 2022, 483, 333–347. [Google Scholar] [CrossRef]
- Chu, Q.; Ouyang, W.; Liu, B.; Zhu, F.; Yu, N. Dasot: A unified framework integrating data association and single object tracking for online multi-object tracking. In Proceedings of the AAAI, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10672–10679. [Google Scholar]
- Baisa, N.L. Robust online multi-target visual tracking using a HISP filter with discriminative deep appearance learning. J. Vis. Commun. Image Represent. 2021, 77, 102952. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the CVPR, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Yang, B.; Nevatia, R. An online learned CRF model for multi-target tracking. In Proceedings of the CVPR, Providence, RI, USA, 16–21 June 2012; pp. 2034–2041. [Google Scholar]
- Kuo, C.H.; Huang, C.; Nevatia, R. Multi-target tracking by on-line learned discriminative appearance models. In Proceedings of the CVPR, San Francisco, CA, USA, 13–18 June 2010; pp. 685–692. [Google Scholar]
- Yoon, Y.c.; Boragule, A.; Song, Y.m.; Yoon, K.; Jeon, M. Online multi-object tracking with historical appearance matching and scene adaptive detection filtering. In Proceedings of the AVSS, Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
- Chu, P.; Ling, H. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In Proceedings of the ICCV, Seoul, Korea, 27 October–2 November 2019; pp. 6172–6181. [Google Scholar]
- Zhao, D.; Fu, H.; Xiao, L.; Wu, T.; Dai, B. Multi-object tracking with correlation filter for autonomous vehicle. Sensors 2018, 18, 2004. [Google Scholar] [CrossRef] [Green Version]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the CVPR, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar] [CrossRef]
- Hornakova, A.; Kaiser, T.; Swoboda, P.; Rolinek, M.; Rosenhahn, B.; Henschel, R. Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths. In Proceedings of the ICCV, Virtual, 11–17 October 2021; pp. 6330–6340. [Google Scholar]
- Peng, J.; Wang, T.; Lin, W.; Wang, J.; See, J.; Wen, S.; Ding, E. TPM: Multiple object tracking with tracklet-plane matching. Pattern Recognit. 2020, 107, 107480. [Google Scholar] [CrossRef]
- Shi, J. Good features to track. In Proceedings of the CVPR, Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar]
- Wang, B.; Wang, G.; Luk Chan, K.; Wang, L. Tracklet association with online target-specific metric learning. In Proceedings of the CVPR, Columbus, OH, USA, 23–28 June 2014; pp. 1234–1241. [Google Scholar]
- Lee, S.H.; Kim, M.Y.; Bae, S.H. Learning discriminative appearance models for online multi-object tracking with appearance discriminability measures. IEEE Access 2018, 6, 67316–67328. [Google Scholar] [CrossRef]
- Wang, B.; Wang, G.; Chan, K.L.; Wang, L. Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 589–602. [Google Scholar] [CrossRef] [Green Version]
- Milan, A.; Rezatofighi, S.H.; Dick, A.; Reid, I.; Schindler, K. Online multi-target tracking using recurrent neural networks. In Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Chen, L.; Ai, H.; Shang, C.; Zhuang, Z.; Bai, B. Online multi-object tracking with convolutional neural networks. In Proceedings of the ICIP, Beijing, China, 17–20 September 2017; pp. 645–649. [Google Scholar]
- Dong, X.; Shen, J. Triplet Loss in Siamese Network for Object Tracking. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Unde, A.S.; Rameshan, R.M. MOTS R-CNN: Cosine-margin-triplet loss for multi-object tracking. arXiv 2021, arXiv:2102.03512. [Google Scholar]
- Lusardi, C.; Taufique, A.M.N.; Savakis, A. Robust Multi-Object Tracking Using Re-Identification Features and Graph Convolutional Networks. In Proceedings of the ICCVW, Virtual, 11–17 October 2021; pp. 3868–3877. [Google Scholar]
- Leal-Taixé, L.; Canton-Ferrer, C.; Schindler, K. Learning by Tracking: Siamese CNN for Robust Target Association. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 418–425. [Google Scholar]
- Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 539–546. [Google Scholar]
- Xie, E.; Ding, J.; Wang, W.; Zhan, X.; Xu, H.; Sun, P.; Li, Z.; Luo, P. Detco: Unsupervised contrastive learning for object detection. In Proceedings of the ICCV, Virtual, 11–17 October 2021; pp. 8392–8401. [Google Scholar]
- Mo, S.; Kang, H.; Sohn, K.; Li, C.L.; Shin, J. Object-aware contrastive learning for debiased scene representation. arXiv 2021, arXiv:2108.00049. [Google Scholar]
- Pirk, S.; Khansari, M.; Bai, Y.; Lynch, C.; Sermanet, P. Online object representations with contrastive learning. arXiv 2019, arXiv:1906.04312. [Google Scholar]
- Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994; Volume 2. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar]
- Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
- Beaupré, D.A.; Bilodeau, G.A.; Saunier, N. Improving multiple object tracking with optical flow and edge preprocessing. arXiv 2018, arXiv:1801.09646. [Google Scholar]
- Lucas, B.D.; Kanade, T. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, Vancouver, BC, Canada, 24–28 August 1981. [Google Scholar]
- Fischer, P.; Dosovitskiy, A.; Ilg, E.; Häusser, P.; Hazirbas, C.; Golkov, V.; van der Smagt, P.; Cremers, D.; Brox, T. FlowNet: Learning Optical Flow with Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. FlowNet 2. In 0: Evolution of Optical Flow Estimation with Deep Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Sun, D.; Yang, X.; Liu, M.Y.; Kautz, J. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8934–8943. [Google Scholar]
- Teed, Z.; Deng, J. Raft: Recurrent all-pairs field transforms for optical flow. In Proceedings of the ECCV, Virtual, 23–28 August 2020; pp. 402–419. [Google Scholar]
- Scovanner, P.; Tappen, M.F. Learning pedestrian dynamics from the real world. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 381–388. [Google Scholar] [CrossRef] [Green Version]
- Pellegrini, S.; Ess, A.; Schindler, K.; van Gool, L. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the ICCV, Kyoto, Japan, 29 September–2 October 2009; pp. 261–268. [Google Scholar] [CrossRef] [Green Version]
- Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the CVPR, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 961–971. [Google Scholar]
- Gupta, A.; Johnson, J.; Fei-Fei, L.; Savarese, S.; Alahi, A. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the CVPR, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2255–2264. [Google Scholar]
- Mohamed, A.; Qian, K.; Elhoseiny, M.; Claudel, C. Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In Proceedings of the CVPR, Virtual, 14–19 June 2020; pp. 14424–14432. [Google Scholar]
- Liu, Y.; Yan, Q.; Alahi, A. Social nce: Contrastive learning of socially-aware motion representations. In Proceedings of the ICCV, Virtual, 11–17 October 2021; pp. 15118–15129. [Google Scholar]
- Leal-Taixé, L.; Milan, A.; Reid, I.; Roth, S.; Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv 2015, arXiv:1504.01942. [Google Scholar]
- Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by example. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2007; Volume 26, pp. 655–664. [Google Scholar]
- Stadler, D.; Beyerer, J. Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling. In Proceedings of the CVPR, Nashville, TN, USA, 19–25 June 2021; pp. 10958–10967. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards real-time multi-object tracking. In Proceedings of the ECCV, Virtual, 23–28 August 2020; pp. 107–122. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Ahuja, R.K.; Magnanti, T.L.; Orlin, J.B. Network Flows; MIT: Cambridge, MA, USA, 1988. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the ICML, Atlanta, GA, USA, 16–21 June 2013; Volume 30, p. 3. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable person re-identification: A benchmark. In Proceedings of the ICCV, Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
- Sanderson, C.; Curtin, R. Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 2016, 1, 26. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
- Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The clear mot metrics. Eurasip J. Image Video Process. 2008, 2008, 1–10. [Google Scholar] [CrossRef]
- Li, Y.; Huang, C.; Nevatia, R. Learning to associate: Hybridboosted multi-target tracker for crowded scene. In Proceedings of the CVPR, Miami, FL, USA, 20–25 June 2009; pp. 2953–2960. [Google Scholar]
- Baisa, N.L. Online multi-object visual tracking using a GM-PHD filter with deep appearance learning. In Proceedings of the 2019 22th International Conference on Information Fusion (FUSION), Otawa, ON, Canada, 2–5 July 2019; pp. 1–8. [Google Scholar]
- Boragule, A.; Jeon, M. Joint cost minimization for multi-object tracking. In Proceedings of the AVSS, Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Baisa, N.L.; Wallace, A. Development of a N-type GM-PHD filter for multiple target, multiple type visual tracking. J. Vis. Commun. Image Represent. 2019, 59, 257–271. [Google Scholar] [CrossRef] [Green Version]
- Dehghan, A.; Modiri Assari, S.; Shah, M. Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In Proceedings of the CVPR, Boston, MA, USA, 7–12 June 2015; pp. 4091–4099. [Google Scholar]
- Le, N.; Heili, A.; Odobez, J.M. Long-term time-sensitive costs for crf-based tracking by detection. In Proceedings of the ECCV, Amsterdam, The Netherlands, 8–16 October 2016; pp. 43–51. [Google Scholar]
- Lee, J.; Kim, S.; Ko, B.C. Online Multiple Object Tracking Using Rule Distillated Siamese Random Forest. IEEE Access 2020, 8, 182828–182841. [Google Scholar] [CrossRef]
- Pang, B.; Li, Y.; Zhang, Y.; Li, M.; Lu, C. Tubetk: Adopting tubes to track multi-object in a one-step training model. In Proceedings of the CVPR, Virtual, 14–19 June 2020; pp. 6308–6318. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE TPAMI 2009, 32, 1627–1645. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully convolutional one-stage object detection. In Proceedings of the CVPR, Long Beach, CA, USA, 16–20 June 2019; pp. 9627–9636. [Google Scholar]
- Kieritz, H.; Becker, S.; Hübner, W.; Arens, M. Online multi-person tracking using integral channel features. In Proceedings of the AVSS, Colorado Springs, CO, USA, 23–26 August 2016; pp. 122–130. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Method | Setting | Detections | MOTA ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ | Hz ↑ |
---|---|---|---|---|---|---|---|---|---|---|
Tube_TK [82] | Online | FCOS [84] | 64.0% | 59.4% | 33.5% | 19.4% | 10,962 | 53,626 | 1117 | 1.0 |
TMOH [61] | Online | Faster R-CNN [62] | 63.2% | 63.5% | 27.0% | 31.0% | 3122 | 63,376 | 635 | 0.7 |
MOTRF [81] | Online | YOLOv3 [64] | 57.9% | 41.7% | 28.5% | 22.1% | 8196 | 66,538 | 2051 | 11.1 |
LSST16O [13] | Online | DPM [83] | 49.2% | 56.5% | 13.4% | 41.4% | 7187 | 84,875 | 606 | 2.0 |
KCF16 [11] | Online | DPM [83] | 48.8% | 47.2% | 15.8% | 38.1% | 5875 | 86,567 | 906 | 0.1 |
SOT + MOT [14] | Online | DPM [83] | 46.4% | - | 18.6% | 46.5% | 12,491 | 87,855 | 404 | 0.8 |
DMAN [15] | Online | DPM [83] | 46.1% | 46.1% | 17.4% | 42.7% | 7909 | 89,874 | 532 | 2.4 |
oICF [85] | Online | DPM [83] | 43.2% | 49.3% | 11.3% | 48.5% | 6651 | 96,515 | 381 | 0.4 |
AM_ADM [32] | Online | DPM [83] | 40.1% | 43.8% | 7.1% | 46.2% | 8503 | 99,891 | 789 | 5.8 |
HISP_DAL [19] | Online | DPM [83] | 37.4% | 30.5% | 7.6% | 50.9% | 3222 | 108,865 | 2101 | 3.3 |
JCmin_MOT [77] | Online | DPM [83] | 36.7% | 28.6% | 7.5% | 54.4% | 2936 | 111,890 | 667 | 14.8 |
GM_PHD_DAL [76] | Online | DPM [83] | 35.1% | 26.6% | 7.0% | 51.4% | 2350 | 111,886 | 4047 | 3.5 |
GM_PHD_N1T [78] | Online | DPM [83] | 33.3% | 22.6% | 7.2% | 51.4% | 1750 | 116,452 | 3499 | 9.9 |
ApLift [28] | Batch | Faster R-CNN [62] | 61.7% | 66.1% | 34.3% | 31.2% | 9168 | 60,180 | 495 | 0.6 |
Lif_T [2] | Batch | Faster R-CNN [62] | 61.3% | 64.7% | 23.2% | 34.5% | 4844 | 65,401 | 389 | 0.5 |
TPM [29] | Batch | DPM [83] | 51.3% | 47.9% | 18.7% | 40.8% | 2701 | 85,504 | 569 | 0.8 |
MHT_bLSTM [5] | Batch | DPM [83] | 42.1% | 47.8% | 14.9% | 44.4% | 11,637 | 93,172 | 753 | 1.8 |
LINF1_16 [6] | Batch | DPM [83] | 41.0% | 45.7% | 11.6% | 51.3% | 7896 | 99,224 | 430 | 4.2 |
GMMCP [79] | Batch | DPM [83] | 38.1% | 35.5% | 8.6% | 50.9% | 6607 | 105,315 | 937 | 0.5 |
LTTSC-CRF [80] | Batch | DPM [83] | 37.6% | 42.1% | 9.6% | 55.2% | 11,969 | 101,343 | 481 | 0.6 |
MOT_GM (Proposed) | Online | DPM [83] | 43.2% | 51.5% | 9.0% | 54.5% | 3481 | 99,532 | 484 | 10.31 |
MOT_GM (Proposed) | Online | CenterNet [86] | 64.5% | 70.9% | 36.4% | 20.7% | 21,182 | 42,730 | 816 | 6.54 |
Baseline Multi-Object Tracking Method | |||||||||
---|---|---|---|---|---|---|---|---|---|
Sequence | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ | Hz ↑ |
MOT16-02 | 25.76% | 75.01% | 32.00% | 3.70% | 53.70% | 192 | 12,982 | 66 | 16.33 |
MOT16-04 | 43.52% | 76.89% | 40.29% | 9.64% | 36.14% | 1277 | 25,355 | 229 | 12.57 |
MOT16-05 | 29.41% | 74.92% | 38.96% | 2.40% | 51.20% | 313 | 4472 | 28 | 21.08 |
MOT16-09 | 56.91% | 73.98% | 54.32% | 28.00% | 8.00% | 133 | 2098 | 34 | 16.57 |
MOT16-10 | 37.24% | 73.75% | 46.53% | 11.11% | 48.15% | 420 | 7274 | 37 | 16.19 |
MOT16-11 | 51.32% | 78.16% | 56.08% | 17.39% | 50.72% | 270 | 4179 | 17 | 16.24 |
MOT16-13 | 19.15% | 71.92% | 28.84% | 5.61% | 66.36% | 240 | 8993 | 24 | 17.41 |
Total | 37.84% | 75.90% | 40.89% | 8.51% | 49.71% | 2845 | 65,353 | 435 | 16.02 |
Proposed Method | |||||||||
Sequence | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ | Hz ↑ |
MOT16-02 | 26.31% | 75.02% | 32.90% | 7.41% | 55.56% | 90 | 13,001 | 51 | 13.11 |
MOT16-04 | 50.85% | 78.05% | 61.94% | 15.66% | 34.94% | 165 | 23,170 | 38 | 8.39 |
MOT16-05 | 28.84% | 74.90% | 41.33% | 1.60% | 55.20% | 242 | 4582 | 28 | 18.09 |
MOT16-09 | 57.24% | 74.30% | 55.01% | 20.00% | 12.00% | 86 | 2139 | 23 | 14.70 |
MOT16-10 | 38.11% | 74.40% | 44.91% | 12.96% | 50.00% | 238 | 7350 | 34 | 13.25 |
MOT16-11 | 51.58% | 78.02% | 58.52% | 14.49% | 52.17% | 178 | 4246 | 18 | 14.82 |
MOT16-13 | 17.90% | 72.75% | 27.63% | 3.94% | 69.16% | 129 | 9261 | 10 | 14.91 |
Total | 41.05% | 76.69% | 51.00% | 8.70% | 51.84% | 1129 | 63,749 | 202 | 12.87 |
Method | Appearance Model | Motion Model | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ |
---|---|---|---|---|---|---|---|---|---|---|
M1 | Color | Self | 37.84% | 75.90% | 40.89% | 8.51% | 49.71% | 2845 | 65,353 | 435 |
M2 | Color | Self, Relation | 40.61% | 76.54% | 48.20% | 8.90% | 50.68% | 1467 | 63,828 | 280 |
M3 | Deep | Self, Relation | 41.05% | 76.69% | 51.00% | 8.70% | 51.84% | 1129 | 63,749 | 202 |
Dimension | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ | Hz ↑ | MOTA × Hz |
---|---|---|---|---|---|---|---|---|---|---|
64 | 40.50% | 76.67% | 49.96% | 8.32% | 52.42% | 1273 | 64,187 | 234 | 13.36 | 541.08 |
128 | 40.36% | 76.69% | 49.93% | 7.74% | 52.80% | 1281 | 64,328 | 242 | 13.41 | 541.28 |
256 | 39.31% | 76.65% | 49.08% | 6.58% | 53.58% | 1524 | 65,160 | 234 | 13.34 | 524.40 |
Method | MOTA ↑ | MOTP ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ |
---|---|---|---|---|---|---|---|
Self motion model | 40.81% | 76.69% | 8.12% | 51.64% | 1166 | 63,987 | 196 |
Relation motion model | 40.68% | 76.78% | 8.70% | 52.61% | 1126 | 64,171 | 201 |
Combined motion model | 41.05% | 76.69% | 8.70% | 51.84% | 1129 | 63,749 | 202 |
MOT without Object Constraint Learning | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sequence | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ | Appearance Updates ↓ | Motion Updates ↓ | Hz ↑ |
MOT16-02 | 26.31% | 75.02% | 32.90% | 7.41% | 55.56% | 90 | 13,001 | 51 | 600 | 600 | 13.11 |
MOT16-04 | 50.85% | 78.05% | 61.94% | 15.66% | 34.94% | 165 | 23,170 | 38 | 1050 | 1050 | 8.39 |
MOT16-05 | 28.84% | 74.90% | 41.33% | 1.60% | 55.20% | 242 | 4582 | 28 | 837 | 837 | 18.09 |
MOT16-09 | 57.24% | 74.30% | 55.01% | 20.00% | 12.00% | 86 | 2139 | 23 | 525 | 525 | 14.70 |
MOT16-10 | 38.11% | 74.40% | 44.91% | 12.96% | 50.00% | 239 | 7350 | 34 | 654 | 654 | 13.25 |
MOT16-11 | 51.58% | 78.02% | 58.52% | 14.49% | 52.17% | 178 | 4246 | 18 | 900 | 900 | 14.82 |
MOT16-13 | 17.90% | 72.75% | 27.63% | 3.74% | 69.16% | 129 | 9261 | 10 | 729 | 746 | 14.91 |
Total | 41.05% | 76.69% | 51.00% | 8.70% | 51.84% | 1129 | 63,749 | 202 | 5295 | 5295 | 12.87 |
MOT with Object Constraint Learning | |||||||||||
Sequence | MOTA ↑ | MOTP ↑ | IDF1 ↑ | MT ↑ | ML ↓ | FP ↓ | FN ↓ | ID Sw. ↓ | Appearance Updates ↓ | Motion Updates ↓ | Hz ↑ |
MOT16-02 | 25.30% | 75.44% | 31.07% | 7.41% | 57.41% | 111 | 13,167 | 44 | 389 | 227 | 13.34 |
MOT16-04 | 50.53% | 78.03% | 60.85% | 15.66% | 34.94% | 202 | 23,280 | 43 | 772 | 318 | 8.92 |
MOT16-05 | 26.37% | 74.86% | 37.41% | 0.80% | 55.20% | 297 | 4689 | 34 | 334 | 316 | 18.82 |
MOT16-09 | 56.13% | 74.14% | 54.48% | 16.00% | 12.00% | 82 | 2193 | 31 | 193 | 191 | 15.12 |
MOT16-10 | 37.42% | 74.32% | 46.41% | 11.11% | 51.85% | 237 | 7425 | 47 | 479 | 241 | 13.78 |
MOT16-11 | 50.51% | 77.96% | 56.68% | 11.59% | 55.07% | 205 | 4309 | 26 | 255 | 330 | 15.46 |
MOT16-13 | 17.65% | 72.45% | 26.88% | 3.73% | 70.09% | 147 | 9265 | 17 | 433 | 267 | 15.44 |
Total | 40.36% | 76.69% | 49.93% | 7.74% | 52.80% | 1281 | 64,328 | 242 | 2855 | 1890 | 13.41 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yoo, Y.-S.; Lee, S.-H.; Bae, S.-H. Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning. Sensors 2022, 22, 7943. https://doi.org/10.3390/s22207943
Yoo Y-S, Lee S-H, Bae S-H. Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning. Sensors. 2022; 22(20):7943. https://doi.org/10.3390/s22207943
Chicago/Turabian StyleYoo, Yong-Sang, Seong-Ho Lee, and Seung-Hwan Bae. 2022. "Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning" Sensors 22, no. 20: 7943. https://doi.org/10.3390/s22207943
APA StyleYoo, Y.-S., Lee, S.-H., & Bae, S.-H. (2022). Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning. Sensors, 22(20), 7943. https://doi.org/10.3390/s22207943