Dynamic-Aware Network for Moving Object Detection
Abstract
:1. Introduction
- (1)
- Reasonable utilize spatio-temporal information. In the design of network structure, some methods [20,21,22] focus on extracting spatial features and do not fully utilize the continuity of temporal information, which is a relatively stable clue in video analysis. In addition, there are also some methods that combine spatio-temporal information to obtain moving objects [23,24]. Yet, the method ignores the variability brought by time-varying information, which is an important feature in moving object detection.
- (2)
- Mining deep features for more meaningful clues. Deep features contain abundant semantic abstract information, facilitating the acquisition of accurate target details. Many methods, however, directly feed unprocessed deep information into the decoder without fully exploiting the value of deep features. Some other approaches obtain multiscale features by pyramid pooling, but the strategy cannot establish correlations among different types of features [21,25].
- (3)
- Optimizing the transfer of information between encoder and decoder. As the network layers become deeper, there is a certain degree of loss in object features. The conventional approach involves passing encoding features to the decoder via a skip connection, but the low-level features contain more coarse information [26]. It is unwise to completely ignore all low-level information that can supply rich spatial structure characteristics to the network. And yet, the direct use of these features introduces interference, which will affect detection accuracy.
- (1)
- We propose a Dynamic-Aware Network (DAN) that fully utilizes spatio-temporal information and salient target features for moving object detection, which can effectively explore the intrinsic connection between features to obtain accurate predictions.
- (2)
- We design a Change-Aware Module (CAM) using all change information of different layers and high-level salient features, which can fully leverage the value of deep information and maximize the perception of object change information.
- (3)
- We devise a Motion-Attentive Selection Module (MASM) to alleviate the target blur caused by partial loss of detail, which can acquire discriminative features.
2. Related Work
2.1. Traditional Methods
2.2. Deep Learning-Based Methods
3. Methodology
3.1. Overview
3.2. Change-Aware Module
3.3. Motion-Attentive Selection Module
4. Experiments
4.1. Datasets and Evaluation Metrics
- (1)
- Datasets: To verify the validity of our devised DAN, we conduct experimental comparisons on three commonly used benchmark datasets, including LASIESTA [55], CDnet2014 [56], and INO [57]. The LASIESTA dataset contains 48 videos acquired from indoor and outdoor scenes with a size of 352×288 pixels. CDnet2014 is a large-scale moving object detection dataset that includes 11 categories of video scenes. The INO dataset contains a wealth of videos of outdoor scenes captured by the VIRxCam platform installed outdoors.
- (2)
- Evaluation metrics: F1 is one of the most commonly used comprehensive evaluation metrics in MOD, which is the reconciled average of precision and recall. Moreover, we used seven other metrics to analyze the performance of different models, including accuracy (Acc), FPR, FNR, Sp, AUC, mIoU, and PWC. Detailed information about the above metrics can be found in [18,56,58].
4.2. Implementation Details
4.3. Ablation Study
4.4. Comparisons to the State-of-the-Arts
- (1)
- LASIESTA dataset: In Table 3, we report the quantitative performance of nine techniques on the LASIESTA dataset. Figure 6 illustrates the performance trends in different approaches on the LASIESTA dataset. It can be seen that our designed network is competitive compared with others. The last row of Table 3 presents the average F1 obtained by the different algorithms, where our method achieves 89%. The performance is improved by 8%, 54%, 49%, 5%, 5%, 3%, 4% and 2% compared to Cuevas [59], FgSegNet-M-55 [25], MSFS-55 [21], Fast-D [60], 3DCD-55 [61], Pardas [62], DFC-D [63], and CUAN [64], respectively. Besides, our method also presents a superior performance on single-type videos.
- (2)
- CDnet2014 dataset: Table 4 presents the quantitative results of different techniques [23,24,26,65,66,67,68,69] on the CDnet2014 dataset. Specifically, the proposed DAN achieves 89% on the average F1. Although DAN does not outperform advanced methods in overall performance, our method demonstrates relative stability when facing different types of challenges. For example, in video turbulence0, the performance of approaches BMN-BSN [23] and BSUV-Net [26] fluctuates significantly, with F1 of only 2% and 44%. In the low frame rate video turnpike_0_5fps, the F1 value obtained by Deepbs [24] is only 49%. In short, the designed network is more suitable for scenes with variability. Furthermore, Figure 7 shows the performance trends in different techniques on the CDnet2014 dataset.
- (3)
- INO dataset: In Table 5, we utilize four metrics to compare the performance of different approaches [20,58,69,70,71,72,73] on the INO dataset. The data presented in the table indicates that our method performs well overall and has advantages in several metrics. In particular, the proposed model obtains 98% on AUC, which improves the performance by 8%, 17%, and 2% compared to the recent advanced techniques SPAMOD [20], Qiu [58], and ISFLN [69], respectively.
- (4)
- Visual analysis: Figure 8 and Figure 9 illustrate the qualitative comparison of different methods and our approach [23,24,26,65,67,69]. These examples involve many challenging and complex scenarios, such as shadows, lighting variations, small-sized objects, atmospheric turbulence, and background disturbances. Clearly, the proposed network is able to correctly localize the object position and acquire moving objects with clear contours. The qualitative results highlight the effectiveness of our method in suppressing background interference and accurately distinguishing the object area. Moreover, the designed DAN exhibits the capability to detect objects at different scales.
- (5)
- Complexity analysis: The main constraints for model application are the number of FLOPs and parameters. Table 6 illustrates a comparison of the model complexity of some advanced techniques [21,24,25,61,69,74,75]. Notably, the number of parameters and FLOPs of our model are 4.64 M and 6.87 G, respectively. Collectively, the presented model exhibits impressive performance compared to other approaches.
4.5. Limitations and Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Nomenclature
Dr=j | dilated convolution operation | sigmoid function | |
cat | concatenation operation | Sub | pixel-wise subtraction |
GAP | global average pooling | 1×1 convolution | |
3×3 convolution | AP | average pooling | |
MP | max-pooling | element-wise multiplication | |
element-wise addition | fc | fully connected layer |
References
- Wang, Y.; Zhang, W.; Lai, C.; Wang, J. Adaptive temporal feature modeling for visual tracking via cross-channel learning. Knowl. Based Syst. 2023, 265, 110380. [Google Scholar] [CrossRef]
- Gong, F.; Gao, Y.; Yuan, X.; Liu, X.; Li, Y.; Ji, X. Crude Oil Leakage Detection Based on DA-SR Framework. Adv. Theory Simul. 2022, 5, 2200273. [Google Scholar] [CrossRef]
- Latif, G.; Alghmgham, D.A.; Maheswar, R.; Alghazo, J.; Sibai, F.; Aly, M.H. Deep learning in Transportation: Optimized driven deep residual networks for Arabic traffic sign recognition. Alex. Eng. J. 2023, 80, 134–143. [Google Scholar] [CrossRef]
- Jegham, I.; Alouani, I.; Ben Khalifa, A.; Mahjoub, M.A. Deep learning-based hard spatial attention for driver in-vehicle action monitoring. Expert Syst. Appl. 2023, 219, 119629. [Google Scholar] [CrossRef]
- Hussain, M.I.; Rafique, M.A.; Kim, J.; Jeon, M.; Pedrycz, W. Artificial Proprioceptive Reflex Warning Using EMG in Advanced Driving Assistance System. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 1635–1644. [Google Scholar] [CrossRef]
- Munir, F.; Azam, S.; Rafique, M.A.; Sheri, A.M.; Jeon, M.; Pedrycz, W. Exploring thermal images for object detection in underexposure regions for autonomous driving. Appl. Soft Comput. 2022, 121, 108793. [Google Scholar] [CrossRef]
- Sofuoglu, S.E.; Aviyente, S. GLOSS: Tensor-based anomaly detection in spatiotemporal urban traffic data. Signal Process. 2022, 192, 108370. [Google Scholar] [CrossRef]
- Zhang, L.; Xie, X.; Xiao, K.; Bai, W.; Liu, K.; Dong, P. MANomaly: Mutual adversarial networks for semi-supervised anomaly detection. Inf. Sci. 2022, 611, 65–80. [Google Scholar] [CrossRef]
- López-Rubio, E.; Molina-Cabello, M.A.; Castro, F.M.; Luque-Baena, R.M.; Marín-Jiménez, M.J.; Guil, N. Anomalous object detection by active search with PTZ cameras. Expert Syst. Appl. 2021, 181, 115150. [Google Scholar] [CrossRef]
- Herrmann, M.; Pfisterer, F.; Scheipl, F. A geometric framework for outlier detection in high-dimensional data. WIREs Data Min. Knowl. Discov. 2023, 13, e1491. [Google Scholar] [CrossRef]
- Shao, M.; Sun, Y.; Liu, Z.; Peng, Z.; Li, S.; Li, C. GPNet: Key Point Generation Auxiliary Network for Object Detection. Adv. Theory Simul. 2023, 6, 2200894. [Google Scholar] [CrossRef]
- Kourbane, I.; Genc, Y. A graph-based approach for absolute 3D hand pose estimation using a single RGB image. Appl. Intell. 2022, 52, 16667–16682. [Google Scholar] [CrossRef]
- Wu, T.; Peng, J.; Zhang, W.; Zhang, H.; Tan, S.; Yi, F.; Ma, C.; Huang, Y. Video sentiment analysis with bimodal information-augmented multi-head attention. Knowl. Based Syst. 2022, 235, 107676. [Google Scholar] [CrossRef]
- Yu, J.-M.; Ham, G.; Lee, C.; Lee, J.-H.; Han, J.-K.; Kim, J.-K.; Jang, D.; Kim, N.; Kim, M.-S.; Im, S.G.; et al. A Multiple-State Ion Synaptic Transistor Applicable to Abnormal Car Detection with Transfer Learning. Adv. Intell. Syst. 2022, 4, 2100231. [Google Scholar] [CrossRef]
- Wang, T.; Hou, B.; Li, J.; Shi, P.; Zhang, B.; Snoussi, H. TASTA: Text-Assisted Spatial and Temporal Attention Network for Video Question Answering. Adv. Intell. Syst. 2023, 5, 2200131. [Google Scholar] [CrossRef]
- Goh, G.L.; Goh, G.D.; Pan, J.W.; Teng, P.S.P.; Kong, P.W. Automated Service Height Fault Detection Using Computer Vision and Machine Learning for Badminton Matches. Sensors 2023, 23, 9759. [Google Scholar] [CrossRef]
- Naik, B.T.; Hashmi, M.F. YOLOv3-SORT: Detection and tracking player/ball in soccer sport. J. Electron. Imaging 2023, 32, 011003. [Google Scholar] [CrossRef]
- Li, S.; Han, P.; Bu, S.; Tong, P.; Li, Q.; Li, K.; Wan, G. Change detection in images using shape-aware siamese convolutional network. Eng. Appl. Artif. Intell. 2020, 94, 103819. [Google Scholar] [CrossRef]
- Zhang, H.; Qu, S.; Li, H. Dual-Branch Enhanced Network for Change Detection. Arab. J. Sci. Eng. 2022, 47, 3459–3471. [Google Scholar] [CrossRef]
- Qu, S.; Zhang, H.; Wu, W.; Xu, W.; Li, Y. Symmetric pyramid attention convolutional neural network for moving object detection. Signal Image Video Process. 2021, 15, 1747–1755. [Google Scholar] [CrossRef]
- Lim, L.A.; Keles, H.Y. Learning multi-scale features for foreground segmentation. Pattern Anal. Appl. 2020, 23, 1369–1380. [Google Scholar] [CrossRef]
- Yang, L.; Li, J.; Luo, Y.; Zhao, Y.; Cheng, H.; Li, J. Deep Background Modeling Using Fully Convolutional Network. IEEE Trans. Intell. Transp. Syst. 2018, 19, 254–262. [Google Scholar] [CrossRef]
- Mondéjar-Guerra, V.; Rouco, J.; Novo, J.; Ortega, M. An end-to-end deep learning approach for simultaneous background modeling and subtraction. In Proceedings of the 30th British Machine Vision Conference, Cardiff, UK, 9–12 September 2019; pp. 1–12. [Google Scholar]
- Babaee, M.; Dinh, D.T.; Rigoll, G. A deep convolutional neural network for video sequence background subtraction. Pattern Recognit. 2018, 76, 635–649. [Google Scholar] [CrossRef]
- Lim, L.A.; Yalim Keles, H. Foreground segmentation using convolutional neural networks for multiscale feature encoding. Pattern Recognit. Lett. 2018, 112, 256–262. [Google Scholar] [CrossRef]
- Tezcan, M.O.; Ishwar, P.; Konrad, J. BSUV-Net: A Fully-Convolutional Neural Network for Background Subtraction of Unseen Videos. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 2763–2772. [Google Scholar]
- Zhu, M.; Wang, H. Fast detection of moving object based on improved frame-difference method. In Proceedings of the 6th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 21–22 October 2017; pp. 299–303. [Google Scholar]
- Kang, Y.; Huang, W.; Zheng, S. An improved frame difference method for moving target detection. In Proceedings of the Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 1537–1541. [Google Scholar]
- Luo, X.; Jia, K.; Liu, P. Improved Three-Frame-Difference Algorithm for Infrared Moving Target. In Proceedings of the 5th International Conference on Image, Vision and Computing (ICIVC), Beijing, China, 10–12 July 2020; pp. 108–112. [Google Scholar]
- Sengar, S.S.; Mukhopadhyay, S. A novel method for moving object detection based on block based frame differencing. In Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India, 3–5 March 2016; pp. 467–472. [Google Scholar]
- Sengar, S.S.; Mukhopadhyay, S. Moving object detection based on frame difference and W4. Signal Image Video Process. 2017, 11, 1357–1364. [Google Scholar] [CrossRef]
- Boufares, O.; Boussif, M.; Aloui, N. Moving Object Detection System Based on the Modified Temporal Difference and OTSU algorithm. In Proceedings of the 18th International Multi-Conference on Systems, Signals & Devices (SSD), Monastir, Tunisia, 22–25 March 2021; pp. 1378–1382. [Google Scholar]
- Zeng, W.; Xie, C.; Yang, Z.; Lu, X. A universal sample-based background subtraction method for traffic surveillance videos. Multimed. Tools Appl. 2020, 79, 22211–22234. [Google Scholar] [CrossRef]
- Pan, H.; Zhu, G.; Peng, C.; Xiao, Q. Background subtraction for night videos. PeerJ Comput. Sci. 2021, 7, e592. [Google Scholar] [CrossRef]
- Cioppa, A.; Braham, M.; Van Droogenbroeck, M. Asynchronous Semantic Background Subtraction. J. Imaging 2020, 6, 50. [Google Scholar] [CrossRef]
- Kalli, S.; Suresh, T.; Prasanth, A.; Muthumanickam, T.; Mohanram, K. An effective motion object detection using adaptive background modeling mechanism in video surveillance system. J. Intell. Fuzzy Syst. 2021, 41, 1777–1789. [Google Scholar] [CrossRef]
- Braham, M.; Droogenbroeck, M.V. Deep background subtraction with scene-specific convolutional neural networks. In Proceedings of the International Conference on Systems, Signals and Image Processing (IWSSIP), Bratislava, Slovakia, 23–25 May 2016; pp. 1–4. [Google Scholar]
- Wang, Y.; Luo, Z.; Jodoin, P.-M. Interactive deep learning method for segmenting moving objects. Pattern Recognit. Lett. 2017, 96, 66–75. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Vijayan, M.; Raguraman, P.; Mohan, R. A Fully Residual Convolutional Neural Network for Background Subtraction. Pattern Recognit. Lett. 2021, 146, 63–69. [Google Scholar] [CrossRef]
- Lin, C.; Yan, B.; Tan, W. Foreground Detection in Surveillance Video with Fully Convolutional Semantic Network. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4118–4122. [Google Scholar]
- St-Charles, P.-L.; Bilodeau, G.-A.; Bergevin, R. SuBSENSE: A Universal Change Detection Method with Local Adaptive Sensitivity. IEEE Trans. Image Process. 2015, 24, 359–373. [Google Scholar] [CrossRef]
- Qiu, M.; Li, X. A Fully Convolutional Encoder–Decoder Spatial–Temporal Network for Real-Time Background Subtraction. IEEE Access 2019, 7, 85949–85958. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, Y.; Liu, J.Y.; Wang, K.; Zhang, K.; Zhang, G.S.; Liao, X.F.; Yang, G. Global Transformer and Dual Local Attention Network via Deep-Shallow Hierarchical Feature Fusion for Retinal Vessel Segmentation. IEEE Trans. Cybern. 2023, 53, 5826–5839. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.-B.; Ji, Y.-X.; Tang, J.; Luo, B.; Wang, W.-Q.; Lv, K. DBRANet: Road Extraction by Dual-Branch Encoder and Regional Attention Decoder. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Gaudio, A.; Smailagic, A.; Faloutsos, C.; Mohan, S.; Johnson, E.; Liu, Y.; Costa, P.; Campilho, A. DeepFixCX: Explainable privacy-preserving image compression for medical image analysis. WIREs Data Min. Knowl. Discov. 2023, 13, e1495. [Google Scholar] [CrossRef]
- Minematsu, T.; Shimada, A.; Taniguchi, R.-i. Simple background subtraction constraint for weakly supervised background subtraction network. In Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–8. [Google Scholar]
- Zhang, L.; Hu, X.; Zhang, M.; Shu, Z.; Zhou, H. Object-level change detection with a dual correlation attention-guided detector. ISPRS J. Photogramm. Remote Sens. 2021, 177, 147–160. [Google Scholar] [CrossRef]
- Sakkos, D.; Liu, H.; Han, J.; Shao, L. End-to-end video background subtraction with 3d convolutional neural networks. Multimed. Tools Appl. 2018, 77, 23023–23041. [Google Scholar] [CrossRef]
- Gao, Y.; Cai, H.; Zhang, X.; Lan, L.; Luo, Z. Background Subtraction via 3D Convolutional Neural Networks. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 1271–1276. [Google Scholar]
- Yu, R.; Wang, H.; Davis, L.S. ReMotENet: Efficient Relevant Motion Event Detection for Large-Scale Home Surveillance Videos. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1642–1651. [Google Scholar]
- Zheng, W.; Wang, K.; Wang, F.-Y. A novel background subtraction algorithm based on parallel vision and Bayesian GANs. Neurocomputing 2020, 394, 178–200. [Google Scholar] [CrossRef]
- Bahri, F.; Shakeri, M.; Ray, N. Online Illumination Invariant Moving Object Detection by Generative Neural Network. In Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing, Hyderabad, India, 18–22 December 2018; pp. 1–8. [Google Scholar]
- Dosovitskiy, A.; Brox, T. Inverting Visual Representations with Convolutional Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4829–4837. [Google Scholar]
- Carlos, C.; Maria Yanez, E.; Narciso, G. Labeled dataset for integral evaluation of moving object detection algorithms: LASIESTA. Comput. Vis. Image Underst. 2016, 152, 103–117. [Google Scholar]
- Wang, Y.; Jodoin, P.; Porikli, F.; Konrad, J.; Benezeth, Y.; Ishwar, P. CDnet 2014: An Expanded Change Detection Benchmark Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 393–400. [Google Scholar]
- Video Analytics Dataset [DS]. Available online: http://www.ino.ca/en/video-analytics-dataset/ (accessed on 1 March 2022).
- Qiu, S.; Luo, J.; Yang, S.; Zhang, M.; Zhang, W. A moving target extraction algorithm based on the fusion of infrared and visible images. Infrared Phys. Technol. 2019, 98, 285–291. [Google Scholar] [CrossRef]
- Berjón, D.; Cuevas, C.; Morán, F.; García, N. Real-time nonparametric background subtraction with tracking-based foreground update. Pattern Recognit. 2018, 74, 156–170. [Google Scholar] [CrossRef]
- Hossain, M.A.; Hossain, M.I.; Hossain, M.D.; Thu, N.T.; Huh, E.-N. Fast-D: When Non-Smoothing Color Feature Meets Moving Object Detection in Real-Time. IEEE Access 2020, 8, 186756–186772. [Google Scholar] [CrossRef]
- Mandal, M.; Dhar, V.; Mishra, A.; Vipparthi, S.K.; Abdel-Mottaleb, M. 3DCD: Scene Independent End-to-End Spatiotemporal Feature Learning Framework for Change Detection in Unseen Videos. IEEE Trans. Image Process. 2021, 30, 546–558. [Google Scholar] [CrossRef] [PubMed]
- Pardàs, M.; Canet, G. Refinement Network for unsupervised on the scene Foreground Segmentation. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–21 January 2021; pp. 705–709. [Google Scholar]
- Hossain, M.A.; Hossain, M.I.; Hossain, M.D.; Huh, E.-N. DFC-D: A dynamic weight-based multiple features combination for real-time moving object detection. Multimed. Tools Appl. 2022, 81, 32549–32580. [Google Scholar] [CrossRef]
- Canet Tarrés, G.; Pardàs, M. Context-Unsupervised Adversarial Network for Video Sensors. Sensors 2022, 22, 3171. [Google Scholar] [CrossRef] [PubMed]
- Bianco, S.; Ciocca, G.; Schettini, R. Combination of Video Change Detection Algorithms by Genetic Programming. IEEE Trans. Evol. Comput. 2017, 21, 914–928. [Google Scholar] [CrossRef]
- Braham, M.; Piérard, S.; Droogenbroeck, M.V. Semantic background subtraction. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 4552–4556. [Google Scholar]
- Anthony, C.; Marc Van, D.; Braham, M. Real-Time Semantic Background Subtraction. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 3214–3218. [Google Scholar]
- Li, L.; Wang, Z.; Hu, Q.; Dong, Y. Adaptive Nonconvex Sparsity Based Background Subtraction for Intelligent Video Surveillance. IEEE Trans. Ind. Inform. 2021, 17, 4168–4178. [Google Scholar] [CrossRef]
- Zhang, H.; Li, H. Interactive spatio-temporal feature learning network for video foreground detection. Complex Intell. Syst. 2022, 8, 4251–4263. [Google Scholar] [CrossRef]
- Li, Z.; Hou, Q.; Fu, H.; Dai, Z.; Yang, L.; Jin, G.; Li, R. Infrared small moving target detection algorithm based on joint spatio-temporal sparse recovery. Infrared Phys. Technol. 2015, 69, 44–52. [Google Scholar] [CrossRef]
- Akula, A.; Singh, A.; Ghosh, R.; Kumar, S.; Sardana, H.K. Target Recognition in Infrared Imagery Using Convolutional Neural Network. In Proceedings of International Conference on Computer Vision and Image Processing; Springer: Singapore, 2016; pp. 25–34. [Google Scholar]
- Bhattacharjee, S.D.; Talukder, A.; Alam, M.S. Graph clustering for weapon discharge event detection and tracking in infrared imagery using deep features. In Proceedings of the Conference on Pattern Recognition and Tracking XXVII, Anaheim, CA, USA, 1 May 2017; p. 102030O. [Google Scholar]
- Sun, B.; Li, Y.; Guosheng, G. Moving target segmentation using Markov random field-based evaluation metric in infrared videos. Opt. Eng. 2018, 1, 013106. [Google Scholar] [CrossRef]
- Ozan, T.M.; Prakash, I.; Konrad, J.; And Janusz Konrad, F.I. BSUV-Net 2.0: Spatio-Temporal Data Augmentations for Video-Agnostic Supervised Background Subtraction. IEEE Access 2021, 9, 53849–53860. [Google Scholar] [CrossRef]
- Zhang, H.; Qu, S.; Li, H.; Xu, W.; Du, X. A motion-appearance-aware network for object change detection. Knowl.-Based Syst. 2022, 255, 109612. [Google Scholar] [CrossRef]
Classification | Method | Backbone | Dataset | Running Time | GPU | F1 | |||
---|---|---|---|---|---|---|---|---|---|
CDnet 2014 | Wallflower | USCD | SBI 2015 | ||||||
Traditional methods | Zhu [27] | — | — | 5 FPS | — | — | — | — | — |
Huang [28] | — | — | — | — | — | — | — | — | |
Luo [29] | — | — | — | — | — | — | — | — | |
Sandeep [30] | — | CAVIAR | — | — | — | — | — | — | |
Sandeep [31] | — | CAVIAR | — | — | — | — | — | — | |
Oussama [32] | — | CDnet2014 | 3.02 s/frame | — | — | — | — | — | |
Zeng [33] | — | CDnet2014 | 3 FPS | — | 0.69 | — | — | — | |
Pan [34] | — | CDnet2014 | — | — | 0.70 | — | — | — | |
Cioppa [35] | — | CDnet2014 | — | — | 0.75 | — | — | — | |
Kalli [36] | — | Wallflower | — | — | — | 0.78 | — | — | |
Deep learning-based methods | Braham [37] | — | CDnet2014 | — | — | 0.90 | — | — | — |
Wang [38] | — | CDnet2014 | — | GTX 970 | 0.84 | — | — | — | |
Lim [25] | VGG16 | CDnet2014+SBI2015 | — | — | 0.95 | — | — | 0.98 | |
Midhula [40] | — | CDnet2014 | — | GTX 970 | 0.94 | — | — | — | |
Lin [41] | VGG16 | CDnet2014 | — | GTX 1080Ti | 0.69 | — | — | — | |
Qiu [43] | ConvLSTM | CDnet2014 | 112 FPS | Titan X | 0.86 | — | — | — | |
Minematsu [47] | VGG16 | CDnet2014 | 134 FPS | GTX 1080Ti | 0.85 | — | — | — | |
Sakkos [49] | 3DCNN | CDnet2014 | — | Titan X | 0.95 | — | — | — | |
Gao [50] | 3DCNN | CDnet2012 | — | — | 0.95 (CDnet2012) | — | — | — | |
Zheng [52] | GAN | CDnet2014+USCD+SBI2015 | 23 FPS | GTX 970 | 0.95 | — | 0.92 | 0.92 | |
Bahri [53] | — | CDnet2014+Wallflower | 4.9 FPS | GTX 1080Ti | 0.83 | 0.85 | — | — |
Modules | Metrics | ||||||||
---|---|---|---|---|---|---|---|---|---|
Acc↑ | Precision↑ | Recall↑ | F1↑ | PWC↓ | FPR↓ | FNR↓ | Sp↑ | AUC↑ | |
① Ours | 0.9709 | 0.882 | 0.8903 | 0.879 | 0.8906 | 0.0061 | 0.1097 | 0.9939 | 0.9842 |
② w/o MASM | 0.9692 | 0.8702 | 0.8443 | 0.846 | 1.0411 | 0.0061 | 0.1557 | 0.9939 | 0.9695 |
③ w/o CAM | 0.9678 | 0.8532 | 0.8828 | 0.8569 | 1.218 | 0.0095 | 0.1172 | 0.9905 | 0.9813 |
④ w/o MASM + CAM | 0.9696 | 0.8541 | 0.8309 | 0.8263 | 1.0899 | 0.0066 | 0.1691 | 0.9934 | 0.9564 |
⑤ w/o MASM + CAM + ETDD | 0.9649 | 0.7693 | 0.7791 | 0.7481 | 1.6923 | 0.011 | 0.2209 | 0.989 | 0.9479 |
Videos | Methods | ||||||||
---|---|---|---|---|---|---|---|---|---|
Cuevas [59] | FgSegNet-M-55 [25] | MSFS-55 [21] | Fast-D [60] | 3DCD-55 [61] | Pardas [62] | DFC-D [63] | CUAN [64] | DAN (Ours) | |
O_SN | 0.78 | 0.19 | 0.31 | 0.88 | 0.69 | 0.86 | 0.90 | 0.89 | 0.85 |
O_SU | 0.72 | 0.25 | 0.37 | 0.87 | 0.85 | 0.87 | 0.82 | 0.87 | 0.87 |
O_RA | 0.87 | 0.18 | 0.35 | 0.94 | 0.90 | 0.90 | 0.93 | 0.90 | 0.94 |
O_CL | 0.93 | 0.22 | 0.41 | 0.94 | 0.87 | 0.80 | 0.94 | 0.84 | 0.92 |
I_SI | 0.88 | 0.43 | 0.39 | 0.93 | 0.87 | 0.88 | 0.93 | 0.84 | 0.85 |
I_OC | 0.78 | 0.31 | 0.37 | 0.92 | 0.91 | 0.90 | 0.92 | 0.83 | 0.93 |
I_MB | 0.94 | 0.71 | 0.64 | 0.94 | 0.89 | 0.78 | 0.94 | 0.93 | 0.95 |
I_IL | 0.65 | 0.32 | 0.35 | 0.50 | 0.92 | 0.82 | 0.51 | 0.82 | 0.83 |
I_CA | 0.84 | 0.69 | 0.40 | 0.89 | 0.82 | 0.89 | 0.94 | 0.89 | 0.93 |
I_BS | 0.66 | 0.21 | 0.36 | 0.62 | 0.72 | 0.85 | 0.63 | 0.86 | 0.86 |
Average | 0.81 | 0.35 | 0.40 | 0.84 | 0.84 | 0.86 | 0.85 | 0.87 | 0.89 |
Videos | Methods | ||||||||
---|---|---|---|---|---|---|---|---|---|
IUTIS-5 [65] | SemanticBGS [66] | Deepbs [24] | BMN-BSN [23] | RT-SBS-V1 [67] | BSUV-Net [26] | GSTO [68] | ISFLN [69] | Ours | |
highway | 0.95 | 0.96 | 0.97 | 0.95 | 0.95 | 0.98 | 0.88 | 0.93 | 0.95 |
office | 0.97 | 0.96 | 0.98 | 0.97 | 0.93 | 0.97 | 0.84 | 0.94 | 0.93 |
PETS2006 | 0.94 | 0.94 | 0.94 | 0.92 | 0.88 | 0.95 | 0.83 | 0.92 | 0.92 |
canoe | 0.95 | 0.95 | 0.98 | 0.82 | 0.94 | 0.91 | 0.84 | 0.91 | 0.91 |
turbulence1 | 0.65 | 0.30 | 0.77 | 0.56 | 0.14 | 0.66 | 0.32 | 0.85 | 0.84 |
sofa | 0.79 | 0.84 | 0.81 | 0.91 | 0.77 | 0.89 | 0.73 | 0.93 | 0.93 |
turnpike_0_5fps | 0.88 | 0.88 | 0.49 | 0.72 | 0.90 | 0.91 | 0.79 | 0.81 | 0.80 |
peopleInShade | 0.91 | 0.92 | 0.92 | 0.89 | 0.92 | 0.90 | 0.97 | 0.89 | 0.87 |
lakeSide | 0.60 | 0.66 | 0.65 | 0.51 | 0.57 | 0.76 | NA | 0.84 | 0.79 |
cubicle | 0.92 | 0.98 | 0.94 | 0.63 | 0.97 | 0.92 | 0.78 | 0.90 | 0.90 |
turbulence0 | 0.89 | 0.89 | 0.80 | 0.02 | 0.63 | 0.44 | 0.46 | 0.84 | 0.84 |
diningRoom | 0.92 | 0.93 | 0.90 | 0.87 | 0.78 | 0.91 | NA | 0.96 | 0.95 |
copyMachine | 0.93 | 0.96 | 0.95 | 0.96 | 0.95 | 0.84 | 0.84 | 0.95 | 0.95 |
Average | 0.87 | 0.86 | 0.85 | 0.75 | 0.79 | 0.85 | 0.75 | 0.90 | 0.89 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Yang, L.; Du, X. Dynamic-Aware Network for Moving Object Detection. Symmetry 2024, 16, 1620. https://doi.org/10.3390/sym16121620
Zhang H, Yang L, Du X. Dynamic-Aware Network for Moving Object Detection. Symmetry. 2024; 16(12):1620. https://doi.org/10.3390/sym16121620
Chicago/Turabian StyleZhang, Hongrui, Luxia Yang, and Xiaona Du. 2024. "Dynamic-Aware Network for Moving Object Detection" Symmetry 16, no. 12: 1620. https://doi.org/10.3390/sym16121620
APA StyleZhang, H., Yang, L., & Du, X. (2024). Dynamic-Aware Network for Moving Object Detection. Symmetry, 16(12), 1620. https://doi.org/10.3390/sym16121620