A Dilated Convolutional Neural Network for Cross-Layers of Contextual Information for Congested Crowd Counting
Abstract
:1. Introduction
- Initially, by amalgamating several convolutional and dilated convolutional blocks into two branches, we introduce a dilated contextual module designed to address the issue of perspective distortion inherent in crowd images.
- Subsequently, we propose a novel dilated convolutional network aimed at fully leveraging the cross-layer connections across various feature maps. This network facilitates the fusion of contextual information while preserving intricate details within crowd images.
2. Related Works
2.1. Multicolumn CNN-Based Methods
2.2. Dilated Convolution-Based Methods
3. Proposed Method
3.1. Dilated Contextual Module
3.2. Density Map Generation
3.3. Loss Function
4. Experiments
4.1. Datasets
- ShanghaiTech [2] stands as one of the most extensively utilized large-scale crowd counting datasets. It comprises 1198 images and 330,165 annotated head centers. The dataset is divided into two parts, part A and part B, reflecting distinct crowd density distributions. Part A consists of images randomly sourced from the Internet, whereas Part B comprises images captured from a bustling street in the metropolis of Shanghai.
- Mall [16] is a dataset compiled from surveillance videos of shopping centers. The video sequence encompasses 2000 frames and includes a total of 62,325 pedestrians. This dataset encapsulates a broad spectrum of density variations under various lighting conditions, along with instances of severe occlusion between individuals.
- UCF_CC_50 [17], developed at the University of Central Florida, comprises only 50 images with 63,075 annotated head centers. This dataset encompasses a diverse array of scenes, including concerts, protests, stadiums, and marathons, with densities ranging from 94 to 4543. The images in UCF_CC_50 exhibit different viewing angles, resulting in varying degrees of perspective distortion.
- UCF-QNRF [18] stands as a highly challenging dataset, comprising 1535 high-resolution crowd images with a total of 1,251,642 annotated head centers. These images encapsulate a wide spectrum of crowd densities and are captured by surveillance cameras with varying viewpoints and angles. Notably congested, the dataset exhibits head counts ranging from 49 to 12,865.
4.2. Implementation Details
4.3. Comparisons with State-of-the-Arts
4.3.1. Results on the ShanghaiTech Dataset
4.3.2. Results on the Mall Dataset
4.3.3. Results on the UCF_CC_50 and UCF-QNRF Datasets
4.4. Ablation Study
4.4.1. Number of DCMs
4.4.2. Backbone
4.5. Discussions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, X.; Liu, W.; Mei, T.; Ma, H. PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance. IEEE Trans. Multimed. 2018, 20, 645–658. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 589–597. [Google Scholar]
- Liu, W.; Salzmann, M.; Fua, P. Context-aware crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5099–5108. [Google Scholar]
- Ji, Q.; Zhu, T.; Bao, D. A hybrid model of convolutional neural networks and deep regression forests for crowd counting. Appl. Intell. 2020, 50, 2818–2832. [Google Scholar] [CrossRef]
- Elharrouss, O.; Almaadeed, N.; Abualsaud, K.; Al-Maadeed, S.; Al-Ali, A.; Mohamed, A. FSC-set: Counting, localization of football supporters crowd in the stadiums. IEEE Access 2022, 10, 10445–10459. [Google Scholar] [CrossRef]
- Lin, H.; Ma, Z.; Ji, R.; Wang, Y.; Hong, X. Boosting crowd counting via multifaceted attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 19628–19637. [Google Scholar]
- Liu, Y.; Cao, G.; Shi, H.; Hu, Y. Lw-count: An effective lightweight encoding-decoding crowd counting network. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6821–6834. [Google Scholar] [CrossRef]
- Song, Q.; Wang, C.; Wang, Y.; Tai, Y.; Wang, C.; Li, J.; Wu, J.; Ma, J. To choose or to fuse? Scale selection for crowd counting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; Volume 35, pp. 2576–2583. [Google Scholar]
- Li, Y.; Zhang, X.; Chen, D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1091–1100. [Google Scholar]
- Ma, J.; Dai, Y.; Tan, Y.P. Atrous convolutions spatial pyramid network for crowd counting and density estimation. Neurocomputing 2019, 350, 91–101. [Google Scholar] [CrossRef]
- Yan, Z.; Zhang, R.; Zhang, H.; Zhang, Q.; Zuo, W. Crowd Counting via Perspective-Guided Fractional-Dilation Convolution. IEEE Trans. Multimed. 2022, 24, 2633–2647. [Google Scholar] [CrossRef]
- Hafeezallah, A.; Al-Dhamari, A.; Abu-Bakar, S.A.R. U-ASD Net: Supervised Crowd Counting Based on Semantic Segmentation and Adaptive Scenario Discovery. IEEE Access 2021, 9, 127444–127459. [Google Scholar] [CrossRef]
- Huang, L.; Zhu, L.; Shen, S.; Zhang, Q.; Zhang, J. SRNet: Scale-aware representation learning network for dense crowd counting. IEEE Access 2021, 9, 136032–136044. [Google Scholar] [CrossRef]
- Zhu, G.; Zeng, X.; Jin, X.; Zhang, J. Metro passengers counting and density estimation via dilated-transposed fully convolutional neural network. Knowl. Inf. Syst. 2021, 63, 1557–1575. [Google Scholar] [CrossRef]
- Zhang, C.; Li, H.; Wang, X.; Yang, X. Cross-scene crowd counting via deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 833–841. [Google Scholar]
- Wu, Z.; Zhang, X.; Tian, G.; Wang, Y.; Huang, Q. Spatial-Temporal Graph Network for Video Crowd Counting. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 228–241. [Google Scholar] [CrossRef]
- Idrees, H.; Saleemi, I.; Seibert, C.; Shah, M. Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2547–2554. [Google Scholar]
- Idrees, H.; Tayyab, M.; Athrey, K.; Zhang, D.; Al-Maadeed, S.; Rajpoot, N.; Shah, M. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 532–546. [Google Scholar]
- Sindagi, V.A.; Patel, V.M. Ha-ccn: Hierarchical attention-based crowd counting network. IEEE Trans. Image Process. 2019, 29, 323–335. [Google Scholar] [CrossRef]
- Sindagi, V.A.; Yasarla, R.; Patel, V.M. JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2594–2609. [Google Scholar] [CrossRef]
- Sindagi, V.A.; Patel, V.M. CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, Lecce, Italy, 29 August–1 September 2017; Number 17287241. pp. 1–6. [Google Scholar]
- Sam, D.B.; Surya, S.; Babu, R.V. Switching Convolutional Neural Network for Crowd Counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4031–4039. [Google Scholar]
- Sindagi, V.A.; Patel, V.M. Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1879–1888. [Google Scholar]
- Xiong, H.; Lu, H.; Liu, C.; Liu, L.; Cao, Z.; Shen, C. From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8361–8370. [Google Scholar]
- Gao, J.; Wang, Q.; Li, X. PCC Net: Perspective Crowd Counting via Spatial Convolutional Network. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3486–3498. [Google Scholar] [CrossRef]
- Sam, D.B.; Peri, S.V.; Sundararaman, M.N.; Kamath, A.; Babu, R.V. Locate, size, and count: Accurately resolving people in dense crowds via detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2739–2751. [Google Scholar]
- Bai, S.; He, Z.; Qiao, Y.; Hu, H.; Wu, W.; Yan, J. Adaptive Dilated Network with Self-Correction Supervision for Counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4593–4602. [Google Scholar]
- Elharrouss, O.; Almaadeed, N.; Abualsaud, K.; Al-Ali, A.; Mohamed, A.; Khattab, T.; Al-Maadeed, S. Drone-SCNet: Scaled cascade network for crowd counting on drone images. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 3988–4001. [Google Scholar] [CrossRef]
- Jiang, X.; Zhang, L.; Zhang, T.; Lv, P.; Zhou, B.; Pang, Y.; Xu, M.; Xu, C. Density-Aware Multi-Task Learning for Crowd Counting. IEEE Trans. Multimed. 2021, 23, 443–453. [Google Scholar] [CrossRef]
- Yang, Y.; Li, G.; Du, D.; Huang, Q.; Sebe, N. Embedding Perspective Analysis Into Multi-Column Convolutional Neural Network for Crowd Counting. IEEE Trans. Image Process. 2021, 30, 1395–1407. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Hong, X.; Ma, Z.; Wei, X.; Qiu, Y.; Wang, Y.; Gong, Y. Direct Measure Matching for Crowd Counting. In Proceedings of the International Joint Conferences on Artificial Intelligence Organization, Virtual Event, 19–26 August 2021; pp. 837–844. [Google Scholar]
- Khan, S.D.; Salih, Y.; Zafar, B.; Noorwali, A. A Deep-Fusion Network for Crowd Counting in High-Density Crowded Scenes. Int. J. Comput. Intell. Syst. 2021, 14, 168. [Google Scholar] [CrossRef]
- Shu, W.; Wan, J.; Tan, K.C.; Kwong, S.; Chan, A.B. Crowd counting in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 19618–19627. [Google Scholar]
- Chen, K.; Loy, C.C.; Gong, S.; Xiang, T. Feature mining for localised crowd counting. In Proceedings of the British Machine Vision Conference, Guildford, UK, 3–7 September 2012; pp. 1–11. [Google Scholar]
- Wang, Y.; Zou, Y. Fast visual object counting via example-based density estimation. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3653–3657. [Google Scholar]
- Xiong, F.; Shi, X.; Yeung, D.Y. Spatiotemporal Modeling for Crowd Counting in Videos. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5161–5169. [Google Scholar]
- Sheng, B.; Shen, C.; Lin, G.; Li, J.; Yang, W.; Sun, C. Crowd Counting via Weighted VLAD on a Dense Attribute Feature Map. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 1788–1797. [Google Scholar] [CrossRef]
- Liu, L.; Wang, H.; Li, G.; Ouyang, W.; Lin, L. Crowd Counting Using Deep Recurrent Spatial-Aware Network. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 9–19 July 2018; pp. 849–855. [Google Scholar]
- Kong, W.; Li, H.; Xing, G.; Zhao, F. An Automatic Scale-Adaptive Approach with Attention Mechanism-Based Crowd Spatial Information for Crowd Counting. IEEE Access 2019, 7, 66215–66225. [Google Scholar] [CrossRef]
- Saqib, M.; Khan, S.D.; Sharma, N.; Blumenstein, M. Crowd counting in low-resolution crowded scenes using region-based deep convolutional neural networks. IEEE Access 2019, 7, 35317–35329. [Google Scholar] [CrossRef]
- Fang, Y.; Zhan, B.; Cai, W.; Gao, S.; Hu, B. Locality-Constrained Spatial Transformer Network for Video Crowd Counting. In Proceedings of the IEEE International Conference on Multimedia and Expo, Shanghai, China, 8–12 July 2019; pp. 814–819. [Google Scholar]
- Miao, Y.; Han, J.; Gao, Y.; Zhang, B. ST-CNN: Spatial-Temporal Convolutional Neural Network for crowd counting in videos. Pattern Recognit. Lett. 2019, 125, 113–118. [Google Scholar] [CrossRef]
- Fang, Y.; Gao, S.; Li, J.; Luo, W.; He, L.; Hu, B. Multi-level feature fusion based Locality-Constrained Spatial Transformer network for video crowd counting. Neurocomputing 2020, 392, 98–107. [Google Scholar] [CrossRef]
- Wu, X.; Xu, B.; Zheng, Y.; Ye, H.; Yang, J.; He, L. Fast video crowd counting with a Temporal Aware Network. Neurocomputing 2020, 403, 13–20. [Google Scholar] [CrossRef]
- Han, T.; Gao, J.; Yuan, Y.; Wang, Q. Focus on Semantic Consistency for Cross-Domain Crowd Understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 4–8 May 2020; pp. 1848–1852. [Google Scholar]
- Cai, Y.; Ma, Z.; Lu, C.; Wang, C.; He, G. Global Representation Guided Adaptive Fusion Network for Stable Video Crowd Counting. IEEE Trans. Multimed. 2022, 25, 5222–5233. [Google Scholar] [CrossRef]
- Wang, Q.; Gao, J.; Lin, W.; Yuan, Y. Learning from Synthetic Data for Crowd Counting in the Wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8190–8199. [Google Scholar]
- Basalamah, S.; Khan, S.D.; Ullah, H. Scale Driven Convolutional Neural Network Model for People Counting and Localization in Crowd Scenes. IEEE Access 2019, 7, 71576–71584. [Google Scholar] [CrossRef]
- Khan, S.D.; Basalamah, S. Sparse to Dense Scale Prediction for Crowd Couting in High Density Crowds. Arab. J. Sci. Eng. 2021, 46, 3051–3065. [Google Scholar] [CrossRef]
- Khan, S.D.; Basalamah, S. Scale and density invariant head detection deep model for crowd counting in pedestrian crowds. Vis. Comput. 2021, 37, 2127–2137. [Google Scholar] [CrossRef]
- Wan, J.; Wang, Q.; Chan, A.B. Kernel-Based Density Map Generation for Dense Object Counting. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1357–1370. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Liang, D.; Chen, X.; Xu, W.; Zhou, Y.; Bai, X. Transcrowd: Weakly-supervised crowd counting with transformers. Sci. China Inf. Sci. 2022, 65, 160104. [Google Scholar] [CrossRef]
Structures | Layer Types | Number of Channels | Kernel Size | Padding Size | Stride | Dilation Rate |
---|---|---|---|---|---|---|
Backbone | conv | 64 | 3 × 3 | 1 | 1 | 1 |
conv | ||||||
max-pooling | - | 2 × 2 | - | 2 | - | |
conv | 128 | 3 × 3 | 1 | 1 | 1 | |
conv | ||||||
max-pooling | - | 2 × 2 | - | 2 | - | |
conv | 256 | 3 × 3 | 1 | 1 | 1 | |
conv | ||||||
conv | ||||||
max-pooling | - | 2 × 2 | - | 2 | - | |
conv | 512 | 3 × 3 | 1 | 1 | 1 | |
conv | ||||||
conv | ||||||
DCM_i (1 ≤ i ≤ m) | Branch_1 | 256 | 1 × 1 | 0 | 1 | 1 |
128 | 1 × 1 | 0 | 1 | 1 | ||
64 | 3 × 3 | 2 | 1 | 2 | ||
Branch_2 | 32 | 3 × 3 | 2 | 1 | 2 | |
16 | 3 × 3 | 2 | 1 | 2 | ||
256 | 1 × 1 | 0 | 1 | 1 | ||
fully connected layer | 1 | 1 × 1 | 0 | 1 | 1 |
Dataset | Number of Images | Training/Testing | Average Resolution | Statistics | |||
---|---|---|---|---|---|---|---|
Total | AVG | Min | Max | ||||
ShanghaiTech part A | 482 | 300/182 | 589 × 868 | 241,667 | 501 | 33 | 3139 |
ShanghaiTech part B | 716 | 400/316 | 768 × 1024 | 88,488 | 123 | 9 | 578 |
Mall | 2000 | 800/1200 | 320 × 240 | 62,325 | 31 | 13 | 53 |
UCF_CC_50 | 50 | - | 2101 × 2888 | 63,974 | 1280 | 94 | 4543 |
UCF-QNRF | 1535 | 1201/334 | 2013 × 2902 | 1,251,642 | 815 | 49 | 12,865 |
Methods | Part A | Part B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
Cascaded-MTL [21] (2017) | 101.3 | 152.4 | 20.0 | 31.1 |
Switching-CNN [22] (2017) | 90.4 | 135.0 | 21.6 | 33.4 |
CP-CNN [23] (2017) | 73.6 | 106.4 | 20.1 | 30.1 |
CSRNet [9] (2018) | 68.2 | 106.4 | 10.6 | 16.0 |
HA-CCN [19] (2019) | 62.9 | 94.9 | 8.1 | 13.4 |
S-DCNet [24] (2019) | 58.3 | 95.0 | 6.7 | 10.7 |
ACSPNet [10] (2019) | 85.2 | 137.1 | 15.4 | 23.1 |
PCC Net [25] (2020) | 73.5 | 124 | 11.0 | 19.0 |
CG-DRCN-CC [20] (2020) | 60.2 | 94 | 8.5 | 14.4 |
LSC-CNN [26] (2020) | 66.4 | 117.0 | 8.1 | 12.7 |
ADSCNet [27] (2020) | 55.40 | 97.7 | 6.4 | 11.3 |
SCNet [28] (2021) | 58.5 | 99.1 | 8.5 | 13.4 |
Density CNN [29] (2021) | 63.1 | 106.3 | 9.1 | 16.3 |
EPA [30] (2021) | 60.9 | 91.6 | 7.9 | 11.6 |
S3 [31] (2021) | 57 | 96 | 6.3 | 10.6 |
DFN [32] (2021) | 77.58 | 129.7 | 14.1 | 21.10 |
U-ASD Net [12] (2021) | 64.6 | 106.1 | 7.5 | 12.4 |
ChfL [33] (2022) | 57.5 | 94.3 | 6.9 | 11 |
Lw-Count [7] (2022) | 69.7 | 100.5 | 10.1 | 12.4 |
PFDNet [11] (2022) | 53.8 | 89.2 | 6.5 | 10.7 |
CL-DCNN (ours) | 52.6 | 90.9 | 8.1 | 12.8 |
Methods | MAE | RMSE |
---|---|---|
Ridge Regression [34] (2012) | 3.59 | 19.0 |
MCNN [2] (2016) | 2.24 | 8.5 |
DE-VOC [35] (2016) | 2.7 | 2.1 |
Bidirectional ConvLSTM [36] (2017) | 2.10 | 7.6 |
weighted-VLAD [37] (2018) | 2.4 | 9.1 |
DRSAN [38] (2018) | 1.72 | 2.10 |
SAC-Crowd [39] (2019) | 2.3 | 3.0 |
MGF [40] (2019) | 1.89 | 7.29 |
ACSPNet [10] (2019) | 1.76 | 2.24 |
LSTN [41] (2019) | 2.00 | 2.50 |
ST-CNN [42] (2019) | 4.03 | 5.87 |
CountForest [4] (2020) | 2.25 | 6.21 |
MLSTN [43] (2020) | 1.80 | 2.42 |
TAN [44] (2020) | 2.03 | 2.60 |
FSC [45] (2020) | 3.71 | 4.66 |
U-ASD Net [12] (2021) | 1.8 | 2.2 |
GRGAF-ST [46] (2022) | 1.61 | 2.07 |
CL-DCNN (ours) | 1.55 | 2.01 |
Methods | UCF_CC_50 | UCF-QNRF | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
Cascaded-MTL [21] (2017) | 322.8 | 341.4 | 251.85 | 513.92 |
CP-CNN [23] (2017) | 295.8 | 320.9 | – | – |
Switching-CNN [22] (2017) | 318.1 | 439.2 | 227.71 | 444.78 |
CSRNet [9] (2018) | 266.1 | 397.5 | 120.3 | 208.5 |
CA-Net [3] (2019) | – | – | 107 | 183 |
HA-CCN [19] (2019) | 256.2 | 348.4 | 118.1 | 180.4 |
S-DCNet [24] (2019) | 204.2 | 301.3 | 104.4 | 176.1 |
SFCN [47] (2019) | 214.2 | 318.3 | 102.0 | 171.4 |
SD-CNN [48] (2019) | 235.74 | 345.6 | – | – |
LSC-CNN [26] (2020) | 225.6 | 302.7 | 120.5 | 218.2 |
PCC Net [25] (2020) | 240.0 | 315.5 | 246.41 | 247.12 |
EPA [30] (2021) | 250.1 | 342.1 | – | – |
Density CNN [29] (2021) | 244.6 | 341.8 | 101.5 | 186.9 |
SRNet [13] (2021) | 184.1 | 232.7 | 108.2 | 177.5 |
DFN [32] (2021) | 402.3 | 434.1 | 218.2 | 357.4 |
SS-CNN [49] (2021) | 229.4 | 325.6 | 115.2 | 175.7 |
SDS-CNN [50] (2021) | – | – | 112 | 173 |
U-ASD Net [12] (2021) | 232.3 | 217.8 | – | – |
KDMG [51] (2022) | – | – | 99.5 | 173.0 |
PFDNet [11] (2022) | 205.8 | 289.3 | – | – |
Lw-Count [7] (2022) | 239.3 | 307.6 | 149.7 | 238.4 |
CL-DCNN (ours) | 181.8 | 240.6 | 96.4 | 168.7 |
Number of DCMs | Part A | Part B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
0 | 63.82 | 104.02 | 10.97 | 14.84 |
1 | 52.66 | 90.98 | 10.06 | 13.16 |
2 | 57.64 | 96.32 | 8.92 | 13.35 |
3 | 60.83 | 100.12 | 8.1 | 12.8 |
4 | 62.32 | 102.47 | 9.15 | 12.94 |
Backbone | Part A | Part B | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
AlexNet | 56.40 | 96.72 | 9.45 | 13.48 |
VGG16 | 52.66 | 90.98 | 8.1 | 12.8 |
ResNet50 | 54.52 | 95.21 | 9.94 | 14.07 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Z.; Ma, P.; Jia, M.; Wang, X.; Hei, X. A Dilated Convolutional Neural Network for Cross-Layers of Contextual Information for Congested Crowd Counting. Sensors 2024, 24, 1816. https://doi.org/10.3390/s24061816
Zhao Z, Ma P, Jia M, Wang X, Hei X. A Dilated Convolutional Neural Network for Cross-Layers of Contextual Information for Congested Crowd Counting. Sensors. 2024; 24(6):1816. https://doi.org/10.3390/s24061816
Chicago/Turabian StyleZhao, Zhiqiang, Peihong Ma, Meng Jia, Xiaofan Wang, and Xinhong Hei. 2024. "A Dilated Convolutional Neural Network for Cross-Layers of Contextual Information for Congested Crowd Counting" Sensors 24, no. 6: 1816. https://doi.org/10.3390/s24061816
APA StyleZhao, Z., Ma, P., Jia, M., Wang, X., & Hei, X. (2024). A Dilated Convolutional Neural Network for Cross-Layers of Contextual Information for Congested Crowd Counting. Sensors, 24(6), 1816. https://doi.org/10.3390/s24061816