ICNet: A Dual-Branch Instance Segmentation Network for High-Precision Pig Counting
Abstract
:1. Introduction
- We create a semi-auto instance labeling tool called SAI, which is based on the SAM, to produce faster and more accurate segmentation labels, and a high-precision pig counting dataset called Count1200.
- We reconstruct a more robust backbone called ICNet, with a Pipe Layer and four Parallel Deformable Convolutional Layers (PDCLs), which effectively exploits the long time-series modeling property and is able to expand the sensory field while saving computational resources.
- We design a module named Parallel Deformable Convolutional Block (PDCB) that consists of a double-branch structure and skip-connection, which is used to form the PDCL. PDCB is based on DCNv3, which is able to achieve more features with fewer parameters, and is more suitable for our detection work on multi-scale datasets.
2. Materials and Methods
2.1. Count1200 Dataset
2.2. Methods
2.2.1. Instances Counting Network (ICNet)
2.2.2. Pipe Layer
2.2.3. Downsampling Layer
2.2.4. Parallel Deformable Convolutions Layer (PDCL)
2.2.5. Deformable Convolutions v3 (DCNv3)
2.3. Experimental Environment Setup
2.3.1. Data Preprocessing
2.3.2. Optimization Strategy
2.3.3. Evaluation Metric
3. Results
3.1. Results on the Test Set
3.2. Ablation Study
3.3. Evaluation of Counting
3.4. Visualization Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
SAI | Semi-auto Instance Labeling Tool |
SAM | Segment Anything Model |
ICNet | Instances Counting Network |
PDCL | Parallel Deformable Convolutions Layer |
PDCB | Parallel Deformable Convolution Block |
MHSA | Multi-head Self-attention |
GUI | Graphical User Interface |
ViT | Vision TransFormer |
References
- Neethirajan, S. Recent advances in wearable sensors for animal health management. Sens. Bio-Sens. Res. 2017, 12, 15–29. [Google Scholar] [CrossRef]
- Zhang, T.; Liang, Y.; He, Z. Applying image recognition and counting to reserved live pigs statistics. Comput. Appl. Softw. 2016, 33, 173–178. [Google Scholar]
- Schleppe, J.; Lachapelle, G.; Booker, C.; Pittman, T. Challenges in the design of a GNSS ear tag for feedlot cattle. Comput. Electron. Agric. 2010, 70, 84–95. [Google Scholar] [CrossRef]
- Chen, S.; Wang, Q.; Chen, D.R. Effect of pleat shape on reverse pulsed-jet cleaning of filter cartridges. Powder Technol. 2017, 305, 1–11. [Google Scholar] [CrossRef]
- Rahnemoonfar, M.; Sheppard, C. Deep count: Fruit counting based on deep simulated learning. Sensors 2017, 17, 905. [Google Scholar] [CrossRef]
- Shen, Y.; Zhou, H.; Li, J.; Jian, F.; Jayas, D.S. Detection of stored-grain insects using deep learning. Comput. Electron. Agric. 2018, 145, 319–325. [Google Scholar] [CrossRef]
- Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
- Sa, I.; Ge, Z.; Dayoub, F.; Upcroft, B.; Perez, T.; McCool, C. Deepfruits: A fruit detection system using deep neural networks. Sensors 2016, 16, 1222. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Wang, F.; Fu, X.; Duan, W.; Wang, B.; Li, H. Visual Detection of Lost Ear Tags in Breeding Pigs in a Production Environment Using the Enhanced Cascade Mask R-CNN. Agriculture 2023, 13, 2011. [Google Scholar] [CrossRef]
- Feng, W.; Wang, K.; Zhou, S. An efficient neural network for pig counting and localization by density map estimation. IEEE Access 2023, 11, 81079–81091. [Google Scholar] [CrossRef]
- Jiang, K.; Xie, T.; Yan, R.; Wen, X.; Li, D.; Jiang, H.; Jiang, N.; Feng, L.; Duan, X.; Wang, J. An attention mechanism-improved YOLOv7 object detection algorithm for hemp duck count estimation. Agriculture 2022, 12, 1659. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, X.; Chen, D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1091–1100. [Google Scholar]
- Chen, G.; Shen, S.; Wen, L.; Luo, S.; Bo, L. Efficient pig counting in crowds with keypoints tracking and spatial-aware temporal response filtering. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 10052–10058. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XVIII 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 649–665. [Google Scholar]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. Solov2: Dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 2020, 33, 17721–17732. [Google Scholar]
- Cheng, B.; Schwing, A.; Kirillov, A. Per-pixel classification is not all you need for semantic segmentation. Adv. Neural Inf. Process. Syst. 2021, 34, 17864–17875. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 1290–1299. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. arXiv 2023, arXiv:2304.02643. [Google Scholar]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 14408–14419. [Google Scholar]
- Cheng, H.K.; Oh, S.W.; Price, B.; Schwing, A.; Lee, J.Y. Tracking anything with decoupled video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 1316–1326. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Larsson, G.; Maire, M.; Shakhnarovich, G. Fractalnet: Ultra-deep neural networks without residuals. arXiv 2016, arXiv:1605.07648. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1–9. [Google Scholar] [CrossRef]
- Tangirala, B.; Bhandari, I.; Laszlo, D.; Gupta, D.K.; Thomas, R.M.; Arya, D. Livestock Monitoring with Transformer. arXiv 2021, arXiv:2111.00801. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. Beit: Bert pre-training of image transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar]
- Goyal, P.; Dollár, P.; Girshick, R.; Noordhuis, P.; Wesolowski, L.; Kyrola, A.; Tulloch, A.; Jia, Y.; He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv 2017, arXiv:1706.02677. [Google Scholar]
- Tian, M.; Guo, H.; Chen, H.; Wang, Q.; Long, C.; Ma, Y. Automated pig counting using deep learning. Comput. Electron. Agric. 2019, 163, 104840. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Methods | Scale | Param. | # FLOPs | ||||||
---|---|---|---|---|---|---|---|---|---|
SOLOv1 | 36M | 106G | 47.3 | 82.6 | 51.7 | 9.7 | 53.4 | 56.5 | |
SOLOv2 | 65M | 133G | 60.6 | 90.9 | 67.4 | 33.2 | 64.5 | 67.4 | |
Mask R-CNN | 62M | 134G | 47.8 | 77.1 | 54.2 | 13.3 | 52.7 | 54.9 | |
Mask2Former (R) | 63M | 117G | 53.5 | 85.1 | 59.9 | 15.5 | 58.9 | 60.0 | |
Mask2Former (S) | 69M | 125G | 54.1 | 84.6 | 61.0 | 15.0 | 59.7 | 60.6 | |
InternImage | 49M | 117G | 67.6 | 93.5 | 81.4 | 37.7 | 71.6 | 70.7 | |
ICNet | 33M | 106G | 71.4 | 95.7 | 86.4 | 47.5 | 74.5 | 74.3 |
No. | Simple Block | BN | Dual | Drop Path | AP | AR |
---|---|---|---|---|---|---|
1 | √ | 66.5 | 69.7 | |||
2 | √ | √ | 70.7 | 73.8 | ||
3 | √ | √ | √ | 71.0 | 74.3 | |
4 | √ | √ | √ | 70.8 | 73.7 | |
5 | √ | √ | √ | √ | 71.4 | 74.3 |
Metrics | MAE | RMSE |
---|---|---|
SOLOv1 | 3.71 | 4.69 |
SOLOv2 | 1.20 | 1.83 |
Mask R-CNN | 2.86 | 3.55 |
Mask2Former (R) | 1.73 | 2.37 |
Mask2Former (S) | 2.39 | 3.13 |
InternImage | 0.93 | 1.38 |
ICNet | 0.68 | 1.07 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, S.; Zhao, C.; Zhang, H.; Li, Q.; Li, S.; Chen, Y.; Gao, R.; Wang, R.; Li, X. ICNet: A Dual-Branch Instance Segmentation Network for High-Precision Pig Counting. Agriculture 2024, 14, 141. https://doi.org/10.3390/agriculture14010141
Liu S, Zhao C, Zhang H, Li Q, Li S, Chen Y, Gao R, Wang R, Li X. ICNet: A Dual-Branch Instance Segmentation Network for High-Precision Pig Counting. Agriculture. 2024; 14(1):141. https://doi.org/10.3390/agriculture14010141
Chicago/Turabian StyleLiu, Shanghao, Chunjiang Zhao, Hongming Zhang, Qifeng Li, Shuqin Li, Yini Chen, Ronghua Gao, Rong Wang, and Xuwen Li. 2024. "ICNet: A Dual-Branch Instance Segmentation Network for High-Precision Pig Counting" Agriculture 14, no. 1: 141. https://doi.org/10.3390/agriculture14010141
APA StyleLiu, S., Zhao, C., Zhang, H., Li, Q., Li, S., Chen, Y., Gao, R., Wang, R., & Li, X. (2024). ICNet: A Dual-Branch Instance Segmentation Network for High-Precision Pig Counting. Agriculture, 14(1), 141. https://doi.org/10.3390/agriculture14010141