Fast Tongue Detection Based on Lightweight Model and Deep Feature Propagation
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. Method
2.2.1. Overview
2.2.2. Tongue Detection Based on Improved NanoDet
2.2.3. Group Attribute Auxiliary Guidance Module Based on Knowledge Distillation
2.2.4. Keyframe Detection Based on Inter Frame Difference
3. Results
3.1. Experimental Environment
3.2. Experiments on Hyperparameters
3.3. Comparative Experiment
- SSD [20]: SSD combines multiple feature maps of different scales with default anchors of different sizes to effectively detect objects of different sizes and proportions. It utilizes convolutional neural networks (CNNs) to extract features from images and applies predefined convolutional sliding windows on each feature map to predict the position and category of objects.
- Faster R-CNN [21]: SSD combines multiple feature maps of different scales with default anchors of different sizes to effectively detect objects of different sizes and proportions. It utilizes convolutional neural networks (CNNs) to extract features from images and applies predefined convolutional sliding windows on each feature map to predict the position and category of objects.
- YoloV5 [22]: This model integrates various detection techniques such as an FPN and Mosaic (data augmentation method), making it more effective in learning image features and capable of detecting objects of different sizes and shapes, with strong adaptability. Meanwhile, the model training process is simple, it can be trained on a large scale, and it has high scalability. These features make YoloV5 widely applicable in practical applications.
- FCOS [23]: Compared to traditional two-stage methods, FCOS adopts a single-stage detection approach that does not require the use of candidate boxes and can directly output the object category and bounding box information for each pixel. FCOS introduces the concept of unbounded boxes (infinitely long square borders), and achieves accurate object detection by densely predicting the category, centrality, and boundary offset of each position on the feature map. In addition, FCOS also uses adaptive category branching and numerical stabilization techniques to improve performance and stability, and achieves a good balance between accuracy and speed, with high practicality and promotional value.
- YoloV10 [24]: This model presents consistent dual assignments for NMS-free training of Yolos, firstly to eliminate the latency caused by non-maximum suppression (NMS) for post-processing. And it performs design on detection heads, down-sampling layers, and basic building blocks, making YoloV10 achieve better efficiency–accuracy trade-offs.
3.4. Ablation Study
4. Discussion
4.1. Discussion on Tongue Images
4.2. Discussion on Tongue Videos
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, C.; Fang, C. Diagnostics of Traditional Chinese Medicine; China Press of Chinese Medicine: Beijing, China, 2021; pp. 39–55. [Google Scholar]
- Chen, E.; Li, S.; Hu, M.; He, Q.; Bao, Z.; Yang, H. Research progress on image acquisition and color information analysis of tongue diagnosis in traditional Chinese medicine. China J. Tradit. Chin. Med. Pharm. 2024, 39, 3586–3589. [Google Scholar]
- Cai, Y.; Hu, S.; Guan, J.; Zhang, X. Progress on Objectification Technology of Tongue Inspection in Traditional Chinese Medi-cine and Discussion on its Application. Mod. Tradit. Chin. Med. Mater. Med.-World Sci. Technol. 2021, 23, 2447–2453. [Google Scholar]
- Bo, W.; Ni, S. Research on Multi-Position Target Detection of Electronic Components Combined with HOG and SVM. Mach. Des. Manuf. 2021, 10, 76–80. [Google Scholar]
- Wang, Q. Highway Parking Event Detection Based on Improved Haar-like+Adaboost. Master’s Thesis, Chongqing University, Chongqing, China, 2021. [Google Scholar]
- Zheng, F. Research on Tongue Detection and Tongue Segmentation in Open Environment. Master’s Thesis, Xiamen University, Xiamen, China, 2017. [Google Scholar]
- Fu, Z.; Li, X.; Li, F. Tongue image segmentation based on Snake model and radial edge detection. Image Vis. Comput. 2009, 14, 688–693. [Google Scholar]
- Tong, K.; Wu, Y.; Zhou, F. Recent Advances in Small Object Detection Based on Deep Learning: A Review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Tang, W.; Gao, Y.; Liu, L.; Xia, T.; He, L.; Zhang, S.; Guo, J.; Li, W.; Xu, Q. An Automatic Recognition of Tooth- Marked Tongue Based on Tongue Region Detection and Tongue Landmark Detection via Deep Learning. IEEE Access 2020, 8, 153470–153478. [Google Scholar] [CrossRef]
- Liu, B. Research on Tongue Feature Recognition Based on Image Segmentation and Detection. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2023. [Google Scholar]
- Zhu, L. Research on tongue image detection and segmentation method based on deep learning. Master’s Thesis, Hunan University of Traditional Chinese Medicine, Changsha, China, 2023. [Google Scholar]
- Zendehdel, N.; Chen, H.; Leu, M.C. Real-Time Tool Detection in Smart Manufacturing Using You-Only-Look-Once (YOLO)V5. Manuf. Lett. 2023, 35, 1052–1059. [Google Scholar] [CrossRef]
- Kwon, H.; Kim, D.-J. Dual-Targeted Adversarial Example in Evasion Attack on Graph Neural Networks. Sci. Rep. 2025, 15, 3912. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Lin, Y.; Li, L. Application progress of intelligent diagnosis and treatment in tongue manifestation research. China J. Tradit. Chin. Med. Pharm. 2021, 36, 342–346. [Google Scholar]
- Wu, X.; Xu, H.; Lin, Z.; Li, S.; Liu, H.; Feng, Y. Review of Deep Learning in Classification of Tongue Image. J. Front. Comput. Sci. Technol. 2023, 17, 303–323. [Google Scholar]
- Nguyen, C.H.; Nguyen, T.C.; Tang, T.N.; Phan, N.L.H. Improving Object Detection by Label Assignment Distillation. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; IEEE: Waikoloa, HI, USA, 2022; pp. 1322–1331. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer International Publishing: Munich, Germany, 2018; Volume 11218, pp. 122–138. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Seattle, WA, USA, 2020; pp. 1577–1586. [Google Scholar]
- Cheng, G.; Wang, J.; Li, K.; Xie, X.; Lang, C.; Yao, Y.; Han, J. Anchor-Free Oriented Proposal Generator for Object Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Ultralytics. YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 November 2021).
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Seoul, Republic of Korea, 2019; pp. 9626–9635. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Volume 37, pp. 107984–108011. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Las Vegas, NV, USA, 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML, Long Beach, CA, USA, 9–15 June 2019; Volume 36, pp. 6105–6114. [Google Scholar]
Dataset | Type | Number | Resolution |
---|---|---|---|
Tongue image dataset | lingual image | 1452 | 312 × 415 |
sublingual image | 1452 | 312 × 415 |
Label | Images | Categories | Annotations |
---|---|---|---|
content | height width id filename | super-category id name | imageid bbox area categoryid |
Stage | Output Size | Kernel Size | Stride | Number | Output Channel |
---|---|---|---|---|---|
image | 224 × 224 | - | - | - | 4 |
Stage1 | 112 × 112 | 3 × 3 | 2 | 1 | 32 |
56 × 56 | 3 × 3 | 2 | 1 | ||
Stage2 | 28 × 28 | 3 × 3 | 2 | 1 | 64 |
28 × 28 | 3 × 3 | 1 | 3 | ||
Stage3 | 14 × 14 | 3 × 3 | 2 | 1 | 128 |
14 × 14 | 3 × 3 | 1 | 7 | ||
Stage4 | 7 × 7 | 3 × 3 | 2 | 1 | 256 |
7 × 7 | 3 × 3 | 1 | 3 |
Hyperparameter | mAP | ||
---|---|---|---|
0.5 | 2 | 0.25 | 0.744 (±0.031) |
1 | 1 | 1 | 0.726 (±0.013) |
0.5 | 1 | 1 | 0.828 (±0.016) |
1.5 | 1 | 1 | 0.767 (±0.011) |
1 | 1 | 0.5 | 0.776 (±0.012) |
1 | 1 | 1.5 | 0.738 (±0.019) |
1 | 0.5 | 1 | 0.766 (±0.013) |
Hyperparameter | Value |
---|---|
Learning rate | 0.01 |
Weight decay | 0.005 |
0.5 | |
1 | |
1 |
Method | mAP | Parameters | FPS | Memory Usage |
---|---|---|---|---|
SSD | 0.629 | 156.14 M | 24.72 (±0.48) | 407.48 MB |
Faster RCNN | 0.805 | 10.16 M | 17.02 (±1.61) | 38.80 MB |
YoloV5 | 0.694 | 36.90 M | 38.12 (±0.78) | 142.10 MB |
FCOS | 0.562 | 123.12 M | 38.15 (±1.65) | 459.04 MB |
YoloV10 | 0.823 | 20.45 M | 156.25 (±0.21) | 78.33 MB |
TD-DFP | 0.828 | 19.50 M | 61.88 (±1.75) | 76.55 MB |
Color Channel Addition | mAP |
---|---|
without color channel addition | 0.805 (±0.005) |
H of HSV added | 0.761 (±0.120) |
S of HSV added | 0.759 (±0.025) |
V of HSV added | 0.828 (±0.018) |
Feature Extractor | mAP |
---|---|
TD-DFP with ResNet | 0.504 (±0.025) |
TD-DFP with Ghost | 0.773 (±0.028) |
TD-DFP with EfficientNetLite | 0.767 (±0.017) |
TD-DFP with ShuffleNetV2 without FPN | 0.641 (±0.162) |
TD-DFP with ShuffleNetV2 | 0.828 (±0.018) |
Method | Parameters | mAP |
---|---|---|
TD-DFP without AGM | 13.79 M | 0.754 (±0.012) |
TD-DFP with AGM | 19.50 M | 0.828 (±0.018) |
Keyframe Selection | Forward_Time 1 | Decode_Time 2 | Viz_Time 3 | FPS |
---|---|---|---|---|
without keyframe selection | 3.30 s (±0.25) | 0.62 s (±0.01) | 0.45 s (±0.02) | 61.88 (±0.46) |
2 frames interval | 1.65 s (±0.19) | 0.30 s (±0.02) | 0.45 s (±0.03) | 125.00 (±0.50) |
3 frames interval | 1.10 s (±0.17) | 0.20 s (±0.04) | 0.45 s (±0.02) | 171.42 (±0.38) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, K.; Zhang, Y.; Zhong, L.; Liu, Y. Fast Tongue Detection Based on Lightweight Model and Deep Feature Propagation. Electronics 2025, 14, 1457. https://doi.org/10.3390/electronics14071457
Chen K, Zhang Y, Zhong L, Liu Y. Fast Tongue Detection Based on Lightweight Model and Deep Feature Propagation. Electronics. 2025; 14(7):1457. https://doi.org/10.3390/electronics14071457
Chicago/Turabian StyleChen, Keju, Yun Zhang, Li Zhong, and Yongguo Liu. 2025. "Fast Tongue Detection Based on Lightweight Model and Deep Feature Propagation" Electronics 14, no. 7: 1457. https://doi.org/10.3390/electronics14071457
APA StyleChen, K., Zhang, Y., Zhong, L., & Liu, Y. (2025). Fast Tongue Detection Based on Lightweight Model and Deep Feature Propagation. Electronics, 14(7), 1457. https://doi.org/10.3390/electronics14071457