Next Article in Journal
A Deep Ensemble Learning Approach Based on a Vision Transformer and Neural Network for Multi-Label Image Classification
Previous Article in Journal
Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet
Previous Article in Special Issue
Eliciting Emotions: Investigating the Use of Generative AI and Facial Muscle Activation in Children’s Emotional Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Self-Supervised Foundation Model for Template Matching

1
Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, Bulgaria
2
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev Str., Block 2, 1113 Sofia, Bulgaria
3
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. G. Bonchev Str., Block 8, 1113 Sofia, Bulgaria
*
Authors to whom correspondence should be addressed.
Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038
Submission received: 5 November 2024 / Revised: 12 January 2025 / Accepted: 8 February 2025 / Published: 11 February 2025
(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

Abstract

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one.
Keywords: self-supervised learning; template matching; foundation model; convolutional neural network; image matching self-supervised learning; template matching; foundation model; convolutional neural network; image matching

Share and Cite

MDPI and ACS Style

Hristov, A.; Dimov, D.; Nisheva-Pavlova, M. Self-Supervised Foundation Model for Template Matching. Big Data Cogn. Comput. 2025, 9, 38. https://doi.org/10.3390/bdcc9020038

AMA Style

Hristov A, Dimov D, Nisheva-Pavlova M. Self-Supervised Foundation Model for Template Matching. Big Data and Cognitive Computing. 2025; 9(2):38. https://doi.org/10.3390/bdcc9020038

Chicago/Turabian Style

Hristov, Anton, Dimo Dimov, and Maria Nisheva-Pavlova. 2025. "Self-Supervised Foundation Model for Template Matching" Big Data and Cognitive Computing 9, no. 2: 38. https://doi.org/10.3390/bdcc9020038

APA Style

Hristov, A., Dimov, D., & Nisheva-Pavlova, M. (2025). Self-Supervised Foundation Model for Template Matching. Big Data and Cognitive Computing, 9(2), 38. https://doi.org/10.3390/bdcc9020038

Article Metrics

Back to TopTop