Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (168)

Search Parameters:
Keywords = triplet loss

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 1794 KiB  
Article
Exploring Attributions in Convolutional Neural Networks for Cow Identification
by Dimitar Tanchev, Alexander Marazov, Gergana Balieva, Ivanka Lazarova and Ralitsa Rankova
Appl. Sci. 2025, 15(7), 3622; https://doi.org/10.3390/app15073622 - 26 Mar 2025
Viewed by 68
Abstract
Face recognition and identification is a method that is well established in traffic monitoring, security, human biodata analysis, etc. Regarding the current development and implementation of digitalization in all spheres of public life, new approaches are being sought to use the opportunities of [...] Read more.
Face recognition and identification is a method that is well established in traffic monitoring, security, human biodata analysis, etc. Regarding the current development and implementation of digitalization in all spheres of public life, new approaches are being sought to use the opportunities of high technology advancements in animal husbandry to enhance the sector’s sustainability. Using machine learning the present study aims to investigate the possibilities for the creation of a model for visual face recognition of farm animals (cows) that could be used in future applications to manage health, welfare, and productivity of the animals at the herd and individual levels in real-time. This study provides preliminary results from an ongoing research project, which employs attribution methods to identify which parts of a facial image contribute most to cow identification using a triplet loss network. A new dataset for identifying cows in farm environments was therefore created by taking digital images of cows at animal holdings with intensive breeding systems. After normalizing the images, they were subsequently segmented into cow and background regions. Several methods were then explored for analyzing attributions and examine whether the cow or background regions have a greater influence on the network’s performance and identifying the animal. Full article
Show Figures

Figure 1

18 pages, 2362 KiB  
Article
Hyperspectral Target Detection Based on Masked Autoencoder Data Augmentation
by Zhixuan Zhuang, Jinhui Lan and Yiliang Zeng
Remote Sens. 2025, 17(6), 1097; https://doi.org/10.3390/rs17061097 - 20 Mar 2025
Viewed by 184
Abstract
Deep metric learning combines deep learning with metric learning to explore the deep spectral space and distinguish between the target and background. Current target detection methods typically fail to accurately distinguish local differences between the target and background, leading to insufficient suppression of [...] Read more.
Deep metric learning combines deep learning with metric learning to explore the deep spectral space and distinguish between the target and background. Current target detection methods typically fail to accurately distinguish local differences between the target and background, leading to insufficient suppression of the pixels surrounding the target and poor detection performance. To solve this issue, a hyperspectral target detection method based on masked autoencoder data augmentation (HTD-DA) was proposed. HTD-DA includes a multi-scale spectral metric network based on a triplet network, which enhances the ability to learn local and global spectral variations using multi-scale feature extraction and feature fusion, thereby improving background suppression. To alleviate the lack of training data, a masked spectral data augmentation network was employed. It utilizes the entire hyperspectral image (HSI) training the network to learn spectral variability through mask-based reconstruction techniques and generate target samples based on the prior spectrum. Additionally, in search of more optimal spectral space, an Inter-class Difference Amplification Triplet (IDAT) Loss was introduced to enhance the separation between the target and background when finding the spectral space, by making full use of background and prior information. The experimental results demonstrated that the proposed model provides superior detection results. Full article
(This article belongs to the Special Issue Image Processing from Aerial and Satellite Imagery)
Show Figures

Graphical abstract

36 pages, 28546 KiB  
Article
An Improved YOLOv8-Based Lightweight Attention Mechanism for Cross-Scale Feature Fusion
by Shaodong Liu, Faming Shao, Weijun Chu, Juying Dai and Heng Zhang
Remote Sens. 2025, 17(6), 1044; https://doi.org/10.3390/rs17061044 - 16 Mar 2025
Viewed by 468
Abstract
This paper addresses the challenge of small object detection in remote sensing image recognition by proposing an improved YOLOv8-based lightweight attention cross-scale feature fusion model named LACF-YOLO. Prior to the backbone network outputting feature maps, this model introduces a lightweight attention module, Triplet [...] Read more.
This paper addresses the challenge of small object detection in remote sensing image recognition by proposing an improved YOLOv8-based lightweight attention cross-scale feature fusion model named LACF-YOLO. Prior to the backbone network outputting feature maps, this model introduces a lightweight attention module, Triplet Attention, and replaces the Concatenation with Fusion (C2f) with a more convenient and higher-performing dilated inverted convolution layer to acquire richer contextual information during the feature extraction phase. Additionally, it employs convolutional blocks composed of partial convolution and pointwise convolution as the main body of the cross-scale feature fusion network to integrate feature information from different levels. The model also utilizes the faster-converging Focal EIOU loss function to enhance accuracy and efficiency. Experimental results on the DOTA and VisDrone2019 datasets demonstrate the effectiveness of the improved model. Compared to the original YOLOv8 model, LACF-YOLO achieves a 2.9% increase in mAP and a 4.6% increase in mAPS on the DOTA dataset and a 3.5% increase in mAP and a 3.8% increase in mAPS on the VisDrone2019 dataset, with a 34.9% reduction in the number of parameters and a 26.2% decrease in floating-point operations. The model exhibits superior performance in aerial object detection. Full article
Show Figures

Figure 1

14 pages, 668 KiB  
Article
Fine-Grained Local and Global Semantic Fusion for Multimodal Image–Text Retrieval
by Shenao Peng, Zhongmei Wang, Jianhua Liu, Changfan Zhang and Lin Jia
Big Data Cogn. Comput. 2025, 9(3), 53; https://doi.org/10.3390/bdcc9030053 - 25 Feb 2025
Viewed by 330
Abstract
An image–text retrieval method that integrates intramodal fine-grained local semantic information and intermodal global semantic information is proposed to address the weak fine-grained discrimination capabilities for the semantic features located between image and text modalities in cross-modal retrieval tasks. First, the original features [...] Read more.
An image–text retrieval method that integrates intramodal fine-grained local semantic information and intermodal global semantic information is proposed to address the weak fine-grained discrimination capabilities for the semantic features located between image and text modalities in cross-modal retrieval tasks. First, the original features of images and texts are extracted, and a graph attention network is employed for region relationship reasoning to obtain relation-enhanced local features. Then, an attention mechanism is used for different semantically interacting samples within the same modality, enabling comprehensive intramodal relationship learning and producing semantically enhanced image and text embeddings. Finally, a triplet loss function is used to train the entire model, and it is enhanced with an angular constraint. Through extensive comparative experiments conducted on the Flickr30K and MS-COCO benchmark datasets, the effectiveness and superiority of the proposed method were verified. It outperformed the current method by 6.4% relatively for image retrieval and 1.3% relatively for caption retrieval on MS-COCO (Recall@1 using the 1K test set). Full article
Show Figures

Figure 1

20 pages, 9559 KiB  
Article
Estimation Model of Corn Leaf Area Index Based on Improved CNN
by Chengkai Yang, Jingkai Lei, Zhihao Liu, Shufeng Xiong, Lei Xi, Jian Wang, Hongbo Qiao and Lei Shi
Agriculture 2025, 15(5), 481; https://doi.org/10.3390/agriculture15050481 - 24 Feb 2025
Viewed by 392
Abstract
In response to the issues of high complexity and low efficiency associated with the current reliance on manual sampling and instrumental measurement for obtaining maize leaf area index (LAI), this study constructed a maize image dataset comprising 624 images from three growth stages [...] Read more.
In response to the issues of high complexity and low efficiency associated with the current reliance on manual sampling and instrumental measurement for obtaining maize leaf area index (LAI), this study constructed a maize image dataset comprising 624 images from three growth stages of summer maize in the Henan region, namely the jointing stage, small trumpet stage, and large trumpet stage. Furthermore, a maize LAI estimation model named LAINet, based on an improved convolutional neural network (CNN), was proposed. LAI estimation was carried out at these three key growth stages. In this study, the output structure was improved based on the ResNet architecture to adapt to regression tasks. The Triplet module was introduced to achieve feature fusion and self-attention mechanisms, thereby enhancing the accuracy of maize LAI estimation. The model structure was adjusted to enable the integration of growth-stage information, and the loss function was improved to accelerate the convergence speed of the network model. The model was validated on the self-constructed dataset. The results showed that the incorporation of attention mechanisms, integration of growth-stage information, and improvement of the loss function increased the model’s R2 by 0.04, 0.15, and 0.05, respectively. Among these, the integration of growth-stage information led to the greatest improvement, with the R2 increasing directly from 0.54 to 0.69. The improved model, LAINet, achieved an R2 of 0.81, which indicates that it can effectively estimate the LAI of maize. This model can provide information technology support for the phenotypic monitoring of field crops. Full article
Show Figures

Figure 1

25 pages, 31509 KiB  
Article
Expanding Open-Vocabulary Understanding for UAV Aerial Imagery: A Vision–Language Framework to Semantic Segmentation
by Bangju Huang, Junhui Li, Wuyang Luan, Jintao Tan, Chenglong Li and Longyang Huang
Drones 2025, 9(2), 155; https://doi.org/10.3390/drones9020155 - 19 Feb 2025
Viewed by 421
Abstract
The open-vocabulary understanding of UAV aerial images plays a crucial role in enhancing the intelligence level of remote sensing applications, such as disaster assessment, precision agriculture, and urban planning. In this paper, we propose an innovative open-vocabulary model for UAV images, which combines [...] Read more.
The open-vocabulary understanding of UAV aerial images plays a crucial role in enhancing the intelligence level of remote sensing applications, such as disaster assessment, precision agriculture, and urban planning. In this paper, we propose an innovative open-vocabulary model for UAV images, which combines vision–language methods to achieve efficient recognition and segmentation of unseen categories by generating multi-view image descriptions and feature extraction. To enhance the generalization ability and robustness of the model, we adopted Mixup technology to blend multiple UAV images, generating more diverse and representative training data. To address the limitations of existing open-vocabulary models in UAV image analysis, we leverage the GPT model to generate accurate and professional text descriptions of aerial images, ensuring contextual relevance and precision. The image encoder utilizes a U-Net with Mamba architecture to extract key point information through edge detection and partition pooling, further improving the effectiveness of feature representation. The text encoder employs a fine-tuned BERT model to convert text descriptions of UAV images into feature vectors. Three key loss functions were designed: Generalization Loss to balance old and new category scores, semantic segmentation loss to evaluate model performance on UAV image segmentation tasks, and Triplet Loss to enhance the model’s ability to distinguish features. The Comprehensive Loss Function integrates these terms to ensure robust performance in complex UAV segmentation tasks. Experimental results demonstrate that the proposed method has significant advantages in handling unseen categories and achieving high accuracy in UAV image segmentation tasks, showcasing its potential for practical applications in diverse aerial imagery scenarios. Full article
Show Figures

Figure 1

21 pages, 7597 KiB  
Article
A Novel Neural Network Model Based on Real Mountain Road Data for Driver Fatigue Detection
by Dabing Peng, Junfeng Cai, Lu Zheng, Minghong Li, Ling Nie and Zuojin Li
Biomimetics 2025, 10(2), 104; https://doi.org/10.3390/biomimetics10020104 - 12 Feb 2025
Viewed by 506
Abstract
Mountainous roads are severely affected by environmental factors such as insufficient lighting and shadows from tree branches, which complicates the detection of drivers’ facial features and the determination of fatigue states. An improved method for recognizing driver fatigue states on mountainous roads using [...] Read more.
Mountainous roads are severely affected by environmental factors such as insufficient lighting and shadows from tree branches, which complicates the detection of drivers’ facial features and the determination of fatigue states. An improved method for recognizing driver fatigue states on mountainous roads using the YOLOv5 neural network is proposed. Initially, modules from Deformable Convolutional Networks (DCNs) are integrated into the feature extraction stage of the YOLOv5 framework to improve the model’s flexibility in recognizing facial characteristics and handling postural changes. Subsequently, a Triplet Attention (TA) mechanism is embedded within the YOLOv5 network to bolster image noise suppression and improve the network’s robustness in recognition. Finally, the Wing loss function is introduced into the YOLOv5 model to heighten the sensitivity to micro-features and enhance the network’s capability to capture details. Experimental results demonstrate that the modified YOLOv5 neural network achieves an average accuracy rate of 85% in recognizing driver fatigue states. Full article
(This article belongs to the Special Issue Bio-Inspired Robotics and Applications)
Show Figures

Figure 1

24 pages, 834 KiB  
Article
Triple Down on Robustness: Understanding the Impact of Adversarial Triplet Compositions on Adversarial Robustness
by Sander Joos, Tim Van hamme, Willem Verheyen, Davy Preuveneers and Wouter Joosen
Mach. Learn. Knowl. Extr. 2025, 7(1), 14; https://doi.org/10.3390/make7010014 - 8 Feb 2025
Viewed by 455
Abstract
Adversarial training, a widely used technique for fortifying the robustness of machine learning models, has seen its effectiveness further bolstered by modifying loss functions or incorporating additional terms into the training objective. While these adaptations are validated through empirical studies, they lack a [...] Read more.
Adversarial training, a widely used technique for fortifying the robustness of machine learning models, has seen its effectiveness further bolstered by modifying loss functions or incorporating additional terms into the training objective. While these adaptations are validated through empirical studies, they lack a solid theoretical basis to explain the models’ secure and robust behavior. In this paper, we investigate the integration of adversarial triplets within the adversarial training framework, a method previously shown to enhance robustness. However, the reasons behind this increased robustness are poorly understood, and the impact of different adversarial triplet configurations remains unclear. To address this gap, we utilize the robust and non-robust features framework to analyze how various adversarial triplet compositions influence robustness, providing deeper insights into the robustness guarantees of this approach. Specifically, we introduce a novel framework that explains how different compositions of adversarial triplets lead to distinct training dynamics, thereby affecting the model’s adversarial robustness. We validate our theoretical findings through empirical analysis, demonstrating that our framework accurately characterizes the effects of adversarial triplets on the training process. Our results offer a comprehensive explanation of how adversarial triplets influence the security and robustness of models, providing a theoretical foundation for methods that employ adversarial triplets to improve robustness. This research not only enhances our theoretical understanding but also has practical implications for developing more robust machine learning models. Full article
Show Figures

Figure 1

19 pages, 944 KiB  
Article
Patch-Font: Enhancing Few-Shot Font Generation with Patch-Based Attention and Multitask Encoding
by Irfanullah Memon, Muhammad Ammar Ul Hassan and Jaeyoung Choi
Appl. Sci. 2025, 15(3), 1654; https://doi.org/10.3390/app15031654 - 6 Feb 2025
Viewed by 583
Abstract
Few-shot font generation seeks to create high-quality fonts using minimal reference style images, addressing traditional font design’s labor-intensive and time-consuming nature, particularly for languages with large character sets like Chinese and Korean. Existing methods often require multi-stage training or predefined components, which can [...] Read more.
Few-shot font generation seeks to create high-quality fonts using minimal reference style images, addressing traditional font design’s labor-intensive and time-consuming nature, particularly for languages with large character sets like Chinese and Korean. Existing methods often require multi-stage training or predefined components, which can be time-consuming and limit generalizability. This paper introduces Patch-Font, a novel single-stage method that overcomes the limitations of prior approaches, such as multi-stage training or reliance on predefined components, by integrating a patch-based attention mechanism and a multitask encoder. Patch-Font jointly captures global style elements (e.g., overall font family characteristics) and local style details (e.g., serifs, stroke shapes), ensuring high fidelity to the target style while maintaining computational efficiency. Our approach incorporates triplet margin loss with hard positive/negative mining to disentangle style from content and a style fidelity loss to enhance local style consistency. Experiments on Korean (printed and handwritten) and Chinese fonts demonstrate that Patch-Font outperforms state-of-the-art methods in style accuracy, perceptual quality, and generation speed while generalizing robustly to unseen characters and font styles. By simplifying the font creation process and delivering high-quality results, Patch-Font represents a significant step forward in making font design more accessible and scalable for diverse languages. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 340 KiB  
Article
Quantitative Study of Swin Transformer and Loss Function Combinations for Face Anti-Spoofing
by Liang Yu Gong and Xue Jun Li
Electronics 2025, 14(3), 448; https://doi.org/10.3390/electronics14030448 - 23 Jan 2025
Viewed by 714
Abstract
Face anti-spoofing (FAS) has always been a hidden danger in network security, especially with the widespread application of facial recognition systems. However, some current FAS methods are not effective at detecting different forgery types and are prone to overfitting, which means they cannot [...] Read more.
Face anti-spoofing (FAS) has always been a hidden danger in network security, especially with the widespread application of facial recognition systems. However, some current FAS methods are not effective at detecting different forgery types and are prone to overfitting, which means they cannot effectively process unseen spoof types. Different loss functions significantly impact the classification effect based on the same feature extraction without considering the quality of the feature extraction. Therefore, it is necessary to find a loss function or a combination of different loss functions for spoofing detection tasks. This paper mainly aims to compare the effects of different loss functions or loss function combinations. We selected the Swin Transformer as the backbone of our training model to extract facial features to ensure the accuracy of the ablation experiment. For the application of loss functions, we adopted four classical loss functions: cross-entropy loss (CE loss), semi-hard triplet loss, L1 loss and focal loss. Finally, this work proposed combinations of Swin Transformers and different loss functions (pairs) to test through in-dataset experiments with some common FAS datasets (CelebA-Spoofing, CASIA-MFSD, Replay attack and OULU-NPU). We conclude that using a single loss function cannot produce the best results for the FAS task, and the best accuracy is obtained when applying triplet loss, cross-entropy loss and Smooth L1 loss as a loss combination. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

21 pages, 5699 KiB  
Article
Evaluation of Different Few-Shot Learning Methods in the Plant Disease Classification Domain
by Alexander Uzhinskiy
Biology 2025, 14(1), 99; https://doi.org/10.3390/biology14010099 - 19 Jan 2025
Viewed by 705
Abstract
Early detection of plant diseases is crucial for agro-holdings, farmers, and smallholders. Various neural network architectures and training methods have been employed to identify optimal solutions for plant disease classification. However, research applying one-shot or few-shot learning approaches, based on similarity determination, to [...] Read more.
Early detection of plant diseases is crucial for agro-holdings, farmers, and smallholders. Various neural network architectures and training methods have been employed to identify optimal solutions for plant disease classification. However, research applying one-shot or few-shot learning approaches, based on similarity determination, to the plantdisease classification domain remains limited. This study evaluates different loss functions used in similarity learning, including Contrastive, Triplet, Quadruplet, SphereFace, CosFace, and ArcFace, alongside various backbone networks, such as MobileNet, EfficientNet, ConvNeXt, and ResNeXt. Custom datasets of real-life images, comprising over 4000 samples across 68 classes of plant diseases, pests, and their effects, were utilized. The experiments evaluate standard transfer learning approaches alongside similarity learning methods based on two classes of loss function. Results demonstrate the superiority of cosine-based methods over Siamese networks in embedding extraction for disease classification. Effective approaches for model organization and training are determined. Additionally, the impact of data normalization is tested, and the generalization ability of the models is assessed using a special dataset consisting of 400 images of difficult-to-identify plant disease cases. Full article
(This article belongs to the Section Theoretical Biology and Biomathematics)
Show Figures

Figure 1

28 pages, 618 KiB  
Article
CodeContrast: A Contrastive Learning Approach for Generating Coherent Programming Exercises
by Nicolás Torres
Educ. Sci. 2025, 15(1), 80; https://doi.org/10.3390/educsci15010080 - 13 Jan 2025
Viewed by 889
Abstract
Generating high-quality programming exercises with well-aligned problem descriptions, test cases, and code solutions is crucial for computer science education. However, current methods often lack coherence among these components, reducing their educational value. We present CodeContrast, a novel generative model that uses contrastive learning [...] Read more.
Generating high-quality programming exercises with well-aligned problem descriptions, test cases, and code solutions is crucial for computer science education. However, current methods often lack coherence among these components, reducing their educational value. We present CodeContrast, a novel generative model that uses contrastive learning to map programming problems, test cases, and solutions into a shared feature space. By minimizing the distance between matched components and maximizing it for non-matched ones, CodeContrast learns the intricate relationships necessary to generate coherent programming exercises. Our model architecture includes three encoder networks for problem descriptions, test cases, and solutions. During training, CodeContrast processes positive triplets (matching problem, test case, solution) and negative triplets (non-matching combinations) and uses a contrastive loss to position positive triplets close in the feature space while separating negative ones. Comprehensive evaluations of CodeContrast—through automatic metrics, expert ratings, and student studies—demonstrate its effectiveness. Results show high code correctness (92.3% of test cases passed), strong problem–solution alignment (BLEU score up to 0.826), and robust test case coverage (85.7% statement coverage). Expert feedback and student performance further support the pedagogical value of these generated exercises, with students performing comparably to those using manually curated content. CodeContrast advances the automated generation of high-quality programming exercises, capturing relationships among programming components to enhance educational content and improve the learning experience for students and instructors. Full article
Show Figures

Figure 1

17 pages, 3169 KiB  
Article
Knowledge Reasoning- and Progressive Distillation-Integrated Detection of Electrical Construction Violations
by Bin Ma, Gang Liang, Yufei Rao, Wei Guo, Wenjie Zheng and Qianming Wang
Sensors 2024, 24(24), 8216; https://doi.org/10.3390/s24248216 - 23 Dec 2024
Viewed by 556
Abstract
To address the difficulty in detecting workers’ violation behaviors in electric power construction scenarios, this paper proposes an innovative method that integrates knowledge reasoning and progressive multi-level distillation techniques. First, standards, norms, and guidelines in the field of electric power construction are collected [...] Read more.
To address the difficulty in detecting workers’ violation behaviors in electric power construction scenarios, this paper proposes an innovative method that integrates knowledge reasoning and progressive multi-level distillation techniques. First, standards, norms, and guidelines in the field of electric power construction are collected to build a comprehensive knowledge graph, aiming to provide accurate knowledge representation and normative analysis. Then, the knowledge graph is combined with the object-detection model in the form of triplets, where detected objects and their interactions are represented as subject–predicate–object relationship. These triplets are embedded into the model using an adaptive connection network, which dynamically weights the relevance of external knowledge to enhance detection accuracy. Furthermore, to enhance the model’s performance, the paper designs a progressive multi-level distillation strategy. On one hand, knowledge transfer is conducted at the object level, region level, and global level, significantly reducing the loss of contextual information during distillation. On the other hand, two teacher models of different scales are introduced, employing a two-stage distillation strategy where the advanced teacher guides the primary teacher in the first stage, and the primary teacher subsequently distills this knowledge to the student model in the second stage, effectively bridging the scale differences between the teacher and student models. Experimental results demonstrate that under the proposed method, the model size is reduced from 14.5 MB to 3.8 MB, and the floating-point operations (FLOPs) are reduced from 15.8 GFLOPs to 5.9 GFLOPs. Despite these optimizations, the AP50 reaches 92.4%, showing a 1.8% improvement compared to the original model. These results highlight the method’s effectiveness in accurately detecting workers’ violation behaviors, providing a quantitative basis for its superiority and offering a novel approach for safety management and monitoring at construction sites. Full article
Show Figures

Figure 1

24 pages, 21931 KiB  
Article
Evaluating and Enhancing Face Anti-Spoofing Algorithms for Light Makeup: A General Detection Approach
by Zhimao Lai, Yang Guo, Yongjian Hu, Wenkang Su and Renhai Feng
Sensors 2024, 24(24), 8075; https://doi.org/10.3390/s24248075 - 18 Dec 2024
Viewed by 579
Abstract
Makeup modifies facial textures and colors, impacting the precision of face anti-spoofing systems. Many individuals opt for light makeup in their daily lives, which generally does not hinder face identity recognition. However, current research in face anti-spoofing often neglects the influence of light [...] Read more.
Makeup modifies facial textures and colors, impacting the precision of face anti-spoofing systems. Many individuals opt for light makeup in their daily lives, which generally does not hinder face identity recognition. However, current research in face anti-spoofing often neglects the influence of light makeup on facial feature recognition, notably the absence of publicly accessible datasets featuring light makeup faces. If these instances are incorrectly flagged as fraudulent by face anti-spoofing systems, it could lead to user inconvenience. In response, we develop a face anti-spoofing database that includes light makeup faces and establishes a criterion for determining light makeup to select appropriate data. Building on this foundation, we assess multiple established face anti-spoofing algorithms using the newly created database. Our findings reveal that the majority of these algorithms experience a decrease in performance when faced with light makeup faces. Consequently, this paper introduces a general face anti-spoofing algorithm specifically designed for light makeup faces, which includes a makeup augmentation module, a batch channel normalization module, a backbone network updated via the Exponential Moving Average (EMA) method, an asymmetric virtual triplet loss module, and a nearest neighbor supervised contrastive module. The experimental outcomes confirm that the proposed algorithm exhibits superior detection capabilities when handling light makeup faces. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

14 pages, 1158 KiB  
Article
Extreme R-CNN: Few-Shot Object Detection via Sample Synthesis and Knowledge Distillation
by Shenyong Zhang, Wenmin Wang, Zhibing Wang, Honglei Li, Ruochen Li and Shixiong Zhang
Sensors 2024, 24(23), 7833; https://doi.org/10.3390/s24237833 - 7 Dec 2024
Cited by 1 | Viewed by 962
Abstract
Traditional object detectors require extensive instance-level annotations for training. Conversely, few-shot object detectors, which are generally fine-tuned using limited data from unknown classes, tend to show biases toward base categories and are susceptible to variations within these unknown samples. To mitigate these challenges, [...] Read more.
Traditional object detectors require extensive instance-level annotations for training. Conversely, few-shot object detectors, which are generally fine-tuned using limited data from unknown classes, tend to show biases toward base categories and are susceptible to variations within these unknown samples. To mitigate these challenges, we introduce a Two-Stage Fine-Tuning Approach (TFA) named Extreme R-CNN, designed to operate effectively with extremely limited original samples through the integration of sample synthesis and knowledge distillation. Our approach involves synthesizing new training examples via instance clipping and employing various data-augmentation techniques. We enhance the Faster R-CNN architecture by decoupling the regression and classification components of the Region of Interest (RoI), allowing synthetic samples to train the classification head independently of the object-localization process. Comprehensive evaluations on the Microsoft COCO and PASCAL VOC datasets demonstrate significant improvements over baseline methods. Specifically, on the PASCAL VOC dataset, the average precision for novel categories is enhanced by up to 15 percent, while on the more complex Microsoft COCO benchmark it is enhanced by up to 6.1 percent. Remarkably, in the 1-shot scenario, the AP50 of our model exceeds that of the baseline model in the 10-shot setting within the PASCAL VOC dataset, confirming the efficacy of our proposed method. Full article
(This article belongs to the Collection Artificial Intelligence in Sensors Technology)
Show Figures

Figure 1

Back to TopTop