MDPI - Publisher of Open Access Journals

29 pages, 1677 KB

Open AccessArticle

Pairwise Diverse and Uncertain Gradient-Sampling for Similarity Retrieval

by Christoffer Löffler

Sensors 2025, 25(22), 6899; https://doi.org/10.3390/s25226899 - 12 Nov 2025

Sports tracking produces large, unstructured trajectory datasets. The search and retrieval of interesting plays are essential parts of their analysis. Since annotations are sparse, similarity search remains the standard technique. It relies on learned lower-dimensional representations for its computational feasibility. Siamese Networks learn [...] Read more.

Sports tracking produces large, unstructured trajectory datasets. The search and retrieval of interesting plays are essential parts of their analysis. Since annotations are sparse, similarity search remains the standard technique. It relies on learned lower-dimensional representations for its computational feasibility. Siamese Networks learn dimensionality reduction from pairwise distances. However, complete training datasets are impractical to compute due to their combinatorial nature and the cost of distance calculations. Sub-sampling sacrifices representation quality for speed, leading to less meaningful search results. We propose the novel sampling technique Pairwise Diverse and Uncertain Gradient (PairDUG), which exploits the model’s gradient signals to select representative and informative pairs for training. The broad experimental study implements the method for large-scale basketball and American football datasets. The results show that PairDUG at least halves the required compute time while maintaining, or even improving, retrieval quality, and outperforms other baseline methods. Furthermore, our evaluation shows that the selected pairs’ gradient signals exhibit greater magnitude, diversity, and stability than those of any other method. This work represents a foundational contribution to pairwise distance learning. Hence, future work transfers the method not only to other sports, such as soccer, but also to complex trajectory datasets outside the sports domain. Full article

(This article belongs to the Collection Artificial Intelligence in Sensors Technology)

► Show Figures

Figure 1

16 pages, 3567 KB

Open AccessArticle

DCSC Mamba: A Novel Network for Building Change Detection with Dense Cross-Fusion and Spatial Compensation

by Rui Xu, Renzhong Mao, Yihui Yang, Weiping Zhang, Yiteng Lin and Yining Zhang

Information 2025, 16(11), 975; https://doi.org/10.3390/info16110975 - 11 Nov 2025

Abstract

Change detection in remote sensing imagery plays a vital role in urban planning, resource monitoring, and disaster assessment. However, current methods, including CNN-based approaches and Transformer-based detectors, still suffer from false change interference, irregular regional variations, and the loss of fine-grained details. To [...] Read more.

Change detection in remote sensing imagery plays a vital role in urban planning, resource monitoring, and disaster assessment. However, current methods, including CNN-based approaches and Transformer-based detectors, still suffer from false change interference, irregular regional variations, and the loss of fine-grained details. To address these issues, this paper proposes a novel building change detection network named Dense Cross-Fusion and Spatial Compensation Mamba (DCSC Mamba). The network adopts a Siamese encoder–decoder architecture, where dense cross-scale fusion is employed to achieve multi-granularity integration of cross-modal features, thereby enhancing the overall representation of multi-scale information. Furthermore, a spatial compensation module is introduced to effectively capture both local details and global contextual dependencies, improving the recognition of complex change patterns. By integrating dense cross-fusion with spatial compensation, the proposed network exhibits a stronger capability in extracting complex change features. Experimental results on the LEVIR-CD and SYSU-CD datasets demonstrate that DCSC Mamba achieves superior performance in detail preservation and robustness against interference. Specifically, it achieves F1 scores of 90.29% and 79.62%, and IoU scores of 82.30% and 66.13% on the two datasets, respectively, validating the effectiveness and robustness of the proposed method in challenging change detection scenarios. Full article

► Show Figures

Figure 1

19 pages, 2598 KB

Open AccessArticle

DOCB: A Dynamic Online Cross-Batch Hard Exemplar Recall for Cross-View Geo-Localization

by Wenchao Fan, Xuetao Tian, Long Huang, Xiuwei Zhang and Fang Wang

ISPRS Int. J. Geo-Inf. 2025, 14(11), 418; https://doi.org/10.3390/ijgi14110418 - 26 Oct 2025

Viewed by 358

Abstract

Image-based geo-localization is a challenging task that aims to determine the geographic location of a ground-level query image captured by an Unmanned Ground Vehicle (UGV) by matching it to geo-tagged nadir-view (top-down) images from an Unmanned Aerial Vehicle (UAV) stored in a reference [...] Read more.

Image-based geo-localization is a challenging task that aims to determine the geographic location of a ground-level query image captured by an Unmanned Ground Vehicle (UGV) by matching it to geo-tagged nadir-view (top-down) images from an Unmanned Aerial Vehicle (UAV) stored in a reference database. The challenge comes from the perspective inconsistency between matched objects. In this work, we propose a novel metric learning scheme for hard exemplar mining to improve the performance of cross-view geo-localization. Specifically, we introduce a Dynamic Online Cross-Batch (DOCB) hard exemplar mining scheme that solves the problem of the lack of hard exemplars in mini-batches in the middle and late stages of training, which leads to training stagnation. It mines cross-batch hard negative exemplars according to the current network state and reloads them into the network to make the gradient of negative exemplars participating in back-propagation. Since the feature representation of cross-batch negative examples adapts to the current network state, the triplet loss calculation becomes more accurate. Compared with methods only considering the gradient of anchors and positives, adding the gradient of negative exemplars helps us to obtain the correct gradient direction. Therefore, our DOCB scheme can better guide the network to learn valuable metric information. Moreover, we design a simple Siamese-like network called multi-scale feature aggregation (MSFA), which can generate multi-scale feature aggregation by learning and fusing multiple local spatial embeddings. The experimental results demonstrate that our DOCB scheme and MSFA network achieve an accuracy of 95.78% on the CVUSA dataset and 86.34% on the CVACT_val dataset, which outperforms those of other existing methods in the field. Full article

(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

► Show Figures

Figure 1

18 pages, 5589 KB

Open AccessTechnical Note

Dual-Task Supervised Network for SAR and Road Vector Image Matching

by Hanyu Cai, Yong Xian, Shaopeng Li and Decao Ma

Remote Sens. 2025, 17(20), 3504; https://doi.org/10.3390/rs17203504 - 21 Oct 2025

Viewed by 356

Abstract

We propose using Synthetic Aperture Radar (SAR) images as real-time images and road vector images as reference images for matching navigation, and propose a Siamese U-Net dual-task supervised network for solving the problem, called SUDS. Unlike existing methods of heterogenous image matching, which [...] Read more.

We propose using Synthetic Aperture Radar (SAR) images as real-time images and road vector images as reference images for matching navigation, and propose a Siamese U-Net dual-task supervised network for solving the problem, called SUDS. Unlike existing methods of heterogenous image matching, which extract common features and eliminate saliency differences for matching, we exploit the advantages of the vector images themselves to reduce the matching difficulty from the reference image selection. Firstly, we extract the common road features between SAR images and road vector images using a weight-sharing U-Net feature extraction network. Then, we propose to weight the sum of segmentation loss and matching loss as the network loss to optimize the feature extraction efficiency from both segmentation and matching perspectives. We prepare a specialized SAR-VEC dataset for experiments. Experiments show that the method is able to obtain high matching correctness, with 80.2% correctness within 5 pixels of matching error and 91.0% correctness within 10 pixels of matching error. Compared to existing methods, this method is able to identify the differences in similar roads and better eliminate the influence of imaging interference in SAR images on the matching results, obtaining more accurate matching results with better robustness. And we explore the effect of different weighting parameters

β

on the matching accuracy, and the best matching results are obtained when

β = 0.8

. Full article

(This article belongs to the Special Issue Smart Monitoring of Urban Environment Using Remote Sensing)

► Show Figures

Figure 1

19 pages, 4399 KB

Open AccessArticle

Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets

by Damir Shodiev, Egor Ushakov, Arsenii Litvinov and Yury Markin

Informatics 2025, 12(4), 112; https://doi.org/10.3390/informatics12040112 - 18 Oct 2025

Viewed by 609

Abstract

Background: Significant progress has been made in the field of machine learning, enabling the development of methods for automatic interpretation of medical images that provide high-quality diagnostics. However, most of these methods require access to confidential data, making them difficult to apply under [...] Read more.

Background: Significant progress has been made in the field of machine learning, enabling the development of methods for automatic interpretation of medical images that provide high-quality diagnostics. However, most of these methods require access to confidential data, making them difficult to apply under strict privacy requirements. Existing privacy-preserving approaches, such as federated learning and dataset distillation, have limitations related to data access, visual interpretability, etc. Methods: This study explores the use of generative models to create synthetic medical data that preserves the statistical properties of the original data while ensuring privacy. The research is carried out on the VinDr-Mammo dataset of digital mammography images. A conditional generative method using Latent Diffusion Models (LDMs) is proposed with conditioning on diagnostic labels and lesion information. Diagnostic utility and privacy robustness are assessed via cancer classification tasks and re-identification tasks using Siamese neural networks and membership inference. Results: The generated synthetic data achieved a Fréchet Inception Distance (FID) of 5.8, preserving diagnostic features. A model trained solely on synthetic data achieved comparable performance to one trained on real data (ROC-AUC: 0.77 vs. 0.82). Visual assessments showed that synthetic images are indistinguishable from real ones. Privacy evaluations demonstrated a low re-identification risk (e.g., mAP@R = 0.0051 on the test set), confirming the effectiveness of the privacy-preserving approach. Conclusions: The study demonstrates that privacy-preserving generative models can produce synthetic medical images with sufficient quality for diagnostic task while significantly reducing the risk of patient re-identification. This approach enables secure data sharing and model training in privacy-sensitive domains such as medical imaging. Full article

(This article belongs to the Special Issue Health Data Management in the Age of AI)

► Show Figures

Figure 1

13 pages, 2638 KB

Open AccessArticle

Aircraft Foreign Object Debris Detection Method Using Registration–Siamese Network

by Mo Chen, Xuhui Li, Yan Liu, Sheng Cheng and Hongfu Zuo

Appl. Sci. 2025, 15(19), 10750; https://doi.org/10.3390/app151910750 - 6 Oct 2025

Viewed by 286

Abstract

Foreign object debris (FOD) in civil aviation environments poses severe risks to flight safety. Conventional detection primarily relies on manual visual inspection, which is inefficient, susceptible to fatigue-related errors, and carries a high risk of missed detections. Therefore, there is an urgent need [...] Read more.

Foreign object debris (FOD) in civil aviation environments poses severe risks to flight safety. Conventional detection primarily relies on manual visual inspection, which is inefficient, susceptible to fatigue-related errors, and carries a high risk of missed detections. Therefore, there is an urgent need to develop an efficient and convenient intelligent method for detecting aircraft FOD. This study proposes a detection model based on a Siamese network architecture integrated with a spatial transformation module. The proposed model identifies FOD by comparing the registered features of evidence-retention images with their corresponding normally distributed features. A dedicated aircraft FOD dataset was constructed for evaluation, and extensive experiments were conducted. The results indicate that the proposed model achieves an average improvement of 0.1365 in image-level AUC (Area Under the Curve) and 0.0834 in pixel-level AUC compared to the Patch Distribution Modeling (PaDiM) method. Additionally, the effects of the spatial transformation module and training dataset on detection performance were systematically investigated, confirming the robustness of the model and providing guidance for parameter selection in practical deployment. Overall, this research introduces a novel and effective approach for intelligent aircraft FOD detection, offering both methodological innovation and practical applicability. Full article

► Show Figures

Figure 1

14 pages, 3118 KB

Open AccessArticle

Reconstruction Modeling and Validation of Brown Croaker (Miichthys miiuy) Vocalizations Using Wavelet-Based Inversion and Deep Learning

by Sunhyo Kim, Jongwook Choi, Bum-Kyu Kim, Hansoo Kim, Donhyug Kang, Jee Woong Choi, Young Geul Yoon and Sungho Cho

Sensors 2025, 25(19), 6178; https://doi.org/10.3390/s25196178 - 6 Oct 2025

Viewed by 460

Abstract

Fish species’ biological vocalizations serve as essential acoustic signatures for passive acoustic monitoring (PAM) and ecological assessments. However, limited availability of high-quality acoustic recordings, particularly for region-specific species like the brown croaker (Miichthys miiuy), hampers data-driven bioacoustic methodology development. In this [...] Read more.

Fish species’ biological vocalizations serve as essential acoustic signatures for passive acoustic monitoring (PAM) and ecological assessments. However, limited availability of high-quality acoustic recordings, particularly for region-specific species like the brown croaker (Miichthys miiuy), hampers data-driven bioacoustic methodology development. In this study, we present a framework for reconstructing brown croaker vocalizations by integrating fk14 wavelet synthesis, PSO-based parameter optimization (with an objective combining correlation and normalized MSE), and deep learning-based validation. Sensitivity analysis using a normalized Bartlett processor identified delay and scale (length) as the most critical parameters, defining valid ranges that maintained waveform similarity above 98%. The reconstructed signals matched measured calls in both time and frequency domains, replicating single-pulse morphology, inter-pulse interval (IPI) distributions, and energy spectral density. Validation with a ResNet-18-based Siamese network produced near-unity cosine similarity (~0.9996) between measured and reconstructed signals. Statistical analyses (95% confidence intervals; residual errors) confirmed faithful preservation of SPL values and minor, biologically plausible IPI variations. Under noisy conditions, similarity decreased as SNR dropped, indicating that environmental noise affects reconstruction fidelity. These results demonstrate that the proposed framework can reliably generate acoustically realistic and morphologically consistent fish vocalizations, even under data-limited scenarios. The methodology holds promise for dataset augmentation, PAM applications, and species-specific call simulation. Future work will extend this framework by using reconstructed signals to train generative models (e.g., GANs, WaveNet), enabling scalable synthesis and supporting real-time adaptive modeling in field monitoring. Full article

(This article belongs to the Topic Advances in Underwater Signal Processing and Communication: Challenges, Innovations, and Applications)

► Show Figures

Figure 1

16 pages, 2489 KB

Open AccessArticle

Sentence-Level Silent Speech Recognition Using a Wearable EMG/EEG Sensor System with AI-Driven Sensor Fusion and Language Model

by Nicholas Satterlee, Xiaowei Zuo, Kee Moon, Sung Q. Lee, Matthew Peterson and John S. Kang

Sensors 2025, 25(19), 6168; https://doi.org/10.3390/s25196168 - 5 Oct 2025

Viewed by 1212

Abstract

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable [...] Read more.

Silent speech recognition (SSR) enables communication without vocalization by interpreting biosignals such as electromyography (EMG) and electroencephalography (EEG). Most existing SSR systems rely on high-density, non-wearable sensors and focus primarily on isolated word recognition, limiting their practical usability. This study presents a wearable SSR system capable of accurate sentence-level recognition using single-channel EMG and EEG sensors with real-time wireless transmission. A moving window-based few-shot learning model, implemented with a Siamese neural network, segments and classifies words from continuous biosignals without requiring pauses or manual segmentation between word signals. A novel sensor fusion model integrates both EMG and EEG modalities, enhancing classification accuracy. To further improve sentence-level recognition, a statistical language model (LM) is applied as post-processing to correct syntactic and lexical errors. The system was evaluated on a dataset of four military command sentences containing ten unique words, achieving 95.25% sentence-level recognition accuracy. These results demonstrate the feasibility of sentence-level SSR using wearable sensors through a window-based few-shot learning model, sensor fusion, and ML applied to limited simultaneous EMG and EEG signals. Full article

(This article belongs to the Special Issue Advanced Sensing Techniques in Biomedical Signal Processing)

► Show Figures

Figure 1

17 pages, 4099 KB

Open AccessArticle

A Transformer-Based Multi-Scale Semantic Extraction Change Detection Network for Building Change Application

by Lujin Hu, Senchuan Di, Zhenkai Wang and Yu Liu

Buildings 2025, 15(19), 3549; https://doi.org/10.3390/buildings15193549 - 2 Oct 2025

Viewed by 391

Abstract

Building change detection involves identifying areas where buildings have changed by comparing multi-temporal remote sensing imagery of the same geographical region. Recent advances in Transformer-based methods have significantly improved remote sensing change detection. However, current Transformer models still exhibit persistent limitations in effectively [...] Read more.

Building change detection involves identifying areas where buildings have changed by comparing multi-temporal remote sensing imagery of the same geographical region. Recent advances in Transformer-based methods have significantly improved remote sensing change detection. However, current Transformer models still exhibit persistent limitations in effectively extracting multi-scale semantic features within complex scenarios. To more effectively extract multi-scale semantic features in complex scenes, we propose a novel model, which is the Transformer-based Multi-Scale Semantic Extraction Change Detection Network (MSSE-CDNet). The model employs a Siamese network architecture to enable precise change recognition. MSSE-CDNet comprises four parts, which together contain five modules: (1) a CNN feature extraction module, (2) a multi-scale semantic extraction module, (3) a Transformer encoder and decoder module, and (4) a prediction module. Comprehensive experiments on the standard LEVIR-CD benchmark for building change detection demonstrate our approach’s superiority over state-of-the-art methods. Compared to existing models such as FC-Siam-Di, FC-Siam-Conc, DTCTSCN, BIT, and SNUNet, MSSE-CDNet achieves significant and consistent gains in performance metrics, with F1 scores improved by 4.22%, 6.84%, 2.86%, 1.22%, and 2.37%, respectively, and Intersection over Union (IoU) improved by 6.78%, 10.74%, 4.65%, 2.02%, and 3.87%, respectively. These results robustly substantiate the effectiveness of our framework on an established benchmark dataset. Full article

(This article belongs to the Special Issue Big Data and Machine/Deep Learning in Construction)

► Show Figures

Figure 1

29 pages, 23948 KB

Open AccessArticle

CAGMC-Defence: A Cross-Attention-Guided Multimodal Collaborative Defence Method for Multimodal Remote Sensing Image Target Recognition

by Jiahao Cui, Hang Cao, Lingquan Meng, Wang Guo, Keyi Zhang, Qi Wang, Cheng Chang and Haifeng Li

Remote Sens. 2025, 17(19), 3300; https://doi.org/10.3390/rs17193300 - 25 Sep 2025

Viewed by 560

Abstract

With the increasing diversity of remote sensing modalities, multimodal image fusion improves target recognition accuracy but also introduces new security risks. Adversaries can inject small, imperceptible perturbations into a single modality to mislead model predictions, which undermines system reliability. Most existing defences are [...] Read more.

With the increasing diversity of remote sensing modalities, multimodal image fusion improves target recognition accuracy but also introduces new security risks. Adversaries can inject small, imperceptible perturbations into a single modality to mislead model predictions, which undermines system reliability. Most existing defences are designed for single-modal inputs and face two key challenges in multimodal settings: 1. vulnerability to perturbation propagation due to static fusion strategies, and 2. the lack of collaborative mechanisms that limit overall robustness according to the weakest modality. To address these issues, we propose CAGMC-Defence, a cross-attention-guided multimodal collaborative defence framework for multimodal remote sensing. It contains two main modules. The Multimodal Feature Enhancement and Fusion (MFEF) module adopts a pseudo-Siamese network and cross-attention to decouple features, capture intermodal dependencies, and suppress perturbation propagation through weighted regulation and consistency alignment. The Multimodal Adversarial Training (MAT) module jointly generates optical and SAR adversarial examples and optimizes network parameters under consistency loss, enhancing robustness and generalization. Experiments on the WHU-OPT-SAR dataset show that CAGMC-Defence maintains stable performance under various typical adversarial attacks, such as FGSM, PGD, and MIM, retaining 85.74% overall accuracy even under the strongest white-box MIM attack (

ϵ = 0.05

), significantly outperforming existing multimodal defence baselines. Full article

(This article belongs to the Special Issue Advances in Multimodal Remote Sensing Data: Processing, Fusion and Applications)

► Show Figures

Figure 1

18 pages, 960 KB

Open AccessArticle

Fus: Combining Semantic and Structural Graph Information for Binary Code Similarity Detection

by Yanlin Li, Taiyan Wang, Lu Yu and Zulie Pan

Electronics 2025, 14(19), 3781; https://doi.org/10.3390/electronics14193781 - 24 Sep 2025

Viewed by 366

Abstract

Binary code similarity detection (BCSD) plays an important role in software security. Recent deep learning-based methods have made great progress. Existing methods based on a single feature, such as semantics or graph structure, struggle to handle changes caused by the architecture or compilation [...] Read more.

Binary code similarity detection (BCSD) plays an important role in software security. Recent deep learning-based methods have made great progress. Existing methods based on a single feature, such as semantics or graph structure, struggle to handle changes caused by the architecture or compilation environment. Methods fusing semantics and graph structure suffer from insufficient learning of the function, resulting in low accuracy and robustness. To address this issue, we proposed Fus, a method that integrates semantic information from the pseudo-C code and structural features from the Abstract Syntax Tree (AST). The pseudo-C code and AST are robust against compilation and architectural changes and can represent the function well. Our approach consists of three steps. First, we preprocess the assembly code to obtain the pseudo-C code and AST for each function. Second, we employ a Siamese network with CodeBERT models to extract semantic embeddings from the pseudo-C code and Tree-Structured Long Short-Term Memory (Tree LSTM) to encode the AST. Finally, function similarity is computed by summing the respective semantic and structural similarities. The evaluation results show that our method outperforms the state-of-the-art methods in most scenarios. Especially in large-scale scenarios, its performance is remarkable. In the vulnerability search task, Fus achieves the highest recall. It demonstrates the accuracy and robustness of our method. Full article

► Show Figures

Figure 1

25 pages, 12760 KB

Open AccessArticle

Intelligent Face Recognition: Comprehensive Feature Extraction Methods for Holistic Face Analysis and Modalities

by Thoalfeqar G. Jarullah, Ahmad Saeed Mohammad, Musab T. S. Al-Kaltakchi and Jabir Alshehabi Al-Ani

Signals 2025, 6(3), 49; https://doi.org/10.3390/signals6030049 - 19 Sep 2025

Viewed by 1248

Abstract

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance [...] Read more.

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance monitoring, access management, and law enforcement activities. In this paper, comprehensive evaluations are conducted using different face detection and modality segmentation methods, feature extraction methods, and classifiers to improve system performance. As for face detection, four methods are proposed: OpenCV’s Haar Cascade classifier, Dlib’s HOG + SVM frontal face detector, Dlib’s CNN face detector, and Mediapipe’s face detector. Additionally, two types of feature extraction techniques are proposed: hand-crafted features (traditional methods: global local features) and deep learning features. Three global features were extracted, Scale-Invariant Feature Transform (SIFT), Speeded Robust Features (SURF), and Global Image Structure (GIST). Likewise, the following local feature methods are utilized: Local Binary Pattern (LBP), Weber local descriptor (WLD), and Histogram of Oriented Gradients (HOG). On the other hand, the deep learning-based features fall into two categories: convolutional neural networks (CNNs), including VGG16, VGG19, and VGG-Face, and Siamese neural networks (SNNs), which generate face embeddings. For classification, three methods are employed: Support Vector Machine (SVM), a one-class SVM variant, and Multilayer Perceptron (MLP). The system is evaluated on three datasets: in-house, Labelled Faces in the Wild (LFW), and the Pins dataset (sourced from Pinterest) providing comprehensive benchmark comparisons for facial recognition research. The best performance accuracy for the proposed ten-feature extraction methods applied to the in-house database in the context of the facial recognition task achieved 99.8% accuracy by using the VGG16 model combined with the SVM classifier. Full article

► Show Figures

Figure 1

20 pages, 2172 KB

Open AccessArticle

Securing Smart Grids: A Triplet Loss Function Siamese Network-Based Approach for Detecting Electricity Theft in Power Utilities

by Touqeer Ahmed, Muhammad Salman Saeed, Muhammad I. Masud, Zeeshan Ahmad Arfeen, Mazhar Baloch, Mohammed Aman and Mohsin Shahzad

Energies 2025, 18(18), 4957; https://doi.org/10.3390/en18184957 - 18 Sep 2025

Viewed by 414

Abstract

Electricity theft in power grids results in significant economic losses for utility companies. While machine learning (ML) methods have shown promising results in detecting such frauds, they often suffer from low detection rates, leading to excessive physical inspections. In this study, we attempted [...] Read more.

Electricity theft in power grids results in significant economic losses for utility companies. While machine learning (ML) methods have shown promising results in detecting such frauds, they often suffer from low detection rates, leading to excessive physical inspections. In this study, we attempted to solve the above-mentioned problem using a novel approach. The proposed framework utilizes the intelligence of Siamese network architecture with the Triplet Loss function to detect electricity theft using a labeled dataset obtained from Multan Electric Power Company (MEPCO), Pakistan. The proposed method involves analyzing and comparing the consumption patterns of honest and fraudulent consumers, enabling the model to distinguish between the two categories with enhanced accuracy and detection rates. We incorporate advanced feature extraction techniques and data mining methods to transform raw consumption data into informative features, such as time-based consumption profiles and anomalous load behaviors, which are crucial for detecting abnormal patterns in electricity consumption. The refined dataset is then used to train the Siamese network, where the Triplet Loss function optimizes the model by maximizing the distance between dissimilar (fraudulent and honest) consumption patterns while minimizing the distance among similar ones. The results demonstrate that our proposed solution outperforms traditional methods by significantly improving accuracy (95.4%) and precision (92%). Eventually, the integration of feature extraction with Siamese networks and Triplet Loss offers a scalable and robust framework for enhancing the security and operational efficiency of power grids. Full article

► Show Figures

Figure 1

20 pages, 55265 KB

Open AccessArticle

Learning Precise Mask Representation for Siamese Visual Tracking

by Peng Yang, Fen Hu, Qinghui Wang and Lei Dou

Sensors 2025, 25(18), 5743; https://doi.org/10.3390/s25185743 - 15 Sep 2025

Viewed by 638

Abstract

Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, [...] Read more.

Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, since the bounding box frequently includes excessive background pixels, trackers are sensitive to similar distractors. To address these issues, we propose a novel segmentation-assisted model that learns binary mask representations of targets. This model is generic and can be seamlessly integrated into various Siamese frameworks, enabling pixel-wise segmentation tracking instead of the suboptimal bounding box tracking. Specifically, our model features two core components: (i) a multi-stage precise mask representation module composed of cascaded U-Net decoders, designed to predict segmentation masks of targets, and (ii) a saliency localization head based on the Euclidean model, which extracts spatial position constraints to boost the decoder’s discriminative capability. Extensive experiments on five tracking benchmarks demonstrate that our method effectively improves the performance of both anchor-based and anchor-free Siamese trackers. Notably, on GOT-10k, our method increases the AO scores of the baseline trackers SiamRPN++ (anchor-based) and SiamBAN (anchor-free) by 5.2% and 7.5%, respectively while maintaining speeds exceeding 60 FPS. Full article

(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)

► Show Figures

Figure 1

27 pages, 1902 KB

Open AccessArticle

Few-Shot Breast Cancer Diagnosis Using a Siamese Neural Network Framework and Triplet-Based Loss

by Tea Marasović and Vladan Papić

Algorithms 2025, 18(9), 567; https://doi.org/10.3390/a18090567 - 8 Sep 2025

Viewed by 681

Abstract

Breast cancer is one of the leading causes of death among women of all ages and backgrounds globally. In recent years, the growing deficit of expert radiologists—particularly in underdeveloped countries—alongside a surge in the number of images for analysis, has negatively affected the [...] Read more.

Breast cancer is one of the leading causes of death among women of all ages and backgrounds globally. In recent years, the growing deficit of expert radiologists—particularly in underdeveloped countries—alongside a surge in the number of images for analysis, has negatively affected the ability to secure timely and precise diagnostic results in breast cancer screening. AI technologies offer powerful tools that allow for the effective diagnosis and survival forecasting, reducing the dependency on human cognitive input. Towards this aim, this research introduces a deep meta-learning framework for swift analysis of mammography images—combining a Siamese network model with a triplet-based loss function—to facilitate automatic screening (recognition) of potentially suspicious breast cancer cases. Three pre-trained deep CNN architectures, namely GoogLeNet, ResNet50, and MobileNetV3, are fine-tuned and scrutinized for their effectiveness in transforming input mammograms to a suitable embedding space. The proposed framework undergoes a comprehensive evaluation through a rigorous series of experiments, utilizing two different, publicly accessible, and widely used datasets of digital X-ray mammograms: INbreast and CBIS-DDSM. The experimental results demonstrate the framework’s strong performance in differentiating between tumorous and normal images, even with a very limited number of training samples, on both datasets. Full article

(This article belongs to the Special Issue Machine Learning for Pattern Recognition (3rd Edition))

► Show Figures

Figure 1

Search Results (543)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (543)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI