Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,014)

Search Parameters:
Keywords = image-level labels

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 871 KB  
Article
SMAD: Semi-Supervised Android Malware Detection via Consistency on Fine-Grained Spatial Representations
by Suchul Lee and Seokmin Han
Electronics 2025, 14(21), 4246; https://doi.org/10.3390/electronics14214246 - 30 Oct 2025
Abstract
Malware analytics suffer from scarce, delayed, and privacy-constrained labels, limiting fully supervised detection and hampering responsiveness to zero-day threats. We propose SMAD, a Semi-supervised Android Malicious App Detector that integrates a segmentation-oriented backbone—to extract pixel-level, multi-scale features from APK imagery—with a dual-branch consistency [...] Read more.
Malware analytics suffer from scarce, delayed, and privacy-constrained labels, limiting fully supervised detection and hampering responsiveness to zero-day threats. We propose SMAD, a Semi-supervised Android Malicious App Detector that integrates a segmentation-oriented backbone—to extract pixel-level, multi-scale features from APK imagery—with a dual-branch consistency objective that enforces predictive agreement between two parallel branches on the same image. We evaluate SMAD on CICMalDroid2020 under label budgets of 0.5, 0.25, and 0.125 and show that it achieves higher accuracy, macro-precision, macro-recall, and macro-F1 with smoother learning curves than supervised training, a recursive pseudo-labeling baseline, a FixMatch baseline, and a confidence-thresholded consistency ablation. A backbone ablation (replacing the dense encoder with WideResNet) indicates that pixel-level, multi-scale features under agreement contribute substantially to these gains. We observe a coverage–precision trade-off: hard confidence gating filters noise but lowers early-training performance, whereas enforcing consistency on dense, pixel-level representations yields sustained label-efficiency gains for image-based malware detection. Consequently, SMAD offers a practical path to high-utility detection under tight labeling budgets—a setting common in real-world security applications. Full article
Show Figures

Figure 1

31 pages, 34773 KB  
Article
Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach
by Mohammed Jeryo and Ahad Harati
J. Imaging 2025, 11(11), 377; https://doi.org/10.3390/jimaging11110377 - 27 Oct 2025
Viewed by 130
Abstract
Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of [...] Read more.
Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of event data and the absence of dense annotations are significant obstacles to supervised learning for motion segmentation from event streams. Domain adaptation is also challenging due to the considerable domain shift in intensity images. To address these challenges, we propose a two-phase cross-modality adaptation framework that translates motion segmentation knowledge from labeled RGB-flow data to unlabeled event streams. A dual-branch encoder extracts modality-specific motion and appearance features from RGB and optical flow in the source domain. Using reconstruction networks, event voxel grids are converted into pseudo-image and pseudo-flow modalities in the target domain. These modalities are subsequently re-encoded using frozen RGB-trained encoders. Multi-level consistency losses are implemented on features, predictions, and outputs to enforce domain alignment. Our design enables the model to acquire domain-invariant, semantically rich features through the use of shallow architectures, thereby reducing training costs and facilitating real-time inference with a lightweight prediction path. The proposed architecture, alongside the utilized hybrid loss function, effectively bridges the domain and modality gap. We evaluate our method on two challenging benchmarks: EVIMO2, which incorporates real-world dynamics, high-speed motion, illumination variation, and multiple independently moving objects; and MOD++, which features complex object dynamics, collisions, and dense 1kHz supervision in synthetic scenes. The proposed UDA framework achieves 83.1% and 79.4% accuracy on EVIMO2 and MOD++, respectively, outperforming existing state-of-the-art approaches, such as EV-Transfer and SHOT, by up to 3.6%. Additionally, it is lighter and faster and also delivers enhanced mIoU and F1 Score. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

15 pages, 2225 KB  
Article
An Automatic Pixel-Level Segmentation Method for Coal-Crack CT Images Based on U2-Net
by Yimin Zhang, Chengyi Wu, Jinxia Yu, Guoqiang Wang and Yingying Li
Electronics 2025, 14(21), 4179; https://doi.org/10.3390/electronics14214179 - 26 Oct 2025
Viewed by 182
Abstract
Automatically segmenting coal cracks in CT images is crucial for 3D reconstruction and the physical properties of mines. This paper proposes an automatic pixel-level deep learning method called Attention Double U2-Net to enhance the segmentation accuracy of coal cracks in CT [...] Read more.
Automatically segmenting coal cracks in CT images is crucial for 3D reconstruction and the physical properties of mines. This paper proposes an automatic pixel-level deep learning method called Attention Double U2-Net to enhance the segmentation accuracy of coal cracks in CT images. Due to the lack of public datasets of coal CT images, a pixel-level labeled coal crack dataset is first established through industrial CT scanning experiments and post-processing. Then, the proposed method utilizes a Double Residual U-Block structure (DRSU) based on U2-Net to improve feature extraction and fusion capabilities. Moreover, an attention mechanism module is proposed, which is called Atrous Asymmetric Fusion Non-Local Block (AAFNB). The AAFNB module is based on the idea of Asymmetric Non-Local, which enables the collection of global information to enhance the segmentation results. Compared with previous state-of-the-art models, the proposed Attention Double U2-Net method exhibits better performance over the coal crack CT image dataset in various evaluation metrics such as PA, mPA, MIoU, IoU, Precision, Recall, and Dice scores. The crack segmentation results obtained from this method are more accurate and efficient, which provides experimental data and theoretical support to the field of CBM exploration and damage of coal. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

17 pages, 38319 KB  
Article
Class-Level Feature Disentanglement for Multi-Label Image Classification
by Yingduo Tong, Zhenyu Lu, Yize Dong and Yonggang Lu
Future Internet 2025, 17(11), 486; https://doi.org/10.3390/fi17110486 - 23 Oct 2025
Viewed by 155
Abstract
Generally, the interpretability of deep neural networks is categorized into a priori and a posteriori interpretability. A priori interpretability involves improving model transparency through deliberate design prior to training. Feature disentanglement is a method for achieving a priori interpretability. Existing disentanglement methods mostly [...] Read more.
Generally, the interpretability of deep neural networks is categorized into a priori and a posteriori interpretability. A priori interpretability involves improving model transparency through deliberate design prior to training. Feature disentanglement is a method for achieving a priori interpretability. Existing disentanglement methods mostly focus on semantic features, such as intrinsic and shared features. These methods distinguish between the background and the main subject, but overlook class-level features in images. To address this, we take a further step by advancing feature disentanglement to the class level. For multi-label image classification tasks, we propose a class-level feature disentanglement method. Specifically, we introduce a multi-head classifier within the feature extraction layer of the backbone network to disentangle features. Each head in this classifier corresponds to a specific class and generates independent predictions, thereby guiding the model to better leverage the intrinsic features of each class and improving multi-label classification precision. Experiments demonstrate that our method significantly enhances performance metrics across various benchmarks while simultaneously achieving a priori interpretability. Full article
(This article belongs to the Special Issue Machine Learning Techniques for Computer Vision)
Show Figures

Figure 1

27 pages, 41569 KB  
Article
Deacidification of the Endolysosomal System by the Vesicular Proton Pump V-ATPase Inhibitor Bafilomycin A1 Affects EGF Receptor Endocytosis Differently in Endometrial MSC and HeLa Cells
by Anna V. Salova, Tatiana N. Belyaeva, Ilia K. Litvinov, Marianna V. Kharchenko and Elena S. Kornilova
Int. J. Mol. Sci. 2025, 26(20), 10226; https://doi.org/10.3390/ijms262010226 - 21 Oct 2025
Viewed by 348
Abstract
It is well-known that EGF binding to EGFR stimulates signal transduction and endocytosis, with the latter leading to lysosomal degradation of EGFR. However, the majority of data on the regulation of endocytosis have been obtained in tumor-derived cells. Here, we perform a comprehensive [...] Read more.
It is well-known that EGF binding to EGFR stimulates signal transduction and endocytosis, with the latter leading to lysosomal degradation of EGFR. However, the majority of data on the regulation of endocytosis have been obtained in tumor-derived cells. Here, we perform a comprehensive analysis of the role of endolysosome acidification in the regulation of endocytic pathway in tumor cells and in endometrial MSCs as a model of proliferating, undifferentiated, non-immortalized cells. Using QD-labeled EGF, the dynamics of co-localization of EGF-receptor complexes with endocytic markers in the control and upon inhibition of V-ATPase by Bafilomycin A1 (BafA1) were studied using confocal microscopy. Image analysis showed that in HeLa and A549 cells, BafA1 significantly slowed down EGFR entry into and exit from EEA1-positive early endosomes without disrupting passage through Rab7, CD63, and Lamp1 compartments, but rather shifting it to later times. In enMSCs, only a portion of EGF-containing endosomes entered the degradation pathway, and lysosomal delivery was significantly delayed. Unlike HeLa, in enMSC early endosomes, BafA1 increased the association of EGF-QDs with EEA1, suggesting a lower pH level, which is suboptimal for EEA1-dependent fusions. It is concluded that, unlike HeLa, enMSCs form a population of pH-independent endosomes containing activated EGFR for a long time. Full article
(This article belongs to the Special Issue Latest Research on Mesenchymal Stem Cells)
Show Figures

Figure 1

27 pages, 3367 KB  
Article
Amodal Segmentation and Trait Extraction of On-Branch Soybean Pods with a Synthetic Dual-Mask Dataset
by Kaiwen Jiang, Wei Guo and Wenli Zhang
Sensors 2025, 25(20), 6486; https://doi.org/10.3390/s25206486 - 21 Oct 2025
Viewed by 382
Abstract
We address the challenge that occlusions in on-branch soybean images impede accurate pod-level phenotyping. We propose a lab on-branch pipeline that couples a prior-guided synthetic data generator (producing synchronized visible and amodal labels) with an amodal instance segmentation framework based on an improved [...] Read more.
We address the challenge that occlusions in on-branch soybean images impede accurate pod-level phenotyping. We propose a lab on-branch pipeline that couples a prior-guided synthetic data generator (producing synchronized visible and amodal labels) with an amodal instance segmentation framework based on an improved Swin Transformer backbone with a Simple Attention Module (SimAM) and dual heads, trained via three-stage transfer (synthetic excised → synthetic on-branch → few-shot real). Guided by complete (amodal) masks, a morphology-driven module performs pose normalization, axial geometric modeling, multi-scale fused density mapping, marker-controlled watershed, and topological consistency refinement to extract seed per pod (SPP) and geometric traits. On real on-branch data, the model attains Visible Average Precision (AP) 50/75 of 91.6/77.6 and amodal AP50/75 of 90.1/74.7, and incorporating synthetic data yields consistent gains across models, indicating effective occlusion reasoning. On excised pod tests, SPP achieves a mean absolute error (MAE) of 0.07 and a root mean square error (RMSE) of 0.26; pod length/width achieves an MAE of 2.87/3.18 px with high agreement (R2 up to 0.94). Overall, the co-designed data–model–task pipeline recovers complete pod geometry under heavy occlusion and enables non-destructive, high-precision, and low-annotation-cost extraction of key traits, providing a practical basis for standardized laboratory phenotyping and downstream breeding applications. Full article
(This article belongs to the Special Issue Feature Papers in Smart Agriculture 2025)
Show Figures

Figure 1

18 pages, 11753 KB  
Article
SemiSeg-CAW: Semi-Supervised Segmentation of Ultrasound Images by Leveraging Class-Level Information and an Adaptive Multi-Loss Function
by Somayeh Barzegar and Naimul Khan
Mach. Learn. Knowl. Extr. 2025, 7(4), 124; https://doi.org/10.3390/make7040124 - 20 Oct 2025
Viewed by 278
Abstract
The limited availability of pixel-level annotated medical images complicates training supervised segmentation models, as these models require large datasets. To deal with this issue, SemiSeg-CAW, a semi-supervised segmentation framework that leverages class-level information and an adaptive multi-loss function, is proposed to reduce dependency [...] Read more.
The limited availability of pixel-level annotated medical images complicates training supervised segmentation models, as these models require large datasets. To deal with this issue, SemiSeg-CAW, a semi-supervised segmentation framework that leverages class-level information and an adaptive multi-loss function, is proposed to reduce dependency on extensive annotations. The model combines segmentation and classification tasks in a multitask architecture that includes segmentation, classification, weight generation, and ClassElevateSeg modules. In this framework, the ClassElevateSeg module is initially pre-trained and then fine-tuned jointly with the main model to produce auxiliary feature maps that support the main model, while the adaptive weighting strategy computes a dynamic combination of classification and segmentation losses using trainable weights. The proposed approach enables effective use of both labeled and unlabeled images with class-level information by compensating for the shortage of pixel-level labels. Experimental evaluation on two public ultrasound datasets demonstrates that SemiSeg-CAW consistently outperforms fully supervised segmentation models when trained with equal or fewer labeled samples. The results suggest that incorporating class-level information with adaptive loss weighting provides an effective strategy for semi-supervised medical image segmentation and can improve the segmentation performance in situations with limited annotations. Full article
Show Figures

Figure 1

17 pages, 2479 KB  
Article
A Semi-Automatic Labeling Framework for PCB Defects via Deep Embeddings and Density-Aware Clustering
by Sang-Jeong Lee, Sung-Bal Seo and You-Suk Bae
Sensors 2025, 25(20), 6470; https://doi.org/10.3390/s25206470 - 19 Oct 2025
Viewed by 490
Abstract
(1) Background. Printed circuit board (PCB) inspection is increasingly constrained by the cost and latency of reliable labels, owing to tiny/low-contrast defects embedded in complex backgrounds and severe class imbalance. (2) Methods. We proposed a semi-automatic labeling pipeline that converts anomaly detection proposals [...] Read more.
(1) Background. Printed circuit board (PCB) inspection is increasingly constrained by the cost and latency of reliable labels, owing to tiny/low-contrast defects embedded in complex backgrounds and severe class imbalance. (2) Methods. We proposed a semi-automatic labeling pipeline that converts anomaly detection proposals into class labels via small margin cropping from images, interchangeable embeddings (HOG, ResNet-50, ViT-B/16), clustering (k-means/GMM/HDBSCAN), and cluster-level verification using representative montages. (3) Results. On 9354 cropped defects spanning 10 categories (imbalance IR ≈ 1542, Gini ≈ 0.642), ResNet-50 + HDBSCAN achieved NMI ≈ 0.290, AMI ≈ 0.283, and purity ≈ 0.624 with ~47 clusters; ViT + HDBSCAN was comparable (NMI ≈ 0.281, AMI ≈ 0.274, ~44 clusters). With a fixed taxonomy, k-means (K = 10) yielded the strongest ARI (0.169 with ResNet-50; 0.158 with ViT). Macro-purity exceeded micro-purity, indicating many small, homogeneous clusters suitable for one-shot acceptance/rejection, enabling an upper-bound ~200× reduction in operator decisions relative to per-image labeling. (4) Conclusions. The workflow provides an auditable, resource-flexible path from normal-only localization to scalable supervision, prioritizing labeling productivity over detector state-of-the-art and directly addressing the industrial bottleneck in the development lifecycle for PCB inspection. Full article
Show Figures

Figure 1

22 pages, 2269 KB  
Data Descriptor
MCR-SL: A Multimodal, Context-Rich Skin Lesion Dataset for Skin Cancer Diagnosis
by Maria Castro-Fernandez, Thomas Roger Schopf, Irene Castaño-Gonzalez, Belinda Roque-Quintana, Herbert Kirchesch, Samuel Ortega, Himar Fabelo, Fred Godtliebsen, Conceição Granja and Gustavo M. Callico
Data 2025, 10(10), 166; https://doi.org/10.3390/data10100166 - 18 Oct 2025
Viewed by 412
Abstract
Well-annotated datasets are fundamental for developing robust artificial intelligence models, particularly in medical fields. Many existing skin lesion datasets have limitations in image diversity (including only clinical or dermoscopic images) or metadata, which hinder their utility for mimicking real-world clinical practice. The purpose [...] Read more.
Well-annotated datasets are fundamental for developing robust artificial intelligence models, particularly in medical fields. Many existing skin lesion datasets have limitations in image diversity (including only clinical or dermoscopic images) or metadata, which hinder their utility for mimicking real-world clinical practice. The purpose of the MCR-SL dataset is to introduce a new, meticulously curated dataset that addresses these limitations. The MCR-SL dataset was collected from 60 subjects at the University Hospital of North Norway and comprises 779 clinical images and 1352 dermoscopic images of 240 unique lesions. The lesion types included are nevus, seborrheic keratosis, basal cell carcinoma, actinic keratosis, atypical nevus, melanoma, squamous cell carcinoma, angioma, and dermatofibroma. Labels were established by combining the consensus of a panel of four dermatologists with histopathology reports for the 29 excised lesions, with the latter serving as the gold standard. The resulting dataset provides a comprehensive resource with clinical and dermoscopic images and rich clinical context, ensuring a high level of clinical relevance, surpassing many existing resources in that matter. The MCR-SL dataset provides a holistic and reliable foundation for validating artificial intelligence models, enabling a more nuanced and clinically relevant approach to automated skin lesion diagnosis that mirrors real-world clinical practice. Full article
Show Figures

Figure 1

20 pages, 11103 KB  
Data Descriptor
VitralColor-12: A Synthetic Twelve-Color Segmentation Dataset from GPT-Generated Stained-Glass Images
by Martín Montes Rivera, Carlos Guerrero-Mendez, Daniela Lopez-Betancur, Tonatiuh Saucedo-Anaya, Manuel Sánchez-Cárdenas and Salvador Gómez-Jiménez
Data 2025, 10(10), 165; https://doi.org/10.3390/data10100165 - 18 Oct 2025
Viewed by 340
Abstract
The segmentation and classification of color are crucial stages in image processing, computer vision, and pattern recognition, as they significantly impact the results. The diverse, hand-labeled datasets in the literature are applied for monochromatic or color segmentation in specific domains. On the other [...] Read more.
The segmentation and classification of color are crucial stages in image processing, computer vision, and pattern recognition, as they significantly impact the results. The diverse, hand-labeled datasets in the literature are applied for monochromatic or color segmentation in specific domains. On the other hand, synthetic datasets are generated using statistics, artificial intelligence algorithms, or generative artificial intelligence (AI). This last one includes Large Language Models (LLMs), Generative Adversarial Neural Networks (GANs), and Variational Autoencoders (VAEs), among others. In this work, we propose VitralColor-12, a synthetic dataset for color classification and segmentation, comprising twelve colors: black, blue, brown, cyan, gray, green, orange, pink, purple, red, white, and yellow. VitralColor-12 addresses the limitations of color segmentation and classification datasets by leveraging the capabilities of LLMs, including adaptability, variability, copyright-free content, and lower-cost data—properties that are desirable in image datasets. VitralColor-12 includes pixel-level classification and segmentation maps. This makes the dataset broadly applicable and highly variable for a range of computer vision applications. VitralColor-12 utilizes GPT-5 and DALL·E 3 for generating stained-glass images. These images simplify the annotation process, since stained-glass images have isolated colors with distinct boundaries within the steel structure, which provide easy regions to label with a single color per region. Once we obtain the images, we use at least one hand-labeled centroid per color to automatically cluster all pixels based on Euclidean distance and morphological operations, including erosion and dilation. This process enables us to automatically label a classification dataset and generate segmentation maps. Our dataset comprises 910 images, organized into 70 generated images and 12 pixel segmentation maps—one for each color—which include 9,509,524 labeled pixels, 1,794,758 of which are unique. These annotated pixels are represented by RGB, HSL, CIELAB, and YCbCr values, enabling a detailed color analysis. Moreover, VitralColor-12 offers features that address gaps in public resources such as violin diagrams with the frequency of colors across images, histograms of channels per color, 3D color maps, descriptive statistics, and standardized metrics, such as ΔE76, ΔE94, and CIELAB Chromacity, which prove the distribution, applicability, and realistic perceptual structures, including warm, neutral, and cold colors, as well as the high contrast between black and white colors, offering meaningful perceptual clusters, reinforcing its utility for color segmentation and classification. Full article
Show Figures

Figure 1

25 pages, 7385 KB  
Article
Reducing Annotation Effort in Semantic Segmentation Through Conformal Risk Controlled Active Learning
by Can Erhan and Nazim Kemal Ure
AI 2025, 6(10), 270; https://doi.org/10.3390/ai6100270 - 18 Oct 2025
Viewed by 434
Abstract
Modern semantic segmentation models require extensive pixel-level annotations, creating a significant barrier to practical deployment as labeling a single image can take hours of human effort. Active learning offers a promising way to reduce annotation costs through intelligent sample selection. However, existing methods [...] Read more.
Modern semantic segmentation models require extensive pixel-level annotations, creating a significant barrier to practical deployment as labeling a single image can take hours of human effort. Active learning offers a promising way to reduce annotation costs through intelligent sample selection. However, existing methods rely on poorly calibrated confidence estimates, making uncertainty quantification unreliable. We introduce Conformal Risk Controlled Active Learning (CRC-AL), a novel framework that provides statistical guarantees on uncertainty quantification for semantic segmentation, in contrast to heuristic approaches. CRC-AL calibrates class-specific thresholds via conformal risk control, transforming softmax outputs into multi-class prediction sets with formal guarantees. From these sets, our approach derives complementary uncertainty representations: risk maps highlighting uncertain regions and class co-occurrence embeddings capturing semantic confusions. A physics-inspired selection algorithm leverages these representations with a barycenter-based distance metric that balances uncertainty and diversity. Experiments on Cityscapes and PascalVOC2012 show CRC-AL consistently outperforms baseline methods, achieving 95% of fully supervised performance with only 30% of labeled data, making semantic segmentation more practical under limited annotation budgets. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Graphical abstract

19 pages, 2488 KB  
Article
Unsupervised Segmentation of Bolus and Residue in Videofluoroscopy Swallowing Studies
by Farnaz Khodami, Mehdy Dousty, James L. Coyle and Ervin Sejdić
J. Imaging 2025, 11(10), 368; https://doi.org/10.3390/jimaging11100368 - 17 Oct 2025
Viewed by 284
Abstract
Bolus tracking is a critical component of swallowing analysis, as the speed, course, and integrity of bolus movement from the mouth to the stomach, along with the presence of residue, serve as key indicators of potential abnormalities. Existing machine learning approaches for videofluoroscopic [...] Read more.
Bolus tracking is a critical component of swallowing analysis, as the speed, course, and integrity of bolus movement from the mouth to the stomach, along with the presence of residue, serve as key indicators of potential abnormalities. Existing machine learning approaches for videofluoroscopic swallowing study (VFSS) analysis heavily rely on annotated data and often struggle to detect residue, which is visually subtle and underrepresented. This study proposes an unsupervised architecture to segment both bolus and residue, marking the first successful machine learning-based residue segmentation in swallowing analysis with quantitative evaluation. We introduce an unsupervised convolutional autoencoder that segments bolus and residue without requiring pixel-level annotations. To address the locality bias inherent in convolutional architectures, we incorporate positional encoding into the input representation, enabling the model to capture global spatial context. The proposed model was validated on a diverse set of VFSS images annotated by certified raters. Our method achieves an intersection over union (IoU) of 61% for bolus segmentation—comparable to state-of-the-art supervised methods—and 52% for residue detection. Despite not using pixel-wise labels for training, our model significantly outperforms top-performing supervised baselines in residue detection, as confirmed by statistical testing. These findings suggest that learning from negative space provides a robust and generalizable pathway for detecting clinically significant but sparsely represented features like residue. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

18 pages, 3535 KB  
Article
UAV Based Weed Pressure Detection Through Relative Labelling
by Sebastiaan Verbesselt, Rembert Daems, Axel Willekens and Jonathan Van Beek
Remote Sens. 2025, 17(20), 3434; https://doi.org/10.3390/rs17203434 - 15 Oct 2025
Viewed by 368
Abstract
Agricultural management in Europe faces increasing pressure to reduce its environmental footprint. Implementing precision agriculture for weed management could offer a solution and minimize the use of chemical products. High spatial resolution imagery from real time kinematic (RTK) unmanned aerial vehicles (UAV) in [...] Read more.
Agricultural management in Europe faces increasing pressure to reduce its environmental footprint. Implementing precision agriculture for weed management could offer a solution and minimize the use of chemical products. High spatial resolution imagery from real time kinematic (RTK) unmanned aerial vehicles (UAV) in combination with supervised convolutional neural network (CNNs) models have proven successful in making location specific treatments. This site-specific advice limits the amount of herbicide applied to the field to areas that require action, thereby reducing the environmental impact and inputs for the farmer. To develop performant CNN models, there is a need for sufficient high-quality labelled data. To reduce the labelling effort and time, a new labelling method is proposed whereby image subsection pairs are labelled based on their relative differences in weed pressure to train a CNN ordinal regression model. The model is evaluated on detecting weed pressure in potato (Solanum tuberosum L.). Model performance was evaluated on different levels: pairwise accuracy, linearity (Pearson correlation coefficient), rank consistency (Spearman’s (rs) and Kendal (τ) rank correlations coefficients) and binary accuracy. After hyperparameter tuning, a pairwise accuracy of 85.2%, significant linearity (rs = 0.81) and significant rank consistency (rs = 0.87 and τ = 0.69) were found. This suggests that the model is capable of correctly detecting the gradient in weed pressure for the dataset. A maximum binary accuracy and F1-score of 92% and 88% were found for the dataset after thresholding the predicted weed scores into weed versus non-weed images. The model architecture allows us to visualize the intermediate features of the last convolutional block. This allows data analysts to better evaluate if the model “sees” the features of interest (in this case weeds). The results indicate the potential of ordinal regression with relative labels as a fast, lightweight model that predicts weed pressure gradients. Experts have the freedom to decide which threshold value(s) can be used on predicted weed scores depending on the weed, crop and treatment that they want to use for flexible weed control management. Full article
Show Figures

Figure 1

17 pages, 550 KB  
Article
AnomalyNLP: Noisy-Label Prompt Learning for Few-Shot Industrial Anomaly Detection
by Li Hua and Jin Qian
Electronics 2025, 14(20), 4016; https://doi.org/10.3390/electronics14204016 - 13 Oct 2025
Viewed by 566
Abstract
Few-Shot Industrial Anomaly Detection (FSIAD) is an essential yet challenging problem in practical scenarios such as industrial quality inspection. Its objective is to identify previously unseen anomalous regions using only a limited number of normal support images from the same category. Recently, large [...] Read more.
Few-Shot Industrial Anomaly Detection (FSIAD) is an essential yet challenging problem in practical scenarios such as industrial quality inspection. Its objective is to identify previously unseen anomalous regions using only a limited number of normal support images from the same category. Recently, large pre-trained vision-language models (VLMs), such as CLIP, have exhibited remarkable few-shot image-text representation abilities across a range of visual tasks, including anomaly detection. Despite their promise, real-world industrial anomaly datasets often contain noisy labels, which can degrade prompt learning and detection performance. In this paper, we propose AnomalyNLP, a new Noisy-Label Prompt Learning approach designed to tackle the challenge of few-shot anomaly detection. This framework offers a simple and efficient approach that leverages the expressive representations and precise alignment capabilities of VLMs for industrial anomaly detection. First, we design a Noisy-Label Prompt Learning (NLPL) strategy. This strategy utilizes feature learning principles to suppress the influence of noisy samples via Mean Absolute Error (MAE) loss, thereby improving the signal-to-noise ratio and enhancing overall model robustness. Furthermore, we introduce a prompt-driven optimal transport feature purification method to accurately partition datasets into clean and noisy subsets. For both image-level and pixel-level anomaly detection, AnomalyNLP achieves state-of-the-art performance across various few-shot settings on the MVTecAD and VisA public datasets. Qualitative and quantitative results on two datasets demonstrate that our method achieves the largest average AUC improvement over baseline methods across 1-, 2-, and 4-shot settings, with gains of up to 10.60%, 10.11%, and 9.55% in practical anomaly detection scenarios. Full article
Show Figures

Figure 1

21 pages, 14964 KB  
Article
An Automated Framework for Abnormal Target Segmentation in Levee Scenarios Using Fusion of UAV-Based Infrared and Visible Imagery
by Jiyuan Zhang, Zhonggen Wang, Jing Chen, Fei Wang and Lyuzhou Gao
Remote Sens. 2025, 17(20), 3398; https://doi.org/10.3390/rs17203398 - 10 Oct 2025
Viewed by 373
Abstract
Levees are critical for flood defence, but their integrity is threatened by hazards such as piping and seepage, especially during high-water-level periods. Traditional manual inspections for these hazards and associated emergency response elements, such as personnel and assets, are inefficient and often impractical. [...] Read more.
Levees are critical for flood defence, but their integrity is threatened by hazards such as piping and seepage, especially during high-water-level periods. Traditional manual inspections for these hazards and associated emergency response elements, such as personnel and assets, are inefficient and often impractical. While UAV-based remote sensing offers a promising alternative, the effective fusion of multi-modal data and the scarcity of labelled data for supervised model training remain significant challenges. To overcome these limitations, this paper reframes levee monitoring as an unsupervised anomaly detection task. We propose a novel, fully automated framework that unifies geophysical hazards and emergency response elements into a single analytical category of “abnormal targets” for comprehensive situational awareness. The framework consists of three key modules: (1) a state-of-the-art registration algorithm to precisely align infrared and visible images; (2) a generative adversarial network to fuse the thermal information from IR images with the textural details from visible images; and (3) an adaptive, unsupervised segmentation module where a mean-shift clustering algorithm, with its hyperparameters automatically tuned by Bayesian optimization, delineates the targets. We validated our framework on a real-world dataset collected from a levee on the Pajiang River, China. The proposed method demonstrates superior performance over all baselines, achieving an Intersection over Union of 0.348 and a macro F1-Score of 0.479. This work provides a practical, training-free solution for comprehensive levee monitoring and demonstrates the synergistic potential of multi-modal fusion and automated machine learning for disaster management. Full article
Show Figures

Graphical abstract

Back to TopTop