MDPI - Publisher of Open Access Journals

23 pages, 4501 KB

Open AccessArticle

Complexity-Driven Adversarial Validation for Corrupted Medical Imaging Data

by Diego Renza, Jorge Brieva and Ernesto Moya-Albor

Information 2026, 17(2), 125; https://doi.org/10.3390/info17020125 - 29 Jan 2026

Viewed by 384

Distribution shifts commonly arise in real-world machine learning scenarios in which the fundamental assumption that training and test data are drawn from independent and identically distributed samples is violated. In the case of medical data, such distribution shifts often occur during data acquisition and pose a significant challenge to the robustness and reliability of artificial intelligence systems in clinical practice. Additionally, quantifying these shifts without training a model remains a key open problem. This paper proposes a comprehensive methodological framework for evaluating the impact of such shifts on medical image datasets under artificial transformations that simulate acquisition variations, leveraging the Cumulative Spectral Gradient (CSG) score as a measure of multiclass classification complexity induced by distributional changes. Building on prior work, the proposed approach is meaningfully extended to twelve 2D medical imaging benchmarks from the MedMNIST collection, covering both binary and multiclass tasks, as well as grayscale and RGB modalities. We evaluate the metric analyzing its robustness to clinically inspired distribution shifts that are systematically simulated through motion blur, additive noise, brightness and contrast variation, and sharpness variation, each applied at three severity levels. This results in a large-scale benchmark that enables a detailed analysis of how dataset characteristics, transformation types, and distortion severity influence distribution shifts. Thus, the findings show that while the metric remains generally stable under noise and focus distortions, it is highly sensitive to variations in brightness and contrast. On the other hand, the proposed methodology is compared against Cleanlab’s widely used Non-IID score on the RetinaMNIST dataset using a pre-trained ResNet-50 model, including both class-wise analysis and correlation assessment between metrics. Finally, interpretability is incorporated through class activation map analysis on BloodMNIST and its corrupted variants to support and contextualize the quantitative findings. Full article

(This article belongs to the Special Issue Advances in Weakly Supervised Learning: Theories, Algorithms, and Applications)

► Show Figures

Figure 1

17 pages, 1102 KB

Open AccessArticle

Identifying and Mitigating Label Noise in Deep Learning for Image Classification

by César González-Santoyo, Diego Renza and Ernesto Moya-Albor

Technologies 2025, 13(4), 132; https://doi.org/10.3390/technologies13040132 - 1 Apr 2025

Cited by 6 | Viewed by 5842

Abstract

Labeling errors in datasets are a persistent challenge in machine learning because they introduce noise and bias and reduce the model’s generalization. This study proposes a novel methodology for detecting and correcting mislabeled samples in image datasets by using the Cumulative Spectral Gradient (CSG) metric to assess the intrinsic complexity of the data. This methodology is applied to the noisy CIFAR-10/100 and CIFAR-10n/100n datasets, where mislabeled samples in CIFAR-10n/100n are identified and relabeled using CIFAR-10/100 as a reference. The DenseNet and Xception models pre-trained on ImageNet are fine-tuned to evaluate the impact of label correction on the model performance. Evaluation metrics based on the confusion matrix are used to compare the model performance on the original and noisy datasets and on the label-corrected datasets. The results show that correcting the mislabeled samples significantly improves the accuracy and robustness of the model, highlighting the importance of dataset quality in machine learning. Full article

(This article belongs to the Special Issue Artificial Intelligence and Smart Information Systems: Trends and Innovations)

► Show Figures

Figure 1

18 pages, 1508 KB

Open AccessArticle

Adversarial Validation in Image Classification Datasets by Means of Cumulative Spectral Gradient

by Diego Renza, Ernesto Moya-Albor and Adrian Chavarro

Algorithms 2024, 17(11), 531; https://doi.org/10.3390/a17110531 - 19 Nov 2024

Cited by 2 | Viewed by 1903

Abstract

The main objective of a machine learning (ML) system is to obtain a trained model from input data in such a way that it allows predictions to be made on new i.i.d. (Independently and Identically Distributed) data with the lowest possible error. However, how can we assess whether the training and test data have a similar distribution? To answer this question, this paper presents a proposal to determine the degree of distribution shift of two datasets. To this end, a metric for evaluating complexity in datasets is used, which can be applied in multi-class problems, comparing each pair of classes of the two sets. The proposed methodology has been applied to three well-known datasets: MNIST, CIFAR-10 and CIFAR-100, together with corrupted versions of these. Through this methodology, it is possible to evaluate which types of modification have a greater impact on the generalization of the models without the need to train multiple models multiple times, also allowing us to determine which classes are more affected by corruption. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI