An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision

Abdelwahab, Amira; Afifi, Ahmed; Salama, Mohamed

doi:10.3390/electronics13010169

Open AccessArticle

An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision

by

Amira Abdelwahab

^1,2,*

,

Ahmed Afifi

^3,4

and

Mohamed Salama

⁵

¹

Department of Information Systems, College of Computer Science and Information Technology, King Faisal University, P.O. Box 400, Al-Ahsa 31982, Saudi Arabia

²

Department of Information Systems, Faculty of Computers and Information, Menoufia University, Shibin Al Kawm 32511, Menoufia, Egypt

³

Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, P.O. Box 400, Al-Ahsa 31982, Saudi Arabia

⁴

Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shibin Al Kawm 32511, Menoufia, Egypt

⁵

Department of Management Information Systems, Higher Institute for Specific Studies, Future Academy, Heliopolis P.O. Box 11757, Cairo, Egypt

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(1), 169; https://doi.org/10.3390/electronics13010169

Submission received: 2 December 2023 / Revised: 25 December 2023 / Accepted: 27 December 2023 / Published: 30 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

The integration of active learning (AL) and deep learning (DL) presents a promising avenue for enhancing the efficiency and performance of deep learning classifiers. This article introduces an approach that seamlessly integrates AL principles into the training process of DL models to build robust image classifiers. The proposed approach employs a unique methodology to select high-confidence unlabeled data points for immediate labeling, reducing the need for human annotation and minimizing annotation costs. Specifically, by combining uncertainty sampling with the pseudo-labeling of confident data, the proposed approach expands the training set efficiently. The proposed approach uses a hybrid active deep learning model that selects the most informative data points that need labeling based on an uncertainty measure. Then, it iteratively retrains a deep neural network classifier on the newly labeled samples. The model achieves high accuracy with fewer manually labeled samples than traditional supervised deep learning by selecting the most informative samples for labeling and retraining in a loop. Experiments on various image classification datasets demonstrate that the proposed model outperforms conventional approaches in terms of classification accuracy and reduced human annotation requirements. The proposed model achieved accuracy of 98.9% and 99.3% for the Cross-Age Celebrity and Caltech Image datasets compared to the conventional approach, which achieved 92.3% and 74.3%, respectively. In summary, this work presents a promising unified active deep learning approach to minimize the human effort in manually labeling data while maximizing classification accuracy by strategically labeling only the most valuable samples for the model.

Keywords:

active learning; unlabeled data classification; query strategies; deep learning; image classification; annotation costs

1. Introduction

In the dynamic field of machine learning and artificial intelligence, deep learning has emerged as a powerful tool for extracting complex patterns from data, enabling the development of accurate classification models [1]. However, training deep learning models requires extensive labeled datasets, which can be expensive and time-consuming. This is especially challenging in domains where large amounts of data are required for training robust models. Researchers are exploring innovative approaches that combine deep learning with active learning techniques to address this issue.

Integrating active learning and deep learning offers a promising solution to address these challenges. By leveraging the strengths of both approaches, it is possible to efficiently select the most informative data points for annotation [2,3,4] while also leveraging the power of deep learning to extract meaningful patterns from data. This integration can revolutionize data annotation and classification model development, leading to more efficient, accurate, and accessible machine learning solutions [5,6].

Active learning strategies are essential for deep learning and adapting them is a significant advancement [7]. Techniques such as uncertainty sampling and query-by-committee can now be extended to deep neural networks [6]. These strategies are crucial in selecting data points that challenge the model’s understanding and require further annotation. Bayesian deep learning and dropout-based uncertainty estimation are some of the approaches that help measure model uncertainty and guide instance selection. Deep learning based active learning has many applications in various domains. For instance, it reduces manual annotation efforts in computer vision tasks such as image segmentation and object recognition. However, this approach also poses several challenges that need to be addressed. These include effective uncertainty estimation, selecting appropriate active learning strategies, and scalability to large datasets.

In this research, we propose a powerful model that combines active learning with deep learning to improve the efficiency and effectiveness of data annotation for image classification. The proposed model employs a unique approach to select high-confidence unlabeled data points for immediate labeling, reducing the need for human annotation and minimizing annotation costs. Experiments on various image classification datasets demonstrate that our model outperforms conventional approaches in terms of classification accuracy and reduced human annotation requirements.

This study includes investigating active and deep learning techniques, integrating methodologies, and empirically evaluating the impact on model accuracy. We explore how combining deep learning and active learning can improve the annotation process and enhance model development, addressing significant data challenges in artificial intelligence. By integrating these techniques, we tend to significantly increase the efficiency of data annotation, which, in turn, accelerates progress in machine learning and improves accuracy. This advancement contributes to the scalability and effectiveness of machine learning through innovative data annotation methods, pushing the boundaries of visual recognition tasks.

We can summarize the main contributions of this study as follows:

An innovative active learning approach: The proposed approach utilizes uncertainty sampling to select informative unlabeled data points for labeling and retraining with pseudo-labeling of confident data, efficiently expanding the training set.
Building a robust image classifier with fewer labeled samples: By strategically picking and labeling the most valuable samples in a loop, the model achieves high accuracy with fewer manually labeled samples compared to traditional supervised deep learning approaches.

2. Literature Review

This section comprehensively reviews the fundamental concepts and previous research related to active learning, deep learning, data annotation, and their intersections. This review places our study within the broader context of machine learning and data annotation, highlighting the foundational principles and theoretical frameworks that guide our research.

Significant research has been conducted in the field of machine learning, specifically in active learning. The origins of active learning can be traced back to early works that relied primarily on acquisition functions to gauge prediction deviations [8]. These initial approaches were introduced alongside conventional machine learning methods, such as linear regression [9] and support vector machines [10]. The authors of [11] introduced context constraints to assist users in tagging face photos more effectively. Simultaneously, several articles [12] have emphasized sample selection that maximizes mutual information within the Gaussian process framework.

Several studies have explored different techniques for selecting critical instances in image classification. One such technique, introduced in [13], is an adaptive AI approach that combines information density and uncertainty measures. Another approach discussed in [14] involves selecting diverse, uncertain samples in batches rather than individually. For instance, one study extended support vector machine (SVM)-based active learning to batch mode while considering diversity within each selected class. More recently, ref. [15] proposed a model-agnostic uncertainty batch selection method that uses convex programming to identify informative queries. This method is compatible with various classifiers beyond just SVMs. Batch sampling can increase efficiency by requesting parallel labels for sets of useful data points. Diversity among batches aims to cover the data distribution when expanding the labeled training set.

Active learning techniques typically focus on querying labels for uncertain, low-confidence examples. However, they often overlook more confident predictions, constituting a larger portion of the unlabeled pool. Prioritizing stable, high-confidence instances when expanding training data can also enhance model accuracy and consistency. Since these samples are more prevalent, labeling them may require less human effort than in rare uncertain cases [16]. A balanced approach that selects informative low-confidence and representative high-confidence examples could provide efficiency and performance gains. The goal is to actively acquire training data that comprehensively improves the model, not just targets its weaknesses.

In his work, Brinker created a classification system for active learning strategies and assessed their impact on classification performance [17]. The study divided active learning methods into uncertainty sampling, query-by-committee, query-by-example, and query-by-difference. It was found that uncertainty sampling was the most effective method in reducing labeling efforts while maintaining or improving classification accuracy. Another approach to enhance binary classification is introduced in [18,19], with it called “active testing”. Active testing involves actively querying instances where the classifier exhibits uncertainty, focusing on regions where the model’s prediction confidence is low. The study demonstrated that active testing could significantly reduce labeling effort compared to passive learning.

Dasgupta and his colleagues conducted a study on using active learning in the context of support vector machines (SVMs) [20]. The study demonstrated that active learning could enhance the generalization performance of SVMs by selectively labeling instances that contribute the most to reducing the model’s uncertainty. This research emphasized the potential of active learning to improve the performance of SVMs, a popular machine learning classifier. In another study [21], researchers explored the application of active learning in named entity recognition (NER) tasks in natural language processing (NLP) [22]. Their study showed that active learning can significantly reduce the labeled data required for training NER models while achieving competitive performance compared to passive learning. This research highlighted the practical utility of active learning in NLP tasks.

Several studies have shown that active learning can significantly reduce labeling effort and improve the performance of classification models. However, challenges still need to be addressed, such as the selection of appropriate active learning strategies, the need for effective uncertainty estimation, and the impact of dataset characteristics on active learning effectiveness [23,24,25]. Therefore, this study aims to effectively integrate active learning with deep learning to improve data annotation and classification performance.

3. Methodology

This study explores active learning techniques to efficiently build image classifiers and annotate data from a large pool of unlabeled data, given a fixed annotation budget. The primary challenge is to select the most valuable samples for labeling to train an accurate classifier while minimizing annotation effort. Active learning aims to acquire labels for the most informative subsets, resulting in high-performing models with significantly fewer labeled examples than those required by passive supervised learning. We focus on active acquisition strategies to optimize classification performance while working within labeling and human effort budget constraints. Active learning is implemented in cycles, where batches of samples are sequentially annotated. At the beginning of each cycle, the model trains on the current labeled data. Then, an acquisition function is used to select the next data batch for human annotation. This process is repeated until the full budget is consumed.

The proposed approach, shown in Figure 1, annotates both uncertain and high-confidence samples in each batch. This balances model exploration and exploitation to enhance classification accuracy and data efficiency compared to conventional uncertainty-based active learning. By selectively sampling unlabeled data, the proposed model aims to maximize model performance with minimal human annotation effort.

3.1. Base Classifier

For this study, we employed the EfficientNet-B0 convolutional neural network model [26]. The EfficientNet is a series of convolutional neural network (CNN) architectures designed to achieve the highest level of performance for image classification tasks while being computationally efficient [27,28]. The main concept behind the architecture is to scale the model’s depth, width, and resolution simultaneously to strike a balance between size and accuracy. The model uses a compound scaling method that systematically and uniformly scales the depth, width, and resolution. This allows EfficientNet to use computational resources more efficiently and perform better than traditional models. Given its efficiency and performance improvements, it is well suited for our research’s image classification objectives.

Figure 2 illustrates the architectural design of EfficientNet-B0 [29,30,31,32]. It leverages depth-wise separable convolutions known for computational efficiency. Also, the innovative incorporation of MBConv blocks introduces a fusion of depth-wise separable convolutions, squeeze-and-excitation mechanisms, and shortcut connections. This amalgamation empowers EfficientNet-B0 with a robust and expressive feature extraction capability.

EfficientNet-B0 owes its success to its innovative compound scaling methodology. This approach scales the depth (number of layers), width (number of channels), and resolution (input image size) through a set of coefficients:

α

,

β

, and

γ

. The optimal values for these coefficients are determined through a meticulous grid search. Importantly, the scaling coefficients follow the principle of efficient scaling, ensuring that the increase in computational cost aligns linearly with the growth in parameters. The model architecture starts with a stem convolutional layer that processes the input image and ends with a classifier head for final predictions. The stem and head have scaled numbers of layers, reinforcing the model’s ability to extract hierarchical features. Regularization is achieved by strategically incorporating dropout, which enhances generalization during training. Batch normalization stabilizes and accelerates the convergence of the model.

3.2. Proposed Approach

The proposed approach integrates active deep learning and pseudo-labeling to assign labels to unlabeled data, improving model training by iteratively selecting informative samples for data annotation and deep image classification [32]. In a dataset (

D

) with

m

classes and

n

samples (

x_{i}

), two key assumptions guide the approach: most samples are initially unlabeled, necessitating the determination of labels (

y_{i}

) through active learning, and the labeled set (

D_{l}

) grows incrementally as selectively annotated samples are added.

The approach is designed for scenarios involving iterative active learning and expanding labeled data, suitable for datasets with mostly unknown labels. It offers a comprehensive framework for deep image classification, accommodating continuously expanding unlabeled data by combining active learning for selective sampling with a core deep learning architecture for representation learning.

The proposed model facilitates the efficient labeling and augmentation of training data from abundant unlabeled sources. It incrementally improves knowledge by actively acquiring new labeled data from the unlabeled pool (

D_{u})

, allowing for comprehensive utilization of growing unlabeled resources. The model’s loss function, defined in Equation (1), minimizes the negative log-likelihood, thus training the model parameters (

W

) to enhance classification accuracy.

\min_{\{W, y_{i}, i \in D_{l}\}} - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m} 1 \{y_{i} = j\} \log p (y_{i} = j| x_{i}; W)

(1)

where

1 \{y_{i} = j\}

is an indicator function whose outputs is 1 if

y_{i}

is class

j

, otherwise it outputs 0.

W

represents the CNN parameters.

p (y_{i} = j| ; W)

is the CNN’s predicted probability that the sample

x_{i}

belongs to class

j

.

Optimization can be achieved by using an alternating search approach. In this process, we update the labels (

y_{i})

for unlabeled samples from

D_{u}

, and the network weights

W

, iteratively. This approach involves two alternating phases:

1. In the first phase, we fix the network weights (

W

) and update labels or pseudo-labels (

y_{i})

by assigning labels to unlabeled samples based on the model’s current predictions. This expands the labeled training data.

2. In the second phase, we use the new labeled data and update the network weights (

W

) by training the CNN on the enlarged labeled set to improve classification.

By alternating these two steps of assigning pseudo-labels and retraining the model, we gradually improve the model parameters

W

to maximize labeling accuracy and classification performance. Iterative alternate optimization of model predictions and training enables efficient learning from limited labeled data.

3.2.1. Initialization Methods

Before starting, the labeled set

D_{l}

is empty, and the unlabeled pool

D_{u}

contains all

n

samples. To initialize the CNN parameters

W

, a small set of samples is taken from

D_{l}

and manually annotated for each class. These initially labeled samples are used to train the CNN and obtain starting values for

W

. This provides a warm start for the model.

3.2.2. Choosing Samples with Uncertainty

The unlabeled samples are prioritized for manual annotation by ranking them based on uncertainty measures that quantify the model’s confidence in its predictions. The proposed model is employed to select the most uncertain samples

K

according to an acquisition function and pseudo-labels highly confident samples to expand the training set. Specifically, the least confidence criteria are utilized as the acquisition function, which prioritizes samples where the prediction probability

p (y_{i} = j| x_{i}; W)

of the true class

j

is lowest. This focuses manual annotation efforts on samples in which the model is least certain to efficiently improve performance. The highly confident samples are identified by applying a threshold on the maximum predicted probability. Samples with probability above the confidence threshold are pseudo-labeled and added to the training set. There are three primary selection criteria for uncertain samples:

By iteratively selecting uncertain samples for manual annotation and pseudo-labeling highly confident samples, the training data are expanded selectively while minimizing labeling effort. In the case of “Least Confidence ( ${l c}_{i}$ )”, you should organize the unannotated data samples by sorting them in ascending order according to their ${l c}_{i}$ values, which are calculated as follows:

{l c}_{i} = \max_{j} p (y_{i} = j| x_{i}; W)

(2)

for all possible values of

j

. As per the

{l c}_{i}

definition, the classifier exhibits uncertainty regarding a data sample when the confidence in the most probable class for a given data instance is low.

2.: With margin sampling [33], unlabeled samples are sorted in ascending order based on the margin between the model’s top two predicted class probabilities. For each sample $x_{i}$ , the class probability $p (y_{i} = j| x_{i}; W)$ is computed for every class $j$ . The highest and second-highest probabilities are identified. The margin is calculated as their difference as shown in Equation (3):

${m s}_{i} = p (y_{i} = j_{1}| x_{i}; W) - p (y_{i} = j_{2}| x_{i}; W)$

(3)

Lower margin values indicate more uncertainty between the top classes. Samples are ranked in ascending margin order, so those with the smallest differences between the top predicted classes are prioritized. Margin sampling selects unlabeled samples that have the smallest margin or difference between the model’s most confident class predictions. These ambiguous samples are deemed most useful for annotation.

In margin sampling,

j_{1}

and

j_{2}

denote the top two predicted class labels for a given unlabeled instance. The margin is defined as the difference between the predicted probabilities of

j_{1}

and

j_{2}

. A lower margin value indicates higher uncertainty, as the model struggles to confidently distinguish between the top classes. By sorting all unlabeled data points by ascending margin size, we can rank instances from most ambiguous to least unsure. Prioritizing labeling for ambiguous, low-margin samples allows the model to gain the most information from each new annotated example. Actively sampling uncertain instances in this manner enhances the efficiency of the learning process.

3.: Entropy measures the uncertainty in the model’s predicted class probability distribution for a given unlabeled sample. Samples with high entropy are more ambiguous and informative for the model [33]. Entropy uses the predicted probabilities across all classes $j$ to quantify the uncertainty as follows:

${e n}_{i} = - \sum_{j = 1}^{m} p (y_{i} = j| x_{i}; W) \log p (y_{i} = j| x_{i}; W)$

(4)

Higher entropy indicates that the model is more uncertain about which class the sample belongs to. These highly uncertain samples are useful for annotation to reduce the model’s overall uncertainty [34]. Entropy selects unlabeled samples where the model’s predicted class probability distribution has high variability or randomness. Annotating these ambiguous samples helps minimize the classifier’s overall prediction uncertainty.

In entropy-based sampling,

p (y_{i} = j| x_{i}; W)

denotes the predicted probability of sample

x_{i}

belonging to class

j

, given the model parameters

W

. The entropy

{e n}_{i}

sums the predicted class probabilities to quantify the overall uncertainty in classification.

Unlabeled instances are sorted by decreasing entropy, such that samples with the highest

{e n}_{i}

values (most uncertain) are prioritized for labeling. Actively sampling high-entropy points where the model is least confident can enhance learning with fewer labeled examples than passive learning.

Additionally, highly confident samples are selected where

{e n}_{i} < δ

, a predefined threshold. These unambiguous points are pseudo-labeled without human annotation. By combining uncertainty sampling with the pseudo-labeling of confident data, the proposed approach expands the training set efficiently. Active learning focuses on informative ambiguity while pseudo-labeling leverages stable high-confidence predictions:

j^{*} = \underset{j}{argmax} p (y_{i} = j| x_{i}; W), y_{i} = \{\begin{matrix} j^{*}, {e n}_{i} < δ \\ 0, o t h e r w i s w \end{matrix}

(5)

when

y_{i} = 1;

it indicates sample

x_{i}

is a high-confidence prediction, and these points are added to set Z for pseudo-labeling. Unlike just using the maximum predicted probability

p (y_{i} = j^{*} │ x_{i}; W)

for the top-class

j^{*}

, the entropy considers the distribution over all classes. The threshold

δ

is set high to only pseudo-label points with very low entropy/high confidence, ensuring reliability. By verifying low entropy in addition to high maximal probability, the approach pseudo-labels points that are distinct exemplars of their predicted class. This avoids noise from potential mistakes in pseudo-labeling edge cases.

3.2.3. Fine-Tuning of the Deep Convolutional Neural Network Model

Once both the pseudo-labeled samples and manually labeled samples are held constant, Equation (1) can be streamlined as follows:

\min_{W} - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{m} 1 \{y_{i} = j\} \log p (y_{i} = j| x_{i}; W)

(6)

The quantity of labeled and pseudo-labeling data samples is represented by

N

. We update the parameter matrix

W

using the standard backpropagation technique. Specifically, if

L

denotes the loss function defined in Equation (6), then the derivative of

W

with respect to Equation (6) is calculated as indicated in Equation (7).

\begin{matrix} \frac{\partial L}{\partial W} = \partial \frac{- \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{m} 1 \{y_{i} = j\} l \log p (y_{i} = j| x_{i}; W)}{\partial W} \\ = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{m} 1 \{\{y_{i} = j\} \partial \frac{\log p (y_{i} = j| x_{i}; W)}{\partial W} \\ = - \frac{1}{N} \sum_{i = 1}^{N} (1 \{y_{i} = j\} - p (y_{i} = j| x_{i}; W)) \frac{\partial z_{j} (x_{i}; W)}{\partial W} \end{matrix}

(7)

where

{\{z_{j} (x_{i}; W)\}}_{j = 1}^{m}

represents the activations for all samples of

i

at the final layer of the CNN before the SoftMax.

The SoftMax layer takes these activations as input and converts them into normalized class probability distributions via the following SoftMax function:

p (y_{i} = j| x_{i}; W) = \frac{e^{z_{j}} (x_{i}; W)}{\sum_{t = 1}^{m} e^{z_{j}} (x_{i}; W)}

(8)

where

p (y_{i} = j| x_{i}; W)

is the predicted probability that sample

x_{i}

belongs to class

j

, out of

m

total classes. SoftMax normalizes the activations

z_{j} (x_{i}; W)

into probability values between 0 and 1 that sum to 1 across classes. This gives the model’s predicted class distribution for sample

x_{i}

. The SoftMax layer takes the final CNN layer activations and transforms them into normalized predicted class probabilities using the SoftMax function.

To sustain reliable selection of highly certain examples, the confidence threshold

δ

is adjusted at each iteration

t

. As classification performance increases,

δ

can be raised to only pseudo-label very low-entropy, unambiguous points. This adaptive threshold accounts for the model’s evolving knowledge. A higher

δ

later in training ensures pseudo-labels match the classifier’s enhanced competency. By dynamically tuning the confidence criteria, the approach maintains trustworthy pseudo-labeling as the model improves. To ensure the consistent reliability of high-confidence sample selection, we make periodic adjustments to the threshold

δ

at the conclusion of each iteration

t

as follows.

δ = \{\begin{matrix} δ_{0}, t = 0 \\ δ - d r \times t, t > 0 \end{matrix}

(9)

The rate at which the threshold decreases is determined by the parameter

d r

, and

δ_{0}

represents the initial threshold value. The proposed approach procedure is explained in Algorithm 1.

Algorithm 1: Proposed approach

Input
Unannotated samples X
Initially annotated samples Y;
Number of uncertain samples K
High confidence threshold for sample selection

δ

Threshold decay rate

d r

Optimal iteration count T
Fine-tuning intervals

t

Output
CNN network parameters

W

Algorithm
Begin
Initialize

W

using Y.
Set iteration to 0.
While iteration < T do:
Add K uncertain instances to Y based on Equations (2) or (3) or (4).
Obtain pseudo-labeled samples Z according to Equation (5).
If iteration %

t

== 0 then
Update

W

through fine-tuning using Equation (6).
Update

δ

based on Equation (9).
Increment iteration by 1
Return

W

End

4. Experimental Results and Discussion

4.1. Datasets Description

Specifically, two large-scale datasets are utilized to thoroughly benchmark the proposed model’s efficiency and efficacy on critical computer vision tasks. The first one is the Cross-Age Celebrity dataset [35], containing over 163,000 images across 2000 celebrities annotated with identity, age, and gender metadata. This dataset is designed specifically for evaluating facial recognition and retrieval algorithms across diverse age groups and capture conditions. The second one is the Caltech 256 Image dataset [36] with 30,607 real-world variable-sized images spanning 257 distinct object categories, each represented by at least 80 images. This challenging and diverse object recognition dataset supports the development and evaluation of advanced computer vision classification algorithms aiming to achieve human-level visual understanding.

The proposed model is evaluated on a diverse suite of datasets for facial analysis, including age prediction, face recognition, and object classification. This comprehensive testbed enables quantitative benchmarking against state-of-the-art techniques. Performance is measured by accuracy and loss metrics on tasks spanning age estimation, identification, and image categorization. The experiments employ EfficientNet-B0 as the base feature extractor to compare the proposed approach on top of a standardized architecture. By assessing the proposed model across datasets and vision problems, its robustness and generalizability are thoroughly examined. The diversity of tasks and variability of data challenge the model to surpass current alternatives across computer vision applications.

4.2. Experimental Setup

Image Preprocessing: As the first step, all images from both the Cross-Age Celebrity and Caltech Image datasets are resized to a standard resolution of 224 × 224 pixels. This pixel dimension was chosen to align with the input size expected by the EfficientNet-B0 architectures used in this work. Standardizing the image dimensions enables consistent preprocessing and filtering across the two datasets.

Active Learning Parameters: The data are split by randomly assigning 80% of the images per class to comprise the unlabeled training set that drives the active learning process. The remaining 20% of images per class are held out as the test set for final model evaluation. Within the unlabeled training set, 10% of images are randomly selected upfront for manual annotation to initialize the proposed model. The iterative active learning is then performed on the remaining unlabeled training images.

Model Architecture: Identical convolutional neural network architectures are used for experiments on both the Cross-Age Celebrity and Caltech Image datasets. Specifically, an EfficientNet-B0 model pre-trained on ImageNet ILSVRC serves as the standardized base feature extractor. When applied to the Caltech Image dataset, only the last classification layer is modified to match the target classes, while earlier layers remain fixed to leverage the pre-trained weights.

Optimization: For model optimization, stochastic gradient descent (SGD) is employed. A learning rate of 0.01 is used for experiments on the Cross-Age Celebrity dataset. A learning rate of 0.001 is used for experiments leveraging the Caltech Image dataset.

Implementation: The proposed active deep learning system is efficiently implemented in Python 3.10.12 and runs on Google Collaboratory to leverage free NVIDIA Tesla T4 GPU acceleration resources.

4.3. Results and Discussion

The model was trained using different percentages of labeled samples as specified in Table 1 and Table 2. A systematic approach was taken to optimize the proposed model across multiple dimensions. Careful tuning of key optimization hyperparameters like learning rate was paired with standardized deep learning architectures such as EfficientNet-B0 to balance innovation and best practices. Rigorous evaluation methodology comparing performance on tasks like facial recognition against state-of-the-art techniques quantified accuracy gains. This multi-faceted optimization strategy targeting implementation, optimization, architecture, and evaluation explains the proposed model’s impressive performance improvements demonstrated empirically on the Cross-Age Celebrity benchmark and Caltech Image dataset. The gains highlight the real-world value of the strategic optimizations integrated into the active learning approach.

Table 1 provides a statistical depiction of accuracy, precision, recall, and F1 score at different fractions of labeled data taken from the entire training set of the Cross-Age Celebrity dataset. Our proposed model outperforms the conventional model in terms of both classification accuracy and the proportion of labeled data samples as shown in Figure 3.

Figure 3 illustrates a comparison of the classification accuracy of the proposed model and conventional model on the Cross-Age Celebrity dataset as the percentage of labeled training samples is increased. The proposed model consistently achieves higher accuracy and a higher F1 score with fewer labeled samples, outperforming the conventional model along different labeling samples. This superior performance highlights the proposed model’s ability to leverage unlabeled data through semi-supervised learning techniques, reaching 98.9 accuracy and an F1 score of 0.99 with 100% labeled data. In contrast, the conventional model requires significantly more labeled data, with it needing 100% of the samples annotated to only attain an accuracy of 92.3%. The proposed model’s higher accuracy with less training data demonstrates its improved efficiency and effectiveness for facial age classification compared to the conventional approach. Overall, the results validate the proposed model’s advantages in terms of both classification accuracy and data efficiency.

Table 2 provides a statistical depiction of accuracy, precision, recall and F1 score at different fractions of labeled data taken from the entire training set of the Caltech Image dataset. Our proposed model outperforms the conventional model in terms of both classification accuracy and the proportion of labeled data samples. Figure 4 illustrates a comparison of the classification accuracy of the proposed model and the conventional model on the Caltech Image dataset as the percentage of labeled training samples is increased.

The proposed model consistently achieves higher accuracy with fewer labeled samples, outperforming the conventional model at all data samples from 9% to 100% of the full training set. This superior performance highlights the proposed model’s ability to leverage unlabeled data through semi-supervised learning techniques, with it reaching 99.3% accuracy with 100% labeled data. In contrast, the conventional model requires significantly more labeled data, with it needing 100% of the samples annotated to only attain an accuracy of 74.3%. The proposed model’s higher accuracy with less training data demonstrates its improved efficiency and effectiveness for facial age classification compared to the conventional approach. Overall, the results validate the proposed model’s advantages in terms of both classification accuracy and data efficiency.

The accuracy results clearly show that our proposed model outperforms the conventional model, with it achieving higher accuracy with significantly fewer manually annotated samples. Specifically, on the Cross-Age Celebrity benchmark, the conventional model needed 89% of the training images to be labeled to reach 72.8% test accuracy. In contrast, the proposed model achieved 94% test accuracy with only 54% manually labeled data, reducing annotation requirements by 35%. Similar gains are shown on the Caltech Image dataset—the proposed model attains 90.9% accuracy with only 20% labeled samples, compared to the conventional model requiring 87% labeled data to reach 74.9% accuracy.

These substantial reductions in manual annotation while maintaining or improving accuracy demonstrate that our proposed approach successfully minimizes labeling overhead. By judiciously selecting useful samples and pseudo-labeling high confidence predictions, the proposed model consistently improves upon the standard supervised baseline across vision tasks and datasets. Overall, the results highlight that the active learning framework reduces annotation needs while accelerating and boosting model performance. And it provides a promising direction for semi-supervised learning, leveraging the confidence-weighted integration of labels to improve generalization and reliability.

As discussed in [13,14,15], most existing semi-supervised and active learning methods still require some initial labeled seed data, and their performance heavily depends on the quality of this initial dataset. In contrast, our approach eliminates the need for any labeled data upfront and can learn effective classifiers starting from only unlabeled data. Additionally, previous active learning techniques like uncertainty sampling often neglect highly confident predictions which comprise a larger portion of the unlabeled data [16,17]. Our method balances exploring uncertain data points with exploitation by pseudo-labeling confident unlabeled instances. This allows for more comprehensive utilization of the available unlabeled data. The proposed model also employs a self-supervised learning technique to obtain useful data representations from scratch without manual annotations. This enables the approach to work well even with very limited labeled data. Our experiments demonstrate high accuracy while utilizing Fewer labels on datasets. Most prior studies required significantly more labeled data to attain comparable performance. Finally, our uncertainty-based active query strategy selects the most informative samples for manual labeling to give maximal accuracy improvements. This results in greater efficiency by minimizing annotation costs and human effort compared to standard supervised approaches as shown empirically.

The key advantages provided by the proposed model over previous methods include (1) eliminating the need for initial labeled data, (2) balancing exploration and exploitation for better semi-supervised learning, (3) leveraging self-supervision for representation learning, and (4) strategic active sampling to maximize gains per annotated sample. Therefore, our integrated approach pushes forward the state-of-the-art for deep learning with minimal supervision.

5. Conclusions and Future Work

Deep learning has shown promise but requires massive, annotated datasets, which are costly and time-consuming to obtain. Active learning aims to automate annotation, yet still relies on human input for ambiguous cases. The proposed model takes a different approach—rather than only querying uncertain samples, it proactively extracts high-confidence instances from the unlabeled data to augment training. Specifically, the model randomly selects and iteratively assigns labels to samples likely to improve pattern learning. The incorporation of these high-confidence examples increased classification accuracy and reduced user annotation burden while maintaining model stability. Experiments on two challenging image datasets—Caltech and Cross-Age Celebrity—demonstrate that the proposed model assigns accurate labels to unlabeled data while controlling the error rate. The proposed method outperforms existing techniques, providing a way forward for semi-supervised learning. In future work, we will employ attention mechanisms and expand the celebrity dataset to further boost classification accuracy. By selectively propagating labels from confident predictions, the model can provide a reliable semi-supervised framework to obviate exhaustive manual annotation.

Author Contributions

Conceptualization, A.A. (Amira Abdelwahab) and M.S.; methodology, A.A. (Amira Abdelwahab); software, M.S.; validation, A.A. (Amira Abdelwahab), A.A. (Ahmed Afifi) and M.S.; formal analysis, M.S.; investigation, A.A. (Amira Abdelwahab) and A.A. (Ahmed Afifi); resources, A.A. (Amira Abdelwahab); data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, A.A. (Amira Abdelwahab); visualization, M.S.; supervision, A.A. (Amira Abdelwahab); project administration, A.A.(Ahmed Afifi); funding acquisition, A.A. (Amira Abdelwahab). All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deputyship for Research and Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number INST195.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset is available on https://bcsiriuschen.github.io/CARC/, accessed on 1 July 2023, https://www.kaggle.com/datasets/jessicali9530/caltech256, accessed on 15 July 2023.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, J.; Wang, Q. Informativeness-guided active learning for deep learning–based façade defects detection. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 123–135. [Google Scholar] [CrossRef]
Yao, X.; Guo, Q.; Li, A. Cloud Detection in Optical Remote Sensing Images with Deep Semi-supervised and Active Learning. IEEE Geosci. Remote Sens. Lett. 2023, 20, 45–57. [Google Scholar] [CrossRef]
Kang, C.J.; Peter WC, H.; Siang, T.P.; Jian, T.T.; Li, Z.; Wang, Y.-H. An active learning framework featured Monte Carlo dropout strategy for deep learning-based semantic segmentation of concrete cracks from images. Struct. Health Monit. 2023, 22, 14759217221150376. [Google Scholar] [CrossRef]
Li, X.; Du, M.; Zuo, S.; Zhou, M.; Peng, Q.; Chen, Z.; Zhou, J.; He, Q. Deep convolutional neural networks using an active learning strategy for cervical cancer screening and diagnosis. Front. Bioinform. 2023, 3, 1101667. [Google Scholar] [CrossRef] [PubMed]
Guan, X.; Li, Z.; Zhou, Y.; Shao, W.; Zhang, D. Active learning for efficient analysis of high throughput nanopore data. Bioinformatics 2023, 39, btac764. [Google Scholar] [CrossRef] [PubMed]
Zhao, C.; Qin, B.; Feng, S.; Zhu, W.; Sun, W.; Li, W.; Jia, X. Hyperspectral image classification with multi-attention transformer and adaptive superpixel segmentation-based active learning. IEEE Trans. Image Process. 2023, 32, 3606–3621. [Google Scholar] [CrossRef] [PubMed]
Gu, X.; Lu, W.; Ao, Y.; Li, Y.; Song, C. Seismic Stratigraphic Interpretation Based on Deep Active Learning. IIEEE Trans. Geosci. Remote. Sens. 2023, 61, 1–11. [Google Scholar] [CrossRef]
Fu, X.; Cao, H.; Hu, H.; Lian, B.; Wang, Y.; Huang, Q.; Wu, Y. Attention-Based Active Learning Framework for Segmentation of Breast Cancer in Mammograms. Appl. Sci. 2023, 13, 852. [Google Scholar] [CrossRef]
Yuan, D.; Chang, X.; Liu, Q.; Yang, Y.; Wang, D.; Shu, M.; He, Z.; Shi, G. Active learning for deep visual tracking. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–13. [Google Scholar] [CrossRef]
Liu, Y.; Lou, J.; Wang, M.; Zhang, Q.; Ye, J.; Wu, T. Active learning for truss structure design. ACM Trans. Graph. (TOG) 2021, 40, 1–14. [Google Scholar]
Wu, Y.; Kirchhoff, K.F.; Bilenko, M. Active learning for ML enhanced scientific simulation codes. Int. Conf. Mach. Learn. 2021, 77, 11279–11289. [Google Scholar]
Joshi, M.; Sahoo, D.; Hoi, S.C.; Li, J. Online active learning: A review. arXiv 2021, arXiv:2103.12857. [Google Scholar]
Tanno, R.; Saeedi, A.; Sankaranarayanan, S.; Alexander, D.C.; Arridge, S. Learning from noisy labels by regularization with virtual adversarial perturbations. IEEE Trans. Med. Imaging 2021, 41, 137–153. [Google Scholar]
Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Gupta, B.B.; Chen, X.; Wang, X. A survey of deep active learning. ACM Comput. Surv. 2021, 54, 1–40. [Google Scholar] [CrossRef]
Shi, J.; Wang, Q.; Zhang, Q. Active deep metric learning. Proc. AAAI Conf. Artif. Intell. 2021, 35, 4634–4641. [Google Scholar]
Chen, X.; Song, Y.; Islam, A.; Ren, X.; Ryu, J.K. Noise-aware unsupervised domain adaptation via stochastic conditional shift embedding. Proc. AAAI Conf. Artif. Intell. 2021, 35, 3980–3988. [Google Scholar]
Jiang, L.; Meng, D.; Zhao, Q.; Shan, S.; Hauptmann, A.G. Self-paced curriculum learning. In Proceedings of the Twenty-ninth AAAI conference on artificial intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Wang, K.; Zhang, D.; Li, Y.; Zhang, R.; Lin, L. Cost-effective active learning for deep image classification. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2591–2600. [Google Scholar] [CrossRef]
Gal, Y.; Ghahramani, Z. Bayesian convolutional neural networks with Bernoulli approximate variational inference. In Proceedings of the International Conference on Learning Representations (ICLR) Workshop Track, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Yoo, D.; Kweon, I.S. Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 93–102. [Google Scholar]
Jamal, M.A.; Li, S.; Mong, S.; An, G.; Shuai, Q.; Vasconcelos, N. Rethinking class balanced self-training. arXiv 2020, arXiv:2010.08806. [Google Scholar]
Munjal, B.; Chakraborty, S.; Goyal, P.K. Towards efficient active learning for video classification using temporal coherence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 17362–17371. [Google Scholar]
Zhang, Y.; Ding, Z.; Li, J.; Ogunbona, P.; Xu, D.; Wang, H. Importance-aware semantic segmentation for autonomous driving. Proc. AAAI Conf. Artif. Intell. 2020, 34, 13046–13053. [Google Scholar]
Sofiiuk, K.; Barinova, O.; Konushin, A.; Aliev, T.; Vetrov, D.P. f-ALDA: F-divergences minimization for active learning. arXiv 2021, arXiv:2103.08333. [Google Scholar]
Tanno, R.; Arulkumaran, K.; Alexander, D.C.; Criminisi, A.; Nori, A. Adaptive neural trees. Int. Conf. Mach. Learn. 2020, 9438–9447. [Google Scholar] [CrossRef]
Chen, X.; Pu, X.; Chen, Z.; Li, L.; Zhao, K.N.; Liu, H.; Zhu, H. Application of EfficientNet-B0 and GRU-based deep learning on classifying the colposcopy diagnosis of precancerous cervical lesions. Cancer Med. 2023, 12, 8690–8699. [Google Scholar] [CrossRef] [PubMed]
Zhu, W.; Hu, J.; Sun, G.; Cao, X.; Qian, X. A/B test: Towards rapid traffic splitting for personalized web service. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 78–88. [Google Scholar]
Raza, R.; Zulfiqar, F.; Owais Khan, M.; Arif, M.; Alvi, A.; Iftikhar, M.A.; Alam, T. Lung-EffNet: Lung cancer classification using EfficientNet from CT-scan images. Eng. Appl. Artif. Intell. 2023, 126, 106902. [Google Scholar] [CrossRef]
Jiang, Y.; Huang, D.; Zhang, C. Beyond synthetic noise: Deep learning on controlled noisy labels. Int. Conf. Mach. Learn. 2020, 4804–4815. [Google Scholar] [CrossRef]
Wu, Y.; Winston, E.; Kaushik, D.; Lipton, Z. Domain adaptation with asymmetrically relaxed distribution alignment. Int. Conf. Mach. Learn. 2020, 10283–10293. [Google Scholar] [CrossRef]
Biswas, S.; Pal, S.; Maity, S.P.; Balasubramanian, V.N. Effects of noisy labels on deep neural network architectures. Neural Netw. 2020, 133, 19–29. [Google Scholar]
Wang, Y.; Wang, H.; Shen, Y.; Fei, J.; Li, W.; Jin, G.; Wu, L.; Zhao, R.; Le, X. Semi-supervised semantic segmentation using unreliable pseudo-labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4248–4257. [Google Scholar]
Chen, S.; Wang, R.; Lu, J. A meta-framework for multi-label active learning based on deep reinforcement learning. Neural Netw. 2023, 162, 258–270. [Google Scholar] [CrossRef]
Cacciarelli, D.; Kulahci, M. A survey on online active learning. arXiv 2023, arXiv:2302.08893. [Google Scholar]
Tan, F.; Zheng, G. Active learning for deep object detection by fully exploiting unlabeled data. Connect. Sci. 2023, 35, 2195596. [Google Scholar] [CrossRef]
Shen, X.; Dai, Q.; Ullah, W. An active learning-based incremental deep-broad learning algorithm for unbalanced time series prediction. Inf. Sci. 2023, 642, 119103. [Google Scholar] [CrossRef]

Figure 1. The framework of the proposed approach with human-in-the-loop annotation.

Figure 2. Overall architecture and key components of the EfficientNet-B0 model.

Figure 3. Classification accuracy vs. percentage of labeled data samples for proposed and conventional models on the Cross-Age Celebrity dataset.

Figure 4. Classification accuracy vs. percentage of labeled data samples for the proposed and conventional models on the Caltech Image dataset.

Table 1. Evaluation results for the Cross-Age Celebrity dataset.

Labeled Samples	Accuracy		Precision		Recall		F1 Score
Labeled Samples	Proposed	Conventional	Proposed	Conventional	Proposed	Conventional	Proposed	Conventional
9%	64.7%	61.7%	74%	71%	75%	73%	0.75	0.72
18%	86.4%	74.1%	88%	80%	85%	78%	0.87	0.79
27%	90.3%	80.2%	91%	84%	91%	84%	0.91	0.84
36%	94.6%	84.5%	94%	87%	94%	87%	0.94	0.87
45%	96.3%	87.1%	96%	90%	99%	94%	0.98	0.92
54%	96.5%	88.5%	93%	91%	94%	92%	0.93	0.92
63%	97.5%	88.9%	96%	91%	95%	90%	0.95	0.91
72%	98.5%	89.8%	98%	92%	96%	91%	0.97	0.92
81%	98.7%	91.2%	99%	94%	97%	93%	0.98	0.94
90%	98.9%	91.9%	99%	95%	98%	94%	0.98	0.95
100%	98.9%	92.3%	99%	96%	99%	96%	0.99	0.96

Table 2. Evaluation results for the Caltech Image dataset.

Labeled Samples	Accuracy		Precision		Recall		F1 Score
Labeled Samples	Proposed	Conventional	Proposed	Conventional	Proposed	Conventional	Proposed	Conventional
9%	71.7%	57.7%	78%	61%	80%	63%	0.79	0.62
18%	86.4%	61.7%	87%	65%	82%	61%	0.84	0.63
27%	92.3%	62.2%	92%	66%	94%	66%	0.93	0.66
36%	94.6%	65.5%	94%	69%	94%	69%	0.94	0.69
45%	95.3%	67.1%	95%	71%	96%	74%	0.96	0.72
54%	96.5%	70.5%	96%	73%	96%	76%	0.96	0.75
63%	97.5%	71.9%	97%	74%	97%	73%	0.97	0.74
72%	98.5%	72.8%	98%	75%	97%	74%	0.98	0.75
81%	98.7%	73.2%	99%	76%	98%	75%	0.98	0.75
90%	98.9%	73.9%	99%	76%	99%	75%	0.99	0.76
100%	99.3%	74.3%	99%	77%	99%	77%	0.99	0.77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abdelwahab, A.; Afifi, A.; Salama, M. An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision. Electronics 2024, 13, 169. https://doi.org/10.3390/electronics13010169

AMA Style

Abdelwahab A, Afifi A, Salama M. An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision. Electronics. 2024; 13(1):169. https://doi.org/10.3390/electronics13010169

Chicago/Turabian Style

Abdelwahab, Amira, Ahmed Afifi, and Mohamed Salama. 2024. "An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision" Electronics 13, no. 1: 169. https://doi.org/10.3390/electronics13010169

APA Style

Abdelwahab, A., Afifi, A., & Salama, M. (2024). An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision. Electronics, 13(1), 169. https://doi.org/10.3390/electronics13010169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Base Classifier

3.2. Proposed Approach

3.2.1. Initialization Methods

3.2.2. Choosing Samples with Uncertainty

3.2.3. Fine-Tuning of the Deep Convolutional Neural Network Model

4. Experimental Results and Discussion

4.1. Datasets Description

4.2. Experimental Setup

4.3. Results and Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI