1. Introduction
The exploration of self-consciousness and self-awareness in artificial intelligence (AI) [
1,
2,
3] involves a profound understanding of individuality, emotions, motivations and the subtleties of behavior. This heightened cognitive state enables entities to make informed decisions, set meaningful goals and adapt dynamically. A crucial question arises: can AI models achieve a comparable level of cognitive sophistication for making informed decisions?
To answer this question, it is important to explore the basic principles and evolution of AI. The rise of AI, which is often compared with modern electricity due to its transformative potential, is due to machine learning (ML), the fundamental sub-discipline of AI. The emergence of deep learning (DL) within ML has catapulted AI to unprecedented heights, particularly in image classification, where the task is to categorize images within a predefined set of possibilities.
Figure 1 shows representative images of peripheral blood smears with eosinophils, a type of white blood cell (WBC) [
4,
5]. In particular,
Figure 1a shows an eosinophil from the PBC dataset, which is known for its high-quality and rich WBC images [
4]. In contrast,
Figure 1b shows an eosinophil image from the BCCD dataset, which has lower image quality and fewer WBC images compared with the PBC dataset [
5].
Figure 1c highlights the background of the PBC image (
Figure 1a) on the left and the eosinophilic cell on the right [
4].
Figure 1d further separates the eosinophilic cell into its nucleus and cytoplasm [
4]. The structure and size of the nucleus and cytoplasm help hematologists and AI models to distinguish eosinophils from other WBCs.
Traditional methods for classifying digital neutrophils involve developing a feature extraction algorithm and passing these features to a classifier for eosinophil recognition. This process requires the creation of a feature extractor and classifier using advanced image processing techniques. In machine learning, complicated image processing techniques are used to create a feature extractor that serves as the input to ML classifiers such as XGBoost, random forests (RFs) and support vector machines (SVMs). In contrast, deep learning seamlessly automates both feature extraction and classification [
6,
7]. Deep learning, especially through convolutional neural networks (CNNs), uses convolutional filters as feature extractors and fully connected (FC) dense layers as classifiers [
8,
9]. These large-scale CNNs, often referred to as ConvNets, require a large dataset, such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset, which contains 1,281,167 training images, 50,000 validation images and 100,000 test images [
10]. The ILSVRC’s goal, achieved in 2015 with the ResNet-50 model with an error rate of 3.6%, underscores the goal of cultivating ConvNets with an error rate below 5%, which corresponds to the average human error [
11]. This success has highlighted the suitability of ImageNet models for various image feature extraction and classification tasks and has stimulated research into transfer learning (TL) [
12].
There is a critical challenge in transfer learning—the lack of a standardized understanding of the required dataset size for effective TL processes. This knowledge gap impairs the achievement of exceptional accuracy, with researchers attributing inaccuracies to inadequate TL dataset sizes [
13]. This paper attempts to address this challenge by investigating the feasibility of building self-aware ImageNet models.
After this introduction in
Section 1, this paper is structured as follows:
Section 2 reviews the relevant literature on the feasibility of self-learning AI models.
Section 3 outlines the materials and methods used in the study.
Section 4 presents the experimental results.
Section 5 provides a discussion of these results, and
Section 6 concludes the study.
3. Materials and Methods
This section outlines the methodology employed to classify white blood cell (WBC) images into various categories using pre-trained deep neural network models. The datasets used include two peripheral blood smear datasets, namely “PBC” and “BCCD.” The detailed methodology is illustrated in
Figure 4.
3.1. PBC and BCCD Datasets
The PBC dataset comprises 17,092 images categorized into eight blood cell groups [
4], as summarized in
Table 1.
The PBC images have a standard size of 360 × 363 pixels, which is very close to the input requirements of the ImageNet models and thus minimizes the effects of resizing the images.
Figure 5 [
4] shows the images of each blood cell type in the eight groups.
The BCCD dataset originally comprises 410 peripheral blood smears of red blood cells (RBCs), white blood cells and platelets [
5]. The images are in JPEG format and have a size of 640 × 480 pixels.
Figure 6 shows typical images for each of the four WBC types in the original BCCD dataset.
These blood smears often contain accumulated erythrocytes and oil droplets, which contributes to the lower image quality in the BCCD dataset.
Table 2 summarizes the distribution of eosinophils, lymphocytes, monocytes and neutrophils in the original BCCD dataset [
5].
3.2. Reduced PBC and Augmented BCCD Datasets
Because the BCCD dataset includes only four WBC classes while the PBC dataset contains eight, it is necessary to exclude the additional four WBC classes from the PBC dataset. This adjustment ensures that both datasets have four classes each.
Moreover, because the BCCD dataset has a limited number of images per class, it requires initial augmentation to create a balanced BCCD dataset. The employed augmentation techniques are image transformations such as image rotating and shear, and they are randomly applied to produce enough data.
Table 3 represents a summary of the four WBCs distribution in the new augmented BCCD dataset [
28].
Figure 7 shows four sample frames for each of the four WBC types in the new expanded BCCD dataset [
28].
As a result, we obtain a reduced 4-class PBC dataset and an augmented 4-class BCCD dataset.
3.3. Preprocessing Steps
Dataset pre-processing includes three steps: image resizing, data balancing, and splitting.
First, images in both the PBC and BCCD datasets need to be resized to fit the standard “240 × 240” image input of ImageNet models.
Secondly, the analysis of behavior demands keeping a minimum number of evaluating metrics during comparison. This target is achieved through balanced datasets using accuracy and loss as assessment tools. Three balanced datasets (
Table 4), DS-1, DS-2 and DS-3, will represent the PBC dataset. Of the eight blood cell types in the PBC dataset, we have included only four: neutrophils, lymphocytes, monocytes and eosinophils. This selection aims to maintain consistency by targeting the same number of classes in both the newly balanced PBC dataset and the BCCD dataset.
Three balanced WBC datasets (
Table 5), DS-4, DS-5 and DS-6, will represent the BCCD dataset.
The final data pre-processing step involves dividing data into training and validating data with a 10-to-1 fold for each new PBC and BCCD datasets.
3.4. Balanced PBC and BCCD Datasets
After the pre-processing steps, we obtain three balanced PBC datasets (DS-1, DS-2 and DS-3) and three balanced BCCD datasets (DS-4, DS-5 and DS-6). This diversity in terms of dataset size and image quality enables an investigation of their effects on the learning behavior of the pre-trained ImageNet CNNs through the transfer learning process.
The next step is to standardize the experimental setup for all common models to provide a solid foundation for subsequent analysis.
3.5. Unified Experimental Setup
The experimental setup is standardized with the following elements: data augmentation techniques, data shuffling, metrics used, employed loss function used, learning rate range, automated learning rate reduction, early stopping, model storage and the deep learning technique used.
Geometric transformations are applied as augmentation techniques to improve model generalization. Data shuffling is performed only on training subsets to ensure better generalization. “Accuracy” is chosen as the metric, and “Categorical Cross-Entropy” is used as the loss function. The optimizers used include AdaGrad [
29], ADAM [
30], RMSprop [
31], and SGD [
32].
The learning process involves selecting a range of learning rates (LRs) from to . The reduction of the learning rate reduction is controlled by the “Plateau” function with a certain degree of patience level. The learning process is completed with the “Early Stopping” function. The trained model is saved for later analysis and evaluation.
The deep learning technique used in these experiments is transfer learning (TL), which utilizes the pre-trained experience of the ImageNet CNNs to avoid the need for large datasets, extensive computational resources and time.
3.6. Experimental Subjects: ImageNet Pre-Trained CNNs
The ImageNet competition marked a pivotal moment in the evolution of computer vision by exposing the shortcomings of shallow models with few hidden layers and spurring the development of deep architectures such as ResNet, DenseNet and VGGs. Prior to these advances, shallow machine learning models struggled to effectively update synaptic weights across numerous layers, limiting their ability to learn complex features from data. This limitation made them unsuitable for large-scale image classification tasks.
However, the introduction of deep architectures with innovative techniques such as residual connections, dense connectivity and hierarchical feature extraction overcame these barriers. These advances facilitated the training of deep networks by tackling problems such as vanishing gradients and enabling the effective propagation of information across multiple layers. As a result, the ImageNet competition has triggered the proliferation of deep learning methods that are revolutionizing computer vision and advancing artificial intelligence.
The focus of this study is on analyzing the behavior of the ImageNet models rather than discussing their architectures in detail. The 11 ImageNet models used in this study include Xception [
33]; VGG-16 and VGG-19 [
34]; ResNet-50, ResNet-101 and ResNet-152 V2 [
35]; Inception V3 [
36]; InceptionResNet V2 [
37]; and DenseNet-121, DenseNet-169 and DenseNet-201 [
38].
The VGG models [
34] marked a significant milestone in the ImageNet competition by demonstrating the effectiveness of deep architectures in large-scale image classification tasks. The VGG models used a simple but powerful architecture with a stack of convolutional layers followed by max-pooling layers, achieving remarkable success. This unified structure enabled the effective extraction of complicated features at different spatial resolutions and the acquisition of rich representations of the input images.
The straightforward design of the VGG architecture facilitated implementation and interpretation, making it a popular choice for researchers and practitioners. VGG models demonstrated unprecedented accuracy in image classification, underscoring the transformative potential of deep learning methods. VGG’s success in the ImageNet competition has underscored the importance of deep architectures and catalyzed further advances in computer vision research.
Figure 8 illustrates the architecture of the VGG-16 model [
34] using the example of classifying a neutrophil from the PBC dataset [
4]. The VGG-16 model is characterized by its deep structure, which consists of 16 layers and effectively recognizes complicated features within the neutrophil image. Each layer contains convolutional operations followed by max-pooling layers that enable hierarchical feature extraction across multiple scales. This architecture enables the VGG-16 model to capture fine-grained details essential for accurate classification tasks and shows exceptional performance in distinguishing different cell types.
ImageNet accuracy is a well-established metric for models trained on the ILSVRC dataset with fast processors over an extended period of time. In contrast, transfer learning (TL) accuracy depends on the specific TL approach, the size of the new dataset, its similarity to ILSVRC and the quality of the images [
13]. These factors are usually determined by AI researchers. Novice AI researchers often lack the expertise to accurately assess these factors accurately, which can lead to the overfitting of deep learning models. This emphasizes the importance of incorporating sensors that track the progress of the learning process history to achieve self-awareness.
3.7. Evaluation and Interpretability Tools
The first TL form is used in this study. Different optimizers (SGD, RMSprop, AdaGrad and ADAM) and loss functions (categorical cross-entropy and MSE) were tested, with the best results being obtained with the ADAM optimizer and the categorical cross-entropy loss function. In order to maintain consistency, the results of the 50th epoch were used, and functions such as “TerminateOnNaN”, “ReduceLROnPlateau” and “EarlyStopping” were used initially.
Table 6 presents the constant parameters used in the experiments. These include the ADAM optimizer with a default learning rate of 0.001, the categorical cross-entropy loss function, the accuracy metric, a total of 50 epochs and a training-to-validation ratio of 90% to 10%.
Google Colaboratory (Google Colab) was used for the experiments. Google Colab is a valuable machine learning and data analytics tool that allows data scientists to write and run Python code via an online-hosted Jupyter notebook [
39].
For evaluation and interpretability, confusion matrices with the corresponding classification reports [
8] and Score-CAM are used to numerically and visually analyze and evaluate the performance of the models. In addition, two new parameters, accuracy difference (AD) and loss difference (LD), are developed as potential indicators of self-awareness ability.
Confusion matrices are analyzed numerically using classification report parameters such as accuracy, precision, recall and F1-score [
8].
Accuracy is the ratio of correctly predicted classes to the total number of samples evaluated [
8]. The equation for accuracy is:
where
TP and
TN stand for true positive and true negative cases, respectively, and
FP and
FN stand for false positive and false negative cases, respectively.
Precision measures the quality of a positive prediction of the model [
8]. The equation for precision is:
Recall measures how many of the true positives were correctly identified by the model [
8]. The equation for recall is:
The F1-score is the harmonic mean of precision and recall [
8]. The equation for the F1-score is:
Score-CAMs are used to visualize the internal focus of feature maps of models [
40]. They provide a visual assessment of model performance alongside numerical assessment through classification reports.
3.8. Tool for Behavioral Studies
The core of this work is the search for potential indicators that could provide a concise history of the learning process in pre-trained deep learning models. Typically, researchers compare training and validation accuracy curves and loss curves to assess over- or underfitting during the learning process. In this study, it is proposed to directly measure the differences between these curves as potential indicators of model behavior.
Accuracy difference (
AD) is the difference between the training accuracy (
TA) and the validation accuracy (
VA). The equation for
AD is:
The loss difference (
LD) is the difference between the training loss (
TL) and the validation loss (
VL). The equation for
LD is
These differences serve as potential indicators for achieving self-awareness in AI models. They provide insights into the learning process and guide necessary adjustments to improve model performance.
3.9. Theoretical Framework for AD and LD
In traditional machine learning, overfitting and underfitting are evaluated by analyzing the gap between training and validation performance, typically through accuracy and loss curves. AD and LD simplify this process by offering quantitative measures of these gaps at any point during training.
From a mathematical standpoint, AD and LD represent the degree of divergence between the model’s internal representation of the data (captured by training accuracy and loss) and its generalization capability (captured by validation accuracy and loss). The larger the AD and LD, the more the model is either overfitting or underfitting, which reflects the model’s inability to balance learned knowledge with generalization to unseen data.
Conceptually, AD and LD can be seen as measures of the model’s “awareness” of its performance, functioning similarly to self-monitoring mechanisms in conscious entities. This self-awareness emerges when the model is able to recognize the divergence between its training and validation phases, enabling adjustments to its learning process. The use of AD and LD allows the model to “detect” this divergence and signal whether it is overconfident in its learning (overfitting) or underconfident (underfitting).
3.10. Comparison with Existing Methods
While AD and LD are useful for detecting overfitting or underfitting, they should be considered alongside traditional performance metrics like accuracy, precision, recall, and confusion matrices. Unlike early stopping or regularization techniques (e.g., dropout, L2 regularization), which prevent overfitting, AD and LD help monitor learning progression without intervening directly. However, these metrics may have limitations, particularly in highly imbalanced datasets or in noisy conditions, where standard methods such as confusion matrices and cross-validation could provide a more detailed view of model performance.
In certain edge cases, AD and LD may not effectively capture model behavior. For example, in imbalanced datasets where accuracy alone is not an accurate indicator of model performance, AD might misleadingly show progress, even when precision or recall are poor. Similarly, in cases with noisy data, the loss difference (LD) might fluctuate significantly, obscuring the true learning behavior. Therefore, in such situations, incorporating advanced regularization methods and additional metrics may be required to ensure a robust evaluation of model performance.
We now highlight that while learning and validation curves reveal trends in model performance, AD and LD offer a concise, singular metric that quantifies the difference between training and validation results, making them easier to interpret. Furthermore, methods like test time augmentation and prediction confidence scores focus on robustness during inference, whereas AD and LD focus on behavior during training, providing different but complementary perspectives.
4. Results
This section presents the results of the transfer learning process with eleven ImageNet models on six balanced datasets created from two peripheral blood smear datasets, PBC and BCCD. The balanced datasets from the PBC dataset are DS-1, DS-2 and DS-3, and those from the BCCD dataset are DS-4, DS-5 and DS-6.
Because all 11 ImageNet models showed similar behavior, the results for the ImageNet ResNet-101 V2 model are presented graphically for simplicity, while summary tables contain the numerical results for the other models.
4.1. Balanced PBC Dataset Results: DS-1, DS-2 and DS-3
Figure 9 shows the accuracy summary results of fitting the ResNet-101 V2 model with the PBC-balanced datasets: DS-1, DS-2 and DS-3.
Figure 9a,b show that the ResNet-101 V2 model has an average AD value of +2% and +11% (see
Table 7) when using the DS-1 and DS-2 datasets, respectively. After fitting with the DS-3 dataset, the AD value fluctuates between negative and positive values (
Figure 9c) before finally reaching 100% accuracy (see
Table 7).
Figure 9d,e show that the ResNet-101 V2 model has an average LD value of −0.16 and −0.55 when fitted with the DS-1 and DS-2 datasets, respectively. This pattern indicates a significant overfitting of the ResNet-101 model due to the data size. When using the DS-3 dataset (
Figure 9f), the LD value alternates between negative and positive values before finally reaching zero loss.
The identification of overfitting is evident in both the DS-1 and DS-2 datasets, as shown in
Figure 9. To address this, the introduction of the DS-3 dataset proves to be a crucial solution. This improvement is due to the considerable increase in the number of images, which emphasizes the importance of an adequate size of the dataset to mitigate the problem of overfitting. It is important to recognize that mitigating overfitting through robust dataset size is a nuanced aspect that may be overlooked by some researchers.
Table 7 provides a summary of the average AD and LD values for ImageNet models fitted with the balanced PBC datasets.
The learning behavior of the other ImageNet models mirrors that of the ResNet-101 V2 model. The only differences lie in the average AD and LD values, which vary slightly and are either slightly higher or lower.
4.2. Balanced BCCD Dataset Results: DS-4, DS-5 and DS-6
Figure 10 shows the accuracy summary results of fitting the ResNet-101 V2 model with the BCCD-balanced datasets: DS-4, DS-5 and DS-6.
Figure 10a–c show that the ResNet-101 V2 reaches an average AD value of +56%, +55% and +52% (see
Table 8) with the use of the BCCD-balanced datasets. The DS-4 AD curve has less variation in comparison with the DS-5 and DS-6 AD curves.
Figure 10d–f show that the ResNet-101 V2 reaches an average LD value of −3.35, +15.05 and +20 (see
Table 8) with the use of the BCCD-balanced datasets. All DS-4, DS-5 and DS-6 TL training curves are horizontal lines with a near-zero average value. The DS-4 VL curve is nearly a horizontal line with a value of 3.35. However, the DS-5 and DS-6 VL curves raise exponentially with high fluctuations. Such behavior demonstrates clear overfitting, which becomes even worse with increasing data size and consequently noise.
Table 8 shows the average AD and LD difference values for ImageNet models fitted with the BCCD-balanced datasets using the first TL form.
4.3. DenseNet-169 Model Results
This study used the DenseNet-169 model as a representative example to illustrate the classification reports of the ImageNet models. These summaries show the performance of the model when trained on different datasets, including PBC DS-2 (400 images per class), PBC DS-3 (1000 images per class), BCCD DS-4 (400 images per class) and BCCD DS-5 (1000 images per class).
Figure 11a,b show the confusion matrices for the DenseNet-169 model trained with the PBC DS-3 (1000 images per class) and BCCD DS-5 (1000 images per class) datasets, respectively.
The summary of classification reports for the DenseNet-169 model across different datasets, including PBC DS-2 (400 images per class), PBC DS-3 (1000 images per class), BCCD DS-4 (400 images per class) and BCCD DS-5 (1000 images per class), is shown in
Table 9.
Table 10 showcases the performance metrics of different optimizers on the PBC DS-3 dataset (1000 images per class). Each optimizer is evaluated based on its precision, recall, F1-score and accuracy. ADAM stands out with perfect scores across all metrics, achieving a remarkable precision, recall, F1-score and accuracy of 1.00. Following closely behind is RMSprop, displaying high scores across the board with a precision of 0.98, recall of 0.99, F1-score of 0.98 and accuracy of 0.98. AdaGrad and SGD also demonstrate strong performance, maintaining consistency with scores of 0.97 in precision, recall, F1-score and accuracy. This comparison underscores ADAM’s exceptional performance in handling balanced datasets, setting a high standard for precision-driven optimization.
SCORE-CAM (spatially class-discriminative relevance class activation mapping) is a method in deep learning that identifies and highlights important areas in an image contributing to a neural network’s classification decision. It helps visualize which parts of an image are significant for a specific classification outcome, aiding in model interpretation.
Figure 12 shows the Score-CAM of DenseNet-169 trained with the PBC DS-2 (400 images per class) and DS-3 (1000 images per class) datasets.
Figure 13 shows the Score-CAM of DenseNet-169 trained with the BCCD DS-4 (400 images per class) and DS-5 (1000 images per class) datasets.
5. Discussion
5.1. Learning Process and Impacting Factors
The results show that ImageNet models exhibit specific learning behavior that is influenced by several factors, including:
Model: architecture and depth;
Dataset: size and class representation;
Image: similarity (feature depth), quality and size;
Transfer learning (TL) shape: first, second or third.
The effect of model depth is consistent in the two balanced datasets of BCCD and PBC. Increasing the number of CNN layers requires more data, which means that learning behavior can be observed and identified earlier for shallower models.
The use of PBC balanced datasets demonstrates the importance of sufficient data and balanced class representation. In contrast, BCCD balanced datasets show that data augmentation techniques (DATs) are ineffective in compensating for lack of data in small, unbalanced datasets.
One thousand images per class in the PBC DS-3 dataset was sufficient to achieve AD and LD values of zero. However, using 2750 images per WBC class in the BCCD DS-6 dataset resulted in significantly poorer performance.
High-quality images allow models to achieve mature behavior characterized by “zero AD and LD values” with a minimum of data, computational resources and time. Finally, as mentioned earlier, the choice of a particular TL shape can significantly affect the learning progression of ImageNet models.
Our findings align with previous studies on TL in medical image analysis, where dataset size and quality are key factors influencing model performance. However, we observed unexpectedly high overfitting in the BCCD dataset despite augmentation, suggesting that beyond data quantity, intrinsic image quality plays a critical role. This observation is consistent with the recent literature indicating that image quality impacts TL success, but the extent of overfitting observed here, even with large datasets, suggests that further investigation is needed to understand the limits of augmentation techniques in compensating for low-quality data.
Moreover, the contrasting performance between the PBC and BCCD datasets indicates that factors such as dataset quality may be more critical than previously emphasized in the literature. This opens the door for future research into strategies for mitigating overfitting when working with lower-quality datasets, as well as refining augmentation methods to better address issues stemming from data quality.
One limitation of AD and LD is their sensitivity to dataset quality and size. While effective for smaller datasets like PBC and BCCD, these metrics may require adjustment when applied to larger, more diverse datasets. Furthermore, their effectiveness in detecting self-awareness in more complex architectures, such as transformer-based models, remains an open question. Future research will explore how these metrics can be integrated into existing AI systems to improve self-awareness and decision making, particularly in real-time applications like autonomous driving.
5.2. ImageNet Models and Consciousness Path
Accuracy, loss and their respective training and validation difference functions have been shown to be reliable indicators of the learning process in ImageNet models. These parameters serve as effective tools for measuring, monitoring and updating information about the actual experience and maturity of these models. The inclusion of these parameters provides ImageNet models with a kind of self-awareness and gives them a sense of their performance and learning process.
5.3. DenseNet-169 Model Results
The results of the DenseNet-169 model, including the confusion matrices, summary classification reports and score CAMs, contribute significantly to understanding the effectiveness of using AD and LD as censoring elements to build more confident ImageNet models.
In
Figure 12 and
Figure 13, the focal areas of the Score-CAMs on the testing images from the original PBC dataset and the augmented BCCD dataset illustrate the influence of data quantity in two contrasting scenarios: high-quality and low-quality images. Specifically,
Figure 12b,c demonstrate that an increase in the quantity of high-quality images within the PBC dataset enhances the model’s ability to focus on the object of interest, namely the four white blood cells. In contrast,
Figure 13b,c visually indicate that increasing data quantity in low-quality images, such as those from the BCCD-4 and BCCD-5 datasets, leads to greater model confusion.
6. Conclusions
To summarize, this study represents a pioneering attempt to integrate censorship elements into the development of self-aware artificial intelligence, with a focus on self-aware convolutional neural networks (ConvNets). By examining blood cell classification tasks using the “PBC” and “BCCD” datasets, this research provides invaluable insights into endowing ImageNet models with self-aware capabilities.
The “PBC” dataset, known for its high quality, coherent images and balanced class distribution, contrasts sharply with the challenges of the “BCCD” dataset, which lacks these favorable characteristics. Despite attempts to compensate for the imbalance of the dataset through data augmentation, this study reveals significant challenges, particularly in mitigating accuracy losses, highlighting the complicated interplay between data quality, quantity and model performance.
By analyzing the accuracy difference (AD) and loss difference (LD), this study highlights their central role as censors in investigating the behavior of ConvNets and shedding light on the dynamics of the learning process. These metrics not only serve as indispensable tools for measurement and monitoring, but also pave the way for the introspection of ImageNet models.
While the practical realization of self-aware AI may still seem a long way off, this study serves as a beacon pointing the way for AI researchers to further explore this burgeoning field. By explaining the behavior of ConvNets and highlighting the importance of censorship elements, this work contributes to the foundational knowledge that is essential for the eventual manifestation of self-aware artificial intelligence.
Furthermore, this work lays the foundation for the realization of self-aware AI models and encourages further research to investigate additional censoring elements and various AI tasks beyond image classification. As the field progresses, continued efforts in this direction will undoubtedly pave the way for transformative advances in artificial intelligence and usher in a new era of sentient and self-aware machines.
Future Research Directions
Future research will focus on integrating additional censoring elements to enhance self-awareness, such as uncertainty quantification techniques. Moreover, the exploration of self-aware models in other fields like autonomous systems and robotics will be critical for expanding the applicability of this approach. We also plan to apply explainability tools like SHAP and LIME to further interpret the decision-making process of self-aware AI models, providing more transparency and insight into their internal workings. Additionally, research into addressing the limitations of data augmentation techniques for low-quality datasets will be crucial, especially in medical image analysis, to develop more robust AI systems.