WGCAMNet: Wasserstein Generative Adversarial Network Augmented and Custom Attention Mechanism Based Deep Neural Network for Enhanced Brain Tumor Detection and Classification

Alam, Fatema Binte; Fahim, Tahasin Ahmed; Asef, Md; Hossain, Md Azad; Dewan, M. Ali Akber

doi:10.3390/info15090560

Open AccessArticle

WGCAMNet: Wasserstein Generative Adversarial Network Augmented and Custom Attention Mechanism Based Deep Neural Network for Enhanced Brain Tumor Detection and Classification

by

Fatema Binte Alam

¹

,

Tahasin Ahmed Fahim

²

,

Md Asef

³

,

Md Azad Hossain

⁴

and

M. Ali Akber Dewan

^5,*

¹

Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh

²

Department of Electrical and Electronic Engineering, University of Chittagong, Chittagong 4331, Bangladesh

³

Department of Electrical and Computer Engineering, Auburn University, Auburn, AL 36849, USA

⁴

Department of Electronics and Telecommunication Engineering, Chittagong University of Engineering and Technology, Chittagong 4349, Bangladesh

⁵

School of Computing and Information Systems, Faculty of Science and Technology, Athabasca University, Athabasca, AB T9S 3A3, Canada

^*

Author to whom correspondence should be addressed.

Information 2024, 15(9), 560; https://doi.org/10.3390/info15090560

Submission received: 7 August 2024 / Revised: 27 August 2024 / Accepted: 8 September 2024 / Published: 11 September 2024

(This article belongs to the Special Issue Applications of Deep Learning in Bioinformatics and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Brain tumor detection and categorization of its subtypes are essential for early diagnosis and improving patient outcomes. This research presents a cutting-edge approach that employs advanced data augmentation and deep learning methodologies for brain tumor classification. For this work, a dataset of 6982 MRI images from the IEEE Data Port was considered, in which a total of 5712 images of four classes (1321 glioma, 1339 meningioma, 1595 no tumor, and 1457 pituitary) were used in the training set and a total of 1270 images of the same four classes were used in the testing set. A Wasserstein Generative Adversarial Network was implemented to generate synthetic images to address class imbalance, resulting in a balanced and consistent dataset. A comparison was conducted between various data augmentation metholodogies demonstrating that Wasserstein Generative Adversarial Network-augmented results perform excellently over traditional augmentation (such as rotation, shift, zoom, etc.) and no augmentation. Additionally, a Gaussian filter and normalization were applied during preprocessing to reduce noise, highlighting its superior accuracy and edge preservation by comparing its performance to Median and Bilateral filters. The classifier model combines parallel feature extraction from modified InceptionV3 and VGG19 followed by custom attention mechanisms for effectively capturing the characteristics of each tumor type. The model was trained for 64 epochs using model checkpoints to save the best-performing model based on validation accuracy and learning rate adjustments. The model achieved a 99.61% accuracy rate on the testing set, with precision, recall, AUC, and loss of 0.9960, 0.9960, 0.0153, and 0.9999, respectively. The proposed architecture’s explainability has been enhanced by t-SNE plots, which show unique tumor clusters, and Grad-CAM representations, which highlight crucial areas in MRI scans. This research showcases an explainable and robust approach for correctly classifying four brain tumor types, combining WGAN-augmented data with advanced deep learning models in feature extraction. The framework effectively manages class imbalance and integrates a custom attention mechanism, outperforming other models, thereby improving diagnostic accuracy and reliability in clinical settings.

Keywords:

attention mechanism; brain tumor; convolutional neural network; deep neural network; WGAN

1. Introduction

Brain tumors pose a crucial challenge in the area of medical science due to their complicated nature and deadly effects on the human brain which is a sophisticated organ [1]. These tumors arise from the uncontrolled growth of cells and rapid spreading within the brain or its peripheral structures which become life-threatening [2]. Quick and proper detection of this fatal disease is essential to ensure the right treatment, otherwise, the death rate cannot be decreased [3]. The ability to detect brain tumors is progressing on a daily basis with the advancement of technology. Magnetic Resonance Imaging (MRI) plays a vital role in predicting the presence of brain tumors [4]. Apart from brain tumor detection, its type classification is also possible through MRI [5]. Although MRI is a modern imaging technique for brain tumor detection, the interpretation of MRI images still depends on the expertise of radiologists which is time-consuming and complicated. To classify brain tumors properly, radiologists must be very careful to avoid errors and this process requires extensive experience [6]. However, there is still a great necessity to automate this crucial process to ensure the highest possible accurate and consistent assessment, prediction, and classification [7].

Recently, the automatic detection and classification of brain tumors have greatly progressed with the advent of deep-learning model implementation [8]. Among the different types of deep learning models, convolutional neural networks (CNN) show promising results in the field of medical image processing [9]. Deep learning models can easily extract different types of features, and recognize complex patterns, hence detection and classification of different types of diseases are now easily possible through training different types of datasets [10]. In a similar way described earlier, brain tumor detection and classification are becoming easier through the development of deep learning models [11]. The implementation of such models is reducing the possibility of human error and saving the diagnosis time for radiologists and physicians [12]. As a result, the demand for enhanced detection accuracy and classification of different complex diseases like brain tumors is increasing daily [13,14]. The proposed research’s fundamental contribution and principal findings are outlined below:

A new deep learning framework for brain tumor detection and classification has been proposed, in which modified VGG19 and Inception v3 architectures, custom attention mechanism in the feature extraction along with classification layers are implemented.
In the data preprocessing, images in the dataset have been balanced and augmented using Wasserstein Generative Adversarial Network (WGAN) to generate synthetic images, and a Gaussian filter has also been used for the noise reduction and enhancement of the quality of MRI images.
After training the proposed model, the performance evaluation metrics have shown excellent results which will be very promising with the comparison of the existing models for brain tumor detection and classification.
The model’s explainability through t-SNE plots shows distinct tumor clusters and Grad-CAM highlights crucial areas in MRI scans.

This paper is organized into different sections to present the total research work systematically using the proposed model. In Section 2, a significant number of research works related to brain tumor diagnosis are described briefly which are currently available. Then, in Section 3, the introduction of the dataset, data preprocessing, description of proposed frameworks, and metrics for performance evaluation are presented. In Section 4, performance evaluations of the proposed work are shown both numerically and graphically. Model explainability is also described using GradCAM. In Section 5, an overall performance comparison along with the impacts after implementation are demonstrated. Finally, concluding remarks are provided in Section 6.

2. Related Works

Related Works: Recent developments in deep learning models for medical imaging, particularly for brain tumor detection and classification are showing gradual improvements. Different types of Convolutional Neural Network (CNN)-based methods, hybrid, custom models, and lightweight models were developed for brain tumor diagnosis which are illustrated in the following sub-sections.

2.1. Convolutional Neural Network (CNN)-Based Methods

Recent advancements in Convolutional Neural Networks (CNNs) have significantly influenced medical image processing, including the detection and classification of brain tumors. The application of CNNs extends beyond brain tumor research, as demonstrated in [15], which explored CNN architectures. This work highlights the importance of CNNs in identifying complex patterns in medical images and reinforces the potential benefits of transferring these architectural innovations to cancer research. the difficulties of human-oriented brain tumor classification were reduced through the model development using Fuzzy C-Means clustering followed by conventional supervised machine learning models and CNN [16]. However, a manual feature extraction process was involved and hence the feasibility of this work was limited. In addition, in another study 23-layer CNN with VGG16 was used for brain tumor detection with good results, but the limited data showed overfitting issues [17]. CNN method was used in another work through the fine-tuning of EfficientNet-B0 with additional layers [18]. This method enhanced the classification performance, but image enhancement with augmentation was not up to the mark. A 2D CNN and convolutional auto-encoder network were developed and declared as superior models for tumor detection for configuration in [19], but the performance was underscored concerning traditional models. Improved ResNet50 was also presented as a potential model for brain tumor classification in another study [20] in which adequate comparative analysis was not justified properly. In [21], CNN architecture with ensemble learning was proposed and achieved good results with VGG16 applied in a small dataset.

2.2. Hybrid Models

Hybrid models that combine CNNs with other machine learning techniques have shown significant potential in improving classification accuracy and robustness. A hybrid CNN-LSTM for the same purpose was introduced by other researchers though the progression of the accuracy was not so high to implement [22]. Combined segmentation with feature extraction using Histogram of Oriented Gradients (HOG) and ResNet-V2, followed by classification using a BiLSTM network was proposed for classification in the work [23], but the validation was not enough for the practical implementation. An application-based system was deployed for automatic brain tumor detection and segmentation using deep learning models [24]. The system was designed using CNN, U-Net, and U-Net++ for brain tumor detection and segmentation from 2D and 3D MRI scans. The main objective of this system was to improve the model’s performance based on user feedback through an iterative process which could not bring a good result. In another work, CNN with ResNet50 and U-Net based on improved fine-tuning was implemented in [25], in which the results were not promising enough. Another user-focused system was developed using the hybrid model in which accuracy was also not good enough to implement [26]. In another study, a hybrid model using machine learning and deep learning models was presented, but the performance was still satisfactory [27].

2.3. Lightweight Models

The demand for models suitable for real-time applications has led to the development of lightweight CNN architectures. To mitigate the complexity of the model, a simple model based on lightweight CNN was introduced [28]. However, the accuracy was not satisfactory compared with other models despite being a simple model. Lightweight CNN was also taken into consideration in another study in which a combination of Internet of Medical Things (IoMT) and CNN models was used [29]. However, the limited training data showed the potential possibilities of biasing the performance. Besides these works, a clean-energy cloud-based light-weight deep-learning platform was proposed for brain tumor classification [30]. However, its performance and operational efficiency were questionable regarding implementation in clinical settings.

In summary, it is seen that numerous models were proposed and used in several datasets. However, there is still a necessity to build a model that will outperform those models in all respects and hence this proposed model can be a suitable choice for clinical implementation. A summary comparing key studies, their methods, and their performance metrics is given in Table 1.

3. Materials and Methods

The methodology of this research is carried out in multiple steps. At first, the brain tumor MRI dataset was collected from the IEEE Data port [31], which included separate training and testing sets, and all images were preprocessed (filtering and normalization) as well after loading the dataset. During the augmentation process, the WGAN is used to make balanced classes, and the custom architecture is then employed for classifying the dataset into four types of brain tumors. Finally, performance is analyzed through different metrics. Visualization techniques like confusion metrics, t-SNE, and Grad-CAM were used to illustrate the effectiveness of the model. Figure 1 depicts a summary diagram of the brain tumor classification process.

3.1. Dataset

The dataset used in this research was sourced from IEEE Dataport [31], a publicly accessible repository. It consists of MRI images categorized into four classes: glioma, meningioma, no tumor, and pituitary. The training set contains 1321 glioma, 1339 meningioma, 1595 no tumor, and 1457 pituitary images, while 300 glioma, 306 meningioma, 405 no tumor and 300 pituitary images in the testing set. The images were acquired and processed according to the standards outlined in the dataset documentation. Table 2 shows the distribution of the image dataset. This results in a total of 6982 MRI images and images are then loaded and preprocessed (as shown in Figure 2). Furthermore, the whole dataset was tested by 5-fold cross-validation, which means the data were divided into five subgroups. For each fold, the model was trained on four of these subsets and evaluated on the remaining one. The same procedure was carried out five times, once with each subgroup as the test set. By applying this rigorous method, the model demonstrated robust learning and validation by adhering to the established machine practices.

3.2. Data Loading and Preparation

The image dataset from testing and training directories for each class was loaded and resized to 224 × 224 pixels using a custom function. The images were then converted into arrays with NumPy, a Python library for further data preprocessing. The framework then tested individually by Gaussian, Median and Bilateral filtering methods for noise removal to evaluate their impact on brain tumor classification. Figure 3 shows that Gaussian filtering effectively reduces noise while preserving tumor boundaries and region, essential for accurate classification. On the other hand, Median filtering blurs the edges while removing “salt-and-pepper” noise, and Bilateral filtering is better at maintaining edge properties, but still leads to some form of blur and darker image. Comparing these filters results, Gaussian filter using Gaussian blur function is applied to each image to reduce noise and smooth out images, reducing small, irrelevant intensity variations while highlighting tumor boundaries.

Mathematical expression of Gaussian filter is as follows:

G (x, y) = \frac{1}{2 π σ^{2}} e^{- \frac{x^{2} + y^{2}}{2 σ^{2}}}

(1)

where

σ

is the standard deviation of the Gaussian distribution.

After filtering, the following normalization is performed:

Standard Normalization:

$Normalized pixel = \frac{pixel - μ}{σ}$

(2)

where $μ$ is the mean and $σ$ is the standard deviation of the pixel values. This helps in faster convergence during training by ensuring the data distribution is centered and scaled properly.
Min-Max Normalization:

$Normalized pixel = \frac{pixel - \min}{\max - \min}$

(3)

where min and max are the minimum and maximum pixel values in the image, scales the pixel values to the range [0, 1], which helps in reducing the impact of outliers and ensures the data falls within a specific range, making it easier for the model to learn.
GAN Normalization:

$Normalized pixel = (\frac{pixel}{127.5}) - 1$

(4)

modifies the values of pixels to the range [−1, 1]. This is particularly beneficial for Generative Adversarial Networks (GANs), because they perform better when the data is normalized to this range, resulting in more reliable and efficient training.

Finally, the class labels are converted into one-hot encoded vectors where each label is represented as a binary vector with a 1 in the position corresponding to the class and 0 s elsewhere returning the processed images and labels as NumPy arrays for model training and evaluation. Normalization steps played a key role in the increased model performance by making sure that our input data was same scaled. The Standard Normalization converged the results for training faster, Min-Max Normalization helped in reducing the outlier effect and GAN normalization was able to stabilize a GAN Training. The generalization capability, stability and also accuracy improved together through these normalization techniques.

3.3. WGAN for Data Augmentation

Data augmentation technique is very popular for enhancing the diversity of datasets. In this research, A Wasserstein Generative Adversarial Network (WGAN) is being employed in to brain tumor image dataset to enrich the training dataset and address class imbalance. The WGAN consists of a generator and a discriminator (critic) that work together to generate realistic synthetic images. Using the layers of transposed convolutions and LeakyReLU activation functions, the generator creates new training images from random noise, while these images are evaluated by the critic against real ones using convolutional layers with weight clipping to ensure stability. Wasserstein Loss function (loss = mean(ytrue × ypred)) is implemented to produce smoother gradients and improve GAN training. The discriminator is updated multiple times per generator iteration during training to improve its evaluation using both actual and generated images. In this way, the generator gains the ability to produce images that are identical to real ones. By applying the trained WGAN, the number of additional images needed to balance each class and generate these is determined. The new synthetic images were combined with the original training set and labeled using one-hot encoding. This augmented and balanced training dataset, which contained both real and fake images, was used to train a proposed classification model to improve its performance. The following bar chart (Figure 4) illustrates the distribution of samples per class before and after WGAN augmentation, clearly demonstrating how WGAN effectively balanced the dataset and contributed to a more robust model. The addition of these generated images in the training dataset helped minimize the class imbalance, resulting in enhanced model effectiveness. WGAN effectively generates synthetic MRI scan-like images, preserving structural features for accurate classification, despite varying augmentation strategies, demonstrating significant differences in model performance, as shown in Figure 5. This WGAN-generated images represent a sample visualization of the WGAN’s output which captures essential features for accurate classification like real trained images. The model is also tested on Traditional augmentation (such as rescaling (1/255), width and height shifts (up to 20%), rotation (up to 20 degrees), zooming (up to 20%), and horizontal flipping) with as well as no augmentation but WGAN achieves better performance. This innovative approach of choosing WGAN-augmentation for brain tumor classification helps create a robust system capable of accurately diagnosing all classes.

3.4. Proposed Classifier Model

This research introduces a classifier model tailored for brain tumor classification, specially designed to distinguish between glioma, meningioma, no tumor, and pituitary classes. In the proposed model, the input (224 × 224) simultaneously enters into two well-known deep learning architectures, such as VGG19 and InceptionV3. VGG19 captures small details through the use of compact convolutional filters, while InceptionV3 utilizes inception modules at different scales for detecting multi-scale features, making them good choices especially since brain tumors can be classified through multiple types. The model harnesses the blended strength of the InceptionV3 and VGG19 architectures because of their powerful feature extraction capabilities. The models are not used fully; instead, they are taken up to their intermediate layers, such as ‘mixed5’ from InceptionV3 and ‘block3_conv4’ from VGG19. To further optimize the feature extraction process, VGG19 is truncated at ‘block3_conv4’, allowing us to focus on essential features. After that, those layers were further refined parallelly with additional convolutional layers and max pooling operations to capture detailed features of the images. For example, the convolutional layer applies a 3 × 3 kernel to extract spatial hierarchy, followed by ReLU activation function

f (x) = max (0, x)

to introduce non-linearity, then max pooling down samples the feature maps while maintaining the most critical information.

Both outputs were applied into a custom attention mechanism block to enhance the model’s focus on the most prominent regions with the MRI images of the Human Brain. This block included convolution layers that project the input lower-dimensional spaces (

f = W_{f} \cdot X, g = W_{g} \cdot X, h = W_{h} \cdot X

), where

W_{f}, W_{g}, W_{h}

, etc. are learnable weights. The attention mechanism computes a similarity matrix

s = softmax (f \cdot g^{T})

and then aggregates the information back, weighted by the attention scores (

o = softmax (s) \cdot h

). This process assures that the model focuses on the most informative areas, which improves feature discrimination.

The results from parallel modified blocks (intermediate layers from InceptionV3 and VGG19, along with additional layers and custom attention blocks), become more detailed, and focused feature extractors are concatenated, forming a comprehensive feature representation. After being concatenated, the output passed through dense layers with ReLU activations, followed by batch normalization layers, which stabilize and accelerate the training process. The dense layers, represented as

f (x) = W \cdot x + b

, where W and b are weights and biases, respectively, by capturing intricate patterns. Dropout layers are used to prevent overfitting, alongside early stopping based on validation loss, monitoring of learning curves for signs of overfitting, and the application of data augmentation to enhance model generalization. The final ‘softmax’ activation function provides a probability distribution over four classes. Figure 6 depicts the proposed architecture in detail. This architecture leverages the deep feature extraction of two known models, combined with a custom attention mechanism, ensuring superior performance in brain tumor classification. The improved feature extraction, attention mechanism, and strong dense layers outweigh conventional models, resulting in a state-of-the-art solution in medical imaging.

3.5. Training and Evaluation

The training process was performed using Nadam optimizer as an optimization method, set to a learning rate of 0.002, and was conducted throughout 64 epochs with a batch size of 16. As illustrated in Table 3, the performance of the classifier is evaluated using several metrics, including accuracy, loss, precision, recall, and Area Under the Curve (AUC) [32,33]. During the training phase, the model is trained on the augmented dataset, which contains both original and generated images. Model checkpoints are used to store the highest-performing model depending on validation accuracy. In addition, if the validation accuracy does not increase for 10 consecutive epochs, the learning rate is reduced by 0.5 times.

4. Results

4.1. Experimental Setup

The proposed classifier model was trained and executed using a desktop computer equipped with an NVIDIA Geforce RTX 3060 GPU, featuring 12.0 GB of RAM, and powered by an Intel(R) Core (TM) i7-8265U processor operating at 1.60 GHz, with a boost-up to 1.80 GHz. These computational resources were adequate for training the model and evaluating its performance, ensuring both efficient operation and results.

4.2. Data Filtering Impact

After loading the training and testing datasets, for dataset preprocessing the framework is trained individually with three filterings such as Gaussian, Median and Bilateral as well as no filtering. As a result, Gaussian filter achieved the highest classification accuracy at 99.61%, compared to 96.67% with Median, 97.53% with Bilateral, and 94.32% with no filtering. Gaussian filtering obtained excellent results and visualization for noise reduction and edge preservation, which is why it was selected as the preferred method in this framework to enhance classification accuracy.

4.3. Data Augmentation Results

To resolve the problem of class imbalance, a WGAN was used to generate synthetic images. A total of 688 synthetic images were created, which balanced the class distribution and provided a more robust training set, resulting in 1600 images per class (total training set: 6400 images). The proposed architecture was trained on different augmentation strategies, including WGAN augmentation, traditional augmentation, and no augmentation separately in order to choose suitable augmentation technique for brain tumor classification. Based on the performance metrics, these three augmentation techniques have been compared in Table 4. At first, without applying any augmentation technique achieved accuracy 93.12%, whereas traditional augmentation techniques improved model accuracy, while the WGAN approach led to even greater improvements. The model with WGAN-augmented data achieved 99.60% accuracy and a 0.9999 AUC, outperforming traditional augmentation, highlighting WGAN’s superior enhancement of model robustness and generalization.

4.4. Classifier Performance

The proposed architecture was evaluated on the testing set representing a total of 1270 images of all types of brain tumor classifications. It achieved the following performance metrics:

Loss: 0.0153
Accuracy: 0.9961
Precision: 0.9960
Recall: 0.9960
AUC: 0.9999

Figure 7 depicts the evolution of overall performance measures across 64 training epochs, providing insights into testing accuracy (99.6%), AUC (99.99%), loss (1.53%), precision (99.6%), and recall (99.6%). These findings show that the proposed approach is very accurate and robust, with all metrics exceeding 99.5%. To ensure the robustness and generalizability of the proposed framework, 5-fold cross-validation has been performed on the training dataset. In this process, the dataset is split into five subsets, with the model trained on four subsets and validated on the remaining one, iterating through all subsets. The results for each fold are presented in Table 5. The classification performance obtained from the separate training and testing sets (99.6% accuracy) have been found to be consistent with those from the 5-fold cross-validation showing robustness of the model based on different data splits. This comprehensive evaluation offers a reliable estimate of the model’s performance on unseen data, preventing over fit.

4.5. Confusion Matrix and Visualization

Figure 8 depicts the overall confusion metric, which measures the model’s ability to differentiate between different types of brain tumors. The model has an overall accuracy of 99.61 percent. It properly categorized 299 glioma patients, 302 meningioma cases, 405 non-tumor cases, and 259 pituitary cases. However, there were occasional misclassifications, such as one case where glioma was mistakenly identified as meningioma.

Additionally, the t-SNE projection of the softmax layer outputs, shown in Figure 9, highlights how well the model distinguishes between different types of brain tumors. Each dot in the plot represents an MRI image, with colors indicating the class: glioma, meningioma, no tumor, and pituitary tumors. The clear separation into distinct clusters shows that the model has effectively learned the unique features of each tumor type, resulting in well-defined groups. This visualization underscores the robustness of the model’s classification capabilities.

Figure 10 depicts an essential feature of the study: employing Grad-CAM to improve model transparency and explainability. These visualizations highlighted the regions the model focused on for predictions, adding clarity to its decision-making process. The results indicated a robust and precise model for classifying brain tumor types, also providing valuable insights into its analytical approach.

5. Discussion

To implement a deep learning model, it has to satisfy requirements concerning all important aspects like balanced class, no biasing, highest accuracy, and statistical validity. Table 6 shows a comparison among the existing models which were experimented with the proposed model. The existing models shown in Table 6 were experimented with in the same dataset used for this research. For data preprocessing, these models were used by several methods like traditional augmentation (rotation, zoom, shift, flip, etc.), fuzzy inference systems, and different types of filters, etc. These works were analyzed using different types of deep learning architectures. The accuracy of these models ranged from 75% to 98.5%. WGCAMNet model exhibited excellent performance in all respects which can contribute a significant contribution to brain tumor detection and classification. Though the existing models showed good accuracy, there were some significant limitations that must be resolved. The proposed model improves not only the accuracy but also major limitations have also been improved. Moreover, the statistical evaluation of the proposed model shows great superiority over the existing models which ensure critical cases and maintain model robustness.

The uniqueness of the proposed model lies in several key aspects. Integration of WGAN enhanced the data augmentation significantly and hence class imbalance and biasing problems were removed, which are usually seen in the existing models. Through WGAN implementation, the proposed model produced synthetic images that enhance the training dataset and thereby create a more balanced and comprehensive dataset. This improves the model’s learning capability along with the enhancement of generalizability to unknown data. Furthermore, the integration of a custom attention mechanism in the WGCAMNet framework drives the model to focus more on the most relevant features of images, especially the tumor regions which are essential for proper classification. This contrasts with the traditional feature extraction methods used in other models. The performance metrics of the proposed model show great advancement, especially AUC of 99.99%, ensuring the greatest effectiveness in distinguishing among the different types of brain tumors with high precision. This is crucial for clinical applications since misclassification can negatively impact appropriate and timely treatment of the patient. The proposed framework is robust and generalizable across various data splits as well, ensuring reliable performance in diverse scenarios, as demonstrated by consistent results across 5-fold cross-validation and dedicated datasets. In addition, the use of explainability models, like Grad-CAM, and correct detection and classification increase model performance. So, comparing all the factors, the proposed model can be a significant advancement for brain tumor detection and classification if it is implemented in real applications.

6. Conclusions

There is no doubt that brain tumors have become one of the deadliest diseases globally, making early and accurate detection along with precise classification crucial for providing the right treatment and improving patient outcomes. Rapid advancements in artificial intelligence, particularly deep learning, have become vital in addressing these challenges in medical science. The proposed architecture, which combines WGAN for data augmentation, custom attention mechanisms, and modified VGG19 and InceptionV3 models, overcomes the limitations of existing models and also sets a new benchmark for accuracy and reliability in brain tumor diagnosis. This model holds great potential for contributing to more accurate and timely diagnoses in clinical practice, significantly enhancing patient outcomes. Looking ahead, the proposed model could form the foundation of a comprehensive system for real-world clinical implementation, accelerating research on drug and treatment protocols and potentially reducing the global mortality rate from brain tumors.

Author Contributions

Conceptualization: F.B.A.; methodology: F.B.A. and T.A.F.; software: F.B.A., T.A.F. and M.A.; validation: F.B.A., T.A.F., M.A., M.A.H. and M.A.A.D.; formal analysis: F.B.A., T.A.F. and M.A.; investigation: F.B.A., T.A.F. and M.A.; resources: F.B.A., T.A.F. and M.A.; data curation: F.B.A. and T.A.F.; writing—original draft preparation: F.B.A. and T.A.F.; writing—review and editing: F.B.A., T.A.F., M.A., M.A.H. and M.A.A.D. visualization: F.B.A.; supervision: M.A.H. and M.A.A.D.; project administration: M.A.H. and M.A.A.D.; funding acquisition: M.A., M.A.H. and M.A.A.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data are publicly available, we downloaded the data from IEEE Data Port.

Acknowledgments

Special thanks for dataset from IEEE Data Port provided by Jyotismita Chaki and Marcin Wozniak.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ostrom, Q.T.; Gittleman, H.; Truitt, G.; Boscia, A.; Kruchko, C.; Barnholtz-Sloan, J.S. CBTRUS statistical report: Primary brain and other central nervous system tumors diagnosed in the United States in 2011–2015. Neuro-Oncology 2018, 20, iv1–iv86. [Google Scholar] [CrossRef]
Louis, D.N.; Perry, A.; Reifenberger, G.; Von Deimling, A.; Figarella-Branger, D.; Cavenee, W.K.; Ohgaki, H.; Wiestler, O.D.; Kleihues, P.; Ellison, D.W. The 2016 World Health Organization classification of tumors of the central nervous system: A summary. Acta Neuropathol. 2016, 131, 803–820. [Google Scholar] [CrossRef] [PubMed]
Ostrom, Q.T.; Cioffi, G.; Gittleman, H.; Patil, N.; Waite, K.; Kruchko, C.; Barnholtz-Sloan, J.S. CBTRUS statistical report: Primary brain and other central nervous system tumors diagnosed in the United States in 2012–2016. Neuro-Oncology 2019, 21, v1–v100. [Google Scholar] [CrossRef] [PubMed]
Lee, C.H.; Jung, K.W.; Yoo, H.; Park, S.; Lee, S.H. Epidemiology of primary brain and central nervous system tumors in Korea. J. Korean Neurosurg. Soc. 2010, 48, 145. [Google Scholar] [CrossRef] [PubMed]
Chandana, S.R.; Movva, S.; Arora, M.; Singh, T. Primary brain tumors in adults. Am. Fam. Physician 2008, 77, 1423–1430. [Google Scholar] [PubMed]
Castro, M.G.; Cowen, R.; Williamson, I.K.; David, A.; Jimenez-Dalmaroni, M.J.; Yuan, X.; Bigliari, A.; Williams, J.C.; Hu, J.; Lowenstein, P.R. Current and future strategies for the treatment of malignant brain tumors. Pharmacol. Ther. 2003, 98, 71–108. [Google Scholar] [CrossRef]
Erickson, B.J.; Korfiatis, P.; Akkus, Z.; Kline, T.L. Machine learning for medical imaging. Radiographics 2017, 37, 505–515. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef]
Shen, D.; Wu, G.; Suk, H.I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [Google Scholar] [CrossRef] [PubMed]
Lakhani, P.; Sundaram, B. Deep learning at chest radiography: Automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 2017, 284, 574–582. [Google Scholar] [CrossRef] [PubMed]
Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Med. Phys. 2019, 29, 102–127. [Google Scholar] [CrossRef] [PubMed]
Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
Hussain, S.I.; Toscano, E. An extensive investigation into the use of machine learning tools and deep neural networks for the recognition of skin cancer: Challenges, future directions, and a comprehensive review. Symmetry 2024, 16, 366. [Google Scholar] [CrossRef]
Hossain, T.; Shishir, F.S.; Ashraf, M.; Al Nasim, M.D.A.; Shah, F.M. Brain tumor detection using convolutional neural network. In Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 3–5 May 2019; pp. 1–6. [Google Scholar]
Khan, M.S.I.; Rahman, A.; Debnath, T.; Karim, M.R.; Nasir, M.K.; Band, S.S.; Mosavi, A.; Dehzangi, I. Accurate brain tumor detection using deep convolutional neural network. Comput. Struct. Biotechnol. J. 2022, 20, 4733–4745. [Google Scholar] [CrossRef]
Shah, H.A.; Saeed, F.; Yun, S.; Park, J.H.; Paul, A.; Kang, J.M. A robust approach for brain tumor detection in magnetic resonance images using finetuned efficientnet. IEEE Access 2022, 10, 65426–65438. [Google Scholar] [CrossRef]
Saeedi, S.; Rezayi, S.; Keshavarz, H.; Niakan Kalhori, S.R. MRI-based brain tumor detection using convolutional deep learning methods and chosen machine learning techniques. BMC Med. Inform. Decis. Mak. 2023, 23, 16. [Google Scholar] [CrossRef]
Aggarwal, M.; Tiwari, A.K.; Sarathi, M.P.; Bijalwan, A. An early detection and segmentation of brain tumor using deep neural network. Bmc Med. Inform. Decis. Mak. 2023, 23, 78. [Google Scholar] [CrossRef]
Khaliki, M.Z.; Başarslan, M.S. Brain tumor detection from images and comparison with transfer learning methods and 3-layer CNN. Sci. Rep. 2024, 14, 2664. [Google Scholar] [CrossRef]
Alsubai, S.; Khan, H.U.; Alqahtani, A.; Sha, M.; Abbas, S.; Mohammad, U.G. Ensemble deep learning for brain tumor detection. Front. Comput. Neurosci. 2022, 16, 1005617. [Google Scholar] [CrossRef] [PubMed]
Mahum, R.; Sharaf, M.; Hassan, H.; Liang, L.; Huang, B. A robust brain tumor detector using BiLSTM and Mayfly optimization and multi-level thresholding. Biomedicines 2023, 11, 1715. [Google Scholar] [CrossRef] [PubMed]
Sailunaz, K.; Bestepe, D.; Alhajj, S.; Özyer, T.; Rokne, J.; Alhajj, R. Brain tumor detection and segmentation: Interactive framework with a visual interface and feedback facility for dynamically improved accuracy and trust. PLoS ONE 2023, 18, e0284418. [Google Scholar] [CrossRef] [PubMed]
Asiri, A.A.; Shaf, A.; Ali, T.; Aamir, M.; Irfan, M.; Alqahtani, S.; Mehdar, K.M.; Halawani, H.T.; Alghamdi, A.H.; Alshamrani, A.F.A.; et al. Brain tumor detection and classification using fine-tuned CNN with ResNet50 and U-Net model: A study on TCGA-LGG and TCIA dataset for MRI applications. Life 2023, 13, 1449. [Google Scholar] [CrossRef] [PubMed]
Saad, G.; Suliman, A.; Bitar, L.; Bshara, S. Developing a hybrid algorithm to detect brain tumors from MRI images. Egypt. J. Radiol. Nucl. Med. 2023, 54, 14. [Google Scholar] [CrossRef]
Anantharajan, S.; Gunasekaran, S.; Subramanian, T.; Venkatesh, R. MRI brain tumor detection using deep learning and machine learning approaches. Meas. Sens. 2024, 31, 101026. [Google Scholar] [CrossRef]
Mahmud, M.I.; Mamun, M.; Abdelgawad, A. A deep analysis of brain tumor detection from MR images using deep learning networks. Algorithms 2023, 16, 176. [Google Scholar] [CrossRef]
Hammad, M.; ElAffendi, M.; Ateya, A.A.; Abd El-Latif, A.A. Efficient brain tumor detection with lightweight end-to-end deep learning model. Cancers 2023, 15, 2837. [Google Scholar] [CrossRef]
Ghauri, M.S.; Wang, J.Y.; Reddy, A.J.; Shabbir, T.; Tabaie, E.; Siddiqi, J. Brain tumor recognition using artificial intelligence neural-networks (BRAIN): A cost-effective clean-energy platform. Neuroglia 2024, 5, 105–118. [Google Scholar] [CrossRef]
Chaki, J.; Wozniak, M. Brain Tumor MRI Dataset. IEEE Dataport. 2023. Available online: https://ieee-dataport.org/documents/brain-tumor-mri-dataset (accessed on 20 July 2024).
Podder, P.; Alam, F.B.; Mondal, M.R.H.; Hasan, M.J.; Rohan, A.; Bharati, S. Rethinking densely connected convolutional networks for diagnosing infectious diseases. Computers 2023, 12, 95. [Google Scholar] [CrossRef]
Alam, F.B.; Podder, P.; Mondal, M.R.H. RVCNet: A hybrid deep neural network framework for the diagnosis of lung diseases. PLoS ONE 2023, 18, e0293125. [Google Scholar] [CrossRef] [PubMed]
Chaki, J.; Woźniak, M. Brain Tumor Categorization and Retrieval Using Deep Brain Incep Res Architecture Based Reinforcement Learning Network. IEEE Access 2023, 11, 130584–130600. [Google Scholar] [CrossRef]
Arumugam, M.; Thiyagarajan, A.; Adhi, L.; Alagar, S. Crossover Smell Agent Optimized Multilayer Perceptron for Precise Brain Tumor Classification on MRI Images. Expert Syst. Appl. 2024, 238, 121453. [Google Scholar] [CrossRef]
Amarnath, A.; Al Bataineh, A.; Hansen, J.A. Transfer-Learning Approach for Enhanced Brain Tumor Classification in MRI Imaging. BioMedInformatics 2024, 4, 1745–1756. [Google Scholar] [CrossRef]
Vu, H.A. Integrating Preprocessing Methods and Convolutional Neural Networks for Effective Tumor Detection in Medical Imaging. arXiv 2024, arXiv:2402.16221. [Google Scholar]

Figure 1. Detailed Workflow Diagram of the Brain Tumor MRI Classification Process.

Figure 2. The sample collection includes axial, coronal, and sagittal images of four brain tumor types: (a) gliomas, (b) meningiomas, (c) no tumor, and (d) pituitary tumors.

Figure 3. Effects of Different Filtering Methods on Sample Brain Tumor Image.

Figure 4. Distribution of samples per class before and after WGAN augmentation.

Figure 5. Sample WGAN genarated images.

Figure 6. Block Diagram of the Proposed Model for Brain Tumor Classification.

Figure 7. Performance metrics of the proposed architecture over 64 epochs, including (a) accuracy, (b) loss, (c) precision, (d) recall, and (e) AUC.

Figure 8. Confusion matrix showing the classification accuracy across different brain tumor types.

Figure 9. t-SNE projection illustrating the separation of different brain tumor classes.

Figure 10. Grad-CAM visualization highlighting important regions in the MRI images for model predictions.

Table 1. Comparison of key studies on brain tumor detection and classification.

Reference	Methods	Performance Metrics
Hossain et al. [16]	Fuzzy C-Means clustering, traditional classifiers (SVM, KNN, MLP, etc.), CNN	CNN accuracy: 97.87%
Khan et al. [17]	23-layer CNN, transfer learning with VGG16	Accuracy: 97.8% (binary), 100% (multiclass)
Shah et al. [18]	Fine-tuned EfficientNet-B0, data augmentation, image enhancement	Accuracy: 98.87%
Saeedi et al. [19]	2D CNN, convolutional auto-encoder, traditional ML methods (MLP, KNN, etc.)	2D CNN Accuracy: 96.47%, Auto-encoder Accuracy: 95.63%, AUC: 0.99, Recall: 95%
Aggarwal et al. [20]	Improved ResNet for segmentation	>10% improvement in accuracy, recall, F1-score
Khaliki et al. [21]	CNN, Inception-V3, EfficientNetB4, VGG19, transfer learning	Best accuracy: 98% (VGG16), F-score: 97%, AUC: 99%, Recall: 98%, Precision: 98%
Alsubai et al. [22]	Hybrid CNN-LSTM, data preprocessing, CNN feature extraction	Accuracy: 99.1%, Precision: 98.8%, Recall: 98.9%, F1-score: 99.0%
Mahum et al. [23]	Mayfly optimization, ResNet-V2, BiLSTM	High accuracy, precision, recall, F1 score, AUC
Sailunaz et al. [24]	CNN, U-Net, U-Net++ for 2D and 3D MRI segmentation	Accuracy and Dice scores above 90%
Asiri et al. [25]	Fine-tuned CNN with ResNet50, U-Net for segmentation	IoU: 0.91, DSC: 0.95, SI: 0.95
Saad et al. [26]	Hybrid algorithm for brain tumor detection, CAD	Detection accuracy: 96.6%
Anantharajan et al. [27]	Ensemble Deep Neural SVM, Fuzzy C-means, GLCM	Accuracy: 97.93%, Sensitivity: 92%, Specificity: 98%
Mahmud et al. [28]	CNN architecture compared with ResNet-50, VGG16, Inception V3	Accuracy: 93.3%, AUC: 98.43%, Recall: 91.19%, Loss: 0.25
Hammad et al. [29]	CNN-based model for IoMT applications, lightweight design	Accuracy: 99.48% (binary), 96.86% (multi-class)
Ghauri et al. [30]	Clean-energy cloud-based DL platform, multi-layer CNN	Precision: 96.8%

Table 2. Distribution of data within each class into training and test sets.

Type of Brain Tumor	Training Set	Testing Set
Glioma	1321	300
Meningioma	1339	306
No tumor	1595	405
Pituitary	1457	300
Total	5712	1270

Table 3. Evaluation Metrics and Equations.

Metric	Equation	Notes
Accuracy	$Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$	TP: True Positives
Precision	$Precision = \frac{TP}{TP + FP}$	TN: True Negatives
Recall (Sensitivity)	$Recall = \frac{TP}{TP + FN}$	FP: False Positives
F1 Score	$F 1 Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$	FN: False Negatives

Table 4. Model Performance with Different Data Augmentation Techniques.

Augmentation Method	Loss	Accuracy	Precision	Recall	AUC
No Augmentation	0.1889	93.12%	0.9348	0.9304	0.9923
Traditional Augmentation	0.1666	95.50%	0.9561	0.9525	0.9936
WGAN Augmentation	0.0153	99.60%	0.9960	0.9960	0.9999

Table 5. Model Performance Across 5-Fold Cross-Validation.

Fold	Accuracy	Loss	Precision	Recall	AUC
Fold 1	0.9920	0.0375	0.9910	0.9905	0.9990
Fold 2	0.9875	0.0405	0.9860	0.9880	0.9980
Fold 3	0.9905	0.0380	0.9880	0.9920	0.9996
Fold 4	0.9915	0.0380	0.9900	0.9890	0.9998
Fold 5	0.9890	0.0395	0.9885	0.9875	0.9995
Average	0.9902	0.0389	0.9889	0.9894	0.9996

Table 6. Comparison between the proposed model and state-of-the-art models.

Reference	Dataset	Preprocessing Method	Model Architecture	Performance Metrics
Chaki et al. [34]	[31]	Fuzzy Inference System	Deep Brain INCEP Res Architecture 2.0 Based Reinforcement Learning Network	Accuracy: 97.5%
Arumugam et al. [35]	[31]	Cropping and denoising by Gaussian filter	CNN with Multi Layer Perception	Accuracy: 98.5%, Sensitivity: 98.6%, Specificity: 98.4%
Amarnath et al. [36]	[31]	Traditional Augmentation	ResNet50	Accuracy: 87.9%, F1 Score: 79.6%
Amarnath et al. [36]	[31]	Traditional Augmentation	Xception	Accuracy: 98.1%, F1 Score: 98.1%
Amarnath et al. [36]	[31]	Traditional Augmentation	EfficientNetV2-S	Accuracy: 96.1%, F1 Score: 96.2%
Amarnath et al. [36]	[31]	Traditional Augmentation	ResNet152V2	Accuracy: 78.5%, F1 Score: 79.9%
Amarnath et al. [36]	[31]	Traditional Augmentation	VGG16	Accuracy: 76.8%, F1 Score: 77.5%
Vu et al. [37]	[31]	Smoothing with a Kernel, Bilateral Filtering, Gray scale conversion and Traditional Augmentation	Modified ResNet50	Accuracy: 75%
Proposed Model	[31]	WGAN and Gaussian Filter	WGCAMNet	Accuracy: 99.61%, Precision: 99.60%, Recall: 99.60%, AUC: 99.99%, Loss: 0.0153

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alam, F.B.; Fahim, T.A.; Asef, M.; Hossain, M.A.; Dewan, M.A.A. WGCAMNet: Wasserstein Generative Adversarial Network Augmented and Custom Attention Mechanism Based Deep Neural Network for Enhanced Brain Tumor Detection and Classification. Information 2024, 15, 560. https://doi.org/10.3390/info15090560

AMA Style

Alam FB, Fahim TA, Asef M, Hossain MA, Dewan MAA. WGCAMNet: Wasserstein Generative Adversarial Network Augmented and Custom Attention Mechanism Based Deep Neural Network for Enhanced Brain Tumor Detection and Classification. Information. 2024; 15(9):560. https://doi.org/10.3390/info15090560

Chicago/Turabian Style

Alam, Fatema Binte, Tahasin Ahmed Fahim, Md Asef, Md Azad Hossain, and M. Ali Akber Dewan. 2024. "WGCAMNet: Wasserstein Generative Adversarial Network Augmented and Custom Attention Mechanism Based Deep Neural Network for Enhanced Brain Tumor Detection and Classification" Information 15, no. 9: 560. https://doi.org/10.3390/info15090560

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

WGCAMNet: Wasserstein Generative Adversarial Network Augmented and Custom Attention Mechanism Based Deep Neural Network for Enhanced Brain Tumor Detection and Classification

Abstract

1. Introduction

2. Related Works

2.1. Convolutional Neural Network (CNN)-Based Methods

2.2. Hybrid Models

2.3. Lightweight Models

3. Materials and Methods

3.1. Dataset

3.2. Data Loading and Preparation

3.3. WGAN for Data Augmentation

3.4. Proposed Classifier Model

3.5. Training and Evaluation

4. Results

4.1. Experimental Setup

4.2. Data Filtering Impact

4.3. Data Augmentation Results

4.4. Classifier Performance

4.5. Confusion Matrix and Visualization

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI