1. Introduction
With all the wonderful progress of medicine over the last decades, some diseases are still life-threatening and, among them, brain cancer is the most aggressive [
1]. Uncontrolled irregular growth of protein inside and around the brain tissues is known as a brain tumor. A brain tumor can be malignant or benign, malignant being the most aggressive type. In layman’s terms, the malignant type of brain tumor is called brain cancer. If a tumor breaches the covering and spreads into other parts, it is considered cancer [
2]. Pituitary, meningioma, and glioma tumors are the three basic categories of brain tumors. The pituitary is a gland located at the base of the brain, and any abnormal growth of protein around this gland is known as a pituitary brain tumor [
3]. Meningioma is a benign tumor that develops slowly and is found on the brain’s outer coverings beneath the skull [
3]. The last and most aggressive one is glioma, with the highest mortality rate worldwide among all brain tumors [
4]. It is commonly found in the cerebral hemispheres and the supporting tissue cells of the brain. Because of the location of the various brain tumors, pituitary and meningioma tumors are easy to detect, but gliomas are difficult to detect and analyze [
3]. Sample images of glioma, meningioma, and pituitary from the dataset used in this research are presented in
Figure 1.
Early symptoms of both benign and cancerous tumors are rare. The increased intracranial pressure is one of the initial symptoms. The skull bone restricts the amount of space available for growth. As a result, any new growth will raise intracranial pressure. Symptoms depend upon the site of the tumor; headache, vomiting, numbness of the hand or leg, or fits are a few symptoms [
5].
Benign tumors, including meningioma and pituitary tumors, are slow-growing and typically cause no symptoms. However, neuropsychiatric symptoms such as anxiety, psychosis, personality changes, memory disturbances, or anorexia nervosa are common in patients with meningioma [
6]. When only psychiatric symptoms are present, the diagnosis of meningioma could be delayed. Meningioma and almost all benign tumors are more likely to cause psychiatric symptoms and behavioral manifestations in individuals [
7]. Gyawali et al. in [
6] emphasize the need for neurological evaluation and neuroimaging in psychiatric patients, particularly those with unusual symptoms. Similarly, fatigue, seizures, edema, endocrinopathy, and psychiatric disorders are symptoms commonly found in patients with glioma tumors [
8]. Because these symptoms are generic and not disease-specific, medical imaging is frequently used for brain tumor diagnosis.
Computed axial tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI) are a few common medical imaging techniques that are frequently used in medicine, including the diagnosis of brain tumors. In the clinical practice for the initial brain tumor diagnosis, computed axial tomography (CT) and magnetic resonance imaging (MRI) are the most widely used imaging techniques. Both CT and MRI have some advantages over each other. CT takes less time for imaging and offers high spatial resolution compared to MRI [
9]. This property of CT makes it ideal for chest and bone-related diagnosis. However, the contrast of CT for soft tissue imaging is not high compared to MRI [
9]. So, MRI is the most popular because of its high-resolution imaging capability.
In simple MRI scans, benign and malignant tumors look similar, and it is compulsory to differentiate among them at the initial stage of diagnosis. Contrast-enhanced MRI is the first choice of medical experts because of its ease of availability and better soft tissue resolution.
Although magnetic resonance imaging (MRI) has become the gold standard for diagnosing patients with tumors in any part of the body, classic MRI scans have two main limitations. It neither distinguishes neoplastic tissue from nonspecific, treatment-related changes after chemotherapy or surgery, nor does it show the tumor to the full extent [
10]. Several modern MRI techniques, such as perfusion-weighted imaging and magnetic resonance spectroscopic imaging, are being tested recently in clinical practices to address the same diagnostic issues. Perfusion-weighted imaging highlights the fluids moving through the arteries, and diffusion-weighted imaging weights the MRI signal by the diffusion rate of water molecules [
10].
Contrast-enhanced MRI plays a critical role in identifying, characterizing, and planning surgical tumor resection in patients with glioma. Any sign of contrast enhancement in early postoperative MRI (within 24–72 h) indicates incomplete resection [
11]. A considerable number of patients could be misjudged with contrast-enhanced MRI, especially the patients with IDH-wildtype anaplastic glioma [
12]. IDH (isocitrate dehydrogenase) is an important enzyme in the tricarboxylic acid cycle, and the tumors with normal IDH genes are referred to as “IDH wild-type” or “IDH negative” [
13]. These IDH wild-type tumors are considered the most aggressive ones and lack contrast enhancement on MRI, so it may not be the best option for resection guidance [
11]. Positron emission tomography (PET) scans have been considered recently in some clinical facilities to overcome the deficiency of contrast-enhanced MRI, particularly for the group of patients with IDH wild-type [
11].
PET employs a range of radioactive tracers to target various metabolic and molecular processes. It can provide valuable extra information that enables medical experts to diagnose more precisely, particularly in ambiguous clinical scenarios [
10]. For the diagnosis of most peripheral tumors in oncology, the most widely used PET tracer is 2-
18F-fluorodeoxyglucose (
18F-FDG) [
10]. However, in the case of a brain tumor, the use of
18F-FDG PET is limited due to the high levels of glucose metabolism in normal brain tissues.
In cerebral gliomas, the proliferation marker
18F-3′-deoxy-3′-fluorothymidine (
18F-FLT) accumulates in proportion to malignancy grade [
14]. Nevertheless,
18F-FLT is unable to detect the full extent of a glioma because it cannot pass through the intact blood–brain barrier (BBB) and accumulates in portions of the tumor where the BBB has been disrupted.
The uptake of radiolabeled amino acids is poor in normal brain tissue, in contrast to the widely used
18F-FDG PET tracer, due to which tumors can be displayed with a strong tumor to background contrast. The ability of common amino acid tracers to penetrate through the intact BBB is one of their key characteristics, allowing for the depiction of the tumor that makes PET superior to contrast-enhanced MRI [
11]. So, PET with radiolabeled amino acids is used as an alternative to the contrast-enhanced MRI for more exact tumor delineation [
15]. The radiolabeled amino acid O-(2-[
18F]fuoroethyl)-L-tyrosine (FET) is currently the most widely used tracer, particularly in Europe [
10]. The fundamental advantage of PET employing radiolabeled amino acids is that their uptake is not affected by blood–brain barrier disruption, allowing it to detect tumor portions that are not visible on MRI [
10,
16].
Despite numerous technological advances in medical imaging and treatment, brain tumor patients’ survival rates remain extremely low [
17]. No doubt, PET, radiolabeled PET, MRI, CT, and contrast-enhanced MRI help medical experts diagnose and classify brain tumors; however, accuracy is vulnerable to human subjectivity. Observing an enormous amount of medical data (MRI/CT images) is time-consuming for humans, and chances of human error are always there. The detection of a brain tumor at an early stage is crucial and depends upon the expertise of neurologists [
18]. It is necessary to build computer-aided-diagnostic (CAD) systems that could help radiologists and other medical experts.
Researchers have shown great interest in developing automated AI-based intelligent systems. Traditional machine learning algorithms and methods for classifying brain tumors involve several steps, including heavy preprocessing, manual feature extraction, manual feature selection, classification, etc. Feature extraction and selection is a difficult process that requires prior domain knowledge, as the classification accuracy depends on good features being identified [
19].
The problem of manual feature selection is eliminated with the arrival of deep learning. Image processing and deep learning methods have shown outstanding performance in various image-based tasks in various fields, including medicine [
20,
21,
22,
23].
Synthesized MRI/CT pictures can be extremely useful for training machine learning models when real MRI/CT images are prohibitively expensive to obtain from patients when considering time constraints and patient privacy [
24].
Deep learning models feature hundreds of layers and millions of parameters. The more complex the model, the more data we need to train it. Overfitting is a prevalent problem when deep networks with a large number of parameters are trained on small datasets. The beauty of supervised deep learning lies in the quality and quantity of labeled data that is extremely difficult to acquire in the medical field.
In 2014, ground-breaking work in the field of generative models was proposed by Goodfellow et al., called generative adversarial networks (GANs) [
25]. A GAN is made up of two components: a generator and a discriminator. The generator attempts to fool the discriminator by producing realistic-looking images, while the discriminator attempts to distinguish the created images as real or fake. They are alternately trained to reach final convergence. One significant difference between conventional generative models and GANs is that GAN learns the input distribution as a whole image instead of generating the image pixel by pixel.
So, researchers used GANs and tried to generate artificial medical images to overcome this problem. In the case of brain tumor magnetic resonance (MR) images, most GAN-based works are conducted to generate super-resolution brain MR images [
26], some researchers used GANs for brain tumor segmentation [
27,
28], and very few used it for brain tumor classification [
29].
In GANs and all its proposed extensions, there are a few things in common. First, all of them are the tools to generate those samples for which hundreds of thousands of images are available for training, e.g., MNIST. For medical image generation, we do not have that much training data generally.
Secondly, all these generative models use random Gaussian noise to sample the input vector. Because random Gaussian noise is a low-tailed distribution, the generator generates blurry and non-diverse images. Such image generation may not be helpful in the medical imaging field, as blurry images do not offer any realistic features to learn for the classifier.
In this paper, we tried to solve this problem by proposing a framework to generate brain tumor medical images artificially. This framework is the combination of two generative models, variational autoencoder (VAEs) and generative adversarial networks (GANs). We cascaded a GAN model with an encoder–decoder network trained separately on the training set and produced a noise vector with the image manifold information.
Our proposed method can generate realistic-looking sharp brain tumor images that improve the classification results significantly.
The rest of this paper is organized as follows.
Section 1.1 reviews previous work related to brain tumor classification based on various machine learning methods, including GANs and its applications in medical imaging.
Section 2 reports the proposed ED-GAN method in detail including experiment settings. Results & discussion, and conclusion are presented in
Section 3 and
Section 4, respectively.
1.1. Related Work
In the process of developing a machine learning-based intelligent system for the classification of brain tumors, researchers usually first perform segmentation of brain tumors by using various methods and then classify them [
30]. This method improves the accuracy, but it is time consuming and takes one extra step before putting the network into the training. However, many researchers used CNNs to classify brain tumors directly without segmentation.
Justin et al. [
31] used three classifiers (i.e., random forest (RF), a fully connected neural network (FCNN), and a CNN) to improve the classification accuracy. The CNN attained the highest rate of accuracy, i.e., 90.26%. Tahir et al. [
30] investigated various preprocessing techniques in order to improve the classification results. They used three preprocessing techniques: noise reduction, contrast enhancement, and edge detection. The various combinations of these preprocessing techniques are tested on various test sets. They assert that employing a variety of such schemes is more advantageous than relying on any single preprocessing scheme. They used the Figshare dataset and tested the SVM classifier on it, which achieved 86% accuracy.
Ismael et al. [
32] combined statistical features with neural networks. They extracted statistical features from the MR images for classification and used 2D discrete wavelet transforms (DWT) and Gabor filters for feature selection. They feed the segmented MR images to their proposed algorithm and obtain an average accuracy of 91.9%.
Another project that sought to categorize multi-grade brain tumors can be found in [
33]. A previously trained CNN model is utilized along with segmented images to implement the method. They use three different datasets to validate the model. Data augmentation was performed using various techniques to handle the class imbalance and improve accuracy. Original and augmented datasets are tested on the proposed technique. In comparison to previous works, the presented results are convincing.
Nayoman et al. [
34] investigated the use of CNNs and constructed seven different neural networks. One of the lightweight models performed best. Without any prior segmentation, this simple model achieves a test accuracy of 84.19%.
Guo et al. [
35] propose an Alzheimer’s disease classifier. In Alzheimer’s disease, abnormal protein grows in and around the brain cells. The author uses graph convolutional neural networks (GCNNs) to classify Alzheimer’s disease into 2 and 3 categories. They used the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. The proposed graph nets achieved 93% for 2 class classification compared to 95% for ResNet architecture and 69% for SVM classifier. The proposed graph CNN achieved 77% in the three-class classification, ResNet 65%, and SVM 57%.
Ayadi et al. [
36] used two different datasets, Figshare and Radiopaedia. One is used to classify brain tumor class, and the other is related to the classification of the stage of the brain tumor. For the classification of the main class of the tumor, they used a simple, lightweight CNN architecture.
Zhou et al. [
37] used only axial slices from the dataset to classify the brain tumor. They also used a simple CNN classifier.
Pashaei et al. [
38] proposed a method based on extreme learning machines in their study to classify the brain tumor. First, they extracted the features using CNN and used them in a kernel extreme learning machine (KELM) to build a classifier. KELM is famous for increasing the robustness of the classification task.
GAN-based networks for producing synthetic medical images have gained popularity in recent years due to their exceptional performance. A variation of Cycle GAN is proposed by Liu et al. [
39] that generates Computed Tomography (CT) images using the domain control module (DCM) and Pseudo Cycle Consistent module (PCCM). The DCM adds additional domain information, while the PCCM maintains the consistency of created images. Shen et al. created mass images using GANs and then filled them with contextual information by incorporating the synthetic lesions into healthy mammograms. They asserted that their suggested network can learn real-world images’ shape, context, and distribution [
40].
Chenjie et al. proposed a multi-stream CNN architecture for glioma tumor grading/subcategory grading that captures and integrates data from several sensors [
41].
Navid et al. [
29] proposed a new model for brain tumor classification using CNN on the Figshare dataset. They extracted the features by using the model as a discriminator of a GAN. Then a SoftMax classifier was added to the last fully connected layer to classify three tumors. They used data augmentation to improve the results and achieve 93.01% accuracy on the random split.
Other researchers have applied GANs to a variety of problems from medicine, including Shin et al. [
42], who utilized a two-step GAN to generate MR images of brain parts with and without tumors [
43], Ahmad used TED-GAN [
44] to classify skin cancer images, and Nie [
45] generated pelvic CT images.
GANs have gained the attention of researchers and are extensively used in a variety of medical imaging fields these days. Researchers attempt to improve results by utilizing complex and deep architectures. All these GAN-based studies contribute in various ways, but all of them used the random Gaussian noise as an input to the generator of the GAN. In the generative medical imaging field, manipulating the input noise of GANs is still un-explored.
3. Results and Discussion
In this study, we proposed the combination of two different generative models (VAEs and GANs) to generate artificial MR brain tumor MR images. Generating medical images using any generative model is time-consuming and more difficult compared to generating images of other species such as dogs, cats, and digits, where GANs are mostly used. Additionally, using these synthetic medical images to train the classifier for tumor identification is even more critical and requires a lot of strict evaluation before making any opinion about it. The dataset used in this study was somewhat small. We attempted to capitalize on the use of variational autoencoders in conjunction with GANs to handle the problem of the limited availability of data. We used the Figshare public dataset of brain tumor MR images [
47]. The details of the dataset split are discussed in
Section 2.2.1.
Before training the ResNet50 for a reasonable number of epochs, we trained it for 30 epochs under different values of hyperparameters, including different optimizers, batch size, and dropout rates.
Table 3 summarizes the average accuracy for various optimizers under different optimizer learning rates. We chose the Adam optimizer to observe the effect of the dropout rate, as it performed better comparatively during the testing of the optimizer learning rate.
Table 4 shows the effect of various dropout rates. To check the effect of the dropout rate, we fixed the epochs, learning rate, and optimizer to 30, Adam, and 0.0001, respectively. No generative images or augmentation was used for testing the hyperparameters; only the training set (60% of the dataset) was used. Augmentation of data plays a vital role in overcoming the class imbalance in the dataset and improving the results. We used plenty of generative images for augmentation; apart from this, we used the classic augmentation technique (rotation and scaling) to observe its effect on the results.
We observed an improvement of around 5% in the average accuracy when ResNet50 was trained on the training set with classic augmentation, without using generative images in the training set.
Table 5 summarizes the results with and without augmentation.
In general, most past studies have relied solely on accuracy performance measures to compare the results with their proposed technique. However, using just accuracy for comparative purposes can be deceptive because it ignores other performance measures such as sensitivity, specificity, and precision. In the situation of imbalanced data, the accuracy of the classifier could be better for one class than the others. F1-score is a performance measure that includes all aspects (sensitivity and precision) of evaluations. This study used various performance metrics, including recall/sensitivity, specificity, precision, F1-score, and average accuracy.
Glioma is the most dangerous type of brain cancer. Neurologists are always interested in its sensitivity, specificity, and precision. ResNet50 trained on the Figshare dataset images only, without any generative images, achieved 82.86% sensitivity (recall) and 84.45% specificity for brain tumor class glioma. In contrast, the sensitivity and specificity improved to 96.50% and 97.52% for the same glioma class, respectively, when ResNet50 was trained with the images generated by the proposed method ED-GAN. All the hyperparameters values were the same, and the classifier was trained for 500 epochs. The training and validation accuracy graph of the classifier for 500 epochs is shown in
Figure 4. A detailed quantitative comparison of sensitivity, specificity, precision, and F1-score for various experiments are summarized in
Table 5.
Figure 5 shows the confusion matrices of various experiments. A confusion matrix (CM) is a great way to see the behavior of the classifier visually. In a single glance, one can observe whether the classifier is biased to some dominant class or not.
Let us consider
Figure 5A; the vertical and horizontal axis represents the true and predicted labels, respectively. If we consider a class glioma (test images of glioma = 286) and observe the matrix horizontally, the classifier predicted 220 images correctly as glioma. It incorrectly classified the remaining 66 images of glioma: 36 as meningioma and 30 as pituitary.
We used some other generative models, GAN [
25], DeliGAN [
49], to compare the performance of the proposed framework. The performance measure inception score [
48] was used to measure the quality of generated images. We used the inception model to calculate the inception score, though it is not compulsory to use only this architecture. The inception score uses the KL divergence, which is a good performance measure for generative models. It measures the difference between two probability distributions instead of considering the image pixels only. To compare the classification results with these generative models, we used the ResNet50 as a classifier. ResNet50 is trained with the training set along with images generated by generative models GAN and DeLiGAN. We used generative images as augmentation, and did not use any other classic augmentation such as scaling, cropping or rotation, etc.
Table 6 and
Table 7 represent the comparison of inception score and classification performance measures for the proposed method with state-of-the-art generative models.
Apart from comparing with other image generative methods, we compare our classification results with several other studies published in various journals within the last five years. We selected 11 studies for comparison. They all used the same public dataset of brain tumor MR images. Out of the 11 studies, 9 reported an average accuracy of more than 90%. The average accuracy of the proposed framework is better, around 2–7%. The comparative classification results and other insightful information are summarized in
Table 8.
GAN-based generative models can easily learn the outer features, such as the shape of the skull, but it is quite challenging to generate fine features by using GAN, such as tumors inside the skull. We can observe this situation in
Figure 6B. This Figure is taken from [
29], where they used the GAN for the pre-training of brain tumor classifier and achieved an average accuracy of around 95%.
Figure 6A represents the images generated by the proposed ED-GAN. Here, we can clearly observe the quality difference of generated images of brain tumors. Our proposed extension of GAN, ED-GAN, could generate better images because it samples the noise from the informative noise vector instead of random Gaussian noise. Furthermore, this is the quality of generated images that ensured the proposed framework achieved a better average accuracy of 96.25% on the test set.