3.1.1. Image Translation
Deliberation of more than one imaging modality is recommended in finding the complete picture of the abnormalities. The main task in image-to-image translation is a reliable mapping from the source image to the synthetic image [
14]. The process involves good adoption of the loss function to inculcate this mapping into a high-quality translated image [
15]. However, acquiring all modalities is impossible due to certain limitations, e.g., scanning time, cost, radiation dose, and patients’ age. GAN is extensively used to generate CT and PET images from source brain MRIs.
- A.
MRI-to-CT Translation:
In current radiotherapy, both CT and MRI are used to diagnose brain abnormality better. On the one hand, CT images give an electron density score required for treatment planning; on the other hand, MRI is equally contributory with its superior contrast at soft brain tissues. MRI-only treatment planning can reduce the registration misalignment between CT and MRI scans, curtail the imaging cost, improve the radiotherapy accuracy, and lower the patient’s vulnerability towards ionization radiation compared to the CT process possible with GAN. The GAN model can work on paired or unpaired type training images as inputs. The unpaired dataset is readily available but is hard to deal with as the mapping between input and output is unknown. Although challenging to access, the paired datasets present easy GAN development.
The authors in [
16] have generated synthetic CT images by using mutual information (MI) as the loss function to avoid the issue of misalignment between MRI and CT images. The study of [
17] is the modification of [
16], where the ConditionalGAN (CGAN) model checks the dosimetric accuracy of synthetic CT (SCT) images of patients with brain tumors with the aim of their use in MRI-only treatment planning for proton therapy. Along with MI, binary cross entropy is used as the activation/loss function in the discriminator. The study of [
18] compares the similarity between SCT and the original CT, where a CGAN based on the pix2pix architecture produces an SCT. Radiation dose calculation is a significant difficulty in MR-only workflow as it is hard to find electron density information from only MRI scans. The study [
19] discusses SCT generation and evaluates dosimetry accuracy.
The MedGAN in [
20] uses the fusion of non-adversarial losses from recent image style transfer techniques to capture the desired target modality’s high- and low-frequency details. MedGAN works at the image level in an end-to-end manner giving better performance compared to patch-wise training, which suffers from limited modeling capacity. CasNet, new generator architecture, uses encoder-decoder pairs to raise the acuteness of the resultant images by gradual refinement. The MR-based attenuation correction (MRAC) process is extensively used in PET/MR systems for photon attenuation correction. In the atlas-based MRAC method, CGAN generates photon attenuation maps from SCT. Skip connections of U-Net and GAN loss restore edge information in images [
21]. The dosimetric and image-guided radiation therapy (IGRT) method of SCT generation is discussed in [
22]. GAN with a ResNet generator and a CNN discriminator creates SCT images from T1-weighted postgadolinium MRI. In [
23], a spatial attention-guided generative adversarial network (Attention-GAN) minimizes the spatial difference in SCTs that can deal with the atypical anatomies and outliers in a better way. The framework projects the regions of interest (ROI), and the transformation network performs the domain change. The term attention is the addition of the absolute values in different layers of the discriminator across the channel dimension. CycleGAN is the second most commonly used GAN model in image translation applications. The cycleGAN can work on unpaired data but may introduce inconsistent anatomical features in the generated images. The unsupervised attention guided GAN (UAGGAN) model can work with both paired and unpaired images and be used for bidirectional MR-CT image synthesis. First, the supervised pre-training fine-tunes the network parameters; then, the unsupervised training improves the medical image translation. The combination of WGAN adversarial loss with content loss and L1 assures global consistency in the output image. The UAGGAN achieves satisfactory performance by producing attention masks [
24]. In [
25], cycleGAN with dense blocks performs two transformation mappings (MRI to CT and CT to MRI) simultaneously. A multi-scale patch-based GAN performs unpaired domain translation and generates 3D medical images of a high resolution. The approach has a low memory requirement where a low-resolution version is generated and later converted into a high-resolution version by patches of constant sizes [
26]. Three-dimensional cycleGAN uses inverse transformation and inverse supervision to learn the mapping between the MRI and CT image pairs for proton treatment planning of base-of-skull (BoS) tumors. The dense blocks-based generator explores image patches for textural and structural features [
27]. Attenuation correction (AC) needed for a PET image is accomplished by a 3D cycle-GAN, where a 3D U-net generator produces continuous AC maps from Dixon MR Images without MR and CT image registration. The downsampling and upsampling layers in 3D U-net reduce the memory requirements [
28]. StarGAN performs image translation among more than one pair of classes. In [
29], counterfactual activation generator (CAG) implement image transformation for seven classes. This setting extracts task-sensitive features from brain activations by equating ground truth and real and synthetic images. In [
30], high dimensional input maps are translated to high dimensional output maps with the help of Pix2Pix-cGANs to colorize the tumor region in Intracranial tumor MRI images.
- B.
MRI-to-PET Translation:
MRI scans also find applications in synthetic PET scan generation similar to SCT generation. A good-quality PET image requires a full-dose tracer, but the potential health hazards posed by radioactive exposures raise concerns about PET images’ use. A 3D auto-context-based locality adaptive multimodality GAN (LAGAN) generates the superior FDG PET using the same kernel for every input modality. The locality-adaptive fusion network produces a fused image by learning convolutional kernels at different image locations. Then, these fused images are used for generator training keeping the number of parameters low while increasing the number of modalities. Contrary to the multimodality cases where convolution is performed globally, the method in [
31] concentrates on locality adaptive convolution. In PET imaging, if the tracer dose is lowered, considering its negative effect on the patient, unnecessary noises and artifacts compromise the quality of the resultant image. Two networks, convolutional auto-encoder and GAN generate adaptive PET templates with the help of a C-PIB PET scan and a T1-weighted MRI sequence. These synthetic PET images are used to spatialize amyloid PET scans during Alzheimer’s disease estimation [
32]. In Multiple Sclerosis, demyelination occurs in the brain’s white matter and the spinal cord. Sketcher-Refiner GAN predicts the PET-derived myelin content map from multimodal MRI by sketching the anatomy and physiological details and then producing the myelin content map. The model is an extension of CGAN with a 3D U-Net generator working on four MRI modalities as inputs [
33]. A task-induced pyramid and attention GAN(TPAGAN) integrates pyramid convolution and attention module to create the absent PET image with corresponding MR. Three sub-networks perform the whole task: pyramid and attention generator, standard discriminator, and task-induced discriminator [
34].
Bidirectional mapping GAN (BMGAN), a 3D end-to-end network, in [
35] makes use of image contexts and latent vectors to generate PET from brain MRI. The model employs the generator, the discriminator, and the encoder to fuse the semantic features of PET scans with the high-dimensional latent space. The forward mapping step during the model training allows the PET images to become encoded into the latent space. The backward mapping step enables the generator to produce PET images from the MRI and sampled latent vectors. Finally, the encoder reconstructs the input latent vector from the synthetic PET scan. A hybrid GAN (HGAN) employs a hybrid loss function to produce absent PET images taking a clue from corresponding MRI scans. A spatially-constrained Fisher representation (SCFR) network is used to derive statistical details from multimodal neuroimaging data [
36]. In [
37], cycleGAN generates fake FDG-PET from T1-weighted MRI in two manners; one is from three adjacent transverse slices, and the other is from 3D mini patches. Two CNNs, ScaleNet and HighRes3DNet, and one CGAN were trained to map structural MR to nonspecific (NS) PET images [
38].
Table 5 presents the summary of image translation.
3.1.2. Image Registration
The image registration technique processes the images to fuse them to extract more information. In some cases, moving images are transformed into fixed reference images. The purpose behind the image registration can be motion correction, pose estimation, spatial normalization, atlas-based segmentation, and aligning images from multiple subjects [
39].
A deep pose estimation network speedily enables slice-to-volume and volume-to-volume registration of brain anatomy. Transformation variables are adjusted by multi-scale registrations that initiate the iterative optimization process. CGAN learns region-based distortions of multimodal registration from T1- to T2-weighted images. Regression-type CNN predicts the angle-axis depiction of 3D movement. CycleGAN can be used for cases where paired images are unavailable [
40]. The cycleGAN-based model performs symmetric image registration of unimodal/multimodal images where an inverse consistency performs bi-directional spatial transformations between images. SymReg-GAN is the extension of cycleGAN that performs semi-supervised learning with labeled image pairs and neglects unlabeled pairs. The spatial transformer performs the differentiable operation and warps the moving image using an estimated transformation [
41].
The geometric transformation estimates the association of physically corresponding points within a pair of images’ fields-of-view (FOVs). This transformation can sometimes lead to asymmetric and biased mapping where the “fixed” image is unaffected and the “moving” image experiences an interpolation smoothing the image simultaneously. Most of the current registration methods are focused on asymmetric directional image registration. Multi-atlas-based brain image parcellation (MAP) is a technique in which numerous brain atlases are registered to a new reference map. Manually labeled brain regions are passed on and combined with the final parcellation result. The generator of multi-atlas-guided fully convolutional network (FCN) with multi-level feature skip connection (MA-FCN-SC) structure produces input brain image with parcellation [
42]. In the multi-atlas guided deep learning parcellation (DLP) technique, attributes of the most suitable map led to the parcellation process of the target brain map. FCN with squeeze-and-excitation (SE) sections GAN (FCN-SE-GAN) performs better than the MAP technique since this method avoids nonlinear registration. Improvement in the result is caused by three factors: brain atlases, automatic brain atlas selection, and GAN [
43]. An unsupervised adversarial similarity network performs registration without ground-truth deformation images and specific similarity metrics for the network training. Both mono-modal and multimodal 3D image registration can apply to the network. A spatial transformation layer connects the registration and the discrimination networks [
44]. Image registration is crucial for brain atlas building, but it also helps monitor the continuous advancements in multiple patient visits. A specific dataset can train the deep networks for applications where sufficient ground truth data is unavailable for training. However, a network trained to register a pair of chest X-ray images cannot produce the same quality output on a couple of brain MRI scans. In such cases, the network needs to be retrained. GAN-based registration of an image pair and segmentation and transfer learning is achieved. Other image pairs can easily use them without retraining. Two convolutional auto-encoders are used for encoding and decoding [
45].
Table 6 presents the summary of GAN-synthesized images used for registration.
3.1.3. Image Super-Resolution
The super-resolution (SR) technique converts low-resolution images to high-resolution images without compromising the scanner settings and imaging sequences. These SR methods achieve higher SNR and reduced blurriness at edges compared to conventional interpolating methods [
46]. In the super-resolution process, several low-resolution images taken from slightly different viewpoints are used to predict the high-resolution version. Sufficient prior information allows better prediction parameters than actual measurements [
47].
The single image super-resolution (SISR) method is vital for medical images as it helps diagnose the disease. A lesion-focused SR (LFSR) method is developed that produces seemingly more realistic SR images. In the LFSR method, a multi-scan GAN (MSGAN) produces multi-scale SR and higher-dimensional images from the lower-dimensional version [
48]. Training the GAN becomes complicated when the inputs are high-resolution and high-dimensional images; therefore, information learning is divided among several GANs. First, a shaping network in unconditional super-resolution GAN (SR-GAN) is employed to pick up the three-dimensional discrepancy in the shape of adult brains. Then, a texture network is applied in conditional pix2pix GAN to improve image slices with realistic local contrast patterns. Finally, the shape network is trained with the WGAN with Gradient Penalty (WGAN-GP) method. It is an unconditional generator that grasps the brain’s three-dimensional spatial distortions [
49].
In [
50], authors have used the progressive upscaling method to generate true colors. The multi-path architecture of the SRGAN model takes out shallow features on multiple scales where the filter sizes are three, five, and seven instead of a single scale. The upscaled features are matched back to a high-resolution image through a reconstruction convolutional layer. Enhanced sSRGAN (ESRGAN) implements super-resolution 2D MRI slice creation where slices from three different latitudes are selected for the 2D super-resolution and later reconstructed into a three-dimensional form. The first half of the three-dimensional matrices is reconstructed from high-resolution slices with good texture features. Then, the three-dimensional slices are repaired through interpolation to obtain new brain MRI data. The VGG16 [
51] is employed before activation to restore the features, solve over-brightness in SRGAN, and improve performance [
52]. The work of [
53] is also based on ESR-GAN, where two neural networks complete the super-resolution task. The first network, receiving field block (RFB)-ESRGAN, selects half the number of slices for super-resolution reconstruction and MRI rebuilding and upholds high-frequency information. The second network, the noise-based network (NESRGAN), completes the second super-resolution reconstruction task with noise and interpolated sampling, repairing the reconstructed MRI’s absent values. The linear interpolation technique is involved in feature extraction and up-sampling. The neonatal brain MRI is a low anisotropic resolution scan. To increase the resolution, medical images SR using GAN (MedSRGAN) uses residual whole map attention network (RWMAN) first to interpolate and then segment [
54].
Existing super-resolution methods are very scale-specific and cannot be generalized over the magnification scale. The medical image arbitrary-scale super-resolution (MIASSR) method coupled with GAN executes the super-resolution for modalities such as cardiac MR scans and chest CTs by exercising transfer learning [
55]. Similarly, in [
56], simultaneous super-resolution and segmentation are performed for 3D neonatal brain MRI on the simulated version of low-resolution. The learned model then upgrades and segments the real clinical low-resolution images. In 2D MR acquisition, the pulse sequence decides the slice thickness. The exact characteristics of signal excitation are not explicitly known; giving less information about slice selection profiles (SSP) creates insufficient training data. This problem can be solved by predicting a relative SSP from the difference between in- and through-plane image patches. The thicker slices and larger slice distance are maintained to decrease scan timing and achieve a high signal-to-noise ratio, resulting in a lower through-plane resolution than in-plane resolution. The GAN-based method focuses on improving the resolution of the through-plane slices where training data is a degraded version of in-plane slices to match the through-plane resolution [
57]. A high signal-to-noise ratio in an MRI scan can assist in correctly detecting Alzheimer’s disease. Utilizing the GAN-based SR technique, image quality equivalent to a 3-T scanner can be achieved without altering scanner parameters, even if the scans are obtained through 1.5-T scanners. The generator creates a transformation mask, and the discriminator differentiates the synthetic 3-T image from the original 3-T image [
58].
In [
59], fine perceptive generative adversarial networks (FPGANs) adopt the divide-and-conquer scheme to extract the low-frequency and high-frequency features of MR images separately and parallelly. The model first decomposes an MR image into low-frequency global approximation and high-frequency anatomical texture subbands in the wavelet domain. The subband GAN simultaneously performs a super-resolving process on each subband image, resulting in finer anatomical structure recovery. Study [
60] uses end-to-end GAN architecture to produce high-resolution 3D images. The training is performed in a hierarchical manner producing a low-resolution scan and a randomly selected part of the high-resolution scan simultaneously. It provides two-fold benefits: first, the memory requirement during training of high-resolution images is divided into small parts. Next, high-resolution volumes are converted to a single low-resolution image keeping anatomical consistency intact. Spatial resolution images are produced by direct Fourier encoding from three short-duration scans [
61].
Table 7 presents the summary of GAN-Synthesized images used for super-resolution.
3.1.4. Contrast Enhancement
In MR imaging, different sequences (or modalities) can be acquired that provide valuable and distinct knowledge about brain disease, for example, T1-weighted, T2-weighted, proton density imaging, diffusion-weighted imaging, diffusion tensor imaging, and functional MRI (fMRI) [
62,
63,
64]. The imaging process can highlight only one of them. Multiple scan acquisition processes and long scan times for capturing all contrasts can give rise to the cost and discomfort of the patient. The enhancement process that generates different contrasts from the same MRI sequences is helpful for overcoming the data heterogeneity [
65]. The contrast enhancement methods can be divided into three categories, as shown in
Figure 5.
- A.
Modality Translation:
In the MRI acquisition, discrete imaging protocols result in different intensity distributions for a single imaging object. The recent data-driven techniques acquire MR images from multi-center and multi-device with multi-parameters. This fact gives rise to the need for universal and uniform datasets. All studies discussed in this section generate one or more MRI modalities from one or more available modalities. The redundant information of the multi-echo saturation recovery sequence with different echo time (TE) and inversion time (TI) generates multiple contrasts generally used as a reference to find a mutual-correction effect. In [
66], the multi-task deep learning model (MTDL) synthesizes six 2D multi-contrast sequences: axial T1-weighted, T2-weighted, T1 and T2-FLAIR, short Tau inversion recovery (STIR), and proton density (PD) simultaneously. The registration-based synthesis approach is based on creating a single atlas responsible for loss in structural information of dummy multi-contrast images due to a nonlinear intensity transformation. Whereas the intensity-based method not depending on fixed geometric relationships among different anatomies and gives better synthesis results. PGAN is used for the generation when multi-contrast images are spatially registered, and CGAN is used when unregistered [
67]. MultiModal GAN (MM-GAN), a variant of Pix2Pix architecture, synthesizes the absent modality by merging the details from all available modalities [
68]. MI-GAN, an amendment over MM-GAN, is a multi-input generative model that creates the missing modalities. Commonly acquired modalities are T1-weighted (T1), T1-contrast-enhanced (T1c), T2-weighted (T2), and T2-fluid-attenuant inversion recovery (FLAIR). The absent one is created from the other three available modalities [
69]. The limitation of earlier methods of cross-modality generation is that they are not extendible to multiple modalities. The total M (M − 1) number of different generators will need to be trained to learn all sorts of mapping among M modalities. In addition to this, each translator can only use two out of M modalities simultaneously. The modality-agnostic encoder of a cycle-constrained CGAN draws out modality-invariant anatomical features and generates the desired modality with a conditioned decoder. A conditional autoencoder and discriminator can complete all pair-wise translations. Once the feedforward processing on any modality label is over, the same autoencoder is reused to make a condition on the modality label of the original input for the cycle reconstruction [
70]. The usual cross-modality image translation methods involving GAN models are based on paired data. Modular cycleGAN (MCGAN) performs unsupervised multimodal MRI translation from a single modality and retains the lesion information. The architecture includes encoders, decoders, and discriminators. MCGAN uses the combination of deconvolution and resize upsampling methods that avoid the checkerboard artifacts in the generated images [
71]. Edges in a medical image contain principal details of anatomy such as tissue, organ, and abrasion details. However, the images produced by a normal GAN have blurred boundaries. A flexible, gradient-prior integrated, encoder-decoder-based adversarial learning network (FGEAN) is an end-to-end framework of multiple inputs and multiple outputs that uses gradient-prior to retain tissue composition type of high-frequency details [
72]. Edge-aware GAN (Ea-GAN) is a 3D method that extracts voxel-wise intensity and image structure information to overcome slice discontinuity and blurriness problems. The Sobel operator is used to extract the edge details. The Sobel filter assigns higher weights to its nearer neighbors and lower weights to the farther neighbors, which is impossible with direct image gradient application [
73].
CycleGAN-based unified forward generative adversarial network transforms any T2-FLAIR images in different groups into a single reference one [
74]. In [
75], WGAN generates multi-sequence brain MR images with the advantage of stable learning. The Earth Mover (EM) distance (a.k.a. the Wasserstein-1 metrics) of WGAN allows minor mode collapse. In [
76], sample-adaptive GAN imitates each sample by learning its correlation with its neighboring training samples and applying the target-modality features as auxiliary information for synthesis. The self-attention GAN (SAGAN) of [
77] attends to various organ anatomical structures via attention maps which showcase spatial semantic details with the help of an attention module. In [
78], the GAN framework learns shared content encoding and domain-specific style encoding across multiple domains. CGAN in image modality translation (IMT) network employs nonlinear atlas-based registration to register a moving image to the fixed image. The PatchGAN classifier with no constraints on each patch’s size acts as the discriminator generating acute results with lesser parameters and a low running time [
15]. In [
79], GAN provides a solution for the detection of Small Vessel Disease (SVD) by estimating the advancement of White Matter Hyper-intensities (WMH) during a year. Disease Evolution Predictor (DEP) model notices WMH in T2-weighted and T2- FLAIR MRIs. DEP-GAN (Disease Evolution Predictor GAN), an extension of visual attribution GAN (VA-GAN), uses an irregularity map (IM) or probability map (PM) for both input and output modalities to represent WMH. This generated image is called Disease Evolution Map (DEM), which classifies brain tissue voxel among progressing, regressing, or stable WMH groups.
- B.
Quality Improvement:
High-resolution images are generated from down-sampled data during the MRI analysis to save the scan time. High-resolution images in one contrast improve the quality of down-sampled images in another contrast. The anatomical details of different contrast images refine the reconstruction quality of the image. This increase in image contrast is used for the classification of brain tumors [
80]. The intensity distributions of pixels in brain MR images overlap in regions of interest (ROIs) that cause low tissue contrast and create problems in accurate tissue segmentation. The cycleGAN-based model increases the contrast within the tissue by using an attention mechanism. A multistage architecture focuses on a single tissue preliminary and filters out the irrelevant context in every stage to increase high tissue contrast (HTC) images’ resolution [
81]. CycleGAN for unpaired data usually encodes the deformations and noises of various domains during synthetic image generation. The deformation invariant cycleGAN (DiCycleGAN) uses image alignment loss based on normalized mutual information (NMI) to strengthen the alignment between source and target domain data [
82]. Generation of high-resolution MRI hippocampus region images from low-resolution MRI is arduous. Difficulty-aware GAN (da-GAN) is designed with dual discriminators and attention mechanisms in hippocampus regions for creating multimodality images. These HR images are deployed to improve hippocampal subfields classification accuracy compared to LR images [
83]. In [
84], Sequential GAN, a combination of two GANs, generates bi-modality images from common low-dimensional vectors. Sequential multimodal image production first creates images of one modality from low-dimensional vectors. These synthetic images are mapped to their counterparts in the other modality through image-to-image translation. The synthetic FLAIR images are not as realistic in terms of quality as synthetic T1-weighted and T2-weighted images. In [
85], CGAN and the two parallel FCNs improve the quality of fake FLAIR images by retaining the contrast information of original FLAIR images. In [
86], the proposed method discovers and learns global contrast from the label images to embed this information in the generated images. The capability of 2-way GAN coupled with global features in U-Net bypasses the need for paired ground truth. The multimodal images with a better perceptual quality improve the learning capability of the model.
- C.
Single Network Generation:
Unified GAN, the improved version of starGAN [
87], generates multiple contrasts of MR images from a single modality. StarGAN can perform image translations among multiple domains with one generator and one discriminator. The single-input multiple-output (SIMO) model is trained on four different modalities. The network learns the details from the multimodal MR images and analogous modality markers. The generator takes an image of one modality producing a target modality image, and then performs the second task of recreating the original modality image through a synthesized one [
88]. The available methods of multimodal image generation target only the missing image production between two modalities. CycleGAN and pix2pixGAN can only create images from one modality to another; the former is used for unpaired images and the latter for paired images. Multimodality GAN (MGAN) simultaneously synthesizes three high-quality MR modalities (FLAIR, T1, and T1ce) from one MR modality-T2. Complementary information provided by these modalities boosts tumor segmentation accuracy. The architecture extends starGAN for paired multimodality MR images, adding modality labels to pix2pix. StarGAN brings the domain labels to cycleGAN and thus empowers a single network to translate an input image to any desired target domain for unpaired multidomain training images. Thus a single network translates a single modality T2 to any desired target modalities [
89].
3.1.6. Segmentation
Brain tissue segmentation in an MRI scan provides vital biomarkers such as quantification of tissue atrophy, structural changes, and localization of abnormality are crucial in disease diagnosis. DL-based segmentation methods are finding success in the automatic mode of segmentation. The segmentation methods that use GAN-synthesized MR images for atrophy detection can be grouped into three categories, as shown in
Figure 6 [
95].
- A.
Brain Tumor Segmentation:
Two standard techniques in brain tumor segmentation are patch-based and end-to-end methods. A multi-angle GAN-based framework fuses the synthetic images with the probability maps. The PatchGAN generator focuses on local image patches, randomly selects many fixed-size patches from an image, and normalizes all responses that improve the resultant image. The multichannel structure in the discriminator averages the responses to provide the output [
96]. A 3D GAN performs brain tumor segmentation by combining label correction and sample reweighting, where the dual inference network works as a revised label mask generator [
97]. The current glioma growth prediction is achieved by mathematical models based on complicated mathematical formulations of partial differential equations with few parameters resulting in insufficient patterns and other characteristics of gliomas. On the contrary, GANs prove the upper hand on mathematical models as they need not directly convert the probability density function to generate data. Plus, GANs can withstand overfitting by providing structured training. A 3D GAN stacks two GANs with conditional initialization of segmented feature maps for glioma growth prediction [
98]. Tumor growth prediction needs multiple time points of the same patient’s single or multimodal medical images. Again a stacked 3D GAN, GP-GAN, is used for glioma growth prediction [
99]. Deep convolutional GAN (DCGAN) first performs data augmentation by generating synthetic images to create a large data set. The image noise is also removed with the help of an adaptive median filter so that the resultant images are of superior features. After this preprocessing step, faster R-CNN uses this synthetic data for training, identifying, and locating tumors. The classification result is tumor placement under three types: meningioma, glioma, pituitary, and primary type [
100]. Manual delineation of lesions such as glioma, Ischemic lesions, and Multiple Sclerosis from MR sequences is tedious. Discriminative machine learning techniques such as Random Forest, Support Vector Machines, and DL techniques such as CNN and autoencoders detect and segment lesions from MR scans. However, generative methods such as GANs can also employ convolution operators to learn the distribution parameters [
101].
The class-conditional densities of lesions overlap because the pixel values of ROIs are distributed over the entire intensity range in MR scans. The existence of four major overlapping ROIs (non-enhancing tumor, enhancing, normal, and edema) of intensity distribution poses a challenge in the segmentation process. Enhancement and segmentation GAN (Enh-Seg-GAN) refines lesion contrast by including the classifier loss in model training, which estimates the central pixel labels of the sliding input patches. The CGAN generator modifies each pixel in the input image patch. It then forwards this to the Markovian discriminator. The synthetic image is concatenated with other fundamental modalities (FLAIR, T1c, and T2) to improve segmentation [
102]. Feature concatenation-based squeeze and excitation-GAN (FCSE-GAN) appends the feature concatenation block to the generator network to reduce noise from the image and the squeeze and excitation block to the discriminator network to segment the brain tumor [
103].
- B.
Annotation:
The second group describes the methods used to perform segmentation without manual data labeling and how annotation tasks are necessary for DL models. The annotation of medical images is a tedious task requiring good medical prowess. The annotated datasets are an essential requirement for supervised machine learning. The supervised transfer learning (STL) method for domain adaptation trains the GAN model on a source domain dataset and then fine-tuned it on a target domain dataset. The inductive transfer learning (ITL) method extracts annotation labels of the target domain dataset from the trained source domain model using cycleGAN-based unsupervised domain adaptation (UDA) [
104]. DCNN-based image segmentation methods are hard to generalize. A synthetic segmentation network (SynSeg-Net) trains a DCNN by unpaired source and target modality images without having manual labels on the target imaging modality. In [
105], cycleGAN performs multi-atlas segmentation with a cycle synthesis subnet and segmentation subnet. GANs are designed to generate properly anonymized synthetic images to safeguard the patients’ privacy information. In [
106], three GANs are trained on time-of-flight (TOF) magnetic resonance angiography (MRA) patches to create image labels for arterial brain vessel segmentation. Image labels created from deep convolutional GAN, Wasserstein-GAN with gradient penalty (WGAN-GP), and WGAN-GP with spectral normalization (WGAN-GP-SN) are applied to a second dataset using the transfer learning approach. The results of WGAN-GP and WGAN-GP-SN are superior to that of DCGAN. The structure of triple-GAN, which works on the principle of a three-player cooperative game, is modified to incorporate 3D transposed convolution in the generator. It performs tensor-train decomposition on all the classifier and discriminator layers and uses a high-order pooling module to take advantage of association within feature maps. The tensor-train decomposition, high-order pooling, and semi-supervised learning-based GAN (THS-GAN) classify MR images for AD diagnosis [
107]. In normal conditions, human brains are relatively symmetric. However, the presence of any mass lesion generates asymmetry in the brain structure because it displaces normal brain tissue. The symmetric driven GAN (SD-GAN) learns a nonlinear mapping between the left and right brain images in unsupervised manifold learning to detect tumors from scans that do not require symmetry [
108]. Segmentation tasks on medical images suffer the issues of generalization, overfitting, and insufficient annotated datasets. Guided GAN (GGAN) decimates the data points of an input image, due to which the size of the network is reduced, operating on only a few parameters [
109].
- C.
Multimodal Segmentation:
The shape or appearance model called Shape Constraint (SC-GAN) uses a Fully Convolutional Residual Network (FC-ResNet) fused with a shape representation model (SRM) for segmentation tasks on multimodal images in H&N cancer diagnosis. A pre-trained 3D convolutional auto-encoder is utilized for SRM as a regularizer in the training stage [
110]. Multimodal segmentation should have acceptable results in both source and target domains. However, the domain shifts between multiple modalities make the learning task of divergent image features through a single model challenging. Three-dimensional unified GAN executes the auxiliary translation task by extracting the modality-invariant features and upgrading low-level information representations [
111]. Hippocampal subfields segmentation based on SVM combined 3D CNN and GAN. Three-dimensional GAN-SVM acts as a generator and 3D CNN-SVM discriminator [
112]. One2One CycleGAN is used in survival estimation extracting features from MRI multimodal images. A single ResNet-based generator creates the T1 image from the T2 samples and the T2 image from the T1 samples reducing overfitting and providing augmentation to create virtual samples [
113]. MRIs are used to locate the abrasion of disease or to understand the fMRI-based effective connectivity (EC) within a set of brain regions. The task of locating the abrasion caused due to Multiple Sclerosis (MS) in brain images is a real challenge as there is much inconstancy in the intensity, size, shape, and location of these abrasions. In [
114], GAN uses a single generator with multiple modalities and multiple discriminators, one for each modality, to identify the NxN patch as real or fake.
3.1.7. Reconstruction
Although MRI is one of the very sought-after imaging methods for physical and physiological reasons, scanning time causes concern for patients [
115]. MRIs are reconstructed due to various reasons cited in
Figure 7.
- A.
MRI Acceleration:
The lengthy scanning process in which the samples are collected line-by-line in k-space (frequency domain and Fourier image space) is uncomfortable for the patients and becomes a reason for motion artifacts. The concept of accelerated MRI is crucial to tackling this issue. MRI is reconstructed from highly under-sampled (up to 20%) k-space data, especially in fetal, cardiac, functional MRI, multimodal acquisitions, and dynamic contrast enhancements. The acquisition time is lowered by less slice selection, reducing the spatial resolution. The sweep time can also be lessened by selecting a partial k-space and approximating the absent k-space points. A k-space and an image-space U-Net reconstruct the whole k-space matrix from under-sampled data [
116]. The compressed sensing (CS) MRI scheme reduces the sweep time by considering a minor set of samples for image construction. Refine-GAN, the adapting fully-residual convolutional auto-encoder, and general GAN, is the base for fast and precise CS-MRI reconstruction. A chained network enhances the reconstruction quality [
117]. Traditional CS-MRI is affected by slow iterations and noise-induced artifacts during the high acceleration factor. The RSCA-GAN uses spatial and channel-wise attention with long skip connections to improve the quality at each stage, accelerating the reconstruction process and removing the artifacts brought by fast-paced under-sampling [
118]. Parallel imaging integrated with the GAN model (PI-GAN) and transfer learning accelerates MRI imaging with under-sampling in the k-space. The transfer learning removes the artifacts and yields smoother brain edges [
119]. Reforming multi-contrast brain MR images from down-sampled data points can save scanning time [
120].
- B.
MR Slice Reconstruction:
MR slice reconstruction is performed to examine brain anatomy and surgery maneuverings as the modality provides high resolution. Thin-section images are 1 mm wide with a spacing gap of zero, while thick-section images are 4 mm to 6 mm wide with a spacing gap of 0.4 mm to 1 mm. The higher value of thickness leads to low resolution. The GAN and CNN are combined to reconstruct thin-section brain scans of newborns from thick-section ones. The first stage of the network is a Least-Square GAN (LS-GAN) with a 3D-Y-Net generator. This stage fuses the images of the axial and sagittal planes and maps them onto thin-section image space. The cascade of 3D-DenseU-Net and a stack of enhanced residual structures removes image artifacts and provides recalibrations and structural improvements in the sagittal plane [
121]. Unsupervised medical anomaly detection (MAD)-GAN uses multiple adjacent brain MRI slice reconstructions to locate brain anomalies at different stages of multi-sequence structural MRI [
122]. The edge generator of Edge-guided GAN (EG-GAN) joins the missing edges of low-resolution images and masks produced from missing slices in the through-plane as input. A contrast completion network employs these connected edges to predict the voxel intensities in the missing rows [
123]. Conditional deep convolutional generative adversarial neural networks (CDCGANs) are used to forecast the advancement of AD by producing dummy MR images in the series arrangement. The atrophy is measured using the cortical ribbon (CR) fractal dimension (box-counting method). The method uses only one coronal slice of a patient’s baseline T1- image. A reducing fractal dimension ensures the progressing illness [
124]. The brain multiplex image represents the brain connectivity status extracted from MRI scans. The association between the two brain regions of interest is quantified based on function, structure, and morphology. A single network, adversarial brain multiplex translator (ABMT), performs brain multiplex estimation and classification for the vision of gender-related distinction linkages. Brain multiplexes are constructed from a source network intra-layer, a target intra-layer, and a convolutional interlayer. The ABMT is the improved version of GT-GAN [
125], which has pioneered graph or network translation. Contrary to conventional GAN, the generator (translator) of GT-GAN picks up the generic translation mapping from the source network to the target network simultaneously [
126]. 3D CGAN and a local adaptive fusion method are used for quality FLAIR image synthesis. They synthesize each slice separately along the axial direction and concatenate them into a 3D image. This synthesis predicts the coronal and the sagittal direction by analyzing complete images or large image patches [
127].
- C.
Enhancement of Scan Efficiency:
CGAN enhances the scan efficiency of under-sampled and multi-contrast procured images. The shared high-frequency-prior present in the source contrast is used to maintain high-spatial-frequency features. The low-frequency-prior in the under-sampled target contrast is used to avert feature leakage or quality loss. The perceptual prior is used to upgrade the retrieval of high-level attributes. Reconstructing-synthesizing GAN (RS-GAN) generator estimates the target-contrast image from either fully sampled or partially under-sampled source-contrast image [
128]. The tissue susceptibility in various brain diseases is measured via the quantitative susceptibility mapping (QSM) technique. The inherent issue of dipole inversion can affect the reliability of the susceptibility map. QSM-GAN is a 3D U-Net that solves the dipole inversion problem in QSM reconstruction [
129]. A directed graph represents a brain-effective connectivity network where nodes denote brain regions. EC-RGAN is a recurrent GAN that applies effective connectivity generators to acquire the temporal information from the fMRI time series and refine the quality [
130]. Double inversion recovery (DIR) improves FLAIR images acquired with a higher sensitivity for lesion diagnosis than conventional or fluid-attenuated T2-weighted scans. They are beneficial for detecting cortical plaques in MS. DiamondGAN can increase image details via multi-to-one mapping where various input modalities (in this case, T1, T2, and FLAIR) are utilized to produce one output modality (in this case, DIR) [
131]. Three-dimensional multi-information GAN uses structural MRI to find cortical atrophy to predict disease progression. First, a 3D GAN model generates 3D MRI images at future-time points; then, a 3D-densenet-based multiclass classification identifies the stages of produced MRI [
132]. Visual scenes can be reconstructed from human brain activity measured with fMRI. Dual-Variational Autoencoder/Generative Adversarial Network (DVAE/GAN) learns the mapping from fMRI signals to their corresponding visual stimuli (images). Cognitive Encoder, Visual Encoder, and GAN transform the high-dimensional and noisy brain signals (fMRI) into low-dimensional latent representation [
133].
- D.
Bias-free MRI Scan:
MRI scanners inherently produce bias, resulting in soft intensity changes across the scans. Two GANs were simultaneously trained to reconstruct the plain bias field and a bias-free MRI scan [
134]. Ultrahigh-field MRI introduces high signal inhomogeneity in the scanned images, giving rise to different non-uniform power concentrations in the tissues. The regional Specific Absorption Rate (SAR) varies spatially and temporally with possible hubs in several hard-to-predict positions. A CGAN model can assess the subject-specific local SAR, otherwise hard to compute, and is rated by offline numerical simulations using generic body models. A CNN learns to portray the connection between subject-specific complex B1+ maps and the corresponding local SAR [
135].
3.1.9. Data Augmentation
The success of DL models depends on large data samples, which is precisely the constraint of brain MRI. The data augmentation method is used to enlarge the training dataset to improve synthetic image quality without adding new samples to the set. The actions such as translating, rotating, flipping, stretching, and shearing the existing images in the dataset can augment the set; however, these methods lack diversity in the newly generated image samples. The training can become affected towards suboptimal results [
143]. Generative modeling maintains similar features to the actual data set while developing the dummy version of the existing images. Deep convolutional GAN generates dummy images using strided convolutions to carry out upsampling in place of max-pooling layers [
144]. A timely examination of the brain’s status is crucial in preventing Parkinson’s disease (PD) and hindering its spread. Automatic diagnosis methods use either single-view or multi-view scans to execute the classification or prediction of PD. A WGAN operates on multi-view samples from the MRI dataset containing the cross-sectional view (AXI) and the longitudinal view (SAG). The prodromal class with fewer AXI/SAG MRI data samples causes the problem of over-fitting or under-fitting in an application. Two ResNet networks are trained jointly on the two-view data to create more samples for the prodromal class in AXI and SAG [
145].
Class imbalance is a significant issue in abnormal tissue identification and classification in medical analysis. In the case of imbalanced data, a predominant class is filled with usual vital samples, while a subsidiary class is with ailing samples. When a model is trained with a dataset with visible disparity, it generates biased results towards healthy data giving rise to predictable outputs by the network and low sensitivities. The class distribution can be balanced by re-sampling the data space, similarly to oversampling of the predominant class and under-sampling of the subsidiary class, construction of a new compact dataset in an iterative sampling manner by bypassing unessential details, ensemble sampling, and hybrid sampling. A pair-wise GAN architecture uses a cross-modality input to increase heterogeneity in the augmented images. GAN-augmented images are utilized in the pre-training phase, and then real brain MRIs complete advanced training leading to synthetic MR images from one modality to another [
146]. Brain tumors are segregated into meningioma, glioma, and pituitary tumors. The direct resemblance in the three classes results in a complex classification procedure in MRI images. A multi-scale gradient GAN (MSG-GAN) synthesizes MRI images with meningioma disease and uses transfer learning to improve classification performance [
147]. Noise-to-image and image-to-image GANs enhance the data augmentation (DA) effect. Progressive growing of GAN (PGGAN) is a multistage noise-to-image GAN used for high-resolution MR image generation. Refinement methods such as multimodal unsupervised image-to-image translation (MUNIT) or SimGAN rectify the texture and shape of the images produced by PGGAN close to the originals [
148]. A moderate-sized glioma dataset can affect the precise brain tumor categorization using several MRI modalities such as T1-weighted, T1-weighted with contrast-enhanced, T2-weighted, and FLAIR. Pair-wise, GAN trained on two input channels, unlike the normal GAN with only one input channel, augments the brain images to the compact dataset [
149]. Two types of perfusion modalities, dynamic susceptibility contrast (DSC) and dynamic contrast-enhanced (DCE), are used to generate realistic relative cerebral blood volume (RCBV). The CGAN is trained on brain tumor perfusion images to learn DSC and DCE parameters with a single gadolinium-based contrast agent administration [
150]. AGGrGAN in [
151] is a collection of three base GAN models—two variants of deep convolutional GAN (DCGAN) and a WGAN that generate synthetic MRI images of brain tumors. The model uses the style transfer technique, selects distributed features across the multiple latent spaces, and captures the local patterns to enhance the image resemblance.
In a stroke disease, the brain cells start dying due to insufficient blood to the brain (cerebral ischemia) or moments of internal bleeding (intracranial hemorrhage). CGAN generator is trained on specially altered lesion masks to create synthetic brain images to enlarge the training dataset. CNN segmentation network includes depth-wise-convolution-based X-blocks and feature similarity module (FSM) [
152]. IsoData (Iterative Self-Organizing Data Analysis Technique) is an unsupervised segregation method measuring the means of the classes uniformly dispersed within data space and clusters the rest of them iteratively based on the minimum distance methods. Every iteration computes the new mean and classifies pixels. The WGAN-based process depends upon the image histogram by generalizing more than two classes and splitting, merging, and deleting the class depending on the input threshold parameters [
153]. Functional connectivity GAN (FC-GAN) generates the functional brain connectivity (FC) patterns obtained from fMRI data amplifying the efficiency of the neural network classifier. VAE and WGAN-based network contains three parts, the encoder, the generator, and the discriminator [
154].
The connectome-based sample generation is another approach for data augmentation. A generative adversarial neural network auto-encoder (AAE) framework produces synthetic structural brain connectivity instances of MS patients even for an unbalanced dataset [
155]. The number of samples in the regular fMRI datasets is insufficient for training. Multiple GAN architecture generates new multi-subject fMRI points. The multiple GAN architectures used in this method are cycleGAN, starGAN, and RadialGAN and do not need label details to determine the relation matrix. The cycle-GAN is not expandable to multiple domains because of the N (N-1) mappings to be learned for N domains. StarGAN is expandable to multiple domains using a single generator for multidomain translation tasks. RadialGAN can successfully extend the target dataset by employing multiple source datasets [
156]. Dual-encoder BiGAN architecture duplicates abnormal samples within a normal distribution. Anomaly detection in BiGAN reduces bad cycle consistency loss due to insufficient sample data information [
157]. The approach in [
158] generates annotated diffusion-weighted images (DWIs) of brains showing an ischemic stroke (IS). Realistic DWIs are generated from axial slices of these 3D segmentation maps with the help of three generative models: Pix2Pix, SPADE, and cycleGAN.