**1. Introduction**

In medical imaging, magnetic resonance imaging (MRI) is commonly used because it can capture the anatomical structure of the human body without exposing subjects to radiation. MRI scanners that generate a strong magnetic field can scan images with a higher spatial and contrast resolution than commonly used scanners. These high-resolution MR images are preferred in both the clinical and research fields since more information can be obtained from a single scanning session, leading doctors to diagnose diseases earlier or computers to analyze images more precisely.

3T MRI scanners provide high spatial resolution and contrast images and are widely used in clinical practice and research studies. Moreover, 7T ultra-high-field scanners are now becoming available for research use, providing ultra-high resolution images that depict fine anatomical structures in unprecedented detail and with higher contrast. Such ultrahigh resolution MRI is attractive because it has the potential to capture mild disease-related anatomical changes that are difficult to identify with 3T MRI. Alternatively, obtaining highdefinition images with commonly used scanners requires longer scanning times and places a burden on the patient. In such a situation, super-resolving techniques draw attention, which translates low-resolution (LR) MR images to high-resolution (HR) MR images [1].

**Citation:** Ikuta, K.; Iyatomi, H.; Oishi,K.; on behalf of the Alzheimer's Disease Neuroimaging Initiative Super-Resolution for Brain MR Images from a Significantly Small Amount of Training Data. *CSFM* **2022**, *3*, 7. https://

Academic Editors: Kuan-Chuan Peng and Ziyan Wu

doi.org/10.3390/cmsf2022003007

Published: 27 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Resolution enhancing methods for MRI can be categorized into two groups: (1) processing the raw signal from the MRI scanner to improve the resolution to be reconstructed and (2) translating already reconstructed LR images into HR-like images, so-called superresolution (SR).

From a practical point of view, we chose a post-processing method instead of processing the raw signal from the scanner. The choice is for the following three reasons: (1) MR images are usually stored as rendered image files, while the raw signal data are discarded immediately after each scan. In this approach, an extensive archive of legacy MR images can be used. (2) Post-processing can be used to perform super-resolution. (3) This approach is independent of specific scanner hardware and scan protocols, and can be applied to many MRI contrasts, such as T1-MRI, T2-MRI, diffusion MRI, and functional MRI.

Although these deep-learning (DL) methods, including recent generative adversarial network (GAN) -based ones, have many desirable features from non-DL techniques, they have not ye<sup>t</sup> been able to synthesize images as if they were taken by a high-field scanner. This is because most of their methods are designed to be trained with pairs of ordinary resolution MRI and its shrunken version. Therefore, the conventional SR methods can only learn to translate low-resolution images to normal-resolution images and cannot perform normal-to-high translation, which is an essential demand by clinicians and researchers.

What makes the normal-to-high translation difficult is the limited number of highdefinition training images. Deep neural networks, especially GANs, require a large number of training samples to achieve desired performance. Due to the limitation of not having access to images taken by high-end scanners, it is virtually impossible to apply existing DL-based algorithms.

This paper proposes a simple ye<sup>t</sup> effective GAN-based super-resolution method. Compared to the existing DL-based super-resolution methods, the proposed method requires significantly less training of MR images (dozens of data) and generates highquality SR images. The proposed method comprises two techniques: stochastic patch sampling (SPS) and artifact suppressing discriminator (ASD). The SPS partitions input LR MR images into several smaller patches (i.e., cubes) first. After the partitioning, the ESRGAN-based neural network takes each LR patch as an input, and then outputs the corresponding upscaled HR patch. Here, the ASD eliminates discontinuities in the joints of each patch and generates natural-looking high-resolution images. In our experiments to evaluate the performance of our SR method using 7T MR images of 37 patients, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) were significantly improved from 16.19 to 26.92 and 0.766 to 0.944, respectively, compared with baseline ESRGAN. In addition, the diagnostic performance of the Alzheimer's disease discriminator trained on super-resolution processed images improved from 80.31% to 83.85%.

### **2. Related Works**

In the last decade, the accuracy of single image super-resolution for general (nonmedical) images has increased significantly along with the advancement of DL-based algorithms [2]. Originating from the super-resolution convolutional neural networks (SRC-NNs) [3], a very first successful attempt to utilize convolutional neural networks to perform super-resolution, many studies have proposed DL-based SR techniques. Very-deep superresolution (VDSR) [4] extended SRCNN with a deeper network to improve the accuracy. Enhanced deep residual networks for single image super-resolution (EDSR) [5] also introduced a deeper network with residual connection from ResNet [6]. In more recent years, significant quality improvements have been achieved by several generative adversarial network (GAN)-based SR methods [7]. A super-resolution generative adversarial network (SRGAN) [8] achieved significant improvement in pixel-wise accuracy by introducing a discriminator to their ResNet-like SR network. An enhanced super-resolution generative adversarial network (ESRGAN) [9] made even more improvements with a DenseNet-like generator [10] and Relativistic discriminator [11].

Along with the advancement of super-resolution methods for general images, studies for applying SR for medical images have also been made. Pham et al. [12] applied SRCNN to MR images to enhance spatial resolution. The improvement with GAN-based techniques has also been applied to medical imaging fields [13]. Sánchez and Vilaplana [14] utilized a simplified version of SRGAN to MR images. Yamashita and Markov [15] improved the quality of optical coherence tomography (OCT) images with ESRGAN.

### **3. Proposed Method**

In this paper, we propose a new super-resolution technique for brain MR images with a significantly smaller number of training images. To train the GAN-based super-resolution network, SPS randomly selects many small cubic regions from the input images and feeds them into the network. While the SPS enables the network to be effectively trained with a few images, it also introduces intensity discontinuities around the boundaries of the patches. The ASD suppresses such discontinuities by implicitly knowing the location information of its input patches by referring to both the HR and the generated SR image.

### *3.1. The Network Architecture*

Figure 1 illustrates the schematics of the proposed method. For the network architecture, we used a slightly modified version of the ESRGAN. The modifications we applied are as follows: (1) all the layers such as convolutions, poolings, and pixel-shufflers are changed to their three-dimensional version to process volumetric MR images, and (2) the number of residual-in-residual dense block (RRDB) is reduced from 23 to 5 because the expected input size is smaller than the original ESRGAN.

**Figure 1.** The schematics of the proposed method. Note that all 3D images are drawn in 2D for the sake of visibility.

### *3.2. Stochastic Patch Sampling (SPS)*

Each image is split into a set of smaller three-dimensional cubic patches by randomly choosing the coordinate in the image space. If the patch is sampled from the background and does not contain any brain structure, the patch is automatically rejected and repeatedly re-sampled until sampled from the appropriate coordinate. While the total amount of information fed to the network is theoretically identical, using a collection of sampled patches has several benefits rather than using the whole image at once. Compared to the whole-brain image, the sampled patch is relatively tiny. With this approach, we can use more training samples per batch during mini-batch learning, enabling more stable optimization. The network will also be more robust for unregistered images since the input patches have more significant spatial variances introduced in random sampling. At the inference phase, patches are sampled evenly from the input image with the grid manner. Then, each patch is upscaled by the network and combined into a single image.

### *3.3. Artifact Suppressing Discriminator (ASD)*

Since the network processes small patches separately and performs super-resolution on each one individually, there is no mechanism to sustain the consistency of the final combined image. This lack of consistency causes discontinuities at each patch's joints, resulting in an unpleasant final image. To address this issue, we introduced ASD, which is an extension of the common GAN discriminator. ASD takes two images; one is always a "real" (or HR, in the context of super-resolution) image, and the other is a generated image or an HR image combined as a two-channel image. Accordingly, the discriminator takes (HR+HR) or (HR+SR) images during training. The proposed ASD can extract more discriminative feature representations by learning the correlation/difference between HR and SR images, while common discriminators take HR and SR images independently.
