**4. Experiments**

The purpose of super-resolution is to aid clinicians or computers in analyzing images more precisely, providing more information on smaller structures. To confirm the effectiveness of the proposed techniques, we investigate the impact of the proposed superresolution network on disease classification performance in addition to regular image quality evaluation.

### *4.1. Dataset and Preprocessing*

As the high-resolution reference images, we used 37 scans of T1-weighted MR images from the DS000113 ("Forrest Gump") dataset [16] and 11 images from the DS002702 dataset [17], both published by OpenNeuro (https://openneuro.org/, accessed on 20 September 2020). Both datasets are provided as a collection of functional-MRI (fMRI) images but also contain T1-weighted still MR images we used, which were taken on a high-field 7T scanners. After the skull removal and intensity normalization, each HR image is shrunken to 50% to make an LR image, providing high- and low-resolution training pairs. At the SPS phase during the training, we randomly sampled 2500 24 × 24 × 24 patches per one high-resolution image and downsampled them to 50% resolution to make low-resolution images.

### *4.2. Training of the Network*

Since the network input and output are three-dimensional volume data, we cannot use the perceptual loss with a VGG network used in the original SRGAN and ESRGAN because it must be pre-trained with the ImageNet dataset. To train the network to generate images with more fidelity, we added mean-squared error (MSE) of gradients of the images for all directions to capture finer transitions of the intensity.

As for optimization of both networks, we used the Adam optimizer with the same learning rate and *β*1, *β*2 parameters with the original ESRGAN.

### *4.3. Assessing the Image Quality*

First, we measure the two most standard metrics for assessing a super-resolution system: (1) peak signal-to-noise ratio (PSNR) and (2) structural similarity (SSIM) between each output image and its corresponding original high-resolution image. In general, SR studies using GAN, Inception Score, and Fréchet Inception Distance (FID) are often used. However, scores are calculated based on low-dimensional representations of two-dimensional images by models trained on everyday objects (e.g., ImageNet) and are unsuitable for this evaluation. We also investigate the line profile of the optic thalamus, which is difficult to see for fine structure with a conventional MR scanner.

### *4.4. Assessing the Impact on Improving Diagnostic Performance*

In addition, the purpose of super-resolution is to aid clinicians or computers in analyzing images more precisely, providing more information on smaller structures. Therefore, we investigate the effectiveness of the proposed SR method on disease classification performance. This way, we can emulate one of the real-world applications of super-resolution for medical images.

In this experiment, we used 650 images from the ADNI2. (Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). For up-to-date information, see www.adni-info.org, accessed on 20 September 2020) dataset, containing 360 cognitively normal (CN) images and 290 Alzheimer's disease (AD) images. Each image is downsampled into 1.4-mm pixel spacing to match the training data after skull removal and intensity normalization. We first performed super-resolution for all images to make pseudo-high-resolution training/validation samples. Then, we trained a three-dimensional version of MobileNetV2 with 90% of the images and evaluated the area-under-curve (AUC) score with the remaining 10% of them.

To confirm that the proposed SR process recovers some of the lost information from low-resolution images, we also trained a classifier with downsampled images. Then, we trained another classifier with super-resolved downsampled images and compared their AUC score. We used 50% and 25% for the downsampling scales and three-dimensional MobileNetV2 for the classifier network.

Here, we defined "recovery ratio" to measure how much information is recovered from a low-resolution image as follows:

$$\text{recovery ratio} = \frac{\text{AUC(SR)} - \text{AUC(LR)}}{\text{AUC(HR)} - \text{AUC(LR)}},$$

where AUC(HR), AUC(LR), and AUC(SR) are the AUC score on HR images, LR (×0.5 downsampled from HR) images, and SR images, where 2× super-resolved images are applied for LR images, respectively. Here, we assume that the AUC with the SR images does not exceed that with the HR images, i.e., the AUC with the HR dataset is the upper bound for the resolution.
