1. Introduction
Studying the macaque brain provides a crucial avenue for understanding human brain mechanisms in neuroscience research [
1]. Currently, the macaque monkey serves as a prominent primate model and has become a vital subject for investigating the human brain using various medical imaging techniques [
2,
3].
Diffusion magnetic resonance imaging (dMRI) technology detects the movement direction of water molecules in the brain, utilizing the anisotropic diffusion characteristics of water molecules in the white matter to reconstruct the white matter in the brain. The b-value represents the intensity of the diffusion-sensitive gradient field, which, along with its corresponding three b-vectors, reflects the influence of microstructural tissue on water diffusion within living tissue in dMRI. Researchers commonly refer to the images corresponding to different b-value intensities in the dMRI volume as b-value images. Diffusion tensor imaging (DTI) estimation and probabilistic tractography techniques are established methods for reconstructing major white matter fiber bundles in brain imaging [
4]. Typically, dMRI images consist of multiple b-value images, with low b-value (b < 10, recommend b = 0) volumes serving as the basis for DTI, which is crucial for data analysis in neuroscience research. Nowadays, to mitigate interference such as head motion during acquisition, one low b-value image often corresponds to 5–10 high b-value (b > 300,commonly b = 1000) volumes [
5]. However, in some publicly accessible macaque brain dMRI datasets, the ratio of low b-value to high b-value volumes may be below 1:5 or even 1:10, which could be due to the use of early acquisition protocol configurations [
6]. The reliability of computed results, such as DTI estimation, from analyses using data that do not meet the required ratio needs further confirmation. Therefore, it is necessary to generate and optimize low b-value volumes in macaque brain dMRI data.
Medical image-to-image translation refers to the method of translation images from an input image modality to an output modality through a mapping relationship. This approach can be used to acquire additional data or complete missing data [
7], and it can be applied to downstream tasks in medical image processing, such as image registration and segmentation [
8,
9], as well as image classification [
10].
Generative adversarial networks (GANs) are network models based on game theory, consisting of a generator and a discriminator [
11]. The generator attempts to generate high-quality images to deceive the discriminator, while the discriminator distinguishes between real and generated images. Both sides become stronger in the adversarial process, resulting in the generator producing increasingly realistic images. With the emergence of GANs, the performance of medical image-to-image translation has been greatly improved. Initially, GANs were only used to generate images from random noise. With researchers attempting to use Transformer as the generator of GANs [
12], Transformer is being applied in the field of medical image-to-image translation. The advent of pix2pix and CycleGAN propelled the performance of GANs in image-to-image translation tasks [
13,
14]. New methods harness the powerful generative capabilities of GANs to produce visually and objectively superior images.
Some researchers have explored the application scenarios of CycleGAN in medical image-to-image translation [
15,
16,
17], but more efforts have been devoted to improving CycleGAN for better application in unsupervised learning settings [
18,
19,
20,
21,
22,
23]. Methods based on CycleGAN are unsupervised approaches, with the advantage of being able to perform mutual translation between two domains without requiring paired images. However, because CycleGAN serves two image translation tasks, its performance on the generation task in a single target domain is generally inferior compared with supervised methods.
Compared with CycleGAN, methods based on Transformer are a supervised learning approach. Some researchers have employed Transformer for medical segmentation [
24], MRI reconstruction [
25], and medical image-to-image translation [
8,
26,
27]. However, Transformer requires a large amount of data, but publicly accessible samples of macaque dMRI data are limited, making it challenging to fully leverage the advantages of Transformer [
28].
The pix2pix-based method is also a widely used supervised learning approach for medical image-to-image translation. The Synb0-DisCo method applies the pix2pix technique to correct distorted b0 images [
29]. pGAN and Ea-GAN, respectively, enhance the image detail capability by improving the loss function and considering edge information [
30,
31]. MedGAN [
32] employs a cascaded U-Net as its generator for various medical image translation tasks. As pix2pix-based methods are designed for the generation task of a single target domain with paired image data, they often exhibit higher generation accuracy in medical image-to-image translation tasks. However, since such methods typically rely on a single generative adversarial network, they lack in detail learning.
Furthermore, all these methods share a common issue. Currently, most studies on modality translation of brain MRI images are based on human brain GRAY color space, with the aim of providing visually interpretable images [
21,
26,
31,
33]. Medical imaging signal intensity values have absolute significance [
8], and are required for probabilistic tractography calculations, rather than the typical GRAY color range of bitmap images. Therefore, the images generated by the aforementioned methods cannot meet the requirements of computational neuroscience research.
In this work, we introduce the concept of peak information maps and propose a novel end-to-end primary-auxiliary dual GAN network (PadGAN), which can extract latent space features from peak information maps to translate high-quality low b-value images. The generated low b-value images can be used for augmenting dMRI image data, improving the quality of dMRI images. The results show that PadGAN outperforms existing methods in qualitative observations and quantitative metrics, and the effectiveness of each module is validated through ablation experiments. Finally, we use the Xtract toolbox [
34] in FSL6.0 (FMRIB Software Library) tools [
35] to perform probabilistic tractography and use FSL tools to conduct DTI estimation on dMRI data augmented. The Xtract calculation results of dMRI data augmented using our method are more satisfactory. In summary, the specific contributions of this paper are as follows:
We introduce the concept of peak information maps and design a corresponding method for calculating peak information maps.
We propose a novel end-to-end primary-auxiliary dual GAN network to translate high b-value images to low b-value images. In this network, the auxiliary generator extracts latent space features from peak information maps and transfers these features to the primary generator. The primary network integrates the latent space features and multi-scale features to generate low b-value images.
Through DTI estimation and Xtract probabilistic tractography experiments, we validate the effectiveness of generating low b-value images for augmenting original dMRI data, providing new validation approaches for quality assessment in brain science research and offering optimized dMRI data for brain science studies.
3. Results
3.1. Comparison Experiments and Results
The method proposed in this paper is compared with five existing methods that have shown good performance in the field of medical image-to-image translation research. Specifically:
Pix2pix [
13] network adopts the U-Net architecture as the main framework of the generator.
CycleGAN [
14] network shares the same generator architecture as pix2pix, but it involves two generators and two discriminators for cyclic generation tasks.
SwinUnet [
24] utilizes the Swin Transformer as the main framework for medical image segmentation tasks, adapted for application in this paper.
ResViT [
26] builds upon the Vision Transformer architecture as the main generator framework.
pGAN [
30] adopts ResNet as the main framework.
For the comparative experiments, the original models’ architectures and training parameters are used during the training process. All models are pre-trained for 20 epochs and trained for an additional 80 epochs on an NVIDIA GeForce RTX 3090. Structural similarity (SSIM), peak signal-to-noise ratio (PSNR), and mutual information (MI) are selected as quantitative evaluation metrics in this paper.
Table 3 lists the comprehensive results of the AMU, Mount Sinai-P, Mount Sinai-S, UCDavis, and UWM sites, each containing non-brain tissue. To compare the results with only brain tissue, the non-brain tissue is removed from all results, as shown in
Table 4, which displays the results after excluding non-brain tissue for the five sites. The overall results are consistent with
Table 1, but there is a slight decrease. Subsequent experiments show results after excluding non-brain tissue. The specific results for the five datasets are shown in
Table 5, and
Figure 5 and
Figure 6. The CycleGAN method produces results closer to the source images on most datasets. Although this method employs a dual-generator and dual-discriminator structure, with each generative adversarial network serving separate tasks for generating target and source images, it is suitable for scenarios where paired images are not required in both domains. In contrast, both generative adversarial networks in our method are dedicated to generating low b-value images, resulting in better visual observations and evaluation metrics. The pGAN method fails to generate detail-rich images, as it uses ResNet as the basic generator architecture with a deeper network structure, but lacks the capability to retain encoder feature map information like U-Net. Our method utilizes the advantages of the U-Net architecture to capture features from different layers, thereby preserving detailed image information. Transformer-based ResViT and SwinUnet methods exhibit relatively generic performance due to the differences in global information from different sites in the macaque brain image dataset and the limited data samples. In contrast, our method, a fully convolutional neural network, maximizes the local generation capabilities of convolutional neural networks. The Pix2pix method, a single generator adversarial network based on the U-Net generator architecture, performs well in generating global structural features but lacks detailed features. Our method addresses this limitation by using the auxiliary generative adversarial network to provide hidden space containing more detailed features, thus compensating for the shortcomings of the single generator adversarial network in capturing detailed features.
3.2. Ablation Experiments and Results
We conducted three ablation experiments to further investigate the role and effectiveness of the auxiliary generator in our proposed method. The details of the experiments are as follows: (1) removing the auxiliary network and retaining only the encoder part of the auxiliary network to encode the peak information map, to verify the role of the auxiliary network; (2) replacing the latent space features extracted by the auxiliary generator with random Gaussian noise to explore the role of latent space features; and (3) directly reusing the weights of the main generator in the auxiliary network to verify whether the auxiliary network needs to be trained separately.
The results are shown in
Table 6. (1) After removing the auxiliary network, PSNR decreased by 5.1256, SSIM decreased by 0.1225, and MI decreased by 0.0736. This indicates that the auxiliary generator plays an important role in improving the network performance. (2) When replacing the auxiliary generator with noise, PSNR decreased by 4.2291, SSIM decreased by 0.0649, and MI decreased by 0.0445. This suggests that the auxiliary generator can effectively extract latent space features from the peak information map. (3) When reusing the main network’s network weights in the auxiliary network, PSNR decreased by 1.8627, SSIM decreased by 0.0385, and MI decreased by 0.0371, fully demonstrating that the latent space learned by the auxiliary generator is different from that of the main generator, and the auxiliary generator has a necessary existence.
3.3. Xtract and DTI Estimation Results
Xtract is a robust probabilistic tractography method integrated into the FSL6.0 software package. It utilizes dMRI data to estimate the trajectories and connectivity patterns of white matter tracts. To assess the effectiveness of the augmented macaque dMRI brain images through our proposed method, we employed Xtract to compute the structural connectivity of dMRI brain images. Eight subjects were selected from the UCDavis dataset, and the images generated by pix2pix and PadGAN were respectively added to the corresponding dMRI data. Subsequently, we conducted Xtract tractography experiments on the dMRI images augmented by the pix2pix and PadGAN methods, as well as the original reference dMRI images, resulting in a total of 42 fiber tracts.
As shown in
Figure 7, the fiber bundle visualization results demonstrate that, compared with pix2pix, our method captures more fiber bundles visually, and the shapes are similar to the reference results. It is worth noting that our results display more and clearer fiber bundles within the white rectangular area.
DTI is a magnetic resonance imaging technique used to study the diffusion properties of water molecules within tissues. DTI offers various diffusion parameters, with the most commonly used being fractional anisotropy (FA) and mean diffusivity (MD). FA represents the degree of directional diffusion of water molecules within the tissue, while MD represents the average strength of water molecule diffusion. To better evaluate the quality of the generated images, this study conducted DTI estimation on dMRI images augmented by the PadGAN and pix2pix methods.
Figure 8 displays the DTI estimation results using FA and MD as examples. In the low b-value replacement experiment, our method demonstrates higher similarity to the original reference dMRI images compared with the pix2pix method. In the experiment of augmenting the original reference dMRI, our method shows smoother results. The last column in the figure demonstrates that the absence of low b-value volumes in dMRI images significantly affects the DTI estimation results. Therefore, low b-value images are crucial for DTI computation.
The experiments above indicate that Xtract and DTI estimation results can reflect the quality of different macaque image generation methods. Therefore, Xtract and DTI estimation are expected to become further validation methods for assessing the quality of generated macaque or medical images.
4. Discussion
In this work, we propose a method for dMRI brain image data augmentation using PadGAN to generate low b-value images. The introduction of peak information maps creates end-to-end conditions for extracting latent space features, allowing the auxiliary network to obtain latent space features through adversarial learning. On the basis of the U-Net network, a feature fusion module is added to the primary generator to merge latent space features and multi-scale information, thus generating images with rich details. Additionally, various generative adversarial network models are explored, and the strengths and weaknesses of each model are analyzed. PadGAN is creatively proposed and compared with comparative models in qualitative, quantitative, Xtract probabilistic tractography and DTI estimation to demonstrate its overall performance. Finally, ablation experiments are conducted on each module of PadGAN to demonstrate the importance of each part.
Both generators in PadGAN adopt the encoder–decoder architecture based on U-Net, preserving multi-scale information through skip connections, and the introduction of latent space features enables PadGAN to learn fine-grained image features. As shown in
Figure 5 and
Figure 6, unlike previous studies on human brain datasets where Transformer-based network models yield poor results, typically due to the large volume of data in human brain datasets resulting in different model parameters for each dataset, our approach uses a unified training strategy for the limited datasets of macaque brain images from each site. For datasets collected from each site, there are significant differences in acquisition parameters. Therefore, attention mechanisms are difficult to perform effectively for multi-site datasets. While ResNet can maintain model learning capability, even with deep network layers, it does not preserve multi-scale features like U-Net, resulting in deficiencies in detail generation. The pix2pix method based on U-Net demonstrates good performance, but, as a single generator and discriminator method, it still lacks in generating image details. Although CycleGAN has two generative adversarial networks, these networks are tasked with mutual conversion between two modal data samples and do not leverage both networks to generate images in one target domain. The auxiliary network in PadGAN provides latent space information to the primary network to enhance the detail generation of the generated images, utilizing U-Net’s skip connections to preserve multi-scale information, resulting in superior performance in image details.
Unlike the typical computer vision image-to-image translation domain, the signal intensity values of MRI images have absolute significance and can be used for DTI estimations or neuroimaging studies. Common images in daily life are usually RGB images with a maximum pixel intensity value of 255, while the signal intensity value range of macaque brain images is typically in the range of thousands to tens of thousands. Therefore, when evaluating the quality of MRI image generation, we can go beyond quantitative metrics and qualitative observations. For medical MRI, some researchers conduct Turing tests with expert radiologists to assess the authenticity of generated images [
32]. For macaque and human brain images used in research, we can further evaluate the quality of generated images by calculating neural tracing or DTI estimation results, which presents a novel validation approach.
In future work, we can explore the generation of realistic images using multi-modal data. Although the macaque brain imaging dataset is limited, with few data within each site, many sites have at least two modalities of data. Utilizing network models that can effectively leverage multi-modal information may lead to the generation of higher quality images. Additionally, our method also has the potential for application in human brain imaging. Firstly, our method can be used for data augmentation of human brain dMRI images. Although human brain images typically have a higher spatial resolution and signal-to-noise ratio, and there are more publicly available datasets with better data quality, there may still be issues with insufficient collection of low b-value images due to operator and configuration issues. In such challenges, applying our method directly to human brain images is a good choice. Secondly, our method has the potential for application in classification studies of normal and diseased brain images. By using the PadGAN method to generate more images of a certain modality, the image sample size can be expanded, thereby improving classification accuracy. However, diseased images typically require higher precision in a certain region, and it may be a good choice to introduce attention mechanisms to enhance contextual information.