IFGAN: Pre- to Post-Contrast Medical Image Synthesis Based on Interactive Frequency GAN

Lei, Yanrong; Xu, Liming; Wang, Xian; Fan, Xueying; Zheng, Bochuan

doi:10.3390/electronics13224351

Open AccessArticle

IFGAN: Pre- to Post-Contrast Medical Image Synthesis Based on Interactive Frequency GAN

by

Yanrong Lei

¹,

Liming Xu

^1,2,

Xian Wang

^1,*,

Xueying Fan

¹ and

Bochuan Zheng

¹

School of Computer Science, China West Normal University, Nanchong 637009, China

²

College of Computer Science, Sichuan University, Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4351; https://doi.org/10.3390/electronics13224351

Submission received: 29 September 2024 / Revised: 30 October 2024 / Accepted: 4 November 2024 / Published: 6 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Medical images provide a visual representation of the internal structure of the human body. Injecting a contrast agent can increase the contrast of diseased tissues and assist in the accurate identification and assessment of conditions. Considering the adverse reactions and side effects caused by contrast agents, previous methods synthesized post-contrast images with pre-contrast images to bypass the administration process. However, existing methods pay inadequate attention to reasonable mapping of the lesion area and ignore gaps between post-contrast and real images in the frequency domain. Thus, in this paper, we propose an interactive frequency generative adversarial network (IFGAN) to solve the above problems and synthesize post-contrast images from pre-contrast images. We first designed an enhanced interaction module that is embedded in the generator to focus on the contrast enhancement region. Within it, target and reconstruction branch features interact to control the local contrast enhancement region feature and maintain the anatomical structure. We propose focal frequency loss to ensure the consistency of post-contrast and real images in the frequency domain. The experimental results demonstrated that IFGAN outperforms other sophisticated approaches in terms of preserving the accurate contrast enhancement of lesion regions and anatomical structures. Specifically, our method produces substantial improvements of 7.9% in structural similarity (SSIM), 36.3% in the peak signal-to-noise ratio (PSNR), and 8.5% in multiscale structural similarity (MSIM) compared with recent state-of-the-art methods.

Keywords:

feature interaction; frequency constraint; contrast enhancement; generative adversarial network; medical image synthesis

1. Introduction

Medical imaging technology provides high-resolution images of internal structure and helps doctors make diagnoses and treatment plans. Medical imaging techniques such as computed tomography (CT) and magnetic resonance imaging (MRI) have become standard tools for clinical diagnosis [1]. By using contrast agents to enhance the visibility of diseased tissue and comparing pre- and post-contrast medical images, we can more clearly show the differences between diseased tissue and healthy tissue. Contrast-enhanced CT (CECT) improves the visualization of blood vessels and tissues, and T1-weighted contrast-enhanced magnetic resonance imaging (T1CE) enhances soft tissue contrast and is widely used for tumor detection. It has been reported that adverse reactions and side effects caused by contrast agents potentially harm the health of patients [2]. Iodine-based contrast agents can cause severe allergic reactions, whereas gadolinium-based contrast agents increase the risk of nephrogenic systemic fibrosis (NSF). The risk of adverse reactions of ionic contrast agents can be up to 0.12 [3], and even the European Medicines Agency recommends limiting some intravenous linear agents to prevent the deposition of contrast agents in human tissues [4], which may cause unknown health problems. Therefore, synthesizing post-contrast medical images without injecting contrast agents into the body is a valuable technique for practical diagnosis and treatment [5].

GAN-based medical image synthesis provides the ability to bypass the contrast agent administration process to obtain post-contrast medical images. Thus, several generative models have been proposed to synthesize high-quality and perceptually realistic post-contrast images from pre-contrast images. However, existing methods still face problems, including (1) insufficient attention paid to local contrast enhancement regions and (2) missing frequency information. Specifically, contrast-enhanced regions are often concentrated in specific anatomical structures or lesion areas. As shown in Figure 1, contrast enhancement regions in post-contrast images obtained via previous methods [6,7,8,9,10,11,12] are often incomplete or ignored. These regions also tend to fit easy-to-synthesize frequencies, so important frequency information is easily ignored. Gaps between real and post-contrast medical images in the frequency domain cause important frequency information to be lost, and image textures and details in the spatial domain to be blurred or even distorted. In addition, current methods enhance images by reducing pixel-level differences but struggle to capture contrast enhancement regions in biological tissues.

Inspired by the above problems, we propose an I nteractive Frequency Generative Adversarial Network (IFGAN) for pre-to-post-contrast medical image synthesis. First, we propose an enhanced interaction module (EIM) to force the model to focus on the contrast enhancement region. Next, we propose focal frequency loss (FFL) to ensure the consistency of real and post-contrast images in the frequency domain and to prevent the loss of important frequency information. In addition, we design feature interactions to achieve fine-grained control of local lesions and eliminate irrelevant details to promote synthesis. The main contributions of this paper can be summarized as follows:

We propose a novel pre-to-post-contrast medical image synthesis method that preserves frequency information and anatomical structure to avoid the risk of adverse reactions and side effects caused by contrast agents.
We propose an enhanced interaction module to focus on the contrast enhancement region, where the features of the target and reconstruction branches interact, to control contrast enhancement feature synthesis and maintain the anatomical structure.
We introduce focal frequency loss to narrow the gap between the real and post-contrast images in the frequency domain and to prevent the loss of frequency information, further maintaining clinically relevant features and texture structure.
Experiments show that our method achieves satisfactory post-contrast synthesis and substantial performance improvement compared with recent state-of-the-art (SOTA) methods.

2. Related Work

Existing deep medical image synthesis methods include CNN-, UNet-, GAN-, transformer-, and diffusion-based methods [13]. Considering that GANs have been widely used in image synthesis [14], data augmentation [15], and cross-modal medical image synthesis [16], we focus mainly on GAN-based methods. Essentially, any nonlinear GAN with a paired source and target image can be used to achieve pre-to-post-contrast medical image synthesis. To implement medical image synthesis, we generally divide our approach into cross-modal generation, high-quality reconstruction, and contrast enhancement. Related issues can be summarized as follows.

The cross-modal generation method regards the desired image as the target image and builds a GAN framework to output a target image from the source image, where a post-contrast medical image can be generated with the trained GAN by treating pre- and post-contrast medical images as source and target images, respectively. Along these lines, pGAN [17] employs a cycle-consistency strategy for multi-contrast MRI synthesis. BPGAN [7] proposes an end-to-end bidirectional prediction method, facilitating flexible cross-modal synthesis between CT and MRI images. Similarly, Bi-MGAN [10] integrates deep and handcrafted features to constrain feature generation and achieves multi-modal MRI image generation. DC-cycleGAN [11] regards source samples as negative samples and applies dual-contrast loss to map learned samples away from source images. The authors of [18] fused radiological features from CT to MRI and identified lesion areas by selecting anchor boxes with the greatest differences in radiomic features across various scales in CT images. FACGAN [19] generates CT images from MRI images by incorporating residual-frequency channel attention using a frequency cycle strategy to extract more comprehensive tissue structure information. MGDGAN [20] employs a mask estimation network to guide the generation of different tissues in CT images, resulting in more accurate brain lesion synthesis. The authors of [21] utilized a cycle-consistent structure to eliminate the need for paired data with perceptual loss, further highlighting high-frequency texture details. SC-GAN [22] presents truncation loss using a segmentation model to address missing anatomical structures in truncated regions during synthetic CT (sCT) generation. GAN-based cross-modal synthesis methods usually consider direct mapping from the source domain to the target domain [23], which requires supervised learning of prior knowledge [24] to ensure the consistency of cross-modal translation.

The high-quality reconstruction method regards the desired image as a high-quality image and builds a GAN framework to output a high-quality image with a low-quality image. In this way, post-contrast medical images can be reconstructed with the trained GAN by treating pre-contrast and post-contrast medical images as low- and high-quality images, respectively. To this end, AR-GAN [25] introduces a two-stage learning model to determine and dynamically adjust correction parameters for each pixel, generating high-quality SPET images from LPET images. Similarly, the authors of [26] applied a two-stage GAN to map low-quality ultrasound images to their high-quality counterparts. Ea-GAN [27] enhances edge information perception using the Sobel detection operator. Vessel-GAN [28] utilizes expert knowledge to design filters based on the structure of blood vessels, allowing this GAN framework to generate more credible coronary CT angiography (CTA) images from myocardial CT perfusion (CTP) data. Ref. [29] introduced a multiscale generator architecture combined with a channel-mask attention module, which significantly improves the quality of synthesized contrast-enhanced CT (CECT) images. RG-GAN [30] designs a specific data augmentation module using low-cost, non-real, labeled data to improve lesion preservation in PET images.

The contrast enhancement method regards the desired image as a post-contrast image and outputs a post-contrast image with a pre-contrast image. Along these lines, DCE-MRI [12] integrates perceptual and pixel-level features to transform non-contrast breast MRIs into corresponding contrast-enhanced sequences. BICEPS [31] uses feature decoupling to improve the alignment of pre- and post-contrast MRI sequences. Considering image misalignment in contrast enhancement, RegGAN [9] adaptively fits the noise distributions of unpaired images using a registration network. The authors of [32] utilized self-supervised learning and dual-energy CT (DECT) to achieve high-quality contrast enhancement with registered non-contrast CT (NCCT) and contrast-enhanced CT (CECT) image pairs embedded. The authors of [33] integrated a deformation field learning network with a 3D generator to reduce misalignment and realize the joint synthesis and deformation registration of abdominal CECT images and generate CECT images from NCCT images. SGCDD-GAN [34] emphasizes key areas by adopting multi-task learning and employing a dual-decoder generator, which ensures that the generator focuses more on lesion areas during NCCT-to-CECT enhancement.

Although some progress has been made in the above methods, there are still several limitations. First, almost all the mentioned GAN-based synthesis methods lack attention to features of local contrast enhancement regions, which leads to blurred details and even missing key regions in the generated post-contrast images. Moreover, most of the current methods do not consider frequency information in the frequency domain. To address these issues, we propose IFGAN for pre-to-post-contrast medical image synthesis. Specifically, we first propose the EIM to focus on the local contrast enhancement region. Then, we introduce focal frequency loss to narrow the gap between the post-contrast and real images in the frequency domain and to prevent the loss of important frequency information to maintain texture and edge details. The subsequent analysis is provided to demonstrate that the optimization can be differential and that focal frequency loss is effective in improving synthesis.

3. The Proposed Method

In this section, we first define notations and the goal of this research. Then, we present the deep architecture of the proposed IFGAN in Figure 2, in which one generator and one discriminator are involved. The former encodes pre-contrast images into feature representations and fuses different task information to be mapped to different images, i.e., post-contrast images and reconstructed images. The latter determines whether the post-contrast and real images belong to the corresponding domain. Then, we provide the loss function and training algorithm.

3.1. Problem Definition and Notations

In this section, we introduce necessary notation and the problem definition. Given the pre-contrast image (

I_{s}

), real image (

I_{t}

), and target label (

t, t \in {0, 1}

), the goal is to train the generator to synthesize a post-contrast image (

{\tilde{I}}_{t}

) with the pre-contrast image (

I_{s}

) and target label (t) on the basis of less distortion between the pre-contrast image (

I_{s}

) and the reconstructed image (

{\tilde{I}}_{s}

).

During training, the encoder (

E_{n}

) in the generator (G) first encodes the spatial information feature (z) of the pre-contrast image (

I_{s}

), denoted as

z = E_{n} (I_{s})

, then inputs it into the dual-branch decoder (

D_{t}

and

D_{s}

), which is used to decode the target post-contrast image (

{\tilde{I}}_{t}

) and reconstructed image (

{\tilde{I}}_{s}

) according to the target label (t) and reconstruction label (s), i.e.,

{\tilde{I}}_{t} = D_{t} (z, t)

and

{\tilde{I}}_{s} = D_{s} (z, s)

, respectively. Then, the discriminator (D) is used to determine whether the post-contrast image (

{\tilde{I}}_{t}

) and real image (

I_{t}

) belong to target domain. Finally, the generator (G) and discriminator (D) are jointly trained, and the optimization can be defined as follows:

min_{G} max_{D} L_{a d v} = E [log D (I_{t}, t)] + E [log (1 - D (G (I_{s}, t), t))]

(1)

where

G (I_{s}, t)

represents the post-contrast image (

{\tilde{I}}_{t}

). Note that the goal is to obtain a high-quality post-contrast image, so we mainly train the branch decoder (

D_{t}

) to be optimal. Through the EIM and weight sharing, the branch decoder (

D_{t}

) can interact with

D_{s}

to improve the attention on the local contrast enhancement region and keep the anatomical structure unchanged.

3.2. Deep Architecture of IFGAN

3.2.1. Generator

The generator, which contains one encoder and a dual-branch decoder, is designed to synthesize post-contrast images from pre-contrast images. The encoder encodes the pre-contrast image (

I_{s} \in R^{h \times w \times 3}

) into a low-resolution feature map (z) with dimensions of 256 × 64 × 64, and the dual-branch decoder decodes z to pixel space according to task labels (t and s) and obtains a post-contrast image (

{\tilde{I}}_{t}

) and reconstructed image (

{\tilde{I}}_{s}

).

The encoder includes one convolution layer and four residual blocks. The first two residual blocks are embedded with a max pooling layer for downsampling, and the last two residual blocks preserve the shape of the feature tensor. The decoder consists of one deconvolution layer and four residual blocks. The first two residual blocks interact with feature information through the EIM, and the last two residual blocks are upsampled by nearest-neighbor interpolation to avoid a possible checkerboard effect in transposed convolution and, finally, input into the deconvolution layer to synthesize a post-contrast medical image. The residual block promotes information flow in the deep network and identity mapping mechanisms, which have been proven to be effective in alleviating gradient vanishing. Adaptive instance normalization (AdaIN) [35] is also adopted in the decoder.

3.2.2. Discriminator

The discriminator distinguishes whether an input image is a post-contrast image or a real image, and it consists of three convolutional layers and six residual blocks. The residual blocks are downsampled bya max pooling layer to obtain an intmerediate tensor of 512 × 4 × 4. Then, the convolutional layers with convolution kernel sizes of 4 × 4 and 1 × 1 are used to output a vector with dimensions of 2 × 1 × 1, indicating the probability that the input image belongs to the real image class. Spectral normalization is introduced into each residual block to constrain discriminative ability to avoid unstable training and sub-optimal performance.

3.2.3. Enhanced Interaction Module

Considering that label guidance and paired supervised learning cannot ensure detailed local features, i.e., contrast enhancement region, in post-contrast images, we define the enhanced interaction module (EIM) to focus on local contrast-enhanced features, where the interaction of the target and reconstruction branch features controls the local contrast enhancement region, with the label serving as guidance to tell model the mapping direction but without providing more information to affect image synthesis. As shown in Figure 2, the EIM calculates the difference between post-contrast features (

f_{t}

) and reconstructed features (

f_{s}

) and maintains post-contrast features through the target label (t). It can be represented as follows:

f_{e} = E I M ((f_{t} - f_{s}), t)

(2)

where

f_{t} - f_{s}

highlights differences that need to be noted, and the target label is fused through AdaIN. At the same time, a convolution layer with convolution kernel size of 3 × 3 is used to continue to map to obtain local features, which guide focus on local features, that is,

f_{t}^{'} = c o n v (f_{e} \otimes f_{t})

(3)

where ⊗ represents a concatenation operation. A concatenated feature (

f_{e} \otimes f_{t}

) is mapped to the output feature (

f_{t}^{'}

) through a convolution layer of with a kernel size of 3 × 3. The EIM enhances feature expression through timely information interaction and enables the generator to control local contrast-enhanced feature synthesis, as verified by subsequent experiments.

3.3. Loss Function

To obtain realistic visually perceived post-contrast medical images, we introduce adversarial loss, pixel-wise mean absolute error, focal frequency loss, and reconstruction loss. The first item is used to drive the generated post-contrast medical images into the manifold of the real images. The second two items reduce the gap between the post-contrast and real images in the spatial and frequency domains, respectively. Additionally, reconstruction loss guides the generation of reconstructed images to assist in maintaining the anatomical structure of post-contrast images.

3.3.1. Adversarial Loss

Adversarial loss supports the generator (G) and the discriminator (D) in the form of adversarial training and ultimately enables the generator to generate realistic post-contrast images from pre-contrast images. During training, the encoder extracts the high-dimensional latent features (z) from pre-contrast images (

I_{s}

), i.e.,

z = E_{n} (I_{s})

. Then, synthesized post-contrast medical images (

{\tilde{I}}_{t}

) with target features can be obtained with the guidance of label t, i.e.,

{\tilde{I}}_{t} = D_{t} (E_{n} (I_{s}), t)

. The adversarial loss is defined as

L_{G e n} = E [log D (I_{t}, t)] + E [log (1 - D (D_{t} (E_{n} (I_{s}), t), t)]

(4)

where

I_{t}

represents the real post-contrast image,

D_{t}

represents the post-contrast synthesis branch decoder, and t represents the target label that guides post-contrast medical synthesis. During training, the discriminator distinguishes whether post-contrast and real images belong to the same domain, which indirectly enhances the fitting and generation ability.

3.3.2. Pixel-Wise Mean Absolute Error

The pixel-wise mean absolute error contributes to reducing content differences between post-contrast images (

{\tilde{I}}_{t}

) and real images (

I_{t}

). Calculating and reducing the pixel-level Manhattan distance between them allows IFGAN to learn the corresponding content relationship and capture and retain clinical information, including the location and shape of lesions, as well as normal anatomical structures. It is defined as follows:

L_{p M A E} = \frac{1}{W H} \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} {∥{I_{t}}_{(i, j)} - {\tilde{I}}_{t (i, j)}∥}_{1}

(5)

where W and H denote the width and height of the post-contrast image, respectively.

{I_{t}}_{(i, j)}

and

{\tilde{I}}_{t (i, j)}

represent the pixel values of the real and target post-contrast image at coordinates of

(i, j)

, respectively.

3.3.3. Focal Frequency Loss

The focal frequency loss preserves frequency information in the target post-contrast image (

{\tilde{I}}_{t}

), which avoids the loss of high-frequency information and frequency-spectrum region shifting. The high-frequency components generally correspond to fast-changing regions. By minimizing the frequency gap in the frequency domain, the synthesis of edges and details can be better controlled, which improves visual perception and preserves texture structure. Focal frequency loss is defined as follows:

L_{f f l} = \frac{1}{W H} \sum_{u = 0}^{W - 1} \sum_{v = 0}^{H - 1} ω (u, v) {|F_{t} (u, v) - F_{g} (u, v)|}^{2}

(6)

with

F (u, v) = \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} f (i, j) \cdot e^{- γ 2 π (\frac{u i}{M} + \frac{v j}{N})}

(7)

where

f (i, j)

represents the pixel value at spatial-domain coordinates of

(i, j)

. e and

γ

are the Euler number and imaginary unit in the Euler formula (

e^{γ θ} = cos θ + γ sin θ

), respectively.

F_{t} (u, v)

and

F_{g} (u, v)

are frequency representations of the real (

I_{t}

) and post-contrast (

{\tilde{I}}_{t}

) image, respectively, and

ω (u, v)

represents the spatial-frequency weight at coordinates of

(u, v)

.

3.3.4. Reconstruction Loss

The reconstruction loss is designed to maintain the anatomical structure of the pre-contrast image (

I_{s}

). The reconstruction branch decoder (

D_{s}

) provides timely interactive information with the EIM module to focus on local contrast-enhanced features, which shares feature weights of anatomical structure features with the post-contrast branch decoder (

D_{t}

), which embeds anatomical structure features from the pre-contrast image into the post-contrast image. Reconstruction loss can be defined as follows:

L_{r e c} = \frac{1}{W H} \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} {∥{I_{s}}_{(i, j)} - D_{s} {(E_{n} (I_{s}), s)}_{(i, j)}∥}_{1}

(8)

where

{\tilde{I}}_{s} = D_{s} (E_{n} (I_{s}), s)

denotes the reconstructed image and s denotes the reconstruction label.

3.3.5. Objective Function and Algorithm

By merging all abovementioned losses together, i.e., adversarial, pixel-wise mean absolute error, focal frequency, and reconstruction loss, adversarial optimization can be summarized as

min_{G} max_{D} L_{G e n} + α L_{p M A E} + β L_{f f l} + λ L_{r e c}

(9)

where

α

,

β

, and

λ

are weight-balance hyper-parameters. We use alternated learning and a gradient descent strategy to update the parameters of IFGAN until convergence. The training process of IFGAN is summarized as Algorithm 1.

Algorithm 1 The learning algorithm of IFGAN

Require:

Input image set X; real image set Y; target label t; reconstruction label s; mini-batch sampler

S A

.

Ensure:

Generator G with parameters

θ_{g}

; discriminator D with parameters

θ_{d}

;

Initialization

batch size K; iteration count T; learning rate

l r

.

1:: for iter < T do
2:: $(I_{s}, I_{t}) \leftarrow S A (X, Y)$ Randomly sample a batch of input images and their paired real images;
3:: ${\tilde{I}}_{s}, {\tilde{I}}_{t} \leftarrow G (I_{s}, s, t)$ Generate the target image and the reconstructed image of the input image;
4:: $θ_{d} \leftarrow σ_{s p} (θ_{d})$ Spectral normalization of discriminator parameters $θ_{d}$ ;
5:: Compute gradients of parameters $θ_{d}$ based on the following formula:
$\frac{\partial L_{G e n}}{\partial θ_{d}} = \frac{1}{k} \nabla θ_{d} \sum_{k = 1}^{K} [log D_{θ_{d}} (I_{t}^{k}, t) + log (1 - D_{θ_{d}} ({\tilde{I}}_{t}^{k}, t))]$
6:: $L_{G e n} \leftarrow L_{G e n} + l r \cdot \partial L_{G e n} / \partial θ_{d}$ Update gradients of parameters $θ_{d}$ ;
7:: Compute gradients of parameters $θ_{g}$ based on the following formula:
$\begin{array}{l} \frac{\partial L_{G}}{\partial θ_{g}} = \frac{1}{k} \nabla θ_{g} \sum_{k = 1}^{K} {log [1 - D ({G_{θ}}_{g} (I_{s}^{k}, t), t)] \\ + \frac{1}{W H} \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} |I_{t (i, j)}^{k} - {G_{θ}}_{g} (I_{s (i, j)}^{k}, t)| \\ + \frac{1}{W H} \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} |I_{s (i, j)}^{k} - {G_{θ}}_{g} (I_{s (i, j)}^{k}, s)| \\ + \frac{1}{W H} \sum_{u = 0}^{W - 1} \sum_{v = 0}^{H - 1} ω (u, v) {|F_{t} (u, v) - F_{g} (u, v)|}^{2}} \end{array}$
8:: $L_{G} \leftarrow L_{G} + l r \cdot \partial L_{G} / \partial θ_{g}$ Update generator parameters $θ_{g}$ ;
9:: end for

4. Frequency-Domain Optimization Analysis

As summarized in Section 2, almost all medical image synthesis methods based on deep learning mainly focus on optimizing the spatial domain, which can fail to capture differences in the frequency domain and lead to blurred regions and a poor structure. As shown in Figure 3a,b, gaps between the real image and the generated image in the frequency domain lead to the loss of important frequency information, and the texture details of the image in the spatial domain are blurred or even distorted. To address this, we define and embed focal frequency loss to improve synthesis quality and optimize differences in the frequency domain between post-contrast and real images. A detailed analysis of frequency-domain optimization is provided below.

Due to spectral bias, deep models tend to learn low-frequency components and eschew frequency components, which can make it difficult to synthesize frequencies and details [36]. To this end, we designed focal frequency loss to reduce the weight of easy frequencies using a dynamic-spectrum weight matrix, which prompts the model to focus on hard frequencies that are difficult to generate. First, the real image (

I_{t}

) and post-contrast image (

{\tilde{I}}_{t}

) are converted to the frequency domain according to the discrete Fourier transform (DFT), which is defined as

\begin{matrix} F_{t} (u, v) = \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} f_{t} (i, j) \cdot e^{- γ 2 π (\frac{u i}{M} + \frac{v j}{N})} \\ F_{g} (u, v) = \sum_{i = 0}^{W - 1} \sum_{j = 0}^{H - 1} f_{g} (i, j) \cdot e^{- γ 2 π (\frac{u i}{M} + \frac{v j}{N})} \end{matrix}

(10)

where

W \times H

denotes the width and height of images, respectively.

F_{t} (u, v)

and

F_{g} (u, v)

represent frequency-domain representations of the real image (

I_{t}

) and target post-contrast image (

{\tilde{I}}_{t}

), respectively.

γ

is an imaginary unit. The real and the target post-contrast image can be expressed by complex numbers as follows:

\begin{matrix} F_{t} (u, v) = R_{t} (u, v) + I_{t} (u, v) γ = a + b γ \\ F_{g} (u, v) = R_{g} (u, v) + I_{t} (u, v) γ = c + d γ \end{matrix}

(11)

where

R (u, v)

and

I (u, v)

represent real and the imaginary part, respectively. The amplitude and phase of

F_{t} (u, v)

can be expressed as

|F_{t} (u, v)| = \sqrt{a^{2} + b^{2}}

and

∠ F_{t} (u, v) = arctan (b / a)

, respectively. Similarly,

|F_{g} (u, v)| = \sqrt{c^{2} + d^{2}}

and

∠ F_{g} (u, v) = arctan (d / c)

denote the amplitude and phase of

F_{g} (u, v)

, respectively.

FFL represents

{\vec{r}}_{t}

and

{\vec{r}}_{g}

as two vectors mapped from

F_{t} (u, v)

and

F_{g} (u, v)

, respectively. According to the definitions of amplitude and phase, vectors

|{\vec{r}}_{t}|

and

|{\vec{r}}_{t}|

correspond to the amplitude, and angles

∠ F_{t} (u, v)

and

∠ F_{g} (u, v)

correspond to the phase. Therefore, the frequency distance corresponds to the distance between

{\vec{r}}_{t}

and

{\vec{r}}_{g}

, which takes into account the size and angle of the vector. By using square Euclidean distance, it can be expressed as

d ({\vec{r}}_{t}, {\vec{r}}_{g}) = {∥{\vec{r}}_{t} - {\vec{r}}_{g}∥}_{2}^{2} = {|F_{t} (u, v) - F_{g} (u, v)|}^{2}

(12)

Furthermore, the frequency distance between the real and target post-contrast images can be expressed as

d (F_{t}, F_{g}) = \frac{1}{W H} \sum_{u = 0}^{W - 1} \sum_{v = 0}^{H - 1} {|F_{t} (u, v) - F_{g} (u, v)|}^{2}

(13)

A dynamic spectrum weight matrix (W) is used to focus on frequency components that are difficult to synthesize. The matrix element (

ω (u, v)

) is defined as follows:

w (u, v) = {|F_{t} (u, v) - F_{g} (u, v)|}^{α}

(14)

where

α

is the scale factor of flexibility. The frequency distance matrix and the dynamic-spectrum weight matrix are multiplied by the Hadamard product, and the average value is calculated. The focal frequency loss formula is defined as follows:

F F L = \frac{1}{W H} \sum_{u = 0}^{W - 1} \sum_{v = 0}^{H - 1} ω (u, v) {|F_{t} (u, v) - F_{g} (u, v)|}^{2}

(15)

where the dynamic-spectrum weight matrix (

ω (u, v)

) is updated according to the non-uniform distribution of each frequency loss in the current training process.

From Equation (15), we can generally conclude that

F F L

is differentiable, since

ω (u, v)

and

{(F_{t} (u, v) - F_{g} (u, v))}^{2}

are differentiable. Thus, the derivative of

F F L

with respect to parameters

θ_{g}

of the generator can be computed, which means that any simple stochastic gradient descent algorithm can be used for model optimization. Some have reported that the loss of frequency information leads to blurred texture and edge details [25,32] and that the weight of each frequency in Equation (13) is identical, which indicates that the model still produces inherent bias and blurred texture. In contrast, our method uses the dynamic-spectrum weight matrix and frequency distance for element-wise multiplication in Equation (15), i.e.,

W ⊙ |F_{t} - F_{g}|

, which prevents IFGAN from favoring easy frequencies by assigning high weights to the hard frequencies. Therefore, it effectively prevents the loss of frequency information and further improves the quality of spatial-domain image synthesis, as verified by subsequent experimental results.

5. Experiment

To verify the effectiveness of IFGAN, we conducted extensive experiments on two public datasets. First, we compare the proposed IFGAN with recent state-of-the-art (SOTA) synthesis methods and analyze the qualitative and quantitative results. Then, we analyze the constraints of FFL on frequency information and provide corresponding results. We also use ablation experiments to verify the effectiveness of the EIM and each loss item. The experiment is based on Python 3.9 deep learning framework PyTorch and OpenCV.

5.1. Training Datasets

BraTS (http://braintumorsegmentation.org/, accessed on 1 July 2021) [37], collected for brain tumor segmentation, consists of brain MRI scans provided by multiple medical centers. Each includes four modal brain images, namely Flair, T1, T1CE, and T2. T1CE refers to T1-weighted enhanced images. The original 3D image and Whole Tumor (WT) mask were converted into paired axial 2D slices which were randomly divided in a ratio of 7:2:1, resulting in 2352, 672, and 336 pairs for the training, testing, and validation sets, respectively.

SegRap (https://segrap2023.grand-challenge.org, accessed on 14 April 2023), a dataset for segmentation of organs at risk (OAR) and gross tumor volume (GTV) in patients with nasopharyngeal carcinoma, includes a total of 200 patients with CT and CECT data. Within it, 120 training data are publicly provided with images and annotations, adjusted to the window width and window level and standardized into two-dimensional paired continuous slices. The slices with lesion areas were screened and randomly divided in a ratio of 7:2:1, resulting in 1344, 384, and 192 pairs for the training, testing, and validation sets, respectively.

5.2. Implementation Details and Metrics

All the experiments were conducted on an Intel(R) Xeon(R) Gold6148CPU @ 2.6 GHz (Intel Corporation, Santa Clara, CA, USA) with 20 cores and 7 Tesla V100-SXM2 GPUs (NVIDIA Corporation, Santa Clara, CA, USA) using the same settings to ensure impartiality and objectivity. An Adam optimizer with

(β_{1}, β_{2}) = (0.9, 0.99)

was adopted, and the initial learning rate was set to

10^{- 4}

. The batch size was set to 4, and all image resolutions were set to 256 × 256. The weights of the

α

,

β

, and

λ

loss functions were set to 100, 1, and 1, respectively.

To evaluate the synthesis performance of IFGAN, we compared it with seven state-of-the-art methods, including medical image synthesis methods RIED-Net [6], BPGAN [7], RegGAN [9], Bi-MGAN [10], DC-cycleGAN [11], and DCE-MRI [12], as well as the StarGAN V2 adaptive domain synthesis method [8]. All these methods can be used to achieve pre-to-post-contrast medical image synthesis, with source codes or interfaces provided for training and the performance of standardized comparisons. We re-trained all the models on our dataset and platform.

We used structural similarity (SSIM) [13] and multiscale structural similarity (MSIM) [10] to evaluate structural similarity and used the peak signal-to-noise ratio (PSNR) [13], normalized root-mean-square error (NRMSE) [10], and perceptual image-patch similarity (LPIPS) [38] to evaluate visual perception. Additionally, we used Frechet inception distance (FID) [10] and GAN-seg [10] to measure manifold fittingness and the similarity of the features of contrast enhancement lesion regions. Among them, lower NRMSE, LPIPS, and FID values indicate better performance, and higher SSIM, PSNR, MSIM, and GAN-seg values reflect greater similarity to real images. Corresponds to symbols in all tables, ↑ means higher is better, ↓ means lower is better.

5.3. Training Time and Throughput

To evaluate learning efficiency and computational consumption, we considered training time and throughput. The former measures the time required for the model to reach convergence, and the latter represents the number of images that the model can process per unit of time. A short training time and high throughput show that the model possesses high efficiency.

As shown in Table 1, IFGAN requires less training time than BPGAN, StarGAN V2, and DC-cycleGAN. Compaed with Bi-MGAN and DCE-MRI, our IFGAN has higher throughput. Although DCE-MRI, RIED-NET, and RegGAN have advantages in training time, our IFGAN has demonstrated commendable performance on both BraTS and SegRap while using just a moderate level of processing resources.

To test the training process, we drew the training loss curves of the IFGAN model on different datasets. In Figure 4, (a) represents the training loss on BraTS, and (b) represents the training loss on SegRap. In (a) and (b), the generative loss shows a downward trend, while the discriminative loss shows an increasing trend, which is consistent with the loss optimization goal. Obviously, the loss curves of the generator and the discriminator show an opposite fluctuation pattern and gradually tend toward an equilibrium state with the increase in the number of epochs, which indicates that our model can effectively converge.

5.4. Qualitative Results

To evaluate the visual perception of synthesized post-contrast images, we provide qualitative results of all methods on the BraTS and SegRap datasets in Figure 5 and Figure 6, respectively. We enlarged local lesion regions to compare the differences and employed and MAE heat map to visualize structural and shape differences between the post-contrast and real images. Regions with large differences are shown as bright colors on the heat map, and fewer color pixels indicate less deformation. The color bar on the right side represents color pixel corresponding to MAE, ranging from 0 to 1.

As shown in Figure 5 and Figure 6, the post-contrast synthesized results of IFGAN are closer to real images than those synthesized by other methods. Specifically, it can be seen from Figure 5 that the introduction of diversity loss in StarGAN V2 leads to a misunderstanding of the anatomical structure of the model. The lack of mandatory constraint of the pixel intensity difference in Reg GAN may lead to obvious noise due to the wrong reflection of the gray value. The synthesis results of DCE-MRI and RIED-Net are obviously blurred. Most importantly, except for RIED-Net, none of them correctly reflected the contrast enhancement region. In Figure 6, it can also be found that DCE-MRI failed to delineate the contrast enhancement region, with blurred edges. The other methods capture more tiny contrast enhancement regions, which may be caused by the smaller target range in the image. These methods do not enhance the extraction of specific features, and there is still a risk of ignoring the contrast enhancement region. Anatomical structure features yielded by RIED-Net, RegGAN, and StarGAN V2 were not correctly learned.

We can see that although IFGAN, BPGAN, Bi-MGAN, and DC-cycleGAN successfully capture tiny contrast enhancement regions while maintaining correct anatomical structure, MAE heat maps of IFGAN are more excellent in terms of detail preservation, with less deviation from real images. Combining all the results in Figure 5 and Figure 6, we can generally conclude that IFGAN successfully maintains contrast enhancement regions and anatomical structure and that the visual perception by our IFGAN is close to real images.

5.5. Quantitative Results

To measure quality of synthesized post-contrast images, we used SSIM, PSNR, MSIM, and NRMSE to evaluate the degree of image distortion and structural changes on the BraTS and SegRap datasets. Quantitative comparison results are shown in Table 2 and Table 3.

As shown in Table 2, compared with other methods, IFGAN achieved the best scores on all evaluation indicators. Specifically, on the BraTS dataset, out method achieved average increments of 10.4% SSIM, 39.7% PSNR, and 14.7% MSIM, while an average decrement in NRMSE of 51.7% was achieved. As shown in Table 3, compared with other advanced methods, IFGAN achieved better PSNR, MSIM, and NRMSE results on the SegRap dataset. Our method achieved average increments of 32.8% PSNR and 2.4% MSIM and an average decrement of 54.2% in NRMSE. It is worth noting that our IFGAN incurred minor drops in SSIM comparing with Bi-MGAN, since it introduces two adversarial systems and requires more training time. Otherwise, our method possesses clear advantages in terms of PSNR and NRMSE.

Combining all results in Table 2 and Table 3, we conclude that our IFGAN achieves superior performance in PSNR and NRMSE, along with competitive results in SSIM and MSIM, owing to the introduction of the enhanced interaction module, which enhances the detail retention ability of the model so that anatomical structure and visual quality can be better maintained.

Apart from the above comparisons, we also used LPILS, FID, and GAN-seg to explore feature similarity and manifold fittingness in latent space. Based on the U-net segmentation model, GAN-seg obtained Dice similarity of segmentation results of the contrast enhancement region in generated post-contrast images. The evaluations are shown in Table 4.

From Table 4, it can be computed that our method achieved average decrements of 59.9% LPIPS and 34.4% FID. Although it exhibited some drops in LPIPS and FID, our method also achieved higher scores than the other methods and a clear advantage in GAN-seg, which indicates that IFGAN successfully captures enhanced regional features.

5.6. Focal Frequency Analysis

To further demonstrate that focal frequency loss preserves important frequency information, we present visualization results and corresponding average spectral images. In Figure 7 and Figure 8, the first rows shows the real post-contrast images and the corresponding average spectral images, the second row shows the generated post-contrast images and the corresponding average spectral images without focal frequency loss, and the third row shows the results with focal frequency loss applied. In the SegRap dataset, images were cropped to focus the frequency conversion on the foreground physiological tissue. The application of FFL on the BraTS dataset yielded clearer textures, while subtle changes are noted in the SegRap results. The average spectral images indicate that focal frequency loss significantly narrows the frequency-domain gap between real and generated post-contrast images across both datasets.

In spectral images, the central area reflects low-frequency components (red and yellow pixels), while the surrounding regions correspond to high-frequency components (blue pixels). A lack of high-frequency information results in image blurring and loss of texture, while spectral shifts can cause distortion of details. Focal frequency loss helps reduce the gap in the frequency domain, preserving essential frequency information and maintaining texture details. Compared to the SegRap dataset, the brain tissue MRI images in the BraTS dataset offer richer anatomical structure information. The focal frequency loss demonstrates a more pronounced improvement in detail for the BraTS dataset, highlighting its effectiveness in preserving real textures.

5.7. Bidirectional Synthesis Analysis

IFGAN can achieve bidirectional synthesis of medical images using a single generator, demonstrating generality. IFGAN can realize both pre-to post-contrast medical image synthesis and post-to-pre-contrast medical image synthesis without the need for re-training for each mapping direction. The qualitative results of the comparison of IFGAN with other bidirectional synthesis comparison methods on the BraTS and SegRap datasets are shown in Figure 9.

As shown in Figure 9, IFGAN achieved generally satisfactory results compared with other bidirectional synthesis methods. In the synthesis of pre-to-post-contrast medical images, IFGAN maintains attention to the enhanced region and anatomical structure details. In the synthesis of post-to-pre-contrast medical images, the results of IFGAN are also closer to real images. To comprehensively evaluate the proposed method, we also provide quantitative results for another mapping direction (post to pre-contrast), as shown in Table 5. The proposed method still achieved the highest scores.

Combining all results in Figure 9 and Table 2, Table 3, Table 4 and Table 5 shows that IFGAN can achieve high-quality bidirectional synthesis of pre- and post-contrast medical images, with wide applicability and flexibility.

5.8. Discussion and Limitations

To evaluate the impact of the designed EIM and various losses on the quality of the generated post-contrast image, we present the design of ablation experiments in this section. Excluding each component separately is helpful in explaining their contributions to preserving anatomical structure and texture details. In Table 6 and Figure 10 and Figure 11, ‘w/o EIM’ represents the variant of the EIM removed by IFGAN, ‘w/o ffl’ represents the variant without focal frequency loss, ‘w/o rec’ represents the variant without reconstruction loss, and ‘w/o pmae’ represents the variant withtout pixel-wise mean absolute error.

As shown in Table 6, the scores of all evaluation indicators in the results of ‘w/o pmae’ decreased significantly, followed by ‘w/o EIM’. After removing the focal frequency loss and reconstruction loss, the scores decreased slightly.

The quantitative results presented in Figure 10 and Figure 11 are consistent with the qualitative results. In Figure 10, the anatomical structure of the generated post-contrast image of ‘w/o pmae’ is blurred or even distorted. The capture of the enhanced region in ‘w/o EIM’ is obviously not enough and slightly inferior to that achieved with the proposed method. Compared with ‘w/o ffl’ and ‘w/o rec’, the proposed method is more realistic in retaining texture details. In Figure 11, the subjective influence of each loss function on the SegRap dataset is not obvious, but the necessity is shown in Table 6.

Figure 10 and Figure 11 demonstrate that the pixel-wise mean absolute error is crucial for preserving the image structure. The EIM component helps to improve the image quality by focusing on the local contrast enhancement region. The reconstruction loss and focal frequency loss further ensure the integrity of the anatomical structure and details. This shows that the EIM and various losses contributed to improving the quality of the generated post-contrast images on the BraTS and SegRap datasets.

Although IFGAN has made significant progress in pre-to post-contrast medical image synthesis. There are still several problems to be solved. First, the EIM enhances the attention to enhanced regional features through a reasonable structural design, but there is a lack of visual and quantifiable methods to explain its extraction of specific features. Secondly, IFGAN shows the potential for bidirectional mapping between two domains, albeit with some flexibility. designing a separate generator for each domain, IFGAN achieves mapping by sharing potential features and adjusting statistics, which makes it possible to extend it to multi-domain adaptive synthesis in future work.

6. Conclusions

In this paper, we propose an interactive frequency generative adversarial network (IFGAN). IFGAN can achieve the synthesis of post-contrast medical images from pre-contrast images. Specifically, we first proposed an enhanced interaction module (EIM) to force the model to focus on local contrast enhancement regions. The EIM is embedded in a generator that shares latent space representations and weights. Within it, the target branch and the reconstruction branch features interact to control the local contrast enhancement region feature and maintain the unchanged anatomical structure. Furthermore, we introduced focal frequency loss to ensure the consistency of the generated post-contrast image and the real post-contrast image in the frequency domain and to prevent the loss of important frequency information, thereby maintaining real texture and edge detail. Qualitative and quantitative experiments showed that the proposed method produces excellent results. In particular, the proposed method is not limited to the synthesis of a single direction. It has the potential to be used for various bidirectional medical image synthesis tasks.

Author Contributions

Conceptualization, Y.L. and L.X.; methodology, L.X.; software, Y.L.; validation, Y.L., X.W. and X.F.; formal analysis, B.Z.; writing—original draft preparation, Y.L.; writing—review and editing, L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62176217, 62206224), the Innovation Team Funds of China West Normal University (No. KCXTD2022-3), the Postdoctoral Science Foundation of China (No. 2023M732428), the Doctoral Research Innovation Project (No. 21E025), and the Sichuan Science and Technology Program (2024ZYD0272).

Data Availability Statement

The source code can be obtained from github.com/LEI-YRO/IFGAN (accessed on 1 August 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Scharitzer, M.; Lampichler, K.; Popp, S.; Mang, T. Computed tomography and magnetic resonance imaging of colonic diseases. Radiologie 2023, 63, 441–450. [Google Scholar] [CrossRef] [PubMed]
McDonald, J.; McDonald, R. MR imaging safety considerations of gadolinium-based contrast agents: Gadolinium retention and nephrogenic systemic fibrosis. Magn. Reason. Imaging Clin. 2020, 28, 497–507. [Google Scholar] [CrossRef] [PubMed]
Hu, T.; Oda, M.; Hayashi, Y.; Lu, Z.; Kumamaru, K.K.; Akashi, T.; Aoki, S.; Mori, K. Aorta-aware GAN for non-contrast to artery contrasted CT translation and its application to abdominal aortic aneurysm detection. Int. J. Comput. Assist. Radiol. 2022, 17, 97–105. [Google Scholar] [CrossRef] [PubMed]
EMA’s Final Opinion Confirms Restrictions on Use of Linear Gadolinium Agents in Body Scans. Available online: https://www.ema.europa.eu/en/documents/press-release/emas-final-opinion-confirms-restrictions-use-linear-gadolinium-agents-body-scans_en.pdf (accessed on 6 August 2023).
Pasquini, L.; Napolitano, A.; Pignatelli, M.; Tagliente, E.; Parrillo, C.; Nasta, F.; Romano, A.; Bozzao, A.; Di Napoli, A. Synthetic post-contrast imaging through artificial intelligence: Clinical applications of virtual and augmented contrast media. Pharmaceutics 2022, 14, 2378. [Google Scholar] [CrossRef] [PubMed]
Gao, F.; Wu, T.; Chu, X.; Yoon, H.; Xu, Y.; Patel, B. Deep residual inception encoder–decoder network for medical imaging synthesis. IEEE J. Biomed. Health Inf. 2019, 24, 39–49. [Google Scholar] [CrossRef]
Xu, L.; Zeng, X.; Zhang, H.; Li, W.; Lei, J.; Huang, Z. BPGAN: Bidirectional CT-to-MRI prediction using multi-generative multi-adversarial nets with spectral normalization and localization. Neural Netw. 2020, 128, 82–96. [Google Scholar] [CrossRef]
Choi, Y.; Uh, Y.; Yoo, J.; Ha, J. StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8188–8197. [Google Scholar]
Kong, L.; Lian, C.; Huang, D.; Hu, Y.; Zhou, Q. Breaking the dilemma of medical image-to-image translation. Adv. Neural Inf. Process. Syst. 2021, 34, 1964–1978. [Google Scholar]
Xu, L.; Zhang, H.; Song, L.; Lei, Y. Bi-MGAN: Bidirectional T1-to-T2 MRI images prediction using multi-generative multi-adversarial nets. Biomed. Signal Process. Control 2022, 78, 103994–104004. [Google Scholar] [CrossRef]
Wang, J.; Wu, Q.; Pourpanah, F. DC-cycleGAN: Bidirectional CT-to-MR synthesis from unpaired data. Comput. Med. Imaging Graph. 2023, 108, 102249–102268. [Google Scholar] [CrossRef]
Osuala, R.; Joshi, S.; Tsirikoglou, A.; Garrucho, L.; Pinaya, W.; Diaz, O.; Lekadir, K. Pre-to post-contrast breast MRI synthesis for enhanced tumour segmentation. In Proceedings of the SPIE Medical Imaging, San Diego, CA, USA, 2 April 2024; pp. 226–237. [Google Scholar]
Dayarathna, S.; Islam, K.; Uribe, S.; Yang, G.; Hayat, M.; Chen, Z. Deep learning based synthesis of MRI, CT and PET: Review and analysis. Med. Image Anal. 2024, 92, 103046–103077. [Google Scholar] [CrossRef]
Zhao, R.; Peng, X.; Kelkar, V.; Anastasio, M.; Lam, F. High-dimensional MR reconstruction integrating subspace and adaptive generative models. IEEE Trans. Biomed. Eng. 2024, 71, 1969–1979. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Yang, J.; Sawan, M. Multichannel synthetic preictal EEG signals to enhance the prediction of epileptic seizures. IEEE Trans. Biomed. Eng. 2022, 69, 3516–3525. [Google Scholar] [CrossRef] [PubMed]
Hu, S.; Lei, B.; Wang, S.; Wang, Y.; Feng, Z.; Shen, Y. Bidirectional mapping generative adversarial networks for brain MR to PET synthesis. IEEE Trans. Med. Imaging 2021, 41, 145–157. [Google Scholar] [CrossRef]
Dar, S.; Yurt, M.; Karacan, L.; Erdem, A.; Erdem, E.; Cukur, T. Image synthesis in multi-contrast MRI with conditional generative adversarial networks. IEEE Trans. Med. Imaging 2019, 38, 2375–2388. [Google Scholar] [CrossRef] [PubMed]
Feng, E.; Qin, P.; Chai, R.; Zeng, J.; Wang, Q.; Meng, Y.; Wang, P. MRI generated from CT for acute ischemic stroke combining radiomics and generative adversarial networks. IEEE J. Biomed. Health Inf. 2022, 26, 6047–6057. [Google Scholar] [CrossRef]
Wei, K.; Kong, W.; Liu, L.; Wang, J.; Li, B.; Zhao, B.; Li, Z.; Zhu, J.; Yu, G. CT synthesis from MR images using frequency attention conditional generative adversarial network. Comput. Biol. Med. 2024, 170, 107983–107998. [Google Scholar] [CrossRef]
Luo, Y.; Zhang, S.; Ling, J.; Lin, Z.; Wang, Z.; Yao, S. Mask-guided generative adversarial network for MRI-based CT synthesis. Knowl.-Based Syst. 2024, 195, 111799–111808. [Google Scholar] [CrossRef]
Sun, J.; Jiang, J.; Ling, R.; Wang, L.; Jiang, J.; Wang, M. Bidirectional mapping perception-enhanced cycle-consistent generative adversarial network for super-resolution of brain MRI images. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–4. [Google Scholar]
Chen, X.; Zhao, Y.; Court, L.; Wang, H.; Pan, T.; Phan, J.; Wang, X.; Ding, Y.; Yang, J. SC-GAN: Structure-completion generative adversarial network for synthetic CT generation from MR images with truncated anatomy. Comput. Med. Imaging Graph. 2024, 113, 102353–102364. [Google Scholar] [CrossRef]
Wang, Z.; Yang, Y.; Chen, Y.; Yuan, T.; Sermesant, M.; Delingette, H.; Wu, O. Mutual information guided diffusion for zero-shot cross-modality medical image translation. IEEE Trans. Med. Imaging 2024, 43, 2825–2838. [Google Scholar] [CrossRef]
Diogo, P.; Morais, M.; Calisto, F.; Santiago, C.; Aleluia, C.; Nascimento, J.C. Weakly-supervised diagnosis and detection of breast cancer using deep multiple instance learning. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 18–21 April 2023; pp. 1–4. [Google Scholar]
Luo, Y.; Zhou, L.; Zhan, B.; Fei, Y.; Zhou, J.; Wang, Y.; Shen, D. Adaptive rectification based adversarial network with spectrum constraint for high-quality PET image synthesis. Med. Image Anal. 2022, 77, 102335–102347. [Google Scholar] [CrossRef]
Zhou, Z.; Wang, Y.; Guo, Y.; Qi, Y.; Yu, J. Image quality improvement of hand-held ultrasound devices with a two-stage generative adversarial network. IEEE Trans. Biomed. Eng. 2020, 67, 298–311. [Google Scholar] [CrossRef] [PubMed]
Yu, B.; Zhou, L.; Wang, L.; Shi, Y.; Fripp, J.; Bourgeat, P. Ea-GANs: Edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE Trans. Med. Imaging 2019, 38, 1750–1762. [Google Scholar] [CrossRef] [PubMed]
Wu, C.; Zhang, H.; Chen, J.; Gao, Z.; Zhang, P.; Muhammad, K.; Del Ser, J. Vessel-GAN: Angiographic reconstructions from myocardial CT perfusion with explainable generative adversarial networks. Future Gener. Comput. Syst. 2022, 130, 128–139. [Google Scholar] [CrossRef]
Kim, J.; Lee, Y.; Ko, D.; Kim, T.; Ham, S.; Woo, S. MGCMA: Multi-scale generator with channel-wise mask attention to generate synthetic contrast-enhanced chest computed tomography. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, Tallinn, Estonia, 27–31 March 2023; pp. 575–584. [Google Scholar]
Yang, X.; Chin, B.; Silosky, M.; Wehrend, J.; Litwiller, D.; Ghosh, D.; Xing, F. Learning without real data annotations to detect hepatic lesions in PET images. IEEE Trans. Biomed. Eng. 2024, 71, 679–688. [Google Scholar] [CrossRef] [PubMed]
Xue, Y.; Dewey, B.; Zuo, L.; Han, S.; Carass, A.; Duan, P.; Remedios, S.; Pham, D.; Saidha, S.; Calabresi, P.; et al. Bi-directional synthesis of Pre-and Post-contrast MRI via guided feature disentanglement. In International Workshop on Simulation and Synthesis in Medical Imaging; Springer: Berlin/Heidelberg, Germany, 2022; pp. 55–65. [Google Scholar]
Pang, H.; Qi, S.; Wu, Y.; Wang, M.; Li, C.; Sun, Y.; Qian, W.; Tang, G.; Xu, J.; Liang, Z.; et al. NCCT-CECT image synthesizers and their application to pulmonary vessel segmentation. Comput. Methods Programs Biomed. 2023, 231, 107389–107403. [Google Scholar] [CrossRef]
Zhong, L.; Huang, P.; Shu, H.; Li, Y.; Zhang, Y.; Feng, Q.; Wu, Y.; Yang, W. United multi-task learning for abdominal contrast-enhanced CT synthesis through joint deformable registration. Comput. Methods Programs Biomed. 2023, 231, 107391–107401. [Google Scholar] [CrossRef]
Yang, Y.; Chen, Q.; Li, Y.; Wang, F.; Han, X.; Iwamoto, Y.; Liu, J.; Lin, L.; Hu, H.; Chen, Y. Segmentation guided crossing dual decoding generative adversarial network for synthesizing contrast-enhanced computed tomography images. IEEE J. Biomed. Health. Inf. 2024, 28, 4737–4750. [Google Scholar] [CrossRef]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
Xu, Z.; J, Q. Frequency principle: Fourier analysis sheds light on deep neural networks. Commun. Comput. Phys. 2020, 28, 1746–1767. [Google Scholar]
Ji, Z.; Shen, Y.; Ma, C.; Gao, M. Scribble-based hierarchical weakly supervised learning for brain tumor segmentation. In Medical Image Computing and Computer Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2019; pp. 175–183. [Google Scholar]
Barrera, K.; Merino, A.; Molina, A.; Rodellar, J. Automatic generation of artificial images of leukocytes and leukemic cells using generative adversarial networks (syntheticcellgan). Comput. Methods Programs Biomed. 2023, 229, 107314. [Google Scholar] [CrossRef]

Figure 1. Comparison of real and post-contrast images with deep generative models in the spatial domain. The yellow box represents the local contrast enhancement region.

Figure 2. The deep architecture and data flow of IFGAN. The pre-contrast image (

I_{s}

) is input into generator the (G), which contains a basic encoder and dual-branch decoder. Within it, the feature map (z) yielded by the basic encoder is input into the dual-branch decoder, and the reconstruction label (s) guides the reconstructed image (

{\tilde{I}}_{s}

; pink line). In the enhanced interaction module, the post-contrast medical feature (

f_{t}

) subtracts the reconstruction feature (

f_{s}

) and continues to be guided by the target label (t) to produce the target post-contrast image (

{\tilde{I}}_{t}

; blue line). The discriminator discriminates the authenticity between the post-contrast (

{\tilde{I}}_{t}

) and real medical images (

I_{t}

).

Figure 2. The deep architecture and data flow of IFGAN. The pre-contrast image (

I_{s}

) is input into generator the (G), which contains a basic encoder and dual-branch decoder. Within it, the feature map (z) yielded by the basic encoder is input into the dual-branch decoder, and the reconstruction label (s) guides the reconstructed image (

{\tilde{I}}_{s}

; pink line). In the enhanced interaction module, the post-contrast medical feature (

f_{t}

) subtracts the reconstruction feature (

f_{s}

) and continues to be guided by the target label (t) to produce the target post-contrast image (

{\tilde{I}}_{t}

; blue line). The discriminator discriminates the authenticity between the post-contrast (

{\tilde{I}}_{t}

) and real medical images (

I_{t}

).

Figure 3. Examples comparing real images with generated images in the frequency domain. (a) The loss of high-frequency information (surrounding area) in the generated image leads to blurred texture. (b) A shift in the frequency-spectrum region occurs in the generated image, resulting in unnecessary details.

Figure 4. Training curves on the BraTS and SegRap datasets.

Figure 5. Comparisons on the BraTS dataset. The yellow box represents the local contrast enhancement region.

Figure 6. Comparisons on the SegRap dataset. The yellow box represents the local contrast enhancement region.

Figure 7. The focal frequency loss affects the comparison on the BraTS dataset, and the last column represents the mini-batch average spectra.

Figure 8. The focal frequency loss affects the comparison on the SegRap dataset, and the last column represents the mini-batch average spectra.

Figure 9. Bidirectional synthesis comparison on the BraTS and SegRap datasets.

Figure 10. Comparison of qualitative ablation results on the BraTS dataset.

Figure 11. Comparison of qualitative ablation results on the SegRap dataset.

Table 1. Training time and throughput.

Method	Year	Train Time (h)		Throughput
Method	Year	BraTS	SegRap	BraTS	SegRap
RIED-Net [6]	2019	3.54	2	4	4
BPGAN [7]	2020	47.5	29.58	2	2
StarGAN V2 [8]	2020	32.67	18.67	4	4
RegGAN [9]	2021	8	4	1	1
Bi-MGAN [10]	2022	21	9.52	2	2
DC-cycleGAN [11]	2023	25.2	28.46	4	4
DCE-MRI [12]	2024	15.65	9.75	2	2
Ours	-	15.97	13.59	4	4

Table 2. Quantitative results compared with competitors on the BraTS dataset.

Pre- to Post-Contrast	Method	Year	SSIM↑	PSNR↑	MSIM↑	NRMSE↓
T1→T1CE	RIED-Net [6]	2019	0.921	29.183	0.951	0.168
	BPGAN [7]	2020	0.901	26.749	0.919	0.222
	StarGAN V2 [8]	2020	0.813	20.444	0.789	0.454
	RegGAN [9]	2021	0.678	13.790	0.644	0.957
	Bi-MGAN [10]	2022	0.911	28.111	0.930	0.190
	DC-cycleGAN [11]	2023	0.881	24.011	0.891	0.323
	DCE-MRI [12]	2024	0.831	22.709	0.835	0.359
	Ours	-	0.927	31.112	0.961	0.136

Bold values show the best performance in each column.

Table 3. Quantitative comparisons on the SegRap dataset.

Pre- to Post-Contrast	Method	Year	SSIM↑	PSNR↑	MSIM↑	NRMSE↓
CT→CECT	RIED-Net [6]	2019	0.922	25.067	0.960	0.419
	BPGAN [7]	2020	0.972	28.496	0.977	0.240
	StarGAN V2 [8]	2020	0.933	26.833	0.970	0.295
	RegGAN [9]	2021	0.803	20.503	0.933	0.584
	Bi-MGAN [10]	2022	0.989	36.361	0.995	0.103
	DC-cycleGAN [11]	2023	0.977	32.585	0.992	0.151
	DCE-MRI [12]	2024	0.967	30.764	0.988	0.185
	Ours	-	0.984	36.953	0.996	0.096

Bold values show the best performance in each column.

Table 4. Feature similarity and manifold fitting evaluations.

Dataset	Method	Year	LPIPS↓	FID↓	GAN-Seg↑
BraTS	RIED-Net [6]	2019	0.071	0.0024	0.4545
	BPGAN [7]	2020	0.061	0.0021	0.5250
	StarGAN V2 [8]	2020	0.106	0.0022	0.3796
	RegGAN [9]	2021	0.182	0.0031	0.3457
	Bi-MGAN [10]	2022	0.055	0.0022	0.5136
	DC-cycleGAN [11]	2023	0.078	0.0023	0.4742
	DCE-MRI [12]	2024	0.140	0.0030	0.2794
	Ours	-	0.038	0.0015	0.5326
SegRap	RIED-Net [6]	2019	0.0724	0.0034	0.5610
	BPGAN [7]	2020	0.0150	0.0013	0.5865
	StarGAN V2 [8]	2020	0.0251	0.0021	0.5812
	RegGAN [9]	2021	0.0754	0.0031	0.5875
	Bi-MGAN [10]	2022	0.0057	0.0007	0.6163
	DC-cycleGAN [11]	2023	0.0120	0.0015	0.5568
	DCE-MRI [12]	2024	0.0420	0.0041	0.5730
	Ours	-	0.0057	0.0012	0.6245

Bold values show the best performance in each column.

Table 5. Quantitative results of another mapping direction on the BraTS and SegRap datasets.

Post- to Pre-Contrast	Dataset	Method	SSIM↑	PSNR↑	MSIM↑	NRMSE↓
T1CE→T1	BraTS	BPGAN [7]	0.875	23.147	0.915	0.215
		StarGAN V2 [8]	0.857	23.004	0.916	0.235
		Bi-MGAN [10]	0.914	27.366	0.950	0.142
		DC-cycleGAN [11]	0.890	24.312	0.929	0.217
		Ours	0.933	30.446	0.972	0.099
CECT→CT	SegRap	BPGAN [7]	0.971	29.389	0.976	0.235
		StarGAN V2 [8]	0.922	25.741	0.951	0.359
		Bi-MGAN [10]	0.990	37.163	0.996	0.099
		DC-cycleGAN [11]	0.979	33.443	0.993	0.144
		Ours	0.985	37.788	0.996	0.092

Bold values indicate the best performance in each respective column.

Table 6. Quantitative results of the ablation experiment.

Dataset	Variant	SSIM↑	PSNR↑	MSIM↑	NRMSE↓
BraTS	w/o EIM	0.919	30.227	0.954	0.152
	w/o ffl	0.925	31.005	0.959	0.138
	w/o rec	0.926	31.073	0.961	0.136
	w/o pmae	0.842	24.105	0.877	0.301
	Ours	0.927	31.112	0.961	0.136
SegRap	w/o EIM	0.983	36.954	0.996	0.096
	w/o ffl	0.983	36.943	0.996	0.096
	w/o rec	0.982	36.653	0.996	0.099
	w/o pmae	0.978	34.773	0.994	0.120
	Ours	0.984	36.954	0.996	0.096

Bold values indicate the best performance in each respective column.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lei, Y.; Xu, L.; Wang, X.; Fan, X.; Zheng, B. IFGAN: Pre- to Post-Contrast Medical Image Synthesis Based on Interactive Frequency GAN. Electronics 2024, 13, 4351. https://doi.org/10.3390/electronics13224351

AMA Style

Lei Y, Xu L, Wang X, Fan X, Zheng B. IFGAN: Pre- to Post-Contrast Medical Image Synthesis Based on Interactive Frequency GAN. Electronics. 2024; 13(22):4351. https://doi.org/10.3390/electronics13224351

Chicago/Turabian Style

Lei, Yanrong, Liming Xu, Xian Wang, Xueying Fan, and Bochuan Zheng. 2024. "IFGAN: Pre- to Post-Contrast Medical Image Synthesis Based on Interactive Frequency GAN" Electronics 13, no. 22: 4351. https://doi.org/10.3390/electronics13224351

APA Style

Lei, Y., Xu, L., Wang, X., Fan, X., & Zheng, B. (2024). IFGAN: Pre- to Post-Contrast Medical Image Synthesis Based on Interactive Frequency GAN. Electronics, 13(22), 4351. https://doi.org/10.3390/electronics13224351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IFGAN: Pre- to Post-Contrast Medical Image Synthesis Based on Interactive Frequency GAN

Abstract

1. Introduction

2. Related Work

3. The Proposed Method

3.1. Problem Definition and Notations

3.2. Deep Architecture of IFGAN

3.2.1. Generator

3.2.2. Discriminator

3.2.3. Enhanced Interaction Module

3.3. Loss Function

3.3.1. Adversarial Loss

3.3.2. Pixel-Wise Mean Absolute Error

3.3.3. Focal Frequency Loss

3.3.4. Reconstruction Loss

3.3.5. Objective Function and Algorithm

4. Frequency-Domain Optimization Analysis

5. Experiment

5.1. Training Datasets

5.2. Implementation Details and Metrics

5.3. Training Time and Throughput

5.4. Qualitative Results

5.5. Quantitative Results

5.6. Focal Frequency Analysis

5.7. Bidirectional Synthesis Analysis

5.8. Discussion and Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI