Hybrid Multimodal Medical Image Fusion Method Based on LatLRR and ED-D2GAN

Zhou, Tao; Li, Qi; Lu, Huiling; Zhang, Xiangxiang; Cheng, Qianru

doi:10.3390/app122412758

Open AccessArticle

Hybrid Multimodal Medical Image Fusion Method Based on LatLRR and ED-D²GAN

¹

School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China

²

Key Laboratory of Image and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China

³

School of Science, Ningxia Medical University, Yinchuan 750004, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(24), 12758; https://doi.org/10.3390/app122412758

Submission received: 9 November 2022 / Revised: 9 December 2022 / Accepted: 9 December 2022 / Published: 12 December 2022

(This article belongs to the Special Issue Advanced Technologies in Medical Image Processing and Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

In order to better preserve the anatomical structure information of Computed Tomography (CT) source images and highlight the metabolic information of lesion regions in Positron Emission Tomography (PET) source images, a hybrid multimodal medical image fusion method (LatLRR-GAN) based on Latent low-rank representation (LatLRR) and the dual discriminators Generative Adversarial Network (ED-D²GAN) is proposed. Firstly, considering the denoising capability of LatLRR, source images were decomposed by LatLRR. Secondly, the ED-D²GAN model was put forward as the low-rank region fusion method, which can fully extract the information contained by the low-rank region images. Among them, encoder and decoder networks were used in the generator; convolutional neural networks were also used in dual discriminators. Thirdly, a threshold adaptive weighting algorithm based on the region energy ratio is proposed as the salient region fusion rule, which can improve the overall sharpness of the fused image. The experimental results show that compared with the best methods of the other six methods, this paper is effective in multiple objective evaluation metrics, including the average gradient, edge intensity, information entropy, spatial frequency and standard deviation. The results of the two experiments are improved by 35.03%, 42.42%, 4.66%, 8.59% and 11.49% on average.

Keywords:

multimodal; medical image fusion; LatLRR; GAN; deep learning

1. Introduction

Common medical images, such as Computed Tomography (CT) images, have high spatial resolution and can provide accurate anatomical information of lesions for the clinical diagnosis of patients [1]. However, due to the low resolution of soft tissue, CT images have certain limitations in qualitative diagnosis [2]. Positron Emission Tomography (PET) images are highly sensitive for the early diagnosis of tumors, but due to the low spatial resolution of the images, accurate anatomical structure information of the lesion cannot be provided by these images [3]. In the face of complex diseases, medical images with a single mode cannot provide sufficient auxiliary information for clinicians to refer to, while the fused images can simultaneously present effective information of images with different modes, which can improve the identification ability of the lesion area. It has an important clinical application value in early diagnosis, clinical staging, localization of lesion areas, formulation of diagnosis and treatment plans, and evaluations of the curative effect of tumors.

So far, a large number of image fusion methods has been proposed by relevant researchers, mainly including the methods based on multi-scale decomposition, and the methods based on sparse representation, deep learning, and hybrid models. Among them, the image fusion methods based on multi-scale decomposition [4] firstly decomposed the source images into different low-frequency sub-bands and high-frequency sub-bands. Then, specific fusion rules are used to synthesize each sub-band of the fused image. Finally, the fusion result is reconstructed by the corresponding inverse transform [5]. Diwakar et al. [6] proposed a multi-modal medical image fusion method in non-subsampled shearlet transform (NSST) domains for the Internet of Medical Things. The source images were decomposed into low-frequency and high-frequency component by the decomposition method based on NSST. In the low-frequency component, weighted fusion based on significance features is performed by using Multi local extrema (MLE) and co-occurrence filter. The fusion method based on fuzzy is used in the high-frequency component. Such methods can effectively obtain the detail information of the source images, but these methods need to manually design complex fusion rules. The number of the decomposition layers and the design of fusion rules will directly affect the quality of the final fused images. Compared with the methods based on multi-scale decomposition, sparse representation methods [7] under the condition of sharing the same set of sparse coefficients of the high-frequency and low-frequency images, through the use of sliding windows from the source images into multiple overlapping blocks, which can reduce the visual artifacts of the fused images and improve the robustness of registration error. Li et al. [8] proposed a method for multi-modal medical image denoising and fusion tasks based on sparse representation, in which group sparse representation can provide satisfactory fusion results with fewer artifacts through strong robustness. However, these methods are very time-consuming and dictionary learning is complex.

In recent years, deep learning models have been widely used in the field of image segmentation [9], image analysis [10], image detection [11], image fusion [12], and image classification [13] through good feature extraction and representation capabilities. In the field of image fusion, Liu et al. [14] firstly introduced the Convolutional Neural Network (CNN) into the multi-focus image fusion field. By learning the CNN model, it can effectively avoid the problem where traditional methods need to design complex fusion rules, but this method only uses the results of the last layer of the network model. FusionGAN [15] proposed by Ma et al. introduces the Generative Adversarial Network (GAN) [16] into the image fusion field, establishing an adversarial game between the generator and discriminator, which can obtain fused images with outstanding target information. Fu et al. [17] realized end-to-end anatomical and functional medical image fusion based on GAN and obtained fused images with clear edges and rich details. However, these methods only use one discriminator, which makes it easy to cause losses of effective information in the fused images, and fusion methods based on deep learning often pay little attention to noise-processing of the source images. In addition, it is still a challenge to achieve effective fusion by designing network architectures and loss functions compared to the methods based on multi-scale decomposition.

Considering the advantages and limitations of a single fusion method, researchers have proposed some methods based on hybrid models to apply deep learning to the framework of traditional image fusion tasks to further improve the quality of fused images. Latent low-rank representation (LatLRR) [18] is usually used in clustering analysis tasks, which can remove the noise region contained in the source images and extract the global and local structure of data. Gao et al. [19] combined LatLRR with CNN, used LatLRR and Rolling Guided Image Filtering (RGIF) to decompose the source images at two levels, and used the fusion rules based on CNN to fuse the detailed layers, thus improving the contrast and sharpness of the fused images. Xia et al. [20] proposed a multi-modal medical image fusion method based on multi-scale transformation and the deep stacked convolutional neural network (DSCNN). In this method, the trained DSCNN model was used to decompose the source images into low-frequency and high-frequency images, and were then fused respectively. The proposed method can adaptively decompose and reconstruct the image in the fusion process. However, the image fusion methods based on CNN have a strong dependence on the quality of the source images. Due to the particularity of medical images, there is a small number of registered medical images in the current public datasets. Therefore, the application of the image fusion method based on CNN in the field of multimodal medical image fusion has certain limitations. Wang et al. [21] proposed a medical image fusion method based on GAN and shift-invariant shearlet transform (SIST). In this method, SIST was used to decompose the source images, and the trained GAN model was used as the fusion rule of the high-pass sub-band, and the low-pass sub-band fusion was achieved by local energy-weighted summation and bilateral filter. It can effectively suppress the phenomena of artifacts and distortions in the fused images, but it is easy to lose some details in the source images because only one discriminator is being used.

Based on the above problems, in order to better preserve the anatomical structure and contour intensity information of the CT source images, the functional and metabolic information of the lesion area in the PET source images are highlighted, and to enhance the visibility of the source images, a hybrid multimodal medical image fusion method (LatLRR-GAN) based on LatLRR and dual discriminators GAN (ED-D²GAN) is proposed by combining LatLRR and GAN in this paper. Considering the denoising ability of LatLRR, in order to reduce the impact of noise in the fusion process, this paper firstly uses LatLRR to decompose the source images. Secondly, we show how GAN has the following advantages: (1) Strong feature extraction ability; (2) an end-to-end image fusion process; and (3) how the quality of fused images can constantly adjust through the adversarial game between generator and discriminator. This paper introduces GAN into the framework of multimodal medical image fusion tasks based on decomposition transformation to alleviate the shortage of manual design of complex fusion rules and improve the quality of the fused images. Additionally, considering the shortcomings of using a single discriminator, this paper uses dual discriminators to fully obtain the valid information contained in the source images and highlight the lesion information in the fused images. Finally, due to how the fusion rule based on region energy can obtain local region feature information and improve the sharpness of the fused images, this paper uses the threshold adaptive weighting algorithm based on a region energy ratio as the salient region fusion rule.

The main contributions can be summarized as follows:

A hybrid multimodal medical image fusion method based on LatLRR and ED-D²GAN is proposed, which can effectively realize the fusion of CT and PET images.
An image fusion strategy based on a dual discriminator GAN is proposed. Encoder and decoder networks are used in the generator; CNNs are used in dual discriminators, which can effectively preserve the anatomical structure information in CT source images and the functional information of the lesion region in PET source images.
A threshold adaptive weighting algorithm based on a region energy ratio is used as the fusion rule of salient region images, which improves the quality of fused images.

The remainder of the paper is presented in four sections. The fusion methods of low-rank region images and salient region images are described in detail in Section 2. Section 3 provides and analyses the experimental details and results. The experimental results and future work directions are discussed in Section 4. Finally, this paper is concluded in Section 5.

2. Proposed Method

The overall network architecture of this paper is shown in Figure 1. Among them, the overall framework process of the proposed method is described in Figure 1a, the overall network architecture of ED-D²GAN is shown in Figure 1b, the network structure of generator is shown in Figure 1c, and the network structure of the CT discriminator and PET discriminator of ED-D²GAN is shown in Figure 1d and Figure 1e, respectively.

This section describes the proposed fusion method in detail. Due to medical images often involving various tissues and organs of the human body, there are the characteristics of a huge amount of data, a complex structure, and significant noise. Therefore, in order to discard the noise in the source images and improve the quality of the fused images, LatLRR is first used to decompose the source images into a low-rank region, salient region, and noise region in this paper. The decomposition process of LatLRR is shown in Figure 2. Secondly, according to the characteristics of the low-rank region images, due to the small area of the lesions in the whole image, in order to properly process the background of the medical images to highlight the lesion information, a fusion model based on ED-D²GAN is proposed to extract the deeper features of the low-rank region images. The fusion process is shown in Figure 1b. Thirdly, the salient region images mainly reflect the detailed characteristics and edge information of the source images. The selection of the salient region images’ fusion rule has a great influence on the sharpness and edge distortion of the fused images. Due to the fusion rule based on pixels not being able to accurately reflect the strong correlation among multiple pixels in a certain local region, a threshold adaptive weighting algorithm based on a region energy ratio is proposed in this paper as the salient region images fusion rule. The specific fusion process is shown in Equations (8) to (13). Finally, the noise region images are discarded, and the final fused image is reconstructed by linear addition of the low-rank region fused image and the salient region fused image.

2.1. Low-Rank Region Images Fusion Method

Due to the difference between PET and CT in the imaging mechanism, the gray values of the two kinds of images are different. Malignant lesions with high metabolism appear as dark areas in PET images, while CT images can clearly show the distribution of bones and organs. In addition, the low-rank region images of the medical images obtained by LatLRR decomposition concentrates most of the effective information of the source images. Considering that low-rank region images mainly present background information, an image fusion method based on ED-D²GAN is proposed in this paper to better highlight the lesions in the fused images. The low-rank region image of CT after LatLRR decomposition is firstly enhanced to improve the details and contrast information in the fused images of low-rank region. In addition, considering that a discriminator used in the basic GAN model cannot retain all the valid information contained in the two source images at the same time, this paper uses two discriminators to discriminate the input source images and the fused image, respectively, to improve the quality of the fused image. Generators and discriminators have an adversarial role in the whole architecture. The overall network architecture of ED-D²GAN is shown in Figure 1b.

2.1.1. Generator

In order to obtain more detailed information from the source images, a generator network architecture based on an encoder and decoder is designed in this paper, which is used to fuse the enhanced image

C T_L 1_E

of the low-rank region of CT decomposed by LatLRR and the low-rank region image

P E T_L 2

of PET decomposed by LatLRR. The generator network architecture is shown in Figure 1c. The feature extraction and fusion processes are performed in the encoder. Firstly,

C T_L 1_E

and

P E T_L 2

were connected in series in the channel dimension as the input of the encoder network, and then the fused feature maps were used as the output. Finally, the fused feature maps were reconstructed in the decoder to obtain the low-rank region fused image with the same resolution as the source images. The encoder contains five layers of CNNs and the step size of each convolutional layer was set to 1. The decoder contains five layers of CNNs and its network architecture is shown in Figure 1c. In order to better preserve the contrast information of source images, a Batch Normalization (BN) layer [22] was introduced to overcome the sensitivity of data initialization, avoid the problem of gradient explosion or gradient disappearance, and accelerate the speed of the network training. In addition, a

L e a k y R e L u

activation function was used to improve the network effect.

2.1.2. Discriminator

The discriminator is designed to act against the generator, where

D_{C T}

aims at correctly discriminating the generated image

I

and

C T_L 1_E

and

D_{P E T}

aims at correctly discriminating the generated image

I

and

P E T_L 2

. In the proposed method,

D_{C T}

and

D_{P E T}

are two independent discriminators with the same architecture. Compared with the generator, the design of the discriminator architecture is relatively simple, as shown in Figure 1d and e, respectively. The discriminator is made up of three convolutional layers. In order to avoid introducing noise, convolution layers with a step size of 2 were used instead of pooling layers to make the discriminator have a better classification effect. The

L e a k y R e L u

activation function was used in the first three layers, and the

t a n h

activation function was used in the last layer to generate a scalar that estimates the probability that the input image comes from low-rank region source images rather than the generated image.

2.1.3. Loss Function

The model architecture of ED-D²GAN consists of three parts: generator

G

, discriminator

D_{C T}

and discriminator

D_{P E T}

. Therefore, the loss function also includes three parts, that is, generator loss function

L_{G}

, discriminator

D_{C T}

loss function

L_{D_{C T}}

and discriminator

D_{P E T}

loss function

L_{D_{P E T}}

.

Loss function of generator

Since the training process of the basic GAN model is unstable, this paper uses content loss to impose additional constraints on the generator. The loss function of a generator consists of the content loss between the generated image and the low-rank region images of the source images and the adversarial loss between the generator and the discriminator, which is defined as follows:

L_{G} = L_{G}^{a d v} + α L_{c},

(1)

where

L_{G}^{a d v}

is the adversarial loss,

L_{c}

is the content loss, and

α

is the parameter that controls the trade-off.

L_{G}^{a d v}

directs the generator to generate a real fused image through an adversarial game between the generator and discriminator in order to fool both two discriminators, defined as follows:

\begin{matrix} L_{G}^{a d v} & = E (\log (1 - D_{C T} (G (C T_L 1_E, P E T_L 2)))) + \\ E (\log (1 - D_{P E T} (G (C T_L 1_E, P E T_L 2)))), \end{matrix}

(2)

where

D_{C T}

and

D_{P E T}

respectively represent two discriminators,

G (C T_L 1_E, P E T_L 2)

represents the fused image, and

E

represents the mathematical expected value, that is, the generator expects the generated fused image to deceive the discriminators.

L_{c}

constrains the content similarity between the generated image and the low-rank region images of the source images, so that the fused image can retain more effective information of the source images. Since the texture details of CT images are mainly characterized by gradient changes, the functional information of PET images can be characterized by pixel intensity. Therefore,

L_{c}

mainly includes gradient loss and intensity loss. The specific definition is as follows:

L_{c} = L_{g r a d} + β L_{i n},

(3)

where

L_{g r a d}

represents gradient loss,

L_{i n}

represents intensity loss, and

β

is the parameter controlling the trade-off.

L_{g r a d}

is used to measure the retention degree of texture details in the fused image. In order to retain finer texture information in the fused image, this paper constructs gradient loss according to the principle of maximum selection, which can not only enhance the preservation of texture detail information, but also effectively prevent the diffusion of edge detail information in high-contrast areas.

L_{g r a d}

is defined as follows:

L_{g r a d} = ||\max (|\nabla^{2} C T_L 1_E|, |\nabla^{2} P E T_L 2|) -| \nabla^{2} G (C T_L 1_E, P E T_L 2)| | |_{1},

(4)

where

|\cdot|

represents the absolute value function,

{||\cdot||}_{1}

is the L1 norm,

\max (\cdot)

is a maximum function,

\nabla^{2}

is the Laplacian gradient operator.

L_{i n}

is used to constrain the fused image to maintain a similar intensity information distribution with the source images, thus maintaining significant contrast information.

L_{i n}

is defined as follows:

\begin{matrix} L_{i n} = γ {||G (C T_L 1_E, P E T_L 2) - P E T_L 2||}_{F}^{2} + \\ (1 - γ) {||G (C T_L 1_E, P E T_L 2) - C T_L 1_E||}_{F}^{2}, \end{matrix}

(5)

where

{||\cdot||}_{F}^{2}

is the

F r o n b e n i u s

norm, and

γ

is the parameter that controls the trade-off.

2.: Loss function of discriminator

In this paper, two independent discriminators

D_{C T}

and

D_{P E T}

are used to constrain the generator to capture more contrast information and texture information, respectively. The corresponding loss functions are

L_{D_{C T}}

and

L_{D_{P E T}}

, defined as follows:

L_{D_{C T}} = E [- l o g D_{C T} (C T_L 1_E)] + E [- \log (1 - D_{C T} (G (C T_L 1_E, P E T_L 2)))],

(6)

L_{D_{P E T}} = E [- l o g D_{P E T} (P E T_L 2)] + E [- \log (1 - D_{P E T} (G (C T_L 1_E, P E T_L 2)))],

(7)

where

L_{D_{C T}}

and

L_{D_{P E T}}

are both cross-entropy loss functions. The CT discriminator

L_{D_{C T}}

is used to accurately distinguish the fused image from the CT image

C T_L 1_E

, and the PET discriminator

L_{D_{P E T}}

is used to accurately distinguish the fused image from the PET image

P E T_L 2

.

2.1.4. Training Details

In this paper, 400 images (200 CT images and 200 PET images) were selected as the training dataset images of ED-D²GAN from the training dataset of lung tumor patients provided by a third-class A hospital in Ningxia. This dataset is described in detail in Section 3.1.1. In order to meet the input size of the model, the image size of the training dataset was adjusted to 256 pixels × 256 pixels, and the original RGB three-channel images were converted to grayscale images. In the training process of ED-D²GAN, the network was trained for 12 epochs, the batch size was 16, the learning rate of the whole network was

2 \times 10^{- 4}

, and the exponential decay was 0.9 of the original value after each epoch. The generator and discriminator used the RMS Prop Optimizer and Adam Optimizer, respectively, where

α = 0.4

,

β = 5

,

γ = 10

.

2.2. Salient Region Images Fusion Method

Salient region images can reflect important information, such as the edge contour and texture details of the source images. Therefore, the selection of the salient region fusion rule is directly related to the sharpness and edge intensity of the fused image. Considering that the local features of an image are not expressed by a single pixel, but by multiple pixels in a local region, and how each pixel has a strong correlation with each other, the simple weighted fusion rule based on a single pixel cannot reflect the feature information of the region well. However, the fusion rule based on regional energy can overcome the one-sidedness of the above methods and obtain more local feature information of the images. Therefore, the fusion method based on regional energy [23] was selected as the basis of the salient region fusion rule in this paper. We considered that when the energy of two local regions is relatively similar, the relative loss of information is easy to be caused by directly selecting the pixel value with more regional energy. Therefore, a threshold adaptive weighted fusion algorithm based on regional energy ratio is proposed in this paper. According to the constant changes of regional center pixel and its corresponding regional energy, the weight matrix was set to adjust the weight coefficient adaptively, so as to achieve the purpose of fully preserving the details of the fused image. The fusion processes are as follows.

Firstly, the salient region images

C T_S 1

and

P E T_S 2

decomposed by LatLRR were segmented by the sliding window, respectively, and the regional energy

E_{C T} (m, n)

and

E_{P E T} (m, n)

of the pixel centered on

(m, n)

were obtained. The calculation formulas are as follows:

E_{C T} (m, n) = \sum_{i \in S} \sum_{j \in T} W (m, n) \times {[C T_S 1 (i + m, j + n)]}^{2},

(8)

E_{P E T} (m, n) = \sum_{i \in S} \sum_{j \in T} W (m, n) \times {[P E T_S 2 (i + m, j + n)]}^{2},

(9)

where

(i, j)

is the relative offset of pixels in the region window from the center pixel.

S

and

T

are the maximum row and maximum column coordinates of the regional window.

W

is a weight matrix of size

3 \times 3

. The normalized

W

is defined as:

W = \frac{1}{16} [\begin{matrix} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \end{matrix}],

(10)

Then, the regional energy ratio

E_{r a t i o} (m, n)

is obtained according to the regional energy, and the calculation formula is as follows:

E_{r a t i o} (m, n) = \frac{E_{P E T} (m, n)}{E_{C T} (m, n)},

(11)

Finally, the salient region fused image

I_{s f}

is calculated by weighting method, and the corresponding formula is as follows:

I_{s f} (m, n) = w_{1} \times P E T_S 2 (m, n) + w_{2} \times C T_S 1 (m, n),

(12)

where

w_{1}

and

w_{2}

are the weighting coefficients. The specific calculation formula is as follows:

[\begin{matrix} w_{1} \\ w_{2} \end{matrix}] = \{\begin{matrix} {[\begin{matrix} 0 & 1 \end{matrix}]}^{T} & E_{r a t i o} < t h 1 \\ [\begin{matrix} E_{P E T} / (E_{P E T} + E_{C T}) \\ E_{C T} / (E_{P E T} + E_{C T}) \end{matrix}] & t h 1 < E_{r a t i o} < t h 2, \\ {[\begin{matrix} 1 & 0 \end{matrix}]}^{T} & E_{r a t i o} > t h 2 \end{matrix}

(13)

where [·]^T is the matrix transpose operator, and

t h 1

and

t h 2

are threshold coefficients determined according to the overall energy distribution of the images. When the energy ratio of the region is too small or too large, the weight of the region with higher energy is set to 1 and the weight of the region with lower energy is set to 0. When the energy ratio of the region is within the threshold range, the adaptive weight is calculated based on the energy ratio. That is, the larger the region energy, the larger the corresponding weighting coefficient, and the higher the proportion in the fused image. On the contrary, the smaller the region energy, the lower the proportion in the fused image.

After obtaining the low-rank region fused image and the salient region fused image, the final fused image was reconstructed by using the linear addition method, and the formula is as follows:

I_{l f} + I_{s f} = I_{f},

(14)

where

I_{l f}

and

I_{s f}

represent the low-rank region fused image and the salient region fused image, respectively, and

I_{f}

represents the final fused image. The proposed fusion algorithm is summarized in Algorithm 1.

Algorithm 1. The proposed multimodal medical image fusion algorithm.

Input:

C T

image and

P E T

image.

Output: Fused image

I_{f}

.

Stage 1: Image decomposition

C T

\to C T_L 1

, C T_S 1

, C T_N 1

;

P E T

\to P E T_L 2

, P E T_S 2

, P E T_N 2

;

Stage 2: Image fusion

$Fusion of the low - rank region images — according to Equations (1) to (7), the low - rank region fused image I_{l f} can be obtained;$

2.: $Fusion of the salient region images — according to Equations (8) to (13), the salient region fused image I_{s f} can be obtained;$

Stage 3: Image reconstruction according to Equation (14);

Output the fused image I_{f}

_.

3. Experimental Section

In this section, the dataset employed in this work and the experimental environments are introduced in Section 3.1. Then, in Section 3.2, the comparison methods and the objective evaluation metrics are presented. To demonstrate the effectiveness of the proposed LatLRR-GAN, qualitative and quantitative comparisons with representative and state-of-the-art methods and the corresponding analysis are given in Section 3.3. In Section 3.4, ablation experiments are described.

3.1. Dataset and Experimental Environments

3.1.1. Dataset

The dataset used in this paper was collected from 95 clinical patients with lung tumors who underwent PET and CT general examination in a third-class A hospital in Ningxia from January 2018 to June 2020. There were 46 female patients and 49 male patients. They ranged in age from 39 to 76, with an average age of 50.635. Among the 95 patients, 40 were smokers, whose smoking time ranged from 2 to 25 years, with an average year of 12.112. Before imaging, all patients fasted for 6 h, had a controlled blood glucose below 10, urinated, and removed any metal jewelry. The patients were examined using intravenous fluoride deoxy glucose injection. One hour after the imaging agent was injected, PET and CT images of the lungs and cadre were taken while the patients lay supine in a quiet and dark room for 45 to 60 min. After scanning, cross-plane, sagittal plane and coronal plane images were selected. To ensure the correct labeling of the lesion area, the data collected were evaluated by two imaging clinicians. When there was a dispute between the two clinicians, three clinicians with more than 10 years of experience in tumor imaging diagnosis were invited to make a joint diagnosis, and the result was subject to the majority opinion. Patients with special conditions were diagnosed in combination with clinical practice. After the processing of rotation, mirror, data enhancement and data augmentation, the image datasets of PET/CT, PET and CT modes were constituted. The final sample number of the image datasets of the three modes was 2430, respectively, among which the datasets of each mode include 2025 training set images and 405 test set images, respectively. The image labels were manually drawn by the clinicians.

3.1.2. Experimental Environments

Hardware environment: The computer had 256 GB of RAM, an NVIDIA TITAN V graphics card and an Intel (R) Xeon (R) Gold 6154 CPU @ 3.00 GHz processor. Software environment: Windows Server 2019 Datacenter 64-bit OS, Matlab 2020b, TensorFlow 2.0 Deep Learning Framework, CUDA 11.3.58.

3.2. Comparison Methods and Evaluation Metrics

3.2.1. Comparison Methods

In order to qualitatively prove the effectiveness of LatLRR-GAN, four fusion methods based on decomposition transformation and two deep learning fusion methods based on GAN were used to compare the fused results of CT images and PET images. Method 1: Nonsubsampled Contourlet Transform (NSCT) was used in the decomposition method, and the average gradient adaptive weighted fusion rule was used for the low-rank region images, and the fusion rule based on the region energy maximum was used for the salient region images. Method 2: LatLRR was used in the decomposition method, the fusion rule of average value was used in the low-rank region images, and the direct additive fusion rule was used in the salient region images. Method 3: Wavelet transform (WT) was used in the decomposition method, the fusion rule of average value was used for low-rank region images, and the fusion rule of maximum value was used for salient region images. Method 4: The decomposition method uses nested decomposition of LatLRR and NSCT. The low-frequency images use the fusion rule of average gradient adaptive weighting, and the high-frequency images use the fusion rule based on the regional energy maximum. Method 5: GANMcC [24]. Method 6: FusionGAN [15]. The parameters of both methods were set to the default values specified by their authors.

3.2.2. Evaluation Metrics

Six common evaluation metrics in the field of image fusion were used in the experiments to quantitatively evaluate the performance of LatLRR-GAN and other comparison methods. There was the average gradient (AG) [25], edge intensity (EI) [26], information entropy (IE) [27], spatial frequency (SF) [28], Q^AB/F [29] and standard deviation (SD) [30]. Among them, AG was used to measure the sharpness and texture details of the fused images. EI is a computational measure of image edge intensity, which is essentially the magnitude of the gradient of edge points. IE reflects the information content of the images and measures the expected value of the appearance of pixels at each position in the images. SF reflects the overall image sharpness and the change rate of the gray image. SF was obtained by row frequency and column frequency, and source images were not used as a reference image in the calculation. Q^AB/F was used to evaluate the amount of edge information transferred from input images to the fused image. SD measured the information richness of the fused images. Each evaluation metric was positively correlated with the quality of the fused images. The higher the evaluation metric value, the more details of the fused images could be obtained, and the higher the image clarity grade.

3.3. Comparison Experiments

In order to verify the effectiveness of LatLRR-GAN, two experiments were conducted. In Section 3.3.1, five cases of CT lung window images and PET images are compared on seven methods and six evaluation metrics. In Section 3.3.2, five cases of CT mediastinal window images and PET images are compared on seven methods and six evaluation metrics. Among them, the CT lung window images contain clear, detailed information of the trachea in the lung, and CT mediastinal window images contain clear mediastinal information.

3.3.1. CT Lung Window Images and PET Images

The fusion results of the CT lung window images and PET images are shown in Figure 3 and the evaluation metric results of the fused images are shown in Table 1. The best evaluation metric values are represented in red; the second-best evaluation metric values are represented in blue. Bar charts of the evaluation metric values of the fused images are shown in Figure 4.

It can be seen from Figure 3 that both Methods 4 and 7 clearly display detailed information of the lung bronchus, but detailed information on the lesion area and edge of Method 7 is more obvious. Methods 1, 2, 3, 5 and 6 could not better preserve the detailed information of the lung bronchus. Although Methods 2, 5 and 6 could highlight the lesion area information, the contrast of the fused images is lower and the edge information is blurred. Therefore, the Method 7 proposed in this paper can better fuse the information of lung bronchus in CT images and the lesion area information in PET images.

As can be seen from the evaluation metrics of IE and SD in Table 1 and Figure 4, there is little difference between the evaluation metric values of the Methods 4 and 7, which reflects how the fused images’ sharpness of these two methods is better. From the evaluation metrics of AG, EI and SF in Table 1 and Figure 4, it can be seen that Method 7 had a great improvement compared with other methods, which reflects the strong ability of the proposed method in preserving detailed information, such as that of the lung bronchus and edge intensity. Therefore, fused images with high definition and rich details could be obtained by the LatLRR-GAN method proposed in this paper.

3.3.2. CT Mediastinal Window Images and PET Images

The fusion results of CT mediastinal window images and PET images are shown in Figure 5 and the evaluation metric results of the fused images are shown in Table 2. The best evaluation metric values are represented in red; the second-best evaluation metric values are represented in blue. Bar charts of the evaluation metric values of the fused images are shown in Figure 6.

It can be seen from Figure 5 that Methods 1, 2, 4 and 7 can better retain the contrast information of CT mediastinal window images, such as tissue and bone. However, the performance abilities of Methods 1 and 2 on the lesion area are not as good as that of Method 7, and the edge intensity information of the fused images in Method 4 is not as good as that of Method 7. Methods 3, 5 and 6 are not as clear as Method 7 in their ability to express details such as tissue contour and lesion area. Although Method 3 can retain more lesion information, the contrast of detailed parts, such as organs and bones, is low, and detailed information on the edge is blurred. Therefore, Method 7 proposed in this paper can better fuse soft tissue information, such as the mediastinum in CT images and the lesion area information in PET images.

It can be seen from Table 2 and Figure 6 that compared with other methods, Method 7 proposed in this paper provides a great improvement in the evaluation metric values of AG and EI. In the evaluation metric values of SF, Method 7 shows little difference with Methods 1 and 4, but has obvious advantages compared with Methods 5 and 6. In the two evaluation metric values of IE and SD, Method 7 also has obvious advantages. In terms of the evaluation metric value of Q^AB/F, the value of Method 4 is improved compared with Method 7, because the background contrast of CT images is adjusted in Method 4 in order to highlight the lesion area of PET images. On the whole, the fused images obtained by Method 7 proposed in this paper can retain rich detailed information, such as on soft tissue and clearly contrasting information.

3.4. Ablation Experiments

Three cases of CT lung window images and PET images and three cases of CT mediastinal window images and PET images were selected, respectively, for ablation experiments to verify the effectiveness of the proposed method. Method 1 (LatLRR_AE): LatLRR was used to decompose the source images, the average value fusion rule was used for the low-rank region images, and the threshold adaptive weighted fusion rule based on the region energy ratio was used for the salient region images, which verified the effectiveness of the ED-D²GAN low-rank region images’ fusion rule proposed in this paper. Method 2 (ED-D²GAN): The ED-D²GAN proposed in this paper was directly used to fuse the source images, and the effectiveness of the LatLRR decomposition of the source images was verified. In addition, the advantages of ED-D²GAN using two discriminators were verified by changing the number of discriminators. Method 3 (Single_D_CT): only the D_CT discriminator was used and the remaining parts were consistent with the proposed method. Method 4 (Single_D_PET): only the D_PET discriminator is used and the remaining parts were consistent with the proposed method. Method 5 (LatLRR-GAN): LatLRR was used to decompose the source images, ED-D²GAN based fusion rule was used for the low-rank region images after decomposition, and the threshold adaptive weighted fusion rule based on a region energy ratio was used for the salient region images after decomposition.

3.4.1. CT Lung Window Images and PET Images

The fusion results of ablation experiments about CT lung window images and PET images are shown in Figure 7, and the evaluation metric results of the fused images are shown in Table 3. The best metrics are represented by red; the second-best metrics are represented by blue. In Figure 8, the evaluation metric values of the ablation experiment are visualized.

Figure 7 shows the fusion results about ablation studies of five methods on CT lung window images and PET images. It can be seen from the Figure that the visual effect of fused images of Method 1 is poor, and the contrast information of the lung bronchus is weak, which can reflect the obvious advantages of the low-rank region fusion rule based on ED-D²GAN in the LatLRR-GAN proposed in this paper. Although the fusion results of Method 2 improved compared with Method 1, it can be seen from Table 3 that the AG and EI of Method 2 are relatively low, which reflects the advantages of using LatLRR to decompose the source images. Compared with Methods 1 and 2, Methods 3 and 4 can better highlight the lesion area and better retain detailed information of the bronchial lung, but it can be seen from Table 3 and Figure 8 that using dual discriminators to distinguish the fused image from the two source images can make the fused images retain more detail in the CT source images and PET source images simultaneously. Moreover, Method 5 has advantages over the single discriminator in terms of AG, EI, SF and SD. Therefore, the proposed method can effectively improve the quality of the fused images.

3.4.2. CT Mediastinal Window Images and PET Images

The fusion results of the ablation experiment about CT mediastinal window images and PET images are shown in Figure 9. The evaluation metric results of the fused images are shown in Table 4. The best metrics are represented in red and the second-best metrics are represented in blue. In addition, the evaluation metric values of the ablation experiment are visualized in Figure 10.

Figure 9 shows the fusion results of the five methods of CT mediastinal window images and PET images about the ablation experiment. It can be seen from Figure 9 that the fused images of Method 1 are weak in contrast information and edge detail information, which reflects the effectiveness of the low-rank region images’ fusion rule based on ED-D²GAN proposed in this paper. Method 2 shows better performance in Q^AB/F, but it can be seen from Table 4 that the performance of this method is weak in the evaluation metrics of AG, EI and SD, which reflects the advantages of using LatLRR to decompose the source images in this paper. Methods 3 and 4 show little difference from Method 5 proposed in this paper in terms of visual effects, but it can be seen from Table 4 and Figure 10 that the dual discriminators can retain more detailed information in the two source images at the same time, and the overall effect of the fused images is better. Therefore, the LatLRR-GAN proposed in this paper has certain advantages in retaining information, such as detailed information and edge intensity.

4. Discussion

The method based on the hybrid model is an important research direction in the field of multimodal medical image fusion. The main work of this paper is an attempt in this research direction. By combining the multi-scale decomposition method with the method based on GAN, a significant lesion area of the fused images can be obtained. It has an important clinical application value in the early diagnosis and evaluation of the curative effect of tumors.

The fusion method based on a hybrid model can make up for the shortcomings of the single fusion method and improve the quality of the fused images effectively. This paper proposed a hybrid multimodal medical image fusion method based on LatLRR and ED-D²GAN, as shown in Figure 1. Due to LatLRR being able to obtain the noise component contained in the source images, this paper adopted LatLRR to decompose the source images and discard the noise component in the whole fusion process. Then, in order to improve the details of the lesion areas in the fused images, the low-rank CT images decomposed by LatLRR were enhanced in this paper, and the dual-discriminator GAN model was proposed as the fusion rule of low-rank region images. It is appropriate to use the GAN model as the fusion rule of low-rank region images, because it has a strong feature extraction ability and can constantly adjust the quality of fused images through the adversarial game process between the generator and discriminator. In addition, the use of two discriminators enables the final fused images to retain more details from the two source images at the same time, thus obtaining a better fused image. Finally, a threshold adaptive weighted algorithm based on a regional energy ratio was proposed as a fusion rule for salient region images. On the one hand, the correlation between multiple pixels in a local area was fully considered in this rule. On the other hand, when the energy of two local regions is very similar, this method can avoid the information loss caused by directly selecting the pixel value with the largest amount of energy of the region to a certain extent.

In summary, it can be seen from Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 that the visual effect of the fused images obtained by the method LatLRR-GAN proposed in this paper is superior to other fusion methods, which proves the superiority of LatLRR-GAN. Similarly, it can be seen from the objective evaluation metric values in Table 1, Table 2, Table 3 and Table 4, which can be quantitatively compared, that the LatLRR-GAN has certain advantages in AG, EI, IE, SF and SD.

Although LatLRR-GAN has some advantages in both qualitative and quantitative comparisons, this work still has limitations. As can be seen from Table 1, Table 2, Table 3 and Table 4, due to the method in this paper lacking further research on edge information retention, the performance of LatLRR-GAN on the Q^AB/F value is a little poor. At present, researchers have proposed a variety of methods for edge information extraction, and in future work, the authors will pay more attention to this problem.

5. Conclusions

This paper proposed a hybrid multimodal medical image fusion method based on LatLRR and ED-D²GAN. Firstly, the CT and PET source images were decomposed into low-rank region images, salient region images, and noise region images by LatLRR, respectively. Secondly, the decomposed CT low-rank region image was enhanced, and a dual discriminators GAN (ED-D²GAN) was used to fuse the low-rank region images. Thirdly, the threshold adaptive weighting algorithm based on a region energy ratio was used as the salient region images’ fusion rule. Finally, the final fused image was obtained by linear addition. Subjective and objective experiments demonstrate the effectiveness of the proposed fusion rules. The proposed method can not only highlight the lesion information in PET source images, but also obtain fused images with obvious contrast and edge intensity. It is of great significance to effectively alleviate the shortcomings of the single fusion method.

Author Contributions

Writing—review and editing, T.Z.; project administration, T.Z.; funding acquisition, T.Z.; Conceptualization, Q.L.; writing—original draft preparation, Q.L.; methodology, Q.L.; software, Q.L.; validation, Q.L., X.Z. and Q.C.; investigation, H.L.; data curation, H.L.; supervision, H.L.; visualization, X.Z.; formal analysis, Q.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62062003; Natural Science Foundation of Ningxia, grant number 2022AAC03149; and North Minzu University Research Project of Talent Introduction, grant number 2020KYQD08.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, Y.; Zhao, J.; Lv, Z.; Li, J. Medical Image Fusion Method by Deep Learning. Int. J. Cogn. Comput. Eng. 2021, 2, 21–29. [Google Scholar] [CrossRef]
Zhang, Y.D.; Dong, Z.; Wang, S.H.; Yu, X.; Yao, X.; Zhou, Q.; Hu, H.; Li, M.; Jiménez-Mesa, C.; Ramirez, J.; et al. Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation. Inf. Fusion 2020, 64, 149–187. [Google Scholar] [CrossRef] [PubMed]
Polinati, S.; Dhuli, R. Multimodal medical image fusion using empirical wavelet decomposition and local energy maxima. Optik 2020, 205, 163947. [Google Scholar] [CrossRef]
Al-Marzouqi, H.; AlRegib, G. Curvelet transform with learning-based tiling. Signal Process. Image Commun. 2017, 53, 24–39. [Google Scholar] [CrossRef]
Liu, Z.; Song, Y.; Sheng, V.S.; Xu, C.; Maere, C.; Xue, K.; Yang, K. MRI and PET image fusion using the nonparametric density model and the theory of variable-weight. Comput. Methods Programs Biomed. 2019, 175, 73–82. [Google Scholar] [CrossRef] [PubMed]
Diwakar, M.; Shankar, A.; Chakraborty, C.; Singh, P.; Arunkumar, G. Multi-modal medical image fusion in NSST domain for internet of medical things. Multimed. Tools Appl. 2022, 81, 37477–37497. [Google Scholar] [CrossRef]
Zong, J.; Qiu, T. Medical image fusion based on sparse representation of classified image patches. Biomed. Signal Process. Control. 2017, 34, 195–205. [Google Scholar] [CrossRef]
Li, S.; Yin, H.; Fang, L. Group-Sparse Representation With Dictionary Learning for Medical Image Denoising and Fusion. IEEE Trans. Biomed. Eng. 2012, 59, 3450–3459. [Google Scholar] [CrossRef]
Zhang, J.; Li, C.; Kosov, S.; Grzegorzek, M.; Shirahama, K.; Jiang, T.; Sun, C.; Li, Z.; Li, H. LCU-Net: A novel low-cost U-Net for environmental microorganism image segmentation. Pattern Recognit. 2021, 115, 107885. [Google Scholar] [CrossRef]
Zhou, T.; Ye, X.; Lu, H.; Zheng, X.; Qiu, S.; Liu, Y. Dense convolutional network and its application in medical image analysis. Biomed Res. Int. 2022, 2022, 1–22. [Google Scholar] [CrossRef]
Chen, H.; Li, C.; Wang, G.; Li, X.; Rahaman, M.; Sun, H.; Hu, W.; Li, Y.; Liu, W.; Sun, C.; et al. GasHis-Transformer: A multi-scale visual transformer approach for gastric histopathological image detection. Pattern Recognit. 2022, 130, 108827. [Google Scholar] [CrossRef]
Zhou, T.; Li, Q.; Lu, H.; Cheng, Q.; Zhang, X. GAN review: Models and medical image fusion applications. Inf. Fusion 2023, 91, 134–148. [Google Scholar] [CrossRef]
Chen, H.; Li, C.; Li, X.; Rahaman, M.; Hu, W.; Li, Y.; Liu, W.; Sun, C.; Sun, H.; Huang, X.; et al. IL-MCAM: An interactive learning and multi-channel attention mechanism-based weakly supervised colorectal histopathology image classification approach. Comput. Biol. Med. 2022, 143, 105265. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Chen, X.; Peng, H.; Wang, Z. Multi-focus image fusion with a deep convolutional neural network. Inf. Fusion 2017, 36, 191–207. [Google Scholar] [CrossRef]
Ma, J.; Wei, Y.; Liang, P.; Chang, L.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Fu, J.; Li, W.; Du, J.; Xu, L. DSAGAN: A generative adversarial network based on dual-stream attention mechanism for anatomical and functional image fusion. Inf. Sci. 2021, 576, 484–506. [Google Scholar] [CrossRef]
Liu, G.; Yan, S. Latent low-rank representation for subspace segmentation and feature extraction. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
Gao, C.; Song, C.; Zhang, Y.; Qi, D.; Yu, Y. Improving the Performance of Infrared and Visible Image Fusion Based on Latent Low-Rank Representation Nested With Rolling Guided Image Filtering. IEEE Access 2021, 9, 91462–91475. [Google Scholar] [CrossRef]
Xia, K.; Yin, H.; Wang, J. A novel improved deep convolutional neural network model for medical image fusion. Clust. Comput. 2019, 22, 1515–1527. [Google Scholar] [CrossRef]
Wang, L.; Chang, C.; Hao, B.; Liu, C. Multi-modal Medical Image Fusion Based on GAN and the Shift-Invariant Shearlet Transform. In Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Online Event, 16–19 December 2020. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations 2016, Caribe Hilton, San Juan, Puerto Rico, 12–19 November 2015. [Google Scholar]
Srivastava, R.; Khare, A.; Prakash, O. Local energy-based multimodal medical image fusion in curvelet domain. IET Comput. Vis. 2016, 10, 513–527. [Google Scholar] [CrossRef]
Ma, J.; Zhang, H.; Shao, Z.; Liang, P.; Xu, H. GANMcC: A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion. IEEE Trans. Instrum. Meas. 2021, 70, 1–14. [Google Scholar] [CrossRef]
Shen, Y.; Wu, Z.; Wang, X.; Dong, Y.; Jiang, N. Tetrolet transform images fusion algorithm based on fuzzy operator. J. Front. Comput. Sci. Technol. 2015, 9, 1132–1138. [Google Scholar]
Petrovic, V.; Cootes, T. Information representation for image fusion evaluation. In Proceedings of the Fusion 2006, Florence, Italy, 10–13 July 2006. [Google Scholar] [CrossRef]
Roberts, J.W.; Van Aardt, J.; Ahmed, F. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J. Appl. Remote Sens. 2008, 2, 023522. [Google Scholar] [CrossRef]
Eskicioglu, A.M.; Fisher, P.S. Image quality measures and their performance. IEEE Trans. Commun. 1995, 43, 2959–2965. [Google Scholar] [CrossRef] [Green Version]
Xydeas, C.S.; Petrovic, V. Objective image fusion performance measure. Electron. Lett. 2000, 36, 308–309. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]

Figure 1. The overall network architecture. (a) The overall framework process; (b) the overall network architecture of ED-D²GAN; (c) the network structure of the generator; (d) the network structure of the CT discriminator of ED-D²GAN; and (e) the network structure of the PET discriminator of ED-D²GAN.

Figure 2. LatLRR decomposition process.

C T_L 1

and

P E T_L 2

represent the low-rank region images of the decomposed source images;

C T_S 1

and

P E T_S 2

represent the salient region images of the decomposed source images; and

C T_N 1

and

P E T_N 2

represent the noisy region images of the decomposed source images.

Figure 2. LatLRR decomposition process.

C T_L 1

and

P E T_L 2

represent the low-rank region images of the decomposed source images;

C T_S 1

and

P E T_S 2

represent the salient region images of the decomposed source images; and

C T_N 1

and

P E T_N 2

represent the noisy region images of the decomposed source images.

Figure 3. The fusion results of CT lung window images and PET images. Method 1: NSCT; Method 2: LatLRR; Method 3: WT; Method 4: LatLRR+NSCT; Method 5: GANMcC; Method 6: FusionGAN; Method 7: LatLRR-GAN.

Figure 4. Bar charts of fused images’ evaluation metric values of CT lung window images and PET images. Method 1: NSCT; Method 2: LatLRR; Method 3: WT; Method 4: LatLRR+NSCT; Method 5: GANMcC; Method 6: FusionGAN; Method 7: LatLRR-GAN.

Figure 5. The fusion results of CT mediastinal window images and PET images. Method 1: NSCT; Method 2: LatLRR; Method 3: WT; Method 4: LatLRR+NSCT; Method 5: GANMcC; Method 6: FusionGAN; Method 7: LatLRR-GAN.

Figure 6. Bar charts of fused images’ evaluation metric values of CT mediastinal window images and PET images. Method 1: NSCT; Method 2: LatLRR; Method 3: WT; Method 4: LatLRR+NSCT; Method 5: GANMcC; Method 6: FusionGAN; Method 7: LatLRR-GAN.

Figure 7. The fusion results of CT lung window images and PET images about the ablation experiments. Method 1: LatLRR_AE; Method 2: ED-D²GAN; Method 3: Single_D_CT; Method 4: Single_D_PET; Method 5: LatLRR-GAN.

Figure 8. The evaluation metrics coefficient radar maps of CT lung window images and PET images about the ablation experiments. Method 1: LatLRR_AE; Method 2: ED-D²GAN; Method 3: Single_D_CT; Method 4: Single_D_PET; Method 5: LatLRR-GAN.

Figure 9. The fusion results of CT mediastinal window images and PET images about the ablation experiments. Method 1: LatLRR_AE; Method 2: ED-D²GAN; Method 3: Single_D_CT; Method 4: Single_D_PET; Method 5: LatLRR-GAN.

Figure 10. The evaluation metrics’ coefficient radar maps of CT mediastinal window images and PET images about the ablation experiment. Method 1: LatLRR_AE; Method 2: ED-D²GAN; Method 3: Single_D_CT; Method 4: Single_D_PET; Method 5: LatLRR-GAN.

Table 1. The results of fused images’ evaluation metric values of CT lung window images and PET images. Method 1: NSCT; Method 2: LatLRR; Method 3: WT; Method 4: LatLRR+NSCT; Method 5: GANMcC; Method 6: FusionGAN; Method 7: LatLRR-GAN. The best evaluation metric values are represented in red; the second-best evaluation metric values are represented in blue.

Images	Methods	AG	EI	IE	SF	Q^AB/F	SD
1	NSCT	6.7902	61.8019	6.6232	27.6588	0.4876	6.6319
	LatLRR	6.1668	58.5574	6.8845	21.4049	0.4725	6.2861
	WT	5.4231	48.7742	6.3391	24.2894	0.3532	5.7617
	LatLRR+NSCT	7.1699	66.8098	6.9679	27.0213	0.5089	6.6074
	GANMcC	5.2624	42.1391	5.5375	20.3215	0.3070	5.6115
	FusionGAN	5.3399	44.9403	6.2913	20.9859	0.3368	5.0670
	LatLRR-GAN	9.9507	98.0598	7.1664	32.6015	0.4936	7.1942
2	NSCT	7.1907	66.5461	6.4508	28.3334	0.5453	6.5942
	LatLRR	6.4104	61.2882	6.6764	22.0096	0.5071	6.1107
	WT	5.3815	50.4165	6.2273	24.1727	0.3158	5.5468
	LatLRR+NSCT	7.4851	70.4718	6.7766	27.5646	0.5452	6.5234
	GANMcC	5.2782	46.3028	5.7938	21.5764	0.3126	5.6716
	FusionGAN	5.7324	48.5718	6.1302	21.6855	0.3339	5.7612
	LatLRR-GAN	10.3892	102.5992	7.0284	33.2705	0.5077	7.3602
3	NSCT	6.3354	58.0445	6.6663	27.0911	0.4883	6.6298
	LatLRR	5.8110	55.4798	6.6929	20.8422	0.4980	6.3504
	WT	4.9259	46.0231	6.3396	24.2832	0.3150	5.8398
	LatLRR+NSCT	6.7313	62.9149	6.7644	26.8761	0.5161	6.6197
	GANMcC	4.4921	40.7514	5.5336	19.5217	0.3080	5.5639
	FusionGAN	4.4775	41.5534	5.9374	20.2999	0.3072	5.4516
	LatLRR-GAN	9.5939	94.6642	7.1047	30.5888	0.4958	7.3836
4	NSCT	5.8972	52.3510	5.9100	27.4761	0.4629	6.4959
	LatLRR	5.2983	49.6661	6.7621	20.9930	0.4707	6.3137
	WT	4.5008	41.3844	5.7323	23.7695	0.3014	5.7827
	LatLRR+NSCT	6.1958	56.5728	6.8205	26.8018	0.5078	6.4695
	GANMcC	4.4075	39.0327	5.5711	20.1564	0.3010	5.3314
	FusionGAN	4.6772	40.8885	5.5567	21.5881	0.3956	5.5948
	LatLRR-GAN	8.3274	80.7329	7.0516	30.0796	0.4869	7.2110
5	NSCT	6.6641	60.0758	6.4024	27.6600	0.5070	6.6517
	LatLRR	5.8707	54.9209	6.6354	21.2508	0.4786	6.2156
	WT	4.9883	46.0404	6.1079	24.1076	0.3069	5.7196
	LatLRR+NSCT	6.8949	63.4267	6.7009	27.0235	0.5206	6.5769
	GANMcC	4.9554	40.1703	5.0607	19.3215	0.3004	5.0556
	FusionGAN	4.7924	42.7647	5.9446	20.7787	0.3385	5.0781
	LatLRR-GAN	9.6424	93.7401	6.9441	32.6549	0.5093	7.3473

Table 2. The results of fused images’ evaluation metric values of CT mediastinal window images and PET images. Method 1: NSCT; Method 2: LatLRR; Method 3: WT; Method 4: LatLRR+NSCT; Method 5: GANMcC; Method 6: FusionGAN; Method 7: LatLRR-GAN. The best evaluation metric values are represented in red; the second-best evaluation metric values are represented in blue.

Images	Methods	AG	EI	IE	SF	Q^AB/F	SD
1	NSCT	6.8779	61.7357	5.0360	32.0976	0.5116	5.7207
	LatLRR	6.0419	56.2118	5.9776	24.9882	0.4956	5.5529
	WT	5.1805	47.4596	4.8254	28.0206	0.3148	5.0303
	LatLRR+NSCT	6.9261	63.7558	6.0681	30.5653	0.5468	5.7738
	GANMcC	4.5108	30.4581	4.7010	22.6290	0.3114	4.6901
	FusionGAN	5.2713	48.2093	4.5326	26.2313	0.3069	4.5911
	LatLRR-GAN	11.3958	104.6724	6.7085	39.3140	0.5220	6.7189
2	NSCT	7.9136	72.0494	5.9264	31.4682	0.5190	5.8663
	LatLRR	7.1379	67.3753	6.7216	24.5394	0.4857	5.9897
	WT	5.7516	53.0297	5.7662	26.6668	0.3893	5.3399
	LatLRR+NSCT	8.2567	77.0077	6.6908	30.3958	0.5355	6.1098
	GANMcC	5.2162	40.8938	5.8925	21.4760	0.3835	5.4296
	FusionGAN	5.6553	46.4851	5.9131	22.4006	0.4027	5.7214
	LatLRR-GAN	9.0267	87.3771	6.3032	34.9138	0.5201	6.5196
3	NSCT	7.3579	66.9876	5.8889	29.2474	0.4906	5.7461
	LatLRR	6.7382	63.8525	6.6608	23.0331	0.4651	5.9029
	WT	5.5173	51.2336	5.7161	25.9284	0.3933	5.2259
	LatLRR+NSCT	7.7724	72.5752	6.6468	28.7566	0.5077	6.0418
	GANMcC	5.2840	40.6193	5.8015	22.4883	0.3154	5.5587
	FusionGAN	5.4292	42.8494	5.9772	20.3388	0.3390	5.5425
	LatLRR-GAN	10.6800	104.2155	6.9095	34.3389	0.4821	6.6913
4	NSCT	9.1151	81.7507	6.2800	31.8276	0.5219	5.9192
	LatLRR	8.3223	77.1386	6.7648	25.3768	0.4811	5.8338
	WT	6.6501	60.7376	6.0900	28.0719	0.3975	5.1960
	LatLRR+NSCT	9.5643	88.0301	6.8263	30.9593	0.5320	6.0858
	GANMcC	6.5427	46.0002	6.0664	21.3119	0.3902	5.2568
	FusionGAN	6.6687	46.0735	6.0146	22.4728	0.3934	5.2173
	LatLRR-GAN	10.5485	103.3578	6.9186	32.7830	0.4926	6.7281
5	NSCT	9.0230	74.8148	5.4163	36.7065	0.5333	5.7432
	LatLRR	7.9501	67.9433	6.5126	29.8936	0.4953	5.6478
	WT	6.5029	55.1489	5.2648	31.4250	0.3038	5.0835
	LatLRR+NSCT	9.1967	78.3693	6.5572	35.6572	0.5477	5.8475
	GANMcC	6.2239	42.7345	5.6118	22.0594	0.3506	5.6745
	FusionGAN	6.3612	44.9311	5.9115	24.4120	0.3017	5.0167
	LatLRR-GAN	12.3573	119.3307	7.0015	36.1969	0.5078	6.8767

Table 3. The fused images’ evaluation metric results of CT lung window images and PET images about the ablation experiments. Method 1: LatLRR_AE; Method 2: ED-D²GAN; Method 3: Single_D_CT; Method 4: Single_D_PET; Method 5: LatLRR-GAN. The best evaluation metric values are represented in red; the second-best evaluation metric values are represented in blue.

Images	Methods	AG	EI	IE	SF	Q^AB/F	SD
1	LatLRR_AE	7.1936	70.4249	6.7235	22.6081	0.5044	6.6990
	ED-D²GAN	8.5586	82.8161	6.7909	30.9804	0.5293	7.1277
	Single_D_CT	10.5909	105.5618	7.2029	32.2329	0.4568	7.8626
	Single_D_PET	10.5980	105.1268	7.1799	32.5337	0.4580	7.7477
	LatLRR-GAN	11.0359	109.2525	7.2611	33.2306	0.4649	7.9309
2	LatLRR_AE	7.4516	72.8877	6.7520	23.2457	0.5010	6.6615
	ED-D²GAN	8.7314	84.2797	6.9045	31.6064	0.5160	7.1360
	Single_D_CT	11.1608	110.8545	7.2271	33.5400	0.4567	7.9171
	Single_D_PET	10.9083	108.0114	7.2447	33.1174	0.4668	7.7543
	LatLRR-GAN	11.4452	113.0691	7.2678	33.8591	0.4583	7.9639
3	LatLRR_AE	6.7730	66.6580	6.7490	22.2463	0.4994	6.3988
	ED-D²GAN	8.2582	80.0652	6.7024	32.2703	0.4976	7.1346
	Single_D_CT	10.3707	103.2422	7.2092	32.6115	0.4502	7.7318
	Single_D_PET	10.1196	100.4910	7.1579	31.9520	0.4533	7.5195
	LatLRR-GAN	10.6438	105.3437	7.2439	33.2622	0.4583	7.7910

Table 4. The results of fused images’ evaluation metric values of CT mediastinal window images and PET images about the ablation experiment. Method 1: LatLRR_AE; Method 2: ED-D²GAN; Method 3: Single_D_CT; Method 4: Single_D_PET; Method 5: LatLRR-GAN. The best evaluation metric values are represented in red; the second-best evaluation metric values are represented in blue.

Images	Methods	AG	EI	IE	SF	Q^AB/F	SD
1	LatLRR_AE	7.1179	67.7787	6.1955	26.2079	0.4988	6.0567
	ED-D²GAN	8.3479	77.0280	5.9225	33.8304	0.5423	6.2247
	Single_D_CT	8.7781	83.6390	6.4083	32.7873	0.4786	6.8221
	Single_D_PET	8.5449	82.4642	6.3324	31.5362	0.4843	6.7714
	LatLRR-GAN	9.3172	88.9128	6.4332	34.1039	0.4983	6.8266
2	LatLRR_AE	6.7746	64.8246	6.2563	25.4204	0.4888	6.0667
	ED-D²GAN	7.8481	72.5484	5.8768	32.5410	0.5359	6.1651
	Single_D_CT	8.3265	79.4966	6.4489	32.0401	0.4840	6.7476
	Single_D_PET	8.0521	77.6914	6.3404	30.8232	0.4764	6.7126
	LatLRR-GAN	8.6675	83.0892	6.4701	33.4706	0.4956	6.8007
3	LatLRR_AE	6.7428	64.5195	6.1546	25.2950	0.4893	5.7954
	ED-D²GAN	7.9406	73.7453	5.8995	33.7126	0.5313	6.1157
	Single_D_CT	8.3980	80.6120	6.4209	31.7215	0.4724	6.5390
	Single_D_PET	8.2376	80.0195	6.3216	30.6989	0.4747	6.5238
	LatLRR-GAN	8.7718	84.3128	6.4379	33.4766	0.4904	6.5717

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, T.; Li, Q.; Lu, H.; Zhang, X.; Cheng, Q. Hybrid Multimodal Medical Image Fusion Method Based on LatLRR and ED-D²GAN. Appl. Sci. 2022, 12, 12758. https://doi.org/10.3390/app122412758

AMA Style

Zhou T, Li Q, Lu H, Zhang X, Cheng Q. Hybrid Multimodal Medical Image Fusion Method Based on LatLRR and ED-D²GAN. Applied Sciences. 2022; 12(24):12758. https://doi.org/10.3390/app122412758

Chicago/Turabian Style

Zhou, Tao, Qi Li, Huiling Lu, Xiangxiang Zhang, and Qianru Cheng. 2022. "Hybrid Multimodal Medical Image Fusion Method Based on LatLRR and ED-D²GAN" Applied Sciences 12, no. 24: 12758. https://doi.org/10.3390/app122412758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Multimodal Medical Image Fusion Method Based on LatLRR and ED-D²GAN

Abstract

1. Introduction

2. Proposed Method

2.1. Low-Rank Region Images Fusion Method

2.1.1. Generator

2.1.2. Discriminator

2.1.3. Loss Function

2.1.4. Training Details

2.2. Salient Region Images Fusion Method

3. Experimental Section

3.1. Dataset and Experimental Environments

3.1.1. Dataset

3.1.2. Experimental Environments

3.2. Comparison Methods and Evaluation Metrics

3.2.1. Comparison Methods

3.2.2. Evaluation Metrics

3.3. Comparison Experiments

3.3.1. CT Lung Window Images and PET Images

3.3.2. CT Mediastinal Window Images and PET Images

3.4. Ablation Experiments

3.4.1. CT Lung Window Images and PET Images

3.4.2. CT Mediastinal Window Images and PET Images

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI