Next Article in Journal
Halfway to Automated Feeding of Chinese Hamster Ovary Cells
Next Article in Special Issue
2S-BUSGAN: A Novel Generative Adversarial Network for Realistic Breast Ultrasound Image with Corresponding Tumor Contour Based on Small Datasets
Previous Article in Journal
A New Vision Measurement Technique with Large Field of View and High Resolution
Previous Article in Special Issue
A Hierarchical Siamese Network for Noninvasive Staging of Liver Fibrosis Based on US Image Pairs of the Liver and Spleen
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

TDFusion: When Tensor Decomposition Meets Medical Image Fusion in the Nonsubsampled Shearlet Transform Domain

1
Jiangsu Province Key Lab on Image Processing and Image Communication, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2
National Engineering Research Center of Communication and Network Technology, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(14), 6616; https://doi.org/10.3390/s23146616
Submission received: 11 June 2023 / Revised: 3 July 2023 / Accepted: 14 July 2023 / Published: 23 July 2023
(This article belongs to the Special Issue Computer-Aided Diagnosis Based on AI and Sensor Technology)

Abstract

:
In this paper, a unified optimization model for medical image fusion based on tensor decomposition and the non-subsampled shearlet transform (NSST) is proposed. The model is based on the NSST method and the tensor decomposition method to fuse the high-frequency (HF) and low-frequency (LF) parts of two source images to obtain a mixed-frequency fused image. In general, we integrate low-frequency and high-frequency information from the perspective of tensor decomposition (TD) fusion. Due to the structural differences between the high-frequency and low-frequency representations, potential information loss may occur in the fused images. To address this issue, we introduce a joint static and dynamic guidance (JSDG) technique to complement the HF/LF information. To improve the result of the fused images, we combine the alternating direction method of multipliers (ADMM) algorithm with the gradient descent method for parameter optimization. Finally, the fused images are reconstructed by applying the inverse NSST to the fused high-frequency and low-frequency bands. Extensive experiments confirm the superiority of our proposed TDFusion over other comparison methods.

1. Introduction

Medical image fusion plays a pivotal role in contemporary medical research and holds profound significance for medical treatments [1]. With the advancements in medical imaging, a diverse array of imaging technologies has emerged. Prominent modalities encompass cardiac angiography, computed tomography (CT), positron emission tomography (PET), magnetic resonance imaging (MRI), and single photon emission computed tomography (SPECT), among others. Each of these modalities focuses on distinct aspects of the human body or various pathologies. Early medical images predominantly captured the anatomical structure through morphological imaging techniques such as CT and MR images. Although both of these modalities provide anatomical information, their emphasis differs. CT effectively reveals bone tissue and blood vessels, while MRI excels in visualizing soft tissue. The advent of PET and SPECT has enhanced the emphasis on functional and metabolic information. Nonetheless, the information conveyed by these individual images remains fragmented, limiting their utility in medical observation and diagnosis. Consequently, medical image fusion technology [2,3], which integrates multiple images of diverse modalities into a single composite image that encompasses a range of complementary information, has achieved increasing attention. This technique mitigates randomness and redundancy, enhancing the clinical applicability of medical images for the diagnosis and assessment of medical conditions.
The prevailing medical image fusion methodologies [4,5,6,7,8,9] primarily adopt a multiscale transformation (MST) framework. In general, MST-based fusion methods entail the following three steps: Firstly, the source images are decomposed into a domain of multiscale transforms. Subsequently, the transformed coefficients are merged utilizing a fusion rule. Finally, the fused image is reconstructed by applying the inverse transform to the merged coefficients. Within this framework, fusion approaches that incorporate diverse transformation methods and fusion strategies have been proposed, including the nonsubsampled contourlet transform (NSCT) [10] and the nonsubsampled shearlet transform (NSST) [11]. While the implementation process of the shearlet transform (ST) method utilized in NSST shares similarities with the contourlet transform (CT), the ST employs shear filters instead of directional filters found in the CT. This distinction liberates ST from the limitations associated with the number of directions, enabling it to capture a greater level of detail and functional information across various orientations. The ST exhibits a heightened directional sensitivity and accommodates diverse geometric shapes, thereby proficiently capturing the intrinsic geometric features of multidimensional phenomena. The NSST incorporates the nonsubsampled ST within nonsubsampled pyramid filters (NSPFs) [12] and the shift-invariant shearlet filters (SFBs) [11]. This integration endows the NSST with an enhanced robustness to distortion, courtesy of the shift variance property. Furthermore, the NSST inherits the advantages of multiscale and multidirectional characteristics, rendering it a highly effective image decomposition method. Given these compelling attributes, the NSST is selected as the preferred multiscale transform for our model. Consequently, our model adheres to the fundamental MST-based fusion framework.
In the context of MST-based fusion methods, the fusion strategy for high-frequency and low-frequency components represents a pivotal challenge. Although the human visual system is more sensitive to high-frequency components, a significant portion of information in the source images is embedded within the low-frequency domain. Thus, achieving meaningful fusion outcomes necessitates careful consideration of both high-frequency and low-frequency fusion processes. Regrettably, existing MST-based fusion methods often address these two problems separately and treat them disparately and independently. The integration of neural networks holds considerable promise for data fusion tasks. Notably, recent works [13,14,15] have leveraged deep neural networks to fuse multimodal information for predicting interactions. Specifically, a fusion strategy based on the fuzzy adaptive reduced pulse-coupled neural network (RPCNN) [16] has demonstrated effective handling of the primary low-frequency and high-frequency fusion problems. By employing RPCNNs with fuzzy-adaptive linking strengths, both the low-frequency subband (LFS) and high-frequency subband (HFS) coefficients are fused in a similar manner. This approach significantly enhances the image fusion performance. However, one limitation remains, which is the presence of free parameters in the pulse-coupled neural network (PCNN). To overcome the challenge of setting free parameters in the traditional PCNN model, Ming et al. [7] introduced the parameter adaptive pulse-coupled neural network (PA-PCNN) model to fuse high-frequency coefficients using adaptive PCNN parameters. While this method addresses the parameter selection issue, the fusion strategy for the high-frequency component remains conventional, adhering to the traditional high-frequency fusion approach. Xu et al. [17] exploited the component separation setting, but its complex computation degrades its practicability. In order to equally emphasize the high- and low-frequency components, we propose a cross-fusion technique that integrates the high-frequency and low-frequency components from the source images. In our model, we employ tensor decomposition to effectively fuse the high-frequency and low-frequency components, thereby extracting more pertinent information.
Tensors, as a branch of mathematical research and a generalization of vector concepts, offer great convenience in handling high-dimensional data. With the advancements in computational imaging like hyperspectral imaging and magnetic resonance imaging, tensors have gradually found practical applications. Tensor decomposition serves as a higher-order generalization of matrix decomposition, sharing similarities with matrix factorization. It serves three main purposes: dimension reduction, missing data filling, and implicit relation mining. In the context of multi-dimensional images, the Tucker decomposition [18] has been extensively utilized for various purposes, such as image denoising, selecting image features through tensor subspace, and compressing data. Building upon Tucker decomposition, Li et al. proposed a multiband image fusion method based on tensor decomposition [19]. This approach approximates the three-dimensional tensor as a nuclear tensor multiplied by a three-order dictionary, expressing the fusion problem as an estimation of the kernel tensor and the three-mode dictionary. Through iterative decomposition steps until convergence, the dictionary and core tensor are updated to achieve accurate estimation. In our proposed method, we construct an image fusion strategy based on tensor decomposition, leveraging the fusion of tensor information from different dimensions (with high frequency being three-dimensional and low frequency being two-dimensional). By cross-fusing the high- and low-frequency components in the source images, we aim to achieve equal attention to the high- and low-frequency components. In our paper, we combine the tensor-decompositon-based fusion strategy with the nonsubsampled shearlet transform (NSST) to develop a novel optimization model for medical image fusion, named tensor decomposition in the nonsubsampled shearlet transform domain (TDFusion).
In the field of medical image fusion, the concept of tensors has been previously explored but has not been considered a sufficiently novel idea. For instance, Liu et al. [20] proposed the use of structure tensors for analyzing image properties. However, it is important to note that the structure tensor differs fundamentally from the tensor employed in our model. Our tensor representation serves as a method to effectively represent image data and facilitates improved fusion outcomes. During the Tucker decomposition process, there is a potential risk of damaging certain information in the source material due to the sparse prior. To address this issue, we introduce the joint static and dynamic guidance filtering (JSDG) technique proposed by Ham et al. [21] to supplement the corresponding high-frequency information. Unlike the multi-modal deep guided filtering approach proposed by Bernhard et al. [22], which combines a local linear guided filter with a guided image obtained from multimodal input, JSDG incorporates the concept of dynamic and static guidance and utilizes the output as the dynamic guidance of the image. This approach not only focuses on the structural information of the static guidance but also considers the properties of the input image. The JSDG model we employ combines static and dynamic guidance structures, effectively incorporating high-frequency components while preserving the results of mixed-frequency fusion. Furthermore, we adopt the low-frequency information completion strategy from the PA-PCNN method, which defines two new activity level measures, namely weighted local energy (WLE) and the weighted sum of an eight-neighborhood-based modified Laplacian (WSEML). In this method, the fusion of low-frequency components from two images is accomplished through the utilization of WLE and WSEML as the foundation of the NSST. This successfully completes the fusion of low-frequency images and information, thereby enhancing the overall fusion performance.
In conclusion, this paper proposes a novel unified optimization model for multimodal medical image fusion, leveraging tensor decomposition and the NSST. By integrating the low-frequency and high-frequency components using the tensor decomposition method, we obtain a mixed-frequency fusion image as illustrated in Figure 1. The main contributions of our method can be summarized as follows:
  • Our TDFusion model is a unified optimization model. On the basis of the NSST method and the tensor decomposition method, the mixed-frequency fusion image is obtained by fusing the high-frequency and low-frequency components of two source images.
  • Considering the structural differences between high-frequency and low-frequency components, some information will be lost during fusion. We embed the framework into the guided filter to optimize and complete the knowledge from low frequencies to high frequencies.
  • We combine the ADMM algorithm with the gradient descent method to improve the performance of the fusion image. Through a large number of experiments, the effectiveness of our model in five benchmark datasets of image fusion problems (T1 and T2, T2 and PD, CT and MRI, MRI and PET, and MR and SPECT) is verified. Compared with the other five medical image fusion methods, our model also achieves better results.
The paper structure is arranged as follows. Section 1 is the introduction of the paper and details some methods used in our model. In Section 3, the notation and preliminaries of tensors are briefly introduced. Section 4 discusses the solution process of our model in detail. Then, the specific experimental results and a comparative analysis are shown in Section 5. Finally, Section 6 gives a brief summary.

2. Related Work

In recent years, many well-known approaches based on machine learning, deep learning, or some other methods for brain tumor detection and identification have emerged. Diwakar et al. [23] used the non-subsampling shearlet transform (NSST) to extract low- and high-frequency image components in multimodal medical images and proposed a new method for low frequency component fusion based on an improved and modified Laplacian (MSML) clustering dictionary learning technique. In the NSST domain, directional contrast can be used to fuse high-frequency coefficients while using the inverse NSST method to obtain multimodal medical images. Two-dimensional medical image segmentation models are popular among researchers using traditional and new machine learning and deep learning techniques. However, because so much work has been done in recent years to create 3D volumes, 3D volumetric data have just become more widely available. The new architecture developed by Nodirov et al. [24] is based on a 3D U-Net model that employs numerous skip connections, cost-effective pre-trained 3D MobileNetV2 blocks, and attention modules. They employ 3D brain image data. In order to maximize the use of features, additional skip connections are also introduced between the encoder and decoder blocks to streamline the exchange of extracted features between the two blocks. The skip connection’s irrelevant aspects are also filtered out using the attention module, which keeps more processing power while achieving improved accuracy. Biomedical image processing makes it simpler to find and localize brain cancers using MRI. In order to detect brain tumor locations, Arif et al. [25] suggested a method using MRI sequence pictures as the input images. This method is extremely challenging due to the huge range of tumor tissues present in different patients. To enhance the effectiveness of medical image segmentation and streamline the segmentation process, researchers used the Berkeley wavelet transform (BWT) and a deep learning classifier as the foundation for their work. Using the gray-level co-occurrence matrix (GLCM) approach, the important features of each tissue were also identified and the features were subsequently improved using genetic algorithms.

3. Notation and Preliminaries of Tensors

We summarize the notation and preliminaries of tensors widely used in this paper in Table 1. The scalars, vectors, matrices, and tensors are, respectively, denoted by the lowercase letters, bold lowercase letters, bold uppercase letters, and calligraphic letters.  E  means the identity matrix.  1  denotes a matrix filled with ones. Variables with the superscript  pre  mean the corresponding variables in the previous iteration. The superscript  T  denotes the transposition operation.
An M-dimensional tensor is denoted as  A R N 1 × N 2 × × N M . Its entries are denoted as  a n 1 n 2 , , n M  or  A n 1 , n 2 , , n M , where  1 n m N m . The m-mode unfolding matrix  A ( m ) R N m × N 1 N 2 , , N m 1 N m + 1 , , N M  is defined by arranging all the m-mode unfolding vectors as the rows of the matrix.
Based on the definitions above, we also provide the definition for the multiplication of a tensor and a matrix. Given an M-dimensional tensor  A R N 1 × N 2 × × N M  and a matrix  B R E m × N m , the m-mode product can be denoted by  A × m B . We record it as  C R N 1 × × N m 1 × E m × N m + 1 × × N M , which is an M-dimensional tensor whose entries are computed by:
c n 1 n 2 , , n m 1 , e m , n m + 1 , , n M = n m = 1 N m a n 1 n 2 , , n m 1 , n m , n m + 1 , , n M b e m , n m
Utilizing the m-mode unfolding matrix, the m-mode product can also be computed in the form of matrix multiplication:
C ( m ) = BA ( m ) = error A × m B
The m-mode product also has many properties. We list some of them which will be used in the following deduction.
Property 1.
A series of tensor products are exchangeable for distinct modes (matrices  B m 1 R E m 1 × N m 1 , B m 2 R E m 2 × E m 2 ):
A × m 1 B m 1 × m 2 B m 2 = A × m 2 B m 2 × m 1 B m 1 m 1 m 2
Property 2.
A series of tensor products are mergeable for the same mode (matrices  B 1 R E m × N m , B 2 R L m × E m ):
A × m B 1 × m B 2 = A × m B 2 B 1
Property 3.
Given a collection of matrices  B m R E m × N m  ( m = 1 , 2 , , M ), we define an M-dimensional tensor  G = A × 1 B 1 × 2 B 2 × 3 × M B M . Then, for tensor  G R E 1 × E 2 × × E M , we have:
g = B M B M 1 B 1 a
wheremeans the Kronecker product and  g , a  are the vectorization of tensors  G , A , which are obtained by stacking all the one-mode vectors of the tensors.
The symbols used in this section like  A , B , C , G  represent general tensors or matrices.

4. The Proposed Method

After Section 3 introduced the related theorems and other knowledge, in this section, we introduce the NSST, tensor decomposition, and other related content and list the optimization process and the solution process of subproblems.

4.1. Nonsubsampled Shearlet Transform (NSST)

A shearlet is able to capture the intrinsic geometrical features of multidimensional phenomena effectively [11]. Owing to the shift invariance given by the nonsubsampled process, the NSST is more robust than other multiscale transforms. As a result, we adopt the NSST in our MST framework, which involves two basic steps:
Multiscale Decomposition: Given the n-th source picture  S n R I × J  in our fusion method, nonsubsampled pyramid filters (NSPFs) are used to obtain multiscale representations of the picture. In total, K levels of NSPFs are used, so the source image  S n  is decomposed into K high-frequency maps  H n k R I × J ( k = 1 , , K ) , whose scale ranges from fine to coarse. The rest of picture after filtering is denoted as a low-frequency map  L n R I × J .
Multidirectional Representation: We let the high-frequency map  H n k  pass through the shift-invariant shearlet filter banks (SFBs) to obtain its multidirectional representations. If we decompose the k-th level of the map in  D k  directions, the result can be denoted as  H n k , d k R I × J ( d k = 1 , , D k ) .
After the two steps, we obtain the multiscale and multidirectional representations of the source image  S n H n k , d k L n ( k = 1 , , K , d k = 1 , , D k ) . It is worth noting that every decomposed high-frequency map  H n k , d k R I × J  is of the same size as the source image  S n R I × J . Thus, we can combine all maps along the third dimension and form a three-dimensional tensor  H n R I × J × D , where  D = k d k . The low-frequency map  L n  can also be seen as a two-dimensional tensor  L n . Then, the whole NSST can be represented by the following equation:
NSST S n = H n , L n
For simplicity, we call  H n R I × J × D  and  L n R I × J  the high-frequency and low-frequency maps, respectively, in the following deduction. As can be seen in Figure 2, the high-frequency maps (Figure 2(b1,b2)) contain rich detail and edge information. Human eyes are also more sensitive to the high-frequency information. The low-frequency maps (Figure 2(c1,c2)) mainly preserve the shape and strength information. They keep the majority of information from the source images. Thus, both the high-frequency and low-frequency fusion processes are of great importance to the purpose of medical image fusion, preserving all valid and useful pattern information from the source images.

4.2. Tensor Decomposition Based Fusion

In contrast to conventional MST-based fusion strategies, which typically handle high-frequency and low-frequency components individually, we employ a mixed-frequency fusion of the outcomes of the NSST. Specifically, given N source images  S n ( n = 1 , , N ) , we perform cross-fusion on their high-frequency maps  H n ( n = 1 , , N )  and low-frequency maps  L n ( n = 1 , , N ) . The rule of cross-fusion is that a high-frequency map should not be fused with the low-frequency map from the same image. For instance, in the case of two source images, the fusion strategy is  H 1 & L 2 H 2 & L 1 . Mixed-frequency fusion breaks down the information barrier between the low-frequency and high-frequency components in traditional MST-based strategies and promotes a better fusion of different multiscale information. It also means our model gives equal attention to the high-frequency and low-frequency representations, which are often unequally treated in most medical image fusion methods. Noting that  H n R I × J × D  and  L n R I × J  are tensors of different dimensions, we adopt the tensor-decomposition-based method in [19] to perform our mixed-frequency fusion. Denoting the mixed-frequency maps as  G a , b R I × J × D , they can be modeled as a core tensor multiplied by the factor matrix along each mode using the Tucker decomposition [18]:
G a = TD H 1 , L 2 = C a × 1 I a × 2 J a × 3 D a
G b = TD H 2 , L 1 = C b × 1 I b × 2 J b × 3 D b
where  I a , b R I × N i N i < I , J a , b R J × N j N j < J , and  D a , b R D × N d N d < D  are dictionaries of the I-mode, J-mode, and D-mode, respectively. Tensors  C a , b R N i × N j × N d  hold the coefficients over the three dictionaries. It is noteworthy that although we exclusively discuss the fusion of two images in this context (which is the most prevalent scenario), it is straightforward to extend the model to encompass the fusion of multiple images. From Equations (7) and (8), we can see that they share the same iteration process. For simplification, we delete the subscript to show the generality. Thus, we present our proposed model by the following equation:
G = TD H , L = C × 1 I × 2 J × 3 D
According to Property 3 in Section 3, the cost function for the mixed-frequency fusion problem  F ( I , J , D , C )  can be written as:
argmin I , J , D , C H C × 1 I × 2 J × 3 D F 2 + γ L C × 1 I × 2 J × 3 D * F 2 + λ C 1 s . t . D * = P D
where  γ  is the fusion control parameter, and  λ  is the sparsity regularization parameter. The high-frequency map  H R I × J × D  is the same size as the mixed-frequency map  G R I × J × D . The low-frequency map  L R I × J  can be viewed as the down-sampled version of the mixed-frequency map  G  along the third dimension.  P R 1 × D  can be understood as the proportion of low-frequency information each direction contains in the mixed-frequency map. In order to solve problem (10), we use the proximal alternating optimization (PAO) algorithm [26], which is guaranteed to converge to a critical point under specific circumstances.
I = argmin I F ( I , J , D , C ) + β I I p r e F 2 J = argmin J F ( I , J , D , C ) + β J J p r e F 2 D = argmin D F ( I , J , D , C ) + β D D p r e F 2 C = argmin C F ( I , J , D , C ) + β C C p r e F 2
where  F ( I , J , D , C )  is the objective function in problem (10);  β  is a positive model parameter; and variables with the superscript  pre  mean the corresponding variables in the previous iteration.  · F  denotes the Forbenius norm. We will provide the solving process of the four subproblems briefly.

4.3. The Optimization Solution

4.3.1. Solution of I

Substituting function F into the first subequation in Equation (11) and discarding the terms irrelevant to the optimization objective, we obtain the following Equation (12):
argmin I H C × 1 I × 2 J × 3 D F 2 + γ L C × 1 I × 2 J × 3 D * F 2 + β I I p r e F 2
Utilizing the one-mode unfolding matrix and Property 1 of the tensor product in Section 3, we can write Equation (12) in an equivalent form:
argmin I H ( 1 ) I A i F 2 + γ L ( 1 ) I B i F 2 + β I I p r e F 2
where  A i = C × 2 J × 3 D ( 1 ) , B i = C × 2 J × 3 D * ( 1 ) · ( · ) ( 1 )  denotes the one-mode unfolding matrix. Equation (13) is quadratic and we can solve the derivative of it:
I A i A i T + γ B i B i T + β E = H ( 1 ) A i T + γ L ( 1 ) B i T + β I p r e
According to [19], the unique solution of Equation (14) can be efficiently found by the conjugate gradient (CG) algorithm [26].

4.3.2. Solution of J

Substituting function F into the second subequation of (11), we obtain problem (15):
argmin J H C × 1 I × 2 J × 3 D F 2 + γ L C × 1 I × 2 J × 3 D * F 2 + β J J p r e F 2
Utilizing the two-mode unfolding matrix and Property 1 of the tensor product in Section 3, we can write Equation (15) in an equivalent form:
argmin J H ( 2 ) J A j F 2 + γ L ( 2 ) J B j F 2 + β J J p r e F 2
where  A j = C × 1 I × 3 D ( 2 ) B j = C × 1 I × 3 D * ( 2 ) · ( · ) ( 2 )  denotes the two-mode unfolding matrix. Problem (16) is quadratic and we can solve the derivative of it:
J A j A j T + B j B j T + β E = H ( 2 ) A j T + γ L ( 2 ) B j T + β J p r e
Considerting to the solving process of I, the unique solution of Equation (17) can be efficiently found by the conjugate gradient (CG) algorithm.

4.3.3. Solution of D

Substituting function F into the third subequation of Equation (11), we obtain Equation (18):
argmin D H C × 1 I × 2 J × 3 D F 2 + γ L C × 1 I × 2 J × 3 D * F 2 + β D D p r e F 2
Utilizing the three-mode unfolding matrix in Section 3, we can write Equation (18) in an equivalent form:
argmin D H ( 3 ) D A d F 2 + γ L ( 3 ) PD A d F 2 + β D D p r e F 2
where  A d = C × 1 I × 2 J ( 3 ) · ( · ) ( 3 )  denotes the three-mode unfolding matrix. Equation (19) is quadratic and we can solve the derivative of it:
D A d A d T + β E + γ P T PD A d A d T = H ( 3 ) A d T + γ P T L ( 3 ) A d T + β D p r e
The unique solution of Equation (20) can also be efficiently found by the conjugate gradient (CG) algorithm.

4.3.4. Solution of C

Substituting function F into the fourth subproblem in Equation (11), we obtain Equation (21):
argmin C H C × 1 I × 2 J × 3 D F 2 + γ L C × 1 I × 2 J × 3 D * F 2 + λ C 1 + β C C p r e F 2
By introducing the splitting variables  C 1 = C 2 = C R N i × N j × N d , the problem above can be rewritten as follows:
argmin C , C 1 , C 2 G ( C ) + G 1 C 1 + G 2 C 2 s . t . C 1 = C 2 = C
where
G ( C ) = λ C 1 + β C C p r e F 2 G 1 C 1 = H C × 1 I × 2 J × 3 D F 2 G 2 C 2 = γ L C × 1 I × 2 J × 3 D * F 2
Equation (22) can be solved by the alternating direction method of multipliers (ADMM) algorithm [27], the details of which are described in [19]. The optimized  I a , b , J a , b , D a , b , C a , b  can be obtained after the execution of the PAO algorithm (the convergence of which will be verified in Section 4). Then, we can count the mixed-frequency maps  G a , G b  using Equations (7) and (8). As can be seen in Figure 2(d1,d2), the mixed-frequency maps retain the texture information of high-frequency maps as the basics and simultaneously introduce the features of low-frequency maps. However, it is inevitable that information will be lost in the process of tensor optimization [28]. Thus, we use joint static and dynamic guidance (JSDG) and WLE and WSEML low-frequency fusion to complete the information and promote a better fusion of knowledge. We verify the importance of this process in the ablation analysis.

4.4. High-Frequency Completion

4.4.1. Joint Static and Dynamic Guidance

We use joint static and dynamic guidance (JSDG) to complete the high-frequency information in mixed-frequency maps  G a , G b . JSDG is able to jointly leverage the structural information of guidance and input images, so we achieve our purpose of preserving the results of mixed-frequency fusion and perform high-frequency completion at the same time. We also demonstrate its advantages over ordinary guided filtering in the ablation analysis. Denoting the completed mixed-frequency maps as  U a , b R I × J × D , the problem can be written as:
U a = JSDG H 1 , G a
U b = JSDG H 2 , G b
where the mixed-frequency maps  G a , b  serve as the static guidance. The high-frequency maps  H 1 , 2  serve as the input image and dynamic guidance. Since Equations (24) and (25) share the same solving process, we delete the subscript to show the generality. Thus, the purpose of our model is studying the following problem:
U = JSDG ( H , G )
Since all variables in the problem above are three-dimensional tensors, we extend JSDG from 2D signals in [21] to 3D signals. Utilizing the definition of an m-mode unfolding vector in Section 2, the cost function of Equation (26) can be denoted as:
argmin U E ( U ) = d u d h d F 2 + α 1 m , n I J ϕ δ g d ( m ) g d ( n ) ψ v u d ( m ) u d ( n )
where  ϕ δ ( x ) = e δ x 2 ψ v ( x ) = 1 ϕ v ( x ) / v , ψ v ( x )  is Welsch’s function, which is nonconvex.  α  is a regularization parameter and  δ , v  are parameters that control the smoothness bandwidth.  u d R 1 × I J ( d = 1 D )  is the three-mode unfolding vector of tensor  U , and  h d , g d  have the same meaning. Without loss of generality, we give the same confidence to every pixel. The problem can also be written in an equivalent form:
argmin U E ( U ) = 1 D ( U ( 3 ) H ( 3 ) ) ( U ( 3 ) H ( 3 ) ) T 1 D T + α v 1 I J d w d g w d g w d u 1 I J T
where  w g d ( m , n ) R I J × I J = ϕ δ g d ( m ) g d ( n ) w d u ( m , n ) R I J × I J = ϕ v u d ( m ) u d ( n ) ( m , n = 1 , , I J ) . ⊗ represents the Hadamard product of the matrices.  1 x R 1 × x  denotes a vector filled with ones. Equation (28) is a nonconvex optimization problem, which can be solved by the majorization-minimization algorithm presented in [21]. Firstly, we built the surrogate objective function  Q ( U ) :
Q ( U ) = 1 D δ U ( 3 ) Z p r e U ( 3 ) T 2 H ( 3 ) U ( 3 ) T + H ( 3 ) H ( 3 ) T δ U ( 3 ) p r e Z p r e U ( 3 ) p r e T 1 D T + α v 1 I J Ω p r e 1 I J T
Z = d O d w g d w u d
O d = diag o 1 d , , o I J d
o m d = n = 1 I J ϕ δ g d ( m ) g d ( n ) ϕ v u d ( m ) u d ( n )
The surrogate function  Q ( U )  is a convex approximate function of nonconvex optimization problem  E ( U ) . If a  U p r e  with  E  is given, the following  U  can be obtained by solving Equation (29).
After the iteration, we obtain the complete mixed-frequency maps  U a , b . A comparison between high-frequency maps  U 1 , 2 (a1,2), original mixed-frequency maps  G a , b (b1,2), and completed mixed-frequency maps  U a , b (c1,2) can be seen in Figure 3. The blue frames show that  U a , b  completes the lost high-frequency information in  G a , b , while the red frames show that  U 1 , 2  preserves and consolidates the product of mixed-frequency fusion. In a word, our JSDG method achieves the purpose of retaining both the structure of the input image and static guidance.

4.4.2. Fusion of Complete Mixed-Frequency Maps

To retain as much detail as possible and reduce randomness and redundancy, we chose the components with a relatively high activity level in  U a / b R I × J × D  to form our fused mixed-frequency map  H f R I × J × D . For each coefficient, we used the activity fusion strategy in [29] with independent parameter settings. For notation simplicity, we use  k ( k a , b )  to universally denote the source mix-frequency maps. Given the source sparse maps  U k , the corresponding initial activity level maps  M k R I × J  can be calculated as follows:
M k ( i , j ) = d = 1 D u k ( i , j , d ) , ( 1 i I , 1 j J )
The final activity level map can be calculated by taking the average process:
M f k ( i , j ) = p = r r q = r r M k ( i + p , j + q ) ( 2 r + 1 ) 2 , ( 1 i I , 1 j J )
where the parameter r is the window radius. Then, the fused mixed-frequency map  H f  can be calculated by:
H f ( i , j , : ) = U k * ( i , j , : ) , k * = argmax k M k f ( i , j ) , ( 1 i I , 1 j J )

4.5. Low-Frequency Completion

We processed the low-frequency maps  L 1 , 2  by the WLE and WSEML method in [7] to obtain a fused low-frequency map  L f I × J . This is used as the base of the inverse nonsubsampled shearlet transform (INSST) in order to complete the lost low-frequency information. According to [7], because the amount of NSST decomposition is usually limited, low-frequency maps  L 1 , 2  still contain some detailed information. Thus, to fully utilize the details from source images, we use the WLE method to fuse the low-frequency information and use the WSEML method to extract the remaining high-frequency information after the NSST. Thus, the fused low-frequency map  L f  can be calculated by the following equations:
L f ( i , j ) = L n ( i , j ) , n * = argmax n WLE n ( i , j ) · WSEML n ( i , j ) , ( 1 i I , 1 j J )
where
WLE n ( i , j ) = p = t t q = t t V ( p + t + 1 , q + t + 1 ) L n ( i + p , j + q ) WSEML n ( i , j ) = p = t t q = t t V ( p + t + 1 , q + t + 1 ) EML n ( i + p , j + q )
where  V  is a weighting matrix, t is the window radius, and EML means the Euclidean modified Laplacian.

4.6. Reconstruction Fused Image by the INSST

The INSST can reconstruct the fused image  F  in two steps, which are actually the inverse processes of multidirectional representation and multiscale decomposition in Section A. We use the fused mixed-frequency map  H f R I × J × D  and fused low-frequency map  L f R I × J  as inputs. Firstly, the nonsubsampled pyramid  H f , k R I × J ( k = 1 , , K )  is generated by accumulating the filtered results in all directions. Secondly, the image is reconstructed from coarse to fine. The whole INSST can be represented by the following equation:
F = I N S S T L f , H f

5. Experiments

In this section, we first introduce the experimental settings, objective metrics, and comparison methods in detail. The proposed method is compared with other approaches in two aspects: a fused result analysis and a quantitative metrics analysis. To enrich the experimental content and improve the readability, we also carried out a parameter analysis, a convergence analysis, and an ablation analysis. All experiments were carried out using MATLAB R2016b on a computer with a dual-core Intel Core i5 processor (1.8 GHz) and 8GB 1600 MHz DDR3.

5.1. Experimental Settings

In this section, we discuss the experimental setup, including the images of the dataset used for the experiment, the selection of the comparison method, and the setting of the parameters. Other experimental details are shown in Table 2.

5.1.1. Experimental Images

Medical image fusion technology widely uses four kinds of medical imaging, including computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and single photon emission tomography (SPECT). In our experiment, the five most common multi-modal medical image fusion problems based on different modes were selected, including T1-T2, T2-PD and CT-MRI, MRI-PET, and MRI-SPECT. T1, T2, and PD are all MRI images based on different weights. The source images of our experiment were selected from the whole brain atlas database [30], and the spatial resolution was 256 × 256.

5.1.2. Objective Metrics

To verify the performance superiority of our method, we selected a total of seven metrics to analyze the image fusion effect from different aspects. These metrics are divided into four categories in [29], namely information-theory-based metrics, image-feature-based metrics, image-structure-similarity-based metrics, and human-perception-inspired metrics. ChenBlum is a human visual system (HVS)-based metric [31] belonging to human-perception-inspired metrics, the purpose of which is to obtain the average value of the quality map of the whole image. Image-feature-based metrics include  Q a b f  [32] and spatial frequency (SF) [33].  Q a b f  reflects the quality of the visual information obtained from the input image fusion, mainly the degree of edge information protection, and the SF measures the overall activity level of an image. The multi-scale structural similarity metric (MS-SSIM) [34] is an improved version of the SSIM [35]. Information-theory-based metrics include feature mutual information (FMI-pixel) and the nonlinear correlation coefficient (NCC) [36]. FMI-pixel calculates the mutual information of the image features and the NCC measures the general relationships among a group of images. The last one is the standard deviation (STD). It measures the contrast information of the fused image quality using variance.

5.1.3. Comparison Methods

Our TDFusion model was compared with five existing medical image fusion methods, including ASR [37], NSST-PAPCNN [7], NSCT-PCDC [38], GFF [5], CS-MCA [29], and FCFusion [17]. Since CS-MCA is not designed for color image fusion, CA-MCA was not included in the comparison of color image fusion. In order to compare the fusion results of each method more objectively, all parameters in these methods are set to default values.

5.2. Visual Effects Analysis

Our comparative experiment was carried out on five medical image fusion problems, and three groups of results are selected for each problem. Then, we compared and analyzed the visual effects of various fusion methods in detail.

5.2.1. Fusion Analysis on T1-T2

The anatomical structure can be better observed in T1 images and T2 images are able to show tissue lesions more effectively. Figure 4(a1–a3) contains T1 images and T2 images are shown in Figure 4(b1–b3). The fusion of T1 and T2 images can lead to a more comprehensive anatomical structure and soft tissue information, which is of great significance in the clinical diagnosis and treatment of soft tissue lesions.
As shown in Figure 4, our TDFusion (i1–i3) model achieves the best visual effect of the fusion image, clearly shows the texture information of the original image, and excellently extracts the detailed information. ASR (f1–f3) and FCFusion (h1–h3) methods achieved good results in brightness and structure processing with the source image, but compared with our TDFusion model, the details are still not clear enough. CS-MCA (g1–g3) and GFF (e1–e3) methods lead to a lot of energy loss while extracting the corresponding details. In addition, NSCT-PCDC (d1–d3) and NSST-PAPCNN (c1–c3) methods not only have the problem of energy loss, but the detailed information extracted is also insufficient, resulting in the the non-ideal fusion image effect.

5.2.2. Fusion Analysis on T2-PD

PD images mainly reflect the proton content of different tissues in the image; thus, the fusion with T2 images can better preserve the edge information and textural features of the source images and improve the efficiency of medical images for diagnosis.
As we can see in Figure 5, T2 source images and PD source images are shown in Figure 5(a1–a3,b1–b3), respectively. The fused images of NSST-PAPCNN (c1–c3) and NSCT-PCDC (d1–d3) are seriously interfered with by noise, many artifacts are produced, and the edges and details of the source image are not retained well, where c1 and d1 show these problems more obviously. This is not conducive to medical observation and research. Moreover, ASR (f1–f3) does not extract the information of the source image well, and the visual information quality of the fused image is not very good. In addition, the GFF (e1–e3) method suffers from information loss, the contrast of fusion image quality is poor, and the overall acquisition level is low. Although CS-MCA (g1–g2) and FCFusion (h1–h3) are relatively close to our TDFusion (i1–i3) fused image, our model has a better performance in terms of detail processing. In general, our method is very effective at removing noise and avoiding artifacts and achieves the best visual effect in all methods.

5.2.3. Fusion Analysis on CT-MRI

Both MR images and CT images belong to anatomical imaging technology. CT images have a high-density spatial resolution, which can better reflect bone and other dense structures, while MRI can clearly reflect the soft tissue information. The fusion of MRI and CT images solves the problem of the poor representation of soft tissue lesions, which greatly improves the efficiency and accuracy of medical diagnosis.
Figure 6 shows the fusion results of three sets of CT (a1–a3) and MR (b1–b3) images. As can be seen from Figure 6, NSCT-PCDC (d1-d3) and GFF (e1–e2) produce a large amount of energy loss, which greatly reduces the intensity and contrast of many fused images. The ASR (f1–f3) method overcomes most of the noise interference, but the problem of losing information still exists. In addition, NSST-PAPCNN (c1–c3) performs well in detail extraction and energy preservation, but it is not as good as CS-MCA (g1–g3) and TDFusion (i1–i3) at preserving texture information. Our TDFusion method has achieved good results in structure information extraction and contrast.

5.2.4. Fusion Analysis on MRI-PET

PET belongs to functional imaging technology, which can well reflect the metabolic information of the human body, but the spatial resolution of functional images is often low. Therefore, the fusion of MR and PET can better combine anatomical and functional information, which is more conducive to medical observation and diagnosis. In addition, PET images provide information through color changes, so it is regarded as a color image which can be summarized by color fusion.
In Figure 7, the first two columns are the MR and PET source images. NSCT-PCDC and GFF suffer from severe color distortion, resulting in a low color fidelity. In Figure 7e2, no color information is extracted. The structural and textural information provided by ASR is not sufficient for final fused results, leading to much information loss and resulting in brightness. In addition, NSST-PAPCNN and FCFusion methods handle the color information well and achieve a higher visual quality than other methods, but they are still insufficient compared with TDFusion. Although our TDFusion is not particularly good at retaining source image information, it has greatly improved the extraction of details and structural information.

5.2.5. Fusion Analysis on MR-SPECT

Similar to PET images, SPECT produces a three-dimensional color medical image that can reveal metabolic changes without structural information. At the same time, the color image fusion strategy combined with MR tissue structure information is used to locate functional information.
The three sets of fusion results are shown in Figure 8, and the visual effects of the fused images are also similar to MRI-PET fusion in the previous section. NSCT-PCDC, GFF, and ASR have different degrees of color distortion. Some important functional information contained in SPECT is lost, making medical diagnosis extremely difficult. In addition, FCFusion and our TDFusion have good structure and color information retention; however, NSST-PAPCNN is not as good as our model in edge information protection. The performance proves that our TDFusion has a good fusion effect on MRI and SPECT images.

5.3. Objective Metrics Analysis

In order to objectively analyze the performance of the fusion method, we randomly selected a total of 95 sets of image pairs, including 28 sets of T1–T2 image pairs, 23 sets of T2–PD image pairs, 12 sets of CT–MRI image pairs, 10 sets of MRI–PET image pairs, and 22 sets of MRI–SPECT image pairs. From Table 3, we can clearly see the performance results of the seven selected indicators for each fusion method. The top three results for each indicator are marked with red, blue, and green, respectively.
In grayscale fusion (T1–T2, T2–PD, CT–MRI), the performance of our TDFusion is the best in all indicators, whether this is in the extraction of details and structural information or in the processing of noise and energy loss. The similarity between the fusion images and the source images is very high, the visual effect is better, and the reliability of medical observations and diagnoses is improved.
In color fusion (MRI–PET and MRI–SPECT), ChenBlum, MS-SSIM, and SF are lower than other indexes, but remain in the top three. Regarding ChenBlum, the GFF method achieves the best results for the whole image quality, but at the same time, the color fidelity is low and the fused image is not ideal. MS-SSIM is an improvement of the SSIM method. The contrast is the maximum of all layers and the structure is related to all layers. The gap between our TDFusion and NSCT-PCDC is also very low; in addition, when SF is used as a measure of the overall level of activity, it is determined that our model results in a certain lack of information, which has been well improved by JSDG. From the previous part of the visual effect analysis, we can see that the image fused by our model has achieved good results in detail extraction, denoising, and color fidelity. The visual quality is also excellent. Although it has not reached the best results, it still remains the second or third best.

5.4. Analysis and Discussion

5.4.1. Analysis of Computational Running Time

We further compare the computational efficiency of the task using the average running time presented in Table 4. As shown in Table 4, the proposed TDFusion achieves the best performance regarding the average running times, which demonstrates the superiority of the proposed TDFusion in efficiency.

5.4.2. Convergence Analysis

As can be seen from Figure 9a,b, which shows the cost fusion of TD and JSDG, respectively, our objective function converges after six iterations both in grayscale fusion or color fusion, indicating that our TDFusion model has good convergence.

5.4.3. Ablation Analysis

A total of 91 groups of images (including five kinds of mode fusion) were selected. The results are shown in Figure 10. Common GF, JSDG, and None Filter were chosen for the ablation experiment as a comparison. The experimental results show that after supplementing the high-frequency information with JSDG, each index is significantly better than that without JSDG, but the performance of the universal filter is somewhat inferior to that of none filter in terms of index, which indicates that it cannot compensate for the high-frequency information of the image well.

5.4.4. Parameter Analysis

There are two influential free parameters in our model,  α  and  γ , which are the regularization parameter and fusion control parameter, respectively. They are related to the main cost functions of the two main optimization processes in our model: JSDG and TD. We used the control variable method to determine the parameter values, then tested the impacts of these values on the selected seven metrics. As can be seen from Table 5, except for  Q a b f , the top three performances are obtained when  γ  = 1, which shows that it is reasonable to fix  γ  as 1. Then, we compared the results when  γ  is fixed to 1. It is easy to observe that the performance is similar in all indicators except  Q a b f . Although the parameter combination of  γ  = 1 and  α  = 1 performs slightly better than other combinations of other indicators, the deterioration in  Q a b f  is too significant, which leads to a decline in visual quality. Therefore, considering these data comprehensively, we set  γ  as 1 and  α  as 2. In the selection of parameters, we chose the average of 10 data images, not an individual image, and for the same values of  α  and  γ , the impact on the overall results is not significant, so it does not lead to the problem of overfitting.

6. Conclusions

The non-downsampled shearlet transform domain tensor decomposition (TDFusion) approach for fusing medical images is proposed in this research. The source image is split into high-frequency and low-frequency components using the NSST method. The two components are fused independently using two different fusion techniques. The high-frequency and low-frequency parts are first combined using the TD method, and then the low-frequency part is subsequently combined using the WLE and WSEML methods in NSST-PAPCNN. Since HF and LF have different structural characteristics, some information will be lost during the fusion process. To lessen the negative impact of this information loss in the fusion process and enhance the fusion quality, we introduce JSDG. Finally, NSST reconstruction is used to obtain the final fusion results. The model exhibits notable advantages in detail and structural information extraction, energy preservation, and denoising according to a large number of successful experiments, including a parametric analysis, comparative experiments, and a visual effect analysis. In future research, we will consider using different methods to achieve low and high frequency separation to further optimize the model.

Author Contributions

R.Z.: Conceptualization, investigation, methodology, software, writing—original draft preparation; Z.W.: conceptualization, software, writing—review and editing; H.S.: software, writing—review and editing; L.D.: methodology, writing—review and editing; H.Z.: conceptualization, software, writing—review and editing, supervision, project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China under Grant 62072256.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Azam, M.A.; Khan, K.B.; Salahuddin, S.; Rehman, E.; Khan, S.A.; Khan, M.A.; Kadry, S.; Gandomi, A.H. A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Comput. Biol. Med. 2022, 144, 105253. [Google Scholar] [CrossRef] [PubMed]
  2. Jiao, D.; Li, W.; Ke, L.; Xiao, B. An Overview of Multi-Modal Medical Image Fusion. Neurocomputing 2016, 215, 3–20. [Google Scholar]
  3. James, A.P.; Dasarathy, B.V. Medical image fusion: A survey of the state of the art. Inf. Fusion 2014, 19, 4–19. [Google Scholar] [CrossRef] [Green Version]
  4. Wang, Z.; Ma, Y. Medical image fusion using m-PCNN. Inf. Fusion 2008, 9, 176–185. [Google Scholar] [CrossRef]
  5. Li, S.; Kang, X.; Hu, J. Image fusion with guided filtering. IEEE Trans. Image Process. 2013, 22, 2864–2875. [Google Scholar]
  6. Yu, L.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar]
  7. Yin, M.; Liu, X.; Liu, Y.; Chen, X. Medical Image Fusion With Parameter-Adaptive Pulse Coupled Neural Network in Nonsubsampled Shearlet Transform Domain. IEEE Trans. Instrum. Meas. 2018, 68, 49–64. [Google Scholar] [CrossRef]
  8. Qi, S.; Calhoun, V.D.; Erp, T.V.; Bustillo, J.; Damaraju, E.; Turner, J.A.; Du, Y.; Yang, J.; Chen, J.; Yu, Q. Multimodal Fusion with Reference: Searching for Joint Neuromarkers of Working Memory Deficits in Schizophrenia. IEEE Trans. Med. Imaging 2017, 37, 93–105. [Google Scholar] [CrossRef] [Green Version]
  9. Yin, H. Tensor Sparse Representation for 3D Medical Image Fusion Using Weighted Average Rule. IEEE Trans. Biomed. Eng. 2018, 65, 2622–2633. [Google Scholar] [CrossRef]
  10. Zhang, Q.; Guo, B.L. Multifocus image fusion using the nonsubsampled contourlet transform. Signal Process. 2009, 89, 1334–1346. [Google Scholar] [CrossRef]
  11. Easley, G.; Labate, D.; Lim, W.Q. Sparse directional image representations using the discrete shearlet transform. Appl. Comput. Harmon. Anal. 2008, 25, 25–46. [Google Scholar] [CrossRef] [Green Version]
  12. Khan, M.; Khan, M.K.; Khan, M.A.; Ibrahim, M.T. Endothelial Cell Image Enhancement using Nonsubsampled Image Pyramid. Inf. Technol. J. 2007, 6, 1057–1062. [Google Scholar] [CrossRef] [Green Version]
  13. Yu, G.; Yang, Y.; Yan, Y.; Guo, M.; Zhang, X.; Wang, J. DeepIDA: Predicting isoform-disease associations by data fusion and deep neural networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 19, 2166–2176. [Google Scholar] [CrossRef]
  14. Wang, J.; Zhang, L.; Zeng, A.; Xia, D.; Yu, J.; Yu, G. DeepIII: Predicting isoform-isoform interactions by deep neural networks and data fusion. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 19, 2177–2187. [Google Scholar] [CrossRef]
  15. Xu, G.; He, C.; Wang, H.; Zhu, H.; Ding, W. DM-Fusion: Deep Model-Driven Network for Heterogeneous Image Fusion. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–15. [Google Scholar] [CrossRef] [PubMed]
  16. Das, S.; Kundu, M.K. A Neuro-Fuzzy Approach for Medical Image Fusion. IEEE Trans. Biomed. Eng. 2013, 60, 3347–3353. [Google Scholar] [CrossRef] [PubMed]
  17. Xu, G.; Deng, X.; Zhou, X.; Pedersen, M.; Cimmino, L.; Wang, H. FCFusion: Fractal Componentwise Modeling with Group Sparsity for Medical Image Fusion. IEEE Trans. Ind. Inform. 2022, 18, 9141–9150. [Google Scholar] [CrossRef]
  18. Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279–311. [Google Scholar] [CrossRef]
  19. Li, S.; Dian, R.; Fang, L.; Bioucas-Dias, J.M. Fusing hyperspectral and multispectral images via coupled sparse tensor factorization. IEEE Trans. Image Process. 2018, 27, 4118–4130. [Google Scholar] [CrossRef]
  20. Liu, X.; Mei, W.; Du, H. Structure tensor and nonsubsampled shearlet transform based algorithm for CT and MRI image fusion. Neurocomputing 2017, 235, 131–139. [Google Scholar] [CrossRef]
  21. Ham, B.; Cho, M.; Ponce, J. Robust image filtering using joint static and dynamic guidance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4823–4831. [Google Scholar]
  22. Stimpel, B.; Syben, C.; Schirrmacher, F.; Hoelter, P.; Dörfler, A.; Maier, A. Multi-modal deep guided filtering for comprehensible medical image processing. IEEE Trans. Med. Imaging 2019, 39, 1703–1711. [Google Scholar] [CrossRef]
  23. Diwakar, M.; Singh, P.; Singh, R.; Sisodia, D.; Singh, V.; Maurya, A.; Kadry, S.; Sevcik, L. Multimodality Medical Image Fusion Using Clustered Dictionary Learning in Non-Subsampled Shearlet Transform. Diagnostics 2023, 13, 1395. [Google Scholar] [CrossRef] [PubMed]
  24. Jakhongir, N.; Abdusalomov, A.B.; Whangbo, T.K. Attention 3D U-Net with Multiple Skip Connections for Segmentation of Brain Tumor Images. Sensors 2022, 22, 6501. [Google Scholar]
  25. Arif, M.; Ajesh, F.; Shamsudheen, S.; Geman, O.; Izdrui, D.R.; Vicoveanu, D.I. Brain Tumor Detection and Classification by MRI Using Biologically Inspired Orthogonal Wavelet Transform and Deep Learning Techniques. J. Healthc. Eng. 2022, 2022, 2693621. [Google Scholar] [CrossRef] [PubMed]
  26. Li, X.; Huang, T.Z.; Zhao, X.L.; Ji, T.Y.; Zheng, Y.B.; Deng, L.J. Adaptive total variation and second-order total variation-based model for low-rank tensor completion. Numer. Algorithms 2021, 86, 1–24. [Google Scholar] [CrossRef]
  27. Boyd, S.P.; Parikh, N.; Chu, E.K.W.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  28. Da Silva, C.; Herrmann, F.J. Optimization on the hierarchical tucker manifold–applications to tensor completion. Linear Algebra Its Appl. 2015, 481, 131–173. [Google Scholar] [CrossRef]
  29. Liu, Y.; Chen, X.; Ward, R.K.; Wang, Z.J. Medical image fusion via convolutional sparsity based morphological component analysis. IEEE Signal Process. Lett. 2019, 26, 485–489. [Google Scholar] [CrossRef]
  30. Johnson, C. The whole brain atlas. BMJ 1999, 319, 1507. [Google Scholar]
  31. Yin, C.; Blum, R.S. A new automated quality assessment algorithm for image fusion. Image Vis. Comput. 2009, 27, 1421–1432. [Google Scholar]
  32. Xydeas, C.S.; Pv, V. Objective image fusion performance measure. Mil. Tech. Cour. 2000, 56, 181–193. [Google Scholar] [CrossRef] [Green Version]
  33. Zheng, Y.; Essock, E.A.; Hansen, B.C.; Haun, A.M. A new metric based on extended spatial frequency and its application to DWT based fusion algorithms. Inf. Fusion 2007, 8, 177–192. [Google Scholar] [CrossRef]
  34. Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the IEEE Asilomar Conference on Signals, Pacific Grove, CA, USA, 9–12 November 2003. [Google Scholar]
  35. Zhou, W.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar]
  36. Wang, Q.; Shen, Y. Performances evaluation of image fusion techniques based on nonlinear correlation measurement. In Proceedings of the 21st IEEE Instrumentation and Measurement Technology Conference, Como, Italy, 18–20 May 2004; Volume 1, pp. 472–475. [Google Scholar]
  37. Liu, Y.; Wang, Z. Simultaneous image fusion and denoising with adaptive sparse representation. IET Image Process. 2015, 9, 347–357. [Google Scholar] [CrossRef] [Green Version]
  38. Bhatnagar, G.; Wu, Q.J.; Liu, Z. Directive contrast based multimodal medical image fusion in NSCT domain. IEEE Trans. Multimed. 2013, 15, 1014–1024. [Google Scholar] [CrossRef]
Figure 1. This is the image fusion process of the TDFusion model. Firstly, the source images  S 1  and  S 2  are decomposed into low-frequency  L 1  and  L 2  and high-frequency  H 1  and  H 2  by the NSST method. Then, high-frequency  H 1  and low-frequency  L 2  are fused by the tensor decomposition method to obtain  G a , and low-frequency  L 1  and high-frequency  H 2  are fused by the same method to obtain  G b . The obtained  G a  and  G b  are added to the JSDG guided filter and the corresponding information from high-frequency  H 1  and  H 2  is added, while the WLE and WSEML methods are used to complete the low-frequency part information. Finally, the fused image is reconstructed by performing the inverse NSST on the fused high-frequency  H f  and low-frequency  L f .
Figure 1. This is the image fusion process of the TDFusion model. Firstly, the source images  S 1  and  S 2  are decomposed into low-frequency  L 1  and  L 2  and high-frequency  H 1  and  H 2  by the NSST method. Then, high-frequency  H 1  and low-frequency  L 2  are fused by the tensor decomposition method to obtain  G a , and low-frequency  L 1  and high-frequency  H 2  are fused by the same method to obtain  G b . The obtained  G a  and  G b  are added to the JSDG guided filter and the corresponding information from high-frequency  H 1  and  H 2  is added, while the WLE and WSEML methods are used to complete the low-frequency part information. Finally, the fused image is reconstructed by performing the inverse NSST on the fused high-frequency  H f  and low-frequency  L f .
Sensors 23 06616 g001
Figure 2. This figure displays the intermediate results of the fusion process. (a1,a2) are the source images; (b1,b2) are slices of the high-frequency maps  H 1 , H 2 ; (c1,c2) are the low-frequency maps  L 1 , L 2 ; (d1,d2) are the slices of the mixed-frequency maps.
Figure 2. This figure displays the intermediate results of the fusion process. (a1,a2) are the source images; (b1,b2) are slices of the high-frequency maps  H 1 , H 2 ; (c1,c2) are the low-frequency maps  L 1 , L 2 ; (d1,d2) are the slices of the mixed-frequency maps.
Sensors 23 06616 g002
Figure 3. A comparison between the slices of high-frequency maps  U 1 , 2  (a1,a2), original mixed-frequency maps  G a , b  (b1,b2), and complete mixed-frequency maps  U a , b  (c1,c2). The blue and red boxes are shown the enlarged details of each frequency map.
Figure 3. A comparison between the slices of high-frequency maps  U 1 , 2  (a1,a2), original mixed-frequency maps  G a , b  (b1,b2), and complete mixed-frequency maps  U a , b  (c1,c2). The blue and red boxes are shown the enlarged details of each frequency map.
Sensors 23 06616 g003
Figure 4. The visual effects of various fusion methods on T1 and T2 images. Each image has two close-ups. The first two columns of each group are the source images T1 (a1a3) and T2 (b1b3). The fused images are obtained by NSST-PAPCNN (c1c3), NSCT-PCDC (d1d3), GFF (e1e3), ASR (f1f3), CS-MCA (g1g3), FCFusion (h1h3), and TDFusion (i1i3). The red boxes are shown the enlarged details of fused result.
Figure 4. The visual effects of various fusion methods on T1 and T2 images. Each image has two close-ups. The first two columns of each group are the source images T1 (a1a3) and T2 (b1b3). The fused images are obtained by NSST-PAPCNN (c1c3), NSCT-PCDC (d1d3), GFF (e1e3), ASR (f1f3), CS-MCA (g1g3), FCFusion (h1h3), and TDFusion (i1i3). The red boxes are shown the enlarged details of fused result.
Sensors 23 06616 g004
Figure 5. The visual effects of various fusion methods on T2 and PD images. Each image has two close-ups. The first two columns of each group are the source images T2 (a1a3) and PD (b1b3). The fused images were obtained by NSST-PAPCNN (c1c3), NSCT-PCDC (d1d3), GFF (e1e3), ASR (f1f3), CS-MCA (g1g3), FCFusion (h1h3), and TDFusion (i1i3). The red boxes are shown the enlarged details of fused result.
Figure 5. The visual effects of various fusion methods on T2 and PD images. Each image has two close-ups. The first two columns of each group are the source images T2 (a1a3) and PD (b1b3). The fused images were obtained by NSST-PAPCNN (c1c3), NSCT-PCDC (d1d3), GFF (e1e3), ASR (f1f3), CS-MCA (g1g3), FCFusion (h1h3), and TDFusion (i1i3). The red boxes are shown the enlarged details of fused result.
Sensors 23 06616 g005
Figure 6. The visual effects of various fusion methods on CT and MR-T2 images. Each image has two close-ups. The first two columns of each group are the source images CT (a1a3) and T2 (b1b3). The fused images were obtained by NSST-PAPCNN (c1c3), NSCT-PCDC (d1d3), GFF (e1e3), ASR (f1f3), CS-MCA (g1g3), FCFusion (h1h3), and TDFusion (i1i3). The red boxes are shown the enlarged details of fused result.
Figure 6. The visual effects of various fusion methods on CT and MR-T2 images. Each image has two close-ups. The first two columns of each group are the source images CT (a1a3) and T2 (b1b3). The fused images were obtained by NSST-PAPCNN (c1c3), NSCT-PCDC (d1d3), GFF (e1e3), ASR (f1f3), CS-MCA (g1g3), FCFusion (h1h3), and TDFusion (i1i3). The red boxes are shown the enlarged details of fused result.
Sensors 23 06616 g006
Figure 7. The visual effects of various fusion methods on MRI and PET images. Each image has two close-ups. The first two columns of each group are the source images MRI (a1a3) and PET (b1b3). The fused images are obtained by NSCT-PCDC (c1c3), GFF (d1d3), ASR (e1e3), NSST-PAPCNN (f1f3), FCFusion (g1g3), and TDFusion (h1h3). The red boxes are shown the enlarged details of fused result.
Figure 7. The visual effects of various fusion methods on MRI and PET images. Each image has two close-ups. The first two columns of each group are the source images MRI (a1a3) and PET (b1b3). The fused images are obtained by NSCT-PCDC (c1c3), GFF (d1d3), ASR (e1e3), NSST-PAPCNN (f1f3), FCFusion (g1g3), and TDFusion (h1h3). The red boxes are shown the enlarged details of fused result.
Sensors 23 06616 g007
Figure 8. The visual effects of various fusion methods on MRI and SPECT images. Each image has two close-ups. The first two columns of each group are the source images MRI (a1a3) and GFF (b1b3). The fused images are obtained by PET (c1c3), ASR (d1d3), NSST-PAPCNN (e1e3), NSCT-PCDC(f1f3), FCFusion (g1g3), and TDFusion (h1h3). The red boxes are shown the enlarged details of fused result.
Figure 8. The visual effects of various fusion methods on MRI and SPECT images. Each image has two close-ups. The first two columns of each group are the source images MRI (a1a3) and GFF (b1b3). The fused images are obtained by PET (c1c3), ASR (d1d3), NSST-PAPCNN (e1e3), NSCT-PCDC(f1f3), FCFusion (g1g3), and TDFusion (h1h3). The red boxes are shown the enlarged details of fused result.
Sensors 23 06616 g008
Figure 9. The convergence chain diagram of grayscale fusion and color fusion. From Equation (10), we present the converged value of F of grayscale fusion and color fusion in (a) and (b) respectively.
Figure 9. The convergence chain diagram of grayscale fusion and color fusion. From Equation (10), we present the converged value of F of grayscale fusion and color fusion in (a) and (b) respectively.
Sensors 23 06616 g009
Figure 10. Ablation experimental data results. The indicators are ChenBlum, FM-pixel, MS-SSIM, NCC,  Q A B / F , SF, and STD.
Figure 10. Ablation experimental data results. The indicators are ChenBlum, FM-pixel, MS-SSIM, NCC,  Q A B / F , SF, and STD.
Sensors 23 06616 g010
Table 1. Symbols and meanings.
Table 1. Symbols and meanings.
SymbolsMeanings
  a i j k Scalar
AMatrix
  A T Conjugate transpose of a matrix
  A Third-order tensor
  A   ( i , : , : ) Horizontal slice of the tensor A
  A   ( : , j , : ) Side slices of tensor A
  A   ( : , : , k ) The front slice of the tensor A
  A   ( i , j , : ) Tube of Tensor A
Table 2. Experimental detailed specifications.
Table 2. Experimental detailed specifications.
Experimental EnvironmentParameters
Experimental equipmentsIntel Core i5 dual-core processor
8GB 1600 MHz DDR3
Compiling softwareMATLAB 2016b
Table 3. Objective performance of different fusion methods on seven metrics over T1–T2, T2–PD, CT–MRI, MRI–PET, and MRI–SPECT. The top three for each metric are marked in red, blue, and green.
Table 3. Objective performance of different fusion methods on seven metrics over T1–T2, T2–PD, CT–MRI, MRI–PET, and MRI–SPECT. The top three for each metric are marked in red, blue, and green.
MethodsChenBlumFMI-PixelMS-SSIMNCC Q AB / F SFStd
TDFusion0.65140.86760.96670.81060.6605−0.07760.3277
FCFusion0.62600.84220.91460.80800.6390−0.15470.2533
ASR0.63700.83510.92610.80540.5960−0.20860.2292
CS-MCA0.62610.84300.95250.80580.6399−0.14570.2550
GFF0.61060.84770.92340.80670.6438−0.15260.2606
NSCT-PCDC0.54490.82750.87920.80420.5467−0.13830.2427
NSST-PAPCNN0.44590.80640.86790.80500.4031−0.35360.2956
TDFusion0.76220.89240.98130.80690.6450−0.08820.2249
FCFusion0.74860.87900.96220.80540.6330−0.14470.1933
ASR0.76060.87450.96690.80540.6167−0.23920.1842
CS-MCA0.74800.88210.97850.80550.6447−0.14950.1976
GFF0.73250.87930.95840.80560.6332−0.19510.1881
NSCT-PCDC0.66270.86740.94410.80460.5675−0.13090.1838
NSST-PAPCNN0.57870.86080.95200.80510.5434−0.24730.2126
TDFusion0.71220.91010.93280.80650.5957−0.10140.3282
FCFusion0.68600.89220.91460.80630.5390−0.25470.2533
ASR0.71140.90330.90810.80580.5468−0.26910.2621
CS-MCA0.69360.90800.92560.80590.5468−0.23420.2887
GFF0.69560.90540.82630.80610.5835−0.29060.2529
NSCT-PCDC0.60750.89610.83690.80520.5350−0.17890.2695
NSST-PAPCNN0.54490.88500.89110.80580.5168−0.28330.3264
TDFusion0.63980.89700.9999940.81150.8188−0.01480.2526
FCFusion0.63650.88520.999940.81140.8090−0.01270.2536
ASR0.63860.85330.9999470.80430.7545−0.14190.1656
GFF0.65690.89280.9999960.81120.8136−0.02630.2427
NSCT-PCDC0.59700.89520.9999960.81060.8146−0.01360.2470
NSST-PAPCNN0.64380.89580.9999950.81130.7811−0.01200.2524
TDFusion0.65880.89720.9999830.80940.7719−0.03020.2627
FCFusion0.65100.89620.9999800.80860.7310−0.05470.2413
ASR0.65110.86550.9999550.80490.6826−0.18840.1803
GFF0.68450.89590.9999860.80910.7682−0.04690.2438
NSCT-PCDC0.62610.89670.9999920.80840.7315−0.03100.2521
NSST-PAPCNN0.64540.89600.9999870.80880.7215−0.02780.2618
Table 4. The average running time of different methods.
Table 4. The average running time of different methods.
MethodsCS-MCAGFFNSCT-RPCNNNSCT-PCDCNSST-PAPCNNFCFusionTDFusion
Times137.380.068.4315.146.8656.8920.66
Table 5. The comparison of metric values with different parameter values.
Table 5. The comparison of metric values with different parameter values.
  ChenBlum FMI - Pixel MS - SSIM   NCC   Q AB / F   SF   std
  γ = 1 , α = 1 0.61670.891400.9657410.808860.5990−0.08550.25175
  γ = 1 , α = 2   0.6137   0.89136   0.965740   0.80877   0.6033   0.0865   0.25174
  γ = 1 , α = 3 0.61170.891240.9657310.808740.6065−0.08710.25173
  γ = 10 , α = 1 0.60540.890810.9656020.808500.6295−0.09110.25164
  γ = 10 , α = 2 0.59810.890450.9654760.808150.6309−0.09780.25149
  γ = 10 , α = 3 0.59270.890070.9653440.807930.6321−0.10430.25132
  γ = 0.1 , α = 1 0.59190.890530.9651390.808020.6145−0.09410.25155
  γ = 0.01 , α = 1 0.58670.890380.9649260.807910.6115−0.09570.25150
  γ = 0.1 , α = 2 0.57880.890260.9647490.807780.6052−0.09800.25142
  γ = 0.01 , α = 2 0.57040.890080.9644290.807660.6022−0.10030.25132
  γ = 0.1 , α = 3 0.57020.890060.9644990.807650.6009−0.10080.25130
  γ = 0.01 , α = 3 0.56030.889830.9640450.807520.5985−0.10360.25117
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, R.; Wang, Z.; Sun, H.; Deng, L.; Zhu, H. TDFusion: When Tensor Decomposition Meets Medical Image Fusion in the Nonsubsampled Shearlet Transform Domain. Sensors 2023, 23, 6616. https://doi.org/10.3390/s23146616

AMA Style

Zhang R, Wang Z, Sun H, Deng L, Zhu H. TDFusion: When Tensor Decomposition Meets Medical Image Fusion in the Nonsubsampled Shearlet Transform Domain. Sensors. 2023; 23(14):6616. https://doi.org/10.3390/s23146616

Chicago/Turabian Style

Zhang, Rui, Zhongyang Wang, Haoze Sun, Lizhen Deng, and Hu Zhu. 2023. "TDFusion: When Tensor Decomposition Meets Medical Image Fusion in the Nonsubsampled Shearlet Transform Domain" Sensors 23, no. 14: 6616. https://doi.org/10.3390/s23146616

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop