Two-Branch Feature Interaction Fusion Method Based on Generative Adversarial Network

Chang, Rong; Dang, Junpeng; Zhang, Nanchuan; Zhao, Shan; Hu, Shijin; Xing, Lin; Bai, Haicheng; Zhou, Chengjiang; Yang, Yang

doi:10.3390/electronics12163442

Open AccessArticle

Two-Branch Feature Interaction Fusion Method Based on Generative Adversarial Network

by

Rong Chang

¹

,

Junpeng Dang

¹,

Nanchuan Zhang

¹,

Shan Zhao

^2,*,

Shijin Hu

^2,*,

Lin Xing

²,

Haicheng Bai

³,

Chengjiang Zhou

²

and

Yang Yang

²

¹

Yuxi Power Supply Bureau of Yunnan Power Grid Co., Ltd., Yuxi 651100, China

²

The School of Information Science and Technology, Yunnan Normal University, Kunming 650000, China

³

The Network and Information Center, Yunnan Normal University, Kunming 650000, China

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(16), 3442; https://doi.org/10.3390/electronics12163442

Submission received: 31 May 2023 / Revised: 25 June 2023 / Accepted: 29 June 2023 / Published: 15 August 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study proposes a fusion method of infrared and visible images based on feature interaction. Existing fusion methods can be classified into two categories based on a single-branch network and a two-branch network. Generative adversarial networks are widely used in single-branch-based fusion methods, which ignore the difference in feature extraction caused by different input images. Most two-branch-based fusion methods use convolutional neural networks, which do not take into account the inverse promotion of fusion results and lack the interaction between different input features. To remedy the shortcomings of these fusion methods and better utilize the feature from source images, this study proposes a two-branch feature interactions method based on a generative adversarial network for visible and infrared image fusion. In the generator part, a two-branch feature interaction approach was designed to extract features from different inputs and realize feature interaction through the network connection of different branches. In the discriminator part, a double-classification discriminator was used for visible images and infrared images. Extensive comparison experiments with state-of-the-art methods have demonstrated the advantages of this proposed generative adversarial network based on two-branch feature interaction, which can enhance the texture details of objects in fusion results and reduce the interference of noise information from source inputs. In addition, the above advantages were also confirmed in generalization experiments of object detection.

Keywords:

infrared and visible image fusion; two-branch network; feature interaction; generative adversarial network

1. Introduction

Infrared images and visible images are different types of images obtained from different sensors that capture different types of information about the same scene. Infrared images provide texture detail and contrast based on the intensity of heat captured from objects [1]. Conversely, visible images that are strongly affected by the environment of strong light, the environment of weak light, and the environment of smoke, may increase the difficulty in capturing effective information and cause a large amount of noisy information interference. Both feature complementation and feature fusion are very necessary for these two classes of images [2]. Therefore, novel fusion methods have been designed according to the difference in image information, which can filter out the noise information and improve the diversity of information in fusion images.

Infrared and visible image fusion was first used by traditional methods such as sparse representation methods (SR) [3], multi-scale transform methods (MST) [4,5,6], low-rank representation (LRR) [7,8], and other methods [9,10]. With the development of deep learning, weakly supervised deep learning networks have been used more frequently in image fusion methods. These methods can be roughly summarized as convolutional-neural-network-based fusion methods [11,12,13,14,15] and generative-adversarial-network-based fusion methods [1,2,16,17,18,19].

All of the above traditional and deep-learning-based approaches address some problems and provide new ideas, but some challenges have not yet been overcome:

Existing fusion methods based on generative adversarial networks and two-branch fusion methods based on convolutional neural networks have inspired us to develop new fusion ideas, but new challenges have also emerged.
Existing fusion methods based on generative adversarial networks adopt the single-branch mode for feature extraction [1,2,16,17]. These methods cannot avoid the feature loss problem of infrared images caused by interference information from visible images, thus resulting in contrast reduction of the scene and weak texture information in occluded regions.
Some of the existing fusion methods based on convolutional neural networks use a double-branch mode to extract features [11,12,13,14,15]. We draw inspiration from these methods of extracting features individually, but they also face some problems. Feature correlation between different source images is reduced in this simple two-branch design. Moreover, restricting the feature extraction and reconstruction of the network only by loss functions may result in retaining too much useless information in the fusion results when the source images are disturbed by smoke or strong light.

To address the three aforementioned issues, this study designed a two-branch fusion model for infrared and visible images. The main contributions of the proposed model are as follows:

(1): In terms of issue (a): This study designed a generative adversarial network based on two-branch feature interaction for infrared and visible image fusion. The advantages of two-branch feature extraction and the adversarial advantage of generated adversarial networks were reasonably utilized in this proposed model.
(2): In terms of issue (b): This study designed a generator based on two-branch feature extraction to extract features from visible and infrared images. The two-branch feature extraction mode was designed to address the loss of texture features in infrared images caused by the large area of interference information from visible images.
(3): In terms of issue (c): This study designed a two-branch feature extraction mode of feature interaction and enhanced the feature correlation of different branches through layer-hopping connection. The feature similarity between both the source images and fused results was enhanced by a discriminator optimization generator.

2. Related Works

For the past few years, visible and infrared image fusion algorithms designed by traditional methods have been well known. With the wide application of deep learning networks, they have also been widely used in the design of infrared and visible image fusion algorithms. Next, this paper first introduces some traditional methods for image fusion, then introduces algorithms based on convolutional neural networks (CNNs), and finally introduces fusion algorithms based on generative adversarial networks (GANs).

2.1. The Traditional Fusion Method

Feature extraction, feature fusion, and feature reconstruction have been redesigned to optimize the function of the fusion image using different mathematical methods in traditional methods. Four representative traditional methods are described in detail below.

These include sparse representation fusion methods [3], multi-scale transform (MST) fusion methods [4,5,6], low-rank representation fusion methods [7,8] and other fusion methods [9,10].

Regarding multi-scale transform (MST) fusion methods [4,5,6], it is the main algorithm of MST that obtains multi-scale representations from input images through multi-scale transformations. Specific fusion rules are designed to obtain multi-scale coefficients of fusion, which are related to the correlation and activity between pixels in multi-scale representation of images. Finally, fusion images are obtained by multi-scale inverse transformation of the fusion coefficient.

Sparse representation (SR) fusion methods [3] transform two types of source images into one single-scale feature vector based on a linear combination through a dictionary and then fuse and reconstruct the feature vector to obtain an all-fused result.

For low-rank representation fusion methods [7,8], the fusion method of low-rank representation is divided into three steps: the low-rank partial features and significant features of all the source images are extracted as the first step, then different extracted features are fused separately, and, finally, fused features are reconstructed. The authors in [20] designed an image decomposition strategy, MDLatLRR, based on imprecise augmented Lagrange multipliers, and the weighted average method was used to fuse different features.

2.2. Convolutional-Neural-Network-Based Fusion Methods

The convolutional neural network is a type of neural network that is commonly used in computer vision tasks. It consists of multiple layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layer applies a set of filters to input images, which helps to extract features such as edges and textures. The pooling layer then reduces the spatial size of the feature maps by taking the maximum or average value in each region. The fully connected layer connects all neurons from the previous layers to the output layer for classification or regression.

Convolutional neural networks have become the first choice of image fusion methods as networks have evolved. The fusion method of infrared and visible images also uses convolutional neural networks more widely [21,22,23,24,25,26]. The single-branch fusion method and two-branch fusion method are two kinds of convolutional-neural-network-based fusion methods.

The authors in [27] proposed a unified fusion framework of multiple types of fusion images, U2Fusion, which is based on a single-branch network. An adaptive method was used to estimate the feature significance from both source images in the steps of feature extraction and feature retention. Although it obtained a good effect in fused results from visible and infrared images, the training set of the model was derived from single multi-focus image data, which led to poor performance in extracting and preserving feature information from other types of images. Moreover, the choice of an appropriate fusion strategy is also an issue for different types of image fusion requiring different fusion strategies.

Most fusion methods based on convolutional neural networks use two-branch networks to obtain features from source images. An image fusion framework based on CNN was proposed by Zhang et al. [12], which proved to be suitable for various types of image fusion, such as visible and infrared images, different exposure images, and different modal medical images. First, fully convolutional neural networks were proposed, so that the training process was end-to-end without the need for manual intervention or post-processing procedures. The training of the model was only targeted at different focus image datasets, due to the establishment of completely different focus image datasets. Finally, the CNN-based image fusion model with perceptual loss was also a precedent. The authors in [11] created a network known as RFN, which used different encoders to encode both source images, fuse encoded features, and then transmit them to the decoder to get fusion results. The authors in [13] created DIDFuse, which uses different encoders to decompose different source images into background parts and detail parts to obtain different feature maps, and then uses decoders to restore features to visible or infared images. The network was well optimized by encoding and decoding. The authors in [14] designed SeaFusion, which designs different branches to extract different features of different source images by designing gradient residual-dense blocks in each branch to improve the ability of the network convolution to acquire features. Such feature extraction methods also provide us with good inspiration. In the method designed by Li et al. [28], basic parts and detailed parts from different source images were obtained by different branches, and then a weighted average and deep learning network were used to fuse the basic parts and details parts of different source images. The fused image consisted of the above two parts.

The above fusion methods based on convolutional neural networks solve some challenges in the fusion field, but there are still some open problems. These methods only rely on the design of loss function, which limits the feature extraction ability of the network and introduces noise information into the feature reconstruction, thereby resulting in a large number of details covered by noise information.

2.3. Generative-Adversarial-Network-Based Methods

The generative adversarial network constitutes one of the most promising methods of weakly supervised learning. This model can be divided into generator and discriminator parts. The generator in GANs typically consists of one or more deep neural networks that take random noise as the input and produce fake data samples as the output. The discriminator, on the other hand, is a binary classifier that takes data samples as input and outputs a probability score indicating whether the input is real or fake. These two parts of the adversarial game contribute to the optimization of the entire fusion network. Goodfellow et al. [29] designed the GAN model. This network concept based on adversarial learning has brought new vitality to the deep learning field.

The first fusion model, FusionGAN, was proposed by Ma et al. [2], which is a fusion framework based on adversarial learning that is used for visible and infrared image fusion tasks. The function of the generator is to obtain fusion results, which mainly include not only intensity features from infrared images, but also gradient features from visible images. Meanwhile, the function of the discriminator is to make fused images retain more detail features from visible images. A specific discriminator was designed to judge the authenticity of visible images and fusion results, with a promoting corresponding generator to produce more feature-rich results.

DDcGAN [17] was designed for source image fusion with different resolutions. The source images with different resolutions are passed into the encoder part and the decoder part of the generator to obtain fusion images. Fused images are fed into different discriminators, both of which simultaneously promote the generator. Ma et al. [1] used multiple classifiers in generative adversarial networks (GANMcC) to balance the information retention degree from different source images in fused results. In addition, the new content loss function was introduced in the network of GANMcC, which includes two types of loss—gradient loss and intensity loss. In addition, different content loss functions were adopted for different source images.

Unfortunately, existing generative adversarial networks used in the fusion domain all adopt a single-branch mode, and such feature extraction methods have difficulty in specialized feature extraction for different source images.

3. Proposed Method

The proposed fusion model based on two-branch feature complementation is described in this section. The formulation of the problem is introduced first in the article. Secondly, the proposed fusion network is introduced in detail, including the generator based on two-branch feature complementarity and the double-classification discriminator based on structures of layer-hopping connections. The loss functions used to optimize the discriminator and generator are then introduced. Finally, this section provide certain details of model training and testing.

3.1. Problem Formulation

A two-branch feature complementary generator is proposed, visible images

I_{v}^{h * w * 3}

are passed into one branch of the proposed generator and infrared images, and

I_{r}^{h * w * 1}

are passed into another branch of the proposed generator. Feature extraction is performed via convolution on both source images. The features obtained from each convolution layer of infrared image are passed into another corresponding feature extraction branch for feature complementarity. The features from both branches can generate fused images

I_{f}^{h * w * 3}

after feature fusion and feature reconstruction. The fusion results obtained from the two-branch feature complementary generator, visible images and infared images, are respectively fed to a special discriminator based on layer-hopping connections. The identification results are obtained, which are guided to achieve optimization by the loss function.

3.2. The Fusion Model Based on Two-Branch Feature Complementation

In this section, the proposed fusion network is described in detail, including a generator based on two-branch feature complementarity and a double-classification discriminator based on layer-hopping connections in Figure 1. Visible and infrared images are fed into two branches in the proposed generator separately. In each branch, features from different images are extracted by convolution. It is worth noting that the features in each layer of the infrared images are passed into another branch. Then, the different features of the two branches are fused and reconstructed to obtain fusion results. the fusion results, visible images, and infared images are passed into the double-classification discriminator based on layer-hopping connections in turn. The discriminator result is used to promote the fusion performance of the two-branch generator through updated loss function. The two-branch generator and updated discriminator enhance the performance of the entire network through this interaction.

3.2.1. The Architecture of the Generator Based on Two-Branch Feature Complementarity

The design of the two-branch network improves the feature extraction ability of different source images. This study designed pixel superposition between the same convolution layers of different branches, which was different from the previous two-branch networks, and proposed the idea of feature complementarity for different features in feature extraction. This design can promote the feature complementarity between the two source images in the process of feature extraction from visible images and infared images.

Visible images and infrared images are passed into two branches in the proposed generator, respectively, as shown in Figure 2. The each convolution layer result of the infrared image branch is not only passed into the next convolution layer, but also passed into the next convolution layer of the visible image branch after pixel addition with the convolution layer result of the visible image branch. Subsequently, the last convolution layer results of the different branches are added and passed into four following convolution layers for feature fusion and feature reconstruction. All the convolution layers in the proposed generator backbone network include a convolution layer, batchnorm layer, and ReLu layer.

3.2.2. The Architecture of Double-Classification Discriminator Based on Layer-Hopping Connections

Fusion models using single infrared image discriminators may reduce the feature retention degree of the visible images. Therefore, this study designed a double-classification discriminator to solve the problem of missing features in the visible image. In addition, the feature extraction capability of the discriminator network on input images was enhanced, and the ability of the discriminator network to distinguish images was improved by introducing hopping connect layers.

As shown in Figure 3, the double-classification discriminator consists of three 3 × 3 convolution layers, a 5 × 5 convolution layer, and an 8 × 8 convolution layer, all of which are equipped with batchnorm layers and ReLu layers. The result of five convolution layers is classified by an activiation layer.

The classification results of the different images are used to guide the generator to retain different reconstruction features through the loss function. The two-branch generator and double-classification discriminator enhance the feature extraction and reconstruction capabilities of the network through the action of loss functions.

3.3. Loss Functions

This section starts with introducing the lightweight loss functions of the two-branch generator. These include a traditional loss function, content loss function, and adversarial loss function. Then, the proposed loss functions of thedouble-classification discriminator are introduced.

3.3.1. Loss Functions of the Generator Based on Two-Branch Feature Complementarity

Recently, GAN-based image fusion methods have introduced content loss into the loss function. However, this is not sufficient to solve some invisible problems, such as the insufficient ability to extract gradient information from both visible and infared images, thus resulting in a large gradient difference between source images and fused images. In order to solve the above problems, this study designed the traditional loss function

L_{t r a}

, the updated content loss function

L_{c o n}

, and the adversarial loss function

L_{A d v}

. The similarity between both source images and fused results was improved.

The global loss function for the two-branch generator

L_{G}

is shown in detail.

L_{G} = a \cdot L_{t r a} + b \cdot L_{c o n}^{*} + c \cdot V_{A d v}^{*} .

(1)

L_{G}

denotes the global loss function of the two-branch generator, which consists of three parts. First,

L_{t r a}

denotes the traditional loss, then the second is the loss of content

L_{c o n}

, and last is the adversarial loss

V_{A d v} (G)

.

a, b,

and c are the corresponding proportional parameters of the three parts. The three parts of the loss function are described in detail next.

L_{t r a} = \frac{1}{T} \sum_{n = 1}^{T} \frac{(I_{f (m, n)}^{t} - β \cdot I_{r (m, n)}^{t} - α \cdot I_{v (m, n)}^{t})}{m \cdot n} .

(2)

On the right hand side of the

L_{t r a}

,

I_{f (m, n)}^{t}

describes the pixel in column n and row m of the tth fused image.

I_{r (m, n)}^{t}

describes the pixel in column n row m of the tth infrared image.

I_{v (m, n)}^{t}

represents the pixel in column n and row m of the tth visible image.

α

and

β

are the corresponding proportional parameters.

L_{c o n}^{*} = \frac{1}{H L} ({∥ I_{f} - I_{r} ∥}_{F}^{2} + ξ {∥ \nabla I_{f} - \nabla I_{v} ∥}_{F}^{2}) .

(3)

The second part is content loss

L_{c o n t e n t}^{*}

. In the formula above, L and H describe the width and height of input data, respectively. The matrix Frobenius norm is represented by

{∥ \cdot ∥}_{F}

. ∇ describes the Laplace gradient operator.

ξ

is used to equilibrate the proportions of the two parts in the updated content loss function.

L_{A d v}^{*} = \frac{1}{2 T} \sum_{t = 1}^{T} ({(D (I_{f}^{t}) - d_{1})}^{2} + {(D (I_{f}^{t}) - d_{2})}^{2}) .

(4)

The second part is the updated adversarial loss

L_{A d v}^{*}

, where

I_{f}^{t}

describes the fused image number t. The number of fused images is T. The numerical value that the proposed generator expects discriminator D to believe as fake data is shown by

d_{1}

, and the numerical value that the proposed generator expects discriminator D to believe as real data is shown by

d_{2}

.

3.3.2. Loss Functions of the Double-Classification Discriminator Based on Layer-Hopping Connections

The discriminator of a single infrared image guides the corresponding generator to retain the features of the images. The semantic features of the visible image cannot realize the complementary function of different types of features under such a discriminator. To sum up, the proposed model should keep the same processing steps and calculation methods in the discriminator for two different input images. Therefore, this study designed a double-classification discriminator with the redefined loss function

L_{D i s}^{*}

, which is shown in detail.

L_{D i s}^{*} = V_{D i s} (I_{v}) + V_{D i s} (I_{f}) + V_{D i s} (I_{r}) .

(5)

L_{D i s}^{*}

can be divided into three parts. In order to perform the same processing steps and calculation methods for infrared images as for visible images, this study designed the

V_{D i s} (I_{r})

, which represents the results of the estimated value for the infrared image.

V_{D i s} (I_{v})

and

V_{D i s} (I_{f})

are the rest of the content, which represent the estimated values of the visible image and fused image, respectively.

V_{D i s} (I_{r}) = \frac{1}{N} \sum_{n = 1}^{N} {(D (I_{r}^{n}) - d_{1})}^{2} .

(6)

The first part is

V_{D i s} (I_{r})

.

I_{r}^{n}

expresses the n number of visible images, N expresses total number of input visible images, and

D (I_{r}^{n})

indicates the estimated value of the discriminator. The numerical value that the proposed generator expects the discriminator to believe as fake data is represented by

d_{1}

.

V_{D i s} (I_{v}) = \frac{1}{N} \sum_{n = 1}^{N} {(D (I_{v}^{n}) - d_{2})}^{2} .

(7)

The next part is

V_{D i s} (I_{v})

, where

I_{v}^{n}

expresses the n number of visible images, N expresses total number of input visible images. and

D (I_{v}^{n})

indicates the results of the discriminator. The numerical value that the generator expects the discriminator to believe as fake data is described by

d_{2}

.

V_{D i s} (I_{f}) = \frac{1}{N} \sum_{n = 1}^{N} {(D (I_{f}^{n}) - d_{3})}^{2} .

(8)

The last part is

V_{D i s} (I_{f})

, where

I_{f}^{n}

denotes the n number of fused images, N denotes total number of fused images. and

D (I_{f}^{n})

indicates the results of the discriminator. The numerical value that the proposed generator expects the discriminator to believe as fake data is described by

d_{3}

.

3.4. Training Details

This study randomly selected 50 groups of visible and infrared images to train and optimize the proposed model in Python 3.7. By clipping each group of visible and infrared images with a step size of 14, groups of training images could be obtained, which were 136 × 136 in size. t groups of source images patched from the training dataset were selected with the same size of 136 × 136 as the inputs of the two-branch generator. Fused image patches were the outputs of the two-branch generator, with sizes of 136 × 136. Then, this study selected t patches of fusion images, with t corresponding patches of infrared images and t corresponding patches of visible images incoming to the discriminator. We trained the two-branch generator and double-classification discriminator n times with an Adam optimizer [30] to obtain the best efficient generator, which is shown in Algorithm 1.

In practice, this study empirically set

E = 10

,

m = 8

, and k as the ratio of the total number of patches to m. The settings of various parameters in the two-branch generator and the multi-classification discriminator included the following:

a = 1

,

b = 50

, and

c = 50

.

α = 0.5

,

β = 0.5

, and

ξ = 5

.

d_{1}

was a random number between 0.6 and 1.1,

d_{2}

was a random number between 0.7 and 1.2, and

d_{3}

was a random number between 0 and 0.3.

During testing, four common datasets were selected to verify the proposed approach through cropping images without overlapping them. Then, a batch of image patches generated by a two-branch generator was spliced together to produce the fusion image.

Algorithm 1 Training procedure of the proposed

1:: for E epochs do
2:: for all n steps do
3:: Select t fusion patches { $I_{f}^{(1)}$ , ⋯ , $I_{f}^{(t)}$ } from G;
4:: Select t infrared patches { $I_{r}^{(1)}$ , ⋯ , $I_{r}^{(t)}$ };
5:: Select t visible patches { $I_{v}^{(1)}$ , ⋯ , $I_{v}^{(t)}$ };
6:: Update discriminators $D i s$ by { $\frac{1}{N} \sum_{n = 1}^{N} {(D (I_{r}^{n}) - d_{1})}^{2}$ + $\frac{1}{N} \sum_{n = 1}^{N} {(D (I_{v}^{n}) - d_{2})}^{2}$ + $\frac{1}{N} \sum_{n = 1}^{N} {(D (I_{f}^{n}) - d_{3})}^{2}$ };
7:: end for
8:: Select t infrared patches { $I_{r}^{(1)}$ , ⋯ , $I_{r}^{(t)}$ } and t visible patches { $I_{v}^{(1)}$ , ⋯ , $I_{v}^{(t)}$ } from training data;
9:: Update generator by $L_{G}$ ;
10:: end for

4. Experiments

In this section, the dataset, comparison method, and evaluation index designed in the comparison experiment are first detailed. Subsequently, comparative tests with twelve methods are carried out on four public datasets, and object detection task analysis and time complexity analysis are introduced in detail. Finally, this study designed an ablation experiment to verify the necessity of each part of the network and summarized the experimental part.

4.1. Experimental Configurations

4.1.1. Datasets and Comparison Method

We selected four public datasets: the TNO dataset, the RoadScene dataset [27], the M³FD dataset [16], and the MFNet dataset. Twelve advanced methods were used to compare experiments, and ten deep-learning-based methods were also used to compare their performance: FusionGAN [2], GANMcC [1], MFEIF [15], U2Fusion [27], IFCNN [12], DIDFuse [13], RFN [11], DDcGAN [17], TarDAL [16], and SeAFusion [14]. We also included MDLatLRR [20] and GTF [9], which are traditional methods.

4.1.2. Metrics

Six different evaluation metrics were selected to quantitatively analyze the proposed model compared with other comparative methods. The calculation of pixel-level similarity between the input and output images, as well as a separate entropy calculation of the output image, were both included in the optimization process. The following are the six evaluation metrics:

EN [31]: Information entropy is used to measure the information richness of fusion results. For fusion images obtained by different methods, the larger the numerical value of information entropy, the more the features from both source images are described in fusion images.

E N = - \sum_{f = 0}^{F - 1} P_{f} {log}_{2} p_{f} .

(9)

SF [32]: Spatial frequency reflects the gray change rate of the images. The larger the numerical value, the sharper the fusion image.

S F = {(\sqrt{\sum_{m = 1}^{K} \sum_{j = 1}^{L} {(x_{m n} - x_{m}, n - 1)}^{2}})}^{2} + {(\sqrt{\sum_{m = 1}^{K} \sum_{j = 1}^{L} {(x_{m n} - x_{m} - 1, n)}^{2}})}^{2} .

(10)

PSNR: One of the quality evaluations based on the noise calculation of the image is the peak signal-to-noise ratio; the higher the numerical value, the higher the pixel quality of the fusion images.

P S N R = 10 * {log}_{10} \frac{{(M A X_{f})}^{2}}{ω_{m} M S E_{m f} + ω_{n} M S E_{n f}} .

(11)

VIF [33]: Visual information fidelity is an evaluation index based on information fidelity, which reflects the modeling relationship between the image and human visual distortion. The VIF is always in the range of [0,1], and the image distortion is lower. More information is retained and consistent with human visual perception when the VIF approaches 1.

AG [34]: The average gradient is used to calculate the gradient information of the images so as to measure the sharpness of theimages; the higher the value, the clearer the fusion result.

A G = \frac{\sum_{i = 1}^{I - 1} \sum_{j = 1}^{J - 1} \sqrt{\frac{{(H (i + 1, j) - H (i, j))}^{2} + {(H (i, j + 1) - H (i, j))}^{2}}{2}}}{(I - 1) * (J - 1)} .

(12)

MSE: The mean square error calculates the differentiation between both the source images and the fused results. The lower the numerical value is, the higher the similarity between both the source images and the fused results.

M S E = \frac{1}{I * J} \sum_{i = 0}^{I} \sum_{j = 0}^{J} {(f_{i j} - s_{i j})}^{2} .

(13)

4.2. Comparison Experiment

In this part, we conducted comparative tests with twelve comparison methods on four public datasets. The following are the quantitative analysis and qualitative analysis.

4.2.1. Qualitative Analysis

It can be obviously observed that TarDAL, MFEIF, DDcGAN, SeAFusion, U2Fusion, and IFCNN could not avoid the excessive useless information retained from visible images, such as the smoke interference information in Figure 4 and the strong light interference information in Figure 5. Although the GTF, MDLatLRR, FusionGAN, GANMcC, and RFN methods could retain useful scene information from infrared images under the interference of smoke or strong light, they only retained contrast information but lacked a lot of contour information and texture information.

Conversely, the proposed method did a good job of preserving useful contrast and texture details from both source images when there was a lot of useless information in the visible images. In addition, the proposed method could extract and retain the weak texture and brightness information of infrared images well when the visible images could not capture effective information, as shown in Figure 4.

In order to better illustrate the advantages of the proposed approach, this study also conducted comparative tests on the TNO dataset and RoadScene dataset. The proposed method could also avoid interference caused by useless smoke information to generate the fusion results well, and it effectively used the information from the infrared images, as shown in Figure 6. Fortunately, the proposed method also showed good contrast information, as well as rich texture information, in scenes where the road had a strong light at night, as shown in Figure 7.

4.2.2. Quantitative Analysis

We randomly selected 70 pairs of visible and infared images from the MFNet dataset, M³FD dataset, and RoadScene dataset, as well as 25 pairs of images from the TNO dataset. The images were quantitatively compared and analyzed across six different evaluation metrics.

As shown in Table 1, Table 2, Table 3 and Table 4, the proposed method outperformed all other comparison methods in the SF and AG values on all datasets, thus indicating that the proposed fusion images have the highest average gradient and spatial frequency, which proves that the proposed fusion images have the highest quality. The ranking of other evaluation indicators also ranked in the top four among various methods, thus indicating that the proposed model can not only retain different features from both source images, but also reconstruct the gradient information in fused images.

In summary, the proposed fused image not only had more similar information from both source images, but the gradient information was also more satisfying for human visual observation.

4.3. Comparative Experiments Based on Object Detection

This study tested the performance of different fusion results in object detection using the MFNet dataset in order to better verify the advantage of the proposed fusion image in weakening noise information. Public YOLOv5 was used as the target detection network, and the results were divided into visual results and quantitative results, which are introduced separately below.

Firstly, two groups from the test results were randomly selected for visual comparison, as shown in Figure 8. From the first set of data, it is not difficult to find that the object detection based on the fused images of this study was superior to the other 12 methods in night images with an interference of strong light. Similarly, there was no error detection in the fusion results of this study, which is also an advantage over the source images. The results of this study could not only be detected with higher accuracy for large targets in the second set of data, but also for all small targets in the images.

Subsequently, 90 groups of object detection results from the MFNet dataset were randomly selected for quantitative analysis, including the three indicators Recall, [email protected], and mAP@[0.5:0.95] in Table 5. It is obvious that, among the above indicators, the fusion images of this study had higher numerical results than the other fusion images. The outstanding ability of the infrared images to capture heat information promotes its achieving the highest detection accuracy and recall rate, but it lacked the necessary texture, which can be well illustrated in Figure 8.

In summary, the proposed method has been convincingly proven to be able to comprehensively retain specific information from different sources, and the interference of noise information was reduced according to the high detection accuracy of the object detection.

4.4. Efficiency Comparison

This study calculated the time complexity of the proposed method and twelve other comparison methods on four datasets. All methods were tested under GPU. It is evident from Table 6 that the proposed method and TarDAL performed well on all datasets. In particular, the proposed method ran fastest on the RoadScene dataset and was second only to TarDAL on the M³FD dataset, MFNet dataset, and TNO dataset. The proposed approach with a faster time is more likely to meet the needs of subsequent advanced visual tasks.

4.5. Ablation Experiment

The functional necessities of each part of the proposed fusion method were well validated by the ablation experiment conducted on the M³FD dataset. The related qualitative and quantitative analyses are discussed below.

Qualitative analysis: As shown in Figure 9, only a double-classification discriminator was designed in Module 1, a two-branch feature extraction was added to the generator part in the model of Module 2, and a two-branch feature interaction was introduced in Module 3.

It can be obviously observed that the introduction of the double-classification discriminator overcame the problem of weak edge information caused by a single discriminator. However, the richness of the texture information and contrast differences still needed to be improved. Meanwhile, a two-branch feature extraction generator was designed to solve the problem that important information is overwritten due to the connection of two source image channels. Finally, the global contrast between different objects was further enhanced, and the detailed texture inside objects was further improved through the design of feature interactions.

Quantitative analysis: In order to more convincingly prove the necessity of each component, this study randomly selected 70 pairs of images from the M³FD dataset for quantitative analysis, as shown in Table 7.

Although the introduction of the double-classification discriminator did not significantly improve information entropy and other indicators, it greatly improved the visual quality of the images. The design of the two-branch feature extraction generator enhanced the gradient information of the fused images but also retained the useless noise information. Fortunately, the design of feature interactions filtered noise information to avoid the texture feature weakening and contrast disappearance caused by the excessive retention of useless information from visible images in the final network.

4.6. Discussion

The proposed method achieved the highest scores in the SF and AG indicators, with the other four indicators also averaging in the top three. Furthermore, the visualization results demonstrated the strong robustness of the proposed method, even when there was noise in the visible images. In the object detection experiment, quantitative comparison with digital evidence confirmed the advantage of the proposed method in preserving contour information. Additionally, visual analysis provided direct evidence that the proposed method could output fusion results that were more suitable for object detection, especially when there was a significant amount of noise in the input images.

In conclusion, the proposed method enhanced the neglected texture detail and contrast when visible images were disturbed by a amount of noise information, which could be well demonstrated by the comparison experiments of different datasets and the performance tests of object detection comparisons with existing advanced methods.

5. Conclusions and Future Work

Within this paper, a two-branch feature interaction fusion network based on generative adversarial networks was proposed for visible and infrared images. The generator part was designed with a two-branch network to strengthen the preservation of edge features and texture information from both visible and infared images. Second, the feature interactions design of the two-branch network filtered the influence of noise from the network in the fusion results. Meanwhile, the design of the double-classification discriminator enhanced the guiding ability of the feature extraction of the generator in the proposed approach. However, the proposed method was only suitable for infrared and visible image fusion, and other multimodal image fusions were difficult to be widely applied due to the lack of datasets, which requires further study.

Author Contributions

Conceptualization , R.C. and S.Z.; Methodology, J.D., S.H., L.X. and C.Z.; Software, J.D. and C.Z.; Validation, N.Z.; Formal analysis, J.D., S.H. and L.X.; Investigation, R.C., N.Z., H.B., C.Z. and Y.Y.; Resources, L.X.; Data curation, R.C., S.H., H.B. and Y.Y.; Writing—original draft, N.Z., S.Z., S.H. and C.Z.; Writing—review & editing, S.Z.; Supervision, S.Z. and H.B.; Funding acquisition, R.C., J.D., N.Z. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The science and technology program “Research on remote safety control technology of power field operation based on infrared and visible multi-source image fusion” funded by China Southern Power Grid provided funding for this effort. This work was also partially supported by Yunnan Province Ten thousand Talents Program and Yunnan Normal University PhD Research Initiation Project 01000205020503148.

Data Availability Statement

Data is not available due to privacy or ethical restrictions.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Ma, J.; Zhang, H.; Shao, Z.; Liang, P.; Xu, H. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 2020, 70, 1–14. [Google Scholar] [CrossRef]
Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 210–227. [Google Scholar] [CrossRef]
Liu, Y.; Liu, S.; Wang, Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
Li, S.; Kang, X.; Fang, L.; Hu, J.; Yin, H. Pixel-level image fusion: A survey of the state of the art. Inf. Fusion 2017, 33, 100–112. [Google Scholar] [CrossRef]
Dogra, A.; Goyal, B.; Agrawal, S. From multi-scale decomposition to non-multi-scale decomposition methods: A comprehensive survey of image fusion techniques and its applications. IEEE Access 2017, 5, 16040–16067. [Google Scholar] [CrossRef]
Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 171–184. [Google Scholar] [CrossRef]
Liu, G.; Lin, Z.; Yu, Y. Robust subspace segmentation by low-rank representation. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 663–670. [Google Scholar]
Ma, J.; Chen, C.; Li, C.; Huang, J. Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 2016, 31, 100–109. [Google Scholar] [CrossRef]
Ma, J.; Zhou, Z.; Wang, B.; Zong, H. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 2017, 82, 8–17. [Google Scholar] [CrossRef]
Li, H.; Wu, X.J.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
Zhao, Z.; Xu, S.; Zhang, C.; Liu, J.; Li, P.; Zhang, J. DIDFuse: Deep image decomposition for infrared and visible image fusion. arXiv 2020, arXiv:2003.09210. [Google Scholar]
Tang, L.; Yuan, J.; Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
Liu, J.; Fan, X.; Jiang, J.; Liu, R.; Luo, Z. Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 105–119. [Google Scholar] [CrossRef]
Liu, J.; Fan, X.; Huang, Z.; Wu, G.; Liu, R.; Zhong, W.; Luo, Z. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5802–5811. [Google Scholar]
Ma, J.; Xu, H.; Jiang, J.; Mei, X.; Zhang, X.P. DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995. [Google Scholar] [CrossRef]
Han, M.; Yu, K.; Qiu, J.; Li, H.; Wu, D.; Rao, Y.; Yang, Y.; Xing, L.; Bai, H.; Zhou, C. Boosting target-level infrared and visible image fusion with regional information coordination. Inf. Fusion 2023, 92, 268–288. [Google Scholar] [CrossRef]
Rao, Y.; Wu, D.; Han, M.; Wang, T.; Yang, Y.; Lei, T.; Zhou, C.; Bai, H.; Xing, L. AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion. Inf. Fusion 2023, 92, 336–349. [Google Scholar] [CrossRef]
Li, H.; Wu, X.J.; Kittler, J. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 2020, 29, 4733–4746. [Google Scholar] [CrossRef]
Zhao, Z.; Xu, S.; Zhang, J.; Liang, C.; Zhang, C.; Liu, J. Efficient and model-based infrared and visible image fusion via algorithm unrolling. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1186–1196. [Google Scholar] [CrossRef]
Li, H.; Wu, X.J.; Durrani, T. NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656. [Google Scholar] [CrossRef]
Wang, D.; Liu, J.; Fan, X.; Liu, R. Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. arXiv 2022, arXiv:2205.11876,. [Google Scholar]
Zhang, H.; Ma, J. SDNet: A versatile squeeze-and-decomposition network for real-time image fusion. Int. J. Comput. Vis. 2021, 129, 2761–2785. [Google Scholar] [CrossRef]
Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
Xie, H.; Zhang, Y.; Qiu, J.; Zhai, X.; Liu, X.; Yang, Y.; Zhao, S.; Luo, Y.; Zhong, J. Semantics lead all: Towards unified image registration and fusion from a semantic perspective. Inf. Fusion 2023, 98, 101835. [Google Scholar] [CrossRef]
Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
Li, H.; Wu, X.J.; Kittler, J. Infrared and visible image fusion using a deep learning framework. In Proceedings of the 2018 IEEE 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 2705–2710. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Roberts, J.W.; Van Aardt, J.A.; Ahmed, F.B. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J. Appl. Remote. Sens. 2008, 2, 023522. [Google Scholar]
Eskicioglu, A.M.; Fisher, P.S. Image quality measures and their performance. IEEE Trans. Commun. 1995, 43, 2959–2965. [Google Scholar] [CrossRef]
Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [Google Scholar] [CrossRef]
Zhao, W.; Wang, D.; Lu, H. Multi-focus image fusion with a natural enhancement via a joint multi-level deeply supervised convolutional neural network. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 1102–1115. [Google Scholar] [CrossRef]

Figure 1. Structure of two-branch-feature-complementarity-based fusion method. All source images are transferred to respective branches for feature extraction and feature fusion in generator. Then, fusion results are obtained through feature reconstruction. Finally, fusion images, visible images and infrared images are used as the input of discriminator respectively, and identification results are guided by loss functions to optimize fusion network.

Figure 2. The generator structure based on two-branch feature complementarity. Visible images are fed into one branch, and infrared images are fed into another branch in the proposed generator. Feature complementation is carried out by pixel superposition in the process of feature extraction of different branches. Different features obtained from both branches are fused and reconstructed to obtain final fused images.

Figure 3. Structure of double-classification discriminator based on layer-hopping connections. Both source and fused images are fed to discriminator containing two skip connection layers, respectively, to calculate the discriminant results of all images.

Figure 4. The Fused results of the proposed fusion method and twelve other comparison methods used on the M³FD dataset. Three different groups of contrast images are listed as a, b, c. Distinct features are marked with GREEN boxes , and enlarged features are in the corner of the images.

Figure 5. The fused results of the proposed fusion method and twelve other comparison methods used on the MFNet dataset. Distinct features are marked with GREEN boxes, and enlarged features are in the lower left corner of images.

Figure 6. The fused results of the proposed fusion method and twelve other comparison methods used on TNO dataset. Distinct features are marked with GREEN boxes.

Figure 7. The fused results of the proposed fusion method and twelve other comparison methods used on RoadScene dataset. Distinct features are marked with GREEN boxes, and enlarged features are in upper corner of images.

Figure 8. Visual comparison of different fusion results based on the MFNet dataset in target detection task.

Figure 9. The visualization results of different modules in ablation experiment conducted on the M³FD dataset. The fusion model with double-classification discriminator added is Module 1, the fusion model with double-classification discriminator and two-branch generator is Module 2, and Module 3 is the fusion model with information interactions added in two-branch generator. Areas with large differences are highlighted by GREEN boxes.

Table 1. The average values of the proposed and twelve other comparison methods for six evaluation indicators evaluated using the M³FD dataset. GREEN highlights the third best result, BLUE highlights the second best result, and RED highlights the best result.

Methods Name	EN↑	SF↑	PSNR↑	VIF↑	AG↑	MSE↓
GTF	7.334	0.059	62.886	0.836	6.505	0.039
MDLatLRR	6.362	0.031	62.183	0.658	2.560	0.041
FusionGAN	6.737	0.036	63.810	0.589	4.155	0.031
GANMcC	6.860	0.034	63.861	0.809	3.924	0.030
IFCNN	6.794	0.061	65.707	0.792	6.683	0.020
RFN	7.343	0.021	61.612	1.090	2.244	0.052
U2Fusion	7.051	0.065	64.795	0.978	7.923	0.024
DDcGAN	7.473	0.043	57.656	0.087	4.049	0.112
DIDFuse	7.066	0.047	61.132	0.789	3.917	0.051
MFEIF	6.526	0.029	61.505	0.688	2.561	0.048
SeAFusion	6.649	0.048	59.165	0.765	3.896	0.083
TarDAL	7.028	0.045	60.823	0.734	3.550	0.057
OURS	7.170	0.072	64.553	0.993	8.878	0.025

Table 2. The average values of the proposed and twelve other comparison methods for six evaluation indicators evaluated using the MFNet dataset. GREEN highlights the third best result, BLUE highlights the second best result, and RED highlights the best result.

Methods Name	EN↑	SF↑	PSNR↑	VIF↑	AG↑	MSE↓
GTF	6.053	0.033	63.190	0.708	2.569	0.034
MDLatLRR	6.262	0.028	65.253	0.793	2.389	0.021
FusionGAN	5.879	0.019	63.703	0.724	1.652	0.030
GANMcC	6.463	0.023	64.049	0.794	2.055	0.031
IFCNN	6.423	0.037	65.945	0.786	3.119	0.018
RFN	6.019	0.014	62.590	0.848	1.315	0.040
U2Fusion	6.698	0.044	63.624	0.664	3.898	0.031
DDcGAN	7.269	0.042	57.866	0.100	4.038	0.108
DIDFuse	5.915	0.040	62.193	0.680	2.886	0.039
MFEIF	6.373	0.026	64.732	0.860	2.248	0.025
SeAFusion	6.693	0.045	63.598	0.920	3.758	0.031
TarDAL	6.615	0.038	62.961	0.830	3.004	0.035
OURS	6.943	0.048	64.459	0.940	4.936	0.024

Table 3. The average values of the proposed method and twelve other comparison methods for six evaluation indicators evaluated using the RoadScene dataset. GREEN highlights the third best result, BLUE highlights the second best result, and RED highlights the best result.

Methods Name	EN↑	SF↑	PSNR↑	VIF↑	AG↑	MSE↓
GTF	7.654	0.035	59.796	0.718	3.728	0.074
MDLatLRR	6.752	0.041	64.255	0.738	4.183	0.035
FusionGAN	7.131	0.035	59.796	0.609	3.729	0.074
GANMcC	7.299	0.038	60.424	0.754	4.242	0.069
IFCNN	7.043	0.062	64.195	0.769	6.054	0.035
RFN	7.316	0.036	60.757	0.787	4.172	0.067
U2Fusion	7.095	0.069	62.720	0.754	7.172	0.043
DDcGAN	7.628	0.049	56.243	0.078	5.313	0.161
DIDFuse	7.250	0.063	62.010	0.805	6.540	0.049
MFEIF	6.882	0.041	63.109	0.789	4.236	0.044
SeAFusion	7.288	0.077	61.357	0.855	7.991	0.063
TarDAL	7.225	0.055	62.187	0.761	5.195	0.050
OURS	7.414	0.095	62.304	0.671	10.592	0.046

Table 4. The average values of the proposed method and twelve other comparison methods for six evaluation indicators evaluated using the TNO dataset. GREEN highlights the third best result, BLUE highlights the second best result, and RED highlights the best result.

Methods Name	EN↑	SF↑	PSNR↑	VIF↑	AG↑	MSE↓
GTF	6.863	0.041	62.043	0.682	3.892	0.042
MDLatLRR	6.298	0.028	63.333	0.731	2.901	0.033
FusionGAN	6.651	0.026	61.058	0.712	2.587	0.054
GANMcC	6.803	0.026	61.869	0.784	2.858	0.049
IFCNN	6.590	0.047	63.774	0.736	4.551	0.022
RFN	7.020	0.025	62.362	0.903	3.016	0.038
U2Fusion	7.060	0.051	63.072	0.913	5.523	0.030
DDcGAN	7.490	0.043	57.090	0.071	5.063	0.122
DIDFuse	6.922	0.042	61.445	0.811	4.253	0.048
MFEIF	6.578	0.027	62.577	0.817	3.012	0.040
SeAFusion	7.196	0.054	62.193	1.063	5.716	0.039
TarDAL	7.095	0.045	61.183	0.838	4.225	0.050
OURS	7.072	0.061	63.479	0.819	6.845	0.029

Table 5. Fusion results of this study were compared quantitatively with fusion results of twelve other advanced methods in object detection. GREEN highlights the third best result, BLUE highlights the second best result, and RED highlights the best result.

Methods Name	Recall	[email protected]	mAP@[0.5:0.95]
Visible images	0.287	0.326	0.153
Infrared images	0.386	0.436	0.243
GTF	0.336	0.373	0.208
MDLatLRR	0.362	0.408	0.223
FusionGAN	0.297	0.376	0.205
GANMcC	0.353	0.403	0.223
IFCNN	0.353	0.409	0.229
RFN	0.251	0.22	0.126
U2Fusion	0.358	0.404	0.220
DDcGAN	0.239	0.300	0.145
DIDFuse	0.326	0.380	0.206
MFEIF	0.355	0.410	0.228
SeAFusion	0.358	0.399	0.223
TarDAL	0.353	0.404	0.229
OURS	0.355	0.417	0.237

Table 6. The running time of thirteen comparison methods on M³FD dataset, MFNet dataset, RoadScene dataset, and TNO dataset. GREEN highlights the third best result, BLUE highlights the second best result, and RED highlights the best result (unit: s).

Methods Name	M³FD	MFNet	RoadScene	TNO
GTF	14.921	14.015	6.174	4.450
MDLatLRR	115.347	35.710	16.908	37.584
FusionGAN	0.401	0.165	1.058	0.660
GANMcC	0.672	0.301	2.023	1.278
IFCNN	3.507	45.131	38.934	25.620
RFN	26.730	11.691	6.823	10.206
U2Fusion	9.081	3.240	1.717	3.039
DDcGAN	1.069	0.659	3.453	2.218
DIDFuse	1.045	0.382	0.110	0.406
MFEIF	0.364	0.165	0.092	0.182
SeAFusion	6.039	0.235	0.058	0.175
TarDAL	0.144	0.086	0.056	0.121
OURS	0.211	0.096	0.052	0.125

Table 7. The mean value of each module in ablation experiment under six different evaluation indices conducted on M³FD dataset. The fusion model with double-classification discriminator added is Module 1, the fusion model with double-classification discriminator and two-branch generator is Module 2, and Module 3 is the fusion model with information interaction added in two-branch generator. RED indicates the best result, and BLUE represents the second best result.

Module	EN↑	SF↑	PSNR↑	VIF↑	AG↑	MSE↓
NO	6.737	0.036	63.810	0.589	4.155	0.031
Module 1	6.668	0.026	59.592	1.035	3.364	0.047
Module 2	6.443	0.042	61.132	0.796	5.960	0.143
Module 3	7.170	0.072	64.553	0.993	8.878	0.025

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, R.; Dang, J.; Zhang, N.; Zhao, S.; Hu, S.; Xing, L.; Bai, H.; Zhou, C.; Yang, Y. Two-Branch Feature Interaction Fusion Method Based on Generative Adversarial Network. Electronics 2023, 12, 3442. https://doi.org/10.3390/electronics12163442

AMA Style

Chang R, Dang J, Zhang N, Zhao S, Hu S, Xing L, Bai H, Zhou C, Yang Y. Two-Branch Feature Interaction Fusion Method Based on Generative Adversarial Network. Electronics. 2023; 12(16):3442. https://doi.org/10.3390/electronics12163442

Chicago/Turabian Style

Chang, Rong, Junpeng Dang, Nanchuan Zhang, Shan Zhao, Shijin Hu, Lin Xing, Haicheng Bai, Chengjiang Zhou, and Yang Yang. 2023. "Two-Branch Feature Interaction Fusion Method Based on Generative Adversarial Network" Electronics 12, no. 16: 3442. https://doi.org/10.3390/electronics12163442

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Branch Feature Interaction Fusion Method Based on Generative Adversarial Network

Abstract

1. Introduction

2. Related Works

2.1. The Traditional Fusion Method

2.2. Convolutional-Neural-Network-Based Fusion Methods

2.3. Generative-Adversarial-Network-Based Methods

3. Proposed Method

3.1. Problem Formulation

3.2. The Fusion Model Based on Two-Branch Feature Complementation

3.2.1. The Architecture of the Generator Based on Two-Branch Feature Complementarity

3.2.2. The Architecture of Double-Classification Discriminator Based on Layer-Hopping Connections

3.3. Loss Functions

3.3.1. Loss Functions of the Generator Based on Two-Branch Feature Complementarity

3.3.2. Loss Functions of the Double-Classification Discriminator Based on Layer-Hopping Connections

3.4. Training Details

4. Experiments

4.1. Experimental Configurations

4.1.1. Datasets and Comparison Method

4.1.2. Metrics

4.2. Comparison Experiment

4.2.1. Qualitative Analysis

4.2.2. Quantitative Analysis

4.3. Comparative Experiments Based on Object Detection

4.4. Efficiency Comparison

4.5. Ablation Experiment

4.6. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI