Fusion of Infrared and Visible Light Images Based on Improved Adaptive Dual-Channel Pulse Coupled Neural Network

Feng, Bin; Ai, Chengbo; Zhang, Haofei

doi:10.3390/electronics13122337

Open AccessArticle

Fusion of Infrared and Visible Light Images Based on Improved Adaptive Dual-Channel Pulse Coupled Neural Network

by

Bin Feng

^1,*

,

Chengbo Ai

¹ and

Haofei Zhang

²

¹

School of Optoelectronic Engineering, Xi’an Technological University, Xi’an 710021, China

²

No. 208 Research Institute of China Ordnance Industries, Beijing 102202, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(12), 2337; https://doi.org/10.3390/electronics13122337

Submission received: 3 May 2024 / Revised: 3 June 2024 / Accepted: 7 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue Machine Learning Methods for Solving Optical Imaging Problems)

Download

Browse Figures

Versions Notes

Abstract

:

The pulse-coupled neural network (PCNN), due to its effectiveness in simulating the mammalian visual system to perceive and understand visual information, has been widely applied in the fields of image segmentation and image fusion. To address the issues of low contrast and the loss of detail information in infrared and visible light image fusion, this paper proposes a novel image fusion method based on an improved adaptive dual-channel PCNN model in the non-subsampled shearlet transform (NSST) domain. Firstly, NSST is used to decompose the infrared and visible light images into a series of high-pass sub-bands and a low-pass sub-band, respectively. Next, the PCNN models are stimulated using the weighted sum of the eight-neighborhood Laplacian of the high-pass sub-bands and the energy activity of the low-pass sub-band. The high-pass sub-bands are fused using local structural information as the basis for the linking strength for the PCNN, while the low-pass sub-band is fused using a linking strength based on multiscale morphological gradients. Finally, the fused high-pass and low-pass sub-bands are reconstructed to obtain the fused image. Comparative experiments demonstrate that, subjectively, this method effectively enhances the contrast of scenes and targets while preserving the detail information of the source images. Compared to the best mean values of the objective evaluation metrics of the compared methods, the proposed method shows improvements of 2.35%, 3.49%, and 11.60% in information entropy, mutual information, and standard deviation, respectively.

Keywords:

pulse coupled neural network; infrared and visible light images; image fusion; non-subsampled shearlet transform

1. Introduction

Visible light imaging sensors often struggle to produce high-quality images of targets and scenes in complex environments such as concealment, low light, and night-time conditions. Solely relying on visible light images is insufficient for machine vision tasks that require all-weather functionality, high robustness, and environmental adaptability [1]. In contrast, infrared imaging is unaffected by lighting conditions and environmental factors, effectively highlighting concealed targets and resolving issues of target occlusion in these challenging environments. However, infrared images suffer from drawbacks such as background blurring and loss of texture information, whereas visible light images provide a more detailed and layered depiction of target and scene information. Fusion images that combine the advantages of both visible light and infrared images not only effectively highlight targets in infrared images but also incorporate background details from both sources. This improves the accuracy, robustness, and environmental adaptability of target detection and recognition. The fused image offers richer information for subsequent target detection and tracking tasks. This allows detection and tracking algorithms to more accurately determine the category and trajectory of targets, thereby enhancing the overall performance and efficiency of the target tracking system.

At present, image fusion methods can be primarily classified into four main approaches: multi-scale geometric analysis (MGA) [2,3,4], saliency detection [5], sparse representation (SR) [6], and neural networks (NN) [7]. MGA methods effectively capture the spatial structures of edges and detailed texture information across various scales. Saliency detection methods often excessively emphasize prominent areas in images, leading to the loss of information in other regions. SR methods typically face challenges in constructing overcomplete dictionaries with good data representation, thereby limiting the development of this approach. Methods based on neural networks usually demand significant computational resources and large-scale datasets, making them less practical for low-cost image fusion. Each method has its own strengths and limitations. Hence, a hybrid approach combining the advantages of two methods can alleviate the inherent limitations of individual methods and further enhance the quality of fused images.

The Non-Subsampled Shearlet Transform (NSST), as an advanced MGA method, effectively exploits the intrinsic geometric regularities of image structures to achieve an optimal representation of images with rich edges across various scales and directions. However, the presence of linear transformations in NSST may result in pixel distortion in fused images. On the contrary, the non-linear Pulse-Coupled Neural Network (PCNN) [8] can effectively mitigate this phenomenon. PCNN does not require extensive training datasets, overcoming the limitations of traditional neural network methods dependent on datasets and computational resources. Influenced by the advantages of PCNN and NSST, this study explores the utilization of a PCNN-based method in conjunction with NSST to enhance fusion performance.

The original Pulse-Coupled Neural Network (PCNN) model, with its single input channel, necessitates the manual tuning of multiple parameters that influence fusion outcomes. Wang et al. [9] devised a dual-channel PCNN model for medical image fusion, surpassing the limitations of the original model in concurrently handling diverse image types. Nevertheless, this model lacks the direct output of fused images, requiring the introduction of time matrices and linear transformations. Chen et al. [10] proposed a single-channel parameter adaptive PCNN (PAPCNN) model, aiming to streamline the complexity and inefficiency of manual parameter adjustment. However, the PAPCNN model assigns a uniform linking strength to all neurons, potentially limiting its ability to effectively capture image feature information. Gu et al. [11] introduced unit linking into PCNNs, markedly reducing the number of PCNN parameters. Panigrahy et al. [12] proposed a parameter-adaptive unit linking PCNN (PAUDPCNN) fusion method within the NSCT domain. Despite this, the adaptive approach in the PAUDPCNN model allocates varying linking strengths to neurons, with other parameters remaining constant akin to the PAPCNN model. Additionally, the fusion process within the NSCT domain operates at a slower pace, resulting in increased time costs for Panigrahy’s fusion approach. These methods primarily employ normalized grayscale values of input images as PCNN model inputs. However, human visual perception typically prioritizes image texture and edges over individual pixels, potentially leading to suboptimal fusion outcomes. Cheng et al. [13] proposed an innovative fusion framework for visible light and infrared images, leveraging singular value decomposition and adaptive dual-channel PCNN within the NSST domain. While this method employs identical connection strengths for fusing high-pass and low-pass sub-bands, such an approach may not be suitable given the differing information contained within these sub-bands.

Drawing inspiration from the advantages and limitations of PCNN models mentioned above, this study optimizes the parameter setting method and spatial correlation of PCNN models, proposing a new model for adaptive dual-channel PCNN based on regional structural information operators and multi-scale morphological gradients. Subsequently, NSST decomposes infrared and visible light source images into high and low-pass sub-bands, with the normalized high-pass sub-band’s weighted sum of eight-neighborhood Laplacian and the low-pass sub-band’s energy activity used as feedback inputs for the new model’s high and low-pass sub-band fusion, respectively. Finally, the fused high and low-pass sub-bands are reconstructed to obtain the fused image. The ablation experiments demonstrate the superior performance of the proposed model in this paper. Fusion experiments comparing our method with five other methods demonstrate that, subjectively, our method effectively enhances scene and target contrast while maintaining rich texture and detail information. Objectively, our method outperforms the compared methods in information entropy, mutual information, and standard deviation by 2.35%, 3.49%, and 11.60%, respectively. Therefore, our method exhibits excellent comprehensive performance both subjectively and objectively.

2. Background

2.1. Dual-Channel Pulse Coupled Neural Network

The Dual-Channel Pulse-Coupled Neural Network (DCPCNN) exhibits advantages over its single-channel counterpart in information processing and feature extraction [14]. By simultaneously leveraging multiple channels of information, the network can capture richer and more complex features. The DCPCNN model comprises three components: the receptive field, information fusion pool, and pulse generation field [15]. The model architecture and mathematical representation of the DCPCNN are shown in Figure 1 and Equation (1), respectively.

\{\begin{cases} F^{I} (i, j) = S^{I} (i, j) \\ F^{V} (i, j) = S^{V} (i, j) \\ L_{n} (i, j) = \{\begin{array}{l} 1, if \sum_{k, l \in N (i, j)} Y_{n - 1} (k, l) > 0 \\ 0, otherwise \end{array} \\ U_{n}^{I} (i, j) = F_{n}^{I} (i, j) [1 + β^{I} (i, j) L_{n} (i, j)] \\ U_{n}^{V} (i, j) = F_{n}^{V} (i, j) [1 + β^{V} (i, j) L_{n} (i, j)] \\ U_{n} (i, j) = \exp (- a_{u}) U_{n - 1} (i, j) + \max {U_{n}^{I} (i, j), U_{n}^{V} (i, j)} \\ Y_{n} (i, j) = \{\begin{array}{l} 1, if U_{n} (i, j) \geq θ_{n} (i, j) \\ 0, otherwise \end{array} \\ θ_{n} (i, j) = \exp (- a_{θ}) θ_{n - 1} (i, j) + V_{θ} Y_{n} (i, j) \end{cases}

(1)

In Equation (1),

F^{I} (i, j)

,

F^{V} (i, j)

,

L_{n} (i, j)

,

Y_{n} (i, j)

and

θ_{n} (i, j)

represent the dual-channel feedback input, link input, firing pulse output, and dynamic threshold of the neuron at position

(i, j)

at the nth iteration, respectively;

S^{I} (i, j)

and

S^{V} (i, j)

are the normalized grayscale values of the infrared image and visible light image at position

(i, j)

;

U_{n}^{I} (i, j)

,

U_{n}^{V} (i, j)

, and

U_{n} (i, j)

, respectively, represent the internal activity of the

(i, j)

neuron within the infrared and visible light channels at the nth iteration and the internal activity of the model output;

β^{I} (i, j)

and

β^{V} (i, j)

represent the link strengths corresponding to

S^{I} (i, j)

and

S^{V} (i, j)

, respectively;

a_{u}

and

a_{θ}

represent the exponential decay coefficients of the internal activity and dynamic threshold, respectively.

V_{θ}

is the threshold amplification coefficient.

The working principle of the DCPCNN is as follows: The two channels respectively receive pixel values from two types of images as feedback input to the model and take the activation status of surrounding neurons as the connection input. If any of the eight neighboring neurons is activated, the connection input is set to 1; otherwise, it is set to 0. The internal activity term of a neuron is determined based on the internal activity terms of the two channels and the computation from the previous iteration. If the internal activity term exceeds the dynamic threshold, the neuron emits a pulse signal, indicating that the neuron is activated.

2.2. Non-Subsampled Shearlet Transform

The shearlet transform is a method combining geometric and multiscale analysis, proposed based on synthetically dilated affine systems [4]. When the dimensionality is 2, the shearlet functions are defined as follows:

Ω_{A B} (ψ) = {ψ_{i, j, k} (x) = {|\det A|}^{j / 2} ψ (B^{l} A^{j} x - k); j, l \in Z^{2}}

(2)

In Equation (2),

ψ

represents a set of basic functions satisfying

ψ \in L^{2} (R^{2})

; A denotes the anisotropic matrix used for multiscale partitioning and B is the shear matrix used for directional analysis, both of which are 2 × 2 invertible matrices, where

|\det B| = 1

. Z represents the integer domain. j, l, and k, respectively, denote the decomposition scale, directional parameter, and shear parameter. A and B can be expressed as follows:

A = [\begin{matrix} 4 & 0 \\ 0 & \sqrt{2} \end{matrix}], B = [\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}]

(3)

NSST maps the pseudo-polar coordinate system to the Cartesian coordinate system and demonstrates through Fourier inverse transformation that its operation can be achieved via two-dimensional convolution, thus avoiding down-sampling operations. NSST consists of two main components [16]: the Non-Subsampled Pyramid Filter Banks (NSPFB) and Shift-Invariant Shearlet Filter Banks (SFB). The NSST process (Figure 2) unfolds as follows: initially, the NSPFB conducts a multiscale decomposition of the source image at level k, yielding a single low-pass sub-band and k high-pass sub-bands; subsequently, the SFB applies a directional transformation at level j to the high-pass sub-bands, yielding 2^j high-pass components. This effectively achieves a multidirectional decomposition of the high-pass sub-bands.

3. Proposed Method

3.1. Improved Adaptive Dual-Channel Pulse Coupled Neural Network

From Equation (1), it is evident that in the DCPCNN model, the parameters

β^{I, V}

,

α_{u}

,

V_{θ}

, and

α_{θ}

need to be set. Due to the varying ranges and step sizes of the aforementioned five parameters, researchers must manually attempt each combination of parameter values to find the optimal settings, which is nearly impossible to accomplish in practical scenarios. Hence, the adaptive adjustment of these parameters is required. In the traditional dual-channel PCNN model, the same linking strength value is applied to each neuron’s value. To better express the characteristic information of the image, the linking strength should vary with different inputs to the neurons.

Since the model’s feedback input is normalized image data, failure to normalize the connection strength would result in drastic fluctuations in the internal activity terms of the two channels. Therefore, the linking strengths can be normalized based on the input to the neuron. In this study, the Sigmoid function is employed to normalize the linking strengths.

β^{I, V} (i, j) = \frac{1}{1 + e^{- u^{I, V} (i, j)}}

(4)

In Equation (4),

- u^{I, V} (i, j)

represents the data at position

(i, j)

, calculated based on Equations (12) and (16) for the infrared and visible light images.

Parameter

α_{u}

is the exponential decay coefficient for the internal activity term, which affects the distribution range of the internal activity term. A lower value of

α_{u}

results in a wider distribution range of the internal activity term, indicating an inverse relationship between the value of

α_{u}

and the internal activity term. This paper adopts the method of adaptive parameter setting for single-channel PCNN [10], combined with the data of infrared and visible light images to define parameter

α_{u}

, as given in Equation (5):

α_{u} = \log (\frac{1}{\sqrt{σ (I) σ (V)}})

(5)

In Equation (5),

σ (I)

and

σ (V)

, respectively, represent the standard deviations of the normalized pixel values of the infrared and visible light images.

In the adaptive parameter setting method for single-channel PCNN, parameters

V_{θ}

and

α_{θ}

can be calculated based on

α_{u}

and linking strength

β^{I, V}

. Since the linking strength varies with different inputs to the neurons, parameters

V_{θ}

and

α_{θ}

should also vary with different inputs to the neurons. Parameters

V_{θ}

and

α_{θ}

in the PCNN model can be adaptively represented as follows:

V_{θ} (i, j) = e^{- α_{u}} + \frac{1}{2} (β^{I} (i, j) + β^{V} (i, j))

(6)

α_{θ} (i, j) = \ln \frac{V_{θ}}{S^{I} S^{V} (\frac{1 - e^{- 3 α_{u}}}{1 - e^{- α_{u}}} + \frac{1}{2} (β^{I} (i, j) + β^{V} (i, j)) e^{- α_{u}})}

(7)

In Equation (7),

S^{I}

and

S^{V}

, respectively, represent the Otsu thresholds of the normalized pixel values of the infrared and visible light images.

The number of iterations in traditional PCNN models is typically determined empirically or through repeated experiments. If the number of iterations n in the PCNN model is too small, it may lead to poor fusion results, while if n is too large, it may result in higher computational costs and performance degradation due to increased memory usage. To adaptively set a reasonable number of iterations, a time matrix model of the same size as the image is used in PCNN. The time matrix T is defined as:

T_{i j} [n] = \{\begin{matrix} n, \\ T_{i j} [n - 1], \end{matrix} \begin{array}{l} if Y_{i j} = 1 for the first time \\ otherwise \end{array}

(8)

In Equation (8), the time matrix values can exhibit three scenarios:

(1): If Y_ij remains inactive, the value of T_ij remains unchanged.
(2): If Y_ij fires for the first time, T_ij will be set to the corresponding iteration number.
(3): Once Y_ij has fired, the value of T_ij will be saved as the iteration number of when Y_ij first fired.

3.2. Proposed Fusion Framework

This paper integrates the multiscale transformation algorithm NSST with PCNN, leveraging their spatial and spectral information advantages. The method optimizes the parameter settings of the PCNN model and spatial correlation and designs fusion strategies for the high-frequency and low-frequency sub-bands obtained from the NSST decomposition to enhance energy preservation and detail-extraction capabilities. The fusion method proposed in this paper is illustrated in Figure 3, with the following workflow:

(1): Perform multiscale decomposition on the registered infrared and visible light images using NSST to obtain a low-pass sub-band and a series of high-pass sub-band images.
(2): Utilizing the Weighted Sum of Eight-neighborhood Laplacian (WSEML) [17] from normalized high-pass sub-bands as the feedback input for LSI-PCNN, and employing the energy attribute (EA) [18] and WSEML from the normalized low-pass sub-band as the feedback input for MSMG-PCNN.
(3): Initialize the parameters of PCNN, adaptively compute the parameters and iterations of LSI-PCNN and MSMG-PCNN, perform fusion of the high and low-pass sub-bands based on the magnitude of the internal activity terms output by PCNN.
(4): Perform the NSST inverse transform to reconstruct the fused image from the fused high and low-pass sub-bands.

3.2.1. Fusion of High-Pass Sub-Bands

The high-pass sub-bands reflect the detailed texture and edge information of the image. Traditional fusion rules for high-pass sub-bands based on PCNN typically use the grayscale values of individual pixels as feedback inputs. However, human visual perception is more sensitive to image textures and edges, making pixel-based methods less effective in achieving satisfactory fusion outcomes. The Weighted Sum of Eight-neighborhood Laplacian (WSEML) [17] can more effectively extract gradient information from eight directions within the neighborhood, thereby better reflecting the details of image edges. In this paper, we utilize the utilized WSEML, normalized to [0, 1], to stimulate LSI-PCNN. The mathematical definition of WSEML is as follows:

W S E M L (i, j) = \sum_{m = - r}^{r} \sum_{n = - r}^{r} W (m + r + 1, n + r + l) {E M L S}_{S} (i + m, j + n)

(9)

\begin{matrix} E M L (i, j) & = |2 S^{L} (i, j) - S^{L} (i - 1, j) - S^{L} (i + 1, j)| \\ + |2 S^{L} (i, j) - S^{L} (i, j - 1) - S^{L} (i, j + 1)| \\ + \frac{1}{\sqrt{2}} |2 S^{L} (i, j) - S^{L} (i - 1, j - 1) - S^{L} (i + 1, j + 1)| \\ + \frac{1}{\sqrt{2}} |2 S^{L} (i, j) - S^{L} (i - 1, j + 1) - S^{L} (i + 1, j - 1)| \end{matrix}

(10)

In Equations (9) and (10),

S^{L} (i, j)

represents the high-pass sub-bands coefficient at scale L and position

(i, j)

, while EML represents the Laplacian energy calculated from the gradients in eight directions within a 3 × 3 region. Compared to SML [19], which extracts the energy sum of gradients in four directions within the neighborhood, EML provides a more comprehensive representation of neighborhood information. To emphasize the central pixel of the window, the weighting matrix W is defined as:

W = \frac{1}{16} [\begin{matrix} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \end{matrix}]

(11)

Since high-pass and low-pass sub-bands reflect distinct image features and hold varying degrees of significance within the image, using identical linking strengths may result in an undue emphasis on high-frequency information or an excessive attenuation of low-frequency details, thereby compromising the image quality and visual fidelity. Consequently, it is generally recommended to employ distinct linking strengths for high-pass and low-pass sub-band fusion to better preserve both the fine details and overall structural integrity of the image.

The regional singular values of an image reflect the structural characteristics and energy information within a local area. Specifically, regional singular values represent the main features and variation trends of that local region, capturing the primary directions and intensity changes of the pixels within the area.

Regional singular values represent the main features and variation trends of a local area, capturing the primary directions and intensity changes of the pixels within that region. Therefore, the regional singular values of an image reflect the structural characteristics and energy information within the local area. If the regional singular values of the image are replaced with an identity diagonal matrix, these characteristic details are removed, resulting in the loss of the image’s details and textures. As shown in Figure 4, the infrared and visible light source images in Figure 4a are decomposed using regional singular value decomposition and then replaced with an identity diagonal matrix, resulting in Figure 4b. Compared to Figure 4a, the two images in Figure 4b lose their original structure and local area features, becoming blurred and flattened.

The region-wise singular values of an image form a diagonal matrix, which cannot be directly utilized as the connectivity strengths in pulse-coupled neural networks. Therefore, employing the Frobenius norm to compute the singular values within regions allows for the assessment of the overall magnitude of singular values within local areas. The Frobenius norm quantifies the overall structural characteristics of the singular value matrix, further capturing the structural information of the regional image. The region-wise information operator obtained from the aforementioned computation is referred to as the Local Structure Information (LSI) operator. LSI is defined as:

L S I^{I, V} (i, j) = u^{I, V} (i, j) = {‖σ^{I, V} (i, j)‖}_{F}^{2}

(12)

In Equation (12),

σ^{I, V} (i, j)

represents the singular value matrix of infrared and visible light images within a 3 × 3 window and ‖ ‖_F denotes the Frobenius norm of the singular value matrix.

The LSI operator, obtained by computing the singular values within regions using the Frobenius norm, effectively quantifies the overall structural characteristics of the singular value matrix. Consequently, the local structural information represented by the LSI operator is introduced as the linking strength for high-pass sub-band fusion, building upon the improved adaptive unit-linked dual-channel PCNN. This type of PCNN is referred to as LSI-PCNN.

According to Equations (1), (4)–(8), and (12), LSI-PCNN-related parameters are adaptively calculated. The fusion of high-pass sub-band coefficients is based on the magnitude of the model’s output internal activity terms. The fusion formula for high-pass sub-band coefficients is as follows:

H_{F}^{l, k} (i, j) = \{\begin{array}{l} H_{V}^{l, k} (i, j), if U_{V H}^{l, k} (i, j) \geq U_{I H}^{l, k} (i, j) \\ H_{I}^{l, k} (i, j), otherwise \end{array}

(13)

In Equation (13),

H_{V H}^{l, k}

,

H_{V}^{l, k}

, and

H_{I}^{l, k}

, respectively, represent the fused high-pass sub-band coefficient, and the high-pass sub-band coefficients of the infrared and visible light images.

U_{V H}^{l, k} (i, j)

and

U_{I H}^{l, k} (i, j)

denote the internal activity terms of the high-pass sub-bands of the infrared and visible light image output by the LSI-PCNN. If

U_{V H}^{l, k} (i, j)

is greater than

U_{I H}^{l, k} (i, j)

, it implies that the pixel at position

(i, j)

in the high-pass sub-band image of the infrared image exhibits more prominent features compared to the corresponding position in the high-pass sub-band image of the visible light image. Therefore, the former is chosen as the pixel in the fusion sub-image. Conversely, the latter is chosen as the pixel in the fusion sub-image.

3.2.2. Fusion of Low-Pass Sub-Band

The low-pass sub-bands reflect the primary energy information of the image while also containing some detail information. The Energy Attribute (EA) [18] can describe the global characteristic energy information of the image, while the WSEML measures local details such as contrast and texture changes. A new indicator for describing image information is proposed based on these two metrics, aiming to comprehensively extract both global and local information from the image to achieve better fusion results. In this paper, the energy activity E, defined by EA and WSEML, is used as the feedback input for the MSMG-PCNN. EA is defined as follows:

E A^{I, V} (i, j) = \exp (α |L^{I, V} (i, j) - (μ^{I, V} - M e^{I, V}) / 2|)

(14)

In Equation (14),

L^{I, V} (i, j)

represents the pixel at position

(i, j)

of the infrared and visible light images,

μ^{I, V}

and

M e^{I, V}

are the mean and median of the infrared and visible light images, and

α

is the modulation parameter.

The energy activity E is defined as:

E = W S E M L + E A

(15)

Extract the multiscale morphological gradients image from the source image in Figure 5a, as shown in Figure 5b. Compared to the source image in Figure 5a,b, this effectively reflects the boundary information of the image, demonstrating the spatial correlation within the image. Therefore, the multiscale morphological gradients (MSMG) [20], which possess strong predictive and discriminative capabilities, serves as the linking strength for the low-pass sub-band. This effectively further extracts the gradient information of the low-pass sub-band and increases the spatial correlation within the image. MSMG is defined as:

M S M G {(i, j)}^{I, V} = u^{I, V} (i, j) = \sum_{t = 1}^{n} w_{t} \times G_{t} (i, j), w_{t} = \frac{1}{2 t + 1}

(16)

In Equation (16),

G_{t} (i, j)

represents the single-scale morphological gradient and w_t denotes the weight of the gradient at level building when using the improved adaptive dual-channel PCNN; the MSMG is introduced as the linking strength for the low-pass sub-band. This type of PCNN is referred to as MSMG-PCNN.

According to Equations (1), (4)–(8), and Equation (16), the MSMG-PCNN-related parameters are computed. Similarly, the fusion of low-pass sub-band coefficients is based on the magnitude of the model’s output internal activity terms. The fusion formula for low-pass sub-band coefficients is as follows:

L_{F}^{l, k} (i, j) = \{\begin{array}{l} L_{V}^{l, k} (i, j), if U_{V L}^{l, k} (i, j) \geq U_{I L}^{l, k} (i, j) \\ L_{I}^{l, k} (i, j), otherwise \end{array}

(17)

In Equation (17),

L_{F}^{l, k}

,

L_{V}^{l, k}

, and

L_{I}^{l, k}

, respectively, represent the fused low-pass sub-band coefficient, and the low-pass sub-band coefficients of the infrared and visible light images.

U_{V L}^{l, k} (i, j)

and

U_{I L}^{l, k} (i, j)

denote the internal activity terms of the low-pass sub-bands of the infrared and visible light images output by the MSMG-PCNN. The principle of fusion for low-pass sub-bands is consistent with that of high-pass sub-bands.

4. Experimental Results and Discussion

4.1. Experimental Settings

To validate the feasibility, environmental adaptability, and performance advantages of the proposed infrared and visible light image fusion algorithm, this study compares it with five fusion methods: NSST, NSST-DCPCNN [21], VSM-WLS [22], NSCT-PAUDPCNN [12], and NSST-PAPCNN [23]. NSCT-PAUDPCNN uses “9/7” and “pkva” as the direction and pyramid filters, and the parameters for NSST-DCPCNN are set based on commonly used values established from prior experience, with the following settings: α_L = 0.06931, α_h= 0.2, V_L = 1, V_h = 20, h = 0.2, n = 200, and W = [0.707, 1, 0.707; 1, 0, 1; 0.707, 1, 0.707]. The experimental objects are four sets of infrared and visible light images from the TNO dataset, and all fusion experiments are conducted in MATLAB 2023a. This study evaluates the fusion performance of the proposed method and the comparative methods through a combination of subjective evaluation and objective evaluation.

4.2. Subjective Evaluation

The fusion results obtained by applying NSST, NSST-DCPCNN, VSM-WLS, NSCT-PAUDPCNN, NSST-PAPCNN, and the proposed algorithm to four sets of source images are illustrated inFigure 6a–h, Figure 7a–h, Figure 8a–h and Figure 9a–h, respectively.

From the above four groups of fused images, it is apparent that the NSST method exhibits pixel distortion, leading to an inability to effectively distinguish infrared targets from the background. The image clarity is poorest in this method, severely impacting perceptual quality. The NSST-DCPCNN method performs relatively well, although it lags behind our proposed algorithm in terms of image detail and contrast. The VSM-WLS method presents an overall dull visual perception, resulting in relatively lower contrast in the fused images. The NSST-PAPCNN method introduces some black pseudo-noise due to the suboptimal selection of linking strength, resulting in blurry fused images and compromised perceptual quality. The NSCT-PAUDPCNN method achieves good fusion results, but the contrast between targets and scenes in the fused images is lower than that obtained by our algorithm. Compared to the aforementioned five methods, the fused images generated by our algorithm effectively highlight target information, while achieving the highest contrast, and also realize the highest visual performance within the human visual range.

4.3. Objective Evaluation

Subjective visual assessment may be influenced by individual differences such as visual acuity, emotional state, and psychological factors. In contrast, objective assessment calculates relevant quality metrics based on image data, thereby reflecting the quality of the fused image. Therefore, objective quality assessment can effectively complement the shortcomings of subjective assessment. This study selected six mainstream evaluation metrics for quantitative evaluation: spatial frequency (SF) [24], information entropy (IE) [25], edge preservation (Q^AB/F) [26], mutual information (MI) [27], and standard deviation (SD) [28].

SF measures the overall activity level of an image in the spatial domain. The greater the overall activity level of the fused image, the clearer the fused image. IE quantifies the richness of information contained in the fused image; a higher IE value indicates that the fused image contains more information. Q^AB/F is an indicator of how much edge information from the source images is retained in the fused image; a higher Q^AB/F value indicates that more edge information is preserved in the fused image. MI measures the amount of information acquired from the source images in the fused image; a higher MI value indicates that the fused image acquires more information from the source images. SD quantifies the variation and dispersion of pixel grayscale values in the fused image, thus measuring the contrast of the fused image; a higher SD value indicates higher contrast in the fused image.

Table 1, Table 2, Table 3 and Table 4 present a quantitative evaluation of the objective metrics for three PCNN models and six image fusion algorithms. The bold values represent the best objective evaluation results for the same metric.

To evaluate the performance of various PCNN models, ablation studies were conducted on PAPCNN, PAUDPCNN, and the proposed model. Ablation studies were performed using NSST as the decomposition tool, applying the aforementioned PCNN models to ten sets of infrared and visible light images for fusion. The average objective evaluation of the various PCNN models in the ablation studies is shown in Table 1.

As shown in Table 1, the proposed model achieved improvements in the evaluation metrics of SF, IE, MI, and SD. Only the Q^AB/F value was slightly lower than that of the PAUDPCNN. Therefore, the overall performance of the proposed PCNN model was superior to that of previous PCNN models.

Table 2 and Table 3 summarize the objective evaluation metrics for the first four sets of experiments. Analysis of Table 1 reveals that in the first set of experiments, the proposed method exhibited a slightly lower SF value compared to the VSM-WLS algorithm. This was due to the introduction of pseudorandom noise during image fusion by the VSM-WLS algorithm, resulting in a false increase in the SF value. In the second set of experiments, because NSCT more effectively described the straight edge information of trenches in the second group of images, and as NSST better described the curved edge information, the SF and Q^AB/F values of the objective evaluation metrics for this algorithm were slightly lower than those of the NSCT-PAUDPCNN. However, the remaining metrics were higher than those of the comparison algorithms.

In the third set of experiments, all objective evaluation metrics of our method surpassed those of other algorithms, effectively demonstrating the performance superiority of our proposed algorithm. In the fourth set of experiments, our proposed algorithm exhibited only slightly lower Q^AB/F values compared to the NSST-DCPCNN algorithm, while the remaining metrics were higher than those of the comparison algorithms.

To further validate the performance superiority and comprehensive effectiveness of the proposed method, this study extended the fusion subjects to ten sets of infrared and visible light images from different scenarios. The experiments were conducted using the six aforementioned methods. The objective evaluation metrics of the fused images are statistically presented in line chart format, as illustrated in Figure 10. The mean values of the objective evaluation metrics for the ten sets of fused images are listed in Table 4.

The objective metrics comparison indicated that, relative to the other five algorithms, the fused images obtained by our algorithm exhibited superior performance across the aforementioned objective evaluation metrics. Specifically, the SD, IE, and MI values of the fusion results from our algorithm consistently outperformed those of the comparison algorithms, with other metrics also ranking among the top two positions. This demonstrates that our algorithm achieved optimal contrast, information content, and overall performance, consistent with the previous subjective evaluations. As shown in Table 2, compared to the best mean values of the objective evaluation metrics of the contrasted methods, the proposed method showed improvements of 2.35%, 3.49%, and 11.60% in information entropy, mutual information, and standard deviation, respectively.

Hence, both subjective and objective assessments affirmed the superior overall performance of our proposed algorithm over alternative approaches. The last column in Table 4 illustrates the average runtime for each method. Comparative analysis revealed that the NSST algorithm operated the fastest, notwithstanding its comparatively inferior visual effects in the fused images. In contrast to our proposed algorithm and NSST-DCPCNN, NSST-PAPCNN took more time due to its utilization of a single-channel PCNN model for fusion. Furthermore, the NSCT-PAUDPCNN method exhibited elevated time complexity, attributable to the prolonged process of NSCT decomposition and reconstruction. Consequently, our proposed method prioritized superior timeliness while ensuring optimal fusion outcomes. Despite the constraints imposed by the MATLAB experimental platform’s processing speed, our approach still necessitated a certain duration to execute image fusion operations. Hence, expedited execution of image fusion operations can be achieved through the adoption of more efficient programming languages and parallel computing paradigms.

Based on the above subjective and objective evaluations, it can be concluded that the infrared and visible light image fusion method studied in this paper effectively highlights the contours and brightness information of targets, while enhancing the information content, mutual information, and contrast of the fused images. When targets are situated in complex environments such as concealment, low light, and night-time, the target recognition and tracking system based on fused images can fully leverage the advantage of infrared images in highlighting target information, while combining the high-resolution background information from visible light images—thus achieving precise target recognition and tracking in various environments. Subsequently, integrating deep learning-based target recognition and tracking techniques with the fusion method proposed in this paper could be explored. This involves initially acquiring and processing fused image datasets using the method proposed in this paper, followed by constructing target recognition neural network models and target tracking algorithm frameworks, and finally conducting model training to achieve precise target recognition and tracking. However, the performance of the target recognition and tracking system based on fused images still requires further research.

5. Conclusions

To address the issues of low contrast and significant loss of detail and texture information in previous infrared and visible light image fusion algorithms, this paper proposes a novel fusion approach based on an adaptive dual-channel pulse-coupled neural network (PCNN) in the NSST domain. Firstly, the parameter adaptive settings and spatial correlation of the pulse-coupled neural network model were optimized, and a new model based on local structure information and multiscale morphological gradients was introduced. Secondly, NSST was employed as a multiscale decomposition tool to obtain a low-pass sub-band and a series of high-pass sub-bands images. The Weighted Sum of Eight-neighborhood Laplacian of the high-pass sub-bands and the activity energy of the low-pass sub-band were used as feedback inputs for the fusion of high and low-pass sub-bands in the new model. Finally, the fused image was reconstructed from the fused high and low-pass sub-bands. The ablation experiments demonstrated the superior performance of the proposed model in this paper. Comparative fusion experiments using five methods indicated, subjectively, that the proposed approach can enhance the contrast of scenes and targets while preserving rich texture and detail information. Moreover, compared to the mean values of the best objective evaluation metrics of the comparison methods, our method showed improved information entropy, mutual information, and standard deviation by 2.35%, 3.49%, and 11.60%, respectively. Thus, our method demonstrates excellent comprehensive performance both subjectively and objectively.

Author Contributions

Conceptualization, B.F.; Methodology, B.F.; Investigation, C.A.; Data curation, H.Z.; writing and editing, C.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by 2022 “Insight Action” Achievement Transformation and Application Project (Program No. 628020320).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors thank all the anonymous reviewers for their very helpful comments for improving the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cao, Y.; Guan, D.; Huang, W.; Yang, J.; Cao, Y.; Qiao, Y. Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inf. Fusion 2019, 46, 206–217. [Google Scholar] [CrossRef]
Kutyniok, G.; Labate, D. Construction of regular and irregular shearlet frames. Wavelet Theory 2007, 12, 1–12. [Google Scholar]
Cunha, A.L.; Zhou, J.P.; Do, M.N. The nonsubsampled contourlet transform: Theory, design, and applications. IEEE Trans. Image Process. 2006, 15, 3089–3101. [Google Scholar] [CrossRef] [PubMed]
Easley, G.; Labate, D.; Lim, W.Q. Sparse directional image representation using the discrete shearlet transforms. Appl. Comput. Harmon. Anal. 2008, 25, 103626. [Google Scholar] [CrossRef]
Zhang, S.; Li, X.; Zhang, X.; Zhang, S. Infrared and visible image fusion based on saliency detection and two-scale transform decomposition. Infrared Phys. Technol. 2021, 10, 1350–1479. [Google Scholar] [CrossRef]
Ding, W.; Bi, D.; He, L.; Fan, Z. Infrared and Visible Image Fusion Based on Sparse Feature. Acta Photonica Sin. 2018, 47, 910002. [Google Scholar] [CrossRef]
Feng, X.; Yang, M.J.; Zhang, H.D.; Qiu, G. Infrared and Visible Image Fusion Based on Dual Channel Residual Dense Network. Acta Photonica Sin. 2023, 52, 1110003. [Google Scholar]
Eckhorn, R.; Reitboeck, H.J.; Arndt, M.; Dicke, P. Feature Linking via Synchronization among Distributed Assemblies: Simulations of Results from Cat Visual Cortex. Neural Comput. 1999, 2, 293–307. [Google Scholar] [CrossRef]
Wang, Z.B.; Ma, Y.D. Dual-channel PCNN and its application in the field of image fusion. In Proceedings of the Third IEEE International Conference on Natural Computation (ICNC 2007), Haikou, China, 24–27 August 2007; pp. 755–759. [Google Scholar]
Chen, Y.; Park, S.K.; Ma, Y.; Ala, R. A new automatic parameter setting method of a simplified PCNN for image segmentation. IEEE Trans. Neural Netw. 2011, 22, 880–892. [Google Scholar] [CrossRef] [PubMed]
Gu, X.D.; Guo, S.D.; Yu, D.H. A new approach for automated image segmentation based on unit linking PCNN. In Proceedings of the International Conference on Machine Learning and Cybernetics, Beijing, China, 4–5 November 2002; pp. 175–178. [Google Scholar]
Panigrahy, C.; Seal, A.; Mahato, N.K. Parameter adaptive unit-linking dual-channel PCNN based infrared and visible image fusion. Neurocomputing 2022, 514, 21–38. [Google Scholar] [CrossRef]
Cheng, B.Y.; Jin, L.X.; Li, G.N. A novel fusion framework of visible light and infrared images based on singular value decomposition and adaptive DUAL-PCNN in NSST domain. Infrared Phys. Technol. 2018, 91, 153–163. [Google Scholar] [CrossRef]
Yang, Z.; Dong, M.; Guo, Y.; Gao, X.; Wang, K.; Shi, B.; Ma, Y. A new method of micro-calcifications detection in digitized mammograms based on improved simplified PCNN. Neurocomputing 2016, 218, 79–90. [Google Scholar] [CrossRef]
Panigrahy, C.; Seal, A.; Mahato, N.K. MRI and SPECT image fusion using a weighted parameter adaptive dual channel PCNN. IEEE Signal Process. Lett. 2020, 27, 690–694. [Google Scholar] [CrossRef]
Baruah, H.G.; Nath, V.K.; Hazarika, D.; Hatibaruah, R. LocalBit-Plane Neighbour Dissimilarity Patternin Non-subsampled ShearletTransform Domain for Bio-medical Image Retrieval. Math. Biosci. Eng. MBE 2022, 19, 1609–1632. [Google Scholar] [CrossRef] [PubMed]
Gong, X.Q.; Hou, Z.Y.; Lv, K.Y.; Lu, T.; Xia, Y.; Li, W. Remote sensing image fusion method combining improved Laplacian energy and parameter adaptive dual-channel unit-linking pulse coupled neural network. Acta Geod. Cartogr. Sin. 2023, 52, 1892–1905. [Google Scholar]
Ruan, L.N.; Dong, Y. Non-Subsampling Shearlet Transform Remote Sensing Image Fusion with Improved Dual-channel Adaptive Pulse Coupled Neural Network. Laser Optoelectron. Prog. 2023, 60, 374–384. [Google Scholar]
Liao, L.N.; Li, H.T.; Xiang, Y. Multi-focus image fusion algorithm based on SML and difference image. Chin. J. Liq. Cryst. Disp. 2023, 88, 524–533. [Google Scholar] [CrossRef]
Tan, W.; Tiwari, P.; Pandey, H.M.; Moreira, C.; Jaiswal, A.K. Multimodal medical image fusion algorithm in the era of big data. Neural Comput. Appl. 2020, 56, 86–105. [Google Scholar] [CrossRef]
Gao, G.R.; Xu, L.P.; Feng, D.Z. Multi-focus image fusion based on non-subsampled shearlet transform. IET Image Process. 2013, 6, 633–639. [Google Scholar]
Ma, J.; Zhou, Z.Q.; Wang, B.; Zong, H. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 2017, 82, 8–17. [Google Scholar] [CrossRef]
Yin, M.; Liu, X.; Liu, Y.; Chen, X. Medical Image Fusion with Parameter-Adaptive Pulse Coupled Neural Network in Nonsubsampled Shearlet Transform Domain. IEEE Trans. Instrum. Meas. 2019, 68, 49–64. [Google Scholar] [CrossRef]
Helmy, A.K.; Taweel, G.S. Image segmentation scheme based on SOM–PCNN in frequency domain. Appl. Soft Comput. 2016, 40, 405–415. [Google Scholar] [CrossRef]
Ma, Y.; Chen, J.; Chen, C.; Fan, F.; Ma, J. Infrared and visible image fusion using total variation model. Neurocomputing 2016, 202, 12–19. [Google Scholar] [CrossRef]
Xydeas, C.; Petrovic, V. Objective image fusion performance measure. Electron. Lett. 2000, 36, 308–309. [Google Scholar] [CrossRef]
Qu, G.H.; Zhang, D.; Yan, P.F. Information measure for performance of image fusion. Electron. Lett. 2002, 38, 313–315. [Google Scholar] [CrossRef]
Bai, X.; Zhang, Y.; Zhou, F.; Xue, B. Quadtree-based multi-focus image fusion using a weighted focus-measure. Inf. Fusion 2015, 22, 105–111. [Google Scholar] [CrossRef]

Figure 1. Dual-channels PCNN model structure.

Figure 2. The dual-level decomposition process of NSST [4].

Figure 3. Flow chart of the fusion algorithm [17,18].

Figure 4. Source image and extracted images. (a) Infrared and visible light source images; (b) decomposed singular value images.

Figure 5. Source image and extracted images; (a) Infrared and visible light source images; (b) Multiscale morphological gradients images.

Figure 6. The first group of original images and six fused images. (a) Visible light image; (b) Infrared image; (c) NSST; (d) NSST-DCPCNN; (e) VSM-WLS; (f) NSST-PAPCNN; (g) NSCT-PAUDPCNN; (h) Proposed.

Figure 7. The second group of original images and six fused images; (a) Visible light image; (b) Infrared image; (c) NSST; (d) NSST-DCPCNN; (e) VSM-WLS; (f) NSST-PAPCNN; (g) NSCT-PAUDPCNN; (h) Proposed.

Figure 8. The third group of original images and six fused images; (a) Visible light image; (b) Infrared image; (c) NSST; (d) NSST-DCPCNN; (e) VSM-WLS; (f) NSST-PAPCNN; (g) NSCT-PAUDPCNN; (h) Proposed.

Figure 9. The fourth group of original images and six fused images; (a) Visible light image; (b) Infrared image; (c) NSST; (d) NSST-DCPCNN; (e) VSM-WLS; (f) NSST-PAPCNN; (g) NSCT-PAUDPCNN; (h) Proposed.

Figure 10. Comparison of objective evaluation metrics for the ten sets; (a) SF; (b) IE; (c) Q^AB/F; (d) MI; (e) SD.

Table 1. Average objective evaluation of various PCNN models.

Model	SF	IE	Q^AB/F	MI	SD
PAPCNN	10.7979	6.9631	0.4631	2.2402	38.1145
PAUDPCNN	12.7758	6.9690	0.5453	2.3793	39.7973
Propose model	12.7894	7.1298	0.5373	2.7001	44.4765

Table 2. Objective evaluation results of the first and second groups of fused images.

Images	Fusion Methods	SF	IE	Q^AB/F	MI	SD
man_in_doorway	NSST	14.7155	7.0060	0.5331	1.2807	32.1183
	NSST-DCPCNN	14.7589	7.1909	0.5178	2.2632	44.8060
	VSM-WLS	15.3315	7.1122	0.4700	1.3837	34.9360
	NSST-PAPCNN	10.8719	7.2229	0.3915	2.0190	43.8874
	NSCT-PAUDPCNN	15.1683	7.2810	0.5397	1.9261	46.6804
	Proposed	14.7953	7.3488	0.5435	2.4537	48.3090
soldier_in_trench	NSST	14.0219	6.7115	0.6173	1.8442	27.9298
	NSST-DCPCNN	14.1562	7.1674	0.6261	3.7013	43.2693
	VSM-WLS	13.8672	6.9739	0.5813	2.2776	34.1787
	NSST-PAPCNN	10.1671	7.2644	0.2992	2.6234	43.4790
	NSCT-PAUDPCNN	14.3915	7.2143	0.6431	3.2565	45.4065
	Proposed	14.2752	7.3065	0.6359	3.9647	47.2504

Table 3. Objective evaluation results of the third and fourth groups of fused images.

Images	Fusion Methods	SF	IE	Q^AB/F	MI	SD
lake	NSST	11.8991	6.6783	0.5420	1.5819	27.0315
	NSST-DCPCNN	11.8709	7.0916	0.5296	2.7760	42.5150
	VSM-WLS	11.2331	7.0032	0.4807	2.0636	35.9001
	NSST-PAPCNN	8.3470	7.1576	0.2877	2.2542	43.3220
	NSCT-PAUDPCNN	12.0028	7.1325	0.5442	2.2309	42.6051
	Proposed	12.8852	7.3796	0.5482	3.0847	47.1004
Nato_camp	NSST	14.8734	6.9426	0.2255	1.3181	30.6080
	NSST-DCPCNN	14.9887	7.1945	0.5847	2.5462	38.6822
	VSM-WLS	15.0639	7.1097	0.5037	1.7524	36.1192
	NSST-PAPCNN	10.6633	7.2408	0.3723	2.0394	39.5377
	NSCT-PAUDPCNN	15.1718	7.2773	0.6056	2.3053	40.3856
	Proposed	15.0520	7.4603	0.6120	3.0231	44.8374

Table 4. Average metric for ten sets of fused images.

Fusion Methods	SF	IE	Q^AB/F	MI	SD	Time/s
NSST	12.4762	6.5961	0.5323	1.5555	27.6832	5.6818
NSST-DCPCNN	12.5387	6.8856	0.5336	2.6090	38.3319	26.2608
VSM-WLS	13.1418	6.7652	0.4903	1.87983	33.6529	1.3170
NSST-PAPCNN	9.43445	6.9296	0.3595	2.2046	38.0021	40.8346
NSCT-PAUDPCNN	12.7722	6.9658	0.5487	2.3941	39.8544	106.5191
Proposed	12.7894	7.1298	0.5373	2.7001	44.4765	23.1243

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, B.; Ai, C.; Zhang, H. Fusion of Infrared and Visible Light Images Based on Improved Adaptive Dual-Channel Pulse Coupled Neural Network. Electronics 2024, 13, 2337. https://doi.org/10.3390/electronics13122337

AMA Style

Feng B, Ai C, Zhang H. Fusion of Infrared and Visible Light Images Based on Improved Adaptive Dual-Channel Pulse Coupled Neural Network. Electronics. 2024; 13(12):2337. https://doi.org/10.3390/electronics13122337

Chicago/Turabian Style

Feng, Bin, Chengbo Ai, and Haofei Zhang. 2024. "Fusion of Infrared and Visible Light Images Based on Improved Adaptive Dual-Channel Pulse Coupled Neural Network" Electronics 13, no. 12: 2337. https://doi.org/10.3390/electronics13122337

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fusion of Infrared and Visible Light Images Based on Improved Adaptive Dual-Channel Pulse Coupled Neural Network

Abstract

1. Introduction

2. Background

2.1. Dual-Channel Pulse Coupled Neural Network

2.2. Non-Subsampled Shearlet Transform

3. Proposed Method

3.1. Improved Adaptive Dual-Channel Pulse Coupled Neural Network

3.2. Proposed Fusion Framework

3.2.1. Fusion of High-Pass Sub-Bands

3.2.2. Fusion of Low-Pass Sub-Band

4. Experimental Results and Discussion

4.1. Experimental Settings

4.2. Subjective Evaluation

4.3. Objective Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI