Vehicle Target Recognition Method Based on Visible and Infrared Image Fusion Using Bayesian Inference

Wu, Jie; Zhang, Xiaoqian

doi:10.3390/app13148334

Open AccessArticle

Vehicle Target Recognition Method Based on Visible and Infrared Image Fusion Using Bayesian Inference

by

Jie Wu

^1,*

and

Xiaoqian Zhang

²

¹

School of Defense, Xi’an Technological University, Xi’an 710021, China

²

School of Electronic and Information Engineering, Xi’an Technological University, Xi’an 710021, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(14), 8334; https://doi.org/10.3390/app13148334

Submission received: 21 June 2023 / Revised: 13 July 2023 / Accepted: 17 July 2023 / Published: 19 July 2023

Download

Browse Figures

Versions Notes

Abstract

:

The accuracy of single-mode optical imaging systems for vehicle target recognition is limited by external ambient illumination or imaging equipment resolution. In this paper, a vehicle target recognition method based on visible and infrared image fusion using Bayesian inference is proposed. Based on the significance area detection combined with Spectral Residual (SR) and EdgeBox algorithms, the target area is marked on the visible light image, and the maximum between-class variance method and GrabCut algorithm are used to segment vehicle targets in marked images. The open operation filter is employed to extract the vehicle target features in the infrared image, and the fusion result of the visible light image and the infrared image is obtained by introducing the Intersection Over Union (IOU), and as the parameter of Bayesian inference, the class and attribute of the parameter are defined and substituted into the established naive Bayesian classification model, and the probability of the class is calculated to determine whether the vehicle target is recognized. Experiments under different test conditions were carried out, and the experimental results were as follows: the accuracy of image target recognition reached 77% when the vehicle target was not occluded; when the vehicle target was partially occluded, the accuracy of image target recognition reached 74%; the results verified that the proposed method can recognize vehicle targets in different scenarios.

Keywords:

vehicle target; visible light image; infrared image; Bayesian inference; intersection over union

1. Introduction

Target recognition has always been a research hotspot in the field of digital image processing. In the military field, it is widely applied in fields such as intelligence reconnaissance and precision guidance. In the civil field, it is commonly utilized in areas such as video surveillance and medical diagnosis [1]. Most of the current target recognition methods are based on visible light images. Visible light refers to light with wavelengths between 390 and 780 nm; the wavelength range of light is good, the detailed texture information of the scene can be obtained, the picture is clear, and the amount of information is rich, but it cannot recognize the target that is blocked and camouflaged [2], and it is easily affected by lighting, such as target recognition based on visible light images in the night and in rainy and foggy weather. The target recognition method based on infrared images can overcome the influence of illumination. Infrared light refers to the radiation electromagnetic wave with wavelengths between 760 nm and 1000 µm; infrared imaging reflects the difference in the target and background outward radiation energy and has a large distance and strong anti-interference ability. Even in smoke, rain and snowy weather, infrared image feature extraction and recognition can still be performed, but compared to visible light imaging, infrared imaging is not sensitive to brightness in a scene, resulting in lower contrast of the resulting image, poor resolution, and less detailed texture information [3]. In view of the advantages and disadvantages of visible light image recognition and infrared image recognition, this paper proposes a vehicle target recognition method based on visible and infrared image fusion using Bayesian inference, which makes full use of the advantages of the two to obtain images with rich spectral and textural information of visible light and thermal radiation information of infrared images [4], so as to improve the accuracy of vehicle target recognition under various weather conditions.

2. Related Work

In recent years, increasing amounts of researchers have studied target recognition methods based on the fusion of visible light images and infrared images. He et al. [5] proposed a visible light target detection algorithm that adaptively integrates infrared features; this algorithm combines deep convolutional neural networks with multi-source information to fully fuse multi-modal features of the target and improves the detection performance. Wang et al. [6] introduced a method for infrared and visible light image fusion and multiple target recognition using a method that first extracted the local region of interest in the image and then established feature point positioning by using multi-modal image scale and partial grayscale invariant image features, and finally established registration relationships to achieve detection and recognition of multiple targets in the fused image. Liu et al. [7] presented a feature-fusion-based method for infrared and visible target detection; the method employed a convolutional neural network to achieve accurate recognition and localization of various targets. Li et al. [8] used temporal and spatial information constraints to gain flight parameters based on the visible light equipment. Bai et al. [9] put forward a decision-level fusion-based object detection algorithm that used the YOLOv3 network to detect visible and infrared images and achieved rapid target detection. Zhou et al. [10] used a semantic segmentation-based method for infrared and visible image fusion which can obtain better fusion results. Zhang et al. [11] introduced a new unsupervised deep-learning-based algorithm for infrared and visible image fusion, which can enhance subsequent processing information of a target. Ning et al. [12] used a decision-level fusion strategy based on model reliability. Ding et al. [13] proposed a multi-scale transformation-based infrared and visible image fusion algorithm with object enhancement; the algorithm can reconstruct the fusion image through an inverse Laplacian pyramid. Luo et al. [14] presented a visual-detection-based method for matching regions in infrared and visible light images. Li et al. [15] used a saliency-detection-based infrared and visible image fusion method under different viewing angles in which the Mask R-CNN was used to extract the salient regions of the infrared image; the regions were locally fused with the visible light image based on the visual field transformation model. Zhu et al. [16] put forward an infrared and visible image fusion method based on complex decomposition and intuitionistic fuzzy sets. Han et al. [17] applied an image fusion algorithm that preserves the infrared contour and gradient information; the algorithm performed color space transformation and adaptive morphological denoising on the source images. Wang et al. [18] advanced an infrared and visible image fusion method guided by self-attention. Ren et al. [19] created a multi-scale entropy-based method for infrared and visible image fusion using the Non-Subsampled Shearlet Transform (NSST); the method restored images at the original scale through inverse NSST transformation. Sun et al. [20] put forward a cross-modal pedestrian re-identification method for visible–infrared images. Huang et al. [21] used an unmanned aerial vehicle detection method based on the fusion of infrared and visible images; the method used YOLOv3 to train and test in order to improve the accuracy of unmanned aerial vehicle recognition. Hao et al. [22] suggested an infrared–visible image fusion and target recognition algorithm based on the interest region mask convolutional neural network. Shen et al. [23] presented a fusion method based on a multi-scale geometric transformation model. Zhao et al. [24] developed a multi-feature extraction-based infrared and visible light image object tracking method; the method has stable tracking of image targets in complex environments. Tang et al. [25] advanced a deep learning-based decision-level fusion tracking method for infrared and visible light, the results of this method showed that the dual-band fusion tracking has better robustness.

Significant developments have been achieved by the aforementioned researchers, most of whose research is based on deep learning methods and mainly focuses on improving fusion methods. However, deep-learning-based object recognition methods usually require a large number of data samples, and the trained networks perform poorly in recognizing images of poor quality, exhibiting insufficient generalization capability. Therefore, in this paper, a vehicle target recognition method based on visible and infrared image fusion using Bayesian inference is proposed. The primary contributions and innovations of this paper are as follows:

(1): In order to better extract target features from visible and infrared images, information fusion based on IOU judgment is used to obtain real target features by comparing the relationship between IOU and the set threshold.
(2): Based on the IOU judgment results as parameters for Bayesian inference, vehicle target recognition is achieved, and the feasibility of the method is verified through experiments.

The remainder of this paper is organized as follows: Section 2 introduces the research work conducted by researchers. Section 3 elaborates visible light and infrared image feature fusion algorithm. Section 4 states fusion image target recognition based on the Bayesian inference method. Section 5 provides verification methods and calculation results. Finally, Section 6 provides a summary of this paper.

3. Visible Light and Infrared Image Feature Fusion Algorithm

Considering the disadvantages of using single-mode optical equipment, this study is based on image information captured by visible light and infrared equipment, and a feature fusion algorithm of visible light and infrared images is proposed in this paper; first, an image of the vehicle target is captured by visible light equipment. Then, the vehicle target on the image is recognized and segmented, infrared equipment is used to capture an image of the vehicle target, and the features of the vehicle target on the formed infrared image are extracted. Finally, the vehicle target features extracted from visible light images and infrared images are fused; the fusion result can provide technical support for subsequent target recognition.

3.1. Visible Light Image Recognition Algorithm

As the imaging system is shaken by external influences during the shooting process, resulting in the degradation of the quality of the captured images, the loss of information, and the formation of blurry pictures, there is a negative impact on the accuracy of target recognition. The Wiener filter method is used to restore the fuzzy image, and the method model is Formula (1), where

H (u, v)

is the degradation function and

G (u, v)

is the Fourier transform of the degenerated image.

\hat{F} (u, v) = \frac{G (u, v)}{H (u, v)} \frac{| H (u, v) |^{2}}{(| H (u, v) |^{2} + K)}

(1)

In Formula (1),

H (u, v) = \frac{T}{π (u a + v b)} \sin [π (u a + v b)] e^{j π (u a + v b)}

,

G (u, v) = F (u, v) H (u, v)

.

The method for recognizing visible light images usually starts from the upper left corner of the image and traverses the entire image to search for the target, which not only takes time but also increases computational complexity. As a result, saliency region detection, which is used to suspect the real target, is performed before target recognition. That is, some information that does not have basic target features is filtered out from the salient regions, and the structural edge features of the vehicle target are used to constrain and improve the accuracy of real target recognition.

Saliency region detection methods are divided into spatial domain detection methods and frequency domain detection methods. Among these, frequency domain detection methods are prone to losing target edge information, and although the detection method based on the spatial domain can retain the details of image objects, it is difficult to remove interference factors such as ground shadows. Based on the advantages and disadvantages of two detection methods, this paper combines the spatial domain detection method and the frequency domain detection method to detect suspected vehicle target areas.

The SR algorithm based on the frequency domain detection method is chosen in which the visible light image is transformed to the frequency domain and the low-frequency components are suppressed by using a high-pass filter to highlight high-frequency information such as target edge contours. The specific steps of SR algorithm (Algorithm 1) are as follows:

Algorithm 1 The SR Algorithm
Step 1:	Visible light image $g (x, y)$ is transformed into frequency domain space $g^{'} (u, v)$ through Fourier transform.
Step 2:	The amplitude spectrum $A (u, v)$ , phase spectrum $X (u, v)$ , and logarithmic amplitude spectrum $L (u, v)$ in the frequency domain are calculated, which are $A (u, v) = \|F (g)\|$ , $X (u, v) = a n g l e F (g)$ , $L (u, v) = \log (A (u, v))$ .
Step 3:	The spectral residual $C (u, v)$ is calculated, which is $C (u, v) = L (u, v) - h_{n} \cdot L (u, v)$ .
Step 4:	The significance map is obtained, which is $S (x, y) = \|F^{- 1} {[\exp (C (u, v) + i \cdot X (u, v))]}^{2}\|$ .

Based on the spatial domain detection method, the EdgeBox algorithm is selected. Its basic idea is to perform structural edge detection in visible light images, score all candidate regions based on the detection results, and output structurally significant regions through scores [26].

In the process of image detection, the detection boxes on the same target are closer or even overlapping, while the detection boxes on different targets are farther apart. This paper designs an iterative automatic clustering algorithm for target boxes. The specific steps of the SR EdgeBox algorithm (Algorithm 2) are as follows:

Algorithm 2 The SR EdgeBox Algorithm
Step 1:	$n$ candidate boxes $“ b o x ”$ are entered.
Step 2:	All $b o x$ in the queue is placed, the first candidate box $b o x_{1}$ is selected as the benchmark, and calculated in sequence whether there is overlap with the other $n - 1$ $b o x$ . If two candidate boxes overlap, the minimum rectangle $b o x_{\min}$ is calculated that surrounds the two candidate boxes, candidate box $b o x_{1}$ is replaced. At the same time, the current candidate box is deleted, and then continue to calculate whether there is overlap between $b o x_{1}$ and the other $b o x$ ;
Step 3:	The overlap result between candidate boxes $b o x_{1}$ and all other $b o x$ are calculated and the corresponding minimum bounding rectangles are obtained, candidate box $b o x_{1}$ is added to the last $b o x$ and the original $b o x_{1}$ is removed, candidate box $b o x_{2}$ becomes $b o x_{1}$ , candidate box $b o x_{3}$ becomes $b o x_{2}$ , and so on. Then, continue with step 2, repeating the process $n$ times;
Step 4:	The final $m$ possible target boxes $(m \leq n)$ are outputted.

In each iteration, the candidate boxes with overlapping relationships are replaced by the minimum bounding rectangle until there is no overlap area between the target boxes, so as to achieve automatic clustering. This paper combines the SR EdgeBox algorithm with the spatial domain and frequency domain. Firstly, the EdgeBox algorithm is used to extract edges in the spatial domain. Secondly, the SR algorithm is used to extract high-frequency information in the frequency domain, filter low-frequency information, and finally obtain the target area of the visible light image.

By giving a box representing the vehicle target area, the background can be better removed, the target body can be segmented, and the interactive iterative graph GrabCut algorithm is used to segment the target and background on the image. This GrabCut algorithm does not require training, as after giving it the vehicle target box, the pixels in the box are automatically regarded as possible targets and backgrounds through the algorithm, and the pixels outside the frame are regarded as the background [27]. Then, Gaussian hybrid models are established for the target and background data based on the similarity characteristics of the vehicle target pixels; the unlabeled pixels are inferred by minimizing the energy function. The energy function is expressed by Formula (2).

E (o, p, q) = ψ (o, p, q) + ε (o, q)

(2)

In Formula (2),

o

represents the labels for each pixel segmentation in the image,

p

represents the parameters of the segmentation model,

q

represents the pixel values of the image,

ψ (o, p, q)

represents the main data items of the target image, and

ε (o, q)

represents the smoothing terms of the similarity between pixels within the target.

The advantage of the GrabCut algorithm is that it can effectively utilize the continuity between adjacent pixels for segmentation, but the algorithm requires continuous pixel labeling and performs poorly in situations when the boundaries of target and background are unclear and the contrast between them is small. The threshold segmentation algorithm selects a grayscale threshold based on the grayscale histogram of an image, divides the grayscale level of the image into several parts through one or more thresholds, and recognizes pixels belonging to the same part as the same object. However, this segmentation algorithm is sensitive to noise such as lighting. Based on the comparison of the advantages and disadvantages of the two methods, this paper combines them to segment vehicle targets. Firstly, using the visual saliency algorithm to detect the suspected target area as a whole, the maximum inter class variance method is used to obtain an appropriate threshold for binary segmentation. Secondly, a line-marking operation is performed under a given length in the same area of the binary image. Finally, the obtained line and its label (foreground or background) are input into the GrabCut algorithm to continue iteration until the real target is segmented [28]. The steps for obtaining the segmentation threshold algorithm (Algorithm 3) using the maximum inter class variance method are as follows:

Algorithm 3 Segmentation Threshold Algorithm
Step 1:	The image is defined as a matrix of $M \times N$ size, that is, the pixels in the image, with pixel values ranging between $(0, 255)$ .
Step 2:	The segmentation threshold between target and background is set to $T_{t h r e s h o l d}$ , the proportion of pixels belonging to the target is set to $ω_{0}$ , and the average grayscale of the target to $μ_{0}$ ; the proportion of background pixel points is set to $ω_{1}$ , and the average background grayscale is set to $μ_{1}$ , which are $ω_{0} = \frac{N_{0}}{M \times N}$ , $ω_{1} = \frac{N_{1}}{M \times N}$ , $N_{0} + N_{1} = M \times N$ , $ω_{0} + ω_{1} = 1$ .
Step 3:	The total average grayscale of the image is denoted as $μ$ , and the inter class variance is denoted as $h$ , which are $μ = ω_{0} \times μ_{0} + ω_{1} \times μ_{1}$ , $h = ω_{0} \times {(μ_{0} - μ)}^{2} + ω_{1} \times {(μ_{1} - μ)}^{2} = ω_{0} \times ω_{1} \times {(μ_{0} - μ)}^{2}$ .
Step 4:	The number of pixels in the image with a grayscale value less than threshold $T_{t h r e s h o l d}$ is denoted as $N_{0}$ , and the number of pixels with a grayscale value greater than or equal to threshold $T_{t h r e s h o l d}$ is denoted as $N_{1}$ , which is $T_{t h r e s h o l d} = \arg \max_{T} h$ .

3.2. Infrared Image Feature Extraction

Any object with a temperature above absolute zero can produce thermal radiation, while the spectrum of thermal radiation produced by an object with a temperature below 1725 degrees Celsius is concentrated in the infrared light region, so all objects in nature can radiate infrared energy outward, and the wavelength of infrared energy released by objects at different temperatures is not the same. Therefore, the infrared wavelength is related to the temperature. The wavelength and distance of infrared radiation produced by any object due to its own physical and chemical properties and different temperatures are also different. Compared with visible light images, infrared images and environments are clearly different, but they also have the characteristics of poor resolution, low signal-to-noise ratio, blurred visual effects, and a nonlinear relationship between grayscale distribution and target reflection features. Therefore, using the local invariant features of infrared images, connected area feature extraction is adopted in our feature extraction method, the image is divided into several components, each part’s features are extracted, and finally the same feature area is integrated.

These obtained infrared images are converted into a grayscale image, and a grayscale level filter is added. The grayscale is divided into 17 levels, corresponding to multiples of 15 from 0 to 255. The grayscale value in each grayscale interval is set to be equal to the highest value of that level, highlighting the sense of hierarchy in the grayscale image. The detection intensity of infrared images is divided into three levels: grayscale values between 210 and 240 represent strong radiation, grayscale values between 120 and 180 represent moderate radiation, and grayscale values below 90 represent ordinary radiation. Strong radiation mainly comes from vehicle targets, moderate radiation mainly comes from buildings, and ordinary radiation comes from the normal infrared intensity of the surrounding environment. To corrode and expand without damaging the original target information of the infrared image, a 3 × 3 operational filter is used to partially eliminate the interference of ground infrared noise. The infrared feature extraction algorithm is

\begin{matrix} γ (R_{r}, θ_{g r a y}, θ_{o p e n}) = \{R_{r}_{1}, R_{r}_{2}, \dots, R_{r}_{j}\} \\ R_{r i} = \{{G^{'}}_{i}, S_{i} |i = 1, 2, 3, \dots, j\} \end{matrix}

(3)

In Formula (3),

R_{r}

represents the infrared image,

θ_{g r a y}

represents the grayscale filter,

θ_{o p e n}

represents the open operation filter,

R_{r}_{j}

is the recognized infrared target image block,

{G^{'}}_{i}

is the position information of the image block, and

S_{i}

is the radiation level of the image block [29].

3.3. Information Fusion Based on IOU Judgment

While visible light images have high resolution and good recognition contrast, infrared images are less affected by the environment, have high accuracy, and can recognize camouflage. The effectiveness of target recognition is improved by fusing infrared images and visible light images. Due to the difference in resolution between visible and infrared images, the mask boxes extracted from infrared images are scaled and mapped into the RGB image recognition results, resulting in box information that includes two methods of recognition. IOU is used to evaluate the similarity between two rectangular boxes. The larger the value of IOU, the closer the predicted result is to the actual result, while the smaller the value, the worse the predicted result. The visible light image recognition target box is defined as

A

, the infrared image recognition target box as

B

, the intersecting part of the two boxes as

C

, and the merging part of the two boxes as

D

,

I O U = \frac{A \cap B}{A \cup B} = \frac{C}{D}

. The diagram of IOU is shown in Figure 1.

The process of multi-source information fusion and IOU judgment method is shown in Figure 2.

When the visible light image detection box overlaps with the infrared image detection box, the IOU value between two detection boxes is calculated. When the IOU exceeds a certain threshold, it is determined that there is a recognition target in the area, and the visible light image detection box is used as an image block for target recognition. When the IOU falls below this threshold, both detection boxes are discarded. When there is only an infrared image detection box and no visible light image detection box, it is determined as a ground infrared interference object and discarded. When there is only a visible light image detection box without an infrared image detection box, it is determined that the visible light image detection box has misidentified the target and is discarded.

4. Fusion Image Target Recognition Based on Bayesian Inference Method

Bayesian inference is an optimization method based on statistical probability and good robustness, and is less affected by noise and other factors, which can associate the prior probability of events with the posterior probability and represent the causal relationship between attributes through probability, which is a classic classification reasoning technique based on Bayes’ theorem. Bayesian classification algorithms are mainly divided into Bayesian Network (BN), Naive Bayes Classifier (NBC) and Dynamic Bayesian Network (DBN). There may be occlusion, jitter and other phenomena during the shooting of the imaging system, resulting in an incomplete display of target information in the image or the image being blurred and unclear. In order to simplify this problem, naive Bayesian inference is adopted. The naive Bayesian model is proposed by combining Bayesian theory and conditional independence assumptions to contain only one root node and multiple leaf nodes, and there is no association relationship between the leaf nodes, so according to the conditional independence between leaf nodes, the structure is the simplest Bayesian classification model; the naive Bayesian network structure is shown in Figure 3.

If the target image taken by the test system is occluded, in order to further recognize the vehicle target, the Bayesian inference method is used to recognize the vehicle. Events

f

and

s

are defined; event

f

indicates whether the captured image contains vehicle target information,

f

takes a value of

\{0, 1\}

,

f = 1

, and event

f

is true, that is, the captured image contains vehicle target information. Otherwise,

f = 0

, event

f

is false. Event

s

indicates that the image to be recognized contains attributes of other targets, including vehicles, buildings, and grass, represented by

s = \{c a r, b u i l d i n g, g r a s s\}

. The value of the vehicle attribute is

\{0, 1, 2\}

, and when the value is 0, it means that the image to be recognized does not contain a vehicle target. When the value is 1, it means that the image to be recognized contains a vehicle target, but the vehicle target is occluded. When the value is 2, it means that the image to be recognized contains a vehicle target, and the vehicle target is not occluded. Similarly, the values of the buildings and grass attributes are

\{0, 1\}

, and when the value is 0, it means that buildings or grass are not detected in the captured image. Otherwise, buildings or grass are detected in the captured image.

P (f |s) = \frac{P (s |f) P (f)}{P (s)} \propto P (s |f) P (f) = P (c a r |f) P (b u i l d i n g |f) P (g r a s s |f) P (f)

(4)

In Formula (4), the posterior probability function

P (f |s)

represents the probability that event

f

occurs after event

s

occurs, the likelihood function

P (f |s)

represents the probability of event

s

occurring when event

f

is true, the prior probability function

P (f)

represents the probability that event

f

will occur before event

s

occurs, which is determined by the specific scene and the accuracy of the detection algorithm [30], and the marginal likelihood function

P (s)

is not equal to 0 and is independent of any assumptions.

By comparing the magnitude of

P (f = 1 |s)

and

P (f = 0 |s)

, it is deduced whether the captured image contains a vehicle target; if the former is large, it is considered that the vehicle target has been detected, otherwise it is considered that the vehicle target has not been detected. The probability of detecting a vehicle is shown in Formula (5).

\begin{array}{l} P (f = 1 |s) = \frac{P (s |f = 1) P (f = 1)}{P (s)} \propto P (s |f = 1) P (f = 1) = \\ P (c a r |f = 1) P (b u i l d i n g |f = 1) P (g r a s s |f = 1) P (f = 1) \end{array}

(5)

5. Experimental Analysis

During the experiment, if the environmental impact causes shaking during the shooting process of the imaging system, causing the image to be blurred, the image is restored first and then detected. The specific process is that the SR algorithm and the EdgeBox algorithm are applied to detect visible light images. Due to the SR algorithm’s inability to preserve image target details well, the EdgeBox algorithm can preserve image target details. However, the EdgeBox algorithm cannot remove interference factors such as ground shadows. Thus, this paper combines spatial domain detection methods with frequency domain detection methods to detect suspected vehicle target areas. The results are shown in Figure 4, Figure 5 and Figure 6.

If the image captured by the imaging system does not have blur, there is no need to restore the image, and the test is performed directly. The results are shown in Figure 7, Figure 8 and Figure 9.

Comparing with detection results of Figure 6 and Figure 9, it can be seen that the vehicle image after deblurring and the vehicle image without blurring can be divided by the SR algorithm to achieve the division of the vehicle area mask, the image target details can be retained through the EdgeBox algorithm, and we can use a method based on the combination of the spatial and frequency domains to detect suspected vehicle target areas.

To verify the effectiveness of the algorithm proposed in this paper, tests were carried out when the vehicle image information is in different state conditions; the test results when the vehicle image is fully captured are shown in Figure 10, and the test results when the vehicle image is not fully captured are shown in Figure 11.

From Figure 10 and Figure 11, it can be seen that the frequency domain SR saliency detection method can highlight high-frequency information, but at the same time preserve non-target areas such as leaves. By using structural edges to constrain and filter out non structural regions in the spatial domain, a series of candidate boxes is obtained. The most likely structural regions are 15 boxes, each represented by different colors, as shown in Figure 10c and Figure 11c. Finally, the vehicle target is detected, as shown in Figure 10d and Figure 11d.

The vehicle target is set to the target and the surrounding environment is set to the background. The GrabCut segmentation algorithm and the maximum inter class variance threshold segmentation method are combined to segment the target and background of the detected vehicle target area. The vehicle targets in different scenes are segmented separately. The segmentation results of complete vehicle image are shown in Figure 12, and the segmentation results of incomplete vehicle image are shown in Figure 13.

After achieving target detection and segmentation of visible light images, they are fused with infrared images to reduce the probability of false detections and missed detections. The infrared images are converted into grayscale images, and the vehicle targets in the infrared images are set to strong radiation and the surrounding buildings are set to medium radiation for different targets and environments. A 3*3 operational filter is selected to maximize the preservation of infrared image details. The feature extraction results of the complete vehicle image are shown in Figure 14, and the feature extraction results of the incomplete vehicle image are shown in Figure 15.

The visible light image and infrared image are fused based on IOU judgment; the fusion result is shown in Figure 16a for a complete vehicle image, and the fusion result is shown in Figure 16b for an incomplete vehicle image. In these figures, the green box represents the infrared image detection results, and the blue box represents the visible light image detection results.

Using naive Bayesian inference to calculate the target recognition results in both complete and incomplete vehicle images. When the vehicle image is complete, the recognition rate of vehicle targets is about 76.8%, the recognition rate of buildings is about 0, and the recognition rate of grass is about 57.4%; when the vehicle image is incomplete, the recognition rate of vehicle targets is decreased to 73.7%, the recognition rate of buildings is still 0, and the recognition rate of grass is increased to 59.7%. The detection results of vehicle targets based on the recognition methods presented in this paper and Reference [24] are shown in Table 1.

When the imaging device is shooting at different distances, the change curve of the vehicle target recognition rate is shown in Figure 17. In Figure 17, it can be observed that as the shooting distance of optical imaging equipment increases, the recognition rates of both the method in this paper and the method in Reference [24] gradually decrease. However, the recognition results of the method in this paper are still higher than those of the method in Reference [24], indicating that the vehicle target recognition method based on visible light and infrared image fusion in this paper has stronger adaptability and better detection and recognition results.

6. Conclusions

This paper first performs saliency region detection on visible light images captured by optical imaging equipment. By combining SR and EdgeBox detection methods, vehicles in visible light images are detected and labeled. The GrabCut algorithm and maximum inter class variance method are combined to segment vehicle targets in the recognition box of the labeled image. Secondly, we filter and process the infrared images of vehicles captured by infrared devices to extract feature information of the vehicle target. Thirdly, based on the processed visible light image and infrared image, the authenticity of the target is judged using IOU judgment, and Bayesian inference fusion processing is used to obtain the final vehicle target recognition result. Finally, the effectiveness of the proposed method was verified through experimental results and comparative analysis.

Author Contributions

Conceptualization and methodology and validation, writing—original draft, J.W.; methodology, software and validation, writing—original draft, X.Z. All authors will be informed of each step of the manuscript processing, including submission, revision, revision reminder, etc., via emails from our system or assigned Assistant Editor. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 62073256.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, M.; Zhang, T.; Cui, W. Research of infrared small pedestrian target detection based on YOLOv3. Infrared Technol. 2020, 42, 176–181. [Google Scholar]
Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
Han, J.; Bhanu, B. Fusion of color and infrared video for moving human detection. Pattern Recognit. 2007, 40, 1771–1784. [Google Scholar] [CrossRef]
Chen, C.; Meng, X.; Shao, F.; Fu, R. Infrared and visible image fusion method based on multiscale low-rank decomposition. Acta Opt. Sin. 2020, 40, 72–80. [Google Scholar]
He, N.; Shi, S. Research on visible image object detection algorithm based on infrared features fusion. Microelectron. Comput. 2023, 40, 29–36. [Google Scholar]
Wang, N.; Zhou, M.; Du, Q. An infrared visible image fusion and target recognition method. J. Air Space Early Warn. Res. 2019, 33, 328–332. [Google Scholar]
Liu, J.; Yi, G.; Huang, D. Object detection in visible light and infrared images based on feature fusion. Laser Infrared 2023, 53, 394–401. [Google Scholar]
Li, H.; Zhang, X. Flight parameter calculation method of multi-projectiles using temporal and spatial information constraint. Def. Technol. 2023, 19, 63–75. [Google Scholar] [CrossRef]
Bai, Y.; Hou, Z.; Liu, X.; Ma, S.; Yu, W.; Pu, L. An object detection algorithm based on decision-level fusion of visible light image and infrared image. J. Air Force Eng. Univ. 2020, 21, 53–59+100. [Google Scholar]
Zhou, H.; Hou, J.; Wu, W.; Zhang, Y.; Wu, Y.; Ma, J. Infrared and visible image fusion based on semantic segmentation. J. Comput. Res. Dev. 2021, 58, 436–443. [Google Scholar]
Zhang, Y.; Wu, X.; Li, H.; Xu, T. Infrared image and visible image fusion algorithm based on unsupervised deep learning. J. Nanjing Norm. Univ. (Eng. Technol. Ed.) 2023, 23, 1–9. [Google Scholar]
Ning, D.; Zheng, S. An object detection algorithm based on decision-level fusion of visible and infrared images. Infrared Technol. 2023, 45, 282–291. [Google Scholar]
Ding, Q.; Qi, H.; Zhao, J.; Li, J. Research on infrared and visible image fusion algorithmbased on target enhancement. Laser Infrared 2023, 53, 457–463. [Google Scholar]
Luo, W.; Liu, M.; Li, L.; Wang, C. Research on region matching of infrared and visible images based on vision detection. Laser J. 2023, 44, 186–190. [Google Scholar]
Li, Y.; Wang, Y.; Yan, Y.; Hu, M.; Liu, B.; Chen, P. Infrared and visible images fusion from different views based on saliency detection. Laser Infrared 2021, 51, 465–470. [Google Scholar]
Zhu, Y.; Gao, L. Infrared and visible image fusion method based on compound decomposition and intuitionistic fuzzy set. J. Northwestern Polytech. Univ. 2021, 39, 930–936. [Google Scholar] [CrossRef]
Han, L.; Yao, J.; Wang, K. Visible and infrared image fusion by preserving gradients and contours. J. Comput. Appl. 2023. [Google Scholar] [CrossRef]
Wang, T.; Luo, X.; Zhang, Z. Infrared and visible image fusion based on self-attention learning. Infrared Technol. 2023, 45, 171–177. [Google Scholar]
Ren, Y.; Zhang, J. Infrared and visible image fusion based on NSST multi-scale entropy. J. Ordnance Equip. Eng. 2022, 43, 278–285. [Google Scholar]
Sun, Y.; Wang, R.; Zhang, Q.; Lin, R. A cross-modality person re-identification method for visible-infrared images. J. Beijing Univ. Aeronaut. Astronaut. 2022. [Google Scholar] [CrossRef]
Huang, Y.; Mei, L.; Wang, Y.; He, P.; Lian, B.; Wang, Y. Research on UAV detection based on infrared and visible image fusion. Comput. Knowl. Technol. 2022, 18, 1–8. [Google Scholar]
Hao, Y.; Cao, Z.; Bai, F.; Sun, H.; Wang, X.; Qin, J. Research on infrared visible image fusion and target recognition algorithm based on region of interest mask convolution neural network. Acta Photonica Sin. 2021, 50, 84–98. [Google Scholar]
Shen, Y.; Chen, X.; Yuan, Y.; Wang, L.; Zhang, H. Infrared and visible image fusion based on significant matrix and neural network. Laser Optoelectron. Prog. 2020, 57, 76–86. [Google Scholar]
Zhao, G.; Fu, Y.; Chen, Y. A method for tracking object in infrared and visible image based on multiple features. Acta Armamentarii 2011, 32, 445–451. [Google Scholar]
Tang, C.; Ling, Y.; Yang, H.; Yang, X.; Tong, W. Decision-level fusion tracking for infrared and visible spectra based on deep learning. Laser Optoelectron. Prog. 2019, 56, 217–224. [Google Scholar]
Wang, F.; Song, Y.; Zhao, Y.; Yang, X.; Zhang, Z.S. IR saliency detection based on a GCF-SB visual attention framework. Aerosp. Control. Appl. 2020, 46, 28–36. [Google Scholar]
Yang, J.; Li, Z. Infrared dim small target detection algorithm based on Bayesian estimation. Foreign Electron. Meas. Technol. 2021, 40, 19–23. [Google Scholar]
Shen, Y.; Jin, T.; Dan, J. Semi-supervised infrared image objcet detection algorithm based on key points. Laser Optoelectron. Prog. 2023, 59, 1–18. [Google Scholar]
Lan, Y.; Yang, L. Application research of infrared image target tracking in intelligent network vehicle. Laser J. 2019, 40, 60–64. [Google Scholar]
Miao, X.; Wang, C. Single Frame infrared (IR) dim small target detection based on improved sobel operator. Opto-Electron. Eng. 2016, 43, 119–125. [Google Scholar]

Figure 1. The diagram of IOU.

Figure 2. Multi source information fusion and IOU judgment.

Figure 3. Naive Bayesian network structure.

Figure 4. Detection results based on SR algorithm. (a) Original vehicle image. (b) Restored image. (c) Extract high frequencies based on SR algorithm. (d) Thresholding processing. (e) Divided the vehicle area Mask part.

Figure 5. Detection results based on EdgeBox algorithm. (a) Original vehicle image. (b) Restored image. (c) Extract based on EdgeBox algorithm. (d) Thresholding processing.

Figure 6. Detection results by combining SR and EdgeBox algorithms (image blurring). (a) Remove low-frequency areas. (b) Obtain the final target area.

Figure 7. Detection results based on SR algorithm. (a) Original vehicle image. (b) Extract high frequencies based on SR algorithm. (c) Thresholding processing. (d) Divided the vehicle area Mask part.

Figure 8. Detection results based on EdgeBox algorithm. (a) Original vehicle image. (b) Extract based on EdgeBox algorithm. (c) Thresholding processing.

Figure 9. Detection results by combining SR and EdgeBox algorithms. (a) Remove low-frequency areas. (b) Obtain the final target area.

Figure 10. Test results of complete vehicle image. (a) Original image of complete vehicle image. (b) SR algorithm detection results. (c) Structural area detection results. (d) Final detection results.

Figure 11. Test results of incomplete vehicle image. (a) Original image of incomplete vehicle image. (b) SR algorithm detection results. (c) Structural area detection results. (d) Final detection results.

Figure 12. Comparison of segmentation results for complete vehicle image. (a) Complete vehicle image. (b) GrabCut algorithm. (c) Maximum inter class variance algorithm. (d) The method proposed in this paper. (e) Extracting targets.

Figure 13. Comparison of segmentation results for incomplete vehicle image. (a) Incomplete vehicle image. (b) GrabCut algorithm. (c) Maximum inter class variance algorithm. (d) The method proposed in this paper. (e) Extracting targets.

Figure 14. Feature extraction results of complete vehicle image. (a) Infrared image of complete vehicle image. (b) Infrared image containing Gaussian noise. (c) 3 × 3 filter. (d) Infrared image segmentation. (e) Box selection of infrared target. (f) Extraction of infrared target.

Figure 15. Feature extraction results of incomplete vehicle image. (a) Infrared image of incomplete vehicle image. (b) Infrared image containing Gaussian noise. (c) 3 × 3 filter. (d) Infrared image segmentation. (e) Box selection of infrared target. (f) Extraction of infrared target.

Figure 16. Fusion results of visible light image and infrared image. (a) Fusion results when the vehicle image is complete. (b) Fusion results when the vehicle image is incomplete.

Figure 17. The change curve of vehicle target recognition rate at different distances.

Table 1. Recognition probability of vehicle target.

	The Vehicle Image Is Complete	The Vehicle Image Is Incomplete
Recognition probability in this paper	0.77	0.74
Recognition probability of Reference [24]	0.68	0.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, J.; Zhang, X. Vehicle Target Recognition Method Based on Visible and Infrared Image Fusion Using Bayesian Inference. Appl. Sci. 2023, 13, 8334. https://doi.org/10.3390/app13148334

AMA Style

Wu J, Zhang X. Vehicle Target Recognition Method Based on Visible and Infrared Image Fusion Using Bayesian Inference. Applied Sciences. 2023; 13(14):8334. https://doi.org/10.3390/app13148334

Chicago/Turabian Style

Wu, Jie, and Xiaoqian Zhang. 2023. "Vehicle Target Recognition Method Based on Visible and Infrared Image Fusion Using Bayesian Inference" Applied Sciences 13, no. 14: 8334. https://doi.org/10.3390/app13148334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vehicle Target Recognition Method Based on Visible and Infrared Image Fusion Using Bayesian Inference

Abstract

1. Introduction

2. Related Work

3. Visible Light and Infrared Image Feature Fusion Algorithm

3.1. Visible Light Image Recognition Algorithm

3.2. Infrared Image Feature Extraction

3.3. Information Fusion Based on IOU Judgment

4. Fusion Image Target Recognition Based on Bayesian Inference Method

5. Experimental Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI