1. Introduction
Automatic apple fruit picking in natural environments can reduce the intensity of heavy manual labor, which is an inevitable choice for modern agriculture [
1]. The natural light in northwest China is strong, and the visible light images collected in the natural environment are vulnerable to the influences of changing light and complex backgrounds, so the recognition effect lacks some robustness [
2]. In the complex environment of orchard operations, the most potential in vision research on picking robots lies in the technology of heterogeneous image fusion (IF) between time-of-flight (ToF) images and visible light images. The collected images have a variety of different attributes, including light invariance, spatial hierarchy, infrared perception, reliability of discrimination data, etc. [
2]. The image is indirectly generated from the depth information, which can reflect the near–far relationship and infrared reflection characteristics of different objects in the scene, and the effect is not affected by light changes [
2]. Image fusion generates a new information processing process that interprets the scene from a different source image which cannot be obtained from the information obtained by a single sensor [
2,
3]. Determining how to fuse ToF images and visible light images with different wavelength ranges and imaging mechanisms with high quality is currently a topic of great interest in image fusion research.
A non-subsampled shearlet transform (NSST) is a multi-scale, multi-directional, translation-invariant transform domain image decomposition method, which is widely used in image fusion [
4]. An NSST shearlet wave transform avoids the down-sampling operation, and has the characteristics of translation invariance, simple operation, low time complexity, etc. [
5]. Compared with wavelet transforms such as the discrete wavelet transform (DWT), stationary wavelet transform (SWT), discrete cosine transform (DCT), curvelet transform, and contourlet transform, an NSST has a good effect on searching for edges and contours. There are large numbers of deep neural network layers in deep learning methods. This characteristic could lead to low efficiency and a high cost. The advantage of an NSST is that it can fully fuse the source image information, and the fused image has good correlation coefficient and information entropy, which is more suitable for the situation where the image background in the natural orchard environment is complex, and the contour and image texture information need to be fused at the same time.
Related works are summarized as follows: A pulse coupled neural network (PCNN) is a neural network model established by simulating the activities of visual nerve cells in the cerebral cortex. Similar pattern features are classified into categories based on the principles of similarity clustering and capture characteristics [
6]. In terms of image fusion in a transform domain, Cheng et al. used an adaptive dual-channel pulse coupled neural network with triple connection strength in the local non-down-sampled shear wave transform domain to solve the spectral difference between infrared and visible light [
7]. Panigrahy et al. proposed a new medical fusion method in a non-down-sampled shear wave transform domain based on a weighted parameter adaptive dual channel PCNN [
8]. In terms of image fusion in saliency attention models, Liu et al. proposed a saliency detection model that combines a global saliency map with a local saliency map [
9]. Yang et al. designed a new fuzzy logic rule based on global saliency measurements to fuse the details extracted from panchromatic images with high spatial resolution and multispectral images with low spatial resolution [
10]. Li et al. used the segmentation-driven low-rank matrix recovery model to detect the significance of each individual image in the image set to highlight the regions with sparse features in each image [
11]. In terms of the optimization of image fusion parameters, Zhu et al. applied PCNN parameters to infrared and visible image fusion through quantum-behavior particle swarm optimization improvement [
12]. Huang et al. used an NSCT to independently decompose the intensity hue saturation of the image, a PCNN to fuse high-frequency sub-band images and low-frequency images, and a hybrid leapfrog algorithm to optimize PCNN parameters [
13]. Dharini et al. proposed a nature-inspired optimal feature selection method using ant colony optimization to reduce the complexity of the PCNN fusion of infrared and visible images [
14]. In the research of overexposure problems with concern to the ongoing climate change-related environmental changes over mountainous areas, Muhuri et al. used polarization fraction variation with temporal RADARSAT-2 C-Band full-polarimetric to study SAR Data [
15]. Raskar et al. introduced a novel technique to allow a user to interact with projected information and to update the projected information [
16].
A PCNN classifies similar pattern features into categories based on the principles of similarity aggregation and capture characteristics. The segmentation combination has the advantages of the grayscale aggregation lighting mechanism and the same grayscale attribute priority lighting. This is consistent with the basic idea of cluster analysis. Qiu et al. proposed a new density peaks-based clustering method, called clustering with local density peaks-based minimum spanning tree [
17]. Huang et al. proposed new adaptive spatial regularization for the representation coefficients to improve the robustness of the model to noise [
18]. Huang et al. proposed ultra-scalable spectral clustering and ultra-scalable ensemble clustering methods [
19].
Although scholars have studied the optimization and improvement of PCNN parameters, there are still cases of pixel artifacts, region blurring and unclear edges due to ignoring the impact of image changes and fluctuations on the results during the ignition process.
This paper introduces the concept of entropy [
20] in information theory and proposes a PCNN model guided by a saliency mechanism (SMPCNN). The ToF low-frequency component after multiple lighting segmentation using PCNN is simplified into a first-order Markov situation, and the significance function is defined as first-order Markov mutual information. On this basis, a PCNN model guided by a saliency mechanism for image fusion in transform domain (NSST-SMPCNN) is proposed to fuse ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment.
We summarize our main contributions below.
First, we aim to solve the following existing problems:
The traditional method of space domain fusion is to create a fusion model in the image gray space, which has the disadvantage that it is not easy to find the source image texture and boundary features.
A PCNN model has the defects of parameter experience setting, unadaptive termination, and easy over-segmentation. In the ignition process, it ignores the impact of image change fluctuation on the results, resulting in pixel artifacts, area blurring, and unclear edges.
The differences in imaging mechanisms between ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment lead to the problem of low fusion quality.
Second, the innovations and novelties of this paper are as follows:
A PCNN model guided by a saliency mechanism is proposed and applied to the fusion of ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment.
The ToF low-frequency component is simplified after multiple lighting segmentation using a PCNN into a first-order Markov situation. The significance function is defined as first-order Markov mutual information.
The significance function is used as the termination condition of a PCNN model iteration, and Kullback–Leibler (KL) divergence is used to measure the dynamic threshold amplification coefficient of the PCNN model.
A new momentum-driven multi-objective artificial bee colony algorithm is proposed to optimize the parameters of link channel feedback, link strength, and dynamic threshold attenuation factor. The momentum update strategy of employing bees and observing bees is used. The grid density construction is used to ensure that the optimal solution distribution is not too dense. The absolute value of the difference between the grid index values of the same dimension of the nondominated solution is used as the deletion selection probability of the nondominated solution to construct the optimal solution set. Cross entropy (CE) and mutual information (MI), two image fusion quality evaluation functions, are selected as multi-objective fitness functions.
The low-frequency components of ToF and color image after multiple lighting segmentation using a PCNN are fused using the weighted average rule, and the high-frequency components are fused using improved bilateral filters.
Three, the advantage of our work is as follows: the proposed NSST-SMPCNN method combines the saliency mechanism, saliency function, and the PCNN clustering segmentation mechanism, and has the advantages of a grayscale clustering lighting mechanism and the same grayscale attribute first lighting, which is suitable for heterogeneous image fusion in complex orchard environments in Gansu.
The paper structure is summarized as follows:
Section 1 contains the introduction and a description of the related works, as well as the highlights and contributions of this paper. Then, basic concept definitions of an NSST and a PCNN are introduced. The proposed definition of significance function is also defined in
Section 2. In
Section 3, a PCNN model guided by a saliency mechanism is proposed. Then, a PCNN transform domain image fusion method guided by a saliency mechanism is constructed in the
Section 4. Lastly, the final section contains a description of the experiment and the conclusions.
3. PCNN Model Guided by Saliency Mechanism
PCNN region segmentation and a saliency mechanism can locate the most interesting object region in the image well. Combining the saliency mechanism, saliency function and PCNN clustering segmentation mechanism, a PCNN model guided by a saliency mechanism is proposed, which has the advantages of a grayscale clustering lighting mechanism and the same grayscale attribute lighting priority, and is suitable for heterogeneous image fusion in complex orchard environments in Gansu.
The PCNN model has certain shortcomings, including that the parameters are limited by manual experience settings and cannot be terminated adaptively, and ignoring the impact of image changes and fluctuations on the results during the ignition process results in pixel artifacts, area blurring, and unclear edges. The iteration termination conditions and dynamic threshold amplification coefficients and the feedback items of the link channel αL, link strength β, and dynamic threshold attenuation factor αθ are improved adaptively. A new momentum-driven multi-objective artificial bee colony algorithm (MMOABC) is used for parameter optimization and is applied to the proposed PCNN model guided by a saliency mechanism. The improved SMPCNN model has the characteristics of enhancing the same type of pulse connection, reducing the difficulty of parameter integration, and improving the performance of image segmentation.
3.1. Adaptive Iteration Termination Conditions
The traditional PCNN model has the defects of nonadaptive termination and over-segmentation. The authors of [
23] used the maximum information entropy as the termination condition, but over-segmentation often occurs when the entropy is at its maximum, and the background with the same gray value will be mistaken for the target area and segmented together.
In this paper, the significance function is used as the criterion for model iteration termination, which is expressed as Equation (11). For a low-frequency ignition segmentation map, the greater the significance of the first-order Markov mutual information, the better the regional consistency.
3.2. Adaptive Dynamic Threshold Amplification Coefficient
In the ToF image, the fruit target is often shown as a region with a high gray value and normal distribution. Two ignition segmentation images are used to measure the PCNN dynamic threshold amplification coefficient, which is expressed as Equation (12). The probability distribution
corresponding to the state
, as well as the probability distribution
corresponding to the state
, and the KL divergence of the two states are calculated. This formula is used to measure the similarity between the probability distributions of two ignition segmentation maps. The closer the probability distribution of the two ignition segmentation images is, the smaller the dynamic threshold amplification coefficient is, which will enable the PCNN model to ignite when the target region tends to be stable during continuous iteration.
3.3. Parameter Optimization of Momentum Driven Multi-Objective Artificial Bee Colony Algorithm
An artificial bee colony algorithm (ABC) [
24] is a swarm intelligence optimization algorithm proposed to simulate the characteristics of bee swarms. It has the advantages of strong global optimization ability, few parameters, high accuracy, and strong robustness. However, its optimization strategy has the defects of simplicity and randomness, which make the algorithm premature, cause convergence stagnation, and other problems. In order to accelerate the convergence rate of the artificial bee colony algorithm, the concept of momentum [
25,
26] in deep learning is introduced, and a new momentum-driven multi-objective artificial bee colony algorithm is proposed to optimize the three parameters. including the feedback from the link channel
αL, link strength
β, and dynamic threshold attenuation factor
αθ.
3.3.1. Initial Population
The three parameters, including feedback from the link channel
αL, link strength
β, and dynamic threshold attenuation factor
αθ, are used as the initial population of the momentum-driven multi-objective artificial bee colony algorithm. Random generation of NP food source information
was performed according to Formula (13).
3.3.2. Hiring Bees Momentum Updating Strategy
NP food source information was randomly generated. During a food update evolution, a randomly selected food source
was attached to a hired bee in the bee colony. In the d-dimensional space, the randomly selected jth dimension component
of each food source
in the food source information space database was evolved through the following hired bee momentum update strategy, as shown in Equations (14) and (15), to obtain a new food source
. Among them,
. In Equations (14) and (15),
represents the update step size of the previous update evolution,
represents the update step size obtained after the current momentum update evolution,
represents momentum, and the value is 0.9.
3.3.3. Observation Bees Nesterov Momentum Updating Strategy
In a food update evolution, the selection probability of observation bees was calculated according to Formula (16), and a randomly selected food source
was attached to an observation bee in the bee colony. In the d-dimensional space, the randomly selected jth dimension component
of each food source
in the food source information space database was evolved through the following observation bees Nesterov momentum updating strategy, as shown in Formulas (17) and (18), to obtain a new food source
. Among them,
. In Formulas (17) and (18),
represents the update step size of the previous update evolution,
represents the update step size obtained after the current Nesterov momentum update evolution,
represents momentum, and the value is 0.9.
.
3.3.4. Pareto Grid Density Construction Method
In multi-objective optimization problems, individuals are judged by dominance and dense information. In this paper, a grid density construction method is used to ensure that the distribution of optimal solutions in the Pareto optimal solution set (also known as Pareto) is not too dense. The grid is a dynamic, nGrid bisected interval within the range of (−inf, +inf). Here, nGrid is a variable, which represents the number of divided grids. The value inf represents a number, which is far less than infinity.
The maximum and minimum values of each dimension of the median value of the nondominated solution were determined. The predefined
nGrid was used to divide the current interval, which was divided into
nGrid + 1. The minimum interval starts from negative infinity
inf, and the maximum interval ends at positive infinity +
inf, to prevent the nondominated solutions from crossing the boundary, and make the nondominated solutions fall in the grid. The formula for solving the grid index value is shown in (19). The value low
i represents the minimum boundary value of the grid, and Target represents the number of objective functions.
.
3.3.5. Pareto Optimal Solution Set Construction Method
First, constructing the optimal solution set requires a certain probability to randomly delete redundant nondominated solutions. The method to construct the deletion selection probability involves the use of the absolute value of the difference between the nondominated solution and the grid index value of the same dimension for operation. The formulas are shown in (20) and (21). The larger
the nondominated solution corresponding to Formula (20), the harder it will be to delete. The advantage of this is that the preference for a certain optimization objective brought by the nondominated solution interval is reduced, and the unified operation for all optimization objectives can be carried out fairly to obtain a relatively fair solution with the possibility of deletion.
3.3.6. Calculation Method of Multi-Objective Fitness
To solve the problem of the diversity of image fusion quality evaluation functions, two image fusion quality evaluation functions, cross entropy (CE) and mutual information (MI), are selected to form a multi-objective optimization problem for two objectives. The formula is shown in (22).
3.4. PCNN Model Structure Guided by Saliency Mechanism
The model structure is shown in
Figure 1.
6. Conclusions
The traditional method of spatial domain fusion is to create a fusion model in the image gray space, which has the disadvantage of not finding the texture and boundary characteristics of the source image easily. A PCNN model has the defects of parameter experience setting, nonadaptive termination, and easy over-segmentation. This paper proposes a PCNN model guided by the saliency mechanism and applies it to the fusion of ToF and visible light heterogeneous images collected by a binocular acquisition system in an orchard environment. The iteration termination conditions and dynamic threshold amplification coefficients Vθ, the feedback items of the link channel αL, link strength β, and dynamic threshold attenuation factor αθ are improved adaptively. A new momentum-driven multi-objective artificial bee colony algorithm (MMOABC) is used for parameter optimization. The proposed NSST-SMPCNN method combines the saliency mechanism, saliency function and PCNN clustering segmentation mechanism, and has the advantages of a grayscale clustering lighting mechanism and the same grayscale attribute first lighting, which is suitable for heterogeneous image fusion in complex orchard environments in Gansu. The data results show that the NSST-SMPCNN algorithm described in this paper has the best fusion effect on the ToF confidence image and the corresponding visible light image collected in the natural environment. The values of nine indicators, including AG, ES, IE, SD, PSNR, SF, IC, MI, and SSI, indicated excellent performance.
However, some data test results in the public dataset still have the disadvantage of a poor fusion effect, which needs further improvement. In future work, it is necessary to introduce a deep learning convolutional neural network to further explore the algorithm structure to capture better image features and improve the fusion effect.