**1. Introduction**

The traditional plant classification method is mainly realized by artificial recognition, which has the disadvantages of being time-consuming, susceptible to subjective judgment, and low recognition accuracy, far from meeting the requirements for rapid and accurate plant identification. Therefore, the rapid and accurate identification of plants is very challenging and meaningful. Plant recognition has been a challenging study since early last century, and plants play an irreplaceable role in human life. In the last decades, many researchers have studied image processing and pattern recognition as well as paid extensive attention to plant recognition. They have used images of plant organs (e.g., leaf, flower, fruit, and bark) for plant recognition.

In fact, although the images of flower, fruit, and bark have been employed for plant recognition, they have low recognition rates. In addition, these organ images have some limits; for instance, the flowering period is short and the texture of bark is unstable. Compared with flower, fruit, and bark, leaf images can be collected easily during the year, and its shape and texture are also stable. Therefore, the leaf is used as one of the important features for identifying plants. Most methods for plant recognition based on image processing rely on leaf images. In other words, plant species are recognized by leaf recognition.

In pattern recognition, using shape, texture, and color features for classification has been widely used. Soumyabrata et al. [1] proposed an improved text-based classification method to improve the classification results by integrating color and texture information. In addition, different color

components and other parameters were compared and evaluated. Kristin et al. [2] introduced pattern recognition and computer vision as well as the application of texture features and pattern recognition. However, most leaves have small inter-class color differences, and some leaves have large intra-class color differences. As illumination may be uneven under natural conditions, the color features will affect recognition results. Therefore, the proposed method uses shape and texture features, which are more robust.

Both shape and texture features are used for leaf recognition. In 2012, Kumar et al. designed a mobile application Leafsnap, where histograms of curvature over scale (HoCS) [3] as a single (shape) feature was employed for plant identification. Other shape features are also used for leaf recognition, such as centroid-contour distance (CCD) [4], aspect ratio [5], Hu invariant moments [6], polar Fourier transform (PFT) [6], inner distance shape context (IDSC) [7], sinuosity coefficients [8], multiscale region transform (MReT) [9], etc. However, some leaves from different kinds of plants are very similar; the shapes of those leaves even cannot be differentiated by the naked eye. Hence, it is reasonable to use both shape feature and texture feature for leaf recognition. The most commonly used texture features contain entropy sequence (EnS) [10], histogram of gradients (HOG) [11], Zernike moments [12], scale invariant feature transform (SIFT) [13,14], gray-level co-occurrence matrix (GLCM) [15], and local binary patterns (LBP) [15]. Fu et al. [16] proposed a hybrid framework for plant recognition with complicated background. They extracted the block LBP operators as the texture features and calculated the Fourier descriptors as the shape features. Saleem et al. [17] combined 11 shape features, 7 statistical features, and 5 vein features for leaf recognition. Chaki et al. [18] used Gabor filter and GLCM to model texture feature and used a set of curvelet transform coefficients together with invariant moments to capture shape feature. Shao [19] proposed a new manifold learning method, namely supervised global-locality preserving projection (SGLP), for plant leaf recognition. Chaki et al. [20] proposed a novel approach by using the combination of fuzzy-color and edge-texture histogram to recognize fragmented leaf images. Some features based on Gabor filters [21,22], fractal dimension [23], locality projection analysis (SLPA) [24], kernel based principal component analysis (KPCA) [25], bag of word (BOW) [22,26] and convolutional neural networks (CNN) [27] are also used for leaf recognition.

In this paper, a new leaf feature called BOF\_DP based on dual-output pulse-coupled neural network (DPCNN) and BOF is proposed, and an improved shape context called BOF\_SC is also used in our plant image recognition system. The rest of the paper is organized as follows. Section 2 briefly introduces some related basic theories, including DPCNN and BOF. Section 3 introduces the theories related to feature extraction. Section 4 introduces the details of our proposed recognition method. Section 5 presents some comparative experimental results on several representative leaf image datasets.

#### **2. Theory for Plant Recognition**

#### *2.1. Dual-Output Pulse-Coupled Neural Network*

DPCNN was proposed by Li for geometry-invariant texture retrieval in 2012 [28]. The structure of DPCNN model is shown in Figure 1.

**Figure 1.** Structure of DPCNN.

The mathematical expressions of DPCNN model are as follows:

$$F\_{ij}[n] = fF\_{ij}[n-1] + S\_{ij}[n](V\_F \sum\_{k,l} M\_{ij\text{kl}} Y\_{kl}^{\text{dl}}[n-1] + \gamma) \tag{1}$$

$$Y\_{ij}^F[n] = \begin{cases} 1, F\_{ij}[n] > T\_{ij}[n] \\ 0, \text{otherwise} \end{cases} \tag{2}$$

$$\mathbb{L}\mathcal{U}\_{ij}[n] = F\_{ij}[n] + V\_{\mathbb{L}}\mathbb{S}\_{ij}[n] \sum\_{kd} \mathcal{W}\_{ijkd} \mathbb{Y}\_{kd}^{\mathbb{F}}[n] \tag{3}$$

$$Y\_{ij}^{LI}[n] = \begin{cases} \ 1, \mathcal{U}\_{ij}[n] > T\_{ij}[n] \\ 0, \text{otherwise} \end{cases} \tag{4}$$

$$\mathbb{E}\left[T\_{ij}[n+1]\right] = \mathbb{g}T\_{ij}[n] + V\_E \mathbb{Y}\_{ij}^{lI}[n] \tag{5}$$

$$S\_{i\bar{j}}[n+1] = (1 - Y\_{i\bar{j}}^{L}[n] + Y\_{i\bar{j}}^{F}[n])S\_{i\bar{j}}[n] + (Y\_{i\bar{j}}^{L}[n] - Y\_{i\bar{j}}^{F}[n])A\_{i\bar{j}}\tag{6}$$

where *S* is the external stimulus, and it changes depending on the current outputs *Y<sup>F</sup>* and *YU*. *VE*, *VU*, *VF*, *f*, and *g* are fixed constants between 0 and 1. *W* and *M* are the connection weights which the current neuron communicates with its neighbors. *Y<sup>F</sup>* is the feeding out and *Y<sup>U</sup>* is the compensating output.

Each neuron of DPCNN is an active neuron, which can be ignited by the feedback input or internal activity of the neuron to generate output pulse. First, the feedback input (*Fij*) changes due to the influence of external stimuli and external compensation output from neighboring neurons. Once the value of the feedback input (*Fij*) exceeds the active value, the neuron generates a feedback output pulse. Then, the feedback output, feedback input, and external stimulus from the neighboring neurons work together to change the value of the internal activity (*Uij*). Once the value of the neuron's internal activity item exceeds its activity threshold, a compensation output pulse is generated. Finally, the activity threshold (*Eij*) and external excitation (*Sij*) values are updated.

The pulse sequence generated by pulse-coupled neural network (PCNN) can represent the image edge and texture information; thus, it can extract effective image features. However, there are still some limitations in feature extraction. For example, there is only one pulse generator in the entire neuron model, and the excitation of neurons lacks a compensation mechanism. DPCNN is improved based on the PCNN model. Compared with PCNN, DPCNN has the following advantages: (1) each neuron of DPCNN has two chances to be excited; (2) DPCNN can adaptively change the size of the external excitation of each neuron; and (3) received local stimuli from peripheral neurons are affected by the modulation of the input stimulus. In addition, DPCNN also has translation, rotation, scale invariance, and robustness.

When DPCNN is used for feature extraction, the input image must be a gray image and the intensity of a pixel should be between 0 and 1. In our tests, the parameters were the same as in Ref. [28], except the iterations. The output of each iteration is a binary image, which is called pulse image. The entropy of the pulse image is used as a feature, and, after *n* times-iteration, the feature EnS, which is a vector with lengths of *n*, is obtained.

#### *2.2. Bag of Feature*

BOF model represents an image as an orderless collection of local features, and it has been widely used in pattern recognition. After the efforts of many researchers, the BOF model, which is used with spatial pyramid matching (SPM) [29] and locality-constrained linear coding (LLC) [30], has good performance in many studies. The flow chart of the BOF classification model is shown in Figure 2.

**Figure 2.** BOF classification system.

LLC is a linear coding scheme with local constraints. Local constraint makes coding results more accurate and acquires spare code, and it improves the speed of training and classification. The mathematical expression of LLC is as follows:

$$\min\_{\mathbf{c}} \sum\_{i=1}^{M} \|\mathbf{x}\_{i} - B\_{\mathbf{c}\_{i}}\|^{2} + \lambda \|d\_{i} \oplus c\_{i}\|^{2} \tag{7}$$
 
$$\text{s.t.}\\\mathbf{1}^{T}\mathbf{c}\_{i} = \mathbf{1}, \ \forall i$$

where *X* = {*x*1, *x*2, ... , *xN*} is the feature descriptor set obtained after the origin image blocking; *ci* is the coding result of *xi*; ⊕ denotes the dot product operation; and *di* is the Euclidean distance between *xi* and *B*.

SPM is an algorithm of image matching, recognition, and classification using spatial pyramid, and it is a method for obtaining the spatial information of the image by statistically distributing image feature points on images of different resolutions. Generally, it has two steps:


When BOF works, each image of the dataset is divided into many blocks, and the size of the block is always 8 × 8. To get a better effect, neighboring blocks are combined as a patch. The collection of patches can be regarded as a bag of components. Generally, the sizes of the patches of the collection are too big, and many patches are similar. Thus, it is reasonable to unify similar patches into a standard component. In fact, the above operation is calculated in feature space such as SIFT space and HOG space. Patches are expressed by feature extracted from themselves; the collection of standard components counted by K-means is called codebook; and the standard component is called code. For each patch, it is described by its neighborhood (the code in codebook) using a histogram which is the function of LLC. Finally, the histogram is pooling by SPM, and the sparse and smooth feature is obtained.
