*4.2. Feature Fusion*

The feature extraction is introduced in Section 3. In this section, the two features are fused to a feature vector. Support *F* and *T* are the BOF\_SC and BOF\_DP features, respectively. Firstly, different weights α and β (support to α + β = 1) are assigned to *F* and *T*, thus the feature vector can be expressed as *FV* = [α*F*, β*T*]. The larger is the weight, the greater is the role of the feature in the fused feature. Because these weights greatly influence the final recognition result, α and β are usually determined after many experiments. As *F* and *T* are sparse matrices, *FV* is still a sparse matrix; it might be easy for classification, but it requires much memory. Thus, a direct linear discriminant analysis (LDA) algorithm [31] is used for dimensionality reduction. Finally, the final dimensionality of the feature vector is 1000.

#### *4.3. Classification*

There are many classifiers for leaf recognition, such as support vector machine (SVM) [10], probabilistic neural network (PNN) [5], K nearest neighbor (KNN) [32], and random forests [33]. The most commonly used is SVM for its high accuracy and easy of use. Liblinear [34] and Libsvm [35] are two popular SVM tools for classification. Although Libsvm and Liblinear can achieve similar results in linear classification, Liblinear is much more efficient than Libsvm in both training and prediction. When the number of samples is large, Liblinear is significantly faster than Libsvm [36]. Thus, we use Liblinear rather than Libsvm. Given a set of training leaf features *Fvi*, *yi* ∈ [1, ... , *N*], where *N* is the number of leaf species, when Liblinear is used for leaf recognition, the problem can be defined as:

$$\mathbf{r}\_{i} = \mathbf{a}\mathbf{g}\mathbf{m}\mathbf{a}\mathbf{x}\_{n\in[1,\dots,N],\boldsymbol{u}\neq\boldsymbol{y}\_{i}}\boldsymbol{w}\_{n}^{T}\mathbf{F}\mathbf{v}\_{i}\tag{8}$$

$$\min\_{\alpha\_1, \dots, \alpha\_N} \left\{ \sum\_{n=1}^N \|\alpha\_n\|^2 + c \sum\_i \max\left(0, 1 + \alpha\_{r\_i}^T F v\_i - \alpha\_{y\_i}^T F v\_i\right) \right\} \tag{9}$$

$$\boldsymbol{\mathfrak{Y}} = \mathbf{ar}\mathbf{g}\mathbf{n} \mathbf{a} \mathbf{x}\_{n \in [1, \dots, N]} \boldsymbol{a}\_{n}^{T} \mathbf{F} \boldsymbol{v}\_{i} \tag{10}$$

When Liblinear works, it will learn a multi-class space. *ri* represents the *i*th class learned from training data. In Equation (9), the first part is a linear regularization term or linear kernel. *c* is the weight of linear kernel. For the testing data, the predicted labels are defined by Equation (10).
