Transfer Learning for Leaf Small Dataset Using Improved ResNet50 Network with Mixed Activation Functions

Zhang, Ruolei; Zhu, Yijun; Ge, Zhangshangjie; Mu, Hongbo; Qi, Dawei; Ni, Haiming

doi:10.3390/f13122072

Open AccessArticle

Transfer Learning for Leaf Small Dataset Using Improved ResNet50 Network with Mixed Activation Functions

by

Ruolei Zhang

,

Yijun Zhu

,

Zhangshangjie Ge

,

Hongbo Mu

,

Dawei Qi

and

Haiming Ni

^*

College of Science, Northeast Forestry University, Hexing Road 26, Harbin 150040, China

^*

Author to whom correspondence should be addressed.

Forests 2022, 13(12), 2072; https://doi.org/10.3390/f13122072

Submission received: 24 October 2022 / Revised: 23 November 2022 / Accepted: 1 December 2022 / Published: 5 December 2022

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Taxonomic studies of leaves are one of the most effective means of correctly identifying plant species. In this paper, mixed activation function is used to improve the ResNet50 network in order to further improve the accuracy of leaf recognition. Firstly, leaf images of 15 common tree species in northern China were collected from the Urban Forestry Demonstration Base of Northeast Forestry University (45°43′–45°44′ N, 126°37′–126°38′ E, forest type was artificial forest), and a small leaf dataset was established. After that, seven commonly used activation functions were selected to improve the ResNet50 network structure, and the improved network was applied to the transfer learning research of the leaf small dataset. On this basis, five activation functions with better performance were selected for the study of mixed activation functions in deep learning. Two of these five activation functions are arbitrarily selected for combination, and a total of twenty combinations are obtained. Further, the first activation function was used in each combination to replace the first ReLU function after all addition operations in the ResNet50 network residual block structure, and another activation function was used to replace the other position ReLU functions. The experimental results show that in the transfer learning of the leaf small dataset using the ResNet50 deep residual network, the appropriate combination of mixed activation functions can increase the performance of the improved network to a certain extent. Among them, the ELU-Swish1 combination has the most significant improvement effect on the network performance, whose final effective validation accuracy reaches 98.17%. Furthermore, the comparison with GoogLeNet and VGG-16 also demonstrates the excellent performance of the improved ELU-Swish1 ResNet50 (ES-ResNet50) network architecture. Finally, tests on the other two small leaf datasets, Flavia and Swedish, also demonstrate the performance improvement of ES-ResNet50. The validation accuracy of the improved ES-Resnet 50 algorithm on these two datasets reaches 99.30% and 99.39%, respectively. All these experiments prove that the recognition performance of leaf transfer learning using the ES-ResNet50 network is indeed improved, which may be caused by the complementarity of the e-exponential gradient of ELU and Swish1 activation functions in the negative region.

Keywords:

residual block; neural network; mixed activation functions; deep learning; leaves recognition

1. Introduction

Plant species identification is of great significance to the research of forestry, agriculture, gardens, and plant diversity. Traditional manual classification methods are very cumbersome. It is difficult for researchers without a rich background of specialized knowledge to classify plants with highly similar appearance quickly and accurately [1]. Therefore, using computer-aided technology to realize automatic classification and recognition of plant species is always an important research topic. Leaves play an important role in plant species recognition because of their universality and morphological stability. The improvement of leaf recognition technology will also promote the development of plant recognition [2,3].

Leaf geometric features have become the most commonly used feature parameters in plant leaf recognition because they are convenient for visual observation [4,5,6]. Hu et al. [7] proposed a leaf feature extraction method based on multi-scale distance matrix, which makes the captured leaf features invariant to translation, rotation, symmetry, and scaling. Zhang et al. [5] reviewed leaf characterization methods based on local and global contour features. Then, a leaf representation method using area ratio to quantify the concavity and convexity of contour points on different scales is proposed, which can effectively obtain the global information and contour details of leaves. The multi-granularity fusion method combining edge features and global shape features is used to better represent the leaf feature information. After reviewing and discussing the existing plant leaf recognition methods based on shape or texture features, Yang introduced a novel plant leaf recognition method that uses multi-scale triangular descriptor to represent the shape information of plant leaves, uses local binary pattern histogram Fourier (LBP-HF) as texture features, and finally uses weighted distance to combine shape and texture features [8]. Although a large number of advanced mathematical theories have been applied to leaf feature extraction, with the increase of species and quantity, the feature parameters required to accurately identify plant species are becoming more complex, and the limitations of leaf recognition algorithms based on geometric feature parameters are also highlighted.

With the maturity of neural network and deep learning technology, the research of object recognition using a deep transfer learning method is becoming increasingly popular [9,10,11]. Deep neural networks can fully capture the shape and texture features of objects through autonomous learning, avoiding the limitation of incomplete features caused by extracting geometric feature information in advance. Transfer learning can directly transfer the trained model parameters to new tasks, help the training of new models, and solve the time consumption problem of training deep neural networks from scratch [12]. Based on these advantages, various deep transfer learning models have been rapidly used in the field of leaf recognition. S. Pereira et al. [13] removed the last three classification layers of AlexNet network and replaced them with a new feature classification layer, achieving a transfer learning recognition rate of 89.75% for grape leaves. Jiang et al. [14] realized the transfer learning of rice and wheat diseased leaves by adjusting the classification layer and fully connected layer of the VGG16 model, which had been pre-trained on an ImageNet dataset. Chen et al. [15] improved the traditional VGGNet by expanding the final convolution layer and performing batch normalization. They tested it on the common data set of rice leaves, and the recognition accuracy reached more than 90%. All of these studies create a new transfer learning model as a fixed feature extractor for leaf recognition by replacing the last fully connected layer of the deep network or fine-tuning part of the convolutional layer near the output according to the new classification requirements. To some extent, it reflects the limitations of the current research on deep neural network transfer learning.

Activation function is an indispensable part of a neural network model [16]. For the specific research object, the appropriate activation function can significantly improve the performance of the network, such as precision, accuracy, and convergence speed [17,18,19,20]. At present, only one activation function is usually used in the construction of neural network models [12,14], and the optimal network combination is selected by replacing all activation functions in the whole network, which effectively expands the adjustment range of transfer learning. However, there are few studies on the mixed use of multiple activation functions in the same network, and the existing cases of mixed activation functions are only combined at the last activation layer of the network [21].

This study carried out relevant research on the mixed use of activation functions in the overall structure of ResNet50 deep network, with the main objectives as follows:

(1) Explore the influence of using a single nonlinear activation function to improve the overall ResNet50 network on the transfer learning of leaf small datasets and select several activation functions with good performance from the common nonlinear activation functions.

(2) According to the characteristics of residuals structure, construct the mixed activation function combinations to improve the ResNet50 network architecture, and further test their performance changes in the transfer learning of leaf small datasets.

(3) Verify the performance stability of the improved ResNet50 network by using different leaf small datasets, comparing with other networks, and comparing with algorithms in other references.

The rest of this paper is summarized as follows: Section 2 introduces the materials, methods, and training parameters. Section 3 presents the experimental results and discussion. Section 4 summarizes the content of the paper.

2. Materials and Methods

2.1. Leaf Dataset

The leaf dataset used in this experiment is a small dataset composed of leaf images of 15 common native tree species in Northeast China, with a total of 369 samples. All samples were collected from the Urban Forestry Demonstration Base of Northeast Forestry University, with geographical coordinates of 45°43′–45°44′ N, 126°37′–126°38′ E. The base was built in 1948, with a total area of 43.95 hectares and a tree planting area of nearly 35 hectares. Among them, there are 2.5 hectares of tree specimen park, 25 hectares of various kinds of plantations, and more than 180 kinds of introduced tree species [22]. Figure 1 shows the satellite map of the Urban Forestry Demonstration Base of Northeast Forestry University. Leaves with complete outline, glossy surface, and no disease or insect pests were collected. Table 1 lists the scientific names and sample size of these 15 different leaf species. All the leaf images were captured using a Nikon D850 digital camera (manufactured by Nikon Image Instrument Sales Co., LTD., Songjiang District, Shanghai, China) in a lab with stable lighting. The leaf was placed on the experimental platform, the camera lens was vertically placed 40 cm above the leaf, and the sample image of the leaf was collected by automatic exposure. The collected original images are 24-bit RGB color images with a resolution of 3840 × 2160 pixels. All sample image sizes are reduced to 224 × 224 to facilitate classification using depth neural network. The bicubic interpolation algorithm is used for image compression, and the spatial resolution of the output image is 300 dpi. Figure 2 shows the 15 different leaf species in this dataset. In the experiment, the ratio of training set to test set is 7:3.

2.2. ResNet50

With the rapid development of artificial neural network theory and the continuous improvement of computer hardware, the application of deep learning in the field of image recognition has also made new breakthroughs [23]. Neural network integrates different classification features in the form of layer structure, and the number of features can be enriched by increasing the number of stacked layers (depth). With the increase of network depth, the recognition accuracy of the network will gradually become saturated, and then degrade rapidly. The reason for this degradation is the increase of training errors caused by the deepening of network layers, rather than the overfitting. In order to solve the problem of gradient degradation of deep networks, He et al. proposed the residual network structure, which can transmit more accurate feature information to deeper network levels by shortcut [24].

ResNet50 is a typical deep residual network whose structure mainly composed of 50 layers, including 49 convolutional layers and one fully connected layer. Among them, 49 convolutional layers can be divided into 5 parts, the first part of input preprocessing contains a convolutional layer, and the second to fifth parts are all composed of bottleneck building blocks [24]. Each bottleneck building block is composed of several convolutional layers, batch normalizations, rectified linear unit activation functions, and a shortcut [24]. The overall architecture of ResNet50 network is shown in Table 2, and the structure of the basic bottleneck building blocks that make up ResNet50 is shown in Figure 3. The residual building block can be expressed by the following formula:

y = F (x) + x

(1)

where

F (x)

is called the residual function, and x and y are the input and output parameters of the residual function, respectively. It is worth noting that y is actually the x of the next residual block.

2.3. Transfer Learning

Transfer learning is a machine learning technology, which can quickly apply the previously learned knowledge to solve similar new problems [10]. To achieve high prediction performance, deep learning requires a large number of pre-labeled datasets. On the one hand, training deep neural networks from scratch using large datasets is usually a very time-consuming process [25]. On the other hand, in practical research work, it is usually difficult to obtain a large amount of labeled image data. Therefore, transfer learning is often applied to train neural networks using relatively small datasets and has been proven to be a very effective method [26].

Since the leaves datasets used in this experiment are all small sample sets, it is easy to cause overfitting by using ResNet50 for training directly. Therefore, the pre-trained ResNet50 network architecture on the ImageNet was selected for the transfer learning to realize the accurate classification of leaf small datasets.

2.4. Activation Function

Activation function is a kind of nonlinear function, which is used to increase the nonlinearity of the network model between the output of the upper node and the input of the lower node in the multilayer neural network [27]. For a specific training model, the selection of an appropriate activation function can effectively increase the performance of neural network [17].

In order to improve the network performance of ResNet50 for transfer learning of leaf small dataset as much as possible, this paper selects seven commonly used activation functions of Tanh, ReLU, clipped ReLU, leaky ReLU, softplus, ELU, and Swish to transform and test the ResNet50 network. The details of these seven activation functions are as follows:

Tanh is a typical activation function with vanishing gradient problem, and its saturation interval is symmetric in positive and negative intervals [28]. Tanh function is calculated by Formula (2).

T a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(2)

Rectified Linear Unit (ReLU) can effectively suppress the gradient vanishing problem of deep neural network, and its proposal has made great progress in deep learning [29]. ReLU is defined as Equation (3).

R e L U (x) = \max (0, x)

(3)

When the learning rate is relatively large, the weights input to ReLU may all be negative. At this time, the output value of the ReLU function is always equal to 0, resulting in neuron death. Therefore, a Leaky Rectified Linear Unit activation function (LReLU) is proposed, which is defined as follows [29]:

L R e L U (x) = \max (α x, x)

(4)

where

α

is an initialization constant, usually taken as 0.01.

Exponential Linear Units (ELU) is also proposed to solve the problems of ReLU function. By using the exponential form to represent the negative part of the function, the average value of the output is close to 0, and it is easier to converge [30]. ELU is described as:

E L U (x) = {\begin{matrix} x, x > 0 \\ α (e^{x} - 1), x \leq 0 \end{matrix}

(5)

where

α

is also an initialization constant, generally take

α = 1

. Formula (5) is simplified as:

E L U (x) = {\begin{matrix} x, x > 0 \\ e^{x} - 1, x \leq 0 \end{matrix}

(6)

Clipped Rectified Linear Unit (CReLU) sets an upper-line value based on the ReLU function, and its form is expressed as follows [31]:

C R e L U (x) = {\begin{matrix} 0, x < 0 \\ x, 0 \leq x < c \\ c, c \leq x \end{matrix}

(7)

where c is the ceiling factor, and its value is set according to the actual situation, generally between 6–10.

Softplus is a nonlinear, continuous, and differentiable function, which can be regarded as a smooth form of ReLU [32]. It has all the advantages of ReLU activation function. The specific formula is defined as:

S o f t p l u s (x) = \ln (1 + e^{x})

(8)

Swish is a new type of composite activation function [33], and its expression is defined as follows:

S w i s h (x) = \frac{x}{1 + e^{- β x}}

(9)

where

β

is a trainable parameter, when

β = 0

, the Swish activation function becomes a linear function

S w i s h (x) = x / 2

. When

β

approaches positive infinity, the Swish activation function approximates ReLU. Thus, Swish can be regarded as a smooth function between the linear function and ReLU. When taking

β = 1

, the Swish function reduces to a special form as follows:

S w i s h 1 (x) = \frac{x}{1 + e^{- x}}

(10)

Figure 4 shows the five waveforms of Tanh, ReLU, LReLU, ELU, and Softplus among the seven activation functions mentioned above. It can be seen from the curve that ReLU and LReLU are the closest, the trends of ELU and Softplus are similar, and the Tanh function is significantly different from the other four activation functions. Figure 5 shows the CReLU waveforms for two common cases that the upper limit factor c is taken as 6 and 10, respectively. The flexible value of c makes the CReLU activation function variable. Figure 6 shows the waveforms of the Swish function when the parameter

β

takes positive infinity, 1, 0.1, and 0, respectively. It can be seen that the waveforms range of the Swish function are between ReLU and the linear function

x / 2

. The f(x) in Figure 4, Figure 5 and Figure 6 refers to the above seven activation functions.

2.5. Server Configuration and Training Parameter Settings

All network construction, training, and transfer learning in this paper were implemented by using the MATLAB 2021 deepNetworkDesigner toolbox (developed by Mathworks in Natick, MA, USA). Meanwhile, all the computations and network training were accomplished through a 3.2 GHz Inter Core i5 server with 16 GB RAM. After repeated testing, taking into account training efficiency and model accuracy, we determined the basic training parameters for the leaf small dataset. The details of the basic network training parameters Epoch, batch size, and learning rate used in the experiment are shown in Table 3.

2.6. Evaluation Index System

Three evaluation metrics, validation accuracy (A), validation loss (L), and training time (T), are used to evaluate the performance of the ResNet50 network structure before and after improvement. Further, precision (P), recall (R), and F₁-score (F₁) are used to evaluate the test accuracy of deep learning algorithms, where F₁-score represents the harmonic mean of precision and recall [34,35]. These parameters are defined as follows:

A = \frac{T P + T N}{T P + F P + T N + F N}

(11)

P = \frac{T P}{T P + F P} \times 100 %

(12)

R = \frac{T P}{T P + F N} \times 100 %

(13)

F_{1} = 2 \times \frac{P \times R}{P + R}

(14)

where TP, FP, TN, and FN denote true positive, false positive, true negative, and false negative, respectively.

L = - \frac{1}{n} \sum_{i = 1}^{n} [y_{i} (x) \log (p_{i} (x)) + (1 - y_{i} (x)) l o g (1 - p_{i} (x))]

(15)

where validation loss is characterized by cross entropy [36], n denotes the number of samples,

y_{i}

means the truth class, and

p_{i}

represents the predicted probability of the sample i. Cross entropy can reflect the difference between the real probability distribution and the predicted probability distribution. The smaller the cross-entropy value is, the better the prediction effect of the model is.

3. Results and Discussion

3.1. Improving ResNet50 with Single Activation Function

All activation functions in the classic ResNet50 network structure are ReLU. In this set of experiments, six activation functions of Tanh, LReLU, ELU, CReLU6, Softplus, and Swish1 are used to replace all ReLU activation functions in the ResNeT 50 network trained by ImageNet, respectively. Then, the leaf small dataset mentioned in Section 2.1 are used for transfer learning on the seven ResNet network architectures before and after improvement. Figure 7 and Figure 8 are separately the corresponding validation accuracy and validation loss trend curves. Validation accuracy, validation loss, and training time are used to measure the performance of the network. Best validation accuracy (BVA), finally validation accuracy (FVA), best validation loss (BVL), finally validation loss (FVL), training time to achieve optimal performance (OT), and time to complete the entire training process (ET) are used to measure the performance of the network. The detailed performance of transfer learning on leaf small dataset using the above seven ResNet 50 network architectures are shown in Table 4. From the perspective of classification performance, for the transfer learning of the leaf small dataset, only the improved ResNet50 architecture using the LReLU activation function has better overall performance than the ReLU architecture. Although the validation accuracy of the two architectures is the same, the improved ResNet50 based on LReLU activation function has lower validation loss and faster convergence speed. The loss of ResNet50 network architectures improved by using ELU or Softplus activation function are both slightly lower than ReLU at the best or finally validation accuracy. The best validation accuracy of ResNet50 network improved by Swish1 function reached 92.66%, while the performance of the ResNet50 network improved by Tanh or CReLU is poor.

3.2. Improving ResNet50 with Mixed Activation Function

According to the training results in Section 3.1, considering the verification accuracy and verification loss, we selected five activation functions (ReLU, LReLU, ELU, Softplus, and Swish) with good performance to improve the ResNet50 network architectures, and explore the impact of mixed use of activation functions on the performance of deep network. Considering the structural characteristics of the residual block, two of the above five activation functions are selected each time to improve the ResNet50 network architecture. Figure 9 is a schematic diagram of the residual block structure of mixing two activation functions, in which activation 1 represents the activation function used after addition (in particular, the first activation function of the input layer of the ResNet50 network is also divided into activation 1), and activation 2 represents another activation function used in other positions. The five selected activation functions are used as activation 1, respectively, and there are 20 combinations in total. The results of the transfer learning for the leaf small dataset of the ResNet50 network structure changed by these 20 mixed activation functions are shown in Table 5. Figure 10 shows the validation accuracy and validation loss curves of these 20 improved ResNet50 network architectures, respectively.

From the experimental results, it can be seen that the performance of the ResNet50 network changed by some mixed activation functions has been improved to a certain extent, and all the mixed models have not experienced a significant drop in training efficiency. For most of the hybrid methods, the network training performance is compromised. Among them, the best effective validation accuracy performance of the combination of ReLU-Swish1, LReLU-Swish1, and Softplus-LReLU are still as high as 99.08%, which are the same as when LReLU (or ReLU) activation function is used alone. Both the best validation accuracy and the final validation accuracy of the Softplus-ELU combination are the same as that of using Softplus activation function alone. The best validation loss of the two combinations ReLU-Softplus and LReLU-Softplus decreases slightly. The validation accuracy of the Swish1-ELU combination is significantly improved. Among all the improved ResNet50 network structures in this experiment, the ELU-Swish1 combination shows the best performance. Its best valid validation accuracy, final valid validation accuracy, and their corresponding valid validation losses are 100%, 98.17%, 0.1604, and 0.1885, respectively.

It is not difficult to find that among these activation function combinations that can improve network performance after mixed use, there is at least one activation function related to the e-index. It mainly involves three activation functions: ELU, Softplus, and Swish1. As shown in Figure 11, when the three activation function curves are drawn into the same coordinate system, it can be found that the enclosed areas of the two combinations of ELU-Swish1 and ELU-Softplus are mainly concentrated in the part where the input x is less than 0. This may be because the gradient of the exponential function is closer to the natural gradient, and when the ELU is used at the first activation function position after the addition of the residual block structure, more complete negative input information can be saved. Afterwards, using Swish1 or Softplus activation function in other locations can further refine the information. When the input is negative, the outputs of the ELU and Swish1 activation functions are also negative, and they show the best complementary performance in transfer learning on leaf small datasets.

3.3. Comparison with Other Deep Neural Networks

In order to further prove the good performance of the ES-ResNet50 network architecture improved using the ELU-Swish1 activation function, three networks of GoogLeNet, VGG-16, and ResNet50 are selected for a more comprehensive comparison [14,23,37]. Among them, GoogLeNet has high computing power because of its parallel computing. VGG-16 has strong fitting ability because of its rich parameters. Additionally, ResNet50 can solve the problem of gradient disappearance of deep network to a certain extent due to the introduction of residual structure. In this experiment, GoogLeNet, VGG-16, and ResNet50 networks pre-trained on ImageNet were used for the study of leaf small dataset transfer learning. In the process of migration learning, only the last full connection layer and output layer were changed according to the type of leaves. Figure 12 shows the histograms of average precision, average recall, average F1 score, final validation loss, final validation accuracy, and training time using these four networks for training on leaf small data. It can be seen from Figure 12 that the ES-ResNet50 algorithm is the best in the three parameters of precision, F1-score, and validation accuracy. In addition, its recall and validation loss performance also perform well, both improving over the pre-revamped ResNet50 network. Although the computation time of the ES-ResNet50 network is slightly increased compared with that before the improvement, it is not worth mentioning compared with the improvement of its comprehensive performance. Figure 13 shows the validation accuracy and validation loss convergence curves of these four network structures. Among them, the validation accuracy of the ES-ResNet50 network is excellent. Although the validation loss is not as good as that of GoogLeNet and VGG-16, it is greatly improved compared with the original ResNet50 network.

Figure 14, Figure 15, Figure 16 and Figure 17 shows the confusion matrix of the effective verification accuracy of the four networks in Figure 12. It can be seen that GoogLeNet has a total of four sample identification errors, involving one Ulmus pumila Linn misclassified as Amygdalus triloba (Lindl.) Ricker, one Tilia mandshurica Rup et Maxim misclassified as Tilia mandshurica Rup et Maxim, and two Fraxinus mandshurica Rupr. misclassified as Lonicera maackii (Rupr.) Maxim and Phellodendron amurense Rupr, respectively. VGG-16 has a total of five sample errors, involving one Salix matsudana Koidz misclassified as Phellodendron amurense Rupr, two Fraxinus mandshurica Rupr. misclassified as Phellodendron amurense Rupr, one White birch misclassified as Populus davidiana Dode, and one Juglans mandshurica misclassified as Lonicera maackii (Rupr.) Maxim. ResNet50 has three sample errors, including one Amygdalus triloba (Lindl.) Ricker misidentified as Armeniaca sibirica (L.) Lam, two Fraxinus mandshurica Rupr. separately misidentified as Lonicera maackii (Rupr.) Maxim and Phellodendron amurense Rupr. ES-ResNet50 had only two sample errors, one of which is that Fraxinus mandshurica Rupr. is misclassified as Lonicera maackii (Rupr.) Maxim, and the other is that Phellodendron amurense Rupr is misclassified as Fraxinus mandshurica Rupr.

It can be seen that the validation accuracy of Fraxinus mandshurica Rupr. is the lowest; it is easy to be misidentified as Lonicera maackii (Rupr.) Maxim or Phellodendron amurense Rupr. However, the confusion of other types of leaves is not universal in these four networks, especially the improved ES-ResNet50 network, which has no misidentification in the classification of other types of leaves. In conclusion, compared with the other three deep learning models, ES-Resnet 50 network has better classification performance.

3.4. Tests on Other Leaf Datasets

In this part of the test, Flavia [38] and Swedish [8] leaf datasets, which are widely used in leaf recognition research, are used to further verify the performance of ES-Resnet 50 network. The Flavia leaf dataset contained 32 species with a total of 1907 leaf samples, ranging from 50 to 77 for each category. The Swedish dataset contains 15 species, with 75 samples each class, and 1125 leaf images in total. Figure 18 and Figure 19 show the sample images in these two datasets, one for each category. Both of these two typical leaf datasets have many similar types, and the number of samples in the database is also small, which has similar characteristics to the small leaf datasets established in Section 2.1. Table 6 and Table 7 show the comparison of the proposed ES-ResNet50 transfer learning algorithm with other published results on Flavia and Swedish leaf datasets, respectively. A total of nine algorithms based on deep learning, machine learning, shape learning, and transfer learning are selected for comparison. From the results in these two tables, it can be seen that compared with the three types of methods based on deep learning, machine learning, and shape features, the algorithms based on transfer learning also have good recognition effect in the classification of small leaf datasets. In particular, the improved ES-ResNet50 algorithm both have the highest validation accuracy on Flavia and Swedish leaf datasets. On Flavia dataset, the validation accuracy of ES-ResNet50 method is 99.30, which is 0.02% higher than that of Dual-path CNN algorithm. On Swedish dataset, the validation accuracy of ES-ResNet50 method is 99.39%, which is 0.28% higher than that of CNN-RNN algorithm. Since the transfer learning algorithm is based on the pre-trained deep network model, compared with the other three types of algorithms, it has obvious advantages in operation speed. It further validates the effectiveness of the improved ES-Resnet50 algorithm in the classification and recognition of small leaf datasets.

4. Conclusions

This research attempts to improve the ResNet50 network that has been pre-trained on ImageNet by using the mixed activation function and explores its effectiveness in the transfer learning of leaf small datasets. Firstly, a leaf small dataset composed of 15 common native tree species in northern China collected from the Forestry demonstration base of Northeast Forestry University was established. Secondly, seven current mainstream activation functions are respectively used to improve the ResNet50 network structure, and five of them with good performance are selected for further research on mixed activation functions. Then, according to the structural characteristics of the residual block, the first activation function after the addition structure and the activation function of other positions are divided into two categories, and twenty combinations consisting of the five selected activation functions are used to replace them, respectively. From the experimental results, the mixed use of activation functions does not show a significant drop in network performance. Among them, the ELU-Swish1 combination finally showed the best network performance in the transfer learning training of the established leaf small dataset. Its best validation accuracy, final validation accuracy, and their corresponding validation losses are 100%, 98.17%, 0.1604, and 0.1885, respectively. Furthermore, the comparative experiment with GoogLeNet and VGG-16 networks also proves the excellent performance of the improved ES-ResNet50. Finally, Flavia and Swedish leaf datasets, which are widely used in the field of leaf recognition, were selected to compare the improved ES-Resnet 50 algorithm with other published algorithms. The recognition accuracy of the improved ES-ResNet50 transfer learning algorithm on Flavia and Swedish leaf datasets reaches 99.30% and 99.39%, respectively, which further proves the effectiveness of the improved algorithm. The improvement of network performance may be due to the complementarity of the e-exponential gradient of ELU and Swish1 activation functions in the negative region. This study finally proves that the proper mixed use of activation functions in the deep neural network structure can effectively improve the network performance to a certain extent. At the same time, it also provides useful references for leaf recognition and other related transfer learning studies.

Author Contributions

Conceptualization, H.N.; methodology, R.Z.; software, H.N.; validation, Y.Z.; formal analysis, R.Z.; investigation, Z.G.; resources, H.M.; data curation, H.M.; writing—original draft preparation, Y.Z.; writing—review and editing, R.Z.; visualization, H.N.; supervision, D.Q. and H.N.; project administration, H.N.; funding acquisition, H.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Fundamental Research Funds for the Central Universities (No. 2572022BC03), the Innovation Training program for college students of Northeast Forestry University (No. 202110225485), and the Project of National Natural Science Foundation of China (No. 31570712).

Data Availability Statement

Not applicable.

Acknowledgments

We are highly grateful to the anonymous reviewers and handling editor for their insightful comments, which greatly improved an earlier version of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shao, Y. Supervised global-locality preserving projection for plant leaf recognition. Comput. Electron. Agric. 2019, 158, 102–108. [Google Scholar] [CrossRef]
Siwar, J.; Didier, C.; Reda, B. Evidential two-step tree species recognition approach from leaves and bark. Expert Syst. Appl. 2020, 146, 113154. [Google Scholar] [CrossRef]
Jozsef, S. Plant leaf recognition with shallow and deep learning: A comprehensive study. Intell. Data Anal. 2020, 24, 1311–1328. [Google Scholar] [CrossRef]
Shanwen, Z.; Yingke, L. Modified locally linear discriminant embedding for plant leaf recognition. Neurocomputing 2011, 74, 2284–2290. [Google Scholar] [CrossRef]
Xiang, Z.; Wanqing, Z.; Hangzai, L.; Long, C.; Jinye, P.; Jianping, F. Plant recognition via leaf shape and margin features. Multimed. Tools Appl. 2019, 78, 27463–27489. [Google Scholar] [CrossRef]
Xin, C.; Bin, W. Invariant leaf image recognition with histogram of Gaussian convolution vectors. Comput. Electron. Agric. 2020, 178, 105714. [Google Scholar] [CrossRef]
Rongxiang, H.; Wei, J.; Haibin, L.; Deshuang, H. Multiscale distance matrix for fast plant leaf recognition. IEEE Trans. Image Process. 2012, 21, 4667–4672. [Google Scholar] [CrossRef]
Chengzhuan, Y. Plant leaf recognition by integrating shape and texture features. Pattern Recognit. 2021, 112, 107809. [Google Scholar] [CrossRef]
Lei, H.; Yangyang, Z.; Haonan, C.; Chandrasekar, V. Advancing Radar Nowcasting Through Deep Transfer Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4100609. [Google Scholar] [CrossRef]
Yao, Z.; Jian, H.; Qiming, Q.; Yuanheng, S.; Tianyuan, Z.; Hong, S. Transfer-learning-based approach for leaf chlorophyll content estimation of winter wheat from hyperspectral data. Remote Sens. Environ. 2021, 267, 112724. [Google Scholar] [CrossRef]
Liangtian, W.; Rong, L.; Lu, S.; Hansong, N.; Xianpeng, W. UAV swarm based radar signal sorting via multi-source data fusion: A deep transfer learning framework. Inf. Fusion 2022, 78, 90–101. [Google Scholar] [CrossRef]
Krishnamoorthy, N.; Prasad, L.N.; Kumar, C.P.; Subedi, B.; Abraha, H.B.; Sathishkumar, V.E. Rice leaf diseases prediction using deep neural networks with transfer learning. Environ. Res. 2021, 198, 111275. [Google Scholar] [CrossRef]
Pereira, C.S.; Morais, R.; Reis, M.J.C.S. Deep learning techniques for grape plant species identification in natural images. Sensors 2019, 19, 4850. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhencun, J.; Zhengxin, D.; Wenping, J.; Yuze, Y. Recognition of rice leaf diseases and wheat leaf diseases based on multi-task deep transfer learning. Comput. Electron. Agric. 2021, 186, 106184. [Google Scholar] [CrossRef]
Chen, J.; Chen, J.; Zhang, D.; Sun, Y.; Nanehkaran, Y. Nanehkaran. Using deep transfer learning for image-based plant disease identification. Comput. Electron. Agric. 2020, 173, 105393. [Google Scholar] [CrossRef]
Sheng, Q.; Hua, L.; Cheng, L.; Si, W.; San, W.H. Adaptive activation functions in convolutional neural networks. Neurocomputing 2018, 272, 204–212. [Google Scholar] [CrossRef]
Andrea, A.; Francesco, D.; Francesco, I.; Roberto, P. A survey on modern trainable activation functions. Neural Netw. 2021, 138, 14–32. [Google Scholar]
Munender, V.; Pravendra, S. Optimizing nonlinear activation function for convolutional neural networks. Signal Image Video Process. 2021, 15, 1323–1330. [Google Scholar] [CrossRef]
Agyepong Jonas, T.; Mostafa, S.; Yasutaka, W.; Keiji, K.; Mahdy Ahmed, E.L. Secure image inference using pairwise activation functions. IEEE Access 2021, 9, 118271–118290. [Google Scholar] [CrossRef]
Singh, B.V.; Vinay, K. Linearized sigmoidal activation: A novel activation function with tractable non-linear characteristics to boost representation capability. Expert Syst. Appl. 2019, 120, 346–356. [Google Scholar] [CrossRef]
Privietha, P.; Raj, V.J. Hybrid Activation Function in Deep Learning for Gait Analysis. In Proceedings of the 2022 International Virtual Conference on Power Engineering Computing and Control: Developments in Electric Vehicles and Energy Sector for Sustainable Future (PECCON), Chennai, India, 5–6 May 2022; Volume 5, p. 21989965. [Google Scholar] [CrossRef]
Yuan, S.; Xiaoping, L.; Hongtao, S.; Guiqiang, W. Horizontal distribution of collembola in urban forest foundation of Northeast Forest University. J. Eng. Heilongjiang Univ. 2013, 4, 51–54. [Google Scholar] [CrossRef]
Korznikov, K.A.; Kislov, D.E.; Altman, J.; Doležal, J.; Vozmishcheva, A.S.; Krestov, P.V. Using U-Net-Like Deep Convolutional Neural Networks for Precise Tree Recognition in Very High Resolution RGB (Red, Green, Blue) Satellite Images. Forests 2021, 12, 66. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Al-gaashani, M.S.; Shang, F.; Muthanna, M.S.; Khayyat, M.; Abd El-Latif, A.A. Tomato leaf disease classification by exploiting transfer learning and feature concatenation. IET Image Process. 2022, 16, 913–925. [Google Scholar] [CrossRef]
Khan, A.; Nawaz, U.; Ulhaq, A.; Randall, W. Robinson. Real-time plant health assessment via implementing cloud-based scalable transfer learning on AWS DeepLens. PLoS ONE 2020, 15, e0243243. [Google Scholar] [CrossRef]
Ohn, I.; Kim, Y. Smooth Funciton Approximation by Deep Neural Networks with General Activation Functions. Entropy 2019, 21, 627. [Google Scholar] [CrossRef] [Green Version]
Xin, W.; Yi, Q.; Yi, W.; Sheng, X.; Haizhou, C. ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing 2019, 363, 88–98. [Google Scholar] [CrossRef]
Shuihua, W.; Preetha, P.; Yuxiu, S.; Bin, L.; Ming, Y.; Hong, C. Classification of Alzheimer’s Disease Based on Eight-Layer Convolutional Neural Network with Leaky Rectified Linear Unit and Max Pooling. J. Med. Syst. 2018, 42, 85. [Google Scholar] [CrossRef]
Ying, Y.; Su, J.; Shan, P.; Miao, L.; Wang, X.; Peng, S. Rectified Exponential Units for Convolutional Neural Networks. IEEE Access 2019, 7, 101633. [Google Scholar] [CrossRef]
Wang, S.H.; Muhammad, K.; Hong, J.; Sangaiah, A.K.; Zhang, Y.D. Alcoholism identification via convolutional neural network based on parametric ReLU, dropout, and batch normalization. Neural Comput. Appl. 2020, 32, 665–680. [Google Scholar] [CrossRef]
Shi, P.; Li, G.; Yuan, Y.; Huang, G.; Kuang, L. Prediction of dissolved oxygen content in aquaculture using Clustering-based Softplus Extreme Learning Machine. Comput. Electron. Agric. 2019, 157, 329–338. [Google Scholar] [CrossRef]
Jinsakul, N.; Tsai, C.F.; Tsai, C.E.; Wu, P. Enhancement of Deep Learning in Image Classification Performance Using Xception with the Swish Activation Function for Colorectal Polyp Preliminary Screening. Mathematics 2019, 7, 1170. [Google Scholar] [CrossRef] [Green Version]
Thangaraj, R.; Anandamurugan, S.; Kaliappan, V.K. Automated tomato leaf disease classification using transfer learning-based deep convolution neural network. J. Plant Dis. Prot. 2021, 128, 73–86. [Google Scholar] [CrossRef]
Yan, Q.; Yang, B.; Wang, W.; Wang, B.; Chen, P.; Zhang, J. Apple Leaf Diseases Recognition Based on An Improved Convolutional Neural Network. Sensors 2020, 20, 3535. [Google Scholar] [CrossRef]
Zhang, B.; Mu, H.; Gao, M.; Ni, H.; Chen, J.; Yang, H.; Qi, D. A Novel Multi-Scale Attention PFE-UNet for Forest Image Segmentation. Forests 2021, 12, 937. [Google Scholar] [CrossRef]
Mohammed, B.; Kamel, B.; Abdelouahab, M. Deep Learning for Tomato Diseases: Classification and Symptoms Visualization. Appl. Artif. Intell. 2017, 31, 299–315. [Google Scholar] [CrossRef]
Abdul, K.; Edi, N.L.; Adhi, S.; Insap, S.P. Foliage plant retrieval using polar Fourier transform, color moments and vein features. Signal Image Process. 2011, 2, 1–13. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
Karen, S.; Andrew, Z. Very Deep Convolutional Networks for Large-Scale Image Recognition. Int. Conf. Learn. Represent. 2015, 1, 1097–1105. [Google Scholar] [CrossRef]
Han, L.S.; Seng, C.C.; Joseph, M.S.; Paolo, R. How deep learning extracts and learns leaf features for plant classification. Pattern Recognit. 2017, 71, 1–13. [Google Scholar] [CrossRef]
Meet, S.; Singha Sougata, P.; Suyash, A. Leaf classification using marginalized shape context and shape+texture dual-path deep convolutional neural network. IEEE Int. Conf. Image Process. (ICIP) 2017, 1, 860–864. [Google Scholar] [CrossRef]
Sujith, A. Plant Leaf Classification and Comparative Analysis of Combined Feature Set Using Machine Learning Techniques. Traitement Du Signal 2021, 38, 1587–1598. [Google Scholar] [CrossRef]
Aydin, K.; Seydi, K.A.; Cagatay, C.; Yalin, Y.H.; Huseyin, T. Analysis of transfer learning for deep neural network based plant classification models. Comput. Electron. Agric. 2019, 158, 20–29. [Google Scholar] [CrossRef]

Figure 1. The satellite image of the Urban Forestry Demonstration Base of Northeast Forestry University and its geographical location in China.

Figure 2. Leaf images used for classification, one sample for each type, with the number corresponding to the number in Table 1.

Figure 3. The basic residual building block for ResNet 50.

Figure 4. The waveforms of Tanh, ReLU, LReLU, ELU, and Softplus.

Figure 5. The waveform of CReLU activation function (where CReLU6 represents the ceiling factor c takes 6, and CReLU represents the ceiling factor c takes 10).

Figure 6. The waveform of the Swish activation function when the parameter

β

takes different values.

Figure 6. The waveform of the Swish activation function when the parameter

β

takes different values.

Figure 7. Trends in validation accuracy for transfer learning on ResNet50 with seven different activation functions.

Figure 8. Trends of validation loss for transfer learning on ResNet50 with seven different activation functions.

Figure 9. Residual block of mixed activation functions.

Figure 10. Validation accuracy and loss curves for twenty ResNet50 networks improved by mixed activation functions. (where (a–e) are the network training results of taking ReLU, LReLU, ELU, Softplus, and Swish1 as the first activation function after addition operation, and the other four functions are separately used at other locations of the network structure).

Figure 11. Activation function curves of ELU, Softplus, and Swish1, which are related to e-index.

Figure 12. Performance comparison of four networks: where (a–f) are separately precision, recall, F₁-score, validation accuracy, validation loss, and training time.

Figure 13. The comparison of convergence speeds of four network structures, where (a) is validation accuracy curve and (b) is the validation loss curve.

Figure 14. The confusion matrix of leaf small datasets by using GoogLeNet.

Figure 15. The confusion matrix of leaf small datasets by using VGG-16.

Figure 16. The confusion matrix of leaf small datasets by using ResNet50.

Figure 17. The confusion matrix of leaf small datasets by using ES-ResNet50.

Figure 18. Sample images of the Flavia leaf dataset, one image per species.

Figure 19. Sample images of the Swedish leaf dataset, one image per class.

Table 1. Details of the 15 tree leaves.

Number	Scientific Name	Size
1	Acer negundo Linn	22
2	Acer pictumThunb. ex Murray	22
3	Amygdalus triloba (Lindl.) Ricker	19
4	Armeniaca sibirica (L.) Lam	27
5	Fraxinus mandshurica Rupr.	25
6	Juglans mandshurica	30
7	Lonicera maackii (Rupr.) Maxim	21
8	Phellodendron amurense Rupr	22
9	Populus davidiana Dode	25
10	Quercus mongolica Fisch. ex Ledeb	17
11	Salix matsudana Koidz	32
12	Tilia amurensis Rupr.	33
13	Tilia mandshurica Rup et Maxim	24
14	Ulmus pumila Linn	29
15	White birch	21

Table 2. The architecture of ResNet50.

Layer Name	Output Size	50-Layer
Conv1	112 × 112	7 × 7, 64, stride 2
		3 × 3 max pool, stride 2
Conv2_x	56 × 56	$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$
Conv3_x	28 × 28	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 4$
Conv4_x	14 × 14	$[\begin{matrix} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 6$
Conv5_x	7 × 7	$[\begin{matrix} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$
	1 × 1	average pool, 1000-d fc, softmax

Table 3. Basic training parameters in the experiments.

Number	Parameter Name	Value	Meaning
1	Epoch	6	Training times
2	Batch size	25	Iterations per Epoch
3	Learning rate	0.0001	Initial learning rate
4	Maximum iterations	150	The maximum iterations for the entire training

Table 4. Training results of seven ResNet50 network architectures improved by single activation function.

Activation Function	BVA (%)	FVA (%)	BVL	FVL	OT (s)	ET (s)
ReLU	99.08	97.25	0.2489	0.2762	910	943
Tanh	63.30	55.96	1.4648	1.5435	835	1049
LReLU	99.08	97.25	0.2278	0.2554	802	834
ELU	95.41	92.66	0.2116	0.2395	959	1042
CReLU6	77.98	72.48	1.4361	1.3994	794	862
Softplus	97.25	96.33	0.1929	0.1849	965	1401
Swish1	92.66	91.74	0.6687	0.3352	846	880

Table 5. Training results of twenty ResNet50 network architectures improved by mixed activation functions.

Activation Function	BVA (%)	FVA (%)	BVL	FVL	OT (s)	ET (s)
ReLU-LReLU	99.08	96.33	0.2282	0.2572	842	882
ReLU-ELU	95.41	92.66	0.2264	0.2413	888	1019
ReLU-Softplus	98.17	95.41	0.1723	0.2113	1034	1076
ReLU-Swish1	99.08	96.33	0.2171	0.2425	845	879
LReLU-ReLU	99.08	97.25	0.2480	0.2765	859	894
LReLU-ELU	95.41	92.66	0.2233	0.2378	982	1128
LReLU-Softplus	98.17	95.41	0.1706	0.2085	994	1033
LReLU-Swish1	99.08	97.25	0.2382	0.2882	1011	1045
ELU-ReLU	98.17	97.25	0.2525	0.2138	836	993
ELU-LReLU	98.17	97.25	0.2085	0.1974	680	983
ELU-Softplus	98.17	93.58	0.1772	0.1899	1030	1157
ELU-Swish1	100.00	98.17	0.1604	0.1885	1005	1045
Softplus-ReLU	98.17	97.25	0.2757	0.2526	850	1251
Softplus-LReLU	99.08	97.25	0.2498	0.2358	871	1269
Softplus-ELU	97.25	96.33	0.1968	0.1902	1189	1332
Softplus-Swish1	97.25	96.33	0.2166	0.2032	1058	1247
Swish1-ReLU	96.33	94.50	0.4198	0.3352	683	885
Swish1-LReLU	96.33	93.58	0.3537	0.3079	616	891
Swish1-ELU	96.33	94.50	0.2357	0.3352	849	958
Swish1-Softplus	96.33	93.58	0.2031	0.3079	1132	1188

Table 6. Comparison with other methods on the Flavia leaf dataset.

Method	Algorithm Types	Validation Accuracy (%)
AlexNet+L2 [8,39]	deep learning	88.99
VGG-16+L2 [8,40]	deep learning	93.53
Deep-Plant [8,41]	deep learning	98.22
Dual-path CNN [42]	deep learning	99.28
ANN LBP+GIST+PHOG [43]	machine learning	97.50
MTD and LBP-HF [8]	shape-based	99.16
CNN-RNN [44]	transfer learning	92.65
ResNet50	transfer learning	99.10
ES-ResNet50	transfer learning	99.30

Table 7. Comparison with other methods on the Swedish leaf dataset.

Method	Algorithm Types	Validation Accuracy (%)
AlexNet+L2 [8,39]	deep learning	95.75
VGG-16+L2 [8,40]	deep learning	95.84
Deep-Plant [8,41]	deep learning	97.54
Dual-path CNN [42]	deep learning	96.28
ANN LBP+GIST+PHOG [43]	machine learning	98.99
MTD and LBP-HF [8]	shape-based	98.48
CNN-RNN [44]	transfer learning	99.11
ResNet50	transfer learning	99.09
ES-ResNet50	transfer learning	99.39

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, R.; Zhu, Y.; Ge, Z.; Mu, H.; Qi, D.; Ni, H. Transfer Learning for Leaf Small Dataset Using Improved ResNet50 Network with Mixed Activation Functions. Forests 2022, 13, 2072. https://doi.org/10.3390/f13122072

AMA Style

Zhang R, Zhu Y, Ge Z, Mu H, Qi D, Ni H. Transfer Learning for Leaf Small Dataset Using Improved ResNet50 Network with Mixed Activation Functions. Forests. 2022; 13(12):2072. https://doi.org/10.3390/f13122072

Chicago/Turabian Style

Zhang, Ruolei, Yijun Zhu, Zhangshangjie Ge, Hongbo Mu, Dawei Qi, and Haiming Ni. 2022. "Transfer Learning for Leaf Small Dataset Using Improved ResNet50 Network with Mixed Activation Functions" Forests 13, no. 12: 2072. https://doi.org/10.3390/f13122072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transfer Learning for Leaf Small Dataset Using Improved ResNet50 Network with Mixed Activation Functions

Abstract

1. Introduction

2. Materials and Methods

2.1. Leaf Dataset

2.2. ResNet50

2.3. Transfer Learning

2.4. Activation Function

2.5. Server Configuration and Training Parameter Settings

2.6. Evaluation Index System

3. Results and Discussion

3.1. Improving ResNet50 with Single Activation Function

3.2. Improving ResNet50 with Mixed Activation Function

3.3. Comparison with Other Deep Neural Networks

3.4. Tests on Other Leaf Datasets

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI