1. Introduction
Allergic diseases are common and are found in clinical practice [
1]. They are listed as one of the three major diseases that need to be prevented and controlled in the 21st century by the World Health Organization. Pollen allergens are one of the main causes and affect up to 30% of the population in industrialized countries [
2]. Certain quantities of plant pollen allergens in the air can induce a series of allergic diseases, such as allergic rhinitis, bronchial asthma and dermatitis. These allergic diseases that are caused by plant pollen are also called hay fever. With the intensified urbanization of human society and the expansion of planting areas, the levels of pollen allergens have also increased. This has made pollen allergies become seasonal epidemic diseases that are accompanied by high incidence rates. In the United States, pollen allergies affect approximately 5% of the population and as much as 15% of the population in some regions. In Europe, they affect 20% of the population and it has been estimated that 30% of the population could be affected in the next 20 years [
3]. In China, the incidence rate of pollen allergies is generally 0.5% to 1%, but it reaches 5% in high incidence areas [
4]. This percentage has been increasing over recent years as well. Nearly 30% of allergic rhinitis cases in China are caused by pollen allergens. In Beijing, hay fever patients account for about 1/3 of all respiratory allergy patients.
Effective pollen prediction can prompt hay fever patients to take positive countermeasures. Therefore, it is of great practical significance to identify and count the main allergenic pollen grains in the air. According to our survey, few pioneering works have made significant progress in this topic in China at present. As an international metropolis, the urban population in Beijing has shown an upward trend in pollen allergy incidence rate over recent years. As a result, the proportion of the population with pollen allergies is much higher than that in other cities. Considering this fact, we constructed an allergenic pollen dataset of pollen that is mainly found in the Beijing area and explored the automatic classification and identification of the allergenic pollen grains.
As pollen grains vary greatly in size and shape, the identification of their morphological structures is a complex task [
5]. Traditional pollen identification methods are generally realized by manually observing the surface characteristics of the pollen grains. In the manual identification process, experts need to possess the relevant knowledge and practical experience in pollen morphology. This process takes considerable amounts of time and resources, which limits research progress [
6]. To solve these drawbacks, the automatic classification of pollen images has attracted more and more attention by utilizing computer vision and pattern recognition [
7].
There are two main types of methods for the classification of pollen images that are based on computer vision: the traditional methods of manually extracting features and deep learning-based methods. The traditional methods rely on manually extracting features such as color, texture, shape and other features. The commonly used descriptors for texture features include the gray-level co-occurrence matrix (GLCM) [
8], local binary patterns (LBPs) [
9], log-Gabor filters (LGFs) [
10] and discrete Tchebichef moments (DTMs) [
11]. In [
12], Marcos et al. artificially combined these four texture features and their classification accuracy reached 94.83% using a specific pollen dataset. In order to integrate richer pollen information, some studies have combined shape and texture features for pollen grain classification [
7,
13,
14,
15]. For example, Tello-Mijares et al. [
14] utilized geometric descriptors, first-order texture statistics and second-order GLCM-based texture statistics, which were obtained from the L*a*b* color space. Their accuracy result could reach 95.6% using a 12-class pollen dataset.
Although the methods that use manual feature extraction have received a great amount of attention, their classification performance can be affected by many factors, such as image quality and descriptor selection. Due to the success of convolutional neural networks (CNNs) in the field of computer vision, pollen image classification methods that are based on deep learning have gradually become mainstream. In [
16], Daood et al. proposed an automatic feature learning network for pollen image classification using fine-tuned transfer learning. They achieved an accuracy rate of 94% using a 30-class pollen dataset. In [
17], André et al. studied two methods for training CNNs using a large-scale pollen image dataset: random initialization and fine-tuned feature learning. Their results showed that the process of fine-tuning a pre-trained CNN achieved an accuracy of 96.24%, which was much higher than that of the random initialization method. In [
18], Gallardo-Caballero et al. focused on the three-dimensional visual information of pollen grains. Their process of analyzing each sample was recorded in MJPEG video format in order to make use of the 3D information that was presented by several focal planes. They used a training set of 251 videos and their detection results achieved scores of 98.54% for recall and 99.75% for precision.
However, these deep learning methods can only perform transfer learning using CNNs. Therefore, they do not focus on the characteristics of pollen images. The shape characteristics of different pollen grains vary greatly, which results in the resolutions of pollen grain images spanning a large range. In [
19], Sevillano et al. studied the influence of pollen image size and proposed a pre-processing algorithm. Their pre-processing algorithm cropped the pollen grains from the original images while keeping a minimum amount of padding around them and maintaining their size, so that all of the images could be reframed into 227 × 227 images. However, some pollen grains were too small, which resulted in low image resolutions and unclear details. The algorithm in [
19] was ineffective for improving the quality of pollen grain images. These unfavorable factors affect the accuracy of image classification. Therefore, it is not sufficient to train a self-collected pollen image dataset directly using a pre-trained CNN with a single normalized resolution.
To solve the above limitations, we studied an automatic classification method for allergenic pollen grains in the Beijing area. The proposed work produced the following main contributions:
The construction of a multi-class and large-scale dataset of pollen grain images under a light microscope, which is the first large-scale dataset of pollen images from the Beijing area;
The proposal of a deblurring pipeline for super-resolution reconstruction based on a GAN, which effectively improved the quality of the images by reconstructing the semantic information of the pollen images;
The proposal of an easy-to-implement and efficient multi-scale deep learning architecture with a multi-branch network for pollen image classification so that images in different resolution ranges could be trained using the corresponding branches, which significantly improved the classification performance.
This paper is organized as follows. In
Section 2, we review several previous related works about pollen datasets and introduce the main current automatic classification methods. In
Section 3, we describe the specific process for the construction of the large-scale dataset of pollen images from the Beijing area. Then, the super-resolution reconstruction module that enhances the quality of blurred pollen images is proposed. Next, a multi-scale classification method is introduced for the automatic classification of the proposed dataset. The experimental results and ablation study are shown in
Section 4 and we present our conclusions and directions for future research in
Section 5.
4. Experiments
In the experiments, we utilized the pre-trained VGGNet [
36] and ResNet [
37] architectures that were trained using the 2012 ImageNet for the transfer learning. In [
17], the authors tested ResNet with different numbers of layers, from 18 to 152, and found that the accuracy did not improve with an increase in the number of layers. Thus, we adopted VGG-19 and Resnet-50 as the backbone networks for the experiments. The accuracy and F1-score were used as the metrics to measure the performance of the different classification methods with 5-fold cross-validation. In each fold, we chose 80% of the images from each category for training and 20% for testing.
For the parameter settings, the learning rate was set to 0.001, the batch size was 64 and the number of epochs was 1000. Adagrad [
41] was used as the optimizer and the model performance for the testing set was evaluated every 10 epochs. To avoid overfitting [
42], we introduced an early stop function to stop training when the performance of the model was no longer improving. The model performance was evaluated using the testing set every five epochs. The training process stopped when the accuracy of the testing set did not increase within 50 epochs.
4.1. Results and Analysis
In this section, we first present the performance results of the proposed multi-scale classifier. The number of branches was set to two. Following the selector addition and deblurring processes, the samples were transferred to the corresponding branch network, according to their resolution. The backbone networks of the two branches were the same. The resolution ranges of the two branches were 0–280 and 281–650. Their normalized resolutions were then set to 224 × 224 and 448 × 448, respectively.
Table 4 shows the number of samples in the different categories of these two branches. Since the number of samples in the different categories of our dataset was imbalanced, we used the undersample approach to solve this problem. Specifically, the number of
Artemisia samples in Branch 1, Chenopodiaceae samples in Branch 2 and Cupressaceae samples in Branch 2 were relatively large. In order to eliminate the impact of this data imbalance on the classifier, we randomly picked 400
Artemisia samples for Branch 1 and 300 Chenopodiaceae and Cupressaceae samples for Branch 2 (see
Table 4).
We used five metrics to evaluate the performance of the multi-scale classifier with the VGG-19 and Resnet-50 backbones: precision, recall (or true positive rate, TPR), specificity (or true negative rate, TNR), F1-score and accuracy. Out of these metrics, F1-score and accuracy were selected as the primary criteria. In the proposed multi-scale classifier, we adopted a selector to pick out the images with low resolutions and then transfer them to the deblurring process. The whole process was named “Multi-Scale + Selector + Deblur”. The classification results for the VGG-19 and Resnet-50 backbones are shown in
Table 5 and
Table 6, respectively. The classification results include the precision, recall, specificity and F1-score for each pollen category, the average F1-score and the total accuracy. From the tables, it can be seen that the total accuracy of the multi-scale classifier was 0.977 with the Resnet-50 backbone. It also yielded an average F1-score of 0.966 using our proposed dataset. Meanwhile, the multi-scale classifier with the VGG-19 backbone achieved a total accuracy of 0.968 and an average F1-score of 0.949.
Figure 8 shows the confusion matrices of the proposed method, based on the VGG-19 and Resnet-50 backbones. It can be seen that the performance of the Resnet-50 backbone in each category was better overall than the performance of the VGG-19 backbone.
4.2. Ablation Study
In this paper, we mainly used our deblurring pipeline and multi-scale classifier for the automatic pollen grain classification task. In order to demonstrate the effectiveness of each aspect clearly, we conducted ablation experiments to illustrate the contributions of each process.
To verify the effectiveness of the deblurring pipeline, we tested the classification performance of the deblurring pipeline using single-scale classifiers. Results were recorded for the “Single-Scale” and “Single-Scale + Deblur” processes. “Single-Scale” means that a single-scale classifier was used, which was based on the VGG-19 or ResNet-50 backbones, and “Single-Scale + Deblur” represents the single-scale classifier being used with the deblurring pipeline. The image resolutions of the results from the two processes are shown in
Figure 9a,b. Thus, we set the normalized resolution to 112 × 112 for the "Single-Scale” process. For “Single-Scale + Deblur”, all of the images in our proposed dataset were deblurred using SRGAN and the normalized resolution was set to 448 × 448.
To verify the effectiveness of our multi-scale classifier, we compared the results from the “Single-Scale + Selector + Deblur” and “Multi-Scale + Selector + Deblur” processes. The normalized resolution was set to 336 × 336 for the single-scale classifier, while the settings for the multi-scale classifier were those that were introduced in
Section 4.1. All of the samples in our proposed dataset were processed using our selector addition and deblurring pipeline, the resolution range of which is shown in
Figure 9c. It can be seen that the image resolutions in
Figure 9a were concentrated under 100 × 100 and that the image resolutions in
Figure 9b were concentrated under 400 × 400. Meanwhile, the resolution distribution of samples in the two classifiers was consistent. However, the resolution distribution in
Figure 9c could be divided into two resolution ranges, with each range covering the samples of all of the pollen categories, which was more suitable for training the multi-scale classifier. The samples in
Figure 9c not only had a higher image quality but could also ensure the feature learning of all of the categories in each branch.
Table 7 shows the classification results from the four cases, with the two groups of ablation experiments for each of the VGG-19 and Resnet-50 backbones. In the first group of ablation experiments, it was obvious that the accuracy and F1-score of the “Single-Scale + Deblur” process was higher than that of the “Single-Scale” process. The “Single-Scale + Deblur” process with the Resnet-50 backbone achieved an accuracy of 0.962 and an F1-score of 0.941, which proved that using SRGAN to deblur images was effective for improving the classification performance. Secondly, the performance of the multi-scale classifier was generally better than that of the single-scale classifier. The highest accuracy was 0.977, which was achieved using the two-branch multi-scale classifier with the Resnet-50 backbone. The F1-score for this process was 0.966, which proved that the multi-scale classification structure was effective for improving the performance of pollen image classification.
5. Conclusions and Future Works
In this paper, we constructed a large-scale, high-quality pollen image dataset for the automatic pollen grain classification task in the Beijing area. We utilized the color extraction method to locate pollen grains in scanning images, which reduced the required human resources. Moreover, we designed a deblurring pipeline to enhance the image quality by learning semantic features. We also proposed an easy-to-implement and efficient multi-scale classifier because of the large resolution range of the training dataset. We also investigated the performance of our classifier for the pollen grain classification task and proved its excellent performance. The proposed work could be of great significance for pollen grain classification research in the Beijing area and could have important enlightening significance in the quest to improve the image quality and classification performance of self-collected pollen grain datasets.
In future studies, we aim to extend the number of categories and samples in our dataset for Beijing area. In the process of image cropping during the data pre-processing, we could not ensure that a pollen grain would not be split between two or more images, so we plan to use object detection algorithms to locate the pollen grains in the scanning images to reduce the loss of pollen samples during pre-processing. In terms of experiments, we aim to introduce weighted averages to improve the elimination of data imbalances. We also hope to improve the training efficiency of the classifier by reducing the parameters of the classifier model.