Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification
Abstract
:1. Introduction
- Provide best practices for transferring knowledge from pre-trained CNNs with small weights;
- Show that deeper CNNs suffer from the vanishing gradient problem, and then propose a simple yet efficient solution to combat this effect using an auxiliary loss function;
- Confirm experimentally, that this simple trick allows settling new state-of-the-art results on several benchmark datasets with low computational cost (fine-tune for a maximum 40 iterations).
2. Related Works
3. Inception Networks and EfficientNets
3.1. Inception Networks
3.1.1. GoogLeNet (Inception-v1)
3.1.2. Inception-v3
3.2. EfficientNets
4. Proposed Fine-Tuning Method
5. Experiments
5.1. Dataset Description
5.2. Experimental Set-Up
5.3. Experiments on Inception-v3
5.4. Experiments on GoogLeNet
5.5. Sensitivity Analysis with Respect to the Training Size
5.6. Experiments using EfficientNets
6. Discussions
7. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016, arXiv:1602.07261. [Google Scholar]
- Sun, H.; Li, S.; Zheng, X.; Lu, X. Remote Sensing Scene Classification by Gated Bidirectional Network. IEEE Trans. Geosci. Remote Sens. 2019, 1–15. [Google Scholar] [CrossRef]
- Castelluccio, M.; Poggi, G.; Sansone, C.; Verdoliva, L. Land Use Classification in Remote Sensing Images by Convolutional Neural Networks. arXiv 2015, arXiv:1508.00092. [Google Scholar]
- Cheng, G.; Yang, C.; Yao, X.; Guo, L.; Han, J. When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2811–2821. [Google Scholar] [CrossRef]
- Nogueira, K.; Penatti, O.A.B.; Santos, J.A. Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognit. 2017, 61, 539–556. [Google Scholar] [CrossRef] [Green Version]
- Boualleg, Y.; Farah, M.; Farah, I.R. Remote Sensing Scene Classification Using Convolutional Features and Deep Forest Classifier. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1944–1948. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Scene Classification With Recurrent Attention of VHR Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1155–1167. [Google Scholar] [CrossRef]
- Singh, P.; Komodakis, N. Improving Recognition of Complex Aerial Scenes Using a Deep Weakly Supervised Learning Paradigm. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1932–1936. [Google Scholar] [CrossRef]
- Liu, Y.; Huang, C. Scene Classification via Triplet Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 220–237. [Google Scholar] [CrossRef]
- Wu, H.; Liu, B.; Su, W.; Zhang, W.; Sun, J. Deep Filter Banks for Land-Use Scene Classification. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1895–1899. [Google Scholar] [CrossRef]
- Zhang, F.; Du, B.; Zhang, L. Scene Classification via a Gradient Boosting Random Convolutional Network Framework. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1793–1802. [Google Scholar] [CrossRef]
- Cheng, G.; Li, Z.; Yao, X.; Li, K.; Wei, Z. Remote Sensing Image Scene Classification Using Bag of Convolutional Features. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1735–1739. [Google Scholar] [CrossRef]
- Othman, E.; Bazi, Y.; Alajlan, N.; Alhichri, H.; Melgani, F. Using convolutional features and a sparse autoencoder for land-use scene classification. Int. J. Remote Sens. 2016, 37, 2149–2167. [Google Scholar] [CrossRef]
- Weng, Q.; Mao, Z.; Lin, J.; Guo, W. Land-Use Classification via Extreme Learning Classifier Based on Deep Convolutional Features. IEEE Geosci. Remote Sens. Lett. 2017, 14, 704–708. [Google Scholar] [CrossRef]
- Liu, Q.; Hang, R.; Song, H.; Li, Z. Learning Multiscale Deep Features for High-Resolution Satellite Image Scene Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 117–126. [Google Scholar] [CrossRef]
- Liu, Y.; Zhong, Y.; Fei, F.; Zhu, Q.; Qin, Q. Scene Classification Based on a Deep Random-Scale Stretched Convolutional Neural Network. Remote Sens. 2018, 10, 444. [Google Scholar] [CrossRef] [Green Version]
- Alhichri, H.; Alajlan, N.; Bazi, Y.; Rabczuk, T. Multi-Scale Convolutional Neural Network for Remote Sensing Scene Classification. In Proceedings of the 2018 IEEE International Conference on Electro/Information Technology (EIT), Rochester, MI, USA, 3–5 May 2018; pp. 1–5. [Google Scholar]
- Wang, J.; Liu, W.; Ma, L.; Chen, H.; Chen, L. IORN: An Effective Remote Sensing Image Scene Classification Framework. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1695–1699. [Google Scholar] [CrossRef]
- Liu, Y.; Zhong, Y.; Qin, Q. Scene Classification Based on Multiscale Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens 2018, 56, 7109–7121. [Google Scholar] [CrossRef] [Green Version]
- Gong, Z.; Zhong, P.; Yu, Y.; Hu, W. Diversity-Promoting Deep Structural Metric Learning for Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 371–390. [Google Scholar] [CrossRef]
- Yu, Y.; Liu, F. Aerial Scene Classification via Multilevel Fusion Based on Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 287–291. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, Y.; Ding, L. Scene Classification Based on Two-Stage Deep Feature Fusion. IEEE Geosci. Remote Sens. Lett. 2018, 15, 183–186. [Google Scholar] [CrossRef]
- Marmanis, D.; Datcu, M.; Esch, T.; Stilla, U. Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks. IEEE Geosci. Remote Sens. Lett. 2016, 13, 105–109. [Google Scholar] [CrossRef] [Green Version]
- Wang, G.; Fan, B.; Xiang, S.; Pan, C. Aggregating Rich Hierarchical Features for Scene Classification in Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4104–4115. [Google Scholar] [CrossRef]
- Li, E.; Xia, J.; Du, P.; Lin, C.; Samat, A. Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5653–5665. [Google Scholar] [CrossRef]
- Chaib, S.; Liu, H.; Gu, Y.; Yao, H. Deep Feature Fusion for VHR Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4775–4784. [Google Scholar] [CrossRef]
- Hasanpour, S.H.; Rouhani, M.; Fayyaz, M.; Sabokrou, M. Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures. arXiv 2016, arXiv:1608.06037. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Yang, Y.; Newsam, S. Bag-of-visual-words and Spatial Extensions for Land-use Classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; ACM: New York, NY, USA, 2010; pp. 270–279. [Google Scholar]
- Xia, G.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Othman, E.; Bazi, Y.; Melgani, F.; Alhichri, H.; Alajlan, N.; Zuair, M. Domain Adaptation Network for Cross-Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4441–4456. [Google Scholar] [CrossRef]
- Othman, E.; Bazi, Y.; Alhichri, H. Remote_Sensing_Dataset-Google Drive. Available online: http://bit.ly/ksa_dataset (accessed on 5 May 2019).
- Ullmann, J.R. Experiments with the n-tuple Method of Pattern Recognition. IEEE Trans. Comput. 1969, 100, 1135–1137. [Google Scholar] [CrossRef]
- He, N.; Fang, L.; Li, S.; Plaza, A.; Plaza, J. Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6899–6910. [Google Scholar] [CrossRef]
- Zhou, Y.; Liu, X.; Zhao, J.; Ma, D.; Yao, R.; Liu, B.; Zheng, Y. Remote sensing scene classification based on rotation-invariant feature learning and joint decision making. J. Image Video Proc. 2019, 2019, 3. [Google Scholar] [CrossRef] [Green Version]
- Yang, Z.; Mu, X.; Zhao, F. Scene classification of remote sensing image based on deep network and multi-scale features fusion. Optik 2018, 171, 287–293. [Google Scholar] [CrossRef]
- Liang, Y.; Monteiro, S.T.; Saber, E.S. Transfer learning for high resolution aerial image classification. In Proceedings of the 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 18–20 October 2016; pp. 1–8. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Merced dataset | ||||
Work | Method | CNN | Acc [%] | Train [%] |
Castelluccio et al. [6] 2015 | SGD, with a learning rate of 0.001, and 20,000 iterations. | GoogLeNet | 97.10% | 80% |
Cheng et al. [7] 2018 | Adam, with a learning rate of 0.001 for the classification layer, and 0.001 for the other layers. The iteration number changes from 1000 to 15,000 with a stride of 1000. | VGG16+SVM GoogLeNet+SVM AlexNet+SVM | 96.82 ± 0.20 97.14 ± 0.10 94.58 ± 0.11 | 80% |
Nogueira et al. [8] 2017 | SGD, with a learning rate of 0.001, and 20,000 iterations. | GoogLeNet | 97.78 ± 0.97 | 80% |
Sun et al. [5] 2019 | SGD 0.0001 learning rate, 50 iterations | VGG16 | 97.14 ± 0.48 96.57 ± 0.48 | 80% 50% |
AID dataset | ||||
Sun et al. [5] 2019 | SGD 0.0001 learning rate, 50 iterations | VGG16 VGG16 | 93.60 ± 0.64 89.49 ± 0.34 | 50% 20% |
NWPU dataset | ||||
Boualleg et al. [9] 2019 | 0.01 learning rate for the last layer, 0.001 for other layer, 15,000 iterations | VGGNet16: GoogLeNet: AlexNet: | 87.15 ± 0.45 82.57 ± 0.12 81.22 ± 0.19 | 10% |
Boualleg et al. [9] 2019 | 0.01 learning rate for the last layer, 0.001 for other layer, 15,000 iterations | VGGNet16: GoogLeNet: AlexNet: | 90.36 ± 0.18 86.02 ± 0.18 85.16 ± 0.18 | 20% |
Cheng et al. [10] 2017 | VGG16 | 84.56 | 10% | |
Optimal-31 dataset | ||||
Wang et al. [11] 2018 | No details were provided for the used parameters. | VGG16 GoogLeNet AlexNet | 87.45 ± 0.45 82.57 ± 0.12 81.22 ± 0.19 | 80% |
Sun et al. [5] 2019 | SGD 0.0001 learning rate, 50 iterations | VGG16 | 89.52 ± 0.26 | 80% |
Network | #Parameters |
---|---|
AlexNet | 60.97 M |
VGG16 | 138.36 M |
GoogLeNet [1] | 7 M |
Inception-v3 [3] | 23.83 M |
Inception-v4 [4] | 42.71 M |
Incep-Res-v2 [4] | 55.97 M |
CNN | Placement of the Auxiliary Softmax | Layer Output |
---|---|---|
GoogLeNet | Mixed_4f_Concatenatated | 16 × 16 × 832 |
Inception-v3 | Mixed7 | 14 × 14 × 768 |
EfficientNet-B0 | Swish34 | 16 × 16 × 672 |
EfficientNet-B3 | Swish54 | 16 × 16 × 816 |
Parameter | Settings |
---|---|
Optimizer | RMSprop |
Learning parameter | Initial:0.0001, then decreased by a factor of 1/10 after each 20 iterations |
Moving average | 0.9 |
Maximum number of epochs | 40 |
Loss contribution | 0.5 |
Images size | 256 × 256 pixels |
Mini-batch size | 50 |
Number of trials for each experiment | 5 |
Data augmentation | Not used |
Inception Mixed Layer | Layer Size | Merced 50% Train | AID 20% Train | NWPU 10% Train | Optimal-31 80% Train | KSA 20% Train |
---|---|---|---|---|---|---|
Mixed4 | (12,12,768) | 95.33 ± 0.43 | 90.17 ± 0.23 | 84.20 ± 0.25 | 90.57 ± 0.30 | 94.11 ± 0.42 |
Mixed5 | (12,12,768) | 96.57 ± 0.28 | 91.16 ± 0.31 | 85.75 ± 0.30 | 91.66 ± 0.34 | 94.58 ± 0.38 |
Mixed6 | (12,12,768) | 97.06 ± 0.36 | 91.72 ± 0.10 | 86.92 ± 0.28 | 92.02 ± 0.87 | 94.80 ± 0.33 |
Mixed7 | (12,12,768) | 97.67 ± 0.29 | 92.28 ± 0.35 | 88.24 ± 0.43 | 93.11 ± 1.04 | 95.03 ± 0.50 |
Mixed8 | (5,5,1280) | 96.78 ± 0.67 | 91.17 ± 0.37 | 87.11 ± 0.31 | 91.88 ± 1.03 | 93.38 ± 0.66 |
Mixed9 | (5,5,2048) | 96.62 ± 0.38 | 91.87 ± 0.37 | 87.89 ± 0.10 | 92.79 ± 1.02 | 94.38 ± 0.55 |
Mixed10 | (5,5,2048) | 95.41 ± 0.42 | 90.39 ± 0.29 | 87.23 ± 0.36 | 91.45 ± 1.10 | 92.27 ± 1.65 |
Dataset | Without Auxiliary | Auxiliary Softmax | Auxiliary Conv + Softmax |
---|---|---|---|
Merced | 95.41 ± 0.42 | 97.35 ± 0.43 | 97.63 ± 0.20 |
AID | 90.39 ± 0.29 | 92.69 ± 0.34 | 93.52 ± 0.21 |
NWPU | 87.23 ± 0.36 | 89.28 ± 0.29 | 89.32 ± 0.33 |
Optimal31 | 91.45 ± 1.10 | 93.81 ± 0.51 | 94.13 ± 0.35 |
KSA | 92.27 ± 1.65 | 95.55 ± 0.25 | 96.36 ± 0.24 |
Average | 91.35 ± 0.76 | 93.73 ± 0.36 | 94.19 ± 0.27 |
Mixed_4f | Without Auxiliary | With Auxiliary Softmax | With Auxiliary Conv + Softmax | |
---|---|---|---|---|
Merced | 97.12 ± 0.21 | 97.04 ± 0.26 | 97.52 ± 0.28 | 97.90 ± 0.34 |
AID | 91.09 ± 0.20 | 92.09 ± 0.14 | 92.58 ± 0.22 | 93.25 ± 0.33 |
NWPU | 85.52 ± 0.17 | 88.16 ± 0.13 | 88.06 ± 0.34 | 89.22 ± 0.25 |
Optimal-31 | 90.64 ± 0.90 | 92.36 ± 0.60 | 92.63 ± 0.36 | 93.11 ± 0.55 |
KSA | 94.32 ± 0.68 | 95.10 ± 0.53 | 95.46 ± 0.50 | 96.14 ± 0.39 |
Average | 91.74 ± 0.43 | 92.95 ± 0.33 | 93.25 ± 0.34 | 93.92 ± 0.37 |
EfficeintNet-B0 | EfficeintNet-B3 | |||
---|---|---|---|---|
Dataset | Without Auxiliary | With Auxiliary | Without Auxiliary | With Auxiliary |
Merced (Train 50%) | 97.69 ± 0.41 | 98.01 ± 0.45 | 97.33 ± 0.48 | 98.22 ± 0.49 |
AID (Train 20%) | 93.61 ± 0.27 | 93.69 ± 0.11 | 92.64 ± 0.24 | 94.19 ± 0.15 |
NWPU (Train 10%) | 89.83 ± 0.15 | 89.96 ± 0.27 | 89.46 ± 0.17 | 91.08 ± 0.14 |
Optimal-31 (Train 80%) | 92.58 ± 0.92 | 93.97 ± 0.13 | 93.92 ± 0.73 | 94.51 ± 0.75 |
KSA (Train 20%) | 95.71 ± 0.31 | 96.26 ± 0.35 | 95.35 ± 0.76 | 96.29 ± 0.49 |
Average | 94.08 ± 0.26 | 94.58 ± 0.24 | 93.74 ± 0.47 | 94.85 ± 0.40 |
Method | 80% Train | 50% Train |
---|---|---|
ARCNet-VGG16 [11] | 99.12 ± 0.40 | 96.81 ± 0.14 |
VGG16+MSCP [39] | 98.36 ± 0.58 | --- |
Siamese ResNet50+RD [40] | 94.50 | 91.71 |
OverfeatL+IFK [41] | 98.91 | --- |
Triplet networks [13] | 97.99 ± 0.53 | --- |
MCNN [23] | 96.66 ± 0.90 | |
GoogLeNet+SVM [35] | 94.31 ± 0.89 | 92.70 ± 0.60 |
AlexNet [42] | 95.00 ± 1.74 | - |
VGG16+IFK [25] | 98.57 ± 0.34 | |
D-DSML-CaffeNet [24] | 95.76 ± 1.70 | |
ResNet [42] | 97.19 ± 0.57 | |
Fusion by addition [30] | 97.42 ± 1.79 | |
VGG16+EMR [28] | 98.14 | |
Fine-tuning VGG16 [5] | 97.14 ± 0.48 | 96.57 ± 0.38 |
GBNet [5] | 96.90 ± 0.23 | 95.71 ± 0.19 |
GBNet+global feature [5] | 98.57 ± 0.48 | 97.05 ± 0.19 |
Fine-tuning GoogLeNet [6] | 97.10 | --- |
Inception-v3-aux [ours] | 98.80 ± 0.26 | 97.63 ± 0.20 |
GoogLeNet-aux [ours] | 99.00 ± 0.46 | 97.90 ± 0.34 |
EfficientNet-B0-aux [ours] | 99.04 ± 0.33 | 98.01 ± 0.45 |
EfficientNet-B3-aux [ours] | 99.09 ± 0.17 | 98.22 ± 0.49 |
Method | 50% Train | 20% Train |
---|---|---|
ARCNet-VGG16 [11] | 93.10 ± 0.55 | 88.75 ± 0.40 |
VGG16+MSCP [39] | 94.42 ± 0.17 | 91.52 ± 0.21 |
MCNN [23] | 91.80 ± 0.22 | --- |
Fusion by addition [30] | 91.87 ± 0.36 | |
Multilevel fusion [25] | 95.36 ± 0.22 | |
VGG16 (fine-tuning) [5] | 93.60 ± 0.64 | 89.49 ± 0.34 |
GBNet+global feature [5] | 95.48 ± 0.12 | 92.20 ± 0.23 |
GoogLeNet+SVM [35] | 86.39 ± 0.55 | 83.44 ± 0.40 |
CaffeNet [35] | 89.53 ± 0.31 | 86.86 ± 0.47 |
VGG16 [35] | 89.64 ± 0.36 | 86.59 ± 0.29 |
Inception-v3-aux [ours] | 95.64 ± 0.20 | 93.52 ± 0.21 |
GoogLeNet-aux [ours] | 95.54 ± 0.12 | 93.25 ± 0.33 |
EfficientNet-B0-aux [ours] | 96.17 ± 0.16 | 93.69 ± 0.11 |
EfficientNet-B3-aux [ours] | 96.56 ± 0.14 | 94.19 ± 0.15 |
Method | 10% Train | 20% Train |
---|---|---|
VGG16+MSCP [39] | 85.33 ± 0.17 | 88.93 ± 0.14 |
Triplet networks [13] | --- | 92.33 ± 0.20 |
Fine-tuning VGG16 [10] | 87.15 ± 0.45 | 90.36 ± 0.18 |
Fine-tuning GoogLeNet [10] | 82.57 ± 0.12 | 86.02 ± 0.18 |
Inception-v3-aux [ours] | 89.32 ± 0.33 | 92.18 ± 0.11 |
GoogLeNet-aux [ours] | 89.22 ± 0.25 | 91.63 ± 0.11 |
EfficientNet-B0-aux [ours] | 89.96 ± 0.27 | 92.89 ± 0.16 |
EfficientNet-B3-aux [ours] | 91.08 ± 0.14 | 93.81 ± 0.07 |
Method | 80% Train |
---|---|
ARCNet-VGG16 [11] | 92.70 ± 0.35 |
ARCNet-AlexNet [11] | 85.75 + 0.35 |
ARCNet-ResNet [11] | 91.28 + 0.45 |
Fine-tuning GoogLeNet [11] | 82.57 ± 0.12 |
Fine-tuning VGG16 [11] | 87.45 ± 0.45 |
Fine-tuning AlexNet [11] | 81.22 ± 0.19 |
VGG16 [35] | 89.12 ± 0.35 |
Fine-tuning VGG16 [5] | 89.52 ± 0.26 |
GBNet [5] | 91.40 ± 0.27 |
GBNet+global feature [5] | 93.28 ± 0.27 |
Inception-v3-aux [ours] | 94.13 ± 0.35 |
GoogLeNet-aux [ours] | 93.11 ± 0.55 |
EfficientNet-B0-aux [ours] | 93.97 ± 0.13 |
EfficientNet-B3-aux [ours] | 94.51 ± 0.75 |
Merced | Optimal-31 | AID | |
---|---|---|---|
50% Train | 80% Train | 20% Train | |
EfficientNet-B3-aux | 98.22 ± 0.49 14 minutes | 94.51 ± 0.75 20 minutes | 94.19 ± 0.15 27 minutes |
EfficientNet-B3-aux-aug | 98.38 ± 0.30 42 minutes | 95.26 ± 0.46 46 minutes | 95.56 ± 0.23 1 hour |
Merced 50% Train | Optimal-31 80% Train | |
---|---|---|
Softmax | 93.90 ± 0.62 | 85.86 ± 1.67 |
Dense(128)+Softmax | 94.34 ± 0.46 | 85.06 ± 1.70 |
Dense(128)+Dense(128)+Softmax | 94.03 ± 0.71 | 86.02 ± 1.57 |
EfficientNet-B0-aux | 98.01 ± 0.45 | 94.51 ± 0.75 |
Merced | Optimal-31 | AID | |
---|---|---|---|
50% Train | 80% Train | 20% Train | |
DensNets169 | 98.15 ± 0.35 11 minutes | 93.76 ± 1.06 15 minutes | 94.38 ± 0.26 20 minutes |
DensNets169-aux | 98.22 ± 0.24 17 minutes | 94.67 ± 0.31 19 minutes | 94.88 ± 0.19 30 minutes |
DensNets169-aux-aug | 98.64 ± 0.33 48 minutes | 95.37 ± 0.69 44 minutes | 95.82 ± 0.12 1.20 hours |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bazi, Y.; Al Rahhal, M.M.; Alhichri, H.; Alajlan, N. Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification. Remote Sens. 2019, 11, 2908. https://doi.org/10.3390/rs11242908
Bazi Y, Al Rahhal MM, Alhichri H, Alajlan N. Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification. Remote Sensing. 2019; 11(24):2908. https://doi.org/10.3390/rs11242908
Chicago/Turabian StyleBazi, Yakoub, Mohamad M. Al Rahhal, Haikel Alhichri, and Naif Alajlan. 2019. "Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification" Remote Sensing 11, no. 24: 2908. https://doi.org/10.3390/rs11242908
APA StyleBazi, Y., Al Rahhal, M. M., Alhichri, H., & Alajlan, N. (2019). Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification. Remote Sensing, 11(24), 2908. https://doi.org/10.3390/rs11242908