A Class-Incremental Detection Method of Remote Sensing Images Based on Selective Distillation
Abstract
:1. Introduction
- We propose a method of class-incremental detection of remote sensing images based on selective distillation, which can avoid knowledge conflicts on objects of new classes while retaining the ability to detect objects of historical classes, retaining important knowledge that is gainful for incremental learning and accomplishing effective integrated learning on old and new knowledge.
- We propose a novel evaluation metric C-SP for incremental detection methods, which avoids the influence on the incremental learning result from the learning difficulty of the old and new classes themselves, directly evaluates the model stability and plasticity performance, and uses the harmonic mean to comprehensively evaluate the model stability and plasticity performance.
- The performance comparison experiments, visualization analysis, method validity analysis, and hyperparameter analysis experiments are conducted on the widely used remote sensing image object detection dataset, and the experimental results verify the superiority and validity of the method.
2. Related works
2.1. Object Detection for Remote Sensing Image
2.2. Incremental Learning Based on Knowledge Distillation
3. Methods
3.1. Preliminary Knowledge
3.2. Selective Distillation Class Incremental Detection Framework
3.3. Selective Distillation Strategy
3.3.1. Selection of Input Image Distillation
3.3.2. Selection of Region Proposal Box
3.3.3. Calculation of Distillation Loss
4. Experiment Settings
4.1. Baselines
- Joint fine-tuning: learning all data, i.e., data of new class and historical class, to update the parameters of the original detection model.
- Fine-tuning: Learning only the currently added data, i.e., data of new class, to update the parameters of the original detection model.
- ILOD method: Class-incremental detection method based on knowledge distillation, which calculates distillation loss on the region candidate box determination results.
- Faster ILOD method: Class-incremental detection method based on knowledge distillation, which calculates distillation loss on the full graph features, RPN output, and regional candidate box determination results.
4.2. Datasets
4.3. Evaluation
4.3.1. The Detection Accuracy of Model after Incremental Learning
4.3.2. Stability and Plasticity of Model after Incremental Learning
4.4. Training Settings
5. Results
5.1. Performance Comparison
5.1.1. Experimental Results on the DOTA Dataset
5.1.2. Experimental Results on the DIOR Dataset
5.2. Prediction Results Analysis
5.3. Efficiency Analysis
5.3.1. Distillation Input Step
5.3.2. Region Proposal Box Selection Step
5.3.3. Distillation Loss Calculation Step
5.4. Hyperparameter Analysis
6. Discussion
6.1. Advantages
- Remote sensing images capture a wide range of surface objects, and existing class-incremental detection methods only consider objects of new class in new data, lacking insight into the phenomenon that objects of a new class may exist in historic data; this paper fills the research gap in this part. The ultimate goal of the class-incremental detection method is that the target model should maintain the ability to correctly detect historic classes in previous data when data increments occur, but also correct the previous learning experience of learning new classes as background classes based on the label information of new classes in new data, and detect objects of new class as the corresponding foreground classes.
- The distillation strategies of existing knowledge distillation-based class-incremental detection methods and the stability and plasticity performance of the methods are summarized as the basis of the proposed method in this paper. In the existing class-incremental detection methods based on knowledge distillation, the student model learns the knowledge of new class while receiving the knowledge distilled from the teacher model. When the remote sensing objects of the new class appear in the historical class data in an unlabeled manner, the teacher model determines the new class objects as background rather than as foreground with semantic information. The undifferentiated distillation strategy on the teacher model, on the other hand, leads to the problem of inefficient learning of knowledge of new class objects.
- The proposed method utilizes a selective distillation strategy to limit the scope of knowledge distillation, which well solves the negative effect of the original knowledge distillation strategy on the model plasticity. From the experimental results of the performance comparison, we can see that the method can simultaneously take care of the model plasticity and stability, and can achieve the highest detection accuracy on all classes, including historical classes and new classes.
6.2. Deficiency and Prospect
- For algorithm evaluation, we only evaluated the algorithm on two publicly available remote sensing image object detection datasets, which suffer from two problems: on the one hand, the data size is small, while we believe that the public class incremental conflict situation will be more severe in larger scale image object detection tasks; on the other hand, there are fewer conflicting classes in the dataset, which also limits the performance of the algorithm.
- Regarding algorithm theory, we found that the mask’s design has an essential impact on the algorithm. However, the intrinsic factors of the mask affecting the algorithm performance are unclear and need further exploration.
- The class-incremental detection method of remote sensing based on knowledge selective distillation proposed in this paper is oriented to the incremental data scenario of new class data. The incremental data in the real world is complex and diverse, and there may be both similar instance data and new class data. It is a very important research direction for the future to study the incremental detection method of adaptive multi-type remote sensing object data increment to meet the detection demand more comprehensively.
- The research on remote sensing incremental detection method in this paper focuses on one-time incremental learning under the current moment. However, the increment of remote sensing data exists for a long time, and if the existing knowledge fusion mechanism for one-time incremental learning is used to complete multiple incremental learning, each incremental learning requires knowledge fusion, which may lead to the problem of low stability of the learned knowledge when the data differences of multiple increments are too large. Therefore, it is necessary to study remote sensing increment detection methods that can be oriented to long-range increments of remote sensing images.
- There have been many studies on designing improvement modules for detecting small objects, multi-scale, complex backgrounds, dense distribution, shape differences, and other detection difficulties on remote sensing images, and their model accuracy is higher than the classical Faster R-CNN model. In this paper, the object detection models are all classical Faster R-CNN models, and the improvement modules that can improve the detection accuracy are not added to the incremental detection methods. It is also one of the future research directions to study incremental detection methods for remote sensing image detection difficulties.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AlexNet | ImageNet classification with deep convolutional neural networks |
AP | Average precision |
AP50 | The AP below the threshold 0.5 of IoU |
CNN | Convolution neural network |
C-SP | Combined stability-plasticity performance |
Fast R-CNN | Upgraded version of R-CNN |
Faster R-CNN | Upgraded version of Fast R-CNN |
GoogleNet | Going deeper with convolutions |
GPU | Graphic processing unit |
LwF | Learning without forgetting |
R-CNN | Region-based CNN |
ResNet | Residual network |
RPN | Region Proposal Networks |
SDCID | Class-incremental Detection method based on Selective Distillation |
VGG | Visual geometry group |
References
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Qin, W.; Song, T.; Liu, J.; Wang, H.; Liang, Z. Remote Sensing Military Target Detection Algorithm Based on Lightweight YOLOv3. Comput. Eng. Appl. 2021, 57, 7. [Google Scholar]
- McCloskey, M.; Cohen, N.J. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation; Elsevier: Amsterdam, The Netherlands, 1989; Volume 24, pp. 109–165. [Google Scholar]
- Ratcliff, R. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychol. Rev. 1990, 97, 285. [Google Scholar] [CrossRef] [PubMed]
- French, R.M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 1999, 3, 128–135. [Google Scholar] [CrossRef]
- Shmelkov, K.; Schmid, C.; Alahari, K. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3400–3409. [Google Scholar]
- Li, D.; Tasci, S.; Ghosh, S.; Zhu, J.; Zhang, J.; Heck, L. RILOD: Near real-time incremental learning for object detection at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, Arlington, VI, USA, 7–9 November 2019; pp. 113–126. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Peng, C.; Zhao, K.; Lovell, B.C. Faster ilod: Incremental learning for object detectors based on faster rcnn. Pattern Recognit. Lett. 2020, 140, 109–115. [Google Scholar] [CrossRef]
- Shuai, T.; Sun, K.; Shi, B.; Chen, J. A ship target automatic recognition method for sub-meter remote sensing images. In Proceedings of the 2016 4th International Workshop on Earth Observation and Remote Sensing Applications (EORSA), Guangzhou, China, 4–6 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 153–156. [Google Scholar]
- Shi, Z.; Yu, X.; Jiang, Z.; Li, B. Ship detection in high-resolution optical imagery based on anomaly detector and local shape feature. IEEE Trans. Geosci. Remote Sens. 2013, 52, 4511–4523. [Google Scholar]
- Konstantinidis, D.; Stathaki, T.; Argyriou, V.; Grammalidis, N. Building detection using enhanced HOG–LBP features and region refinement processes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 888–905. [Google Scholar] [CrossRef] [Green Version]
- Tuermer, S.; Kurz, F.; Reinartz, P.; Stilla, U. Airborne vehicle detection in dense urban areas using HoG features and disparity maps. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2327–2337. [Google Scholar] [CrossRef]
- Zheng, J.; Xi, Y.; Feng, M.; Li, X.; Li, N. Object detection based on BING in optical remote sensing images. In Proceedings of the 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Datong, China, 15–17 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 504–509. [Google Scholar]
- Song, Z.; Sui, H.; Wang, Y. Automatic ship detection for optical satellite images based on visual attention model and LBP. In Proceedings of the 2014 IEEE Workshop on Electronics, Computer and Applications, Ottawa, ON, Canada, 8–9 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 722–725. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Yang, X.; Yan, J.; Liao, W.; Yang, X.; Tang, J.; He, T. Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans. Pattern Anal. Mach. Intell. 2022. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Ding, J.; Guo, H.; Cheng, W.; Pan, T.; Yang, W. Mask OBB: A semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images. Remote Sens. 2019, 11, 2930. [Google Scholar] [CrossRef]
- Yang, F.; Li, W.; Hu, H.; Li, W.; Wang, P. Multi-scale feature integrated attention-based rotation network for object detection in VHR aerial images. Sensors 2020, 20, 1686. [Google Scholar] [CrossRef] [Green Version]
- Li, C.; Xu, C.; Cui, Z.; Wang, D.; Zhang, T.; Yang, J. Feature-attentioned object detection in remote sensing imagery. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3886–3890. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.025312. [Google Scholar]
- Li, Z.; Hoiem, D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [Green Version]
- Dhar, P.; Singh, R.V.; Peng, K.C.; Wu, Z.; Chellappa, R. Learning without memorizing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5138–5146. [Google Scholar]
- Douillard, A.; Cord, M.; Ollion, C.; Robert, T.; Valle, E. Podnet: Pooled outputs distillation for small-tasks incremental learning. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 86–102. [Google Scholar]
- Chen, Q.; Wang, S.; Chen, L. Incremental detection of remote sensing objects with feature pyramid and knowledge distillation. IEEE Geosci. Remote Sens. 2020, 60, 1–13. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Zhu, Q.; Zhong, Y.; Zhao, B.; Xia, G.S.; Zhang, L. Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2016, 13, 747–751. [Google Scholar] [CrossRef]
Model Setting | AP50 | S (%) | P (%) | C-SP (%) | |||
---|---|---|---|---|---|---|---|
The First 8 Class | The Last 7 Class | All Class | |||||
Learn all data at once | 55.54 | 63.63 | |||||
Incremental learning | Joint fine-tuning | 57.21 | 54.94 | 56.15 | 103.01 | 86.34 | 94.50 |
Fine-tuning | 31.90 | 59.30 | 44.69 | 57.44 | 93.20 | 69.97 | |
ILOD + Faster R-CNN | 55.51 | 48.30 | 52.14 | 99.95 | 75.91 | 87.08 | |
Faster ILOD | 56.07 | 47.27 | 51.96 | 100.95 | 74.29 | 86.47 | |
SDCID(M1) | 55.44 | 53.53 | 54.55 | 99.82 | 84.13 | 91.83 | |
SDCID(M2) | 53.63 | 54.42 | 54.00 | 96.56 | 85.53 | 91.08 |
Class | ILOD + Faster R-CNN | Faster ILOD | SDCID (M1) | SDCID (M2) |
---|---|---|---|---|
Large vehicle | 67.14 | 66.91 | 67.79 | 67.53 |
Track and field | 63.17 | 63.53 | 60.47 | 61.35 |
Helicopter | 48.94 | 49.48 | 46.18 | 36.03 |
Bridge | 42.16 | 42.84 | 40.89 | 39.91 |
Roundabout | 59.79 | 63.48 | 64.13 | 63.03 |
Soccer field | 40.99 | 40.48 | 42.43 | 39.03 |
Ship | 66.51 | 66.51 | 66.36 | 67.05 |
Oil tank | 55.39 | 55.31 | 55.25 | 55.09 |
Tennis court | 78.84 | 77.55 | 84.02 | 84.83 |
Aircraft | 74.72 | 74.21 | 79.46 | 79.96 |
Diamond | 43.39 | 43.72 | 52.69 | 54.13 |
Small vehicle | 42.65 | 42.56 | 45.39 | 46.04 |
Basketball Court | 27.94 | 25.74 | 30.86 | 34.58 |
Port | 38.15 | 36.52 | 46.65 | 46.57 |
Swimming pool | 32.38 | 30.59 | 35.62 | 34.82 |
Model Setting | AP50 | S (%) | P (%) | C-SP (%) | |||
---|---|---|---|---|---|---|---|
The First 8 Class | The Last 7 Class | All Class | |||||
Learn all data at once | 59.74 | 51.21 | |||||
Incremental learning | Joint fine-tuning | 61.97 | 39.40 | 50.69 | 103.74 | 76.93 | 88.35 |
Fine-tuning | 48.81 | 44.56 | 46.68 | 81.70 | 87.02 | 84.27 | |
ILOD + Faster R-CNN | 59.68 | 31.13 | 45.41 | 99.91 | 60.79 | 75.58 | |
Faster ILOD | 60.10 | 30.14 | 45.12 | 100.60 | 58.86 | 74.27 | |
SDCID(M1) | 59.35 | 37.70 | 48.52 | 99.34 | 73.62 | 84.57 | |
SDCID(M2) | 59.27 | 37.54 | 48.41 | 99.22 | 73.31 | 84.32 |
Class | ILOD + Faster R-CNN | Faster ILOD | SDCID(M1) | SDCID(M2) |
---|---|---|---|---|
Aircraft | 48.43 | 49.01 | 48.16 | 49.56 |
Airport | 73.19 | 74.01 | 73.45 | 71.94 |
Baseball field | 65.59 | 65.74 | 65.61 | 65.88 |
Basketball field | 83.82 | 83.98 | 83.34 | 83.70 |
Bridge | 10.55 | 11.20 | 11.72 | 10.22 |
Chimney | 73.95 | 73.85 | 74.01 | 74.26 |
Dam | 50.36 | 51.67 | 50.41 | 50.36 |
Expressway service area | 68.41 | 69.11 | 67.45 | 67.39 |
Expressway toll-station | 51.35 | 51.03 | 47.67 | 48.43 |
Golf course | 71.19 | 71.41 | 71.64 | 71.00 |
Track and field | 41.08 | 35.96 | 51.69 | 51.96 |
Port | 21.62 | 21.93 | 23.86 | 24.01 |
Flyover | 33.75 | 33.96 | 35.59 | 36.81 |
Ship | 27.19 | 26.51 | 41.67 | 39.93 |
Court | 42.53 | 40.19 | 44.86 | 44.87 |
Oil tank | 14.78 | 14.50 | 17.43 | 17.85 |
Tennis court | 67.02 | 66.26 | 68.95 | 68.57 |
Railway station | 10.66 | 11.86 | 18.37 | 16.58 |
Vehicle | 16.98 | 16.38 | 21.37 | 20.65 |
Windmill | 35.67 | 33.86 | 53.23 | 54.20 |
Selective Mask | AP50 | S (%) | P (%) | C-SP (%) |
---|---|---|---|---|
Gaussian blur mask (M1) | 54.55 | 99.82 | 84.13 | 91.83 |
Natural image mask (M2) | 54.00 | 96.56 | 85.53 | 91.08 |
Object overlay with natural images (M3) | 53.50 | 96.45 | 83.97 | 90.20 |
Remote sensing image mask (M4) | 53.34 | 94.98 | 84.90 | 89.99 |
Object overlay with remote sensing images (M5) | 53.74 | 96.65 | 84.55 | 90.60 |
Number of Region Proposal Boxes | AP50 | S (%) | P (%) | C-SP (%) | |
---|---|---|---|---|---|
From Teacher Model | From Student Model | ||||
0 | 64 | 54.13 | 100.31 | 82.24 | 90.98 |
8 | 56 | 53.61 | 101.85 | 78.95 | 89.71 |
16 | 48 | 53.43 | 100.96 | 79.21 | 89.49 |
32 | 32 | 53.58 | 101.58 | 79.10 | 89.69 |
48 | 16 | 53.51 | 100.66 | 79.80 | 89.72 |
56 | 8 | 53.27 | 100.34 | 79.29 | 89.28 |
64 | 0 | 54.55 | 99.82 | 84.13 | 91.83 |
Number of Region Proposal Boxes | AP50 | S (%) | P (%) | C-SP (%) | |
---|---|---|---|---|---|
From Teacher Model | From Student Model | ||||
0 | 64 | 53.88 | 96.96 | 84.72 | 90.84 |
8 | 56 | 53.01 | 96.91 | 81.86 | 89.25 |
16 | 48 | 53.41 | 98.01 | 82.10 | 89.88 |
32 | 32 | 53.46 | 98.60 | 81.68 | 89.90 |
48 | 16 | 53.41 | 98.01 | 82.10 | 89.88 |
56 | 8 | 53.27 | 100.34 | 79.29 | 89.28 |
64 | 0 | 54.00 | 96.56 | 85.53 | 91.08 |
Calculation of Classification Results for Student Models in Selective Distillation Loss | AP50 | S (%) | P (%) | C-SP (%) |
---|---|---|---|---|
Original | 53.89 | 98.04 | 83.69 | 90.77 |
Reconstructed | 54.55 | 99.82 | 84.13 | 91.83 |
Calculation of Classification Results for Student Models in Selective Distillation Loss | AP50 | S (%) | P (%) | C-SP (%) |
---|---|---|---|---|
Original | 53.41 | 95.25 | 84.85 | 90.09 |
Reconstructed | 54.00 | 96.56 | 85.53 | 91.08 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ruan, H.; Peng, J.; Chen, Y.; He, S.; Zhang, Z.; Li, H. A Class-Incremental Detection Method of Remote Sensing Images Based on Selective Distillation. Symmetry 2022, 14, 2100. https://doi.org/10.3390/sym14102100
Ruan H, Peng J, Chen Y, He S, Zhang Z, Li H. A Class-Incremental Detection Method of Remote Sensing Images Based on Selective Distillation. Symmetry. 2022; 14(10):2100. https://doi.org/10.3390/sym14102100
Chicago/Turabian StyleRuan, Hang, Jian Peng, Ye Chen, Silu He, Zhenshi Zhang, and Haifeng Li. 2022. "A Class-Incremental Detection Method of Remote Sensing Images Based on Selective Distillation" Symmetry 14, no. 10: 2100. https://doi.org/10.3390/sym14102100