JPSSL: SAR Terrain Classification Based on Jigsaw Puzzles and FC-CRF
Abstract
:1. Introduction
- Considering that SAR images contain a variety of terrain types, it is not easy to extract effective features of different terrains from SAR images. A jigsaw puzzle pretext task for SAR images is designed. This task can be learned from the image itself through a large amount of unlabeled data, and the features extracted by the network are more discriminative. Models learned through this task can learn rich data representations that have strong generalization capabilities.
- A jigsaw puzzle self-supervised learning framework (JPSSL) for the SAR image terrain classification task is proposed. This framework has a low dependence on data. With a few negligible-cost patch-level data, JPSSL can automatically capture image feature representation in the pretext task and effectively transfer it to the downstream task, achieving superior performance compared to supervised methods under the same conditions in terrain classification.
- The proposed framework in this paper can perform terrain classification on SAR images of different granularities and has achieved excellent experimental results on SAR images of different resolutions and scenes.
2. Method
2.1. Pretext Task
2.1.1. Data Collection
2.1.2. Pseudo-Label Acquisition
2.1.3. Pretext Task Process
2.2. Downstream Terrain Classification Task
2.2.1. Task Process
2.2.2. Fully Connected CRFs
Algorithm 1 The training process of JPSSL. |
|
3. Experiments
3.1. Experimental Data and Evaluation Indicators
3.1.1. The 25-Class SAR Scene Data
3.1.2. Large-Scene High-Resolution SAR Image Data
3.1.3. Evaluation Indicators
- Pretext TaskThe evaluation indicators of the pretext tasks under two different experiments are consistent. First, a permutation is randomly selected from the pre-defined permutation set, and image blocks are shuffled according to this permutation. The shuffled image blocks are input into the network to obtain a probability vector. The index value of the largest value in the probability vector is the predicted result, and the accuracy is calculated using the real index value of the permutation. The jigsaw puzzle pretext task can be regarded as a multi-classification problem, and the results of the pretext task are displayed in the form of classification accuracy. The is used to represent the accuracy rate of the pretext task, where represents the number of positive classes predicted as positive classes, represents the number of negative classes predicted as negative classes, and N represents the total amount of data.: Pretext task accuracy.
- Downstream Task
3.2. 25-Class SAR Scene Data Classification
3.2.1. Experimental Settings for Pretext and Downstream Tasks
3.2.2. Analysis of Influencing Factors
- Permutation Set SizeAs the method of pseudo-label acquisition, the permutation set’s size affects both the pretext and downstream tasks. To explore the effect of the size of the permutation set on the 25-class SAR image scene classification task, experiments on the permutation set size parameter are conducted. The training data for the downstream task are ten randomly selected data for each category, and the weights of all convolutional layers are frozen. The accuracy rates for the pretext and downstream tasks with different permutation set sizes are shown in Table 1. 0.635 is the highest accuracy for the downstream task, shown in bold in Table 1.It can be seen from the experimental results in Table 1 that as the permutation set size increases, the difficulty of the jigsaw puzzle pretext task increases, which leads to a decrease in the accuracy of the pretext task. The main measure of the performance of the self-supervised task is how well the downstream task performs, and our ultimate goal is to determine the parameters that perform best in the downstream task. As can be seen from the results of the downstream task in Table 1, although the pretext task accuracy is the highest when the permutation set size is set to 50, the accuracy of the downstream task is lower at this point. Better downstream task results are achieved for the permutation set size of 80.
- Frozen LayersThe model obtained from the pretext task for the downstream task is the transfer learning process. In the experiments shown in Table 1, the transfer learning method is adopted to transfer the weights of all convolutional layers to the downstream task and freeze them, which means the gradient is not updated during the training process. Table 2 shows the downstream task accuracy using different freezing layer methods when the permutation set size is 80. The training data used are consistent with those in Table 1.In Table 2, Method I freezes all convolutional layer weights, Method II freezes all weights before the last convolutional layer, Method III freezes all weights before the last two convolutional layers, and Method IV freezes all weights before the last three convolutional layers. Bold in Table 2 indicates the highest accuracy. It can be concluded from Table 2 that the downstream task achieves the highest accuracy when all weights before the last convolutional layer are frozen. In the subsequent experiments, the permutation set size was set to 80, and the parameters of all weights before the last convolutional layer were frozen during transfer learning.
- Amount of Labeled Training Data
Permutation Set Size | 50 | 80 | 100 | 120 | 200 | 300 | 500 |
---|---|---|---|---|---|---|---|
Pretext task | 0.81 | 0.76 | 0.72 | 0.71 | 0.61 | 0.52 | 0.45 |
Downstream task | 0.51 | 0.635 | 0.63 | 0.57 | 0.59 | 0.43 | 0.37 |
How to Freeze Parameters | Permutation Set Size | Downstream Task Accuracy |
---|---|---|
Method I | 80 | 0.635 |
Method II | 80 | 0.726 |
Method III | 80 | 0.67 |
Method IV | 80 | 0.65 |
Downstream Task Accuracy | |||||
---|---|---|---|---|---|
Amount of Data per Class | 3 | 10 | 25 | 35 | 50 |
JPSSL(no pretraining) | 0.304 | 0.479 | 0.715 | 0.82 | 0.9 |
JPSSL | 0.501 | 0.731 | 0.854 | 0.874 | 0.9 |
3.2.3. Image Retrieval
3.3. Terrain Classification of High-Resolution Large-Scene SAR Images
3.3.1. Experimental Settings for Pretext and Downstream Tasks
3.3.2. Analysis of Influencing Factors
- Cut Image Patch SizeWhen the low-cost data acquisition method is used, patch-level data need to be selected in each type of large-scale aggregation area in SAR images. The size of the cut image patches affects the results of the pretext and downstream tasks. The size of the JiuJiang SAR image is 8000 × 8000, and the image resolution is 3 m. Image patches that are cut too small cannot effectively provide the context information of the image, and image patches cut too large lead to the degradation of classification performance. Experiments on the size of the cut image patches are conducted.Due to the relatively small size of the JiuJiang SAR image, five different cut image patch sizes are used for our experiments. Bold in Table 4 indicates the highest values of different indicators. As can be seen from Table 4, the accuracy of the pretext task is higher when the image patch sizes are 75 × 75, 120 × 120, and 150 × 150. Images can be divided into nine blocks without gaps in these three cases. Meanwhile, images cannot be evenly divided into nine blocks when the image patch sizes are 50 × 50, 150 × 150, and 100 × 100. The performance of self-supervised tasks mainly concerns the performance of downstream tasks. It can be seen from Table 4 that when the cut image patch size is 75 × 75, the PA of the model can reach 85.3%, the MIoU can reach 68.8%, and the overall classification performance is the best. Therefore, the image patch size is set to 75 × 75 for the JiuJiang SAR image.
- Permutation Set Size and Normalization MethodAs the pseudo-label acquisition method of this task, the size of the permutation set has an impact on SAR image terrain classification. The permutations of the pretext task according to the maximum Hamming distance proposed in Section 2.1.2 are selected and combined to form the permutation set. The cut image patches of 75 × 75 pixels are used to conduct experiments on the JiuJiang SAR image. For the training process of the downstream task, 20 labeled training data for each terrain category are selected. For the downstream task training process, the scene classification experiment is used. The performance of scene classification is positively correlated with the performance of final terrain classification and is relatively simpler. Table 5 shows the performance of the pretext and downstream tasks for different permutation set sizes.It can be seen from Table 5 that as the size of the permutation set increases, the difficulty of the pretext task increases, and the accuracy of the pretext task decreases. However, it can be seen from Table 5 that the accuracy of the downstream task is almost constant, and a slight change in the accuracy of the scene classification task causes little change on the terrain classification task. The accuracy of downstream tasks tends to be consistent under different permutation set sizes through multiple experiments. It can be concluded that the permutation set size has little impact on the terrain classification result. Considering the calculation cost and the final performance, and to be consistent with Section 3.2.2, the permutation set size in the terrain classification experiment is set to 80.To better solve the SAR image jigsaw puzzle problem, normalization methods for image blocks have been researched. Normalization can reduce internal covariance so the model can be trained effectively. The normalization methods for the nine image blocks used for the jigsaw puzzle task are explored in Table 6, and the results under different normalization methods for the pretext and downstream tasks are shown in Table 7. Bold in Table 7 indicates the highest accuracy for different tasks.As seen in Table 7, the best results are obtained in both the pretext and downstream tasks when using mode I. Therefore, mode I is used for the jigsaw puzzle task. The same pattern applies to the other data as well.
- Amount of Labeled Training DataThe amount of labeled training data affects the performance of the final terrain classification task. Self-supervised learning aims to reduce reliance on labeled data and achieve better results with a small amount of labeled data. The effect of the amount of labeled data is explored under the best hyperparameters of the above experiment. For the JiuJiang SAR image, the cut image patch with the size of 75 × 75 pixels is used, and the size of the central area selected when cropping the image from the SAR image is 25 × 25. Table 8 shows the results with and without the transfer pre-trained model.The bold in Table 8 indicates the highest values of PA and MIoU. It can be seen from Table 8 that when 20 labeled training data are selected for each category, the performance of the model is the best, and the classification performance is improved to a certain extent relative to the supervised method. The same pattern applies to other data as well.
- Central Prediction Area SizeThe choice of the central prediction area size of the cut image patch impacts the model’s classification accuracy. The larger the central area size, the coarser the classification, but the more efficient it is. The smaller the central area size, the finer the classification, but the less efficient it is. Therefore, choosing the appropriate size of the central prediction area is important. Based on the above experiments, experiments on different central prediction area sizes are carried out using JiuJiang data. Table 9 shows the classification performance of different central prediction area sizes. Bold in Table 9 indicates the highest values of different indicators.The experimental results are consistent with the theoretical analysis. When the size of the central prediction area is reduced from 25 × 25 to 15 × 15, the model’s classification performance is improved by less than 0.1% but the training time and memory are significantly increased. Considering classification performance and efficiency issues, the central area is set to 25 × 25 pixels.
- Selection of Labeled DataWhen selecting a small amount of labeled data for the downstream task, images from different aggregation areas for each category should be selected evenly. If the selected data are not sufficiently representative, it leads to a decrease in the performance of the model. In the JiuJiang data set, the images of water in different large-scale aggregation areas are slightly different, so we experiment with the selection of labeled data. Figure 5a shows the randomly selected data, and Figure 5b shows the manually selected data. Bold in Table 10 indicates the highest values of different indicators.It can be seen from Table 10 that various performance indicators are improved to a certain extent if manually selected data are used. The selection of representative data can improve the performance of the model. Considering the issue of model performance, more representative data for each category will be selected in subsequent SAR image experiments.
3.3.3. Large-Scene High-Resolution SAR Images Terrain Classification
- Jiujiang DataThe experiments are conducted under the best settings explored in Section 3.3.2. Figure 6 shows the visualization results of terrain classification on JiuJiang data for the present method and the comparison methods, and Table 11 shows the evaluation indicators of terrain classification on JiuJiang data for the present method and the comparison methods. Bold in Table 11 indicates the highest values of different indicators. As the deeplabv3+ method requires a large amount of training data to achieve good results, overfitting is serious if a small amount of training data is used, which results in poor final classification performance. The indicators obtained using the JPSSL method improved to a certain extent compared with different comparative experiments. The forest and farmland categories of the JiuJiang image are very similar and cannot easily be distinguished from each other. Compared with the baseline, the present method improves the F1 score indicators of the forest class and farmland class by 17% and 23%, respectively. Comparison of Figure 6f,g shows that the present method can better distinguish forest and farmland. It can be seen from the table that the present method achieves the highest in all indicators. Compared with the baseline, there is a 10% improvement in PA, MIoU, and FWIoU, and a 14% improvement in Kappa indicators. After using FC-CRF post-processing, the overall metrics are improved by approximately 2%, and the classification effect of forest and farmland is improved. A comparison of Figure 6b,h shows the excellent result achieved by the present method.
- Napoli DataThe size of the SAR image in the Napoli area of Italy is 18,332 × 16,000, a large image with a higher resolution than the SAR image of JiuJiang. Due to the difference in image size, resolution, and imaging method, the hyperparameters suitable for the JiuJiang SAR image may not be suitable for the Napoli SAR image. The cut image patch size is one of the most significant points. Based on the hyperparameter experiment of the cut image patch size of the JiuJiang SAR image, the experiment is conducted on the cut image patch size of the Napoli image, and supplements are made accordingly. Table 12 represents the impact of different cut image patch sizes for the pretext task and the downstream terrain classification task. Bold in Table 12 indicates the highest values of different indicators.It can be seen from Table 12 that when the cut image patch size is 120 × 120 pixels, both the pretext task and the downstream terrain classification task achieve the highest indicators. For the SAR image in Napoli, Italy, the cut image patch size is set to 120 × 120 pixels, and the rest of the parameters are universal and consistent with the corresponding parameters of the JiuJiang SAR image.The experiments are conducted using the best parameter settings presented above. Figure 7 shows the visualization results of terrain classification on the Napoli data for the present method and the comparison methods, and Table 13 shows the evaluation indicators of terrain classification on Napoli data for the present method and the comparison methods. Bold in Table 13 indicates the highest values of different indicators. The Deeplabv3+ method is seriously overfitted when only 20 training data are used for each class. The indicators obtained using the JPSSL method improved to a certain extent compared with different comparative experiments. Compared with the baseline, the F1 score indicators of the forest and building categories increased by 20% and 10%, respectively, which shows the present method improves the discriminative performance of forest and building features. The present method achieves the highest indicators in terms of overall indicators. Compared with the baseline, PA shows approximately 6% improvement, and Kappa, MIoU, and FWIoU show roughly 8% improvement. After using FC-CRF post-processing, the overall metrics hare improved by approximately 1.5%, and the classification effect of forest and farmland improved overall. A comparison of Figure 7b,h shows the excellent result achieved by the present method.
- PoDelta Data
Method | PA | Kappa | MIoU | FWIoU | F1score | |||
---|---|---|---|---|---|---|---|---|
Water | Forest | Building | Farmland | |||||
Deeplabv3+ | 0.610 | 0.432 | 0.352 | 0.501 | 0.884 | 0.187 | 0.411 | 0.403 |
Segformer | 0.652 | 0.511 | 0.402 | 0.516 | 0.941 | 0.372 | 0.622 | 0.078 |
SimCLR | 0.827 | 0.748 | 0.645 | 0.752 | 0.956 | 0.480 | 0.859 | 0.747 |
JPSSL (no pre-training) | 0.767 | 0.663 | 0.582 | 0.702 | 0.963 | 0.338 | 0.867 | 0.603 |
JPSSL (pre-training) | 0.867 | 0.803 | 0.699 | 0.807 | 0.964 | 0.503 | 0.899 | 0.834 |
JPSSL (pre-training + FC-CRF) | 0.884 | 0.827 | 0.725 | 0.829 | 0.973 | 0.567 | 0.892 | 0.858 |
Image Patch Size | Pretext Task | Downstream Task | |||
---|---|---|---|---|---|
Accuracy | PA | Kappa | MIoU | FWIoU | |
50 × 50 | 0.728 | 0.788 | 0.710 | 0.618 | 0.674 |
75 × 75 | 0.856 | 0.786 | 0.707 | 0.620 | 0.675 |
100 × 100 | 0.656 | 0.784 | 0.709 | 0.631 | 0.679 |
120 × 120 | 0.894 | 0.804 | 0.733 | 0.649 | 0.697 |
150 × 150 | 0.805 | 0.787 | 0.711 | 0.630 | 0.675 |
200 × 200 | 0.740 | 0.789 | 0.712 | 0.625 | 0.673 |
255 × 255 | 0.890 | 0.768 | 0.682 | 0.596 | 0.644 |
Method | PA | Kappa | MIoU | FWIoU | F1score | |||
---|---|---|---|---|---|---|---|---|
Water | Forest | Building | Farmland | |||||
Deeplabv3+ | 0.660 | 0.548 | 0.498 | 0.545 | 0.952 | 0.393 | 0.785 | 0.321 |
Segformer | 0.740 | 0.654 | 0.577 | 0.620 | 0.974 | 0.531 | 0.861 | 0.384 |
SimCLR | 0.770 | 0.693 | 0.622 | 0.664 | 0.954 | 0.525 | 0.825 | 0.684 |
JPSSL (no pre-training) | 0.745 | 0.650 | 0.562 | 0.620 | 0.978 | 0.301 | 0.778 | 0.648 |
JPSSL (pre-training) | 0.804 | 0.733 | 0.649 | 0.697 | 0.981 | 0.501 | 0.872 | 0.688 |
JPSSL (pre-training + FC-CRF) | 0.820 | 0.754 | 0.665 | 0.719 | 0.986 | 0.540 | 0.853 | 0.731 |
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, W.; Zheng, L.; Wang, J.; Wang, G.; Qi, J.; Zhang, T. Application of Flood Disaster Monitoring Based on Dual Polarization of Gaofen-3 SAR Image. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3382–3385. [Google Scholar]
- Souza, W.d.O.; Reis, L.G.d.M.; Ruiz-Armenteros, A.M.; Veleda, D.; Ribeiro Neto, A.; Fragoso, C.R., Jr.; Cabral, J.J.d.S.P.; Montenegro, S.M.G.L. Analysis of environmental and atmospheric influences in the use of sar and optical imagery from sentinel-1, landsat-8, and sentinel-2 in the operational monitoring of reservoir water level. Remote Sens. 2022, 14, 2218. [Google Scholar] [CrossRef]
- Gao, G.; Yao, L.; Li, W.; Zhang, L.; Zhang, M. Onboard information fusion for multisatellite collaborative observation: Summary, challenges, and perspectives. IEEE Geosci. Remote Sens. Mag. 2023, 11, 40–59. [Google Scholar] [CrossRef]
- Zhang, Z.; Yang, J.; Du, Y. Deep convolutional generative adversarial network with autoencoder for semisupervised SAR image classification. IEEE Geosci. Remote Sens. Lett. 2020, 19, 4000405. [Google Scholar] [CrossRef]
- Wang, H.; Magagi, R.; Goita, K. Polarimetric decomposition for monitoring crop growth status. IEEE Geosci. Remote Sens. Lett. 2016, 13, 870–874. [Google Scholar] [CrossRef]
- Tombak, A.; Turkmenli, I.; Aptoula, E.; Kayabol, K. Pixel-based classification of SAR images using feature attribute profiles. IEEE Geosci. Remote Sens. Lett. 2018, 16, 564–567. [Google Scholar] [CrossRef]
- Bai, Y.; Gao, C.; Singh, S.; Koch, M.; Adriano, B.; Mas, E.; Koshimura, S. A framework of rapid regional tsunami damage recognition from post-event TerraSAR-X imagery using deep neural networks. IEEE Geosci. Remote Sens. Lett. 2017, 15, 43–47. [Google Scholar] [CrossRef]
- Passah, A.; Sur, S.N.; Paul, B.; Kandar, D. SAR Image Classification: A Comprehensive Study and Analysis. IEEE Access 2022, 10, 20385–20399. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
- Hong, Z.Q. Algebraic feature extraction of image for recognition. Pattern Recognit. 1991, 24, 211–219. [Google Scholar] [CrossRef]
- Dai, D.; Yang, W.; Sun, H. Multilevel local pattern histogram for SAR image classification. IEEE Geosci. Remote Sens. Lett. 2010, 8, 225–229. [Google Scholar] [CrossRef]
- Ansari, R.A.; Buddhiraju, K.M.; Malhotra, R. Urban change detection analysis utilizing multiresolution texture features from polarimetric SAR images. Remote Sens. Appl. Soc. Environ. 2020, 20, 100418. [Google Scholar] [CrossRef]
- Xiang, D.; Tang, T.; Zhao, L.; Su, Y. Superpixel generating algorithm based on pixel intensity and location similarity for SAR image classification. IEEE Geosci. Remote Sens. Lett. 2013, 10, 1414–1418. [Google Scholar] [CrossRef]
- Yao, J.; Krolak, P.; Steele, C. The generalized Gabor transform. IEEE Trans. Image Process. 1995, 4, 978–988. [Google Scholar] [PubMed]
- Lu, C.S.; Chung, P.C.; Chen, C.F. Unsupervised texture segmentation via wavelet transform. Pattern Recognit. 1997, 30, 729–742. [Google Scholar] [CrossRef]
- Xu, Y.; Bai, T.; Yu, W.; Chang, S.; Atkinson, P.M.; Ghamisi, P. Ai security for geoscience and remote sensing: Challenges and future trends. IEEE Geosci. Remote Sens. Mag. 2023, 11, 60–85. [Google Scholar] [CrossRef]
- Datcu, M.; Huang, Z.; Anghel, A.; Zhao, J.; Cacoveanu, R. Explainable, physics-aware, trustworthy artificial intelligence: A paradigm shift for synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2023, 11, 8–25. [Google Scholar] [CrossRef]
- Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Su, S.; Cui, Z.; Guo, W.; Zhang, Z.; Yu, W. Explainable Analysis of Deep Learning Methods for SAR Image Classification. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022. [Google Scholar]
- Wang, N.; Wang, Y.; Liu, H.; Zuo, Q.; He, J. Feature-fused SAR target discrimination using multiple convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1695–1699. [Google Scholar] [CrossRef]
- Geng, J.; Wang, H.; Fan, J.; Ma, X. Deep supervised and contractive neural network for SAR image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2442–2459. [Google Scholar] [CrossRef]
- Atteia, G.; Collins, M.J.; Algarni, A.D.; Samee, N.A. Deep-Learning-Based Feature Extraction Approach for Significant Wave Height Prediction in SAR Mode Altimeter Data. Remote Sens. 2022, 14, 5569. [Google Scholar] [CrossRef]
- Yue, Z.; Gao, F.; Xiong, Q.; Wang, J.; Huang, T.; Yang, E.; Zhou, H. A novel semi-supervised convolutional neural network method for synthetic aperture radar image recognition. Cogn. Comput. 2021, 13, 795–806. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, F.; Hou, Z.; Mian, L.; Wang, Z.; Zhang, J.; Tang, J. Self-supervised learning Generative or contrastive. IEEE Trans. Knowl. Data Eng. 2021, 35, 857–876. [Google Scholar] [CrossRef]
- Ericsson, L.; Gouk, H.; Loy, C.C.; Hospedales, T.M. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Process. Mag. 2022, 39, 42–62. [Google Scholar] [CrossRef]
- Jing, L.; Tian, Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4037–4058. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Albrecht, C.M.; Braham, N.A.A.; Mou, L.; Zhu, X.X. Self-supervised learning in remote sensing: A review. arXiv 2022, arXiv:2206.13188. [Google Scholar] [CrossRef]
- Tao, C.; Qi, J.; Lu, W.; Wang, H.; Li, H. Remote sensing image scene classification with self-supervised paradigm under limited labeled samples. IEEE Geosci. Remote Sens. Lett. 2020, 19, 8004005. [Google Scholar] [CrossRef]
- Sun, X.; Wang, P.; Lu, W.; Zhu, Z.; Lu, X.; He, Q.; Li, J.; Rong, X.; Yang, Z.; Chang, H.; et al. Ringmo: A remote sensing foundation model with masked image modeling. IEEE Trans. Geosci. Remote Sens. 2022, 61, 5612822. [Google Scholar] [CrossRef]
- Jung, H.; Oh, Y.; Jeong, S.; Lee, C.; Jeon, T. Contrastive self-supervised learning with smoothed representation for remote sensing. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8010105. [Google Scholar] [CrossRef]
- Ji, H.; Gao, Z.; Zhang, Y.; Wan, Y.; Li, C.; Mei, T. Few-shot scene classification of optical remote sensing images leveraging calibrated pretext tasks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5625513. [Google Scholar] [CrossRef]
- Markaki, S.; Panagiotakis, C. Jigsaw puzzle solving techniques and applications a survey. Vis. Comput. 2023, 39, 4405–4421. [Google Scholar] [CrossRef]
- Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1422–1430. [Google Scholar]
- Noroozi, M.; Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 69–84. [Google Scholar]
- Du, R.; Chang, D.; Bhunia, A.K.; Xie, J.; Ma, Z.; Song, Y.Z.; Guo, J. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 153–168. [Google Scholar]
- Li, R.; Liu, S.; Wang, G.; Liu, G.; Zeng, B. Jigsawgan: Auxiliary learning for solving jigsaw puzzles with generative adversarial networks. IEEE Trans. Image Process. 2021, 31, 513–524. [Google Scholar] [CrossRef] [PubMed]
- Du, W.S. Subtraction and division operations on intuitionistic fuzzy sets derived from the Hamming distance. Inf. Sci. 2021, 571, 206–224. [Google Scholar] [CrossRef]
- Ruby, U.; Yendapalli, V. Binary cross entropy with deep learning technique for image classification. Int. J. Adv. Trends Comput. Sci. Eng 2020, 9, 5393–5397. [Google Scholar]
- Krähenbühl, P.; Koltun, V. Efficient inference in fully connected crfs with gaussian edge potentials. Adv. Neural Inf. Process. Syst. 2011, 24, 109–117. [Google Scholar]
- Zhang, S.; Xing, J.; Wang, X.; Fan, J. Improved YOLOX-S Marine Oil Spill Detection Based on SAR Images. In Proceedings of the 2022 12th International Conference on Information Science and Technology (ICIST), Kaifeng, China, 14–16 October 2022; pp. 184–187. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Image Patch Size | Pretext Task | Downstream Task | |||
---|---|---|---|---|---|
Accuracy | PA | Kappa | MIoU | FWIoU | |
50 × 50 | 0.758 | 0.835 | 0.754 | 0.630 | 0.764 |
75 × 75 | 0.88 | 0.853 | 0.784 | 0.688 | 0.798 |
100 × 100 | 0.68 | 0.782 | 0.689 | 0.616 | 0.715 |
120 × 120 | 0.913 | 0.724 | 0.616 | 0.567 | 0.651 |
150 × 150 | 0.838 | 0.754 | 0.653 | 0.578 | 0.698 |
Permutation Set Size | 50 | 80 | 100 | 125 | 150 | 175 | 200 | 300 |
---|---|---|---|---|---|---|---|---|
Pretext task accuracy | 0.935 | 0.885 | 0.855 | 0.844 | 0.767 | 0.709 | 0.68 | 0.59 |
Scene classification accuracy | 0.948 | 0.945 | 0.949 | 0.944 | 0.948 | 0.95 | 0.944 | 0.936 |
Mode | Normalization Methods |
---|---|
Mode I | Individual normalization for each image block |
Mode II | All image blocks are normalized using a uniform mean standard deviation |
Mode III | Each image block uses a uniform mean and each image block’s standard deviation is handled separately |
Mode IV | Each image block uses a uniform standard deviation and each image block’s mean is handled separately |
Normalization Mode | Mode I | Mode II | Mode III | Mode IV |
---|---|---|---|---|
Pretext task accuracy | 0.937 | 0.784 | 0.933 | 0.918 |
Scene classification accuracy | 0.948 | 0.933 | 0.932 | 0.925 |
The Amount of Data per Class | JPSSL (No Pretraining) | JPSSL | ||
---|---|---|---|---|
PA | MIoU | PA | MIoU | |
10 | 0.745 | 0.533 | 0.827 | 0.638 |
20 | 0.767 | 0.582 | 0.850 | 0.686 |
30 | 0.822 | 0.641 | 0.824 | 0.664 |
Central Prediction Area Size | PA | Kappa | MIoU | FWIoU |
---|---|---|---|---|
15 × 15 | 0.870 | 0.808 | 0.705 | 0.812 |
25 × 25 | 0.867 | 0.803 | 0.699 | 0.807 |
75 × 75 | 0.834 | 0.756 | 0.650 | 0.767 |
PA | Kappa | MIoU | FWIoU | |
---|---|---|---|---|
randomly selected data | 0.853 | 0.784 | 0.688 | 0.798 |
manually selected data | 0.867 | 0.803 | 0.699 | 0.807 |
Image Patch Size | Pretext Task | Downstream Task | |||
---|---|---|---|---|---|
Accuracy | PA | Kappa | MIoU | FWIoU | |
50 × 50 | 0.755 | 0.907 | 0.800 | 0.522 | 0.881 |
75 × 75 | 0.873 | 0.911 | 0.808 | 0.520 | 0.887 |
100 × 100 | 0.678 | 0.908 | 0.799 | 0.522 | 0.879 |
120 × 120 | 0.873 | 0.929 | 0.846 | 0.588 | 0.904 |
150 × 150 | 0.822 | 0.913 | 0.813 | 0.526 | 0.885 |
200 × 200 | 0.748 | 0.920 | 0.828 | 0.595 | 0.897 |
255 × 255 | 0.918 | 0.913 | 0.813 | 0.571 | 0.885 |
Method | PA | Kappa | MIoU | FWIoU | F1score | |||
---|---|---|---|---|---|---|---|---|
Water | Forest | Building | Farmland | |||||
Deeplabv3+ | 0.773 | 0.575 | 0.373 | 0.727 | 0.900 | 0.073 | 0.123 | 0.727 |
Segformer | 0.925 | 0.838 | 0.603 | 0.901 | 0.989 | 0.287 | 0.665 | 0.868 |
SimCLR | 0.923 | 0.834 | 0.595 | 0.889 | 0.980 | 0.323 | 0.627 | 0.870 |
JPSSL (no pre-training) | 0.842 | 0.672 | 0.442 | 0.814 | 0.987 | 0.365 | 0.125 | 0.664 |
JPSSL (pre-training) | 0.929 | 0.846 | 0.588 | 0.904 | 0.988 | 0.299 | 0.582 | 0.881 |
JPSSL (pre-training + FC-CRF) | 0.948 | 0.885 | 0.662 | 0.924 | 0.988 | 0.337 | 0.766 | 0.918 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ren, Z.; Lu, Y.; Hou, B.; Li, W.; Sha, F. JPSSL: SAR Terrain Classification Based on Jigsaw Puzzles and FC-CRF. Remote Sens. 2024, 16, 1635. https://doi.org/10.3390/rs16091635
Ren Z, Lu Y, Hou B, Li W, Sha F. JPSSL: SAR Terrain Classification Based on Jigsaw Puzzles and FC-CRF. Remote Sensing. 2024; 16(9):1635. https://doi.org/10.3390/rs16091635
Chicago/Turabian StyleRen, Zhongle, Yiming Lu, Biao Hou, Weibin Li, and Feng Sha. 2024. "JPSSL: SAR Terrain Classification Based on Jigsaw Puzzles and FC-CRF" Remote Sensing 16, no. 9: 1635. https://doi.org/10.3390/rs16091635
APA StyleRen, Z., Lu, Y., Hou, B., Li, W., & Sha, F. (2024). JPSSL: SAR Terrain Classification Based on Jigsaw Puzzles and FC-CRF. Remote Sensing, 16(9), 1635. https://doi.org/10.3390/rs16091635