CTHNet: A CNN–Transformer Hybrid Network for Landslide Identification in Loess Plateau Regions Using High-Resolution Remote Sensing Images
Abstract
:1. Introduction
- (1)
- A database of 1500 landslide and non-landslide samples was constructed based on HRSIs, which were interpreted in Liulin County, a region within the Loess Plateau.
- (2)
- A hybrid network framework integrating the strengths of the CNN and transformer was developed for loess landslide identification. An improved VGG-16 variant was used to construct the CNN module for detail mapping, and a multi-scale lightweight transformer (MLT) module was proposed to capture global scale information.
- (3)
- The results and performance in terms of loess landslide identification were compared and analyzed with CNN-based and transformer-based methods.
2. Related Works
2.1. CNN-Based Methods for Image Segmentation
2.2. Transformer-Based Methods for Image Segmentation
2.3. Remote Sensing Image Segmentation for Landslides
3. Study Area
4. Materials and Methods
4.1. Data and Preprocessing
4.2. Network Architecture
4.2.1. CNN-Transformer Hybrid as Encoder
- (1)
- CNN Module
- (2)
- Multi-Scale Lightweight Transformer Module
4.2.2. Decoder
4.2.3. Model Training
4.3. Accuracy Evaluation
5. Results
6. Discussion
6.1. Comparative Result Analysis
6.2. Ablation Analysis
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Adam | Adaptive Moment Estimation |
ALOS | Advanced Land Observing Satellite |
CNN | Convolutional Neural Network |
CTHNet | CNN–Transformer Hybrid Network |
DC-FFN | Depth-Wise Convolution Feed-Forward Network |
DEM | Digital Elevation Model |
FC | Fully Connected Layer |
FCN | Fully Convolutional Network |
FN | False Negative |
FP | False Positive |
HCNet | Hierarchical Context Network |
HRSI | High-Resolution Remote Sensing Image |
MLT | Multi-Scale Lightweight Transformer |
SC-MSA | Spatial Condensed Multi-Head Attention |
TN | True Negative |
TP | True Positive |
ViT | Vision Transformer |
WCE | Weighted Cross-Entropy Loss Function |
References
- Li, Y.; Mao, J.; Xiang, X.; Mo, P. Factors Influencing Development of Cracking–Sliding Failures of Loess Across the Eastern Huangtu Plateau of China. Nat. Hazards Earth Syst. 2018, 18, 1223–1231. [Google Scholar] [CrossRef]
- Hungr, O.; Leroueil, S.; Picarelli, L. The Varnes Classification of Landslide Types, an Update. Landslides 2013, 11, 167–194. [Google Scholar] [CrossRef]
- Zhu, Y.; Qiu, H.; Yang, D.; Liu, Z.; Ma, S.; Pei, Y.; He, J.; Du, C.; Sun, H. Pre- and Post-Failure Spatiotemporal Evolution of Loess Landslides: A Case Study of the Jiangou Landslide in Ledu, China. Landslides 2021, 18, 3475–3484. [Google Scholar] [CrossRef]
- Li, Y.; Mo, P. A Unified Landslide Classification System for Loess Slopes: A Critical Review. Geomorphology 2019, 340, 67–83. [Google Scholar] [CrossRef]
- Ji, Q.; Liang, Y.; Xie, F.; Yu, Z.; Wang, Y. Automatic and Efficient Detection of Loess Landslides Based on Deep Learning. Sustainability 2024, 16, 1238. [Google Scholar] [CrossRef]
- Zhuang, J.; Peng, J.; Wang, G.; Javed, I.; Wang, Y.; Li, W. Distribution and Characteristics of Landslide in Loess Plateau: A Case Study in Shaanxi Province. Eng. Geol. 2018, 236, 89–96. [Google Scholar] [CrossRef]
- Liu, Y.; Wu, L. Geological Disaster Recognition on Optical Remote Sensing Images Using Deep Learning. In Proceedings of the 4th International Conference on Information Technology and Quantitative Management (ITQM)—Promoting Business Analytics and Quantitative Management of Technology, Asan, Republic of Korea, 16–18 August 2016; pp. 566–575. [Google Scholar] [CrossRef]
- Wang, J.; Chen, G.; Jaboyedoff, M.; Derron, M.-H.; Li, F.; Li, H.; Luo, X. Loess landslides Detection via a Partially Supervised Learning and Improved Mask-RCNN with Multi-source Remote Sensing Data. CATENA 2023, 231, 107370–107385. [Google Scholar] [CrossRef]
- Li, H.; He, Y.; Xu, Q.; Deng, J.; Li, W.; Wei, Y. Detection and Segmentation of Loess Landslides via Satellite Images: A Two-Phase Framework. Landslides 2022, 19, 673–686. [Google Scholar] [CrossRef]
- Li, Y.; Chen, G.; Han, Z.; Zheng, L.; Zhang, F. A Hybrid Automatic Thresholding Approach Using Panchromatic Imagery for Rapid Mapping of Landslides. GISci. Remote Sens. 2014, 51, 710–730. [Google Scholar] [CrossRef]
- Das, I.; Stein, A.; Kerle, N.; Dadhwal, V.K. Probabilistic Landslide Hazard Assessment Using Homogeneous Susceptible Units (HSU) Along a National Highway Corridor in the Northern Himalayas, India. Landslides 2011, 8, 293–308. [Google Scholar] [CrossRef]
- Chen, W.; Li, X.; Wang, Y.; Chen, G.; Liu, S. Forested Landslide Detection Using LiDAR Data and the Random Forest Algorithm: A Case Study of the Three Gorges, China. Remote Sens. Environ. 2014, 152, 291–301. [Google Scholar] [CrossRef]
- Tehrani, F.S.; Santinelli, G.; Herrera, M.H. Multi-Regional Landslide Detection Using Combined Unsupervised and Supervised Machine Learning. Geomat. Nat. Hazards Risk 2021, 12, 1015–1038. [Google Scholar] [CrossRef]
- Lei, T.; Xue, D.; Lv, Z.; Li, S.; Zhang, Y.; Nandi, A.K. Unsupervised Change Detection Using Fast Fuzzy Clustering for Landslide Mapping from Very High-Resolution Images. Remote Sens. 2018, 10, 1381. [Google Scholar] [CrossRef]
- Qiao, G.; Mi, H.; Feng, T.; Lu, P.; Hong, Y. Multiple Constraints Based Robust Matching of Poor-Texture Close-Range Images for Monitoring a Simulated Landslide. Remote Sens. 2016, 8, 396. [Google Scholar] [CrossRef]
- Wang, R.; Shi, Y.; Cao, W. GA-SURF: A New Spee de D-Up Robust Feature Extraction Algorithm for Multispectral Images Based on Geometric Algebra. Pattern Recognit. Lett. 2019, 127, 11–17. [Google Scholar] [CrossRef]
- Liu, K.; Skibbe, H.; Schmidt, T.; Blein, T.; Palme, K.; Brox, T.; Ronneberger, O. Rotation-Invariant HOG Descriptors Using Fourier Analysis in Polar and Spherical Coordinates. Int. J. Comput. Vis. 2014, 106, 342–364. [Google Scholar] [CrossRef]
- Chong, Y.; Nie, Y.; Tao, Y.; Chen, Y.; Pan, S. HCNet: Hierarchical Context Network for Semantic Segmentation. IEEE Access 2020, 8, 179213–179223. [Google Scholar] [CrossRef]
- Li, M.; Rui, J.; Yang, S.; Liu, Z.; Ren, L.; Ma, L.; Li, Q.; Su, X.; Zuo, X. Method of Building Detection in Optical Remote Sensing Images Based on SegFormer. Sensors 2023, 23, 1258. [Google Scholar] [CrossRef]
- Cai, H.; Chen, T.; Niu, R.; Plaza, A. Landslide Detection Using Densely Connected Convolutional Networks and Environmental Conditions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5235–5247. [Google Scholar] [CrossRef]
- Wang, K.; He, D.; Sun, Q.; Yi, L.; Yuan, X.; Wang, Y. A Novel Network for Semantic Segmentation of Landslide Areas in Remote Sensing Images with Multi-Branch and Multi-Scale Fusion. Appl. Soft Comput. 2024, 158, 111542–111555. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, H.; Chen, Z.; Huangliang, K.; Zhang, H. TransUNet+: Redesigning the Skip Connection to Enhance Features in Medical Image Segmentation. Knowl. Based Syst. 2022, 256, 109859–109869. [Google Scholar] [CrossRef]
- Fu, Y.; Zhang, W.; Bi, X.; Wang, P.; Gao, F. TCNet: A Transformer–CNN Hybrid Network for Marine Aquaculture Mapping from VHSR Images. Remote Sens. 2023, 15, 4406. [Google Scholar] [CrossRef]
- Li, C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. Computer Vision and Pattern Recognition. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Majumdar, S.; Sau, A.; Biswas, M.; Sarkar, R. Metallographic Image Segmentation Using Feature Pyramid Based Recurrent Residual U-Net. Comput. Mater. Sci. 2024, 244, 113199–113212. [Google Scholar] [CrossRef]
- Zhao, N.; Huang, B.; Yang, J.; Radenkovic, M.; Chen, G. Oceanic Eddy Identification Using Pyramid Split Attention U-Net with Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1500605. [Google Scholar] [CrossRef]
- Schlemper, J.; Oktay, O.; Schaap, M.; Heinrich, M.; Kainz, B.; Glocker, B.; Rueckert, D. Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images. Med. Image Anal. 2019, 53, 197–207. [Google Scholar] [CrossRef] [PubMed]
- Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into Deep Learning, PyTorch ed.; Li, Y., Ed.; Posts & Telecom Press: Beijing, China, 2023; pp. 178–207. ISBN 978-7-115-60082-0. [Google Scholar]
- Sun, S.; Zhang, T.; Li, Q.; Wang, J.; Zhang, W.; Wen, Z.; Tang, Y. Fault Diagnosis of Conventional Circuit Breaker Contact System Based on Time–Frequency Analysis and Improved AlexNet. IEEE Trans. Instrum. Meas. 2021, 70, 3508512. [Google Scholar] [CrossRef]
- Jiang, X.; Li, G.; Liu, Y.; Zhang, X.-P.; He, Y. Change Detection in Heterogeneous Optical and SAR Remote Sensing Images Via Deep Homogeneous Feature Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1551–1566. [Google Scholar] [CrossRef]
- Ye, M.; Ruiwen, N.; Chang, Z.; He, G.; Tianli, H.; Shijun, L.; Yu, S.; Tong, Z.; Ying, G. A Lightweight Model of VGG-16 for Remote Sensing Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 6916–6922. [Google Scholar] [CrossRef]
- Balagourouchetty, L.; Pragatheeswaran, J.K.; Pottakkat, B.; Ramkumar, G. GoogLeNet-Based Ensemble FCNet Classifier for Focal Liver Lesion Diagnosis. IEEE J. Biomed. Health 2020, 24, 1686–1694. [Google Scholar] [CrossRef]
- Xu, L.; Chen, Q. Remote-Sensing Image Usability Assessment Based on ResNet by Combining Edge and Texture Maps. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1825–1834. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2015, arXiv:1411.4038. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Ji, S.; Dai, P.; Lu, M.; Zhang, Y. Simultaneous Cloud Detection and Removal from Bitemporal Remote Sensing Images Using Cascade Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 732–748. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Li, R.; Duan, C.; Zhang, C.; Meng, X.; Fang, S. A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. 2022, 19, 6506105. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Zheng, S.X.; Lu, J.C.; Zhao, H.S.; Zhu, X.T.; Luo, Z.K.; Wang, Y.B.; Fu, Y.W.; Feng, J.F.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6877–6886. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.T.; Cao, Y.; Hu, H.; Wei, Y.X.; Zhang, Z.; Lin, S.; Guo, B.N. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408715. [Google Scholar] [CrossRef]
- Liu, X.; Peng, Y.; Lu, Z.; Li, W.; Yu, J.; Ge, D.; Xiang, W. Feature-Fusion Segmentation Network for Landslide Detection Using High-Resolution Remote Sensing Images and Digital Elevation Model Data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4500314. [Google Scholar] [CrossRef]
- Chen, H.; He, Y.; Zhang, L.; Yang, W.; Liu, Y.; Gao, B.; Zhang, Q.; Lu, J. A Multi-Input Channel U-Net Landslide Detection Method Fusing SAR Multisource Remote Sensing Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1215–1232. [Google Scholar] [CrossRef]
- Xu, B.; Zhang, C.; Liu, W.; Huang, J.; Su, Y.; Yang, Y.; Jiang, W.; Sun, W. Landslide Identification Method Based on the FKGRNet Model for Remote Sensing Images. Remote Sens. 2023, 15, 3407. [Google Scholar] [CrossRef]
- Liu, X.; Xu, L.; Zhang, J. Landslide Detection with Mask R-CNN Using Complex Background Enhancement Based on Multi-Scale Samples. Geomat. Nat. Hazards Risk 2024, 15, 2300823–2300845. [Google Scholar] [CrossRef]
- Xu, P.; Zhu, Q.; Li, H.; Hu, H.; Ding, Y.; Chen, L. A Landslide Extraction Method of Remote Sensing Image Based on Multi-Scale Depth Attention Model. J. Geomat. 2022, 47, 108–112. [Google Scholar] [CrossRef]
- Wu, L.; Liu, R.; Ju, N.; Zhang, A.; Gou, J.; He, G.; Lei, Y. Landslide mapping based on a hybrid CNN-transformer network and deep transfer learning using remote sensing images with topographic and spectral features. Int. J. Appl. Earth Obs. Geoinf. 2024, 126, 103612–103628. [Google Scholar] [CrossRef]
- Xiang, X.; Gong, W.; Li, S.; Chen, J.; Ren, T. TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3123–3136. [Google Scholar] [CrossRef]
- Ma, S.; Qiu, H.; Yang, D.; Wang, J.; Zhu, Y.; Tang, B.; Sun, K.; Cao, M. Surface Multi-Hazard Effect of Underground Coal Mining. Landslides 2023, 20, 39–52. [Google Scholar] [CrossRef]
- Xie, E.Z.; Wang, W.H.; Yu, Z.D.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. PVT v2: Improved Baselines with Pyramid Vision Transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, K.; Cao, J.; Timofte, R.; van Gool, L. LocalViT: Bringing Locality to Vision Transformers. arXiv 2021, arXiv:2104.05707. [Google Scholar] [CrossRef]
- Ho, Y.; Wookey, S. The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling. IEEE Access 2020, 8, 4806–4813. [Google Scholar] [CrossRef]
Data Type | Data Sources |
---|---|
VHSR Image Data | Google Earth |
Terrain Data (DEM) | ALOS |
Relevant Parameter | Parameter Value |
---|---|
Epoch | 60 |
Batch size | 8 |
Learning rate | 0.0001 |
Zero-fill convolution for patch embedded | Kernel sizes = [7,5,3,3] Strides = [4,2,2,1] C = 256 |
Head number of MSA | [1,1,2,2] |
Reduction rate | [8,4,2,1] |
Optimizer | Adam |
Loss function | WCE |
Confusion Matrix | Truth Value | ||
---|---|---|---|
Positive | Negative | ||
Predictive Value | Positive | TP | FP (Type II) |
Negative | FN (Type I) | TN |
Methods | Evaluation Metrics (%) | |||
---|---|---|---|---|
Precision | Recall | F1-Score | IoU | |
U-Net | 76.26 | 56.92 | 65.19 | 48.35 |
DeepLabV2 | 74.56 | 60.01 | 66.50 | 49.81 |
Attention-UNet | 79.16 | 49.53 | 60.93 | 43.82 |
HCNet | 74.31 | 64.39 | 69.00 | 52.67 |
CTHNet | 80.42 | 66.50 | 72.81 | 57.23 |
Methods | Evaluation Metrics (%) | |||
---|---|---|---|---|
Precision | Recall | F1-Score | IoU | |
SETR | 66.68 | 47.17 | 55.25 | 38.17 |
Swin-UNet | 70.94 | 60.10 | 65.07 | 48.22 |
TransUNet | 75.87 | 62.62 | 68.61 | 52.22 |
SegFormer | 61.94 | 57.58 | 59.68 | 42.53 |
CTHNet | 80.42 | 66.50 | 72.81 | 57.23 |
Methods | Evaluation Index (%) | |||
---|---|---|---|---|
Precision | Recall | F1-Score | IoU | |
Scheme 1 | 74.96 | 61.59 | 67.62 | 51.08 |
Scheme 2 | 74.92 | 63.80 | 68.91 | 52.57 |
Scheme 3 | 80.42 | 66.50 | 72.81 | 57.23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Zhang, J.; Fu, Y. CTHNet: A CNN–Transformer Hybrid Network for Landslide Identification in Loess Plateau Regions Using High-Resolution Remote Sensing Images. Sensors 2025, 25, 273. https://doi.org/10.3390/s25010273
Li J, Zhang J, Fu Y. CTHNet: A CNN–Transformer Hybrid Network for Landslide Identification in Loess Plateau Regions Using High-Resolution Remote Sensing Images. Sensors. 2025; 25(1):273. https://doi.org/10.3390/s25010273
Chicago/Turabian StyleLi, Juan, Jin Zhang, and Yongyong Fu. 2025. "CTHNet: A CNN–Transformer Hybrid Network for Landslide Identification in Loess Plateau Regions Using High-Resolution Remote Sensing Images" Sensors 25, no. 1: 273. https://doi.org/10.3390/s25010273
APA StyleLi, J., Zhang, J., & Fu, Y. (2025). CTHNet: A CNN–Transformer Hybrid Network for Landslide Identification in Loess Plateau Regions Using High-Resolution Remote Sensing Images. Sensors, 25(1), 273. https://doi.org/10.3390/s25010273