1. Introduction
Semi-arid shrublands, which cover over 18% of the Earth’s terrestrial surface [
1,
2], play a crucial role in global carbon cycling, biodiversity conservation, and soil stabilization. However, these ecosystems are highly vulnerable to climate change and desertification, making their monitoring essential for sustainable land management [
3]. Effective monitoring requires the accurate delineation of shrub crown boundaries, which is vital for assessing vegetation dynamics, ecosystem health, and conservation strategies. Despite the importance of semi-arid shrublands, mapping them in these regions presents significant challenges for traditional remote sensing methods. Techniques such as object-based image analysis (OBIA) and edge detection algorithms often struggle to detect shrubs due to their small size and spectral similarity to surrounding grasses or low-lying plants [
4]. These challenges are further exacerbated by seasonal spectral variations and limited annotated training data, making precise shrub delineation difficult [
5].
Recent advances in artificial intelligence (AI) have transformed remote sensing by improving feature extraction and segmentation accuracy in high-resolution imagery [
6]. Unlike traditional machine learning, which relies on significant feature engineering and preprocessing for unstructured data (e.g., images, sound), deep learning methods like convolutional neural networks (CNNs) can automatically learn hierarchical features, improving object recognition in complex landscapes. Several studies have considered deep learning as an alternative to OBIA for classification and object detection. Guirado et al. [
7] utilized both OBIA and CNN methods for the object detection of the
Ziziphus lotus shrub, achieving an average of a 95% F1-score and showing that the best CNN detector achieved up to 12% better precision, up to 30% better recall, and up to 20% better balance between precision and recall than the OBIA method. James & Bradshaw [
8] demonstrated the feasibility of using deep learning and drone technology for real-time plant species detection in a farm southwest of Grahamstown, in the Eastern Cape, South Africa, achieving 83% F1 in detecting shrubs, highlighting the potential of using deep learning for efficient and scalable vegetation monitoring. Khaldi et al. [
9] presented a deep learning approach for the individual mapping of large polymorphic shrubs in high-mountain ecosystems in the Sierra Nevada National Park located on the southern fringe of the Iberian Peninsula using satellite imagery, and achieved an F1-score in shrub delineation of 87.87% on the photo-interpreted data, demonstrating deep learning’s effectiveness for regional-scale medium-to-large shrub detection using high-resolution google earth satellite imagery (13 cm resolution).
Despite the notable results achieved with deep learning in shrub detection, training deep learning models typically requires large, annotated datasets [
10]. This requirement is often impractical in remote sensing applications, where field data collection and manual labeling are labor-intensive and resource-demanding. Consequently, there is a growing need for methodologies that enable robust model performance using limited training data. Moreover, in the broader field of computer vision, small object detection remains a well-known challenge due to the limited number of pixels representing small objects, making them more susceptible to misclassification or being overlooked [
11]. This issue is particularly evident in shrublands, where shrubs only occupy a small portion of high-altitude unmanned aerial system (UAS) imagery, further complicating detection. The framework for accurately delineating shrubs remains challenging, as current methods face limitations in handling variability in shrub morphology, density, and environmental conditions.
To address the challenges posed by complex environmental conditions and limited training data, transfer learning and data augmentation have proven effective in alleviating data constraints [
12,
13,
14]. Specifically, transfer learning reduces the reliance on large, labeled datasets by transferring knowledge from pre-trained models, while data augmentation improves model robustness, prevents overfitting, and addresses the issue of limited data by generating new synthetic data points through various transformations of the original dataset. This study explored the potential of using pre-trained models that were initially trained for tree crown delineation [
15,
16] in shrub detection. Given the shared structural traits between trees and shrubs, such as the presence of canopies and vegetative textures, these models offer a potential starting point for accurate shrub delineation. However, a key challenge lies in the domain shift between the source domain (trees) and the target domain (shrubs). Domain shift refers to the difference in data distributions between the domains, which can degrade model performance when the learned features in the source domain do not generalize well to the target [
17]. In this context, morphological differences are particularly relevant: trees often exhibit larger, more distinct, and vertically prominent crowns, while shrubs have smaller, irregular, and more horizontally distributed canopies that often overlap and blend with background elements such as soil or grass. These structural differences affect spatial, spectral, and contextual features, potentially reducing the effectiveness of the pre-trained representations. To address this, our study employed fine-tuning to refine these models. Fine-tuning with labeled shrubland data helps the model recalibrate its feature detectors to focus on shrub-specific traits, such as irregular crown structures, overlapping canopies, and increased background noise [
18]. This targeted adaptation is essential for overcoming domain shifts and achieving accurate shrub delineation in complex environments.
Although previous research has been conducted on shrub detection, much of it has focused on local-scale applications with extensive effort in data annotation and significant computational resources. The objective of this study was to test whether transfer learning can enhance the generalizability and efficiency of shrub crown detection by reducing annotation and computational demands while maintaining competitive accuracy. To address this objective, we investigated whether pre-trained models from related domains can be effectively adapted to shrub delineation tasks through fine-tuning, by evaluating the performance of transferred models based on two fine-tuning levels. The first level involves feature extraction, where only the final classification layers are retrained, while the second level applies more extensive fine-tuning, with more layers unfrozen for retraining.
5. Conclusions
This study evaluated the transferability of pre-trained deep learning models across ecologically similar but structurally different areas for shrub segmentation using UAS-based remote sensing imagery. Our results highlight that fine-tuning is essential, especially when variations in shrub size, texture, and seasonal conditions are present. Among the two tested models, Attention U-Net and Mask R-CNN, we found that Attention U-Net is more effective when training resources are limited, offering better generalization with minimal annotated data. It is particularly suited for practitioners seeking rapid deployment with limited annotation efforts. Both models performed well in detecting mature shrubs during the late summer (July to September). However, segmentation accuracy dropped significantly for small shrubs with crown areas of less than 2 m2 when imagery was captured at a 120 m flight elevation. These small objects often lacked distinct spatial features at this scale, leading to poor recall across both models. The challenge was even more pronounced for early-season (April to June) vegetation, where low contrast and undeveloped crowns made detection difficult. Future work should consider multi-scale feature fusion, denser anchor box strategies, and hybrid CNN–Transformer designs to improve small object segmentation. Additionally, integrating multispectral bands beyond RGB, such as near-infrared (NIR) and red-edge, could potentially improve model performance, especially for small or early-stage shrubs that are difficult to resolve using RGB alone. Also, flying at lower altitudes or using higher-resolution sensors could help overcome the current limitations in small shrub detection. These findings emphasize the need for improved model architectures and imaging strategies for accurately detecting and monitoring fine-scale vegetation structures, particularly small or early-stage shrubs, in complex and dynamic ecosystems.