1. Introduction
Synthetic aperture radar (SAR) is an active microwave remote sensing imaging radar with the high resolution and the ability to continuously monitor within a short imaging period. It is less affected by weather and can observe ground targets independent of sunlight illumination. Its applications extend across military and civilian domains [
1]. In environmental protection [
2], disaster monitoring [
3], ocean observation [
4], resource preservation [
5], land-cover [
6] analysis, precision agriculture [
7], urban area detection [
8], and geographic mapping [
9], SAR terrain classification task, predicting semantic labels pixel by pixel, finds extensive usage.
In recent years, with the rapid growth and development of the number of available SAR images and artificial intelligence technology, the exponentially increasing demand for SAR image interpretation, especially for the task of terrain classification, has pushed classification algorithms to gradually shift to automatic, end-to-end, and the hottest deep learning-based [
10] methods. The existing deep learning (DL) models integrate the essence of traditional methods, which can learn multi-scale and multi-level feature representations, such as FPN [
11]. In addition, it can effectively capture the contextual information of the image, such as ViT [
12]. With its powerful feature representation ability, DL methods have made great progress in the field of SAR image terrain classification [
13,
14,
15]. However, the progression of deep learning methods in the domain of SAR image terrain classification remains limited by two respects. On the one hand, different optical images, which are highly consistent with human visual cognition, in SAR images display many abstract forms such as foreshortening, shadow, and layover because of the active microwave imaging process. This makes SAR image labeling arduous, infrequent, and costly to acquire. On the other hand, owing to the distinctive imaging mechanism of SAR, radar imaging parameters have a pronounced influence on SAR images, resulting in variations in data distribution, gray levels, and texture due to different sensors, bands, polarizations, resolutions, or angles of incidence.
Figure 1 presents SAR images produced through various resolutions (imaging area is Hanzhong, Shaanxi) and polarizations, where the data for the different polarizations were taken from the study [
16]. Notably, substantial variations in grayscale are apparent across these SAR images, particularly when considering distinct imaging modes. Significant domain differences in these complex data situations weaken classification performance during model migration. As a result, the training and testing of deep methods are often limited to the same scene, resulting in a deficiency in model generalization. Therefore, the focus of this paper is to circumvent the constraints imposed by the scarcity of manually labeled samples and the significant domain differences that challenge prevailing deep terrain classification algorithms.
From the above description, the complexity and high dependency of SAR data hinder model generalizability and feature learning. As a special case of transfer learning (TL), domain adaptation (DA) uses the label learning of the source domain to execute new tasks in the target domain. As a paradigm of cross-domain learning, deep DA methods, particularly unsupervised DA (UDA), play a crucial role in enhancing the generalizability of SAR image terrain classification methods. UDA solely depends on source domain labels for model training, while autonomously adapting to the feature distribution of the target domain. This approach significantly mitigates the requirement for manual labeling, minimizes inconsistencies across domain features, and leverages commonalities between the source and target domains. It has been extensively investigated and applied in both natural and remote sensing domains. Existing UDA methods tend to consider only the consistency of the domain in feature space, leading to a limited ability to perceive uncommon terrain features. Li et al. [
17] proposed a BDL framework that reduces domain differences through bidirectional learning in two sequential steps of image translation and segmentation adaptation in both directions. It is essential to highlight that the image translation network’s quality in the mentioned framework significantly influences the image segmentation quality. The study demonstrates that maintaining consistency among low-level features within the image space facilitates the model in capturing fundamental terrain features, consequently enhancing domain alignment within the feature space.
Drawing inspiration from the above research, a multi-step unsupervised domain adaptive SAR image terrain classification framework (STDM-UDA) based on style transfer and domain metrics is proposed for unlabeled scenes, while considering the unique characteristics and present state of SAR images. STDM-UDA, a two-step independent domain adaptive network is constructed to reduce domain differences in image space and semantic space and to migrate annotation information from the source to achieve terrain classification in the target domain. In the first step, the style transfer network facilitates the migration of SAR image style characteristics from the source to the target domain, diminishing disparities in low-level statistical attributes like brightness and contrast among images. This process narrows the domain gaps within the image space, aiding the network in acquiring shared feature representations. In the second step, the network extracts semantic features from the images and employs adversarial learning and domain metrics to align the translated source and target domains within the feature space. STDM-UDA enhances model generalization and accomplishes target domain image terrain classification by leveraging source domain information, thus obviating the need for target domain image annotation.
In summary, the main contributions of this paper are as follows:
A multi-step unsupervised domain adaptive SAR image terrain classification model framework based on Style Transformation and Domain Metrics (STDM-UDA) is proposed. The framework reduces the domain differences in both image space and feature space through two independent domain adaptation networks to enhance the generalization of the model.
STDM-UDA transfers source domain knowledge to an unlabeled target domain, avoiding the need for labeled data in the target domain.
The effectiveness of STDM-UDA is convincingly demonstrated by the terrain classification results in three high-resolution broad scenes without labels.
The rest of this paper is organized as follows. The related work is introduced in
Section 2.
Section 3 details the method adopted in this paper.
Section 4 presents the classification experimental results on SAR images. Finally, discussion and conclusions are summarized in
Section 5 and
Section 6.
3. Methods
The technical details of STDM-UDA are described in this section. The complete structure of the proposed framework is shown in
Figure 2, in which STDM-UDA consists of two independent steps based on adversarial DA: image style transfer network and adaptive segmentation network. The networks in both steps of STDM-UDA employ the generative adversarial network (GAN) training strategy for domain adaptive learning. The training process for each network is conducted independently. In the first step, the network is directed to achieve SAR image style transfer from the source to the target domain. This process diminishes differences between the two domains in the low-level of image space, encompassing brightness, contrast, dynamic range, and texture. Consequently, it eases the visual discrepancies between domains and facilitates knowledge transfer. The generated intermediate domain images along with the target domain images are trained for the adaptive images segmentation network in the second step, and each pixel category of the broad scenes in the target domain is acquired during testing.
In the remainder of this section, the data preprocessing of raw SAR images is described in
Section 3.1. The image translation network and adaptive segmentation network are detailed in
Section 3.2 and
Section 3.3, respectively.
3.1. Data Preprocessing
This section describes the preprocessing of raw SAR images of a broad scene. SAR images have a large dynamic range due to their spatial resolution. As a result, the raw SAR images generally consist of 16-bit unsigned integer data with a highly asymmetric distribution, where the majority of pixels are located in the low amplitude range (0 to 500). Standard CNNs are unable to handle such a large dynamic range, so dynamic range compression is necessary. Applying a simple linear stretching to the SAR images cause the compression of the majority of pixel points into a narrow range of small gray levels, leading to a significant loss of detail information. As a consequence, the image cannot be accurately displayed. This will cause the recognition of the neural network to be misdirected. To mitigate this issue, we employ linear stretching with truncation for processing raw SAR images. Specifically, the distribution and frequency of occurrence of the gray levels of the raw SAR images are counted in ascending order and a distribution function is accumulated. As the distribution function accumulates to a threshold, the gray values of pixels with higher gray levels are overwritten by 255, and the remaining pixel gray values are linearly stretched between 0 and 255. The definition is as follows:
where
represents the truncated gray value, and
and
represent the input and output gray levels at pixel points (
i,
j), respectively.
The dynamic range-transformed SAR image is divided into a series of subimages using dilated sampling.
Figure 3 illustrates the difference between direct and dilated sampling. Unlike direct sampling, the dilated sampling approach discards the prediction results of boundary pixels with contextual information deprivation. Although this increases the prediction overhead, it significantly enhances the consistency of predicted terrian at the boundaries of the subimage predicted results.
3.2. Image Style Transfer Network
The GAN-based image style transfer network (STN) is used as the first step of STDM-UDA as an adversarial domain adaptive network, which achieves the directional transfer of style images from the source to the target domain. Reducing the domain differences at the low level between the two domains in the image space provides a good starting point for the second step of the adaptive segmentation network.
In STDM-UDA, we adopt CycleGAN [
38] as the unpaired SAR image style transfer model. Rather than necessitating a direct correspondence between SAR images in the source and target domains, CycleGAN encourages the generator to achieve SAR image style transfer through adversarial learning. This entails the discriminator evaluating the stylistic similarity between the generated (translated) image and the target domain image. Its structure is shown in
Figure 4 and consists of two pairs of the generator and the discriminator. The generators
G and
F establish a bi-directional mapping relationship between the images of the source domain
S and the target domain
T. The discriminators
and
separately discriminate between the source images
s and the translated source images
, and the target images
t and the translated target images
. For convenience, it is defined
as the sample space of the source domain
S, and
is the sample space of the target domain
T. In summary, the loss function of STN has the following parts: an adversarial loss
, a cycle consistency loss
, and an identity loss
.
3.2.1. Adversarial Loss
The
G learns the mapping from
S to
T, and the
F learns the mapping from
T to
S. We apply adversarial losses to both mapping functions such that the mapped data distribution is close to that of the target domain. The adversarial loss [
40] is applied to both mappings so that the mapped data distribution is close to the real-data distribution.
The adversarial loss of
is defined as follows:
The G aims to generate plausible images of T. The is dedicated to distinguishing between real and generated images of T so that the optimization objective for this loss is to minimize G and maximize .
The adversarial loss of
is
Similar to the process for , the goal is to minimize F and maximize .
3.2.2. Cycle Consistency Loss
Relying solely on an adversarial loss does not guarantee that the generator will consistently map the input to the desired output, especially in the case of larger models. To ensure that the learned
G and
F maintain consistency without contradicting each other, the cycle consistency loss [
38] is added to aim for the
to resemble the
t, and the
to resemble the
s as closely as possible. The cycle consistency loss incorporates the
loss to quantify the similarity between images during the learning process of the two mappings.
is used in the loss to measure the consistency between SAR images. The definitions are as follows:
Under the constraint of cycle consistency loss, both G and F satisfy the forward cycle consistency for image s. The image style transfer cycle takes s back to the original image after one cycle (). The same holds for image t.
3.2.3. Identity Loss
The loss [
38] is designed to train the network to recognize image styles. For the generators, the output is an identity mapping when real samples are used as input. Its expression is as follows:
3.2.4. Full Objective
The full objective of the image translation network is
where
is the cycle consistency loss coefficient.
The ultimate goal is to optimize
3.3. Adversarial Adaptive Segmentation Network
As shown in the second stage in
Figure 2, based on the intermediate domain outputs in the first step, the adversarial-based domain adaptive segmentation network achieves the alignment of the semantic features of the two domains and outputs the terrain classification results. The whole training process of STDM-UDA is listed in Algorithm 1.
Algorithm 1 The training process of STDM-UDA. |
1: | , , and the . |
2: | |
3: | Initialize image translation network . |
4: | for number of image translation iterations do |
5: | train with Formula (7). |
6: | end for |
7: | Get the by the . |
8: | |
9: | Initialize the M and the D. |
10: | for number of segmentation iterations do |
11: | train M with Formula (10). |
12: | train D with Formula (11). |
13: | end for |
The adaptive classification network consists of a segmentation network M and a domain discrimination network D. The M output the segmentation probability map of translated source domain image and the target domain image segmentation probability map . The predicted segmentation map of the image is obtained by using a softmax operation on and . The D discriminates the domain features learned by the M and measures the similarity between the two domain distributions to reduce the difference between the source and target domain. On this basis, we add a measure of domain feature similarity to complement the adversarial loss in an explicit metric to further facilitate the learning of domain-invariant features by the M.
The objective function of the
M mainly consists of adversarial loss and segmentation loss [
41], which are expressed as
where
is the label map of
s,
C is the number of classes, and
is an indicator function with a value of 1 if the condition is true and 0 otherwise.
H and
W are the height and width of the output probability map.
is the translated source domain probability of the output of
M, defined as
.
where
represents a function that measures the similarity of features or distributions, which is a linear combination of kl divergence and SSIM [
42].
and
represent the adversarial loss coefficient and the similarity loss coefficient.
And the objective function of the
D is the adversarial loss, which is expressed as follows:
Additionally, in the testing phase, we employed dilate prediction to mitigate errors due to insufficient contextual information in pixels at the edges of image blocks.
5. Discussion
In this section, the terrain classification results are critically analyzed and discussed in order to place them in a broader context.
STDM-UDA comprises two distinct components. The style transfer network considers only the coherence of low-level statistical data across images from both the source and target domains, allowing for flexible selection of domains. Nevertheless, domain adaptive segmentation networks must not only reconcile domain disparities in the semantic feature space of an image but also accomplish image feature classification, thereby restricting the choice of both source and target domains. As depicted in the ground truth of PoDelta in
Figure 6, there is a notable scarcity of forest and buildings compared to water and farmland in this scene. When utilizing PoDelta as the source domain, the qualitative classification outcomes of Rosenheim and JiuJiang indicate that the substantial imbalance in data categories within the source domain significantly diminishes the model’s capability to discern the boundaries of these sparse terrains in the target domain. Conversely, with the Pohang source data, characterized by a more equitable terrain distribution, STDM-UDA exhibits a marked advantage in both qualitative and quantitative classification evaluations. The analyses above demonstrate that STDM-UDA effectively transfers terrain information from source domains featuring balanced terrain distributions, while still retaining a certain sparse terrain recognition capability on an unbalanced source.
Moreover, the efficacy of style transfer and domain metrics in STDM-UDA is showcased through both qualitative and quantitative terrain classification across the above three experimental scenes in
Section 4.4. The findings reveal that features extracted by the model solely in feature space under single-step DA are inevitably influenced by low-level statistical disparities among image domains, diminishing the model’s capability to discern distinct terrain edges. Introducing domain metrics atop this framework can enhance the model’s edge perception ability, yet it may diminish the feature disparities between rare and other terrains. STDM-UDA incorporates a style transfer and extends single-step DA into multi-step DA. This approach enhances the resemblance between SAR images from source and target domains in image space, thus mitigating low-level feature interference and yielding superior classification performance across the experiment’s entire dataset. This indicates that aligning domain differences in both image space and feature space plays a crucial role in the effectiveness of STDM-UDA for terrain classification tasks.
6. Conclusions
In this paper, a multi-step unsupervised domain adaptive framework called STDM-UDA is proposed. STDM-UDA transfers the labeling information of SAR images into the source domain and implements terrain classification on the unlabeled target domain to reduce the dependence on labeled samples and improve model generalization. First, the source domain image undergoes a style transformation via a network to produce a translated source domain with the target domain’s style, thus minimizing domain gaps in image space. Then, an adaptive segmentation network extracts image features while simultaneously aligning domain differences in feature space and facilitating terrain classification. Additionally, domain metrics are integrated to offer supplementary feature distribution information to the model. Experiments conducted across three SAR scenes showcase the efficacy of STDM-UDA’s learning and classification regarding domain disparities induced by various imaging parameters under examination. The findings further indicate that STDM-UDA exhibits enhanced capabilities in feature learning and migration, particularly on source domain data characterized by balanced terrain classes. Furthermore, the ablation analysis of STDM-UDA reveals the terrain classification task’s greater sensitivity to spatial domain disparities in images over those in feature space. In other words, the efficacy of the classification is significantly influenced by the quality of the style transfer network. This underscores the imperative and effectiveness of concurrently constraining differences between image and feature space domains.
In the future, we aim to address the bias stemming from the source domain’s terrain distribution, thereby enhancing the adaptability of our proposed method across a broader spectrum of source domain data. Furthermore, integrating multimodal data can offer a more comprehensive and complementary terrain perspective. Hence, our future work will try to explore the use of multimodal data semantic information to improve SAR terrain classification method.