Single-Scene SAR Image Data Augmentation Based on SBR and GAN for Target Recognition

Feng, Shangchen; Fu, Xikai; Feng, Yanlin; Lv, Xiaolei

doi:10.3390/rs16234427

Open AccessArticle

Single-Scene SAR Image Data Augmentation Based on SBR and GAN for Target Recognition

¹

Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China

²

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

³

Key Laboratory of Target Cognition and Application Technology (TCAT), Beijing 100190, China

⁴

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(23), 4427; https://doi.org/10.3390/rs16234427

Submission received: 22 October 2024 / Revised: 18 November 2024 / Accepted: 22 November 2024 / Published: 26 November 2024

(This article belongs to the Special Issue Advances in Synthetic Aperture Radar (SAR) Data Processing and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

High-performance neural networks for synthetic aperture radar (SAR) automatic target recognition (ATR) often encounter the challenge of data scarcity. The lack of sufficient labeled SAR image datasets leads to the consideration of using simulated data to supplement the dataset. On the one hand, electromagnetic computation simulations provide high amplitude accuracy but are inefficient for large-scale datasets due to their complex computations and physical models. On the other hand, ray tracing simulations offer high geometric accuracy and computational efficiency but struggle with low amplitude correctness, hindering accurate numerical feature extraction. Furthermore, the emergence of generative adversarial networks (GANs) provides a way to generate simulated datasets, trying to balance computational efficiency with image quality. Nevertheless, the simulated SAR images generated based on random noise lack constraints, and it is also difficult to generate images that exceed the parameter conditions of the real image’s training set. Hence, it is essential to integrate physics-based simulation techniques into GANs to enhance the generalization ability of the imaging parameters. In this paper, we present the SingleScene-SAR Simulator, an efficient framework for SAR image simulation that operates under limited real SAR data. This simulator integrates rasterized shooting and bouncing rays (SBR) with cycle GAN, effectively achieving both amplitude correctness and geometric accuracy. The simulated images are appropriate for augmenting datasets in target recognition networks. Firstly, the SingleScene-SAR Simulator employs a rasterized SBR algorithm to generate radar cross section (RCS) images of target models. Secondly, a specific training pattern for cycle GAN is established to translate noisy RCS images into simulated SAR images that closely resemble real ones. Finally, these simulated images are utilized for data augmentation. Experimental results based on the constructed dataset show that with only one scene SAR image containing 30 target chips, the SingleScene-SAR Simulator can efficiently produce simulated SAR images that exhibit high similarity in both spatial and statistical distributions compared with real images. By employing simulated SAR images for data augmentation, the accuracy of target recognition networks can be consistently and significantly enhanced.

Keywords:

automatic target recognition (ATR); data augmentation; synthetic aperture radar (SAR) image simulation

1. Introduction

Synthetic aperture radar (SAR) automatic target recognition (ATR) systems are designed to detect and classify targets of interest within SAR data [1]. Automatic target recognition systems based on deep learning algorithms have demonstrated satisfactory performance. Typically, the development of a high-performance and robust deep learning model necessitates a comprehensive dataset that is ideally large-scale and diverse. However, obtaining a large number of SAR image samples of specific targets with different configurations is both expensive and challenging. Consequently, effectively augmenting SAR datasets when faced with limited data sources is crucial for advancing SAR ATR technology.

Traditional enhancement methods primarily use techniques like rotation, flipping, zooming, and shifting. These methods create new training samples by modifying the geometric shapes or adjusting pixel values. Since these techniques do not generate new information beyond the original image, they are generally not considered effective data augmentation techniques. In contrast, image simulation technology enables the efficient and cost-effective generation of a substantial number of simulated SAR images with varying imaging configurations and specific targets, all while imposing little limitations. According to research conducted by Timo Balz and Stefan Auer [2], there are primarily two categories of SAR simulation methods:

Simulation methods focusing on amplitude correctness:
These methods typically necessitate the establishment of appropriate physical scattering models and employ approximate electromagnetic computation techniques to derive the final simulated SAR images. As early as 1991, J. M. Nasr and D. Vidal-Madjar utilized high-frequency approximation techniques to compute the RCS of electrically large targets subsequently modulating the selected SAR pulse response to generate simulated images [3]. Feng Xu and Ya-Qiu Jin applied mapping and projection principles alongside a vector radiative transfer model and integral equation method to calculate the scattering field, successfully simulating polarized SAR images of complex terrain scenes [4]. Wenna Fan et al. [5] used a combination of Geometric Optics (GOs) and Physical Optics (POs) methods to obtain spatially distributed scattering fields, employing a modified frequency scaling algorithm (AFSA) for echo data processing, which resulted in highly squinted spotlight simulated SAR images. Lamei Zhang et al. [6] integrated the Kirchhoff approximation with GO and PO methodologies to compute the frequency domain echo of linear frequency modulation (LFM) signals; they then employed inverse fast Fourier transform (IFFT) techniques to produce simulated polarized SAR images. While these methods offer well-defined physical mechanisms and high numerical accuracy, they inherently demand significant computational resources due to their reliance on detailed physical modeling and numerical techniques. Additionally, SAR imaging involves multiple signal processing steps, each introducing further computational overhead. These challenges become particularly pronounced when generating large-scale datasets, significantly limiting the practicality of these approaches in deep learning applications.
Simulation methods focusing on geometric accuracy:
These methods typically necessitate precise CAD models and employ various ray tracing algorithms to simulate the interaction between electromagnetic waves and target models, thereby generating simulated SAR images. RaySAR [7] utilizes the ray tracing algorithm from the open-source software POV-ray to simulate radar signals in 3D—specifically, azimuth, range, and elevation. However, RaySAR primarily emphasizes the geometric accuracy of the simulated signals while neglecting random scattering effects. It also employs Phong shading’s diffuse reflection model and specular reflection model to simplify the scattering processes of targets. Due to significant differences in the physical properties of light compared to electromagnetic waves, as well as a lack of precise analytical expressions for modeling scattering effects such as specular reflection coefficients and diffuse reflection coefficients, the pixel amplitude values derived from SAR images generated by RaySAR do not possess exact physical significance. Nonetheless, RaySAR remains a commendable tool for SAR image simulation. The coherent raytracing SAR simulator [8], developed based on the ray tracing concept introduced by Amanatides and Woo [9], serves as a coherent SAR image simulator. Similar to RaySAR, this simulator models scattering effects using diffuse reflection coefficients and specular reflection coefficients. Additionally, it mandates that all polygons within the 3D model be convex; this requirement limits its applicability in certain scenarios. On another front, SARViz [10] is a rasterization method implemented on graphics processing units (GPUs), capable of generating approximately 20 simulated images with dimensions of 1024 × 768 pixels per second [2]. However, in pursuit of enhanced simulation efficiency, SARViz compromises the geometric and radiometric accuracy to some extent; notably, it is unable to simulate third-order reflections or higher.

Overall, both of the above SAR image simulation methods have limitations in that they cannot simultaneously ensure geometric correctness and amplitude correctness. Consequently, it is challenging for them to be directly employed in generating large-scale and high-quality datasets applicable to neural networks.

In addition, simulation data generation methods based on generative adversarial networks (GANs) provide a new approach. These methods use specific probability density distributions to fit the distribution of real SAR data, enabling the generation of images that are close to realistic. For instance, Xianjie Bao et al. [11] employed DCGAN, WGAN, and WGAN-GP techniques to translate random noise into simulated SAR images, likely samples in MSTAR. Changjie Cao et al. [12] inserted label information into the random noise and used label-directed generative adversarial networks (LDGANs) to translate it into simulated SAR images. Unfortunately, the data generation seeds of these generative networks are random noise, with little or weak constraints on the generated results, making the configuration parameters of the generated images relatively random and difficult to ensure that every generated image is usable. Moreover, although these methods can generate simulated images with good visual effects, due to the high sensitivity of SAR images to changes in target azimuth, incidence angle, and other parameters [13], they can only generate simulated images under the same parameter conditions. For example, when there are samples of incidence angles as 15° and 17° in the MSTAR dataset [14], the generated images with the same parameter configuration have limitations in data augmentation.

In recent years, diffusion models [15] have demonstrated exceptional performance in generating high-quality synthetic data, achieving superior results compared to GANs in the visual domain. Diffusion models are a type of generative model based on Markov chains, which generate images by gradually introducing noise into the data and iteratively learning to recover the original data from the noise. Compared with GANs, diffusion models have distinct advantages in both generation quality and training stability. Although diffusion models exhibit outstanding generative performance, their potential to generate simulated SAR images has not been fully explored. Denisa Qosja et al. [16] utilized denoising diffusion probabilistic models (DDPMs) to achieve both conditional and unconditional simulated SAR image generation. While diffusion models currently excel in generating simulated SAR images based on class labels or semantic constraints, their future potential for generating simulated SAR images based on “unpaired image-to-image translation” should not be overlooked.

In conclusion, it is necessary and meaningful to incorporate appropriate physics-based simulation techniques into GANs to enhance the generalization ability of the imaging parameters.

Fortunately, a simulation method known as shooting and bouncing rays (SBR) [17] combines GO and PO in a concise and efficient manner, providing the possibility. Ref. [18] employed the SBR algorithm to simulate one SAR image of urban structures, and [19] applied a time-domain SBR algorithm for simulating imaging of ship models. These methods have verified the effectiveness of the algorithm in addressing complex geometric structure targets. Moreover, Y.-L. Chang et al. [20] implemented the SBR algorithm to simulate spaceborne SAR imaging systems, successfully obtaining simulated SAR images and exploring their potential application in target classification network training. However, this method simulates the RD imaging algorithm, entailing a considerable computational burden; and the simulation dataset employed only contains 180 images with a resolution of 3 m. In summary, by combining the SBR algorithm with GANs, there is hope to address the two key issues mentioned above: firstly, enabling the simulated images generated by GANs to have better controllability and be less limited by the imaging parameters of the real training dataset; secondly, reducing the computational load of the imaging algorithm while maintaining satisfactory simulation imaging quality.

Compared with other GANs, cycle GAN [21] for unpaired image-to-image translation facilitates its combination with the SBR algorithm. Yunpeng Chang et al. [22] translated simulated data with cycle GAN and 232 real samples with an incidence angle of 17°, achieving realistic-simulated images with the same incidence angle. However, this research did not generate simulated images with different imaging parameters, and its effect on data augmentation was limited. Lei Liu et al. [23] translated simulated data with cycle GAN and RaySAR simulator, thereby enhancing the performance of the target classification network. Nonetheless, this method requires the preparation of 40 real samples with different parameters and target types, which contradicts the original intention of data augmentation technology to expand information under limited real data conditions. Additionally, RaySAR does not consider changes in the target during the synthetic aperture process, and its amplitude value calculation results are derived from simplified rendering models. None of these methods can generate diverse simulated data under the highly restricted conditions of real data.

This paper proposes a data augmentation framework (referred to as SingleScene–SAR simulator) encompassing the SBR algorithm and cycle GAN. The SingleScene–SAR simulator merely demands a single-scene SAR image. It is capable of generating simulation data with diverse parameters and is sufficient for training high-performance neural networks. Specifically, the SingleScene–SAR simulator can simultaneously meet the following four requirements:

Develop RCS image simulation technology based on the SBR algorithm, which meets the needs of large-scale high-resolution SAR image training sets while ensuring geometric accuracy and amplitude correctness.
Use cycle GAN to obtain simulated images that are closer to real data. The training dataset strictly limits the number of samples and sample parameters, i.e., the dataset contains only 30 images, all from a single scene SAR image.
This framework can simulate and generate a large number of realistic and usable simulated images with other imaging parameters, such as data with different incidence angles and different aircraft models.
Based on public data, construct a 0.5-m resolution aircraft target dataset Aircraft-VariedSAR containing different imaging parameters and random aircraft models and verify the improvement effect of the simulated data generated by this framework on the performance of the target recognition network on this dataset.

This paper will provide an SAR image simulation framework, SingleScene–SAR simulator, which allows for training a neural network with satisfactory recognition accuracy using only a single scene SAR image containing a few targets for any SAR target recognition task. This further alleviates the limitations of data scarcity on deep learning SAR target recognition.

The remainder of this article is structured as follows. Section 2 introduces the SBR algorithm and cycle GAN for unpaired image-to-image translation. Section 3 details the proposed framework for the SingleScene–SAR Simulator. Section 4 presents data descriptions, experimental results, and analyses. Finally, Section 5 concludes the article.

2. Background

In Section 2.1, we briefly introduce the basic concepts of SBR, including its main principles, structure, and calculation process to clarify the method discussed in Section 3. In Section 2.2, we present the unpaired image-to-image translation based on cycle GAN, along with the rationale for choosing cycle GAN.

2.1. Shooting and Bouncing Rays

Shooting and bouncing rays (SBR) is a high-frequency approximation technique used to compute the electromagnetic scattering characteristics of complex targets. Originally developed for calculating the RCS of open cavity interiors like cylinders, SBR simulates electromagnetic wave propagation and scattering through multiple reflections, as shown in Figure 1. The SBR methodology includes three main components: ray tracing, amplitude tracking, and far-field integration. This section presents key derivations and formulaic conclusions related to SBR, with detailed references from Hao Ling et al.’s work [17].

2.1.1. Ray Tracing

SBR employs a set of parallel ray tubes, called a ray pool, to model incident plane waves. For each ray tube, we define the departure point

r_{0} (x_{0}, y_{0}, z_{0})

on the aperture plane

\sum_{A}

and its propagation direction vector

s (s_{x}, s_{y}, s_{z})

. In a homogeneous isotropic medium, the ray tube travels in a straight line. Letting the propagation time be

t (t > 0)

, we have:

r (x, y, z) = r_{0} (x_{0}, y_{0}, z_{0}) + s (s_{x}, s_{y}, s_{z}) t

(1)

When the ray tube intersects with the target body, it reflects. The plane formed by the incident direction

s_{i}

and the normal

n

of the intersection is called the incident plane, as shown in Figure 2a. The angle between the normal unit vector

n

and the incident direction

s_{i}

is

θ_{i}

, while the angle between

n

and reflected direction

s_{r}

is

θ_{r}

. According to Snell’s law,

θ_{i} = θ_{r}

; thus, the direction of the reflected ray tube

s_{r}

is:

s_{r} = s_{i} - 2 (s_{i} \cdot n) n

(2)

The reflected ray tube continues in a straight line. If it intersects the target body again, the process repeats until the ray tube extends to infinity. Thus, all ray paths can be determined.

2.1.2. Amplitude Tracking

During the propagation of the ray tube, both field strength and phase change accordingly. The field strength at the (i + 1)-th intersection

r_{i + 1}

with the target can be derived from that at the i-th intersection

r_{i}

:

E_{i + 1} = {(D F)}_{i} \cdot {(Γ)}_{i} \cdot E_{i} \cdot exp (- j k |r_{i + 1} - r_{i}|)

(3)

where

{(D F)}_{i}

and

{(Γ)}_{i}

are the divergence factor and reflection coefficient matrix at

r_{i}

, respectively.

The divergence factor

{(D F)}_{i}

characterizes the variation in the internal field strength of the ray tube when its cross-sectional area changes following reflection off a curved surface, as illustrated in Figure 2b. In scenarios where the model comprises a substantial number of triangular facets and the cross-sectional area of the ray tube is sufficiently small, alterations in this cross-sectional area can be considered negligible. Consequently,

{(D F)}_{i}

simplifies to a constant value of 1.

The reflection coefficient matrix

{(Γ)}_{i}

describes the change in electromagnetic wave intensity upon reflection. According to geometric optics, the reflection coefficients for vertically and parallel polarized waves differ. Therefore, the electric field must be decomposed into TM and TE components at the reflection point, with the reflected electric field calculated separately. Thus, Equation (3) can be expressed as:

E_{i + 1} = [\begin{matrix} E_{i + 1}^{‖} \\ E_{i + 1}^{⊥} \end{matrix}] = {[\begin{matrix} Γ^{‖} & 0 \\ 0 & Γ^{⊥} \end{matrix}]}_{i} \cdot [\begin{matrix} E_{i}^{‖} \\ E_{i}^{⊥} \end{matrix}] \cdot exp (- j k |r_{i + 1} - r_{i}|)

(4)

The reflection coefficients for vertically and parallel polarized waves,

Γ^{⊥}

and

Γ^{‖}

, are defined by Fresnel’s theorem. Once the incident wave

E_{0}

is established, iterative calculations using Equation (4) can determine the field strength at all intersection points of the ray tubes with the target body.

2.1.3. Far-Field Integration

The far-field scattering of a target is the sum of the scattering field contributions of all ray tubes reflected from the target’s surface. In this context,

k r \to \infty

. According to [17], the far-field scattering formula is:

E (r, θ, ϕ) = \frac{exp (- j k r)}{r} [\hat{θ} A_{θ} + \hat{ϕ} A_{ϕ}]

(5)

where

[\begin{matrix} A_{θ} \\ A_{ϕ} \end{matrix}] = \frac{j k}{2 π} {\int \int}_{\sum_{A}} exp [j k (u x + v y)] \cdot [\begin{matrix} E_{x} cos ϕ + E_{y} sin ϕ \\ (- E_{x} sin ϕ + E_{y} cos ϕ) cos θ \end{matrix}] d x d y

(6)

In this equation,

u = sin θ cos ϕ

,

v = sin θ sin ϕ

,

E_{x}

and

E_{y}

are the x and y components on the aperture plane

\sum_{A}

. Let the direction of each ray tube be

(s_{x}, s_{y}, s_{z})

, and the intersection with the aperture plane be

(x_{i}, y_{i})

. Then, Formula (6) can be approximated as:

\begin{matrix} [\begin{matrix} A_{θ} \\ A_{ϕ} \end{matrix}] = \frac{j k}{2 π} \sum_{\begin{matrix} i \\ all rays \end{matrix}} \underset{\begin{matrix} ith exit \\ ray tube \end{matrix}}{\int \int} d x d y \cdot exp [j k (u x + v y)] \cdot exp \{- j k [s_{x} (x - x_{i}) + s_{y} (y - y_{i})]\} \\ \cdot [\begin{matrix} E_{x} cos ϕ + E_{y} sin ϕ \\ (- E_{x} sin ϕ + E_{y} cos ϕ) cos θ \end{matrix}] \end{matrix}

(7)

2.1.4. Radar Cross Section (RCS)

RCS is a physical quantity that quantitatively characterizes the scattering capability of a scatterer in a specific direction. For far-field scattering, the formula for calculating RCS is as follows:

σ = lim_{R \to \infty} 4 π R^{2} \frac{{|E^{s}|}^{2}}{{|E^{i}|}^{2}}

(8)

where

E^{i}

and

E^{s}

are the incident and scattered electric fields, respectively. Based on the polarization direction of the incident electric field and the polarization mode (co-polarization or cross-polarization), [17] defines that RCS can be calculated using the variables in Formula (7) as follows:

σ = 4 π {|A_{θ}|}^{2} \begin{matrix} or \end{matrix} σ = 4 π {|A_{ϕ}|}^{2}

(9)

2.2. Cycle GAN-Based Unpaired Image-to-Image Translation

Image-to-image translation aims to map input images to different domains, such as generating photographs from sketches. Traditionally, training such neural networks necessitated a substantial amount of paired training data. However, acquiring paired training data can be both challenging and costly. To address this issue, unpaired image-to-image translation has been proposed. Resales et al. [24] introduced a Bayesian framework that integrates prior information derived from Markov random fields calculated based on source images in order to infer the most probable output image while also learning the rendering style. Liu et al. [25] combined variational autoencoders with coupled GANs, yielding high-quality results in image translation. Yaniv et al. [26] optimized distance loss within a specific feature space between input and output images, ensuring that outputs closely resemble inputs. These methods require pre-defining similarity functions or assuming that the input and output are in the same low-dimensional embedding space. In contrast, cycle GAN [21] does not require these preset constraints, making it more versatile.

In the SingleScene–SAR Simulator, cycle GAN is used to achieve the translation from the RCS image domain to the SAR image domain, a typical unpaired image-to-image task. Specifically, we aim for the designed simulation image generator to operate under conditions where real data are extremely scarce, i.e., providing only one SAR scene image. In such cases, the chips obtained from the SAR scene image are often fewer than 40. Compared with the extensive and varied simulated images, real images are scarce and difficult to pair. Additionally, traditional GANs face challenges in generating controllable images for unpaired image translation tasks. The lack of strong constraints on the simulation image generation process means the final trained generator may ignore the input simulation image’s information and mimic the distribution of real images instead. In this case, although the generated image may closely resemble a real image, it has little relation to the input information. Considering these issues, cycle GAN was ultimately chosen as the generator architecture for image translation.

Cycle GAN is a stable and versatile method for unpaired image-to-image translation. Its network architecture is illustrated in Figure 3. This approach requires only datasets from two distinct domains without the necessity for one-to-one alignment between the data points, thereby enabling the learning of probability distributions for each domain and establishing a mapping between these distributions to facilitate image translation. This property greatly facilitates the preparation of simulated RCS image sets and real SAR image sets. Additionally, cycle GAN consists of two generators,

G_{X}

and

G_{Y}

, and two discriminators,

D_{X}

and

D_{Y}

. This setup creates a bidirectional link between the image domains X and Y. The GAN loss ensures that synthetic images closely resemble the target domain, while the cycle-consistency loss guarantees reversible mappings between the two domains. This unique bidirectional structure enforces the generator to utilize the input image information when generating simulated images, effectively applying additional constraints to ensure that the generated simulated SAR images are diverse and usable.

3. Proposed Method

In Section 3.1, we outline the overall framework of the method and its constituent modules. In Section 3.2, we provide a detailed description of the principles, structures, and computational processes of each functional module. In Section 3.3, the neural network used to verify the effectiveness of data augmentation for simulated SAR images is discussed. Finally, Section 3.4 introduces the hardware platform and software information used to implement the entire method.

3.1. Framework of the SingleScene–SAR Simulator

Figure 4 illustrates the overall framework of the proposed SingleScene–SAR simulator. This framework only necessitates a small number of SAR images from a single scene, typically around 30 cropped real SAR chips. Furthermore, the SingleScene–SAR simulator is capable of generating realistic simulated SAR target images for various target models under any incidence angle and target azimuth angle.

The SingleScene–SAR simulator consists of two primary components: RCS image simulation and SAR image translation. The operation process of RCS image simulation is depicted in Figure 4a, which encompasses the parameter parsing module, SBR simulation module, and RCS synthesis module. The parameter parsing module conducts a comprehensive analysis of the SAR image metadata alongside the target model coordinate data, subsequently generating scanned parameters for SBR simulation that align with specified requirements. Following this, the SBR simulation module scans the target CAD model based on the generated scanning parameter data to produce scanned data. However, the scanned data cannot be utilized directly; this necessitates further processing by the RCS synthesis module to yield the final simulated RCS image. The operational process of SAR image translation is depicted in Figure 4b. In this segment, it is important to note that the data required during the training phase differ from that needed during the prediction phase. Simply put, during the training phase, it is necessary that real data and simulated RCS images have similar imaging parameters. The following sections will provide detailed operational details of these two parts.

3.2. SAR Image Simulation

3.2.1. Parameter Parsing Module

The parameter parsing module is specifically designed to supply all essential parameters for the SBR algorithm. This includes the position coordinates

P

, as well as the shape vectors

v_{r i g h t}

and

v_{d o w n}

, which are utilized during ray pool sampling. Furthermore, it encompasses the initial position

P_{t u b e}

, the propagation direction

d

, and the polarization direction

p

of each ray tube within the ray pool.

To achieve realistic simulated SAR images, it is essential that the scanning parameters of the SBR algorithm align with those of actual SAR platforms. The current main imaging modes of spaceborne SAR platforms are stripmap mode, spotlight mode, and sliding spotlight mode. The parameter parsing module can change the position

P_{t u b e}

and propagation direction

d

of the ray tubes through different constraints to simulate the three imaging modes. As the simulation modeling of imaging modes is not within the scope of this paper, we will focus on the spotlight mode as a representative example.

Figure 5a illustrates the motion model of the spaceborne SAR platform operating in spotlight imaging mode. Given that the target’s volume is significantly smaller than the orbital scale of the satellite, it is treated as a point target. A Cartesian coordinate system is established with the Earth’s tangent plane at the target point serving as the x-y plane. The x-axis and y-axis are aligned with the range and azimuth directions, respectively, while the z-axis points upwards. At this juncture, the radar trajectory height is denoted as H, and the slant range to the target is represented by R. Specifically,

R_{0}

denotes the shortest slant range corresponding to zero Doppler time. The radar traverses along the azimuth direction at a speed of v and scans over the target in a “stop-and-go” manner. The antenna’s azimuth scanning angle range to the target point is

(- θ_{a z}, θ_{a z})

. The resolutions in the azimuth and range directions are

Δ x

and

Δ y

, respectively, with a central operating frequency of

f_{0}

. Let the incidence angle of the antenna to the target be

θ_{i n}

, then the number of samples of the simulated radar’s ray pool along the azimuth direction is:

N = \frac{2 f_{P R F} R_{0} tan θ_{a z}}{v}

(10)

where

f_{P R F}

denotes the pulse repetition frequency. The SingleScene–SAR simulator acquires the target scattering characteristics throughout the synthetic aperture process by conducting multiple samplings along the azimuth direction.

Figure 5b shows the grid sampling of the target model by the SBR ray pool. To simplify calculations, the grid is aligned with both azimuth and range directions, with a unit size defined as

Δ x \times Δ y

. The grid must encompass the entire model, defined by minimum coordinate point

P_{min} = {[x_{min}, y_{min}, 0]}^{T}

and maximum coordinate point

P_{max} = {[x_{max}, y_{max}, z_{max}]}^{T}

. Let

P_{c e n} = {[x_{c e n}, y_{c e n}, z_{c e n}]}^{T}

represent the center of the target. The coordinates

P (n)

for sampling in the azimuth direction during the n-th iteration are:

P (n) = [\begin{matrix} x_{s a r} \\ y_{s a r} \\ z_{s a r} \end{matrix}] = [\begin{matrix} σ_{s c a n} \cdot R_{0} sin θ_{i n} + x_{c e n} \\ \frac{n v}{f_{P R F}} - R_{0} tan θ_{a z} + y_{c e n} \\ R_{0} cos θ_{i n} \end{matrix}]

(11)

where

n = 1, 2, \dots, N

.

σ_{s c a n}

takes 1 for left-looking radar and −1 for right-looking radar.

To obtain simulated images of the target at various azimuth angles, the coordinate

P (n)

can be rotated in the opposite direction to achieve an equivalent rotational effect on the target. When the target undergoes a counterclockwise rotation by an angle

α

around the z-axis, the coordinate

P^{'} (n, α)

of the ray pool is:

P^{'} = [\begin{matrix} {x^{'}}_{s a r} \\ {y^{'}}_{s a r} \\ {z^{'}}_{s a r} \end{matrix}] = [\begin{matrix} cos α & sin α & 0 \\ - sin α & cos α & 0 \\ 0 & 0 & 1 \end{matrix}] P

(12)

The ray pool at

P (n)

will effectively sample each grid unit of the target grid. Consequently, to determine the propagation direction of the ray pool, it is essential to first define the position coordinates for each grid unit. For the target grid illustrated in Figure 5b, both the boundary range and minimum unit size have been established. Let

N_{a z}

and

N_{r g}

denote the number of grid units along the azimuth and range directions, respectively. The coordinates

G (n_{a z}, n_{r g})

of a specific grid unit can then be expressed as:

G (n_{a z}, n_{r g}) = [\begin{matrix} x_{min} + (n_{r g} - 1) Δ x - z_{max} tan θ_{i n} \\ y_{min} + (n_{a z} - 1) Δ y - z_{max} tan θ_{i n} \\ 0 \end{matrix}]

(13)

where

n_{a z} = 1, 2, \dots, N_{a z}

denotes the arrangement position of the grid unit along the azimuth direction;

n_{r g} = 1, 2, \dots, N_{r g}

indicates the arrangement position of the grid unit along the range direction. The term

z_{max} tan θ_{i n}

ensures that the incident angle

θ_{i n}

of the ray encompasses the entire target model. At this point, the propagation direction

d_{o b s} (n, n_{a z}, n_{r g})

of the ray pool can be readily determined as:

d_{o b s} (n, n_{a z}, n_{r g}) = [\begin{matrix} x_{o b s} \\ y_{o b s} \\ z_{o b s} \end{matrix}] = \frac{G (n_{a z}, n_{r g}) - P (n)}{∥G (n_{a z}, n_{r g}) - P (n)∥}

(14)

To ensure pixel-level accuracy of the RCS image, the projection plane of each ray pool on the grid unit should strictly correspond to the size of the grid unit

Δ x \times Δ y

. Since the ray pool usually performs grid sampling in an oblique incidence manner, the cross-section of the ray pool should be a specific shape of a parallelogram in three-dimensional space. As illustrated in Figure 5b, a spatial parallelogram can be defined by a pair of shape vectors

v_{r i g h t} (n, n_{a z}, n_{r g})

and

v_{d o w n} (n, n_{a z}, n_{r g})

. These two shape vectors originate from a common starting point, with their respective magnitudes representing the side lengths of the corresponding parallelogram. The calculation formulas are:

v_{r i g h t} = Δ x [e_{x} - (e_{x} \cdot d_{o b s}) d_{o b s}]

(15)

v_{d o w n} = Δ y [- e_{y} + (e_{y} \cdot d_{o b s}) d_{o b s}]

(16)

where

e_{x}

and

e_{y}

represent the unit basis vectors within the coordinate system, while

d_{o b s}

denotes the propagation direction associated with the ray pool. The relative density of the ray tube,

ρ_{t u b e}

, is defined as the number of linearly arranged ray tubes per unit wavelength

λ

. The initial positions of each ray tube, denoted as

P_{t u b e} (n, n_{r}, n_{d})

in ray pool

P (n)

, are:

P_{t u b e} = P + (\frac{n_{r}}{N_{r i g h t}} - \frac{1}{2}) v_{r i g h t} + (\frac{n_{d}}{N_{d o w n}} - \frac{1}{2}) v_{d o w n}

(17)

with

\{\begin{matrix} N_{r i g h t} = \frac{ρ_{t u b e}}{λ} ∥v_{r i g h t}∥ \\ N_{d o w n} = \frac{ρ_{t u b e}}{λ} ∥v_{d o w n}∥ \end{matrix}

(18)

where

N_{r i g h t}

and

N_{d o w n}

denote the number of ray tubes along the respective shape vectors;

n_{r} = 1, 2, \dots, N_{r i g h t}

,

n_{d} = 1, 2, \dots, N_{d o w n}

.

Each ray tube should include the polarization direction

p

of the corresponding electromagnetic wave to accurately compute the scattered field in far-field integration. The vertical polarization direction

p_{V} (n, n_{a z}, n_{r g})

and horizontal polarization direction

p_{H} (n, n_{a z}, n_{r g})

can be derived from spherical coordinate decomposition:

p_{V} = {[\begin{matrix} cos φ c o s θ & sin φ c o s θ & - sin θ \end{matrix}]}^{T}

(19)

p_{H} = {[\begin{matrix} - sin φ & cos φ & 0 \end{matrix}]}^{T}

(20)

with

\{\begin{matrix} φ = arctan \frac{y_{o b s}}{x_{o b s}} \\ θ = \frac{π}{2} - arctan \frac{z_{o b s}}{\sqrt{x_{o b s}^{2} + y_{o b s}^{2}}} \end{matrix}

(21)

3.2.2. SBR Simulation Module

The original SBR algorithm is limited in that it can only provide the RCS value for the entire target, lacking geometric information within the spatial domain. To address this limitation, a simulation module based on the SBR algorithm has been developed. This module conducts rasterized sampling of the target to generate RCS images that possess both geometric accuracy and amplitude correctness.

Once all of the necessary scanning parameters are provided to the SBR simulation module, ray tubes are generated, and an iterative process begins. Each ray tube is assigned a unique starting position

P_{t u b e}

, propagation direction

d

, and polarization direction

p

. They then propagate linearly according to Equation (1). Upon intersecting with the target, the propagation direction

d

and polarization direction

p

are updated based on Equations (2) and (4). The updated ray tubes continue their linear propagation, repeatedly intersecting with the target model and updating parameters until they propagate to infinity. Ultimately, each ray tube’s iteration concludes at the final propagation direction

d_{l a s t}

and polarization direction

p_{l a s t}

. Meanwhile, the number of collisions between the ray tube and the target is denoted as

n_{b}

.

For a ray tube originating from the position

P_{t u b e} (n, n_{r}, n_{d})

, the resultant reflected electric and magnetic fields are as follows:

E^{r} (n, n_{a z}, n_{r g}, n_{r}, n_{d}) = p_{l a s t} Γ^{n_{b}} E_{0} exp (j k r + ϕ)

(22)

H^{r} (n, n_{a z}, n_{r g}, n_{r}, n_{d}) = - E^{r} \times d_{l a s t}

(23)

where

ϕ

represents the initial phase,

E_{0}

denotes the initial amplitude of the electric field, and

Γ

signifies the reflection coefficient of the electric field corresponding to its polarization direction. For an ideal conductor,

η = 0

. According to Fresnel’s theorem, we have

Γ^{⊥} = - 1

and

Γ^{‖} = 1

.

The far-field scattering of a target results from the cumulative contributions of the scattering fields associated with all ray tubes that interact with it. In the context of SAR, the primary interest lies in the backscattering field. For a ray tube denoted as

P_{t u b e} (n, n_{r}, n_{d})

, let the position vector of the observation point for the scattering field be represented as

r_{o b s} = P_{t u b e}

, and let the observation direction correspond to the initial propagation direction

d_{o b s}

of the ray tube. Based on Equations (5)–(7), the far-field scattering characteristics of the ray tube can be computed using the following equations:

E^{s} (n, n_{a z}, n_{r g}, n_{r}, n_{d}) = \frac{exp (- j k r)}{r} [u_{V} A_{r i g h t} + u_{H} A_{d o w n}]

(24)

A_{r i g h t} (n, n_{a z}, n_{r g}, n_{r}, n_{d}) = \sum_{n_{r} = 1}^{N_{r i g h t}} \sum_{n_{d} = 1}^{N_{d o w n}} - \frac{i k A_{t u b e}}{4 π} (E^{r} \times u_{V} + H^{r} \times u_{V}) \cdot e_{d i r} exp (- j k \cdot r)

(25)

A_{d o w n} (n, n_{a z}, n_{r g}, n_{r}, n_{d}) = \sum_{n_{r} = 1}^{N_{r i g h t}} \sum_{n_{d} = 1}^{N_{d o w n}} - \frac{i k A_{t u b e}}{4 π} (E^{r} \times u_{H} + H^{r} \times u_{V}) \cdot e_{d i r} exp (- j k \cdot r)

(26)

In these equations,

u_{V}

and

u_{H}

are unit vectors representing the vertical and horizontal polarization directions of the observation. They are calculated similarly to Equations (19) and (20).

k

is the wave vector, which can be expressed in Cartesian coordinates using

φ

and

θ

from Equation (21) as follows:

k = k [(e_{x} cos φ + e_{y} sin φ) sin θ + e_{z} cos θ]

(27)

A_{t u b e}

is the area of the ray tube, calculated as:

A_{t u b e} = \frac{∥v_{r i g h t} \times v_{d o w n}∥}{N_{r i g h t} N_{d o w n}}

(28)

3.2.3. RCS Synthesis Module

To achieve high-resolution RCS images, the SBR simulation module performs extensive data computations. When the radar samples the target N times in the azimuth direction, and there are

N_{a z} \times N_{r g}

grid cells for the target, the total number of ray tubes needing iterative computation is:

N_{t u b e} = \sum_{n = 1}^{N} \sum_{n_{a z} = 1}^{N_{a z}} \sum_{n_{r g} = 1}^{N_{r g}} N_{r i g h t} (n, n_{a z}, n_{r g}) \times N_{d o w n} (n, n_{a z}, n_{r g})

(29)

The RCS synthesis module is developed to synthesize the final RCS image. It projects

A (n, n_{a z}, n_{r g}, n_{r}, n_{d})

onto the RCS image through a specific mapping relationship. This mapping process is executed in two distinct steps. The first step involves completing the integration of RCS values for a single grid cell, which is calculated as follows:

σ_{R C S} (n, n_{a z}, n_{r g}) = \sum_{n_{r} = 1}^{N_{r i g h t}} \sum_{n_{d} = 1}^{N_{d o w n}} 4 π {|A (n, n_{a z}, n_{r g}, n_{r}, n_{d})|}^{2}

(30)

At this stage, RCS images are generated at each sampling point along the azimuth direction. These images provide high-resolution geometric details and accurate amplitude information of the target from various squint angles. The next step synthesizes all RCS image components into a final RCS image, calculated as:

σ_{R C S} (n_{a z}, n_{r g}) = \sum_{n = 1}^{N} \frac{w (n, n_{a z}, n_{r g})}{N} \cdot σ_{R C S} (n, n_{a z}, n_{r g})

(31)

The weighting factor

w (n, n_{a z}, n_{r g})

simulates the attenuation effect of radiated values with propagation distance. In high-accuracy scenarios, this factor can be inversely proportional to the fourth power of the propagation distance [27]. Generally,

w = 1

suffices for most applications.

3.2.4. Cycle GAN-Based Translation Model

Although high-resolution RCS images provide detailed geometric and amplitude information, they differ significantly from real SAR images due to imaging processing and noise interference. These differences complicate the direct use of simulated datasets for neural network training, necessitating translation from RCS to SAR images.

In SAR image simulation methods based on echo simulation, the obtained RCS images can be interpreted as the scattering coefficient of the target. For instance, we can establish a cylindrical coordinate system

(x, r, θ)

centered on the radar flight trajectory. When the radar operates at coordinate x, the echo from a point target located at coordinates

(X, R_{0}, π / 2)

can be formulated as follows [3]:

s (x, τ) = σ \cdot exp [- j π K {(τ - \frac{2 R (x)}{c})}^{2} - j \frac{4 π R (x)}{λ}] \cdot w_{a} (x) \cdot w_{r} (x)

(32)

with

R (x) = \sqrt{{(x - X)}^{2} + R_{0}^{2}}

(33)

The target’s echo can be expressed as a combination of three components: the scattering coefficient

σ

, the phase variation term

exp (\cdot)

, and the signal envelope

w (\cdot)

derived from rectangular window functions

w_{a} (x)

and

w_{r} (x)

. Here,

τ

is the range time, K is the chirp rate of the transmitted signal,

w_{a} (x)

represents the azimuth gain, and

w_{r} (x)

is the envelope of the transmitted signal. By summing the echo signals from all scattering points, the entire target’s echo can be obtained. The target echo, after undergoing imaging processing, becomes the simulated SAR image. This process of handling RCS information can be regarded as a strictly linear system. However, the echo simulation-based method requires simulating the entire synthetic aperture distance, which is time-consuming even for small targets. Moreover, due to the differences between the simulation environment and real-world conditions, the images generated by this method often exhibit differences from real images. This requires additional nonlinear transformations for reconciliation [1].

Cycle GAN is an image translation method designed for unpaired datasets, and it is appropriate for our objectives. As illustrated in Figure 4b, the datasets from the RCS image domain

{\{x_{i}\}}_{i = 1}^{N} \in X_{R C S}

and the SAR image domain

{\{y_{i}\}}_{i = 1}^{M} \in Y_{S A R}

are provided to cycle GAN. This enables the network to learn the mapping relationships

G : X_{R C S} \to Y_{S A R}

and

F : Y_{S A R} \to X_{R C S}

between the two image domains

X_{R C S}

and

Y_{S A R}

. Cycle GAN utilizes adversarial loss and cycle consistency loss as the loss functions for the model, with their respective influences modulated by the weight

λ

[21]. The corresponding formula is:

L (G, F, D_{X}, D_{Y}) = L_{G A N} (G, D_{Y}, X, Y) + L_{G A N} (F, D_{X}, Y, X) + λ L_{c y c} (G, F)

(34)

with

\{\begin{matrix} L_{G A N} (G, D_{Y}, X, Y) = E_{y \sim p_{d a t a} (y)} [log D_{Y} (y)] + E_{x \sim p_{d a t a} (x)} [log (1 - D_{Y} (G (x)))] \\ L_{G A N} (F, D_{X}, Y, X) = E_{x \sim p_{d a t a} (x)} [log D_{X} (x)] + E_{y \sim p_{d a t a} (y)} [log (1 - D_{X} (F (y)))] \\ L_{c y c} (G, F) = E_{x \sim p_{d a t a} (x)} [{∥ F (G (x)) - x ∥}_{1}] + E_{y \sim p_{d a t a} (y)} [{∥ G (F (y)) - y ∥}_{1}] \end{matrix}

(35)

Cycle GAN determines the optimal generators by finding the generators G and F that minimize the loss function, and the discriminators

D_{X}

and

D_{Y}

that maximize the loss function. Firstly, the adversarial loss

L_{G A N}

compels the generator to continuously enhance output quality through adversarial training with the discriminator, ensuring the generated images resemble real images. It enables the generator to perform effectively under a wide range of target parameters, reducing the risk of overfitting. Consequently, the adversarial loss can have an effect similar to the regularization constraint. Secondly, the cycle consistency loss

L_{cyc} (G, F)

imposes constraints on bidirectional mappings to ensure their reversibility. This additional constraint significantly narrows down the space of potential mapping functions. Lastly, in the training phase, all real samples utilized originate from the single-scene SAR image while simulated RCS images employ the same imaging parameters. Such constraints reduce uncertainty in both inputs and outputs within the image transformation system, consequently reducing complexity in the linear transformation component of the mapping. Under the combined effect of these three constraints, the fitting process of the mapping in cycle GAN is stable, convergent, and efficient.

3.2.5. Noise Model

Speckle noise in SAR images arises from the interference of backscattered signals from ground targets, reflecting their scattering characteristics. However, mapping this backscattering information to speckle noise is often challenging. The limited parameters of the generative network struggle to achieve both mutual mapping between image domains and the simulation of complex coherent speckle noise. Thus, adding simulated noise to the image domain

X_{R C S}

is necessary to ease the difficulty of noise mapping.

For single-look SAR images, the amplitude of speckle noise follows a Rayleigh distribution, making it a suitable choice for noise modeling in the image domain [28,29]. This distribution effectively captures the statistical properties of speckle noise without introducing unnecessary complexity. With the superior fitting performance of generative adversarial networks, the noise model incorporated into the image domain

X_{R C S}

does not require high precision. Through the transformation, a relatively simple noise model can ultimately evolve into one that closely resembles real-world speckle noise. The probability distribution function of the noise is:

f (x; s) = \frac{x}{s^{2}} exp (- \frac{x^{2}}{2 s^{2}}), x ⩾ 0

(36)

where s is the scale parameter of the Rayleigh distribution.

3.3. Aircraft Recognition with Data Augmentation

It is well known that SAR image simulation technology significantly enhances the generalization capability of networks by supplementing real SAR image datasets. However, differences between simulated and real images pose challenges for traditional target recognition networks. Moreover, the quantitative metrics used to evaluate SAR image simulation often fail to intuitively reflect its data augmentation effectiveness. To verify the enhancement effect of simulated SAR images on target recognition performance and to explore the adaptability of different network architectures to simulated data, it is necessary to select appropriate neural networks for evaluation.

Previous studies, such as [1], utilized neural networks with traditional architectures to assess the performance of simulated data. However, alternative neural network architectures may exist that are more suitable for processing simulated images. Deep networks such as ResNet exhibit stronger representation capabilities, but the domain gap between simulated and real SAR images may amplify the risk of overfitting. In contrast, basic networks better demonstrate the actual gain effect of simulated data. Transformer-based architectures excel in long-range dependency modeling but require a large number of training samples, making it difficult to achieve optimal performance with the current data scale. Since our focus is on evaluating the impact of simulated SAR images, using basic networks allows us to highlight and analyze the effects of data augmentation without being confounded by specific architecture optimizations.

Neural networks based on the ConvNet [14] architecture replace the original fully connected layers with convolutional layers, which sacrifices some feature representation capability but enhances robustness. A-ConvNet [30] exemplified a successful application of ConvNet in SAR image target classification. Inspired by this approach, we designed a lightweight target recognition network tailored for simulated SAR images based on the A-ConvNet architecture. This network comprises seven convolutional layers and five max-pooling layers, as illustrated in Figure 6a. Additionally, a dropout layer is incorporated before the output layer to mitigate the excessive influence of a limited number of weight parameters, thereby ensuring the generalization capability of the network. The classic cross-entropy function will be employed as the loss function.

To further evaluate the applicability of different network structures to simulated data, we also implemented a neural network utilizing the traditional VGG architecture for performance testing on our simulated image set, as depicted in Figure 6b. VGGNet is a classic deep network capable of extracting multi-level features. As a basic network, it has a wide range of applicability and can serve as a benchmark for evaluating new data. Except for the first layer, the first nine layers of this VGG-based network align with our ConvNet design. To maintain consistency, this network also employs dropout and cross-entropy loss functions.

By selecting two architectures with different characteristics: the lightweight, robust ConvNet and the feature-rich VGGNet, we can ensure a balanced evaluation of the applicability of simulated data for SAR target recognition.

3.4. Program Implementation

In terms of hardware configuration, the proposed method utilized 13th Gen Intel Core i5 CPU, NVIDIA GeForce RTX 4070 Ti S, and 64 GB of total memory. In practice, lower hardware configurations can also meet the requirements. For the software environment, the proposed method used C++, Python 3.11, PyTorch 2.3, and CUDA 12.1.

Apart from the configuration of simulation parameters, provision of 3D models, and acquisition of real SAR data, the entire program workflow is fully automated using C++ or Python. Specifically, the program can be divided into five functional modules:

Parameter parsing module;
SBR simulation module;
RCS synthesis module;
Cycle GAN-based translation;
Aircraft recognition network.

Modules (1) to (3) are implemented in C++. Firstly, the parameter parsing module generates and stores a binary file based on the provided simulation parameters and metadata. This file stores parameter vectors for each scan point, with detailed variable lists and calculation methods described in Section 3.2.1. Secondly, the SBR simulation module reads the binary file and 3D model files to perform electromagnetic simulation. This module includes both ray tracing and amplitude tracking functions. The ray tracing function is developed based on RaytrAMP [31], while the amplitude tracking function is automatically calculated by the program, with detailed calculation methods provided in Section 3.2.2. Upon completion, this module generates a binary file of the SBR scanned data. Lastly, the RCS synthesis module parses the SBR scanned data and reconstructs it into simulated RCS images, with detailed calculation methods in Section 3.2.3.

Modules (4) and (5) are implemented in Python. Details of the network training and prediction processes are discussed in the next section.

4. Experimental Results

Firstly, Section 4.1 introduces the dataset specifically constructed for this experiment and its data distribution. Then, Section 4.2 provides a detailed description of the initial parameters of the experiment, including the core parameters of the simulated SAR platform. Section 4.3 presents the various metrics used to evaluate the experimental results. Moreover, Section 4.4 discusses the quality of the simulated SAR images and their respective indicators. Finally, Section 4.5 sets up ablation experiments for key functional modules and discusses the experimental results. Additionally, information on the hardware platform and software environment for the experiment can be found in Section 3.4.

4.1. Data Description

Due to the scarcity of datasets that meet the experimental requirements, this study has constructed a dataset for aircraft target recognition based on the Umbra synthetic aperture radar (SAR) open data [32], referred to as Aircraft-VariedSAR. The open data program (ODP) features over twenty diverse time-series locations that are frequently updated, including 40 spotlight mode SAR images with a resolution of 0.5 m, X-band frequency, and VV polarization from Suvarnabhumi International Airport in Thailand, collected between 1 February 2023 and 8 May 2023. We extracted aircraft from these 40 SAR images into chips measuring 300 × 300 pixels and randomly cropped clutter areas outside the aircraft to obtain clutter chips of identical dimensions. The types of clutter included in this dataset encompass grasslands, rivers, farmlands, terminals, airport runways, and urban buildings, among others. All cropped chips were stored directly in a lossless image format without any preprocessing steps applied. The dataset comprises a total of 1107 aircraft images and 4488 clutter images. Figure 7 illustrates the imaging parameters for SAR aircraft images within Aircraft-VariedSAR; incidence angles range from 19.67° to 53.05°, while squint angles vary from −42.10° to +56.57°. In contrast to the MSTAR dataset [33], which focuses on sample data with limited incidence angles, the Aircraft-VariedSAR dataset encompasses most imaging parameters typical of spaceborne SAR imagery. Furthermore, approximately 41.6% of samples within this dataset are concentrated within an incidence angle range of

(30^{\circ}, 50^{\circ})

and a squint angle range of

(- 2^{\circ}, 2^{\circ})

, thereby ensuring its broad applicability across various scenarios. A portion of the Aircraft-VariedSAR dataset will be utilized as training samples for cycle GAN, as well as training and testing samples for the aircraft target recognition network.

In addition to Aircraft-VariedSAR, this experiment will involve two types of simulated image datasets: the simulated RCS image dataset and the simulated SAR image dataset. Part of the simulated RCS dataset will serve as training samples for cycle GAN, while the simulated SAR dataset will be used for aircraft target recognition. These datasets are also experimental results and will be detailed in Section 4.4.

The experiment will utilize eight different passenger aircraft models, as shown in Figure 8, for electromagnetic simulation scanning. Key parameters of these models are listed in Table 1, each featuring distinct geometric structures, dimensions, and detail levels. These models will enhance the testing of the SingleScene–SAR simulator’s performance under various modeling conditions.

4.2. Implementation Details

When simulating RCS images, the authenticity of the simulation parameters directly affects the accuracy of the simulation results. Therefore, the initial input parameters of the parameter parsing module should refer to the parameters of the real SAR platform and the metadata of the SAR images. The main parameters are shown in Table 2, and unless otherwise specified, the units of the parameters are in the International System of Units. The incidence angle

θ_{i n}

is taken in steps of 10 degrees, with five parameters ranging from 20 to 60 degrees. The rotation angle

α

of the aircraft model around the z-axis is taken in steps of 30 degrees, with 12 parameters ranging from 0 to 330 degrees. Using eight different aircraft models for scanning simulation, a total of 480 RCS images can be obtained. These RCS images are finally stored in a unified size of 300 × 300 pixels by zero-padding.

The simulated RCS image dataset needs to undergo image translation by cycle GAN to generate simulated SAR images that closely resemble real ones. During the training phase of cycle GAN, in addition to providing simulated data, both simulated images and real SAR images are required. To evaluate the data augmentation performance of the proposed method under varying imaging parameters, three SAR scene images with distinct imaging parameters will be utilized independently to train the cycle GAN model. In this study, we focus on variations in incidence angles within a commonly observed range, while the squint angle is fixed near 0°. This choice is motivated by two considerations: on the one hand, these parameters are more representative and frequently encountered in practical applications. On the other hand, although squint angles significantly influence SAR image characteristics, research in [23] has demonstrated that simulated images remain effective for data augmentation even under different squint angle conditions. The details of the selected SAR scene images are presented in Table 3. The incidence angles for the SAR scene images employed in these three parallel experiments are 30°, 40°, and 50°, all of which fall within the primary incidence angle range illustrated in Figure 7.

Since each experiment utilizes only one SAR scene image, the number of SAR chips extracted from the scene image is limited to 30 to standardize variables. These 30 samples are subsequently augmented through shifting and cropping to generate new training samples, resulting in a final count of 720. There are 96 RCS images that correspond with the specific incidence angle, such as 30°, 40°, or 50°. The 96 simulated samples are initially added with noise based on the Rayleigh noise model described in Equation (36). The homogeneous noise region within the real SAR images is estimated using the maximum likelihood method, which determines the noise model parameter s = 25.04. Subsequently, the noise-added RCS images undergo augmentation through shifting and cropping to generate new training samples, culminating in a total of 768 samples. With the exception of variations in input and output data dimensions, all other hyperparameters of cycle GAN remain consistent with those outlined in [21]. Following 80 epochs and an additional 60 decay epochs, model training is completed. At this stage, three SAR image translation models corresponding to different incidence angles have been developed. During the prediction phase, these three models are employed to translate the RCS image dataset containing 480 samples, resulting in three distinct simulated SAR image datasets. For ease of reference, these datasets are designated as RCS2SAR-30, RCS2SAR-40, and RCS2SAR-50; each dataset comprises 480 simulated SAR image samples.

To evaluate the data augmentation performance of three simulated SAR image datasets, we designed a comparative experiment using aircraft target recognition networks. Previously, three independent simulated SAR image datasets RCS2SAR-xx were created, where “xx” represents “30”, “40”, or “50”. For each RCS2SAR-xx, the following experiments were conducted:

Original SAR Images D/T: The training dataset for these experiments comprises unprocessed original real SAR images. Specifically, the aircraft samples consist of 30 SAR image chips corresponding to Table 3. Additionally, the clutter samples are derived from the same set of SAR images, with quantities of either 60 or 90, respectively, identified by the suffixes “D (double)” and “T (triple)”.
Enhanced SAR Images D/T: In these experiments, we augment the size of the training dataset through shifting and cropping techniques based on real data D/T to improve the network’s generalization capability. Consequently, the number of aircraft samples increases from 30 to 510; similarly, clutter samples expand from either 60 or 90 to a total of 1020 or 1530, respectively, distinguished by the suffixes “D (double)” and “T (triple)”.
Data Augmentation D/T: These experiments incorporate simulated SAR images sourced from the RCS2SAR-xx dataset alongside real data D/T. The aircraft sample set includes both 30 real SAR images and an additional 480 simulated SAR images; meanwhile, clutter samples remain consistent with those in enhanced real data D/T.

To ensure comparability of quantitative indicators, all experiments used the same test data. The test set includes 500 aircraft chips and 500 clutter chips randomly selected from the Aircraft-VariedSAR dataset, with no overlap with the training data. This setup allows for the accurate evaluation of the recognition network’s performance in various unknown scenarios.

Both aircraft recognition networks employ identical hyperparameters: weights were initialized using a zero-mean Gaussian distribution with a standard deviation of

\sqrt{2 / n}

, where n represents the number of inputs for each unit. The optimization process utilized mini-batch stochastic gradient descent with momentum, featuring a batch size of 10 and a momentum parameter set to 0.9. The learning rate was established at 1 ×

10^{- 3}

for the initial 30 epochs, followed by a reduction to 5 ×

10^{- 4}

for the subsequent 40 epochs. Additionally, a weight decay parameter of 4 ×

10^{- 3}

was implemented alongside a dropout probability of 0.5 to mitigate overfitting risks.

4.3. Evaluation Metrics

The experiments employed two categories of evaluation metrics: the first category assessed the similarity between the simulated image domain and the real image domain; the second category evaluated the effectiveness of simulated images in facilitating data augmentation.

4.3.1. Similarity Evaluation Metrics Between Image Domains

In other cases involving paired simulated data [6,34,35,36], most methods employ the structural similarity index (SSIM) and its variants as evaluation metrics for image similarity. However, in the context of unpaired image translation techniques, SSIM is particularly sensitive to pixel-level differences, rendering it unsuitable for assessing similarity between distinct image domains. As illustrated in Figure 9, even a simple rotation can lead to a significant loss of reference capability in SSIM. Consequently, there is a pressing need for new metrics that are more appropriate for quantifying the similarity between two image domains. Given that the primary differences between these domains reside in their statistical characteristics at the pixel level, methods such as those described in [12] have utilized statistical histograms to represent domain similarity. This experiment employs three specific metrics to quantify histogram distribution similarity: Bhattacharyya distance, Chi-square distance, and Kullback–Leibler (KL) pseudo-distance.

The Bhattacharyya distance quantifies the similarity between two probability distributions, effectively representing their minimum separation. The formula is as follows:

d_{B} (H_{1}, H_{2}) = - ln (\sum_{I = 0}^{N - 1} \sqrt{H_{1} (I) H_{2} (I)})

(37)

Chi-square distance serves as a statistical measure for comparing observed and expected values within a dataset by calculating the squared differences between them. The formula is as follows:

d_{C} (H_{1}, H_{2}) = \sum_{I = 0}^{N - 1} \frac{{[H_{1} (I) - H_{2} (I)]}^{2}}{H_{1} (I)}

(38)

The Kullback–Leibler pseudo-distance, which originates from the concept of relative entropy in information theory, assesses the additional information needed to distinguish events based on a reference distribution. The formula is as follows:

d_{K L} (H_{1}, H_{2}) = \sum_{I = 0}^{N - 1} H_{1} (I) log \frac{H_{1} (I)}{H_{2} (I)}

(39)

As a complement to the aforementioned objective evaluation metrics, LPIPS (learned perceptual image patch similarity) [37] is adopted as a subjective evaluation metric. LPIPS measures the perceptual similarity between simulated SAR images and real SAR images. When the evaluation involves two image sets

X = \{x_{1}, x_{2}, x_{3}, \dots, x_{n}\}

and

Y = \{y_{1}, y_{2}, y_{3}, \dots, y_{n}\}

from different domains, the average

\bar{L P I P S}

score can be used. A lower

\bar{L P I P S}

score indicates a higher perceptual quality of the generated images. The formula for the

\bar{L P I P S}

is:

\bar{L P I P S} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{l} w_{l} \cdot {∥ϕ_{l} (x_{i}) - ϕ_{l} (y_{i})∥}_{2}^{2}

(40)

where

ϕ_{l} (\cdot)

represents the feature extractor at the l-th layer of a perceptual feature extraction network, and

w_{l}

denotes the weights for the l-th layer of the network.

4.3.2. Usability Evaluation Metrics of Data Augmentation

To quantitatively assess the impact of the proposed data augmentation on aircraft recognition performance, we use three metrics: precision, recall, and F1-score. The F1-score indicates overall performance changes in the network. The calculation methods for these metrics are outlined as follows:

Precision = \frac{TP}{TP + FP}

(41)

Recall = \frac{TP}{TP + FN}

(42)

F 1 = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

(43)

True positive (TP) indicates correctly identified aircraft, false positive (FP) refers to actual clutter misidentified as aircraft, true negative (TN) denotes accurately recognized clutter, and false negative (FN) signifies actual clutter incorrectly predicted as aircraft. The neural network’s output is based on the recognition result with the highest confidence.

4.4. Performance of SingleScene–SAR Simulator

Figure 10 shows samples from the Aircraft-VariedSAR dataset, which are the RCS image dataset and the RCS2SAR-30/40/50 datasets.

In the real SAR images, strong scattering effects from aircraft primarily arise at the leading and trailing edges of the wings and winglets, while the fuselage appears relatively dark. This is due to modern aircraft using composite materials that differ between fuselage skin and wings, resulting in varying scattering intensities. However, under certain placements and observation angles, significant scattering from the fuselage can also be observed, as seen in samples within the green box. Given the complex physical properties of aircraft components, modeling them individually poses challenges in practical applications. Thus, treating the entire aircraft model as a perfect electric conductor (PEC) material with adequate post-processing is a reasonable approach.

In the RCS images, strong scattering is predominantly observed at the leading edges of the wings and winglets, consistent with the scattering patterns in real SAR images. Additionally, scattering effects are also visible on the fuselage due to its modeling as PEC material. While this differs slightly from real-world SAR images, it highlights the ability of the proposed method to capture the general scattering characteristics of metallic surfaces under comparable conditions. These observations validate the effectiveness of the SBR method in generating physically consistent results. Furthermore, compared with other aircraft models, the RCS images of Aircraft H exhibit certain unique features. Specifically, the scattering intensity distribution is more uniform rather than concentrated at specific points. This could be attributed to Aircraft H’s significantly higher number of triangular facets and its larger model volume (detailed parameters are shown in Table 1). These factors contribute to a broader scattering distribution and lower relative contrast. This difference also impacts the final image translation results.

In the simulated SAR image datasets RCS2SAR-30/40/50, the simulated SAR images after image translation exhibit more texture features and speckle noise closer to real SAR images. This is due to the precise calculation of the target’s RCS by the SBR algorithm and a reasonable noise model, making it easier for cycle GAN to map from the RCS image domain to the SAR image domain based on given geometric and amplitude information. Additionally, some simulated SAR images show irregular short linear distributions of strong scattering noise. This is because the background of real SAR images is not always homogeneous noise but can include strong scattering distributions caused by interference objects such as runways and terminal bridges. These interfere with cycle GAN’s learning of the SAR image background environment, resulting in similar elements in the generated simulated SAR image backgrounds. Fortunately, such random interference in the background environment helps improve the robustness of the target recognition network.

In the samples from the RCS2SAR-30 dataset, side lobe effects in both azimuth and range directions closely resemble those found in real SAR images. This is illustrated by the samples within the yellow box. This phenomenon can be attributed to the substantial side lobe effects present in the training data of the generation network model utilized for the RCS2SAR-30 dataset. Consequently, these effects are retained in the final fitted mapping function. In contrast, the RCS2SAR-40 and RCS2SAR-50 datasets, whose training data for the generation network do not have significant side lobe effects, result in better imaging quality of the final simulated SAR images. Furthermore, the image translation result of the discrete scattering point’s combination structure can be observed in the simulated images of small aircraft (e.g., Aircraft A and E, within blue boxes). These results closely resemble the typical scattering characteristics exhibited by aircraft targets in low-resolution SAR images [38,39,40]. Even though the training data for cycle GAN are high-resolution SAR images, the final translated simulated SAR image still has a similar scattering structure to real SAR images when the aircraft pixels in the given RCS image are few. This indicates that the SBR simulation method with a clear physical mechanism has high generalizability.

The histograms of 480 samples from the different datasets are presented in Figure 11. Table 4 displays the

\bar{L P I P S}

and the calculated results of similarity metrics between different histogram curves shown in Figure 11. The first three metrics, Bhattacharyya distance, Chi-square distance, and KL pseudo-distance, are represented by

d_{B}

,

d_{C}

, and

d_{K L}

, respectively.

The histogram curve of the simulated RCS images differs significantly from that of real SAR images. In the histogram of the simulated RCS image dataset, the aircraft target occupies only a small region, resulting in a large number of background pixels with zero values. Additionally, the normalization of the RCS images and the presence of a few strong RCS values compress the dynamic range of the images, leading to a very low histogram curve. In contrast, the histograms of the three simulated SAR datasets exhibit distribution characteristics similar to those of real SAR images. On the one hand, the simulated SAR images include background noise that closely resembles the environmental noise in real SAR images, significantly improving the histogram distribution. On the other hand, the mapping process from the RCS image domain to the SAR image domain also affects the final histogram curve. The similarity metrics of the histograms, presented in Table 4, further confirm this observation.

In addition to histogram-based distance metrics, the

\bar{L P I P S}

score is noticeably reduced after processing with cycle GAN. This reduction highlights the perceptual dissimilarity between RCS images and real SAR images and the substantial improvement achieved in the simulated SAR images. However, the

\bar{L P I P S}

scores of the three simulated SAR datasets did not reach extremely low values (such as 0.3), primarily due to the inherent challenges of the unpaired image-to-image translation task. Further improvements in the translation model or refinements in the physical simulation model could enhance the perceptual similarity of simulated SAR images. Nevertheless, when using simulated SAR images for data augmentation tasks, the objective accuracy metrics of target recognition networks are often more critical than perceptual similarity metrics.

Table 5 presents the accuracy metrics for aircraft target recognition across 36 experiments. The F1-score metrics of all experimental groups exhibited a significant improvement following the data augmentation by the SingleScene–SAR simulator. This is because the simulated SAR images not only resemble real SAR images but also provide more features of aircraft targets for the training set. Since the simulator can obtain a substantial number of training samples with different imaging parameters and target parameters at a very low cost, it can greatly enhance the generalization ability of the target recognition networks. This improvement is evidenced by consistent performance gains on a test set comprising 1000 SAR image samples characterized by 40 distinct imaging parameters. The maximum difference in metric improvement can reach 12.10%.

Further analysis indicates that precision remains relatively stable, while recall shows significant improvement across all data augmentation groups. This phenomenon can be attributed to the fact that, in the absence of simulated data for data augmentation, both the original SAR images group and the enhanced SAR images group rely solely on a single SAR scene image for their training data. Consequently, this limited dataset provides little information and restricts generalization capabilities. As a result, both ConvNet and VGGNet can only recognize aircraft targets with obvious features, and their performance is poor for targets with more interference or less obvious scattering features. In contrast, using simulated data for data augmentation enables the training set to acquire comprehensive features pertaining to aircraft targets under diverse imaging and target parameter conditions. This approach significantly enhances the neural network’s recognition performance for aircraft targets that exhibit less pronounced features, resulting in an improvement in recall from 7.5% to 16.75%.

An interesting observation in Table 5 is the presence of negative values in the difference of precision. In other words, in half of the experimental groups, the precision metric slightly decreased after applying data augmentation techniques. This phenomenon can be summarized from four perspectives. Firstly, the fluctuation in precision may be related to the imbalance between positive and negative data. The experimental groups with a negative precision difference are primarily those with double the clutter, marked with the suffix “D”. When data augmentation techniques were applied, the imbalance between positive and negative samples was further exacerbated. In contrast, the experimental groups with triple the clutter, marked with the suffix “T”, were able to maintain more stable precision. Secondly, the target recognition network is driven by cross-entropy loss rather than precision. When a large number of positive samples were introduced through data augmentation, precision exhibited minor fluctuations, while recall increased significantly. This optimization objective slightly sacrificed the ability to constrain false positives, in exchange for a significant improvement in overall performance. Thirdly, the domain gap between simulated SAR images and real SAR images may have caused a slight decrease in the network’s adaptability to the distribution of real data, potentially leading to instability in precision under certain conditions. Lastly, the fluctuation in precision may be related to the uncertainty of training data. The absolute values of negative precision differences were all less than 0.01. Such minor performance fluctuations are tolerable. When averaging the differences in precision across all 36 groups, precision actually increased by approximately 0.0037. This phenomenon implies two insights. On the one hand, in practical applications, providing as much real data as possible (especially negative data) within allowable limits will help mitigate the instability of precision. On the other hand, in future work, improving the quality of simulated SAR images and narrowing the gap between them and real SAR images will also be beneficial.

Specifically, the augmentation of the RCS2SAR-30 simulated dataset had the most significant improvement on the target recognition network. The average F1-score improvement of the corresponding four groups was 9.49%, significantly better than the other two simulated datasets. This is because the real SAR images used in this group of experiments have severe side lobe effects and poor imaging quality, resulting in unimpressive performance metrics for the original SAR images group and the enhanced SAR images group. The introduction of the RCS2SAR-30 dataset largely compensates for this disadvantage, resulting in the highest performance improvement difference. In other words, the data augmentation method based on the SingleScene–SAR simulator is particularly suitable for application scenarios where real SAR images are scarce and of poor imaging quality. Correspondingly, the data augmentation of the RCS2SAR-40 simulated dataset has a relatively smaller difference in metric improvement, but the final absolute value is improved to a maximum of 93.58%. This is because the real SAR images used in this group of experiments have an incidence angle of 40°. As shown in Figure 7, lots of samples in the test set have incidence angles around 40°, meaning the difference between the training set and the test set is relatively small. Even though the enhanced SAR images group using RCS2SAR-40 already has good accuracy, the data augmentation of the SingleScene–SAR simulator is still effective in performance improvement.

Finally, the comparative results between ConvNet and VGGNet indicate that, regardless of the architecture of the target recognition network, data augmentation techniques utilizing simulated data can achieve stable metric improvement. Furthermore, it is evident that ConvNet performs better than VGGNet. This is because the fully convolutional layer structure of ConvNet sacrifices a small amount of expressive ability but has better generalization performance. Consequently, this strategy makes it more suitable for training datasets containing simulated SAR images.

Figure 12 illustrates the variation of the F1-score with the number of iterations during the 70 epochs of training for the ConvNet and VGGNet neural networks. Excluding fluctuations caused by differences in datasets in some experimental groups, most data augmentation experimental groups’ F1-score curves rapidly rise to near the highest metric within 10 epochs and stabilize around 20 epochs. These stabilizations generally occur before the networks begin weight decay at the 30th epoch. This indicates that the convergence speed and stability of the target recognition networks are improved when combined with data augmentation.

4.5. Ablation Experiments

To evaluate the impact of various functional modules within the SingleScene–SAR simulator on data augmentation performance, we conducted a series of ablation experiments. As presented in Table 5, six groups of data augmentation experiments utilized the simulated datasets RCS2SAR-30/40/50. These six groups served as baseline experiments for the ablation study. For each baseline experiment, the following three ablation experiments were set up:

SBR only: These experiments use only the unprocessed simulated RCS images as training data for data augmentation. The rest of the samples from real SAR images in the training set remain consistent with the baseline experiment.
SBR and noise estimator only: These experiments add noise to the simulated RCS images using the noise estimator module based on the SBR-only D/T but do not perform image translation.
SBR and cycle GAN only: These experiments perform image translation on the simulated RCS images based on the SBR-only D/T but do not add noise.

In order to ensure comparability of metrics, all ablation experiments used the same test datasets as the baseline experiments, and all network hyperparameters remained unchanged.

Figure 13 shows some of the simulated data used for data augmentation in the ablation experiments. The SBR and noise estimator-only experimental group, lacking image translation by cycle GAN and using a simplified noise model that fails to capture realistic background noise, produced simulated images with significant differences from real SAR images. Additionally, the SBR and cycle GAN-only experimental group experienced mode collapse during cycle GAN training. This is because it is difficult to establish a simple mapping between targets and background noise in real SAR images. Under the limited network parameters of cycle GAN, learning mappings to both SAR targets

G : X_{RCS} \to Y_{SAR Target}

and background noise

F : X_{RCS} \to Y_{SAR Noise}

is challenging. In other words, adding noise with an appropriate distribution model, such as Rayleigh distribution noise, can significantly reduce the difficulty of fitting the mapping between image domains for cycle GAN.

As shown in Table 6, the F1-score metric for all experimental groups decreased after ablating some or all modules. The SBR and cycle GAN-only group had an average decrease of 9.92%, the SBR and noise estimator-only group had an average decrease of 13.43%, and the SBR-only group had the largest average decrease of 15.34%. This indicates that both the noise estimator and cycle GAN are indispensable for enhancing the performance of target recognition network training data. Their individual use is less effective, and they must be combined to achieve the best performance. Compared with the baseline experiments, the precision metric of the ablation experiments remained largely unchanged, while the recall metric significantly decreased. The SBR and cycle GAN-only group had an average recall decrease of 13.13%, while the SBR and noise estimator-only group had an average recall decrease of 19.38%. The decrease in the recall metric indicates a reduced ability of the target recognition network to recognize difficult-to-detect aircraft targets. Compared with the SBR and cycle GAN-only group, which lacks background noise information, the recall metric is more sensitive to the image translation module that can refine more aircraft features.

5. Conclusions

This paper investigates a data augmentation approach in SAR target recognition tasks to address the scarcity of training set data. It creatively proposes and elaborates on the technical framework for training high-performance target recognition networks through data augmentation techniques when only one scene SAR image of the airport is utilized, which is termed the SingleScene–SAR simulator. Firstly, the simulated RCS images of the targets are acquired using the rasterized SBR electromagnetic simulation algorithm. Secondly, cycle GAN is trained in a specific pattern, and the RCS images are translated into simulated SAR images that closely resemble the real ones. Finally, the simulated images are used for data augmentation. Experimental results show that the proposed method can efficiently generate 480 simulated SAR images with spatial and grayscale statistical features close to reality using only 30 SAR aircraft chips within the same scene image. After data augmentation with simulated SAR images, all target recognition networks with different real sample parameters and different network architectures achieve stable and significant improvements in accuracy. Their generalization and robustness are also enhanced. This method efficiently expands the acquisition channels of simulated SAR image data and effectively mitigates restrictions imposed by the scale and quality of real training sets on target recognition networks. The proposed method also offers the potential for extending the application of deep learning in SAR images.

Author Contributions

Conceptualization, S.F.; data curation, S.F. and X.F.; formal analysis, S.F.; funding acquisition, X.L. and X.F.; investigation, S.F.; methodology, S.F. and Y.F.; project administration, X.F.; resources, S.F. and X.L.; software, S.F. and Y.F.; supervision, X.L. and X.F.; validation, S.F. and X.F.; visualization, S.F.; writing—original draft, S.F.; writing—review and editing, S.F. and X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the LuTan-1 L-Band Spaceborne Bistatic SAR data processing program, grant number E0H2080702.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cha, M.; Majumdar, A.; Kung, H.T.; Barber, J. Improving SAR automatic target recognition using simulated images under deep residual refinements. In Proceedings of the ICASSP, Calgary, AB, Canada, 15–20 April 2018; pp. 2606–2610. [Google Scholar]
Balz, T.; Hammer, H.; Auer, S. Potentials and limitations of SAR image simulators—A comparative study of three simulation approaches. ISPRS J. Photogramm. Remote Sens. 2015, 101, 102–109. [Google Scholar] [CrossRef]
Nasr, J.M.; Vidal-Madjar, D. Image simulation of geometric targets for spaceborne synthetic aperture radar. IEEE Trans. Geosci. Remote Sens. 1991, 29, 986–996. [Google Scholar] [CrossRef]
Xu, F.; Jin, Y.-Q. Imaging Simulation of Polarimetric SAR for a Comprehensive Terrain Scene Using the Mapping and Projection Algorithm. IEEE Trans. Geosci. Remote Sens. 2006, 44, 3219–3234. [Google Scholar] [CrossRef]
Fan, W.; Zhang, M.; Li, J. Numerical simulation of highly squinted spotlight SAR images from complex targets using an amendatory frequency scaling algorithm. Int. J. Remote Sens. 2018, 39, 3306–3319. [Google Scholar] [CrossRef]
He, W.; Yokoya, N. Multi-Temporal Sentinel-1 and -2 Data Fusion for Optical Image Simulation. ISPRS Int. J.-Geo-Inf. 2018, 7, 389. [Google Scholar] [CrossRef]
Auer, S.; Hinz, S.; Bamler, R. Ray-Tracing Simulation Techniques for Understanding High-Resolution SAR Images. IEEE Trans. Geosci. Remote Sens. 2010, 48, 1445–1456. [Google Scholar] [CrossRef]
Hammer, H.; Schulz, K. Coherent simulation of SAR images. Image Signal Process. Remote Sens. XV 2009, 7477, 406–414. [Google Scholar]
Amanatides, J.; Woo, A. A fast voxel traversal algorithm for ray tracing. Eurographics 1987, 87, 3–10. [Google Scholar]
Balz, T.; Stilla, U. Hybrid GPU-Based Single- and Double-Bounce SAR Simulation. IEEE Trans. Geosci. Remote Sens. 2009, 47, 3519–3529. [Google Scholar] [CrossRef]
Bao, X.; Pan, Z.; Liu, L.; Lei, B. SAR Image Simulation by Generative Adversarial Networks. In Proceedings of the IGARSS, Yokohama, Japan, 28 July–2 August 2019; pp. 9995–9998. [Google Scholar]
Cao, C.; Cao, Z.; Cui, Z. LDGAN: A Synthetic Aperture Radar Image Generation Method for Automatic Target Recognition. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3495–3508. [Google Scholar] [CrossRef]
Zhang, M.; Cui, Z.; Wang, X.; Cao, Z. Data Augmentation Method of SAR Image Dataset. In Proceedings of the IGARSS, Valencia, Spain, 22–27 July 2018; pp. 5292–5295. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the NIPS, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 2256–2265. [Google Scholar]
Qosja, D.; Wagner, S.; O’Hagan, D. SAR Image Synthesis with Diffusion Models. In Proceedings of the 2024 IEEE Radar Conference, Denver, CO, USA, 6–10 May 2024; pp. 1–6. [Google Scholar]
Ling, H.; Chou, R.-C.; Lee, S.-W. Shooting and bouncing rays: Calculating the RCS of an arbitrarily shaped cavity. IEEE Trans. Antennas Propag. 1989, 37, 194–205. [Google Scholar] [CrossRef]
Xia, W.; Li, H.; Wang, F.; Li, H.; Zhang, J. SAR image simulation for urban structures based on SBR. In Proceedings of the IEEE Radar Conference, Cincinnati, OH, USA, 19–23 May 2014; pp. 0792–0795. [Google Scholar]
Dong, C.-L.; Meng, X.; Guo, L.-X. Research on SAR Imaging Simulation Based on Time-Domain Shooting and Bouncing Ray Algorithm. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1519–1530. [Google Scholar] [CrossRef]
Chang, Y.-L.; Chiang, C.-Y.; Chen, K. SAR image simulation with application to target recognition. Progr. Electromagn. Res. 2011, 119, 35–57. [Google Scholar] [CrossRef]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the ICCV, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Chang, Y.; Liu, C.; Cao, L.; Yu, W.; Chen, H.; Cui, T. Generating SAR Images Based on Neural Network. In Proceedings of the ICCEM, Shanghai, China, 20–22 March 2019; pp. 1–3. [Google Scholar]
Liu, L.; Pan, Z.; Qiu, X.; Peng, L. SAR Target Classification with Cycle GAN Transferred Simulated Samples. In Proceedings of the IGARSS, Valencia, Spain, 22–27 July 2018; pp. 4411–4414. [Google Scholar]
Rosales, R.; Achan, K.; Frey, B. Unsupervised image translation. In Proceedings of the ICCV, Nice, France, 13–16 October 2003; pp. 472–478. [Google Scholar]
Liu, M.-Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. In Proceedings of the NIPS, Long Beach, CA, USA, 4–9 December 2017; pp. 700–708. [Google Scholar]
Taigman, Y.; Polyak, A.; Wolf, L. Unsupervised Cross-Domain Image Generation. arXiv 2016, arXiv:1611.02200. [Google Scholar]
Wu, K.; Jin, G.; Xiong, X.; Zhang, H.; Wang, L. SAR Image Simulation Based on Effective View and Ray Tracing. Remote Sens. 2022, 14, 5754. [Google Scholar] [CrossRef]
Xu, Z. Wavelength-Resolution SAR Speckle Model. IEEE Geosci. Remote Sens. Lett. 2022, 9, 4504005. [Google Scholar] [CrossRef]
Voigt, G.H.M.; Alves, D.I.; Muller, C.; Machado, R.; Ramos, L.P.; Vu, V.T.; Pettersson, M.I. A Statistical Analysis for Intensity Wavelength-Resolution SAR Difference Images. Remote Sens. 2023, 15, 2401. [Google Scholar] [CrossRef]
Chen, S.; Wang, H.; Xu, F.; Jin, Y.-Q. Target Classification Using the Deep Convolutional Networks for SAR Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4806–4817. [Google Scholar] [CrossRef]
RaytrAMP. Available online: https://github.com/RedBlight/RaytrAMP (accessed on 30 March 2024).
Umbra Synthetic Aperture Radar (SAR) Open Data. Available online: https://registry.opendata.aws/umbra-open-data (accessed on 30 March 2024).
Keydel, E.R.; Lee, S.W.; Moore, J.T. MSTAR extended operating conditions: A tutorial. In Proceedings of the 3rd SPIE Conference Algorithms SAR Imagery, Orlando, FL, USA, 10 June 1996; SPIE: Bellingham, WA, USA, 1996; Volume 2757, pp. 228–242. [Google Scholar]
Niu, S.; Qiu, X.; Lei, B.; Ding, C.; Fu, K. Parameter Extraction Based on Deep Neural Network for SAR Target Simulation. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4901–4914. [Google Scholar] [CrossRef]
Niu, S.; Qiu, X.; Lei, B.; Fu, K. A SAR Target Image Simulation Method With DNN Embedded to Calculate Electromagnetic Reflection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2593–2610. [Google Scholar] [CrossRef]
Sampat, M.P.; Wang, Z.; Gupta, S.; Bovik, A.C.; Markey, M.K. Complex Wavelet Structural Similarity: A New Image Similarity Index. IEEE Trans. Image Process. 2009, 18, 2385–2401. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Chen, J.; Wang, H.; Lu, H. Aircraft Detection in SAR Images via Point Features. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4004205. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, L.; Liu, Z.; Hu, D.; Kuang, G.; Liu, L. Attentional Feature Refinement and Alignment Network for Aircraft Detection in SAR Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5220616. [Google Scholar] [CrossRef]
Chen, Y.; Cong, Y.; Zhang, L. Deformable Scattering Feature Correlation Network for Aircraft Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4007205. [Google Scholar] [CrossRef]

Figure 1. Shooting and bouncing rays.

Figure 2. (a) Snell’s Law. (b) Ray tube reflection on the triangular facet.

Figure 3. The network architecture of cycle GAN.

Figure 4. Framework of the SingleScene–SAR simulator. (a) The component of RCS image simulation. (b) The component of SAR image translation.

Figure 5. (a) Spaceborne SAR platform motion model. (b) The grid sampling of the target model and the shape vectors of the ray pool.

Figure 6. Aircraft recognition network architecture. (a) ConvNet. (b) VGGNet.

Figure 7. Parameter of SAR images from the Aircraft-VariedSAR dataset.

Figure 8. Aircraft models.

Figure 9. The real SAR image (a) and its 90° rotated counterpart (b). The SSIM of these two images is 0.1921.

Figure 10. Real SAR images, simulated RCS images, and simulated SAR images generated from the RCS2SAR-30/40/50 datasets. The 1st and 2nd rows display real SAR images; the 3rd and 4th rows showcase simulated RCS images. The final 6 rows consist of simulated SAR images derived from the RCS2SAR-30/40/50 datasets, with dataset names indicated on the left side. In addition to the real SAR images, corresponding CAD models and incident angles for each image are labeled above each column. The angle values located in the bottom right corner of the images denote the rotation angle of the CAD model around the z-axis.

Figure 11. Normalized histogram results.

Figure 12. F1-score curves. The experiments are organized from top to bottom according to the simulated datasets RCS2SAR-30, RCS2SAR-40, and RCS2SAR-50. From left to right, the experiments utilize ConvNet and VGGNet architectures with double-sized clutters (D) and triple-sized clutters (T), respectively.

Figure 13. Simulated images derived from the ablation experiments. The 1st row shows images from “SBR and Noise Estimator only”; the remaining 3 rows show images from “SBR and cycle GAN only”, with the names of the corresponding simulated datasets of baseline labeled on the left. The angle values in the bottom right corner of each image represent the clockwise rotation angle of the CAD model around the z-axis.

Table 1. Parameters of aircraft models.

Name	Length (Meter)	Width (Meter)	Number of Triangular Faces
Aircraft A	39.5	35.8	152,633
Aircraft B	76.3	64.7	5706
Aircraft C	69.8	66.3	79,517
Aircraft D	63.0	60.0	9250
Aircraft E	37.6	33.7	371,197
Aircraft F	63.6	62.7	196,994
Aircraft G	66.8	66.1	72,200
Aircraft H	77.2	79.8	545,862

Table 2. SAR simulation parameters for SingleScene–SAR simulator.

Symbol	Parameter	Description
$p$	VV	Polarization mode of the antenna
$σ_{s c a n}$	1	Antenna left-right viewing parameter (1 for left view and −1 for right view)
$θ_{a z}$	2.49	Scanning angle of the antenna along the azimuth direction (degree)
$θ_{s q}$	0	Squint angle (rad)
$Δ x$	0.5	Range resolution
$Δ y$	0.5	Azimuth resolution
$f_{0}$	9.6 × $10^{9}$	Central operating frequency of the SAR platform
$f_{P R F}$	5.708 × $10^{3}$	Pulse repetition frequency
$ρ_{t u b e}$	10	Line density of the ray tube per unit wavelength

Table 3. SAR images for cycle GAN translation.

Name	Incidence Angle (Degree)	SAR Image Serial Number
RCS2SAR-30	30°	2023-02-01-03-05-24
RCS2SAR-40	40°	2023-03-05-14-25-30
RCS2SAR-50	50°	2023-03-25-14-19-09

Table 4. Similarity evaluation metrics between image domains.

Metric	$d_{B}$	$d_{B}$	$d_{KL}$	$\bar{LPIPS}$
Real SAR vs. Real SAR	0.000	0.000	0.000	0.000
Real SAR vs. RCS Images	0.528	0.533	2.245	0.845
Real SAR vs. RCS2SAR-30	0.049	0.014	0.200	0.479
Real SAR vs. RCS2SAR-40	0.038	0.014	0.175	0.487
Real SAR vs. RCS2SAR-50	0.046	0.018	0.257	0.463

Table 5. Target recognition network performance metrics. The difference is calculated by subtracting the metrics of the enhanced SAR images item from those of the data augmentation item above it. Negative values indicate that the latter is lower than the former.

Dataset	Network Type	ConvNet			VGGNet
Dataset	Metrics	Precision	Recall	F1	Precision	Recall	F1
RCS2SAR-30	Original SAR Images D	0.9944	0.5551	0.7124	0.9971	0.5325	0.6942
	Enhanced SAR Images D	0.9908	0.6294	0.7696	0.9933	0.5544	0.7116
	Data Augmentation D	0.9813	0.7538	0.8520	0.9835	0.7219	0.8326
	Difference	−0.0095	0.1244	0.0824	−0.0098	0.1675	0.1210
	Original SAR Images T	0.9945	0.5657	0.7210	0.9971	0.5163	0.6802
	Enhanced SAR Images T	0.9918	0.5778	0.7302	0.9922	0.5491	0.7068
	Data Augmentation T	0.9938	0.6900	0.8136	0.9897	0.6712	0.7996
	Difference	0.0020	0.1122	0.0834	−0.0025	0.1221	0.0928
RCS2SAR-40	Original SAR Images D	0.9846	0.6899	0.8110	0.9820	0.6249	0.7638
	Enhanced SAR Images D	0.9757	0.8030	0.8808	0.9625	0.7305	0.8306
	Data Augmentation D	0.9730	0.9014	0.9358	0.9563	0.7933	0.8672
	Difference	−0.0027	0.0984	0.0550	−0.0061	0.0627	0.0366
	Original SAR Images T	0.9962	0.5751	0.7290	0.9942	0.5267	0.6886
	Enhanced SAR Images T	0.9780	0.7536	0.8512	0.9860	0.6901	0.8118
	Data Augmentation T	0.9803	0.8895	0.9326	0.9783	0.7791	0.8674
	Difference	0.0023	0.1359	0.0814	−0.0076	0.0889	0.0556
RCS2SAR-50	Original SAR Images D	0.9812	0.7607	0.8568	0.9700	0.6839	0.8022
	Enhanced SAR Images D	0.9528	0.7998	0.8694	0.9489	0.6406	0.7648
	Data Augmentation D	0.9596	0.9008	0.9292	0.9729	0.7751	0.8628
	Difference	0.0068	0.1011	0.0598	0.0240	0.1345	0.0980
	Original SAR Images T	0.9830	0.6840	0.8066	0.9858	0.6019	0.7474
	Enhanced SAR Images T	0.9647	0.7455	0.8410	0.9650	0.6431	0.7718
	Data Augmentation T	0.9834	0.8224	0.8954	0.9839	0.7181	0.8302
	Difference	0.0187	0.0769	0.0544	0.0189	0.0750	0.0584

Table 6. Difference in performance metrics among baseline and ablation experiments. Negative values indicate that the performance of the ablation experiment was lower than the baseline experiment. The maximum and minimum values in each column are highlighted in bold.

Dataset	Network Type	ConvNet			VGGNet
Dataset	Metrics (Difference)	Precision	Recall	F1	Precision	Recall	F1
Real & Simulated Group D/T with RCS2SAR-30 Dataset	SBR D	0.0117	−0.2196	−0.1622	0.0107	−0.2737	−0.2148
	SBR + Noise Estimator D	0.0158	−0.2402	−0.1742	0.0123	−0.2393	−0.1826
	SBR + Image Translation D	−0.0464	−0.1782	−0.1396	−0.0421	−0.1405	−0.1140
	SBR T	0.0045	−0.1919	−0.1504	0.0090	−0.2395	−0.1968
	SBR + Noise Estimator T	0.0056	−0.1918	−0.1490	0.0097	−0.1972	−0.1570
	SBR + Image Translation T	−0.0484	−0.1471	−0.1248	−0.0478	−0.1137	−0.0992
Real & Simulated Group D/T with RCS2SAR-40 Dataset	SBR D	0.0156	−0.3299	−0.2118	0.0347	−0.2954	−0.2044
	SBR + Noise Estimator D	0.0072	−0.2927	−0.1848	0.0321	−0.2724	−0.1850
	SBR + Image Translation D	−0.0385	−0.1455	−0.1004	−0.0323	−0.1507	−0.1096
	SBR T	0.0095	−0.3056	−0.1982	0.0133	−0.2794	−0.2030
	SBR + Noise Estimator T	0.0117	−0.2961	−0.1900	0.0102	−0.2483	−0.1768
	SBR + Image Translation T	−0.0333	−0.1946	−0.1318	−0.0134	−0.1828	−0.1304
Real & Simulated Group D/T with RCS2SAR-50 Dataset	SBR D	0.0161	−0.0970	−0.0480	0.0091	−0.0874	−0.0540
	SBR + Noise Estimator D	0.0148	−0.0932	−0.0460	0.0080	−0.0379	−0.0212
	SBR + Image Translation D	−0.0331	−0.1202	−0.0824	−0.0165	−0.0503	−0.0382
	SBR T	−0.0049	−0.1476	−0.0970	0.0057	−0.1480	−0.1068
	SBR + Noise Estimator T	−0.0074	−0.1229	−0.0806	0.0048	−0.0933	−0.0646
	SBR + Image Translation T	−0.0328	−0.0844	−0.0646	−0.0254	−0.0675	−0.0552

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, S.; Fu, X.; Feng, Y.; Lv, X. Single-Scene SAR Image Data Augmentation Based on SBR and GAN for Target Recognition. Remote Sens. 2024, 16, 4427. https://doi.org/10.3390/rs16234427

AMA Style

Feng S, Fu X, Feng Y, Lv X. Single-Scene SAR Image Data Augmentation Based on SBR and GAN for Target Recognition. Remote Sensing. 2024; 16(23):4427. https://doi.org/10.3390/rs16234427

Chicago/Turabian Style

Feng, Shangchen, Xikai Fu, Yanlin Feng, and Xiaolei Lv. 2024. "Single-Scene SAR Image Data Augmentation Based on SBR and GAN for Target Recognition" Remote Sensing 16, no. 23: 4427. https://doi.org/10.3390/rs16234427

APA Style

Feng, S., Fu, X., Feng, Y., & Lv, X. (2024). Single-Scene SAR Image Data Augmentation Based on SBR and GAN for Target Recognition. Remote Sensing, 16(23), 4427. https://doi.org/10.3390/rs16234427

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Single-Scene SAR Image Data Augmentation Based on SBR and GAN for Target Recognition

Abstract

1. Introduction

2. Background

2.1. Shooting and Bouncing Rays

2.1.1. Ray Tracing

2.1.2. Amplitude Tracking

2.1.3. Far-Field Integration

2.1.4. Radar Cross Section (RCS)

2.2. Cycle GAN-Based Unpaired Image-to-Image Translation

3. Proposed Method

3.1. Framework of the SingleScene–SAR Simulator

3.2. SAR Image Simulation

3.2.1. Parameter Parsing Module

3.2.2. SBR Simulation Module

3.2.3. RCS Synthesis Module

3.2.4. Cycle GAN-Based Translation Model

3.2.5. Noise Model

3.3. Aircraft Recognition with Data Augmentation

3.4. Program Implementation

4. Experimental Results

4.1. Data Description

4.2. Implementation Details

4.3. Evaluation Metrics

4.3.1. Similarity Evaluation Metrics Between Image Domains

4.3.2. Usability Evaluation Metrics of Data Augmentation

4.4. Performance of SingleScene–SAR Simulator

4.5. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI