1. Introduction
Signal detection under a low signal-to-noise ratio (SNR) and complex clutter is a highly challenging task, which is extremely important in signal processing [
1]. Due to the presence of complex clutter data, radar target echoes are usually weak and complex, thereby resulting in the failure of the detection performance to meet the application requirements [
2]. A classical fast Fourier transform (FFT)-based constant false alarm rate (CFAR) detector is available for addressing this issue. However, this method suffers from severe performance degradation due to the poor resolution and leakage of the spectral energy, thereby resulting in an urgent need for new theoretical support to realize a breakthrough.
Information geometry, which is a theory that is based on statistical manifolds, is a differential geometry method for information science problems, which has been applied in numerous areas, e.g., neural networks [
3], image processing [
4,
5,
6], information geometric detection [
7,
8,
9,
10,
11,
12], dictionary learning, and sparse coding [
13]. Signal detection based on information geometry was first proposed in 1989, when an issue of multisource statistical inference was analyzed and the hypothesis testing problem was explained using a statistical manifold [
14], which highlighted the fundamental role that manifold theory plays in statistical information. After that, series of statistical inference theories were investigated via information geometry [
15,
16]. From the perspective of information geometry,
is regarded as a point on a statistical manifold, with a cylindrical confidence zone
R that is centered on
θ0, where the parameter
θ0 represents the null hypothesis sample data and
M denotes the statistical manifold. After the statistical modeling from the observed sample data, we can determine whether
θ is equal to
θ0 or not.
Figure 1 illustrates the basic principle of the statistical hypothesis problem.
This novel method, which is based on a Riemannian manifold, provides a new approach for solving complex signal processing problems. In recent years, Barbaresco proposed a CFAR detector that is based on Cartan’s geometry of the Hermitian positive-definite (HPD) product manifold [
17,
18,
19]. The CFAR detector obtains the maximum detection probability while keeping the target detection false alarm rate constant [
17], which has become a seminal result in target detection. Furthermore, it is now well established by several studies that the CFAR detector has a large performance advantage in signal processing [
9,
20]. However, a potential drawback is that the algorithm contains many matrix exponential, logarithmic, and inverse operations, which strongly impact the computational efficiency, as detailed in
Section 2. Thus, it is imperative to find an efficient method for optimizing the CFAR detector algorithm, which is also known as the matrix information geometric signal detection (MIGSD) algorithm.
Recently, additional studies and applications in combination with high-performance computing (HPC) methods have been conducted. The practicality of HPC has also been proved, especially for marine and atmospheric numerical calculations. The message passing interface (MPI) began to be widely used in the parallelization of the semi-Lagrangian shallow-water model [
21], the parallel ocean model (POM) [
22], and the finite-volume coastal ocean circulation model (FVCOM) [
23]. The open multiple processing (OpenMP) [
24] is extensively applied in the coastal ocean circulation model [
25], the mesoscale numerical weather prediction model 5 (MM5) [
24], and many other weather forecast models, wave models, and ocean models [
26]. In addition, the application of the HPC parallel methods continues to deepen, no longer limited to marine meteorology, but also shines in many other scientific areas, e.g., large-scale image data processing and pattern recognition [
27], molecular dynamics [
28], computational fluid dynamics applications [
29,
30], and cosmic celestial motion simulation [
31].
In the HPC area, OpenMP realizes superior parallel performance in shared storage environments [
32]. MPI is the standard for parallel programming in distributed storage architecture computers [
33]. However, due to the rapid growth in the communications between nodes, the bandwidth limits its efficiency; in this case, the use of a single available parallel technology (e.g., MPI or OpenMP) does not yield the desired performance [
34,
35]. Therefore, we must provide an ideal parallel programming scheme that enables applications to use this hybrid hardware structure most efficiently with minimal overhead and higher performance simultaneously. Fortunately, the hybrid MPI/OpenMP programming model can not only realize two levels of parallelism between nodes but also fully utilize the message passing model and shared-memory programming. The basic strategy of the hybrid MPI/OpenMP programming model is to apply multiple MPI processes on each node with OpenMP threads executing in the MPI process [
36,
37], which can significantly improve the efficiency of the program.
This hybrid MPI/OpenMP parallel method has been applied extensively for scientific computation. By combining the hybrid MPI/OpenMP modeling with the weather research and forecasting (WRF) model, improved performance over pure MPI or OpenMP has been realized [
38]. Duan Geng [
39] used the hybrid MPI/OpenMP programming model to improve the KMP algorithm. Phu Luong [
40] applied dual-Level Parallelism in coastal ocean circulation modeling. Furthermore, the hybrid parallel method is applied in many new developing fields, e.g., machine learning [
41,
42,
43], data mining [
44], and cloud computing [
45]. Doubtlessly, hybrid MPI/OpenMP modeling is a classical and promising candidate for scientific application.
The remainder of this paper is organized as follows: First, we introduce information geometry and related information to the MIGSD method. Then, we analyze critical computational components and the computational complexity of the serial algorithm. In
Section 3, we present our high-performance computing (HPC)-based MIGSD algorithm, which uses hybrid OpenMP/MPI, and detail our efforts to realize high computational and parallel efficiency on the Tianhe-2 supercomputer. Our experimental results are presented in
Section 4 and our ongoing works to overcome the limitations of the current implementations are discussed in
Section 5, followed by the conclusions of this study.
2. The Matrix Information Geometric Signal Detection Method
In this section, we describe how to map the sample data to a high-dimensional manifold in detail. Then, we derive the Riemannian mean matrix. Finally, we analyze the computational complexity of the MIGSD algorithm. We mention that the manifold is the extension of the concept of curve and surface in high dimensional space, and, if a Riemannian metric can be established in the (local) space of a manifold, the manifold is a Riemannian manifold.
2.1. Mapping from the Sample Data to an HPD Manifold
The main strategy of the MIGSD method is illustrated in
Figure 2. The sample data obey the zero-mean complex Gaussian distribution. Since the mean is zero, the information between the sample data is included in the covariance matrix, to which all corresponding distance cells constitute a nonlinear HPD manifold. As an extension of the statistical hypothesis problem, by comparing the geometric distance between the unit matrix and the Riemannian mean matrix with a specified threshold γ, we can judge whether the test cell corresponds to a signal or noise.
For received sample data
, where
n is the length of the pulse data, the matrix information geometric detector distinguishes the signal from the clutter. Assume that
satisfies a zero-mean complex Gaussian distribution, namely,
, with the probability density function expressed as follows:
where
represents the determinant of the covariance matrix, and the covariance matrix
H is formulated as:
in which the parameter
hk represents the correlation coefficients, where
is the complex conjugate of
z.
H is essentially a Toeplitz HPD matrix. According to the ergodicity of the stationary Gaussian process, we can calculate
hk by replacing statistical expectations with its time average:
As alluded to above, all the covariance matrices
H corresponding to the distance cells constitute a matrix manifold, which contains the correlation information between the sample data. Thus, the
n-dimensional vector of the sample data is mapped into an
n-dimensional matrix space, which can be formed as:
where
represents a Riemannian manifold with nonpositive curvature and
represents the
n-dimensional vector space, respectively.
2.2. Derivation of the Riemannian Mean Matrix
Now, we are ready to derive the Riemannian mean matrix. A manifold
contains a set of points endowed with a curve structure and
Hi represents an HPD matrix on the manifold
. Between two points
H1 and
H2 on
, there are infinitely many paths of minimal geodesic distance. In this paper, we measure the distance metric between
H1 and
H2 by the geodesic distance, which is formulated as:
where
is the logarithm map on the Riemannian manifold, ‖∙‖
F is the Frobenius norm and
λi represents the
i-th eigenvalue of
. The objective function that is used to calculate the mean of data
x in the Euclidean space is:
For the HPD manifold, we employ the geometric distance instead of the Euclidean distance. Let
denote a set of HPD matrices, with the mean matrix
defined as follows.
The subgradient algorithm is used to run the iteration via the fixed-point method [
46,
47]. Its convergence can be proved as follows: For a set of HPD matrices, the objective function
F(
H) can be expressed as:
Then, the gradient
is derived.
Let
,
H
is a positive matrix. Thus,
For these
n HPD matrices
, both sides are multiplied by
H1−1/2, and then we have
. In this way, the above sequence can be rewritten as
. According to the congruent transformation in Riemannian geometry, these matrices are still on the HPD manifold, and the Riemannian mean matrix does not change. Then,
Since
P1 represents the unit diagonal matrix
I. Thus,
For simplicity, let
, namely,
. Then,
According to the fixed-point method, the iterations can be formulated as follows:
Applying the logarithm map yields:
where
denotes the stepsize and
t the number of the iterations. The above equation converges after many iterations to the Riemann mean. As discussed above, our objective is to calculate the geometric distance between the test cell and the Riemann mean matrix
; in this case, signal detection is performed via comparison with a threshold.
2.3. Computational Complexity of the Algorithm
To set the stage for the algorithm complexity analysis, we recall useful information regarding computational complexity, which is shown in
Table 1. Then, we detail the serial MIGSD algorithm based on MATLAB (R2019a) to analyze the computational complexity. Arithmetic with individual elements has complexity
O(
1).
Pd_D is the signal detection rate.
PFA denotes the desired probability of false alarm, through which the threshold is determined. The signal-to-noise ratio (SNR) is defined as follows:
where
Pl is the signal power received by radar and
σ2 represents the noise variance. The higher is the SNR, the smaller is the amount of noise that is mixed with the signal (the higher the signal quality).
Now, we are ready to describe the serial method by presenting the main pseudocode in Algorithm 1, from which it is clear that the MIGSD algorithm involves a double-loop: Pd_D is initialized in the outer loop, while the inner loop executes many matrix operations. The calculation task in the inner loop can be divided into three main parts: the estimation of the Toeplitz matrix, the calculation of the Riemannian mean matrix, and the calculation of the geodesic distance.
Algorithm 1MIGSD (M, K, PFA, Pd_D) |
|
Each test cell is mapped into an HPD matrix on the Riemannian manifold. The threshold is determined by numbers of matrix iterations according to PFA. To obtain the signal detection rate Pd_D, the Monte Carlo method is employed inside the double loop. This method uses a weighted random sample to simulate the posterior probability distribution of the solution, which is also known as a random sampling method, to transform the integral into a summation form. Each Monte Carlo process compares the Geodesic_Matrix (H_Cell, H_D) and the threshold to solve the Pd_D problem.
The computational complexity of the mean matrix can be upper bounded by counting the number of multiplication operations. In this case, the Riemannian mean can be evaluated via an iterative procedure with (3
an3 +
an2)
tk multiplications, where
a denotes the number of HPD matrices for averaging,
tk is the number of iterations, and
n is the length of the pulse data in the range cell, which is substantially more expensive. The high computational cost limits its practical application, thereby making it much more difficult to evaluate the detection performance as the dimension increases, which is discussed in
Section 4.1.
3. High-Performance Computing-Based MIGSD Method
As discussed above, the high computational cost of the MIGSD algorithm poses challenges in analyzing the relationship of the detection performance and the dimension of the HPD matrices, which motivates us to use a high-performance computing (HPC) method to accelerate the algorithm. Moreover, as there are no data correlation in the iterative procedure, the hybrid MPI/OpenMP model can be employed effectively to improve the MIGSD algorithm. However, these methods are not supported in MATLAB. Thus, we transform the MATLAB program into a Fortran90 version to apply the HPC methods. From the perspective of HPC, our primary objective is to identify the hotspots of the program, from which we can obtain the largest performance improvement. The hotspots of the MIGSD serial program are concentrated in the double loops for the solution, for instance, the Monte Carlo iterations for the solution; the iterative method for calculating the mean matrix; and the matrix inverse, logarithm, and eigenvalue operations for calculating both the mean matrix and the geodesic distance.
Now, we present our parallel algorithm, which is detailed as follows:
The HPC-based MIGSD algorithm is divided into training and working steps: the training step provides the
threshold, while the signal detection rate
Pd_D is calculated in the working step. As discussed above, since the distance between every pair of points in the HPD manifold is independent, the current calculated distance does not affect the next calculated distance in the main loop. In this case, MPI can be applied in the outer layer to create processes, while OpenMP is used in the inner layer to create threads. This framework of the hybrid MPI/OpenMP programming model fully utilizes the bandwidth, as illustrated in
Figure 3. MPI_INIT is used to initialize the MPI environment to establish links between multiple MPI processes, while OMP PARALLEL opens the multithread environment. MPI_Finalize is used at the end of the MPI runtime environment, while OMP END PARALLEL closes the multithread parallel domain. These functions are the basic parallel framework for defining MPI or OpenMP programs.
3.1. Our Efforts in the Training Step
In the training step, our objective is to identify a threshold from the sequence of geodesic distances according to the PFA. To this end, MPI is used for task partitioning, while OpenMP is fused on the MPI-divided loop for further computation, namely, the task is divided by MPI processes, while OpenMP threads compute the local geodesic distance via involved functions in the MPI processes. After every process has completed the assigned calculation tasks, namely, the OpenMP threads have computed all the local distances, we gather all the local distances into the main process using function MPI_GATHER. In this case, the main process contains the full sequence of geodesic distances, which enables us to sort the geodesic distances into descending order. In addition, the threshold is determined in this descending sequence according to the PFA. To facilitate comparison with the threshold during the working step, the threshold will be broadcasted to the other MPI processes using the function MPI_BCAST. In this way, each MPI process has a copy of the threshold. This concludes the training step. Since each computation in the training step is independent with nearly no process communication overhead, the HPC-based MIGSD algorithm realizes high parallel performance.
The pseudocode of the training step in the HPC-based MIGSD program is presented as Algorithm 2. The parameter myrank indicates the process number, npros means the number of MPI processes, distlist_descent represents the descending sequence for distlist, and t represents the maximum number of training.
Algorithm 2 Training (M, K, PFA, threshold) |
|
3.2. Our Efforts in the Working Step
The working step is similar to the training step. The main difference is that we use the Monte Carlo method for all SNRSs through a double loop. Since the threshold that is obtained from the training step has already been passed to each process by function MPI_BCAST, we can immediately employ the Monte Carlo method in each process to compare dist and threshold. The input data to the working step are a clutter-containing signal matrix that is generated by a random function, and the MPI environment exists until the instruction MPI_FINALIZE is encountered.
In the Monte Carlo iteration, the correlation matrix, the mean matrix, and the geodesic distances of these two matrices are calculated without interference, which is known as task-level parallelism. Moreover, since the operations of each iteration are independent, MPI can be used outside the Monte Carlo loop, namely each MPI process executes a part of the Monte Carlo iterations, while multicore OpenMP is used in the MPI process to calculate the involved functions. In each OpenMP thread, dist is compared with threshold; in this case, the signal is finally determined, which is denoted as a cnt. The parallel domain of OpenMP is closed until all cnts have been obtained. At this point, the Monte Carlo simulation is also complete. We turn to outside of the Monte Carlo loop, where MPI_REDUCE is used to sum the signals in the subprocesses. In this case, the signal detection rate, namely Pd_D, is obtained from the working step. The basic parallel implementation of the working step is presented in Algorithm 3. The parameter Num_Montecarlo means the number of Monte Carlo iterations, myrank indicates the process number, npros means the number of MPI processes, and cnt_total represents the total number of signals obtained after comparison with the threshold.
Algorithm 3 Working (M, K, threshold, Montecarlo) |
|
In this application, since the program rarely incurs communication overhead, our HPC-based MIGSD program can realize high parallel optimization performance both within nodes and between nodes. The performance of our hybrid scheme is strong and it remains so for any combination of MPI processes and OpenMP threads, which is detailed in
Section 4 (it provides nearly linear speed-up).
5. Discussion
The influence of the noise that is mixed into the signal on signal processing is a subject that merits further study. However, due to the variety of clutter and the large amount of data, accelerating the detection and processing of radar signals is a difficult problem. In this paper, we use the hybrid MPI/OpenMP parallel model to overcome the high complexity and the large computational cost of the MIGSD method. In addition, the detection performance of the MIGSD method with a variety of dimensions is explored, which is especially important for practical applications.
The experimental results clearly demonstrate the following: (1) Parallel tools can accelerate the MIGSD algorithm, and, interestingly, computer technology and signal detection are fused. (2) The detection performance of the MIGSD algorithm varies with the dimension of the HPD matrix. The higher the HPD matrix dimension is, the better the detection performance of the matrix information geometric detector is.
For future research, we may focus on selecting a suitable PFA in the MIGSD method and on using other HPC methods (e.g., acceleration methods that are based on GPU hardware) to complete our HPC-based MIGSD program. Additionally, since the high memory usage becomes a substantial problem as the dimension of the HPD matrix increases, we consider optimizing the serial MIGSD algorithm; perhaps other Riemannian distance metrics are suitable.