1. Introduction
Time delay estimation is a basic problem in signal parameter estimation. It is required in many applications, such as speech analysis, channel estimation, radar navigation, and sonar signal processing [
1,
2,
3,
4,
5,
6,
7,
8]. In passive sonar systems, the distance and depth estimation of underwater targets is often affected by the performance of time delay estimation. In general, the accuracy of the time delay is of more concern when estimating the distance. A major time delay exists at this point without considering the multipath problem [
9]. The multipath problem is very important when estimating the depth. Therefore, not only is the accuracy of the time delay estimation needed, but also the number of time delays [
10].
The cross-correlation method is commonly used to estimate the time delay [
11], which determines the time delay value by locating the peak of the cross-correlation function. The cross-correlation method is classical and easy to be implemented in practical systems, but its resolution is limited by the shortcomings of the wide main lobe and high sidelobes of the cross-correlation function [
12]. Moreover, the estimation accuracy of this method is limited by the sampling frequency.
To mitigate the poor resolution problem of the cross-correlation method, many high-resolution methods of time delay estimation have been proposed, including look-ahead backtracking orthogonal matching pursuit (LABOMP) [
4,
5], multiple signal classification (MUSIC) [
13], weighted
norm minimization algorithm based on the cross-correlation method (CC-WL1) [
14], and the deconvolution cross-correlation method (Dec-CC) [
14]. These algorithms can obtain higher time delay resolution. However, there exists a common shortcoming of these methods, in that the performance of these methods is still limited by the grid. When the real-time delay is out of the grid, a grid mismatch arises, thereby reducing the performance of time delay estimation.
A simple way to solve the problem of grid mismatch is to increase the fineness of the grid, and solve the time delay through a denser grid. Although this can improve the accuracy of time delay estimation, dense grids often necessitate enormous data storage and computational pressure. Another method is to adaptively adjust the fineness of the grid to match the true time delay, which is known as off-grid time delay estimation. As described in Reference [
15], sparse Bayesian inference can be used to iteratively update the fineness of the grid from the coarse grid to obtain a more accurate time delay estimation. However, this method still requires a coarse grid to be provided in advance.
Solving the time delay estimation problem in the continuous time domain, rather than the discrete one, may be the best solution to completely overcome the mismatch problem, which is referred to as grid-less time delay estimation. In reference [
16], inspired by grid-free compressed beamforming [
17], the time delay estimation problem was transformed into a sparse reconstruction problem, then the atomic-norm minimization (ANM) method was used to achieve the time delay estimation in the continuous time domain. A major limitation of the ANM method is that it requires sufficient separation between two adjacent time delays to be recovered. The reweighted atomic-norm minimization (RAM) method can avoid this limitation. According to reference [
18], the RAM method has a higher resolution than the ANM method. Nevertheless, both the RAM and ANM methods meet a common issue, that is, when the noise variance is not known, it is necessary to manually adjust the parameters to obtain the best estimation result, a process which is often difficult and complex.
A VAriational Time Delay Estimation (VATDE) method is proposed in this paper. The VATDE method estimates the posterior probability density functions (PDFs) of time delays using variational Bayesian inference and calculates their expectations, rather than simply keeping the point estimates of time delays. The proposed method is grid-less, and, unlike the existing work in references [
16,
18], the VATDE does not need parameter adjustment, and can automatically estimate the number of time delays, noise variance, and amplitude variance. A simulation demonstrated that, even compared with other state-of-the-art methods, this method still had better time delay estimation performance.
2. Signal Model
Consider two spatially separated sensors in a passive system, which receive signals
and
from the same target source
. Assume that there are no other interference sources, and the target source signal
is affected by Gaussian white noise during the propagation process. Then, the cross-correlation function
of the received signals
and
can be obtained as follows:
where
is the autocorrelation function of the source signal
;
is the number of time delays;
is the amplitude of each time delay;
represents the
time delay; and
is the noise-related term. Ideally, the received signal and noise are independent of each other, at which point
.
Assume that the source signal
is a broadband signal. Then, the Fourier transform is performed on the cross-correlation function
, and the frequency form of Formula (1) can be obtained as follows:
where
is the cross-power spectrum between the received signal
and
;
is the auto-power spectrum of the source signal
;
is the cross-power spectrum of the noise;
is a sampling of the frequency band of the source signal
;
is the sampling rate; and
is the signal sampling length. The aim of this paper was to estimate exact time delays from the cross-power spectrum
of the received signal.
Before using the VATDE algorithm proposed in this paper, according to the definition of Formula (3), the model in Formula (2) was further parameterized. Let
where
. Note that
and
are in a one-to-one correspondence: when
is evaluated correctly,
can be obtained by
. Therefore, Formula (2) can be further expressed as follows:
where
, because the phase component of the auto-power spectrum is 0.
In fact, the value of
is difficult to obtain, so this paper used dimension
of the source signal frequency band as the number of time delays in the overcomplete model. Of course, it is necessary to ensure
. At this point, the final modeling result of the VATDE algorithm can be given as follows:
where
and
represent the weight of the time delay; and
represents the Fourier transform of the time delay value. Since the goal of this paper was to obtain
non-zero weights, sparse prior information was introduced for the weights. Ideally, the model shown in Formula (5) would finally obtain
real-time delays and the corresponding non-zero weights and
zero weights. Specifically, this paper introduces binary latent variables
to control the weight
to promote sparsity. When
, then
is active, that is,
is not equal to 0; otherwise,
is equal to 0. This can be expressed as the following formula:
where
follows the Bernoulli Gaussian distribution [
19,
20];
;
is the probability that the
weight component is active; and
is the variance of the zero-mean gaussian distribution. In this paper,
denotes the complex multivariate Gaussian PDFs with mean
and variance
. For the
time delay, let its prior distribution be
. This is a von Mises distribution; it is extremely important in the distribution on the unit circle [
21], and in most cases let
, which represents a lack of prior knowledge [
22,
23].
According to the overcomplete model (Formula (5)), the likelihood function
can be represented as follows:
where
represents the variance of background noise. The maximum likelihood (ML) estimation of the core parameters in this paper can then be represented as follows:
where
is the joint PDFs, which is the product of the likelihood function (Formula (7)) and the prior PDFs:
However, the calculation of Formula (8) is extremely complex and intractable; thus, in the next section an iterative algorithm is proposed to solve it.
3. Variational Bayesian Inference Time Delay Estimation Algorithm
In this section, the proposed VATDE algorithm relies on the mean-field theory of variational Bayesian inference. This paper wished to determine the approximation PDFs
that minimize the Kullback–Leibler (KL) divergence
, which means that the function
is a maximization [
24] (pp. 732–733),
where
represents the expectation operator. Furthermore, all time delays and other variables are assumed to be independent of each other, and
, so
can be represented as follows:
According to Formula (11), the following estimation can be obtained:
For the detailed derivation process, please refer to reference [
21] (p. 26).
Due to
, the approximate posterior PDFs of the weights
can be represented as follows:
According to Formula (14), the mean and variance of the weights
can be estimated as follows:
Then, this paper defines
as the estimation of the number of time delays
, which is a set of non-zero parameters in the variable
:
Therefore,
is equivalent to the cardinality of
,
. Finally, the estimation of the cross-power spectrum
is given as follows:
Compared with Formula (5), Formula (17) represents the cross-power spectrum without noise.
Now, let us consider maximizing the function
. Naturally, it is difficult to maximize the function
on all variables simultaneously. Therefore, this section adopts the alternative optimization scheme, that is, the function
is maximized for each variable, while other variables are fixed. Let
be the set of all variables, and
represents each variable in the set, then the above alternative optimization scheme can be expressed as follows [
24] (p. 735), Equations (21) and (25).:
where
represents the expectation of all the variables except variable
. In the following sections, more optimization details are presented.
3.1. Inference Parameter
According to Formula (18), for each variable in the parameter
, the optimization factor
can be represented as follows [
25] (Chapter 10, p. 466):
By substituting Formula (9), the following can be obtained:
When
, Formula (20) is further expressed as follows:
Then, the following can be obtained using Formula (15):
where
When
, then
, which means that only the prior information plays a role in Formula (22).
It is difficult to obtain the analytical results of Formula (22) directly. Fortunately, according to reference [
22] (Heuristic 2), the
can be approximated as a von Mises distribution:
The von Mises distribution
is expressed as follows:
where
is the mean direction,
is the concentration parameter, and
is the modified Bessel function of the first kind of order
[
21] (p. 348). According to Formula (25), the following can be obtained [
21] (p. 26):
Then, parameters
and
in Formulas (12) and (13) can be accordingly calculated.
3.2. Inference Parameters and
Then, fix
, and give the maximization of the function of
with respect to
. We define matrix
and
as follows:
According to Formula (18),
can be calculated with the following formula:
where
represents the element of parameter
indexed by
;
and
represent the submatrix with
as the row and column index; and
represents the number of non-zero elements in
. Then, it has the following formula:
According to Formula (14), the approximate posterior PDFs
of the weight
can be represented as the product of
and
. Therefore, to calculate
, the value of
must be estimated. By substituting the approximate PDFs
, put Formula (11) into Formula (10), and Formula (32) can be obtained as follows:
Consequently, when we maximize
, it is possible to obtain
:
The global optimal Bernoulli sequence
of the problem (Formula (33)) can be determined using an enumeration algorithm. However, for M-dimensional sequences, the computational cost of this operation is
and its computational complexity will increase exponentially as the
increases. For a given
, to reduce the complexity, a greedy iterative search strategy is proposed to determine the local optimal value. The specific update policy is as follows: For every
, calculate
, where
is the same as
, except that the
element is flipped. Let
, if
, update
by flipping the
th element; otherwise, if
remains unchanged, then terminate the algorithm. As for the computation of
, a detailed method has been given in references [
22,
23], and will not be repeated here. If the algorithm does not stop halfway through, and
is initialized to
, it usually only takes
steps to determine the local maximum of
, in which
is the estimate of the number of time delays.
3.3. Inference Parameters , , and
After updating parameters
and
, parameters
are then estimated by maximizing the lower bound of function
. First, we bring the approximate PDFs
from Formula (11) into Formula (10):
When only terms that depend on parameters
are considered, Formula (34) can be further expressed as follows:
Based on the likelihood function (Formula (7)) defined in
Section 2 and the form of the prior PDFs, Formula (36) can be obtained as follows:
If we set
and
, then Formula (37) can be obtained as follows:
Clearly,
is given by the number of non-zero elements of
, and
is the variance of the corresponding weights of these non-zero elements. Similarly, set
, and Formula (38) can be obtained as follows:
In summary, is composed of three parts: the fitting of the residual, the error of the weight estimation, and the uncertainty of the parameter estimation. The better fits, the smaller the values of the above three parts will be.
3.4. The VATDE Algorithm
This section gives the details of the update process of the VATDE algorithm, as shown in Algorithm 1.
Algorithm 1. Description of the VATDE algorithm. |
Input: The cross-power spectrum of the received signal. |
Output: Estimate the number of delays , time delay estimation , weight estimate , and reconstruct the cross-power spectrum . |
(1) initialize , , and , calculate |
(2) repeat |
(3) update , and (see Section 3.2) |
(4) update , , and (as shown in Formulas (37) and (38)) |
(5) When , update , and (see Section 3.1) |
(6) until the stopping criterion |
(7) according to , obtain |
(8) return ,, and |
For Algorithm 1, each step will increase the value of function , and eventually converge to some local maximum of function ; thus, initialization is very important for Algorithm 1. Next, a specific initialization scheme is described.
First, initialize
as
by defining the following:
Then,
can be simplified to the form of Formula (22). By defining
,
will be re-expressed as follows:
Second, can be calculated. from Formula (28) , and can be calculated using Formula (29). According to Formula (31), can be calculated. Then, update ; at this time, . According to the above description, , , and are fully initialized.
Finally, we give the initialization of the parameters , and, . Let be initialized to the average of the final quarter of the eigenvalues of matrix ; ; and is initialized to .
4. Simulation
Since the proposed algorithm is a grid-less time delay estimation algorithm, the simulation in this section is mainly compared with the proposed grid-less time delay estimation algorithm. It was assumed that the target source signal was a Gaussian white noise signal with a frequency of 1–4 kHz, and the signal length was 0.1 s. Considering a passive system, the sampling rate of the system was 10 kHz, the sampling interval was s, two spatially separated synchronous sensors received the signal from the target source at the same time, and there was background noise in the space, which was Gaussian white noise with a mean of 0.
First, the ideal state was considered, that is, there was only one time delay between the received signals of the two sensors, the time delay value was s, and the normalized amplitude was 1. The signal-to-noise ratio (SNR) was set to 10 dB, and the end condition of the VATDE algorithm was , then represents the maximum number of iterations. Next, we compared the time delay estimation performance of the VATDE algorithm proposed in this paper with the ANM and RAM algorithms.
It can be seen from
Figure 1a that all three algorithms, i.e., the ANM and RAM methods, and the VATDE proposed in this paper, could obtain better time delay estimation results in the case of a single time delay. Meanwhile, the enlarged window in
Figure 1a further shows that the VATDE algorithm was closer to the real-time delay than the RAM and ANM algorithms. Therefore, the VATDE algorithm’s time delay estimation accuracy was higher, while the time delay estimation accuracy of the RAM algorithm was also slightly better than that of the ANM algorithm.
Figure 1b shows the estimation of the number of time delays existing using the VATDE algorithm. However, since the ANM and RAM algorithms cannot provide an estimate of the number of time delays, the number of time delays can only be manually determined. It can be seen from
Figure 1b that the VATDE algorithm provided an accurate estimation of the number of time delays with only a few simple iterations, which avoids the interference caused by false peaks when determining the true time delay estimation.
Figure 2a shows a comparison of the phase of the cross-power spectrum between the received signals in the case of no noise and the phase of the cross-power spectrum when the SNR was 10 dB.
Figure 2b shows the performance of different algorithms for reconstructing the original cross-power spectrum phase from the noisy cross-power spectrum phase. It can be clearly seen that the reconstructed cross-power spectrum phase obtained by the proposed VATDE algorithm fitted the original cross-power spectrum phase perfectly. In addition, the reconstruction performance of the ANM algorithm was similar to that of the RAM algorithm. Although the trend of the reconstructed cross-power spectrum phase was roughly the same as that of the original cross-power spectrum phase, there was still a certain deviation. Clearly, the time delay estimation performance of the algorithm is was to the reconstruction performance of the cross-power spectrum. When the time delay estimation of the algorithm was more accurate, then the reconstructed cross-power spectrum phase becomes more suitable for the original cross-power spectrum phase.
Second, assuming the presence of a multipath influence, there were two time delays between the received signals of the two sensors, which signifies that there will be three time delays in the cross-power spectrum. In this paper, the magnitudes of the three time delays were set to , and the amplitudes were , respectively, while the SNR was also set as 10 dB. This paper compared the time delay estimation performance of the VATDE algorithm proposed in this paper with those of the ANM and RAM algorithms.
Figure 3a shows the time delay estimation results of the VATDE, ANM, and RAM algorithms in the presence of three time delay values. It can be seen that, compared with the ANM and RAM algorithms, the VATDE algorithm not only provided a more accurate time delay estimation result, it also yielded a more accurate amplitude estimate. The time delay estimation results of the RAM algorithm were similar to those of the ANM algorithm. However, under the same parameter selection, the RAM algorithm had fewer false peaks than the ANM algorithm, which reduced the interference of false peaks to the time delay estimation results.
Figure 3b once again shows the estimated performance of the VATDE algorithm for the number of time delays present. It still only took a few simple iterations to estimate the real number of time delays, which is very meaningful for time delay estimation, as it avoids the problem of distinguishing the real delay from the obtained time delay estimation results.
Figure 4a shows a comparison of the cross-power spectrum phase between the received signals in the case of no noise and the cross-power spectrum phase in the case of SNR 10 dB when there were three time delays. Compared with
Figure 2a, the multiple time delays made the original cross-power spectrum phase more irregular; thus, it was more difficult to reconstruct the original cross-power spectrum phase from the noisy cross-power spectrum phase.
Figure 4b shows the performance of different algorithms in reconstructing the original cross-power spectrum phase in the presence of three time delays. It is clear that the reconstructed cross-power spectrum phases of the ANM and RAM algorithms exhibited significant deviations from the original cross-power spectrum phase, due to their inaccurate time delay estimation and poor time delay amplitude estimation. Compared with the ANM algorithm, the reconstructed cross-power spectrum phase of the RAM algorithm could still reflect the original cross-power spectrum phase trend to a certain extent. This was because the reconstructed results of the ANM algorithm were greatly influenced by false peaks, and, thus, could not reflect the general trend of the original cross-power spectrum phase. Compared with the ANM and RAM algorithms, the reconstructed cross-power spectrum phase of the VATDE algorithm proposed in this paper still perfectly fitted the original cross-power spectrum phase, which once again reflects the superiority of the VATDE algorithm.
Next, the computational efficiency of the different algorithms was compared. In this study, 100 Monte Carlo simulations were carried out under the multipath assumption. The ANM and RAM algorithms in this paper were solved using the SDPT3 [
26] solver in the CVX toolbox [
27]. Based on reference [
28], the time complexity of the CVX toolbox for solving the semi-positive definite problem was approximately
Thus, the time complexity of ANM was approximately
. Since the estimation result of the RAM algorithm was obtained using multiple iterations of the ANM algorithm, the time complexity of RAM was approximately
, where
denotes the number of iterations. The time complexity of VATDE was approximately
. Obviously, VATDE was the most computationally efficient in theory.
The average running times of the ANM, RAM, and VATDE algorithms in MATLAB 2020a are shown in
Table 1. Compared with the ANM and RAM algorithms, VATDE clearly had the shortest running time. The running times of the ANM and RAM algorithms were much longer than that of the VATDE algorithm. The running time of the RAM algorithm was a multiple of the running time of the ANM algorithm, and the specific value depended on the number of iterations. In this paper, the number of iterations was set to three and it can be seen that RAM took three times as long as ANM to complete. The simulation results were consistent with the theory and prove the high computational efficiency of the VATDE algorithm.
Of course, for the ANM, RAM, and VATDE algorithms, the running time was affected by the number of sampling points. When the number of sampling points is higher, the running time will be longer. However, as shown by the results in
Figure 5, when the number of sampling point varied from 30 to 300, the running time and sampling points did not change linearly, and the multiplier of the increase in running time was greater than the multiplier of the increase in sampling points. Therefore, when the running time is long due to a large number of sampling points, the calculation efficiency can be improved by dividing the dataset into multiple small sets of sampling points. In addition, the results in
Figure 5 show that the VALSE algorithm possessed computational advantages compared with the ANM and RAM algorithms at different numbers of sampling points, and the computational efficiency was more than 10 times that of the ANM algorithm.
Based on the above calculation efficiency comparison, it can be concluded that the VATDE algorithm proposed in this paper is more likely to be applied to real-time data processing.
Finally, the root-mean-squared error (RMSE) and SNR curves were plotted to show the time delay estimation accuracy of the three algorithms. Considering the different uses of time delay estimation, this paper divided the accuracy of delay estimation into two categories. One is the time delay estimation accuracy in the case of a single time delay without considering the multipath effect, which is usually applied to the geometric positioning problem in passive sonars [
9]. The second category is the time delay estimation accuracy with multiple time delays in the multipath environment. The multipath effect will lead to more than three peaks in the cross-correlation function. The depth estimation problem of underwater targets can be solved by estimating multiple time delays [
10].
Meanwhile, the signal simulation remained unchanged. The settings of single time delay and multiple time delays were the same as those in the above simulation. When SNR changed from 0 dB to 20 dB, then the RMSE of the ANM, RAM, and VATDE algorithms was compared, and each result was obtained using 500 Monte Carlo simulations. The simulation results are shown in
Figure 6.
Figure 6a shows the RMSE and SNR curves under a single time delay, while
Figure 6b shows the RMSE and SNR curves under multiple time delays. It can be seen that the RMSE of the VATDE algorithm was lower than that of the ANM and RAM algorithms in the cases of both single and multiple time delays. Especially in a low SNR environment, the time delay estimation accuracy of the ANM and RAM algorithms deteriorated significantly, while VATDE still had high time delay estimation accuracy. In addition, the RMSE of the ANM and RAM algorithms was approximately equal with a single time delay, but with multiple time delays, the RMSE of the RAM algorithm was lower than that of the ANM algorithm. This may be because the RAM algorithm had fewer false peaks than the ANM algorithm. Moreover, it can be seen from the RMSE results in
Figure 6a,b that the RMSE curves of the ANM, RAM, and VATDE algorithms in
Figure 6b were much more than three times those in
Figure 6a, which may have been due to the mutual interference between multiple time delays, which reduced the accuracy of time delay estimation. If the post-processing method of the cross-correlation function is adopted, that is, only the peak region of the cross-correlation function is searched separately, avoiding the search of the whole continuous time region, then the accuracy of time delay estimation with multiple time delays may be further improved. This is the next step of this study, and will not be discussed here.