1. Introduction
Linear system identification problems have to be worked out in the context of various applications [
1,
2], including echo cancellation, active noise control, interference reduction, and channel modeling among others. A benchmark technique used to address such problems is the well-known Wiener filter, which basically relies on solving a linear system (namely the Wiener–Hopf equations) using a set of statistics. The Wiener–Hopf equations involve a set of estimates for the covariance matrix of the input signal and the cross-correlation vector between the input and reference sequences. The problem is formulated following an optimization criterion in terms of minimizing the mean-squared error (MSE), while the error is defined as the difference between the reference sequence and the output signal. The resulting optimal filter also represents an important basis for the development of other related tools for system identification problems, such as adaptive filtering algorithms [
3,
4].
There are some inherent limitations associated with the conventional Wiener filter solution, which is obtained by directly solving (using matrix inversion) the Wiener–Hopf equations. First, the accuracy of the solution is highly influenced by the accuracy of the statistics’ estimates. On the other hand, obtaining a reliable set of these estimates requires a large amount of data, i.e., much larger than the filter length. This could represent a significant shortcoming when dealing with a limited (or incomplete) amount of data and/or a long-length filter. Second, the external noise (that is part of the reference signal) biases the Wiener filter solution, which becomes less accurate when the signal-to-noise ratio (SNR) decreases. This could be the case in noisy environments, where different types of perturbations are likely to emerge. Third, the conventional solution involves the covariance matrix inversion, which is a very challenging operation in terms of both computational complexity and numerical accuracy [
5,
6]. The difficulty could increase significantly when operating with long-length filters, which further entail large dimension matrices.
Most of the previously discussed limitations are connected to the length of the filter, which could be very large in many scenarios. For example, in applications like echo cancellation and noise reduction [
7], the acoustic impulse responses to be identified have hundreds/thousands of coefficients when using the common sampling rate of 8 or 16 kHz. Therefore, dealing with such long-length filters could lead to significant limitations in terms of both the accuracy and complexity of the solution. In order to reformulate such high-dimension system identification problems (with a large parameter space) more efficiently, a recently developed decomposition-based technique has been involved [
8]. The main idea behind this technique is to exploit the low-rank feature of the system impulse response, in conjunction with its nearest Kronecker product (NKP) decomposition. As a result, a system identification problem featuring a large parameter space is designed as a combination of two shorter filters, with a significantly reduced number of coefficients. This further implies the operation with smaller matrices/vectors and, consequently, leads to improved robustness in terms of the accuracy of the final solution, even for the challenging cases mentioned above (e.g., a limited amount of data and/or low SNRs). Due to these important gains, the NKP-based approach has been involved in a wide range of applications, among which can be mentioned echo cancellation, adaptive beamforming, linear prediction, speech dereverberation, and microphone arrays, e.g., see [
9,
10,
11,
12,
13,
14,
15,
16,
17] and the references therein.
Recently, the NKP technique has been applied in conjunction with a third-order tensor (TOT) decomposition of the impulse response [
18], leading to a higher efficiency in terms of reducing the dimensionality of the system identification problem. This was not a straightforward extension of the low-rank approach presented in [
8], since handling the rank of a tensor is a sensitive issue that usually involves different approximation techniques [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. On the other hand, the solution proposed in [
18] avoids such an approximation, by controlling and limiting the tensor rank to very small values. However, the resulting Wiener filter based on TOT decomposition solves the involved sets of Wiener–Hopf equations using the conventional approach, which relies on matrix inversion. Alternatively, different iterative techniques could be used to avoid such an operation [
29,
30,
31], like the conjugate gradient (CG) method [
32]. In [
33], the CG algorithm has been applied in conjunction with the NKP-based technique from [
8], showing improved performance. However, applying the CG method together with the TOT decomposition is a more challenging task, due to the particular connection between the three (shorter) component filters and the need for auxiliary variables within the algorithm.
Motivated by these aspects, in the current paper, we design an improved iterative version of the Wiener filter. The proposed algorithm involves the TOT-based decomposition, together with the CG method to solve three sets of Wiener–Hopf equations. As a result, it outperforms the counterpart version from [
18], which uses the direct matrix inversion for solving the Wiener–Hopf equations, and also the CG-based solution from [
33], which exploits the second-order NKP decomposition. Following this introduction, in
Section 2 we provide some background on the conventional Wiener filter, the CG method (to avoid matrix inversion), and the TOT-based decomposition. Next, in
Section 3, the proposed algorithm is developed. Simulation results provided in
Section 4 support its performance and advantages compared to the existing solutions. The paper is summarized in
Section 5, outlining the main conclusions and several perspectives for future works.
2. Conventional Wiener Filter, Conjugate Gradient Method, and Impulse Response Decomposition Based on a Third-Order Tensor
In this section, several backgrounds related to the upcoming developments are provided. First, we present the conventional Wiener filter for solving linear system identification problems. Next, the CG method is introduced as an efficient (iterative) alternative to avoid the matrix inversion required by the direct solution of Wiener–Hopf equations. Finally, the TOT-based decomposition of the impulse response is presented, outlining the main idea recently introduced in [
18].
The main framework considered in this paper is related to a single-input single-output (SISO) linear system identification scenario, where all the involved signals are zero-mean and real-valued. In this context, the available signals are the input
and the reference
, while
t represents the discrete-time index. Under this scenario, there is a correlation between these two sequences, since the reference signal is obtained at the output of an unknown system driven by the input signal, while the output is corrupted by an additive noise, as shown in
Figure 1. Thus,
where the vector
h contains the
L coefficients of the unknown impulse response (with superscript
denoting transposition),
is a vector that contains the
L most recent time samples of the input signal
, and
is an additive noise, which is uncorrelated with
. In the second line of (
1),
represents the output signal.
Based on the correlation between the reference sequence and the input signal, and following the MSE optimization criterion, an estimate of
can be obtained by solving the Wiener–Hopf equations [
2], i.e.,
where
represent the covariance matrix of the input signal and the cross-correlation vector between the input and reference sequences, respectively,
contains the coefficients of the Wiener filter (i.e.,
L parameters), while
denotes mathematical expectation. Thus, the conventional Wiener filter results by using the matrix inversion operation, so that
In order to avoid matrix inversion, several alternative methods for solving (
5) can be applied. The basic idea is to obtain the final solution in an iterative manner. In this context, the CG method [
32] represents a popular choice, being included in the category of exact line search methods [
34,
35]. Hence, considering the initialization
, an initial residual
can be computed. Also, in the initial step, it requires a conjugate vector
and an auxiliary scalar
. Using this initialization, the CG algorithm runs for
k steps, each one involving the relations:
The stopping criterion can be related to a maximum number of updates (i.e., for
) or a predefined threshold for the residual.
The convergence of the CG algorithm is influenced by the condition number of
[
5], i.e., the larger this number, the slower the convergence is. In order to improve the convergence rate, a preconditioning procedure could be applied to this matrix. There are different methods for choosing the so-called preconditioner (i.e., a matrix that multiplies
), like Jacobi, Gauss–Seidel, etc. Basically, the algorithm from (
6)–(12) is reformulated, while incorporating the preconditioning directly into the iteration. On the other hand, this procedure involves an additional computational amount. Nevertheless, the purpose of this paper is not to analyze the influence of different preconditioners on the overall performance of the CG algorithm. Our primary goal is to develop the decomposition-based approach in conjunction with the CG method for solving the Wiener–Hopf equations. In this context, the main challenges are related to the connection between the component filters and the specific initialization (as will be shown in the next section), and not to the performance of the CG algorithm itself. Consequently, in the following, the basic CG algorithm from (
6)–(12) is considered without preconditioning. However, in order to keep the positive-definite character of the covariance matrix and to avoid any potential numerical/stability problems [
6], it is recommended to add a very small positive constant to the elements of the main diagonal.
The maximum number of updates required by the Wiener filter using the CG method (namely WF-CG) to reach the solution of the conventional Wiener filter (WF) is generally much smaller compared to the filter length. This is supported in
Figure 2 and
Figure 3, where the performances of the conventional WF and WF-CG are analyzed in two different scenarios for the identification of a network echo path of length
(using a sampling rate of 8 kHz). This impulse response results from the first cluster of ITU-T G168 Recommendation [
36] concerning digital network echo cancellers. It contains 64 coefficients padded with zeros up to the full-length
L. The required statistics (
and
) are estimated by averaging across
data samples of
and
, with
. The reference signal is obtained according to (
1), using a first-order autoregressive [AR(1)] process as input, which results from filtering white Gaussian noise through an AR(1) model; the pole of this model is set to 0.8. The additive noise is white and Gaussian, with
, where
and
stand for the variances of
and
, respectively. The results are shown using a common performance measure involved in system identification scenarios, which is the normalized misalignment (in dB). It is defined as
(where
denotes the Euclidean norm) and basically shows the “difference” between the true impulse response and the estimated one. The lower this quantity, the better the accuracy of the estimate.
In the first scenario considered in
Figure 2, we evaluate the impact of using different amounts of data (
) to estimate the statistics by varying the value of
M. It can be noticed that a low amount of data significantly influences the accuracy of the Wiener solution. Nevertheless, the WF-CG converges toward the conventional WF after a small number of CG iterations (as compared to the filter length). Second, in
Figure 3, the SNR influence is outlined. As expected, a lower SNR reduces the accuracy of the Wiener estimate. Similarly to the previous experiment, the WF-CG reaches the conventional WF for
. Both analyzed scenarios support the influence of the main factors that affect the behavior of the Wiener filter, i.e., the amount of available data for estimating the statistics and the SNR level. Thus, it is of great importance to improve the overall performance and robustness related to these aspects.
In terms of computational complexity, the conventional Wiener solution based on matrix inversion requires operations, while the iterative version that uses the CG method needs an amount, with . Nevertheless, when identifying a long-length impulse response, a large value of K could be required for the CG iterations. This also represents a motivation for the dimensionality reduction of the problem, by reformulating a system identification scenario with a large parameter space (i.e., a large number of coefficients) as a combination of the estimates provided by shorter filters.
In this regard, the recent solution from [
18] is based on a third-order tensor decomposition of the impulse response, namely TOT decomposition. As a result, the final estimate results as a combination (via the Kronecker product) of the coefficients associated with three filters, which are significantly shorter (as compared to the original impulse response). This idea is briefly explained in the following. First, let us consider that the length of the filter can be factorized as
, with
, so that the impulse response of the system results in [
8]
Here, the shorter impulse responses
and
have the lengths
and
, respectively, while ⊗ denotes the Kronecker product [
37]. At this point, let us assume that
is low rank [
8]. Moreover, its length can be factorized as
(with
). Consequently, the global impulse response results in
where the two impulse responses
and
have the lengths
and
, respectively, while
. It can be noticed that the coefficients of
can be “rearranged” in the form of a third-order tensor, i.e.,
where ∘ stands as the notation for the outer product. Furthermore,
is in fact a sum of
P third-order tensors, each one of rank
[
19]. As indicated in [
18], the recommended values for
are small (e.g., 2 or 3).
Summarizing, the identification of the global impulse response
of length
L (i.e., with
coefficients) is transformed into a combination of three (shorter) sets of impulse responses, i.e.,
,
, and
(for
and
). As a result, the new parameter space of the filter involves only
,
, and
coefficients, respectively. Since usually
[
18], the TOT-based decomposition approach leads to a significant dimensionality reduction, which is achieved especially when dealing with long-length filters (i.e., large values of
L). While the conventional Wiener filter using matrix inversion involves a computational complexity proportional to
, the decomposition-based solution using the CG method combines the estimates of three shorter filters, which results in a computational complexity proportional to
, with
.
3. Iterative Wiener Filter Based on TOT and CG
The current section is dedicated to the development of the proposed solution, which results in the form of an iterative Wiener filter based on TOT decomposition and using the CG method for solving the associated sets of Wiener–Hopf equations. For this purpose and a better readability of the upcoming developments, several preliminary elements from [
18] and the specific notation are presented at the beginning of this section. These preliminaries are related to the TOT-based decomposition framework and the associated Wiener–Hopf equations. Further, the proposed solution is developed. The differences between the version from [
18] and the current proposal based on the CG method are mainly related to (i) the specific initialization that involves auxiliary matrices and (ii) the connection between the component filters from one CG cycle to another within the main iterations of the proposed CG-based Wiener filter. Moreover, since the impulse responses from (
14) have different lengths, the CG cycles corresponding to the component filters use different numbers of updates.
As shown in [
18], the estimates of the component impulse responses from (
14) can be obtained based on a multilinear optimization approach [
38,
39]. In other words, two of the component impulse responses are considered fixed, while optimizing the third (remaining) one. This approach leads to three sets of Wiener–Hopf equations, i.e.,
The corresponding data structures and the associated notation are shown in
Table 1, where
generally denotes the estimate of
from (
14), while
is the identity matrix of size
.
At this point, (
16)–(18) are going to be solved with the CG method. The resulting solutions will then be sequentially iterated and combined (via the Kronecker product). Finally, the Wiener filter
, which represents an estimate of
, will be obtained as
where
,
, and
are obtained from
,
, and
, respectively. All these steps of the designed algorithm are detailed in the following.
As mentioned before, the developed iterative Wiener filter is based on the TOT decomposition of the global impulse response, while the CG updates are used to efficiently solve (
16), (17), and (18), respectively. To this purpose, the main iterations of the Wiener filter are denoted as superscripts
, while the CG updates appear in subscripts
. The initialization of the algorithm concerns the three component filters, which are initially defined as
where
and
represent the maximum number of CG updates (for the component filters),
is a very small positive number, while
denotes an all-zeros vector with the length indicated in subscript. The reason for using
is that we are dealing with different lengths for the component filters. Among them, the length of
(which has
coefficients) could be significantly smaller, taking into account that
and
have
and
coefficients, respectively, while
.
At this point, we need to introduce the auxiliary matrices (for
and
):
which will further facilitate the definition of matrices
and
. Hence, in each main iteration
of the algorithm, we first construct using (
23):
These allow us to compute
The structures from (
28) and (29) are used to process (
16) with the CG method. Consequently, using (
20), the initial settings are
Next, for
, we perform similar to (
6)–(12):
The final solution
will represent the initialization for the CG cycle associated with this filter [similar to (
30)] in the next main iteration of the algorithm. Also, it is decomposed as
which further allows the evaluation of
so that we can compute
The notation from (
45) and (46) is used to process (17) with the CG updates. Hence, in this step, we follow the initial settings from (21), so that
Consequently, the CG cycle for the second filter is defined by the relations:
for
. The final solution
will represent the initial setting in the next main iteration of the algorithm [similar to (
47)]. The decomposition of this impulse response is performed in two steps, i.e.,
At this point, having the components from (
41) and (59), we continue with the development associated with the last component filter, starting with the evaluation of
Therefore, introducing the notation:
we can further process (18) with the CG method. To this purpose, the initialization relies on (22), so that the settings are
Thus, for
, the CG cycle for the third filter consists of the relations:
The decomposition of the final solution
results in
and provides the final elements for evaluating the estimated impulse response based on (
19). Also,
represents the initialization for the next main iteration of the algorithm, according to (
65).
Summarizing, using (
41), (59), and (77), we obtain
Finally, using the same components, we evaluate the auxiliary matrices (for
and
):
These will be used in the next main iteration of the algorithm, in order to compute the structures from (
25) and (
42), respectively.
The resulting iterative Wiener filter (IWF) is based on TOT decomposition and uses the CG method, which will be referred to as IWF-TOT-CG. Its main steps are summarized in
Table 2, while the CG cycles for solving (
16)–(18) are detailed in
Table 3.