Next Article in Journal
A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model
Previous Article in Journal
Influences of the Pretreatments of Residual Biomass on Gasification Processes: Experimental Devolatilizations Study in a Fluidized Bed
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Effective Adaptive Combination Strategy for Distributed Learning Network

1
School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China
2
Institute of Acoustics, Chinese Academy of Sciences, Beijing 100000, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(12), 5723; https://doi.org/10.3390/app11125723
Submission received: 9 March 2021 / Revised: 25 May 2021 / Accepted: 31 May 2021 / Published: 20 June 2021
(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Abstract

:
In this paper, we develop a modified adaptive combination strategy for the distributed estimation problem over diffusion networks. We still consider the online adaptive combiners estimation problem from the perspective of minimum variance unbiased estimation. In contrast with the classic adaptive combination strategy which exploits orthogonal projection technology, we formulate a non-constrained mean-square deviation (MSD) cost function by introducing Lagrange multipliers. Based on the Karush–Kuhn–Tucker (KKT) conditions, we derive the fixed-point iteration scheme of adaptive combiners. Illustrative simulations validate the improved transient and steady-state performance of the diffusion least-mean-square LMS algorithm incorporated with the proposed adaptive combination strategy.

1. Introduction

It is generally beneficial to exploit diffusion strategies for distributed parameter estimation issues over adaptive networks [1,2,3,4,5,6]. Specifically, the diffusion least-mean-square (LMS)-based methods have already been used in many contexts, such as biological behavior modeling [7,8], distributed detection [9], distributed localization [10] and target tracking and escaping from predators [11], where scalability, robustness, and low power consumption are the desirable features [1]. For the diffusion strategy, each node of the network is allowed to receive the intermediate estimates from its neighboring nodes to improve the accuracy of its local estimate. Such cooperation enables each node to leverage the spatial diversity of noise profile over the entire network. From this point of view, the performance of distributed diffusion methods can be further enhanced by using suitable combination weights (combiners).
There have been several static combination rules [1,12], e.g., Metropolis rule, Laplacian rule, Uniform rule and Relative-degree rule. However, these static combiners are designed based solely on the topology of network, so they would generally be unadjustable to adapt to the spatial variation of signal and noise statistics. To address this problem, many studies resort to the adaptive combination (AC) strategies [12,13,14,15,16,17,18], most of which are developed for the adapt-then-combine (ATC) diffusion LMS algorithm [1].
Based on the minimum variance unbiased estimation (MVUE), the classic AC strategy [12] outperforms existing static combiners when applying onto the diffusion LMS algorithms. An optimal adaptive combination scheme is derived by estimating the variances of the measurement noises adaptively [13]. Simulation results validate the superior steady-state performance of the diffusion LMS algorithm regarding the optimal combiners in [13], as compared to previous static and adaptive combiners. Based on the adaptive combination rule in [13], an optimal combination rule regarding the channel distortion is also proposed [16]. To achieve both the accelerated convergence rate and good steady-state network performance, the combination switching mechanisms [14,15] are proposed, i.e., static combination scheme in the converging stage and AC scheme when approaching the steady state.
In addition, a decoupled adapt-then-combine (D-ATC) algorithm is proposed, for, the least-squares (LS)-based AC scheme is developed [17,18], which could achieve rather approximate performance as the ATC algorithm with the classic AC in homogeneous networks.

Motivation and Contribution

As mentioned above, the classic AC strategy is derived based on MVUE, which is validated to be a feasible criterion [12]. In fact, one of the key techniques in the classic AC strategy [12] is orthogonal projection which is exploited to guarantee the combiners adding up to 1. However, the orthogonal projection technique actually limits the update direction of combiners at each iteration, which can be relaxed to further improve the performance of diffusion LMS algorithm further ahead in Section 3.2.
In this paper, we still formulate the online adaptive combiners estimation of the ATC algorithm from the perspective of MVUE. Instead of directly exploiting the orthogonal projection technology in [12], we present a non-constrained mean-square deviation (MSD) cost function based on Lagrange multipliers. According to the fixed-point iteration methodology and KKT necessary conditions, we develop an effective adaptive combination strategy, which solely relies on the previous instantaneous intermediate weight estimates without resorting to the knowledge of measurement data and noises. The proposed AC strategy can be seen as the modified and extended version of the classic AC in [12]. Simulations validate the superior performance of the diffusion LMS algorithm when using the proposed AC strategy.
Notation: R and C denote the fields of real and complex numbers, respectively. Scalars are denoted by lower-case letters, and vectors and matrices respectively by lower- and upper-case boldface letters. The transpose and conjugate transpose are denoted by ( · ) T and ( · ) H , respectively. E { · } represents expectation. ( · ) means the real part. col { · } stands for the vector obtained by stacking its arguments on top of one another. diag { · } generates a diagonal matrix using the given diagonal arguments. [ · ] i stands for the ith element of a vector. The min { · } denotes the minimum element of a vector. We define the eigenvalue set of the square matrix F as { λ ( F ) } , with λ max ( F ) denoting the maximum eigenvalue. The spectral radius of the square matrix F is denoted by ϱ ( F ) max { | λ ( F ) | } .

2. The ATC Diffusion LMS Algorithm

2.1. Model Assumption

Consider a network containing N nodes, which are used to estimate an M-dimensional unknown parameter w o C M collectively. N k denotes the set of neighbors of node k, including k itself. The cardinality of N k is n k . For each node k, at the time instant t, the regressor u k ( t ) C M and the measurement signal d k ( t ) C are available. The signal model is given by
d k ( t ) w o H u k ( t ) + v k ( t ) ,
where v k ( t ) denotes the additive zero-mean white Gaussian measurement noise at node k, with variance σ v , k 2 . For any k and t, v k ( t ) is independent from u k ( t ) , and for all k l or i j , v k ( i ) is independent from v l ( j ) .

2.2. ATC Algorithm

The main target of distributed estimation algorithm is to generate an estimate w k ( t ) of w o at each node k and time t in a distributed manner. For the diffusion strategy, generally, each node k executes a local adaptation step to obtain an intermediate estimate ψ k ( t ) , then all the nodes share their intermediate estimates to their neighbors, and finally, each node k linearly combines all the intermediate estimates received from its neighbors under some combination weights. The detailed steps of ATC diffusion LMS algorithm are
ψ k ( t ) = w k ( t 1 ) + μ k u k ( t ) e k * ( t ) ,
w k ( t ) = l N k a l , k ψ l ( t ) ,
where e k ( t ) d k ( t ) w k H ( t 1 ) u k ( t ) is the prior estimate error and μ k > 0 is the step size for k { 1 , 2 , , N } . The combiner a l , k is the weight of intermediate estimation from node l during the combination step of node k. Moreover, the non-negative combination matrix A = [ a l , k ] satisfies [1,12]
a l , k 0 if l N k , a l , k = 0 if l N k , a k T 1 = 1 ,
with a k denoting the kth column of A . Notice that A is left-stochastic since the entries of each column are non-negative and sum to 1 [19,20].

3. Adaptive Combination Scheme

3.1. Minimum Variance Unbiased Estimation

Consider the signal model and the ATC diffusion LMS algorithm in Section 2. Assume that for each node k { 1 , , N } , the intermediate estimate ψ k ( t ) in the diffusion LMS algorithm satisfies
E { ψ k ( t ) } = w o , for all k { 1 , , N } ,
We define
Ψ ( t ) [ ψ 1 ( t ) , , ψ N ( t ) ] , k = 1 , , N .
Following [12], we have the minimum variance unbiased estimation problem for each k { 1 , , N } ,
min a k R N a k T ( Q Ψ ) a k st . 1 N T a k = 1 ; a l , k = 0 , for all l N k ,
where Q Ψ E { ( Ψ ( t ) E { Ψ ( t ) } ) H ( Ψ ( t ) E { Ψ ( t ) } ) } and 1 N denotes the N × 1 vector with unit entries. Applying the convex combination strategy for all k { 1 , , N } , we also require a l , k 0 .

3.2. Fixed-Point Iteration Solution

First, we introduce a transform matrix P k , defined as P k [ l th column of I N ] l N k . Then, a k in (6) can be expressed by a k = P k b k , with b k R n k . Therefore, the minimization problem (6) can be transformed into
min b k J ( b k ) b k T ( Q Ψ k ) b k st . 1 n k T b k = 1 ; b k , i 0 , for i = 1 , , n k ,
where Q Ψ k P k T Q Ψ P k = E { ( Ψ k ( t ) E { Ψ k ( t ) } ) H ( Ψ k ( t ) E { Ψ k ( t ) } ) } with Ψ k ( t ) Ψ ( t ) P k , and b k , i is the ith element of vector b k . By introducing the Lagrange multipliers α and ω , we obtain the cost function
L ( b k , ω , α ) J ( b k ) ω T b k + α ( 1 n k T b k 1 ) .
Taking the gradient of (8) with respect to b k yields
b k L ( b k , ω , α ) = b k J ( b k ) ω + 1 n k α = ( Q Ψ k ) b k ω + 1 n k α .
According to the Karush–Kuhn–Tucker (KKT) condition [21], the optimal value of the tuple ( b k , ω , α ) should obey
( Q Ψ k ) b k ω + 1 n k α = 0 ω i b k , i = 0 .
or equivalently,
[ ( Q Ψ k ) b k + 1 n k α ] i b k , i = 0 .
We introduce a positive definite diagonal matrix D k ( t 1 ) , whose ith diagonal element is an arbitrary positive function with respect to b k ( t 1 ) , denoted as f i ( b k ( t 1 ) ) . Obviously, D k ( t 1 ) b k L ( b k , ω , α ) is still the descent direction of the cost function (8) [21]. Therefore, we have the adaptive solution of problem (7) through the fixed-point iteration method
b k , i ( t ) = b k , i ( t 1 ) η k , i ( t ) f i ( b k ( t 1 ) ) b k , i ( t 1 ) [ ( Q Ψ k ) b k ( t 1 ) + 1 n k α ] i ,
where b k , i ( t ) is the estimate of b k , i at time instant t and η k , i is the learning factor of b k , i ( t ) .
To simplify the problem, we choose the same learning factor η k , i for all i at any time t. Hence, we can rewrite (12) as a vector,
b k ( t ) = b k ( t 1 ) η k ( t ) Γ k ( t 1 ) [ ( Q Ψ k ) b k ( t 1 ) + 1 n k α ] .
where Γ k ( t 1 ) diag { D k ( t 1 ) b k ( t 1 ) } .
Applying the constraint 1 n k T b k ( t ) = 1 and pre-multiplying both sides of (13) by 1 n k T yields the Lagrange multiplier α
α = 1 n k T ( b k ( t 1 ) η k ( t ) Γ k ( t 1 ) ( Q Ψ k ) b k ( t 1 ) ) 1 η k ( t ) 1 n k T Γ k ( t 1 ) 1 n k .
Substituting α into (13), and using the constraint 1 n k T b k ( t 1 ) = 1 again, we can obtain the update of combiners b k ( t ) ,
b k ( t ) = b k ( t 1 ) η k ( t ) G k ( t 1 ) Γ k ( t 1 ) ( Q Ψ k ) b k ( t 1 ) .
where G k ( t 1 ) I n k D k ( t 1 ) b k ( t 1 ) 1 n k T 1 n k T D k ( t 1 ) b k ( t 1 ) with I n k denoting the n k × n k identity matrix.
The adaptive combiners (15) can be updated by two incremental steps
g k ( t ) = ζ k ( t 1 ) ( Q Ψ k ) b k ( t 1 )
b k ( t ) = b k ( t 1 ) η k ( t ) g k ( t )
where
ζ k ( t 1 ) = G k ( t 1 ) Γ k ( t 1 ) .
Then, we can obtain the combiner a k ( t )
a k ( t ) = P k b k ( t )
which can be used in (3) to update the local weight estimate adaptively. We also define the adaptive combination matrix A ( t ) R N × N where a k ( t ) is the kth column vector of it.
Please note that g k ( t ) in (16) can be seen as the product of ζ k ( t 1 ) and the gradient b k J ( b k ( t 1 ) ) = ( Q Ψ k ) b k ( t 1 ) , which means ζ k ( t 1 ) can be seen an auxiliary matrix to adjust the update direction b k J ( b k ( t 1 ) ) . On the one hand, note that G k ( t 1 ) in (18) is a projection matrix [22], which enables the update direction g k ( t ) orthogonal to the vector 1 n k , i.e., 1 n k T g k ( t ) = 0 . Then we have 1 n k T b k ( t ) = 1 n k T b k ( t 1 ) = = 1 n k T b k ( 0 ) = 1 if the initial combiners satisfy 1 n k T b k ( 0 ) = 1 . On the other hand, since as a whole, ζ k ( t 1 ) is a positive semi-definite symmetric matrix, the update direction g k ( t ) is still the descent direction of the cost function (7) [21]. Instead of using the positive semi-definite symmetric matrix ζ k ( t 1 ) = G k ( t 1 ) Γ k ( t 1 ) , the classic AC [12] consider ζ k ( t 1 ) being replaced by the orthogonal projection matrix characterized by 1 n k , i.e., I n k 1 n k 1 n k T n k , which actually limits the update direction of the adaptive combiners to be lie in the hyperplane spanned by 1 n k and b k J ( b k ( t 1 ) ) . In fact, we find that (16) and (17) could reduce to the classic AC [12] when D k ( t ) = diag { b k ( t ) } 1 .
We now consider optimizing the learning factor η k ( t ) . Substituting (16) and (17) into (7) yields a cost function regarding the learning factor η k ( t ) ,
h ( η k ( t ) ) = g k T ( t ) ( Q Ψ k ) g k ( t ) η k 2 ( t ) 2 g k T ( t ) ( Q Ψ k ) b k ( t 1 ) η k ( t ) + b k T ( t 1 ) ( Q Ψ k ) b k ( t 1 ) .
Obviously, h ( η k ( t ) ) is a quadratic (convex) function with respect to η k ( t ) . Thus, its minimum value can be readily obtained if and only if
η k o ( t ) = g k T ( t ) ( Q Ψ k ) b k ( t 1 ) g k T ( t ) ( Q Ψ k ) g k ( t ) = b k T ( t 1 ) ( Q Ψ k ) ζ k ( t 1 ) ( Q Ψ k ) b k ( t 1 ) g k T ( t ) ( Q Ψ k ) g k ( t ) .
Note that the optimal learning factor η k o ( t ) is non-negative since ζ k ( t 1 ) is a positive semi-definite matrix.
To guarantee that the combiners are non-negative, we set the upper bound of η k ( t ) as [12],
η k max ( t ) = min ( b k ( t ) ) g k ( t ) + ε ,
where ε > 0 is a small constant. Please note that · represents the maximum norm. Thus, we choose the learning factor in (17) at time instant t,
η k ( t ) = min { η k max ( t ) , η k o ( t ) } .
Please note that the Q Ψ k is usually unavailable in practical applications. As what done in [12], Q Ψ k can be replaced by its approximation
Q ^ Ψ k ( t ) 1 2 Δ Ψ k H ( t ) Δ Ψ k ( t ) .
where Δ Ψ k ( t ) = Ψ k ( t ) Ψ k ( t 1 ) . To make it smoother, we consider a forgetting factor λ . Then, the iterative expression of Q ^ Ψ k can be written as
Q ^ Ψ k ( t ) = λ Q ^ Ψ k ( t 1 ) + 1 2 Δ Ψ k H ( t ) Δ Ψ k ( t ) .
In practical applications, we use Q ^ Ψ k ( t ) in (25) to replace the aforementioned statistical quantity Q Ψ k for each node k at time instant t.
Finally, the implementation of the ATC algorithm with the proposed AC strategy is summarized in Algorithm 1.
Algorithm 1 ATC with the proposed AC strategy
For each node k, set ψ k ( 0 ) = w k ( 0 ) = 0 M and choose b k ( 0 ) R n k so that 1 n k T b k ( 0 ) = 1 .
Given a small positive constant ε and step size μ k , at each time instant t > 0 , compute at each node k:
1. Update the intermediate weight estimate ψ k ( t ) through (2).
2. Update combiner a k ( t ) consecutively through (25), (16), (23), (17) and (19).
3. Update the local weight estimate w k ( t ) through (3).

4. Mean Convergence

We now analyze the mean convergence of the diffusion LMS algorithm with the proposed adaptive combiners.
We now introduce the following independence assumptions:
Assumption 1
(Independence). All regressors u k ( t ) are spatially and temporally independent ([1], Assumption 1).
Assumption 2.
The combination matrix A ( t ) is independent of all regressors u k ( t ) and all local weight estimates w k ( t 1 ) at time t 1 ([12], Assumption 4.3).
Theorem 1.
Under Assumptions 1 and 2, a sufficient condition to guarantee the convergence of the diffusion LMS algorithm is given by,
0 < μ k < 2 ϱ R ,
where ϱ R denotes the spectrum radius of the matrix R .
The proof is referred to Appendix A. Please note that the sufficient condition (26) is consistent with ([1], Equation (37)) and ([12], Equation (25))

5. Simulation Results

We evaluate herein the MSD of the proposed algorithm. Without loss of generality, the unknown weight vector w o is set to be 1 M / M with M = 5 . The initial weight vector estimations are w k ( 0 ) = 0 M for each node k. The constant ε used in (22) is set to be 0.5 × 10 4 and the forgetting factor λ = 0 or λ = 0.95 . We consider D k ( t ) = diag { b k ( t ) } γ in terms of the proposed AC, where γ can be 0, −1 or −2.
We use the empirical MSDs as the performance metrics. Both the transient and steady-state empirical network MSDs hereinafter are obtained by averaging L = 500 independent trials over all nodes of the network,
MSD ( t ) 1 L N k = 1 N = 1 L w ˜ k ( ) ( t ) 2 2 ,
MSD 1 L N k = 1 N = 1 L w ˜ k ( ) ( ) 2 2 ,
where w ˜ k ( ) ( t ) w o w k ( ) ( t ) with w k ( ) ( t ) denoting the transient weight estimate of the th trial and w ˜ k ( ) ( ) 2 2 is obtained by averaging w ˜ k ( ) ( t ) 2 2 of 100 iterations after convergence.
Example 1.
We consider the same topology of the network with N = 15 nodes as ([12], Figure 5) ([17], Figure 6a), illustrated in Figure 1a. The measurement noise is real Gaussian white noise with its variance σ v , k 2 at each node k presented in Figure 1b. According to the mean convergence condition, we set the step size to μ k = 0.01 for each node k. We herein consider γ = 0 . Each regressor u k ( t ) at each node k is real Gaussian white noise sequence with their covariance matrices R u , k = σ u , k 2 I M with σ u , k 2 = 1 for all k. The noise power of node 5 , 12 , 13 at the 1500th iteration suddenly goes up to 5.
As demonstrated in Figure 2, the proposed AC strategy outperforms the classic AC [12] and the uniform combination [1] in terms of superior steady-state performance and similar convergence rates, and outperforms LS-based AC [17] and the optimal AC [13] in terms of accelerated convergence rate. We also observe that the forgetting factor can further enhance the performance of the proposed AC. After the sudden change of the noise power of some nodes, the proposed AC exhibits rather high reconvergence rate and good steady-state performance.
Example 2.
The initial measurement noise variance σ v , k 2 at each node k is presented in Figure 1b. In this simulation, we compare the proposed AC scheme with the uniform combination and the classic AC in terms of the steady-state MSD at different noise variances τ σ v , k 2 at each node k. The other simulation conditions are same as Example 1. The steady-state MSDs with respect to the noise variances are illustrated in Table 1.
The fairness of this experiment is endorsed by the relatively approximate convergence behavior of the transient MSD curves plotted in Figure 3. As shown in Table 1, with the noise variances increasing, the ATC algorithm incorporated with the proposed AC exhibits superior performance compared to the ATC algorithm with the uniform combination or the classic AC strategy. Additionally, we can also find that the performance gain brought by the forgetting factor is limited when the noise variances increase to some certain limit.
Example 3.
We now consider target tracking model ([17], Equation (52)), namely
w o ( t ) = w o + θ ( t )
θ ( t ) = 0.99 θ ( t 1 ) + ξ ( t )
where ξ ( t ) is a sequence of independent identically distributed perturbations with zero mean and covariance matrix Ξ, independent of the input regressors and measurement noise at every iteration. We herein consider ξ ( t ) is Gaussian white with Ξ = σ ξ 2 I M and σ ξ 2 = 1 × 10 7 , i.e., the unknown weight vector is varying slowly. The other simulation conditions are same as Example 1.
As illustrated in Figure 4, similar to Example 1, the proposed AC strategy outperforms the uniform combination rule, the classic AC and LS-based AC in terms of superior steady-state performance, and outperforms the optimal AC in terms of accelerated convergence rate, under tracking scenarios. We could also observe that the forgetting factor can further enhance the performance of the proposed AC in the tracking scenario.
Example 4.
We now consider the impact of the factors λ and γ in the performance of the proposed AC strategy. Without loos of generality, we consider the step sizes μ k = 0.02 for each node k. The simulation result is shown in Figure 5.
It can be seen from Figure 5 that with specific γ , the larger λ brings the higher performance gain for the proposed AC. We could also observe that choosing γ = 2 brings better steady-state performance than the other two choices, especially for small forgetting factor λ . In particular, as compared to Example 1, for the two scenarios γ = 0 , λ = 0 and γ = 0 , λ = 0.95 , we find that larger step sizes in this example lead to accelerated convergence rate at the cost of degraded steady-state performance.
Example 5.
We consider the sparse network with N = 15 nodes with its topology same as ([12], Figure 8), depicted in Figure 6a. The measurement noise is real Gaussian white noise with its variance σ v , k 2 at each node presented in Figure 6c. We consider the heterogenous network with the step sizes μ k = 0.004 for orange shaded nodes and μ n = 0.02 for the rest. Each regressor u k ( t ) at each node k is real Gaussian white noise sequence with their covariance matrices R u , k = σ u , k 2 I M with σ u , k 2 presented in Figure 6b for all k.
As illustrated in Figure 7, for the sparse network, the proposed AC scheme outperforms the classic AC and LS-based AC in terms of superior steady-state performance while keeping rather approximate convergence rate. It also outperforms the optimal AC scheme in terms of the accelerated convergence rate. Moreover, the introduction of forgetting factor can further enhance the performance of the proposed AC scheme.

6. Conclusions

In this paper, we present a modified adaptive combination strategy for the distributed estimation problem over diffusion networks to improve robustness against the spatial variation of signal and noise statistics over the network. Considering the Karush–Kuhn–Tucker conditions and fixed-point iteration methodology, we derive an effective adaptive combination strategy for the ATC diffusion LMS algorithm. We also invoke the forgetting factor and optimize the learning factor to further enhance the performance of the proposed adaptive combination strategy. Illustrative simulations validate the improved performance of the diffusion LMS algorithm with the proposed adaptive combination strategy.

Author Contributions

Data curation, Q.L.; Funding acquisition, C.X.; Methodology, Q.L.; Software, C.X.; Supervision, C.X. and D.Y.; Writing—Original draft, Q.L.; Writing—Review & editing, Q.L. and D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was jointly funded by the National Natural Science Foundation of China under Grant 11864016, 61671442 and Social Science Key Research Base Project of Jiangxi Province under Grant JD19042.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

In this section you can acknowledge any support given which is not covered by the author contribution or funding sections. This may include administrative and technical support, or donations in kind (e.g., materials used for experiments).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Mean Convergence Analysis

We define the network weight error
w ˜ ( t ) col { w ˜ 1 ( t ) , , w ˜ N ( t ) } C M N ,
where w ˜ k ( t ) w o w k ( t ) , k = 1 , 2 , , N . We define the global regressor,
u ( t ) col { u 1 ( t ) , , u N ( t ) } ,
and the global covariance matrix,
R diag R 1 , , R N ,
where the covariance matrix R k E u k t u k H t , k = 1 , 2 , , N .
We also introduce the following diagonal matrices,
V ( t ) diag v 1 ( t ) , , v N ( t ) I M ,
D diag μ 1 , , μ N I M ,
and the extended combination matrix
A ( t ) A ( t ) I M .
Following [1,12], we can obtain the iteration of the network weight error
w ˜ t = F ^ t w ˜ t 1 A T ( t ) D z ( t ) ,
where F ^ t A T ( t ) I M N D R ^ t with R ^ t diag R ^ 1 t , , R ^ N t denoting the instantaneous estimate of the global covariance matrix R and R ^ k t u k t u k H t , and z ( t ) V * ( t ) u ( t ) .
Obviously, we have E z ( t ) = 0 . According to Assumption 1 and Assumption 2, taking mathematical expectation of (A7) yields
E w ˜ t = F t E w ˜ t 1 ,
where
F ( t ) E F ^ t = E A T ( t ) I M N D R .
(A8) can be further expressed as
E w ˜ t = F ( t ) E { w ˜ ( 0 ) } ,
where F ( t ) i = t 1 F ( i ) F ( t ) F ( 1 ) . To facilitate our analysis, we herein introduce the submultiplicative matrix norm (A submultiplicative matrix norm satisfies A B A B [23]). For any square matrix X , and any ϵ > 0 , there exists a submultiplicative matrix norm · ϱ such that ϱ ( X ) X ϱ ϱ ( X ) + ϵ , where ϱ ( X ) denotes the spectrum radius of X [23,24]. Accordingly, we have F ( t ) ϱ i = t 1 F ( i ) ϱ . Notice that the diffusion LMS algorithm converges if and only if lim t F ( t ) ϱ = 0 . Hence, a sufficient condition for the diffusion LMS algorithm to converge is F ( t ) ϱ ϱ ( F ( t ) ) + ϵ < 1 . Therefore, the diffusion LMS algorithm converges if ϱ ( F ( t ) ) < 1 for all t with a sufficiently small ϵ chosen. Thus, the diffusion LMS algorithm converges if F ( t ) in (A8) is stable at each t.
E A T ( t ) in (A9) is left-stochastic since each column of it sums to 1. Thus, according to ([1], Appexdix I), F ( t ) is stable if, and only if I M N D R is stable, i.e., max { | 1 λ D R | } < 1 . Notice that D R is positive semi-definite Hermitian since R is block diagonal and D is positive definite diagonal. In light of [25], we have λ max D R λ max D λ max R = μ max λ max R , where μ max is the maximum step size used in the network. Thus, max { | 1 λ D R | } < 1 holds if, and only if μ max λ max R < 2 , i.e., 0 < μ < 2 / λ max R .

References

  1. Cattivelli, F.S.; Sayed, A.H. Diffusion LMS Strategies for Distributed Estimation. IEEE Trans. Signal Process. 2010, 58, 1035–1048. [Google Scholar] [CrossRef]
  2. Lee, H.S.; Kim, S.E.; Lee, J.W.; Song, W.J. A Variable Step-Size Diffusion LMS Algorithm for Distributed Estimation. IEEE Trans. Signal Process. 2015, 63, 1808–1820. [Google Scholar] [CrossRef]
  3. Ahn, D.C.; Lee, J.W.; Shin, S.J.; Song, W.J. A new robust variable weighting coefficients diffusion LMS algorithm. Signal Process. 2017, 131, 300–306. [Google Scholar] [CrossRef]
  4. Huang, W.; Li, L.; Li, Q. Diffusion Robust Variable Step-Size LMS Algorithm Over Distributed Networks. IEEE Access 2018, 6, 47511–47520. [Google Scholar] [CrossRef]
  5. Ashkezari-Toussi, S.; Sadoghi-Yazdi, H. Robust diffusion LMS over adaptive networks. Signal Process. 2019, 158, 201–209. [Google Scholar] [CrossRef]
  6. Nassif, R.; Vlaski, S.; Sayed, A.H. Adaptation and Learning Over Networks Under Subspace Constraints—Part I: Stability Analysis. IEEE Trans. Signal Process. 2020, 68, 1346–1360. [Google Scholar] [CrossRef] [Green Version]
  7. Tu, S.Y.; Sayed, A.H. Mobile adaptive networks with self-organization abilities. In Proceedings of the 2010 7th International Symposium on Wireless Communication Systems, York, UK, 19–22 September 2010; pp. 379–383. [Google Scholar] [CrossRef]
  8. Cattivelli, F.S.; Sayed, A.H. Modeling Bird Flight Formations Using Diffusion Adaptation. IEEE Trans. Signal Process. 2011, 59, 2038–2051. [Google Scholar] [CrossRef]
  9. Cattivelli, F.S.; Sayed, A.H. Distributed detection over adaptive networks using diffusion adaptation. IEEE Trans. Signal Process. 2011, 59, 1917–1932. [Google Scholar] [CrossRef]
  10. Chen, J.; Sayed, A.H. Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks. IEEE Trans. Signal Process. 2012, 60, 4289–4305. [Google Scholar] [CrossRef] [Green Version]
  11. Tu, S.; Sayed, A.H. Mobile Adaptive Networks. IEEE J. Sel. Top. Signal Process. 2011, 5, 649–664. [Google Scholar] [CrossRef]
  12. Takahashi, N.; Yamada, I.; Sayed, A.H. Diffusion Least-Mean Squares with adaptive combiners: Formulation and performance analysis. IEEE Trans. Signal Process. 2010, 58, 4795–4810. [Google Scholar] [CrossRef]
  13. Tu, S.; Sayed, A.H. Optimal combination rules for adaptation and learning over networks. In Proceedings of the 2011 4th IEEE International Workshop Computational Advances in Multi-Sensor Adaptive Processing, San Juan, Puerto Rico, 13–16 December 2011; pp. 317–320. [Google Scholar] [CrossRef]
  14. Yu, C.; Sayed, A.H. A strategy for adjusting combination weights over adaptive networks. In Proceedings of the 2013 IEEE International Conference Acoustic, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 4579–4583. [Google Scholar] [CrossRef]
  15. Fernandez-Bes, J.; Arenas-García, J.; Sayed, A.H. Adjustment of combination weights over adaptive diffusion networks. In Proceedings of the 2014 IEEE International Conference Acoustic, Speech and Signal Processing, Florence, Italy, 4–9 May 2014; pp. 6409–6413. [Google Scholar] [CrossRef]
  16. Abdolee, R.; Vakilian, V. An Iterative Scheme for Computing Combination Weights in Diffusion Wireless Networks. IEEE Wireless Commun. Lett. 2017, 6, 510–513. [Google Scholar] [CrossRef]
  17. Fernandez-Bes, J.; Azpicueta-Ruiz, L.A.; Arenas-García, J. Distributed estimation in diffusion networks using affine least-squares combiners. Dig. Signal Process. 2015, 36, 1–14. [Google Scholar] [CrossRef]
  18. Fernandez-Bes, J.; Arenas-García, J.; Silva, M.T.M. Adaptive Diffusion Schemes for Heterogeneous Networks. IEEE Trans. Signal Process. 2017, 65, 5661–5674. [Google Scholar] [CrossRef] [Green Version]
  19. Sayed, A.H. Adaptive Networks. Proc. IEEE 2014, 102, 460–497. [Google Scholar] [CrossRef]
  20. Zhao, X.; Sayed, A.H. Asynchronous Adaptation and Learning Over Networks-Part I: Modeling and Stability Analysis. IEEE Trans. Signal Process. 2015, 63, 811–826. [Google Scholar] [CrossRef]
  21. Chen, J.; Richard, C.; Bermudez, J.C.M. Nonnegative Least-Mean-Square Algorithm. IEEE Trans. Signal Process. 2011, 59, 5225–5235. [Google Scholar] [CrossRef] [Green Version]
  22. Behrens, R.T.; Scharf, L.L. Signal processing applications of oblique projection operators. IEEE Trans. Signal Process. 1994, 42, 1413–1424. [Google Scholar] [CrossRef] [Green Version]
  23. Kailath, T.; Sayed, A.H.; Hassibi, B. Linear Estimation; Prentice-Hall: Englewood Cliffs, NJ, USA, 2000. [Google Scholar]
  24. Sayed, A.H. Fundamentals of Adaptive Filtering; Wiley: New York, NY, USA, 2003. [Google Scholar]
  25. Marshall, A.W.; Olkin, I.; Arnold, B.C. Matrix Theory. In Inequalities: Theory of Majorization and Its Applications; Springer: New York, NY, USA, 2011; Chapter 9; pp. 338–347. [Google Scholar] [CrossRef]
Figure 1. Network topology and noise profile: (a) network topology; (b) noise profile.
Figure 1. Network topology and noise profile: (a) network topology; (b) noise profile.
Applsci 11 05723 g001
Figure 2. The MSD learning curves.
Figure 2. The MSD learning curves.
Applsci 11 05723 g002
Figure 3. The MSD learning curves under different noise levels.
Figure 3. The MSD learning curves under different noise levels.
Applsci 11 05723 g003
Figure 4. The MSD learning curves under tracking scenarios.
Figure 4. The MSD learning curves under tracking scenarios.
Applsci 11 05723 g004
Figure 5. The MSD learning curves.
Figure 5. The MSD learning curves.
Applsci 11 05723 g005
Figure 6. The topology, regressor power and noise profile: (a) network topology; (b) regressor power; (c) noise profile.
Figure 6. The topology, regressor power and noise profile: (a) network topology; (b) regressor power; (c) noise profile.
Applsci 11 05723 g006
Figure 7. The MSD learning curves for the sparse network.
Figure 7. The MSD learning curves for the sparse network.
Applsci 11 05723 g007
Table 1. The steady-state MSD with respect to the variation of the noise power.
Table 1. The steady-state MSD with respect to the variation of the noise power.
CombinersUniformClassic AC λ = 0 λ = 0.95
Noise τ
τ = 0.5 −49.0−50.2−51.2−53.0
τ = 1 −46.0−48.0−49.1−50.0
τ = 1.5 −44.3−46.6−47.7−48.3
τ = 2 −43.3−45.9−46.9−47.1
τ = 2.5 −42.1−45.1−46.1−46.1
τ = 3 −41.2−44.3−45.2−45.3
τ = 3.5 −40.6−43.7−44.4−44.4
τ = 4 −40.2−43.5−44.1−44.0
τ = 4.5 −39.6−42.9−43.5−43.3
τ = 5 −39.2−42.7−43.3−43.2
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, C.; Li, Q.; Ying, D. An Effective Adaptive Combination Strategy for Distributed Learning Network. Appl. Sci. 2021, 11, 5723. https://doi.org/10.3390/app11125723

AMA Style

Xu C, Li Q, Ying D. An Effective Adaptive Combination Strategy for Distributed Learning Network. Applied Sciences. 2021; 11(12):5723. https://doi.org/10.3390/app11125723

Chicago/Turabian Style

Xu, Chundong, Qinglin Li, and Dongwen Ying. 2021. "An Effective Adaptive Combination Strategy for Distributed Learning Network" Applied Sciences 11, no. 12: 5723. https://doi.org/10.3390/app11125723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop