An Effective Adaptive Combination Strategy for Distributed Learning Network

Xu, Chundong; Li, Qinglin; Ying, Dongwen

doi:10.3390/app11125723

Open AccessArticle

An Effective Adaptive Combination Strategy for Distributed Learning Network

by

Chundong Xu

^1,*,

Qinglin Li

¹ and

Dongwen Ying

²

¹

School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China

²

Institute of Acoustics, Chinese Academy of Sciences, Beijing 100000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(12), 5723; https://doi.org/10.3390/app11125723

Submission received: 9 March 2021 / Revised: 25 May 2021 / Accepted: 31 May 2021 / Published: 20 June 2021

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we develop a modified adaptive combination strategy for the distributed estimation problem over diffusion networks. We still consider the online adaptive combiners estimation problem from the perspective of minimum variance unbiased estimation. In contrast with the classic adaptive combination strategy which exploits orthogonal projection technology, we formulate a non-constrained mean-square deviation (MSD) cost function by introducing Lagrange multipliers. Based on the Karush–Kuhn–Tucker (KKT) conditions, we derive the fixed-point iteration scheme of adaptive combiners. Illustrative simulations validate the improved transient and steady-state performance of the diffusion least-mean-square LMS algorithm incorporated with the proposed adaptive combination strategy.

Keywords:

distributed estimation; diffusion strategy; adaptive combination strategy; KKT conditions

1. Introduction

It is generally beneficial to exploit diffusion strategies for distributed parameter estimation issues over adaptive networks [1,2,3,4,5,6]. Specifically, the diffusion least-mean-square (LMS)-based methods have already been used in many contexts, such as biological behavior modeling [7,8], distributed detection [9], distributed localization [10] and target tracking and escaping from predators [11], where scalability, robustness, and low power consumption are the desirable features [1]. For the diffusion strategy, each node of the network is allowed to receive the intermediate estimates from its neighboring nodes to improve the accuracy of its local estimate. Such cooperation enables each node to leverage the spatial diversity of noise profile over the entire network. From this point of view, the performance of distributed diffusion methods can be further enhanced by using suitable combination weights (combiners).

There have been several static combination rules [1,12], e.g., Metropolis rule, Laplacian rule, Uniform rule and Relative-degree rule. However, these static combiners are designed based solely on the topology of network, so they would generally be unadjustable to adapt to the spatial variation of signal and noise statistics. To address this problem, many studies resort to the adaptive combination (AC) strategies [12,13,14,15,16,17,18], most of which are developed for the adapt-then-combine (ATC) diffusion LMS algorithm [1].

Based on the minimum variance unbiased estimation (MVUE), the classic AC strategy [12] outperforms existing static combiners when applying onto the diffusion LMS algorithms. An optimal adaptive combination scheme is derived by estimating the variances of the measurement noises adaptively [13]. Simulation results validate the superior steady-state performance of the diffusion LMS algorithm regarding the optimal combiners in [13], as compared to previous static and adaptive combiners. Based on the adaptive combination rule in [13], an optimal combination rule regarding the channel distortion is also proposed [16]. To achieve both the accelerated convergence rate and good steady-state network performance, the combination switching mechanisms [14,15] are proposed, i.e., static combination scheme in the converging stage and AC scheme when approaching the steady state.

In addition, a decoupled adapt-then-combine (D-ATC) algorithm is proposed, for, the least-squares (LS)-based AC scheme is developed [17,18], which could achieve rather approximate performance as the ATC algorithm with the classic AC in homogeneous networks.

Motivation and Contribution

As mentioned above, the classic AC strategy is derived based on MVUE, which is validated to be a feasible criterion [12]. In fact, one of the key techniques in the classic AC strategy [12] is orthogonal projection which is exploited to guarantee the combiners adding up to 1. However, the orthogonal projection technique actually limits the update direction of combiners at each iteration, which can be relaxed to further improve the performance of diffusion LMS algorithm further ahead in Section 3.2.

In this paper, we still formulate the online adaptive combiners estimation of the ATC algorithm from the perspective of MVUE. Instead of directly exploiting the orthogonal projection technology in [12], we present a non-constrained mean-square deviation (MSD) cost function based on Lagrange multipliers. According to the fixed-point iteration methodology and KKT necessary conditions, we develop an effective adaptive combination strategy, which solely relies on the previous instantaneous intermediate weight estimates without resorting to the knowledge of measurement data and noises. The proposed AC strategy can be seen as the modified and extended version of the classic AC in [12]. Simulations validate the superior performance of the diffusion LMS algorithm when using the proposed AC strategy.

Notation:

R

and

C

denote the fields of real and complex numbers, respectively. Scalars are denoted by lower-case letters, and vectors and matrices respectively by lower- and upper-case boldface letters. The transpose and conjugate transpose are denoted by

{(\cdot)}^{T}

and

{(\cdot)}^{H}

, respectively.

E {\cdot}

represents expectation.

ℜ (\cdot)

means the real part.

col {\cdot}

stands for the vector obtained by stacking its arguments on top of one another.

diag {\cdot}

generates a diagonal matrix using the given diagonal arguments.

{[\cdot]}_{i}

stands for the ith element of a vector. The

min {\cdot}

denotes the minimum element of a vector. We define the eigenvalue set of the square matrix

F

as

{λ (F)}

, with

λ_{max} (F)

denoting the maximum eigenvalue. The spectral radius of the square matrix

F

is denoted by

ϱ (F) ≜ max {| λ (F) |}

.

2. The ATC Diffusion LMS Algorithm

2.1. Model Assumption

Consider a network containing N nodes, which are used to estimate an M-dimensional unknown parameter

w_{o} \in C^{M}

collectively.

N_{k}

denotes the set of neighbors of node k, including k itself. The cardinality of

N_{k}

is

n_{k}

. For each node k, at the time instant t, the regressor

u_{k} (t) \in C^{M}

and the measurement signal

d_{k} (t) \in C

are available. The signal model is given by

d_{k} (t) ≜ w_{o}^{H} u_{k} (t) + v_{k} (t),

(1)

where

v_{k} (t)

denotes the additive zero-mean white Gaussian measurement noise at node k, with variance

σ_{v, k}^{2}

. For any k and t,

v_{k} (t)

is independent from

u_{k} (t)

, and for all

k \neq l

or

i \neq j

,

v_{k} (i)

is independent from

v_{l} (j)

.

2.2. ATC Algorithm

The main target of distributed estimation algorithm is to generate an estimate

w_{k} (t)

of

w_{o}

at each node k and time t in a distributed manner. For the diffusion strategy, generally, each node k executes a local adaptation step to obtain an intermediate estimate

ψ_{k} (t)

, then all the nodes share their intermediate estimates to their neighbors, and finally, each node k linearly combines all the intermediate estimates received from its neighbors under some combination weights. The detailed steps of ATC diffusion LMS algorithm are

\begin{matrix} ψ_{k} (t) & = w_{k} (t - 1) + μ_{k} u_{k} (t) e_{k}^{*} (t), \end{matrix}

(2)

\begin{matrix} w_{k} (t) & = \sum_{l \in N_{k}} a_{l, k} ψ_{l} (t), \end{matrix}

(3)

where

e_{k} (t) ≜ d_{k} (t) - w_{k}^{H} (t - 1) u_{k} (t)

is the prior estimate error and

μ_{k} > 0

is the step size for

k \in {1, 2, \dots, N}

. The combiner

a_{l, k}

is the weight of intermediate estimation from node l during the combination step of node k. Moreover, the non-negative combination matrix

A = [a_{l, k}]

satisfies [1,12]

a_{l, k} \geq 0 if l \in N_{k}, a_{l, k} = 0 if l \notin N_{k}, a_{k}^{T} 1 = 1,

(4)

with

a_{k}

denoting the kth column of

A

. Notice that

A

is left-stochastic since the entries of each column are non-negative and sum to 1 [19,20].

3. Adaptive Combination Scheme

3.1. Minimum Variance Unbiased Estimation

Consider the signal model and the ATC diffusion LMS algorithm in Section 2. Assume that for each node

k \in {1, \dots, N}

, the intermediate estimate

ψ_{k} (t)

in the diffusion LMS algorithm satisfies

E {ψ_{k} (t)} = w_{o}, for all k \in {1, \dots, N},

(5)

We define

\begin{matrix} Ψ (t) ≜ [ψ_{1} (t), \dots, ψ_{N} (t)], k = 1, \dots, N . \end{matrix}

Following [12], we have the minimum variance unbiased estimation problem for each

k \in {1, \dots, N}

,

\begin{matrix} min_{a_{k} \in R^{N}} & a_{k}^{T} ℜ (Q_{Ψ}) a_{k} \\ st . & 1_{N}^{T} a_{k} = 1; a_{l, k} = 0, for all l \notin N_{k}, \end{matrix}

(6)

where

Q_{Ψ} ≜ E {{(Ψ (t) - E {Ψ (t)})}^{H} (Ψ (t) - E {Ψ (t)})}

and

1_{N}

denotes the

N \times 1

vector with unit entries. Applying the convex combination strategy

for all k \in {1, \dots, N}

, we also require

a_{l, k} ⩾ 0

.

3.2. Fixed-Point Iteration Solution

First, we introduce a transform matrix

P_{k}

, defined as

P_{k} ≜ {[l th column of I_{N}]}_{l \in N_{k}}

. Then,

a_{k}

in (6) can be expressed by

a_{k} = P_{k} b_{k}

, with

b_{k} \in R^{n_{k}}

. Therefore, the minimization problem (6) can be transformed into

\begin{matrix} min_{b_{k}} & J (b_{k}) ≜ b_{k}^{T} ℜ (Q_{Ψ_{k}}) b_{k} \\ st . & 1_{n_{k}}^{T} b_{k} = 1; b_{k, i} \geq 0, for i = 1, \dots, n_{k}, \end{matrix}

(7)

where

Q_{Ψ_{k}} ≜ P_{k}^{T} Q_{Ψ} P_{k} = E {{(Ψ_{k} (t) - E {Ψ_{k} (t)})}^{H} (Ψ_{k} (t) - E {Ψ_{k} (t)})}

with

Ψ_{k} (t) ≜ Ψ (t) P_{k}

, and

b_{k, i}

is the ith element of vector

b_{k}

. By introducing the Lagrange multipliers

α

and

ω

, we obtain the cost function

L (b_{k}, ω, α) ≜ J (b_{k}) - ω^{T} b_{k} + α (1_{n_{k}}^{T} b_{k} - 1) .

(8)

Taking the gradient of (8) with respect to

b_{k}

yields

\begin{matrix} \nabla_{b_{k}} L (b_{k}, ω, α) & = \nabla_{b_{k}} J (b_{k}) - ω + 1_{n_{k}} α \\ = ℜ (Q_{Ψ_{k}}) b_{k} - ω + 1_{n_{k}} α . \end{matrix}

(9)

According to the Karush–Kuhn–Tucker (KKT) condition [21], the optimal value of the tuple (

b_{k}, ω, α

) should obey

\{\begin{matrix} ℜ (Q_{Ψ_{k}}) b_{k} - ω + 1_{n_{k}} α = 0 \\ {[ω]}_{i} b_{k, i} = 0 \end{matrix} .

(10)

or equivalently,

{[ℜ (Q_{Ψ_{k}}) b_{k} + 1_{n_{k}} α]}_{i} b_{k, i} = 0 .

(11)

We introduce a positive definite diagonal matrix

D_{k} (t - 1)

, whose ith diagonal element is an arbitrary positive function with respect to

b_{k} (t - 1)

, denoted as

f_{i} (b_{k} (t - 1))

. Obviously,

D_{k} (t - 1) (- \nabla_{b_{k}} L (b_{k}, ω, α))

is still the descent direction of the cost function (8) [21]. Therefore, we have the adaptive solution of problem (7) through the fixed-point iteration method

\begin{matrix} b_{k, i} (t) = & b_{k, i} (t - 1) - η_{k, i} (t) f_{i} (b_{k} (t - 1)) b_{k, i} (t - 1) \\ {[ℜ (Q_{Ψ_{k}}) b_{k} (t - 1) + 1_{n_{k}} α]}_{i}, \end{matrix}

(12)

where

b_{k, i} (t)

is the estimate of

b_{k, i}

at time instant t and

η_{k, i}

is the learning factor of

b_{k, i} (t)

.

To simplify the problem, we choose the same learning factor

η_{k, i}

for all i at any time t. Hence, we can rewrite (12) as a vector,

\begin{matrix} b_{k} (t) = & b_{k} (t - 1) - η_{k} (t) Γ_{k} (t - 1) [ℜ (Q_{Ψ_{k}}) b_{k} (t - 1) + 1_{n_{k}} α] . \end{matrix}

(13)

where

Γ_{k} (t - 1) ≜ diag {D_{k} (t - 1) b_{k} (t - 1)}

.

Applying the constraint

1_{n_{k}}^{T} b_{k} (t) = 1

and pre-multiplying both sides of (13) by

1_{n_{k}}^{T}

yields the Lagrange multiplier

α

α = \frac{1_{n_{k}}^{T} (b_{k} (t - 1) - η_{k} (t) Γ_{k} (t - 1) ℜ (Q_{Ψ_{k}}) b_{k} (t - 1)) - 1}{η_{k} (t) 1_{n_{k}}^{T} Γ_{k} (t - 1) 1_{n_{k}}} .

(14)

Substituting

α

into (13), and using the constraint

1_{n_{k}}^{T} b_{k} (t - 1) = 1

again, we can obtain the update of combiners

b_{k} (t)

,

\begin{matrix} b_{k} (t) & = b_{k} (t - 1) - η_{k} (t) G_{k} (t - 1) Γ_{k} (t - 1) ℜ (Q_{Ψ_{k}}) b_{k} (t - 1) . \end{matrix}

(15)

where

G_{k} (t - 1) ≜ I_{n_{k}} - \frac{D_{k} (t - 1) b_{k} (t - 1) 1_{n_{k}}^{T}}{1_{n_{k}}^{T} D_{k} (t - 1) b_{k} (t - 1)}

with

I_{n_{k}}

denoting the

n_{k} \times n_{k}

identity matrix.

The adaptive combiners (15) can be updated by two incremental steps

\begin{matrix} g_{k} (t) & = ζ_{k} (t - 1) ℜ (Q_{Ψ_{k}}) b_{k} (t - 1) \end{matrix}

(16)

\begin{matrix} b_{k} (t) & = b_{k} (t - 1) - η_{k} (t) g_{k} (t) \end{matrix}

(17)

where

\begin{matrix} ζ_{k} (t - 1) = G_{k} (t - 1) Γ_{k} (t - 1) . \end{matrix}

(18)

Then, we can obtain the combiner

a_{k} (t)

\begin{matrix} a_{k} (t) & = P_{k} b_{k} (t) \end{matrix}

(19)

which can be used in (3) to update the local weight estimate adaptively. We also define the adaptive combination matrix

A (t) \in R^{N \times N}

where

a_{k} (t)

is the kth column vector of it.

Please note that

- g_{k} (t)

in (16) can be seen as the product of

ζ_{k} (t - 1)

and the gradient

\nabla_{b_{k}} J (b_{k} (t - 1)) = ℜ (Q_{Ψ_{k}}) b_{k} (t - 1)

, which means

ζ_{k} (t - 1)

can be seen an auxiliary matrix to adjust the update direction

\nabla_{b_{k}} J (b_{k} (t - 1))

. On the one hand, note that

G_{k} (t - 1)

in (18) is a projection matrix [22], which enables the update direction

- g_{k} (t)

orthogonal to the vector

1_{n_{k}}

, i.e.,

1_{n_{k}}^{T} g_{k} (t) = 0

. Then we have

1_{n_{k}}^{T} b_{k} (t) = 1_{n_{k}}^{T} b_{k} (t - 1) = \dots = 1_{n_{k}}^{T} b_{k} (0) = 1

if the initial combiners satisfy

1_{n_{k}}^{T} b_{k} (0) = 1

. On the other hand, since as a whole,

ζ_{k} (t - 1)

is a positive semi-definite symmetric matrix, the update direction

- g_{k} (t)

is still the descent direction of the cost function (7) [21]. Instead of using the positive semi-definite symmetric matrix

ζ_{k} (t - 1) = G_{k} (t - 1) Γ_{k} (t - 1)

, the classic AC [12] consider

ζ_{k} (t - 1)

being replaced by the orthogonal projection matrix characterized by

1_{n_{k}}

, i.e.,

I_{n_{k}} - \frac{1_{n_{k}} 1_{n_{k}}^{T}}{n_{k}}

, which actually limits the update direction of the adaptive combiners to be lie in the hyperplane spanned by

1_{n_{k}}

and

\nabla_{b_{k}} J (b_{k} (t - 1))

. In fact, we find that (16) and (17) could reduce to the classic AC [12] when

D_{k} (t) = diag {b_{k} (t)}^{- 1}

.

We now consider optimizing the learning factor

η_{k} (t)

. Substituting (16) and (17) into (7) yields a cost function regarding the learning factor

η_{k} (t)

,

\begin{matrix} h (η_{k} (t)) & = g_{k}^{T} (t) ℜ (Q_{Ψ_{k}}) g_{k} (t) η_{k}^{2} (t) \\ - 2 g_{k}^{T} (t) ℜ (Q_{Ψ_{k}}) b_{k} (t - 1) η_{k} (t) \\ + b_{k}^{T} (t - 1) ℜ (Q_{Ψ_{k}}) b_{k} (t - 1) . \end{matrix}

(20)

Obviously,

h (η_{k} (t))

is a quadratic (convex) function with respect to

η_{k} (t)

. Thus, its minimum value can be readily obtained if and only if

\begin{matrix} η_{k}^{o} (t) & = \frac{g_{k}^{T} (t) ℜ (Q_{Ψ_{k}}) b_{k} (t - 1)}{g_{k}^{T} (t) ℜ (Q_{Ψ_{k}}) g_{k} (t)} \\ = \frac{b_{k}^{T} (t - 1) ℜ (Q_{Ψ_{k}}) ζ_{k} (t - 1) ℜ (Q_{Ψ_{k}}) b_{k} (t - 1)}{g_{k}^{T} (t) ℜ (Q_{Ψ_{k}}) g_{k} (t)} . \end{matrix}

(21)

Note that the optimal learning factor

η_{k}^{o} (t)

is non-negative since

ζ_{k} (t - 1)

is a positive semi-definite matrix.

To guarantee that the combiners are non-negative, we set the upper bound of

η_{k} (t)

as [12],

\begin{matrix} η_{k}^{max} (t) = \frac{min (b_{k} (t))}{∥ g_{k} {(t) ∥}_{\infty} + ε}, \end{matrix}

(22)

where

ε > 0

is a small constant. Please note that

{∥ \cdot ∥}_{\infty}

represents the maximum norm. Thus, we choose the learning factor in (17) at time instant t,

\begin{matrix} η_{k} (t) = min {η_{k}^{max} (t), η_{k}^{o} (t)} . \end{matrix}

(23)

Please note that the

Q_{Ψ_{k}}

is usually unavailable in practical applications. As what done in [12],

Q_{Ψ_{k}}

can be replaced by its approximation

\begin{matrix} {\hat{Q}}_{Ψ_{k}} (t) \approx \frac{1}{2} Δ Ψ_{k}^{H} (t) Δ Ψ_{k} (t) . \end{matrix}

(24)

where

Δ Ψ_{k} (t) = Ψ_{k} (t) - Ψ_{k} (t - 1)

. To make it smoother, we consider a forgetting factor

λ

. Then, the iterative expression of

{\hat{Q}}_{Ψ_{k}}

can be written as

\begin{matrix} {\hat{Q}}_{Ψ_{k}} (t) = λ {\hat{Q}}_{Ψ_{k}} (t - 1) + \frac{1}{2} Δ Ψ_{k}^{H} (t) Δ Ψ_{k} (t) . \end{matrix}

(25)

In practical applications, we use

{\hat{Q}}_{Ψ_{k}} (t)

in (25) to replace the aforementioned statistical quantity

Q_{Ψ_{k}}

for each node k at time instant t.

Finally, the implementation of the ATC algorithm with the proposed AC strategy is summarized in Algorithm 1.

Algorithm 1 ATC with the proposed AC strategy

For each node k, set

ψ_{k} (0) = w_{k} (0) = 0_{M}

and choose

b_{k} (0) \in R^{n_{k}}

so that

1_{n_{k}}^{T} b_{k} (0) = 1

.
Given a small positive constant

ε

and step size

μ_{k}

, at each time instant

t > 0

, compute at each node k:
1. Update the intermediate weight estimate

ψ_{k} (t)

through (2).
2. Update combiner

a_{k} (t)

consecutively through (25), (16), (23), (17) and (19).
3. Update the local weight estimate

w_{k} (t)

through (3).

4. Mean Convergence

We now analyze the mean convergence of the diffusion LMS algorithm with the proposed adaptive combiners.

We now introduce the following independence assumptions:

Assumption 1

(Independence). All regressors

u_{k} (t)

are spatially and temporally independent ([1], Assumption 1).

Assumption 2.

The combination matrix

A (t)

is independent of all regressors

u_{k} (t)

and all local weight estimates

w_{k} (t - 1)

at time

t - 1

([12], Assumption 4.3).

Theorem 1.

Under Assumptions 1 and 2, a sufficient condition to guarantee the convergence of the diffusion LMS algorithm is given by,

\begin{matrix} 0 < μ_{k} < \frac{2}{ϱ (R)}, \end{matrix}

(26)

where

ϱ (R)

denotes the spectrum radius of the matrix

R

.

The proof is referred to Appendix A. Please note that the sufficient condition (26) is consistent with ([1], Equation (37)) and ([12], Equation (25))

5. Simulation Results

We evaluate herein the MSD of the proposed algorithm. Without loss of generality, the unknown weight vector

w_{o}

is set to be

1_{M} / M

with

M = 5

. The initial weight vector estimations are

w_{k} (0) = 0_{M}

for each node k. The constant

ε

used in (22) is set to be

0.5 \times 10^{- 4}

and the forgetting factor

λ = 0

or

λ = 0.95

. We consider

D_{k} (t) = diag {b_{k} (t)}^{γ}

in terms of the proposed AC, where

γ

can be 0, −1 or −2.

We use the empirical MSDs as the performance metrics. Both the transient and steady-state empirical network MSDs hereinafter are obtained by averaging

L = 500

independent trials over all nodes of the network,

\begin{matrix} MSD (t) & ≜ \frac{1}{L N} \sum_{k = 1}^{N} \sum_{ℓ = 1}^{L} {∥ {\tilde{w}}_{k}^{(ℓ)} (t) ∥}_{2}^{2}, \end{matrix}

(27)

\begin{matrix} MSD & ≜ \frac{1}{L N} \sum_{k = 1}^{N} \sum_{ℓ = 1}^{L} {∥ {\tilde{w}}_{k}^{(ℓ)} (\infty) ∥}_{2}^{2}, \end{matrix}

(28)

where

{\tilde{w}}_{k}^{(ℓ)} (t) ≜ w_{o} - w_{k}^{(ℓ)} (t)

with

w_{k}^{(ℓ)} (t)

denoting the transient weight estimate of the ℓth trial and

∥ {\tilde{w}}_{k}^{(ℓ)} {(\infty) ∥}_{2}^{2}

is obtained by averaging

∥ {\tilde{w}}_{k}^{(ℓ)} {(t) ∥}_{2}^{2}

of 100 iterations after convergence.

Example 1.

We consider the same topology of the network with

N = 15

nodes as ([12], Figure 5) ([17], Figure 6a), illustrated in Figure 1a. The measurement noise is real Gaussian white noise with its variance

σ_{v, k}^{2}

at each node k presented in Figure 1b. According to the mean convergence condition, we set the step size to

μ_{k} = 0.01

for each node k. We herein consider

γ = 0

. Each regressor

u_{k} (t)

at each node k is real Gaussian white noise sequence with their covariance matrices

R_{u, k} = σ_{u, k}^{2} I_{M}

with

σ_{u, k}^{2} = 1

for all k. The noise power of node

5, 12, 13

at the 1500th iteration suddenly goes up to 5.

As demonstrated in Figure 2, the proposed AC strategy outperforms the classic AC [12] and the uniform combination [1] in terms of superior steady-state performance and similar convergence rates, and outperforms LS-based AC [17] and the optimal AC [13] in terms of accelerated convergence rate. We also observe that the forgetting factor can further enhance the performance of the proposed AC. After the sudden change of the noise power of some nodes, the proposed AC exhibits rather high reconvergence rate and good steady-state performance.

Example 2.

The initial measurement noise variance

σ_{v, k}^{2}

at each node k is presented in Figure 1b. In this simulation, we compare the proposed AC scheme with the uniform combination and the classic AC in terms of the steady-state MSD at different noise variances

τ σ_{v, k}^{2}

at each node k. The other simulation conditions are same as Example 1. The steady-state MSDs with respect to the noise variances are illustrated in Table 1.

The fairness of this experiment is endorsed by the relatively approximate convergence behavior of the transient MSD curves plotted in Figure 3. As shown in Table 1, with the noise variances increasing, the ATC algorithm incorporated with the proposed AC exhibits superior performance compared to the ATC algorithm with the uniform combination or the classic AC strategy. Additionally, we can also find that the performance gain brought by the forgetting factor is limited when the noise variances increase to some certain limit.

Example 3.

We now consider target tracking model ([17], Equation (52)), namely

\begin{matrix} w_{o} (t) & = w_{o} + θ (t) \end{matrix}

(29)

\begin{matrix} θ (t) & = 0.99 θ (t - 1) + ξ (t) \end{matrix}

(30)

where

ξ (t)

is a sequence of independent identically distributed perturbations with zero mean and covariance matrix Ξ, independent of the input regressors and measurement noise at every iteration. We herein consider

ξ (t)

is Gaussian white with

Ξ = σ_{ξ}^{2} I_{M}

and

σ_{ξ}^{2} = 1 \times 10^{- 7}

, i.e., the unknown weight vector is varying slowly. The other simulation conditions are same as Example 1.

As illustrated in Figure 4, similar to Example 1, the proposed AC strategy outperforms the uniform combination rule, the classic AC and LS-based AC in terms of superior steady-state performance, and outperforms the optimal AC in terms of accelerated convergence rate, under tracking scenarios. We could also observe that the forgetting factor can further enhance the performance of the proposed AC in the tracking scenario.

Example 4.

We now consider the impact of the factors λ and γ in the performance of the proposed AC strategy. Without loos of generality, we consider the step sizes

μ_{k} = 0.02

for each node k. The simulation result is shown in Figure 5.

It can be seen from Figure 5 that with specific

γ

, the larger

λ

brings the higher performance gain for the proposed AC. We could also observe that choosing

γ = - 2

brings better steady-state performance than the other two choices, especially for small forgetting factor

λ

. In particular, as compared to Example 1, for the two scenarios

γ = 0, λ = 0

and

γ = 0, λ = 0.95

, we find that larger step sizes in this example lead to accelerated convergence rate at the cost of degraded steady-state performance.

Example 5.

We consider the sparse network with

N = 15

nodes with its topology same as ([12], Figure 8), depicted in Figure 6a. The measurement noise is real Gaussian white noise with its variance

σ_{v, k}^{2}

at each node presented in Figure 6c. We consider the heterogenous network with the step sizes

μ_{k} = 0.004

for orange shaded nodes and

μ_{n} = 0.02

for the rest. Each regressor

u_{k} (t)

at each node k is real Gaussian white noise sequence with their covariance matrices

R_{u, k} = σ_{u, k}^{2} I_{M}

with

σ_{u, k}^{2}

presented in Figure 6b for all k.

As illustrated in Figure 7, for the sparse network, the proposed AC scheme outperforms the classic AC and LS-based AC in terms of superior steady-state performance while keeping rather approximate convergence rate. It also outperforms the optimal AC scheme in terms of the accelerated convergence rate. Moreover, the introduction of forgetting factor can further enhance the performance of the proposed AC scheme.

6. Conclusions

In this paper, we present a modified adaptive combination strategy for the distributed estimation problem over diffusion networks to improve robustness against the spatial variation of signal and noise statistics over the network. Considering the Karush–Kuhn–Tucker conditions and fixed-point iteration methodology, we derive an effective adaptive combination strategy for the ATC diffusion LMS algorithm. We also invoke the forgetting factor and optimize the learning factor to further enhance the performance of the proposed adaptive combination strategy. Illustrative simulations validate the improved performance of the diffusion LMS algorithm with the proposed adaptive combination strategy.

Author Contributions

Data curation, Q.L.; Funding acquisition, C.X.; Methodology, Q.L.; Software, C.X.; Supervision, C.X. and D.Y.; Writing—Original draft, Q.L.; Writing—Review & editing, Q.L. and D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was jointly funded by the National Natural Science Foundation of China under Grant 11864016, 61671442 and Social Science Key Research Base Project of Jiangxi Province under Grant JD19042.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

In this section you can acknowledge any support given which is not covered by the author contribution or funding sections. This may include administrative and technical support, or donations in kind (e.g., materials used for experiments).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Mean Convergence Analysis

We define the network weight error

\begin{matrix} \tilde{w} (t) & ≜ col {{\tilde{w}}_{1} (t), \dots, {\tilde{w}}_{N} (t)} \in C^{M N}, \end{matrix}

(A1)

where

{\tilde{w}}_{k} (t) ≜ w_{o} - w_{k} (t), k = 1, 2, \dots, N

. We define the global regressor,

\begin{matrix} u (t) & ≜ col {u_{1} (t), \dots, u_{N} (t)}, \end{matrix}

(A2)

and the global covariance matrix,

\begin{matrix} R & ≜ diag \{R_{1}, \dots, R_{N}\}, \end{matrix}

(A3)

where the covariance matrix

R_{k} ≜ E \{u_{k} (t) u_{k}^{H} (t)\}, k = 1, 2, \dots, N

.

We also introduce the following diagonal matrices,

\begin{matrix} V (t) & ≜ diag \{v_{1} (t), \dots, v_{N} (t)\} \otimes I_{M}, \end{matrix}

(A4)

\begin{matrix} D & ≜ diag \{μ_{1}, \dots, μ_{N}\} \otimes I_{M}, \end{matrix}

(A5)

and the extended combination matrix

\begin{matrix} A (t) & ≜ A (t) \otimes I_{M} . \end{matrix}

(A6)

Following [1,12], we can obtain the iteration of the network weight error

\begin{matrix} \tilde{w} (t) = \hat{F} (t) \tilde{w} (t - 1) - A^{T} (t) D z (t), \end{matrix}

(A7)

where

\hat{F} (t) ≜ A^{T} (t) (I_{M N} - D \hat{R} (t))

with

\hat{R} (t) ≜ diag \{{\hat{R}}_{1} (t), \dots, {\hat{R}}_{N} (t)\}

denoting the instantaneous estimate of the global covariance matrix

R

and

{\hat{R}}_{k} (t) ≜ u_{k} (t) u_{k}^{H} (t)

, and

z (t) ≜ V^{*} (t) u (t)

.

Obviously, we have

E \{z (t)\} = 0

. According to Assumption 1 and Assumption 2, taking mathematical expectation of (A7) yields

\begin{matrix} E \{\tilde{w} (t)\} = F (t) E \{\tilde{w} (t - 1)\}, \end{matrix}

(A8)

where

\begin{matrix} F (t) ≜ E \{\hat{F} (t)\} = E \{A^{T} (t)\} (I_{M N} - D R) . \end{matrix}

(A9)

(A8) can be further expressed as

\begin{matrix} E \{\tilde{w} (t)\} = F (t) E {\tilde{w} (0)}, \end{matrix}

(A10)

where

F (t) ≜ \prod_{i = t}^{1} F (i) ≜ F (t) \dots F (1) .

To facilitate our analysis, we herein introduce the submultiplicative matrix norm (A submultiplicative matrix norm satisfies

∥ A B ∥ \leq ∥ A ∥ ∥ B ∥

[23]). For any square matrix

X

, and any

ϵ > 0

, there exists a submultiplicative matrix norm

{∥ \cdot ∥}_{ϱ}

such that

ϱ (X) \leq {∥ X ∥}_{ϱ} \leq ϱ (X) + ϵ

, where

ϱ (X)

denotes the spectrum radius of

X

[23,24]. Accordingly, we have

{∥ F (t) ∥}_{ϱ} \leq \prod_{i = t}^{1} {∥ F (i) ∥}_{ϱ} .

Notice that the diffusion LMS algorithm converges if and only if

{lim}_{t \to \infty} {∥ F (t) ∥}_{ϱ} = 0

. Hence, a sufficient condition for the diffusion LMS algorithm to converge is

{∥ F (t) ∥}_{ϱ} \leq ϱ (F (t)) + ϵ < 1

. Therefore, the diffusion LMS algorithm converges if

ϱ (F (t)) < 1

for all t with a sufficiently small

ϵ

chosen. Thus, the diffusion LMS algorithm converges if

F (t)

in (A8) is stable at each t.

E \{A^{T} (t)\}

in (A9) is left-stochastic since each column of it sums to 1. Thus, according to ([1], Appexdix I),

F (t)

is stable if, and only if

I_{M N} - D R

is stable, i.e.,

max {| 1 - λ (D R) |} < 1

. Notice that

D R

is positive semi-definite Hermitian since

R

is block diagonal and

D

is positive definite diagonal. In light of [25], we have

λ_{max} (D R) \leq λ_{max} (D) λ_{max} (R) = μ_{max} λ_{max} (R)

, where

μ_{max}

is the maximum step size used in the network. Thus,

max {| 1 - λ (D R) |} < 1

holds if, and only if

μ_{max} λ_{max} (R) < 2

, i.e.,

0 < μ < 2 / λ_{max} (R)

.

References

Cattivelli, F.S.; Sayed, A.H. Diffusion LMS Strategies for Distributed Estimation. IEEE Trans. Signal Process. 2010, 58, 1035–1048. [Google Scholar] [CrossRef]
Lee, H.S.; Kim, S.E.; Lee, J.W.; Song, W.J. A Variable Step-Size Diffusion LMS Algorithm for Distributed Estimation. IEEE Trans. Signal Process. 2015, 63, 1808–1820. [Google Scholar] [CrossRef]
Ahn, D.C.; Lee, J.W.; Shin, S.J.; Song, W.J. A new robust variable weighting coefficients diffusion LMS algorithm. Signal Process. 2017, 131, 300–306. [Google Scholar] [CrossRef]
Huang, W.; Li, L.; Li, Q. Diffusion Robust Variable Step-Size LMS Algorithm Over Distributed Networks. IEEE Access 2018, 6, 47511–47520. [Google Scholar] [CrossRef]
Ashkezari-Toussi, S.; Sadoghi-Yazdi, H. Robust diffusion LMS over adaptive networks. Signal Process. 2019, 158, 201–209. [Google Scholar] [CrossRef]
Nassif, R.; Vlaski, S.; Sayed, A.H. Adaptation and Learning Over Networks Under Subspace Constraints—Part I: Stability Analysis. IEEE Trans. Signal Process. 2020, 68, 1346–1360. [Google Scholar] [CrossRef] [Green Version]
Tu, S.Y.; Sayed, A.H. Mobile adaptive networks with self-organization abilities. In Proceedings of the 2010 7th International Symposium on Wireless Communication Systems, York, UK, 19–22 September 2010; pp. 379–383. [Google Scholar] [CrossRef]
Cattivelli, F.S.; Sayed, A.H. Modeling Bird Flight Formations Using Diffusion Adaptation. IEEE Trans. Signal Process. 2011, 59, 2038–2051. [Google Scholar] [CrossRef]
Cattivelli, F.S.; Sayed, A.H. Distributed detection over adaptive networks using diffusion adaptation. IEEE Trans. Signal Process. 2011, 59, 1917–1932. [Google Scholar] [CrossRef]
Chen, J.; Sayed, A.H. Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks. IEEE Trans. Signal Process. 2012, 60, 4289–4305. [Google Scholar] [CrossRef] [Green Version]
Tu, S.; Sayed, A.H. Mobile Adaptive Networks. IEEE J. Sel. Top. Signal Process. 2011, 5, 649–664. [Google Scholar] [CrossRef]
Takahashi, N.; Yamada, I.; Sayed, A.H. Diffusion Least-Mean Squares with adaptive combiners: Formulation and performance analysis. IEEE Trans. Signal Process. 2010, 58, 4795–4810. [Google Scholar] [CrossRef]
Tu, S.; Sayed, A.H. Optimal combination rules for adaptation and learning over networks. In Proceedings of the 2011 4th IEEE International Workshop Computational Advances in Multi-Sensor Adaptive Processing, San Juan, Puerto Rico, 13–16 December 2011; pp. 317–320. [Google Scholar] [CrossRef]
Yu, C.; Sayed, A.H. A strategy for adjusting combination weights over adaptive networks. In Proceedings of the 2013 IEEE International Conference Acoustic, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 4579–4583. [Google Scholar] [CrossRef]
Fernandez-Bes, J.; Arenas-García, J.; Sayed, A.H. Adjustment of combination weights over adaptive diffusion networks. In Proceedings of the 2014 IEEE International Conference Acoustic, Speech and Signal Processing, Florence, Italy, 4–9 May 2014; pp. 6409–6413. [Google Scholar] [CrossRef]
Abdolee, R.; Vakilian, V. An Iterative Scheme for Computing Combination Weights in Diffusion Wireless Networks. IEEE Wireless Commun. Lett. 2017, 6, 510–513. [Google Scholar] [CrossRef]
Fernandez-Bes, J.; Azpicueta-Ruiz, L.A.; Arenas-García, J. Distributed estimation in diffusion networks using affine least-squares combiners. Dig. Signal Process. 2015, 36, 1–14. [Google Scholar] [CrossRef]
Fernandez-Bes, J.; Arenas-García, J.; Silva, M.T.M. Adaptive Diffusion Schemes for Heterogeneous Networks. IEEE Trans. Signal Process. 2017, 65, 5661–5674. [Google Scholar] [CrossRef] [Green Version]
Sayed, A.H. Adaptive Networks. Proc. IEEE 2014, 102, 460–497. [Google Scholar] [CrossRef]
Zhao, X.; Sayed, A.H. Asynchronous Adaptation and Learning Over Networks-Part I: Modeling and Stability Analysis. IEEE Trans. Signal Process. 2015, 63, 811–826. [Google Scholar] [CrossRef]
Chen, J.; Richard, C.; Bermudez, J.C.M. Nonnegative Least-Mean-Square Algorithm. IEEE Trans. Signal Process. 2011, 59, 5225–5235. [Google Scholar] [CrossRef] [Green Version]
Behrens, R.T.; Scharf, L.L. Signal processing applications of oblique projection operators. IEEE Trans. Signal Process. 1994, 42, 1413–1424. [Google Scholar] [CrossRef] [Green Version]
Kailath, T.; Sayed, A.H.; Hassibi, B. Linear Estimation; Prentice-Hall: Englewood Cliffs, NJ, USA, 2000. [Google Scholar]
Sayed, A.H. Fundamentals of Adaptive Filtering; Wiley: New York, NY, USA, 2003. [Google Scholar]
Marshall, A.W.; Olkin, I.; Arnold, B.C. Matrix Theory. In Inequalities: Theory of Majorization and Its Applications; Springer: New York, NY, USA, 2011; Chapter 9; pp. 338–347. [Google Scholar] [CrossRef]

Figure 1. Network topology and noise profile: (a) network topology; (b) noise profile.

Figure 2. The MSD learning curves.

Figure 3. The MSD learning curves under different noise levels.

Figure 4. The MSD learning curves under tracking scenarios.

Figure 5. The MSD learning curves.

Figure 6. The topology, regressor power and noise profile: (a) network topology; (b) regressor power; (c) noise profile.

Figure 7. The MSD learning curves for the sparse network.

Table 1. The steady-state MSD with respect to the variation of the noise power.

	Uniform	Classic AC	$λ = 0$	$λ = 0.95$
Noise $τ$	Uniform	Classic AC	$λ = 0$	$λ = 0.95$
$τ = 0.5$	−49.0	−50.2	−51.2	−53.0
$τ = 1$	−46.0	−48.0	−49.1	−50.0
$τ = 1.5$	−44.3	−46.6	−47.7	−48.3
$τ = 2$	−43.3	−45.9	−46.9	−47.1
$τ = 2.5$	−42.1	−45.1	−46.1	−46.1
$τ = 3$	−41.2	−44.3	−45.2	−45.3
$τ = 3.5$	−40.6	−43.7	−44.4	−44.4
$τ = 4$	−40.2	−43.5	−44.1	−44.0
$τ = 4.5$	−39.6	−42.9	−43.5	−43.3
$τ = 5$	−39.2	−42.7	−43.3	−43.2

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, C.; Li, Q.; Ying, D. An Effective Adaptive Combination Strategy for Distributed Learning Network. Appl. Sci. 2021, 11, 5723. https://doi.org/10.3390/app11125723

AMA Style

Xu C, Li Q, Ying D. An Effective Adaptive Combination Strategy for Distributed Learning Network. Applied Sciences. 2021; 11(12):5723. https://doi.org/10.3390/app11125723

Chicago/Turabian Style

Xu, Chundong, Qinglin Li, and Dongwen Ying. 2021. "An Effective Adaptive Combination Strategy for Distributed Learning Network" Applied Sciences 11, no. 12: 5723. https://doi.org/10.3390/app11125723

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Effective Adaptive Combination Strategy for Distributed Learning Network

Abstract

1. Introduction

Motivation and Contribution

2. The ATC Diffusion LMS Algorithm

2.1. Model Assumption

2.2. ATC Algorithm

3. Adaptive Combination Scheme

3.1. Minimum Variance Unbiased Estimation

3.2. Fixed-Point Iteration Solution

4. Mean Convergence

5. Simulation Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Acknowledgments

Conflicts of Interest

Appendix A. Mean Convergence Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI