Next Article in Journal
Dark Energy Problem, Physics of Early Universe and Some New Approaches in Gravity
Next Article in Special Issue
Shannon’s Sampling Theorem for Bandlimited Signals and Their Hilbert Transform, Boas-Type Formulae for Higher Order Derivatives—The Aliasing Error Involved by Their Extensions from Bandlimited to Non-Bandlimited Signals
Previous Article in Journal
A Free Energy Principle for Biological Systems
Previous Article in Special Issue
A Survey on Interference Networks: Interference Alignment and Neutralization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimization of MIMO Systems Capacity Using Large Random Matrix Methods

by
Florian Dupuy
1,* and
Philippe Loubaton
2,*
1
Thales Communications & Security, 4 av. des Louvresses, 92622 Gennevilliers, France
2
Université Paris-Est/Marne la Vallée, LIGM, UMR CNRS 8049, 5 Bd. Descartes, Champs/Marne, 77454 Marne la Vallée Cedex 2, France
*
Authors to whom correspondence should be addressed.
Entropy 2012, 14(11), 2122-2142; https://doi.org/10.3390/e14112122
Submission received: 12 September 2012 / Revised: 19 October 2012 / Accepted: 24 October 2012 / Published: 1 November 2012
(This article belongs to the Special Issue Information Theory Applied to Communications and Networking)

Abstract

:
This paper provides a comprehensive introduction of large random matrix methods for input covariance matrix optimization of mutual information of MIMO systems. It is first recalled informally how large system approximations of mutual information can be derived. Then, the optimization of the approximations is discussed, and important methodological points that are not necessarily covered by the existing literature are addressed, including the strict concavity of the approximation, the structure of the argument of its maximum, the accuracy of the large system approach with regard to the number of antennas, or the justification of iterative water-filling optimization algorithms. While the existing papers have developed methods adapted to a specific model, this contribution tries to provide a unified view of the large system approximation approach.

Graphical Abstract

1. Introduction

It is now recognized that multiple-input multiple-output (MIMO) systems have the potential to increase the capacity of wireless digital communication systems. If t and r denote the number of transmit and receive antennas, the channel capacity gain can be in fact of the order of min ( t , r ) under certain (often rather unrealistic) assumptions [1]. Although the use of MIMO systems begins to be normalized, a number of related theoretical issues are still to be solved. This paper addresses the design of input covariance matrices chosen in order to maximize the (Gaussian) mutual information of the system.
This problem was extensively addressed in the past under various assumptions on the channel state information. In this paper, we assume that the channel is known at the receiver side, but that only its probability distribution, which is assumed complex Gaussian, is available at the transmitter side, a context recognized to be realistic in wireless communications, as the channels encountered are mostly fast fading whereas their statistics are slow fading. In this context of fast fading channels, we need to consider the expectation of the mutual information w.r.t. the channel probability distribution instead of the mutual information itself. The optimization of this average mutual information is of course more difficult than the optimization of the mutual information itself, because the analytical expressions of the cost function and of its first- and second-order derivatives are extremely complicated. The optimal input covariance matrix thus cannot be expressed in closed form, and one has to resort to descent algorithms in which the gradient and the Hessian of the cost function have to be evaluated using computationally intensive Monte-Carlo simulations (see e.g., [2] in the context of Rician channels). We note that for certain channel distributions (Rayleigh bi-correlated and uncorrelated Rician channels), the singular vectors of the optimal precoder have closed form expressions, which, of course, simplifies the optimization algorithms (see e.g., [3,4,5]).
Recently, it was proposed to use large random matrix tools to approximate the average mutual information by a large system approximation, and to optimize the approximation versus the input covariance matrix. This optimization problem is easier because the approximation can be very often nearly expressed in closed form so that its maximization does not need Monte-Carlo simulations. Large system approximations of average mutual information were derived in the past using the replica method (see e.g., [6] for bi-correlated MIMO Rayleigh channels, [7] for frequency selective Rayleigh channels, [8] for bi-correlated MIMO Ricean channels, [9] for jointly correlated Rician channel) or rigorous random matrix methods ( [10] for bi-correlated MIMO Rayleigh channels, [11] for bi-correlated MIMO Ricean channels, [12] for frequency selective Rayleigh channels, [13] for multiple access Rayleigh MIMO channels). Optimization algorithms of the large system approximation were first proposed in [14] in the context of bi-correlated MIMO Rayleigh channels. In [11,12] the authors addressed in more details the Rician bi-correlated channels and the frequency selective Rayleigh channels respectively, while [15] considered the Rician channel with interference. In [9] the authors considered the case of jointly correlated Rician channel. Finally, in the context of multiple access Rayleigh MIMO channels, the authors in [13] studied the multiple users input covariance matrices optimization of the related large system approximation. In [16] the authors generalized this analysis in the context of the multiple access jointly correlated Rician MIMO channels. Interestingly, the mutual information provided by the optimization of the approximation appears numerically to be quite close from the true optimum mutual information, even for realistic values of r and t. These observations were confirmed by the theoretical results of [11,12], which showed that the relative error is a O 1 t 2 term.
In this paper, we give a comprehensive introduction to the optimization of large system approximations of average mutual information of MIMO channels. We note that the main results of this paper were already presented in [11,12]. However, the focus of [11,12] is on the detailed derivation of the large system approximations, while the present paper concentrates on their optimization. In particular, we emphasize on methodological issues that are often partially—if at all—addressed. Another benefit of this paper is to provide, as much as possible, a unified view of the optimization of the large system approximation in the context of various kinds of MIMO channel, whereas the above referenced papers develop tools that seem specific to each particular model. Generally speaking, the writing style of this paper is rather informal, and hopefully easier to understand. The interested reader may refer to [11,12] for a rigorous presentation.

2. General Principles

2.1. Optimization of the Average Mutual Information

We consider a flat fading MIMO system equipped with t transmit antennas and r receive antennas. In this context, the received observation at time n is a r-variate signal y ( n ) given by:
y ( n ) = H x ( n ) + v ( n )
where x ( n ) represents the t-dimensional signal sent at time n at the transmitter side, and where H is the r × t channel matrix. ( v ( n ) ) n Z is a complex white Gaussian noise such that E ( v ( n ) v ( n ) H ) = σ 2 I r .
Channel matrix H is assumed to be a complex Gaussian random matrix (Matrix H is said complex Gaussian if the r t -dimensional vector h = vec ( H ) is complex Gaussian, i.e., the probability distribution of ( Re ( h ) T , Im ( h ) T ) T is Gaussian and if E ( h E ( h ) ) ( h E ( h ) ) T = 0 ) whose current value is known at the receiver side and whose probability distribution is known at the transmitter side. If Q denotes the covariance matrix of x ( n ) normalized in such a way that 1 t Tr ( Q ) = 1 , the (Gaussian) average mutual information of model (1) is given by [1]:
I ( Q ) = E H log det I r + H Q H H σ 2
where E H represents the mathematical expectation operator w.r.t. the distribution of matrix H .
In the following, we denote by C the cone of all non-negative (positive semi-definite) matrices Q , and by C 1 the set of all matrices Q C for which 1 t Tr ( Q ) = 1 . The ergodic capacity of the channel model (1) is defined as the maximum of I ( Q ) over the set C 1 , and is known to represent the maximum rate that can be transmitted reliably when H is known at the receiver and when its probability distribution is known at the transmitter side. In particular, using as input covariance matrix the argument of the maximum of I ( Q ) allows to transmit reliably at the largest possible rate. This explains why the maximization of I ( Q ) over the set C 1 is important.
Function Q I ( Q ) is concave, and under some additional assumptions, strictly concave on the convex set C 1 . Therefore ([17]), the maximization of I ( Q ) over C 1 has a unique argument, which is denoted Q * in the following. In general, Q * has no closed form expression. Therefore, its evaluation needs the use of descent algorithms which, due to the concavity of I and the convexity of C 1 , are convergent. In order to implement such algorithms, it is necessary to be able to evaluate I and its first and second derivatives at each point. As the analytical expressions of these functions are in general extremely complicated, the simplest solution consists in using Monte-Carlo evaluations. This, of course, leads to computationally intensive algorithms.

2.2. Large System Approximation of I ( Q )

2.2.1. The Case Q = I

In this paragraph, we first introduce informally how it is possible to derive a large system approximation of the mutual information I ( I ) corresponding to the case where the covariance matrix of the transmitted signal is Q = I . In order to simplify the notations, we denote I ( I ) by I.
Background on the behaviour of the eigenvalue distribution of H H H Let H r , t be a r × t complex Gaussian random matrix representing the channel matrix of a MIMO system (we indicate that H is a r × t matrix by denoting H by H r , t ), and consider its associated Gram matrix defined by H r , t H r , t H . Under certain assumptions on the probability distribution of H r , t (see the examples below), it appears that the empirical eigenvalue distribution μ ^ r , t of H r , t H r , t H defined as 1 r i = 1 r δ λ λ i ( H r , t H r , t H ) tends to have a deterministic behaviour when the dimensions r and t converge to + in such a way that t r c , where 0 < c < + . In order to shorten the notations, t + stands in the following for r and t converge to + in such a way that t r c . This is equivalent to saying that it exists deterministic probability measures μ r , t (which depend on the dimensions r and t) for which
1 r i = 1 r ϕ λ i ( H r , t H r , t H ) ϕ ( λ ) d μ r , t ( λ ) 0
almost surely when t + for each well chosen test function φ. Sometimes, the measures μ r , t also converge towards a probability distribution μ * , which does not depend on the values of ( r , t ) but only on c = lim t r . In this case, the eigenvalue distribution converges towards μ * , which is called the limit eigenvalue distribution H r , t H r , t H . In order to illustrate the above phenomenon, we consider the simplest situation in which the entries of H r , t are independent identically distributed (i.i.d.) zero mean random variables of variance 1 t , and where c 1 . In this case, the measure μ r , t is absolutely continuous, and its density p r , t ( λ ) is given by:
p r , t ( λ ) = r 2 π λ [ ( 1 + 1 / r ) 2 λ ] [ λ ( 1 1 / r ) 2 ] i f λ [ ( 1 1 / r ) 2 , ( 1 + 1 / r ) 2 ] = 0 otherwise
The corresponding distribution is called the Marčenko–Pastur distribution, and was introduced for the first time in [18].
In order to simplify the notations, we no longer mention the dimensions r and t in the following, and denote H r , t , μ ^ r , t , μ r , t by H , μ ^ , μ respectively. In order to establish that μ ^ has a deterministic behaviour when t + , it is well known that it is sufficient to establish that (3) holds for ϕ ( λ ) = 1 λ + σ 2 for each σ 2 > 0 (see e.g., [19,20]). In order to reformulate (3) for ϕ ( λ ) = 1 λ + σ 2 , we introduce the Stieltjes transform s ν of a positive measure ν carried by R + , which is the function defined for each z C R as:
s ν ( z ) = R + 1 λ + z d ν ( λ )
Equation (3) for ϕ ( λ ) = 1 λ + σ 2 for each σ 2 > 0 is thus equivalent to:
lim t + s μ ^ ( σ 2 ) s μ ( σ 2 ) = 0
for each σ 2 > 0 because
s μ ^ ( σ 2 ) = 1 r i = 1 r 1 λ i H H H + σ 2
and
s μ ( σ 2 ) = R + 1 λ + σ 2 d μ ( λ )
In order to prove (5), a possible approach is to establish that s μ ^ ( σ 2 ) has the same asymptotic behaviour as a deterministic quantity which appears to coincide with the value at point σ 2 of the Stieltjes transform of a certain probability measure μ carried by R + . For this, it is useful to introduce the resolvent of matrix H H H defined as the r × r matrix valued function S defined for each σ 2 > 0 by:
S ( σ 2 ) = H H H + σ 2 I r 1
It is important to notice that
s μ ^ ( σ 2 ) = 1 r Tr S ( σ 2 )
Under suitable hypotheses on H , the random entries of S ( σ 2 ) turn out to have the same behaviour as the entries of a deterministic r × r matrix T ( σ 2 ) , which, in particular, implies that
1 r Tr S ( σ 2 ) 1 r Tr T ( σ 2 ) 0
The entries of matrix T ( σ 2 ) can be expressed in closed form in terms of the (unique) positive solutions ( δ j ( σ 2 ) ) j = 1 , , n , ( δ ˜ i ( σ 2 ) ) i = 1 , , m of a nonlinear system of n + m equations depending on r , t , σ 2 , and the statistical properties of H . This nonlinear system of equations is called in [21] the canonical equations associated to the random matrix model H . The number of unknowns m + n depends on r , t and the statistics of H . Then, it is shown that function σ 2 1 r Tr T ( σ 2 ) coincides with the Stieltjes transform of a certain probability measure μ, which, in turn, establishes that (5) holds for each σ 2 . It is also useful to notice that the entries of matrix S ˜ ( σ 2 ) defined as
S ˜ ( σ 2 ) = H H H + σ 2 I t 1
have the same behaviour as the entries of a t × t deterministic matrix T ˜ ( σ 2 ) . These entries can be evaluated in terms of the solutions of the above-mentioned canonical equations.
We also mention that it is often possible to evaluate the accuracy of the approximations. In the context of a number of complex Gaussian random matrix models, it holds that
E 1 r Tr S ( σ 2 ) E 1 r Tr S ( σ 2 ) 2 = O 1 t 2
E 1 r Tr S ( σ 2 ) 1 r Tr T ( σ 2 ) = O 1 t 2
Large system approximation of I We finally consider the problem of approximating I = E log det ( I r + 1 σ 2 H H H ) . We first note that
1 r log det ( I r + 1 σ 2 H H H ) = 1 r i = 1 r log 1 + λ i ( H H H ) / σ 2
and thus coincides with a functional of the eigenvalues of H H H . It can therefore be approximated by a deterministic term that can very often be expressed in an explicit way. In order to explain this, we use the following well-known identity:
1 r log det I + 1 σ 2 H H H = σ 2 + 1 ω 2 1 r Tr S ( ω 2 ) d ω 2
As 1 r Tr S ( ω 2 ) can be approximated by 1 r Tr T ( ω 2 ) , it can be shown that
1 r log det I + 1 σ 2 H H H σ 2 + 1 ω 2 1 r Tr T ( ω 2 ) d ω 2 0
But, in the context of the various usual MIMO models, it turns out that
σ 2 + 1 ω 2 1 r Tr T ( ω 2 ) d ω 2 = 1 r log det σ 2 T ( σ 2 ) 1 + log det σ 2 T ˜ ( σ 2 ) 1 g ( σ 2 , δ ( σ 2 ) , δ ˜ ( σ 2 ) )
where δ ( σ 2 ) = ( δ 1 ( σ 2 ) , , δ n ( σ 2 ) ) T and δ ˜ ( σ 2 ) = ( δ ˜ 1 ( σ 2 ) , , δ ˜ m ( σ 2 ) ) T represent the solutions of the nonlinear system of m + n equations governing the entries of T ( σ 2 ) and where g is a well chosen function that is expressed in closed form. Taking the mathematical expectation of (12), and using that the error between E 1 r Tr S ( ω 2 ) and 1 r Tr T ( ω 2 ) is a O 1 t 2 term for each ω 2 (see Equation (10)), it can be shown that
I / r = E 1 r log det I + 1 σ 2 H H H = 1 r log det σ 2 T ( σ 2 ) 1 + log det σ 2 T ˜ ( σ 2 ) 1 g ( σ 2 , δ ( σ 2 ) , δ ˜ ( σ 2 ) ) + O 1 t 2
or equivalently
I = I ¯ + O 1 t
where the large system approximation I ¯ of I is defined by:
I ¯ = log det σ 2 T ( σ 2 ) 1 + log det σ 2 T ˜ ( σ 2 ) 1 g ( σ 2 , δ ( σ 2 ) , δ ˜ ( σ 2 ) )
As I and I ¯ scale linearly with t when t increases, Ref (14) implies that the relative error between I and its large system approximation is a O 1 t 2 term, and thus decreases rather fast towards 0 when t increases. This explains partially why large system approximation of mutual information are very accurate even for moderate number of transmit and receive antennas.
Examples 
In order to illustrate these results, we provide the expressions of T , T ˜ and I ¯ in the context of useful random matrix models. We first consider the case where H can be written as:
H = 1 t C 1 / 2 X C ˜ 1 / 2
where the entries of X are complex Gaussian i.i.d. random variables with zero mean and variance 1 and where C and C ˜ are r × r and t × t deterministic positive matrices such that
sup r C < + , sup t C ˜ < +
Then, it can be shown that the nonlinear system of equations:
κ = 1 t Tr C σ 2 ( I r + κ ˜ C ) 1
κ ˜ = 1 t Tr C ˜ σ 2 ( I t + κ C ˜ ) 1
has a unique pair of positive solutions denoted ( δ ( σ 2 ) , δ ˜ ( σ 2 ) ) . Matrices T ( σ 2 ) and T ˜ ( σ 2 ) introduced above are given by:
T ( σ 2 ) = σ 2 ( I r + δ ˜ ( σ 2 ) C ) 1
T ˜ ( σ 2 ) = σ 2 ( I t + δ ( σ 2 ) C ˜ ) 1
The two canonical equations defining ( δ ( σ 2 ) , δ ˜ ( σ 2 ) ) can therefore be written as:
δ ( σ 2 ) = 1 t Tr C T ( σ 2 )
δ ˜ ( σ 2 ) = 1 t Tr C ˜ T ˜ ( σ 2 )
The corresponding large system approximation I ¯ of I is given by:
I ¯ = log det I + δ ˜ ( σ 2 ) C + log det I + δ ( σ 2 ) C ˜ t σ 2 δ ( σ 2 ) δ ˜ ( σ 2 )
so that function g defined by Equation (13) is given by g ( σ 2 , δ , δ ˜ ) = t σ 2 δ δ ˜ . It is useful to justify the above expression of I ¯ because the proof below extends in the context of other models. For this, it is necessary to establish that Equation (13) holds for the above function g. It can be shown that the right-hand side of (23) converges towards 0 when σ 2 + . Therefore, it remains to check that the derivative w.r.t. σ 2 of the right-hand side of (23) coincides with r 1 σ 2 1 r Tr T ( σ 2 ) . We introduce the following function V ( σ 2 , κ , κ ˜ ) of three variables:
V ( σ 2 , κ , κ ˜ ) = log det I + κ ˜ C + log det I + κ C ˜ t σ 2 κ κ ˜
which is obtained by replacing ( δ ( σ 2 ) , δ ˜ ( σ 2 ) ) with the fixed parameters ( κ , κ ˜ ) into the expression of the right-hand side of (23). A very important property of function V is:
V κ κ = δ ( σ 2 ) , κ ˜ = δ ˜ ( σ 2 ) = 0
V κ ˜ κ = δ ( σ 2 ) , κ ˜ = δ ˜ ( σ 2 ) = 0
In fact, it is straightforward that
V κ κ = δ ( σ 2 ) , κ ˜ = δ ˜ ( σ 2 ) = Tr C ˜ I + δ ( σ 2 ) C ˜ 1 t σ 2 δ ˜ ( σ 2 )
and vanishes because δ ˜ ( σ 2 ) satisfies the canonical Equation (22). We obtain similarly that (26) holds because δ ( σ 2 ) is the solution of the canonical Equation (21). In sum, it appears that (25, 26) are equivalent to the canonical Equations 19 and 20.
The derivative w.r.t. σ 2 of the right-hand side of (23) is given by:
V κ κ = δ ( σ 2 ) , κ ˜ = δ ˜ ( σ 2 ) d δ d σ 2 + V κ ˜ κ = δ ( σ 2 ) , κ ˜ = δ ˜ ( σ 2 ) d δ ˜ d σ 2 t δ ( σ 2 ) δ ˜ ( σ 2 )
and thus coincides with t δ ( σ 2 ) δ ˜ ( σ 2 ) by (25, 26). Using (21, 22) as well as the expressions (19, 20) of T ( σ 2 ) and T ˜ ( σ 2 ) , this is easily seen to be equal to r 1 σ 2 1 r Tr T ( σ 2 ) ; indeed,
r σ 2 Tr T ( σ 2 ) = Tr σ 2 T ( σ 2 ) 1 I r T = Tr ( I r + δ ˜ ( σ 2 ) C ) I r T = t δ ( σ 2 ) δ ˜ ( σ 2 )
Hence, both sides of (13) have the same derivative w.r.t. σ 2 . Moreover, both sides converge towards 0 when σ 2 + , thus Equation (13) is verified for g ( σ 2 , δ , δ ˜ ) = t σ 2 δ δ ˜ . This, together with (12), establishes that (23) holds.
We now consider the case where channel H is given as:
H = l = 1 L 1 t C l 1 / 2 X l C ˜ l 1 / 2
where matrices ( X l ) l = 1 , , L are mutually independent complex Gaussian random matrices with i.i.d. entries of mean 0 and variance 1 and where C l and C ˜ l are r × r and t × t deterministic positive matrices such that
sup r C l < + , sup t C ˜ l < +
for l = 1 , , L . It is important to notice that the parameter L is independent of r and t, and thus does not scale with the number of antennas. As shown in [7,12], model (27) is useful in the context of multipath channels with independent paths. Matrices T and T ˜ are given by:
T ( σ 2 ) = σ 2 ( I r + l = 1 L δ ˜ l ( σ 2 ) C l ) 1
T ˜ ( σ 2 ) = σ 2 ( I t + l = 1 L δ l ( σ 2 ) C ˜ l ) 1
where the ( δ l ( σ 2 ) , δ ˜ l ( σ 2 ) ) l = 1 , , L are the unique positive solutions of the system of 2 L equations:
δ l ( σ 2 ) = 1 t Tr C l T ( σ 2 )
δ ˜ l ( σ 2 ) = 1 t Tr C ˜ l T ˜ ( σ 2 )
for l = 1 , , L . The large system approximation of I is this time given by [7,12]:
I ¯ = log det I + l = 1 L δ ˜ l ( σ 2 ) C l + log det I + l = 1 L δ l ( σ 2 ) C ˜ l t σ 2 l = 1 L δ l ( σ 2 ) δ ˜ l ( σ 2 )
and thus corresponds to Equation (14) when g ( σ 2 , δ , δ ˜ ) = t σ 2 δ T δ ˜ = t σ 2 l = 1 L δ l δ ˜ l . (33) is proved in the same way as (23) by introducing the function of 2 L + 1 variables V ( σ 2 , κ , κ ˜ ) obtained by replacing into the right-hand side of (33) functions δ ( σ 2 ) , δ ˜ ( σ 2 ) by fixed vectors κ and κ ˜ , and by observing that the canonical Equations (31, 32) are equivalent to:
V κ l κ = δ ( σ 2 ) , κ ˜ = δ ˜ ( σ 2 ) = 0
V κ ˜ l κ = δ ( σ 2 ) , κ ˜ = δ ˜ ( σ 2 ) = 0
We now consider a third random matrix model modeling bi-correlated Rician channels. We assume that H can be written as:
H = A + 1 t C 1 / 2 X C ˜ 1 / 2
where the entries of X are complex Gaussian i.i.d. random variables with zero mean and variance 1 and where C and C ˜ are r × r and t × t deterministic positive matrices such that
sup r C < + , sup t C ˜ < +
Matrix A is a deterministic r × t matrix satisfying:
sup r , t A < +
This time, matrices T ( σ 2 ) and T ˜ ( σ 2 ) are given by:
T ( σ 2 ) = σ 2 ( I r + δ ˜ ( σ 2 ) C ) + A ( I + δ ( σ 2 ) C ˜ ) 1 A H 1
T ˜ ( σ 2 ) = σ 2 ( I t + δ ( σ 2 ) C ˜ ) + A H ( I + δ ˜ ( σ 2 ) C ) 1 A 1
where ( δ ( σ 2 ) , δ ˜ ( σ 2 ) ) satisfy the same canonical Equations (21, 22) as in the case of the Rayleigh bi-correlated MIMO channels, i.e.,
δ ( σ 2 ) = 1 t Tr C T ( σ 2 )
δ ˜ ( σ 2 ) = 1 t Tr C ˜ T ˜ ( σ 2 )
It can be shown as previously that the corresponding large system approximation I ¯ of I is now given by [11,22]:
I ¯ = log det I + δ ˜ ( σ 2 ) C + log det I + δ ( σ 2 ) C ˜ + 1 σ 2 A H ( I + δ ˜ ( σ 2 ) C ) 1 A t σ 2 δ ( σ 2 ) δ ˜ ( σ 2 )
and corresponds to Equation (14) when g ( σ 2 , δ , δ ˜ ) is equal to:
g ( σ 2 , δ , δ ˜ ) = log det I + 1 σ 2 ( I + δ ˜ C ) 1 A ( I + δ ˜ C ) 1 A H + t σ 2 δ δ ˜
A slightly more general non-zero mean MIMO model is:
H = A + 1 t U D X V H
where ∘ represents the Schur–Hadamard product, where D represents a r × t matrix such that D i , j 0 and sup r , t D < + , and where U and V are deterministic unitary r × r and t × t matrices. A and X have the same properties as in the context of model (36). This MIMO channel model was introduced in [23], studied in the large system regime in the zero mean case in [10] and in the non-zero mean case in [24] and, using the replica method, more recently in [9,16] in the context of the MIMO multi-access channel. As we are essentially interested by the functionals of the eigenvalues of H H H , it is possible to assume without restrictions that U and V are reduced to I r and I t respectively. In this case, matrices T and T ˜ are given by:
T ( σ 2 ) = σ 2 ( I r + Δ ˜ ( σ 2 ) ) + A ( I + Δ ( σ 2 ) ) 1 A H 1
T ˜ ( σ 2 ) = σ 2 ( I t + Δ ( σ 2 ) ) + A H ( I + Δ ˜ ( σ 2 ) 1 ) A 1
where Δ ( σ 2 ) = Diag ( δ 1 ( σ 2 ) , , δ t ( σ 2 ) ) and Δ ˜ ( σ 2 ) = Diag ( δ ˜ 1 ( σ 2 ) , , δ ˜ r ( σ 2 ) ) are two diagonal t × t and r × r matrices whose elements are the positive solutions of the equations:
δ j ( σ 2 ) = 1 t Tr E j T ( σ 2 )
δ ˜ i ( σ 2 ) = 1 t Tr E ˜ i T ˜ ( σ 2 )
for i = 1 , , r , and j = 1 , , t , where E j = Diag ( D 1 , j 2 , , D r , j 2 ) and E ˜ i = Diag ( D i , 1 2 , , D i , t 2 ) . The large system approximation I ¯ is now given by:
I ¯ = log det I + Δ ˜ ( σ 2 ) + log det I + Δ ( σ 2 ) + 1 σ 2 A H ( I + Δ ˜ ( σ 2 ) ) 1 A σ 2 Tr Δ ˜ ( σ 2 ) T ( σ 2 )
and corresponds to Equation (14) for a suitable function g ( σ 2 , δ , δ ˜ ) . We finally consider the model representing the MIMO multiple access channel addressed in [13]. H is now a r × L t matrix given by:
H = 1 t C 1 1 / 2 X 1 C ˜ 1 1 / 2 , , 1 t C L 1 / 2 X L C ˜ L 1 / 2
where the various matrices ( C l , C ˜ l , X l ) l = 1 , , L satisfy the same conditions as in the context of model (27). T and T ˜ are the r × L t and L t × r matrices given respectively by:
T ( σ 2 ) = σ 2 ( I r + l = 1 L δ ˜ l ( σ 2 ) C l ) 1
T ˜ ( σ 2 ) = diag T ˜ 1 ( σ 2 ) , , T ˜ L ( σ 2 )
where matrices ( T ˜ l ( σ 2 ) ) l = 1 , , L are defined by:
T ˜ l ( σ 2 ) = σ 2 ( I t + δ l ( σ 2 ) C ˜ l ) 1
and where the ( δ l ( σ 2 ) , δ ˜ l ( σ 2 ) ) l = 1 , , L are the unique positive solutions of the system of 2 L equations:
δ l ( σ 2 ) = 1 t Tr C l T ( σ 2 )
δ ˜ l ( σ 2 ) = 1 t Tr C ˜ l T ˜ l ( σ 2 )
for each l = 1 , , L . The large system approximation I ¯ is equal to:
I ¯ = log det I + l = 1 L δ ˜ l ( σ 2 ) C l + l = 1 L log det I + δ l ( σ 2 ) C ˜ l t σ 2 l = 1 L δ l ( σ 2 ) δ ˜ l ( σ 2 )
and thus correspond to Equation (14) when g ( σ 2 , δ , δ ˜ ) = t σ 2 δ T δ ˜ = t σ 2 l = 1 L δ l δ ˜ l .
Note that this model can be obtained from model (27), which provides a unified view. If we consider T l 1 / 2 = diag 0 , , 0 , C ˜ l 1 / 2 , 0 , , 0 and X l C r × t L then H = l 1 t C l 1 / 2 X l T l 1 / 2 gets the same expression as Equation (51); the large system approximation (57) is then easily obtained from (33).

2.2.2. The General Case Q I

In order to address the general case Q I , it is sufficient to exchange random matrix model H by random matrix model H Q 1 / 2 , and to evaluate the large system approximation I ¯ ( Q ) of the corresponding average mutual information I ( Q ) . This is straightforward if the right multiplication of H by Q 1 / 2 leads to a random matrix model for which the large system approximations are easy to evaluate. This turns out to be the case in the context of models (15, 27 and 36) and also in the context of model (51) if L t × L t matrix Q is block diagonal, i.e., Q = Diag ( Q 1 , , Q L ) , a condition that fortunately holds in the multiuser precoding schemes presented in [13]. For these models, H and H Q 1 / 2 belong to the same class of random matrix model and it is then sufficient to replace:
  • for model (15), matrix C ˜ by Q 1 / 2 C ˜ Q 1 / 2 ,
  • for model (27), matrices ( C ˜ l ) l = 1 , , L by ( Q 1 / 2 C ˜ l Q 1 / 2 ) l = 1 , , L ,
  • for model (36), matrices A and C ˜ by A Q 1 / 2 and Q 1 / 2 C ˜ Q 1 / 2 respectively,
  • for model (51), matrices ( C ˜ l ) l = 1 , , L by ( Q l 1 / 2 C ˜ l Q l 1 / 2 ) l = 1 , , L (where Q = Diag ( Q 1 , , Q L ) )
in the various equations defining the large system approximations of I (note that the various matrices C ˜ are multiplied from both sides by Q 1 / 2 as they are covariance matrices; e.g., the matrix C ˜ is the transmit covariance matrix of channel H , therefore the transmit covariance matrix of the channel H Q 1 / 2 is Q 1 / 2 C ˜ Q 1 / 2 ). In the context of model (45) the right multiplication by Q 1 / 2 unfortunately modifies the structure of the model. Using the replica trick [9] nevertheless derived a large system approximation I ¯ ( Q ) of I ( Q ) (see also [16] in the case of the corresponding multi-user MIMO channel). It is however not clear that the optimization of I ¯ ( Q ) completely fits with the unified presentation of the present paper. Therefore, we will not elaborate on the model (45) in the following.
In the context of models (15, 27 and 36), matrices T and T ˜ as well as solutions of the canonical equations δ , δ ˜ still depend on σ 2 , but also on the covariance matrix Q . As the dependence versus σ 2 does not play any role in the following, we now denote T , T ˜ , δ and δ ˜ by T ( Q ) , T ˜ ( Q ) , δ ( Q ) and δ ˜ ( Q ) respectively. It is easily seen that for the above 3 models the large system approximation of I ( Q ) can be written as:
I ¯ ( Q ) = log det I + Q G ( δ ( Q ) , δ ˜ ( Q ) ) + i ¯ ( δ ( Q ) , δ ˜ ( Q ) )
where G ( δ , δ ˜ ) is a certain t × t positive matrix valued function given in closed form, and i ¯ ( δ , δ ˜ ) is a function also given in closed form. As previously, it is useful to introduce the function V ( Q , κ , κ ˜ ) defined by:
V ( Q , κ , κ ˜ ) = log det I + Q G ( κ , κ ˜ ) + i ¯ ( κ , κ ˜ )
and which corresponds to the expression (58) of I ¯ ( Q ) but in which ( δ ( Q ) , δ ˜ ( Q ) ) is replaced by the fixed parameter ( κ , κ ˜ ) . In others words, I ¯ ( Q ) can be written as:
I ¯ ( Q ) = V ( Q , δ ( Q ) , δ ˜ ( Q ) )
In the context of models (15, 27 and 36), it holds that
V κ i κ = δ ( Q ) , κ ˜ = δ ˜ ( Q ) = 0
V κ ˜ j κ = δ ( Q ) , κ ˜ = δ ˜ ( Q ) = 0
for each pair ( i , j ) . As previously, these relations follow directly from the canonical equations verified by the components of δ ( Q ) and δ ˜ ( Q ) . It is important to notice that the boundedness assumptions (28, 38) imply that I ¯ ( Q ) represents a valid large system approximation of I ( Q ) provided that the mean and the covariance matrices associated to model H Q 1 / 2 satisfy such assumptions. For this, it is sufficient to assume that matrix Q satisfies:
sup t Q < +
In particular, property I ¯ ( Q ) I ( Q ) = O 1 t holds if matrix Q satisfies (63). As explained below, this induces some technical difficulties to justify the relevance of the approach consisting in replacing the optimization of I ( Q ) by the optimization of I ¯ ( Q ) .
We finally mention that in the context of model (51), if Q is block-diagonal, it holds that
I ¯ ( Q ) = l = 1 L log det I + Q l G l ( δ ( Q ) , δ ˜ ( Q ) ) + i ¯ ( δ ( Q ) , δ ˜ ( Q ) )
for certain matrix valued functions ( G l ) l = 1 , , L . In [13], it is proposed to optimize (64) w.r.t. matrices ( Q l ) l = 1 , , L under the constraints 1 t Tr ( Q l ) = 1 for each l = 1 , , L . As the formulation of this problem slightly differs from the optimization of I ¯ in models (15, 27 and 36), we will not discuss this issue in the next sections. However, the results presented below can easily be adapted to the context of model (51).

3. Concavity of the Large System Approximation

As the general approach of this paper is to optimize I ¯ ( Q ) w.r.t. Q , it is crucial to be able to prove that Q I ¯ ( Q ) is a strictly concave function defined on C 1 . We point out that this important point was not addressed in [7,14,15]. While the strict concavity needs rather intricate arguments, specific to each model, the concavity of I ¯ in the context of (15, 27 and 36) can be derived in a unified way. For this, we use the scheme proposed in [11,12], and introduce for each integer m an augmented m r × m t -dimensional model H ( m ) of the same kind in which the various covariance matrices and mean matrices are exchanged by their Kronecker product by matrix I m (the new matrices are denoted C l ( m ) , C ˜ l ( m ) , A ( m ) ), and in which the mutually independent i.i.d. r × t complex Gaussian random matrices ( X l ) l = 1 , , L are replaced by mutually independent i.i.d. m r × m t complex Gaussian random matrices ( X l ( m ) ) l = 1 , , L . If Q is a t × t covariance matrix defined on the usual cone C , we consider the function I m defined on C by:
I m ( Q ) = E log det I m r + 1 σ 2 H ( m ) ( Q I m ) H ( m ) H
When m + while r and t remain fixed, the dimensions m r and m t of matrix H ( m ) both converge to + in such a way that m t m r = t r remains constant. For each model, the mutual information I m ( Q ) has a large system approximation I ¯ m ( Q I m ) , which can be expressed explicitly in terms of Q and of the solutions of the associated canonical equations denoted δ ( m ) ( Q I m ) and δ ˜ ( m ) ( Q I m ) . However, looking at the canonical equations characterizing models (15, 27 and 36), it is easy to check that δ ( m ) ( Q I m ) and δ ˜ ( m ) ( Q I m ) coincide with the solutions δ ( Q ) and δ ˜ ( Q ) of the canonical equation associated to the non-augmented model, and that
I ¯ m ( Q ) = m I ¯ ( Q )
for each m. Moreover, it is clear that
lim m + 1 m I m ( Q ) I ¯ m ( Q ) = 0
which implies that for each Q C , it holds that
lim m + I m ( Q ) m = I ¯ ( Q )
Function Q 1 m I m ( Q ) is clearly concave on C . As the pointwise limit of concave functions is still concave, (67) implies that I ¯ is concave on C .
As mentioned above, the strict concavity of the large system approximation is much more difficult, and was proved in [11,12] in the context of the models (15, 27 and 36). Therefore, function I ¯ is maximized on C 1 by a unique matrix denoted Q ¯ * in the following.

4. Properties of I ¯ ( Q ¯ * )

4.1. Comparison between I ( Q * ) and I ( Q ¯ * )

The optimization of I ¯ in place of I is justified if it is possible to establish that the optimum mutual information I ( Q * ) has the same asymptotic behaviour as I ( Q ¯ * ) when t increases. As function I ¯ is an approximation of function I, it seems reasonable to expect that the argument of the maximum of both functions behave similarly, and thus that I ( Q * ) I ( Q ¯ * ) 0 when t + . We however recall that I ¯ ( Q ) is a valid approximation of I ( Q ) provided that sup t Q < + . It is therefore intuitive that if Q * and/or Q ¯ * do not satisfy this condition, then the approach consisting in approximating Q * by Q ¯ * may not be successful. Fortunately, the following result was established in [11,12] in the context of models (15, 27 and 36).
Theorem 1
In the context of models (15, 27 and 36), under certain extra assumptions on the second order statistics of H , it holds that
(1)
sup t Q * < +
(2)
sup t Q ¯ * < +
(3)
I ( Q * ) I ( Q ¯ * ) = O 1 t
This result is proved in [11,12], where the above-mentioned technical extra assumptions are specified (they are related to the infima over the considered matrix dimension of the second order statistics smallest eigenvalue—these infima need to be greater than 0). The proofs of items (1) and (2) are not obvious, and the interested reader may refer to [11,12] for more details. We here only justify that item (3) holds if (1) and (2) are verified. We write that
I ( Q * ) I ( Q ¯ * ) + I ¯ ( Q ¯ * ) I ¯ ( Q * ) = I ( Q * ) I ¯ ( Q * ) + I ¯ ( Q ¯ * ) I ( Q ¯ * )
As (1) and (2) hold, I ( Q * ) I ¯ ( Q * ) and I ¯ ( Q ¯ * ) I ( Q ¯ * ) are both O 1 t terms. Moreover, I ( Q * ) I ( Q ¯ * ) and I ¯ ( Q ¯ * ) I ¯ ( Q * ) are both positive terms by the very definition of Q * and Q ¯ * . Therefore, both terms are O 1 t , which, in particular, implies that item (3) holds.

4.2. Structure of I ( Q ¯ * )

In the context of model (15), it is well known that the eigenvectors of the optimal matrix Q * coincide with the eigenvectors of matrix C ˜ . Models (27, 36) are more complicated, and the structure of Q * is unknown. The asymptotic approach presented in this paper allows to obtain insights on matrix Q ¯ * , and thus on an approximate capacity achieving covariance matrix. In order to present the corresponding result, we recall that in the context of models (27, 36), I ¯ ( Q ) is given by (58). We denote by δ * and δ ˜ * the vectors δ ( Q ¯ * ) and δ ˜ ( Q ¯ * ) , and by G * the matrix G ( δ * , δ ˜ * ) . Then, the following result holds.
Theorem 2
Matrix Q ¯ * coincides with the maximizer of the following classical optimization problem:
sup Q C 1 log det I + G * Q
Before giving the proof, we provide some comments on Theorem 2. It is first important to notice that this result cannot in practice be used in order to evaluate matrix Q ¯ * because the parameters δ * and δ ˜ * , which depend on Q ¯ * , are of course unknown. However, Theorem 2 is useful in that it gives some insights on the structure of the nearly capacity achieving covariance matrix Q ¯ * . More precisely, it is well known that the eigenvectors of the maximizer of (69) coincide with the eigenvectors of G * and that its eigenvalues are solutions of a water-filling algorithm based on the eigenvalues of G * . It is useful to express matrix G * in the context of models (27, 36). In the first case (27), G * is equal to:
G * = l = 1 L δ l , * C ˜ l
If model (27) reduces to the Rayleigh bi-correlated MIMO channel, i.e., L = 1 , one obtains that the eigenvectors of Q ¯ * coincide with the eigenvectors of the transmit correlation matrix, a result consistent with what is known on Q * in this context. Interestingly, if L > 1 , we obtain that the eigenvectors of the approximate capacity achieving covariance matrix Q ¯ * coincide with the eigenvectors of a weighted sum of the covariance matrices ( C ˜ l ) l = 1 , , L . Moreover, the eigenvalues of Q ¯ * are the solutions of a classical water-filling algorithm based on the eigenvalues of l = 1 L δ l , * C ˜ l .
In the context of model (36), matrix G * is given by:
G * = I + δ * C ˜ + 1 σ 2 A H ( I + δ ˜ * C ) 1 A
In the case of an uncorrelated Rician channel ( C and C ˜ are both reduced to I ), matrix G * is a linear combination of I and of matrix A H A . The eigenvectors of Q ¯ * thus coincide with the right singular vectors of A , a result in accordance with the structure of Q * in this context (see [5]). If C = I , the eigenvectors of Q ¯ * coincide with the eigenvectors of a linear combination of C ˜ and A H A , which is an intuitively pleasant result. However, in the general case where C I , this is no longer true, and a linear combination of C ˜ and the surprising matrix A H ( I + δ ˜ * C ) 1 A comes into play.
Proof of Theorem 2.
We first introduce some classical concepts and results needed for the optimization of Q I ¯ ( Q ) .
Definition 1
Let φ be a function defined on the convex set C 1 . Let P , Q C 1 . Then φ is said to be differentiable in the Gâteaux sense (or Gâteaux differentiable) at point Q in the direction P Q if the following limit exists:
lim λ 0 + ϕ ( Q + λ ( P Q ) ) ϕ ( Q ) λ
In this case, this limit is noted ϕ ( Q ) , P Q .
Note that ϕ ( Q + λ ( P Q ) ) makes sense for λ [ 0 , 1 ] , as Q + λ ( P Q ) = ( 1 λ ) Q + λ P naturally belongs to C 1 .
Proposition 1
Let ϕ : C 1 R be a strictly concave function. Then,
  • φ is Gâteaux differentiable at Q in the direction P Q for each P , Q C 1 ,
  • Q o p t is the unique maximizer of φ on C 1 if and only if it verifies:
    Q C 1 , ϕ ( Q o p t ) , Q Q o p t 0
This proposition is standard (see for example Chapter 2 of [25]).
In order to prove Theorem 2, we apply the above proposition to function I ¯ . For this, we use function V introduced in Equation (59). It is clear that maximizing over C 1 function Q log det I + Q G * is equivalent to maximizing function Q V ( Q , δ * , δ ˜ * ) because i ¯ ( δ * , δ * ˜ ) does not depend on Q . We denote by V ( Q ¯ * , δ * , δ ˜ * ) , P Q ¯ * the Gâteaux differential of function Q V ( Q , δ * , δ * ˜ ) at point Q ¯ * in direction P Q ¯ * . The proof then relies on the observation hereafter proven that, for each P C 1 ,
I ¯ ( Q ¯ * ) , P Q ¯ * = V ( Q ¯ * , δ * , δ ˜ * ) , P Q ¯ *
Assuming (71) is verified, it holds that the left-hand side of (71) is negative for each P because Q ¯ * maximizes I ¯ over C 1 . Therefore, we obtain that V ( Q ¯ * , δ * , δ ˜ * ) , P Q ¯ * 0 for each matrix P C 1 . As the function Q V ( Q , δ * , δ ˜ * ) is strictly concave on C 1 , its unique maximizer on C 1 coincides with Q ¯ * .
It now remains to prove (71). We first mention that functions Q δ ( Q ) and Q δ ˜ ( Q ) are Gâteaux differentiable (this is proved in [11,12] in the context of models (27, 36)). Consider P and Q C 1 . Then, (60) implies that
I ¯ ( Q ) , P Q = V ( Q , δ ( Q ) , δ ˜ ( Q ) ) , P Q + l = 1 L V κ l κ = δ ( Q ) , κ ˜ = δ ˜ ( Q ) δ l ( Q ) , P Q
+ l = 1 L V κ ˜ l κ = δ ( Q ) , κ ˜ = δ ˜ ( Q ) δ ˜ l ( Q ) , P Q
Using relations (61, 62) and letting Q = Q ¯ * in Equation (72) yields, as expected,
I ¯ ( Q ¯ * ) , P Q ¯ * = V ( Q ¯ * , δ ( Q ¯ * ) , δ ˜ ( Q ¯ * ) ) , P Q ¯ *
We mention that a number of authors used, at least implicitly, the result stated in Theorem 2, but sometimes with partial or non-convincing justifications. We hope that the above proof is more convincing.

5. Optimization Algorithms of I ¯ ( Q )

The most straightforward way to optimize the strictly concave function Q I ¯ ( Q ) on C 1 consists in implementing a descent algorithm. The constraint Q 0 may be taken into account by using, e.g., barrier interior point methods ([26]). However, an alternative tempting choice consists in proposing an iterative water-filling algorithm based on the observation that for fixed ( κ , κ ˜ ) , the optimization w.r.t. Q of Q V ( Q , κ , κ ˜ ) can be achieved by a classical water-filling. On the other hand, for fixed Q , the solutions of the canonical equations ( δ ( Q ) , δ ˜ ( Q ) ) can be easily evaluated. Therefore, a tempting choice is to adapt Q and ( δ ( Q ) , δ ˜ ( Q ) ) separately. In order to present the corresponding algorithms, we assume that the canonical equations defining ( δ ( Q ) , δ ˜ ( Q ) ) can be written as:
δ ( Q ) = f ( Q , δ ( Q ) , δ ˜ ( Q ) )
δ ˜ ( Q ) = f ˜ ( Q , δ ( Q ) , δ ˜ ( Q ) )
In order to evaluate numerically the solutions of system (73, 74), it is possible to use the fixed-point algorithm:
  • Initialization: choose δ ( 0 ) and δ ˜ ( 0 ) as vectors with strictly positive components
  • Evaluation of ( δ ( k + 1 ) , δ ˜ ( k + 1 ) ) from ( δ ( k ) , δ ˜ ( k ) ) :
    δ ( k + 1 ) = f ( Q , δ ( k ) , δ ˜ ( k ) )
    δ ˜ ( k + 1 ) = f ˜ ( Q , δ ( k ) , δ ˜ ( k ) )
It is proved in [12] that this algorithm converges towards ( δ ( Q ) , δ ˜ ( Q ) ) in the context of model (27). However, the approach used in [12], based on analytic continuation methods, can be extended to the other kinds of canonical equations.
The iterative water-filling algorithm proposed in [11,12] is a follows:
  • Initialization: Q ( 0 ) = I
  • Evaluation of Q ( m + 1 ) from Q ( m ) : δ ( m + 1 ) and δ ˜ ( m + 1 ) are defined as the solutions of the canonical Equation (73, 74) when Q = Q ( m ) , and Q ( m + 1 ) coincides with the argument of the maximum w.r.t. Q of log det I + Q G ( δ ( m + 1 ) , δ ˜ ( m + 1 ) ) .
It is proved in [11,12] that if this algorithm is convergent, then it converges towards the optimal matrix Q ¯ * . We do not reproduce the proof here, but mention that it is again based on Equation (71). While the algorithm seems to converge very often, in [15] an example was quite cleverly found for which sequence ( Q ( m ) ) m 0 may oscillate between two non-full rank positive matrices Q 1 and Q 2 such that Rank ( Q 1 ) Rank ( Q 2 ) . Interestingly, [15] suggested a very simple modification of the above scheme and proved its convergence in the context of model (15):
  • Initialization: choose δ ( 0 ) and δ ˜ ( 0 ) as vectors with strictly positive components
  • Evaluation of δ ( n + 1 ) and δ ˜ ( n + 1 ) from δ ( n ) and δ ˜ ( n ) : Q ( n ) coincides with the argument of the maximum w.r.t. Q of log det I + Q G ( δ ( n ) , δ ˜ ( n ) ) and δ ( n + 1 ) and δ ˜ ( n + 1 ) are defined by:
    δ ( n + 1 ) = f ( Q ( n ) , δ ( n ) , δ ˜ ( n ) )
    δ ˜ ( n + 1 ) = f ˜ ( Q ( n ) , δ ( n ) , δ ˜ ( n ) )
The two schemes differ in that [15] does not solve the canonical equations at each iteration, but implements (77, 78), which is somewhat similar to an iteration of the fixed point algorithm.
We finally provide here some simulation results to evaluate the performance of the optimization algorithms in the context of model (27) with L = 5 . In order to generate the matrices C ( l ) and C ˜ ( l ) , we use the propagation model introduced in [27]; for every l, the matrices C ( l ) and C ˜ ( l ) correspond to a path, which is characterized by a mean angle of departure, a mean angle of arrival and an angle spread for each of these two angles (these parameters are given in Table 1 in radians).
Table 1. Paths angular parameters (in radians).
Table 1. Paths angular parameters (in radians).
l = 1 l = 2 l = 3 l = 4 l = 5
mean departure angle4.440.201.740.290.61
departure angle spread0.090.080.040.110.01
mean arrival angle4.120.225.345.874.26
arrival angle spread0.090.080.050.080.03
In the simulations featured in Figure 1(a) (respectively in Figure 1(b)), we consider a MIMO system with r = t = 4 (respectively r = t = 8 ). In both figures, we have represented the average mutual information I ( I t ) (i.e., without optimization), the optimal average mutual information I ( Q ¯ * ) (i.e., with an input covariance matrix maximizing the approximation I ¯ ) obtained by the iterative water-filling introduced in [11,12] (also called frozen water-filling [15]) and the same quantity I ( Q ¯ * ) obtained this time by the water-filling introduced by [15]. All the mutual information are evaluated by Monte-Carlo simulations with 20 000 channel realizations. All the values given by iterative algorithms are calculated with a relative precision of 10 3 . The average mutual information I ( Q * ) obtained with the Vu–Paulraj direct approach [2] is also represented as a reference.
Figure 1. Comparison with Vu–Paulraj algorithm.
Figure 1. Comparison with Vu–Paulraj algorithm.
Entropy 14 02122 g001
Vu–Paulraj’s algorithm is a direct approach to optimize I ( Q ) . It is composed of two nested iterative loops. The inner loop evaluates Q * ( n ) = argmax I ( Q ) + k barrier log det Q thanks to the Newton algorithm with the constraint 1 t Tr Q = 1 , for a given value of k barrier and a given starting point Q 0 ( n ) . Maximizing I ( Q ) + k barrier log det Q instead of I ( Q ) ensures that Q remains positive semi-definite through the steps of the Newton algorithm; this is the so-called barrier interior-point method. The outer loop then decreases k barrier by a certain constant factor μ and gives the inner loop the next starting point Q 0 ( n + 1 ) = Q * ( n ) . The algorithm stops when the desired precision is obtained, or, as the Newton algorithm requires heavy Monte-Carlo simulations for the evaluation of the gradient and of the Hessian of I ( Q ) , when the number of iterations of the outer loop reaches a given number N max . As in [2], we chose N max = 10 , μ = 100 , 20 000 trials for the Monte-Carlo simulations, and we started with k barrier = 1 100 .
Both Figure 1(a) and Figure 1(b) show that maximizing I ¯ ( Q ) over the input covariance leads to significant improvement of I ( Q ) . All the approaches provide the same results: the two water-filling procedures presented in this section and the Vu–Paulraj’s direct approach all produce the same results, confirming the stated results.
Nonetheless, the execution times differ considerably from one algorithm to another. The indirect approaches are computationally much more efficient: in Vu–Paulraj’s algorithm, the evaluation of the gradient and of the Hessian of I ( Q ) needs heavy Monte-Carlo simulations, thus slowing down drastically the calculations. Besides, the water-filling introduced by [8] is computationally slightly more efficient than the so-called frozen water-filling. Instead of two nested iterative loops (the outer to optimize the input covariance matrix Q , the inner to solve the canonical equations), there is only one loop performing both calculations. For the present case with r = t = 8 , an average of 39.7 fixed point iterations is needed for the frozen water-filling to converge, whereas the water-filling introduced by [8] needs only 7.9 fixed point iterations on average. Table 2 illustrates these comparisons; for the three algorithms the average execution time to obtain the input covariance matrix is given, on a 2.93 GHz Intel Core 2 Duo CPU with 4 GB of RAM, for both r = t = 4 and r = t = 8 .
Table 2. Average execution time.
Table 2. Average execution time.
r = t = 4 r = t = 8
Vu–Paulraj 503 s 1877 s
Frozen Water-filling 17.3 m s 20.1 m s
Water-filling of [8] 5.10 m s 7.13 m s

6. Concluding Remarks

In this paper, we have given a comprehensive introduction to the evaluation of the capacity achieving covariance matrices of MIMO channels using large system approximation approaches. When the MIMO channel is available at the receiver and its probability distribution is available at the transmitter, the direct maximization of the mutual information w.r.t. the input covariance matrix leads to high complexity algorithms using intensive Monte-Carlo simulations. An alternative approach consists in approximating the mutual information in the large number of antennas regime and optimizing the approximation. We have recalled how it is possible to derive the large system approximations in the context of a number of MIMO channels, and have addressed their optimization. We have emphasized on important methodological points that are not necessarily covered by the existing literature, such as the strict concavity of the approximation, the structure of the argument of its maximum, the accuracy of the large system approach w.r.t. the number of antennas, and the justification of iterative water-filling optimization algorithms. We finally mention that the mutual information considered in this paper allow to obtain insights on the performances of MIMO systems equipped with maximum likelihood receivers. As these receivers may be complex to implement, reduced complexity schemes such as the minimum mean squares (MMSE) receivers are often considered. In this context, the relevant mutual information to be optimized versus the input precoder is different, and is defined as the sum over the transmit antennas of terms E ( log ( 1 + β j ) ) where β j represents the output MMSE signal to interference plus noise ratio associated to the stream sent by antenna j. The corresponding optimization problems are more difficult because they are not convex as in the context of the present paper. However, high accuracy large system approximation techniques can still be developed (see e.g., [28,29]), and provide both insights on the optimal precoders and low complexity maximization algorithms. The case of Rayleigh bi-correlated MIMO channels was recently addressed in [30] where the structure of the optimal precoders was analyzed. However, an important work remains to be done in the context of the other usual MIMO channels.

References

  1. Telatar, E. Capacity of multi-antenna gaussian channels. Eur. Trans. Telecommun. 1999, 10, 585–595. [Google Scholar] [CrossRef]
  2. Vu, M.; Paulraj, A. Capacity Optimization for Rician Correlated MIMO Wireless Channels. In Proceedings of the Thirty-Ninth Asilomar Conference on the Signals, Systems and Computers, Pacific Grove, CA, USA, November 2005; pp. 133–138.
  3. Goldsmith, A.; Jafar, S.; Jindal, N.; Vishwanath, S. Capacity limits of MIMO channels. IEEE J. Select. Areas Commun. 2003, 21, 684–702. [Google Scholar] [CrossRef]
  4. Jafar, S.; Goldsmith, A. Transmitter optimization and optimality of beamforming for multiple antenna systems. IEEE Trans. Wirel. Commun. 2004, 3, 1165–1175. [Google Scholar] [CrossRef]
  5. Hosli, D.; Kim, Y.; Lapidoth, A. Monotonicity results for coherent MIMO Rician channels. IEEE Trans. Inform. Theory 2005, 51, 4334–4339. [Google Scholar] [CrossRef]
  6. Moustakas, A.; Simon, S.; Sengupta, A. MIMO capacity through correlated channels in the presence of correlated interferers and noise: A (not so) large N analysis. IEEE Trans. Inf. Theory 2003, 49, 2545–2561. [Google Scholar] [CrossRef]
  7. Moustakas, A.; Simon, S. On the outage capacity of correlated multiple-path MIMO channels. IEEE Trans. Inf. Theory 2007, 53, 3887. [Google Scholar] [CrossRef]
  8. Taricco, G. Asymptotic mutual information statistics of separately correlated Rician fading MIMO channels. IEEE Trans. Inf. Theory 2008, 54, 3490–3504. [Google Scholar] [CrossRef]
  9. Wen, C.; Jin, S.; Wong, K.; Chen, J.; Ting, P. On the Ergodic Capacity of Jointly-Correlated Rician Fading MIMO Channels. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal, Prague, Czech Republic, May 2011; pp. 3224–3227.
  10. Tulino, A.; Verdú, S. Random Matrix Theory and Wireless Communications; Now Publishers Inc: Hanover, MA, USA, 2004. [Google Scholar]
  11. Dumont, J.; Hachem, W.; Lasaulce, S.; Loubaton, P.; Najim, J. On the capacity achieving covariance matrix for Rician MIMO channels: An asymptotic approach. IEEE Trans. Inf. Theory 2010, 56, 1048–1069. [Google Scholar] [CrossRef]
  12. Dupuy, F.; Loubaton, P. On the capacity achieving covariance matrix for frequency selective MIMO channels using the asymptotic approach. Inf. Theory IEEE Trans. 2011, 57, 5737–5753. [Google Scholar] [CrossRef] [Green Version]
  13. Couillet, R.; Debbah, M.; Silverstein, J. A deterministic equivalent for the analysis of correlated MIMO multiple access channels. IEEE Trans. Inf. Theory 2011, 57, 3493–3514. [Google Scholar] [CrossRef]
  14. Wen, C.; Ting, P.; Chen, J. Asymptotic analysis of MIMO wireless systems with spatial correlation at the receiver. IEEE Trans. Commun. 2006, 54, 349–363. [Google Scholar] [CrossRef]
  15. Taricco, G.; Riegler, E. On the ergodic capacity of correlated Rician fading MIMO channels with interference. IEEE Trans. Inf. Theory 2011, 57, 4123–4137. [Google Scholar] [CrossRef]
  16. Wen, C.; Jin, S.; Wong, K. On the sum-rate of multiuser MIMO uplink channels with jointly-correlated Rician fading. IEEE Trans. Commun. 2011, 59, 2883–2895. [Google Scholar] [CrossRef]
  17. Luenberger, D. Optimization by Vector Space Methods; John Wiley and Sons: New York, NY, USA, 1969. [Google Scholar]
  18. Marčenko, V.; Pastur, L. Distribution of eigenvalues for some sets of random matrices. Math. USSR Sb. 1967, 1, 457. [Google Scholar] [CrossRef]
  19. Pastur, L.; Shcherbina, M. Eigenvalue Distribution of Large Random Matrices. In Mathematical Surveys and Monographs; American Mathematical Society: Providence, RI, USA, 2011. [Google Scholar]
  20. Bai, Z.; Silverstein, J. Spectral Analysis of Large Dimensional Random Matrices (Springer Series in Statistics), 2nd ed.; Springer: Heidelberg, Germnay, 2009. [Google Scholar]
  21. Girko, V. Theory of Stochastic Canonical Equations, Volume I; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2001; Chapter 7. [Google Scholar]
  22. Taricco, G. On the Capacity of Separately-Correlated MIMO Rician Channels. In Proceedings of Global Telecommunications Conference, San Francisco, CA, USA, December 2006.
  23. Sayeed, A. Deconstructing multiantenna fading channels. IEEE Trans. Signal Process. 2002, 50, 2563–2579. [Google Scholar] [CrossRef]
  24. Hachem, W.; Loubaton, P.; Najim, J. Deterministic equivalents for certain functionals of large random matrices. Ann. Appl. Probab. 2007, 17, 875–930. [Google Scholar] [CrossRef]
  25. Borwein, J.; Lewis, A. Convex Analysis and Nonlinear Optimization : Theory and Examples; Springer: Verlag, Germany, 2000. [Google Scholar]
  26. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  27. Bolcskei, H.; Gesbert, D.; Paulraj, A. On the capacity of OFDM-based spatial multiplexing systems. IEEE Trans. Commun. 2002, 50, 225–234. [Google Scholar] [CrossRef]
  28. Kumar, K.; Caire, G.; Moustakas, A. Asymptotic performance of linear receivers in MIMO fading channels. IEEE Trans. Inf. Theory 2009, 55, 4398–4418. [Google Scholar] [CrossRef]
  29. Moustakas, A.; Kumar, K.; Caire, G. Performance of MMSE MIMO Receivers: A Large N analysis for Correlated Channels. In Proceedings of IEEE Vehicular Technology Conference, Barcelona, Spain, 2009.
  30. Artigue, C.; Loubaton, P. On the precoder design of flat fading MIMO systems equipped with MMSE receivers: A large system approach. IEEE Trans. Inf. Theory 2011, 57, 4138–4155. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Dupuy, F.; Loubaton, P. Optimization of MIMO Systems Capacity Using Large Random Matrix Methods. Entropy 2012, 14, 2122-2142. https://doi.org/10.3390/e14112122

AMA Style

Dupuy F, Loubaton P. Optimization of MIMO Systems Capacity Using Large Random Matrix Methods. Entropy. 2012; 14(11):2122-2142. https://doi.org/10.3390/e14112122

Chicago/Turabian Style

Dupuy, Florian, and Philippe Loubaton. 2012. "Optimization of MIMO Systems Capacity Using Large Random Matrix Methods" Entropy 14, no. 11: 2122-2142. https://doi.org/10.3390/e14112122

Article Metrics

Back to TopTop