Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback

Kang, Jinho; Lee, Jung Hoon; Choi, Wan

doi:10.3390/app9142894

Open AccessArticle

Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback

by

Jinho Kang

¹

,

Jung Hoon Lee

^2,*

and

Wan Choi

¹

School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea

²

Department of Electronics Engineering and Applied Communications Research Center, Hankuk University of Foreign Studies, Yongin 17035, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2019, 9(14), 2894; https://doi.org/10.3390/app9142894

Submission received: 12 June 2019 / Revised: 15 July 2019 / Accepted: 17 July 2019 / Published: 19 July 2019

(This article belongs to the Special Issue Applications of AI for 5G and Beyond Communications: Network Management, Operation, and Automation)

Download

Browse Figures

Versions Notes

Abstract

:

A two-stage precoder is widely considered in frequency division duplex massive multiple-input and multiple-output (MIMO) systems to resolve the channel feedback overhead problem. In massive MIMO systems, users on a network can be divided into several user groups of similar spatial antenna correlations. Using the two-stage precoder, the outer precoder reduces the channel dimensions mitigating inter-group interferences at the first stage, while the inner precoder eliminates the smaller dimensions of intra-group interferences at the second stage. In this case, the dimension of effective channel reduced by outer precoder is important as it leverages the inter-group interference, the intra-group interference, and the performance loss from the quantized channel feedback. In this paper, we propose the machine learning framework to find the optimal dimensions reduced by the outer precoder that maximizes the average sum rate, where the original problem is an NP-hard problem. Our machine learning framework considers the deep neural network, where the inputs are channel statistics, and the outputs are the effective channel dimensions after outer precoding. The numerical result shows that our proposed machine learning-based dimension optimization achieves the average sum rate comparable to the optimal performance using brute-forcing searching, which is not feasible in practice.

Keywords:

massive MIMO; machine learning; deep neural network; limited feedback; two-stage precoder; non-convex optimization

1. Introduction

Massive multiple-input and multiple-output (MIMO) is one of the most promising technologies for the next-generation wireless mobile communication systems [1,2,3,4]. The large-scale antennas equipped at a base station (BS) can considerably improve the data rate, energy efficiency, and reliability; the theoretical results show that it perfectly cancels out inter-user interference with simple linear transceivers. In this case, the accurate knowledge of channel state information (CSI) is crucial at the BS to achieve the potential gain from large-scale antennas [1,2,3,4]. In time division duplex (TDD) systems, the BS can acquire CSI during uplink training thanks to channel reciprocity [4,5], and, hence, many existing works on massive MIMO systems consider TDD systems. However, many of current wireless mobile communication systems are frequency division duplexing (FDD) systems [6,7], whose uplink and downlink channels are independent from each other, which motivates research to achieve the potential gains also in FDD systems [8,9,10,11,12,13].

In FDD systems, a BS obtains the CSI from users’ channel feedback due to the lack of reciprocity between the uplink and the downlink channels [6,7,8,9,10,11,12,13]. In massive MIMO systems, the CSI feedback overhead problem becomes more severe because of the large number of antennas; it was already shown that the feedback size should linearly scale with the number of antennas to fully obtain the multiplexing gain [6,7]. To resolve the feedback overhead problem, a two-stage precoder was widely used for FDD massive MIMO systems, which consists of the outer and the inner precoders [10,11,12,13]. In the two-stage precoder, the outer precoder projects the original channel space of large dimensions onto a smaller dimensional subspace, and then the inner precoder controls the inter-user interference as multiuser MIMO precoding. There are various types of two-stage precoder designs in massive MIMO systems, and, among them, the hybrid architecture is widely considered for mmWave bands due to the limitation of hardware implementation [14,15,16,17]. As a full digital architecture has many benefits in sub-6 GHz bands, there are also many papers that consider the full digital architecture in massive MIMO systems [10,11,12,13]. Thus, we mainly consider a joint spatial division and multiplexing (JSDM) [10] with the full digital architecture in massive MIMO systems with a correlated channel environment.

The key idea of JSDM is to divide whole users into multiple user groups according to the channel covariance matrices, and then the outer precoder and the inner precoder sequentially mitigate inter-group and inter-user interference, respectively [10,18]. At the first stage, the outer precoder mitigates the inter-group interference (IGI) by projecting the original channel into a smaller dimensional subspace. Then, the inner precoder cancels out the same-group interference (SGI) with the dimension-reduced effective channels produced by the outer precoder; in this case, the BS exploits the quantized versions of dimension-reduced effective channels obtained via limited feedback. Therefore, the outer precoder design is important as it leverages many performance achieving factors such as the inter-group interference, the intra-group interference, and the channel quantization error. This motivates the sophisticated outer precoder design taking into account all of these factors. In this context, we optimized the dimension of outer precoder in [19] for a downlink massive MIMO system with limited feedback based on the lower bound-based analysis.

Meanwhile, machine learning recently has attracted a lot of attention in wireless communication systems [20,21]. The machine learning technique has shown good performance in many applications from image processing to economics [20,21,22]. In addition, machine learning is applied to the physical layer processing such as antenna selection and beamforming design in MIMO systems [23,24], and channel estimation and hybrid precoding for massive MIMO systems [25,26]. In particular, deep learning [27], which is one of key machine learning techniques, is tackling complicated nonlinear problems and high-computation issues in many areas [22] and overwhelms many existing schemes [22,24,25,26].

In this paper, we develop our initial work [19] on the dimension optimization for the outer precoder design with the machine learning framework. Our contributions can be summarized as follows:

We introduce our two-stage precoder design with limited feedback, where the quantized CDI of the dimension-reduced effective channel is only fed back to the BS. We first derive a lower bound of the average sum rate and then optimize the dimension of the outer precoder to maximize average sum rate.
We propose the machine learning framework for the dimension optimization based on a deep neural network (DNN); we determine the DNN architecture of the input, the hidden, and the output layers as well as training procedure. Our DNN architecture takes eigenvalues of covariance matrices of user groups as inputs and returns the structure of outer precoder, which represents the allocated dimensions for all user groups.
We evaluate our DNN model and show that our proposed machine learning based outer precoder dimension optimization improves the average sum-rate and achieves near-optimal performance.

The rest of this paper is organized as follows. We introduces our system model in Section 2 and describe our problem in Section 3. We introduce our previous work on the lower bound of the achievable sum rate in Section 4 and propose a machine learning framework for the dimension optimization in Section 5. We evaluate our proposed DNN model in Section 6 and conclude our paper in Section 7.

Notations: We use upper and lower case boldfaces to denote matrices and vectors, respectively. The notations

{(\cdot)}^{T}

and

{(\cdot)}^{†}

represent the transpose and complex conjugate transpose, respectively. In addition, the notations

E [\cdot]

and

Pr [\cdot]

denote the expectation and the probability, respectively.

2. System Model

Our system model is illustrated in Figure 1. We consider a single-cell multi-user massive MIMO downlink system with limited feedback, where a BS with M transmit antennas serves K single-antenna users simultaneously. Let

F (≜ [f_{1}, \dots, f_{K}]) \in C^{M \times K}

be a linear precoding matrix and

d (≜ [d_{1}, \dots, d_{K}]) \in C^{K \times 1}

be a data symbol vector for all

k \in {1, \dots, K}

. Then, the transmit signal vector at the BS denoted by

x \in C^{M \times 1}

is obtained from

x = F d

. The received signal denoted by

y \in C^{K \times 1}

becomes

y = H^{†} x + n = H^{†} F d + n,

(1)

where

H \in C^{M \times K}

is a concatenated channel matrix such that

H ≜ [h_{1}, \dots, h_{K}]

, where

h_{k} \in C^{M \times 1}

is the user k’s channel vector, and

n (≜ [n_{1}, \dots, n_{K}]) \in C^{K \times 1}

is additive white Gaussian noise.

In this paper, we assume that the BS only utilizes the channel direction information (CDI) for beamforming vector design to save the additional feedback overhead required for power allocation [6,7]. Therefore, when the total transmit signal power is P, the BS allocates equal powers to each user, i.e.,

E [{|d_{k}|}^{2}] = P / K

. Meanwhile, the beamforming vector for user k should satisfy

f_{k}^{†} f_{k} = 1

.

In our channel model, we consider the Rayleigh correlated channel such that

h_{k} \sim C N (0, R_{k})

, where

R_{k} \in C^{M \times M}

is a positive semi-definite channel covariance matrix represented by

R_{k} ≜ (U_{k}^{'}) (Λ_{k}^{'}) {(U_{k}^{'})}^{†}

with the singular value decomposition (SVD). We denote by

r_{k}

the number of non-zero singular values in

Λ_{k}^{'}

. Then, with the Karhunen–Loeve representation,

h_{k}

can be represented by [9,10,19]

h_{k} ≜ U_{k} Λ_{k}^{1 / 2} e_{k},

(2)

where

U_{k} \in C^{M \times r_{k}}

is a tall unitary matrix comprised of

r_{k}

column vectors in

U_{k}^{'}

corresponding to non-zero singular values,

Λ_{k} \in R^{r_{k} \times r_{k}}

is a diagonal matrix with

r_{k}

non-zero positive eigenvalues, and

e_{k} \in C^{r_{k} \times 1} \sim C N (0, I_{r_{k}})

.

Assuming the one-ring scattering model along with spatial correlation of the transmitter’s uniform linear array (ULA) antennas (as shown in Figure 1), the channel covariance matrix of the g-th group is given by [9,10,19]

{[R_{g}]}_{p, q} = \frac{1}{2 Δ_{g}} \int_{θ_{g} - Δ_{g}}^{θ_{g} + Δ_{g}} e^{- j 2 π \frac{δ_{a}}{λ_{c}} (p - q) sin θ} d θ,

(3)

where p and q denote

(p, q)

-th element of covariance matrix

R_{g}

,

λ_{c}

is the carrier wavelength,

δ_{a}

is the space between adjacent antennas, and

θ_{g}

and

Δ_{g}

represent the azimuth center angle and angular spread, respectively. With this model, total K users are divided into G groups according to the channel covariance matrices

{R_{g}}_{g = 1}^{G}

. We denote by

K_{g}

the number of users in the g-th group, so we have

K = \sum_{g = 1}^{G} K_{g}

. We assume that all users in the same group have the same channel covariance matrix, and the BS perfectly knows the covariance matrices of all user groups, i.e.,

R_{1}, \dots, R_{G}

.

3. Limited Feedback with Two-Stage Precoder

In this section, we first overview of the structure of the two-stage precoder and briefly explain the limited feedback method for the two-stage precoder. Then, we formulate our problem.

3.1. Two-Stage Precoder

We adopt the two-stage precoder to reduce complexity and feedback overhead induced by large number of antennas [10,19], which is illustrated in Figure 2. In this case, the linear precoder becomes

F ≜ TV

, where

T \in C^{M \times S}

is the outer precoder, which is for spatial division, while

V \in C^{S \times K}

is the inner precoder, which is for spatial multiplexing in each group. The outer and the inner precoders are given by

T ≜ [T_{1}, \dots, T_{G}]

and

V ≜ bldiag {V_{1}, \dots, V_{G}}

, respectively, where

T_{g} \in C^{M \times s_{g}}

and

V_{g} \in C^{s_{g} \times K_{g}}

are the outer and inner precoder of the g-th group, respectively, where “bldiag” denotes block-diagonalized matrix. Here, the dimension of effective channels, i.e.,

S ≜ \sum_{g = 1}^{G} s_{g}

, is a design parameter, where

s_{g}

is the reduced-dimension of the effective channel for the g-th group to be optimized.

We denote by

H_{g} ≜ [h_{g, 1}, \dots, h_{g, K_{g}}]

the channel matrix of the g-th group. Then, the concatenated channel matrix is represented by

H = [H_{1}, \dots, H_{G}]

. For given channel covariance matrices, the dimension-reduced effective channel after outer precoding is represented as follows:

\begin{matrix} H^{eff} ≜ T^{†} H ≜ [\begin{matrix} T_{1}^{†} H_{1} & T_{1}^{†} H_{2} & \dots & T_{1}^{†} H_{G} \\ T_{2}^{†} H_{1} & T_{2}^{†} H_{2} & \dots & T_{2}^{†} H_{G} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ T_{G}^{†} H_{1} & T_{G}^{†} H_{2} & \dots & T_{G}^{†} H_{G} \end{matrix}] . \end{matrix}

(4)

Note that it is difficult to acquire the CDI of the whole effective channel

H^{eff}

at the BS due to the heavy channel feedback overhead. Thus, we focus on a more practical approach with low computational complexity, which quantizes and feeds back only the dimension-reduced effective channel of each group, i.e.,

H_{g}^{eff} (≜ T_{g}^{†} H_{g}) \in C^{s_{g} \times K_{g}}

, whose dimension is determined by

s_{g}

. Consequently, the received signal at user k in the g-th group is given by

\begin{matrix} y_{g, k} = \sqrt{\frac{P}{K}} h_{g, k}^{†} T_{g} v_{g, k} d_{g, k} + \underset{same group interference (SGI)}{\underset{⏟}{\sqrt{\frac{P}{K}} h_{g, k}^{†} \sum_{i \neq k}^{K_{g}} T_{g} v_{g, i} d_{g, i}}} + \underset{inter-group interference (IGI)}{\underset{⏟}{\sqrt{\frac{P}{K}} h_{g, k}^{†} \sum_{c \neq g}^{G} \sum_{j = 1}^{K_{c}} T_{c} v_{c, j} d_{c, j}}} + n_{g, k}, \end{matrix}

(5)

where

v_{g, k}

and

v_{g, i}

are the beamforming vectors of user k and user

j (\neq k)

in the g-th group, respectively, which consist of the inner precoder for the g-th group represented by

V_{g} = [v_{g, 1}, \dots, v_{g, K_{g}}]

;

v_{c, j}

is the beamforming vector of the j-th user in group

c (\neq g)

, i.e.,

V_{c} = [v_{c, 1}, \dots, v_{c, K_{c}}]

;

n_{g, k}

is a complex Gaussian noise with zero mean and variance

σ^{2}

, i.e.,

n_{g, k} \sim C N (0, σ^{2})

. Note that the second and third terms on the right-hand side of (5) are the same-group interference (SGI) and inter-group interference (IGI), respectively.

Treating each group separately and exploiting only the CDI of the dimension-reduced effective channel of each group, the IGI should be cancelled out by the outer precoder only on the channel covariance matrices i.e.,

R_{1}, \dots, R_{G}

. Hence, the outer precoder for the g-th group is designed with the criterion given by

H_{c}^{†} T_{g} \approx 0 for all c \neq g .

(6)

According to the approximation (6), we adopt the block diagonalization (BD) method for the design structure of the outer precoder proposed in [10] in order to cancel out the IGI among the other groups, which constructs the precoder by the null-space of the channels of the other groups. Note that the BD method is a generalization of the zero-forcing channel-inversion for multi-user MIMO channels with linear processing [10,28]. With a similar approach used in [10], the outer precoder for the g-th group is designed as follows. We define the matrix

Y_{g} ≜ [U_{1}^{★}, \dots, U_{g - 1}^{★}, U_{g + 1}^{★}, \dots, U_{G}^{★}],

(7)

where

U_{c}^{★}

is an

M \times r_{c}^{★}

matrix comprised of the dominant eigenvectors of

U_{c}

, which is the eigenmatrix of the cth group covariance matrix such that

R_{c} = U_{c} Λ_{c} U_{c}^{†}

, i.e.,

r_{c}^{★} \leq r_{c}

. The dimension of

Y_{g}

is

M \times \sum_{c \neq g}^{G} r_{c}^{★}

with satisfying

\sum_{c \neq g}^{G} r_{c}^{★} \leq M

. Note that, if

\sum_{g = 1}^{G} r_{g} \leq M

, we can choose

r_{c}^{★} = r_{c}

to reflect the eigenvectors of the other groups exactly. Thus, we assume

r_{c}^{★} = r_{c},

(

\forall c \neq g

) in the definition (7) to construct

Y_{g}

for group

g \in {1, 2, \dots, G}

. Using the SVD,

Y_{g}

can be expressed as

Y_{g} = Ψ_{g} Σ_{g} {[Φ_{g}, Φ_{g}^{⊥}]}^{†},

(8)

where

Φ_{g}^{⊥}

is an

M \times (M - \sum_{c \neq g}^{G} r_{c})

sub-unitary matrix (

{(Φ_{g}^{⊥})}^{†} Φ_{g}^{⊥} = I_{M - \sum_{c \neq g}^{G} r_{c}}

) comprised of the orthogonal bases of the null-space of

Y_{g}

. After projecting onto the null-space, i.e.,

Span (Φ_{g}^{⊥})

, the covariance matrix of the projected channel is obtained by

\begin{matrix} {\bar{R}}_{g} & = {(Φ_{g}^{⊥})}^{†} R_{g} Φ_{g}^{⊥} \\ = {\bar{U}}_{g} {\bar{Λ}}_{g} {\bar{U}}_{g}^{†}, \end{matrix}

(9)

where (9) is obtained from the SVD. Selecting the dominant

s_{g}

(

\leq r_{g}

) eigenmodes of

{\bar{R}}_{g}

, we can construct the dimension-reduced effective channel for group g according to the BD method. Consequently, the outer precoder of the g-th group, i.e.,

T_{g} \in C^{M \times s_{g}}

, is given by

T_{g} = Φ_{g}^{⊥} {\bar{U}}_{g}^{★},

(10)

where

{\bar{U}}_{g}^{★}

contains the dominant

s_{g}

eigenvectors of

{\bar{U}}_{g}

, where

s_{g}

is the design parameter to be optimized. Note that it is satisfied that

T_{g}^{†} T_{g} = {({\bar{U}}_{g}^{★})}^{†} {(Φ_{g}^{⊥})}^{†} Φ_{g}^{⊥} {\bar{U}}_{g}^{★} = {({\bar{U}}_{g}^{★})}^{†} {\bar{U}}_{g}^{★} = I_{s_{g}}

.

For the inner precoder, we adopt the zero-forcing (ZF) beamforming to mitigate multiuser interference among the users in each group (i.e., SGI). The beamforming vector of user k in the g-th group is constructed by null-space of the effective channel vectors of all the other users in the g-th group. It is obvious that the minimum mean squared error (MMSE) type of precoder can achieve better performance than the ZF precoder. However, with the MMSE precoder, the optimal regularization parameter for the inner precoder design is dependent on the outer precoder design, so it is not easy to find. Moreover, we consider limited feedback environment, so the effects of channel quantization errors make the optimal regularization factor much more difficult to find. Thus, we consider the ZF precoder for inner precoder design thanks to its simplicity and analytical tractability. Note that the ZF beamforming is asymptotically optimal among all downlink beamforming strategies in a high SNR region [6,29], and it guarantees high spectral efficiency for large-scale antennas with low-complexity linear processing [2,9]. Then, the inner precoder for the g-th group is given by

\begin{matrix} V_{g} = [v_{g, 1}, \dots, v_{g, K_{g}}] \in C^{s_{g} \times K_{g}}, \end{matrix}

(11)

where

v_{g, k}

is a ZF beamforming vector of user k in the g-th group. Note that a beamforming vector

v_{g, k}

is normalized such that

{∥ v_{g, k} ∥}^{2} = 1

, since the two-stage beamforming vector for user k in the g-th group with the BD-based outer precoder, i.e.,

f_{k} = T_{g} v_{g, k}

, should satisfy

f_{k}^{†} f_{k} = 1

, which is given by

\begin{matrix} f_{k}^{†} f_{k} = v_{g, k}^{†} T_{g}^{†} T_{g} v_{g, k} = v_{g, k}^{†} I_{s_{g}} v_{g, k} = v_{g, k}^{†} v_{g, k} = {∥ v_{g, k} ∥}^{2} = 1, \end{matrix}

(12)

for equal power allocation at the BS in (1). The details of the construction of a ZF beamforming vector of user k in the g-th group is characterized in the following subsection.

3.2. Limited Feedback Method with a Two-Stage Precoder

In the previous section, the dimension of the effective channel of each user in group g is reduced to

s_{g}

by the outer precoder. Hence, we focus on the the limited feedback system to acquire the CDI of the dimension-reduced effective channel of each group, i.e.,

H_{g}^{eff}

. We define

h_{g, k}^{eff} \in C^{s_{g} \times 1}

the dimension-reduced effective channel of user k in the g-th group as

\begin{matrix} h_{g, k}^{eff} ≜ T_{g}^{†} h_{g, k} . \end{matrix}

(13)

Given the covariance matrix

R_{g}

and the outer precoder

T_{g}

, each user only needs to quantize its effective channel (

h_{g, k}^{eff}

) and feeds the quantized channel back to the BS.

For quantization of the effective channel, we adopt the random vector quantizer (RVQ), which is widely used to analyze the effects of quantization error and performs close to optimal quantization with a Rayleigh fading channel environment [6,7,9]. Using the RVQ, the Rayleigh component of the effective channel should be quantized according to (13). By the Karhunen–Loeve representation, the effective channel in (13) can be decomposed by the SVD [13], given as

\begin{matrix} h_{g, k}^{eff} = T_{g}^{†} U_{g} Λ_{g}^{1 / 2} e_{g, k} = Ω_{g} {\bar{Σ}}_{g} Γ_{g}^{†} e_{g, k} = Ω_{g} Σ_{g} w_{g, k}, \end{matrix}

(14)

where

Ω_{g} {\bar{Σ}}_{g} Γ_{g}^{†}

(

Ω_{g} \in C^{s_{g} \times s_{g}}, {\bar{Σ}}_{g} \in C^{s_{g} \times r_{g}}, Γ_{g} \in C^{r_{g} \times r_{g}}

) is SVD of

T_{g}^{†} U_{g} Λ_{g}^{1 / 2}

, and

Σ_{g} \in C^{s_{g} \times s_{g}}

is the matrix comprised of the first

s_{g}

columns of

{\bar{Σ}}_{g}

, i.e., the diagonal matrix with

s_{g}

non-zero positive eigenvalues, and

w_{g, k} \in C^{s_{g} \times 1}

is the vector with the first

s_{g}

elements of

Y_{g}^{†} e_{g, k}

. Note that

w_{g, k}

follows the distribution of

C N (0, I_{s_{g}})

, and we have

\begin{matrix} T_{g}^{†} R_{g} T_{g} = (Ω_{g} Σ_{g}) {(Ω_{g} Σ_{g})}^{†} \end{matrix}

(15)

due to the fact that

E [h_{g, k}^{eff} {(h_{g, k}^{eff})}^{†}] = T_{g}^{†} E [h_{g, k} h_{g, k}^{†}] T_{g} = T_{g}^{†} R_{g} T_{g}

from (13), and

E [h_{g, k}^{eff} {(h_{g, k}^{eff})}^{†}] = (Ω_{g} Σ_{g}) E [w_{g, k} w_{g, k}^{†}] {(Ω_{g} Σ_{g})}^{†} = (Ω_{g} Σ_{g}) {(Ω_{g} Σ_{g})}^{†}

from (14) and

w_{g, k} \sim C N (0, I_{s_{g}})

, respectively.

Allocating B-bits to each user’s feedback size and, using the RVQ, a quantization codebook for user k in the g-th group, i.e.,

C_{g, k} = {c_{g, k, 1}, \dots, c_{g, k, 2^{B}}}

, consists of

2^{B}

randomly chosen isotropic

s_{g}

-dimensional unit norm vectors. In this case, the quantized CDI of

w_{g, k}

in (14), i.e.,

{\hat{w}}_{g, k}

, is obtained by

\begin{matrix} {\hat{w}}_{g, k} & = \underset{c \in C_{g, k}}{arg~max} {cos}^{2} (∠ ({\tilde{w}}_{g, k}, c)) = \underset{c \in C_{g, k}}{arg~max} {|{\tilde{w}}_{g, k}^{†} c|}^{2}, \end{matrix}

(16)

where

{\tilde{w}}_{g, k} = \frac{w_{g, k}}{∥ w_{g, k} ∥}

. Then, the quantization error denoted by

Z_{g, k}^{\hat{w}} \in [0, 1]

is defined as

\begin{matrix} Z_{g, k}^{\hat{w}} ≜ 1 - {|{\tilde{w}}_{g, k}^{†} {\hat{w}}_{g, k}|}^{2} . \end{matrix}

(17)

For an arbitrary codeword

c \in C_{g, k}

in (16), the value

{|{\tilde{w}}_{g, k}^{†} c|}^{2}

follows the beta distribution with parameters

(s_{g} - 1, 1)

because it is the square of the inner product of two independent and isotropic unit-norm random vectors in

C^{s_{g}}

[6,7]. Consequently, a quantization error using B-bits RVQ, i.e.,

Z_{g, k}^{\hat{w}}

in (17), is the minimum of

2^{s_{g}}

independent beta distributed random variables, of which the complementary cumulative density function (CDF) is given by

\Pr [Z_{g, k}^{\hat{w}} > z] = {(1 - z^{s_{g} - 1})}^{2^{B}}

, and the expectation is bounded as [6,7]

\begin{matrix} E [Z_{g, k}^{\hat{w}}] < 2^{- \frac{B}{s_{g} - 1}} . \end{matrix}

(18)

After receiving the feedback of

{\hat{w}}_{g, k}

, the BS obtains the quantized CDI of the dimension-reduced effective channel according to (14) given by

\begin{matrix} {\hat{h}}_{g, k}^{eff} = \frac{Ω_{g} Σ_{g} {\hat{w}}_{g, k}}{∥Ω_{g} Σ_{g} {\hat{w}}_{g, k}∥}, \end{matrix}

(19)

and the quantization error of the dimension-reduced effective channel

{\hat{h}}_{g, k}^{eff}

denoted by

Z_{g, k}^{{\hat{h}}^{eff}} \in [0, 1]

is defined as

\begin{matrix} Z_{g, k}^{{\hat{h}}^{eff}} ≜ 1 - {|{({\tilde{h}}_{g, k}^{eff})}^{†} {\hat{h}}_{g, k}^{eff}|}^{2} . \end{matrix}

(20)

Note that the distribution of

Z_{g, k}^{{\hat{h}}^{eff}}

in (20) can be different from the distribution of

Z_{g, k}^{\hat{w}}

in (17) (i.e., beta distribution) since the quantized CDI of the dimension-reduced effective channel is projected by the covariance matrix and the outer precoder.

Based on the quantized CDI of the dimension-reduced effective channel of the g-th group, i.e.,

{\hat{H}}_{g}^{eff} = [{\hat{h}}_{g, 1}^{eff}, \dots, {\hat{h}}_{g, K_{g}}^{eff}] \in C^{s_{g} \times K_{g}}

, the BS constructs the inner precoder of the g-th group, i.e.,

V_{g} = [v_{g, 1}, \dots, v_{g, K_{g}}]

, where a ZF beamforming vector of user k in the g-th group, i.e.,

v_{g, k}

is obtained as

\begin{matrix} v_{g, k} = \frac{A_{g, k} {\hat{h}}_{g, k}^{eff}}{∥A_{g, k} {\hat{h}}_{g, k}^{eff}∥}, \end{matrix}

(21)

where

A_{g, k} ≜ A_{g, k}^{⊥} {(A_{g, k}^{⊥})}^{†} \in C^{s_{g} \times s_{g}}

is the null-space of the quantized CDI of the effective channels of all other users in group g, where

A_{g, k}^{⊥}

is a

s_{g} \times (s_{g} - K_{g} + 1)

submatrix consisted by orthonormal column vectors, which is obtained from the SVD of

{\hat{H}}_{g, - k}^{eff}

such as

\begin{matrix} {\hat{H}}_{g, - k}^{eff} = [A_{g, k}, A_{g, k}^{⊥}] Ξ_{g, k} L_{g, k}^{†}, \end{matrix}

(22)

where

{\hat{H}}_{g, - k}^{eff}

is the quantized CDI of the effective channels of all other users in group g, i.e.,

{\hat{H}}_{g, - k}^{eff} = [{\hat{h}}_{g, 1}^{eff}, \dots, {\hat{h}}_{g, k - 1}^{eff}, {\hat{h}}_{g, k + 1}^{eff}, \dots, {\hat{h}}_{g, K_{g}}^{eff}]

[11,29].

3.3. Problem Formulation

With the two-stage precoder and the quantized CDI of the effective channel, the signal-to-interference plus noise ratio (SINR) of user k in the g-th group is obtained from (5) as follows:

\begin{matrix} {SIN R}_{g, k} & = \frac{γ {|{(h_{g, k}^{eff})}^{†} v_{g, k}|}^{2}}{\underset{SGI}{\underset{⏟}{γ \sum_{i \neq k}^{K_{g}} {|{(h_{g, k}^{eff})}^{†} v_{g, i}|}^{2}}} + \underset{IGI}{\underset{⏟}{γ \sum_{c \neq g}^{G} \sum_{j = 1}^{K_{c}} {|h_{g, k}^{†} T_{c} v_{c, j}|}^{2}}} + 1}, \end{matrix}

(23)

where

γ ≜ \frac{P}{K σ^{2}}

is the signal-to-noise ratio (SNR) at each user. Then, the average sum rate denoted by

R_{_{sum}}

is given by

\begin{matrix} R_{_{sum}} = \sum_{g = 1}^{G} \sum_{k = 1}^{K_{g}} E [\log_{2} (1 + {SIN R}_{g, k})] . \end{matrix}

(24)

Analyzing (23) and (24), the quantized CDI of the effective channel, i.e.,

h_{g, k}^{eff}

, is only fed back to the BS, so the IGI term in

{SIN R}_{g, k}

is not affected by the quantized CDI. In this case, the IGI term is determined by the outer precoders for the other groups, i.e.,

{T_{c}}_{c \neq g}^{G}

. With exploiting the BD method for the outer precoder design as in (10), the performance of IGI cancellation, i.e.,

\sum_{c \neq g}^{G} \sum_{j = 1}^{K_{c}} {|h_{g, k}^{†} T_{c} v_{c, j}|}^{2}

, depends on the reduced dimension numbers of the effective channels of the other groups, i.e.,

{s_{c}}_{c \neq g}^{G}

. On the other hand, the magnitude of the desired signal term, i.e.,

{|{(h_{g, k}^{eff})}^{†} v_{g, k}|}^{2}

, and the performance of SGI cancellation, i.e.,

\sum_{i \neq k}^{K_{g}} {|{(h_{g, k}^{eff})}^{†} v_{g, i}|}^{2}

, are restricted by the dimension of the outer precoder of g-th group, i.e.,

s_{g}

[19]. Note that the quantization error of the effective channel, i.e.,

{\hat{h}}_{g, k}^{eff}

in (20), is more limited by

s_{g}

given feedback bit allocation, i.e., B-bits. Therefore, the reduced dimensions of the effective channels for all groups (i.e.,

s_{1}, \dots, s_{G}

) should be jointly optimized considering all the interactions among them in order to maximize the average sum rate in (24). Thus, we formulate the optimization problem to find the optimal

s_{1}, \dots, s_{G}

as follows [19]:

\begin{matrix} P 1 : \underset{s_{1}, \dots, s_{G}}{maximize} & \sum_{g = 1}^{G} \sum_{k = 1}^{K_{g}} E [\log_{2} (1 + {SIN R}_{g, k})], \\ subject to & s_{g} \in Z^{+}, \forall g = 1, \dots, G, \\ K_{g} \leq s_{g} \leq r_{g} . \end{matrix}

(25)

The optimization problem

P 1

is difficult to solve directly because it is a mixed-integer problem, which is generally known to be NP-hard. Moreover, the effect of the dimensions is implicit in the objective function, and the optimal solution can be obtained only by numerical search for every channel realization, which is almost impossible in practice. Note that, once the reduced dimensions

{s_{g}}_{g = 1}^{G}

are determined, the BS informs

s_{g}

to the users in the g-th group so that they can quantize

s_{g}

-dimensional effective channels with the corresponding codebook.

4. Our Previous Work on Dimension Optimization

As we explained in the previous section, the problem

P 1

is hard to solve because it is an NP-hard problem and the effect of the dimensions is implicit in the objective function. In this section, we briefly explain how we solved the problem

P 1

in our previous work [19].

In [19], we showed that the objective function of the problem

P 1

can be approximated [3] and then lower bounded as follows:

\begin{matrix} \sum_{g = 1}^{G} \sum_{k = 1}^{K_{g}} E [\log_{2} (1 + {SIN R}_{g, k})] \\ \approx \sum_{g = 1}^{G} \sum_{k = 1}^{K_{g}} \log_{2} (1 + \frac{E [γ {|{(h_{g, k}^{eff})}^{†} v_{g, k}|}^{2}]}{E [γ \sum_{i \neq k}^{K_{g}} {|{(h_{g, k}^{eff})}^{†} v_{g, i}|}^{2} + γ \sum_{c \neq g}^{G} \sum_{j = 1}^{K_{c}} {|h_{g, k}^{†} T_{c} v_{c, j}|}^{2} + 1]}) & (26) \\ > \sum_{g = 1}^{G} R_{g}^{'} (s_{1}, \dots, s_{G}), & (27) \end{matrix}

where

\begin{matrix} R_{g}^{'} & (s_{1}, \dots, s_{G}) ≜ \log_{2} (1 + \frac{γ \{Tr (T_{g}^{†} R_{g} T_{g}) - \sum_{j = 1}^{K_{g} - 1} λ_{j} (R_{g})\}}{γ \frac{K_{g} - 1}{s_{g} - 1} 2^{- \frac{B}{s_{g} - 1}} Tr (T_{g}^{†} R_{g} T_{g}) + γ \sum_{c \neq g}^{G} K_{c} Tr (T_{c}^{†} R_{g} T_{c}) + 1}), \end{matrix}

(28)

with

λ_{j} (R_{g})

the j-th largest eigenvalue of

R_{g}

. Thus, the effect of dimensions becomes explicit in (27). Note that, given

{s_{g}}_{g = 1}^{G}

, the BS obtains the lower bound of (27) based only on the covariance matrices

{R_{g}}_{g = 1}^{G}

and the outer precoders

{T_{g}}_{g = 1}^{G}

without the CDI of

s_{g}

-dimensional effective channels (i.e.,

h_{g, k}^{eff}

for all

g \in {1, \dots, G}

and

k \in {1, \dots, K_{g}}

).

For a practical design, we can establish using the lower bound (28) as follows:

\begin{matrix} P 2 : \underset{s_{1}, \dots, s_{G}}{maximize} & \sum_{g = 1}^{G} R_{g}^{'} (s_{1}, \dots, s_{G}), \\ subject to & s_{g} \in Z^{+}, \forall g = 1, \dots, G, \\ K_{g} \leq s_{g} \leq r_{g} . \end{matrix}

(29)

The problem

P 2

is also a mixed-integer problem, and obtaining the objective function of (29) in a closed-form is not available because the outer precoders

{T_{g}}_{g = 1}^{G}

are constructed by the procedure in (7)–(10) with respect to

{s_{g}}_{g = 1}^{G}

. Thus, the optimal solution to problem

P 2

is obtained by a combinatorial joint optimization of

s_{1}, \dots, s_{G}

, which requires the G-dimensional numerical search, so it is still complex.

To reduce complexity of the G-dimensional numerical search in problem

P 2

, we can assume that the dimensions of the effective channels among all groups are the same as s, i.e.,

s_{1} = \dots = s_{G} = s

. Then, the problem

P 2

is reduced to

\begin{matrix} P 3 : \underset{s}{maximize} & \sum_{g = 1}^{G} R_{g}^{'} (s), \\ subject to & s \in Z^{+} \\ max {K_{1}, \dots, K_{g}} \leq s \leq min {r_{1}, \dots, r_{G}}, \end{matrix}

(30)

and the optimal solution of the problem

P 3

can be obtained from one-dimensional numerical search [19]. We describe the detailed procedure to optimal the optimal solution of the problem

P 3

in Algorithm 1.

Algorithm 1 Finding the optimal solution of the problem

P 3

1:: Input: Channel covariance matrices ${R_{g}}_{g = 1}^{G}$ , number of users in the groups ${K_{g}}_{g = 1}^{G}$ , SNR $γ$ , and feedback bit B
2:: Obtain $U_{g}$ and $r_{g}$ for all $g \in {1, \dots, G}$ by SVD
3:: Define $s_{min} ≜ max {K_{1}, \dots, K_{g}}$ and $s_{max} ≜ min {r_{1}, \dots, r_{G}}$
4:: Set $s \in Z^{+}$ and $s_{min} \leq s \leq s_{max}$
5:: for $s = s_{min}, \dots, s_{max}$ do
6:: Construct outer precoder $T_{g} \in C^{M \times s}$ for all $g \in {1, \dots, G}$ (referring to Section 3.1)
7:: Compute $R_{g}^{'} (s)$ based on (28) for all $g \in {1, \dots, G}$
8:: Compute $\sum_{g = 1}^{G} R_{g}^{'} (s)$
9:: end for
10:: Obtain $s^{★} = \arg max_{s_{min} \leq s \leq s_{max}} R_{g}^{'} (s)$
11:: Output: The optimal dimension $s^{★}$

5. Machine Learning Framework for Dimension Optimization

In this section, we propose the machine learning framework for dimension optimization (i.e., the problem

P 1

). Note that our machine learning framework is based on a deep neural network (DNN) and tackles problem

P 1

directly.

5.1. Preliminary: The General DNN Architecture

Our proposed machine learning-based dimension optimization utilizes a DNN [27], so, in this subsection, we briefly explain a general DNN model. A DNN is one of the most popular algorithms in machine learning, which is considered as a multiple layer perceptron (MLP) [20,21,22]. The DNN is comprised of one input layer,

L - 2

hidden layers, and one output layer. The DNN with many hidden layers enhances its learning and mapping abilities, so it is capable of handling complicated nonlinear problems [20,21,22,25,26].

Let

w \in R^{N_{0}}

be the input vector of the input layer and

o \in R^{N_{(L - 1)}}

be the output vector of the output layer. Then, the mapping between them can be mathematically represented as follows:

\begin{matrix} o = f (w; θ) = f^{(L - 1)} (f^{(L - 2)} (\dots f^{(2)} (f^{(1)} (w; θ_{l})))), \end{matrix}

(31)

where

f^{(l)} (w; θ_{l})

for

l \in {1, 2, \dots, L - 1}

is an activation function of the l-th layer, which maps the input and output vectors of the l-th layer, and

θ = {θ_{1}, \dots, θ_{(L - 1)}}

is a set of parameters such as weights and biases to calculate the weighted sum of nodes, which can be adjusted during the training procedure [20,21,22]. Activation functions in (31) are prerequisite to tackle nonlinear problems in the DNN model. In particular, multiple nodes in each layer operate with the activate functions to map its input vector to the output vector with nonlinear operations. Various activation functions are listed in Table 1 and illustrated in Figure 3.

A set of parameters in (31), i.e.,

θ

, is adjusted in the training procedure to minimize the loss function [20,21,22]. In supervised learning, each training sample is labeled by the desired output

\bar{o}

(i.e., the correct answer), and hence the loss function between the DNN-based model output

o

and the desired output

\bar{o}

becomes

\begin{matrix} l o s s (θ) = \frac{1}{\bar{n}} \sum_{j = 1}^{\bar{n}} L_{f} ({\bar{o}}_{j}, o_{j}), \end{matrix}

(32)

where

{\bar{o}}_{j}

and

o_{j}

are the desired output and the predicted output from the j-th training sample, respectively. In addition,

\bar{n}

is the batch-size of training samples, where the batch-size means the total number of training samples in a mini batch.

There are mainly two types of loss functions for supervised learning in a DNN [20,21,22] as shown in Table 2.

For supervised learning, the ultimate goal of DNN design is to map

w

to

\bar{o}

from (31) based on training samples. Thus, training is the process of adjusting

θ

to minimize the loss function in (32). The widely used algorithm for training is a stochastic gradient descent (SGD) method [20,21,22,24], which updates its gradient for each layer to minimize the loss function at each step, i.e., mini batch. In addition, there are many modified ones such as SGD with momentum (SGDM), adaptive gradient (AdaGrad), and root mean square propagation (RMSProp) [20,21,22,30,31,32]. Adaptive moment estimation (Adam) is also widely used for DNN training.

5.2. The Proposed DNN Framework

The proposed DNN architecture is illustrated in Figure 4, where it is comprised of one input layer of

((M + 1) G + 2)

input nodes, one output layer of M output nodes, and

\bar{L}

hidden layers of

(N_{1}, \dots, N_{\bar{L}})

nodes. The input layer of the proposed DNN model takes each user’s SNR, feedback sizes, the number of users, and eigenvalues of each user group’s covariance matrix. In addition, the output layer returns the dimension for each group. Although the number of input nodes depends on the number of user groups, they are fixed in our design because the number of groups is given for the system model according to channel statistics. Meanwhile, the number of hidden layers and nodes are mediatable variables according to the DNN architecture.

5.2.1. Input Layer

To establish the DNN-based supervised learning framework, the input data of a learning system, i.e.,

w

in (31), should be determined considering the system model and the problem

P 1

. Owing to the insight from the lower bound

\sum_{g = 1}^{G} R_{g}^{'} (s_{1}, \dots, s_{G})

in Section 4, our proposed DNN model takes the input of SNR at each user (

γ

), feedback bit allocation (B), and the number of users (

K_{g}

) and eigenvalues of covariance matrix for all groups as follows:

\begin{matrix} w = {[B, γ, \underset{Group 1}{\underset{⏟}{{K_{1}, λ_{1, 1}, \dots, λ_{1, M}}}}, \dots, \underset{Group G}{\underset{⏟}{{K_{G}, λ_{G, 1}, \dots, λ_{G, M}}}}]}^{T} \in R^{(M + 1) G + 2}, \end{matrix}

(33)

where

{λ_{g, 1}, \dots, λ_{g, M}} \in R^{M}

for the g-th group can be divided by the

r_{g}

non-zero eigenvalues of the covariance matrix of the g-th group and

(M - r_{g})

zeros. For the feature information of the input, the eigenvalues characterize the optimal dimensions because the covariance matrices affect the objective function of problem

P 1

, which is statistically determined by the effective channels and quantization codebooks. Note that we adopt the eigenvalues rather than the covariance matrices itself in the view of principal component analysis (PCA) for the machine learning technique [22]. It reduces the complexity of the proposed DNN model from large-scale antennas because taking covariance matrices (

{R_{g}}_{g = 1}^{G}

) as the input requires

((M^{2} + 1) G + 2)

input nodes.

5.2.2. Output Layer

The proposed DNN framework aims at obtaining the solution of problem

P 1

, i.e.,

{s_{1}^{★}, \dots, s_{G}^{★}}

. The number of eigenvalues for the g-th group is variable dependent on the channel statistics, i.e., covariance matrices, whose range is

r_{g} \leq M

. Thus, the maximum number of classes for the optimal dimensions considering all groups becomes

M^{G}

, which is almost impossible to establish the DNN model in practice for massive MIMO systems. We adopt the parallel DNN framework for each group as shown in Figure 4. Then, the desired output vector

{\bar{o}}_{g}

for the g-th group is expressed by

\begin{matrix} {\bar{o}}_{g} = {[0, \dots, 0, \underset{(s_{g}^{★} - th component)}{1}, 0, \dots, 0]}^{T} \in R^{M}, \end{matrix}

(34)

where (34) is comprised of the

s_{g}^{★}

-th component one and the other components zero. For the output layer, the Softmax activation function is used for outputs in the interval

(0, 1)

. Thus, the position of 1 in the g-th output vector becomes the dimension for the g-th group.

5.2.3. Hidden Layer

The hidden layer is designed to learn the features of the training samples. They are redefined according to the system model to characterize the relationship between the input and the output layers. We adopt the tanh function as the activation function and determine the number of hidden layers and the number of nodes to increase the performance in the training procedure.

5.2.4. DNN Training

For the precise classification in the training to map the input of (33) into the desired output of (34), we need enough training samples that reflects many situations. Thus, we generate the training samples varying SNR, the feedback size of each user, and eigenvalues of the covariance matrices. In particular, we focus on randomly generating covariance matrices varying angular spread in (3) to learn the feature information of eigenvalues, i.e., channel statistics of the system model. With the fixed azimuth center angle in (3), the number of eigenvalues of covariance matrix increases as the angular spread increases. It also leads to variation of magnitude of the eigenvalues. Note that the number of groups and the number of users in each group are the system model parameters, so they are fixed in the DNN training.

Then, we obtain the optimal dimensions of the problem

P 1

by numerical search for every channel realization and label the samples according to (34) with perspective of the given SNR, feedback size, and covariance matrices of the groups. For the proposed DNN training, we adopt cross entropy as the loss function in (32), which is widely used for multi-class classification of DNN architecture [20,21,22]. Thus, the loss function of the g-th group DNN model is given by

\begin{matrix} l o s s (θ_{g}) & = \frac{1}{\bar{n}} \sum_{j = 1}^{\bar{n}} L_{f} ({\bar{o}}_{j}^{g}, o_{j}^{g}) \\ = - \frac{1}{\bar{n}} \sum_{j = 1}^{\bar{n}} \sum_{i = 1}^{M} ({\bar{o}}_{j, i}^{g} log o_{j, i}^{g} + (1 - {\bar{o}}_{j, i}^{g}) log (1 - o_{j, i}^{g})), \end{matrix}

(35)

where

\bar{n}

is the batch-size of the training samples, and j denotes the j-th training sample;

{\bar{o}}_{j}^{g}

and

o_{j}^{g}

are the desired output vector determined by

s_{g}^{★}

and (34), and the predicted output in the output layer obtained by the Softmax activation function of the g-th group;

{\bar{o}}_{j, i}^{g}

and

o_{j, i}^{g}

denote the i-th elements of

{\bar{o}}_{j}^{g}

and

o_{j}^{g}

, respectively.

For our DNN model training, we divided the samples into the training and the validation sets with the ratio of

0.85

and

0.15

, respectively. We use the Adam algorithm [32,33] for training, which is the combined and updated version of AdaGrad [30] and RMSProp [31]. To prevent under-fitting and over-fitting, we set the maximum number of epochs to 3000, the batch-size to 1000, and initial learning rate to

0.001

. The learning rate is adjusted by the drop factor of

0.5

for every 1000 epoch during the training. In addition, we used the

L_{2}

-regularization factor of

0.01

, gradient decay factor of

0.98

, and squared gradient decay factor of

0.99

. To prevent the over-fitting, we also adopt an early stopping strategy, which terminates the training when the maximum number of epochs or validation patience criterion is reached. The validation patience is the number of times that the loss on the validation set is larger than or equal to the previously obtained smallest loss [33]. We adopt the validation check with every five epochs and validation patience criterion of eight.

6. DNN Performance and Numerical Results

In this section, we evaluate our proposed DNN framework. We consider a single-cell massive MIMO systems with limited feedback, where the BS is equipped with 64 ULA antennas (i.e.,

M = 64

). The BS serves 15 single-antenna users (i.e.,

K = 15

) that can be divided into three groups (i.e.,

G = 3

) with 5 users in each group (i.e.,

K_{g} = 5

for

g \in {1, 2, 3})

. For the channel model, we consider the Ralyeigh correlated channel with

θ_{g} = - \frac{π}{4} + \frac{π}{4} (g - 1)

and

\frac{δ_{a}}{λ_{c}} = 0.5

in (3). As mentioned in Section 5.2.4, we generate covariance matrices varying angular spread, where

Δ_{g}

is randomly generated in the range of

Δ_{g} \in [\frac{2 π}{45}, \frac{π}{9}]

. We denote the number of eigenvalues of the g-th group covariance matrix by

r_{g}

(i.e., rank). With this setting, the ranges of each group’s rank are given by

r_{1} \in {10, \dots, 20}

,

r_{2} \in {13, \dots, 27}

, and

r_{3} \in {10, \dots, 20}

, respectively. To obtain the average sum rate of

R_{_{sum}}

in (24), we average out channel realizations based on RVQ codebooks varying small-scale Rayleigh channel fading

e_{g, k}

in (2) with the fixed covariance matrices

{R_{g}}_{g = 1}^{3}

.

First of all, we establish our proposed DNN framework and train our machine learning model for our system model setting. Our DNN model operates in parallel and thus is trained for each group as explained in Section 5.2. Thus, there are a total of three DNN models of the same structure at the BS. The DNN model of each group is comprised of the input layer of 197 nodes, the output layer of 64 nodes, and six hidden layers of 200 nodes each. For the DNN training, we generate

4 \times 10^{5}

training samples varying each user’s SNR, feedback sizes, and covariance matrices as the optimal dimensions (i.e.,

{s_{g}^{★}}_{g = 1}^{3}

) are obtained by every training sample according to the problem

P 1

by numerical search. To train our DNN model for each group, each sample for the g-th group DNN model is labeled by its own desired output, i.e.,

{\bar{o}}_{g}

in (34) according to

s_{g}^{★}

. To improve the training performance, the samples are are divided into the training set and the validation set, and we adopt the Adam algorithm with the parameter setting as explained in Section 5.2.4.

In Figure 5, we show the training state and performance of the proposed DNN model for every group. Figure 5a shows the cross entropy loss of the training and validation set with respect to the number of iterations. The cross entropy losses for all groups decrease as the number of iterations increases, but the gap between the training and validation sets also increases as the number of iterations increases. Thus, the stopping strategy is required to terminate the training to avoid over-fitting as we explained in Section 5.2.4. The losses of group 1 and group 3 are similar, while the loss of group 2 is larger than them. However, the gap between them decreases as the number of iterations increases.

Figure 5b shows the accuracy of our machine learning model for the training and the validation sets with respect to the number of iterations. The accuracy is measured with the ratio of the numbers of correct answers (i.e.,

o = \bar{o}

) to wrong answers (i.e.,

o \neq \bar{o}

). The accuracy increases as the number of iterations increases as we can expect from Figure 5a. The final accuracies from the training set and the validation set become

(84.7, 82.0, 85.7) %

, and

(83.2, 78.4, 84.3) %

, respectively. Figure 5c shows the accuracy for all samples in the validation set with respect to the number of ranks for the covariance matrix of each group. Thus, we can conclude that the our proposed DNN model is well trained and learn the features from channel statistics information.

Figure 6 compares the average sum rate of the proposed DNN model-based scheme and other reference schemes, where the feedback size are 6 bits and 10 bits in Figure 6a,b, respectively. For comparison, we consider the following five reference schemes:

Optimal scheme: The optimal dimensions of the outer precoder are obtained via brute-force numerical search.
Full rank-based scheme: The dimensions of the outer precoder are the number of ranks of covariance matrices.
Lower bound-based scheme: The dimensions of the outer precoder are obtained by the lower bound-based analysis and the solution of problem $P 3$ .
Fixed dimension scheme 1: The outer precoder dimensions are fixed to five, i.e., $s_{1} = \dots = s_{G} = 5$ .
Fixed dimension scheme 2: The outer precoder dimensions are fixed to eight, i.e., $s_{1} = \dots = s_{G} = 8$ .

For performance comparison, we average out 100 covariance realizations varying the angular spread. In Figure 6a, the proposed scheme outperforms the reference schemes and is comparable to the optimal scheme. The reason is that the dimensions obtained by the proposed DNN framework are closer to the optimal dimensions. Note that the lower bound-based scheme increases the average sum rate compared to the other reference schemes, (i.e., full rank-based scheme, and fixed dimension scheme 1 and 2), and it is lower bounded on both the optimal and the proposed schemes. In Figure 6b, as expected, the proposed scheme outperforms the reference schemes and achieves near-optimal performance. The gap between the proposed scheme and other reference schemes is different from those from the gap in Figure 6a. This is because the optimal dimensions are affected by feedback bit allocations. Consequently, we conclude that the dimension of the outer precoder is optimized to maximize the average sum rate and the proposed DNN model-based scheme performs well and is comparable to the optimal scheme.

7. Conclusions

In this paper, we optimized the dimension of the outer precoder in the two-stage precoder to maximize the average sum rate in massive MIMO systems when feedback size is limited. We proposed the DNN framework to find the optimal dimensions in order to maximize the average sum rate, where the original problem is an NP-hard problem. We established the DNN-based supervised learning framework, which takes the SNR at each user, feedback bit allocation, and eigenvalues of covariances of user groups as inputs and returns the optimal dimensions allocating for all user groups. The numerical results showed that our proposed machine learning based outer precoder dimension optimization improves the average sum-rate and achieves near-optimal performance using brute-forcing searching, which is not feasible in practice. Although we consider single-antennas users in our system model, our proposed scheme can be extended for the multi-antenna users. In this case, for inner precoder design, we can use block diagonalization (BD) to cancel the inter-user interference instead of the ZF precoder. However, it is not easy to design the matrix codebooks, which are shared among the transmitter and the users for limited feedback. The extensions of our proposed scheme for more general system models are our on-going research topics.

Author Contributions

Conceptualization, J.K., J.H.L. and W.C.; methodology, J.K. and J.H.L.; software, J.K.; validation, J.K., J.H.L. and W.C.; formal analysis, J.K.; investigation, J.K.; resources, J.H.L. and W.C.; data curation, J.K.; writing—original draft preparation, J.K.; writing—review and editing, J.H.L. and W.C.; visualization, J.K.; supervision, W.C.; project administration, W.C.; funding acquisition, W.C.

Funding

This work has been supported by the Future Combat System Network Technology Research Center program of Defense Acquisition Program Administration and Agency for Defense Development (UD160070BD).

Conflicts of Interest

The authors declare no conflict of interest.

References

Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 2014, 52, 186–195. [Google Scholar] [CrossRef] [Green Version]
Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. An overview of massive MIMO: Benefits and challenge. IEEE J. Sel. Top. Signal Process. 2014, 8, 742–758. [Google Scholar]
Zhang, Q.; Jin, S.; Wong, K.K.; Zhu, H.; Matthaiou, M. Power scaling of uplink massive MIMO systems with arbitrary-rank channel means. IEEE J. Sel. Top. Signal Process. 2014, 8, 966–981. [Google Scholar] [CrossRef]
Ngo, H.Q.; Larsson, E.G.; Marzetta, T.L. Energy and spectral efficiency of very large multiuser MIMO systems. IEEE Trans. Commun. 2013, 61, 1436–1449. [Google Scholar]
Marzetta, T.L. How much training is required for multiuser MIMO? In Proceedings of the 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 29 October–1 November 2006; pp. 359–363. [Google Scholar]
Jindal, N. MIMO broadcast channels with finite-rate feedback. IEEE Trans. Inf. Theory 2006, 52, 5045–5060. [Google Scholar] [CrossRef]
Lee, J.H.; Choi, W. Unified codebook design for vector channel quantization in MIMO broadcast channels. IEEE Trans. Signal Process. 2015, 63, 2509–2519. [Google Scholar] [CrossRef]
Xie, H.; Gao, F.; Zhang, S.; Jin, S. A unified transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model. IEEE Trans. Veh. Technol. 2017, 66, 3170–3184. [Google Scholar] [CrossRef]
Wagner, S.; Couillet, R.; Debbah, M.; Slock, D.T.M. Large system analysis of linear precoding in correlated MISO broadcast channels under limited feedback. IEEE Trans. Inf. Theory 2012, 58, 4509–4537. [Google Scholar] [CrossRef]
Adhikary, A.; Nam, J.; Ahn, J.-Y.; Caire, G. Joint Spatial division and multiplexing—The large-scale array regime. IEEE Trans. Inf. Theory 2013, 59, 6441–6463. [Google Scholar] [CrossRef]
Kim, D.; Lee, G.; Sung, Y. Two-stage beamformer design for massive MIMO downlink by trace quotient formulation. IEEE Trans. Commun. 2015, 63, 2200–2211. [Google Scholar] [CrossRef]
Park, J.; Clerckx, B. Multi-user linear precoding for multi-polarized Massive MIMO system under imperfect CSIT. IEEE Trans. Wirel. Commun. 2015, 14, 2532–2547. [Google Scholar] [CrossRef]
Jeon, Y.-S.; Min, M. Large system analysis of two-stage beamforming with limited feedback in FDD massive MIMO systems. IEEE Trans. Veh. Technol. 2018, 67, 4984–4997. [Google Scholar] [CrossRef]
Sohrabi, F.; Yu, W. Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J. Sel. Top. Signal Process. 2016, 10, 501–513. [Google Scholar] [CrossRef]
Lin, Y.-P. Hybrid MIMO-OFDM beamforming for wideband mmWave channels without instantaneous feedback. IEEE Access 2017, 5, 21806–21817. [Google Scholar] [CrossRef]
Castanheira, D.; Lopes, P.; Silva, A.; Gameiro, A. Hybrid beamforming designs for massive MIMO millimeter-wave heterogeneous systems. IEEE Access 2017, 5, 21806–21817. [Google Scholar] [CrossRef]
Magueta, R.; Castanheira, D.; Silva, A.; Dinis, R.; Gameiro, A. Hybrid iterative space-time equalization for multi-user mmW massive MIMO systems. IEEE Trans. Commun. 2017, 65, 608–620. [Google Scholar] [CrossRef]
Castañeda, E.; Castanheira, D.; Silva, A.; Gameiro, A. Parametrization and applications of precoding reuse and downlink interference alignment. IEEE Trans. Wirel. Commun. 2017, 16, 2641–2650. [Google Scholar] [CrossRef]
Kang, J.; Lee, J.H.; Choi, W. Two-stage precoder for massive MIMO systems with limited feedback. In Proceedings of the 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom), Ho Chi Minh, Vietnam, 29–31 January 2018; pp. 91–96. [Google Scholar]
Klaine, P.V.; Imran, M.A.; Onireti, O.; Souza, R.D. A survey of machine learning techniques applied to self-organizing cellular networks. IEEE Commun. Surv. Tutor. 2017, 19, 2392–2431. [Google Scholar] [CrossRef]
Mao, Q.; Hu, F.; Hao, Q. Deep Learning for Intelligent Wireless Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2018, 20, 2595–2621. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
Joung, J. Machine learning-based antenna selection in wireless communications. IEEE Commun. Lett. 2016, 20, 2241–2244. [Google Scholar] [CrossRef]
Kwon, H.J.; Lee, J.H.; Choi, W. Machine learning-based beamforming in two-user MISO interference channels. In Proceedings of the 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Okinawa, Japan, 11–12 February 2019; pp. 496–499. [Google Scholar]
Huang, H.; Yang, J.; Huang, H.; Song, Y.; Gui, G. Deep learning for super-resolution channel estimation and DOA estimation based massive MIMO system. IEEE Trans. Veh. Technol. 2018, 67, 8549–8560. [Google Scholar] [CrossRef]
Huang, H.; Song, Y.; Yang, J.; Gui, G.; Adachi, F. Deep-learning-based millimeter-wave massive for hybrid precoding. IEEE Trans. Veh. Technol. 2019, 68, 3027–3032. [Google Scholar] [CrossRef]
Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
Spencer, Q.H.; Swindlehurst, A.L.; Haardt, M. Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels. IEEE Trans. Signal Process. 2004, 52, 461–471. [Google Scholar] [CrossRef]
Li, P.; Paul, D.; Narasimhan, R.; Cioffi, J. On the distribution of SINR for the MMSE MIMO receiver and performance analysis. IEEE Trans. Inf. Theory 2006, 52, 271–286. [Google Scholar]
Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
MathWorks. Deep Learning Toolbox. Available online: https://www.mathworks.com/products/deep-learning.html (accessed on 18 July 2019).

Figure 1. Illustration of our system model.

Figure 2. Illustration of the two-stage precoder with limited feedback.

Figure 3. Illustration of activation functions. (a) sigmoid; (b) hyperbolic tangent (tanh); (c) rectified linear unit (ReLU); (d) symmetric saturated linear unit (SSaLU).

Figure 4. The proposed DNN framework.

Figure 5. Training state and performance of the proposed DNN model. (a) the cross entropy loss with respect to the number of iterations; (b) the accuracy with respect to the number of iterations; (c) the accuracy with respect to the number of ranks.

Figure 6. Comparison of the average sum rates for various schemes. (a) feedback bit allocation: 6 bits; (b) feedback bit allocation: 10 bits.

Table 1. Mathmatical expressions for activation functions.

Name	Sigmoid	tanh	ReLU	SSaLU	Softmax
$f (w)$	$\frac{1}{1 + e^{- w}}$	$\frac{1 - e^{- w}}{1 + e^{- w}}$	$max (0, w)$	$\{\begin{matrix} max (- 1, w) & w < 0 \\ min (1, w) & w \geq 0 \end{matrix}$	$\frac{e^{w}}{\sum_{j} e^{w_{j}}}$

Table 2. Mathematical expressions for loss functions (i denotes i-th element of a vector).

Name	Mean Square Error	Cross Entropy
$L_{f} (\bar{o}, o)$	${∥ \bar{o} - o ∥}^{2}$	$- \sum_{i} ({\bar{o}}_{i} log o_{i} + (1 - {\bar{o}}_{i}) log (1 - o_{i}))$

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kang, J.; Lee, J.H.; Choi, W. Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback. Appl. Sci. 2019, 9, 2894. https://doi.org/10.3390/app9142894

AMA Style

Kang J, Lee JH, Choi W. Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback. Applied Sciences. 2019; 9(14):2894. https://doi.org/10.3390/app9142894

Chicago/Turabian Style

Kang, Jinho, Jung Hoon Lee, and Wan Choi. 2019. "Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback" Applied Sciences 9, no. 14: 2894. https://doi.org/10.3390/app9142894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback

Abstract

1. Introduction

2. System Model

3. Limited Feedback with Two-Stage Precoder

3.1. Two-Stage Precoder

3.2. Limited Feedback Method with a Two-Stage Precoder

3.3. Problem Formulation

4. Our Previous Work on Dimension Optimization

5. Machine Learning Framework for Dimension Optimization

5.1. Preliminary: The General DNN Architecture

5.2. The Proposed DNN Framework

5.2.1. Input Layer

5.2.2. Output Layer

5.2.3. Hidden Layer

5.2.4. DNN Training

6. DNN Performance and Numerical Results

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI