Next Article in Journal
Game theory-based Routing for Wireless Sensor Networks: A Comparative Survey
Next Article in Special Issue
A Reinforcement-Learning-Based Distributed Resource Selection Algorithm for Massive IoT
Previous Article in Journal
Wrist Rehabilitation System Using Augmented Reality for Hemiplegic Stroke Patient Rehabilitation: A Feasibility Study
Previous Article in Special Issue
Payload-Based Traffic Classification Using Multi-Layer LSTM in Software Defined Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback

1
School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
2
Department of Electronics Engineering and Applied Communications Research Center, Hankuk University of Foreign Studies, Yongin 17035, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(14), 2894; https://doi.org/10.3390/app9142894
Submission received: 12 June 2019 / Revised: 15 July 2019 / Accepted: 17 July 2019 / Published: 19 July 2019

Abstract

:
A two-stage precoder is widely considered in frequency division duplex massive multiple-input and multiple-output (MIMO) systems to resolve the channel feedback overhead problem. In massive MIMO systems, users on a network can be divided into several user groups of similar spatial antenna correlations. Using the two-stage precoder, the outer precoder reduces the channel dimensions mitigating inter-group interferences at the first stage, while the inner precoder eliminates the smaller dimensions of intra-group interferences at the second stage. In this case, the dimension of effective channel reduced by outer precoder is important as it leverages the inter-group interference, the intra-group interference, and the performance loss from the quantized channel feedback. In this paper, we propose the machine learning framework to find the optimal dimensions reduced by the outer precoder that maximizes the average sum rate, where the original problem is an NP-hard problem. Our machine learning framework considers the deep neural network, where the inputs are channel statistics, and the outputs are the effective channel dimensions after outer precoding. The numerical result shows that our proposed machine learning-based dimension optimization achieves the average sum rate comparable to the optimal performance using brute-forcing searching, which is not feasible in practice.

1. Introduction

Massive multiple-input and multiple-output (MIMO) is one of the most promising technologies for the next-generation wireless mobile communication systems [1,2,3,4]. The large-scale antennas equipped at a base station (BS) can considerably improve the data rate, energy efficiency, and reliability; the theoretical results show that it perfectly cancels out inter-user interference with simple linear transceivers. In this case, the accurate knowledge of channel state information (CSI) is crucial at the BS to achieve the potential gain from large-scale antennas [1,2,3,4]. In time division duplex (TDD) systems, the BS can acquire CSI during uplink training thanks to channel reciprocity [4,5], and, hence, many existing works on massive MIMO systems consider TDD systems. However, many of current wireless mobile communication systems are frequency division duplexing (FDD) systems [6,7], whose uplink and downlink channels are independent from each other, which motivates research to achieve the potential gains also in FDD systems [8,9,10,11,12,13].
In FDD systems, a BS obtains the CSI from users’ channel feedback due to the lack of reciprocity between the uplink and the downlink channels [6,7,8,9,10,11,12,13]. In massive MIMO systems, the CSI feedback overhead problem becomes more severe because of the large number of antennas; it was already shown that the feedback size should linearly scale with the number of antennas to fully obtain the multiplexing gain [6,7]. To resolve the feedback overhead problem, a two-stage precoder was widely used for FDD massive MIMO systems, which consists of the outer and the inner precoders [10,11,12,13]. In the two-stage precoder, the outer precoder projects the original channel space of large dimensions onto a smaller dimensional subspace, and then the inner precoder controls the inter-user interference as multiuser MIMO precoding. There are various types of two-stage precoder designs in massive MIMO systems, and, among them, the hybrid architecture is widely considered for mmWave bands due to the limitation of hardware implementation [14,15,16,17]. As a full digital architecture has many benefits in sub-6 GHz bands, there are also many papers that consider the full digital architecture in massive MIMO systems [10,11,12,13]. Thus, we mainly consider a joint spatial division and multiplexing (JSDM) [10] with the full digital architecture in massive MIMO systems with a correlated channel environment.
The key idea of JSDM is to divide whole users into multiple user groups according to the channel covariance matrices, and then the outer precoder and the inner precoder sequentially mitigate inter-group and inter-user interference, respectively [10,18]. At the first stage, the outer precoder mitigates the inter-group interference (IGI) by projecting the original channel into a smaller dimensional subspace. Then, the inner precoder cancels out the same-group interference (SGI) with the dimension-reduced effective channels produced by the outer precoder; in this case, the BS exploits the quantized versions of dimension-reduced effective channels obtained via limited feedback. Therefore, the outer precoder design is important as it leverages many performance achieving factors such as the inter-group interference, the intra-group interference, and the channel quantization error. This motivates the sophisticated outer precoder design taking into account all of these factors. In this context, we optimized the dimension of outer precoder in [19] for a downlink massive MIMO system with limited feedback based on the lower bound-based analysis.
Meanwhile, machine learning recently has attracted a lot of attention in wireless communication systems [20,21]. The machine learning technique has shown good performance in many applications from image processing to economics [20,21,22]. In addition, machine learning is applied to the physical layer processing such as antenna selection and beamforming design in MIMO systems [23,24], and channel estimation and hybrid precoding for massive MIMO systems [25,26]. In particular, deep learning [27], which is one of key machine learning techniques, is tackling complicated nonlinear problems and high-computation issues in many areas [22] and overwhelms many existing schemes [22,24,25,26].
In this paper, we develop our initial work [19] on the dimension optimization for the outer precoder design with the machine learning framework. Our contributions can be summarized as follows:
  • We introduce our two-stage precoder design with limited feedback, where the quantized CDI of the dimension-reduced effective channel is only fed back to the BS. We first derive a lower bound of the average sum rate and then optimize the dimension of the outer precoder to maximize average sum rate.
  • We propose the machine learning framework for the dimension optimization based on a deep neural network (DNN); we determine the DNN architecture of the input, the hidden, and the output layers as well as training procedure. Our DNN architecture takes eigenvalues of covariance matrices of user groups as inputs and returns the structure of outer precoder, which represents the allocated dimensions for all user groups.
  • We evaluate our DNN model and show that our proposed machine learning based outer precoder dimension optimization improves the average sum-rate and achieves near-optimal performance.
The rest of this paper is organized as follows. We introduces our system model in Section 2 and describe our problem in Section 3. We introduce our previous work on the lower bound of the achievable sum rate in Section 4 and propose a machine learning framework for the dimension optimization in Section 5. We evaluate our proposed DNN model in Section 6 and conclude our paper in Section 7.
Notations: We use upper and lower case boldfaces to denote matrices and vectors, respectively. The notations ( · ) T and ( · ) represent the transpose and complex conjugate transpose, respectively. In addition, the notations E [ · ] and Pr [ · ] denote the expectation and the probability, respectively.

2. System Model

Our system model is illustrated in Figure 1. We consider a single-cell multi-user massive MIMO downlink system with limited feedback, where a BS with M transmit antennas serves K single-antenna users simultaneously. Let F ( [ f 1 , , f K ] ) C M × K be a linear precoding matrix and d ( [ d 1 , , d K ] ) C K × 1 be a data symbol vector for all k { 1 , , K } . Then, the transmit signal vector at the BS denoted by x C M × 1 is obtained from x = F d . The received signal denoted by y C K × 1  becomes
y = H x + n = H F d + n ,
where H C M × K is a concatenated channel matrix such that H [ h 1 , , h K ] , where h k C M × 1 is the user k’s channel vector, and  n ( [ n 1 , , n K ] ) C K × 1 is additive white Gaussian noise.
In this paper, we assume that the BS only utilizes the channel direction information (CDI) for beamforming vector design to save the additional feedback overhead required for power allocation [6,7]. Therefore, when the total transmit signal power is P, the BS allocates equal powers to each user, i.e.,  E [ d k 2 ] = P / K . Meanwhile, the beamforming vector for user k should satisfy f k f k = 1 .
In our channel model, we consider the Rayleigh correlated channel such that h k C N ( 0 , R k ) , where R k C M × M is a positive semi-definite channel covariance matrix represented by R k ( U k ) ( Λ k ) ( U k ) with the singular value decomposition (SVD). We denote by r k the number of non-zero singular values in Λ k . Then, with the Karhunen–Loeve representation, h k can be represented by [9,10,19]
h k U k Λ k 1 / 2 e k ,
where U k C M × r k is a tall unitary matrix comprised of r k column vectors in U k corresponding to non-zero singular values, Λ k R r k × r k is a diagonal matrix with r k non-zero positive eigenvalues, and  e k C r k × 1 C N ( 0 , I r k ) .
Assuming the one-ring scattering model along with spatial correlation of the transmitter’s uniform linear array (ULA) antennas (as shown in Figure 1), the channel covariance matrix of the g-th group is given by [9,10,19]
[ R g ] p , q = 1 2 Δ g θ g Δ g θ g + Δ g e j 2 π δ a λ c ( p q ) sin θ d θ ,
where p and q denote ( p , q ) -th element of covariance matrix R g , λ c is the carrier wavelength, δ a is the space between adjacent antennas, and  θ g and Δ g represent the azimuth center angle and angular spread, respectively. With this model, total K users are divided into G groups according to the channel covariance matrices { R g } g = 1 G . We denote by K g the number of users in the g-th group, so we have K = g = 1 G K g . We assume that all users in the same group have the same channel covariance matrix, and the BS perfectly knows the covariance matrices of all user groups, i.e.,  R 1 , , R G .

3. Limited Feedback with Two-Stage Precoder

In this section, we first overview of the structure of the two-stage precoder and briefly explain the limited feedback method for the two-stage precoder. Then, we formulate our problem.

3.1. Two-Stage Precoder

We adopt the two-stage precoder to reduce complexity and feedback overhead induced by large number of antennas [10,19], which is illustrated in Figure 2. In this case, the linear precoder becomes F TV , where T C M × S is the outer precoder, which is for spatial division, while V C S × K is the inner precoder, which is for spatial multiplexing in each group. The outer and the inner precoders are given by T [ T 1 , , T G ] and V bldiag { V 1 , , V G } , respectively, where T g C M × s g and V g C s g × K g are the outer and inner precoder of the g-th group, respectively, where “bldiag” denotes block-diagonalized matrix. Here, the dimension of effective channels, i.e.,  S g = 1 G s g , is a design parameter, where s g is the reduced-dimension of the effective channel for the g-th group to be optimized.
We denote by H g [ h g , 1 , , h g , K g ] the channel matrix of the g-th group. Then, the concatenated channel matrix is represented by H = [ H 1 , , H G ] . For given channel covariance matrices, the dimension-reduced effective channel after outer precoding is represented as follows:
H eff T H T 1 H 1 T 1 H 2 T 1 H G T 2 H 1 T 2 H 2 T 2 H G T G H 1 T G H 2 T G H G .
Note that it is difficult to acquire the CDI of the whole effective channel H eff at the BS due to the heavy channel feedback overhead. Thus, we focus on a more practical approach with low computational complexity, which quantizes and feeds back only the dimension-reduced effective channel of each group, i.e.,  H g eff ( T g H g ) C s g × K g , whose dimension is determined by s g . Consequently, the received signal at user k in the g-th group is given by
y g , k = P K h g , k T g v g , k d g , k + P K h g , k i k K g T g v g , i d g , i same group interference ( SGI ) + P K h g , k c g G j = 1 K c T c v c , j d c , j inter-group interference ( IGI ) + n g , k ,
where v g , k and v g , i are the beamforming vectors of user k and user j ( k ) in the g-th group, respectively, which consist of the inner precoder for the g-th group represented by V g = [ v g , 1 , , v g , K g ] ; v c , j is the beamforming vector of the j-th user in group c ( g ) , i.e.,  V c = [ v c , 1 , , v c , K c ] ; n g , k is a complex Gaussian noise with zero mean and variance σ 2 , i.e.,  n g , k C N ( 0 , σ 2 ) . Note that the second and third terms on the right-hand side of (5) are the same-group interference (SGI) and inter-group interference (IGI), respectively.
Treating each group separately and exploiting only the CDI of the dimension-reduced effective channel of each group, the IGI should be cancelled out by the outer precoder only on the channel covariance matrices i.e.,  R 1 , , R G . Hence, the outer precoder for the g-th group is designed with the criterion given by
H c T g 0 for all c g .
According to the approximation (6), we adopt the block diagonalization (BD) method for the design structure of the outer precoder proposed in [10] in order to cancel out the IGI among the other groups, which constructs the precoder by the null-space of the channels of the other groups. Note that the BD method is a generalization of the zero-forcing channel-inversion for multi-user MIMO channels with linear processing [10,28]. With a similar approach used in [10], the outer precoder for the g-th group is designed as follows. We define the matrix
Y g [ U 1 , , U g 1 , U g + 1 , , U G ] ,
where U c is an M × r c matrix comprised of the dominant eigenvectors of U c , which is the eigenmatrix of the cth group covariance matrix such that R c = U c Λ c U c , i.e.,  r c r c . The dimension of Y g is M × c g G r c with satisfying c g G r c M . Note that, if g = 1 G r g M , we can choose r c = r c to reflect the eigenvectors of the other groups exactly. Thus, we assume r c = r c , ( c g ) in the definition (7) to construct Y g for group g { 1 , 2 , , G } . Using the SVD, Y g can be expressed as
Y g = Ψ g Σ g [ Φ g , Φ g ] ,
where Φ g is an M × ( M c g G r c ) sub-unitary matrix ( ( Φ g ) Φ g = I M c g G r c ) comprised of the orthogonal bases of the null-space of Y g . After projecting onto the null-space, i.e.,  Span ( Φ g ) , the covariance matrix of the projected channel is obtained by
R ¯ g = Φ g R g Φ g = U ¯ g Λ ¯ g U ¯ g ,
where (9) is obtained from the SVD. Selecting the dominant s g ( r g ) eigenmodes of R ¯ g , we can construct the dimension-reduced effective channel for group g according to the BD method. Consequently, the outer precoder of the g-th group, i.e.,  T g C M × s g , is given by
T g = Φ g U ¯ g ,
where U ¯ g contains the dominant s g eigenvectors of U ¯ g , where s g is the design parameter to be optimized. Note that it is satisfied that T g T g = U ¯ g Φ g Φ g U ¯ g = U ¯ g U ¯ g = I s g .
For the inner precoder, we adopt the zero-forcing (ZF) beamforming to mitigate multiuser interference among the users in each group (i.e., SGI). The beamforming vector of user k in the g-th group is constructed by null-space of the effective channel vectors of all the other users in the g-th group. It is obvious that the minimum mean squared error (MMSE) type of precoder can achieve better performance than the ZF precoder. However, with the MMSE precoder, the optimal regularization parameter for the inner precoder design is dependent on the outer precoder design, so it is not easy to find. Moreover, we consider limited feedback environment, so the effects of channel quantization errors make the optimal regularization factor much more difficult to find. Thus, we consider the ZF precoder for inner precoder design thanks to its simplicity and analytical tractability. Note that the ZF beamforming is asymptotically optimal among all downlink beamforming strategies in a high SNR region [6,29], and it guarantees high spectral efficiency for large-scale antennas with low-complexity linear processing [2,9]. Then, the inner precoder for the g-th group is given by
V g = [ v g , 1 , , v g , K g ] C s g × K g ,
where v g , k is a ZF beamforming vector of user k in the g-th group. Note that a beamforming vector v g , k is normalized such that v g , k 2 = 1 , since the two-stage beamforming vector for user k in the g-th group with the BD-based outer precoder, i.e.,  f k = T g v g , k , should satisfy f k f k = 1 , which is given by
f k f k = v g , k T g T g v g , k = v g , k I s g v g , k = v g , k v g , k = v g , k 2 = 1 ,
for equal power allocation at the BS in (1). The details of the construction of a ZF beamforming vector of user k in the g-th group is characterized in the following subsection.

3.2. Limited Feedback Method with a Two-Stage Precoder

In the previous section, the dimension of the effective channel of each user in group g is reduced to s g by the outer precoder. Hence, we focus on the the limited feedback system to acquire the CDI of the dimension-reduced effective channel of each group, i.e.,  H g eff . We define h g , k eff C s g × 1 the dimension-reduced effective channel of user k in the g-th group as
h g , k eff T g h g , k .
Given the covariance matrix R g and the outer precoder T g , each user only needs to quantize its effective channel ( h g , k eff ) and feeds the quantized channel back to the BS.
For quantization of the effective channel, we adopt the random vector quantizer (RVQ), which is widely used to analyze the effects of quantization error and performs close to optimal quantization with a Rayleigh fading channel environment [6,7,9]. Using the RVQ, the Rayleigh component of the effective channel should be quantized according to (13). By the Karhunen–Loeve representation, the effective channel in (13) can be decomposed by the SVD [13], given as
h g , k eff = T g U g Λ g 1 / 2 e g , k = Ω g Σ ¯ g Γ g e g , k = Ω g Σ g w g , k ,
where Ω g Σ ¯ g Γ g ( Ω g C s g × s g , Σ ¯ g C s g × r g , Γ g C r g × r g ) is SVD of T g U g Λ g 1 / 2 , and Σ g C s g × s g is the matrix comprised of the first s g columns of Σ ¯ g , i.e., the diagonal matrix with s g non-zero positive eigenvalues, and  w g , k C s g × 1 is the vector with the first s g elements of Y g e g , k . Note that w g , k follows the distribution of C N ( 0 , I s g ) , and  we have
T g R g T g = Ω g Σ g Ω g Σ g
due to the fact that E h g , k eff h g , k eff = T g E h g , k h g , k T g = T g R g T g from (13), and  E h g , k eff h g , k eff = Ω g Σ g E w g , k w g , k Ω g Σ g = Ω g Σ g Ω g Σ g from (14) and w g , k C N ( 0 , I s g ) , respectively.
Allocating B-bits to each user’s feedback size and, using the RVQ, a quantization codebook for user k in the g-th group, i.e.,  C g , k = { c g , k , 1 , , c g , k , 2 B } , consists of 2 B randomly chosen isotropic s g -dimensional unit norm vectors. In this case, the quantized CDI of w g , k in (14), i.e.,  w ^ g , k , is obtained by
w ^ g , k = arg~max c C g , k cos 2 ( w ˜ g , k , c ) = arg~max c C g , k w ˜ g , k c 2 ,
where w ˜ g , k = w g , k w g , k . Then, the quantization error denoted by Z g , k w ^ [ 0 , 1 ] is defined as
Z g , k w ^ 1 w ˜ g , k w ^ g , k 2 .
For an arbitrary codeword c C g , k in (16), the value w ˜ g , k c 2 follows the beta distribution with parameters ( s g 1 , 1 ) because it is the square of the inner product of two independent and isotropic unit-norm random vectors in C s g [6,7]. Consequently, a quantization error using B-bits RVQ, i.e.,  Z g , k w ^ in (17), is the minimum of 2 s g independent beta distributed random variables, of which the complementary cumulative density function (CDF) is given by Pr Z g , k w ^ > z = 1 z s g 1 2 B , and the expectation is bounded as [6,7]
E Z g , k w ^ < 2 B s g 1 .
After receiving the feedback of w ^ g , k , the BS obtains the quantized CDI of the dimension-reduced effective channel according to (14) given by
h ^ g , k eff = Ω g Σ g w ^ g , k Ω g Σ g w ^ g , k ,
and the quantization error of the dimension-reduced effective channel h ^ g , k eff denoted by Z g , k h ^ eff [ 0 , 1 ] is defined as
Z g , k h ^ eff 1 h ˜ g , k eff h ^ g , k eff 2 .
Note that the distribution of Z g , k h ^ eff in (20) can be different from the distribution of Z g , k w ^ in (17) (i.e., beta distribution) since the quantized CDI of the dimension-reduced effective channel is projected by the covariance matrix and the outer precoder.
Based on the quantized CDI of the dimension-reduced effective channel of the g-th group, i.e.,  H ^ g eff = h ^ g , 1 eff , , h ^ g , K g eff C s g × K g , the BS constructs the inner precoder of the g-th group, i.e.,  V g = v g , 1 , , v g , K g , where a ZF beamforming vector of user k in the g-th group, i.e.,  v g , k is obtained as
v g , k = A g , k h ^ g , k eff A g , k h ^ g , k eff ,
where A g , k A g , k A g , k C s g × s g is the null-space of the quantized CDI of the effective channels of all other users in group g, where A g , k is a s g × ( s g K g + 1 ) submatrix consisted by orthonormal column vectors, which is obtained from the SVD of H ^ g , k eff such as
H ^ g , k eff = A g , k , A g , k Ξ g , k L g , k ,
where H ^ g , k eff is the quantized CDI of the effective channels of all other users in group g, i.e.,  H ^ g , k eff = h ^ g , 1 eff , , h ^ g , k 1 eff , h ^ g , k + 1 eff , , h ^ g , K g eff [11,29].

3.3. Problem Formulation

With the two-stage precoder and the quantized CDI of the effective channel, the signal-to-interference plus noise ratio (SINR) of user k in the g-th group is obtained from (5) as follows:
SIN R g , k = γ h g , k eff v g , k 2 γ i k K g h g , k eff v g , i 2 SGI + γ c g G j = 1 K c h g , k T c v c , j 2 IGI + 1 ,
where γ P K σ 2 is the signal-to-noise ratio (SNR) at each user. Then, the average sum rate denoted by R sum is given by
R sum = g = 1 G k = 1 K g E log 2 1 + SIN R g , k .
Analyzing (23) and (24), the quantized CDI of the effective channel, i.e.,  h g , k eff , is only fed back to the BS, so the IGI term in SIN R g , k is not affected by the quantized CDI. In this case, the IGI term is determined by the outer precoders for the other groups, i.e.,  { T c } c g G . With exploiting the BD method for the outer precoder design as in (10), the performance of IGI cancellation, i.e.,  c g G j = 1 K c h g , k T c v c , j 2 , depends on the reduced dimension numbers of the effective channels of the other groups, i.e.,  { s c } c g G . On the other hand, the magnitude of the desired signal term, i.e.,  h g , k eff v g , k 2 , and the performance of SGI cancellation, i.e.,  i k K g h g , k eff v g , i 2 , are restricted by the dimension of the outer precoder of g-th group, i.e.,  s g [19]. Note that the quantization error of the effective channel, i.e.,  h ^ g , k eff in (20), is more limited by s g given feedback bit allocation, i.e., B-bits. Therefore, the reduced dimensions of the effective channels for all groups (i.e., s 1 , , s G ) should be jointly optimized considering all the interactions among them in order to maximize the average sum rate in (24). Thus, we formulate the optimization problem to find the optimal s 1 , , s G as follows [19]:
P 1 : maximize s 1 , , s G g = 1 G k = 1 K g E log 2 1 + SIN R g , k , subject to s g Z + , g = 1 , , G , K g s g r g .
The optimization problem P 1 is difficult to solve directly because it is a mixed-integer problem, which is generally known to be NP-hard. Moreover, the effect of the dimensions is implicit in the objective function, and the optimal solution can be obtained only by numerical search for every channel realization, which is almost impossible in practice. Note that, once the reduced dimensions { s g } g = 1 G are determined, the BS informs s g to the users in the g-th group so that they can quantize s g -dimensional effective channels with the corresponding codebook.

4. Our Previous Work on Dimension Optimization

As we explained in the previous section, the problem P 1 is hard to solve because it is an NP-hard problem and the effect of the dimensions is implicit in the objective function. In this section, we briefly explain how we solved the problem P 1 in our previous work [19].
In [19], we showed that the objective function of the problem P 1 can be approximated [3] and then lower bounded as follows:
g = 1 G k = 1 K g E log 2 1 + SIN R g , k g = 1 G k = 1 K g log 2 1 + E γ h g , k eff v g , k 2 E γ i k K g h g , k eff v g , i 2 + γ c g G j = 1 K c h g , k T c v c , j 2 + 1 ( 26 ) > g = 1 G R g s 1 , , s G , ( 27 )
where
R g s 1 , , s G log 2 1 + γ Tr T g R g T g j = 1 K g 1 λ j R g γ K g 1 s g 1 2 B s g 1 Tr T g R g T g + γ c g G K c Tr T c R g T c + 1 ,
with λ j R g the j-th largest eigenvalue of R g . Thus, the effect of dimensions becomes explicit in (27). Note that, given { s g } g = 1 G , the BS obtains the lower bound of (27) based only on the covariance matrices { R g } g = 1 G and the outer precoders { T g } g = 1 G without the CDI of s g -dimensional effective channels (i.e., h g , k eff for all g { 1 , , G } and k { 1 , , K g } ).
For a practical design, we can establish using the lower bound (28) as follows:
P 2 : maximize s 1 , , s G g = 1 G R g s 1 , , s G , subject to s g Z + , g = 1 , , G , K g s g r g .
The problem P 2 is also a mixed-integer problem, and obtaining the objective function of (29) in a closed-form is not available because the outer precoders { T g } g = 1 G are constructed by the procedure in (7)–(10) with respect to { s g } g = 1 G . Thus, the optimal solution to problem P 2 is obtained by a combinatorial joint optimization of s 1 , , s G , which requires the G-dimensional numerical search, so it is still complex.
To reduce complexity of the G-dimensional numerical search in problem P 2 , we can assume that the dimensions of the effective channels among all groups are the same as s, i.e.,  s 1 = = s G = s . Then, the problem P 2 is reduced to
P 3 : maximize s g = 1 G R g s , subject to s Z + max { K 1 , , K g } s min { r 1 , , r G } ,
and the optimal solution of the problem P 3 can be obtained from one-dimensional numerical search [19]. We describe the detailed procedure to optimal the optimal solution of the problem P 3 in Algorithm 1.
Algorithm 1 Finding the optimal solution of the problem P 3
  1:
Input: Channel covariance matrices { R g } g = 1 G , number of users in the groups { K g } g = 1 G , SNR γ , and feedback bit B
  2:
Obtain U g and r g for all g { 1 , , G } by SVD
  3:
Define s min max { K 1 , , K g } and s max min { r 1 , , r G }
  4:
Set s Z + and s min s s max
  5:
for s = s min , , s max do
  6:
    Construct outer precoder T g C M × s for all g { 1 , , G } (referring to Section 3.1)
  7:
    Compute R g ( s ) based on (28) for all g { 1 , , G }
  8:
    Compute g = 1 G R g ( s )
  9:
end for
10:
Obtain s = arg max s min s s max R g ( s )
11:
Output: The optimal dimension s

5. Machine Learning Framework for Dimension Optimization

In this section, we propose the machine learning framework for dimension optimization (i.e., the problem P 1 ). Note that our machine learning framework is based on a deep neural network (DNN) and tackles problem P 1 directly.

5.1. Preliminary: The General DNN Architecture

Our proposed machine learning-based dimension optimization utilizes a DNN [27], so, in this subsection, we briefly explain a general DNN model. A DNN is one of the most popular algorithms in machine learning, which is considered as a multiple layer perceptron (MLP) [20,21,22]. The DNN is comprised of one input layer, L 2 hidden layers, and one output layer. The DNN with many hidden layers enhances its learning and mapping abilities, so it is capable of handling complicated nonlinear problems [20,21,22,25,26].
Let w R N 0 be the input vector of the input layer and o R N ( L 1 ) be the output vector of the output layer. Then, the mapping between them can be mathematically represented as follows:
o = f w ; θ = f ( L 1 ) ( f ( L 2 ) ( f ( 2 ) ( f ( 1 ) ( w ; θ l ) ) ) ) ,
where f ( l ) ( w ; θ l ) for l { 1 , 2 , , L 1 } is an activation function of the l-th layer, which maps the input and output vectors of the l-th layer, and θ = { θ 1 , , θ ( L 1 ) } is a set of parameters such as weights and biases to calculate the weighted sum of nodes, which can be adjusted during the training procedure [20,21,22]. Activation functions in (31) are prerequisite to tackle nonlinear problems in the DNN model. In particular, multiple nodes in each layer operate with the activate functions to map its input vector to the output vector with nonlinear operations. Various activation functions are listed in Table 1 and illustrated in Figure 3.
A set of parameters in (31), i.e., θ , is adjusted in the training procedure to minimize the loss function [20,21,22]. In supervised learning, each training sample is labeled by the desired output o ¯ (i.e., the correct answer), and hence the loss function between the DNN-based model output o and the desired output o ¯ becomes
l o s s ( θ ) = 1 n ¯ j = 1 n ¯ L f o ¯ j , o j ,
where o ¯ j and o j are the desired output and the predicted output from the j-th training sample, respectively. In addition, n ¯ is the batch-size of training samples, where the batch-size means the total number of training samples in a mini batch.
There are mainly two types of loss functions for supervised learning in a DNN [20,21,22] as shown in Table 2.
For supervised learning, the ultimate goal of DNN design is to map w to o ¯ from (31) based on training samples. Thus, training is the process of adjusting θ to minimize the loss function in (32). The widely used algorithm for training is a stochastic gradient descent (SGD) method [20,21,22,24], which updates its gradient for each layer to minimize the loss function at each step, i.e., mini batch. In addition, there are many modified ones such as SGD with momentum (SGDM), adaptive gradient (AdaGrad), and root mean square propagation (RMSProp) [20,21,22,30,31,32]. Adaptive moment estimation (Adam) is also widely used for DNN training.

5.2. The Proposed DNN Framework

The proposed DNN architecture is illustrated in Figure 4, where it is comprised of one input layer of ( ( M + 1 ) G + 2 ) input nodes, one output layer of M output nodes, and L ¯ hidden layers of ( N 1 , , N L ¯ ) nodes. The input layer of the proposed DNN model takes each user’s SNR, feedback sizes, the number of users, and eigenvalues of each user group’s covariance matrix. In addition, the output layer returns the dimension for each group. Although the number of input nodes depends on the number of user groups, they are fixed in our design because the number of groups is given for the system model according to channel statistics. Meanwhile, the number of hidden layers and nodes are mediatable variables according to the DNN architecture.

5.2.1. Input Layer

To establish the DNN-based supervised learning framework, the input data of a learning system, i.e., w in (31), should be determined considering the system model and the problem P 1 . Owing to the insight from the lower bound g = 1 G R g s 1 , , s G in Section 4, our proposed DNN model takes the input of SNR at each user ( γ ), feedback bit allocation (B), and the number of users ( K g ) and eigenvalues of covariance matrix for all groups as follows:
w = B , γ , { K 1 , λ 1 , 1 , , λ 1 , M } Group 1 , , { K G , λ G , 1 , , λ G , M } Group G T R ( M + 1 ) G + 2 ,
where { λ g , 1 , , λ g , M } R M for the g-th group can be divided by the r g non-zero eigenvalues of the covariance matrix of the g-th group and ( M r g ) zeros. For the feature information of the input, the eigenvalues characterize the optimal dimensions because the covariance matrices affect the objective function of problem P 1 , which is statistically determined by the effective channels and quantization codebooks. Note that we adopt the eigenvalues rather than the covariance matrices itself in the view of principal component analysis (PCA) for the machine learning technique [22]. It reduces the complexity of the proposed DNN model from large-scale antennas because taking covariance matrices ( { R g } g = 1 G ) as the input requires ( ( M 2 + 1 ) G + 2 ) input nodes.

5.2.2. Output Layer

The proposed DNN framework aims at obtaining the solution of problem P 1 , i.e., { s 1 , , s G } . The number of eigenvalues for the g-th group is variable dependent on the channel statistics, i.e., covariance matrices, whose range is r g M . Thus, the maximum number of classes for the optimal dimensions considering all groups becomes M G , which is almost impossible to establish the DNN model in practice for massive MIMO systems. We adopt the parallel DNN framework for each group as shown in Figure 4. Then, the desired output vector o ¯ g for the g-th group is expressed by
o ¯ g = 0 , , 0 , 1 ( s g - th component ) , 0 , , 0 T R M ,
where (34) is comprised of the s g -th component one and the other components zero. For the output layer, the Softmax activation function is used for outputs in the interval ( 0 , 1 ) . Thus, the position of 1 in the g-th output vector becomes the dimension for the g-th group.

5.2.3. Hidden Layer

The hidden layer is designed to learn the features of the training samples. They are redefined according to the system model to characterize the relationship between the input and the output layers. We adopt the tanh function as the activation function and determine the number of hidden layers and the number of nodes to increase the performance in the training procedure.

5.2.4. DNN Training

For the precise classification in the training to map the input of (33) into the desired output of (34), we need enough training samples that reflects many situations. Thus, we generate the training samples varying SNR, the feedback size of each user, and eigenvalues of the covariance matrices. In particular, we focus on randomly generating covariance matrices varying angular spread in (3) to learn the feature information of eigenvalues, i.e., channel statistics of the system model. With the fixed azimuth center angle in (3), the number of eigenvalues of covariance matrix increases as the angular spread increases. It also leads to variation of magnitude of the eigenvalues. Note that the number of groups and the number of users in each group are the system model parameters, so they are fixed in the DNN training.
Then, we obtain the optimal dimensions of the problem P 1 by numerical search for every channel realization and label the samples according to (34) with perspective of the given SNR, feedback size, and covariance matrices of the groups. For the proposed DNN training, we adopt cross entropy as the loss function in (32), which is widely used for multi-class classification of DNN architecture [20,21,22]. Thus, the loss function of the g-th group DNN model is given by
l o s s ( θ g ) = 1 n ¯ j = 1 n ¯ L f o ¯ j g , o j g = 1 n ¯ j = 1 n ¯ i = 1 M o ¯ j , i g log o j , i g + ( 1 o ¯ j , i g ) log ( 1 o j , i g ) ,
where n ¯ is the batch-size of the training samples, and j denotes the j-th training sample; o ¯ j g and o j g are the desired output vector determined by s g and (34), and the predicted output in the output layer obtained by the Softmax activation function of the g-th group; o ¯ j , i g and o j , i g denote the i-th elements of o ¯ j g and o j g , respectively.
For our DNN model training, we divided the samples into the training and the validation sets with the ratio of 0.85 and 0.15 , respectively. We use the Adam algorithm [32,33] for training, which is the combined and updated version of AdaGrad [30] and RMSProp [31]. To prevent under-fitting and over-fitting, we set the maximum number of epochs to 3000, the batch-size to 1000, and initial learning rate to 0.001 . The learning rate is adjusted by the drop factor of 0.5 for every 1000 epoch during the training. In addition, we used the L 2 -regularization factor of 0.01 , gradient decay factor of 0.98 , and squared gradient decay factor of 0.99 . To prevent the over-fitting, we also adopt an early stopping strategy, which terminates the training when the maximum number of epochs or validation patience criterion is reached. The validation patience is the number of times that the loss on the validation set is larger than or equal to the previously obtained smallest loss [33]. We adopt the validation check with every five epochs and validation patience criterion of eight.

6. DNN Performance and Numerical Results

In this section, we evaluate our proposed DNN framework. We consider a single-cell massive MIMO systems with limited feedback, where the BS is equipped with 64 ULA antennas (i.e., M = 64 ). The BS serves 15 single-antenna users (i.e., K = 15 ) that can be divided into three groups (i.e., G = 3 ) with 5 users in each group (i.e., K g = 5 for g { 1 , 2 , 3 } ) . For the channel model, we consider the Ralyeigh correlated channel with θ g = π 4 + π 4 ( g 1 ) and δ a λ c = 0.5 in (3). As mentioned in Section 5.2.4, we generate covariance matrices varying angular spread, where Δ g is randomly generated in the range of Δ g 2 π 45 , π 9 . We denote the number of eigenvalues of the g-th group covariance matrix by r g (i.e., rank). With this setting, the ranges of each group’s rank are given by r 1 { 10 , , 20 } , r 2 { 13 , , 27 } , and r 3 { 10 , , 20 } , respectively. To obtain the average sum rate of R sum in (24), we average out channel realizations based on RVQ codebooks varying small-scale Rayleigh channel fading e g , k in (2) with the fixed covariance matrices { R g } g = 1 3 .
First of all, we establish our proposed DNN framework and train our machine learning model for our system model setting. Our DNN model operates in parallel and thus is trained for each group as explained in Section 5.2. Thus, there are a total of three DNN models of the same structure at the BS. The DNN model of each group is comprised of the input layer of 197 nodes, the output layer of 64 nodes, and six hidden layers of 200 nodes each. For the DNN training, we generate 4 × 10 5 training samples varying each user’s SNR, feedback sizes, and covariance matrices as the optimal dimensions (i.e., { s g } g = 1 3 ) are obtained by every training sample according to the problem P 1 by numerical search. To train our DNN model for each group, each sample for the g-th group DNN model is labeled by its own desired output, i.e., o ¯ g in (34) according to s g . To improve the training performance, the samples are are divided into the training set and the validation set, and we adopt the Adam algorithm with the parameter setting as explained in Section 5.2.4.
In Figure 5, we show the training state and performance of the proposed DNN model for every group. Figure 5a shows the cross entropy loss of the training and validation set with respect to the number of iterations. The cross entropy losses for all groups decrease as the number of iterations increases, but the gap between the training and validation sets also increases as the number of iterations increases. Thus, the stopping strategy is required to terminate the training to avoid over-fitting as we explained in Section 5.2.4. The losses of group 1 and group 3 are similar, while the loss of group 2 is larger than them. However, the gap between them decreases as the number of iterations increases.
Figure 5b shows the accuracy of our machine learning model for the training and the validation sets with respect to the number of iterations. The accuracy is measured with the ratio of the numbers of correct answers (i.e., o = o ¯ ) to wrong answers (i.e., o o ¯ ). The accuracy increases as the number of iterations increases as we can expect from Figure 5a. The final accuracies from the training set and the validation set become ( 84.7 , 82.0 , 85.7 ) % , and ( 83.2 , 78.4 , 84.3 ) % , respectively. Figure 5c shows the accuracy for all samples in the validation set with respect to the number of ranks for the covariance matrix of each group. Thus, we can conclude that the our proposed DNN model is well trained and learn the features from channel statistics information.
Figure 6 compares the average sum rate of the proposed DNN model-based scheme and other reference schemes, where the feedback size are 6 bits and 10 bits in Figure 6a,b, respectively. For comparison, we consider the following five reference schemes:
  • Optimal scheme: The optimal dimensions of the outer precoder are obtained via brute-force numerical search.
  • Full rank-based scheme: The dimensions of the outer precoder are the number of ranks of covariance matrices.
  • Lower bound-based scheme: The dimensions of the outer precoder are obtained by the lower bound-based analysis and the solution of problem P 3 .
  • Fixed dimension scheme 1: The outer precoder dimensions are fixed to five, i.e., s 1 = = s G = 5 .
  • Fixed dimension scheme 2: The outer precoder dimensions are fixed to eight, i.e., s 1 = = s G = 8 .
For performance comparison, we average out 100 covariance realizations varying the angular spread. In Figure 6a, the proposed scheme outperforms the reference schemes and is comparable to the optimal scheme. The reason is that the dimensions obtained by the proposed DNN framework are closer to the optimal dimensions. Note that the lower bound-based scheme increases the average sum rate compared to the other reference schemes, (i.e., full rank-based scheme, and fixed dimension scheme 1 and 2), and it is lower bounded on both the optimal and the proposed schemes. In Figure 6b, as expected, the proposed scheme outperforms the reference schemes and achieves near-optimal performance. The gap between the proposed scheme and other reference schemes is different from those from the gap in Figure 6a. This is because the optimal dimensions are affected by feedback bit allocations. Consequently, we conclude that the dimension of the outer precoder is optimized to maximize the average sum rate and the proposed DNN model-based scheme performs well and is comparable to the optimal scheme.

7. Conclusions

In this paper, we optimized the dimension of the outer precoder in the two-stage precoder to maximize the average sum rate in massive MIMO systems when feedback size is limited. We proposed the DNN framework to find the optimal dimensions in order to maximize the average sum rate, where the original problem is an NP-hard problem. We established the DNN-based supervised learning framework, which takes the SNR at each user, feedback bit allocation, and eigenvalues of covariances of user groups as inputs and returns the optimal dimensions allocating for all user groups. The numerical results showed that our proposed machine learning based outer precoder dimension optimization improves the average sum-rate and achieves near-optimal performance using brute-forcing searching, which is not feasible in practice. Although we consider single-antennas users in our system model, our proposed scheme can be extended for the multi-antenna users. In this case, for inner precoder design, we can use block diagonalization (BD) to cancel the inter-user interference instead of the ZF precoder. However, it is not easy to design the matrix codebooks, which are shared among the transmitter and the users for limited feedback. The extensions of our proposed scheme for more general system models are our on-going research topics.

Author Contributions

Conceptualization, J.K., J.H.L. and W.C.; methodology, J.K. and J.H.L.; software, J.K.; validation, J.K., J.H.L. and W.C.; formal analysis, J.K.; investigation, J.K.; resources, J.H.L. and W.C.; data curation, J.K.; writing—original draft preparation, J.K.; writing—review and editing, J.H.L. and W.C.; visualization, J.K.; supervision, W.C.; project administration, W.C.; funding acquisition, W.C.

Funding

This work has been supported by the Future Combat System Network Technology Research Center program of Defense Acquisition Program Administration and Agency for Defense Development (UD160070BD).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 2014, 52, 186–195. [Google Scholar] [CrossRef] [Green Version]
  2. Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. An overview of massive MIMO: Benefits and challenge. IEEE J. Sel. Top. Signal Process. 2014, 8, 742–758. [Google Scholar]
  3. Zhang, Q.; Jin, S.; Wong, K.K.; Zhu, H.; Matthaiou, M. Power scaling of uplink massive MIMO systems with arbitrary-rank channel means. IEEE J. Sel. Top. Signal Process. 2014, 8, 966–981. [Google Scholar] [CrossRef]
  4. Ngo, H.Q.; Larsson, E.G.; Marzetta, T.L. Energy and spectral efficiency of very large multiuser MIMO systems. IEEE Trans. Commun. 2013, 61, 1436–1449. [Google Scholar]
  5. Marzetta, T.L. How much training is required for multiuser MIMO? In Proceedings of the 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 29 October–1 November 2006; pp. 359–363. [Google Scholar]
  6. Jindal, N. MIMO broadcast channels with finite-rate feedback. IEEE Trans. Inf. Theory 2006, 52, 5045–5060. [Google Scholar] [CrossRef]
  7. Lee, J.H.; Choi, W. Unified codebook design for vector channel quantization in MIMO broadcast channels. IEEE Trans. Signal Process. 2015, 63, 2509–2519. [Google Scholar] [CrossRef]
  8. Xie, H.; Gao, F.; Zhang, S.; Jin, S. A unified transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model. IEEE Trans. Veh. Technol. 2017, 66, 3170–3184. [Google Scholar] [CrossRef]
  9. Wagner, S.; Couillet, R.; Debbah, M.; Slock, D.T.M. Large system analysis of linear precoding in correlated MISO broadcast channels under limited feedback. IEEE Trans. Inf. Theory 2012, 58, 4509–4537. [Google Scholar] [CrossRef]
  10. Adhikary, A.; Nam, J.; Ahn, J.-Y.; Caire, G. Joint Spatial division and multiplexing—The large-scale array regime. IEEE Trans. Inf. Theory 2013, 59, 6441–6463. [Google Scholar] [CrossRef]
  11. Kim, D.; Lee, G.; Sung, Y. Two-stage beamformer design for massive MIMO downlink by trace quotient formulation. IEEE Trans. Commun. 2015, 63, 2200–2211. [Google Scholar] [CrossRef]
  12. Park, J.; Clerckx, B. Multi-user linear precoding for multi-polarized Massive MIMO system under imperfect CSIT. IEEE Trans. Wirel. Commun. 2015, 14, 2532–2547. [Google Scholar] [CrossRef]
  13. Jeon, Y.-S.; Min, M. Large system analysis of two-stage beamforming with limited feedback in FDD massive MIMO systems. IEEE Trans. Veh. Technol. 2018, 67, 4984–4997. [Google Scholar] [CrossRef]
  14. Sohrabi, F.; Yu, W. Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J. Sel. Top. Signal Process. 2016, 10, 501–513. [Google Scholar] [CrossRef]
  15. Lin, Y.-P. Hybrid MIMO-OFDM beamforming for wideband mmWave channels without instantaneous feedback. IEEE Access 2017, 5, 21806–21817. [Google Scholar] [CrossRef]
  16. Castanheira, D.; Lopes, P.; Silva, A.; Gameiro, A. Hybrid beamforming designs for massive MIMO millimeter-wave heterogeneous systems. IEEE Access 2017, 5, 21806–21817. [Google Scholar] [CrossRef]
  17. Magueta, R.; Castanheira, D.; Silva, A.; Dinis, R.; Gameiro, A. Hybrid iterative space-time equalization for multi-user mmW massive MIMO systems. IEEE Trans. Commun. 2017, 65, 608–620. [Google Scholar] [CrossRef]
  18. Castañeda, E.; Castanheira, D.; Silva, A.; Gameiro, A. Parametrization and applications of precoding reuse and downlink interference alignment. IEEE Trans. Wirel. Commun. 2017, 16, 2641–2650. [Google Scholar] [CrossRef]
  19. Kang, J.; Lee, J.H.; Choi, W. Two-stage precoder for massive MIMO systems with limited feedback. In Proceedings of the 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications & Computing (SigTelCom), Ho Chi Minh, Vietnam, 29–31 January 2018; pp. 91–96. [Google Scholar]
  20. Klaine, P.V.; Imran, M.A.; Onireti, O.; Souza, R.D. A survey of machine learning techniques applied to self-organizing cellular networks. IEEE Commun. Surv. Tutor. 2017, 19, 2392–2431. [Google Scholar] [CrossRef]
  21. Mao, Q.; Hu, F.; Hao, Q. Deep Learning for Intelligent Wireless Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2018, 20, 2595–2621. [Google Scholar] [CrossRef]
  22. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
  23. Joung, J. Machine learning-based antenna selection in wireless communications. IEEE Commun. Lett. 2016, 20, 2241–2244. [Google Scholar] [CrossRef]
  24. Kwon, H.J.; Lee, J.H.; Choi, W. Machine learning-based beamforming in two-user MISO interference channels. In Proceedings of the 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Okinawa, Japan, 11–12 February 2019; pp. 496–499. [Google Scholar]
  25. Huang, H.; Yang, J.; Huang, H.; Song, Y.; Gui, G. Deep learning for super-resolution channel estimation and DOA estimation based massive MIMO system. IEEE Trans. Veh. Technol. 2018, 67, 8549–8560. [Google Scholar] [CrossRef]
  26. Huang, H.; Song, Y.; Yang, J.; Gui, G.; Adachi, F. Deep-learning-based millimeter-wave massive for hybrid precoding. IEEE Trans. Veh. Technol. 2019, 68, 3027–3032. [Google Scholar] [CrossRef]
  27. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
  28. Spencer, Q.H.; Swindlehurst, A.L.; Haardt, M. Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels. IEEE Trans. Signal Process. 2004, 52, 461–471. [Google Scholar] [CrossRef]
  29. Li, P.; Paul, D.; Narasimhan, R.; Cioffi, J. On the distribution of SINR for the MMSE MIMO receiver and performance analysis. IEEE Trans. Inf. Theory 2006, 52, 271–286. [Google Scholar]
  30. Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
  31. Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
  32. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  33. MathWorks. Deep Learning Toolbox. Available online: https://www.mathworks.com/products/deep-learning.html (accessed on 18 July 2019).
Figure 1. Illustration of our system model.
Figure 1. Illustration of our system model.
Applsci 09 02894 g001
Figure 2. Illustration of the two-stage precoder with limited feedback.
Figure 2. Illustration of the two-stage precoder with limited feedback.
Applsci 09 02894 g002
Figure 3. Illustration of activation functions. (a) sigmoid; (b) hyperbolic tangent (tanh); (c) rectified linear unit (ReLU); (d) symmetric saturated linear unit (SSaLU).
Figure 3. Illustration of activation functions. (a) sigmoid; (b) hyperbolic tangent (tanh); (c) rectified linear unit (ReLU); (d) symmetric saturated linear unit (SSaLU).
Applsci 09 02894 g003
Figure 4. The proposed DNN framework.
Figure 4. The proposed DNN framework.
Applsci 09 02894 g004
Figure 5. Training state and performance of the proposed DNN model. (a) the cross entropy loss with respect to the number of iterations; (b) the accuracy with respect to the number of iterations; (c) the accuracy with respect to the number of ranks.
Figure 5. Training state and performance of the proposed DNN model. (a) the cross entropy loss with respect to the number of iterations; (b) the accuracy with respect to the number of iterations; (c) the accuracy with respect to the number of ranks.
Applsci 09 02894 g005
Figure 6. Comparison of the average sum rates for various schemes. (a) feedback bit allocation: 6 bits; (b) feedback bit allocation: 10 bits.
Figure 6. Comparison of the average sum rates for various schemes. (a) feedback bit allocation: 6 bits; (b) feedback bit allocation: 10 bits.
Applsci 09 02894 g006
Table 1. Mathmatical expressions for activation functions.
Table 1. Mathmatical expressions for activation functions.
NameSigmoidtanhReLUSSaLUSoftmax
f ( w ) 1 1 + e w 1 e w 1 + e w max ( 0 , w ) max ( 1 , w ) w < 0 min ( 1 , w ) w 0 e w j e w j
Table 2. Mathematical expressions for loss functions (i denotes i-th element of a vector).
Table 2. Mathematical expressions for loss functions (i denotes i-th element of a vector).
NameMean Square ErrorCross Entropy
L f o ¯ , o o ¯ o 2 i o ¯ i log o i + ( 1 o ¯ i ) log ( 1 o i )

Share and Cite

MDPI and ACS Style

Kang, J.; Lee, J.H.; Choi, W. Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback. Appl. Sci. 2019, 9, 2894. https://doi.org/10.3390/app9142894

AMA Style

Kang J, Lee JH, Choi W. Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback. Applied Sciences. 2019; 9(14):2894. https://doi.org/10.3390/app9142894

Chicago/Turabian Style

Kang, Jinho, Jung Hoon Lee, and Wan Choi. 2019. "Machine Learning-Based Dimension Optimization for Two-Stage Precoder in Massive MIMO Systems with Limited Feedback" Applied Sciences 9, no. 14: 2894. https://doi.org/10.3390/app9142894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop