**Massive MIMO Systems**

Special Issue Editors

**Kazuki Maruta Francisco Falcone**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Special Issue Editors* Kazuki Maruta Chiba University Japan

Francisco Falcone Public University of Navarre Spain

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Electronics* (ISSN 2079-9292) (available at: https://www.mdpi.com/journal/electronics/special issues/mimo).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Article Number*, Page Range.

**ISBN 978-3-03936-016-1 (Hbk) ISBN 978-3-03936-017-8 (PDF)**

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



Reprinted from: *Electronics* **2019**, *8*, 642, doi:10.3390/electronics8060642 ............... **305**

## **About the Special Issue Editors**

**Kazuki Maruta** completed his Bachelor of Engineering and Master of Engineering degrees, as well as his PhD, at Kyushu University, Japan, in 2006, 2008 and 2016, respectively. From 2008 to 2017, he worked on the research and development of interference compensation techniques for future wireless communication systems at NTT Access Network Service Systems Laboratories. From 2017 to 2020, he was an Assistant Professor at the Graduate School of Engineering, Chiba University. He is currently a Specially Appointed Associate Professor at the Academy for Super Smart Society, Tokyo Institute of Technology. His research interests include MIMO, adaptive array signal processing, channel estimation, medium access control protocols and moving networks. He is a member of IEEE and IEICE. He won the IEICE Young Researcher's Award in 2012, the IEICE Radio Communication Systems (RCS) Active Researcher Award in 2014, the Asia-Pacific Microwave Conference (APMC) Prize in 2014, and the IEICE RCS Outstanding Researcher Award in 2018. He was also co-recipient of the IEICE Best Paper Award in 2018.

**Francisco Falcone** completed his degree in telecommunication engineering and his PhD in communication engineering at the Public University of Navarra (UPNA), Spain, in 1999 and 2005, respectively. From 1999 to 2000, he was a Microwave Commissioning Engineer with Siemens-Italtel, where he deployed microwave access systems. From 2000 to 2008, he was a Radio Access Engineer at Telefonica M ´ oviles, performing radio network planning and optimization tasks in mobile network ´ deployment. In 2009, he was a co-founding member and the Director of Tafco Metawireless, a spin-off company from UPNA. He was an Assistant Lecturer at the Electrical and Electronic Engineering Department, UPNA, from 2003 to 2009. In 2009, he became an Associate Professor with the EE Department, and was the Head of Department from 2012 to 2018. From January 2018 to May 2018, he was a Visiting Professor at the Kuwait College of Science and Technology, Kuwait. He is also affiliated with the Institute for Smart Cities (ISC), UPNA, which hosts around 140 researchers, and is the Head of the ICT section. He has more than 500 contributions in indexed international journals, book chapters and conference contributions. His research interests are related to computational electromagnetics applied to the analysis of complex electromagnetic scenarios, with a focus on the analysis, design and implementation of heterogeneous wireless networks to enable context aware environments. Prof. Falcone was a recipient of the CST 2003 and CST 2005 Best Paper Award, the PhD Award from the Colegio Oficial de Ingenieros de Telecomunicacion (COIT) in 2006, the Doctoral ´ Award UPNA, 2010, the 1st Juan Gomez Penalver Research Award from the Royal Academy of ˜ Engineering of Spain in 2010, the XII Talgo Innovation Award 2012, the IEEE 2014 Best Paper Award, 2014, the ECSA-3 Best Paper Award, 2016, and the ECSA-4 Best Paper Award, 2017. He is a Senior Member of the IEEE.

### *Editorial* **Massive MIMO Systems: Present and Future**

#### **Kazuki Maruta 1,\* and Francisco Falcone <sup>2</sup>**


Received: 17 February 2020; Accepted: 21 February 2020; Published: 26 February 2020

#### **1. Introduction**

We are going to see the first decade since the fundamental concept of massive multiple-input multiple-output (MIMO) (also called large-scale MIMO) has emerged [1]. Massive MIMO is expected to be one of the most promising technologies towards the fifth generation mobile communications (5G) and beyond. Implementation [2,3] and trials [4,5] are actively proceeded. Especially, massive array beamforming has a good match to millimeter wave communication [6] which suffers from link budget shortfall due to its high frequency. Further, thanks to its excessive degree of freedom (DoF), massive MIMO has unlimited potentiality to further enhance system capabilities [7] and still expands various research topics with depth. It should be further discussed and believed to break limitations in wireless communications such as spectral and energy efficiencies for better support of continuously increasing mobile data traffic, as well as terminals driven by Internet of things (IoT). The key contribution of this special issue is to provide readers with new insights and facilitate plentiful discussions in this field.

#### **2. The Present Issue**

This special issue consists of nineteen papers covering wide and important topics in the field of massive MIMO systems, including both fundamental regions such as computation complexity, energy efficiency, pilot contamination, channel estimation, antenna design, non-orthogonal multiple access (NOMA) and millimeter-wave beamforming, as well as emerging topic such as machine learning incorporation. From the system model aspect, variety of scenario have also been covered such as single/multi-cell, distributed antennas, heterogeneous network, IEEE802.11ac and long term evolution (LTE) standards.

Distributed antenna systems (DAS) or base station (BS) cooperation have actively investigated since it can provide array diversity or multiplexing gain due to low spatial correlation of distributed antennas. Its extension to massive MIMO was analyzed in terms of spectral and energy efficiencies with considering hardware impairment such as phase noise [8] and analog-to-digital converter (ADC) resolution [9]. In the distributed massive MIMO structure, sounding reference signal (SRS) design and channel estimation scheme were proposed in order to mitigate the pilot contamination impact [10].

Work in [11] proposed a path loss based pilot allocation strategy and pseudo-random code pilot design. In [12], a modified heuristic pilot assignment algorithm was proposed. Its optimization criteria is to maximize the minimum uplink signal-to-interference plus noise power ratio (SINR). Efficient channel state information (CSI) estimation scheme was proposed in [13]. It exploits prior CSI of the previous timeslot having temporal correlation in the angular domain. Differential modulation unnecessitates channel estimation and is preferable especially in massive MIMO systems. In [14], incoherent detection for differential modulation was expanded to multiple symbols in the single cell scenario. For further capacity enhancement, multiplexing in the power domain, i.e., NOMA enabled by successive interference cancellation (SIC), was introduced [15].

In millimeter-wave communication, almost line-of-sight (LoS) channel or Ricean fading channel is expected. Exploiting CSI of the LoS component, spectral efficiency of equal gain transmission and combining (EGT/EGC) was analyzed in Ricean fading frequency selective fading channel with cooperative relaying scenarios [16]. Such relaying approach is also effective in heterogeneous network where small cell BSs play a role of relay the macro cell BS and user terminals. Reference [17] proposed eigenvector decomposition based hybrid beamforming in the above scenario.

In the practical viewpoint, limited statistical CSI feedback constraint was considered and machine learning based user grouping aided hybrid beamforming was proposed [18]. Further, CSI estimation elimination approach, which applies a blind adaptive array signal processing, has been proposed and its practical performance was evaluated with considering medium access control (MAC) layer overhead of IEEE802.11ac and frequency division duplex (FDD) based LTE standards [19].

Computation complexity for pre/post coding is also significant problem on massive MIMO systems. Suppose uplink transmission, iteration-based new detection algorithms were proposed. One is the extension of linear minimum mean squared error (MMSE) post coding and log-likelihood ratio (LLR) calculation [20] and another is based on the maximum likelihood (ML) detection and iterative discrete estimation approaches [21].

Focusing on energy efficiency, reference [22] proposed simplified beamforming as well as power allocation strategies for the scenario wherein unicast and multicast users are non-orthogonally multiplexed. Discontinuous reception can also contribute to improve the energy efficiency. Authors in [23] introduced an artificial intelligence (AI) approach, i.e., recurrent neural network (RNN), to adapt sleep cycles of user terminals.

In realization of large-scale antenna arrays, we should pay attention to antenna manufacturing. Reference [24] developed Bayesian compressive sensing based planar array diagnostic tool for efficient and reliable testing. New antenna structures were designed; dual-polarized diamond-ring slot antenna array exhibiting wide bandwidth [25], and leaky-wave antenna array incorporating metamaterial shield [26] which can suppress the mutual coupling.

#### **3. Future**

Now discussions towards 6G has started. Massive MIMO is still expected as a promising contributor for 6G [27–29], e.g., referred as 'ultra massive MIMO'. Its potentiality will be truly realized through relentless effort on R&D including the advance of hardware performance. Variety of massive MIMO technologies, which were widely addressed in this special issue, could be one of the most important solutions to bring a breakthrough towards beyond 5G or 6G.

**Author Contributions:** K.M. and F.F. worked together in the whole editorial process of the special issue, 'Massive MIMO Systems' published by journal Electronics. K.M. drafted this editorial summary. K.M. and F.F. reviewed, edited and finalized the manuscript. All authors have read and agree to the published version of the manuscript.

**Acknowledgments:** First of all we would like to thank all researchers who submitted articles to this special issue for their excellent contributions. We are also grateful to all reviewers who contributed evaluations of scientific merits and quality of the manuscripts and provided countless valuable suggestions to improve their quality and the overall value for the scientific community. We would like to acknowledge the editorial board of Electronics journal, who invited us to guest edit this special issue. We are also grateful to the Electronics Editorial Office staff who worked thoroughly to maintain the rigorous peer-review schedule and timely publication.

**Conflicts of Interest:** The authors declare no conflicts of interest.

*Electronics* **2020**, *9*, 385

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Downlink Spectral Efficiency Analysis in Distributed Massive MIMO with Phase Noise**

#### **Qian Lv, Jiamin Li \*, Pengcheng Zhu, Dongming Wang and Xiaohu You**

National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China; seulvqian@seu.edu.cn (Q.L.); p.zhu@seu.edu.cn (P.Z.); wangdm@seu.edu.cn (D.W.); xhyu@seu.edu.cn (X.Y.)

**\*** Correspondence: lijiamin@seu.edu.cn; Tel.: +86-025-5209-1635

Received: 25 October 2018 ; Accepted:10 November 2018 ; Published: 12 November 2018

**Abstract:** To achieve the advantages provided by massive multiple-input multiple-output (MIMO), a large number of antennas need to be deployed at the base station. However, for the reason of cost, inexpensive hardwares are employed in the realistic scenario, which makes the system distorted by hardware impairments. Hence, in this paper, we analyze the downlink spectral efficiency in distributed massive MIMO with phase noise and amplified thermal noise. We provide an effective channel model considering large-scale fading, small-scale fast fading and phase noise. Based on the model, the estimated channel state information (CSI) is obtained during the pilot phase. Under the imperfect CSI, the closed-form expressions of downlink achievable rates with maximum ratio transmission (MRT) and zero-forcing (ZF) precoders in distributed massive MIMO are derived. Furthermore, we also give the user ultimate achievable rates when the number of antennas tends to infinity with both precoders. Based on these expressions, we analyze the impacts of phase noise on the spectral efficiency. It can be concluded that the same limit rate is achieved with both precoders when phase noise is present, and phase noise limits the spectral efficiency. Numerical results show that ZF outdoes MRT precoder in spectral efficiency and ZF precoder is more affected by phase noise.

**Keywords:** distributed massive MIMO; phase noise; amplified thermal noise; spectral efficiency

#### **1. Introduction**

Massive multiple-input multiple-output (MIMO) is becoming a promising technology to provide significant gains [1–6]. Since it was first proposed, massive MIMO has been studied extensively. The main feature of massive MIMO is that hundreds (or even thousands) of antennas are employed at each base station, simultaneously serving tens of users in the same time-frequency resource, which offers big advantages compared to conventional MIMO. Firstly, it can bring unprecedented spatial degrees-of-freedom, which enables the improvement of spectral efficiency and energy efficiency even with simple linear receivers or precoders [7]. In addition, user channels in massive MIMO systems are nearly orthogonal and fast fading, intra-cell interference can be averaged out. Massive MIMO can be divided into two categories: one is co-located massive MIMO and the other is distributed massive MIMO [8]. The latter has promising advantages of increasing energy efficiency, system coverage and spectral efficiency, which results from the increase in macro-diversity gain and the reduction in access distance [9–15]. Considering these advantages, we analyze the spectral efficiency of distributed massive MIMO in this paper. Notably, due to the different access distance between each user and all remote antenna units (RAUs), the channel vectors are non-isotropic, which makes the analysis of performance in distributed massive MIMO more difficult and more complex.

In practical communication systems, inevitable hardware impairments occur and cannot be eliminated even after applying calibration and compensation techniques [16,17]. These impairments can be divided into two categories: multiplicative distortion and additive distortion. Phase noise introduced by the local oscillators of transceivers is the multiplicative distortion. It will cause random rotations of the transmitted data symbols, which degrades the system performance. Furthermore, phase noise makes the estimated channel state information (CSI) more inaccurate and it introduces a phenomenon called channel aging which means the estimated CSI obtained during pilot phase is different from that used for downlink transmission. It is pointed out in [8] that the deployment cost and circuit power consumption of massive MIMO scale linearly with the number of antennas. Therefore inexpensive but hardware-constraint hardware may be deployed for the reason of cost, which makes the hardware impairments more severe in massive MIMO.

Analyzing the spectral efficiency is a fundamental method to evaluate the impacts of phase noise. The impacts of phase noise for uplink transmission have been studied in [18–21] and for downlink transmission were investigated in [22–24]. The impacts of phase noise on physical layer security for downlink massive MIMO were investigate in [22]. The achievable rate was derived in [23] considering the frequency-selective multipath fading channel. The capacity of downlink transmission with linear precoders was analyzed in [24] but it assumed that the number of antennas and users was asymptotically large and it only considered a co-located MIMO system.

Herein, considering a distributed massive MIMO with phase noise and amplified thermal noise, we analyzed the downlink spectral efficiency for any number of antennas and users. Followings are the key contributions of this paper:


The rest of this paper is organized as follows. System model including system configuration, a model describing phase noise and an effective channel model is introduced in Section 2. We obtain the estimated CSI during the uplink pilot training phase and analyze the spectral efficiency with linear precoders in Section 3. Numerical results are given in Section 4. A conclusion is provided in Section 5.

*Notation:* Column vectors **x** and matrices **X** are denoted by bold letters in lower case and in upper case, respectively. **<sup>I</sup>***<sup>N</sup>* is a *<sup>N</sup>* × *<sup>N</sup>* identity matrix. (·)<sup>H</sup> and (·)<sup>T</sup> are the conjugate transpose and transpose operator, respectively. Scalars *x* are denoted by italic letters. |*x*| represents the absolute value of *<sup>x</sup>* and **X** denotes the spectral norm of **<sup>X</sup>**. <sup>E</sup>[·] and var(·) represent the expectation operator and variance operator, respectively. CN (0, *<sup>σ</sup>*2) represents circularly symmetric complex Gaussian distribution with mean zero and variance *σ*2. Γ(*k*, *θ*) means Gamma distribution with shape parameter *k* and scale parameter *θ*. Similarly, Nakagami(*m*, Ω) means Nakagami distribution with shape parameter *m* and controlling spread parameter Ω.

#### **2. System Model**

Considering a distributed massive MIMO system, we first describe the system configuration and give the conventional channel model. Next, we present a model describing phase noise and give an effective channel model incorporating the impacts of phase noise.

We consider the downlink transmission of a single-cell multi-user distributed massive MIMO system comprising *M* RAUs and *K* single-antenna users as in Figure 1. Each RAU is equipped with an array of *N* antennas. All users and RAUs are randomly distributed in the cell.

**Figure 1.** System Model.

Frequency-flat fading channels are assumed and the system runs in time-division duplex (TDD) protocol. The channel vector between all RAUs and the *k*-th user is given by

$$\mathbf{\tilde{g}}\_k \stackrel{\Lambda}{=} \left[ \mathbf{\tilde{g}}\_k^1 \cdot \cdots \mathbf{\tilde{g}}\_k^{MN} \right] = \mathbf{\Lambda}\_k^{1/2} \mathbf{h}\_{k'} \tag{1}$$

where **<sup>Λ</sup>***<sup>k</sup>* = E **g˜** *<sup>k</sup>***g˜** <sup>H</sup> *k* = diag(*λ*1,*<sup>k</sup>* ··· *λM*,*k*) ⊗ **I***<sup>N</sup>* is the covariance matrix, *λm*,*<sup>k</sup>* Δ = *cd*−*<sup>α</sup> <sup>m</sup>*,*<sup>k</sup>* denotes the path loss between the *m*-th RAU and the *k*-th user, *dm*,*<sup>k</sup>* is the corresponding distance, *α* is the path loss exponent, *c* is the median of the mean path gain at a reference distance *dm*,*<sup>k</sup>* = 1 km, and **h***<sup>k</sup>* ∼ CN (0,**I***MN*) is the small-scale fast fading vector.

In this paper, we consider a more realistic scenario where the antennas deployed at each RAU are inexpensive and hardware-constrained. Specifically, each antenna experiences phase noise which distorts communication. The phase noise means the multiplicative phase drift in the signal, which comes from the local oscillators (LOs) of the RAUs and users. We assume that the LOs are free-running without a phase-locked loop (PLL), and then the phase noise is commonly modeled as a discrete-time independent Wiener process [8,25]. Mathematically, the phase noises at the LOs of the *n*-th antenna and the *k*-th user are denoted as

$$\phi\_n\left(t\right) \sim \mathcal{N}\left(\phi\_n\left(t-1\right), \sigma\_{\phi,n}^2\right),\tag{2}$$

$$
\varphi\_k(t) \sim \mathcal{N}\left(\varphi\_k\left(t-1\right), \sigma\_{\varphi,k}^2\right),
\tag{3}
$$

which equal the previous realization *φ<sup>n</sup>* (*t* − 1) and *ϕ<sup>k</sup>* (*t* − 1) plus an independent zero-mean Gaussian random increment with variances *σ*<sup>2</sup> *<sup>φ</sup>*,*<sup>n</sup>* and *σ*<sup>2</sup> *<sup>ϕ</sup>*,*k*. The variances are dependent on the carrier frequency and symbol time [25].

The phase noise can be independent or correlated between antennas of each RAU. In our analysis, we have assumed that the phase noise correlated between antennas of one RAU and independent among RAUs. Then by expressing the total phase noise as a multiplicative factor, we can rewrite the channel vector as

$$\mathbf{g}\_k\left(t\right) = \mathbf{G}\_k\left(t\right)\tilde{\mathbf{g}}\_{k\prime} \tag{4}$$

where **<sup>Θ</sup>***<sup>k</sup>* (*t*) <sup>Δ</sup> = diag *ejθ*1 *<sup>k</sup>* (*t*) , ··· ,*ejθMN <sup>k</sup>* (*t*) <sup>=</sup> *<sup>e</sup>jϕ<sup>k</sup>* (*t*)**<sup>Φ</sup>** (*t*) <sup>∈</sup> <sup>C</sup>*MN*×*MN* is the total phase noise, wherein **<sup>Φ</sup>** (*t*) <sup>Δ</sup> <sup>=</sup> diag(*ejφ*1(*t*), ··· ,*ejφM*(*t*)) <sup>⊗</sup> **<sup>I</sup>***<sup>N</sup>* is the phase noise induced by all RAUs, and similarly, *<sup>e</sup>jϕ<sup>k</sup>* (*t*) corresponds to the phase drift pruduced by the *k*-th user. Notably, because of the presence of phase noise, the effective channel becomes time-dependent.

**Remark 1.** *The conventional channel model without phase noise is obtained when σ*<sup>2</sup> *<sup>φ</sup>*,*<sup>n</sup>* = *σ*<sup>2</sup> *<sup>ϕ</sup>*,*<sup>k</sup>* = 0, ∀*n*, *k.*

#### **3. Downlink Spectral Efficiency Analysis**

In this section, firstly, based on the effective channel model given above, we assume pilot sequence aided transmission is employed and give the channel estimation. Next, since the channel vectors are non-isotropic in distributed massive MIMO and the correlation between channel vectors and intended beams for each user is destroyed by phase noise, we give the approximated distribution of desired signal and interference powers. After that, we derive the closed-form expressions of the ergodic achievable downlink rates with both MRT and ZF precoders.

#### *3.1. Channel Estimation*

As mentioned before, the transmission protocol is assumed as TDD. Each coherence block occupying *T* channel uses is split into two parts: one for uplink pilot symbols and the other for downlink data symbols. In order to guarantee that the pilot symbols of *K* users are orthogonal to each other, it's necessary to allocate *τ* ≥ *K* symbols for pilot transmission. Then the remaining *T* − *τ* channel uses are used for downlink data transmission.

During the pilot training phase, the pilot sequence **x***<sup>k</sup>* Δ = [*xk*(1), ··· , *xk*(*τ*)] <sup>T</sup> is assigned to user *k*. Incorporating the hardware impairments, the received pilot vector **y**<sup>p</sup> at the base station at time *t* ∈ [0, *τ*] is given as

$$\mathbf{y}\_{\mathbf{P}}(t) = \sum\_{k=1}^{K} \mathbf{g}\_{k}(t) \, \mathbf{x}\_{k}(t) + \mathbf{n}^{\text{BS}}(t),\tag{5}$$

where **<sup>n</sup>**BS(*t*) ∼ CN 0, *ξ*BS**I***MN* is the amplified thermal noise at time slot *t*, and its variance *ξ*BS is larger than the variance *σ*<sup>2</sup> of thermal noise. This is because of the effects of low noise amplifiers, mixers and other components.

Let **<sup>Ψ</sup>** <sup>Δ</sup> = - **y**T <sup>p</sup>(1), ··· , **<sup>y</sup>**<sup>T</sup> <sup>p</sup>(*τ*) T <sup>∈</sup> <sup>C</sup>*τMN*×1. Motivated by [8,26], the Linear Minimum Mean Square Error (LMMSE) estimation of the channel of the *k*-th user obtained by pilot training is given by

$$\mathbf{g}\_k(t) = \mathbf{A}\_k \mathbf{H}\_k(t) \boldsymbol{\Sigma}^{-1} \mathbf{Y},\tag{6}$$

where

$$\begin{split} \mathbf{H}\_{k}(t) &= \left[\mathbf{H}\_{k,1}(t), \dots, \mathbf{H}\_{k,\tau}(t)\right], \\ \boldsymbol{\Sigma} &\stackrel{\scriptstyle \Delta}{=} \sum\_{j=1}^{K} \mathbf{B}\_{j} + \boldsymbol{\xi}^{\textrm{BS}} \mathbf{I}\_{\tau MN\prime} \\ \mathbf{H}\_{k,j}(t) &= \mathbf{x}\_{k}^{\*}(t) \mathbf{D}\_{k,i}(t), \\ \mathbf{D}\_{k,j}(t) &= \text{diag}\left(e^{-\frac{r\_{\phi,1}^{2} + \frac{\sigma\_{\phi}^{2}}{2}} |t-i|}, \dots, e^{-\frac{r\_{\phi,M}^{2} + \frac{\sigma\_{\phi}^{2}}{2}} |t-i|}\right) \odot \mathbf{I}\_{N} \\ \left[\mathbf{B}\_{j}\right]\_{u,v} &\stackrel{\scriptstyle \Delta}{=} \mathbf{A}\_{j} \mathbf{x}\_{j}(u) \mathbf{x}\_{j}^{\*}(v) \text{diag}\left(e^{-\frac{r\_{\phi,1}^{2} + \frac{\sigma\_{\phi}^{2}}{2}} |u-v|}, \dots, e^{-\frac{r\_{\phi,M}^{2} + \frac{\sigma\_{\phi}^{2}}{2}} |u-v|}\right) \odot \mathbf{I}\_{N}. \end{split}$$

The pilot sequences can be designed in different ways. Without loss of generation, in this paper we assume that the number of pilot symbols is equal to that of users, i.e., *τ* = *K*. More specifically, we assume that the set of orthogonal pilot sequences **X**<sup>P</sup> Δ = [**x**1, ··· , **x***K*] is a diagonal matrix and each element of it is √*ρ*p, wherein *ρ*<sup>p</sup> is the average transmit power of pilot symbols. This is equivalent to the assumption made in [18,20].

Under these assumptions, we give a definition of

$$\beta\_{m,k}(t) = \frac{e^{-\frac{\nu\_{\Phi,k}^2 + \nu\_{\Phi,m}^2}{2}|t-k|}\sqrt{\rho\_{\mathbf{F}}}\lambda\_{m,k}}{\sqrt{\rho\_{\mathbf{F}}\lambda\_{m,k} + \xi^{\mathbf{BS}}}},\tag{7}$$

then we can rewrite the LMMSE estimation **g**ˆ *<sup>k</sup>*(*t*) in (6) as

$$\begin{split} \mathbf{g}\_k(t) &= \boldsymbol{\Lambda}\_k \mathbf{H}\_k(t) \boldsymbol{\Sigma}^{-1/2} \mathbf{\hat{f}}\_k \\ &= \left( \boldsymbol{\beta}\_{1,k}(t) \mathbf{\hat{f}}\_{1,k}^T, \dots, \boldsymbol{\beta}\_{M,k}(t) \mathbf{\hat{f}}\_{M,k}^T \right)^T, \end{split} \tag{8}$$

where *βm*,*k*(*t*) is the equivalent large-scale fading part from user *k* to RAU *m* and **hˆ** *<sup>k</sup>* = [**hˆ** <sup>T</sup> 1,*k*, ··· , **hˆ** <sup>T</sup> *<sup>M</sup>*,*k*] <sup>T</sup> = **<sup>Σ</sup>**−1/2**<sup>Ψ</sup>** ∼ CN (0,**I***MN*) represents the equivalent small-scale fast fading part.

Because of the orthogonality principle of LMMSE estimation theory, the channel vector **g***k*(*t*) can be decomposed as

$$\mathbf{g}\_k(t) = \mathbf{\hat{g}}\_k(t) + \mathbf{e}\_k(t),\tag{9}$$

where **e***k*(*t*) is the uncorrelated and statistically independent of **g**ˆ *<sup>k</sup>*(*t*) estimation error.

During the pilot transmission phase, we obtain the estimated channel showing in (8). In our analysis, it is assumed that the beamforming vector is designed by using the estimated CSI once during the pilot transmission phase and then is applied for the entire duration of the downlink transmission phase.

#### *3.2. Downlink Signal Model*

For downlink transmission, the received signal of user *k* at time *t* ∈ [*τ* + 1, *T*] is given as

$$r\_k(t) = \sqrt{\rho\_{\text{dl}}} \tilde{\mathbf{g}}\_k^{\text{H}} \Theta\_k^\* \left( t \right) \mathbf{x} + n^{\text{UE}} \left( t \right), \tag{10}$$

where *<sup>ρ</sup>*dl is the downlink transmission power, *<sup>n</sup>*UE (*t*) ∼ CN 0, *ξ*UE is the amplified thermal noise of users at time slot *<sup>t</sup>*, *<sup>ξ</sup>*UE is the variance of the noise, and **<sup>x</sup>** <sup>∈</sup> <sup>C</sup>*MN*×<sup>1</sup> is the signal vector transmitted by all *M* RAUs. Specifically, **x** can be given by

$$\mathbf{x} = \sum\_{l=1}^{K} \mathbf{w}\_{l} s\_{l\prime} \tag{11}$$

where *sl* ∼ CN (0, 1) is the transmitted data symbol assigned for user *l*, and **w***<sup>l</sup>* is the beamforming vector designed at time slot *τ*. MRT and ZF precoders are considered in our analysis. Mathematically, these two linear precoders can be defined as

$$\mathbf{w}\_{l} = \begin{cases} \frac{\frac{\mathbf{g}\_{l}(\tau)}{\|\mathbf{g}\_{l}(\tau)\|}}{\|\mathbf{g}\_{l}(\tau)\|} & \text{MRT}, \\\frac{\mathbf{a}\_{l}(\tau)}{\|\mathbf{a}\_{l}(\tau)\|} & \text{ZF}, \end{cases} \tag{12}$$

where **a***<sup>l</sup>* (*τ*) is the *l*-th column of **Gˆ** (*τ*) **Gˆ** <sup>H</sup> (*τ*) **Gˆ** (*τ*) <sup>−</sup><sup>1</sup> , wherein **Gˆ** (*τ*) <sup>=</sup> [**gˆ** <sup>1</sup>(*τ*), ··· , **gˆ** *<sup>K</sup>*(*τ*)]. Considering (4), we can rewrite (10) as

$$r\_k\left(t\right) = \sqrt{\rho\_{\text{dl}}} \mathbf{g}\_k^{\text{H}}\left(\tau\right) \tilde{\Theta}\_k\left(t\right) \mathbf{x} + n^{\text{UE}}\left(t\right),\tag{13}$$

where

$$\Theta\_k(t) \stackrel{\Delta}{=} \text{diag}\left(e^{-j\left(\theta\_k^1(t) - \theta\_k^1(\tau)\right)} \iota \dots \iota \iota \mathfrak{e}^{-j\left(\theta\_k^{MN}(t) - \theta\_k^{MN}(\tau)\right)}\right) . \tag{14}$$

*Electronics* **2018**, *7*, 317

It is assumed that users have the statistical properties of the channel and they don't carry out channel estimation. So only statical CSI can be used by downlink users to detect the signal. Motivated by [27], we rewrite the received data as

$$r\_k(t) = \sqrt{\rho\_{\rm dl}} \mathbb{E}\left[\mathbf{g}\_k^{\rm H}(\tau) \,\tilde{\Theta}\_k(t) \,\mathbf{w}\_k\right] s\_k + n'\_{\prime} \tag{15}$$

where

$$m' = \sqrt{\rho\_{\rm dl}} \left( \mathbf{g}\_{\rm k}^{\rm H} \left( \boldsymbol{\tau} \right) \tilde{\boldsymbol{\Theta}}\_{\rm k} \left( \boldsymbol{t} \right) \mathbf{w}\_{\rm k} - \mathbb{E} \left[ \mathbf{g}\_{\rm k}^{\rm H} \left( \boldsymbol{\tau} \right) \tilde{\boldsymbol{\Theta}}\_{\rm k} \left( \boldsymbol{t} \right) \mathbf{w}\_{\rm k} \right] \right) \mathbf{s}\_{\rm k} + \sqrt{\rho\_{\rm dl}} \sum\_{i \neq \rm k}^{K} \mathbf{g}\_{\rm k}^{\rm H} \left( \boldsymbol{\tau} \right) \tilde{\boldsymbol{\Theta}}\_{\rm k} \left( \boldsymbol{t} \right) \mathbf{w}\_{i} \mathbf{s}\_{i} + n^{\rm UE} \left( \boldsymbol{t} \right) \boldsymbol{.} $$

Suppose E **g**H *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) **<sup>w</sup>***<sup>k</sup> sk* is the only signal needed at user *k*, and treating *n* as unrelated Gaussian distributed additive noise [28,29], the achievable downlink rate of user *k* is denoted as

$$R\_k = \log\_2\left(1 + \frac{\rho\_{\rm dl} \left| \mathbb{E}\left[\mathbf{g}\_k^{\rm H}(\tau) \, \tilde{\Theta}\_k(t) \, \mathbf{w}\_k\right] \right|^2}{A(t) + B(t) + \tilde{\mathsf{J}}^{\rm UE}}\right),\tag{16}$$

where

$$\begin{aligned} A(t) &= \rho\_{\text{dl}} \text{var}\left(\mathbf{g}\_k^{\text{H}}\left(\tau\right) \vec{\Theta}\_k\left(t\right) \mathbf{w}\_k\right), \\ B(t) &= \sum\_{i \neq k}^{K} \rho\_{\text{dl}} \mathbb{E}\left[ \left| \mathbf{g}\_k^{\text{H}}\left(\tau\right) \vec{\Theta}\_k\left(t\right) \mathbf{w}\_i \right|^2 \right]. \end{aligned}$$

#### *3.3. Downlink Achievable Rates*

From (16) we can find that the correlation between **g***<sup>k</sup>* and **w***<sup>k</sup>* are destroyed by phase noise and the powers of non-isotropic channel vectors projected onto the precoder subspace are necessary to obtain the closed-form expressions. Hence we first present some preliminary lemmas which help us to obtain the approximated and isotropic results.

**Lemma 1** ([28])**.** *For an isotropic random vector* **<sup>x</sup>** <sup>∈</sup> <sup>C</sup>*N*×<sup>1</sup> *whose elements are independent and all distributed as* CN (0, *<sup>σ</sup>*2)*, then the distribution of* **<sup>x</sup>**H**<sup>x</sup>** *is* <sup>Γ</sup>(*N*, *<sup>σ</sup>*2)*.*

The strength of the estimated channel from user *k* to all RAUs is

$$\mathbf{g}\_k^{\rm H}(t)\mathbf{g}\_k(t) = \sum\_{m=1}^{M} \beta\_{m,k}^2(t)\hat{\mathbf{h}}\_{m,k}^{\rm H}\hat{\mathbf{h}}\_{m,k} \tag{17}$$

According to Lemma 1, *β*<sup>2</sup> *m*,*k*(*t*)**h**<sup>ˆ</sup> <sup>H</sup> *<sup>m</sup>*,*k***h**<sup>ˆ</sup> *<sup>m</sup>*,*<sup>k</sup>* is distributed as <sup>Γ</sup>(*N*, *<sup>β</sup>*<sup>2</sup> *<sup>m</sup>*,*k*(*t*)). Hence (17) is the sum of *M* non-identically distributed but independent items. To obtain its distribution, Lemma 2 stated bellow can be used.

**Lemma 2** ([15])**.** *If* {*xi*} *are a set of random variables and independent of each other, each term is distributed as* Γ(*χi*, *θi*)*. Then the distribution of the sum* ∑*<sup>i</sup> xi can be approximated as* ∑*<sup>i</sup> Xi* ∼ Γ(*χ*, *θ*) *wherein*

$$\chi = \frac{\left(\sum\_{i} \chi\_{i} \theta\_{i}\right)^{2}}{\sum\_{i} \chi\_{i} \theta\_{i}^{2}}, \; \theta = \frac{\sum\_{i} \chi\_{i} \theta\_{i}^{2}}{\sum\_{i} \chi\_{i} \theta\_{i}}.\tag{18}$$

**Remark 2.** *From Lemma 2, the distribution of* **g**ˆ <sup>H</sup> *<sup>k</sup>* (*t*)**g**ˆ *<sup>k</sup>*(*t*) *can be approximated as* Γ(*χk*(*t*), *θk*(*t*))*, wherein*

$$\chi\_k(t) = N \frac{\left(\sum\_{m=1}^{M} \beta\_{m,k}^2(t)\right)^2}{\sum\_{m=1}^{M} \beta\_{m,k}^4(t)},\tag{19}$$

$$\theta\_k(t) = \frac{\sum\_{m=1}^{M} \beta\_{m,k}^4(t)}{\sum\_{m=1}^{M} \beta\_{m,k}^2(t)}. \tag{20}$$

Similarly, we can also give the distribution of **e**<sup>H</sup> *<sup>k</sup>* (*t*)**e***k*(*t*) as Γ(*χ*e(*t*), *θ*e(*t*)), wherein

$$\chi\_{\mathbf{e}}(t) = N \frac{\left(\sum\_{m=1}^{M} \eta\_{m,k}^{2}(t)\right)^{2}}{\sum\_{m=1}^{M} \eta\_{m,k}^{4}(t)},\tag{21}$$

$$\theta\_{\mathbf{e}}(t) = \frac{\sum\_{m=1}^{M} \eta\_{m,k}^{4}(t)}{\sum\_{m=1}^{M} \eta\_{m,k}^{2}(t)},\tag{22}$$

where *η*<sup>2</sup> *<sup>m</sup>*,*k*(*t*) = *<sup>λ</sup>m*,*<sup>k</sup>* − *<sup>β</sup>*<sup>2</sup> *<sup>m</sup>*,*k*(*t*).

Based on the lemmas and analysis above, we give the following lemma about the projection principle of non-isotropic vectors.

**Lemma 3** ([28])**.** *When we project an MN-dimensional non-isotropic estimated channel vector* **gˆ** *<sup>k</sup>* <sup>∈</sup> <sup>C</sup>*MN*×<sup>1</sup> *onto a p-dimensional subspace, we can give the approximated distribution of the projection power as* Γ(*pχk*/(*MN*), *θk*)*.*

**Remark 3.** *The dimension p can be given by p* = *MN with MRT precoder and p* = *MN* − *K* + 1 *with ZF precoder, respectively, and for any independent beam, we can have p* = 1 *[30,31].*

When MRT and ZF precoders are employed, based on the analysis above, we can give the distribution of the signal power at user *k* and the distribution of the interference power at user *k*

$$\left| \left| \mathbf{g}\_k^H(\tau) \frac{\mathbf{g}\_k(\tau)}{||\mathbf{g}\_k(\tau)||} \right|\_\circ^2 \sim \Gamma\left(\chi\_k(\tau), \theta\_k(\tau)\right),\tag{23}$$

$$\left| \mathbf{g}\_k^H(\tau) \frac{\mathbf{a}\_k(\tau)}{||\mathbf{a}\_k(\tau)||} \right|^2 \sim \Gamma\left( \frac{MN - K + 1}{MN} \chi\_k(\tau), \theta\_k(\tau) \right),\tag{24}$$

$$\left|\mathbf{e}\_{k}^{\mathrm{H}}(\tau)\mathbf{w}\_{i}\right|^{2}\sim\Gamma\left(\frac{1}{MN}\chi\_{\mathrm{e}}(\tau),\theta\_{\mathrm{e}}(\tau)\right).\tag{25}$$

Notably, **w***<sup>i</sup>* in (25) can be either MRT precoder or ZF precoder and the equation still holds when *i* = *k*, due to the independence of **e***k*(*t*) and **w***i*.

Based on the analysis above, we can give the approximated distribution of **gˆ** H *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) **<sup>w</sup>***<sup>k</sup>* <sup>2</sup> as

$$\left| \left| \mathbf{g}\_k^H(\tau) \,\Theta\_k(t) \, \frac{\mathbf{g}\_k(\tau)}{||\mathbf{g}\_k(\tau)||} \right|^2 \sim \Gamma\left( \chi\_k(\tau), \theta\_k(\tau) \right), \tag{26}$$

$$\left| \left| \mathbf{g}\_k^{\rm H}(\tau) \vec{\Theta}\_k(t) \frac{\mathbf{a}\_k(t)}{||\mathbf{a}\_k(t)||} \right|^2 \sim \Gamma \left( \frac{\rm MN - K + 1}{\rm MN} \chi\_k(t), \theta\_k(t) \right). \tag{27}$$

Figure 2 verifies the accuracy of the approximation in (26) and (27). It illustrates the cumulative distribution function (CDF) curves of **gˆ** H *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) **<sup>w</sup>***<sup>k</sup>* <sup>2</sup> with MRT precoder. The phase noise variance is set as *σ*<sup>2</sup> *<sup>φ</sup>* = *σ*<sup>2</sup> *<sup>ϕ</sup>* = 10−2. It can be seen that although the random variable Θ˜ *<sup>k</sup>* (*t*) will destroy the correlation of **gˆ** *<sup>k</sup>* and **w***k*, the approximation is exactly accurate when the variance of the phase noise is 10−2, and it will be more accurate when the variance of the phase noise is lower 10−2. For ZF precoder, we can get the same conclusion. Meanwhile, the phase noise variance is generally *σ*2 *<sup>φ</sup>* = *σ*<sup>2</sup> *<sup>ϕ</sup>* = 1.58 × <sup>10</sup>−<sup>4</sup> [8,20,26]. Hence, it's reasonable to use (26) and (27) to analyze the downlink spectral efficiency.

**Figure 2.** Cumulative distribution function of signal power with MRT precoder under different *M* and *N*.

Based on the lemmas above, the spectral efficiency for downlink transmission with both MRT and ZF precoders under hardware impairments is analyzed. The theorems and corollary stated below give the closed-form expressions of the downlink achievable rates and system asymptotic performance.

**Theorem 1.** *When MRT precoder is used, the closed-form expression of the downlink achievable rate under hardware impairments is given by*

$$R^{\rm mrt}(t) = \frac{D^{\rm mrt}(t)}{A^{\rm mrt}(t) + B^{\rm mrt}(t) + \xi^{\rm UJE}/\rho\_{\rm dl}},\tag{28}$$

*where*

$$\begin{split} D^{\mathrm{mrt}}(t) &= \left(\frac{\Gamma\left(\lambda\_{k}^{\prime}(t) + \frac{1}{2}\right)}{\Gamma\left(\lambda\_{k}^{\prime}(t)\right)}\right)^{2} \theta\_{k}^{\prime}(t), \\ \chi\_{k}^{\prime}(t) &= N \frac{\left(\sum\_{m=1}^{M} e^{-\left(c\_{\varphi,k}^{2} + \sigma\_{\widehat{\varphi},m}^{2}\right)|t-\tau|} \beta\_{m,k}^{2}(t)\right)^{2}}{\sum\_{m=1}^{M} e^{-2\left(c\_{\varphi,k}^{2} + \sigma\_{\widehat{\varphi},m}^{2}\right)|t-\tau|} \beta\_{m,k}^{4}(t)}, \\ \theta\_{k}^{\prime}(t) &= \frac{\sum\_{m=1}^{M} e^{-2\left(c\_{\varphi,k}^{2} + \sigma\_{\widehat{\varphi},m}^{2}\right)|t-\tau|} \beta\_{m,k}^{4}(t)}{\sum\_{m=1}^{M} e^{-\left(c\_{\varphi,k}^{2} + \sigma\_{\widehat{\varphi},m}^{2}\right)|t-\tau|} \beta\_{m,k}^{2}(t)}, \\ A^{\mathrm{mrt}}(t) &= N \sum\_{m=1}^{M} \beta\_{m,k}^{2}(\tau) + \frac{1}{M} \sum\_{m=1}^{M} \eta\_{m,k}^{2}(\tau) - D^{\mathrm{mrt}}(t), \\ B^{\mathrm{mrt}}(t) &= \sum\_{\widehat{\alpha} \neq \widehat{\alpha}}^{K} \frac{N \sum\_{m=1}^{M} \beta\_{m,i}^{2}(\tau) \lambda\_{m,k}}{\theta\_{i}(\tau) (\chi\_{i}(\tau) - 1)}. \end{split}$$

**Proof of Theorem 1.** Please refer to Appendix A.

*Electronics* **2018**, *7*, 317

**Theorem 2.** *When ZF precoder is used, the closed-form expression of the downlink achievable rate under hardware impairments is given by*

$$R^{\rm rf}(t) = \frac{D^{\rm rf}(t)}{A^{\rm rf}(t) + B^{\rm rf}(t) + \mathcal{J}^{\rm UE}/\rho\_{\rm dl}},\tag{29}$$

*where*

$$\begin{aligned} D^{\mathrm{xf}}(t) &= \left(\frac{\Gamma\left(\kappa(t) + \frac{1}{2}\right)}{\Gamma\left(\kappa(t)\right)}\right)^2 \theta'\_k(t), \\\ A^{\mathrm{xf}}(t) &= \frac{MN - K + 1}{M} \sum\_{m=1}^M \beta^2\_{m,k}(\tau) - D^{\mathrm{xf}}(t), \\\ B^{\mathrm{xf}}(t) &= \frac{K}{M} \sum\_{m=1}^M \eta^2\_{m,k}(\tau), \\\ \kappa(t) &= \frac{MN - K + 1}{MN} \chi'\_k(t). \end{aligned}$$

**Proof of Theorem 2.** Please refer to Appendix B.

Then, in order to study the effects of phase noise further, we investigate a case where the number of antennas employed at each RAU goes infinity and the number of RAUs and users is fixed. The asymptotic performance provided in Corollary 1 is obtained based on (28) and (29).

**Corollary 1.** *Let N* → ∞*, the ultimate rate of user k with both MRT and ZF precoders is given by*

$$R\_k^{\infty}(t) = \frac{\sum\_{m=1}^{M} \varepsilon^{-\left(\sigma\_{\varphi k}^2 + \sigma\_{\Phi,m}^2\right)|t-\tau|} \beta\_{m,k}^2(\tau)}{\sum\_{m=1}^{M} \beta\_{m,k}^2(\tau) - \sum\_{m=1}^{M} \varepsilon^{-\left(\sigma\_{\varphi k}^2 + \sigma\_{\Phi,m}^2\right)|t-\tau|} \beta\_{m,k}^2(\tau)}\tag{30}$$

**Proof.** Since the proof is similar for both precoders, we only provide the proof for MRT precoder. It can be seen that *χ <sup>k</sup>*(*t*) → ∞ when the number of antennas *N* → ∞. Therefore we can have lim*N*→<sup>∞</sup> Γ(*<sup>χ</sup> <sup>k</sup>* (*t*)+ <sup>1</sup> 2 ) Γ(*χ <sup>k</sup>* (*t*)) 2 − *χ <sup>k</sup>*(*t*) = 0 [29]. Then the limiting rate of user *k* can be obtained directly by dividing the denominator and numerator of (28) by *N*. From Corollary 1 we can see that the ultimate rate without phase noise will be unlimited when *N* tends to infinity, which means that phase noise limits the downlink spectral efficiency.

#### **4. Numerical Results**

In this section, a series of Monte Carlo simulations is used to verify the theoretical results obtained in Section 3. A circular single-cell massive MIMO system is considered. All of the RAUs and users are randomly distributed in the cell and the minimum access distance between RAUs and users is set as *r*<sup>0</sup> = 30 m. The channels are generated by (4), and other simulation parameters are presented in Table 1.


**Table 1.** Basic simulation parameters.

Figure 3 illustrates the theoretical and simulated spectral efficiency with MRT and ZF precoders versus the number of antennas per RAU. The spectral efficiency is the average rate between users. We assume that the variances of phase noise and amplified thermal noise are *σ*<sup>2</sup> *<sup>φ</sup>*,*<sup>m</sup>* = *σ*<sup>2</sup> *<sup>ϕ</sup>*,*<sup>k</sup>* = 1.58 × <sup>10</sup>−<sup>4</sup> ∀*m*, *<sup>k</sup>* and *<sup>ξ</sup>*UE = *<sup>ξ</sup>*BS = 1.58*σ*2. *<sup>t</sup>* is set as *<sup>τ</sup>* + 1. It can be seen that the closed-form expressions in (28), (29) and the simulation results in (16) match well with each other. For both precoders, the spectral efficiency increases and gets more and more close to the limiting average rate with the increasing of *N*. When *N* = 100, the system achieves 80% of the ultimate rate with ZF precoder and 76% with MRT precoder. Furthermore, it can be seen that ZF precoder achieves better performance than MRT.

**Figure 3.** Spectral efficiency against *M* with MRT and ZF precoders.

Next, we investigate the effects of phase noise. Figure 4 illustrates the theoretical spectral efficiency with MRT and ZF precoders against the variance of phase noise. Notably, the variance of phase noise reflects the strength of phase noise. It is assumed that the number of antennas *N* = 50 and other system parameters have the same value as Figure 3. Figure 4 reveals that the spectral efficiency decreases monotonically with the variance of phase noise increasing. In addition, phase noise have a greater impact on ZF precoder. This results form the fact that ZF precoder is more sensitive to CSI. It can be noted that as the variance increases, the performance gap between MRT and ZF precoders becomes smaller. This is because when the phase noise is severe, the loss caused by unknown CSI at user side dominates rather than the interference between users.

**Figure 4.** Spectral efficiency against *σ*<sup>2</sup> *<sup>φ</sup>* = *<sup>σ</sup>*<sup>2</sup> *<sup>ϕ</sup>* using MRT and ZF precoders with *σ*<sup>2</sup> *<sup>φ</sup>*,*<sup>m</sup>* = *<sup>σ</sup>*<sup>2</sup> *<sup>ϕ</sup>*,*<sup>k</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>φ</sup>* = *σ*2 *<sup>ϕ</sup>*, ∀*m*, *k*.

Figure 5 illustrates the instantaneous spectral efficiency during the downlink transmission phase. The number of antennas employed at each RAU is assumed as *N* = 40 and the variance of phase noise is set as *σ*<sup>2</sup> *<sup>φ</sup>*,*<sup>m</sup>* = *σ*<sup>2</sup> *<sup>ϕ</sup>*,*<sup>k</sup>* = 1.58 × <sup>10</sup>−<sup>4</sup> ∀*m*, *<sup>k</sup>*. In addition, the coherence time of channel is set as *<sup>T</sup>* = 200. As shown in Figure 5, the spectral efficiency degrades as *t* increases. This is because the uncertainty of the phase drift between downlink transmission phase and pilot training phase increases with the growing of *t*. Figure 5 reveals that it's improper to use the estimated CSI obtained during the pilot phase for the whole data transmission phase.

**Figure 5.** Spectral efficiency against *t* with MRT and ZF precoders.

#### **5. Conclusions**

In this paper, we analyzed downlink spectral efficiency with hardware impairments in distributed massive MIMO. Initially, employing pilot symbol assisted transmission, we obtained the estimated CSI in a more realistic scenario where transmission is distorted by phase noise and amplified thermal noise. Next, we used the imperfect CSI to derive the closed-form expressions for downlink achievable rates with MRT and ZF precoders. In addition, we obtained the ultimate rate when *N* → ∞. It can be seen that the rate performance was limited by phase noise. Then, numerical results proved that the theoretical analysis was accurate. Furthermore, they also revealed that ZF can achieve larger spectral efficiency than MRT precoder, and hardware impairments had a greater impact on ZF precoder. Finally, spectral efficiency degraded with the increasing of the variance of phase noise and downlink transmission time.

In the future work, we intend to extend our research considering a more effective phase noise model which could lead finer precoding strategies to improve the theoretical rates.

**Author Contributions:** Formal analysis, Q.L.; Supervision, J.L., P.Z., D.W. and X.Y.; Validation, Q.L.; Writing—original draft, Q.L.; Writing—review & editing, Q.L. and J.L.

**Funding:** This work was supported in part by National Natural Science Foundation of China (NSFC) (Grant NO. 61501113, 61571120, 61871122), Jiangsu Provincial Natural Science Foundation (Grant No.BK20150630, BK20180011), six talent peaks project in Jiangsu province, and National Key Special Program No.2018ZX03001008-002.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of Theorem 1**

When MRT precoder is chosen, the following three terms E **g**H *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) **<sup>w</sup>***<sup>k</sup>* 2 , *A*(*t*) and *B*(*t*) need to be calculated showing in (16).

For the term E **g**H *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) **<sup>w</sup>***<sup>k</sup>* 2 , we can obtain

$$\begin{array}{l} \left| \mathbb{E} \left[ \mathbf{g}\_k^{\mathrm{H}} \left( \tau \right) \boldsymbol{\Theta}\_k \left( t \right) \mathbf{w}\_k \right] \right|^2 \\ \overset{(a)}{=} \left| \mathbb{E} \left[ \mathbf{g}\_k^{\mathrm{H}} \left( \tau \right) \mathbb{E} \left[ \bar{\boldsymbol{\Theta}}\_k \left( t \right) \right] \mathbf{w}\_k \right] \right|^2 \\ \overset{(b)}{=} \left| \mathbb{E} \left[ \left[ \mathbf{g}\_k^{\mathrm{H}} \left( \tau \right) \mathbf{D}\_{k,\tau} \left( t \right) \mathbf{w}\_k \right] \right] \right|^2 \\ \overset{(c)}{=} \left( \frac{\Gamma \left( \chi\_k' \left( \tau \right) + \frac{1}{2} \right)}{\Gamma \left( \chi\_k' \left( \tau \right) \right)} \right)^2 \theta\_k' \left( \tau \right), \end{array} \tag{A1}$$

where (*a*) is obtained because Θ˜ *<sup>k</sup>*(*t*) is independent of **g***k*(*τ*) and **w***k*. By exploiting the fact that E - *e* <sup>−</sup>*j*(*θ<sup>m</sup> <sup>k</sup>* (*t*)−*θ<sup>m</sup> <sup>k</sup>* (*τ*)) <sup>=</sup> *<sup>e</sup>*<sup>−</sup> *<sup>σ</sup>*<sup>2</sup> *ϕ*,*k*+*σ*<sup>2</sup> *φ*,*m* <sup>2</sup> <sup>|</sup>*t*−*τ*<sup>|</sup> and **e***k*(*t*) is independent of **w***k*, we can get (*b*), and (*c*) results from Lemma 2, Lemma 3 and the relationship between Gamma distribution and Nakagami distribution.

For the term *A*(*t*), we obtain

$$\begin{split} & \text{var}\left(\mathbf{g}\_{k}^{\text{H}}\left(\boldsymbol{\tau}\right)\bar{\boldsymbol{\Theta}}\_{k}\left(\boldsymbol{t}\right)\mathbf{w}\_{k}\right) \\ &= \mathbb{E}\left[\left|\bar{\mathbf{g}}\_{k}^{\text{H}}\left(\boldsymbol{\tau}\right)\bar{\boldsymbol{\Theta}}\_{k}\left(\boldsymbol{t}\right)\mathbf{w}\_{k}\right|^{2}\right] + \mathbb{E}\left[\left|\mathbf{e}\_{k}^{\text{H}}\left(\boldsymbol{\tau}\right)\bar{\boldsymbol{\Theta}}\_{k}\left(\boldsymbol{t}\right)\mathbf{w}\_{k}\right|^{2}\right] \\ & - \left|\mathbb{E}\left[\mathbf{g}\_{k}^{\text{H}}\left(\boldsymbol{\tau}\right)\boldsymbol{\Theta}\_{k}\left(\boldsymbol{t}\right)\mathbf{w}\_{k}\right]\right|^{2} . \end{split} \tag{A2}$$

The first term of (A2) can be calculated as

$$\mathbb{E}\left[\left|\mathbf{g}\_k^H(\tau)\,\Theta\_k\left(t\right)\,\frac{\mathbf{g}\_k(\tau)}{\|\mathbf{g}\_k(\tau)\|}\right|^2\right] \stackrel{(a)}{=} N\sum\_{m=1}^M \beta\_{m,k}^2(\tau)\_{\prime} \tag{A3}$$

where (*a*) results from (26).

Next the second term can be given by

$$\mathbb{E}\left[\left|\mathbf{e}\_k^H(\tau)\,\Theta\_k\left(t\right)\frac{\mathbf{g}\_k(\tau)}{\|\mathbf{g}\_k(\tau)\|}\right|^2\right] \stackrel{(a)}{=} \frac{1}{M} \sum\_{m=1}^M \eta\_{m,k}^2(\tau),\tag{A4}$$

where (*a*) can be obtained by exploiting the fact that **e**<sup>H</sup> *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) <sup>2</sup> <sup>∼</sup> <sup>Γ</sup>(*χ*e(*τ*), *<sup>θ</sup>*e(*τ*)) and (25). For the last term *B*(*t*), we first calculate

$$\begin{split} \mathbb{E}\left[\frac{1}{\|\left|\mathfrak{g}\_{i}\right|^{2}}\right] & \stackrel{(a)}{=} \int\_{0}^{\infty} \frac{1}{\chi} x^{\chi\_{i}-1} \frac{\mathfrak{e}^{-\chi/\theta\_{i}}}{\theta\_{i}^{\chi\_{i}} \Gamma(\chi\_{i})} dx \\ &= \frac{1}{\theta\_{i}^{\chi\_{i}} \Gamma(\chi\_{i})} \theta\_{i}^{\chi\_{i}-1} \Gamma(\chi\_{i}-1) \\ &= \frac{1}{\theta\_{i}(\chi\_{i}-1)} \prime \end{split} \tag{A5}$$

where (*a*) results from Remark 2 and we omit (*t*) in (A5). Based on (A5), we have

$$\begin{aligned} &\mathbb{E}\left[\left|\mathbf{g}\_k^H(\tau)\,\tilde{\Theta}\_k(t)\,\mathbf{w}\_i\right|^2\right] \\ &\overset{(a)}{\asymp} \mathbb{E}\left[\left|\mathbf{g}\_k^H(\tau)\,\Theta\_k\left(t\right)\hat{\mathbf{g}}\_i(\tau)\right|^2\right] \mathbb{E}\left[\frac{1}{\|\hat{\mathbf{g}}\_i(\tau)\|^2}\right] \\ &= \frac{\mathbb{E}\left[\sum\_{m=1}^M \beta\_{m,i}^2(\tau)\lambda\_{m,k}\hat{\mathbf{h}}\_{m,k}^H\hat{\mathbf{h}}\_{m,k}\right]}{\theta\_i(\tau)(\chi\_i(\tau)-1)} \\ &= \frac{N\sum\_{m=1}^M \beta\_{m,i}^2(\tau)\lambda\_{m,k}}{\theta\_i(\tau)(\chi\_i(\tau)-1)},\end{aligned} \tag{A6}$$

where *x y* means lim*N*→∞(*x* − *y*) = 0, (*a*) results from Lemma 4 (ii) of [32]. Finally, combining (A1)–(A6) concludes the proof.

#### **Appendix B. Proof of the Theorem 2**

To derive the closed-form expression of (16) with ZF precoder, the following three terms E **g**H *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) **<sup>w</sup>***<sup>k</sup>* 2 , *<sup>A</sup>*(*t*) and *<sup>B</sup>*(*t*) need to be calculated, wherein **<sup>w</sup>***<sup>k</sup>* <sup>=</sup> **<sup>a</sup>***<sup>k</sup>* (*τ*) **a***<sup>k</sup>* (*τ*) . For the term **g**H *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) **<sup>w</sup>***<sup>k</sup>* 2 , we obtain

E 

$$\begin{aligned} & \left| \mathbb{E} \left[ \mathbf{g}\_k^{\mathrm{H}} \left( \tau \right) \boldsymbol{\Theta}\_k \left( t \right) \mathbf{w}\_k \right] \right|^2 \\ & \stackrel{(a)}{=} \left| \mathbb{E} \left[ \mathbf{g}\_k^{\mathrm{H}} \left( \tau \right) \mathbb{E} \left[ \boldsymbol{\Theta}\_k \left( t \right) \right] \mathbf{w}\_k \right] \right|^2 \\ & \stackrel{(b)}{=} \left| \mathbb{E} \left[ \mathbf{g}\_k^{\mathrm{H}} \left( \tau \right) \mathbf{D}\_{k,\tau} \left( t \right) \frac{\mathbf{a}\_k \left( \tau \right)}{\left\| \mathbf{a}\_k \left( \tau \right) \right\|} \right] \right|^2 \\ & \stackrel{(c)}{=} \left( \frac{\Gamma \left( \kappa \left( t \right) + \frac{1}{2} \right)}{\Gamma \left( \kappa \left( t \right) \right)} \right)^2 \theta\_k'(t), \end{aligned} \tag{A7}$$

where (*a*) is obtained because Θ˜ *<sup>k</sup>* (*t*) is independent of **g**(*τ*) and **w***k*. (*b*) is obtained due to the independence of **e***k*(*τ*) and **w***k*. (*c*) results from Lemma 2, Lemma 3 and the relationship between Gamma distribution and Nakagami distribution.

Similar to the proof of Theorem 1, to get the closed-form expression of *A*(*t*), we need to calculate the following two terms

$$\mathbb{E}\left[\left|\mathbf{g}\_k^H(\tau)\,\bar{\Theta}\_k\left(t\right)\,\frac{\mathbf{a}\_k\left(\tau\right)}{\|\mathbf{a}\_k\left(\tau\right)\|}\right|^2\right],\tag{A8}$$

$$\mathbb{E}\left[\left|\mathbf{e}\_k^H(\tau)\,\Theta\_k\left(t\right)\,\frac{\mathbf{a}\_k\left(\tau\right)}{\left\|\mathbf{a}\_k\left(\tau\right)\right\|}\right|^2\right].\tag{A9}$$

The first term (A8) can be given by

$$\mathbb{E}\left[\left|\mathbf{g}\_k^{\rm M}(\tau)\,\bar{\Theta}\_k(t)\,\frac{\mathbf{a}\_k(\tau)}{\|\mathbf{a}\_k(\tau)\|}\right|^2\right] = \frac{MN - K + 1}{M} \sum\_{m=1}^M \beta\_{m,k}^2(\tau),\tag{A10}$$

which results from (27).

Next the second term can be given by

$$\mathbb{E}\left[\left|\mathbf{e}\_k^H(\tau)\,\Theta\_k(t)\,\frac{\mathbf{a}\_k(\tau)}{||\mathbf{a}\_k(\tau)||}\right|^2\right] = \frac{1}{M}\sum\_{m=1}^M \eta\_{m,k}^2(\tau),\tag{A11}$$

which results from (25) and the fact that **e**<sup>H</sup> *<sup>k</sup>* (*τ*) <sup>Θ</sup>˜ *<sup>k</sup>* (*t*) <sup>2</sup> <sup>∼</sup> <sup>Γ</sup>(*χ*e(*τ*), *<sup>θ</sup>*e(*τ*)).

For the term *B*(*t*), we can have

$$\begin{aligned} &\mathbb{E}\left[\left|\mathbf{g}\_k^H(\tau)\,\bar{\Theta}\_k\left(t\right)\mathbf{w}\_i\right|^2\right] \\ &\overset{(a)}{=} \mathbb{E}\left[\left|\mathbf{e}\_k^H\left(\tau\right)\bar{\Theta}\_k\left(t\right)\mathbf{w}\_i\right|^2\right] \\ &= \frac{1}{M}\sum\_{m=1}^M \eta\_{m,k}^2(\tau), \end{aligned} \tag{A12}$$

where (*a*) results from the property of ZF precoder.

Substituting (A7) and (A10)–(A12) into (16) completes the proof.

#### **References**


c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Spectral and Energy Efficiency of Distributed Massive MIMO with Low-Resolution ADC**

#### **Jiamin Li, Qian Lv \*, Jing Yang, Pengcheng Zhu and Xiaohu You**

National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China; lijiamin@seu.edu.cn (J.L.); jingyang@seu.edu.cn (J.Y.); p.zhu@seu.edu.cn (P.Z.); xhyu@seu.edu.cn (X.Y.)

**\*** Correspondence: seulvqian@seu.edu.cn; Tel.: +25-5209-1635

Received: 15 November 2018; Accepted: 3 December 2018; Published: 4 December 2018

**Abstract:** In this paper, considering a more realistic case where the low-resolution analog-to-digital convertors (ADCs) are employed at receiver antennas, we investigate the spectral and energy efficiency in multi-cell multi-user distributed massive multi-input multi-output (MIMO) systems with two linear receivers. An additive quantization noise model is provided first to study the effects of quantization noise. Using the model provided, the closed-form expressions for the uplink achievable rates with a zero-forcing (ZF) receiver and a maximum ratio combination (MRC) receiver under quantization noise and pilot contamination are derived. Furthermore, the asymptotic achievable rates are also given when the number of quantization bits, the per user transmit power, and the number of antennas per remote antenna unit (RAU) go to infinity, respectively. Numerical results prove that the theoretical analysis is accurate and show that quantization noise degrades the performance in spectral efficiency, but the growth in the number of antennas can compensate for the degradation. Furthermore, low-resolution ADCs with 3 or 4 bits outperform perfect ADCs in energy efficiency. Numerical results imply that it is preferable to use low-resolution ADCs in distributed massive MIMO systems.

**Keywords:** distributed massive MIMO; energy efficiency; spectral efficiency; pilot contamination; quantization noise

#### **1. Introduction**

Massive multi-input multi-output (MIMO) systems are an essential technology for the fifth generation (5G) mobile networks because they can significantly improve spectral efficiency and energy efficiency [1–6]. In massive MIMO systems, a relatively small number of users are served by hundreds or thousands of antennas employed at base stations in the same time-frequency resource. The huge number of antennas provides a high number of degrees-of-freedom, which favors low-complexity receivers, such as maximum ratio combination (MRC) and zero-forcing (ZF), and beamforming, such as ZF and maximum ratio transmission (MRT) [2,7,8]. Therefore, we consider MRC and ZF receivers for uplink transmission in this paper. There are two categories for massive MIMO: co-located massive MIMO and distributed massive MIMO [9]. Compared to co-located massive MIMO, distributed massive MIMO has advantages of increasing spectral efficiency, energy efficiency, and system coverage due to the reduced access distance [8,10–12]. Hence, a distributed massive MIMO system is considered in this paper.

Although massive MIMO systems have significant performance gains, they also face new challenges: high total power consumption, expensive hardware, and mass data processing [13]. Specifically, each antenna is equipped with a radio frequency (RF) chain, including an analog-to-digital converter (ADC) unit in massive MIMO systems. However, with the increase in the antenna number, the hardware complexity and the power consumption of ADCs increase exponentially with the number

of quantization bits [14]. Therefore, one promising solution is to employ low-resolution ADCs in massive MIMO systems. The study of low-resolution ADC in MIMO or massive MIMO systems has caused widespread concern.

Spectral and energy efficiency are two fundamental metrics to analyze the impacts of low-resolution ADCs. Spectral efficiency was investigated in [15–17]. The performance of 1-bit resolution ADC in MIMO systems was studied in [15] considering the nonlinear characteristics of a quantizer. In massive MIMO systems with low-resolution ADCs, the uplink achievable rate using the common MRC receivers has been investigated in [16], and the uplink achievable rate using the common ZF receivers was studied in [17]. However, these two papers made an assumption that the base station had perfect channel state information (CSI), and they only considered a single-cell massive MIMO system. In fact, the CSI is not available at the base station. On the other hand, energy efficiency was studied in [13,18–20]. The optimal number of quantized bits and antenna selection were considered to maximize the energy efficiency of general MIMO with low-resolution ADCs in [13]. It was pointed out in [18] that very low bit resolution is not preferable from the perspective of energy efficiency. A function about energy efficiency and the number of quantized bits was obtained in [19].

The previous papers mainly studied a single-cell system, made an assumption that the base station had perfect CSI, and did not analyze spectral efficiency and energy efficiency simultaneously. Hence, in this paper, a multi-cell multi-user massive MIMO system with low-resolution ADCs is considered, and we assume that the base stations estimate CSI during the uplink pilot transmission phase. Furthermore, the uplink spectral and energy efficiency are both analyzed. Here are the key contributions of this paper:


#### **2. System Model**

We consider a distributed massive MIMO system. There are *L* adjacent cells, and each cell consists of *M* remote antennas units (RAUs) and *K* single-antenna users. Each RAU is equipped with an array of *N* antennas. Each antenna is equipped with a low-resolution ADC, which means system performance will be degraded by quantization noise. RAUs in the same cell transmit or receive signals simultaneously while the beamforming design and signal processing are performed in a baseband processing unit. An example is given in Figure 1. There are *L* = 7 adjacent cells, and in cell-1, there are *K* = 6 users and *M* = 6 RAUs.

**Figure 1.** System configuration.

#### *2.1. Quantization Noise Model*

For uplink transmission, the signal vector received by all RAUs in cell *l* can be given by

$$\mathbf{y}\_l = \sqrt{p\_{\mathbf{u}}} \sum\_{i=1}^{L} \mathbf{G}\_{l,i} \mathbf{x}\_i + \mathbf{n}\_{l\prime} \tag{1}$$

where **x***<sup>i</sup>* is the *K* × 1 signal vector transmitted by the *K* users in cell *i*, *p*<sup>u</sup> is the uplink transmitted power, and **n***<sup>l</sup>* ∼ CN (0,**I***MN*) is the additive white Gaussian noise, **G***l*,*<sup>i</sup>* = [**g***l*,*i*,1, ..., **g***l*,*i*,*K*] is the *MN* × *K* channel matrix from *M* RAUs in cell *l* to *K* users in cell *i*, wherein

$$\mathbf{g}\_{l,i,k} = \begin{bmatrix} \sqrt{\lambda\_{l,1,i,k}} \mathbf{h}\_{l,1,i,k'}^{\mathrm{T}} \cdots \mathbf{ } \sqrt{\lambda\_{l,M,i,k}} \mathbf{h}\_{l,M,i,k}^{\mathrm{T}} \end{bmatrix}^{\mathrm{T}} \tag{2}$$

where *λl*,*m*,*i*,*<sup>k</sup>* is the path loss between the *k*-th user in the *i*-th cell and the *m*-th RAU in the *l*-th cell, which is dependent on the corresponding distance, and **h***l*,*m*,*i*,*<sup>k</sup>* ∼ CN (0,**I***N*) denotes the small scale fading.

This paper assumes that the CSI is unknown to the base station, and pilot training is performed. Motivated by [21], based on the minimum mean square error (MMSE) channel estimation, the equivalent estimated channel can be given by

$$\hat{\mathbf{g}}\_{i,l,k} = \left[ \sqrt{\beta\_{i,1,l,k}} \hat{\mathbf{h}}\_{i,k,1\prime}^{\mathrm{T}} \cdots \right., \sqrt{\beta\_{i,M,l,k}} \hat{\mathbf{h}}\_{i,k,M}^{\mathrm{T}} \right]^{\mathrm{T}} \tag{3}$$

where

$$\beta\_{i,m,l,k} = \frac{\lambda\_{i,m,l,k}^2}{\sum\_{j=1}^L \lambda\_{i,m,j,k} + 1/(\tau p\_u)}. \tag{4}$$

*τ* denotes the length of pilot sequences, *βi*,*m*,*l*,*<sup>k</sup>* denotes the equivalent path loss between the *k*-th user in the *l*-th cell and the *m*-th RAU in the *i*-th cell, and **h**ˆ*i*,*<sup>k</sup>* Δ = [**h**ˆ <sup>T</sup> *<sup>i</sup>*,*k*,1, ··· , **<sup>h</sup>**<sup>ˆ</sup> <sup>T</sup> *<sup>i</sup>*,*k*,*M*] <sup>T</sup> ∼ CN (0,**I***MN*) represents the equivalent small scale fading part of the estimated channel. Because of the orthogonality principle of MMSE estimation theory, **g***i*,*l*,*<sup>k</sup>* can be decomposed as

$$\mathbf{g}\_{iJ,k} = \mathbf{\hat{g}}\_{iJ,k} + \mathbf{\bar{g}}\_{iJ,k} \tag{5}$$

where **g**˜*i*,*l*,*<sup>k</sup>* ∼ CN 0, diag *ηi*,1,*l*,*k*, ··· , *ηi*,*M*,*l*,*<sup>k</sup>* ⊗ **I***<sup>N</sup>* is the uncorrelated and statistically independent of **g**ˆ*i*,*l*,*<sup>k</sup>* estimation error, and *ηi*,*m*,*l*,*<sup>k</sup>* Δ = *λi*,*m*,*l*,*<sup>k</sup>* − *βi*,*m*,*l*,*k*.

After the received analog signals pass through the low-resolution ADCs, the quantized digital signal vector can be obtained as

$$\mathbf{y}\_{l, \!\!\!\!\!\!\!} = \mathcal{Q}(\mathbf{y}\_{l}) = \mathcal{Q}\left(\sqrt{p\_{\mathbf{u}}} \sum\_{i=1}^{L} \mathbf{G}\_{l, \!\!\!\!\!\/, \!\!\!\!\/ } \mathbf{x}\_{l} + \mathbf{n}\_{l}\right) \tag{6}$$

where Q(.) represents the quantization function. Assuming that the gain of automatic gain control is appropriately set, the additive quantization noise model (AQNM) can be employed to reformulate the quantized signal vector as

$$\mathbf{y}\_{l,\emptyset} = \mathbf{a}\mathbf{y}\_l + \mathbf{n}\_{l,\emptyset} = \mathbf{a}\sqrt{p\_\mathbf{u}}\sum\_{i=1}^{L} \mathbf{G}\_{l,i}\mathbf{x}\_l + \mathbf{a}\mathbf{n}\_l + \mathbf{n}\_{l,\emptyset} \tag{7}$$

where *α* = 1 − *ρ*, *ρ* is the inverse of the signal-to-quantization-noise ratio, and **n***l*,q denotes the additive uncorrelated quantization noise vector, which is Gaussian-distributed. The parameter *ρ* is a constant dependent on the number of quantization bits *b*. According to [16], the covariance matrix of quantization noise **n***l*,q for a fixed channel realization can be denoted as

$$\mathbf{R}\_{\mathbf{n}\_{l,\parallel}} = \pi (1 - \boldsymbol{\alpha}) \text{diag} \left( p\_{\mathbf{u}} \sum\_{i=1}^{L} \mathbf{G}\_{l,i} \mathbf{G}\_{l,i}^{\rm H} + \mathbf{I} \right). \tag{8}$$

#### *2.2. The Energy Efficiency Model*

(1) Achievable uplink rates: In the uplink transmission phase, the quantized signal processed by the linear detector of user *k* in cell *l* is presented as

$$\begin{split} \boldsymbol{\sigma}\_{l,k} &= \mathbf{a}\_{l,k}^{\mathrm{H}} \mathbf{y}\_{l,\mathrm{q}} \\ &= a\sqrt{p\mathbf{u}} \sum\_{i=1}^{L} \sum\_{j=1}^{K} \mathbf{a}\_{l,k}^{\mathrm{H}} \hat{\mathbf{g}}\_{l,i,j} \mathbf{x}\_{i,j} + a\sqrt{p\mathbf{u}} \sum\_{i=1}^{L} \sum\_{j=1}^{K} \mathbf{a}\_{l,k}^{\mathrm{H}} \tilde{\mathbf{g}}\_{l,i,j} \mathbf{x}\_{i,j} + a\mathbf{a}\_{l,k}^{\mathrm{H}} \mathbf{n}\_{l} + a\mathbf{a}\_{l,k}^{\mathrm{H}} \mathbf{n}\_{l,\mathrm{q}} \end{split} \tag{9}$$

where **a***l*,*<sup>k</sup>* is the linear receiver vector in cell *l* for user *k*, and *xi*,*<sup>j</sup>* ∼ CN (0, 1) is the *j*-th column of **x***i*. In this paper, we focus on two linear receivers, namely MRC and ZF. Mathematically, **a***l*,*<sup>k</sup>* can be given by

$$\mathbf{a}\_{l,k} = \begin{cases} \mathbf{\hat{g}}\_{l,l,k\prime} & \text{for MRC} \\ \mathbf{f}\_{l,l,k\prime} & \text{for ZF} \end{cases} \tag{10}$$

where **f***l*,*l*,*<sup>k</sup>* is the *k*-th column of **Gˆ** *<sup>l</sup>*,*<sup>l</sup>* **Gˆ** <sup>H</sup> *l*,*l* **Gˆ** *<sup>l</sup>*,*<sup>l</sup>* −<sup>1</sup> , and **Gˆ** *<sup>l</sup>*,*<sup>l</sup>* <sup>=</sup> [**gˆ***l*,*l*,1, ··· , **gˆ***l*,*l*,*K*].

Motivated by [22,23], treating the interference as worst-case unrelated additive noise, the lower bound of the achievable uplink rate of the *k*-th user in the *l*-th cell can be given by

$$R\_{l,k}(\mathbf{p}) = \mathbb{E}\left[\log\_2\left(1 + \frac{p\_{\rm u}a^2|\mathbf{a}\_{l,k}^H\mathbf{g}\_{l,l}|^2}{\mathbb{E}\left[\mathbf{a}\_{l,k}^H\left(p\_{\rm u}a^2\sum\_{(i,j)\neq(l,k)}\mathbf{g}\_{i,j}\mathbf{g}\_{i,j}^H + p\_{\rm u}a^2\sum\_{(i,j)}\mathbf{g}\_{i,j}\mathbf{g}\_{i,j}^H + a^2 + \mathbf{R}\_{\mathbf{n}\_{l,q}}\right)\mathbf{a}\_{l,k}|\mathbf{G}\_{l,l}\right]}\right)\right] \tag{11}$$

$$\stackrel{(a)}{=} \mathbb{E}\left[\log\_2\left(1+\frac{p\_u a^2 |\mathbf{a}\_{\downarrow k}^H \mathbf{g}\_{l,k}|^2}{\mathbb{E}\left[\mathbf{1}\_{l,k} + p\_u a^2 \sum\_{i \neq l} |\mathbf{a}\_{l,k}^H \mathbf{g}\_{l,i,k}|^2 + a^2 ||\mathbf{a}\_{l,k}^H||^2\right]}\right)\right] \tag{12}$$

where **p** is the transmitted power vector of *K* users. Since the denominator of Equation (11) is a conditional expectation operator and the estimated error vector and estimated channel vector are independent, I*l*,*<sup>k</sup>* is given as

$$\mathcal{L}\_{l,k} = p\_{\rm u}a^2 \sum\_{i=1}^{L} \sum\_{j \neq k} \mathbf{a}\_{l,k}^{\rm H} \mathbb{E} \left[ \underline{\mathbf{g}}\_{l,i,j} \underline{\mathbf{g}}\_{l,i,j}^{\rm H} \right] \mathbf{a}\_{l,k} + p\_{\rm u}a^2 \sum\_{(i,j)} \mathbf{a}\_{l,k}^{\rm H} \mathbb{E} \left[ \underline{\mathbf{g}}\_{l,i,j} \underline{\mathbf{g}}\_{l,i,j}^{\rm H} \right] \mathbf{a}\_{l,k} + \mathbf{a}\_{l,k}^{\rm H} \mathbb{E} \left[ \mathbf{R}\_{\mathbf{h}\_{l,q}} \right] \mathbf{a}\_{l,k} \tag{13}$$

wherein

$$\begin{aligned} \mathbb{E}\left[\mathbf{g}\_{l,i,j}\mathbf{g}\_{l,i,j}^{\rm H}\right] &= \text{diag}\left(\beta\_{l,1,l,j}\mathbf{I}\_{\mathbf{M}\_{l}}, \dots, \eta\_{l,M\_{l}l,j}\mathbf{I}\_{\mathbf{N}}\right) \\ \mathbb{E}\left[\mathbf{g}\_{l,i,j}\mathbf{g}\_{l,i,j}^{\rm H}\right] &= \text{diag}\left(\eta\_{l,1,l,j}\mathbf{I}\_{\mathbf{M}\_{l}}, \dots, \eta\_{l,M\_{l}l,j}\mathbf{I}\_{\mathbf{N}}\right) \\ \mathbb{E}\left[\mathbf{R}\_{\mathbf{h}\_{l,i}}\right] &= \text{diag}\left(p\_{u}\mathbf{G}\_{l,l}\mathbf{G}\_{l,j}^{\rm H}\right) + \mathbf{D}\_{\mathbf{R}} \\ \mathbf{D}\_{\mathbf{R}} &= \text{diag}\left(\left(p\_{u}\sum\_{j=1}^{K}\eta\_{l,1,l,j} + p\_{u}\sum\_{i\neq l}^{K}\lambda\_{l,1,j,i} + 1\right)\mathbf{I}\_{\mathbf{N}}, \dots, \left(p\_{u}\sum\_{j=1}^{K}\eta\_{l,M\_{l}l,j} + p\_{u}\sum\_{i\neq j=1}^{K}\lambda\_{l,M\_{l}l,j} + 1\right)\mathbf{I}\_{\mathbf{N}}\right) \\ \eta\_{l,m,i,j} &= \eta\_{l,m,i,j} - \beta\_{l,m,i,j}. \end{aligned} \tag{14}$$

(2) Power consumption model: According to [24–26], for cell *l*, the total power consumption model can be given by

$$P\_{\rm I} = P\_{\rm TC} + P\_{\rm LP} + P\_{\rm T} + P\_{\rm BH}.\tag{15}$$

The first term *P*TC is the power consumption of transceiver chains, which can be given by

$$P\_{\rm TC} = M(NP\_{\rm BS} + \rho P\_{\rm SYN}) + (1 - \rho)P\_{\rm SYN} + KP\_{\rm UE} + MP\_{\rm ADC} \tag{16}$$

where *P*BS and *P*UE are the power consumption of running the circuit components employed at the base station and users, *P*SYN are the power consumed by the local oscillator, and *P*ADC = *a*0*N*2*<sup>b</sup>* + *a*<sup>1</sup> are the power consumed by ADC, wherein *a*<sup>0</sup> and *a*<sup>1</sup> are constant parameters, *ρ* = 1 for the distributed antenna system (DAS), and *ρ* = 0 for the co-located antenna system (CAS). This results from the assumption that antennas at the same RAU are connected to a common oscillator, while oscillators at different RAUs are different in the DAS, and all antennas are connected to a single oscillator in the CAS.

The second term *P*LP is the power consumption of the MRC/ZF receiver at the base station, which can be given by

$$P\_{\rm LP} = B \frac{T - \pi}{T} \frac{2MNK}{L\_{\rm BS}} + \frac{B}{T} \left( \frac{3MNK}{L\_{\rm BS}} (1 - d) + d \left( \frac{K^3}{3L\_{\rm BS}} + \frac{MNK(3K + 1)}{L\_{\rm BS}} \right) \right) \tag{17}$$

where *d* = 0 for MRC while *d* = 1 for ZF, *B* is the bandwidth, *T* denotes the symbols for uplink transmission, and *L*BS is the computational efficiency of arithmetic complex-valued operations for a Joule.

The third term *P*<sup>T</sup> is transmit power, which can be represented as

$$P\_T = \frac{T - \tau}{T} \frac{K}{\sqrt{}} p\_u \tag{18}$$

where *ξ* is the amplified efficiency.

For the last term, *P*BH is the power consumed of backhaul in the DAS, while it can be neglected in the CAS. Specifically, *P*BH in the DAS can be given by

$$P\_{\rm BH} = M \left( P\_0 + B P\_{\rm BT} \sum\_{k=1}^{K} R\_{l,k}(\mathbf{p}) \right) \tag{19}$$

where *P*<sup>0</sup> and *P*BT are the fixed and traffic-dependent power consumption at each backhaul, respectively.

(3) Global energy efficiency model: Based on the above analysis, the total power consumption for all cells can be given by

$$P\_{\text{Total}}(\mathbf{p}) = L P\_{\text{IND}} + \frac{T - \tau}{\sqrt{T}} L K p\_{\text{u}} + P\_{\text{BT}} M B \sum\_{l=1}^{L} \sum\_{k=1}^{K} R\_{l,k}(\mathbf{p}) \tag{20}$$

where *P*IND is the power consumption independent of **p** and can be given by

$$P\_{\rm IND} = P\_{\rm TC} + P\_{\rm LP} + MP\_0. \tag{21}$$

According to [24], the global energy efficiency is defined as the ratio of the achievable sum rate to the total power consumption in Watts. Mathematically, it can be defined as

$$\varphi(\mathbf{p}) = \frac{{}^B \Sigma\_{l=1}^{l} \Sigma\_{k=1}^{K} \ R\_{l,k}(\mathbf{p})}{P\_{\text{Total}}}.\tag{22}$$

#### **3. Energy Efficiency Analysis**

From Equations (22) and (A4), we can see that it is difficult to directly calculate Equation (A4) to analyze the energy efficiency. Therefore, we first derive the closed-form expressions of uplink achievable rates. The results are shown in the following theorems.

**Theorem 1.** *Using MRC receiver with low-resolution ADCs and pilot contamination, the closed-form expression for the uplink achievable rate of the k-th user in the l-th cell is given by*

$$R\_{l,k}^{mrc} = \log\_2\left(1 + \frac{p\_u a\left[\left(N\sum\_{m=1}^M \beta\_{l,m,l,k}\right)^2 + N\sum\_{m=1}^M \beta\_{l,m,l,k}^2\right]}{p\_u a\Omega\_{l,k} + p\_u a\Xi\_{l,k} + (1-a)\Phi\_{l,k} + a\sum\_{m=1}^M \beta\_{l,m,l,k}^2}\right) \tag{23}$$

*where*

$$\begin{split} \boldsymbol{\Omega}\_{l,k} &= N \sum\_{m=1}^{M} \sum\_{i=1}^{L} \sum\_{j=1}^{K} \boldsymbol{\beta}\_{l,m,l} \boldsymbol{\beta}\_{l,m,i,j} \\ \boldsymbol{\Xi}\_{l,k} &= N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k} \left( \sum\_{i=1}^{L} \sum\_{j \neq k} \boldsymbol{\beta}\_{l,m,j,j} + \sum\_{i \neq l} \boldsymbol{\beta}\_{l,m,i,k} \right) + \sum\_{i \neq l} \left( N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k}^{1/2} \boldsymbol{\beta}\_{l,m,i,k}^{1/2} \right)^{2} \\ \boldsymbol{\Phi}\_{l,k} &= \boldsymbol{p}\_{u} \left( \left( N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k} \right)^{2} + N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k}^{2} + N \sum\_{j \neq k} \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k} \boldsymbol{\beta}\_{l,m,j,j} \right) + \boldsymbol{\Upsilon}\_{l,k} \\ \boldsymbol{\Upsilon}\_{l,k} &= N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k} \left( 1 + \boldsymbol{p} \sum\_{j=1}^{K} \boldsymbol{\eta}\_{l,m,l,j} + \boldsymbol{p} \sum\_{i \neq l} \sum\_{j=1}^{K} \boldsymbol{\lambda}\_{l,m,i,j} \right). \end{split}$$

**Proof.** The proof is given in Appendix B.

**Theorem 2.** *Using ZF receiver with low-resolution ADCs and pilot contamination, the closed-form expression for the uplink achievable rate of user k in the l-th cell is given by*

$$R\_{l,k}^{\exists f} = \log\_2\left(1 + \frac{p\_u a\_s^r \sum\_{m=1}^M \beta\_{l,m,l,k}}{ap\_u \sum\_{m=1}^M \left(\sum\_{\substack{i \neq l \ j \neq k}}^{\sum} \beta\_{l,m,i,j} + \sum\_{i=1}^L \sum\_{j=1}^K \eta\_{l,m,i,j} + \zeta \sum\_{i \neq l} \beta\_{l,m,i,k}\right) + (1-a)\left(\zeta p\_u \sum\_{m=1}^M \beta\_{l,m,l,k} + \Delta\_{l,k}\right) + aM}\right) \tag{25}$$

*where ζ* = *MN* − *K* + 1*, and*

$$
\Delta\_{l,k} = \sum\_{m=1}^{M} \left( 1 + p\_{\text{u}} \sum\_{j=1}^{K} \eta\_{l,m,l,j} + p\_{\text{u}} \sum\_{i \neq l} \sum\_{j=1}^{K} \lambda\_{l,m,i,j} \right). \tag{26}
$$

**Proof.** The proof is given in Appendix C.

From Equations (23) and (25), it can be concluded that the quantization noise influences both the numerator and the denominator of Equation (A4). This means that the quantization noise is unlike the additive noise, which only affects the denominator.

*Electronics* **2018**, *7*, 391

Based on the theorems above, we analyze the asymptotic performance with quantization bits, per user transmit power, and the number of antennas per RAU, respectively. The results are given below.

**Case 1:** With a fixed transmitted power per user *p*u and a total number of antennas per cell *MN*, when the number of quantization bits *b* → ∞, the inverse of the signal-to-quantization-noise ratio *ρ* tends toward zero, which means that *α* in Equations (23) and (25) tends toward 1. The following results can then be obtained in this case by replacing the *α* in Equations (23) and (25) with 1

$$\bar{R}\_{l,k}^{\text{mrc}} = \log\_2 \left( 1 + \frac{p\_u \left[ \left( N \sum\_{m=1}^M \beta\_{l,m,l,k} \right)^2 + N \sum\_{m=1}^M \beta\_{l,m,l,k}^2 \right]}{p\_u \Omega\_{l,k} + p\_u \Xi\_{l,k} + \sum\_{m=1}^M \beta\_{l,m,l,k}^2} \right) \tag{27}$$

$$\tilde{R}\_{l,k}^{\text{xf}} = \log\_2 \left( 1 + \frac{p\_u \zeta \sum\_{m=1}^{M} \beta\_{l,m,l,k}}{p\_u \sum\_{m=1}^{M} \left( \sum\_{i \neq l} \sum\_{j \neq k} \beta\_{l,m,i,j} + \sum\_{i=1}^{L} \sum\_{j=1}^{K} \eta\_{l,m,i,j} + \zeta \sum\_{i \neq l} \beta\_{l,m,i,k} \right) + M} \right) . \tag{28}$$

Case 1 shows the achievable uplink rates without considering the quantization noise caused by ADC. It can be seen that the spectral efficiency is only limited by pilot contamination and channel estimation error. Moreover, since the power consumption of an ADC *P*ADC = *a*0*N*2*<sup>b</sup>* + *a*<sup>1</sup> is an exponential function of *b*, *P*ADC tends toward infinity when *b* → ∞. As shown in Equation (20), the total power consumption also goes to infinity. Hence, the limited achievable rates and unlimited power consumption lead to the fact that the global energy efficiency tends toward zero, that is *φ*(**p**) → 0 when *b* → ∞, while *p*<sup>u</sup> and *MN* are fixed.

**Case 2:** With a fixed number of quantization bits *b* and antennas per cell *MN*, when *p*<sup>u</sup> → ∞, the ultimate rates of user *k* in cell *l* with both receivers can be directly obtained by dividing the dominators and numerators of Equations (23) and (25) by *p*u, which are given by

$$\tilde{R}\_{l,k}^{\text{mrc}} = \log\_2 \left( 1 + \frac{a \left[ \left( N \sum\_{m=1}^{M} \beta\_{l,m,l,k} \right)^2 + N \sum\_{m=1}^{M} \beta\_{l,m,l,k}^2 \right]}{a \Omega\_{l,k} + a \Xi\_{l,k} + (1-a) \Phi\_{l,k}^{\dagger}} \right) \tag{29}$$

$$\mathcal{R}\_{l,k}^{\text{xf}} = \log\_2 \left( 1 + \frac{a \zeta \sum\_{m=1}^{M} \beta\_{l,m,l,k}}{a \sum\_{m=1}^{M} \left( \sum\_{j \neq l} \beta\_{l,m,j,j} + \sum\_{i=1}^{L} \sum\_{j=1}^{K} \eta\_{l,m,i,j} + \zeta \sum\_{j \neq l} \beta\_{l,m,i,k} \right) + (1-a) \left( \zeta \sum\_{m=1}^{M} \beta\_{l,m,l,k} + \Lambda\_{l,k}' \right)} \right) \tag{30}$$

where

$$\begin{split} \boldsymbol{\Phi}\_{l,k} &= \left( \left( N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k} \right)^{2} + N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k}^{2} + N \sum\_{j \neq k} \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k} \boldsymbol{\beta}\_{l,m,l,j} \right) + \boldsymbol{Y}\_{l,k}', \\ \boldsymbol{Y}\_{l,k}' &= N \sum\_{m=1}^{M} \boldsymbol{\beta}\_{l,m,l,k} \left( \sum\_{j=1}^{K} \eta\_{l,m,l,j} + \sum\_{i \neq l} \sum\_{j=1}^{K} \boldsymbol{\lambda}\_{l,m,i,j} \right) \\ \boldsymbol{\Delta}\_{l,k}' &= \sum\_{m=1}^{M} \left( \sum\_{j=1}^{K} \eta\_{l,m,l,j} + \sum\_{i \neq l} \sum\_{j=1}^{K} \boldsymbol{\lambda}\_{l,m,i,j} \right). \end{split}$$

Case 2 indicates that, as *p*u grows indefinitely, the achievable uplink rates approach certain values dependent on the resolution of ADC. This observation shows that the performance degradation due to low-resolution ADCs cannot be compensated by increasing the transmit power. Furthermore, it can be seen that, as the transmit power increases, the system energy efficiency tends toward zero. This is because the total power consumption presented in Equation (20) tends toward infinity as *p*u increases indefinitely, but the unlimited growth in the transmit power cannot improve the achievable uplink rates indefinitely.

**Case 3:** With a fixed number of quantization bits *b*, the number of RAUs per cell *M* and the transmitted power per user *p*u, when *N* → ∞, the limiting rates of user *k* in cell *l* with both receivers can be directly obtained by dividing the dominators and numerators of Equations (23) and (25) by *N*2, which are given by

$$R\_{l,k}^{\text{mnc}} = \log\_2\left(1 + \frac{a\left(\sum\_{m=1}^{M} \beta\_{l,m,l,k}\right)^2}{a\sum\_{l\neq l} \left(\sum\_{m=1}^{M} \beta\_{l,m,l,k}^{1/2} \beta\_{l,m,l,k}^{1/2}\right)^2 + (1-a)\left(\sum\_{m=1}^{M} \beta\_{l,m,l,k}\right)^2}\right) \tag{31}$$

$$R\_{l,k}^{\rm xf} = \log\_2\left(1 + \frac{a\sum\_{m=1}^{M} \beta\_{l,m,l,k}}{a\sum\_{j\neq l} \sum\_{m=1}^{M} \beta\_{l,m,j,k} + (1-a)\sum\_{m=1}^{M} \beta\_{l,m,l,k}}\right). \tag{32}$$

Case 3 shows that, when the number of antennas per RAU grows without bound, the impacts of quantization noise vanish. However, the achievable rates with both receivers tend toward certain and limited values as *N* goes infinity. This results from the presence of pilot contamination. Furthermore, since the power consumption of transceiver chains and linear processing at the base station are proportional to *N*, they tend toward infinity when *N* → ∞. As shown in Equation (20), the total power consumption also goes to infinity. Hence, the limited achievable rates and unlimited power consumption lead to the fact that the global energy efficiency tends toward zero, that is *φ*(**p**) → 0 when *N* → ∞ while *b*, *M*, and *p*<sup>u</sup> are fixed.

#### **4. Numerical Results**

In this section, we verify the accuracy of the theoretical results in Section 3 by a series of Monte Carlo simulations. A multi-cell distributed massive MIMO system is considered, which consists of *L* = 7 cells, *M* = 7 RAUs per cell, *K* = 6 users per cell, and the cell radius *D* is normalized to 1. In each cell, all users are uniformly distributed, while RAUs have fixed locations with radiuses *<sup>r</sup>*<sup>1</sup> <sup>=</sup> 0,*r*<sup>2</sup> <sup>=</sup> ··· <sup>=</sup> *<sup>r</sup>*<sup>7</sup> = (<sup>3</sup> <sup>−</sup> <sup>√</sup>3)/2 and angles *<sup>θ</sup>*<sup>1</sup> <sup>=</sup> 0, *<sup>θ</sup>*<sup>2</sup> <sup>=</sup> *<sup>π</sup>*/6, *<sup>θ</sup>*<sup>3</sup> <sup>=</sup> *<sup>π</sup>*/2, *<sup>θ</sup>*<sup>4</sup> <sup>=</sup> <sup>5</sup>*π*/6, *<sup>θ</sup>*<sup>5</sup> <sup>=</sup> 7*π*/6, *θ*<sup>6</sup> = 3*π*/2, *and θ*<sup>7</sup> = 11*π*/6. The path loss between the *k*-th user in the *i*-th cell and the *m*-th RAU in the *l*-th cell *λl*,*m*,*i*,*<sup>k</sup>* is modeled as *λl*,*m*,*i*,*<sup>k</sup>* = *d*−*<sup>ι</sup> <sup>l</sup>*,*m*,*i*,*k*, where *dl*,*m*,*i*,*<sup>k</sup>* is the corresponding distance, and *ι* assumed as *ι* = 3.7 is the path loss exponent. Moreover, the length of pilot sequences is *τ* = *K*. The coherence time of the channel is assumed as *T* = 196 symbols, and the power consumption parameters are given in Table 1.

**Table 1.** Power consumption parameters.


We first prove the accuracy of the theoretical results given in Theorems 1 and 2. Figure 2 illustrates the uplink spectral efficiency per cell versus the number of quantization bits with different numbers of antennas per RAU. It can be seen that the closed-form expressions and simulation results match well with each other using both MRC and ZF receivers. As the number of antennas per RAU increases, the uplink spectral efficiency grows obviously for both receivers. Furthermore, for both receivers, the uplink spectral efficiency increases rapidly with the increase in quantization bits *b* when *b* is small, while the growth of *b* cannot improve the spectral efficiency further when *b* is large. It can be concluded that low-resolution ADCs are acceptable in massive MIMO systems, and employing a large number

of antennas at each RAU can compensate for the performance degradation. In the following, the closed-form expressions will be used for numerical work.

**Figure 2.** Spectral efficiency versus the number of quantization bits with different numbers of antennas per RAU.

Next, Figure 3 illustrates the energy efficiency versus spectral efficiency with different numbers of quantization bits and of antennas per RAU. It can be seen that, as the number of antennas and the number of quantization bits increase, energy efficiency increases first and then decreases. This is because the power consumption and spectral efficiency both increase with the increase in antennas and quantization bits, but the improvement of spectral efficiency dominates first, and the power consumption then dominates. The results illustrate that we cannot improve the spectral efficiency and energy efficiency simultaneously without bound, and there needs to be a tradeoff between them, which was investigated in [27–29]. Moreover, it can be seen in Figure 3b that *b* = 3 or *b* = 4 are preferable under the system configuration mentioned above. If *b* increases further, the spectral efficiency can be slightly improved while the energy efficiency decreases rapidly. It should be noted that the optimal number of bits is dependent on system configuration and system parameters. Figure 3 also indicates that low-resolution ADCs (b = 3 or 4 bits in our simulation results) are preferable in distributed massive MIMO systems.

Finally, the energy efficiency against spectral efficiency with different numbers of quantization bits and transmitted power per user is presented in Figure 4. The same conclusion about the relationship between energy efficiency and spectral efficiency with different *b* can be obtained from Figure 4. As for the transmitted power per user, it can be seen that, with its increase, the energy efficiency increases first and then decreases. This results from the fact that the power consumption linearly increases with the growth of transmitted power, but the spectral efficiency increases first and then tends toward a certain value.

**Figure 3.** Energy efficiency versus spectral efficiency with different numbers of quantization bits *b* and different numbers of antennas *N* per RAU. (**a**) Each line corresponds to different numbers of quantization bits with *b* = [1, 2, 3, 6, 8, 9, 11] and the points on each line correspond to different numbers of antennas per RAU with *N*= [1:1:6, 8, 10, 12, 15, 20, 30, 40, 50, 80, 100]. (**b**) Each line corresponds to different numbers of antennas per RAU with *N*= [4, 6, 8, 12, 16, 22, 30] and the points on each line correspond to different numbers of quantization bits with *b* = [1:1:12].

**Figure 4.** Energy efficiency versus spectral efficiency with the number of quantization bits *b* = [1:1:6] bits and transmitted power per user *pu* = [0.01, 0.02, 0.05, 0.08, 0.1, 0.15, 0.2, 0.3, 0.4, 0.6, 0.8, 1] W, where each line corresponds to different *b* and the points on each line correspond to different *p*u.

#### **5. Conclusions**

In this paper, we analyzed the uplink spectral and energy efficiency simultaneously in distributed massive MIMO systems with low-resolution ADCs. Furthermore, this paper considered a more realistic scenario where the base station did not have CSI and it obtained the estimated CSI during the pilot phase. In this case, the pilot contamination presents and degrades the system performance. We first gave an additive quantization noise model and got the estimated CSI with pilot contamination. Under the imperfect CSI, we derived the closed-form expressions for achievable uplink rates using MRC and ZF receivers. Furthermore, we obtained the asymptotic performance with the number of quantization bits, the per user transmit power, and the per RAU antenna number, respectively. The theorems are verified by simulation. It can be noted that the increase in antennas can compensate for the spectral efficiency degradation caused by quantization noise. Furthermore, the energy efficiency with low-resolution ADCs are better than that with perfect ADCs. Numerical results imply that it is preferable to use low-resolution ADCs in distributed massive MIMO systems.

We intend to extend our research considering the tradeoff between spectral efficiency and energy efficiency, which involves multi-objective optimization. Furthermore, in order to make the system more energy-efficient, we plan to extend our research considering RAU selection.

**Author Contributions:** Formal analysis, J.L.; Methodology, J.L.; Supervision, P.Z. and X.Y.; Validation, J.L. and J.Y.; Writing original draft, J.L. and Q.L.; Writing review & editing, J.L. and Q.L.

**Funding:** This work was supported in part by National Natural Science Foundation of China (NSFC) (Grant No. 61501113, 61571120) and the Jiangsu Provincial Natural Science Foundation (Grant No. BK20150630, BK20180011).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Lemmas for Proof**

In order to derive the closed-form expressions with both receivers, we provide the following preliminary lemmas first.

**Lemma A1** ([22])**.** *Suppose* {*Xi*} *are independent Gamma distributed random variables, i.e.,* {*Xi*} ∼ Γ(*ki*, *θi*)*. Then the first two moments of the sum* ∑*<sup>i</sup> Xi can be given by*

$$\mathbb{E}\left[\sum\_{i} X\_{i}\right] = \sum\_{i} k\_{i} \theta\_{i\prime} \tag{A1}$$

$$\mathbb{E}\left[\left(\sum\_{i} X\_{i}\right)^{2}\right] = \sum\_{i} k\_{i} \theta\_{i}^{2} + \left(\sum\_{i} k\_{i} \theta\_{i}\right)^{2}.\tag{A2}$$

**Lemma A2** ([21])**.** *For the p-dimensional non-isotropic channel vector* **x** *whose strength is distributed as* **<sup>x</sup>**H**<sup>x</sup>** ∼ <sup>Γ</sup>(*k*, *<sup>θ</sup>*)*, when projected onto a s-dimensional subspace, the distribution of the projection power can be approximated as* Γ(*sk*/*p*, *θ*)*.*

**Lemma A3** ([30])**.** *If* **x** *is an N* × 1 *isotropic random vector and* **A** *is a constant matrix. Then we can have*

$$\mathbb{E}\_{\mathbf{x}}\left[\mathbf{x}^{\mathrm{H}}\mathbf{A}\mathbf{x}\right] = \frac{\mathrm{tr}(\mathbf{A})}{N}.\tag{A3}$$

#### **Appendix B. Proof of Theorem 1**

From Lemma 4 of [31], we can obtain the approximation of Equation (11) as follows:

$$R\_{l,k}(\mathbf{p}) \approx \log\_2 \left( 1 + \frac{p\_{\rm u} a^2 \mathbb{E} \left[ |\mathbf{a}\_{l,k}^{\rm H} \mathbf{g}\_{l,l,k}|^2 \right]}{\mathbb{E} \left[ \mathcal{I}\_{l,k} + p\_{\rm u} a^2 \sum\_{j \neq l} |\mathbf{a}\_{l,k}^{\rm H} \mathbf{g}\_{l,j,k}|^2 + a^2 \|\mathbf{a}\_{l,k}^{\rm H}\|^2 \right]} \right) . \tag{A4}$$

Consider the MRC receiver, it can be seen from Equation (A4) that the following terms need to be simplified:

$$\mathbb{E}\left[\|\mathfrak{g}\_{l,l,k}\|\!\!\right] = \hat{k}\_{l,l,k}\theta\_{l,l,k}^2 + (\hat{k}\_{l,l,k}\theta\_{l,l,k})^2 \tag{A5}$$

where

$$\hat{k}\_{l,l,k} = \frac{N(\sum\_{m=1}^{M} \beta\_{l,m,l,k})^2}{\sum\_{m=1}^{M} \beta\_{l,m,l,k}^2} \quad \hat{\theta}\_{l,l,k} = \frac{\sum\_{m=1}^{M} \beta\_{l,m,l,k}^2}{\sum\_{m=1}^{M} \beta\_{l,m,l,k}}.\tag{A6}$$

This can be obtained by exploiting the fact that **g**ˆ <sup>H</sup> *<sup>l</sup>*,*l*,*k***g**ˆ*l*,*l*,*<sup>k</sup>* <sup>∼</sup> <sup>Γ</sup>(<sup>ˆ</sup> *kl*,*l*,*k*, ˆ *θl*,*l*,*k*) and Lemma A1.

Due to the independence between **g**ˆ*l*,*l*,*<sup>k</sup>* and **g**ˆ*l*,*i*,*<sup>j</sup>* when *j* = *k*, we have

$$\mathbb{E}\left[\hat{\mathbf{g}}\_{l,l,k}^{\mathrm{H}}\mathbb{E}\left[\hat{\mathbf{g}}\_{l,l,j}\hat{\mathbf{g}}\_{l,i,j}^{\mathrm{H}}\right]\hat{\mathbf{g}}\_{l,l,k}\right] \stackrel{(a)}{=} N\sum\_{m=1}^{M} \beta\_{l,m,l,k}\beta\_{l,m,i,j} \tag{A7}$$

where (*a*) results from the fact that the channel strength is Gamma-distributed.

Because of the pilot contamination, **g**ˆ*l*,*l*,*<sup>k</sup>* and **g**ˆ*l*,*i*,*<sup>k</sup>* are dependent, we have

$$\mathbb{E}\left[|\mathbf{g}\_{l,l,k}^{\rm H}\mathbf{g}\_{l,i,k}|^2\right] \stackrel{(a)}{=} \left(N\sum\_{m=1}^{M} \beta\_{l,m,l,k}^{1/2} \beta\_{l,m,i,k}^{1/2}\right)^2 + N\sum\_{m=1}^{M} \beta\_{l,m,l,k}\beta\_{l,m,i,k} \tag{A8}$$

where (a) is obtained due to the fact that the channel strength is Gamma-distributed and due to Lemma A1.

Using the fact that **g**ˆ*l*,*l*,*<sup>k</sup>* and **g**˜*l*,*i*,*<sup>j</sup>* are independent, we have

$$\sum\_{(i,j)} \mathbb{E}\left[\mathbf{g}\_{l,l,k}^{\mathrm{H}} \mathbb{E}\left[\mathbf{\bar{g}}\_{l,i,j}\mathbf{\bar{g}}\_{l,i,j}^{\mathrm{H}}\right] \mathbf{\hat{g}}\_{l,l,k}\right] = N \sum\_{m=1}^{M} \sum\_{i=1}^{L} \sum\_{j=1}^{K} \beta\_{l,m,l,k} \eta\_{l,m,i,j}. \tag{A9}$$

For the last term, we first calculate

$$\begin{split} & \mathbb{E} \left[ \hat{\mathbf{g}}\_{l,l,k}^{\mathrm{H}} \text{diag} \left( p\_{\mathrm{u}} \mathbf{\hat{G}}\_{l,l} \mathbf{G}\_{l,l}^{\mathrm{H}} \right) \hat{\mathbf{g}}\_{l,l,k} \right] \\ &= p\_{\mathrm{u}} \mathbb{E} \left[ \hat{\mathbf{g}}\_{l,l,k}^{\mathrm{H}} \hat{\mathbf{g}}\_{l,l} \mathbf{k}\_{l,l}^{\mathrm{H}} \hat{\mathbf{g}}\_{l,l,k} \right] + p\_{\mathrm{u}} \sum\_{j \neq k} \mathbb{E} \left[ \hat{\mathbf{g}}\_{l,l,k}^{\mathrm{H}} \hat{\mathbf{g}}\_{l,j} \mathbf{g}\_{l,l,j}^{\mathrm{H}} \hat{\mathbf{g}}\_{l,l,k} \right] \\ &= p\_{\mathrm{u}} \left( \hat{\mathbf{k}}\_{l,l,k} \hat{\boldsymbol{\theta}}\_{l,l,k}^{2} + (\hat{\mathbf{k}}\_{l,l,k} \hat{\boldsymbol{\theta}}\_{l,l,k})^{2} + N \sum\_{j \neq k} \sum\_{m=1}^{M} \beta\_{l,m,l,k} \beta\_{l,m,l,j} \right). \end{split} \tag{A10}$$

Then we can obtain

$$\begin{split} & \mathbb{E}\left[\mathbf{g}\_{l,l,k}^{\rm H} \mathbb{E}\left[\text{diag}\left(p\_{\text{u}} \sum\_{i=1}^{L} \mathbf{G}\_{l,i} \mathbf{G}\_{l,i}^{\rm H} + \mathbf{I}\right)\right] \hat{\mathbf{g}}\_{l,l,k}\right] \\ &= \mathbb{E}\left[\hat{\mathbf{g}}\_{l,l,k}^{\rm H} \text{diag}\left(p\_{\text{u}} \mathbf{G}\_{l,l} \mathbf{G}\_{l,l}^{\rm H}\right) \hat{\mathbf{g}}\_{l,l,k}\right] + \mathbb{E}\left[\sum\_{n=1}^{MN} \left|\hat{\varrho}\_{l,n,l,k}\right|^{2} \mathbf{D}\_{\text{R}}\right] \\ &= p\_{\text{u}} \left(\hat{k}\_{l,l,k} \hat{\theta}\_{l,l,k}^{2} + (\hat{k}\_{l,l,k} \hat{\theta}\_{l,l,k})^{2} + N \sum\_{j \neq k} \sum\_{m=1}^{M} \beta\_{l,m,l,k} \beta\_{l,m,l,j}\right) + \mathbf{Y}\_{l,k}. \end{split} \tag{A11}$$

Substituting Equations (A5), (A7)–(A9), and (A11) into Equation (A4) yields the closed-form expression expressed by Equation (23). This completes the proof.

#### **Appendix C. Proof of the Theorem 2**

Consider a ZF receiver, similar to the proof of Theorem 1. The following terms need to be calculated.

For the term E 1 **a**<sup>H</sup> *l*,*k*<sup>2</sup> , we have

$$\frac{1}{\|\|\mathbf{a}\_{l,k}^{\rm H}\|\|^2} = \left| \frac{\mathbf{a}\_{l,k}^{\rm H}}{\|\|\mathbf{a}\_{l,k}^{\rm H}\|^2} \hat{\mathbf{g}}\_{l,l,k} \right| \sim \Gamma\left( \frac{MN - K + 1}{MN} \hat{k}\_{l,l,k}, \hat{\boldsymbol{\theta}}\_{l,l,k} \right), \tag{A12}$$

which results form Lemma A2 and from the fact that, from the perspective of each user, an intended beam lies in a subspace of dimension *s* = *MN* − *K* + 1 with ZF receivers. Thus,

$$\mathbb{E}\left[\frac{1}{\|\mathbf{a}\_{l,k}^H\|^2}\right] = \frac{MN - K + 1}{MN} \mathbb{E}\_{lJ,k} \theta\_{lJ,k}.\tag{A13}$$

Next, due to the independence between **a***l*,*<sup>k</sup>* and **g**ˆ*l*,*i*,*j*, we have

$$\begin{split} & \sum\_{i \neq l} \sum\_{j \neq k} \mathbb{E} \left[ \mathbf{a}\_{l,k}^{\mathrm{H}} \mathbb{E} \left[ \hat{\mathbf{g}}\_{l,i,j} \hat{\mathbf{g}}\_{l,i,j}^{\mathrm{H}} \right] \, \mathbf{a}\_{l,k} \right] \\ & \overset{(a)}{=} \sum\_{i \neq l} \sum\_{j \neq k} \mathbb{E} \left[ \mathbb{E} \left[ \mathbf{a}\_{l,k}^{\mathrm{H}} \hat{\mathbf{g}}\_{l,i,j} \hat{\mathbf{g}}\_{l,i,j}^{\mathrm{H}} \mathbf{a}\_{l,k} \right] \right] \\ & \overset{(b)}{=} \frac{1}{MN} \mathbb{E} \left[ \hat{\mathbf{g}}\_{l,i,j}^{\mathrm{H}} \hat{\mathbf{g}}\_{l,i,j} \right] \\ & \overset{(c)}{=} \frac{1}{M} \sum\_{i \neq l} \sum\_{j \neq k} \sum\_{m=1}^{M} \mathbb{1}\_{l,m,i,j} . \end{split} \tag{A14}$$

where (a) results from the fact that **a***l*,*<sup>k</sup>* and **g**ˆ*l*,*i*,*<sup>j</sup>* are independent, (b) results from Lemma A3, and (c) results from the fact that **g**ˆ <sup>H</sup> *l*,*i*,*j* **<sup>g</sup>**ˆ*l*,*i*,*<sup>j</sup>* <sup>∼</sup> <sup>Γ</sup>(<sup>ˆ</sup> *kl*,*i*,*j*, ˆ *θl*,*i*,*j*).

Similarly, we have

$$\sum\_{(i,j)} \mathbb{E}\left[\frac{\mathbf{a}\_{l,k}^{\mathrm{H}}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|^2} \mathbb{E}\left[\tilde{\mathbf{g}}\_{l,i,j} \tilde{\mathbf{g}}\_{l,i,j}^{\mathrm{H}}\right] \frac{\mathbf{a}\_{l,k}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|^2}\right] = \frac{1}{\mathcal{M}} \sum\_{i=1}^{L} \sum\_{j=1}^{K} \sum\_{m=1}^{M} \eta\_{l,m,i,j}.\tag{A15}$$

Due to the pilot contamination, **a***l*,*<sup>k</sup>* and **g**ˆ*l*,*i*,*<sup>k</sup>* are dependent, we have

$$\sum\_{i\neq l} \mathbb{E}\left[ \left| \frac{\mathbf{a}\_{l,k}^{\mathrm{H}}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \hat{\mathbf{g}}\_{l,i,k} \right|^2 \right] = \frac{MN - K + 1}{M} \sum\_{i\neq l} \sum\_{m=1}^{M} \beta\_{l,m,i,k} \tag{A16}$$

For the last term, we first calculate

$$\begin{split} & \mathbb{E} \left[ \frac{\mathbf{a}\_{l,k}^{\mathrm{H}}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \text{diag} \left( p\_{\mathbf{u}} \hat{\mathbf{G}}\_{l,l} \hat{\mathbf{G}}\_{l,l}^{\mathrm{H}} \right) \frac{\mathbf{a}\_{l,k}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \right] \\ &= p\_{\mathbf{u}} \mathbb{E} \left[ \frac{\mathbf{a}\_{l,k}^{\mathrm{H}}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \hat{\mathbf{g}}\_{l,l,k} \hat{\mathbf{g}}\_{l,l,k}^{\mathrm{H}} \frac{\mathbf{a}\_{l,k}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \right] \\ &= \frac{MN - K + 1}{MN} p\_{\mathbf{u}} \hat{\mathbf{g}}\_{l,l,k} \boldsymbol{\theta}\_{l,l,k} . \end{split} \tag{A17}$$

Then we can obtain

$$\begin{split} & \mathbb{E} \left[ \frac{\mathbf{a}\_{l,k}^{\mathrm{H}}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \mathbb{E} \left[ \text{diag} \left( p\_{\mathrm{u}} \sum\_{i=1}^{L} \mathbf{G}\_{l,i} \mathbf{G}\_{l,i}^{\mathrm{H}} + \mathbf{I} \right) \right] \frac{\mathbf{a}\_{l,k}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \right] \\ &= \mathbb{E} \left[ \frac{\mathbf{a}\_{l,k}^{\mathrm{H}}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \text{diag} \left( p\_{\mathrm{u}} \hat{\mathbf{G}}\_{l,l} \hat{\mathbf{G}}\_{l,l}^{\mathrm{H}} \right) \frac{\mathbf{a}\_{l,k}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \right] + \mathbb{E} \left[ \frac{\mathbf{a}\_{l,k}^{\mathrm{H}}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \mathbf{D}\_{\mathrm{R}} \frac{\mathbf{a}\_{l,k}}{\|\mathbf{a}\_{l,k}^{\mathrm{H}}\|} \right] \\ & \overset{(a)}{=} \frac{MN - K + 1}{MN} p\_{\mathrm{u}} \hat{k}\_{l,k} \hat{\boldsymbol{\theta}}\_{l,l,k} + \frac{1}{M} \Delta\_{l,k} \end{split} \tag{A18}$$

where (*a*) results form Lemma A3.

Substituting Equations (A13)–(A16), and (A18) into Equation (A4) yields the closed-form expression expressed by Equation (25). This completes the proof.

#### **References**


c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Channel Sounding for Multi-User Massive MIMO in Distributed Antenna System Environment**

**Seoyoung Yu † and Jeong Woo Lee \*,†**

School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Korea; holywillow@naver.com

**\*** Correspondence: jwlee2@cau.ac.kr; Tel.: +82-2-820-5734

† Current address: 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Korea.

Received: 20 November 2018; Accepted: 21 December 2018; Published: 1 January 2019

**Abstract:** We propose a generation scheme for a sounding reference signal (SRS) suitable for supporting a large number of users in massive multi-input multi-output (MIMO) system with a distributed antenna system (DAS) environment. The proposed SRS can alleviate the pilot contamination problem which occurs inherently in the multi-user system due to the limited number of orthogonal sequences. The proposed SRS sequence is generated by applying a well-chosen phase rotation to the conventional LTE/LTE-A SRS sequences without requiring an increased amount of resource usage. We also propose using the correlation-aided channel estimation algorithm as a supplemental scheme to obtain more reliable and refined channel estimation. It is shown that the proposed SRS sequence and the supplemental channel estimation scheme improve significantly the channel estimation performance in multi-user massive MIMO systems.

**Keywords:** massive multi-input multi-output (MIMO); distributed antenna systems (DAS); sounding reference signal (SRS); channel estimation

#### **1. Introduction**

It is expected that the amount of mobile wireless traffic in 2020 will be 1000 times higher than that of 2010 [1–3]. Along with the dramatic growth in the demand for wireless communications, performance requirements for data rate, spectral efficiency and energy efficiency are also getting higher [4–6]. To meet these growing demand and performance requirements for wireless communications, massive multi-input multi-output (m-MIMO) technology was proposed as one of the key technologies for the next generation cellular networks, known as fifth generation (5G) systems [7–10]. It is known that m-MIMO systems, whose transmitter or receiver is equipped with massive number of antennas, can improve the spectral efficiency and save energy in wireless communication systems [10–12]. As a result, m-MIMO systems have recently attracted many researchers and engineers in many aspects. Multi-user m-MIMO technology, in which a base station (BS) uses a large number of antennas to serve many pieces of user equipment (UE) simultaneously on the same time-frequency resource, is one example that is actively studied to be practically adopted in 5G systems [13–15]. The distributed antenna system (DAS) has also been considered a key technology for feasible deployment of 5G systems [16–18]. In DAS configuration, there exist a digital unit (DU) and multiple radio units (RUs) in a cell, where each RU is connected to DU via fiber optic links. The DU manages RUs in a centralized manner, by which RUs can transmit and receive signals in a cooperative manner.

The key requirement for enjoying the benefit of m-MIMO technology is to obtain the accurate channel state information (CSI) for each link at the BS, or at RUs in DAS environment. In the frequency-division duplex (FDD) approach, UEs estimate downlink (DL) channels by using DL pilot signals, or sounding reference signals (SRS), transmitted from the BS [19]. The required number of DL pilots in an FDD based approach is proportional to the number of BS antennas multiplied by the number of served UEs, which complicates the adoption of such a DL channel estimation in massive MIMO environments. Thus, the m-MIMO systems typically employ the time-division duplex (TDD) approach to estimate the DL channel. In the TDD approach, UEs estimate DL channels by sending mutually orthogonal uplink (UL) pilot signals (SRS) to BS based on the DL and UL channel reciprocity within the channel coherence interval [9]. The total number of UL pilots required in such a TDD based approach is proportional to the number of served UEs irrespective of the number of BS antennas [7,9].

For a given length of sequence, say *M*, we may generate at most *M* orthogonal sequences. Conventionally, pilot sequences are mutually orthogonal, so the maximum number of pilot sequences is limited by *M*. If the number of UEs in simultaneous service exceeds *M*, we need to reuse some or all of the already generated orthogonal sequences. This results in the so-called pilot contamination problem [20] due to the violation of orthogonality between pilot sequences. The pilot contamination mainly limits the performance improvement of m-MIMO systems [7,15]. Most of the prior works that proposed to resolve this problem have considered the use of mutually orthogonal SRS sequences. They include pilot signal coordination [21], blind channel estimation with data samples [22] and cooperative multi-cell precoding in m-MIMO systems [23]. The number of mutually orthogonal SRS sequences is mainly limited by the length of the base sequence. In Long Term Evolution (LTE)/Long Term Evolution-Advanced (LTE-A) systems, the number of orthogonal SRS sequences is 16, which is not suitable for serving a large number of users in m-MIMO environments [24,25]. Using long pilot sequences may reduce the pilot contamination, but they replace data sequences and reduce channel spectral efficiency and throughput. If pilot sequences are made too long, they may occupy even the restricted band, which must be prevented. Thus, it may be desirable to generate a larger number of SRS sequences without increasing the sequence length.

For this purpose, we propose a mechanism for generating SRS sequences with a lower level of pilot contamination which is suitable for serving a large number of UEs. In the proposed mechanism, a phase rotation is applied to the base sequence without increasing the sequence length. The resultant SRS sequences may be mutually correlated and thus still incur a pilot contamination as the number of UEs grows and exceeds the length of base sequence. This results in high channel estimation error when a linear estimation based on the orthogonality of SRS sequences is used. To resolve this problem, we propose a two-step channel estimation algorithm by which the least square (LS) estimation [26] is first applied and the minimum mean squared error (MMSE) estimation [27] is additionally applied only to the group of UEs using mutually correlated SRS sequences. The proposed *correlation-aided channel estimation* shows the improved performance in the channel estimation. It is shown that the proposed SRS combined with the supplemental channel estimation algorithm guarantees lower mean squared error (MSE) in channel estimation, which alleviates the pilot contamination problem.

The rest of the paper is organized as follows. In Section 2, we introduce the system model in consideration. In Section 3, a brief review of SRS sequences in conventional LTE/LTE-A systems is provided. We introduce the generation of proposed SRS sequences and the analysis on the resultant correlation in Section 4. In Section 5, we propose the two-step channel estimation algorithm composed of an LS estimation followed by a supplemental MMSE estimation. We also formulate and analyze the MSE obtained for SRS sequences with an LS estimation. We verify the performances in various aspects by computer simulations in Section 6 and conclude this paper in Section 7.

#### **2. System Model**

Consider a cell having a DU and *R* RUs each of which serves *K* UEs as shown in Figure 1, where each RU has *NT* transmit and receive antennas. We index RU by *r* ∈ {0, ··· , *R* − 1}, and index the UE served by RU *r* as *r*(*k*), *k* ∈ {0, ··· , *K* − 1}. We consider orthogonal frequency division multiplexing (OFDM) communications with *Nc* subcarriers between UE and RU. We suppose the channel reciprocity, by which the DL channel from RU to UE can be estimated by using the UL pilots sent from UEs under the constraint that the time delay from the UL channel estimation to the DL transmission is less than the coherence time of the channel [7]. Then, the UL channel estimated

by sending SRS sequence from each UE to RU is used as the DL channel estimation. Let **s***r*(*k*) = [*sr*(*k*)[0] ··· *sr*(*k*)[*<sup>M</sup>* − <sup>1</sup>]]*<sup>T</sup>* denote the SRS sequence of UE *<sup>r</sup>*(*k*), where the superscript *<sup>T</sup>* denotes a transpose of a vector. We also let **<sup>h</sup>***r*(*k*),*<sup>r</sup>*[*m*] ∈ C*NT*×<sup>1</sup> denote the channel gain between UE *<sup>r</sup>*(*k*) and RU *r* over the *m*-th subcarrier. Then, the channel gains corresponding to *M* subcarriers, **h***r*(*k*),*<sup>r</sup>*[*m*], *m* = 0, ··· , *M* − 1, are estimated by using an SRS sequence.

The signal received by RU *<sup>r</sup>* over the subcarrier *<sup>m</sup>* is denoted by **<sup>y</sup>***r*[*m*] ∈ C*NT*×<sup>1</sup> and obtained by

$$\mathbf{y}\_r[m] = \sum\_{k=0}^{K-1} \mathbf{h}\_{r(k),r}[m] s\_{r(k)}[m] + \sum\_{r'=0, r' \neq r}^{R-1} \sum\_{k=0}^{K-1} \mathbf{h}\_{r'(k),r}[m] s\_{r'(k)}[m] + \mathbf{n}\_r[m], \tag{1}$$

where **<sup>n</sup>***r*[*m*] ∈ C*NT*×<sup>1</sup> is the zero-mean additive white Gaussian noise vector with covariance matrix *σ*2 *<sup>n</sup>***I***NT*×*NT* . Note that **h***r*(*k*),*<sup>r</sup>*[*m*] = *βr*(*k*),*<sup>r</sup>* **g***r*(*k*),*<sup>r</sup>*[*m*], where *βr*(*k*),*<sup>r</sup>* represents the large scale fading while each entry of **<sup>g</sup>***r*(*k*),*<sup>r</sup>*[*m*] ∈ C*NT*×<sup>1</sup> denotes the small scale fading represented by an independent and identically distributed (i.i.d.) zero mean complex Gaussian random variable with unit variance. Note that the large scale fading factor *β*<sup>2</sup> *<sup>r</sup>*(*k*),*<sup>r</sup>* <sup>=</sup> *<sup>d</sup>*−*<sup>a</sup> <sup>r</sup>*(*k*),*<sup>r</sup>* , where *dr*(*k*),*<sup>r</sup>* is a distance between UE *r*(*k*) and RU *r* , and *a* is an attenuation factor.

**Figure 1.** Multi-user m-MIMO configuration in DAS environments.

#### **3. Conventional Channel Sounding Reference Signal in LTE/LTE-A Systems**

In the following, we briefly introduce the generation of SRS in conventional LTE/LTE-A systems [24,25]. The structure of SRS symbol is illustrated in Figure 2. Basically, the SRS sequence is generated by a cyclic shift of a base sequence, which is obtained from Zadoff–Chu sequence [24,25] as presented below. Let *NRB sc* be the number of subcarriers per RB, where *NRB sc* = 12 in LTE/LTE-A systems, and *L* be the number of assigned subcarriers for SRS or sounding bandwidth. Let *D* be the decimation factor which is the number of SRS sequences sharing the allocated sounding bandwidth, where the length of SRS sequence is *M* = *L*/*D*. Note that *L* is the multiple of *NRB sc* , i.e., *<sup>L</sup>* = *<sup>n</sup>* · *<sup>N</sup>RB sc* , <sup>1</sup> ≤ *<sup>n</sup>* ≤ *<sup>N</sup>UL RB* , where *<sup>N</sup>UL RB* is the uplink system bandwidth in terms of RBs. Let *Lz* denote the length of Zadoff–Chu sequence used to generate the base sequence of length *M*, where *Lz* is given by the largest prime number such that *Lz* < *M*.

Let *xq*[*m*] denote the *q*-th root Zadoff–Chu sequence defined by

$$\exp\_q[m] = \exp\left\{-j\frac{\pi qm(m+1)}{L\_z}\right\}, \quad 0 \le m < L\_{z\prime} \tag{2}$$

where

$$\begin{aligned} q &= \lfloor \vec{q} + 0.5 \rfloor + \upsilon(-1)^{\lfloor 2\eta \rfloor}, \\ q &= L\_z(\mu + 1) / 31, \end{aligned} \tag{3}$$

with *<sup>u</sup>* ∈ {0, 1, ··· , 29} and *<sup>v</sup>* = 0 if *<sup>M</sup>* = *nNRB sc* , *<sup>n</sup>* ≤ 5 and *<sup>v</sup>* = 0, 1 if *<sup>M</sup>* = *nNRB sc* , *n* ≥ 6. The base sequence *x*¯[*m*] is obtained by

$$\mathbb{1}[m] = \mathbb{x}\_q[m \mod L\_z], \quad 0 \le m < M. \tag{4}$$

Note that base sequences are divided into groups, where *u* is the group number associated with the physical cell ID and the length of the SRS sequence, and *v* is the base sequence number within the group. The SRS sequence *<sup>x</sup>*(*α*)[*m*] of length *<sup>M</sup>* is defined by applying a cyclic shift *<sup>α</sup>*, *<sup>α</sup>* ∈ {0, 1 ··· , 7}, to the base sequence *x*¯[*m*] as

$$\mathbb{1}^{(a)}[m] = \mathfrak{e}^{i2\pi} \mathbb{1}\_{c}^{\mathfrak{m}} \mathfrak{x}[m], \quad 0 \le m < M,\tag{5}$$

where *Lc* > 7 to obtain distinct values of *e j*2*π <sup>α</sup> Lc* for different *α* ∈ {0, 1 ··· , 7}. For any *α<sup>i</sup>* and *α<sup>j</sup>* chosen from {0, ··· , 7}, two sequences **<sup>x</sup>**(*αi*) = [*x*(*αi*)[0] ··· *<sup>x</sup>*(*αi*)[*<sup>M</sup>* <sup>−</sup> <sup>1</sup>]]*<sup>T</sup>* and **<sup>x</sup>**(*αj*) = [*x*(*αj*) [0] ··· *<sup>x</sup>*(*αj*) [*<sup>M</sup>* − <sup>1</sup>]]*<sup>T</sup>* are orthogonal if <sup>1</sup> *<sup>M</sup>* **<sup>x</sup>**(*αi*)*H***x**(*αj*) <sup>=</sup> *<sup>δ</sup>i*−*j*, where *<sup>δ</sup>i*−*<sup>j</sup>* <sup>=</sup> 1 if *<sup>i</sup>* <sup>=</sup> *<sup>j</sup>* and 0 otherwise, and the superscript *<sup>H</sup>* denotes a conjugate transpose of a vector. This condition reduces to <sup>1</sup> *<sup>M</sup>* <sup>∑</sup>*M*−<sup>1</sup> *<sup>m</sup>*=<sup>0</sup> *<sup>e</sup> j* <sup>2</sup>*<sup>π</sup> Lc* (*αj*−*αi*)*<sup>m</sup>* <sup>=</sup> *<sup>δ</sup>i*−*<sup>j</sup>* by using Equation (5) and the property |*x*¯[*m*]| <sup>2</sup> = 1 for all *m*, which is clear from Equations (2)–(4).

This condition is satisfied only if (*α<sup>j</sup>* <sup>−</sup> *<sup>α</sup>i*) *<sup>M</sup> Lc* is an even integer, or, in other words, *Lc* is a factor of (*α<sup>j</sup>* <sup>−</sup> *<sup>α</sup>i*) *<sup>M</sup>* <sup>2</sup> , for *α<sup>i</sup>* = *α<sup>j</sup>* ∈ {0, ··· , 7}. Consequently, *Lc* must be an integer that is greater than 7 and is a factor of <sup>1</sup> <sup>2</sup> *M*.

**Figure 2.** SRS symbol structure.

Multiple SRS sequences are defined from a single base sequence by using different values of *α* and the decimation factor. In LTE/LTE-A, the decimation factor of two is used and the signal occupies every second subcarrier within the allocated sounding bandwidth. By using distinct SRS sequences obtained with different values of *α* and by using distinct sets of subcarriers as a result of decimation, multiple UEs can estimate their channel gains and can be served by RU simultaneously. In conventional LTE/LTE-A systems using *α* ∈ {0, ··· , 7} and the decimation factor of two, we can obtain only 16 orthogonal sequences. In the multi-user systems with higher number of UEs than 16, we need to design a larger set of SRS sequences having low cross-correlation.

#### **4. Proposed Channel Sounding Reference Signal for Multi-User Systems**

The lack of orthogonal SRS sequences may cause the pilot contamination problem. The easy way to alleviate this is using longer SRS sequences or wider sounding bandwidth. However, this approach may result in the degraded performance of channel estimation in the frequency selective environment and the lower spectral efficiency and throughput because SRS sequences replace data sequences. Moreover, if the sounding bandwidth is too wide, SRS sequences may occupy the restricted band, which must be prohibited. Thus, we aim to generate a new set of SRS sequences showing reduced pilot contamination without increasing the sounding bandwidth or sequence length. We apply phase rotation to the LTE/LTE-A SRS sequences, which is introduced in Section 3 to generate a new SRS sequence as

$$\mathbb{E}\left(\mathbf{x}^{(a,s,p)}[m] \triangleq e^{j2\pi\frac{a}{M}m}e^{j2\pi\frac{p}{L\_p}m}\mathbf{x}^{(a)}[m] = e^{j2\pi\left(\frac{s}{M} + \frac{p}{L\_p} + \frac{a}{L\_c}\right)m}\mathbf{\hat{x}}[m], \quad 0 \le m < M,\tag{6}$$

where 0 <sup>≤</sup> *<sup>s</sup>* <sup>&</sup>lt; *<sup>M</sup> Lc* and 0 ≤ *p* < *Lp*, and the last equality comes from Equation (5). Note that *Lc* is an integer greater than 7 which can divide <sup>1</sup> <sup>2</sup> *M* as introduced in Section 3 and *Lp* is a prime number which is smaller than *Lc*. For a given SRS sequence length *M*, we may generate up to *M* orthogonal sequences. However, by using Equation (5), we can generate only eight orthogonal sequences with varying *<sup>α</sup>* <sup>=</sup> 0, ··· , 7. Thus, we use the phase rotation *<sup>e</sup>j*2*<sup>π</sup> <sup>s</sup> <sup>M</sup> <sup>m</sup>*, 0 <sup>≤</sup> *<sup>s</sup>* <sup>&</sup>lt; *<sup>M</sup> Lc* , together with *e j*2*π <sup>α</sup> Lc <sup>m</sup>*, 0 <sup>≤</sup> *<sup>α</sup>* <sup>&</sup>lt; *Lc*, to generate *M* orthogonal SRS sequences without incurring pilot contamination. If the number of UEs exceeds *M*, we need to generate extra SRS sequences instead of reusing already generated ones. For this purpose, we apply additional phase rotation *e j*2*π <sup>p</sup> Lp <sup>m</sup>*, 0 <sup>≤</sup> *<sup>p</sup>* <sup>&</sup>lt; *Lp*, where *Lp* needs to be coprime with *M* and thus be also coprime with *Lc* in order to make the resultant sequences distinct from already generated *M* sequences. We empirically found that a prime number *Lp* smaller than *Lc* results in a good performance.

By Equations (2)–(4), we can rewrite Equation (6) as

$$\mathbf{x}^{(a,s,p)}[m] = \begin{cases} \exp\left\{j2\pi\left(\left(\frac{s}{M} + \frac{p}{L\_p} + \frac{a}{L\_c}\right)m - \frac{qm(m+1)}{2L\_z}\right)\right\}, & \text{for } 0 \le m < L\_z, \\\exp\left\{j2\pi\left(\left(\frac{s}{M} + \frac{p}{L\_p} + \frac{a}{L\_c}\right)m - \frac{q\left(m-L\_z\right)\left(m-L\_z+1\right)}{2L\_z}\right)\right\}, & \text{for } L\_z \le m < M. \end{cases} \tag{7}$$

The correlation of SRS sequences **s***r*(*k*) and **s***<sup>r</sup>*(*j*) is defined and expanded as

$$\mathbb{C}\_{r(k),r'(j)} \triangleq \frac{1}{M} \mathbf{s}\_{r(k)} \, ^H \mathbf{s}\_{r'(j)} = \frac{1}{M} \sum\_{m=0}^{M-1} s\_{r(k)}^\* [m] \mathbf{s}\_{r'(j)} [m] = \frac{1}{M} \sum\_{m=0}^{M-1} e^{-j2\pi \left(\frac{s-s'}{M} + \frac{p-p'}{L\_P} + \frac{a-s'}{L\_C}\right)} \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, x$$

where *sr*(*k*)[*m*] = *x*(*α*,*s*,*p*)[*m*], *sr*(*j*)[*m*] = *x*(*<sup>α</sup>* ,*s* ,*p* )[*m*] and the superscript <sup>∗</sup> represents the complex conjugate of a complex variable. The detailed derivation of Equation (8) is provided in Appendix A. It is clear that *Cr*(*k*),*<sup>r</sup>*(*j*) obtained with *s* = *s* , *p* = *p* and *α* = *α* corresponds to the auto-correlation of **s***r*(*k*) because **s***<sup>r</sup>*(*j*) = **s***r*(*k*), where *Cr*(*k*),*r*(*k*) = <sup>1</sup> *<sup>M</sup>* <sup>∑</sup>*M*−<sup>1</sup> *<sup>m</sup>*=<sup>0</sup> *<sup>e</sup>j*·<sup>0</sup> <sup>=</sup> 1. The cross-correlation *Cr*(*k*),*<sup>r</sup>*(*j*) <sup>=</sup> 0 if *p* = *p* and either *s* = *s* or *α* = *α* , while *Cr*(*k*),*<sup>r</sup>*(*j*) = 0 if *p* = *p* by the following reason.

Consider *f*(*φ*) = <sup>1</sup> *<sup>M</sup>* <sup>∑</sup>*M*−<sup>1</sup> *<sup>m</sup>*=<sup>0</sup> *<sup>e</sup>j*2*πφm*, where *<sup>f</sup>*(*φ*) = 0 if *<sup>φ</sup><sup>M</sup>* is a nonzero integer, and *<sup>f</sup>*(*φ*) <sup>=</sup> 0, otherwise. If *p* = *p* , the rightmost side of Equation (8) becomes *f s* <sup>−</sup>*<sup>s</sup> <sup>M</sup>* <sup>+</sup> *<sup>α</sup>* −*α Lc* . If *s* = *s* or *α* = *α* , *s* <sup>−</sup>*<sup>s</sup> <sup>M</sup>* <sup>+</sup> *<sup>α</sup>* −*α Lc M* is a nonzero integer because *M* is a multiple of *Lc* as introduced in Section 3, and thus *Cr*(*k*),*<sup>r</sup>*(*j*) = *f s* <sup>−</sup>*<sup>s</sup> <sup>M</sup>* <sup>+</sup> *<sup>α</sup>* −*α Lc* = 0. On the other hand, if *p* = *p* , *s* <sup>−</sup>*<sup>s</sup> <sup>M</sup>* <sup>+</sup> *<sup>p</sup>* −*p Lp* <sup>+</sup> *<sup>α</sup>* −*α Lc M* cannot be a

nonzero integer because (*p* − *p*)*M* is not divisible by *Lp*, and, thus *Cr*(*k*),*<sup>r</sup>*(*j*) = *f s* <sup>−</sup>*<sup>s</sup> <sup>M</sup>* <sup>+</sup> *<sup>p</sup>* −*p Lp* <sup>+</sup> *<sup>α</sup>* −*α Lc* = 0, where *Lp* and *M* are coprime and |*p* − *p* | < *Lp*.

Consequently, for each *<sup>p</sup>*, we can generate a set of *<sup>M</sup>* orthogonal SRS sequences, where 0 <sup>≤</sup> *<sup>s</sup>* <sup>&</sup>lt; *<sup>M</sup> Lc* and 0 ≤ *α* < *Lc* enables the generation of *M* orthogonal sequences. Applying the phase rotation *e j*2*π <sup>p</sup> Lp <sup>m</sup>*, 0 <sup>≤</sup> *<sup>p</sup>* <sup>&</sup>lt; *Lp*, in Equation (7) enables to obtain *Lp* sets of *<sup>M</sup>* orthogonal sequences. Any two sequences obtained with different values of *p* are mutually correlated.

Distinct SRS sequences generated with different *α*, *s* and *p* by Equation (7) are assigned to different UEs. The first set of *M* orthogonal sequences generated with *p* = 0 are assigned to the first *M* UEs. Then, the next set of *M* orthogonal sequences generated with *p* = 1 are assigned to the next *M* UEs. This procedure is repeated by increasing *p* until all UEs are assigned SRS sequences.

#### **5. Channel Estimation**

#### *5.1. Least Square Channel Estimation*

Let **h**ˆ *<sup>r</sup>*(*k*),*r*[*m*] denote the estimation of the channel gain **h***r*(*k*),*r*[*m*]. We apply the LS estimation algorithm [26] to the received signal of SRS sequence for the channel estimation. In this process, we assume a block fading with length *M*, by which the channel is considered invariant over consecutive *M* subcarriers. Then, the estimation of channel gain at the *m*-th subcarrier in a fading block, **h**ˆ *<sup>r</sup>*(*k*),*r*[*m*], can also be denoted by **h**ˆ *<sup>r</sup>*(*k*),*<sup>r</sup>* and obtained as

$$
\hat{\mathbf{h}}\_{r(k),r}[m] = \hat{\mathbf{h}}\_{r(k),r} = \frac{1}{M} \sum\_{i=0}^{M-1} s\_{r(k)}^{\*}[i] \mathbf{y}\_{r}[i]. \tag{9}
$$

By using Equation (1), we can rewrite Equation (9) as

$$\begin{split} \hat{\mathbf{h}}\_{r(k),r} &= \frac{1}{M} \sum\_{l=0}^{M-1} \mathbf{s}\_{r(k)}^{\*} [i] \left\{ \sum\_{j=0}^{K-1} \mathbf{h}\_{r(j),r} [i] \mathbf{s}\_{r(j)} [i] + \sum\_{r'=0, r' \neq r}^{R-1} \sum\_{j=0}^{K-1} \mathbf{h}\_{r'(j),r} [i] \mathbf{s}\_{r'(j)} [i] + \mathbf{n}\_{l} [i] \right\} \\ &= \frac{1}{M} \sum\_{l=0}^{M-1} \left\{ \mathbf{h}\_{r(k),r} [i] + \sum\_{j=0, j \neq k}^{K-1} \mathbf{h}\_{r(j),r} [i] \mathbf{s}\_{r(k)}^{\*} [i] \mathbf{s}\_{r(j)} [i] + \sum\_{r'=0, r' \neq r}^{R-1} \sum\_{j=0}^{K-1} \mathbf{h}\_{r'(j),r} [i] \mathbf{s}\_{r(k)}^{\*} [i] \mathbf{s}\_{r(j)} [i] + \mathbf{s}\_{r(k)}^{\*} [i] \mathbf{n}\_{l} [i] \right\}, \end{split} \tag{10}$$

where *s*∗ *r*(*k*) [*i*]*sr*(*k*)[*i*] = |*sr*(*k*)[*i*]| <sup>2</sup> = 1 is used. If the channel is actually block faded with length *M*, the channel gain **h***r*(*k*),*r*[*m*] is identical for all *m* = 0, ··· , *M* − 1 and thus, we can represent the channel gain by **h***r*(*k*),*r*. Then, Equation (10) can be rewritten as

$$\hat{\mathbf{h}}\_{\mathbf{r}(k),r} = \mathbf{h}\_{\mathbf{r}(k),r} + \sum\_{j=0, j\neq k}^{K-1} \mathbf{h}\_{\mathbf{r}(j),r} \mathbb{C}\_{\mathbf{r}(k),r(j)} + \sum\_{r'=0, r'\neq r}^{R-1} \sum\_{j=0}^{K-1} \mathbf{h}\_{\mathbf{r}'(j),r} \mathbb{C}\_{\mathbf{r}(k),r'(j)} + \frac{1}{M} \sum\_{i=0}^{M-1} \mathbf{s}\_{\mathbf{r}(k)}^{\*} [i] \mathbf{n}\_{\mathbf{l}}[i]. \tag{11}$$

In case that all SRS sequences of UEs in the cell are mutually orthogonal, we have zero cross-correlation between any two SRS sequences so that Equation (11) is simplified as

$$
\hat{\mathbf{h}}\_{r(k),r} = \mathbf{h}\_{r(k),r} + \frac{1}{M} \sum\_{i=0}^{M-1} \mathbf{s}\_{r(k)}^{\*} [i] \mathbf{n}\_{r}[i]. \tag{12}
$$

We define the normalized mean squared error (MSE) of the channel estimation between UE *r*(*k*) and RU *r* at subcarrier *m* as

$$\sigma\_{MSE,r(k)}^2[m] = \frac{E\|\hat{\mathbf{h}}\_{r(k),r} - \mathbf{h}\_{r(k),r}[m]\|^2}{\beta\_{r(k),r}^2 N\_T}. \tag{13}$$

Then, we define the average MSE by

$$
\sigma\_{MSE}^2 = \frac{1}{KRM} \sum\_{k=0}^{K-1} \sum\_{r=0}^{R-1} \sum\_{m=0}^{M-1} \sigma\_{MSE,r(k)}^2 [m] \tag{14}
$$

and use this as the performance metric of channel estimation. Note that, if the channel is actually block faded with length *M*, the normalized MSE can be denoted by *σ*<sup>2</sup> *MSE*,*r*(*k*) and Equation (14) can be simplified as *σ*¯ <sup>2</sup> *MSE* <sup>=</sup> <sup>1</sup> *KR* <sup>∑</sup>*K*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> <sup>∑</sup>*R*−<sup>1</sup> *<sup>r</sup>*=<sup>0</sup> *<sup>σ</sup>*<sup>2</sup> *MSE*,*r*(*k*) . Under the assumption of block fading with length *M*, the average MSE can be obtained from Equations (11), (13) and (14) as

$$\begin{split} \bar{\rho}\_{MSE}^{2} &= \frac{1}{KR} \sum\_{k=0}^{K-1} \sum\_{r=0}^{R-1} \frac{E\|\hat{\mathbf{h}}\_{r(k),r} - \mathbf{h}\_{r(k),r}\|^2}{\hat{\rho}\_{r(k),r}^2 N\_T} \\ &= \frac{1}{KR} \sum\_{k=0}^{K-1} \sum\_{r=0}^{R-1} \frac{1}{\hat{\rho}\_{r(k),r}^2} \left\{ \sum\_{j=0, j\neq k}^{K-1} \beta\_{r(j),r}^2 |\mathbf{C}\_{r(k),r(j)}|^2 + \sum\_{r'=0, r'\neq r}^{R-1} \sum\_{j=0}^{K-1} \beta\_{r'(j),r}^2 |\mathbf{C}\_{r(k),r'(j)}|^2 + \frac{1}{M} \sigma\_n^2 \right\}, \end{split} \tag{15}$$

where the detailed derivation is given in Appendix B.

Let us predict analytically the average MSE under the block fading environment. The large scale fading factors are assumed to be *βr*(*k*),*<sup>r</sup>* = *β*<sup>1</sup> and *β<sup>r</sup>*(*k*),*<sup>r</sup>* = *β*<sup>2</sup> for all *k*, *r* and *r* , *r* = *r*, which means that the large scale fading between UE and serving RU is represented by *β*<sup>1</sup> while the large scale fading between UE and other neighboring RUs are represented by *β*2. We suppose *β*<sup>2</sup> < *β*<sup>1</sup> because UE is usually served by a nearly located RU. Then, Equation (15) can be rewritten as

$$\sigma\_{MSE}^2 = \frac{1}{KR} \sum\_{k=0}^{K-1} \sum\_{r=0}^{R-1} \left\{ \sum\_{j=0, j\neq k}^{K-1} |\mathbb{C}\_{r(k),r(j)}|^2 + \frac{\beta\_2^2}{\beta\_1^2} \sum\_{r'=0, r'\neq r}^{R-1} \sum\_{j=0}^{K-1} |\mathbb{C}\_{r(k),r'(j)}|^2 + \frac{\sigma\_n^2}{\beta\_1^2 M} \right\}.\tag{16}$$

We consider the reuse of *M* orthogonal SRS sequences to UEs repeatedly without applying the phase rotation *e j*2*π <sup>p</sup> Lp <sup>m</sup>* in Equation (7). We suppose that each RU in the cell serves the equal number of UEs and distinct set of *<sup>M</sup> <sup>R</sup>* orthogonal sequences are assigned repeatedly to UEs in each RU. Then, we obtain a correlation matrix **<sup>C</sup>** ∈ C*KR*×*KR* of SRS sequences whose (*i*, *<sup>j</sup>*)-th entry is defined by

$$\mathbf{C}\_{ij} = \begin{cases} 1, & \text{if } |i-j| = kM, \quad k = 0, \cdots, \lfloor \text{KR}/M \rfloor\_{\text{\textquotedblleft}} \\ 0, & \text{else}. \end{cases} \tag{17}$$

Every other *<sup>M</sup> <sup>R</sup>* column and row of **<sup>C</sup>** correspond to UEs in the same RU. Due to the reuse of *<sup>M</sup> <sup>R</sup>* orthogonal sequences for each RU, SRS sequences used for different RUs are always mutually orthogonal. It follows that *Cr*(*k*),*<sup>r</sup>*(*j*) = 0 for all *k*, *j* and *r* = *r* and Equation (16) becomes

$$
\sigma\_{MSE}^2 = \frac{1}{KR} \sum\_{k=0}^{K-1} \sum\_{r=0}^{R-1} \sum\_{j=0, j \neq k}^{K-1} \left| \mathbb{C}\_{r(k), r(j)} \right|^2 + \frac{\sigma\_n^2}{\beta\_1^2 M}. \tag{18}
$$

From Equation (17) and the allocation rule of orthogonal sequences to UEs introduced above, we obtain

$$\frac{1}{KR} \sum\_{k=0}^{K-1} \sum\_{r=0}^{R-1} \sum\_{j=0, j \neq k}^{K-1} |\mathbb{C}\_{r(k), r(j)}|^2 = \begin{cases} 0, & \text{if } KR \le M, \\ \frac{1}{KR} \sum\_{i=1}^{\lfloor \frac{KR}{M} \rfloor} 2(KR - iM), & \text{if } KR > M. \end{cases} \tag{19}$$

Consequently, we can rewrite Equation (18) as

$$\sigma\_{MSE}^2 = \begin{cases} \frac{\sigma\_n^2}{\beta\_1^2 M'} & \text{if } KR \le M, \\\left\lfloor \frac{KR}{M} \right\rfloor \left( 2 - \frac{M}{KR} \left( \left\lfloor \frac{KR}{M} \right\rfloor + 1 \right) \right) + \frac{\sigma\_n^2}{\beta\_1^2 M'} & \text{if } KR > M. \end{cases} \tag{20}$$

Note that *σ*¯ <sup>2</sup> *MSE* is independent of *<sup>β</sup>*<sup>2</sup> <sup>2</sup> while it depends on *<sup>β</sup>*<sup>2</sup> <sup>1</sup>. Recall that *<sup>β</sup>*<sup>2</sup> <sup>2</sup> is determined by the distance between a UE and its neighboring RU. Thus, the performance of channel estimation obtained by reusing orthogonal sequences repeatedly is not affected by how far neighboring RUs are located from UE. It is also clear from Equation (20) that the average MSE decreases as *M* grows. Since the number of orthogonal sequences is limited for a given sequence length, the number of UEs served simultaneously in a cell is limited. In the multi-user systems, a larger number of UEs must be accommodated, so we may need to use non-orthogonal sequences for channel sounding.

#### *5.2. Supplemental Correlation-Aided Channel Estimation*

As the number of UEs grows in the cell, it is impossible to assign orthogonal SRS sequences to all UEs even by using the proposed SRS sequences because the number of orthogonal sequences is limited to *M*. Thus, we propose using the supplemental channel estimation scheme to enhance the channel estimation performance even with correlated SRS sequences. Suppose DU knows SRS sequences and their cross-correlations. We rewrite Equation (11) as

$$\hat{\mathbf{h}}\_{r(k),r} = \sum\_{r'=0}^{R-1} \sum\_{j=0}^{K-1} \mathbf{h}\_{r'(j),r} \mathbb{C}\_{r(k),r'(j)} + \frac{1}{M} \sum\_{i=0}^{M-1} s\_{r(k)}^{\*} [i] \mathbf{n}\_{r}[i] \tag{21}$$

where 0 ≤ *r* < *R* and 0 ≤ *k* < *K*. This can be expressed by using matrix forms as

$$
\hat{\mathbf{H}} = \mathbf{H}\mathbf{C} + \mathbf{N},\tag{22}
$$

where **<sup>H</sup>**<sup>ˆ</sup> ∈ C*NT*×*KR*, **<sup>H</sup>** ∈ C*NT*×*KR* and **<sup>N</sup>** ∈ C*NT*×*KR* are matrices whose columns are **<sup>h</sup>**<sup>ˆ</sup> *<sup>r</sup>*(*k*),*r*, **h***r*(*k*),*<sup>r</sup>* and **<sup>n</sup>***r*(*k*),*r*, respectively, 0 ≤ *<sup>r</sup>* < *<sup>R</sup>* and 0 ≤ *<sup>k</sup>* < *<sup>K</sup>*. We let **<sup>C</sup>** ∈ C*KR*×*KR* denote a correlation matrix whose entries are *Cr*(*k*),*<sup>r</sup>*(*j*), where 0 ≤ *r*,*r* < *R* and 0 ≤ *k*, *j* < *K*. We define two classes of UEs. The first one is the set of UEs that are assigned mutually orthogonal SRS sequences. UEs assigned mutually correlated SRS sequences compose the second class. We reorder **H**ˆ , **H**, **C** and **N** and partition them as **H**ˆ = - **H**ˆ *<sup>u</sup>* **H**ˆ *<sup>c</sup>* , **H** = - **H***u* **H***c* , **C** = **I 0 0 A** and **N** = - **N***u* **N***c* , where submatrices with subscripts *u* and *c* correspond to the first and the second class of UEs, respectively, and **A** is a non-diagonal matrix. Then, Equation (22) can be rewritten in partitioned forms as

$$
\begin{bmatrix}
\hat{\mathbf{H}}\_{\boldsymbol{u}} & \hat{\mathbf{H}}\_{\boldsymbol{\epsilon}}
\end{bmatrix} = \begin{bmatrix}
\mathbf{H}\_{\boldsymbol{u}} & \mathbf{H}\_{\boldsymbol{\epsilon}}
\end{bmatrix} \begin{bmatrix}
\mathbf{I} & \mathbf{0} \\
\mathbf{0} & \mathbf{A}
\end{bmatrix} + \begin{bmatrix}
\mathbf{N}\_{\boldsymbol{u}} & \mathbf{N}\_{\boldsymbol{\epsilon}}
\end{bmatrix} \tag{23}
$$

which implies

$$
\hat{\mathbf{H}}\_{\mathsf{U}} = \mathbf{H}\_{\mathsf{U}} + \mathbf{N}\_{\mathsf{U}} \tag{24}
$$

$$
\hat{\mathbf{H}}\_{\ell} = \mathbf{H}\_{\ell}\mathbf{A} + \mathbf{N}\_{\ell}.\tag{25}
$$

It is clear from Equation (24) that **h**ˆ *<sup>r</sup>*(*k*),*<sup>r</sup>* = **<sup>h</sup>***r*(*k*),*<sup>r</sup>* + **<sup>n</sup>***r*(*k*),*<sup>r</sup>* for the first class of UEs. So, **<sup>h</sup>**<sup>ˆ</sup> *r*(*k*),*r* is considered to estimate sufficiently the actual channel **h***r*(*k*),*<sup>r</sup>* of UE *r*(*k*) in the first class. On the other hand, as can be seen from Equation (25), for the second class of UEs, **h**ˆ *<sup>r</sup>*(*k*),*<sup>r</sup>* includes the linear combination of other UEs' channels as well. Thus, we need a supplemental procedure to obtain the more reliable channel estimations for UEs in the second class. For this purpose, we propose the *correlation-aided channel estimation* which applies the MMSE algorithm [27] to the estimated channels for the second class of UEs as the supplemental procedure. We obtain the refined channel estimation ˜ **H**ˆ *<sup>c</sup>* for UEs in the second class by multiplying the MMSE nulling matrix **W** to **H**ˆ *<sup>c</sup>* as

$$
\vec{\mathbf{H}}\_{\text{c}} = \hat{\mathbf{H}}\_{\text{c}} \mathbf{W}\_{\text{c}} \tag{26}
$$

where

$$\mathbf{W} = \mathbf{A}^H \left(\mathbf{A}\mathbf{A}^H + \mathbf{B}^{-1}\mathbf{I}\right)^{-1} \tag{27}$$

and **B** is a diagonal matrix whose diagonal entries are *β*<sup>2</sup> *<sup>r</sup>*(*k*),*<sup>r</sup>* corresponding to UEs in the second class. Then, after refinement of **<sup>H</sup>**<sup>ˆ</sup> *<sup>c</sup>* to ˜ **H**ˆ *<sup>c</sup>* by Equation (26), the channel estimation **H**ˆ is replaced by **H**ˆ = - **<sup>H</sup>**<sup>ˆ</sup> *<sup>u</sup>* ˜ **H**ˆ *c* .

In summary, first, we apply the LS estimation to obtain **h**ˆ *<sup>r</sup>*(*k*),*<sup>r</sup>* for all UEs. Next, we define the first and the second classes of UEs based on the correlation matrix **C**. The first class is defined by UEs whose SRS sequences are mutually orthogonal. The remaining UEs form the second class. Then, for the second class of UEs, we apply additionally the MMSE algorithm to the channel estimation obtained by LS scheme as the supplemental procedure. Finally, we use the result of LS estimation for the first class and the result of supplement estimation for the second class as the channel estimation. If the number of UEs is less than or equal to the number of mutually orthogonal SRS sequences, we only need to perform a LS estimation as introduced in Equation (9). Otherwise, we need to apply the proposed correlation-aided supplemental estimation after performing the LS estimation.

#### **6. Numerical Results**

We evaluate the performances of the proposed SRS and the proposed supplemental channel estimation algorithm in terms of MSE in the multi-user m-MIMO DAS environment. For comparison, the MSE obtained by using conventional LTE/LTE-A SRS scheme is also shown. As the proposed scheme, we consider the following three types:


We consider the large scale fading factor between UEs and serving RU as *β*<sup>1</sup> and the large scale fading factor between UEs and other neighboring RUs as *β*2, i.e., *βr*(*k*),*<sup>r</sup>* = *β*<sup>1</sup> and *βr*(*k*),*<sup>r</sup>* = *β*<sup>2</sup> for all *k*, *r* and *r* = *r*. We let *β*2/*β*<sup>1</sup> = 0.7692, which comes from the assumption that the distance of UE from serving RU is 1.3 times shorter than the distance from other neighboring RUs, where the attenuation factor is assumed to be *a* = 2. We let each RU serve the same number of UEs. We choose *Lc* = 8 in Equations (5) and (7), and *Lp* = 3 when applying the phase rotation *e j*2*π <sup>p</sup> Lp <sup>m</sup>* in types (b) and (c) of the proposed scheme. We consider a block fading channel, International Telecommunication Union Radiocommunication Sector (ITU-R) Ped A and ITU-R Veh A channels [28], where ITU-R Ped A [29] and ITU-R Veh A channels are examples of frequency selective channels. The simulation parameters used for performance evaluation are listed in Table 1.

We depict the average MSE of channel estimation with respect to the total number of UEs in the cell obtained for block fading channel, ITU-R Ped A channel and ITU-R Veh A channel in Figures 3–5, respectively. In each figure, we plot the average MSE obtained by using conventional SRS in LTE/LTE-A systems and the proposed scheme of three types.

We also plot the average MSE obtained by applying supplemental MMSE estimation to all UEs instead of applying MMSE selectively only to UEs belonging to the second class, where the phase rotation *e j*2*π <sup>p</sup> Lp <sup>m</sup>* is applied and the corresponding curves are marked with 'nonselective supplemental MMSE.' In Figure 3, we include the average MSE predicted analytically by Equation (20) for the proposed scheme of type (a) over block fading channel.


**Table 1.** Parameters used in performance evaluation.

**Figure 3.** Average MSE of channel estimation with respect to the total number of UEs over block fading channel.

**Figure 4.** Average MSE of channel estimation with respect to the total number of UEs over ITU-R Ped A channel.

**Figure 5.** Average MSE of channel estimation with respect to the total number of UEs over ITU-R Veh A channel.

It is observed that the channel estimation MSE obtained by using any SRS scheme rapidly increases when the number of UEs in service exceeds the number of orthogonal SRS sequences, where the conventional LTE/LTE-A scheme and the proposed scheme generate 16 and *M* (=48) orthogonal SRS sequences, respectively. For more than 48 UEs, the proposed scheme also provides much lower channel estimation MSE than LTE/LTE-A scheme. Even reusing repeatedly *M* (=48) orthogonal SRS sequences generated in type (a) can lower significantly the channel estimation MSE for any number of UEs. It is observed that applying phase rotation *e j*2*π <sup>p</sup> Lp <sup>m</sup>* to SRS generation further lowers the channel estimation MSE, and the use of supplemental selective MMSE estimation can even further improve the channel estimation performance at the cost of increasing complexity. Note that the computational complexity of supplemental selective MMSE estimation is, in general, *O*(*n*3) because it requires the matrix inversion, where *n* is the dimension of a square matrix **A**. Considering, as a reference, the case that *M* orthogonal sequences are repeatedly used to generate SRS sequences, the dimension of **A** is determined from Equation (17) as *n* = min(*KR*, 2 max(0, *KR* − *M*)). Applying supplemental MMSE estimation nonselectively to all UEs degrades the performance when the number of UEs is not high enough. In Figure 3, the analytic prediction of MSE for block fading channel is observed to match the numerical result very well. It is clear that the proposed scheme of type (c) shows the best channel estimation performance in terms of MSE for all numbers of UEs. The MSE gain of the proposed scheme of type (c) over LTE/LTE-A system is about 6 dB in block fading and ITU-R Ped A channels, and about 8 dB in ITU-R Veh A channel when serving 120 UEs through four RUs. The performance gain achieved by using the proposed scheme can be observed in block fading channel as well as frequency selective channel.

Conventional LTE/LTE-A systems may not effectively employ m-MIMO transmission for the service of a large number of UEs mainly due to the pilot contamination problem. However, the proposed SRS and the supplemental channel estimation can relieve the pilot contamination problem in a significant level. It is expected that the proposed scheme can be effectively adopted in multi-user m-MIMO systems.

#### **7. Conclusions**

In this paper, we proposed a generation of SRS sequences resulting in improved channel estimation performance without increasing the sequence length. The proposed SRS can easily be generated by imposing a phase rotation to the base sequence. Even though the proposed SRS sequences are mutually correlated when the number of UEs is higher than the sequence length, it lowers significantly the MSE of channel estimation and thus alleviates the pilot contamination problem. We also proposed a supplemental correlation-aided channel estimation scheme to further improve the channel estimation performance of multi-user m-MIMO technology in DAS environments. It is expected that the proposed SRS scheme and the supplemental channel estimation scheme can be effectively adopted in m-MIMO systems.

**Author Contributions:** Writing, J.W.L.; Software, S.Y.; Conceptualization, J.W.L.; Supervision, J.W.L.

**Funding:** This research was supported by the Chung-Ang University Graduate Research Scholarship in 2017 and by the Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Education (NRF-2016R1D1A1B03933174).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Derivation of Equation (8)**

With *sr*(*k*)[*m*] = *x*(*α*,*s*,*p*)[*m*] and *sr*(*j*)[*m*] = *x*(*<sup>α</sup>* ,*s* ,*p* )[*m*], we obtain Equation (8) by using Equation (7) as

$$\begin{split} \mathbf{C}\_{r(k),r'(j)} & \triangleq \frac{1}{M} \mathbf{s}\_{r(k)}{}^H \mathbf{s}\_{r'(j)} = \frac{1}{M} \sum\_{m=0}^{M-1} s\_{r(k)}^\* \left[ m \right] s\_{r'(j)}[m] \\ &= \frac{1}{M} \sum\_{m=0}^{L\_c-1} \varepsilon^{-j2\pi\left(\left(\frac{r}{M} + \frac{p}{p} + \frac{q}{L\_c}\right)m - \frac{qm(m+1)}{2L\_c}\right)} \varepsilon^{j2\pi\left(\left(\frac{r'}{M} + \frac{p'}{p'} + \frac{q'}{L\_c}\right)m - \frac{qm(m+1)}{2L\_c}\right)} \\ &+ \frac{1}{M} \sum\_{m=L\_x}^{M-1} \varepsilon^{-j2\pi\left(\left(\frac{r}{M} + \frac{p}{p} + \frac{q}{L\_c}\right)m - \frac{q(m-L\_x)(m-L\_x+1)}{2L\_x}\right)} \varepsilon^{j2\pi\left(\left(\frac{r'}{M} + \frac{p'}{p'} + \frac{q'}{L\_c}\right)m - \frac{q(m-L\_x)(m-L\_x+1)}{2L\_x}\right)} \\ &= \frac{1}{M} \sum\_{m=0}^{M-1} \varepsilon^{-j2\pi\left(\frac{r-p'}{M} + \frac{p-p'}{L\_p} + \frac{q-q'}{L\_c}\right)} m \end{split}$$

#### **Appendix B. Derivation of Equation (15)**

By using Equations (11), (13) and (14), we obtain Equation (15) as

*σ*¯ 2 *MSE* <sup>=</sup> <sup>1</sup> *KR K*−1 ∑ *k*=0 *R*−1 ∑ *r*=0 *<sup>E</sup>***h**<sup>ˆ</sup> *<sup>r</sup>*(*k*),*<sup>r</sup>* − **<sup>h</sup>***r*(*k*),*r*<sup>2</sup> *β*2 *r*(*k*),*r NT* <sup>=</sup> <sup>1</sup> *KRNT K*−1 ∑ *k*=0 *R*−1 ∑ *r*=0 1 *β*2 *r*(*k*),*r E* \$ \$ \$ \$ \$ *K*−1 ∑ *j*=0,*j* =*k* **h***r*(*j*),*rCr*(*k*),*r*(*j*) + *R*−1 ∑*r*=0,*r* =*r K*−1 ∑ *j*=0 **h***<sup>r</sup>*(*j*),*rCr*(*k*),*<sup>r</sup>*(*j*) + 1 *M M*−1 ∑ *i*=0 *s* ∗ *<sup>r</sup>*(*k*)[*i*]**n***r*[*i*] \$ \$ \$ \$ \$ 2 <sup>=</sup> <sup>1</sup> *KRNT K*−1 ∑ *k*=0 *R*−1 ∑ *r*=0 1 *β*2 *r*(*k*),*r K*−1 ∑ *j*=0,*j* =*k <sup>E</sup>***h***r*(*j*),*r*2|*Cr*(*k*),*r*(*j*)<sup>|</sup> <sup>2</sup> + *R*−1 ∑*r*=0,*r* =*r K*−1 ∑ *j*=0 *<sup>E</sup>***h***<sup>r</sup>*(*j*),*r*2|*Cr*(*k*),*<sup>r</sup>*(*j*)<sup>|</sup> 2 + 1 *M*<sup>2</sup> *M*−1 ∑ *i*=0 *<sup>E</sup>***n***r*[*i*]<sup>2</sup> # <sup>=</sup> <sup>1</sup> *KRNT K*−1 ∑ *k*=0 *R*−1 ∑ *r*=0 1 *β*2 *r*(*k*),*r K*−1 ∑ *j*=0,*j* =*k NTβ*<sup>2</sup> *<sup>r</sup>*(*j*),*r*|*Cr*(*k*),*r*(*j*)| <sup>2</sup>+ *R*−1 ∑*r*=0,*r* =*r K*−1 ∑ *j*=0 *NTβ*<sup>2</sup> *<sup>r</sup>*(*j*),*r*|*Cr*(*k*),*<sup>r</sup>*(*j*)| <sup>2</sup>+ 1 *M*<sup>2</sup> *M*−1 ∑ *i*=0 *NTσ*<sup>2</sup> *n* # <sup>=</sup> <sup>1</sup> *KR K*−1 ∑ *k*=0 *R*−1 ∑ *r*=0 1 *β*2 *r*(*k*),*r K*−1 ∑ *j*=0,*j* =*k β*2 *<sup>r</sup>*(*j*),*r*|*Cr*(*k*),*r*(*j*)| <sup>2</sup> + *R*−1 ∑*r*=0,*r* =*r K*−1 ∑ *j*=0 *β*2 *<sup>r</sup>*(*j*),*r*|*Cr*(*k*),*<sup>r</sup>*(*j*)| <sup>2</sup> + 1 *Mσ*<sup>2</sup> *n* # .

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Efficient Pilot Decontamination Schemes in 5G Massive MIMO Systems**

#### **Omar A. Saraereh 1, Imran Khan 2, Byung Moo Lee 3,\* and Ashraf Tahat <sup>1</sup>**


Received: 26 November 2018; Accepted: 27 December 2018; Published: 3 January 2019

**Abstract:** Massive Multiple-input Multiple-output (MIMO) is an emerging technology for the 5G wireless communication systems which has the potential to provide high spectral efficient and improved link reliability and accommodate large number of users. Aiming at the problem of pilot contamination in massive MIMO systems, this paper proposes two algorithms to mitigate it. The first algorithm is depending on the idea of Path Loss to perform User Grouping (PLUG) which divide the users into the center and edge user groups depending on different levels of pilot contamination. It assigns the same pilot sequences to the center users which slightly suffer from pilot contamination and assign orthogonal pilot sequences to the edge users which severely suffer from pilot contamination. It is assumed that the number of users at the edge of each cell is the same. Therefore, to overcome such limitations of PLUG algorithm, we propose an improved PLUG (IPLUG) algorithm which provides the decision parameters for user grouping and selects the number of central and edge users in each cell in a dynamic manner. Thus, the algorithm prevents the wrong division of users in good channel conditions being considered as an edge user which causes large pilot overhead, and also identifies the users with worst channel conditions and prevents the wrong division of such users from the center user group. The second algorithm for pilot decontamination utilizes the idea of pseudo-random codes in which orthogonal pilot are assigned to different cells. Such codes are deployed to get a transmission pilot by scrambling the user pilot in the cell. Since the pilot contamination is generated because different cells multiplex the same set of orthogonal pilots and the pseudo-random sequences have good cross-correlation characteristics, this paper uses this feature to improve the orthogonality of pilots between different cells. Simulation results show that the proposed algorithms can effectively improve channel estimation performance and achievable rate as compared with other schemes.

**Keywords:** Massive MIMO; pilot decontamination; MSE; dynamic user scheduling; dynamic pilot allocation

#### **1. Introduction**

With the advent of the era of big data and increasing demand by the explosion of growing numbers of subscribers, the demand for communication networks has exploded, and the existing mobile communication networks (4G) are increasingly unable to meet the needs of users for the network [1]. Massive MIMO is a key technology for 5G wireless communications to increase the spectral efficiency [2–4]. It has the ability to be deployed in various communications paradigms such as multicarrier communication, Orthogonal Frequency Division Multiplexing (OFDM) and cooperative communications [5–12]. As massive MIMO utilizes different frequencies in a Frequency Division Duplex (FDD) system. Therefore, the current research on massive MIMO is generally depending on a Time Division Duplex (TDD) system [13], that is, using channel reciprocity to obtain the required channel state information (CSI) [14], but the limited coherence interval limits the number of orthogonal pilots allocated to the user, which inevitably exists. The use of the same pilot by different cell users results in the inability of the Base Station (BS) to distinguish between pilot contamination [15].

In view of the fact that pilot contamination is caused by different cells multiplexing the same pilot, this paper proposes two algorithms for pilot decontamination in massive MIMO systems. The first algorithm is depending on the path loss for performing user grouping (PLUG) while the second algorithm is depending on pseudo-random codes [16]. In PLUG algorithm, the users are divided into a central user group and an edger user group depending on the distance between the users and the BS, and then classified according to the corresponding principle. In the case of a slight loss in the performance of the central user, the communication outage probability of the cell edge user is significantly reduced, and the quality of service (QoS) is significantly improved. On the basis of the PLUG strategy, the improved PLUG (IPLUG) is further proposed, and the decision parameters are dynamically selected to select the number of edge users to realize the dynamic division of the central users and edge users and improve the flexibility of the strategy. The second algorithm is depending on pseudo-random code that assigns different delays to each cell as a code sequence that distinguishes different cells and uses these pseudo-random codes to correspond to the cells. The user pilot performs synchronous scrambling to obtain new user pilots to enhance the orthogonality of user pilots between different cells. At the same time, the mean square error (MSE) of the expected channel estimation under the pilot design scheme is derived and analyzed. The strategy of this paper does not require the large-scale cooperation of BSs. It only needs to know the distance and decision parameters between users and BSs in the current cell and compete for a dynamic grouping by using the division principle. The proposed model has low complexity and requires fewer parameters, which makes it suitable for deploying in massive MIMO systems practical scenarios. It is proved that the proposed pilot design scheme can effectively improve the performance of channel estimation. The numerical simulation results also verify that the proposed pilot design algorithms depending on PLUG and pseudo-random code can greatly improve the performance of channel estimation and effectively reduce the pilot contamination of the system.

#### **2. State-of-the-Art Pilot Decontamination Algorithms**

The authors in [8] pointed out that in multi-cell massive MIMO systems, when the number of BS antennas tends to infinity, the performance of the system is mainly limited by pilot contamination, and thereafter the research on pilot contamination in massive MIMO systems has never stopped. The authors in [17] analyzed the influence of pilots on the performance of massive multi-cell MIMO systems and proposed a minimum mean square error (MMSE) multi-cell precoding method. As compared with the traditional zero-forcing (ZF) precoding, the MMSE scheme can significantly improve the gain of the system without pilot contamination, but the pilot contamination is relatively serious. The increase in system gain is limited. In [18], a pilot contamination precoding (PCP) scheme is proposed in which a precoding matrix is designed with ZF to obtain an infinite signal-to-interference plus noise ratio (SINR), but the effect is not ideal when the number of antennas is limited. In [19], the optimal algorithm for finding the optimal pilot contamination precoding matrix and the simple suboptimal algorithm is proposed depending on [18], and the two algorithms are proved to be limited when the number of BS antennas is limited. Compared with traditional MIMO, the gain of the system can be greatly improved. In [20], an intelligent pilot allocation scheme is proposed, which can maximize the uplink SINR of all users in the target cell under the channel of large-scale fading characteristics. The authors in [21,22] proposed a pilot power control method, which was successfully used in the classified cell, which effectively reduced the pilot contamination and improved

the downlink reachability and rate of the entire system. Pilot contamination is a problem inherent in a massive MIMO system. The method of solving pilot contamination in different situations is not unique.

Ideally, massive MIMO should use Fully Orthogonal Pilot Scheduling (FOPS) to assign orthogonal pilots to each user, but the length of the pilot sequence and the pilot set size are limited by the channel coherence time. In a typical scenario, the maximum number of orthogonal pilot sequences in a 1ms coherence time is about 200 [23]. Therefore, a massive MIMO system generally adopts Fully Reused Pilot Scheduling (FRPS). Since the pilot between users is non-orthogonal or identical, pilot contamination is unavoidable [16]. When the number of BS antennas tends to be infinite and there is no cooperation, the main factor affecting the system performance is inter-cell interference (ICI) caused by pilot contamination [23], so pilot contamination is critical to the performance improvement of massive MIMO systems.

At present, massive MIMO pilot decontamination is mainly carried out from three aspects: channel estimation method [24–28], matrix precoding method case [29,30], and pilot allocation strategy [21]. The authors in [25] utilize a diagonal Jacket matrix for pilot reduction which has low complexity, excellent eigenvector and a constant diagonal treatment and an energy harvest. The drawback of such a method is that it assumes perfect multipath fast channel estimation. In [26], the authors propose a pilot mitigation algorithm using a low-rate coordination between cells during the channel estimation phase itself. The coordination makes use of the additional second-order statistical information about the user channels, which are shown to offer a power way of discriminating across interfering user with even strongly correlated pilot sequences. The authors in [27] propose a highly efficient Discrete Fourier Transform (DFT) based approximation of the Linear Minimum Mean Square Error (LMMSE) estimator for reducing the pilot contamination problem. The authors in [28] propose an eigenvalue-decomposition-based approach to channel estimation, that estimates the channel blindly from the received data. The approach exploits the asymptotic orthogonality of the channel vectors in massive MIMO systems. It also deploys a short training sequence to resolve the multiplicative ambiguity of the received signals covariance. In [29], the authors propose a zero forcing (ZF) time-shifted pilot scheme, which was known to mitigate the pilot contamination effectively using conjugate beamforming. The authors in [30] proposed interreference-cancellation (IC) precoding scheme for pilot contamination mitigation in massive MIMO systems. They investigated the quality-of-service (QoS) guaranteed user-scheduling which is improved by deploying their proposed scheme. The authors in [31] pilot contamination reduction scheme which is dependent on complex exponential basis expansions. The Linear Time-Varying (LTV) channel is estimated and the optimal pilot symbols are derived following the minimum mean square error (MMSE) criterion and it is shown that the optimal pilot strategy is to group consecutive pilot tones together as a pilot cluster and to distribute uniformly all pilot clusters in frequency-domain. Depending on the pilot allocation strategy, the research found that: In [30], under the principle of maximizing Signal Leakage Noise Ratio (SLNR) precoding, the same guidance is adopted for users with small ICI. Users with high frequency and mutual interference user orthogonal pilots to improve the overall performance of the system in the case of pilot contamination. However, the pilot scheduling scheme requires large-scale cooperation between BSs and needs to know the large-scale fading factor of each user. Such a factor is very difficult in a massive MIMO system with its own complex structure. The authors in [32] proposed a pilot allocation strategy depending on power control, which makes the pilot transmit time slots between cells with relatively large crossover gains staggered, but it is not easy to ensure pilot dynamic synchronization of several cells. The choice of mechanism will directly affect the performance of the strategy. The authors in [33] proposed a pilot coordinated allocation scheme, which allocates pilot sequences by identifying pilot usage conditions and selects user multiplexed pilots that are least affected by pilot contaminations, thereby reducing pilot contamination, but inter-cell cooperation to system brings additional burdens and expenses. The authors in [34] proposed an improved strategy depending on soft pilot multiplexing. On the basis of soft pilot multiplexing technology, packet parameters are introduced for secondary grouping, but the path loss factor, shadow fading, and the size

of each user need to be known. The parameters such as the scale fading factor are more complicated, and the computational complexity of implementing the secondary grouping is higher.

#### **3. System Model**

#### *3.1. The System Model*

Figure 1 shows the model for massive MIMO system. The channel propagation matrix of all users in the cell to the cell site is:

$$\mathbf{H}\_{i\rangle} = \mathbf{G}\_{i\rangle} \sqrt{\mathbf{D}\_{i\rangle}} \tag{1}$$

where: *Gij* = - **<sup>g</sup>***ij*<sup>1</sup> **<sup>g</sup>***ij*2... **<sup>g</sup>***ijK* , **<sup>g</sup>***ijk* <sup>∈</sup> <sup>C</sup>*M*×<sup>1</sup> is a small-scale fading vector, each vector is independent of each other and obeys a zero mean complex Gaussian distribution with a variance of *IM*, that is **g***ijk* ∼ *CN*(0,*IM*); *Dij* = diag *βij*<sup>1</sup> *βij*2... *βijK* is a *K* × *K* order diagonal matrix, used to describe the *j*th cell large-scale fading of each user to the *i*th cell BS, *βijk* = *zijk*/ *rijk*/*R α* represents the large-scale fading factor of user *k* to *i*th BS in the *j*th cell. Wherein, the shadow fading *zijk* obeys a lognormal distribution, that is, 10lg *zijk* obeys a zero mean, the standard deviation is a Gaussian distribution of *σ*shadow, *R* is defined as the cell radius, and *rijk* represents the distance from the user *k* to the *i*th BS in the *j*th cell, *α* represents the path loss factor, and *zijk*, *rijk*, *α* are independent of each other. At the same time, it is assumed that the antenna arrays in the same BS are sufficiently compact in arrangement, the large-scale fading of specific users is equal in all propagation paths, but the large-scale fading of different users is independent of each other, and the channel is reciprocity, that is, uplink propagation is assumed to be the conjugate transpose of the matrix in the downlink propagation matrix [22]. For a multi-cell multi-user massive MIMO system, the number of antenna arrays installed by the BS is large, and the channel satisfies progressive orthogonality, namely:

$$
\left(\frac{\mathbf{H}\_{ij}^{H}\mathbf{H}\_{ij}}{\mathcal{M}}\right) = \sqrt{\mathcal{D}\_{ij}} \left(\frac{\mathbf{G}\_{ij}^{H}\mathbf{G}\_{ij}}{\mathcal{M}}\right)\_{M\gg K} \sqrt{\mathcal{D}\_{ij}} \approx \mathcal{D}\_{ij} \tag{2}
$$

**Figure 1.** Massive MIMO multi-cell multi-user TDD system model.

#### *3.2. Causes of Pilot Contamination*

In a massive MIMO multi-cell multi-user TDD system, the BS is estimating the uplink pilot signal by the user in each coherent time and complete signal detection and downlink precoding. At the beginning of each coherence time, all users in all cells simultaneously transmit pilot sequences. At the beginning of each coherence time, all users in all cells simultaneously transmit pilot sequences. Suppose *ψ<sup>i</sup>* = (*ψ*1*i*, *ψ*2*i*, ..., *ψKi*) *<sup>T</sup>* is the *<sup>K</sup>* <sup>×</sup> *<sup>τ</sup>* dimension pilot sequence matrix of all users in *<sup>i</sup>*th cell (*τ* is the sequence length), which satisfies *ψiψ<sup>H</sup> <sup>i</sup>* = *IK*, where *I<sup>K</sup>* is the unit matrix of order *K* × *K*. Under the FRPS policy, the pilot matrix received by the uplink cell BS is:

$$\boldsymbol{y}\_{i}^{p} = \sqrt{p\_{p}} \left(\sum\_{j=1}^{L} \boldsymbol{H}\_{ij} \boldsymbol{\Psi}\_{i}\right) + \boldsymbol{n}\_{i}^{p} \tag{3}$$

where, *pp* is the pilot signal transmission power and *<sup>n</sup><sup>p</sup> <sup>i</sup>* is the additive white Gaussian noise matrix of order *M* × *τ*. After receiving the pilot signal, the BS starts uplink channel estimation. The channel estimation value *H*ˆ *ii* of the target cell is obtained by using the LS channel estimation introduced earlier which is expressed as:

$$\mathcal{H}\_{\rm ii} = \frac{1}{\sqrt{\mathcal{P}\_p}} \boldsymbol{y}\_i^p \boldsymbol{\Psi}\_i^H = \boldsymbol{H}\_{\rm ii} + \sum\_{j \neq i} \boldsymbol{H}\_{\rm ij} + \frac{1}{\sqrt{\mathcal{P}\_p}} \boldsymbol{n}\_i^p \boldsymbol{\Psi}\_i^H \tag{4}$$

It can be seen from Equation (4) that the *i*th cell channel estimation value *H*ˆ *ii* includes the superposition of the channel propagation matrix of other cells to the target cell in addition to the influence of the target channel and noise, which is pilot contamination.

After the user sends the pilot, all users send data signals to the BS and use the same time-frequency resource. The signal received by the *i*th cell BS is defined as:

$$y\_i^u = \sqrt{p\_u} \sum\_{j=1}^{L} \sum\_{k=1}^{K} h\_{ijk} \mathbf{x}\_{jk}^u + \mathbf{n}\_i^u \tag{5}$$

where *x<sup>u</sup> jk* is the transmitted data symbol of user *k* in *j*th cell; *pu* is the uplink user data symbol average transmit power; *hijk* represents the channel transmission vector of user *k* to *i*th cellBS in *j*th cell, which is the *k*th column of *Hij*; *n<sup>u</sup> <sup>i</sup>* is the additive white Gaussian noise vector of order *M* × 1. The BS uses the channel estimation value *H*ˆ *ii* of the Equation (4) and the received signal vector *y<sup>u</sup> <sup>i</sup>* , and the MF detects the original data symbol *x*ˆ*<sup>u</sup> jk* transmitted by the user *k* in the *i*th cell is expressed as:

$$\hat{\mathbf{x}}\_{jk}^{\underline{u}} = \hat{h}\_{iik}^{\underline{H}} \mathbf{y}\_i^{\underline{u}} = \left(\sum\_{j=1}^{L} h\_{ijk}^{\underline{H}} + \boldsymbol{\sigma}\_{ik}^{\underline{H}}\right) \left(\sqrt{p\_{\boldsymbol{u}}} \sum\_{j=1}^{L} \sum\_{k=1}^{K} h\_{ijk} \mathbf{x}\_{jk}^{\underline{u}} + \boldsymbol{n}\_i^{\boldsymbol{u}}\right) \tag{6}$$

where *<sup>v</sup>ik* is the column vector of the matrix *<sup>n</sup><sup>p</sup> <sup>i</sup> <sup>ψ</sup><sup>H</sup> <sup>i</sup>* /√*pp*. When the number of BS antennas *<sup>M</sup>* approaches positive infinity, it is easy to know from Equation (2) that the channel of the massive MIMO system exhibits progressive orthogonality, and the progressive expression of Equation (6) is:

$$\mathbf{x}\_{ik}^{\mathbf{u}} \approx \mathcal{M} \sqrt{p\_{\boldsymbol{u}}} \left( \beta\_{iik} \mathbf{x}\_{jk}^{\mathbf{u}} + \sum\_{j \neq i} \beta\_{ijk} \mathbf{x}\_{jk}^{\mathbf{u}} \right) \tag{7}$$

It can be known from Equation (7) that when the BS antenna *M* → ∞, the data symbol *x*ˆ*u ik* will not be affected by the small-scale fading factor and noise. Therefore, when *M* → ∞, the signal-to-interference and noise ratio (SINR) of the user *k* uplink received signal in the *i*th cell can be defined as:

$$SNR\_{i\bar{k}}^{u} = \frac{\beta\_{i\bar{i}k}^{2}}{\sum\_{j \neq i} \beta\_{i\bar{j}k}^{2}} \tag{8}$$

It can be seen from Equation (8) that due to the existence of pilot contamination, the uplink SINR is limited by the large-scale fading factor of the same pilot user in the interfering cell. According to Equation (8), the uplink achievable rate of user *k* in the target *i*th cell is given by:

$$\mathbf{C}\_{ik}^{u} = (1 - \mu\_0)E\{\text{lb}(1 + SINR\_{ik}^{u})\} \tag{9}$$

where *μ*<sup>0</sup> is the pilot overhead coefficient for full multiplexing, indicating the spectral efficiency loss caused by the pilot sequence used for channel estimation, when other pilot allocation algorithms are used, the adjustment in Equation (9) is to be appropriately made.

#### *3.3. IPLUG Algorithm*

The PLUG algorithm significantly improves the QoS of the edge user but lacks certain flexibility. This is because the number of edge users selected by each cell in the PLUG policy is the same, and only compared with the distance between other users in the cell and the BS. The user's own specific environment is not considered. Therefore, we proposed an improved pilot scheduling algorithm depending on path loss to perform user grouping (IPLUG) and dynamically select the number of edge users per cell by introducing decision parameters. Figure 2 is a schematic diagram of the IPLUG algorithm decision parameters. The improved IPLUG algorithm is depending on the PLUG algorithm and is designed to improve the accuracy and legitimacy of performing user groupings. In the IPLUG algorithm, when the decision parameter *λ* is selected, the BS can complete the dynamic grouping according to the user distance *d* and *λ* × *R*. The specific division principle is:

$$d \stackrel{?}{\\\hspace{1cm}} \stackrel{?}{\lambda} \lambda \times R \to \left\{ \begin{array}{c} \text{Yes} \to \text{Edge User} \\\No \\\ \text{No} \to \text{Center User} \end{array} \right. \tag{10}$$

Let the central user set be *Uc*, the edge user set be *Ue*, the central user pilot set be *ψc*, and the edge user pilot set be *ψe*, the pilot set of *j*th cell is *ψ<sup>j</sup>* = *<sup>ψ</sup>*1*j*, *<sup>ψ</sup>*2*j*, ..., *<sup>ψ</sup>Kj <sup>T</sup>* , *j* = 1, 2, . . . *L*, then:

$$\Psi\_{ki}\Psi\_{kj}^{H} = \begin{cases} \ 1 \to \ \Psi\_{ki} \subseteq \Psi\_{\mathcal{C}} \text{ and } \Psi\_{kj} \subseteq \Psi\_{\mathcal{C}}\\ \ 0 \to \text{otherwise} \end{cases} \tag{11}$$

Then, the pilot vector received by the target *i*th cell BS is:

$$\overline{\mathbf{y}}\_i^p = \sqrt{p\_p} \sum\_{j=1}^L \left( \sum\_{k=1,\ k \in \mathcal{U}\_\mathbf{c}}^K h\_{ijk} \boldsymbol{\upmu}\_{kj} + \sum\_{k=1,\ k \in \mathcal{U}\_\mathbf{c}}^L h\_{ijk} \boldsymbol{\upmu}\_{kj} \right) + \overline{n}\_i^p \tag{12}$$

For the target *i*th cell user *k*, the Least Square (LS) channel estimation is used to obtain the target channel estimation value:

$$
\hat{\mathbf{h}}\_{iik} = \frac{1}{\sqrt{\mathcal{P}\_p}} \mathbf{y}\_i^p \boldsymbol{\Psi}\_{ki}^H \tag{13}
$$

Combining Equations (12) and (13), when the number of antennas *M* → ∞ the SINR of the central user under the IPLUG algorithm is:

$$SNR\_{ik}^{uc} = \frac{\left| \boldsymbol{h}\_{iik}^{H} \boldsymbol{h}\_{iik} \right|^2}{\sum\_{j \neq i\_r} \boldsymbol{\varphi}\_{ki} \leq \boldsymbol{\varphi}\_{\boldsymbol{\varepsilon}}, \ \boldsymbol{\varphi}\_{kj} \leq \boldsymbol{\varphi}\_{\boldsymbol{\varepsilon}}} \left| \boldsymbol{h}\_{jik}^{H} \boldsymbol{h}\_{jik} \right|^2} \tag{14}$$

The edge user is not affected by the pilot contamination, so when *M* → ∞, the edger user *SINRue ik* → ∞.

Figure 3 illustrates the flowchart for the IPLUG algorithm.

**Figure 2.** IPLUG policy decision parameters for user grouping.

**Figure 3.** Proposed IPLUG algorithm flowchart.

#### *3.4. Pseudo-Random Code-Based Pilot Scheduling*

#### 3.4.1. Uplink

Assume that all users in any cell send a pilot sequence of length *τ*, if the pilot transmitted by *K* users in the *j*th cell is *ψ<sup>j</sup>* = *φj*1*φj*<sup>2</sup> ... *φjK* , where *<sup>φ</sup>jk* <sup>∈</sup> <sup>C</sup>*τ*×<sup>1</sup> is the pilot transmitted by the *<sup>k</sup>*th user in the cell (*φ<sup>H</sup> jkφjk* = 1), when the average transmit power of the user is *ρr*, the pilot signal received by the *l*th cell BS is:

$$Y\_l = \sqrt{\rho\_r \pi} \sum\_{j=1}^{L} G\_{jl} \boldsymbol{\psi}\_j^T + \boldsymbol{n}\_l \tag{15}$$

where *Gjl* = *Hjl Djl* is the channel between all users of the *j*th cell and the *l*th cell BS. Let *βjlk* denote the large-scale fading coefficient of the user *k* of the *j*th cell to the *l*th cell BS, then *Djl* can be expressed as a diagonal matrix, and the diagonal element is *βjl* = - *<sup>β</sup>jl*1, *<sup>β</sup>jl*2, ..., *<sup>β</sup>jlK* , *nl* denotes the additive white Gaussian noise of the *l*th cell. The term *Hjl* can be determined from the following matrix expression:

$$H\_{jl} = \begin{bmatrix} h\_{j\_1} l\_1 & \cdots & h\_{j\_1} l\_M \\ \vdots & \cdots & \vdots \\ h\_{j\_K} l\_1 & \cdots & h\_{j\_K} l\_M \end{bmatrix} \tag{16}$$

The elements of *Hjl* are independently and identically distributed (i.i.d), and *hjk lm* represents a smallscale fading coefficient between the *k*th user in the *j*th cell and the *m*th antenna of the BS in the *l*th cell.

The BS estimates the channel using the received pilot signal, which is known from the standard results in the estimation theory [24]. When the number of BS antennas tends to infinity, the MMSE estimate of the channel is:

$$\mathbf{G}\_{jl} = \sqrt{\rho\_r \mathbf{\tau}} Y\_l \left( I + \rho\_r \mathbf{\tau} \sum\_{i=1}^{L} \psi\_i^\* D\_{il} \psi\_i^T \right)^{-1} \psi\_j^\* D\_{jl} \tag{17}$$

#### 3.4.2. Downlink

Considering the BS of the *l*th cell, it is assumed that the information symbol transmitted by the *l*th cell to the user is *Sl* = [*Sl*1*Sl*<sup>2</sup> ... *SlK*] *<sup>T</sup>*, and *Al* = *<sup>f</sup> G*ˆ *ll* is a linear precoding matrix of order *M* × *K*, where *f*(·) represents a specific linear precoding technique at the BS. After precoding, the signal matrix sent by the BS can be represented as *AlSl*, and the BS satisfies the average power constraint, that is, *tr*+ *E AlSlS<sup>H</sup> <sup>l</sup> <sup>A</sup><sup>H</sup> l* , ≤ *P*, then the data information received by the user of the *j*th cell can be expressed as:

$$X\_{\dot{j}} = \sqrt{\rho\_f} \sum\_{l=1}^{L} G\_{\dot{j}l}^T A\_l S\_l + n\_{\dot{j}} \tag{18}$$

where *ρ<sup>f</sup>* is the downlink transmit power and *nj* is the AWGN noise of the corresponding cell. Assuming a simple MF precoding technique at the BS, i.e., *Al* = *G*ˆ <sup>∗</sup> *ll*, Equation (25) can be expressed as:

$$X\_{\vec{l}} = \sqrt{\rho\_f} \sum\_{l=1}^{L} G\_{\vec{l}l}^T \mathbf{G}\_{\text{ll}}^\* \mathbf{S}\_l + n\_{\vec{l}} = \sqrt{\rho\_f \rho\_r \tau} \sum\_{l=1}^{L} G\_{\vec{l}l}^T \left(\rho\_r \tau \sum\_{\vec{j}'=1}^{L} \mathbf{G}\_{\vec{j}'l} \boldsymbol{\upchi}\_{\vec{j}'}^T + n\_{\vec{l}}\right)^\* Z^\* \mathbf{S}\_l + n\_{\vec{j}} \tag{19}$$

where *Z* = *I* + *ρrτ* ∑*<sup>L</sup> <sup>i</sup>*=<sup>1</sup> *ψ*<sup>∗</sup> *<sup>i</sup> Dilψ<sup>T</sup> i* −<sup>1</sup> *ψ*∗ *<sup>j</sup> Djl*. When the number of BS antennas increases and tends to infinity then:

$$\frac{1}{M\sqrt{\rho\_f\rho\_{\overline{r}}\tau}}X\_j \to \sqrt{\rho\_{\overline{r}}\tau} \sum\_{l=1}^L D\_{jl} \left(\psi\_l^H \psi\_j + \rho\_r \tau \sum\_{i=1}^L \psi\_l^H \psi\_i D\_{il}^\* \psi\_i^H \psi\_j\right)^{-1} D\_{ll}^\* S\_l \tag{20}$$

It can be obtained from Equation (20) that when all cells multiplex the same set of orthogonal pilots, the right side of Equation (20) can be equated to:

$$\sum\_{l=1}^{L} D\_{jl} \left( I + \rho\_r \pi \sum\_{i=1}^{L} D\_{il}^\* \right)^{-1} D\_{ll}^\* S\_l \tag{21}$$

At this time, the signal received by the *j*th cell will be interfered by the signals transmitted by other cells, especially when the large-scale fading coefficient is large; this phenomenon is called pilot contamination. When all cells use orthogonal pilots, the right side of Equation (20) can be equated to:

$$D\_{\vec{\text{jj}}} \left( I + \rho\_r \pi D\_{\vec{\text{jj}}}^\* \right)^{-1} D\_{\vec{\text{jj}}}^\* S\_l \tag{22}$$

In this case, the signal received by the user is a scaling of the signal transmitted by the BS of the cell, and there is no pilot contamination. However, in actual communication, because the coherence interval is limited, it cannot guarantee that all cells can adopt an orthogonal pilot. Therefore, pilot contamination has become an important factor limiting the performance of massive MIMO systems.

#### 3.4.3. Pseudo-Random Code Scheme Description

The pseudo-random code has the similarity of noise sequence. It is a seemingly random but actually regular periodic binary sequence, including *m* sequence, a Gold sequence, *M* sequence and combined sequence [35]. Pseudo-random codes have good autocorrelation and cross-correlation properties and can be used as address codes in CDMA communication technology. In addition, thanks to its pseudo-random characteristics, pseudo-random codes are also widely used in information encryption [36]. The pseudo-random sequence is generated by the linear feedback shift register shown in Figure 4, where *ci* = 0, indicates that the line is off; *ci* = 1 indicates that the line is on. The characteristic polynomial of generating a pseudo-random sequence is:

$$f(\mathbf{x}) = \sum\_{i=0}^{n} c\_i \mathbf{x}^i = c\_0 + c\_1 \mathbf{x} + \dots + c\_n \mathbf{x}^n \tag{23}$$

where *<sup>x</sup><sup>i</sup>* indicates the *<sup>i</sup>*th shift register; *<sup>c</sup>*<sup>0</sup> <sup>=</sup> *<sup>c</sup>*<sup>1</sup> <sup>=</sup> 1, for any *<sup>i</sup>* <sup>=</sup> 0 and *<sup>i</sup>* <sup>=</sup> *<sup>n</sup>*, *ci* <sup>∈</sup> {0, 1}.

**Figure 4.** Pseudo-random code linear feedback shifter register.

Since the pilot contamination is generated because different cells multiplex the same set of orthogonal pilots and the pseudo-random sequences have good cross-correlation characteristics, this paper uses this feature to improve the orthogonality of pilots between different cells. Depending on the above considerations, this paper proposes a pilot design scheme depending on pseudo-random code: in the case that all cells multiplex the same set of orthogonal pilots, the pseudo-random code is used as the code sequence for distinguishing different cells, for each cell allocates a pseudo-random code with different delays, and synchronously scrambles the pseudo-random code in each cell to the user pilot of the corresponding cell to obtain a new pilot.

The pilot design scheme depending on the pseudo-random code is shown in Figure 5. Since the flow of the pilot design of different cells is the same, in order to facilitate the representation without loss of generality, the pilot design scheme is specifically described here. Assume that the pilot of the *k*th user in a cell is *φ<sup>k</sup>* = [*φk*1, *φk*2, ..., *φkτ*] *<sup>T</sup>*, and the pseudo-random sequence generated by the linear feedback shifter register in the corresponding cell is *m* = [*m*1, *m*2, ..., *mN*]. The scrambler in Figure 5 indicates that the pseudo-random matrix is multiplied by the matrix of the corresponding user's pilot matrix. In general, a pseudo-random sequence is a 0–1 bit stream whose length is much larger than the length of the pilot sequence.

**Figure 5.** Pseudo-random code pilot design scheme.

In order to ensure that the generated pseudo-random sequence can be used for scrambling, it needs to process as follows:


The pseudo-random matrix used for scrambling after the above processing can be expressed as:

$$P = \begin{bmatrix} m\_1 & 0 & \cdots & 0 \\ 0 & m\_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & m\_{\tau} \end{bmatrix} \tag{24}$$

After the pseudo-random matrix is scrambled by the scrambler to the user pilot, the pilot of the *k*th user outputted by the scrambler is obtained as:

$$\Psi\_k = P\phi\_k = \begin{bmatrix} m\_1 \ m\_2 \phi\_{k2} \ \dots \ m\_{\mathbb{T}} \phi\_{k\mathbb{T}} \end{bmatrix}^T \tag{25}$$

The scrambling method used in this paper is to use the same pseudo-random sequence to synchronously scramble the pilots of all users in the cell, and the pilot of *K* users in the cell can be obtained after the pilot design as:

$$\Psi = \begin{bmatrix} m\_1 \phi\_{11} & \cdots & m\_1 \phi\_{K1} \\ \vdots & \cdots & \vdots \\ m\_\tau \phi\_{1\tau} & \cdots & m\_\tau \phi\_{K\tau} \end{bmatrix} \tag{26}$$

The designed pilot is transmitted to the BS through the channel, and the BS can obtain the required Channel State Information (CSI) by using channel estimation.

When the pilot design is performed on users of other cells, considering the pilot contamination is related to the distance between users, when the distance between two users is large enough, the influence of pilot contamination is neglected. Therefore, the proposed pseudo-random code pilot scheduling scheme only considers the pilot contamination of the target cell and ignores the pilot contamination of the farther cell. Under the above assumptions, only the pseudo-random sequence needs to be allocated to the target cell and the neighboring cell, and the pseudo-random sequence can be reused for the farther cell, which greatly reduces the usage and implementation complexity of the pseudo-random sequence and makes it easy to design a better pilot sequence. Depending on the above considerations, the pilot design of all cells can be designed by using the proposed pilot design scheme. The only difference is that the pseudo-random sequence used by the target cell and the neighboring cell cannot be the same, which involves a pseudo-random sequence selection problem. Since the cross-correlation between different pseudo-random sequences is different, if the cross-correlation value between the selected pseudo-random sequences is large, even if the user pilot is designed, the pilot contamination between the cells is still serious, so that the pilot design seems to be meaningless, so it is necessary to follow the certain criteria when selecting the pseudo-random sequence as the code sequence to distinguish difference cells. Theoretically, the cross-correlation values between pseudo-random sequences are connected. Nearly 0, the effect of the pilot design is more obvious, but the complexity of acquiring these pseudo-random sequences will become very high. In order to obtain the desired effect within a certain complexity range, this paper adopts the following criteria to obtain the required pseudo-random sequence. Assuming that the number of cells requiring different pseudo-random sequences is *M*, the corresponding *M* pseudo-random sequences need to be satisfied:

$$\left|\rho(m\_{i\prime},m\_{j})\right|\_{\mathbf{i}\_{\prime}\neq\mathbf{j}} < \gamma\_{i\prime} \; j=1,2,\ldots,M\tag{27}$$

where *ρ mi*, *mj* = <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>k</sup>*=<sup>1</sup> *mikmjk*, is a function that represents a normalized cross-correlation value between two pseudo-random sequences. *γ* ∈ [0, 1] is a constant values indicating the upper limit of the normalized cross-correlation value between any two selected pseudo-random sequences.

It can be seen from Equation (27) that in the case where the pseudo-random sequence used is determined, the setting of the value is an important factor affecting the impact of the pilot design scheme. The smaller the value, the smaller the cross-correlation value between pseudo-random sequences. After the pilot design, the orthogonality of user pilots between different cells is stronger, and the effect of reducing pilot contamination is more obvious. Otherwise, it cannot effectively reduce the pilot contamination. The BS and the user in each cell share a pseudo-random sequence, and the BS can distinguish pilots from different cells by using detection techniques. At the same time, since each cell has the same pseudo-random sequence to scramble a set of orthogonal pilots of all users in the cell, after pilot design, the user pilots in each cell remain orthogonal, and in addition, no additional intra-cell interference will be introduced. Moreover, increasing the pilot length is beneficial to improve the efficiency of the pilot design. This is because the length of the pseudo-random sequence is the same

as the length of the pilot, and the longer the pseudo-random sequence, the better the orthogonality between them, so increasing the pilot length is equivalent to improving the orthogonality between the pseudo-random sequences. Thereby, the orthogonality of user pilots between different cells after the pilot design is improved.

#### 3.4.4. Mean Square Error (MSE) Performance Analysis of Expected Channel Estimation

This section will use the new pilot to derive and analyze the MSE of the channel estimate and explore the pilot design method described in Section 3.3. It can be seen from [15] that when the pilot contamination is relatively serious, the channel estimation performance of the system drops sharply. In this case, even if a complex multi-cell MMSE precoding scheme is applied, the performance improvement of the massive MIMO system is limited. In order to facilitate derivation and analysis, this section only studies simple single-cell ZF precoding and its precoding matrix is defined as:

$$A\_I = \frac{\left(\text{\textdegree G}\_{II}\left(\text{\textdegree G}\_{II}^H \text{\textdegree G}\_{II}\right)\right)^{-1}}{\sqrt{\text{tr}\left[\left(\text{\textdegree G}\_{II}^H \text{\textdegree G}\_{II}\right)^{-1}\right]}}\tag{28}$$

It can be found from (28) that a single-cell ZF precoding matrix can be designed as long as the information of the desired channel is obtained. Therefore, the subsequent derivation and analysis in this section are depending on the estimated expected channel.

As can be seen from Section 3.4, the user pilot of the *j*th cell is *Ψ<sup>j</sup>* = *PjΨ*, then Equation (15) can be expressed as:

$$\Psi\_{l} = \sqrt{\rho\_{r}\overline{\tau}} \sum\_{j=1}^{L} G\_{jl} \Psi\_{j}^{T} + n\_{l} = \sqrt{\rho\_{r}\overline{\tau}} \sum\_{j=1}^{L} G\_{jl} \psi^{T} P\_{j}^{T} + n\_{l} \tag{29}$$

In the case where the number of BS antennas is limited, the MMSE estimation of the desired channel is:

$$\hat{\mathbf{G}}\_{ll} = \sqrt{\rho\_r \pi} \,\,\, Y\_l \left( \mathbf{C}\_n + \rho\_r \pi \sum\_{i=1}^L P\_i^\* \,\psi^\* \,\mathbf{C}\_{il} \psi^T P\_i^T \right)^{-1} \,\, P\_i^\* \,\, \psi^\* \,\mathbf{C}\_{il} \tag{30}$$

where *Cn* = *E* + *n<sup>H</sup> <sup>l</sup> nl* , , is the autocorrelation matrix, representing the received noise *nl*; *Cjl* = *E* ! *G<sup>H</sup> jl Gjl*" is the autocorrelation matrix representing the channel transfer matrix *Gjl*.

The MMSE defining the expected channel estimation of the *l*th cell BS is:

$$M^{\rm msc} \stackrel{\Delta}{=} E\left\{ \left|| \left| \hat{\mathcal{G}}\_{\rm II} - \mathcal{G}\_{\rm II} \right| \right|\_{\rm F}^{2} \right\} \tag{31}$$

get:

$$\text{Let } \boldsymbol{R} = \sqrt{\rho\_{\boldsymbol{r}} \pi} \left( \mathbf{C}\_{\text{il}} + \rho\_{\boldsymbol{r}} \pi \sum\_{i=1}^{L} P\_i^\* \boldsymbol{\Psi}^\* \mathbf{C}\_{\text{il}} \boldsymbol{\Psi}^T \boldsymbol{P}\_i^T \right)^{-1} P\_i^\* \boldsymbol{\Psi}^\* \mathbf{C}\_{\text{il}} \text{ then } \hat{\mathbf{G}}\_{\text{II}} = \boldsymbol{\Upsilon}\_{\text{I}} \boldsymbol{R}, \text{ substituting into (38) we}$$

$$\boldsymbol{M}^{\text{max}} \quad = \mathbb{E} \left\{ \text{tr} \left\{ \left( \boldsymbol{\Upsilon}\_{\text{I}} \mathbf{R} - \mathbf{G}\_{\text{II}} \right)^H \left( \boldsymbol{\Upsilon}\_{\text{I}} \mathbf{R} - \mathbf{G}\_{\text{II}} \right) \right\} \right\} \tag{32}$$

$$\begin{aligned} \mathcal{R}^{\text{ann}} &= \mathbb{E}\left\{ \mathcal{U} \left\{ (\mathcal{Y}\_{l}^{H} \mathcal{Y}\_{l}) \mathcal{R} - \mathcal{R}^{H} \mathcal{E} \left\{ \mathcal{Y}\_{l}^{H} \mathcal{G}\_{\text{II}} \right\} \right\} \right\} - \mathbb{E}\left\{ \mathcal{G}\_{\text{II}}^{H} \mathcal{Y}\_{l} \right\} \mathcal{R} + \mathbb{E}\left\{ \mathcal{G}\_{\text{II}}^{H} \mathcal{Y}\_{l} \right\} \right\} \end{aligned} \tag{32}$$

Using the model shown in Equation (29), there is:

$$E\left\{Y\_{l}^{H}Y\_{l}\right\} = \mathbb{C}\_{n} + \rho\_{l}\mathop{\rm tr}\limits\_{i=1}^{L} P\_{i}^{\*}\boldsymbol{\upmu}^{\*}\mathbb{C}\_{il}\boldsymbol{\upmu}^{T}P\_{i}^{T}\tag{33}$$

$$E\left\{Y\_{l}^{H}\mathcal{G}\_{ll}\right\}=\sqrt{\rho\_{l}\pi}P\_{i}^{\*}\,\psi^{\*}\mathcal{C}\_{ll}\tag{34}$$

$$E\left\{\mathbf{G}\_{ll}^{H}\mathbf{Y}\_{l}\right\} = \sqrt{\rho\_{r}\pi}\mathbf{C}\_{ll}\boldsymbol{\psi}^{T}P\_{l}^{T}\tag{35}$$

Substituting Equations (32)–(34) into Equation (35), we can rewrite *Mmse* as:

$$\begin{split} M^{\text{mse}} &= tr \left\{ R^H \left( \mathbf{C}\_{\text{tr}} + \rho\_r \mathop{\tau} \mathop{\frac{L}{\tau}}{\mathop{\tau}}\_{i=1} \mathop{\Psi}\_i^\* \boldsymbol{\Psi}^\* \mathbf{C}\_{\text{il}} \boldsymbol{\Psi}^T \boldsymbol{P}\_i^T \right) \mathbf{R} - \sqrt{\rho\_r \tau} \mathbf{R}^H P\_i^\* \boldsymbol{\Psi}^\* \mathbf{C}\_{\text{il}} - \sqrt{\rho\_r \tau} \mathbf{C}\_{\text{il}} \boldsymbol{\Psi}^T \boldsymbol{P}\_l^T \mathbf{R} + \mathbf{C}\_{\text{il}} \right\} \\ &= tr \left\{ \mathbf{C}\_{\text{ll}} - \sqrt{\rho\_r \tau} \mathbf{C}\_{\text{ll}} \boldsymbol{\Psi}^T \boldsymbol{P}\_l^T \boldsymbol{R} \right\} \\ &= tr \left\{ \mathbf{C}\_{\text{ll}} - \rho\_r \tau \, \mathbf{C}\_{\text{ll}} \boldsymbol{\Psi}^T \boldsymbol{P}\_l^T \left( \mathbf{C}\_{\text{m}} + \rho\_r \tau \, \sum\_{i=1}^L P\_i^\* \boldsymbol{\Psi}^\* \mathbf{C}\_{\text{il}} \boldsymbol{\Psi}^T \boldsymbol{P}\_i^T \right)^{-1} P\_l^\* \boldsymbol{\Psi}^\* \mathbf{C}\_{\text{ll}} \right\} \end{split} \tag{36}$$

Since the matrix *Pi* is a diagonal matrix composed of diagonal elements 1 and −1, and *<sup>ψ</sup>H<sup>ψ</sup>* = *<sup>I</sup>*, so (*Piψ*) *<sup>H</sup>*(*Piψ*) = *<sup>I</sup>* can be obtained according to the properties of the matrix conjugate transpose. Using the matrix inversion principle, we can express Equation (36) as:

$$\mathcal{M}^{\text{msc}} = \text{tr}\left\{ \mathbf{C}\_{\text{II}} - \rho\_r \boldsymbol{\tau} \, \mathbf{C}\_{\text{II}} \left( \mathbf{V}\_l^T \mathbf{C}\_n \mathbf{V}\_l^\* + \rho\_r \boldsymbol{\tau} \sum\_{i=1}^L \mathbf{V}\_l^T \mathbf{V}\_i^\* \mathbf{C}\_{\text{ii}} \mathbf{V}\_i^T \mathbf{V}\_i^\* \right)^{-1} \mathbf{C}\_{\text{II}} \right\} \tag{37}$$

where *Ψ<sup>i</sup>* = *Piψ*, *i* = 1, 2, . . . , *L*. When *M* → ∞ then:

$$\mathbf{C}\_{\mathrm{II}} = \mathrm{E}\left\{ \mathbf{G}\_{\mathrm{II}}^{H} \mathbf{G}\_{\mathrm{II}} \right\} = \sqrt{D\_{\mathrm{II}}} \mathrm{E}\left\{ H\_{\mathrm{II}}^{H} H\_{\mathrm{II}} \right\} \sqrt{D\_{\mathrm{II}}} = M D\_{\mathrm{II}} \tag{38}$$

$$\mathcal{C}\_{\mathfrak{n}} = E\left\{ n\_I^H \mathfrak{n}\_I \right\} = M I\_{\mathfrak{n}} \tag{39}$$

Substituting Equations (38) and (39) into Equation (37), we get:

$$\frac{1}{M}M^{msc} = tr\left\{D\_{ll} - \rho\_r \tau \left.D\_{ll}\right|\left(I + \rho\_r \tau \sum\_{i=1}^{L} \Psi\_l^T \Psi\_i^\* D\_{ll} \Psi\_i^T \Psi\_i^\*\right)^{-1} D\_{ll}\right\}\tag{40}$$

When the number of users in each cell is *K* = 1, *Ψ<sup>T</sup> <sup>l</sup> Ψ*<sup>∗</sup> *<sup>i</sup>* = (*Plψ*) *<sup>T</sup>*(*Piψ*) <sup>∗</sup> = *γli*, where *γli* is a normalized cross-correlation value between the pseudo-random code in the *l*th cell and the pseudo-random code in the *i*th cell, which can be obtained from equation (27). Assuming that the large-scale fading coefficient *Dil* = [*βil*1] at this time, then Equation (40) can be simplified as:

$$\frac{1}{M}M^{\text{msc}} = \beta\_{ll1} - \frac{\rho\_r \tau \beta\_{ll1}^2}{1 + \rho\_r \tau \beta\_{ll1} + \rho\_r \tau \sum\_{i \neq 1} \gamma\_{li}^2 \beta\_{li1}} \tag{41}$$

It can be seen from the results of Equation (41) that when the large-scale fading coefficient and the uplink transmit power are constant and the pilot length is constant, the MSE performance of the channel estimation that can be obtained by applying the pilot scheme depending on the pseudo-random code and it is mainly limited by the cross-correlation performance between the pseudo-random sequences used: when the normalized cross-correlation value between the pseudo-random sequences used is large (|*γli*| is close to 1), the MSE performance of the channel estimation is relatively poor. When the normalized cross-correlation value between the pseudo-random sequences used is small (|*γli*| is close to 0), the MSE performance of the channel estimation is better. It can be concluded that if a suitable pseudo-random sequence is selected, the above method is adopted. The proposed pilot design scheme can effectively improve the performance of channel estimation, thus achieving the purpose of reducing pilot contamination. Figure 6 shows the flowchart of the proposed pseudo-random pilot code algorithm.

**Figure 6.** Proposed pseudo-random algorithm flowchart.

#### **4. Simulation Results**

#### *4.1. Simulation Scenario and Parameters*

Taking the MATLAB software (R2017b, The MathWorks, Natick, MA, USA) as the simulation platform, it is assumed that the massive MIMO multi-cell multi-user TDD system includes 7 regular hexagonal cells, as shown in Figure 7.

Table 1 shows the details of simulation parameters which are used for simulation analysis of the proposed pilot decontamination schemes. It is assumed that the antenna spacing is <sup>1</sup> <sup>2</sup> times the carrier wavelength, that is, there is a correlation between the antennas, but the correlation is not considered in this paper, and it is assumed that all the antennas are omnidirectional antennas. The number of users in each cell is *K* = 10, the user is uniformly distributed in the cell, the pilot length is 128, the path loss factor is 3, the pilot overhead coefficient is *μ*<sup>0</sup> = 0.05, the cell radius *R* = 500 m, and the concentrated antennas is 16 ≤ *M* ≤ 1024, and the lognormal shadow fading is 4 dB (only the last normal logarithmic normal shadow fading is set to 2 dB), and the number of simulations is 2000.

**Figure 7.** Simulation scenario of massive MIMO system of seven cells.



According to the above simulation parameters, the Monte Carlo method is used to simulate the uplink SINR, and the normalized channel estimation MSE and the target cell achievable rate, thereby comparing the performance between the PLUG algorithm, the IPLUG, and the FRPS algorithm respectively.

#### *4.2. Analysis of PLUG Algorithm*

Figures 8 and 9 respectively show the performance of the channel estimation normalized MSE and uplink SINR of the central user and the edge user. The performance of the uplink SINR varies with the number of antennas.

**Figure 8.** NMSE comparison of FRPS and PLUG Algorithms.

**Figure 9.** NMSE comparison of FRPS and PLUG Algorithms.

It can be seen from Figures 8 and 9 that the channel estimation normalized MSE of the edge user and the central user under both algorithms has a decreasing trend with the increase in the number of antennas. However, the MSE and SINR curves of the central users in both algorithms overlap, indicating that the center user multiplexes the pilots and causes the same pilot contamination. The edge user allocates orthogonal pilots under the PLUG algorithm, so there is no pilot contamination. When the number of antennas is 256, the NMSE performance of the edge users is increased by 10.88 dB, and the SINR performance is improved by 3.23 dB. Figure 10 is a graph showing the average achievable sum-rate performance of the center user and the edge user as the number of antennas increases when

the number of edge users is one. Figure 12 shows the performance variation of the uplink achievable rate of the target cell of the central user and the edge user when the number of edge users is 1. It can be seen from Figure 10 that the average achievable rate of the central user in the PLUG algorithm is slightly lost. This is because the orthogonal pilot set is added under such algorithm, and the pilot overhead is increased, resulting in a decrease in spectral efficiency (SE), but the performance of the edge user is very good. With a large boost, the average achievable is increased by 0.42 bps/Hs when the number of antennas is 256. It can be seen from Figure 12 that the uplink achievable rate curves of the target cell are basically coincident under the two algorithms, which indicates that the PLUG algorithm does not bring loss to the overall performance of the target cell, but only increases the fairness of the central user and the edge use. It is also obvious from the results that the proposed PLUG algorithm reduces the probability of edge user communication interruption. Figure 13 compares the uplink target cell achievable rate in the case where the number of edge users of the FRPS and PLUG algorithms is different and for the number of antennas of 256. As can be seen from Figure 13, when the number of edge users increases, the achievable rate of the PLUG algorithm gradually decreases. This is because when the number of edge users is greater than one, the cost of pilot overhead exceeds the performance gains of the PLUG algorithm.

**Figure 10.** Comparison achievable rate of target cell of FRPS and PLUG algorithms.

#### *4.3. Analysis of IPLUG Algorithm*

Figure 11 compares the NMSE of the central user and the edge user against the number of antennas for the FRPS, PLUG, and IPLUG algorithms, wherein the IPLUG algorithm corresponds to a decision parameter 0.9, and the PLUG algorithm corresponds to an edge user of 1. As can be seen from Figure 11, when the number of antennas is 256, the NMSE performance of the edge users under the IPLUG algorithm is 2.95 dB higher than that of PLUG algorithm, and the central user NMSE is improved by 4.68 dB. The improvement of channel estimation performance greatly improves the performance of uplink signal detection and downlink coding. This is due to the flexibility brought by the IPLUG algorithm.

**Figure 11.** NMSE comparison of IPLUG, FRPS, and PLUG algorithms.

Figure 14 compares the uplink target cell achievable rate for FRPS and IPLUG with different decision parameters when the number of antennas is 256. As can be seen from Figure 14, when the decision parameter increases to *λ* ≥ 0.68, the achievable rate of the IPLUG algorithm gradually increases. Therefore, the achievable rate of the proposed IPLUG algorithm have significantly exceeded the FRPS algorithm, which is a clear advantage of the IPLUG algorithm, avoiding the waste of pilot overhead caused by some users with good channel conditions being misclassified as edge users, or some users with poor channel conditions are being misclassified as the central user which results in communication interruption.

#### *4.4. Analysis of Pseudo-Random Pilot Scheme*

Figure 15 shows the probability density function (PDF) curves of the channel estimation MSE for the proposed pseudo-random scheme and compares it with no pilot contamination and with pilot contamination cases. The parameters set used for these results are: the number of users per cell is *K* = 4, the number of BS antennas *M* = 100 and pilot length *τ* = 8. It can be seen from Figure 15 that the MSE of the channel estimation that can be obtained by all cells multiplexing the same set of orthogonal pilots is around 1.63, and the MSE distribution of the channel estimation obtained by the pilot after the proposed pseudo-random pilot design is around 0.82. It can be seen that the proposed pseudo-random code scheme is opposite to the channel when all the cells are multiplexed with the same set of orthogonal pilots. The estimated MSE performance has increased nearly two times. Although all cells use orthogonal pilots, there is no pilot contamination, but there are certain uncorrelated noise and fast fading effects when the number of antennas is limited. The MSE of channel estimation is not 0. As the number of BS antennas increases, the uncorrelated noise and fast fading effects are gradually averaged, and the MSE of channel estimation gradually approaches 0.

**Figure 12.** Achievable rate comparison of FRPS and PLUG algorithms for user fairness analysis.

**Figure 13.** Comparison of achievable sum rate against a different number of edge users for FRPS and PLUG algorithms.

**Figure 14.** Comparison of achievable sum rate against a different number of user grouping factor *λ* for IPLUG and FRPS algorithms.

**Figure 15.** PDF comparison of the proposed pseudo-random code scheme with other cases.

Figure 16 depicts the trend of the system's downlink transmission BER as the number of antennas increases. The number of users per cell is *K* = 8, the number of BS antennas *M* = 100 and pilot length *τ* = 32. It can be seen from Figure 16, as the number of BS antennas increases, the performance of the system is improved. In the case of full pilot contamination, the pilot performance is too serious, and the improvement of system performance is not obvious with the increase of the number of antennas. Therefore, it can be foreseen that when the number of antennas reaches a certain value, the system performance will not be improved.

**Figure 16.** Comparison of the BER of the proposed pseudo-random pilot design scheme with other schemes against the number of BS antennas.

When the proposed pilot design scheme or no contamination is used, the performance of the system is significantly improved with the increase of the number of antennas. Figure 17 depicts the trend of the downlink BER of the system as the average transmit power *ρ<sup>f</sup>* of the BS increases. The results are analyzed for different pilot lengths. The number of users per cell *K* = 8, the number of antennas configured by the BS *M* = 100. As can be seen from Figure 17 that when the number of BS antennas is large, the increase of the BS transmit power is effective for improving the system performance, but when it exceeds a certain value, the system performance will not be improved, and this value will be affected by the severity of pilot contamination. It can be seen from Figure 17 that in the case of complete pilot contamination, the system performance will not be improved after the BS transmission power reaches 25 dB. In the case of the proposed pilot design scheme (pseudo-random code) or no contamination, this value is greater than 25 dB. At the same time, it can be found that in the case of complete pilot contamination, the system performance will not be improved with the increase of the pilot length *τ*; in the absence of pilot contamination, the increase of the pilot length can improve the system performance to a small extent, but there is no need to sacrifice pilot overhead to boost such tiny performance. After using the proposed pseudo-random pilot design scheme, the performance of the system will increase significantly with the increase of the pilot length. Furthermore, when the pilot length *τ* = 64, the performance achieved by the proposed pilot design scheme is very close to the case of no pilot contamination. Figure 18 compares the BER of the proposed pseudo-random pilot design scheme with other cases versus the number of cell users *K*. The number of antennas configured by the BS is *M* = 100, and pilot length *τ* = 128. It can be seen from Figure 18 that in the case where the number of BS antennas is large, the increase in the number of cell users may deteriorate the performance of the system, but the performance of the system is still far better when the proposed pseudo-random pilot design or no pilot contamination scheme is used. At the same time, when the number of users is small and the pilot length *τ* = 128, the same performance of the system can be almost achieved in the case of no contamination when the proposed pilot design scheme is adopted. This is because the increase in the length of the pseudo-random code can only make them infinitely close to the orthogonal but cannot completely remove the pilot contamination. In the case of a small number of users, the pilot contamination is small enough to affect the performance of the entire system. In the case of a large number of users, the impact on the overall system performance is considerable due to the superposition of pilot contamination. Overall, the proposed pilot design scheme is very obvious for the improvement of system performance.

**Figure 17.** Comparison of the BER of the proposed pseudo-random code and another scheme versus the average BS power *ρ<sup>f</sup>* .

**Figure 18.** BER comparison against the number of cell users for the proposed pseudo-random pilot design and other cases.

Figure 19 compares the NMSE of the proposed IPLUG algorithm with the conventional state-of-the-art algorithms [29,30,32] wherein the IPLUG algorithm corresponds to a decision parameter 0.9. As can be seen from the figure that the proposed IPLUG algorithm shows better NMSE performance than the conventional algorithms for increasing number of base station antennas. Figure 20 compares.

**Figure 19.** NMSE comparison of IPLUG and conventional algorithms.

**Figure 20.** Comparison of the BER of the proposed pseudo-random pilot design scheme with other schemes against the number of BS antennas.

Figure 20 illustrates the BER comparison of the proposed pseudo-random code scheme and the conventional schemes as the number of antennas increases. The number of users per cell is *K* = 8, the number of BS antennas *M* = 160 and pilot length *τ* = 32. It can be seen from Figure 20, as the number of BS antennas increases, the performance of the system is improved. The proposed pseudo-random code scheme shows better BER performance as compared with the conventional algorithms with increasing number of antennas.

#### **5. Conclusions**

This paper proposes a robust approach for effective pilot decontamination in massive MIMO systems. Two efficient pilot decontamination schemes are proposed. The first scheme is depending on Path Loss to perform User Grouping (PLUG) method while the second scheme is depending on the pseudo-random code. The PLUG scheme divides users into central and edge users. Edger users are allocated orthogonal pilots, and central users are assigned multiplex pilots, which improves edge user performance. The Improved PLUG scheme (IPLUG) is further proposed to overcome the deficiency of the PLUG scheme as it dynamically selects and correctly classifies the edge users and central users so that there is no wrong misclassification and therefore, the communication quality of service is improved. The analytical and simulation results show that the proposed IPLUG scheme can avoid the waste of pilot overhead caused by users with good channel conditions being misclassified as edge users, or the users with poor channel conditions being misclassified as central users, resulting in communication interruption, therefore, the proposed IPLUG scheme increases the fairness of communication for each user. The MSE of the expected channel estimation after the proposed pseudo-random code pilot design scheme is deduced and analyzed. It is found that this scheme can not only effectively improve the performance of channel estimation, but a more accurate channel estimation can be obtained by selecting an appropriate pseudo-random code and pilot length. Thereby improving the performance of the entire downlink system. The above conclusions are verified by numerical simulation. The numerical results also show that the proposed pilot design scheme depending on pseudo-random code can greatly improve the performance of the entire system limited by pilot contamination. The proposed PLUG and IPLUG schemes are focused only on the cells of the omnidirectional antenna, and other sector cells are not considered. At the same time, the pilot allocation is performed in the target cell, and the pilot allocation between the cells is independent of each other and has certain limitations. Therefore, the future research direction will be PLUG and IPLUG schemes depending on pilot decontamination in sectoral cells. The problem is to consider the mutual connection of each cell to reasonably allocate pilots throughout the system.

**Author Contributions:** Conceptualization, O.A.S., I.K. and B.M.L.; Data curation, O.A.S. and I.K.; Formal analysis, O.A.S., I.K. and A.T.; Funding acquisition, I.K. and B.M.L.; Investigation, O.A.S., I.K. and A.T.; Methodology, O.A.S., I.K., B.M.L. and A.T.; Project administration, O.A.S. and I.K.; Software, O.A.S., I.K. and A.T.; Writing—original draft, O.A.S., I.K. and B.M.L.; Writing—review & editing, O.A.S., I.K., B.M.L. and A.T.

**Funding:** This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (grant number: NRF-2017R1D1A1B03028350).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **An Efficient Pilot Assignment Scheme for Addressing Pilot Contamination in Multicell Massive MIMO Systems, Shamala Subramaniam and Ali Mohammed Mansoor**

**Ahmed S. Al-hubaishi 1,\*, Nor Kamariah Noordin 1,2, Aduwati Sali 1,2, Shamala Subramaniam 3,4 and Ali Mohammed Mansoor <sup>5</sup>**


Received: 25 February 2019; Accepted: 21 March 2019; Published: 27 March 2019

**Abstract:** The reuse of the same pilot group across cells to address bandwidth limitations in a network has resulted in pilot contamination. This causes severe inter-cell interference at the targeted cell. Pilot contamination is associated with multicell massive multiple-input multiple-output (MIMO) systems which degrades the system performance even when extra arrays of antennas are added to the network. In this paper, we propose an efficient pilot assignment (EPA) scheme to address this issue by maximizing the minimum uplink rate of the target cell's users. To achieve this, we exploit the large-scale characteristics of the fading channel to minimize the amount of outgoing inter-cell interference at the target cell. Results from the simulation show that the EPA scheme outperforms both the conventional and the smart pilot assignment (SPA) schemes by reducing the effect of inter-cell interference. These results, show that the EPA scheme has significantly improved the system performance in terms of achievable uplink rate and cumulative distribution function (CDF) for both signal-to-interference-plus-noise ratio (SINR), and uplink rate.

**Keywords:** pilot contamination; massive MIMO; pilot assignment; large-scale fading coefficients

#### **1. Introduction**

Equipping the base station (BS) with a large number of antennas (also known as massive multiple-input multiple-output (MIMO)) has been considered one of the fundamental technologies that leads to 5G [1]. The introduction of this technology is to meet the increasing demand for mobile data in 5G [2]. Although the use of massive MIMO systems increases spectral efficiency, enhances energy efficiency, and reduces the effect of small scale fading [3–7], but invariably promotes pilot contamination. In massive MIMO, time-division duplex (TDD) protocol is preferred over the frequency-division duplex (FDD) [8,9], as the former allows channel estimation in one direction (i.e., uplink) and avoids the estimation

of the other side (i.e., downlink) due to channel reciprocity property. In other words, the use of TDD based channel reciprocal minimizes the overhead signals used for channel estimation, which largely saves network bandwidth. Although, the channel estimation ensures high utilization of TDD massive MIMO via uplink transmission, but its channel coherence blocks are restricted in size (limited size). Therefore, the orthogonal pilot sequences cannot be allocated for all users among the cells. To overcome this problem, the orthogonal pilot sequences have to be reused across the cells. Although, pilot reuse approach is a remarkable way forward in addressing the associated problem, however, the channel estimate obtained in a given cell will be contaminated by pilots transmitted by users in other cells. Specifically, the inter-cell interference exacerbates the estimation error and also makes sure the channel estimation of two or more users sharing the same pilot sequence is correlated at a given cell [10]. Thus, with multicell massive MIMO systems, its performance deteriorates during uplink and downlink transmission. This issue is referred to as pilot contamination, and depicted in Figure 1.

**Figure 1.** The effect of pilot contamination in multicell massive MIMO systems at a cell *a*, where the solid line represents the direct gain and the dotted line represents the inter-cell interference.

To address the issue associated with pilot contamination, several research methods have been proposed to eliminate/relieve pilot contamination. Among these methods, the pilot assignment technique is identified to be a potential technique for solving this problem. Smart pilot assignment (SPA) method proposed by [11], focused on adjusting the combination between the users and pilot sequences, but did not consider inter-cell interference which causes the pilot contamination. In this paper, we propose an efficient pilot assignment mechanism to improve the performance of users with respect to intense pilot contamination in multicell massive MIMO systems. We summarize our contributions below:


The rest of this paper is organized as follows. The related work is summarized in Section 2, the system model is described in Section 3, the pilot contamination phenomenon and the achievable uplink rate are illustrated in Section 4, the EPA scheme is explained in Section 5, the simulation results are depicted in Section 6, and finally, this paper is concluded in Section 7.

Notation: Throughout this paper, the bold lower case letters represent vectors and matrices are represented by bold upper case letters. **I***<sup>M</sup>* denotes the identity matrix of dimensions *M* × *M*. The operators

(.)−1,(.)*T*, and (.)*<sup>H</sup>* are defined for inverse, transpose and conjugate transpose operations, respectively. The expectation operator is represented by *E* . .

#### **2. Related Work**

Different traditional algorithms based on pilot assignment have been proposed for pilot decontamination [12,13]. A vertex graph-coloring-based pilot assignment has been proposed in [12], the pilot sequences are allocated to the users according to the inter-cell interference (ICI) graph. The evaluation of ICI graph depends on both angle of arrival (AoA) correlation and distances between users. However, this scheme requires a second order channel information to construct ICI graph. A deep learning-based pilot allocation scheme (DL-PAS) is proposed in [13] to address the pilot contamination problem in massive MIMO systems. This algorithm aims at learning from the relationship between pilot assignment and users' location. However, the DL algorithm requires high data and subsequently takes a longer time to process the data.

The authors in [14,15] developed the location-based pilot assignment approaches for pilot decontamination. A new expression for line of sight (LOS) interference is derived in [14] which is considered as the criteria for pilot allocation. Although, there was an improvement in the sum spectral efficiency (SE), but the pilot assignment process takes a longer time to be implemented, especially in large networks. The work in [15] characterizes the angular region of the targeted user, and the pilot assignment process was implemented with the aim of making this region interference-free. This angular region is characterized by both the number of BS antennas and the location of the targeted user. However, the pilot assignment problem is formulated by the joint optimization problems which subsequently introduce high computational complexity.

In [16,17], the pilot allocation based pilot reuse (reuse factor more than 1) is also considered for pilot contamination's elimination technique. A systematically-constructed pilot reuse method is proposed in [16]. In this approach, the neighbor cells are allowed to use different sets of pilot sequences according to the tree division. To improve performance, it ensures larger distance between cells that share similar pilot sets, the depth of the tree is increased as the pilot contamination severity increases. This approach offers an effective performance when the ratio of the channel coherence time to the number of users in each cell is relatively large. For the purpose of improving the quality of service (QoS) of the edge users, a soft pilot reuse (SPR) scheme was proposed by [17]. The channel quality for each user is initially compared with a determined threshold before the pilot allocation procedure, but an increase in complexity was recorded due to additional computational cost incurred by finding the optimal threshold value.

By considering a fairness among users in order to mitigate the pilot contamination, pilot allocation schemes were proposed in [18,19]. Specifically, to maximize the sum rate of the system and guarantee fairness among users, a pilot allocation scheme was proposed by [18]. An optimization problem is formulated based on a max-product criterion, then both min-leakage algorithm and user- exchange algorithm based on greedy (UEBG) pilot allocation were suggested to solve the optimization problem. Although this scheme almost achieves the same performance as the optimal exhaustive search algorithm (ESA), it still suffers a setback due to high complexity. For the purpose of pilot contamination mitigation in [19], the pilot assignment scheme based on the harmonic SINR utility function was introduced to regulate the fairness among users. However, the system complexity increases as the number of users and network size grows (more than two cells).

Based on performance degradation of users, a pilot assignment scheme has been proposed in [20], the degradation performance is initially evaluated for all users according to the value of the uplink achievable rate. Therefore, the optimal pilot sequences were assigned to users who suffered from the highest degradation in a greedy way. Obviously, this scheme is not effective in bad channel conditions.

In [11,21], the pilot allocation approaches aim at enhancing the performance of users who suffer from bad SINR. The pilot allocation in [21] focused on maximizing the sum capacity of the whole system for pilot decontamination. In this work, the pilot sequences were assigned initially to the users who have bad channel condition. However, the complexity of the pilot assignment procedure increases as the network size is increased. A SPA scheme is proposed in [11] to improve the performance of users with poor SINR. Users with low channel quality were assigned to pilot sequences which resulted in a low interference. However, the achievement of this scheme is limited as it did not consider inter-cell interference which causes the pilot contamination.

Some authors have tried to make a combination of two schemes to get an improved performance as shown in [22,23]. As such, a joint pilot assignment scheme has been proposed by [22], in which time-shifted [24] and the SPA [11] schemes were combined in order to mitigate the effect of pilot contamination. Inter-group interference is suppressed according to [24] strategy, whereas SPA is used to reduce intra-group interference. Although an improved overall performance was recorded, the mutual interference between downlink data and uplink pilot signals cannot be eliminated despite the use of SPA scheme. New pilot assignment schemes such as greedy-based and swapping-based were implemented together with pilot contamination precoding design (PCP) for massive MIMO downlinks [23]. This combination offers a considerable improvement over the random pilot assignment, but the PCP matrix is changed according to the update in pilot assignment information.

By exploiting the channel sparsity for wideband massive MIMO system, the pilot contamination can be removed with the help of pilot assignment policy in [25]. The pilot assignment policy is designed to help identify the subspace of the desired channel. The difficulty in this approach, lies on how to deal with the subspace estimation, which can be realized through multiple frames after randomizing the pilot contamination.

Differing from the aforementioned works [11,20], we consider the source of inter-cell interference throughout pilot assignment, which is essentially the cause of the pilot contamination. In some other works [12–15], the availability of some factors (e.g., user location, AoA, or LOS interference) are needed for pilot assignment which are not always easy to estimate, while our approach requires only large-scale fading coefficients, which can be tracked easily as they do not frequently change during coherence interval. Besides, comparing to previous works [17–19,21], our algorithm is not computationally intensive, and therefore it can be applied for large-scale networks.

#### **3. System Model**

In this section, we describe the system model under which the TDD-massive MIMO systems are implemented. In this model, the uplink comprises *L* cells, in which each cell contains a BS equipped with *M* antennas. Furthermore, in each cell coverage area *K* single-antenna users communicate simultaneously to their designated BS, assuming that *M K* [2,5]. The propagation channels connecting the *k-th* user located in the *j-th* cell to the BS in *i-th* cell is modeled as Rayleigh block fading [26] and the channel vector **h***i jk* <sup>∈</sup> <sup>C</sup>*M*×<sup>1</sup> is denoted as:

$$\mathbf{h}\_{j\_k}^i = \mathbf{g}\_{j\_k}^i \sqrt{\boldsymbol{\beta}\_{j\_k}^i}. \tag{1}$$

where **g***<sup>i</sup> jk* and *<sup>β</sup><sup>i</sup> jk* denote the small scale-fading vector and large-scale fading coefficient, respectively. The small scale-fading vector has a complex Gaussian distribution with zero mean and unity variance, CN (*0, IM*) , while the large-scale fading coefficient is referred to the effect of both path-loss and shadowing and it can be tracked easily as it changes slowly during coherence interval *τ<sup>c</sup>* = *BcTc* [27–29]. We use *Bc* and *Tc* to denote the coherence bandwidth and the coherence time, respectively. Figure 2 illustrates the coherence block for TDD protocol. We also consider that large-scale fading coefficient is equal for

all antenna elements, assuming that the distance between user *k* and BS is significantly larger than the distances between antenna elements.

**Figure 2.** Time-division duplex (TDD) Protocol.

#### **4. Pilot Contamination and Achievable Uplink Rate**

Since the size of channel coherence blocks is limited, it is difficult to assign orthogonal pilot sequences to all users in order to prevent pilot contamination. Thus, it is necessary to reuse the pilot sequences in all cells to overcome this limitation [2]. The pilot sequences Φ = [*φ*1, *φ*2, ..., *φK*] *<sup>T</sup>* <sup>∈</sup> <sup>C</sup>*M*×*τ<sup>p</sup>* are assumed mutually orthogonal Φ*T*Φ = *τp***I***<sup>K</sup>* with length of *τp*. During the pilot phase, the pilot sequences are distributed randomly to all users. Thus, the received signal **U***<sup>φ</sup> <sup>i</sup>* <sup>∈</sup> <sup>C</sup>*M*×*τ<sup>p</sup>* at the BS in the *i-th* cell can be written as:

$$\mathbf{U}\_i^{\boldsymbol{\Phi}} = \sqrt{\rho\_{\boldsymbol{\Phi}}} \sum\_{j=1}^{L} \sum\_{k=1}^{K} \mathbf{h}\_{jk}^{\boldsymbol{i}} \boldsymbol{\phi}\_k^T + \mathbf{N}\_i^{\boldsymbol{\Phi}} \tag{2}$$

$$\mathbf{U}\_{i}^{\phi} = \sqrt{\rho \boldsymbol{\varphi}} \sum\_{k=1}^{K\_{i}} \mathbf{h}\_{i\_{k}}^{i} \boldsymbol{\phi}\_{k}^{T} + \sqrt{\rho \boldsymbol{\varphi}} \sum\_{j=1 \atop j \neq i}^{L} \sum\_{k=1}^{K\_{j}} \mathbf{h}\_{j\_{k}}^{i} \boldsymbol{\phi}\_{k}^{T} + \mathbf{N}\_{i}^{\phi} \tag{3}$$

where *ρφ* denotes the pilot transmission power, and **<sup>N</sup>***<sup>φ</sup> <sup>i</sup>* <sup>∈</sup> <sup>C</sup>*M*×*τ<sup>p</sup>* denotes the additive white Gaussian noise (AWGN) matrix which is assumed independent and identically distributed (i.i.d) random variables whose elements have zero mean and variance *σ*<sup>2</sup> *<sup>N</sup>*. The received signal **<sup>U</sup>***<sup>φ</sup> <sup>i</sup>* is called the observation, in which the BS in the cell *i* can use it to estimate the channel responses. The first term in (3) represents the received pilot signals from users in the serving cell, whereas, the middle term represents the inter-cell interference signal from the neighbor cells, which causes the pilot contamination. Correspondingly, the received uplink data **u***<sup>d</sup> <sup>i</sup>* <sup>∈</sup> <sup>C</sup>*<sup>M</sup>* at the BS in the *i-th* cell can be represented by:

$$\mathbf{u}\_{i}^{d} = \sqrt{\rho\_{l}} \sum\_{j=1}^{L} \sum\_{k=1}^{K} \mathbf{h}\_{j\_{k}}^{i} \mathbf{x}\_{jk}^{u} + \mathbf{n}\_{i}^{u} \tag{4}$$

where *x<sup>u</sup> jk* denotes the uplink transmitted symbol from user *k* located in the *j-th* cell, *ρ<sup>u</sup>* denotes the power of the uplink transmitted symbol with *E* | *<sup>x</sup><sup>u</sup> jk* | 2 = 1, and **n***<sup>u</sup> <sup>i</sup>* <sup>∈</sup> <sup>C</sup>*M*×*τ<sup>u</sup>* denotes the AWGN vector with variance *σ*<sup>2</sup> *<sup>n</sup>* and zero mean value. The minimum mean square error (MMSE) is exploited for the purpose of the channel estimation **h**ˆ*<sup>i</sup> jk* <sup>∈</sup> <sup>C</sup>*M*×<sup>1</sup> [9]. Therefore, the MMSE estimated channel vector **<sup>h</sup>**ˆ*<sup>i</sup> jk* based on the observation **U***<sup>φ</sup> <sup>i</sup>* in (2) can be given as [10]:

$$
\hat{\mathbf{h}}\_{jk}^{i} = \sqrt{\rho\_{\mu}} \mathbf{R}\_{jk}^{i} \mathbf{w}\_{jk}^{i} \mathbf{u}\_{ijk}^{p} \tag{5}
$$

and

$$\mathbf{Y}\_{jk}^{i} = \left(\sum\_{j,k} \rho\_{\rm il} \tau\_p \, \mathbf{R}\_{jk}^{i} + \sigma\_\mathbf{u}^2 \mathbf{I}\_K\right)^{-1} \tag{6}$$

where **u***<sup>p</sup> ijk* <sup>=</sup> **<sup>U</sup>***<sup>φ</sup> <sup>i</sup> φ*<sup>∗</sup> *<sup>k</sup>* which is called the received proceed signal, **<sup>Ψ</sup>***<sup>i</sup> jk* denotes the inverse of the normalized correlation matrix, and **R***<sup>i</sup> jk* denotes the spatial correlation matrix of the channel to be estimated, **R***i jk* = *<sup>E</sup>*[**h***<sup>i</sup> jk***h***iH jk* ].

The estimated channel is then used to detect the uplink data symbol and precode the downlink data. Herein, we consider both maximum ratio combining (MRC) and zero forcing (ZF) as a linear detectors at the BS which are given by [30]:

$$\mathbf{A}\_{i} = \begin{cases} \mathbf{\hat{H}}\_{i}^{i} & \text{MRC} \\ \mathbf{\hat{H}}\_{i}^{i} \left( \mathbf{\hat{H}}\_{i}^{i\,H} \, \mathbf{\hat{H}}\_{i}^{i} \right)^{-1} & \text{ZF} \end{cases} \tag{7}$$

The received detected signal is evaluated by multiplying the received uplink data signal **u***<sup>d</sup> <sup>i</sup>* by the decoding vector **a***iH ik* , which represents the *k-th* column of the matrix **<sup>A</sup>***<sup>i</sup>* and **<sup>h</sup>***<sup>i</sup> ik* is the *k-th* column of the matrix **H***<sup>i</sup> i* . Therefore, the detected symbol of user *k* at a given BS located in a cell *i* can be expressed as:

$$z\_{ik}^{\underline{u}} = \mathbf{a}\_{ik}^{iH} \mathbf{u}\_i^d = \mathbf{a}\_{ik}^{iH} \left( \sqrt{\rho\_u} \sum\_{j=1}^{L} \sum\_{k=1}^{K} \mathbf{h}\_{jk}^i \mathbf{x}\_{jk}^u + \mathbf{n}\_i^u \right) \tag{8}$$

$$z\_{ik}^{\underline{u}} = \sqrt{\rho\_{\rm ik}} \mathbf{a}\_{ik}^{\underline{i}H} \mathbf{h}\_{ik}^{i} \mathbf{x}\_{ik}^{u} + \sqrt{\rho\_{\rm u}} \sum\_{\substack{n=1\\n\neq k}}^{K\_{i}} \mathbf{a}\_{ik}^{\underline{i}H} \mathbf{h}\_{i\_{n}}^{i} \mathbf{x}\_{in}^{u} + \sqrt{\rho\_{\rm u}} \sum\_{\substack{j=1\\j\neq i}}^{L} \sum\_{k=1}^{K\_{j}} \mathbf{a}\_{ik}^{\underline{i}H} \mathbf{h}\_{jk}^{i} \mathbf{x}\_{jk}^{u} + \mathbf{a}\_{ik}^{\underline{i}H} \mathbf{n}\_{i}^{u} \tag{9}$$

The first term in (9) represents the desired signal, the second one represents the intra-cell interference, the third term is the effect of pilot contamination (inter-cell interference), and the last one represents the uncorrelated noise.

Consequently, the average SINR of the *k-th* user in the target cell *i* can be evaluated as:

$$SNR\_{ik}^u = \frac{\rho\_u \ |\mathbf{a}\_{ik}^{iH} \mathbf{h}\_{jk}^i|^2}{\rho\_u \ \sum\_{\substack{j=1\\j\neq i}}^L \sum\_{k=1}^{K\_j} |\mathbf{a}\_{ik}^{iH} \ \mathbf{h}\_{jk}^i|^2 + \frac{v\_{ik}^i}{\rho\_u}} \tag{10}$$

and

$$\boldsymbol{\nu}\_{ik}^{\dot{i}} = \rho\_{\mu}{}^2 \sum\_{\substack{n=1\\n \neq k}}^{\mathcal{K}\_{\boldsymbol{i}}} |\mathbf{a}\_{ik}^{\boldsymbol{i}H} \cdot \mathbf{h}\_{in}^{\boldsymbol{i}}|^2 + \rho\_{\mu} \left||\mathbf{a}\_{ik}^{\boldsymbol{i}}||^2$$

where *υ<sup>i</sup> ik* denotes the intra-cell interference and uncorrelated noise, in which their effect is almost neglected as the number of antennas increases (*M* → ∞) [5]. Then, the uplink SINR can be described by large-scale fading coefficients *β<sup>i</sup> jk* as follows:

$$SINR\_{ik}^{u} = \frac{\left\| \begin{matrix} \beta\_{ik}^{i} \\ \sum\_{j \neq i} \beta\_{jk}^{i} \end{matrix} \right\|}{\sum\_{j \neq i} \left\| \beta\_{jk}^{i} \right\|}, \quad when \ (M \to \infty) \tag{11}$$

It is clear from the above expression that the effect of small-scale fading and thermal noise are averaged out as the number of antennas is increased [5]. Therefore, the ergodic achievable uplink rate of the user *k* according to [26] is:

$$R\_{ik}^{\mu} = \frac{1}{T\_c} \sum\_{\tau\_{\mu}} (1 + SINR\_{ik}^{\mu}) \tag{12}$$

where *R<sup>u</sup> ik* is calculated in bit/channel use and *τ<sup>u</sup>* refers to the uplink duration. From (12), it is obvious that the average uplink rate of multicell massive MIMO systems is limited due to pilot contamination and it cannot be boosted by increasing either the number of serving antennas or both *ρ<sup>u</sup>* and *ρp*.

#### **5. Proposed Scheme**

In this section, an efficient heuristic algorithm is developed for addressing the multicell massive MIMO associated problem. To do this, the assignment and the reuse of pilot group across cells in the network is formulated as an optimization problem.

#### *5.1. Problem Formulation*

Formally, we formulate an optimization problem as depicted in:

$$\mathcal{P} \xrightarrow{M \to \infty} \mathcal{P}: \begin{array}{l} \text{max} \\ \forall k \in i \end{array} \left( \min \frac{\left\| \boldsymbol{\beta}\_{ik}^{i} \right\|^{2}}{\sum\_{j \neq k}^{\min} \left\| \boldsymbol{\beta}\_{jk}^{i} \right\|^{2}} \right) \tag{13}$$

The above optimization problem is based on the method proposed by [11]. In this method, it is assumed the number of antennas is very large and as such make use of the large-scale fading coefficients *βi jk*. To address problems related to pilot contamination, this study concentrates on assigning the pilot sequences for a specific cell in multicell massive MIMO systems. In the target cell, the number of possible iterations is defined by the number of *K* users which is usually very high. In contrast, the conventional scheme assigns the pilot sequences Φ = [*φ*1, *φ*2, ..., *φK*] *<sup>T</sup>* randomly to *K* users.

The performance of multicell massive MIMO systems is much degraded by the effect of the strong inter-cell interference from the neighbor cells and is exacerbated when the channel quality of the users in target cell is poor. Specifically, in the SPA scheme, the set of users with the worst channel quality are assigned pilot sequences with the lowest inter-cell interference. Although these pilot sequences have the lowest interference, they are still considered high interference pilot sequences when used by users which have bad channel quality. Therefore, the interference that is associated with such pilot sequences must be minimized.

#### *5.2. Proposed Solution*

To achieve minimal outgoing inter-cell interference among neighbor cells for the target cell, we ensure a weak channel cross gain of the interfering users against desired users. The large-scale fading coefficients are used to measure the effect of inter-cell interference at the target BS. Thus, the effect of inter-cell interference can be measured by using these fading coefficients as it changes progressively during the coherence interval *τc*, as every user measured result is sent to its corresponding BS. The required conditions for finding the large-scale fading coefficients can be met in long term evolution-advanced (LTE-A) systems. These corresponding BSs contain the channel's information for the available BSs. The user keeps tracking these BSs until a reliable BS is identified for suitable handover. To enhance the cooperation among the BSs, we assume acquisition of the coordinated multi points (CoMP). Furthermore, a mobility management entity (MME) is connected to BSs by S1 interface and has a huge ability for computing. As a result, this unit can collect the large-scale fading coefficients from the connected BSs [31,32].

To abate the effect of setback suffered by users due to poor channel quality or high interference, the SINR is optimized. This is done by assigning the pilot sequence, which is associated with low interference, to users having poor channel quality.

In order to achieve this, we propose a heuristic algorithm based on SPA to solve the optimization problem in (13). Before illustrating the algorithm, we need to define a set of parameters *ηjk* which characterizes the squared cross gain of the interfering users from neighboring cells:

$$
\eta\_{jk} = \left. \theta\_{jk}^{i\_{\perp}} \right|^2, \quad k = 1, 2, \dots, K, \ j = 1, 2, \dots, L \text{ and } j \neq i.
$$

The interference that is produced by users who shared the same pilot sequence *φ<sup>k</sup>* can be evaluated at the target cell as:

$$\zeta\_k^\* = \sum\_{j \neq i} \eta\_{jk} \tag{14}$$

In addition, the set of parameters *<sup>k</sup>* is used to characterize the square channel quality of the target cell's users which can be expressed by:

$$
\boldsymbol{\phi}\_k = \boldsymbol{\beta}\_{ik}^i \; \; \; \; \quad \quad \quad k = 1, 2, \dots, K
$$

So, the optimization problem in (13) can be re-written as the following:

$$\mathcal{P} \xrightarrow{M \to \infty} \mathcal{P} : \begin{pmatrix} \max\_{\forall k \in i} \left( \min\_{\frac{\min\_{k=1} \sum\_{j \neq i} \eta\_{jk}}{\sum\_{j \neq i} \eta\_{jk}} \right) \end{pmatrix} \tag{15}$$

The proposed algorithm EPA is summarized in Algorithm 1 to solve the above optimization problem.

#### **Algorithm 1** Efficient Pilot Assignment (EPA).

#### 1: **Input**:

2: *β<sup>i</sup> jk* ∀*i*, *j and k*

```
3: Output:
```

The available large scale fading coefficients are exploited to measure the interference from the neighbor cells. From the above algorithm, the users in the neighbor cells are classified into different levels according to the value of squared cross gain (*ηjk*), which gives an indication of the strength of the interference at the target cell *i*. The users that cause the highest interference (which have the largest *ηjk*) are classified as the level *V*<sup>1</sup> users. This level involves the worst interfering users from each neighbor cell. The second level *V*<sup>2</sup> contains the users which cause less interference than that in *V*1. This classification process will continue until the last level *VK*, which contains the users that produce the smallest interference. The *k-th* interference level can be represented by:

$$\mathcal{V}\_k = [\eta\_{1k}, \eta\_{2k}, \dots, \eta\_{(L-1)k}] \tag{16}$$

The amount of interference that is produced by the users in each level is described by (14). After that, the interfering users in each level are assigned the same pilot sequence. For instance, the users in *V*<sup>1</sup> and *VK* are assigned the pilot sequences *φ*<sup>1</sup> and *φK*, respectively. As a result, the pilot sequence *φ*<sup>1</sup> is suffering from the highest interference, whereas *φ<sup>K</sup>* is the one with the lowest interference. The remaining pilot sequences have different levels of interference between *φ*<sup>1</sup> and *φK*. After minimizing the inter-cell interference at the serving BS, the second step is to assign pilot sequences to its users and this can be achieved by solving the following formula:

$$\mathcal{P} \xrightarrow{M \to \infty} \mathcal{P} : \begin{pmatrix} \max\_{\forall k \in i} \left( \min \frac{\mathcal{O}\_k}{\mathfrak{I}\_k} \right) \\ \text{s.t.} \end{pmatrix} \tag{17}$$

Obviously, from the EPA algorithm, the pilot assignment process for the users of the target cell depends on, both the squared channel quality *<sup>k</sup>* and the minimized outgoing interference *ξk*, which is caused by users sharing the same pilot sequence in the level *Vk* . The optimization problem in (17) can be solved with the help of the SPA algorithm. In this algorithm, users that suffer setbacks due to bad channel quality are exempted from the pilot sequence as it will cause severe interference. Thus, the sets of users with the worst channel quality are assigned pilot sequence with the lowest inter-cell interference. For the remaining cells, the process will continue in a sequential way, excluding the cells that are already included with the target cell.

Furthermore, our algorithm is not computationally intensive in the sense that it ultimately relies on cell sorting, thus the time complexity it incurs is *O* (*L K*log *K*), and therefore it works faster if compared to recent schemes. For example, EPA shows less computational complexity than the work in [19,21], which incur *O* (*L K*3) and *O* (*L*<sup>2</sup> *K*log *K*), respectively. In addition, the scheme in [17] incurs *O* (*M*(*K*<sup>2</sup> *<sup>e</sup>* + *K*<sup>2</sup> *CS*), where *Ke* denotes the number of edge users in the network, and *KCS* represents the number of users in the largest cell. So apparently [17] is much more intense than EPA. The SPA scheme [11], as it is fundamentally limited to only a target cell optimization, unsurprisingly it incurs only *O* ( *K*log *K*).

#### **6. Simulation Results**

The base code implemented is obtained from [26], while Monte Carlo simulation is used to evaluate the performance of the EPA scheme. A typical hexagonal cellular network made up of *L* cells is considered in the EPA scheme. Each of these cells comprises of a BS which is equipped with *M* number of antennas and *K* users with single antennas under its coverage area [2,5]. A center cell surrounded by all other cells is considered as a target cell. The system parameters are summarized in Table 1. The parameter *β<sup>i</sup> ik* is modeled in decibel as [10]:

$$
\beta\_{ik}^{i} = \text{Y} + 10 \text{ a } \log\_{10} \left( \frac{d\_{jk}^{i}}{1 \text{ km}} \right) + F\_{jk}^{i} \tag{18}
$$

where *d<sup>i</sup> jk* (km) is the distance between the *k-th* user in the *j-th* cell and the BS in the *i-th* cell, *α* is the path-loss exponent, Υ determines the median channel gain at 1 km as a reference distance which can be calculated according to many propagation models [33], and *F<sup>i</sup> jk* N *(*0, *<sup>σ</sup>*<sup>2</sup> *s f*) is the shadow fading which creates log-normal random variations around the nominal value <sup>Υ</sup> + <sup>10</sup> *<sup>α</sup>* log10(*d<sup>i</sup> jk*/1 km).


**Table 1.** Simulation parameters.

We evaluate the SPA [11] and the conventional schemes [2,5] against the EPA scheme. Figure 3 depicts the average uplink rate per user of the EPA, SPA and conventional schemes against the number of BS's antennas using the ZF as a linear detector. Obviously, the average uplink rate of the EPA scheme outperforms the other schemes. This improvement can be attributed to the policy implemented for pilot assignment in the neighbor cells. This implemented policy ensures a significant reduction of the inter-cell interference at the serving BS, which invariably leads to a better throughput. Due to the pilot assignment in the target cell which was executed according to the users' channel quality, the SPA scheme achieves better performance than the other conventional scheme. However, the performance of both SPA and conventional schemes changes slightly when the number of antennas exceeds certain points (e.g., greater than 150).

**Figure 3.** The average uplink rate per user with zero forcing (ZF) for different numbers of antennas.

Figure 4 shows the impact of the EPA scheme when using the MRC as a linear detector. It can be clearly observed that the average uplink rate per user (bits/channel use) is substantially enhanced by the EPA scheme when the number of antennas is increased. The superiority of the EPA scheme over other schemes, arose as a result of the minimization of the inter-cell interference that comes from the neighbor cells. This is achieved by allowing the users in each interference level *Vk* to share the same pilot sequence. Consequently, EPA scheme has shown a low interference from the neighbor cells compared to the other schemes.

**Figure 4.** The average uplink rate per user with maximum ratio combining (MRC) for different numbers of antennas.

Figure 5 depicts the performance of the EPA scheme when compared with both conventional and SPA schemes in terms of cumulative distribution function (CDF) of the average SINR. When the number of BS's antennas is 64 with ZF detector, the probabilities of the average uplink SINR being less than −10 dB for the conventional, the SPA, and the proposed EPA schemes are almost 80%, 26.25%, and 10%, respectively. The improvement is achieved because the effect of the interference, which is associated with the pilot sequences, on channel quality of the users in the target cell became slight, which effectively increased the SINR of the system.

Figure 6 depicts the CDF of the minimum SINR when *M* is 64. It is evident that the minimum SINR of the EPA scheme is significantly improved when compared with SPA and conventional schemes. For example, the probability of the minimum SINR to be less than −20 dB for the EPA scheme is approximately 16.25%, while this probability is about 34.6% and 79.6% for the SPA and the conventional schemes, respectively. The reason behind this improvement is due to assigning the pilots of the users with the lowest interference, in the neighbor cells, to the users who have bad channel quality in the target cell. In consequence, the performance of these users was improved due to the reduction of their interference.

**Figure 5.** The cumulative distribution function (CDF) of the average signal-to-interference-plus-noise ratio (SINR) when *M* = 64 using ZF.

**Figure 6.** The CDF of the min. SINR when *M* = 64 using ZF.

Figures 7 and 8 depict the CDF of average and minimum SINR, respectively, using MRC detector when *M* is 64. As observed from Figures 7 and 8, the EPA scheme outperforms the SPA and the conventional

schemes. As shown in Figure 7, the EPA scheme increases the average SINR by 1.8 dB over the SPA scheme, whereas it increases up to 4.69 dB for minimum SINR, as illustrated in Figure 8.

**Figure 7.** The CDF of the average SINR when *M* = 64 using MRC.

**Figure 8.** The CDF of the min. SINR when *M* = 64 using MRC.

From Figures 5–8, the minimum SINR always achieves a better performance. In other words, the performance of edge users is significantly enhanced. This is due to the fact that the inter-cell interference has been greatly reduced at the target cell while the users with poor channel quality are assigned to the

suitable pilot sequences in order to maximize its SINR. Moreover, the results obtained as a result of using ZF and MRC linear detections, are approximately comparable when run on the same parameters setting. This is because the inter-cell interference is greatly reduced by the EPA scheme that runs before the process of signal detection.

By using ZF detector, the performance of the EPA scheme has been examined in terms of the CDF of the average uplink rate when *M* is 64, as shown in Figure 9. It can be seen that the performance of the CDF in the conventional scheme is highly influenced by the pilot contamination. The assignment of the pilot randomly, has led to the worst performance compared to the SPA and the EPA schemes. On the other hand, the EPA scheme outperforms the SPA and the conventional schemes, since the effect of users who cause the highest interference is considered weak compared to users having good channel quality when they are assigned the same pilot sequence. As a result, these interfering users are excluded from sharing the same pilots of users with bad channel quality.

**Figure 9.** The CDF of the average uplink rate when *M* = 64 using ZF.

Result of evaluation for the CDF of the minimum uplink rate is depicted in Figure 10. It is clear that the EPA scheme performs better than the other schemes. For example, the minimum uplink rate of the EPA scheme is doubled when compared to the SPA scheme. This improvement has been achieved because the interference associated with pilot sequences, which is allocated to users with bad channel quality, was reduced effectively by the EPA scheme.

Figures 11 and 12 represent the CDF of the average and the minimum uplink rate, respectively, when the MRC is utilized and *M* is 64. The EPA scheme achieves the highest performance when compared with other schemes, especially in the minimum uplink rate. Specifically, the achieved gain in minimum uplink rate is doubled while it is 1.2 times in average uplink rate in comparison with SPA. The reason for this improvement in the minimum uplink rate is due to the priority given to the users having the worst channel quality during the pilot assignment process.

**Figure 10.** The CDF of min. uplink rate when *M* = 64 using ZF.

**Figure 11.** The CDF of average uplink rate when *M* = 64 using MRC.

In order to verify the effectiveness of the EPA scheme, the average uplink rate against the number of antennas has been evaluated in Figure 13 with different parameters, considering ZF detector. These parameters, which are shown in Table 1, increase the interference severity at the target cell. Obviously, the average uplink rate of EPA schema is higher than other schemes, despite the intensity of interference.

**Figure 12.** The CDF of min. uplink rate when *M* = 64 using MRC.

**Figure 13.** The average uplink rate per user with ZF for different numbers of antennas, K = 20 , and R = 300 m.

#### **7. Conclusions**

In this paper, we propose a new pilot assignment approach to address the pilot contamination in multicell massive MIMO systems. An optimization problem is formulated in order to improve the minimum uplink rate for users in the target cell. Our approach to solving this optimization problem ensures an overall reduction of the outgoing inter-cell interference of neighbor cells. This reduction is achieved by assigning the pilot sequences to the neighbor cell's users and maximizing the minimum uplink rate of the target cell's users based on SPA algorithm. The numerical results have clearly shown that the EPA scheme is more effective than the other schemes in both MRC and ZF linear detections. Additionally, using such an efficient assigning approach entitles the new EPA scheme to achieve significant performance when the typical parameter *M* is 64 compared to the SPA and the conventional schemes. Likewise, the minimum uplink rate is greatly enhanced by the new EPA scheme than the SPA scheme. Furthermore, the proposed scheme has also proved high effectiveness and performance even in severe interference environments.

**Author Contributions:** Conceptualization, A.S.A.-h. and N.K.N.; Formal analysis, A.S.A.-h. and N.K.N.; Methodology, A.S.A.-h., N.K.N., A.S., S.S. and A.M.M.; Writing—review & editing, A.S.A.-h. and N.K.N.

**Funding:** This work was funded by [Advancing the State of the Art of MIMO: The Key to Successful Evolution of Wireless Networks (ATOM)] grant number [690750-ATOM-H2020-MSCA- RISE-2015, UPM: 6388800-10801] and The APC was funded by [Research Management Centre (RMC)].

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**




© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

## **Downlink Channel Estimation in Massive Multiple-Input Multiple-Output with Correlated Sparsity by Overcomplete Dictionary and Bayesian Inference**

#### **Wei Lu 1,\*, Yongliang Wang 1, Xiaoqiao Wen 1, Shixin Peng <sup>2</sup> and Liang Zhong <sup>3</sup>**


Received: 24 March 2019; Accepted: 24 April 2019; Published: 28 April 2019

**Abstract:** We exploited the temporal correlation of channels in the angular domain for the downlink channel estimation in a massive multiple-input multiple-output (MIMO) system. Based on the slow time-varying channel supports in the angular domain, we combined the channel support information of the downlink angular channel in the previous timeslot into the channel estimation in the current timeslot. A downlink channel estimation method based on variational Bayesian inference (VBI) and overcomplete dictionary was proposed, in which the support prior information of the previous timeslot was merged into the VBI for the channel estimation in the current timeslot. Meanwhile the VBI was discussed for a complex value in our system model, and the structural sparsity was utilized in the Bayesian inference. The Bayesian Cramér–Rao bound for the channel estimation mean square error (MSE) was also given out. Compared with other algorithms, the proposed algorithm with overcomplete dictionary achieved a better performance in terms of channel estimation MSE in simulations.

**Keywords:** massive MIMO; channel estimation; Bayesian inference; overcomplete dictionary

#### **1. Introduction**

Massive multiple-input multiple-output (MIMO) is the key technology for next generation wireless communication. The large number of antennas enable high spectrum efficiency and lower power consumption [1]. To get these benefits, the base station (BS) needs to acquire the channel stated information (CSI) for uplink and downlink. Pilot-based channel estimation is widely used in wireless communication systems. In the time division duplex (TDD) system, the channel reciprocity is used to get the CSI by only estimating the uplink channel at BS. In the frequency division duplex (FDD) system, the channel reciprocity cannot be used directly. In FDD massive MIMO system it is challenging to get the downlink CSI with the conventional feedback scheme. In the conventional feedback scheme each user estimates its channel and then feeds back the estimated CSI to the BS. The pilot and feedback overheads are high for massive MIMO, since they are scaling linearly with the number of antennas. Hence, it is important to design an efficient downlink channel estimation and feedback scheme for a FDD massive MIMO system.

By exploiting the sparsity in massive MIMO channel, compressed sensing (CS) was applied in the channel estimation and feedback. The users could feed the compressed training measurements back to the BS, and an orthogonal matching pursuit (OMP) was used for downlink CSI recovery in [2]. In [3] the modified basis pursuit (MBP) was proposed by utilizing the partial priori signal support information to improve the recovery performance. In [4] the support information of a signal in the discrete fourier transform domain was incorporated into the weighted *l1* minimization approach for CS recovery, which could reduce the number of measurements by the size of the known part of support. In [5] a three-level weighting scheme based on the support information was used for the weighted *l1* minimization and the simulation results showed superiority. In [6] we exploited the reciprocity between uplink and downlink channels in the angular domain, and diagnosed the supports of the downlink channel from the estimated uplink channel, and proposed a weighted subspace pursuit (SP) channel estimation algorithm for FDD massive MIMO. It can be seen that CS was effective in the channel estimation for massive MIMO.

However, most of these algorithms need the sparsity level in the estimation algorithm, which is not practical in engineering scenarios. The Bayesian framework can be applied to the compressive channel estimation. In [7], Bayesian estimation of sparse massive MIMO channel was developed in which neighboring antennas shared among each other their information about the channel support. In [8] a variational expectation maximization strategy was used for massive MIMO channel estimation, and a Gaussian mixture prior model was designed to capture the individual sparsity for each channel and the joint sparsity among users. In [9] a sparse Bayesian learning algorithm was proposed for FDD massive MIMO channel estimation with arbitrary 2D-array. By the Bayesian framework in compressive channel estimation the sparsity level is unnecessary, and it has relatively better recovery performance. Additionally, there exists angular reciprocity in massive MIMO. For example, the channel covariance matrices for uplink and downlink are reconstructed by making use of the angle reciprocity between uplink and downlink channels in [10]. Hence it is promising to apply the angular reciprocity and Bayesian framework in the compressive massive MIMO channel estimation.

Additionally, there exists angular reciprocity in the FDD massive MIMO. There is also time correlation of channels. In [11] a differential compressive feedback in FDD massive MIMO was proposed based on the channel impulses response (CIR) between timeslots, which were slow time-varying and sparse, and the differential CIR between two CIRs in adjacent timeslots was sparse. Inspired by the sparsity in the angular domain and time correlation of channels, the correlated angular sparsity can also be exploited for massive MIMO channel estimation.

In this paper we proposed a downlink channel estimation in a TDD/FDD massive MIMO system. The timeslots were divided into groups. In each group the estimated channel support information of the previous timeslot was utilized by the following timeslot. The correlated angular sparsity between timeslots in the downlink channel was utilized in the Bayesian inference for channel recovery. We transformed the complex sparse vector to the real sparse vector recovery by Bayesian inference, and the structural sparsity of the transformed real sparse vector was utilized. Meanwhile, the prior support information from the estimated channel in the previous timeslot was made use of in modeling the hidden hyperparameters in the Bayesian model. A Bayesian Cramér–Rao bound analysis is presented, and simulations are given out to verify the performance of the proposed algorithm. The main contributions were as follows: (1) a group-based channel estimation scheme was proposed, in which previous estimated channel support information was used as the priori information in the following timeslot due to the sparsity correlation; (2) priori information was merged into the Bayesian inference algorithm for channel recovery; (3) the Bayesian Cramér–Rao bound for the channel estimation mean square error (MSE) was analyzed.

The system model is illustrated in Section 2, while the proposed channel estimation algorithm based on Bayesian inference is presented in Section 3. The Bayesian Cramér–Rao bound (BCRB) for the channel estimation of mean square error (MSE) is given out in Section 4. Simulations and conclusions are presented in Sections 5 and 6.

In the paper, we used the following notations. Scalars, vectors and matrices were denoted by lower-case, boldface lower-case and boldface upper-case symbols. The probability density function

of a given random variable was denoted by *p*(·). Gamma(*x*|*a*, *b*) was the Gamma probability density function (PDF) with shape parameters *a* and *b* for *x*, while Normal(*x*|*c*, *d*) was the Gaussian PDF with parameters mean *c* and variance *d* for *x*. Γ(·) was the Gamma function, and ln(·) was the logarithm function. Tr(·) stood for the trace operator. a(·) denoted the expectation operation with the PDF of variable *a*.

#### **2. System Model**

We considered a massive MIMO TDD/FDD system with a single user, and assumed that the BS was equipped with *N* antennas and the user terminal (UT) had a single antenna. For the downlink channel estimation in the massive MIMO system, the BS transmitted the pilots to UT. The UT received the pilots and fed back the received signal to the BS directly. The received signal **y***d*(*t*) at the UT in the *t*-th timeslot was written as

$$\mathbf{y}^d(t) = \sqrt{\rho^d} \mathbf{A} \mathbf{h}^d(t) + \mathbf{n}^d(t) \tag{1}$$

where **<sup>h</sup>***d*(*t*) <sup>∈</sup> <sup>C</sup>*N*×<sup>1</sup> is the downlink channel, **<sup>A</sup>** <sup>∈</sup> <sup>C</sup>*Td*×*<sup>N</sup>* is the downlink pilots, *Td* is the pilot length, <sup>ρ</sup>*<sup>d</sup>* is the downlink received power, **<sup>n</sup>***<sup>d</sup>* <sup>∈</sup> <sup>C</sup>*Td*×<sup>1</sup> is the received noise with each element to be i.i.d Gaussian with mean 0 and variance <sup>σ</sup>2, **<sup>y</sup>***d*(*t*) <sup>∈</sup> <sup>C</sup>*Td*×<sup>1</sup> is the received signal at UT.

Since in the massive MIMO there existed sparsity, when **<sup>D</sup>***<sup>d</sup>* <sup>∈</sup> <sup>C</sup>*N*×*<sup>M</sup>* was the channel dictionary for downlink channel which could be unitary dictionary or overcomplete dictionary (*M* > *N*, their column vector had the form of steering vector with a different sampling angle), **h***<sup>d</sup> <sup>a</sup>* (*t*) was the sparse representation with **h***d*(*t*) = **D***d***h***<sup>d</sup> <sup>a</sup>* (*t*). In this paper we applied the overcomplete dictionary to present the sparse angular channel to get a better recovery performance. In the downlink channel estimation, we needed to obtain **<sup>h</sup>**<sup>ˆ</sup> *<sup>d</sup> <sup>a</sup>* (*t*) the estimated downlink channel in the angular domain in the *t*-th timeslot.

By utilizing the sparse channel representation we then had

$$\mathbf{y}^d(t) = \sqrt{\rho^d} \mathbf{A} \mathbf{D}^d \mathbf{h}\_d^d(t) + \mathbf{n}^d(t) \tag{2}$$

For simplicity, the timeslot mark is omitted in the following equations. Since **y***d*(*t*), **h***<sup>d</sup> <sup>a</sup>* (*t*), and **n***d*(*t*) are complex number vectors, we could rewrite Equation (2) into real number vectors as

$$
\begin{bmatrix}
\operatorname{Re}(\mathbf{y}^d) \\
\operatorname{Im}(\mathbf{y}^d)
\end{bmatrix} = \begin{bmatrix}
\operatorname{Re}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d) & -\operatorname{Im}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d) \\
\operatorname{Im}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d) & \operatorname{Re}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d)
\end{bmatrix} \begin{bmatrix}
\operatorname{Re}(\mathbf{h}\_d^d(t)) \\
\operatorname{Im}(\mathbf{h}\_d^d(t))
\end{bmatrix} + \begin{bmatrix}
\operatorname{Re}(\mathbf{n}^d(t)) \\
\operatorname{Im}(\mathbf{n}^d(t))
\end{bmatrix} \tag{3}
$$

where Re(·) and Im(·) denote the real and imaginary parts respectively. For simplicity, we rewrote Equation (3) as

$$
\overline{\mathbf{y}} = \overline{\mathbf{A}} \overline{\mathbf{h}} + \overline{\mathbf{n}} \tag{4}
$$

$$\begin{aligned} \text{where } \overline{\mathbf{y}} &= \begin{bmatrix} \operatorname{Re}(\mathbf{y}^d) \\ \operatorname{Im}(\mathbf{y}^d) \end{bmatrix}, \overline{\mathbf{A}} = \begin{bmatrix} \operatorname{Re}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d) & -\operatorname{Im}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d) \\ \operatorname{Im}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d) & \operatorname{Re}(\sqrt{\rho^d} \mathbf{A} \mathbf{D}^d) \end{bmatrix}, \overline{\mathbf{h}} &= \begin{bmatrix} \operatorname{Re}(\mathbf{h}\_d^d(t)) \\ \operatorname{Im}(\mathbf{h}\_d^d(t)) \end{bmatrix} \text{ and } \overline{\mathbf{n}} = \begin{bmatrix} \operatorname{Re}(\mathbf{h}\_d^d(t)) \\ \operatorname{Im}(\mathbf{h}\_d^d(t)) \end{bmatrix}. \end{aligned}$$

On the other hand, we considered the meaning of sparse angular channel representation **h***<sup>d</sup> <sup>a</sup>* (*t*). If the transmission angles were allocated exactly at the sampling points in the channel dictionary *Dd*, then the corresponding coefficient in the **h***<sup>d</sup> <sup>a</sup>* (*t*) was nonzero. If the path number was smaller than the antenna number, then **h***<sup>d</sup> <sup>a</sup>* (*t*) was sparse. However, there was leakage effect induced by dictionary mismatch which will have deteriorated the sparsity of the angular channel representation [12]. When the movement velocity of UT was not very high, e.g., *v* = 12 km/h, and the typical timeslot duration τ = 0.5 ms, the movement distance of UT in one timeslot was 0.017 m. When the distance of UT and BS

was 200 m, the angle change for the line of sight (LoS) transmission in one timeslot was 0.0049◦ which was much smaller than the sampling interval in the dictionary. For the non-LoS (NLoS) transmission, the angle change was also small which is discussed in Section 4.1. Hence the transmission angle change between two timeslots is very small if the transmission environment doesn't change dramatically, and there is correlation in the angular channel sparsity between adjacent timeslots. In other words, the information regarding the estimated angular channel in the previous timeslot could be utilized in the current channel estimation.

It was proven that the prior support information could improve the channel recovery performance [3–6]. Hence in this paper we made use of the prior support information from the previous timeslot to improve the Bayesian channel estimation. In the following section we have discussed how to merge the prior information into the Bayesian inference algorithm for channel estimation.

#### **3. Proposed Algorithm**

We designed a three-layer hierarchical graphical model as shown in Figure 1. In the first layer, **h** was assigned a Gaussian prior distribution

$$p(\overline{\mathbf{h}}|\alpha) = \prod\_{i=1}^{2N} p(\overline{h}\_i|\alpha\_i) \tag{5}$$

where *hi* and α*<sup>i</sup>* are the *i*-th entry in **h** and α respectively, *p*(*hi* <sup>α</sup>*i*) = *Normal*(*hi* 0, <sup>α</sup>*i*) and <sup>α</sup>*<sup>i</sup>* is the inverse variance of the Gaussian distribution. When *hi* is close to 0, then α*<sup>i</sup>* is very large, and vice versa.

In the second layer, we assumed a Gamma distribution as hyperpriors over the hyperparameters α*i*, and it can be presented as

$$p(\boldsymbol{\alpha}) = \prod\_{i=1}^{2N} \text{Gamma}(\alpha\_i | a\_i, b\_i) \tag{6}$$

where Gamma(·) is the Gamma PDF, and the parameters *ai* and *bi* characterize the shape of Gamma PDF. For fixed *ai*, the larger *bi* is, the smaller α*<sup>i</sup>* is; then *hi* tends to be nonzero. In the sparse Bayesian learning *ai* and *bi* were set to be very small for non-informative hyperprior over α*<sup>i</sup>* [13].

In our model, we set *ai* to be constant with a predefined value, and we modeled *bi* as random parameters. In Figure 1 it could be found that the entries of *y* were divided into two sets by their indices, i.e., *S*, *S*+*<sup>N</sup>* and *S*, *S*+*<sup>N</sup> c* , where S was the set with channel support indices from the previous timeslot, and *S*+*<sup>N</sup>* was the set with each index in S added by *N,* since we converted the complex system model to the real system model as Equation (3). *S*, *S*+*<sup>N</sup> <sup>c</sup>* was the complementary set of *S*, *S*+*<sup>N</sup>* . For example, in the (*<sup>t</sup>* <sup>−</sup> 1)-th timeslot, the positions of nonzero entries or called supports in **<sup>h</sup>***<sup>d</sup> <sup>a</sup>* (*t* − 1) were S <sup>=</sup> {4, 5, 6}, then *<sup>S</sup>*+*<sup>N</sup>* <sup>=</sup> {<sup>4</sup> <sup>+</sup> *<sup>N</sup>*, 5 <sup>+</sup> *<sup>N</sup>*, 6 <sup>+</sup> *<sup>N</sup>*}. The probable supports for **<sup>h</sup>***<sup>d</sup> <sup>a</sup>* (*t*) in the current t-th timeslot can be assumed to be the same as those for previous (*t* − 1)-th timeslot for simplicity. On the other hand, we could have also diagnosed the probable channel supports further by taking the angle deviation and leakage effects into consideration. In this paper we adopted the support diagnosis algorithm, and the details can be found in [6].

For *yj* , *<sup>j</sup>* <sup>∈</sup> *S*, *S*+*<sup>N</sup>* , we employed a Gamma distribution over the hyperparameters *bi* in the third layer as

$$\text{Gamma}(b\_{\bar{i}}|c,d) = \Gamma(c)^{-1}d^{\bar{c}}b\_{\bar{i}}^{\bar{c}-1}e^{-db\_{\bar{i}}} \tag{7}$$

where *c* and *d* characterize the shape of Gamma PDF. By the system model and assumptions for massive MIMO, we could use a Bayesian inference to perform the sparse channel recovery.

**Figure 1.** Graphical model for the channel estimation with Bayesian inference. The nodes with double circle, single circle and square correspond to the observed data, hidden variables and parameters, respectively.

According to the standard Bayesian inference [14], let **z** - *h*, α, *b* , we have

$$\begin{array}{rcl}\ln p(\mathbf{z}\_i) &=& \mathbb{E}\_{\mathbf{z}\_i, j \neq j} [\ln p(\overline{\mathbf{y}}, \mathbf{z})] + \text{constant} \\ &\propto \mathbb{E}\_{\mathbf{z}\_i, j \neq j} [\ln p(\overline{\mathbf{y}}, \mathbf{z})] \end{array} \tag{8}$$

where constant is a constant used for *p*(**z***i*) normalization, *p*(*y*, *z*) is the joint pdf for *h* and **z**, and **z***<sup>i</sup>* can be *h*,α, and*b*. We have *p*(**y**, **z**) = *p*(**z <sup>y</sup>**)*p*(*y*). We assume *<sup>p</sup>*(**<sup>z</sup> <sup>y</sup>**) posterior independence among the hidden variables **z,** then *p*(**z <sup>y</sup>**) <sup>≈</sup> *<sup>p</sup>*(**z**), and *<sup>p</sup>*(**z**) is the product of PDF of *<sup>h</sup>*, <sup>α</sup>, and *<sup>b</sup>*.

In order to make use of the prior support information from the previous timeslot and the structure sparsity in Equation (4), we needed to make some modifications to the standard Bayesian inference. The main considerations for the modifications were as follows:

(I) Since we rewrote Equation (2) as Equation (4), if *hd <sup>a</sup>*,*<sup>i</sup>* was nonzero, then *hi* and *hi*+*<sup>N</sup>* were nonzero simultaneously. Hence it was wise to assume that *bi* and *bi*<sup>+</sup>*<sup>N</sup>* were the same;

(II) In the standard Bayesian learning *ai* and *bi* were set to be very small for non-informative hyperprior over α*i*. This assumption was valid if no prior information was provided. If the prior support information was available, such as that the support information of the previous timeslot could be used for channel estimation in the coming timeslot by sparsity correlation, it was wise to assume that the supports between adjacent timeslots were partially common. If the *i-th* element in the angular channel vector was nonzero, then the hyperparameter *bi* and *bi*<sup>+</sup>*<sup>N</sup>* tended to be variables rather than to be fixed small numbers, which meant only for the indices from the prior support set S the third layer prior model was adopted.

It can be seen that the consideration (II) was similar to [15]. However, our proposed algorithm was extended for a complex number system and the structure sparsity was considered. However, on the other hand, the overcomplete dictionary was adopted in our algorithm.

The proposed uplink-aided downlink channel estimation based on Bayesian inference was as follows:

(i) Update of p(*h*)

According to Equation (8), by ignoring the terms which are independent of *h*, we have

$$\begin{split} \ln p(\overline{\mathbf{h}}) &\quad \lhd \mathbb{E}\_{\alpha, \mathbf{b}} \Big[ \ln p(\overline{\mathbf{y}}|\overline{\mathbf{h}}) + \ln p(\overline{\mathbf{h}}|\alpha) + \ln p(\mathbf{b}) \Big] \\ &\quad \lhd \mathbb{E}\_{\alpha, \mathbf{b}} \Big[ \ln p(\overline{\mathbf{y}}|\overline{\mathbf{h}}) + \ln p(\overline{\mathbf{h}}|\alpha) \Big] \\ &= \frac{-1}{2\sigma^{2}} (\overline{\mathbf{y}} - \overline{\mathbf{A}\mathbf{h}})^{T} (\overline{\mathbf{y}} - \overline{\mathbf{A}\mathbf{h}}) - \frac{1}{2} \overline{\mathbf{h}}^{T} \mathbf{A} \overline{\mathbf{h}} \end{split} \tag{9}$$

where **Λ** = diag Eα[α*i*] , σ<sup>2</sup> is the noise variance in the system model, the vectors **b** and α are comprised by *bi* and α*<sup>i</sup>* respectively. Since *p*(**y <sup>h</sup>**) and *<sup>p</sup> h* α are a Gaussian distribution, then *p h* follows a Gaussian distribution with the mean μ and covariance φ given by

$$
\mu = \frac{1}{\sigma^2} \boldsymbol{\Phi} \overline{\mathbf{A}}^T \overline{\mathbf{y}} \tag{10}
$$

$$\boldsymbol{\Phi} = \left(\frac{1}{\sigma^2} \overline{\mathbf{A}}^T \overline{\mathbf{A}} + \boldsymbol{\Lambda}\right) \tag{11}$$

(ii) Update of p(α)

According to Equation (8), by ignoring the terms which are independent of α, we have

ln *<sup>p</sup>*(α) <sup>∝</sup> <sup>E</sup>**h**,**<sup>b</sup>** ln *p*(**y <sup>h</sup>**) + ln *<sup>p</sup>*(**<sup>h</sup>** <sup>α</sup>) + ln *<sup>p</sup>*(α|**a**, **<sup>b</sup>**) <sup>+</sup> ln *<sup>p</sup>*(**b**) <sup>∝</sup> <sup>E</sup>**h**,**<sup>b</sup>** ln *p*(**h** <sup>α</sup>) + ln *<sup>p</sup>*(α|**a**, **<sup>b</sup>**) = 2 *N i*=1 E*h*,*b* (*ai* − 0.5)ln α*<sup>i</sup>* − 0.5*h* 2 *<sup>i</sup>* + *bi* α*i* = *i*∈{*S*,*S*+*N*} E*h*,*b* (*ai* + 0.5 − 1)ln α*<sup>i</sup>* − 0.5*h* 2 *<sup>i</sup>* + *bi* α*i* + *i*∈{*S*,*S*+*N*} *c* E*h*,*b* (*a* − 0.5)ln α*<sup>i</sup>* − 0.5*h* 2 *<sup>i</sup>* + *bi* α*i* = *i*∈{*S*,*S*+*N*} ⎧ ⎪⎪⎨ ⎪⎪⎩ (*ai* + 0.5 − 1)ln α*<sup>i</sup>* − ⎛ ⎜⎜⎜⎜⎝ E*h*,*b* (*bi*+*bi*+*N*) <sup>2</sup> <sup>+</sup> <sup>E</sup>*h*,*<sup>b</sup>* (*h* 2 *<sup>i</sup>* +*h* 2 *<sup>i</sup>*+*N*) 4 ⎞ ⎟⎟⎟⎟⎠α*i* ⎫ ⎪⎪⎬ ⎪⎪⎭ + *i*∈{*S*,*S*+*N*} *c* ⎧ ⎪⎪⎨ ⎪⎪⎩ (*ai* + 0.5 − 1)ln α*<sup>i</sup>* − ⎛ ⎜⎜⎜⎜⎝ *bi* <sup>+</sup> <sup>E</sup>*h*,*<sup>b</sup>* (*h* 2 *<sup>i</sup>* +*h* 2 *<sup>i</sup>*+*N*) 4 ⎞ ⎟⎟⎟⎟⎠α*i* ⎫ ⎪⎪⎬ ⎪⎪⎭ (12)

where *S* is the estimated support set from the previous timeslot. Since the complex system model was converted in Equation (4). By (II), *S*+*<sup>N</sup>* - {*si* + *N*} was also the support set in the converted system model in Equation (4). For *<sup>i</sup>* <sup>∈</sup> *S*, *S*+*<sup>N</sup>* , *bi* is variable number, *bi* and *bi*+*<sup>N</sup>* were assumed to be the same, we used 0.5E*h*,*b*(*bi* <sup>+</sup> *bi*+N) to present <sup>E</sup>*h*,*b*(*bi*). The same assumption was applied to *hi* and *hi*+*<sup>N</sup>* with <sup>E</sup>**h**,**b**(*<sup>h</sup>* 2 *<sup>i</sup>* ) = 0.5E**h**,**b**(*<sup>h</sup>* 2 *<sup>i</sup>* + *h* 2 *<sup>i</sup>*+*N*). In this way the structural sparsity was utilized. 

Since *p*(α|*a*, *b*) is the Gamma distribution and *p h* α is the Gaussian distribution, *p*(α) is the Gamma distribution. Then *<sup>p</sup>*(α*i*) is also the Gamma distribution with the updated parameters'*ai* and'*bi* given by

$$
\widetilde{a\_i} = a\_i + 0.5\tag{13}
$$

$$\begin{aligned} \widetilde{b\_{i}} = \begin{cases} \frac{\mathbb{E}\_{\widetilde{\mathbf{h}},\mathbf{b}}(b\_{i} + b\_{i+N})}{2} + \frac{\mathbb{E}\_{\widetilde{\mathbf{h}},\mathbf{b}}(\widetilde{h}\_{i}^{2} + \widetilde{h}\_{i+N}^{2})}{4}, i \in \{\mathcal{S}, \mathcal{S}\_{+N}\} \\\ b\_{i} + \frac{\mathbb{E}\_{\widetilde{\mathbf{h}},\mathbf{b}}(\widetilde{h}\_{i}^{2} + \widetilde{h}\_{i+N}^{2})}{4}, i \in \{\mathcal{S}, \mathcal{S}\_{+N}\}^{c} \end{cases} \end{aligned} \tag{14}$$

(iii) Update of *p*(**b**{*S*,*S*+*N*})

According to Equation (8), by ignoring the terms which are independent of **b**, we have

$$\begin{split} \ln p(\mathbf{b}\_{\{S,S+\mathsf{M}\}}) &\quad \propto \mathbb{E}\_{\mathbf{a},\mathbf{\overline{h}}} \Big[ \ln p(\overline{\mathbf{y}}|\overline{\mathbf{h}}) + \ln p(\overline{\mathbf{h}}|\mathbf{a}) + \ln p(\mathbf{a}|\mathbf{a},\mathbf{b}) + \ln p(\mathbf{b}|\mathbf{c},\mathbf{d}) \Big] \\ &\quad \propto \mathbb{E}\_{\mathbf{a},\mathbf{\overline{h}}} [\ln p(\mathbf{a}|\mathbf{a},\mathbf{b}) + \ln p(\mathbf{b}|\mathbf{c},\mathbf{d})] \\ &= \sum\_{i \in \mathcal{S}, \mathbf{s}\_{+|\mathbf{i}|}} \{-b\_{i}\mathbb{E}\_{\mathbf{a}}(a\_{i}) + (c\_{i} - 1)\ln b\_{i} - d\_{i}b\_{i} \} \end{split} \tag{15}$$

where **<sup>b</sup>**{*S*,*S*+*N*} is comprised by the entries indicated by *S*, *S*+*<sup>N</sup>* in **b**. In (15) the α*, a, b, c, d* are also comprised by their indicated *S*, *S*+*<sup>N</sup>* , the subscript *S*, *S*+*<sup>N</sup>* is omitted for simplicity. As shown in Figure 1, **b**{*S*,*S*+*N*} was modelled as a Gamma distribution. Since *p*(α*i*|*ai*, *bi*) and *p*(*bi*|*ci*, *di*) were a Gamma distribution, *p*( **~ <sup>b</sup>**{*S*,*S*+*N*}) was Gamma('*bi*∈{*S*,*S*+*M*} '*ci*, *d* '*<sup>i</sup>* ), and the updated '*ci* and *d* '*<sup>i</sup>* were given by

$$
\widetilde{\mathfrak{c}\_i} = a\_i + \mathfrak{c}\_i \tag{16}
$$

$$
\overline{d\_i} = d\_i + \mathbb{E}\_{\mathfrak{a}}(\alpha\_i) \tag{17}
$$

'

Then the Bayesian inference for the channel estimation was executed iteratively among (i), (ii), and (iii). The details of the algorithm are summarized in step 3 of Algorithm 1. When the estimated channel vector **<sup>h</sup>** was recovered, we needed to convert it to the complex vector *<sup>h</sup><sup>d</sup> <sup>a</sup>* according to Equation (3).

**Algorithm 1** Downlink channel estimation with variational inference algorithm and overcomplete dictionary.

Input: **A**, **y**, σ<sup>2</sup> Output: **h**

	- 3.1. Initialize α, *a*, *b*, *c*, *d*.
	- 3.2. μ = <sup>1</sup> <sup>σ</sup><sup>2</sup> <sup>φ</sup>**AT y**, φ = ( <sup>1</sup> <sup>σ</sup><sup>2</sup> **AT <sup>A</sup>** <sup>+</sup> **<sup>Λ</sup>**), <sup>E</sup>**h**,**b**(**<sup>h</sup>** 2 <sup>i</sup> ) = μ<sup>2</sup> <sup>i</sup> <sup>+</sup> <sup>φ</sup>i,i, where **<sup>Λ</sup>** <sup>=</sup> diag Eα[α*i*] , μ*<sup>i</sup>* is the i-th entry in μ, and φi,i is the i-th diagonal entry in φ.
	- 3.3. Update '*ai* and'*bi* according to Equations (13) and (14) in (ii) ('*ai* and'*bi* are the updated *ai* and *bi*, and *ai* and *bi* are the results from last iteration); then according to the property of the Gamma distribution variable, Eα(α*i*) = '*ai*/ '*bi*.
	- 3.4. Update '*c* and *d* according to Equations (16) and (17) in (iii) ('*c* and *d* are the updated *c* and *d*, and *c* and *d* are the results from last iteration); then according to the property of the Gamma distribution variable, Eα('*bi*) = '*c*/*<sup>d</sup>* '.
	- 3.5. Go to step 3.2 until stop criteria meets.

'


In a practical massive MIMO system, the transmission environment may change suddenly, in this way the correlation of sparsity between adjacent timeslots will deteriorate, and the previous channel support information cannot be utilized. On the other hand, the error will accumulate if the previous channel support information is utilized timeslot by timeslot. Hence, the initialization is important for the robustness and efficiency of the algorithm. As shown in Figure 2 divided the timeslots into groups, and each group was comprised of several timeslots. During the channel estimation for each group, the VBI was used for the channel estimation in the first timeslot, and then the proposed algorithm was executed for the remaining timeslots in which the channel support information of the previous timeslot was made use of by the current timeslot. This procedure is detailed in steps 1, 2 and 4 in Algorithm 1.

**Figure 2.** Channel estimations by group. Each block represents one timeslot, and the block filled with grey is the timeslot with variational Bayesian inference (VBI) for the channel estimation, while the blank blocks are the timeslots with the proposed algorithm for channel estimation.

#### **4. Discussion**

#### *4.1. Sparsity Correlation Analysis*

The UT movement distance was very small when the velocity of UT was small and the timeslot was 0.5 ms. The reflector for the transmission was static during the UT moving between timeslots. The ellipse geometry channel model is shown in Figure 3. The line of sight (LoS) distance between BS and UT was *dLos*, the non-LoS (NLoS) distance by reflector between BS and UT was *dNLoS*, and the UT movement distance in one timeslot was *d*Δ. If the transmission path was still reflected by the same reflector as shown in Figure 3, the maximum and minimum NLoS distances from BS to UT between timeslots were *dNLoS* + *d*<sup>Δ</sup> and *dNLoS* − *d*Δ. The transmission angle change was Δθ. The distance between the reflector and BS was *d*1. By some mathematical manipulations shown in Appendix A, we got

$$
\Delta\_0 \approx \frac{2d\_\Lambda (d\_{\rm NLoS} - d\_1)}{2d\_1 d\_{\rm LoS}} \frac{1}{\sqrt{1 - \cos^2 \theta}} \tag{18}
$$

**Figure 3.** Ellipse geometry channel model for line of sight (LoS) and non-LoS (NLoS) transmission.

In order to illustrate the angle change Δ<sup>θ</sup> during one timeslot, we assumed that dNLoS was 800 m, the velocity of UT was 14.4 km/h, and the typical timeslot duration τ = 0.5 ms, then the movement distance of UT in one timeslot was 0.02 m. By changing the distance between BS and reflector, as shown in Figure 4, the angle change was not more than 0.025◦. It should be noted that when the LoS distance

and the *dNLoS* were fixed, BS and reflector distance could not be arbitrary vales due to triangle inequality. Hence, the angle of arrival or departure changed slowly and then there was sparsity correlation among the angular channels for adjacent timeslots.

**Figure 4.** Transmission angle change during one timeslot with a different LoS distance and different distances between the base station (BS) and reflector.

#### *4.2. Bayesian Cramér-Rao Bound Analysis*

In this section we have discussed the Bayesian Cramér–Rao bound (BCRB) for the channel estimation with the proposed algorithm. Let **z** - *h*, σ , then the BCRB for the channel vector **h** is given by the inverse of the Fisher information matrix **J** as:

$$\mathbf{J} = \mathbb{E}\_{\mathbf{z}} \left\{ -\frac{\partial^2 \log p(\overline{\mathbf{y}}, \mathbf{z})}{\partial z\_i \partial z\_j} \right\} \tag{19}$$

According to the system model in Section 2, *h*, σ are independent, the Fisher information matrix **J** is block diagonal. We can rewrite *p*(*y*, *z*) as

$$p(\overline{\mathbf{y}}, \mathbf{z}) = p(\overline{\mathbf{y}}|\mathbf{z})p(\overline{\mathbf{h}}|\mathbf{a})p(\mathbf{a}|\mathbf{b})p(\mathbf{b})\tag{20}$$

Then the BCRB on the MSE of the estimated channel vector **h** is given by

$$\mathbb{E}\left\{\left\|\overline{\mathbf{h}}^{\prime}-\overline{\mathbf{h}}\right\|^2\right\} \geq \mathrm{tr}\Big(\mathbf{J}\_{\overline{h}\_i\overline{h}\_j}^{-1}\Big)\tag{21}$$

where **J** *hi hj* = E**<sup>z</sup>** ( −∂<sup>2</sup> log *<sup>p</sup>*(**y**,**z**) ∂*hi*∂*hj* ) is the fisher information sub-matrix. Thus, we can obtain the Bayesian

Cramér–Rao bound of the minimum mean square error for the estimated channel **h** as shown in Proposition 1.

**Proposition 1.** *The BCRB of MSE for the channel estimation h is represented as*

$$\mathbb{E}\left\{\left\|\overline{\mathbf{h}}^{\prime} - \overline{\mathbf{h}}\right\|^2\right\} \ge tr\left(\text{diag}\left(\mathbb{E}\left(\frac{1}{a\_i}\right) + \frac{1}{\sigma^2} \overline{\mathbf{A}}^T \overline{\mathbf{A}}\right)^{-1}\right) = \sum\_{i \in \mathcal{S}} \frac{1}{\frac{1+\varsigma}{ad\_i} + \frac{\lambda\_i}{\sigma}} + \sum\_{i \notin \mathcal{S}} \frac{1}{\frac{b\_i}{a} + \frac{\lambda\_i}{\sigma}}\tag{22}$$

*where S is the diagnosed support set,* <sup>λ</sup>*<sup>i</sup> is the eigenvalues of <sup>A</sup><sup>T</sup> <sup>A</sup>, and <sup>A</sup><sup>T</sup> <sup>A</sup>* <sup>∈</sup> <sup>R</sup>2*N*×<sup>2</sup>*N, and a, bi, c, and di are the parameters in the Bayesian model in* Figure <sup>1</sup>*. When Td*, *<sup>M</sup>* → ∞ *and Td <sup>M</sup>* = β, according to the random matrix theory, we have

$$\begin{split} \mathbb{E}\left\{ \left( \widehat{\mathbf{h}} - \widehat{\mathbf{h}} \right)^{H} \left( \widehat{\mathbf{h}} - \widehat{\mathbf{h}} \right) \right\} &\quad \geq |\mathsf{S}| \cdot \frac{1}{|\mathsf{S}|} \sum\_{i \in \mathcal{S}} \frac{1}{\frac{1+\varepsilon}{a \min(\mathsf{d})} + \frac{\lambda\_{i}}{\sigma}} + \left( N - |\mathsf{S}| \right) \cdot \frac{1}{(N-|\mathsf{S}|)} \sum\_{i \notin \mathcal{S}} \frac{1}{\frac{\max(\mathsf{b})}{a} + \frac{\lambda\_{i}}{\sigma}} \\ &\to |\mathsf{S}| \frac{\min(\mathsf{d})}{1+\varepsilon} \Big( 1 - \frac{F(\operatorname{snr}\_{1}, \boldsymbol{\beta}}{4 \| \operatorname{snr} \boldsymbol{r}\_{1} \|} \Big) + \left( N - |\mathsf{S}| \right) \frac{\operatorname{d}}{\max(\mathsf{b})} \Big( 1 - \frac{F(\operatorname{snr}\_{2}, \boldsymbol{\beta})}{4 \| \operatorname{snr} \boldsymbol{r}\_{2} \|} \Big) \end{split} \tag{23}$$

*where snr*<sup>1</sup> <sup>=</sup> *amin*(*d*) (1+*c*)<sup>σ</sup> *, snr*<sup>2</sup> <sup>=</sup> *<sup>a</sup>* <sup>σ</sup>*max*(*b*)*, F*(*x*, *z*) = \* - *x* <sup>1</sup> <sup>+</sup> <sup>√</sup> *z* 2 + 1 − - *x* <sup>1</sup> <sup>−</sup> <sup>√</sup> *z* 2 + 1 +2 *, min*(*d*) *and max*(*b*) *are the minimum and maximum entries in d and b.*

The proof of proposition 1 is presented in Appendix B. From proposition 1, we can see that the MSE lower bound is related to the priori support size |*S*|, (1 + *c*)/min(**d**) and max(**b**) for the massive MIMO channel estimation.

#### **5. Simulations**

In the simulation, the support diagnosis algorithm in [6] was adopted, and we assumed that the transmission angle change between timeslots was within 1 degree. The pilot length was 50, and antenna number at the BS was 100. The channel was generated according to the spatial model as defined in 3GPP TR25.996. We compared our proposed algorithm with a unitary dictionary with a size of 100 and the overcomplete dictionary with a size of 150, 200, and 250, and compared this with a Bayesian sparse learning (SL) [16], weighted subspace pursuit (WSP) [6], weighted l1 minimization (W-l1 min) [5], weighted iteratively reweighted least square(W-IRLS), IRLS [17], compressive sampling matched pursuit (COSAMP) in [11], and l1 minimization (l1 min) [18].

In order to evaluate the channel estimation performance, we used a normalized mean-square error (MSE) between true and estimated channel vectors as follows:

$$MSE = \frac{1}{T} \sum\_{T} \frac{\left\| \hat{h}^d - h^d \right\|^2}{\left\| h^d \right\|^2} \tag{24}$$

where *<sup>T</sup>* is the number of trials, **<sup>h</sup>**<sup>ˆ</sup> *<sup>d</sup>* and **<sup>h</sup>***<sup>d</sup>* are the estimated and original channel vector, respectively for each trial. In the simulations the trial number *T* was 250.

In Figure 5 the overcomplete dictionary size was 150 in the proposed algorithm. It could be seen that when the unitary dictionary was used, our proposed algorithm outperformed WSP, COSAMP and IRLS, but was a little worse than W-l1 with a small gap. However, when the overcomplete dictionary was used, our proposed algorithm outperformed other algorithms, but almost had the same performance as SL with a little performance improvement which could be seen in the zoomed-in subfigure. The overcomplete dictionary in the proposed algorithm can dramatically improve the MSE performance due to the fact that there are more atoms in the overcomplete dictionary than in the unitary dictionary which can improve the sparsity in the angular channel; however, it doesn't mean that the larger the overcomplete dictionary size is, the better performance it has, which is shown in Figure 6.

**Figure 5.** Comparisons of channel estimation mean square error (MSE) for different algorithms.

**Figure 6.** Comparisons of channel estimation of MSE for the proposed algorithm with different dictionary sizes.

We compared the performance of the proposed algorithm with different dictionary sizes in Figure 6. It can be seen that in the high SNR region the performance improved when an overcomplete dictionary was used, but the MSE performance gain did not improve when increasing the dictionary size. For example, the algorithm with a dictionary size of 150 had a relatively better performance than with a dictionary size of 100. However, the performances with a dictionary size of 200 and 250 almost gave the same trends as that with a dictionary size of 150. This was because the larger dictionary would induce angel ambiguity because the correlation of atoms increased. Hence, in the practical engineering, the dictionary size is not recommended to be very large. A large dictionary size is computationally expensive and the benefit is limited. It also should be noted that in the low SNR region the MSE performance with a larger dictionary size did not always do better than those with a smaller dictionary size. For example, when the SNR was 0 dB, they hadsimilar performance. The reason was that in the low SNR region the estimated channel support of the previous timeslot was not accurate enough, and on the other hand larger dictionary size would have deteriorated the dictionary incoherence.

We compared the runtime and convergence performance of the proposed algorithm with a different dictionary size in Figure 7. The relative error was defined as the ratio of the difference of adjacent iteration results to the previous iteration result. It can be seen that the proposed algorithm with dictionary size 150 converged fast than with a dictionary size of 100. However, the improvement had its price, and the runtime for the proposed algorithm with dictionary size 150 was longer which meant that the computational complexity was higher with a larger dictionary size. Based on the simulation results shown in Figures 6 and 7, when the antenna at BS is 100, the dictionary size is recommended to be set at 150 or so to balance the performance improvement and computation complexity.

**Figure 7.** Comparisons of runtime and convergence performances of the proposed algorithm with orthogonal dictionary (size is 100) and overcomplete dictionary (size is 150).

#### **6. Conclusions**

In this paper we proposed a downlink channel estimation algorithm based on overcomplete dictionary and variational Bayesian inference. We converted the complex system model to a real model and exploited the correlation of angular channel sparsity in adjacent timeslots. In the algorithm we divided the timeslots into groups and made use of the channel support information of the previous timeslot to the channel estimation in the current timeslot within each group. The sparsity correlation and Bayesian Cramér–Rao bound for the MSE of channel estimation was analyzed. Compared with other recovery algorithms, such as WSP, IRLS, WIRLS, l1 min, W-l1 min and COSAMP, our proposed algorithm with overcomplete dictionary had a relatively better performance. Moderate overcomplete dictionary can improve the MSE performance of channel estimation to balance the computational complexity and performance gain.

**Author Contributions:** Conceptualization and methodology, W.L.; validation, X.W., S.P. and L.Z.; formal analysis, W.L.; writing—original draft preparation, W.L.; supervision, Y.W.

**Funding:** This work was supported in part by the National Science Foundation of China (No.61601509 and 61601334), the China Postdoctoral Science Foundation Grant (No.2016M603045 and 2018M632889) and the self-determined research funds of CCNU(CCNU18QN007) from the colleges basic research and operation of MOE.

**Acknowledgments:** Thank you to the reviewers for their valuable comments.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **Appendix A**

#### **Proof of angle change with UT movement.**

According to the cosine law, we have

$$\cos\theta = \frac{d\_1^2 + d\_{\rm LoS}^2 - \left(d\_{\rm NLoS} - d\_1\right)^2}{2d\_1 d\_{\rm LoS}},\tag{A1}$$

$$\cos(\theta \pm \Delta \rho) = \frac{d\_1^2 + d\_{\text{LoS}}^2 - \left(d\_{\text{NLoS}} \pm d\_\Lambda - d\_1\right)^2}{2d\_1 d\_{\text{LoS}}}.\tag{A2}$$

Then we can get

$$\theta \pm \Delta\_{\theta} = \arccos\left(\frac{d\_1^2 + d\_{\rm LoS}^2 - (d\_{\rm NLoS} - d\_1)^2 - d\_\Lambda^2 \mp 2d\_\Lambda(d\_{\rm NLoS} - d\_1)}{2d\_1 d\_{\rm LoS}}\right). \tag{A3}$$

Since *d*<sup>2</sup> <sup>Δ</sup> is very small compared with *d*<sup>1</sup> and *dLoS*, by the first-order approximation we have

$$\begin{split} \partial \pm \Delta\_{\partial} &\quad \approx \arccos\left(\frac{d\_{1}^{2} + d\_{\text{LoS}}^{2} - (d\_{\text{NLoS}} - d\_{1})^{2} \mp 2d\_{\text{A}}(d\_{\text{NLoS}} - d\_{1})}{2d\_{\text{I}}d\_{\text{LoS}}}\right) \\ &\approx \arccos\left(\frac{d\_{1}^{2} + d\_{\text{LoS}}^{2} - (d\_{\text{NLoS}} - d\_{1})^{2}}{2d\_{\text{I}}d\_{\text{LoS}}}\right) \pm \frac{2d\_{\text{A}}(d\_{\text{NLoS}} - d\_{1})}{2d\_{\text{I}}d\_{\text{LoS}}} \frac{1}{\sqrt{1 - \cos^{2}\theta}} \cdot \\ &= \partial \pm \frac{2d\_{\text{A}}(d\_{\text{NLoS}} - d\_{1})}{2d\_{\text{I}}d\_{\text{LoS}}} \frac{1}{\sqrt{1 - \cos^{2}\theta}} \end{split} \tag{A4}$$

Then we have

$$
\Delta\_{\theta} \approx \frac{2d\_{\Lambda}(d\_{\mathrm{NLoS}} - d\_1)}{2d\_1 d\_{\mathrm{LoS}}} \frac{1}{\sqrt{1 - \cos^2 \theta}}.\tag{A5}
$$

#### **Appendix B**

#### **Proof of Proposition 1.**

Let **z** - *h*, σ , the we have

$$\mathbb{E}\_{\mathbf{z}}\left\{ \left( \mathbf{z} - \hat{\mathbf{z}} \right) \left( \mathbf{z} - \hat{\mathbf{z}} \right)^{T} \right\} \geq \mathbf{J}^{-1}. \tag{A6}$$

Since *h*, σ are independent, the Fisher information matrix **J**is block diagonal, and can be presented as

$$\mathbf{J} = \begin{bmatrix} \mathbf{J}\_{\overline{\mathbf{h}}, \overline{\mathbf{h}}} & \mathbf{0} \\ \mathbf{0} & \mathbf{J}\_{\sigma, \sigma} \end{bmatrix}. \tag{A7}$$

Then the inverse of matrix **J** is

$$\mathbf{J}^{-1} = \begin{bmatrix} \mathbf{J}\_{\overline{\mathbf{h}}, \overline{\mathbf{h}}}^{-1} & \mathbf{0} \\ \mathbf{0} & \mathbf{J}\_{\sigma, \sigma}^{-1} \end{bmatrix} . \tag{A8}$$

Because *p*(*y*, *z*) = *p y z p*(*h* <sup>α</sup>)*p*(<sup>α</sup> *<sup>b</sup>*)*p*(*b*)*p*(σ), we have **J** = E**<sup>z</sup>** −∂<sup>2</sup> log *<sup>p</sup>*(**y**,**z**) ∂*zi*∂*zj* = E**<sup>z</sup>** <sup>−</sup> <sup>∂</sup><sup>2</sup> log *<sup>p</sup>*(**y**|**z**) ∂*zi*∂*zj* + E**<sup>z</sup>** ( <sup>−</sup> <sup>∂</sup><sup>2</sup> log *<sup>p</sup>*(**<sup>h</sup>** α) ∂*zi*∂*zj* ) + E**<sup>z</sup>** <sup>−</sup> <sup>∂</sup><sup>2</sup> log *<sup>p</sup>*(α|**b**) ∂*zi*∂*zj* + E**z** −∂<sup>2</sup> log *<sup>p</sup>*(**b**) ∂*zi*∂*zj* + E**<sup>z</sup>** −∂<sup>2</sup> log *<sup>p</sup>*(σ) ∂*zi*∂*zj* (A9)

Since we mainly focus on the MSE of *h*, we only need to analyze **J <sup>h</sup>**,**h**. We discuss the above formula part by part as follows:

$$\begin{array}{ll} \text{11)} \qquad \text{Let } \mathbf{J}\_{\overline{\mathbf{h}},\overline{\mathbf{h}}}(\overline{\mathbf{y}}) = \mathbb{E}\_{\mathbf{z}} \Big\langle -\frac{\partial^{2} \log p(\overline{\mathbf{y}}|\mathbf{z})}{\partial z\_{i} \partial \overline{z}\_{j}} \Big\rangle, \text{ according to the Bayesian model in Figure 1, we have} \\\\ p(\overline{\mathbf{y}}|\mathbf{z}) \sim Normal(\overline{\mathbf{y}}|\overline{\mathbf{A}}\mathbf{h}, \sigma \mathbf{I}) \text{then} \mathbf{J}\_{\overline{\mathbf{K}},\overline{\mathbf{K}}}(\overline{\mathbf{y}}) = \mathbb{E}\_{\mathbf{z}} \Big\langle \frac{\overline{\mathbf{A}}^{T} \overline{\mathbf{A}}}{\sigma} \rangle = \frac{\overline{\mathbf{A}}^{T} \overline{\mathbf{A}}}{\sigma}. \end{array}$$


Since the priori support set information is used in our proposed algorithm, a three-layer model is constructed for the elements belonging to the priori support set, and a two-layer model is used for the elements not belonging to the priori support set, so E*<sup>Z</sup>* 1 α*i* has different expressions for the two cases. E*Z* 1 α*i* in the two cases are discussed as follows:

1) When *i* belongs to the priori support set, according to the three-layer graph model we have

*<sup>p</sup>*(α) = <sup>2</sup>*<sup>N</sup> i*=1 *Gamma*(α*<sup>i</sup> <sup>a</sup>*, *bi*), (A10)

$$p(b\_i) = \text{Gamma}(b\_i | c, d\_i) = \Gamma(c)^{-1} d\_i^c b\_i^{c-1} e^{-d\_i b\_i}.\tag{A11}$$

Then we get

$$\begin{split} p(a\_i) &= \int\_0^\infty p(a\_i|b\_i)p(b\_i)db\_i \\ &= \int\_0^\infty \Gamma(a)^{-1}b\_i^a a\_i^{a-1} e^{-b\_i a\_i} \Gamma(c)^{-1} d\_i^c b\_i^{c-1} e^{-d\_i b\_i} db\_i \\ &= \Gamma(a)^{-1} \Gamma(c)^{-1} a\_i^{a-1} d\_i^c \frac{\Gamma(a+c)}{(a\_i+d\_i)^{a+c}} \end{split} \tag{A12}$$

Accordingly, we have

$$\begin{split} \mathbb{E}\left\{\frac{1}{a\_i}\right\} &= \int\_0^\infty \frac{1}{a\_i} \Gamma(a)^{-1} \Gamma(c)^{-1} \alpha\_i^{a-1} d\_i^c \frac{\Gamma(a+c)}{(a\_i+d\_i)^{a+c}} d\alpha\_i \\ &= \int\_0^\infty \frac{1}{\alpha\_i} \frac{\Gamma(a+c)}{\Gamma(a)\Gamma(c)} \left(\frac{a\_i}{d\_1}\right)^{a-1} \left(\frac{a\_i}{d\_1}+1\right)^{-a-c} d\frac{a\_i}{d\_1} \end{split} \tag{A13}$$

where <sup>Γ</sup>(*a*+*c*) Γ(*a*)Γ(*c*) α*<sup>i</sup> di a*−1 <sup>α</sup>*<sup>i</sup> di* + <sup>1</sup> −*a*−*<sup>c</sup>* satisfies the probability density function of Beta prime distribution. According to the properties of the Beta prime distribution, when −*a* < −1 < *c*, we have

$$\begin{split} \mathbb{E}\left\{ \left( \widehat{\mathbf{h}} - \widehat{\mathbf{h}} \right)^{H} \left( \widehat{\mathbf{h}} - \widehat{\mathbf{h}} \right) \right\} &\quad \geq |\mathsf{S}| \cdot \frac{1}{|\mathsf{S}|} \sum\_{i \in \mathcal{S}} \frac{1}{\frac{1 + \varepsilon}{a \min(\mathsf{d})} + \frac{\lambda\_{i}}{\sigma}} + (N - |\mathsf{S}|) \cdot \frac{1}{(N - |\mathsf{S}|)} \sum\_{i \notin \mathcal{S}} \frac{1}{\frac{\max(\mathsf{b})}{a} + \frac{\lambda\_{i}}{\sigma}} \\ &\to |\mathsf{S}| \frac{\min(\mathsf{d})}{1 + \varepsilon} \Big( 1 - \frac{F(s \le r\_{1}, \boldsymbol{\theta})}{4 \| s \le r\_{1} \|} \Big) + (N - |\mathsf{S}|) \frac{q}{\max(\mathsf{b})} \Big( 1 - \frac{F(s \le r\_{2}, \boldsymbol{\theta})}{4 \| s \le r\_{2} \|} \Big) \end{split} \tag{A14}$$

2) When *i* does not belong to the priori support set, according to the high-order moment properties for the general gamma distribution, we have

$$\mathbb{E}\left\{\frac{1}{\alpha\_i}\right\} = \frac{b\_i}{a}.\tag{A15}$$

Then in summary, we have

$$\mathbb{E}\left\{\left\|\overline{\mathbf{h}}^{'} - \overline{\mathbf{h}}\right\|^2\right\} \ge tr\left(\left(\text{diag}\left(\mathbb{E}\left(\frac{1}{a\_i}\right)\right) + \frac{1}{\sigma^2} \overline{\mathbf{A}}^T \overline{\mathbf{A}}\right)^{-1}\right) = \sum\_{i \in \mathcal{S}} \frac{1}{\frac{1+\mathfrak{c}}{ad\_i} + \frac{\lambda\_i}{\sigma}} + \sum\_{i \notin \mathcal{S}} \frac{1}{\frac{b\_i}{a} + \frac{\lambda\_i}{\sigma}},\tag{A16}$$

where *<sup>S</sup>* is the diagnosed support set, <sup>λ</sup>*<sup>i</sup>* is the eigenvalues of **<sup>A</sup>**<sup>T</sup> **<sup>A</sup>**,and **<sup>A</sup>**<sup>T</sup> **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>2M×2M.

When overcomplete dictionary is as *D<sup>d</sup>* = 1√ *N e* <sup>−</sup>*<sup>j</sup>* <sup>2</sup><sup>π</sup> *<sup>M</sup> kn n*,*k* , *k* ∈ {1, ··· , *M*}, *n* ∈ {1, ··· , *N*}, and **A** is Gaussian random matrix with each element is mean 0 and variance <sup>1</sup> *Td* , then **AD**<sup>d</sup> is complex Gaussian random matrix. Then **<sup>A</sup>** is Gaussian random matrix with mean 0 and variance <sup>ρ</sup>*<sup>d</sup>* .

2*Td* According to the random matrix theory, for *N* ×*K* dimensional random matrix **H** with each element is independent and is variable with mean 0 and variance 1/*N*, when *<sup>K</sup>*, *<sup>N</sup>* → ∞ and *<sup>K</sup> <sup>N</sup>* → β,then the empirical distribution of eigenvalues of **<sup>H</sup>TH** converges almost surely as *<sup>f</sup>*β(*x*) <sup>=</sup> <sup>1</sup> <sup>−</sup> <sup>1</sup> β + δ(*x*) + √ (*x*−*a*) +(*b*−*x*) + <sup>2</sup>πβ*<sup>x</sup>* , where (*x*) <sup>+</sup> = max(0, *<sup>x</sup>*), *<sup>a</sup>* = <sup>1</sup> <sup>−</sup> <sup>0</sup> β 2 , *<sup>b</sup>* = 1 + 0 β 2 .

Since **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>2*Td*×2*M*, and its element is Gaussian random variable with mean 0 and variance <sup>ρ</sup>*<sup>d</sup>* <sup>2</sup>*Td* By applying the above results for the empirical distribution of eigenvalues of **<sup>H</sup>TH**, when *Td*, *<sup>M</sup>* → ∞ and *Td <sup>M</sup>* <sup>=</sup> <sup>β</sup>, the empirical distribution of eigenvalues <sup>λ</sup> of **<sup>A</sup><sup>T</sup> A** converges almost surely as

$$f\_{\beta}(\lambda) = \left(1 - \frac{1}{\beta}\right)^{+} \delta(\lambda) + \frac{\sqrt{(\lambda - a\nu)^{+}(b\nu - \lambda)^{+}}}{2\pi\beta\lambda\sqrt{\rho^{d}}}\tag{A17}$$

.

where *a* = - ρ*d* <sup>1</sup> <sup>−</sup> <sup>0</sup> β 2 , *b* = - ρ*d* 1 + 0 β 2 . When *<sup>s</sup>*, *<sup>M</sup>* → ∞ and *<sup>s</sup> <sup>M</sup>* = μ, we have

$$\begin{split} \mathbb{E}\left\{ \left( \widehat{\mathbf{h}} - \widehat{\mathbf{h}} \right)^{H} \left( \widehat{\mathbf{h}} - \widehat{\mathbf{h}} \right) \right\} &\quad \geq |\mathsf{S}| \cdot \frac{1}{|\mathsf{S}|} \sum\_{i \in \mathsf{S}} \frac{1}{\frac{1+c}{2\min(\mathsf{s})} + \frac{\lambda\_{i}}{\sigma}} + \left( N - |\mathsf{S}| \right) \cdot \frac{1}{(N-|\mathsf{S}|)} \sum\_{i \notin \mathsf{S}} \frac{1}{\frac{\max(\mathsf{s})}{a} + \frac{\lambda\_{i}}{\sigma}} \\ &\quad \to |\mathsf{S}| \frac{\min(\mathsf{d})}{1+c} \Big( 1 - \frac{F(\kappa r r\_{1} \boldsymbol{\beta})}{4 \|\kappa r r\_{1} \|} \Big) + \left( N - |\mathsf{S}| \right) \frac{a}{\max(\mathsf{b})} \Big( 1 - \frac{F(\kappa r r\_{2} \boldsymbol{\beta})}{4 \|\kappa r r\_{2} \|} \Big)' \end{split} \tag{A18}$$

where *snr*<sup>1</sup> <sup>=</sup> *<sup>a</sup>*min(**d**) (1+*c*)<sup>σ</sup> , *snr*<sup>2</sup> <sup>=</sup> *<sup>a</sup>* <sup>σ</sup>max(**b**) and *<sup>F</sup>*(*x*, *<sup>z</sup>*) = \* - *x* <sup>1</sup> <sup>+</sup> <sup>√</sup> *z* 2 + 1 − - *x* <sup>1</sup> <sup>−</sup> <sup>√</sup> *z* 2 + 1 +2 Then the proofs are complete.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Multiple-Symbol Non-Coherent Detection for Differential QAM Modulation in Uplink Massive MIMO Systems**

#### **Hieu Trong Dao and Sunghwan Kim \***

School of Electrical Engineering, University of Ulsan, 93 Daehak-ro, Nam-gu, Ulsan 44610, Korea; hieu.dtvt10@gmail.com

**\*** Correspondence: sungkim@ulsan.ac.kr; Tel.: +82-52-259-1401

Received: 21 May 2019; Accepted: 17 June 2019; Published: 20 June 2019

**Abstract:** In this paper, we propose a novel multiple-symbol detector based on maximum likelihood metric for differential quadrature amplitude modulation in massive multiple-input multiple-output (MIMO) systems. While current research on differential modulation in massive MIMO has focused on two consecutive symbols, our proposed detector is based on multiple-symbol, which is larger than or equal to two. Moreover, we derive new distance based on the proposed detector. To encode and decode data, we apply existing look-up table algorithm using the proposed distance, which is known as optimum encoding algorithm for differential modulation. Simulation results show the improvement based on the bit-error-rate performance since the proposed detector and distance vary according to the channel statistic information.

**Keywords:** 5G wireless networks; massive MIMO; non-coherent detection; QAM

#### **1. Introduction**

Massive multiple-input multiple-output (MIMO) transmission technique has gained a lot of attention in recent decades [1–13], since it can achieve significant improvement in terms of the energy and spectral efficiency while using simple signal processing [1–4]. Massive MIMO systems prefer operating in the time division duplex (TDD) mode in which users must synchronously send mutually orthogonal pilot signals to the corresponding base station (BS) so that the BS can estimate the channels. This method uses the estimated channels to perform signal processing [5–13], so that pilot signals account for a significant part of the total coherent interval, which decreases the spectral efficiency. In addition, when the number of users is large, the orthogonal pilot set has to be reused in every cell, which leads to pilot contamination problems; this is considered as a performance bottleneck in massive MIMO systems.

The authors of [8] investigate the power allocation to improve the spectral efficiency, this require a large information exchange in backhaul of system between BSs or between BSs and users. Besides, the algorithm to optimize the power is quite complex. Some semi-blind and blind channel estimation methods in uplink massive MIMO have been proposed. In [9], the authors proposed an eigenvalue decomposition-based method to blindly estimate the uplink channel from the data signal. However, they assumed that the number of antennas was very large such that the channel vectors become mutually orthogonal. The authors of [10] derived a new channel estimator based on subspace projection. However, this channel estimation algorithm relies heavily on the eigenvalues of the channel matrix. Interestingly, the authors of [11] proposed an energy detection scheme in which data symbols could be detected without relying on estimated channels. The scheme used in [11] requires designing unique modulated signal constellation for each user in the system. Another promising technique that does not require estimated channels is differential modulation, but it has not received much

attention in the massive MIMO research field until now. In [12], differential quadrature amplitude modulation (DQAM) was proposed for massive MIMO systems. The modulation scheme used in [12] was based on the asymptotic behavior of the channel when the number of BS antennas goes to infinity; however, the authors of [12] did not show the detector when the number of BS antennas is finite. The authors of [13] generalized the QAM detector in [12] and proposed a new detector and non-coherent distance with better performance when the number of BS antennas is not very large. The differential encoding part of [13] was done via the look-up table algorithm used in [14,15]; this is known as the optimum encoding algorithm for differential modulation. Besides, both [12,13] can only detect two consecutive symbols at a time. Recently, the authors of [16] developed a new differential detector based on multiple-symbol differential detection (MSDD) and the generalized likelihood ratio test (GLRT) criterion. However, the authors of [16] only consider the case of M-ary Phase Shift Keying modulation (M-ary PSK).

In this paper, we propose a novel multiple-symbol detector for DQAM based on the maximum likelihood metric, which can detect more than two symbols at a time and varies following channel condition to adapt better with the change in environment. In addition, we propose a novel distance which can be used to encode and decode data by using the look-up table algorithm in [14,15] for DQAM encoding. Since the proposed scheme varies following the change in channel statistic information while the schemes in [12,13] are unchanged, they adapt better to the change of environment and show significantly better performance when compared to previous works [12,13].

#### **2. System Model and Previous Works**

Similar to [12,13], we also consider an uplink massive MIMO system consisting of a single-antenna user and a base station equipped with a large number of antennas *M*(*M* 1). We consider the TDD mode and the block fading model which are popular used in research on massive MIMO system [1,3,5,7,8], in which the channel is unchanged in one coherent interval *T*. The received signal vector at the *m*th BS antenna is modeled as [1]

$$\mathbf{y}\_m = \sqrt{\rho} h\_m \mathbf{x} + \mathbf{n}\_{m\_f} \tag{1}$$

where *ρ* presents the average signal-to-noise ratio (SNR); **x** = [*xt*, *xt*+1, ..., *xt*+*L*−1] *<sup>T</sup>* is the transmit signal vector with length *<sup>L</sup>*(*<sup>L</sup>* ≤ *<sup>T</sup>*) and *<sup>E</sup>*[||**x**||2] = *<sup>L</sup>*, where elements of **<sup>x</sup>** are taken from the conventional QAM constellation as shown in Figure 1a, **n***m* is the additive white Gaussian noise vector at the *m*th BS antenna whose entries follow *CN*(0, 1); and *hm* is the channel coefficient with *CN*(*μh*, *σ*<sup>2</sup> *<sup>h</sup>* ). An example of the block fading model and signal vector is illustrated as in Figure 1b.

**Figure 1.** Conventional 16-QAM constellation and illustration of block fading. (**a**) 16-QAM constellation; (**b**) an illustration of block fading length *T* = 7 and signal vector length *L* = 3.

For simplicity, we normalize the channel so that *μ*<sup>2</sup> *<sup>h</sup>* + *<sup>σ</sup>*<sup>2</sup> *<sup>h</sup>* = 1. Since Rayleigh and Rician fading models are very popular in evaluating system performance on both massive and regular MIMO system [17,18], we focus on these two models with *μ<sup>h</sup>* = *Kr* <sup>1</sup>+*Kr* and *σ<sup>h</sup>* =  1 <sup>1</sup>+*Kr* [12]. Thus, the channel vector from a user to a BS now can be modeled as

$$\mathbf{h} = \mu\_h \mathbf{h}\_{\text{LOS}} + \sigma\_h \mathbf{h}\_{\text{NLOS}\_{\text{V}}} \tag{2}$$

where *Kr* represents the Rician factor; In a special case, when *Kr* = 0, the channel becomes a Rayleigh fading channel. Additionally, **hLOS** = [1, exp−*j<sup>π</sup>* sin(*θ*), ..., exp−*jπ*(*M*−1) sin(*θ*)] *<sup>T</sup>* ∈ *<sup>C</sup>M*×<sup>1</sup> with the arrival angle *θ*; is the light-of-sight (LOS) component when the antenna spacing is a half of wavelength. **hNLOS** ∈ *<sup>C</sup>M*×<sup>1</sup> denotes the non-light-of-sight (NLOS) component whose elements follow i.i.d Gaussian variables with zero mean and unit variance.

In [12,13], the authors considered two consecutive *t*th and (*t* − 1)th instants, with channel vectors **h***t*; and **h***t*−1, and assumed **h***<sup>t</sup>* ≈ **h***t*−1. The received signal vector at the *t*th instant is given as

$$\mathbf{y}\_t = \sqrt{\rho} \mathbf{h}\_t \mathbf{x}\_t + \mathbf{n}\_t. \tag{3}$$

where *xt* is taken from a 16-DQAM constellation based on [19]. With a very large number of BS antennas *M*, they have

$$\begin{aligned} \lim\_{M \to \infty} \frac{1}{M} \mathbf{h}\_t^H \mathbf{h}\_{t-1} &= 1, \lim\_{M \to \infty} \frac{1}{M} \mathbf{n}\_t^H \mathbf{h}\_{t-1} = 0, \\\lim\_{M \to \infty} \frac{1}{M} \mathbf{h}\_t^H \mathbf{n}\_{t-1} &= 0, \lim\_{M \to \infty} \frac{1}{M} \mathbf{n}\_t^H \mathbf{n}\_{t-1} = 0, \end{aligned} \tag{4}$$

Eventually, the signal symbol at the *t*th instant can be detected as

$$\sigma\_t = \frac{1}{M} \mathbf{y}\_t^T \mathbf{y}\_{t-1}^\* = \rho \mathbf{x}\_t \mathbf{x}\_{t-1}^\* \text{ for very large } M. \tag{5}$$

in which *xtx*∗ *<sup>t</sup>*−<sup>1</sup> can be mapped back to the information symbol by the encoding rule of [19]. However, the authors of [12] did not propose a detector for when the number of BS antennas *M* is finite. The authors of [13] generalized the detector in [12] as ([13], Equation (6)), which can be applied for any value of *M*. After that, they proposed a new two consecutive-symbol detector based on the conditional probability and a new non-coherent distance as ([13], Equations (9) and (10)). The new distance in [13] is used for look-up table algorithm in [14,15] for differential encoding.

Particularly, the authors of [14] had already proved that any differential encoding techniques can be transform equivalently to a differential encoding via a look-up table. Using the algorithm in [14,15] to create look-up table for encoding and decoding 16-DQAM signal, a brief explanation of a look-up table is as follows. The readers should refer to [14,15] for the details of the algorithm.


Some properties of the look-up table are listed up below.


Moreover, this look-up table can be optimized by using algorithms proposed in [14,15]. In this paper, we propose new non-coherent detector, new non-coherent distance and apply the algorithm in [14,15] to generate the look-up table. We compare the performance of the proposed detector and distance to the existing detectors and distances proposed in [12,13]. Due to the limitation of length, we would like to skip the detail of the look-up table algorithm and refer interested readers to [14,15].

#### **3. New Differential Detector and Non-Coherent Distance**

Consider the received signal at *m*th BS antenna as in Equation (1), the conditional probability of the received signal vector **y***m*, given transmitted signal vector **x**, is calculated as

$$p(\mathbf{y}\_{\text{m}}|\mathbf{x}) = \frac{1}{2\pi \det(R\_{\text{y}})} \exp\left\{-\frac{1}{2} (\mathbf{y}\_{\text{m}} - \bar{\mathbf{y}}\_{\text{m}})^{H} R\_{\text{y}}^{-1} (\mathbf{y}\_{\text{m}} - \bar{\mathbf{y}}\_{\text{m}})\right\} \tag{6}$$

where **y¯** *m* is the mean of **y***m* and is given as

$$\mathfrak{F}\_{\mathfrak{m}} = E\{\mathbf{y}\_{\mathfrak{m}}\} = \sqrt{\rho} \mu\_{\mathfrak{h}} \mathbf{x}\_{\mathfrak{m}} \tag{7}$$

where det(*Ry*) is the determinant of *Ry*, and *Ry* is the covariance matrix of **y***m*, which can be calculated as

$$\begin{split} R\_{y} &= E\left\{ (\mathbf{y}\_{m} - \bar{\mathbf{y}}\_{m})(\mathbf{y}\_{m} - \bar{\mathbf{y}}\_{m})^{H} \right\} \\ &= E\left\{ \rho (h\_{m} - \mu\_{h}) \mathbf{x} \mathbf{x}^{H} (h\_{m} - \mu\_{h})^{H} + \sqrt{\rho} (h\_{m} - \mu\_{h}) \mathbf{x} \mathbf{n}\_{m}^{H} \\ &\quad + \sqrt{\rho} (h\_{m} - \mu\_{h})^{H} \mathbf{n}\_{m} \mathbf{x}^{H} + \mathbf{n}\_{m} \mathbf{n}\_{m}^{H} \right\} = (\rho \sigma\_{h}^{2} \mathbf{x} \mathbf{x}^{H} + I\_{N}). \end{split} \tag{8}$$

Since the proposed detector aims to maximize the summation of the conditional probability of received signal vector **y***<sup>m</sup>* at all BS antennas 1 ≤ *m* ≤ *M*, given transmitted signal vector **x**, the estimated signal vector **xˆ** can be calculated as

$$\hat{\mathbf{x}} = \arg\max\_{\mathbf{x}\in\mathcal{X}} \sum\_{m=1}^{M} p(\mathbf{y}\_m|\mathbf{x}),\tag{9}$$

where *χ* presents the vector space of all possible transmitted signal vectors **x**. Since the natural logarithm function is monotonically increasing, maximizing *p*(**y***m*|**x**) is equivalent to maximize ln *p*(**y***m*|**x**). Finally, the proposed detector is given as

$$\hat{\mathbf{x}} = \arg\max\_{\mathbf{x}\in\mathcal{X}} \left\{ \sum\_{m=1}^{M} - \left( \mathbf{y}\_{m} - \bar{\mathbf{y}}\_{m} \right)^{H} \mathbb{R}\_{y}^{-1} \left( \mathbf{y}\_{m} - \bar{\mathbf{y}}\_{m} \right) - \ln \left( \det(\mathbf{R}\_{\mathcal{Y}}) \right) \right\},\tag{10}$$

Specially, when the channel is Rayleigh fading, we have **y¯** *<sup>m</sup>* = 0, and the proposed detector becomes

$$\hat{\mathbf{x}} = \arg\max\_{\mathbf{x} \in \chi, \mathbf{x} \in Q\_{\mathbb{I}}} \left\{ \sum\_{m=1}^{M} -\mathbf{y}\_{m}^{H} R\_{y}^{-1} \mathbf{y}\_{m} - \ln(\det(R\_{\mathcal{Y}})) \right\}\_{\prime} \tag{11}$$

where *Q*<sup>1</sup> is the first quadrant. This means the number of decision values that need to be calculated is reduced by a factor of four, from *N<sup>L</sup>* to *<sup>N</sup><sup>L</sup>* <sup>4</sup> , with *N*-QAM. The reduction occurs since when **y¯** *<sup>m</sup>* = 0, for any codeword **x** = [*xt*, *xt*+1, ..., *xt*+*L*−1] *<sup>T</sup>* with the first symbol *xt* belongs to the first quadrant of the constellation, there are also three other codewords which have the first symbol *x <sup>t</sup>* belong to three other quadrants, that have the same estimated vector **xˆ** as **x**. In other words, we only need to calculate the decision values of codewords which have the first symbol *xt* belongs to the first quadrant of the constellation in case of Rayleigh fading. To calculate the proposed non-coherent distance, we propose the distance from **x**<sup>1</sup> to **x**<sup>2</sup> and the distance from **x**<sup>2</sup> to **x**<sup>1</sup> as in Equations (12) and (13) based on the proposed detector, Equation (10), as below:

$$d(\mathbf{x}\_1 \rightarrow \mathbf{x}\_2) = ||[-\rho(\mathbf{c}\_h \mathbf{x}\_1)^H \mathcal{R}\_{\mathbf{x}\_1}^{-1}(\mathbf{c}\_h \mathbf{x}\_1) - \ln(\det(\mathcal{R}\_{\mathbf{x}\_1}))] - [-\rho(\mathbf{c}\_h \mathbf{x}\_1)^H \mathcal{R}\_{\mathbf{x}\_2}^{-1}(\mathbf{c}\_h \mathbf{x}\_1) - \ln(\det(\mathcal{R}\_{\mathbf{x}\_2}))]||;\tag{12}$$

$$d(\mathbf{x}\_2 \rightarrow \mathbf{x}\_1) = ||[-\rho(\mathbf{c}\_h \mathbf{x}\_2)^H \mathcal{R}\_{x\_2}^{-1}(\mathbf{c}\_h \mathbf{x}\_2) - \ln(\det(\mathcal{R}\_{x\_2}))] - [-\rho(\mathbf{c}\_h \mathbf{x}\_2)^H \mathcal{R}\_{x\_1}^{-1}(\mathbf{c}\_h \mathbf{x}\_2) - \ln(\det(\mathcal{R}\_{x\_1}))]||.\tag{13}$$

*Rx*<sup>1</sup> , *Rx*<sup>2</sup> and*ch* in Equations (12) and (13) are calculated as

$$R\_{\mathbf{x}\_1} = \rho \sigma\_h^2 \mathbf{x}\_1 \mathbf{x}\_1^H + I\_L;\\ R\_{\mathbf{x}\_2} = \rho \sigma\_h^2 \mathbf{x}\_2 \mathbf{x}\_2^H + I\_L;\\ c\_h = \sigma\_h + \mu\_h;\tag{14}$$

The Equation (12) is the non-coherent distance from **x**<sup>1</sup> to **x**2, which is based on the assumption that we did send the codeword **x**<sup>1</sup> but the detector wrongly estimated that **x**<sup>2</sup> was sent. Inversely, the Equation (13) is the non-coherent distance from **x**<sup>2</sup> to **x**<sup>1</sup> in which, the **x**<sup>2</sup> was actually sent but the detector wrongly estimated that **x**<sup>1</sup> was sent. In other words, Equations (12) and (13) can be used by likelihood estimator as a distance between two codewords **x**<sup>1</sup> and **x**2. The larger the values of Equations (12) and (13) are, the less chance the detector wrongly estimates between **x**<sup>1</sup> and **x**2. Eventually, the proposed non-coherent distance is calculated as

$$d(\mathbf{x}\_1, \mathbf{x}\_2) = \min(d(\mathbf{x}\_1 \to \mathbf{x}\_2), d(\mathbf{x}\_2 \to \mathbf{x}\_1)). \tag{15}$$

In the differential encoding part, we apply the look-up table algorithm for DQAM as in [14,15] by using the proposed non-coherent distance in Equation (15).

The main contribution of these above steps and equations are summed up as follow.


#### **4. Numerical Results**

In simulations, we use the conventional 16-QAM constellation and apply the look-up table algorithm in [14,15] to differentially encode the information. Particularly, one 4-bit information symbol is encoded into two consecutive 16-QAM points; thus, the non-coherent distance as in Equation (15) calculated with the length of transmitted signal vectors **x**1, **x**<sup>2</sup> is 2. Finally, the look-up table for 16-DQAM has 16 rows presenting 16 different groups; each group contains 16 different vectors **x**, and all transmitted signal vectors in the same group correspond to the same information symbol. The look-up table for the proposed 16-DQAM scheme with Rician fading channel *Kr* = 1 and an

average SNR = −4 dB is given in Table 1 as an example. Notice that, in [14,15] after generating the look-up table, there is one more step that maps information bit to each group. This step adds a little more improvement in bit error rate performance since groups with small non-coherent distance are mapped to information bit symbols with a small difference in the number of bits. However, in this simulation, we focus on comparing the performance of detector and distance between our proposed ones and previous ones in [12,13] so that we skip this step and add the information bit symbols sequentially from first group to last group in the look-up table.


**Table 1.** Look-up table for 16-DQAM using proposed distance, *L* = 2.

Figure 2 illustrates the simulation results of the proposed 16-DQAM scheme with different lengths of estimated signal vector *L* = 2 and 3, as well as the 16-DQAM schemes in [12,13] where the number of BS antennas are *M* = 128 and 500, and the coherent length *T* = 7. Since the authors of [12] did not show the detector when *M* is finite, we suppose that the 16-DQAM scheme in [12] uses the generalized detector as in [13], Equation (6), and the corresponding non-coherent distance in [13]. As previously shown in [12,13], the 16-DQAM scheme of [12] shows an error floor when *M* is not very large. The proposed scheme significantly outperforms the schemes in [12,13] for both *M* = 128 and 500.

**Figure 2.** Performance comparison between the proposed 16-DQAM scheme and the previous works of References [12,13] under Rayleigh fading.

Since the schemes in [12,13] can only detect two consecutive symbols at a time, we simulate our proposed scheme when the length of signal vector is *L* = 2. We can clearly see that, with the same channel condition and signal vector's length, the proposed scheme outperforms the other schemes for *M* = 128 with nearly 3 dB when BER = 10−4. With not so large number of BS antenna *M* = 128, the significant improvement of the proposed scheme shows a huge potential that it can be deployed in real system. When M increases to 500, the BER performance increases much more further with nearly 5 dB at BER = 10−4, which also shows the advantages of massive MIMO with very large number of BS antennas.

Noticeably, the performance of the proposed scheme is improved significantly when the length of the estimated signal vector *L* increases, regardless of the value of *M*. The gain is nearly 1.5 dB at BER = 10−<sup>5</sup> when the length *L* increases from 2 to 3. When *M* = 500, the scheme of [13] only performs better than [12] at low BER (≤ <sup>10</sup>−4) while the performance of the proposed scheme is remarkably better than both [12,13]; the gain is approximately 3 dB at BER = 10−<sup>5</sup> in comparison with [13].

Figure 3 shows the simulation results for the aforementioned schemes under Rician fading with *T* = 7, *L* = 2, *M* = 128 and different values of the Rician factor *Kr* = 0, 1, and 10. Notice that when *Kr* = 0, the Rician channel becomes a Rayleigh channel. The error floor still happens when *Kr* = 1 in the case of [12]. However, when the LOS component of the Rician channel becomes stronger with *Kr* = 10, the error floor seems to disappear and the performance of [12] is improved much more than [13]; the performance gap is nearly 1 dB at BER = 10−5. The performances of scheme of [13] are nearly the same with different values of *Kr*, this is because the detector in [13] cancels out the channel coefficient between two consecutively received symbols. The performance of the proposed scheme is the best among three schemes. Even with *Kr* = 0 (i.e., Rayleigh fading), the performance of the proposed scheme remains better than the other schemes with *Kr* = 1 or 10. When *Kr* is increased from 0 to 1 and 10, the performance of the proposed scheme is improved significantly with gains 5.5 dB and 9.5 dB, respectively. In summary, we conclude that the proposed scheme shows much better performance in Rician channels than in Rayleigh channels.

**Figure 3.** Performance comparison between the proposed 16-DQAM scheme and the previous works of References [12,13] under Rician fading with different Rician factors.

#### **5. Conclusions**

In this paper, we propose a new detector and non-coherent distance for differential QAM modulation in massive MIMO systems. We also apply the well-known look-up table algorithm for DQAM encoding using the proposed non-coherent distance. The proposed detector can detect multiple symbols (≥ 2) at a time. The proposed scheme varies following the change in channel information statistics, allowing them to adapt better to the change in environment. Additionally, they can be applied in a wide class of channels with a not too large number of base station antennas. This paper focuses on massive MIMO system with single cell. Therefore, as a future work, it will be very attractive to investigate the performance and how to improve the proposed scheme in the multiple-cell environment where there is the presence of interference between users in nearby cells.

**Author Contributions:** All authors discussed the contents of the manuscript and contributed to its presentation. H.T.D. designed and implemented the proposed scheme, analyzed the simulation results and wrote the paper under the supervision of S.K.

**Funding:** This research was funded by the Research Program through the National Research Foundation of Korea (NRF-2016R1D1A1B03934653, NRF-2019R1A2C1005920).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Throughput Enhancement in Downlink MU-MIMO Using Multiple Dimensions**

#### **Jong-Gyu Ha, Jae-Hyun Ro and Hyoung-Kyu Song** *∗*

Department of Information and Communication Engineering, uT Communication Research Institute, Sejong University, Gunja-dong 98, Gwangjin-gu, Seoul 05006, Korea

**\*** Correspondence: songhk@sejong.ac.kr; Tel.: +82-2-3408-3890

Received: 13 May 2019; Accepted: 4 July 2019; Published: 5 July 2019

**Abstract:** This paper focuses on the throughput performance enhancement in the single cell multi-user MIMO (MU-MIMO) downlink system model. For better quality of service, this paper proposes the scheme that increases system throughput and improves the spectral efficiency. Specifically, the signal transmission and detection schemes are proposed by using multiple dimensions. At the transmitter side, two dimensions (power and space) are adopted at the same time. To achieve multiple access (MA), the space domain is exploited by using a block diagonalization (BD) precoding technique, and the power domain is exploited to transmit more data symbols. At the receiver, the signal detection structure corresponding to a transmitter is also proposed. In the simulation results, comparisons of throughput performance are presented in various aspects. As a result, the proposed scheme outperforms the conventional schemes using only one dimension in terms of throughput. This paper shows strong performance in MU-MIMO senarios by adopting multiple dimensions.

**Keywords:** multi-user MIMO; space division multiple access (SDMA); block diagonalization (BD); non-orthogonal multiple access (NOMA); broadcast channel

#### **1. Introduction**

In the future, in order to handle explosive data traffic, studies will probably aim to increase channel capacity and data rate in the overall wireless communication system [1]. Multiple-input multiple-output (MIMO) has been studied in wireless systems since it has dramatic gains in channel capacity [2]. Also, multiuser MIMO (MU-MIMO) has been studied widely as a potential for improving the overall throughput [3–5]. In downlink broadcasting (BC) channel, MU-MIMO is accomplished by multiuser beamforming that eliminates the multiuser interference (MUI) completely [6–9]. A number of users can be served by one base station (BS) simultaneously and the spectral efficiency can be increased. Therefore, the use of space-division multiple access (SMDA) in the downlink channel provides a considerable gain in system capacity. The sum rate of the MU-MIMO broadcast channel is achieved by dirty paper coding (DPC). However, the critical drawback of DPC is extreme high complexity to implement in practice [10]. Another promising technique in an MU-MIMO system is block diagonalization (BD). In this paper, the proposed scheme considers BD as a generalization of the channel inversion [11,12]. BD supports the multiple data stream with low complexity and approaches the sum capacity of DPC using user selection algorithms [13].

On the other hand, non-orthogonal multiple access (NOMA) is one of the most promising techniques for improving the overall spectral efficiency [14,15]. NOMA shares the same resources with multiple users by exploiting the domains. The well-known NOMA schemes can be divided into power-domain and code-domain NOMA. Power-domain NOMA multiplexing is achieved by different allocated power for users according to the channel conditions. Symbols are superposed to each user and receivers perform successive interference cancellation (SIC) [16]. Furthermore, MIMO-NOMA is another popular technique to increase sum capacity. In MIMO-NOMA work, the signal processing techniques are investigated and two main routes exist: single-cluster and multi-cluster MIMO-NOMA [17–19]. Using MIMO-NOMA, a full benefit of system capacity and user fairness is achieved by well-chosen power allocation, user clustering, and beamforming [20,21]. Most of these papers consider the user grouping and clustering. However, the proposed scheme fundamentally does not consider user grouping and clustering. Due to the fundamental difference, the proposed scheme has additional advantages. In this paper, the proposed signal transmission and detection scheme use superposition coding (SC) and SIC in conjunction with MU-MIMO to improve the system spectral efficiency. With the SDMA scheme, a number of data symbols can be served to each user by SC and SIC. The power and space domain are fully exploited at the same time and the overall system throughput can be improved in single cell MU-MIMO scenario. The main contributions of this paper are summarized as follows:


The remainder of this paper is organized as follows. Section 2 describes the problems of the existing scheme and explains the solution and the reasonability of the proposed scheme. Section 3 describes the overall system model and the proposed scheme. Also, this section presents the performance of the proposed scheme. Section 4 presents simulation results and compares with the conventional scheme under various conditions. Section 5 considers some extensions about the implementation of the proposed scheme. Finally, conclusion is provided in Section 6.

#### **2. Motivation**

The novelty of the proposed scheme is to use multiple dimensions as much as possible. Figure 1a,b represent examples of resource usage in NOMA and BD schemes using only one dimension, respectively. Figure 1c represents example of resource usage for the proposed scheme. In Figure 1a,b

which represent existing techniques, the two techniques cannot be simply combined since all existing schemes utilize each technique for multiple access. However, in the proposed scheme, the SC is used to separate each symbol by utilizing the power domain. In conclusion, it is possible to utilize both power and space domain at the same time and improve performance by changing usage in SC application. In addition, through proper combination, the proposed scheme compensates the shortcomings of existing schemes and has advantages. First, overhead of the BS is reduced. A dynamic user scheduling and grouping strategy needs the feedback information at the BS. The proposed scheme can reduce the feedback overhead significantly by not considering intra-beam interference. Second, proper fairness among users is assured. The interference is treated as noise to users and does not guarantee fairness among users. In the proposed scheme, a certain error probability is ensured for all users. Third, the complexity is reduced for each user. In the case of the existing NOMA scheme, it is necessary to decode even if it is not the user's own signal. However, in the proposed scheme, all the decoded symbols become the user's own data. Therefore, it is possible to reduce the complexity of the user in demodulating unnecessary data. As a result, the proposed scheme allows more flexibility in spreading user signals over the multiusers.

(b)

**Figure 1.** *Cont*.

**Figure 1.** Examples of resource usage in various dimensions (**a**) Example of resource usage for conventional NOMA; (**b**) Example of resource usage for conventional BD; (**c**) Example of resource usage for proposed scheme.

By adopting the multiple dimensions, the same resources can be shared at each domain. In this paper, both the power and space domain are exploited. Figure 2 shows an example of additional dimension usage. Four dimensions (frequency, time, power, space) are used and the same resources can be appropriately shared. Therefore, the more data symbols can be transmitted to more users.

**Figure 2.** The spectral efficiency using multiple dimensions.

This paper considers a downlink single-cell MU-MIMO system as shown in Figure 3. In the proposed scheme, multiple access is accomplished by SDMA and the power domain is exploited to transmit more data symbols to each user. As a result, the proposed scheme has significant potential to improve spectral efficiency and provide better wireless services to many users. Also, this paper offers advantages in various design issues to meet the requirements and characteristics of the system by using multiple dimensions.

**Figure 3.** The downlink single-cell MU-MIMO system.

#### **3. Proposed Scheme**

This section describes the proposed scheme from data transmission to signal detection. Using the proposed system, more data symbols can be transmitted and more users can be served simultaneously. First, the transmission model explains how two additional domains can be used to improve spectral efficiency. Second, the signal detection model explains how each signal is detected and reliability can be satisfied. Additionally, in the received SINR, some considerations for the proposed scheme are presented. Finally, the sum throughput between the proposed scheme and conventional schemes is compared. As a result, the throughput performance of the proposed scheme is superior to that of the conventional scheme.

#### *3.1. System Model*

This paper considers a downlink MU-MIMO broadcast system which consists of one BS and *K* users as shown in Figure 4. BS is equipped with *Nt* transmitting antennas and each user has *Nr* receiving antennas. The MIMO channel of each user is assumed to be flat fading, since frequency selective fading channel can be easily overcome by using orthogonal frequency division multiplexing (OFDM) modulation. The system model can be further extended to frequency selective fading MIMO channel considering all subcarriers. In this system model, the transmit signal for the *k*-th user can be denoted as follows,

$$\mathbf{x}\_k = \mathbf{W}\_k \mathbf{s}\_k. \tag{1}$$

The received signal at the *k*-th user is given by

$$\mathbf{y}\_k = \underbrace{\mathbf{H}\_k \mathbf{W}\_k \mathbf{s}\_k}\_{\text{desired signal}} + \underbrace{\sum\_{j=1, j \neq k}^{K} \mathbf{H}\_k \mathbf{W}\_j \mathbf{s}\_j}\_{\text{undresired signals}} + \mathbf{n}\_k, k = 1, \dots, K,\tag{2}$$

where *k* and *j* are user indices, **W***<sup>k</sup>* is *Nt* × *Nr* precoding matrix for user *k*, **s***<sup>k</sup>* is a *Nr* × 1 data symbol vector, **x***<sup>k</sup>* is a *Nt* × 1 precoded signal vector for the *k*-th user. **y***<sup>k</sup>* is a received signal vector for the *<sup>k</sup>*-th user and **<sup>n</sup>***<sup>k</sup>* is *Nr* × 1 zero-mean additive white Gaussian noise (AWGN) vector with variance *<sup>σ</sup>*2. In the Equation (2), the first term denotes signal in the intended direction (desired user **s***k*) and the second term denotes multi-user interference caused due to undesired signals (undesired users **s***j*)

**Figure 4.** The downlink MU-MIMO broadcasting model.

#### *3.2. Data Transmission Model*

To exploit both the power and space domain, BD and SC scheme are used. The overall transmission model is shown in Figure 5. First, SC is used for exploiting power domain. The transmit symbols are superposed with different powers in one signal. Therefore, the transmitter can transmit more data stream at the same time. In the existing NOMA scheme, SC and SIC are used to suppress the MUI by allocating the different powers to different users. However, the proposed scheme uses SC for separating the symbols in one superposed signal on the same user which BS transmits. In the proposed scheme, the transmit signals for the *i*-th receiving antenna can be written as follows,

$$s\_i = \sqrt{P\_1}\overline{s}\_{i,1} + \sqrt{P\_2}\overline{s}\_{i,2} + \dots + \sqrt{P\_N}\overline{s}\_{i,N\_\prime} \tag{3}$$

where *i* is a receiving antenna index, *N* is the number of symbols in one superposed signal. *s*˜ are the symbols in one superposed signal. The proposed scheme allocates optional power to symbols to detect each symbol, leading *P*<sup>1</sup> < *P*<sup>2</sup> < ··· < *PN*. Then, transmit signals for the *k*-th user are defined as follows,

$$\mathbf{s}\_{k} = \begin{bmatrix} s\_{1}(k) \\ s\_{2}(k) \\ \vdots \\ \vdots \\ s\_{N\_{r}}(k) \end{bmatrix} = \begin{bmatrix} \sqrt{P\_{1}}\overline{s}\_{1,1}(k) + \sqrt{P\_{2}}\overline{s}\_{1,2}(k) + \dots + \sqrt{P\_{N}}\overline{s}\_{1,N}(k) \\ \sqrt{P\_{1}}\overline{s}\_{2,1}(k) + \sqrt{P\_{2}}\overline{s}\_{2,2}(k) + \dots + \sqrt{P\_{N}}\overline{s}\_{2,N}(k) \\ \vdots \\ \sqrt{P\_{1}}\overline{s}\_{N\_{r},1}(k) + \sqrt{P\_{2}}\overline{s}\_{N\_{r},2}(k) + \dots + \sqrt{P\_{N}}\overline{s}\_{N\_{r},N}(k) \end{bmatrix} . \tag{4}$$

**Figure 5.** The proposed signal transmission model.

For exploiting space domain, the precoding matrix **W** should be designed in SDMA. In the scheme, BD beamforming is adopted to suppress the MUI since BD method shows a strong performance in terms of capacity and has a good flexibility. The objective of the BD method is to completely elminate the MUI by employing the precoding matrix **W**. Then, the precoding matrix design for MUI elimination can be defined as follows,

$$
\mathbf{\tilde{H}}\_k \mathbf{W}\_k = 0, k = 1, \dots, \mathbf{'}, \mathbf{K}. \tag{5}
$$

where **H˜** *<sup>k</sup>* is represented as the channel matrix for all users except for user *k*,

$$\tilde{\mathbf{H}}\_k = \left[ \mathbf{H}\_1^T \cdot \dots \cdot \mathbf{H}\_{k-1}^T \cdot \mathbf{H}\_{k+1}^T \cdot \dots \cdot \mathbf{H}\_K^T \right]^T. \tag{6}$$

With the help of singular value decomposition (SVD), the precoding matrix for eliminating the MUI is designed. SVD is used to decompose a matrix into matrices representing rotation and scaling. By applying the SVD, the **H˜** *<sup>k</sup>* is defined as follows

$$\mathbf{H}\_k = \mathbf{U}\_k \mathbf{A}\_k [\mathbf{V}\_k^{(1)} \ \mathbf{V}\_k^{(0)}]^H \tag{7}$$

where **<sup>Λ</sup>***<sup>k</sup>* is the diagonal matrix of which the diagonal elements are singular value of **H˜** *<sup>k</sup>*. **<sup>V</sup>**(0) *<sup>k</sup>* contains vectors of the zero singular values, **<sup>V</sup>**(1) *<sup>k</sup>* and contains vectors of the non-zero singular values. **<sup>V</sup>**(0) *<sup>k</sup>* is an orthogonal basis for the null space of **H˜** *<sup>k</sup>* and the required precoding matrix. As a result, intended user's channel is projected on the null space in order to have the transmission under the constraint of zero-interference (i.e.,**H˜** *<sup>k</sup>***V˜** (0) *<sup>k</sup>* = 0 , *k* = 1, ··· , *K*.)

The received signal of the *k*-th user after eliminating MUI is defined as follows,

$$\begin{aligned} \mathbf{y}\_k &= \mathbf{H}\_k \mathbf{W}\_k \mathbf{s}\_k + \mathbf{n}\_k = \mathbf{H}\_{eff,k} \mathbf{s}\_k + \mathbf{n}\_k \\ &= \mathbf{H}\_{eff,k} \begin{bmatrix} s\_1(k) \\ s\_2(k) \\ \vdots \\ s\_{N\_r}(k) \end{bmatrix} + \begin{bmatrix} n\_1(k) \\ n\_2(k) \\ n\_2(k) \\ \vdots \\ n\_{N\_r}(k) \end{bmatrix} \\ &= \mathbf{H}\_{eff,k} \begin{bmatrix} \sqrt{P\_1} \bar{s}\_{1,1}(k) + \sqrt{P\_2} \bar{s}\_{1,2}(k) + \dots + \sqrt{P\_N} \bar{s}\_{1,N}(k) \\ \sqrt{P\_1} \bar{s}\_{2,1}(k) + \sqrt{P\_2} \bar{s}\_{2,2}(k) + \dots + \sqrt{P\_N} \bar{s}\_{2,N}(k) \\ \vdots \\ \sqrt{P\_1} \bar{s}\_{N\_r,1}(k) + \sqrt{P\_2} \bar{s}\_{N\_r,2}(k) + \dots + \sqrt{P\_N} \bar{s}\_{N\_r,N}(k) \end{bmatrix} + \begin{bmatrix} n\_1(k) \\ n\_2(k) \\ \vdots \\ n\_r(k) \end{bmatrix}. \end{aligned} \tag{8}$$

where **H***eff* ,*<sup>k</sup>* denotes the effective channel of the *k*-th user. According to the Equation (8), the MUI is perfectly eliminated and the *k*-th user receives its own data. Finally, the users can be considered as point-to-point MIMO.

As a result, by exploiting both the spatial and power domains additionally, more data symbols can be transmitted in the same resource (frequency/time). In the proposed scheme, as the number of symbols at each superposed signal is increased, sum throughput at each user is linearly improved. Although the allocated power to each symbol is reduced, the total throughput is improved since the number of transmit symbols is increased. Unlike other techniques that use only one dimension, such as space or power domains, the proposed scheme can achieve significant gains in overall system throughput by using additional domains.

#### *3.3. Signal Detection Model*

The MUI is perfectly eliminated by using the precoding matrix. Since the *k*-th user receives its own data without MUI, the appropriate receiver structure for each user is similar to the point-to-point MIMO. In Equation (8), MIMO detection is performed. In MIMO detection algorithm, there exist linear and non-linear algorithms. In the proposed scheme, linear detection algorithms such as zero-forcing (ZF) and minimum mean squared error (MMSE) can be applied simply. If the ZF detection scheme is used, the filter matrix for the *k*-th user is as follows,

$$\mathbf{G}\_k = \mathbf{H}\_{eff,k}^H (\mathbf{H}\_{eff,k} \mathbf{H}\_{eff,k}^H)^{-1}. \tag{9}$$

The detected superposed signals of the *k*-th user using ZF MIMO detection can be represented as follows,

$$\begin{aligned} \mathbf{s}\_k &= \mathbf{G}\_k \mathbf{y}\_k = \mathbf{H}\_{\ell f, k}^H (\mathbf{H}\_{\ell f, k} \mathbf{H}\_{\ell f, k}^H)^{-1} (\mathbf{H}\_{\ell f, k} \mathbf{s}\_k + \mathbf{n}\_k) \\ &= \mathbf{H}\_{\ell f, k}^H \left( \mathbf{H}\_{\ell f, k}^H \right)^{-1} \left( \mathbf{H}\_{\ell f f, k} \right)^{-1} \mathbf{H}\_{\ell f f, k} \mathbf{s}\_k + \mathbf{H}\_{\ell f f, k}^H (\mathbf{H}\_{\ell f f, k} \mathbf{H}\_{\ell f f, k}^H)^{-1} \mathbf{n}\_k = \mathbf{s}\_k + \mathbf{G}\_k \mathbf{n}\_k. \end{aligned} \tag{10}$$

In the Equation (10), **s***<sup>k</sup>* can be detected since it satisfies **G***k***H***eff* ,*<sup>k</sup>* = **I**. However, if the linear detection algorithms are used, the bit error rate (BER) performance is too poor since the power of symbols in one superposed signal is low. Low performance detection techniques in terms of BER can cause a negative effect on performing SIC. Therefore, the non-linear detection algorithms can be applied such as maximum likelihood (ML), ordered successive interference cancellation (OSIC), decision feedback equalizer (DFE), QRD-M.

From Equation (10), the estimated superposed signals of the *k*-th user after performing MIMO detection can be reconstructed as follows,

$$\mathbf{s}\_{k} = \begin{bmatrix} \mathbb{s}\_{1}(k) \\ \mathbb{s}\_{2}(k) \\ \vdots \\ \mathbb{s}\_{N\_{r}}(k) \end{bmatrix} = \begin{bmatrix} \sqrt{P\_{1}}\mathbb{s}\_{1,1}(k) + \sqrt{P\_{2}}\mathbb{s}\_{1,2}(k) + \dots + \sqrt{P\_{N}}\mathbb{s}\_{1,N}(k) \\ \sqrt{P\_{1}}\mathbb{s}\_{2,1}(k) + \sqrt{P\_{2}}\mathbb{s}\_{2,2}(k) + \dots + \sqrt{P\_{N}}\mathbb{s}\_{2,N}(k) \\ \vdots \\ \sqrt{P\_{1}}\mathbb{s}\_{N\_{r},1}(k) + \sqrt{P\_{2}}\mathbb{s}\_{N,2}(k) + \dots + \sqrt{P\_{N}}\mathbb{s}\_{N,N}(k) \end{bmatrix},\tag{11}$$

whereˆ˜*s* are the symbols at each estimated superposed signal. In Equation (11), the superposed signal can be decoded by conducting SIC. Therefore, SIC is performed at each receiving antenna. In estimated superposed signal, the strong symbol, i.e., <sup>√</sup>*pNs*˜*Nr*,*<sup>N</sup>* is first decoded. The first decoded symbol can be represented as follows,

$$\mathbf{\hat{s}}\_{k}^{d} = \begin{bmatrix} \mathbf{\hat{s}}\_{1,N}^{d}(k) \\ \mathbf{\hat{s}}\_{2,N}^{d}(k) \\ \vdots \\ \mathbf{\hat{s}}\_{N\_{r},N}^{d}(k) \end{bmatrix} \tag{12}$$

The decoded symbol is then subtracted from the superposed signal.

$$\mathbf{s}\_{k} - \hat{\mathbf{s}}\_{k}^{d} = \begin{bmatrix} \sqrt{\mathsf{P}\_{\mathrm{i}}} \mathbf{s}\_{1,1}(k) + \sqrt{\mathsf{P}\_{\mathrm{2}}} \mathbf{s}\_{1,2}(k) + \dots + \sqrt{\mathsf{P}\_{\mathrm{N}}} \mathbf{s}\_{1,N}(k) \\\sqrt{\mathsf{P}\_{\mathrm{2}}} \mathbf{s}\_{1,1}(k) + \sqrt{\mathsf{P}\_{\mathrm{2}}} \mathbf{s}\_{2,2}(k) + \dots + \sqrt{\mathsf{P}\_{\mathrm{N}}} \mathbf{s}\_{2,N}(k) \\\vdots \\\sqrt{\mathsf{P}\_{\mathrm{N}}} \mathbf{s}\_{N,1}(k) + \sqrt{\mathsf{P}\_{\mathrm{2}}} \mathbf{s}\_{N,2}(k) + \dots + \sqrt{\mathsf{P}\_{\mathrm{N}}} \mathbf{s}\_{N,N}(k) \end{bmatrix} - \begin{bmatrix} \hat{\mathbf{s}}\_{1,N}^{d}(k) \\\ \mathbf{s}\_{2,N}^{d}(k) \\\vdots \\\hat{\mathbf{s}}\_{N,N}^{d}(k) \end{bmatrix} = \begin{bmatrix} \hat{\mathbf{s}}\_{1,N-1}^{d}(k) \\\ \mathbf{s}\_{2,N-1}^{d}(k) \\\ \vdots \\\ \hat{\mathbf{s}}\_{N,N-1}^{d}(k) \end{bmatrix}. \tag{13}$$

Finally, all the symbols in the superposed signal can be decoded by performing SIC. The receiver model is described in Figure 6.

**Figure 6.** The proposed signal detection model.

#### *3.4. Received SINR*

In this subsection, signal to interference plus noise ratio (SINR) for the symbols in a superposed signal is represented to consider the performance of proposed scheme. If it is assumed that the two symbols are superposed in one superposed signal, i.e., *si* <sup>=</sup> <sup>√</sup>*P*1*s*˜*i*,1 <sup>+</sup> <sup>√</sup>*P*2*s*˜*i*,1, the received signal at the *i*-th receiving antenna for each user after eliminating MUI can be defined as follows,

$$y\_i = \lambda\_i(\sqrt{P\_1}\tilde{s}\_{i,1} + \sqrt{P\_2}\tilde{s}\_{i,2}),\tag{14}$$

where *i* is received antenna index, *λ<sup>i</sup>* is a channel gain in one MIMO parallel channel. In this case, *P*<sup>1</sup> < *P*<sup>2</sup> subject to total power constraint. Therefore, *s*˜*i*,1 is a weak symbol and *s*˜*i*,2 is a strong symbol. Then the received SINR for the strong symbol is defined as follows,

$$\text{SINR}\_{\aleph\_2} = \frac{\lambda\_i P\_2}{\sigma^2 + \lambda\_i P\_1}. \tag{15}$$

The weak symbol can be decoded by employing SIC. If it assumed that perfect SIC decoding is conducted, the received SINR for weak symbol is defined as follows,

$$\text{SINR}\_{\mathbb{S}\_1} = \frac{\lambda\_i P\_1}{\sigma^2}. \tag{16}$$

In the proposed scheme, SINR shows two negative effects in terms of system throughput. First, the weak symbol acts as an interference for received SINR of the strong symbol. As shown in Equation (15), *λiP*<sup>2</sup> term is considered as noise. Second, strong symbols is likely be decoded incorrectly.If a strong symbol is not properly decoded, it can adversely affect the decoding of weak symbol. The negative effects for the system throughput are restated in the simulation result. Although there are some degradations in terms of throughput on exploiting the power domain, the system throughput is increased since the number of transmit symbols is linearly increased at high SNR.

#### **4. Simulation Results**

This section shows simulation results to demonstrate the throughput gain of the proposed scheme and compares the results with other conventional schemes. Simulation results also provide some considerations for the proposed scheme. The simulations are performed in seven multi-path Rayleigh fading and time-invariant channel model. OFDM overcomes the frequency selectivity of the wideband channel and multiple carriers enable the high rate data transmission [22,23]. The OFDM symbol is composed of 128 FFT size, four pilots and the 108 data subcarriers based on specification for IEEE 802.11n. The remaining 16 subcarriers are zero padding and OFDM symbol duration is 4 microseconds. All the simulation are simulated with quadrature phase-shift keying (QPSK) modulation (data bits per subcarrier is 2).

Figure 7a,b compare the sum throughput between the proposed schemes and the conventional schemes for the case of two users as a function of SNR. The throughput *T* is calculated as follows,

$$\begin{aligned} T &= N\_b \times (1 - E)^L \times \mathbf{K} \div T\_s \\ N\_b &= N\_s \times \mathbf{O} \times N\_r \times \mathbf{s}\_r \\ L &= N\_s \times \mathbf{O}\_r \end{aligned} \tag{17}$$

where *Nb* is the number of transmit data bits and *L* is the number of data bits in one OFDM symbol. *E* is the BER for each user and *Ns* is the number of data subcarriers. *O* is the number of data bits per subcarrier and *s* is the number of symbols in one superposed signal. *Ts* is the OFDM symbol duration. In the simulation, proposed scheme and conventional BD are the MIMO system that BS and each user have multiple antennas. On the other hand, the conventional NOMA has one antenna at the BS and each user since multiple access is accomplished by the power domain. The proposed scheme and BD scheme are simulated in the case of *Nt* = 4, *Nr* = 2, *K* = 2 (4, 2, 2) and NOMA scheme is *Nt* = 1, *Nr* = 1 *K* = 2 (1, 1, 2). And in the proposed scheme, the number of symbols at each superposed signal is two (*N* = 2). Therefore, the total number of transmit symbols to each user is 8. The power of each symbol is allocated at a ratio of 8:2 from the total power *Pt* = 1. In MIMO system, the ML MIMO detection is applied before conducting SIC. The summary of the simulation parameters is shown in the Table 1.

**Table 1.** Simulation Parameters.


In the simulations, there are two types of simulation results: with and without SIC error. Figure 7a,b show the simulation result with SIC error and Figure 7c,d show the simulation result without SIC error. Both cases of the proposed scheme outperform the conventional schemes in terms of maximum throughput since the proposed scheme exploits both the spatial and power dimensions. The conventional BD shows better performance than the proposed scheme in terms of BER. However, the proposed scheme has higher throughput since the proposed scheme transmits more symbols. In Figure 7b, the BER performance of the proposed scheme is better than that of NOMA since the effect of MIMO detection and SIC decoding is improved at high SNR. Figure 7c,d show the impact of SIC error on the proposed scheme. Without SIC error, throughput and BER performance are better than when there is SIC error. If the strong symbol is wrongly decoded at low SNR, the weak signal is also wrongly decoded. As a result, the error propagation occurs. Therefore, the system needs to be designed to avoid error propagation and reduce SIC error. As the proposed scheme minimizes the impact of SIC error, the performance of the proposed scheme approaches the case where there is no SIC error.

**Figure 7.** Throughput and BER performance of conventional and proposed schemes: (**a**) Throughput performance with SIC error; (**b**) BER performance with SIC error; (**c**) Throughput performance without SIC error; (**d**) BER performance without SIC error.

Figure 8a, b show the performance difference of the proposed scheme according to the number of users and antennas (*Nt*, *Nr*, *K*). The cases of (6, 3, 2) and (6, 2, 3) have the higher throughput performance than (4, 2, 2) case since (6, 3, 2) and (6, 2, 3) transmit more symbols by exploiting both space and power dimension. (6, 2, 3) case has lower throughput performance in low SNR than (6, 3, 2) case since (6, 3, 2) case allocates more symbols to each user. On the other hand, in case of (6, 2, 3), one more user can be serviced. In terms of BER, (6, 3, 2) has better BER performance than (6, 2, 3) since (6, 3, 2) transmits more symbols to each user than (6, 2, 3). (4, 2, 2) has better BER performance than (6, 2, 3) since more power is allocated to each symbol. However, (6, 2, 3) has higher throughput than (4, 2, 2) since (6, 2, 3) transmits more symbols.

**Figure 8.** Throughput and BER performance of proposed scheme according to the number of users and antennas: (**a**) Throughput performance; (**b**) BER performance.

Figure 9a, b show the difference of the performance between the cases of linear MIMO detection and non-linear MIMO detection. The method with ML detection outperforms the method with ZF detection. The ML technique has better detection performance than the ZF technique before performing SIC on each antenna. Additionally, the ML detection technique mitigates the error propagation compared to the ZF detection scheme.

As a result, the overall simulation results show better performance compared to conventional schemes by using two dimensions simultaneously. In the proposed scheme, by exploiting both power and space domains at the same time, the transmitted symbol is increased. The results show that the superiority of the proposed scheme, and the proposed scheme uses the dimensions appropriately.

**Figure 9.** Throughput and BER performance of ML detection and ZF detection: (**a**) Throughput performance; (**b**) BER performance.

#### **5. Implementation Issue**

This section presents some additional considerations for system application and limitation under practical constraints. In addition, this section gives some ideas for additional performance gain and presents some methods to reduce some negative effects in the proposed scheme.

#### *5.1. Complexity*

In the proposed scheme, the additional implementation complexity in typical NOMA is not needed since SIC is performed for the own user's data. However, even though the superposed signals are the user's own data, the symbols with small allocation power in superposed signal have poor BER performance. To solve this problem, a detection scheme with better performance should be used. However, a non-linear algorithm has high complexity. For this problem, a complexity-reduced detection algorithm can be considered. The main consideration is achieving higher throughput with lower complexity.

#### *5.2. SIC-Error Propagation*

SIC is often assumed to be successful with perfect decoding. However, for systems with actual modulation and coding, decoding error inevitably occurs, causing error propagation and remarkable performance degradation. As shown in the simulation results, there are performance differences with and without SIC errors. Ultimately, well-designed system for the proposed scheme which avoids error propagation and decoding error should be considered for optimum performance.

#### *5.3. Power Allocation*

The achievable throughput is affected by the transmit power allocation. If it is assumed that two symbols are superposed in one signal, we should consider how much power should be allocated for each symbol. Basically, allocating more power to strong symbols can reduce the error propagation. However, allocating more power to strong symbols increases the error probability of weak symbols since the power of a weak symbol is too low. As a result, power allocation to each symbol should be considered according to the number of the symbols at the superposed signal subject to total power constraint.

#### *5.4. Optimal Parameters*

In the proposed scheme, if the number of the symbols in a superposed signal is increased, the sum throughput can be increased linearly. However, the sum throughput cannot be increased without limit since there is a limitation that a receiver can detect the symbols. As more symbols are transmitted, the power allocated to the symbols is reduced. Therefore, if too many symbols are transmitted, the receiver can not detect each symbol. Furthermore, there is also the degradation of sum throughput because of the interference in performing SIC. Therefore, the important issue is to find a near-optimal parameter between the number of users and the number of symbols in a superposed signal in the overall system.

#### **6. Conclusions**

This paper suggests multi-dimensionality and a methodology to improve the throughput in an MU-MIMO system. This paper also presents the transceiver structure of the proposed scheme. As a result, the multiple dimensions (space and power) are exploited at the same time, and the overasll system spectral efficiency is improved. If more dimensions are used without degradation or with a little tradeoff in performance, the system throughput can be increased. Also, various system models can be implemented by using additional dimensions.

**Author Contributions:** J.-G.H. proposed throughput enchancement scheme for MU-MIMO downlink channel and processed the simulation; J.-H.R. analyzed the simulation results and made the figure; H.-K.S. reviewed the algorithm and provided the experimental materials for better computational simulations and revised critical errors of the manuscript

**Funding:** This research was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (No.2017-0-00217, Development of Immersive Signage Based on Variable Transparency and Multiple Layers) and was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-2018-0-01423) supervised by the IITP(Institute for Information & communications Technology Promotion).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **LOS-Based Equal Gain Transmission and Combining in General Frequency-Selective Ricean Massive MIMO Channels**

#### **Qiuna Yan 1,2, Yu Sun <sup>1</sup> and Dian-Wu Yue 1,\***


Received: 13 December 2018; Accepted: 7 January 2019; Published: 10 January 2019

**Abstract:** In general frequency-selective Ricean fading environments with doubly-ended spatial correlation, this paper investigates the spectral efficiency of a broadband massive multiple-input multiple-output (MIMO) system. In particular, in order to reduce overhead of channel estimation effectively, it proposes a scheme of equal gain transmission and combining, which is only based on line- of-sight (LOS) component and has low hardware complexity. With the scheme, several interesting transmit power scaling properties without and with spatial correlation are derived when the number of antennas at the transmitter or the number of antennas at the receiver grows in an unlimited way. Furthermore, the asymptotical rate analysis is extended to the cooperative relaying scenarios with decode-and-forward and amplify-and-forward protocols, respectively, and then two novel power scaling laws are given.

**Keywords:** massive MIMO; beamforming; line-of-sight; Ricean fading; frequency-selective; power scaling

#### **1. Introduction**

Recently, massive multiple-input multiple-output (MIMO) has attracted great interest in both academia and industry, and has been a promising solution to meet the demanding spectral efficiency requirement of 5G systems [1]. Its promising benefits includes significant increase of both spectral and energy efficiencies [2–4]. Interestingly, the two benefits of massive MIMO can be achieved by maximum-ration transmission/maximum-ration combining (MRT/MRC) or zero-forcing (ZF) precoding/detection [3,5].

With MRT/MRC and ZF linear processing, many scholars have given various asymptotic performance analyses. In particular, the power scaling law in the limit of the large number of antennas has been widely studied in order to quantify the power savings. For Rayleigh and Ricean fading environments, with MRC and ZF detectors, authors in [3,6] analyzed uplink massive MIMO system performance. If perfect channel state information (CSI) is available, they showed that, when the number of base station (BS) antennas grows large and the transmit power of each user is scaled down proportionally to it, the ergodic achievable rate can asymptotically be equal to a positive constant.

In order to obtain the needed CSI, channel estimation must be obviously carried on [7]. However, the channel estimation will result in not only heavy overhead but also pilot contamination in multi-cells, which will become a serious problem [2,8]. For a point to point massive MIMO system in Ricean fading, to reduce the heavy overhead to estimate the CSI, we investigated a scheme with equal gain transmission /equal gain combining (EGT/EGC), which is only based on the line-of-sight (LOS) component (or say specular component) and has low hardware complexity [9]. It was showed that, with this scheme, the ergodic achievable rate can converge to that of the corresponding MRT/MRC

based on the perfect CSI as the two numbers of antennas at the transmitter and receiver go to infinity. After that, we further considered the novel linear processing scheme for a downlink or uplink multiuser massive MIMO system [10,11] and showed that each user in the downlink or uplink system can have asymptotically the same rate as in the single-user case when the number of BS antennas goes without bound.

It should be pointed out that the above-mentioned results with the novel scheme have only considered uncorrelated Ricean frequency-flat fading channels without a relay [12]. Recently, we tried to develop our analysis to frequency-selective Ricean fading channels [13], but only for a very simple and special scenario [14]. The EGT/EGC linear transmission scheme is very attractive for massive MIMO systems since it enables low-complexity and inexpensive hardware [15–18]. Motivated by these facts, in this paper, we make use of a comparatively complicated and general frequency-selective Ricean fading channel model [14,19,20] to investigate further the LOS-based EGT/EGC scheme. We firstly derive several interesting power scaling properties for broadband massive MIMO systems with and without spatial correlation. In particular, it is shown that the ergodic achievable rate of LOS-based EGT/EGC scheme can have the same asymptotic value as the ergodic achievable rate of the whole CSI-based MRT/MRC scheme if the two numbers of transmit and receive antennas go without bound and with a fixed ratio. Then, we extend our asymptotical performance analysis to the cooperative relaying scenarios with decode-and-forward (DF) protocol and with amplify-and-forward (AF) protocol, respectively, and obtain two novel power scaling laws for the two scenarios. In particular, it is also shown that the ergodic achievable rate of LOS-based scheme can have the same asymptotic value as the ergodic achievable rate of the CSI-based scheme if the number of source antennas and the two numbers of transmit and receive relay antennas go without bound and with two fixed ratios, respectively.

The manuscript is organized as follows: in Section 2, the system model is introduced. In Section 3, the proposed LOS-based transmission scheme is presented and its power scaling law without correlation and with correlation is derived, respectively. Extension of our analysis to a cooperative relaying system is given in Section 4. In Section 5, the analysis results are verified by simulation. Finally, in Section 6, some concluding remarks are given.

*Notation:* boldface lower and upper case letters denote column vectors and matrices, respectively. The superscripts (·) † and (·) <sup>T</sup> stand for conjugate-transpose and transpose operations, respectively. The expectation operator is denoted by <sup>E</sup>{·}. *<sup>α</sup>* <sup>∼</sup> CN(0, *<sup>δ</sup>*2) stands for a circularly symmetric complex Gaussian variable *α* which has zero mean and variance *δ*2.

#### **2. System Model**

Since a set of parallel independent frequency flat MIMO channels can be used to describe a frequency selective MIMO channel, we start with introducing the frequency flat channels [14,19].

For a point-to-point MIMO system over frequency-flat channels, we assume that it has *N* transmit antennas and *M* receive antennas. Then, we can represent a *M* × 1 received signal vector as

$$\mathbf{y} = \sqrt{p}\mathbf{H}\_0 \mathbf{x} + \mathbf{z},\tag{1}$$

where **z** denotes the additive white Gaussian noise (AWGN) vector that has zero-mean and covariance matrix *<sup>σ</sup>*2**I***<sup>M</sup>* with **<sup>I</sup>***<sup>M</sup>* being the *<sup>M</sup>* × *<sup>M</sup>* identity matrix, **<sup>x</sup>** denotes the transmitted signal vector, **H**<sup>0</sup> = [*hmn*] *M*,*N <sup>m</sup>*,*n*=<sup>1</sup> stands for the *M* × *N* channel matrix whose element *hmn* denotes the channel gain between the *m*-th antenna at the receiver and the *n*-th antenna at the transmitter, and *p* is the average transmitted power. The channel matrix **H**<sup>0</sup> under Ricean fading consists of a LOS matrix and a scattered matrix, i.e.,

$$\mathbf{H}\_0 = \sqrt{\overline{\kappa\_0}} \mathbf{H}\_0 + \sqrt{\overline{\kappa\_0}} \breve{\mathbf{H}}\_{0\prime} \tag{2}$$

where *κ*¯0 = *<sup>κ</sup>*<sup>0</sup> 1+*κ*<sup>0</sup> , *κ*˜0 = <sup>1</sup> 1+*κ*<sup>0</sup> . Note that *κ*<sup>0</sup> > 0 represents the Ricean *K*-factor. The LOS matrix **H**<sup>0</sup> can be written as

$$
\overline{\mathbf{H}}\_0 = \mathbf{r}\_0 \mathbf{t}\_0^\mathsf{T}.\tag{3}
$$

Here, **r**<sup>0</sup> denotes the specular array response at the receiver and can be expressed as

$$\mathbf{r}\_0 = \begin{bmatrix} 1, e^{j2\pi d\_\ell \sin(\theta)}, \dots, e^{j2\pi (M-1)d\_\ell \sin(\theta)} \end{bmatrix} \mathbf{T}\_\prime \tag{4}$$

where *θ* is the angle of arrival of the LOS component and *dr* is the antenna spacing normalized by wavelength at the receiver. Similarly, **t**<sup>0</sup> denotes the specular array response at the transmitter and can be given by

$$\mathbf{t}\_0 = \begin{bmatrix} \mathbf{1}, e^{j2\pi d\_t \sin(\phi)}, \dots, e^{j2\pi (N-1)d\_t \sin(\phi)} \end{bmatrix} \mathbf{1} \tag{5}$$

where *φ* is the angle of departure of the LOS component and *dt* is the antenna spacing normalized by wavelength at the transmitter. The entries in the scattering matrix **<sup>H</sup>**<sup>6</sup> 0, [**H**<sup>6</sup> <sup>0</sup>]*mn* <sup>∼</sup> CN(0, 1), i.e., they are circular complex Gaussian random variables with zero mean and unit variance. Furthermore, we assume that they are independent and identically distributed (i.i.d).

Now, we are concerned with a broadband orthogonal frequency-division multiplexing (OFDM)-MIMO system with *K* subcarriers, where ideal OFDM transmission with proper cyclic prefix extension is assumed. For the *k*-th subcarrier, the input–output relationship is expressed as

$$\mathbf{y} = \sqrt{p}\mathbf{H}\mathbf{x} + \mathbf{z},\tag{6}$$

where **x** is just the normalized signal vector, **z** is the AWGN vector, and **H** is the channel matrix. The channel matrix can be given by as

$$\mathbf{H} = \sum\_{\ell=0}^{L-1} \rho\_{\ell} \mathbf{H}\_{\ell} \exp(-j2\pi \frac{k}{K} \ell),\tag{7}$$

where *<sup>L</sup>* represents the channel delay spread, {*ρ*<sup>2</sup> -} is the power delay profile satisfying <sup>∑</sup>*L*−<sup>1</sup> -<sup>=</sup><sup>0</sup> *<sup>ρ</sup>*<sup>2</sup> - = 1, and **H** stands for the channel matrix at time delay -. Furthermore, **H**-, - = 0, 1, ··· , *L* − 1 are mutually uncorrelated, Ricean distributed, and can be expressed as in Label (2)

$$\mathbf{H}\_{\ell} = \sqrt{\mathbb{R}\_{\ell}} \mathbf{H}\_{\ell} + \sqrt{\mathbb{R}\_{\ell}} \mathbf{\hat{H}}\_{\ell}. \tag{8}$$

In particular, **H**- = **rt** T is just as in Label (3) and **<sup>H</sup>**<sup>6</sup> is also modeled as a random matrix consisting of i.i.d. elements.

#### **3. LOS-Based EGT/EGC and Power Scaling Laws**

#### *3.1. The Scenario without Correlation*

The scattered component of *k*-th subcarrier's channel matrix can be described as

$$
\tilde{\mathbf{H}} = \sum\_{\ell=0}^{L-1} \rho\_{\ell} \sqrt{\tilde{\mathbf{x}}\_{\ell}} \tilde{\mathbf{H}}\_{\ell} \exp(-j2\pi \frac{k}{K} \ell). \tag{9}
$$

Now, it is assumed that **<sup>H</sup>**<sup>6</sup> is not available, but the LOS component

$$\overline{\mathbf{H}} = \sum\_{\ell=0}^{L-1} \rho\_{\ell} \sqrt{\overline{\mathbf{x}\_{\ell}}} \overline{\mathbf{H}}\_{\ell} \exp \left( -j2\pi \frac{k}{K} \ell \right) \tag{10}$$

can be available. In what follows, by employing only **H**, we will present a linear processing scheme with EGT/EGC and then compare it with the MRT/MRC scheme based on the perfect CSI.

Since **<sup>H</sup>**<sup>6</sup> is not available, a couple of the normalized weighting vectors **<sup>w</sup>***<sup>t</sup>* and **<sup>w</sup>***<sup>r</sup>* can be chosen in such a way that the effective output signal-to-noise ratio (SNR) can become maximum. The largest eigenvalue of matrix **<sup>H</sup>**† **<sup>H</sup>** is now denoted by *<sup>λ</sup>*max(**H**† **H**) . Due to the fact that **w**† *<sup>r</sup>* **<sup>z</sup>***<sup>k</sup>* ∼ CN(0, *<sup>σ</sup>*2), the effective output signal-to-interference-plus-noise ratio (SINR) can be described as

$$\gamma\_{\rm S}^{(k)} = \frac{p|\mathbf{w}\_r^\dagger \mathbf{H} \mathbf{w}\_t^{\rm T}|^2}{p|\mathbf{w}\_r^\dagger \mathbf{H} \mathbf{w}\_t^{\rm Tt}|^2 + \sigma^2} = \frac{p\lambda\_{\rm max}(\mathbf{H}^\dagger \mathbf{H})}{p|\mathbf{w}\_r^\dagger \mathbf{H} \mathbf{w}\_t^{\rm Tt}|^2 + \sigma^2}. \tag{11}$$

We denote by *RS* the ergodic achievable rate of the LOS-based scheme. Then,

$$R\_S = \mathbb{E}\{\frac{1}{K}\sum\_{k=0}^{K-1} \log\_2(1 + \gamma\_S^{(k)})\} = \frac{1}{K}\sum\_{k=0}^{K-1} R\_S^{(k)}\,. \tag{12}$$

where *R*(*k*) *<sup>S</sup>* <sup>=</sup> <sup>E</sup>{log2(<sup>1</sup> <sup>+</sup> *<sup>γ</sup>*(*k*) *<sup>S</sup>* )}. We have the following results through the derivation.

**Lemma 1.** *Define <sup>λ</sup>*(*k*) max <sup>=</sup> *<sup>λ</sup>*max(**H**† **H**) *and κ*˜*<sup>S</sup>* = ∑*L*−<sup>1</sup> -<sup>=</sup><sup>0</sup> *<sup>ρ</sup>*<sup>2</sup> *κ*˜-. *Then,*

$$\log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{p\mathbb{R}\_S + \sigma^2}) \le R\_S^{(k)} \le \log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{\sigma^2}).\tag{13}$$

**Proof of Lemma 1.** Regarding the ergodic achievable rate of the *k*-th subcarrier, it is easy for us to derive its following lower bound with the help of the well-known Jensen's inequality:

$$R\_S^{(k)} \ge \log\_2(1 + \frac{1}{\mathbb{E}(1/\gamma\_S^{(k)})}) = \log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{p\sum\_{\ell=0}^{L-1} \rho\_\ell^2 \mathbb{E}\_\ell + \sigma^2}),\tag{14}$$

where E- <sup>=</sup> <sup>E</sup>|**w**† *<sup>r</sup>* **<sup>H</sup>**<sup>6</sup> **w**T† *t* | <sup>2</sup> = *κ*˜ for 0 ≤ -≤ *L* − 1. Thus,

$$R\_{\mathbb{C}}^k \ge \log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{p\mathbb{R}\_{\mathbb{S}} + \sigma^2}).\tag{15}$$

Moreover, we can obtain from Label (11) that

$$R\_S^{(k)} = \mathbb{E}\log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{p|\mathbf{w}\_r^\dagger \hat{\mathbf{H}} \mathbf{w}\_t^{\dagger \dagger}|^2 + \sigma^2}) \le \log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{\sigma^2}).\tag{16}$$

Thus, Lemma 1 holds.

**Lemma 2.** *Let κ*¯*<sup>U</sup>* = ∑*L*−<sup>1</sup> -<sup>=</sup><sup>0</sup> *<sup>ρ</sup>*<sup>2</sup> *κ*¯-. *and <sup>κ</sup>*¯*<sup>L</sup>* = max{*ρ*<sup>2</sup> *κ*¯-, 0 ≤ -≤ *L* − 1}. *Then,*

$$\mathbb{K}\_L \le \lim\_{MN \to \infty} \frac{\lambda\_{\text{max}}^{(k)}}{MN} \le \mathbb{K}\_{\text{II}}.\tag{17}$$

**Proof of Lemma 2.** For 0 ≤ - ≤ *L* − 1 and 0 ≤ *b* ≤ *L* − 1, we define *ϕ<sup>b</sup>* = 2*πdr*(sin(*θb*) − sin(*θ*-)), and then have that

$$\frac{\mathbf{H}\_{\ell}^{+}\mathbf{H}\_{b}}{MN} = \varrho\_{\ell b}\mathbf{U}\_{\ell b\prime} \tag{18}$$

where

$$\varrho\_{\ell b} = \frac{\mathbf{r}\_{\ell}^{\dagger} \mathbf{r}\_{b}}{M} = \begin{cases} 1, & \text{for } l = b; \\\frac{1 - e^{iM\varrho\_{lb}}}{M(1 - e^{i\hat{\varrho}\_{lb}})}, & \text{for } \ell \neq b \end{cases} \tag{19}$$

and

$$\mathbf{U}\_{\ell b} = \frac{\mathbf{t}\_{\ell}^{\mathrm{T}\dagger} \mathbf{t}\_{b}^{\mathrm{T}}}{N} = [u\_{\mathrm{at}}]\_{a,n=1}^{N,N} \tag{20}$$

with *uan* = <sup>1</sup> *<sup>N</sup> <sup>e</sup>*2*πdt*((*a*−1) sin(*θb*)−(*n*−1) sin(*θ*- )). Now, suppose that *<sup>M</sup>* <sup>≥</sup> *<sup>N</sup>*. Noting that lim*M*→<sup>∞</sup> *<sup>b</sup>* = 0 for - = *b* and tr(**U**--) = 1, we can obtain that

$$\lim\_{\lambda \text{MN} \to \infty} \frac{\lambda\_{\text{max}}(\overline{\mathbf{H}}^{\dagger} \overline{\mathbf{H}})}{MN} \le \sum\_{\ell=0}^{L-1} \rho\_{\ell}^{2} \overline{\kappa}\_{\ell} \lim\_{\substack{MN \to \infty \ \ell \le 1}} \lambda\_{\text{max}}(\frac{\overline{\mathbf{H}}\_{\ell}^{\dagger} \overline{\mathbf{H}}\_{\ell}}{MN}) = \overline{\kappa}\_{\ell I}.\tag{21}$$

Moreover, we also have

$$\lim\_{M \to \infty} \frac{\lambda\_{\text{max}}(\overline{\mathbf{H}}^{\dagger} \overline{\mathbf{H}})}{MN} \ge \kappa\_L \tag{22}$$

since, for any -, we can get when **w***<sup>t</sup>* = **<sup>t</sup>** <sup>√</sup>- *<sup>N</sup>* and **<sup>w</sup>***<sup>r</sup>* <sup>=</sup> **<sup>r</sup>** <sup>√</sup>- *M*

$$\lim\_{\lambda \text{MN} \to \infty} \frac{\lambda\_{\text{max}}(\mathbf{H}^{\dagger}\mathbf{H})}{MN} \ge \lim\_{\lambda \text{NN} \to \infty} \frac{|\mathbf{w}\_r^{\dagger}\mathbf{H}\mathbf{w}\_t^{\text{Tr}}|^2}{MN} = \bar{\kappa}\_\ell. \tag{23}$$

Therefore, Lemma 2 holds when *M* ≥ *N*. When *N* ≥ *M*, we can also similarly prove that Lemma 2 holds, based on the fact that *<sup>λ</sup>*max(**H**† **<sup>H</sup>**) = *<sup>λ</sup>*max(**HH**† ).

**Proposition 1.** *If E* = *MN p be fixed as MN* → ∞*, then we have*

$$\log\_2(1 + \frac{E\bar{\kappa}\_L}{\sigma^2}) \le \lim\_{MN \to \infty} R\_S \le \log\_2(1 + \frac{E\bar{\kappa}\_L}{\sigma^2}).\tag{24}$$

**Proof of Proposition 1.** If *E* = *MNp* is fixed when *MN* → ∞, we readily show that Proposition 1 holds by using Lemmas 1 and 2.

**Remark 1.** *This proposition gives the lower and upper bounds of the ergodic achievable rate of the LOS-based scheme. In the following special cases, we can obtain further the exact expressions of the ergodic achievable rate.*

**Corollary 1.** *When N* = 1*, we have, if E* = *Mp be fixed as M* → ∞,

$$\lim\_{M \to \infty} R\_S = \log\_2(1 + \frac{E\mathbb{1}\_{II}}{\sigma^2}). \tag{25}$$

*Similarly, when M* = 1*, we also have if E* = *N p be fixed as N* → ∞

$$\lim\_{N \to \infty} R\_S = \log\_2(1 + \frac{E \mathfrak{k}\_{II}}{\sigma^2}). \tag{26}$$

**Proof of Corollary 1.** When *<sup>M</sup>* <sup>=</sup> 1 or *<sup>N</sup>* <sup>=</sup> 1, due to the fact that lim*MN*→<sup>∞</sup> *<sup>λ</sup>*max(**H**† **H**) *MN* = *κ*¯*U*, it easily follows that lim*MN*→<sup>∞</sup> *RS* <sup>=</sup> log2(<sup>1</sup> <sup>+</sup> *<sup>E</sup>κ*¯*<sup>U</sup> <sup>σ</sup>*<sup>2</sup> ).

**Corollary 2.** *If E* = *MN p be fixed as M* → ∞ *and N* → ∞*, then we have*

$$\lim\_{M,N \to \infty} R\_S = \log\_2(1 + \frac{E\mathbb{1}\_L}{\sigma^2}).\tag{27}$$

**Proof of Corollary 2.** Without loss of generality, **H** can be rewritten as

$$\overline{\mathbf{H}} = \sum\_{\ell=0}^{L-1} \rho\_{\ell} \sqrt{\mathbf{\kappa}\_{\ell}} \mathbf{r}\_{\ell} \mathbf{t}\_{\ell}^{\mathbf{T}} \exp(-j2\pi \frac{k}{K} \ell),\tag{28}$$

where *ρ*<sup>0</sup> <sup>√</sup>*κ*¯0 <sup>≥</sup> *<sup>ρ</sup>*<sup>1</sup> <sup>√</sup>*κ*¯1 ≥···≥ *<sup>ρ</sup>L*−<sup>1</sup> <sup>√</sup>*κ*¯*L*−1. We can rewrite **<sup>H</sup>** in a matrix form as

$$\mathbf{H} = \mathbf{A}\_{\mathbf{r}} \mathbf{D} \mathbf{A}\_{\mathbf{t}'}^{\mathrm{T}} \tag{29}$$

where **D** is a *L* × *L* diagonal matrix, [**D**]*ll* = *ρ*- <sup>√</sup>*MNκ*¯-, and **A***r* and **A***t* are defined as follows:

$$\mathbf{A}\_{\mathbf{r}} = \frac{1}{\sqrt{M}} [\mathbf{r}\_0, \mathbf{r}\_1, \dots, \mathbf{r}\_{L-1}] \tag{30}$$

and

$$\mathbf{A}\_{l} = \frac{1}{\sqrt{N}} [\mathbf{t}\_{0}, \mathbf{t}\_{1} \exp(-j2\pi \frac{k}{K}), \dots, \mathbf{t}\_{L-1} \exp(-j2\pi \frac{k}{K}(L-1))].\tag{31}$$

Since both {**r**0,**r**1, ... ,**r***L*−1} and {**t**0, **t**1, ... , **t***L*−1} are orthogonal vector sets when *M* → ∞ and *N* → ∞ [21], **A***<sup>r</sup>* and **A***<sup>t</sup>* are asymptotically unitary matrices. For matrix **H**, thus we can form a singular value decomposition (SVD) as follows

$$\overrightarrow{\mathbf{H}} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^{\dagger} = [\mathbf{A}\_{r}|\mathbf{A}\_{r}^{\perp}]\boldsymbol{\Sigma}[\mathbf{A}\_{t}|\mathbf{A}\_{t}^{\perp}]^{\mathsf{T}},\tag{32}$$

where **Σ** is a diagonal matrix including all singular values on its diagonal, i.e.,

$$[\boldsymbol{\Sigma}]\_{ll} = \begin{cases} \ \rho\_{\ell} \sqrt{\boldsymbol{M} \boldsymbol{N} \boldsymbol{k}\_{\ell}} & \text{for } 0 \le l \le L - 1, \\\ 0, & \text{for } l > L - 1. \end{cases} \tag{33}$$

Then,

$$\lim\_{\lambda,M \to \infty} \frac{\lambda\_{\text{max}}^{(k)}}{MN} = \rho\_0^2 \bar{\kappa}\_0 = \bar{\kappa}\_L. \tag{34}$$

Thus, we finally obtain the desired result.

On the other hand, suppose that the perfect CSI is known, i.e., both of the LOS and scattered components are available at the transmitter and the receiver. Then, the weighting vectors **w***<sup>t</sup>* and **w***<sup>r</sup>* should be chosen in such a way that the exact output SNR is maximized. Thus, the resulting output SNR can be written as [11]

$$
\gamma\_p^{(k)} = \frac{p}{\sigma^2} \lambda\_{\text{max}}(\mathbf{H}^\dagger \mathbf{H}),
\tag{35}
$$

where *λ*max(**H**†**H**) stands for the largest eigenvalue of **H**†**H**. For the MRT/MRC scheme based on the perfect CSI, let *RP* represent its ergodic achievable rate, i.e., *RP* <sup>=</sup> <sup>E</sup>{ <sup>1</sup> *<sup>K</sup>* <sup>∑</sup>*K*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> log2(<sup>1</sup> <sup>+</sup> *<sup>γ</sup>*(*k*) *<sup>P</sup>* )}. Now, we obtain the following power scaling law.

**Proposition 2.** *When M* → ∞ *and N* → ∞*, suppose that E* = *MNp is fixed and N*/*M* → *μ . We have that*

$$\lim\_{M,N \to \infty} R\_P = \lim\_{M,N \to \infty} R\_S. \tag{36}$$

## **Proof of Proposition 2.** Due to the fact **<sup>H</sup>** <sup>=</sup> **<sup>H</sup>** <sup>+</sup> **<sup>H</sup>**<sup>6</sup> , we can have

$$\frac{1}{M}[\mathbf{H}^{\dagger}\mathbf{H}] = \frac{1}{M}[(\overline{\mathbf{H}} + \widetilde{\mathbf{H}})^{\dagger}(\overline{\mathbf{H}} + \widetilde{\mathbf{H}})]$$

$$= \frac{\overline{\mathbf{H}}^{\dagger}\overline{\mathbf{H}}}{M} + \frac{\overline{\mathbf{H}}^{\dagger}\widetilde{\mathbf{H}}}{M} + \frac{\widetilde{\mathbf{H}}^{\dagger}\widetilde{\mathbf{H}}}{M} + \frac{\widetilde{\mathbf{H}}^{\dagger}\widetilde{\mathbf{H}}}{M}.\tag{37}$$

If we let

$$\mathbf{G} = \frac{\mathbf{H}^{\dagger}\mathbf{H}}{M} = [\mathbf{g}\_{\mathrm{uv}}]\_{\boldsymbol{u},\boldsymbol{v}=\mathbf{1}'}^{N,N} \tag{38}$$

it follows that

$$g\_{uv} = \frac{1}{M} \sum\_{k=1}^{M} [\overline{\mathbf{H}}^{\dagger}]\_{uk} [\hat{\mathbf{H}}]\_{kv}. \tag{39}$$

With the aid of (9) and (10), we can have that <sup>|</sup>[**H**† ]*uk*| <sup>2</sup> <sup>≤</sup> *<sup>κ</sup>*¯*<sup>S</sup>* <sup>≤</sup> 1, and [**H**<sup>6</sup> ]*kv* <sup>∼</sup> CN(0, *<sup>δ</sup>*2) with *<sup>δ</sup>*<sup>2</sup> <sup>=</sup> *<sup>κ</sup>*˜*<sup>S</sup>* <sup>≤</sup> 1. As [**H**<sup>6</sup> ]*kv* , 1 <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>M</sup>* are independent each other, we know that

$$\mathcal{g}\_{\rm uv} \sim \mathfrak{CM}(0, \sigma\_{\rm uv}^2), \quad \sigma\_{\rm uv}^2 \le \frac{1}{M}. \tag{40}$$

Thus, it can follow that, when *M* → ∞, **G** → **Q**, where **Q** denotes a matrix with all zero elements. Similarly, we can have that, if *M* → ∞,

$$\mathbf{G}^{\dagger} = \frac{\hat{\mathbf{H}}^{\dagger}\mathbf{H}}{M} \to \mathbf{Q}.\tag{41}$$

Now, *M* is assumed to be large enough. Then, we certainly have that

$$\begin{split} \lambda\_{\text{max}}(\frac{\mathbf{H}^{\dagger}\mathbf{H}}{M}) &= \quad \lambda\_{\text{max}}(\frac{\mathbf{H}^{\dagger}\mathbf{H}}{M} + \frac{\tilde{\mathbf{H}}^{\dagger}\tilde{\mathbf{H}}}{M}) \\ &\leq \quad \lambda\_{\text{max}}(\frac{\mathbf{H}^{\dagger}\mathbf{H}}{M}) + \lambda\_{\text{max}}(\frac{\tilde{\mathbf{H}}^{\dagger}\tilde{\mathbf{H}}}{M}). \end{split} \tag{42}$$

When *M* → ∞, suppose that *N*/*M* → *μ*. Then, we easily derive from ([22], Theorem 2.37), only noting that [*H*6]*mn* <sup>∼</sup> CN(0, *<sup>κ</sup>*˜*S*)

$$
\lambda\_{\text{max}}(\frac{1}{M}\tilde{\mathbf{H}}^{\dagger}\tilde{\mathbf{H}}) \to \mathbb{R}\_{\mathbb{S}}(1+\sqrt{\mu})^2. \tag{43}
$$

Thus, we further get

$$
\lambda\_{\text{max}}(\frac{1}{MN}\mathbf{H}^{\dagger}\mathbf{H}) \le \lambda\_{\text{max}}(\frac{1}{MN}\mathbf{H}^{\dagger}\mathbf{H}) + \tilde{\kappa}\_{S}(1+\sqrt{\mu})^{2}/N.\tag{44}
$$

In addition, we can obtain

$$
\lambda\_{\text{max}}(\frac{1}{MN}\mathbf{H}^{\dagger}\mathbf{H}) \ge \lambda\_{\text{max}}(\frac{1}{MN}\mathbf{H}^{\dagger}\mathbf{H}).\tag{45}
$$

When *M* → ∞ and *N* → ∞, we can get, by combining (44) with (45),

$$
\lambda\_{\text{max}}(\frac{1}{MN}\mathbf{H}^{\dagger}\mathbf{H}) \to \lim\_{M,N \to \infty} \frac{\lambda\_{\text{max}}^{(k)}}{MN}.\tag{46}
$$

It should be noticed that

$$R\_P = \frac{1}{K} \sum\_{k=1}^{K} \mathbb{E} \log\_2(1 + \frac{p}{\sigma^2} \lambda\_{\text{max}}(\mathbf{H}^\dagger(k)\mathbf{H}(k)))$$

$$\lambda = \frac{1}{K} \sum\_{k=1}^{K} \mathbb{E} \log\_2(1 + \frac{pMN}{\sigma^2} \lambda\_{\text{max}}(\frac{\mathbf{H}^\dagger(k)\mathbf{H}(k)}{MN})). \tag{47}$$

Therefore, when *M* → ∞ and *N* → ∞, if *E* = *pMN* is fixed, we can have finally

$$\lim\_{M,N \to \infty} R\_P = \lim\_{M,N \to \infty} R\_S. \tag{48}$$

Thus, Proposition 2 holds.

**Remark 2.** *This proposition implies that, when the two numbers of antennas at the transmitter and the receiver grow large with a fixed ratio, the ergodic achievable rate of the LOS-based scheme has the same asymptotic value as the ergodic achievable rate of the whole CSI-based scheme.*

#### *3.2. The Scenario with Correlation*

Now, we consider extending the proposed LOS-based ECT/EGC without spatial correlation to the scenario in which there exists doubly-ended spatial correlation. The MIMO system model presented in Section 2 is necessarily modified. **<sup>H</sup>**<sup>6</sup> -, - = 0, 1, 2, ··· , *L* − 1 is now modeled as doubly-correlated Rayleigh fading, with transmit and receive correlation matrices Ψ and Φ-, i.e., [12],

$$
\hat{\mathbf{H}}\_{\ell} = [\Phi\_{\ell}]^{1/2} \hat{\mathbf{H}}\_{\ell}^{\omega} [\Psi\_{\ell}]^{1/2},\tag{49}
$$

where **<sup>H</sup>**<sup>6</sup> *<sup>ω</sup>* is an i.i.d. matrix with each entry ∼ CN(0, 1). Since the scattered component of the channel matrix remains unchanged, the needed weighting vectors **w***<sup>t</sup>* and **w***<sup>r</sup>* should also remain unchanged. The ergodic achievable rate of the scenario with spatial correlation is denoted by

$$R\_{\mathbb{C}} = \frac{1}{K} \sum\_{k=0}^{K-1} R\_{\mathbb{C}}^{(k)}, \ R\_{\mathbb{C}}^{(k)} = \mathbb{E}\{\log\_2(1 + \gamma\_{\mathbb{C}}^{(k)})\}. \tag{50}$$

With respect to *RC*, we have the following results by a similar derivation.

#### **Lemma 3.**

$$\log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{p\mathbb{E}\_{\mathbb{C}} + \sigma^2}) \le R\_{\mathbb{C}}^{(k)} \le \log\_2(1 + \frac{p\lambda\_{\text{max}}^{(k)}}{\sigma^2}),\tag{51}$$

*where κ*˜*<sup>C</sup>* = ∑*L*−<sup>1</sup> -<sup>=</sup><sup>0</sup> *<sup>ρ</sup>*<sup>2</sup> *κ*˜**w**† *<sup>r</sup>* [Φ-] 1 <sup>2</sup> 2[Ψ-] 1 <sup>2</sup> **w**T† *<sup>t</sup>* 2.

**Proposition 3.** *If E* = *MN p is fixed as MN* → ∞*, then*

$$\lim\_{MN \to \infty} R\_{\mathbb{C}} = \lim\_{MN \to \infty} R\_{\mathbb{S}}.\tag{52}$$

**Remark 3.** *This proposition implies that the two ergodic achievable rates with and without spatial correlation have the same asymptotic value when MN goes without bound.*

#### **4. Cooperative Relaying Systems**

#### *4.1. The Scenario with Decode-and-Forward Protocol*

Ricean fading often happens in cooperative MIMO systems [23]. Therefore, we can use a Ricean MIMO channel model to describe both the source-relay and relay-destination links. Still in frequency-selective Ricean fading environments, we especially study a classical cooperative relay system with a source node, a destination node and a relay node. The relay node can be equipped with a large-scale antenna array while both the source node and the destination node can be also equipped with a large-scale antenna array. The cooperative system with the three nodes is assumed to operate in a half-duplex mode, and the replay node employs the DF protocol for transmission. Each transmission for the system can be completed through two stages. Obviously, the cooperative system is a composite of two massive MIMO subsystems: one subsystem working at the first stage and the other subsystem at the second stage. Therefore, the rate analysis results mentioned above can be applied to the cooperative relay system.

We suppose that the relay makes use of *M* antennas to receive and transmit data and also suppose that the source has *N*<sup>1</sup> antennas and the destination has *N*<sup>2</sup> antennas. We denote the average transmitted power at the source and the relay by *p*<sup>1</sup> and *p*2, respectively. In addition, we still let *RP* and *RS* represent the ergodic achievable rates for the whole CSI-based MRT/MRC scheme and the LOS-based EGT/EGC scheme, respectively. Then, we can obtain the following power scaling property for the cooperative system.

**Proposition 4.** *When M* → ∞ *and N*<sup>1</sup> → ∞*, let E*<sup>1</sup> = *MN*<sup>1</sup> *p*<sup>1</sup> *be fixed and N*1/*M* → *μ*<sup>1</sup> *for the source-relay link. When M* → ∞ *and N*<sup>2</sup> → ∞*, let E*<sup>2</sup> = *MN*<sup>2</sup> *p*<sup>2</sup> *be fixed and N*2/*M* → *μ*<sup>2</sup> *for the relay-destination link. Then,*

$$\lim\_{M,N\_1,N\_2\to\infty} R\_P = \lim\_{M,N\_1,N\_2\to\infty} R\_S.\tag{53}$$

**Proof of Proposition 4.** From [24], it follows that the ergodic achievable rate with the perfect SCI-based MRT/MRC is written as

$$R\_P = \min\{R\_P^{(1)}/2, R\_P^{(2)}/2\},\tag{54}$$

where *R*(1) *<sup>P</sup>* and *<sup>R</sup>*(2) *<sup>P</sup>* are the corresponding ergodic achievable rates of the source-relay and relay-destination transmission links, respectively. Similarly, we also have that the ergodic achievable rate with the only LOS-based EGT/EGC can be given by

$$R\_S = \min\{R\_S^{(1)} / 2, R\_S^{(2)} / 2\},\tag{55}$$

where *R*(1) *<sup>S</sup>* and *<sup>R</sup>*(2) *<sup>S</sup>* are the corresponding ergodic achievable rates of the source-relay and relay-destination transmission links, respectively. Under the condition of Proposition 4, we get by Proposition 2

$$\lim\_{M,N\_1 \to \infty} R\_P^{(1)} = \lim\_{M,N\_1 \to \infty} R\_S^{(1)} \tag{56}$$

and

$$\lim\_{M,N\_2 \to \infty} R\_P^{(2)} = \lim\_{M,N\_2 \to \infty} R\_S^{(2)}.\tag{57}$$

Thus, it is easy to obtain the desired result (53).

**Remark 4.** *This proposition implies that, when the number of source antennas and the two numbers of relay antennas at the transmitter and the receiver grow large with fixed ratios, the ergodic achievable rate of the LOS-based scheme also has the same asymptotic value as the ergodic achievable rate of the whole CSI-based scheme.*

#### *4.2. The Scenario with Amplify-and-Forward Protocol*

The DF is a regenerative relaying transmission strategy. Now, we consider employing a nonregenerative strategy involving AF to replace the DF. Then, we can have a power scaling law as follows.

**Proposition 5.** *Suppose that N*<sup>1</sup> = *N*<sup>2</sup> = 1*. When M* → ∞*, let E*<sup>1</sup> = *Mp*<sup>1</sup> *and E*<sup>2</sup> = *MN*<sup>2</sup> *p*<sup>2</sup> *be fixed. Then,*

$$\lim\_{M \to \infty} R\_S = \log\_2(1 + (\frac{E\_1 \mathbb{R}\_{l11}}{\sigma\_1^2} \cdot \frac{E\_2 \mathbb{R}\_{l12}}{\sigma\_2^2}) / (\frac{E\_1 \mathbb{R}\_{l11}}{\sigma\_1^2} + \frac{E\_2 \mathbb{R}\_{l12}}{\sigma\_2^2} + 1)). \tag{58}$$

**Proof of Proposition 5.** We denote by *<sup>γ</sup>*(*k*) *<sup>S</sup>*<sup>1</sup> and *<sup>γ</sup>*(*k*) *<sup>S</sup>*<sup>2</sup> the output instantaneous SNR of the source-relay and relay-destination links, respectively. From [25], we obtain that

$$\begin{aligned} \mathcal{R}\_S &= \frac{1}{K} \sum\_{k=0}^{K-1} \mathbb{E} \{ \log\_2(1 + \gamma\_S^{(k)}) \} \\ &= \frac{1}{K} \sum\_{k=0}^{K-1} \mathbb{E} \{ \log\_2(1 + (\gamma\_{S1}^{(k)} \cdot \gamma\_{S2}^{(k)}) / (\gamma\_{S1}^{(k)} + \gamma\_{S2}^{(k)} + 1)) \}. \end{aligned} \tag{59}$$

Based on the proof of Lemma 2, we can have the following asymptotical SNR expressions

$$\lim\_{\substack{M \to \infty \\ M \to \infty}} \gamma\_{S1}^{(k)} = \frac{E\_1 \mathbb{1}\_{L1}}{\sigma\_1^2} \tag{60}$$

and

$$\lim\_{M \to \infty} \gamma\_{S2}^{(k)} = \frac{E\_1 \mathbb{1}\_{L2}}{\sigma\_2^2} \tag{61}$$

Thus, the power scaling law (58) holds.

#### **5. Simulation Results**

For OFDM-MIMO systems in frequency-selective Ricean fading channels, we in this section provide our analytical results and simulation results. In all simulations, we assume that all of these spacings between adjacent antennas at the transmitter and the receiver are 0.5. We set the number of subcarriers *K* = 256, the channel delay spread *L* = 3, and the noise variance as *σ*<sup>2</sup> = 1. In addition, we let *ρ*<sup>2</sup> <sup>0</sup> = *<sup>ρ</sup>*<sup>2</sup> <sup>1</sup> = *<sup>ρ</sup>*<sup>2</sup> <sup>2</sup> = 1/3, *θ*<sup>0</sup> = *φ*<sup>0</sup> = *π*/6, *θ*<sup>1</sup> = *φ*<sup>1</sup> = *π*/4, and *θ*<sup>2</sup> = *φ*<sup>2</sup> = *π*/3. In Figures 1 and 2, the Ricean K-factor *κ* is fixed and is equal to 5 dB.

In order to verify Propositions 1 and 3, we consider firstly the scenario with spatial correlation when *N* = 3 and *E* = 20 dB. The spatial correlation among antennas is assumed to follow the exponential model, i.e., the correlation magnitude between antenna *p* and *q* can be determined by *c*(*p*, *q*) = *g*|*p*−*q*<sup>|</sup> , where *g* denotes the correlation coefficient [12]. Therefore, we represent the correlation matrices (*i*, *j*)-th of Φ by [Φ-]*ij* = (*g*- *<sup>r</sup>*)|*i*−*j*<sup>|</sup> and (*i*, *j*)-th of Ψ by [Ψ-]*ij* = (*g*- *t*)|*i*−*j*<sup>|</sup> , respectively, - = 0, 1, 2. Moreover, we set *g*<sup>0</sup> *<sup>r</sup>* = *g*<sup>1</sup> *<sup>r</sup>* = *g*<sup>2</sup> *<sup>r</sup>* = *gr* and *g*<sup>0</sup> *<sup>t</sup>* = *g*<sup>1</sup> *<sup>t</sup>* = *g*<sup>2</sup> *<sup>t</sup>* = *gt*. For the correlation coefficients *gt* = *gr* = *g* = 0, 0.3, 0.6, 0.9, as *M* increases from 6 to 60, Figure 1 provides a curve of the exact average rate *RC* and two curves of the upper and lower bounds of *RS*. It can be observed that the exact ergodic rate *RC* increases as the number of receive antenna *M* grows large, and is always between the two bounds of *RS*. As both of the correlation coefficients (*gt*, *gr*) increase, *RC* is closer to the upper bound, and becomes higher and higher than *RS*. This indicates that, compared to the uncorrelated scenario, the presence of spatial correlation results in improving the rate performance under the LOS-based EGT/EGC scheme. Therefore, if the LOS-based scheme can be employed, we can achieve performance benefits from the spatial correlation, which is obviously different from the traditional point of view. This implies that it would be practical if a large-scale antenna array is compactly arranged.

**Figure 1.** The ergodic achievable rate versus the number of receive antennas for comparing the case with correlation and the case without correlation.

**Figure 2.** The ergodic achievable rate versus the number of receive antennas for comparing the LOS-based scheme with the whole CSI-based scheme.

Next, we consider validating Proposition 2. For that, we need to compare the ergodic achievable rate of the proposed LOS-based EGT/EGC scheme with that of the perfect CSI-based MRT/MRC scheme. We set *μ* = 1/2 when the numbers of antennas at the transmitter and receiver grow large. For the parameter *E* = 10, 20, 30 dB, as *M* increases from 6 to 60, Figure 2 plots the two ergodic achievable rates, *RS* and *RP*. It can be found from Figure 2 that both of the ergodic achievable rates can tend to the same limit results for the given values of *E*. However, with an increase of *E*, the speed of rate convergence appears to be slower and slower.

Finally, we pay our attention to the classical DF cooperative relay system consisting of the source-relay and relay-destination links and set the identical parameters mentioned above in the two links. When *N*<sup>1</sup> = *N*<sup>2</sup> = 6, as *M* increases from 6 to 60, Figure 3 plots the two average rates *RS* and *RP* for *κ* = 5, 15 dB. It can be found from Figure 3 that, with an increasing *κ*, both *RS* and *RP* improve and *RS* is closer to *RP*. It should be noticed that *RP* denotes the average rate for the traditional linear processing scheme based on the whole CSI as considered in [6]. For obtaining a comprehensive comparison with the scheme based on the whole CSI in Rayleigh fading discussed in [24], Figure 3 also includes a rate curve which corresponds to *κ* = −∞ dB. Interestingly, with *κ* = 5 dB, the LOS-based scheme always obviously outperforms the scheme based on the whole CSI in Rayleigh fading.

**Figure 3.** The ergodic achievable rate versus the number of relay antennas for comparing the case with Ricean fading and the case with Rayleigh fading.

#### **6. Conclusions**

In this paper, we have developed the transmission scheme of LOS-based EGT/EGC for point-to-point massive-MIMO systems in frequency-selective Ricean fading channels without and with spatial correlation. In particular, we have derived expressions of the system achievable rate and determined several power scaling laws. In addition, we have also generalized our analysis to the cooperative relaying scenarios with DF and AF protocols, respectively. It is shown by our simulation results that the spatial correlation can improve the system performance and thus is an advantage, which is contrary to the traditional point of view. Compared to the Rayleigh fading environments, deployment of large scale antenna arrays in Ricean fading environments would be more suitable. For instance, massive MIMO can be applied in microwave backhaul links [26].

**Author Contributions:** Formal Analysis, Q.Y.; Validation, Y.S.; Writing—Original Draft Preparation, Q.Y., Y.S.; Writing—Review and Editing, D.-W.Y.; Supervision, D.-W.Y.

**Funding:** This research was funded by the open research fund of State Key Laboratory of Integrated Services Networks, the Fundamental Research Funds for the Central Universities Grant No. 3132016347, and the Natural Science Foundation of Liaoning Province Grant No. 201602086.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Hybrid Beamforming for Millimeter-Wave Heterogeneous Networks**

#### **Mostafa Hefnawi**

Department of Electrical and Computer Engineering, Royal Military College of Canada, Kingston, ON K7K 7B4, Canada; hefnawi@rmc.ca

Received: 23 December 2018; Accepted: 23 January 2019; Published: 28 January 2019

**Abstract:** Heterogeneous networks (HetNets) employing massive multiple-input multiple-output (MIMO) and millimeter-wave (mmWave) technologies have emerged as a promising solution to enhance the network capacity and coverage of next-generation 5G cellular networks. However, the use of traditional fully-digital MIMO beamforming methods, which require one radio frequency (RF) chain per antenna element, is not practical for large-scale antenna arrays, due to the high cost and high power consumption. To reduce the number of RF chains, hybrid analog and digital beamforming has been proposed as an alternative structure. In this paper, therefore, we consider a HetNet formed with one macro-cell base station (MBS) and multiple small-cell base stations (SBSs) equipped with large-scale antenna arrays that employ hybrid analog and digital beamforming. The analog beamforming weight vectors of the MBS and the SBSs correspond to the the best-fixed multi-beams obtained by eigendecomposition schemes. On the other hand, digital beamforming weights are optimized to maximize the receive signal-to-interference-plus-noise ratio (SINR) of the effective channels consisting of the cascade of the analog beamforming weights and the actual channel. The performance is evaluated in terms of the beampatterns and the ergodic channel capacity and shows that the proposed hybrid beamforming scheme achieves near-optimal performance with only four RF chains while requiring considerably less computational complexity.

**Keywords:** hybrid beamforming; massive MIMO; HetNets; mmWaves

#### **1. Introduction**

Recently, heterogeneous networks (HetNets) that use massive multiple-input multiple-output (MIMO) and millimeter-wave (mmWave) technologies has emerged as a promising solution to enhance the network capacity and coverage of next-generation 5G cellular networks [1–6]. Small cell deployment in HetNets can achieve high signal to interference plus noise ratio (SINR) and dense spectrum reuse, mmWave can address the current challenge of bandwidth shortage, and the large number of antenna arrays [7–10] are essential for mmWaves to compensate for channel attenuation. In Reference [11] we applied the concept of massive multiuser (MU)-MIMO to enhance both the access and the backhaul links in HetNets, and it was shown that such a concept could significantly improve the system performance in terms of link reliability, spectral efficiency, and energy efficiency. Traditional MIMO-beamforming systems require a dedicated radio frequency (RF) chain for each antenna element, which becomes impractical with massive MIMO systems due to either cost or power consumption. To reduce the number of RF chains, hybrid beamforming (HBF), which combines RF analog and baseband digital beamformers, has been proposed as a promising solution [12–17]. Figure 1 shows a general hybrid configuration that connects *Na* antenna elements to *Nd* RF chains, where *Nd* < *Na*, using an analog RF beamforming matrix built from only phase-shifters. Two widely-used analog beamformer architectures for hybrid beamforming are shown in Figure 2. The fully-connected hybrid beamforming structure of Figure 2a provides a full beamforming gain per transceiver—but with

high complexity—by connecting each RF chain to all antennas through a network of *Nd* × *Na* phase shifters [12–15]. Figure 2b, on the other hand, shows a partially-connected structure, where each RF chain is connected to *Na*/*Nd* number of sub-arrays. Such a structure has a lower hardware complexity at the price of reduced beamforming gain.

**Figure 1.** Hybrid beamforming.

**Figure 2.** The architecture of analog beamformers: (**a**) Fully-connected structure; (**b**) partiallyconnected structure.

Previous studies on massive hybrid MIMO mainly focused on single-user systems [12–14]. On the other hand, MU-MIMO cases were studied in References [15–17]. In Reference [15] a scheme called Joint Spatial Division Multiplexing (JSDM) was proposed to create multiple virtual sectors which reduce the overhead and computational complexity of downlink training and uplink feedback. In References [16,17] it was shown that the required number of RF chains only needs to be twice the number of data streams to achieve the same performance of any fully-digital beamforming scheme. These studies, however, did not consider HBF in the context of HetNets and focused primarily on macro-cellular systems. In this paper, we consider a HetNet where both the macro-cell base stations (MBSs) and small-cell base stations (SBSs) are equipped with fully-connected massive hybrid antenna arrays, while all mobile users have a single antenna. We propose a low-complexity HBF that is fully-based on eigenbeamforming. The MBSs and the SBSs select the best-fixed multi-beams by eigendecomposition of the access and backhaul channels. The selected beams are then used by the digital beamformers, which are based on the maximization of the receive SINR of the effective channels consisting of the cascade of the analog beamforming weights and the actual channel [18,19].

#### **2. System Model**

We consider the access and backhaul uplinks in the HetNet of Figure 3, where *K* cognitive small cells and their *Ls* small-cell users (SUs) are concurrently sharing the same frequency band with one MBS and their *Lp* macro-cell primary users (PUs). It is assumed that both the MBS and SBSs are equipped with massive hybrid antenna arrays while the SUs and PUs are equipped with a single antenna. The SBSs are acting as smart relays between the SUs and the MBS with *Na*− element transmitting/receiving antenna arrays and *Nd* RF chains. The MBS is equipped with *Ma*− element antenna arrays and *Md* RF chains. It is also assumed that both the SBS and the MBS perform an OFDM-based transmission and that the analog beamformers are identical for all subcarriers while adapting digital beamformers in each subcarrier.

Let *xs*[ *fn*] = ! *xs* <sup>1</sup>, *<sup>x</sup><sup>s</sup>* <sup>2</sup>, ··· , *<sup>x</sup><sup>s</sup> Ls* " and *xp*[ *fn*] = ! *xp* <sup>1</sup> , *<sup>x</sup><sup>p</sup>* <sup>2</sup> , ··· , *<sup>x</sup><sup>p</sup> Lp* " denote, respectively, the set of *Ls* SUs signals and *Lp* PUs signals transmitted on each subcarrier, and *fn*, *n* = 1, ··· , *Nc*, where *Nc* denotes the number of subcarriers per OFDM symbol in the system. The analysis is done separately on each subcarrier. For brevity therefore, we drop the frequency index *fn*.

**Figure 3.** System model: hybrid beamforming-based HetNet with one macro-cell and *K* small-cells.

#### *2.1. Access Link*

The *Na* × 1 received signal vector *<sup>y</sup>k*,*SBS* at the *<sup>k</sup>*th SBS is given by

$$\mathbf{y}\_{k\_rSRS} = \mathbf{G}\_{k\_rS1}\mathbf{x}\_s + \mathbf{G}\_{k\_rP1}\mathbf{x}\_p + \mathbf{n}\_{k\_rSRS} \tag{1}$$

where *<sup>G</sup>k*,*SU* <sup>∈</sup> <sup>C</sup>*Na*×*Ls* is the channel matrix between the *<sup>k</sup>*th SBS and its *Ls* users, *<sup>G</sup>k*,*PU* <sup>∈</sup> <sup>C</sup>*Na*×*Lp* is the channel matrix between the *<sup>k</sup>*th SBS and the *Lp* PUs, *<sup>x</sup><sup>s</sup>* <sup>∈</sup> <sup>C</sup>*Ls* <sup>×</sup><sup>1</sup> is the transmitted signal vector of *Ls* users in the *<sup>k</sup>*th small-cell, *<sup>x</sup><sup>p</sup>* <sup>∈</sup> <sup>C</sup>*Lp* <sup>×</sup><sup>1</sup> is the transmitted signal vector of *Lp* users in the HetNet, and *<sup>n</sup>k*,*SBS* <sup>∈</sup> <sup>C</sup>*Na*×<sup>1</sup> is the received complex additive white Gaussian noise (AWGN) vector at the *k*th SBS.

It should be noted that in Equation (1), the interference between small-cells was neglected. This was justified by the fact that small-cell base stations are using a large number of antennas, which enables sharp beamforming towards their users without harming neighboring small-cells.

The *<sup>k</sup>*th SBS received signal, *<sup>y</sup>k*,*SBS*, is first applied to an *Na* × *Nd* receive analog beamforming weight matrix, *ASBS R*,*k*,*ls* , whose output is directed to an *Nd* × *Nd* receive digital beamforming weight vector *DSBS R*,*k*,*ls* . If we denote the combined digital-analog receive beamformer for the *l th <sup>s</sup>* user as **<sup>w</sup>***R*,*k*,*ls* <sup>=</sup> *<sup>A</sup>SBS R*,*k*,*ls DSBS R*,*k*,*ls* , then the detection of the *l* th *<sup>s</sup>* user signal by its *k*th SBS can be expressed as

$$\begin{split} \mathbf{w}\_{k,l\_{s}} &= \mathbf{w}\_{\overline{R},k,l\_{s}}^{\overline{H}} \mathbf{y}\_{k,SBS} = \mathbf{w}\_{\overline{R},k,l\_{s}}^{H} \mathbf{G}\_{k,S\mathbf{L}} \mathbf{x}\_{s} + \mathbf{w}\_{\overline{R},k,l\_{s}}^{H} \mathbf{G}\_{k,P\mathbf{L}} \mathbf{x}\_{p} + \mathbf{w}\_{\overline{R},k,l\_{s}}^{H} \mathbf{n}\_{k,SBS} \\ &= \mathbf{w}\_{\overline{R},k,l\_{s}}^{H} \mathbf{g}\_{k,l\_{s}} \mathbf{x}\_{l\_{s}}^{s} + \mathbf{w}\_{\overline{R},k,l\_{s}}^{H} \sum\_{i=1, i \neq l\_{s}}^{L} \mathbf{g}\_{k,i\mathbf{x}\_{i}^{s}} + \mathbf{w}\_{\overline{R},k,l\_{s}}^{H} + \mathbf{G}\_{k,P\mathbf{L}} \mathbf{x}\_{p} + \mathbf{w}\_{\overline{R},k,l\_{s}}^{H} \mathbf{n}\_{k,SBS} \end{split} \tag{2}$$

where *gk*,*ls* is the *l* th *<sup>s</sup>* column of *Gk*,*SU* that represents the channel between the *k*th SBS and its *l* th *<sup>s</sup>* user.

If we denote **H***AL*,*k*,*ls* = *ASBS R*,*k*,*ls H <sup>g</sup>k*,*ls* as the effective access channel between the *<sup>k</sup>*th SBS and its *l* th *<sup>s</sup>* user, then for a set of selected beams, i.e. known *ASBS R*,*k*,*ls H* , the SINR can be expressed in terms of the digital beamformer, *DSBS R*,*k*,*ls* , as

$$\gamma\_{k,l\_s}^{SSS} = \frac{\left(\mathbf{D}\_{R,k,l\_s}^{SSS}\right)^H \mathbf{H}\_{AL,k,l\_s} \mathbf{x}\_{l\_s}^{\boldsymbol{\varepsilon}} \left(\mathbf{x}\_{l\_s}^{\boldsymbol{\varepsilon}}\right)^H \mathbf{H}\_{AL,k,l\_s}^H \mathbf{D}\_{R,k,l\_s}^{SSS}}{\left(\mathbf{D}\_{R,k,l\_s}^{SSS}\right)^H \mathbf{B}\_{AL} \mathbf{D}\_{R,k,l\_s}^{SSS}}\tag{3}$$

where **B***AL* is the covariance matrix of the interference-plus-noise given by

$$\mathbf{B}\_{AL} = \underbrace{\sum\_{i=1, i \neq l\_s}^{L\_o} \left(\mathbf{A}\_{R,k,i}^{\text{SBS}}\right)}\_{\text{Interference from }L\_s - 1\text{ SUs}} \underbrace{\mathbf{g}\_{k,i}\mathbf{x}\_i^s (\mathbf{x}\_i^s)^H \mathbf{g}\_{k,i}^H \mathbf{A}\_{R,k,i}^{\text{SBS}}}\_{\text{Interference from }L\_s - 1\text{ SUs}} + \underbrace{\left(\mathbf{A}\_{R,k,l\_s}^{\text{SBS}}\right)^H \mathbf{G}\_{k,\text{PIL}} \mathbf{x}\_p \mathbf{x}\_p^H \mathbf{G}\_{k,\text{PIL}}^H \mathbf{A}\_{R,k,l\_s}^{\text{SBS}}}\_{\text{Interference from }L\_s - \text{PUs}} + \sigma\_n^2 (\mathbf{A}\_{R,k,l\_s}^{\text{SBS}}) \mathbf{A}\_{R,k,l\_s}^{\text{SBS}} \left(\mathbf{A}\_{R,k,l\_s}^{\text{SBS}}\right)^H \mathbf{x}\_p^H \mathbf{G}\_{k,\text{PIL}}^H \mathbf{A}\_{R,k,l\_s}^{\text{SBS}}$$

#### *2.2. Backhaul Link*

The hybrid beamforming weights at the backhaul link are obtained based on orthogonal pilot signals transmitted from each SBS to the MBS. The *k*th SBS applies its pilot signal *s p <sup>k</sup>* <sup>∈</sup> <sup>C</sup>*Nd* <sup>×</sup><sup>1</sup> to an *Nd* <sup>×</sup> *Nd* transmit digital beamforming weight vector *<sup>D</sup>SBS <sup>T</sup>*,*<sup>k</sup>* followed by an *Na* × *Nd* transmit analog beamforming matrix *ASBS <sup>T</sup>*,*<sup>k</sup>* . If we denote the combined digital-analog transmit beamformer for the *<sup>k</sup>*th SBS as **w***T*,*<sup>k</sup>* = *ASBS <sup>T</sup>*,*<sup>k</sup> <sup>D</sup>SBS <sup>T</sup>*,*<sup>k</sup>* , then the array output of the MBS can be written as

$$\mathbf{y}\_{\text{MBS}}^p = \sum\_{k=1}^K \mathbf{H}\_{k,\text{MBS}} \mathbf{w}\_{T,k} \mathbf{s}\_k^p + \mathbf{H}\_{\text{PIL},\text{MBS}} \mathbf{x}\_p + \mathbf{n}\_{\text{MBS}} \tag{5}$$

where *y<sup>p</sup>* MBS is the *Ma* × 1 vector containing the outputs of the *Ma*− element antenna array of the MBS, **H***k*,*MBS* is the *Na* × *Ma* channel matrix representing the transfer functions from the *Na*− element antenna array of the *<sup>k</sup>*th SBS to the *Ma*− element antenna array of the MBS, **<sup>H</sup>***PU*,*MBS* is the *Ma* × *<sup>L</sup>*<sup>p</sup> channel matrix from the *Lp* PUs to the MBS's *Ma*− element antenna array, and **n***MBS* is the received *Ma* × 1 complex AWGN vector at the MBS.

The MBS detects the *k*th SBS signal by applying the output of the array *y<sup>p</sup>* MBS to the *Ma* × *Md* receive analog weight matrix *AMBS <sup>R</sup>*,*<sup>k</sup>* followed by the *Md* <sup>×</sup> *Md* receive digital beamforming weight vector *<sup>D</sup>MBS <sup>k</sup>*,*<sup>R</sup>* . If we denote the combined digital-analog receive beamformer for the *k*th SBS as *c<sup>k</sup>* = *AMBS <sup>k</sup>*,*<sup>R</sup> <sup>D</sup>MBS <sup>k</sup>*,*<sup>R</sup>* , then the detection of the *k*th SBS signal by the MBS can be expressed as

$$\pounds\_k = \mathfrak{c}\_k^H \mathfrak{y}\_{\text{MBS}}^p = \mathcal{S}\_k^p + \mathcal{S}\_{I\_k} + \mathcal{S}\_{I\_p} + \mathcal{c}\_k^H \mathfrak{u}\_{\text{MBS}} \, \, \, \, \tag{6}$$

where *S<sup>p</sup> <sup>k</sup>* <sup>=</sup> *<sup>c</sup><sup>H</sup> <sup>k</sup>* **H***k*,*MBS***w***T*,*ks p <sup>k</sup>* is the *<sup>k</sup>*th SBS signal, *<sup>S</sup>Ik* <sup>=</sup> *<sup>c</sup><sup>H</sup> <sup>k</sup>* <sup>∑</sup>*<sup>K</sup> <sup>i</sup>*=1,*<sup>i</sup>* <sup>=</sup>*<sup>k</sup>* **<sup>H</sup>***k*,*MBS***w***T*,*k<sup>s</sup> p <sup>k</sup>* is the interference from *<sup>K</sup>* − 1 other SBSs, and *SIp* = *<sup>c</sup><sup>H</sup> <sup>k</sup>* **H***PU*,*MBSx<sup>p</sup>* is the interference from *Lp* PUs.

Assuming that *s p <sup>k</sup>* are complex-valued random variables with unit power, i.e., *E s p k* 2 = 1, and denoting **H***BL*,*<sup>k</sup>* = *AMBS R*,*k H* **<sup>H</sup>***k*,*MBS ASBS T*,*k H* as the effective channel between the *k*th SBS and the MBS, the SINR at the MBS for the *k*th SBS can be expressed as

$$\gamma\_k^{MBS} = \frac{\left(\mathbf{D}\_{R,k}^{MBS}\right)^H \mathbf{H}\_{BL,k} \left(\mathbf{D}\_{T,k}^{SBS}\right)^H \mathbf{D}\_{T,k}^{SBS} \mathbf{H}\_{BL,k}^H \mathbf{D}\_{R,k}^{MBS}}{\mathbf{c}\_k^H \mathbf{B}\_{BL} \mathbf{c}\_k} \tag{7}$$

where **B***BL* = ∑*<sup>K</sup> <sup>i</sup>*=1,*<sup>i</sup>* <sup>=</sup>*<sup>k</sup>* **<sup>H</sup>***i*,*MBS***w***T*,*i***w***<sup>H</sup> T*,*i* **H***<sup>H</sup> <sup>i</sup>*,*MBS* + **<sup>H</sup>***PU*,*MBSxpx<sup>H</sup> <sup>p</sup>* **H***<sup>H</sup> PU*,*MBS* + *<sup>σ</sup>*<sup>2</sup> *<sup>n</sup>***I***Ma* is the covariance matrix of the interference-plus-noise at the backhaul link.

#### *2.3. End-to-End SINR and Channel Capacity*

Once the hybrid beamforming weights of the backhaul link are obtained, they can be used to forward the SUs signals to the MBS. The *k*th SBS applies the received *l* th *<sup>s</sup>* user signal, **w***<sup>H</sup> R*,*k*,*ls gk*,*ls xs ls* , to the hybrid beamformer, **w***T*,*k*. The *NT*,*<sup>a</sup>* × 1 transmitted signal *sk*,*ls* at the output of the antenna array can then be expressed as

$$\mathbf{s}\_{k,l\_s} = \mathbf{w}\_{T,k} \mathbf{w}\_{R,k,l\_s}^H \mathbf{g}\_{k,l\_s} \mathbf{x}\_{l\_s}^s,\tag{8}$$

and the expression for the array output of the MBS can be written as

$$\mathbf{y}\_{MBS} = \mathbf{H}\_{\text{PL},MBS}\mathbf{x}\_{\text{P}} + \mathbf{y}\_{k,MBS} + \sum\_{i=1,\ i \neq k}^{K} \mathbf{y}\_{i,MBS} + \mathbf{n}\_{MBS} \tag{9}$$

where *<sup>y</sup>k*,*MBS* = **<sup>H</sup>***k*,*MBS***w***T*,*k***w***<sup>H</sup> R*,*k*,*ls <sup>y</sup><sup>k</sup>* is the array output of the MBS from the *<sup>k</sup>*th SBS.

Using Equation (2), *yk*,*MBS* can be expressed in terms of the *l* th *<sup>s</sup>* user signal, *x<sup>s</sup> ls* , as follows:

$$\begin{array}{c} \mathbf{y}\_{k,MBS} = \mathbf{H}\_{k,MBS}\mathbf{w}\_{T,k}\mathbf{w}\_{R,k,l\_s}^H \mathbf{g}\_{k,l\_s} \mathbf{x}\_{l\_s}^s\\ + \mathbf{H}\_{k,MBS}\mathbf{w}\_{T,k}\mathbf{w}\_{R,k,l\_s}^H \sum\_{i=1, i \neq l\_s}^{L\_s} \mathbf{g}\_{k,i} \mathbf{x}\_i^s\\ + \mathbf{H}\_{k,MBS}\mathbf{w}\_{T,k}\mathbf{w}\_{R,k,l\_s}^H \mathbf{G}\_{k,PL} \mathbf{x}\_p\\ + \mathbf{H}\_{k,MBS}\mathbf{w}\_{T,k}\mathbf{w}\_{R,k,l\_s}^H \mathbf{h}\_{k,SBS} \end{array} \tag{10}$$

When the MBS applies the output of the array, *<sup>y</sup>MBS*, to the hybrid weight, *<sup>c</sup><sup>H</sup> <sup>k</sup>* , the detection of the *l* th *<sup>s</sup>* user signal of the *k*th SBS by the MBS can be expressed as

$$\hat{\mathbf{x}}\_{k,l\_s} = \mathbf{c}\_k^H \mathbf{y}\_{\text{MBS}} = \mathbf{c}\_k^H \left( \mathbf{S}\_{k,l\_s} + \mathbf{S}\_{I}^{SRS\text{s}} + \mathbf{S}\_{k,I}^{SLI} + \mathbf{S}\_{I}^{PU} + \mathbf{N}\_{\text{MBS}} \right) \tag{11}$$

where

**<sup>S</sup>***k*,*ls* <sup>=</sup> **<sup>H</sup>***k*,*MBS***w***T*,*k***w***<sup>H</sup> R*,*k*,*ls* **g***k*,*ls xs ls* is the *l* th *<sup>s</sup>* user signal of the *k*th SBS, **S**SU *<sup>k</sup>*,*<sup>I</sup>* = **<sup>H</sup>***k*,*MBS***w***T*,*k***w***<sup>H</sup> <sup>R</sup>*,*k*,*ls* <sup>∑</sup>*Ls <sup>i</sup>*=1,*<sup>i</sup>* <sup>=</sup>*ls <sup>g</sup>k*,*<sup>i</sup> xs <sup>i</sup>* is the interference from the *Ls* − 1 other SUs of *<sup>k</sup>*th SBS, **S***SBS <sup>I</sup>* = <sup>∑</sup>*<sup>K</sup> i*=1, *i* =*k* **H***i*,*MBS***w***T*,*i***w***<sup>H</sup> R*,*i*,*ls <sup>y</sup>i*,*SBS* is the interference from the *K* − 1 other SBSs. **S**PU *<sup>I</sup>* = **H***k*,*MBS***w***T*,*k***w***<sup>H</sup> R*,*k*,*ls Gk*,*PUx<sup>p</sup>* + *c<sup>H</sup> <sup>k</sup>* **H***PU*,*MBSx<sup>p</sup> NMBS* = **H***k*,*MBS***w***T*,*k***w***<sup>H</sup> R*,*k*,*ls <sup>n</sup>k*,*SBS* <sup>+</sup> *<sup>n</sup>MBS*

The end-to-end SINR at the MBS for the *l* th *<sup>s</sup>* user of the *k*th SBS can be expressed as

$$\gamma\_{k,l\_s}^{MBS} = \frac{\mathbf{c}\_{\mathbf{k}}^H \mathbf{H}\_{k,MBS} \mathbf{w}\_{T,k} \mathbf{w}\_{R,k,l\_s}^H \mathbf{g}\_{k,l\_s} \mathbf{x}\_{l\_s}^s \left(\mathbf{x}\_{l\_s}^s\right)^H \mathbf{g}\_{k,l\_s}^H \mathbf{w}\_{R,k,l} \mathbf{w}\_{T,k}^H \mathbf{H}\_{k,MBS}^H \mathbf{c}\_k}{\mathbf{c}\_{\mathbf{k}}^H \mathbf{B}\_{AL-BL} \mathbf{c}\_k} \tag{12}$$

where **<sup>B</sup>***AL*−*BL* <sup>=</sup> **<sup>S</sup>**SU *k*,*I* **S**SU *k*,*I H* + **S***SBS I* **S***SBS I H* + **S**PU *I* **S**PU *I H* + *NMBS*(*NMBS*) *<sup>H</sup>* is the covariance matrix of the interference-plus-noise for the *l* th *<sup>s</sup>* user end-to-end link.

The ergodic channel capacity for each user, *ls*, is given by [19]

$$\mathcal{L} = \mathbb{E}\left(\log\_2\left\{1 + \frac{\mathbf{c}\_{\mathbf{k}}^H \mathbf{H}\_{k,MBS} \mathbf{w}\_{T,k} \mathbf{w}\_{R,k,l}^H \mathbf{g}\_{k,l\_s} \mathbf{x}\_s \mathbf{x}\_s^H \mathbf{g}\_{k,l\_s}^H \mathbf{w}\_{R,k,l\_s} \mathbf{w}\_{T,k}^H \mathbf{H}\_{k,MBS}^H \mathbf{c}\_k}{\mathbf{c}\_{\mathbf{k}}^H \mathbf{B}\_{AL-BL} \mathbf{c}\_k}\right\}\right),\tag{13}$$

where E(.) denotes the expectation operator.

#### *2.4. Channel Model*

For the access and backhaul links, we consider mmWave propagation channels with limited scattering which can be modelled at each subcarrier by the clustered channel representation [13]. We assume a scattering environment with *Ncl* scattering clusters randomly distributed in space and within each cluster, there are *Nray* closely located scatterers.

For the backhaul link, the channel matrix at subcarrier *fn* between the *k*th SBS and the MBS can be expressed as

$$\mathbf{H}\_{k,\text{MRS},f\_{\text{f}}} = \sqrt{\frac{N\_{\text{d}}M\_{\text{d}}}{N\_{\text{cl}}N\_{\text{ray}}}} \sum\_{i}^{N\_{\text{cl}}} \sum\_{l=1}^{N\_{\text{ray}}} a\_{il,f\_{\text{f}}} \mathbf{a}\_{\text{MRS},f\_{\text{f}}} \left(\mathcal{O}\_{i,l}^{r}, \theta\_{i,l}^{r}\right) \mathbf{a}\_{k,\text{SBS},f\_{\text{f}}}^{\*} \left(\mathcal{O}\_{i,l}^{t}, \theta\_{i,l}^{t}\right),\tag{14}$$

where *<sup>α</sup>il*, *fn* <sup>=</sup> <sup>6</sup>*αile*<sup>−</sup>*j*2*πi fn*/*Nc* are the complex gains of the *<sup>j</sup>* th ray in the *i* th scattering cluster and <sup>6</sup>*αil* are assumed i.i.d CN(0, *σ*<sup>2</sup> 6*α*,*i* ). With *σ*<sup>2</sup> <sup>6</sup>*α*,*<sup>i</sup>* representing the average power of the *<sup>i</sup> th* cluster, ∅*<sup>r</sup> <sup>i</sup>*,*<sup>l</sup>* and <sup>∅</sup>*<sup>t</sup> <sup>i</sup>*,*<sup>l</sup>* are the azimuth angles of arrival and departure, respectively, *θ<sup>r</sup> <sup>i</sup>*,*<sup>j</sup> and <sup>θ</sup><sup>t</sup> <sup>i</sup>*,*<sup>j</sup>* are the elevation angles of arrival and departure, respectively, and *aMBS*, *fn* ∅*r i*,*l* , *θ<sup>r</sup> i*,*l* and *ak*,*SBS*, *fn* ∅*t i*,*l* , *θt i*,*l* represent, respectively, the normalized receive and transmit array response vectors of the MBS and the *k*th SBS.

For the access link, the channel matrix at subcarrier *fn* between the *k*th SBS and its *Ls* users can be written as

$$\mathbf{G}\_{k,S\boldsymbol{I},f\_{\boldsymbol{u}}} = \sqrt{\frac{L\_{\rm s}M\_{\rm d}}{N\_{\rm cI}N\_{\rm rny}}} \sum\_{i}^{N\_{\rm cI}} \sum\_{l=1}^{N\_{\rm rny}} a\_{il,f\_{\boldsymbol{u}}} \mathbf{a}\_{S\boldsymbol{S},f\_{\boldsymbol{u}}} \left(\mathcal{Q}\_{i,l'}^{\boldsymbol{r}} \theta\_{i,l}^{\boldsymbol{r}}\right) \mathbf{a}\_{k,S\boldsymbol{I},f\_{\boldsymbol{u}}}^{\*} \left(\mathcal{Q}\_{i,l'}^{\boldsymbol{t}} \theta\_{i,l}^{\boldsymbol{t}}\right),\tag{15}$$

where *ak*,*SU*, *fn* ∅*t i*,*l* , *θt i*,*l* represents the spatial signature vector of the *Ls* single antenna users.

The *Nray* azimuth and elevation angles, ∅*r*,*<sup>t</sup> <sup>i</sup>*,*<sup>l</sup>* and *<sup>θ</sup>r*,*<sup>t</sup> <sup>i</sup>*,*<sup>l</sup>* , within the cluster *i* are assumed to be randomly distributed with a uniformly-random mean cluster angle of ∅*r*,*<sup>t</sup> <sup>i</sup> and <sup>θ</sup>r*,*<sup>t</sup> <sup>i</sup>* , respectively, and a constant angular spread of *σ*∅*r*,*<sup>t</sup> and σθr*,*<sup>t</sup>* , respectively.

For simplicity, the access links between the MBS and its *Lp* users (**H***PU*,*MBS*) and between PUs and the *kth* SBS (*Gk*,*PU*) are modeled by convetional i.i.d MIMO channels.

Note that in this per-subcarrrier representation, it is assumed that for each subcarrier *fn* , the carrier frequency *fc* is much larger than *fc* ± *fn* and that *aMBS*, *fn* ∅*r i*,*l* , *θ<sup>r</sup> i*,*l* and *ak*,*SBS*, *fn* ∅*t i*,*l* , *θt i*,*l* can approximately be considered equal for all subcarriers. Consequently, the channel covariance matrices are approximately similar with the same set of eigenvectors for all subcarriers and can be replaced by the average of the covariance matrices, i.e., **H***<sup>H</sup> AL*,*k*,*ls* **<sup>H</sup>***AL*,*k*,*ls* = <sup>1</sup> *Nc* <sup>∑</sup>*Nc <sup>n</sup>*=<sup>1</sup> **<sup>H</sup>***<sup>H</sup> AL*,*k*,*ls*, *fn* **H***AL*,*k*,*ls*, *fn* , **H***<sup>H</sup> BL*,*k***H***BL*,*<sup>k</sup>* <sup>=</sup> <sup>1</sup> *Nc* <sup>∑</sup>*Nc <sup>n</sup>*=<sup>1</sup> **<sup>H</sup>***<sup>H</sup> BL*,*k*, *fn* **H***BL*,*k*, *fn* , and **H***<sup>H</sup> PU*,*MBS***H***PU*,*MBS* = <sup>1</sup> *Nc* <sup>∑</sup>*Nc <sup>n</sup>*=<sup>1</sup> **<sup>H</sup>***<sup>H</sup> PU*,*MBS*, *fn* **H***PU*,*MBS*, *fn* .

#### **3. Proposed Hybrid Beamforming**

#### *3.1. Access Link*

The *k*th SBS communicates with each SU through a set of selected beams that corresponds to a set of weight vectors. These weight vectors are obtained using the eigenbeamforming scheme and can be expressed as

$$\begin{array}{c} \boldsymbol{A}\_{R,k,l\_{\nu}}^{SBS} = \left[ \boldsymbol{a}\_{R,k,l\_{\nu},1}^{SBS}, \boldsymbol{a}\_{R,k,l\_{\nu},2'}^{SBS}, \dots, \boldsymbol{a}\_{R,k,l\_{\nu},N\_d}^{SBS} \right] \\ \text{subject to } \left| \boldsymbol{A}\_{R,k,l\_{\nu}}^{SBS}(\boldsymbol{i}, \boldsymbol{j}) \right|^2 = 1 \end{array} \tag{16}$$

where *aSBS <sup>R</sup>*,*k*,*ls*,*<sup>i</sup>* denote the *i* th selected *Ma* × 1 eigenvector corresponding to the *<sup>i</sup>* th maximum eigen value of *g<sup>H</sup> k*,*ls gk*,*ls* .

Since the analog beamforming matrix *ASBS <sup>R</sup>*,*k*,*ls* is realized using phase shifters only, its elements, *ASBS R*,*k*,*ls* (*i*, *j*), satisfy *ASBS R*,*k*,*ls* (*i*, *j*) 2 = 1. It should be noted that each SBS is using a different analog matrix for each user and that the system model shown in Figure 1 focuses on the detection of the *l* th *<sup>s</sup>* user of the *k*th SBS and shows the analog beamformer and the RF chains for one user only. The analog beamformer can be implemented using the Butler matrix as shown in Figure 4, where four users (*Ls* = 4) and four RF chains per user (*Nd* = 4) are assumed. Depending upon which 4 ports are activated, 4 beams are produced in specified directions. Since we are assuming 4 different channels, we should expect 4 different ports for each user.

**Figure 4.** Hybrid beamforming based on Butler matrix for the access link.

Once the analog beams are selected, the received optimal digital weights, *DSBS R*,*k*,*ls* , are obtained based on the maximization of the access link receive SINR, γ*SBS <sup>k</sup>*,*ls* , given by Equation (3):

$$\mathbf{D}\_{R,k,l\_s}^{SRS} = \mathbf{B}\_{AL,k,l\_s}^{-1} \mathbf{V}\_{AL,k,l\_s} \tag{17}$$

where *VAL*,*k*,*ls* denote the eigen vector corresponding to the maximum eigenvalue of the effective access channel, *H<sup>H</sup> AL*,*k*,*ls HAL*,*k*,*ls*.

#### *3.2. Backhaul Link*

The transmit analog weights of the *k*th SBS are based on the eigenbeamforming scheme and are given by

$$\begin{array}{l} \mathbf{A}\_{T,k}^{SBS} = \left[ \mathbf{a}\_{T,k,1}^{SBS}, \mathbf{a}\_{T,k,2'}^{SBS}, \dots, \mathbf{a}\_{T,k,N\_d}^{SBS} \right] \\ \text{subject to } \left| \mathbf{A}\_{T,k}^{SBS}(i,j) \right|^2 = 1 \end{array} \tag{18}$$

where *aSBS <sup>T</sup>*,*k*,*<sup>i</sup>* denote the *i* th selected *Na* × 1 eigenvector corresponding to the *<sup>i</sup>* th maximum eigenvalue of *H<sup>H</sup> <sup>k</sup>*,MBS*Hk*,MBS.

Assuming channel reciprocity with *Na* = *Ma*, the receive analog weight vectors of the MBS are given by *AMBS <sup>R</sup>*,*<sup>k</sup>* = *<sup>A</sup>SBS <sup>T</sup>*,*<sup>k</sup>* . It should also be noted that the MBS is using a different analog matrix for each SBS, which can be implemented using the Butler matrix of Figure 4, where mobile users are replaced by SBSs.

For fixed analog beamforming weights, *ASBS <sup>T</sup>*,*<sup>k</sup>* and *<sup>A</sup>MBS <sup>R</sup>*,*<sup>k</sup>* , the transmit optimal digital weight vector of the *k*th SBS, *DSBS <sup>T</sup>*,*<sup>k</sup>* , and the receive optimal digital weight vector of the MBS, *<sup>D</sup>MBS <sup>R</sup>*,*<sup>k</sup>* , are obtained base on the maximization of the backhaul link receive SINR and are given by

$$\mathbf{D}\_{T,k}^{SBS} = \mathbf{D}\_{R,k}^{MBS} = \mathbf{B}\_{BL,k}^{-1} \mathbf{H}\_{BL,k} \mathbf{V}\_{BL,\*} \tag{19}$$

where *<sup>V</sup>BL* is the eigenvector corresponding to the maximum eigenvalue of **<sup>H</sup>***<sup>H</sup> BL*,*k***H***BL*,*k*, with *HBL*,*<sup>k</sup>* representing the effective channel given by *HBL*,*<sup>k</sup>* = *AMBS R H k* **<sup>H</sup>***k*,*MBS ASBS T H k* .

#### **4. Simulation Results**

In our simulation setups, we considered a HetNet organized into four SBSs (*K* = 4) and one macro-cell. The SBSs and the MBS used the same number of antennas, *Na* = *Ma* = 64, and the same number of RF chains, *Nd* = *Md* = 2 *or* 4. Each SBS is serving *Ls* = 4 users and the macro-cell is serving 4 users, each transmitting with a single antenna. We assumed QPSK modulation. For the OFDM configurations, we assumed the 256-OFDM system (*Nc* = 256), which is widely deployed in broadband wireless access services.

Figure 5 shows the beampattern of the proposed HBF with four RF chains and the optimal fully-digital one for the access link. It is noted that the optimal beamformer has about five dominant beams, three of which are similar to the selected beams of the proposed HBF. This beampattern means that the data streams can be successfully transmitted through those three beams using the proposed HBF and that near optimal performance could be achieved if we were to bring the number of RF chains close to the number of dominant beams of the optimal beamformer. For the backhaul link, Figure 6 shows very similar beampatterns with more dominant beams.

**Figure 5.** Beampattern of the access link: (**a**) Proposed HBF, 4 RF chains; (**b**) fully-digital beamforming (optimal).

**Figure 6.** Beampattern of the backhaul link: (**a**) Proposed HBF, 4 RF chains; (**b**) fully-digital beamforming (optimal).

Figure 7, on the other hand, compares the ergodic channel capacity of the proposed HBF and the optimal fully-digital one. It is observed that for both cases the optimal beamformer is outperforming the proposed HBF. However, as we increase the number of RF chains, the performance gap between the two schemes was reduced, and a near-optimal solution was achieved by the proposed HBF using four RF chains. On the other hand, for the single cell MU-MIMO case presented in References [12–14], near optimal performance was achieved with only five RF chains, and for the MU-MIMO case in [16,17], it was shown that the required number RF chains could be reduced to two to achieve fully digital beamforming performance. However, unlike our case, where we have assumed a HetNet with a macro cell and multiple small cognitive cells, these studies focused primarily on macro-cellular systems and did not consider HBF in the context of HetNets.

**Figure 7.** Ergodic channel capacity of the proposed HBF for different number of RF chains.

#### **5. Conclusions**

In this paper, we employed hybrid beamforming at the access and backhaul links of a mmWave HetNet system. We proposed a low-complexity HBF that was fully-based on MRT/MRC Eigenbeamforming schemes. The performance evaluation in terms of the beam patterns and the ergodic channel capacity showed that the proposed HBF scheme achieved near-optimal performance with only four RF chains and required considerably less computational complexity.

**Funding:** This research received no external funding.

**Acknowledgments:** The author would like to thank the Canadian Microelectronics Corporation (CMC) for providing the Heterogeneous Parallel Platform to run the computationally-intensive Monte-Carlo Simulations.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Spatio-Radio Resource Management and Hybrid Beamforming for Limited Feedback Massive MIMO Systems**

#### **Hedi Khammari 1, Irfan Ahmed 2,\*, Ghulam Bhatti <sup>3</sup> and Masoud Alajmi <sup>1</sup>**


Received: 5 August 2019; Accepted: 17 September 2019; Published: 20 September 2019

**Abstract:** In this paper, a joint spatio–radio frequency resource allocation and hybrid beamforming scheme for the massive multiple-input multiple-output (MIMO) systems is proposed. We consider limited feedback two-stage hybrid beamformimg for decomposing the precoding matrix at the base-station. To reduce the channel state information (CSI) feedback of massive MIMO, we utilize the channel covariance-based RF precoding and beam selection. This beam selection process minimizes the inter-group interference. The regularized block diagonalization can mitigate the inter-group interference, but requires substantial overhead feedback. We use channel covariance-based eigenmodes and discrete Fourier transforms (DFT) to reduce the feedback overhead and design a simplified analog precoder. The columns of the analog beamforming matrix are selected based on the users' grouping performed by the K-mean unsupervised machine learning algorithm. The digital precoder is designed with joint optimization of intra-group user utility function. It has been shown that more than 50% feedback overhead is reduced by the eigenmodes-based analog precoder design. The joint beams, users scheduling and limited feedbacK-based hybrid precoding increases the sum-rate by 27.6% compared to the sum-rate of one-group case, and reduce the feedback overhead by 62.5% compared to the full CSI feedback.

**Keywords:** hybrid beamforming; massive MIMO; resource allocation

#### **1. Introduction**

The scarcity of available frequency band for wireless communications has led to the inclusion of millimeter Wave (mmWave) frequencies in cellular communications. This has opened the doors for massive multiple-input multiple-output (MIMO) systems. Due to high transmission frequencies, fabrication of large number of antennas with small form factor has become possible. MmWave band has inherent hindrances, like, high path-loss and absorption-loss. It has been known that MIMO systems advantages (spatial multiplexing or diversity gain) are scaled-up with the number of antennas. In summary, one can enjoy the benefits of the large bandwidth available at mmWave frequencies by combating high path and absorption losses with massive MIMO directional beamforming. Future mmWave massive MIMO-based cellular networks will be as shown in Figure 1. Due to the high pathloss on one hand and high directional gain on the other hand, the inter-cell interference and cell boundaries will become meaningless. The fixed area size cell boundaries of traditional cellular will probably no longer exist in the future mmWave massive MIMO systems. Narrow beams can serve distant user equipment (UE) without interfering other UEs provided that there is no obstacle between BS and intended UE, whereas a closely located UE may deprive of connection due to the obstacles.

**Figure 1.** MmWave massive MIMO based cellular system.

The cost of massive MIMO is in terms of excessive feedback overhead for channel estimation along with the hardware complexity of RF chains (increased number of radio-frequency (RF) chains). The feedback overhead has been tackled separately for frequency division duplex (FDD) and time division duplex (TDD) systems. In FDD systems, the uplink channel estimation consists of fewer overheads compared to the downlink channel estimation, because generally, the number of transmit antennas *Nt* is larger than the number of users *K*, and the number of receive antennas per user *nk* (*Nt K* and *Nt nk*). The most common technique to reduce the downlink channel estimation overhead is joint spatial division multiplexing (JSDM) [1]. The JSDM uses two-stage precoding: second order channel statistics (covariance)-based user grouping and the traditional MU-MIMO linear precoding (zero-forcing) for the inter-user interference mitigation based on the low-dimensional effective channel. In TDD, only uplink channel estimation is done and the downlink channel estimates are obtained by the transpose of the uplink channel using the channel reciprocity principle. The TDD massive MIMO systems suffer from pilot contamination when the BS receives non-orthogonal pilot signals from the neighboring cells. This pilot contamination degrades the channel estimation and hence, affects both uplink combining and downlink precoding.

In traditional MIMO systems, a separate RF chain (analog-to-digital converter/digital-to-analog converter, serial-to-parallel/parralel-to-serial converter, up/down converter etc) is required for each antenna, but the high power consumption makes it infeasible for the case of massive MIMO systems. Hybrid beamforming technique resolves this problem by dividing the precoding/combining into baseband digital processing and RF analog processing. The hybrid precoding and combining offer extra degrees of freedom in space domain with a large number of antennas and analog beamforming [2]. The hybrid beamforming can be realized by using MU-MIMO precoding as baseband digital precoding and the statistical channel state information-based pre-beamforming as RF analog precoding. This limited feedback (due to average CSI) configuration is particularly suited for massive MIMO mmWave systems with a large number of antennas but relatively small number of RF chains [3]. It has been shown [4], that the covariance-based limited feedback works well for mmWave massive MIMO systems, where the number of users is small with respect to the number of BS antennas and the channels are formed by a few multi path components (MPCs) with small angular spread.

Limited work has been done on the joint multiuser massive MIMO resource allocation and hybrid beamforming design. Although mmWave massive MIMO system has a potential of tremendous increase of spectral efficiency. However, the cost and power consumption of power-hungry radio frequency chains (analog-to-digital converter (ADC)/digital-to-analog converter (DAC), parallel to serial converter, serial to parallel converter, up converter/down converter) make it impractical to build a complete RF chain for each antenna. A promising solution to this problem is hybrid beamforming, where the precoder at the transmitter is divided into two parts: analog precoder and digital precoder. The analog precoder (usually a network of phase shifters) at the RF stage reduces the number of RF chains required for the digital precoder. In order to configure these precoders, the transmitter requires channel state information in the form of uplink feedback from users, but in the presence of massive antennas, this feedback becomes a huge load on the wireless uplink, especially in FDD mode. JSDM [4] is a technique used to reduce the feedback overhead. It uses slowly varying average channel statistics to implement the analog precoder; then, the digital precoder is realized by using a low-dimensional effective channel. Till now, different variants of the JSDM have been proposed. Li et al. [5] generalize the JSDM scheme to support non-orthogonal virtual sectorization and with multiple RF chains at both link ends. It uses the Kronecker channel model to decouple the transmit and receive beamforming. Under this channel, the analog beamformer is obtained by stacking strongest eigenbeams of the channel covariance matrix and then the digital beamformer is based on a weighted minimum mean squared error (MMSE) with effective channel. However, the Kronecker model does not characterize the mmWave channel where transmitter and receiver have coupling effects due to highly directional transmission. In [6], the authors apply JSDM using a geometrical channel model and find hybrid precoder and combiner at transmitter and receivers, respectively. Hybrid beamforming with switches (HBwS) has been introduced in [7], where, *L* × *Nt* analog beamformer is controlled by *NRF* × *L* instantaneous CSI based switches. *Nt* is the number of transmit antennas, *NRF* is the number of RF chains, and *NRF* < *L* < *Nt*. Another switch-based analog beamforming is proposed in [8] but it requires instantaneous CSI for both switching network and the phase shifter network. Also it contains *L* = *Nt*. The JSDM implementation also requires the training in the downlink to estimate the channel covariance matrix. Most of the work assumes that the CSI is known at both ends. In [9], authors consider the joint optimization of the training resource allocation and channel-statistics-based analog beamformer design by using user centric virtual sectorization. There are different structures for the phase shifter-based analog beamformer, namely, fully connected, sub-connected, and dynamically connected [10]. Park [11] investigate JSDM with these analog beamformer architectures. The dynamic architecture gives better result at the cost of added complexity. In [12], authors propose a hybrid beamforming method with unified analog beamformer by Subspace Construction (SC) based on partial CSI in massive MIMO OFDM system. In [13], statistical CSI based analog beamformer uses regularized block diagonalization to mitigate the inter-group interference and instantaneous CSI based digital beamformer utilizes the weighted MMSE to suppress intra-group interference. Jiang et al. [14] jointly optimize the user selection and beam selection during analog beamforming design. They use Lyapunov-drift optimization framework to obtain the optimal solution. Their work only focuses on the design of statistical CSI based analog precoder and user/beam selection. Our previous work [15] on resource allocation for transmit beamforming develops digital and analog precoders which maximize the sum rate with total power and desired number of RF chains constraints. The provided solutions require full instantaneous CSI at the transmitter and receiver, which, in case of the massive MIMO, consists of large number of pilot transmission in downlink and channel information feedback in the uplink. In this work we exploit the channel similarities by grouping (K-Mean machine learning) the users based on the location information. Low complexity DFT matrix based analog precoder is derived using statistical CSI. This greatly reduces the feedback overhead for the design of zero-forcing digital precoder.

Machine learning (ML) applications for the physical layer of wireless communication systems have been widely reported in [16]. Most of the conventional transmitter and receiver blocks can be replaced by an ML-based auto encoder as suggested by the authors. The large number of antennas in massive MIMO leads to the challenging issue of channel estimation in mmWave communications. A common practice in TDD massive MIMO systems is to utilize the channel reciprocity to get the downlink CSI from uplink channel information estimates. However, in FDD, the channel reciprocity is not applicable and the downlink CSI estimation is very difficult. The downlink channel estimation is known to be hampered by the pilot contamination effect (user to base-station). The quality of channel estimates is deteriorated by the mutual interference caused by the non-orthogonal pilots in a cell. In [17], a supervised learning-based pilot decontamination scheme for massive MIMO uplink is reported. In the proposed ML-based solution, the users' locations in all cells and the pilot assignments stand for the input features and output labels, respectively. In [18], a deep learning network CsiNet is used to learn the CSI-to-codeword transformation (codebook approach is usually adopted to reduce the feedback overhead) at users' terminals and inverse CsiNet at base-station. The authors of [19], suggest a learning-based antenna selection for massive MIMO systems. It uses a multiclass K-NN and support vector machine (SVM) for data-driven optimal antenna selection. Wang et al. [20] employs K-nearest neighbor (*K* − *NN*) supervised learning for the *N* beams allocation among *K* users. In [21], a reinforcement learning based framework for radio resource management in radio access networks has been proposed. In our previous work [22], we used neural networks to reduce the execution time of the computationally intensive resource allocation part of the joint resource allocation and hybrid beamforming design in [15]. However, in this work, we use K-mean based unsupervised machine learning scheme to group the users based on their spatial locations. To the best of our knowledge, there is no research work that jointly consider the spatio–radio resource management and the hybrid beamforming in massive MIMO systems.

In this work, we use spatial channel covariance matrices for the analog beamforming design. We also consider the users to RF beam mapping. This mapping requires channel state information and a search over all possible beam combinations at the base-station. This search is exponential in the number of users [23]. Due to this exponential increase in complexity, we use DFT-based eigenmode beams with RF switches.

Contribution: In this paper, we develop joint spatio–radio resource and hybrid precoding algorithms for limited feedback wideband massive MIMO systems. The contributions of this paper are summarized as follow.


The rest of the paper is organized as follows. System, signal, and channel model along with the problem formulation are described in Section 2. Section 3 introduce the relaxed-convex transformation of the formulated mixed integer optimization problem. Suboptimal solution to the joint resource allocation and hybrid beamforming based on eigenmodes and discrete Fourier transform is given in Section 4. Section 5 proposes machine learning based users grouping and beam selection for joint optimization problem. Simulation results are given in Section 6, followed by the conclusions in Section 7.

*Notations:* Bold upper and lower case letters denote vectors and matrices, respectively. The notations **X**<sup>−</sup>1, **X**†, **X***T*, **X***H*, and *tr*(**X**) denote the inverse, pseudo-inverse, transpose, Hermitian transpose, and trace of a matrix **X**. *vec*{·} is a vector operator, *diag*{*x*1, ..., *xn*} is diagonal matrix, and ⊗ is the Kronecker product. ·*<sup>F</sup>* denotes the Frobenius norm. The *n* × *n* identity matrix is denoted by **<sup>I</sup>***n*. <sup>E</sup>{·} represents the expectation with respect to the random variable within the brackets.

#### **2. System Model**

Consider a FDD MU-MIMO downlink system where a base station (BS) with *Nt* antennas is located at the cell center and transmits to *K* single antenna users as shown in Figure 2. There are *G* groups of users such that the group *g* ∈ G = {1, ..., *G*}. Each group contains *Kg* users.

**Figure 2.** System Model.

Assume that the BS and users have the knowledge of the channel. We consider multi-carrier OFDM transmission with narrow-band blocK-fading channel. The BS is equipped with *Nt* antennas in linear antenna array (ULA) configuration. The information signal block **<sup>S</sup>** <sup>∈</sup> <sup>C</sup>*K*×*Nf* at the input of the BS transceiver for the user *k* is given as

$$\mathbf{s}\_{k} = [\mathbf{s}\_{1}, \mathbf{s}\_{2}, \dots, \mathbf{s}\_{N\_{f}}]\_{\prime} \quad \forall k\_{\prime} \tag{1}$$

and for the subchannel *n*,

$$\mathbf{s}\_{n} = [\mathbf{s}\_{1}, \mathbf{s}\_{2}, \dots, \mathbf{s}\_{K}]^{T}, \quad \forall n,\tag{2}$$

where *Nf* and *Ns* are the number of subchannels and the number of symbols per subchannel, respectively. In a subchannel *<sup>n</sup>*, the information symbol vector is **<sup>s</sup>** <sup>∈</sup> <sup>C</sup>*Ns*×1. We assume *Ns* <sup>=</sup> *<sup>K</sup>*, such that the transmit signal per subchannel *<sup>n</sup>* satisfying <sup>E</sup>{**s***n***s***<sup>H</sup> <sup>n</sup>* } <sup>=</sup> *Pn <sup>K</sup>* **I***K*, where *Pn* = *PT*/*Nf* is the transmit power per subchannel and *PT* is the total transmit power of the BS. The transmit signal vector **<sup>X</sup>** is obtained from **<sup>F</sup>***B***S**, where **<sup>F</sup>***<sup>B</sup>* <sup>∈</sup> <sup>C</sup>*Nt*×*Ns* is the precoding matrix. The hybrid beamforming divides the precoding matrix into baseband digital precoding matrix **<sup>F</sup>***DB* <sup>∈</sup> <sup>C</sup>*NRF*×*Ns* and RF analog precoding

matrix **<sup>F</sup>***AB* <sup>∈</sup> <sup>C</sup>*Nt*×*NRF* , where *NRF* is the number of RF chains as shown in Figure 3. The transmit signal **<sup>X</sup>** <sup>∈</sup> <sup>C</sup>*Nt*×*Nf* is given by

$$\begin{array}{rcl} \mathbf{X} &=& \mathbf{F}^{B} \mathbf{S} \\ &=& \mathbf{F}^{AB} \mathbf{F}^{DB} \mathbf{S} \end{array} \tag{3}$$

Also, the precoding matrix must satisfy

$$\begin{array}{rcl} \mathbb{E}\{tr(\mathbf{X}\mathbf{X}^{H})\} & \leq & P\_T\\ \mathbb{E}\{tr(\mathbf{F}^{AB}\mathbf{F}^{DB}\mathbf{S}\mathbf{S}^{H}\mathbf{F}^{DB^{H}}\mathbf{F}^{AB^{H}})\} & \leq & P\_T\\ tr(\mathbf{F}^{AB}\mathbf{F}^{DB}\mathbb{E}\{\mathbf{S}\mathbf{S}^{H}\}\mathbf{F}^{DB^{H}}\mathbf{F}^{AB^{H}}) & \leq & P\_T \end{array} \tag{4}$$

since <sup>E</sup>{**SS***H*} <sup>=</sup> *PT Nf <sup>K</sup>* **I***KNf* , therefore,

$$\text{tr}(\mathbf{F}^{AB}\mathbf{F}^{DB}\mathbf{F}^{DB^H}\mathbf{F}^{AB^H}) \le N\_f K \tag{5}$$

The transmit signal in subchannel *<sup>n</sup>* is **<sup>x</sup>***<sup>n</sup>* <sup>∈</sup> <sup>C</sup>*Nt*×1. Thus, the received signal vector **<sup>y</sup>***<sup>n</sup>* <sup>∈</sup> <sup>C</sup>*K*×<sup>1</sup> at *K* users in subchannel *n* is given by

$$\begin{array}{rcl} \mathbf{y}\_n &=& \mathbf{H}\_n^H \mathbf{x}\_n + \mathbf{w}\_n \\ &=& \mathbf{H}\_n^H \mathbf{F}^{AB} \mathbf{F}\_n^{DB} \mathbf{s}\_n + \mathbf{w}\_n \end{array} \tag{6}$$

where **H***n* - [**h**1,*n*, ..., **<sup>h</sup>***K*,*n*] <sup>∈</sup> <sup>C</sup>*Nt*×*<sup>K</sup>* is the channel matrix with **<sup>h</sup>***k*,*<sup>n</sup>* = [*h*1,*k*, ..., *hNt*,*k*] *<sup>H</sup>* being the channel vector from BS to user *k* in subchannel *n*, **x***<sup>n</sup>* = **F***DB <sup>n</sup>* **<sup>F</sup>***AB***s***n*, and **<sup>w</sup>***<sup>n</sup>* <sup>∼</sup> CN (**0**, *<sup>σ</sup>*2**I***K*) be the additive white Gaussian noise (AWGN) in subchannel *n* at the users. The RF beamforming **F***RF* is performed in time domain and the same beamforming is applied on all subchannels, whereas, the digital beamforming **F***DB <sup>n</sup>* is performed in frequency domain on the per subchannel basis [11]. In the *nth* subchannel, the *j th* UE receives the sum of all transmitted signals for *K* UEs over its MIMO channel **H***j*,*<sup>i</sup>* as

$$\mathbf{y}\_{j,n} = \sum\_{k=1}^{K} \mathbf{h}\_{j,n}^{H} \mathbf{x}\_{k,n} + w\_{j,n} \tag{7}$$

where **h***j*,*<sup>n</sup>* is the *Nt* × 1 channel vector. We denote the rank of the channel matrix **H***j*,*<sup>n</sup>* by *rj*,*n*, where 0 ≤ *rj*,*<sup>n</sup>* ≤ min(*K*, *Nt*), ∀*n*. In matrix form, the above equation is given as

$$\mathbf{y}\_{j,n} = \mathbf{h}\_{j,n}^H \mathbf{x}\_n + \mathbf{w}\_{j,n} \tag{8}$$

The 1 × *Nf* received signal at the *<sup>k</sup>th* UE is given by

$$\mathbf{y}\_k = [\mathbf{h}\_{k,1}^H \mathbf{F}^{AB} \mathbf{F}\_1^{DB} \mathbf{s}\_{1'} \dots \mathbf{h}\_{k,N\_f}^H \mathbf{F}^{AB} \mathbf{F}\_{N\_f}^{DB} \mathbf{s}\_{N\_f}] + \mathbf{w}\_{k'} \tag{9}$$

Combining the signals for all UEs in a *K* dimensional received signal vector **y** = [**y**1, ..., **y***K*] *H*, we get the system equation as

$$\mathbf{Y} = \mathbf{H}^{H} \mathbf{F}^{AB} \mathbf{F}^{DB} \mathbf{S} + \mathbf{W},\tag{10}$$

where **<sup>Y</sup>**, **<sup>W</sup>** <sup>∈</sup> <sup>C</sup>*K*×*Nf* .

**Figure 3.** Block diagram of mmWave massive MIMO BS and UEs.

#### *2.1. Channel Model*

Generally, massive MIMO channel models are categorized in two types (i) analytical models and (ii) physical models [4]. Analytical models are commonly used for the theoretical analysis of wireless communication systems. The most commonly used analytical model is Kronecker channel model. It is a correlation-based model and characterizes the MIMO channel matrix in terms of the separate transmit and receive side spatial correlation matrices [24],

$$\mathbf{R}\_{H} = \mathbb{E}\{\text{vec}\{\mathbf{H}\}\text{vec}\{\mathbf{H}^{H}\}\}$$

$$= \mathbf{R}\_{tx} \otimes \mathbf{R}\_{rx} \tag{11}$$

under the above assumptions, the channel model **H** is simplified to Kronecker model,

$$\mathbf{H} = \mathbf{R}\_{rx}^{1/2} \mathbf{K} \mathbf{R}\_{tx}^{1/2} \tag{12}$$

where **K** ∼ CN (0, 1) is an i.i.d. unit variance MIMO channel matrix, **R***tx* and **R***rx* are the transmit and receive corrrelation matrices, respectively. The transmit and receive correlation matrices are given as [24],

$$\mathbf{R}\_{\rm tx} = \mathbb{E}\{\mathbf{H}^H \mathbf{H}\}, \qquad \mathbf{R}\_{\rm rx} = \mathbb{E}\{\mathbf{H} \mathbf{H}^H\} \tag{13}$$

The physical models explicitly model wave propagation parameters like the complex amplitude, DoD, DoA, and delay of an MPC [24,25]. MmWave propagation leads to limited spatial scattering due to the high free-space pathloss. In addition, the large tightly packed antenna arrays lead to high levels of antenna correlation. The sparse scattering and antennas spatial correlation makes many of the commonly used statistical fading distributions inaccurate for mmWave channel modeling. Therefore, we use extended Saleh-Valenzuela model, which accurately describes the mathematical structure present in mmWave channels [26,27]. For simplicity, we assume that each scattering cluster around the transmitter and receiver contributes a single propagation path [28].

In general, the mmWave MIMO channel matrix between the BS with *Nt* transmit antennas and a user *k* with *nr* receive antennas in subchannel *n*, can be modeled as double directional channel,

$$\mathbf{H}\_{k,n} = \sqrt{\frac{\mathbf{N}\_l n\_r}{\rho\_{k,n} L}} \sum\_{l=0}^{L} \alpha\_{k,n,l} \mathbf{a}(\phi\_{k,n,l}) \mathbf{b}^H(\theta\_{k,n,l}), \tag{14}$$

where *L* is the total number of multipaths, *αk*,*n*,*<sup>l</sup>* is the complex gain of the *l th* path with i.i.d. CN (0, 1), and *ρk*,*<sup>n</sup>* is the distance dependent pathloss between the BS and user *k* [29]. The LOS path is included with *l* = 0. Moreover, **a** and **b** are the receive and transmit steering vectors, respectively. The variables *φk*,*n*,*<sup>l</sup>* ∈ [0, 2*π*) and *θk*,*n*,*<sup>l</sup>* ∈ [0, 2*π*) are the *l th* path's azimuth angles (boresight angles in the receive array and transmit array) of arrival and departure, respectively. The steering vectors are given by

$$\mathbf{a}(\phi\_{k,n,l}) = \frac{1}{\sqrt{N\_r}} [a\_1(\phi\_l)\_{\prime}, \dots, a\_{n\_r}(\phi\_{k,n,l})]^T \tag{15}$$

$$\mathbf{b}\left(\theta\_{k,\mathbf{u},l}\right) = \frac{1}{\sqrt{N\_l}} [b\_1(\theta\_l)\_\prime \dots \theta\_{N\_l}(\theta\_{k,\mathbf{u},l})]^T \tag{16}$$

The elements of transmit and receive steering vectors are given by

$$b\_{\mathbf{i}}(\theta\_{k,n,l}) = e^{-j\omega\tau\_{\mathbf{i},l}} = e^{-j2\pi(i-1)\frac{d\_{\mathbf{i}}}{\Lambda}\sin(\theta\_{k,n,l})}, \text{ i } = 1, 2, \dots, N\_{\mathbf{l}} \tag{17}$$

$$a\_{\mathbf{i}}(\phi\_{\mathbf{k},n,l}) = e^{-j\omega\tau\_{l,l}} = e^{-j2\pi(i-1)\frac{d\_{\mathbf{r}}}{\lambda}\sin(\phi\_{\mathbf{k},n,l})}, \text{ i } = 1, 2, \dots, n\_{\mathbf{r}} \tag{18}$$

where *λ* is the wavelength, *ω* = <sup>2</sup>*<sup>π</sup> <sup>λ</sup>* , *τ<sup>i</sup>* is the beamforming delay, and *dt* and *dr* are the antenna spacing at the transmitter and receiver, respectively.

The channel matrix in (14) can also be written in more compact form as

$$\mathbf{H}\_{k,n} = \sqrt{\nu} \mathbf{A}\_{k,n} \mathbf{D}\_{k,n} \mathbf{B}\_{k,n}^H \tag{19}$$

where *ν* = *Ntnr <sup>ρ</sup><sup>L</sup>* and, **A***k*,*<sup>n</sup>* and **B***k*,*<sup>n</sup>* consist of stacked steering vectors of AoA and AoD, respectively, i.e., **A***k*,*<sup>n</sup>* = [**a**(*φk*,*n*,1), **a**(*φk*,*n*,2), ... , **a**(*φk*,*n*,*L*)] and **B***k*,*<sup>n</sup>* = [**b**(*θk*,*n*,1), **b**(*θk*,*n*,2), ... , **b**(*θk*,*n*,*L*)]. The matrix **D***k*,*<sup>n</sup>* is a diagonal matrix, given as **D***k*,*<sup>n</sup>* = *diag*{*αk*,*n*,1, *αk*,*n*,2, ... , *αk*,*n*,*L*}. The small scale fading at user *k* in subchannel *n* in multipath component (MPC) is given by *αk*,*n*,*<sup>l</sup>* with zero mean and variance *σ*<sup>2</sup> *k*,*n*,*l* . Assume that each MPC is i.i.d. such that ∑*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>k</sup>*,*n*,*<sup>l</sup>* = 1. We can express the channel model in (19) as

$$\mathbf{H}\_{k,n} = \sqrt{\nu} \mathbf{A}\_{k,n} \boldsymbol{\Sigma}\_{k,n} \mathbf{D}\_{k,n} \mathbf{B}\_{k,n'}^{H} \tag{20}$$

where **<sup>Σ</sup>***k*,*<sup>n</sup>* = *diag*{*σ*<sup>2</sup> *<sup>k</sup>*,*n*,1, *<sup>σ</sup>*<sup>2</sup> *<sup>k</sup>*,*n*,2 ... , *<sup>σ</sup>*<sup>2</sup> *<sup>k</sup>*,*n*,*L*} and **D¯** *<sup>k</sup>*,*<sup>n</sup>* <sup>=</sup> *diag*{*α*¯ *<sup>k</sup>*,*n*,1, *<sup>α</sup>*¯ *<sup>k</sup>*,*n*,2 ... , *<sup>α</sup>*¯ *<sup>k</sup>*,*n*,*L*} with *<sup>α</sup>*¯ *<sup>k</sup>*,*n*,*<sup>l</sup>* <sup>=</sup> *<sup>α</sup>k*,*n*,*<sup>l</sup> σk*,*n*,*<sup>l</sup>* such that *<sup>E</sup>*{*α*¯ *<sup>k</sup>*,*n*,*l*} = 0 and *<sup>E</sup>*{*α*¯ <sup>2</sup> *k*,*n*,*l* } = 1.

Substituting (20) in (13) and averaging over small scale fading, we get the transmit and receive correlation matrices for user *k* in the subchannel *n* as

$$\mathbf{R}\_{t\ge k,n} = \nu \mathbf{B}\_{k,n} \boldsymbol{\Sigma}^2 \mathbf{B}\_{k,n}^H \tag{21}$$

$$\mathbf{R}\_{rx,k,\mathbf{u}} = \nu \mathbf{A}\_{k,\mathbf{u}} \Sigma^2 \mathbf{A}\_{k,\mathbf{u}'}^H \tag{22}$$

For mmWave massive MIMO systems with large number of antennas, the steering vectors are asymptotically orthogonal to each other [6]:

$$\mathbf{a}^{H}(\phi\_{k,\boldsymbol{\eta}})\mathbf{a}(\phi\_{k,\boldsymbol{\eta}'}) \approx 0,$$

$$\mathbf{b}^{H}(\theta\_{k,\boldsymbol{\eta}})\mathbf{b}(\theta\_{k,\boldsymbol{\eta}'}) \approx 0. \tag{23}$$

Moreover, in mmWave massive MIMO, acquisition of the instantaneous full CSI is not practical. Instead, an average CSI in terms of [**A***k*,*n*], [**B***k*,*n*], and [Σ*k*,*n*] is a practical solution for the beamforming design because the coherence time of the channel statistics based CSI is of the order of few seconds or more as compared to the small scale of the order of milli-second [6].

#### *2.2. Problem Formulation*

The hybrid beamforming divides the beamforming matrix into two parts: covariance-based pre-beamforming matrix **F***AB* realized by analog beamformers and the reduced dimension MU-MIMO digital precoding based on the effective channel **H***H***F***AB* (omitting the subchannel subscript for simplicity). We assume that *K* users are divided into *G* groups, such that, the group *g* contains *Kg* number of users. Since users are near the ground level and surrounded by the scatterers compared to the scatterer-free elevated base-station, we assume one-ring model [1] and all users in group *g* experience the same azimuth center angle (*θg*) and angular spread (Δ*g*). In this case, **R***rx* = **I** in (12), therefore, the channel covariance matrix of each user in group *g* is given by [30]

$$\mathbf{R}\_{\%} = \mathbb{E}\{\mathbf{h}\_{\%k} \mathbf{h}\_{\%k}^H\} \tag{24}$$

for which the eigenvalue decomposition gives

$$\mathbf{R}\_{\mathcal{S}} = \mathbf{U}\_{\mathcal{S}} \mathbf{A}\_{\mathcal{S}} \mathbf{U}\_{\mathcal{S}}^H \tag{25}$$

where **<sup>U</sup>***<sup>g</sup>* <sup>∈</sup> <sup>C</sup>*Nt*×*rg* is a tall unitary matrix (**U***g***U***<sup>H</sup> <sup>g</sup>* = **I**) comprises the eigenvectors of **R***<sup>g</sup>* and **Λ***<sup>g</sup>* ∈ <sup>R</sup>*rg*×*rg* is diagonal matrix with *rg* nonzeros positive eigenvalues along the diagonal. The *<sup>i</sup>*, *<sup>j</sup>* <sup>−</sup> *th* element of covariance matrix **R***<sup>g</sup>* represents the correlation between the channel coefficients antenna element *i* and *j* as

$$[\mathbf{R}\_{\mathcal{S}}]\_{i,j} = \frac{1}{2\Delta\_{\mathcal{S}}} \int\_{\theta\_{\mathcal{S}} - \Delta\_{\mathcal{S}}}^{\theta\_{\mathcal{S}} + \Delta\_{\mathcal{S}}} e^{-j2\pi \frac{\theta}{\lambda}(i-j)\sin\theta} d\theta\_{\mathbf{v}} \tag{26}$$

where *d* is the distance between antenna elements of ULA and *λ* is the wavelength of carrier frequency. Using the Karhunen-Loeve representation, the channel vector of user *k* in group *g* is given as

$$\mathbf{h}\_{\mathbb{S}k} = \mathbf{U}\_{\mathbb{S}} \boldsymbol{\Lambda}\_{\mathbb{S}}^{1/2} \mathbf{z}\_{\mathbb{S}} = \mathbf{U}\_{\mathbb{S}} \tilde{\mathbf{h}}\_{\mathbb{S}k} \tag{27}$$

where **<sup>z</sup>***<sup>g</sup>* <sup>∈</sup> <sup>C</sup>*Nt*×<sup>1</sup> ∼ CN (**0**,**I***rg* ) and **h˜** *gk* is beam domain channel. For large *Nt*, **<sup>U</sup>***<sup>g</sup>* tends to discrete Fourier transform (DFT) matrix <sup>Δ</sup>*Nt* <sup>∈</sup> <sup>C</sup>*Nt*×*Nt* [31]. Each column of **<sup>U</sup>***<sup>g</sup>* represents one direction of angle-of-departure (AoD), i.e., a *beam*.

Alternatively, for the case, when dominant eigenvalues *r*ˆ*<sup>g</sup>* ≤ *rg*, then, the channel matrix can be written as ([13], Equation (5))

$$\mathbf{h}\_{\%k} = N\_t^{1/2} \mathbf{R}\_{\%}^{1/2} \mathbf{z}\_{\%k} \tag{28}$$

The limited feedback-based hybrid beamforming consists of analog pre-beamforming matrix **F***AB <sup>g</sup>* <sup>∈</sup> <sup>C</sup>*Nt*×*NRF*,*<sup>g</sup>* responsible for spatial group formation and inter-group interference mitigation; and the digital multi-users precoding matrix **F***DB <sup>g</sup>* <sup>∈</sup> <sup>C</sup>*NRF*,*g*×*Sg* for spatial multiplexing inside the group and inter-user interference mitigation. Here, *NRF*,*<sup>g</sup>* is the number of RF chains for group *g* such that *Sg* < *NRF*,*<sup>g</sup>* < *r*ˆ*<sup>g</sup>* and *Sg* is the number of multi-carrier information symbols vectors for group *g* with *NRF* = ∑*<sup>G</sup> <sup>g</sup>*=<sup>1</sup> *NRF*,*<sup>g</sup>* and *<sup>S</sup>* = <sup>∑</sup>*<sup>G</sup> <sup>g</sup>*=<sup>1</sup> *Sg*. The overall analog pre-beamforming matrix **<sup>F</sup>***AB* <sup>∈</sup> <sup>C</sup>*Nt*×*NRF* is given by

$$\mathbf{F}^{AB} = [\mathbf{F}\_1^{AB}, \dots, \mathbf{F}\_G^{AB}]\_\prime \tag{29}$$

and the overall digital beamforming matrix **<sup>F</sup>***DB* <sup>∈</sup> <sup>C</sup>*NRF*×*Ns* is given by

$$\mathbf{F}^{DB} = \operatorname{diag} [\mathbf{F}\_1^{DB}, \dots, \mathbf{F}\_G^{DB}] \tag{30}$$

and the overall channel matrix

$$\mathbf{H} = [\mathbf{H}\_1, \dots, \mathbf{H}\_G],\tag{31}$$

where the channel matrix of group *g* is defined as **H***g* -[**h***g*<sup>1</sup> , ..., **h***gKg* ].

The analog pre-beamforming **F***AB <sup>g</sup>* is based on the slowly varying channel covariance matrix **R***<sup>g</sup>* and can be implemented by the DFT matrix (when *Nt* is large), whereas, the digital beamformer **F***DB <sup>g</sup>* is based on the instantaneous channel information of the reduced dimension effective channel **H***<sup>H</sup> <sup>g</sup>* **F***AB <sup>g</sup>* . The overall effective channel is given by

$$\mathbf{H}\_{eff}^{H} = \mathbf{H}^{H} \mathbf{F}^{AB} = \begin{bmatrix} \mathbf{H}\_1^{H} \mathbf{F}\_1^{AB} & \mathbf{H}\_1^{H} \mathbf{F}\_2^{AB} & \cdots & \mathbf{H}\_1^{H} \mathbf{F}\_G^{AB} \\ \mathbf{H}\_2^{H} \mathbf{F}\_1^{AB} & \mathbf{H}\_2^{H} \mathbf{F}\_2^{AB} & \cdots & \mathbf{H}\_2^{H} \mathbf{F}\_G^{AB} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{H}\_G^{H} \mathbf{F}\_1^{AB} & \mathbf{H}\_G^{H} \mathbf{F}\_2^{AB} & \cdots & \mathbf{H}\_G^{H} \mathbf{F}\_G^{AB} \end{bmatrix} \tag{32}$$

The excessive pilot transmission in downlink and feedback in uplink of FDD system can be reduced by only sending the group-wise average CSI based channel estimates in uplink. This is accomplished by using the diagonal elements **H***<sup>H</sup> <sup>g</sup>* **F***AB <sup>g</sup>* as feedback information with the size of *Kg* × *NRF*,*<sup>g</sup>* for *g* = 1, ...*G*. The analog pre-beamforming is designed in such a way that the other elements of matrix (32) **H***<sup>H</sup> <sup>g</sup>* **F***AB <sup>g</sup>* ≈ 0 for all *g* = *g*. This group-wise division creates virtual sectors, each group corresponds to a virtual sector [30].

The second order channel statistics-based RF beamformer **F***AB* remains the same across multiple coherence blocks which gives the effective instantaneous channel between BS and user *k* as

$$\mathbf{h}\_{n\_k \ll\_k \mathcal{E}ff}^H \overset{\triangle}{=} \mathbf{h}\_{n\_k \ll\_k}^H \mathbf{F}\_{\mathcal{S}}^{AB} \, \prime \tag{33}$$

with **<sup>h</sup>***n*,*gk*,*eff* <sup>∈</sup> <sup>C</sup>*NRF*,*g*<sup>×</sup>1. Therefore, channel statistics-based CSI sufficiently reduces the feedback overhead on each user, otherwise, for instantaneous CSI, each user have to send the *Nt* × 1 size of channel estimate on the uplink channel. The covariance of effective channel **h***<sup>H</sup> <sup>n</sup>*,*gk*,*eff* is given by using (13) as,

$$\begin{split} \mathbb{E}\{\mathbf{h}\_{n,\S k}\mathbf{z}\_{ff}\mathbf{h}\_{n,\emptyset,\mathcal{E}ff}^{H}\} &=& \mathbb{E}\{\mathbf{F}\_{\mathcal{S}}^{AB}\mathbf{h}\_{n,\emptyset}\mathbf{h}\_{n,\emptyset\_{k}}^{H}\mathbf{F}\_{\mathcal{S}}^{AB}\} \\ &=& \nu \mathbf{F}\_{\mathcal{S}}^{AB^{H}}\mathbf{B}\_{k,n}\Sigma^{2}\mathbf{B}\_{k,n}^{H}\mathbf{F}\_{\mathcal{S}}^{AB} \end{split} \tag{34}$$

The analog beamformer consists of columns of the DFT matrix, which can be easily implemented by phase shifter network. Therefore, **F***AB <sup>n</sup>*,*<sup>g</sup>* can be obtained by eigenvalue decomposition of channel covariance matrix. With the group-wise hybrid beamforming, the received signal **y***g*,*<sup>n</sup>* for group *g* in subchannel *n* becomes

$$\mathbf{y}\_{\mathcal{S},n} = \mathbf{H}\_{\mathcal{S},n}^{H} \mathbf{F}\_{\mathcal{S}}^{AB} \mathbf{F}\_{\mathcal{S},n}^{DB} \mathbf{s}\_{\mathcal{S},n} + \mathbf{H}\_{\mathcal{S},n}^{H} \sum\_{\mathcal{S}' \neq \mathcal{S}} \mathbf{F}\_{\mathcal{S}'}^{AB} \mathbf{F}\_{\mathcal{S}',n}^{DB} \mathbf{s}\_{\mathcal{S}',n} + \mathbf{w}\_{\mathcal{S},n} \tag{35}$$

and the received signal of user *k* in group *g* in subchannel *n* is given by

$$\begin{split} \mathcal{Y}\_{\mathcal{S}l,n} &= \mathbf{h}\_{\mathcal{S}l,n}^{H} \mathbf{F}\_{\mathcal{S}}^{AB} \mathbf{f}\_{\mathcal{S}l,n}^{DB} \mathbf{s}\_{\mathcal{S}l,n} + \underbrace{\mathbf{h}\_{\mathcal{S}l,n}^{H} \sum\_{k' \neq k} \mathbf{F}\_{\mathcal{S}}^{AB} \mathbf{f}\_{\mathcal{S}\_{k',n}}^{DB} \mathbf{s}\_{\mathcal{S}k',n}}\_{\text{Inter-user interference}} \\ &+ \underbrace{\mathbf{h}\_{\mathcal{S}l,n}^{H} \sum\_{\mathcal{S}' \neq \mathcal{S}} \sum\_{j} \mathbf{F}\_{\mathcal{S}'}^{AB} \mathbf{f}\_{\mathcal{S}',n}^{DB} \mathbf{s}\_{\mathcal{S}',n}}\_{\text{Inter-group interference}} + w\_{\mathcal{S}k,n} \tag{36} \end{split} \tag{36}$$

The received signal to interference and noise ratio (SINR) at the user *k* in group *g* and subchannel *n* is given by

$$SINR\_{\mathbb{S}\_k\mathbb{A}} = \frac{|\mathbf{h}\_{\mathbb{S}\_k\mathbb{A}}^H \mathbf{F}\_{\mathbb{S}'}^{AB} \mathbf{f}\_{\mathbb{S}\_k\mathbb{A}}^{DB}|^2}{|\mathbf{h}\_{\mathbb{S}\_k\mathbb{A}}^H|^2 \left(\sum\_{k' \neq k} |\mathbf{F}\_{\mathbb{S}'}^{AB} \mathbf{f}\_{\mathbb{S}\_{k'}\mathbb{A}}^{DB}|^2 + \sum\_{\mathbf{g}' \neq \mathbf{g}} \sum\_j |\mathbf{F}\_{\mathbb{S}'}^{AB} \mathbf{f}\_{\mathbb{S}\_{j'}'}^{BB}|^2\right) + \sigma^2}}\tag{37}$$

The spectral efficiency of user *k* in group *g* and subchannel *n* is expressed as

$$R\_{\mathbb{S}k^{\mathcal{U}}} = \Psi\_{\mathbb{S}k^{\mathcal{U}}} \log\_2(1 + SINR\_{\mathbb{S}k^{\mathcal{U}}}),\tag{38}$$

where Ψ*gk*,*<sup>n</sup>* is the binary variable such that it is equal to 1 if user *k* is selected in group *g* in the subchannel *n*. In order to achieve balance tradeoff between throughput and fairness [32], we use proportional fairness (PF) based throughput maximization. We define per user proportional fairness metric as

$$\mathbb{U}(\mathbf{F}\_{\mathcal{S}}^{AB}, \mathbf{f}\_{\mathcal{S}k}^{DB}) = \frac{R\_{\mathcal{S}k}\mathfrak{u}(t)}{R\_{\mathcal{S}k}\mathfrak{u}(t)}, \forall \mathfrak{n}\_{\prime}\mathfrak{g}\_{k\prime} \tag{39}$$

where *<sup>R</sup>*¯ *gk*,*n*(*t*) is average throughput (moving average) over a past window of length *Tw* = 1/*<sup>α</sup>* [33], as

$$
\bar{R}\_{\mathbb{S}k^{\mathcal{U}}}(t) = a \mathcal{R}\_{\mathbb{S}k^{\mathcal{U}}}(t-1) + (1-a)\bar{\mathcal{R}}\_{\mathbb{S}k^{\mathcal{U}}}(t-1), \tag{40}
$$

The large number of antennas in massive MIMO systems enable the use of the eigenmodes of the channel covariance matrix, i.e., **B***k*,*<sup>n</sup>* comprises of the columns of the DFT matrix [6]. DFT-based beams with *Nt* = 16 and *Nt* = 64 are shown in Figure 4a,b, respectively.

**Figure 4.** DFT-based beams in a 120 sector. (**a**) DFT-based beams in a 120 sector with *Nt* = 16; (**b**) DFT-based beams in a 120 sector with *Nt* = 64.

The beam steering matrix **B***k*,*<sup>n</sup>* consists of selected columns of *Nt* × *Nt* DFT matrix Δ*Nt* such that

$$\mathbf{F}\_{\mathcal{S}}^{AB} = \mathbf{B}\_{k,n} = \Delta\_{N\_t} \mathbf{Y}\_n \tag{41}$$

where Δ*Nt* = [**b**1, **b**2, ..., **b***Nt* ] consisting of all eigenmodes and **Υ***<sup>n</sup>* is an *Nt* × *rR* binary beam selection matrix, with *rR* is the rank of the channel covariance matrix. The selection matrix **<sup>Υ</sup>***<sup>n</sup>* <sup>∈</sup> <sup>C</sup>*Nt*×*rR* with only a single one on each row and column such that ∑*i*[**Υ***n*]*i*,*<sup>j</sup>* = 1 ∀*j*. Now we formulate our optimization problem for joint spatio–radio resource allocation and precoders design with the objective to maximize the utility function as

$$\max\_{\mathbf{F}\_{\mathcal{X}}^{AB}, \mathbf{f}\_{\mathcal{X}\boldsymbol{\aleph},n}^{DB}} \quad \sum\_{\mathcal{S}=1}^{G} \sum\_{n=1}^{N\_f} \sum\_{k=1}^{K\_{\mathcal{S}}} \mathcal{U}\left(\mathbf{F}\_{\mathcal{X}}^{AB}, \mathbf{f}\_{\mathcal{S}k,n}^{DB}\right) \tag{42}$$

subject to

$$\begin{array}{llll} \text{C1}: & tr(\mathbf{F}\_{\text{n}}^{DB}\mathbf{F}^{ABH}\mathbf{F}^{AB}\mathbf{F}\_{\text{n}}^{DB}) \leq P\_{\text{n}\prime} & \forall \text{n} \\ \text{C2}: & rrank(\mathbf{F}^{AB}\mathbf{F}^{DB}) \leq N\_{\text{RF}\prime} \\ \text{C3}: & \sum\_{k=1}^{K\_{\text{S}}} \Psi\_{k,\text{n}} \leq K\_{\text{S}\prime} & \forall \text{g}, \text{n} \\ \text{C4}: & tr(\mathbf{Y}\_{\text{n}}) \leq G\_{\prime} & \forall \text{n} \\ \text{C5}: & [\mathbf{Y}\_{k,\text{n}}]\_{\prime}[\mathbf{Y}\_{\text{n}}] \in \{0,1\}. \end{array}$$

The above optimization problem is a mixed integer programming (MIP) problem with coupling between the digital and RF precoders in the power constraint. This MIP problem is NP-hard [14].

#### **3. Relaxed-Convex Transformation**

Though the above MIP optimization problem is NP-hard, it can be transformed to a relaxed convex optimization problem by (i) relaxing the binary integer constraints to real number between 0 and 1 [14], and (ii) decoupling the digital and analog precoders. For decoupling purpose, we make use of change of variables **F***DB <sup>n</sup>* = (**F***ABH***F***AB*)<sup>−</sup> <sup>1</sup> <sup>2</sup> **F˜** *DB <sup>n</sup>* , where **F˜** *DB <sup>n</sup>* is the *equivalent* digital precoder [34]. Thus, the problem in (42) can be written as

$$\begin{array}{ccccc}\max\_{\mathbf{F}\_{\mathcal{S}}^{AB},\mathbf{f}\_{\mathcal{S};\mathcal{B}}^{DB}} & \sum\_{\mathcal{S}}^{G} \sum\_{n=1}^{N\_{f}} \sum\_{k=1}^{K\_{\mathcal{S}}} \mathcal{Q}\_{\mathcal{S};n} \left(\mathbf{F}\_{\mathcal{S}}^{AB},\mathbf{f}\_{\mathcal{S};n}^{DB}\right) \\ \text{subject to} & \\ \mathbf{C}\mathbf{1}: & tr\big(\mathbf{F}\_{n}^{DB}\mathbf{\tilde{F}}\_{n}^{DB}\big) \leq P\_{n}, \quad \forall n \\ \mathbf{C}\mathbf{2}: & rrank(\mathbf{\tilde{F}}^{DB}) \leq N\_{RF} \\ \mathbf{C}\mathbf{3}: & \sum\_{k=1}^{K\_{\mathcal{S}}} \Psi\_{k,n} \leq K\_{\mathcal{S}}, \quad \forall g,n, \\ \mathbf{C4}: & tr(\mathbf{\tilde{Y}}\_{n}) \leq G, \quad \forall n, \\ \mathbf{C5}: & 0 \leq \left[\Psi\_{k,n}\right] \leq 1, \quad 0 \leq \left[\mathbf{Y}\_{n}\right] \leq 1, \end{array}$$

For a given RF precoder **F***AB* and the knowledge of perfect CSI at the base-station, the digital precoder can be obtained by conventional MU-MIMO techniques, e.g., the *zero-forcing* and *block diagonalization* [15].

For the digital precoder, we adopt the ZF precoder for no multiuser interference among the users in each groups. The beamforming vector of user k is chosen to be orthogonal to the effective channel vectors of all the other users in the group. Zero-forcing is a suboptimal but low complexity approach within the linear precoders' class. ZF precoder is asymptotically optimal among all downlink beamforming techniques in high SNR region. It guarantees high spectral efficiency for large-scale antennas with low-complexity linear processing [35]. For *Nt Nr*, it has shown that zero-forcing beamforming can achieve up to 98% of the non-linear dirty paper coding (DPC) capacity [36]. In order to make this paper self-contained, we describe the block diagonalization briefly. Since digital precoder is used to mitigate the multiuser interference within a groups and all groups are independent, we omit the subscript *g*. First we consider the downlink transmission over one subchannel *n* with the general case of BS with *Nt* antennas and *Kn* users with *nk* antennas each, such that ∑*<sup>K</sup> <sup>k</sup>*=<sup>1</sup> *nk* = *Nr*. The downlink channel on the subchannel *n* is expressed as *Nr* × *Nt* matrix,

$$\mathbf{H}\_{n,eff} = [\mathbf{H}\_{1,n,eff}^T, \dots, \mathbf{H}\_{K\_n,n,eff}^T]^T \tag{44}$$

For user *k*, we define the following (*Nr* − *nk*) × *Nt* channel matrix

$$\mathbf{H}'\_{k,n,eff} = [\mathbf{H}^T\_{1,n,eff}, \dots, \mathbf{H}^T\_{K-1,n,eff}, \mathbf{H}^T\_{k+1,n,eff}, \dots, \mathbf{H}^T\_{K\_n,n,eff}]^T \tag{45}$$

Let the rank of **H** *<sup>k</sup>*,*n*,*eff* be denoted by *r <sup>k</sup>*,*n*, then the nullspace of **H** *<sup>k</sup>*,*n*,*eff* has dimension *Nt* − *r <sup>k</sup>*,*<sup>n</sup>* ≥ *nk*. Performing the SVD of each user's channel matrix in subchannel *n* leads to the following

$$\mathbf{H}'\_{k,n,eff} = \mathbf{U}'\_{k,n} \boldsymbol{\Sigma}'\_{k,n} \mathbf{V}'^{H}\_{k,n} = \mathbf{U}'\_{k,n} \boldsymbol{\Sigma}'\_{k,n} [\mathbf{V}'^{(1)}\_{k,n} \mathbf{V}'^{(0)}\_{k,n}]^{H},\tag{46}$$

where **U** *<sup>k</sup>*,*<sup>n</sup>* and **V** *<sup>k</sup>*,*<sup>n</sup>* are the unitary matrices. The columns of **U** *<sup>k</sup>*,*<sup>n</sup>* are the left singular vectors of **H** *<sup>k</sup>*,*n*,*eff* , the columns of **V** *<sup>k</sup>*,*<sup>n</sup>* are the right singular vectors of **H** *<sup>k</sup>*,*n*,*eff* , and **Σ** *<sup>k</sup>*,*<sup>n</sup>* is a diagonal matrix in which the diagonal entries are the singular values of **H** *<sup>k</sup>*,*n*,*eff* . In the last equality of (46), **V** (1) *<sup>k</sup>*,*<sup>n</sup>* holds the first *r <sup>k</sup>*,*<sup>n</sup>* right singular vectors of **H** *<sup>k</sup>*,*n*,*eff* and **V** (0) *<sup>k</sup>*,*<sup>n</sup>* contains the *Nt* − *r <sup>k</sup>*,*<sup>n</sup>* singular vectors of **H** *k*,*n*,*eff* which are in the nullspace of **H** *<sup>k</sup>*,*n*,*eff* . The columns of **V** (0) *<sup>k</sup>*,*<sup>n</sup>* are best suited for user *k* beamforming matrix **F***DB <sup>k</sup>*,*<sup>n</sup>* , because they will provide zero interference at other UEs. Usually **V** (0) *<sup>k</sup>*,*<sup>n</sup>* contains more number of columns than the *nk*, therefore we use some linear combinations of the columns of **V** (0) *<sup>k</sup>*,*<sup>n</sup>* to make at most *nk* columns.

$$\mathbf{H}\_{k,n,eff} \mathbf{V}\_{k,n}^{(0)} = \mathbf{U}\_{k,n} \begin{bmatrix} \mathbf{E}\_{k,n} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{V}\_{k,n}^{(1)} & \mathbf{V}\_{k,n}^{(0)} \end{bmatrix}^H \tag{47}$$

where **H***k*,*n*,*eff* **V** (0) *<sup>k</sup>*,*<sup>n</sup>* gives the matrix with columns as the linear combinations of the columns of **V** (0) *<sup>k</sup>*,*<sup>n</sup>* . The right hand side of the equation is the SVD of **H***k*,*n*,*eff* **V** (0) *<sup>k</sup>*,*<sup>n</sup>* , where **Σ***k*,*<sup>n</sup>* is *rk*,*<sup>n</sup>* × *rk*,*<sup>n</sup>* diagonal matrix and **<sup>V</sup>**(1) *<sup>k</sup>*,*<sup>n</sup>* represents the *rk*,*<sup>n</sup>* singular vectors with nonzero singular values of **H***k*,*n*,*eff* **V** (0) *<sup>k</sup>*,*<sup>n</sup>* . The Equation (47) can also be written as,

$$\begin{array}{rcl} \mathbf{H}\_{k,n,eff} \mathbf{V}\_{k,\mathfrak{u}}^{'(0)} &=& \mathbf{U}\_{k,\mathfrak{u}} \boldsymbol{\Sigma}\_{k,\mathfrak{u}} \mathbf{V}\_{k,\mathfrak{u}}^{(1)} \\\\ \mathbf{H}\_{k,n,eff} &=& \mathbf{U}\_{k,\mathfrak{u}} \boldsymbol{\Sigma}\_{k,\mathfrak{u}} \mathbf{V}\_{k,\mathfrak{u}}^{(1)} \boldsymbol{\Psi}\_{k,\mathfrak{u}}^{'(0)} \end{array} \tag{48}$$

The transmit beamforming matrix that maximizes the user *k* throughput without any inter-user interference is obtained as,

$$\mathbf{F}\_{k,u}^{DB} = \mathbf{V}\_{k,u}^{\prime(0)} \mathbf{V}\_{k,u}^{(1)} \tag{49}$$

The transmit digital beamforming matrix for subchannel *n* is defined as

$$\mathbf{F}\_n^{DB} = [\mathbf{F}\_{1,n}^{DB}, \dots, \mathbf{F}\_{K\_n,n}^{DB}] \mathbf{P}\_n^{1/2},\tag{50}$$

where **F***<sup>B</sup> k*,*n <sup>H</sup>***F***<sup>B</sup> <sup>k</sup>*,*<sup>n</sup>* = **I**, 1 ≤ *k* ≤ *Kn* and **P***<sup>n</sup>* is a block diagonal matrix whose elements scale the power allocated to each interference-free virtual subchannel for all UEs. The receive combining matrix for this user is **U***k*,*<sup>n</sup>* [37].

In the case of single antenna users, complete diagonalization is achieved entirely at the BS by channel inversion, i.e., **F***DB <sup>n</sup>* = (**H***<sup>H</sup> <sup>n</sup>*,*eff*)†, where (**H***<sup>H</sup> <sup>n</sup>*,*eff*)† is the pseudo-inverse of **<sup>H</sup>***<sup>H</sup> <sup>n</sup>*,*eff* [38].

$$\mathbf{F}\_n^{DB} = \boldsymbol{\beta}\_n \left( \mathbf{H}\_{n,eff}^H \right)^\dagger$$

$$\left[ \mathbf{f}\_{1,n'}^{DB}, \dots, \mathbf{f}\_{K,n}^{DB} \right] = \boldsymbol{\beta}\_n \left[ [\mathbf{h}\_{1,n'}, \dots, \mathbf{h}\_{K,n}]^H \mathbf{F}^{AB} \right]^\dagger \tag{51}$$

where *β<sup>n</sup>* is a normalization factor chosen to satisfy the power constraint and is given by

$$\beta\_n^2 = \frac{K}{||\mathbf{F}^{AB}\mathbf{F}\_n^{DB}||\_F^2} \tag{52}$$

Using the definition of the pseudo-inverse, we get,

$$\mathbf{F}\_n^{DB} = \beta\_n \left(\mathbf{H}\_{n,eff}\mathbf{H}\_{n,eff}^H + \mathbf{N}\_{RF}\varrho\mathbf{I}\_{N\_{RF}}\right)^{-1}\mathbf{H}\_{n,eff} \tag{53}$$

where is the regularization parameter, = 0 for ZF precoding and = *Ns NRF<sup>η</sup>* for regularized ZF, with *η* = *PT*,*n*/*σ*2. Lastly, introducing the group subscript again, the SINR of user *gk* is given by

$$SINR\_{\\$k^{\rm all}} = \frac{\frac{P\_{T,n}}{K\_{\\$}} \beta^2 |\mathbf{h}\_{\\$k^{\rm all}}^H \mathbf{F}\_{\\$}^{AB} \mathbf{F}\_{\\$,n}^{DB} \mathbf{F}\_{\\$,n}^{DB} \mathbf{F}\_{\\$}^{AB} \mathbf{F}\_{\\$}^{ABH} \mathbf{h}\_{\\$k^{\rm all}}|^2}{|\mathbf{h}\_{\\$}^H|^2 \left(\sum\_{k' \neq k} |\mathbf{F}\_{\\$}^{AB} \mathbf{f}\_{\\$,k'}^{DB}|^2 + \sum\_{\mathbf{g}' \neq \mathbf{g}} \sum\_{j} |\mathbf{F}\_{\mathbf{g}'}^{AB} \mathbf{f}\_{\\$,j}^{DB}|^2\right) + \sigma^2} \tag{54}$$

and the PF sum rate is calculated as

$$
\hat{\mathcal{U}} = \sum\_{\mathcal{S}=1}^{G} \sum\_{n=1}^{N\_f} \sum\_{k=1}^{K\_{\mathcal{S}}} \hat{\mathcal{U}}\_{\mathcal{S}k} . \tag{55}
$$

#### **4. Suboptimal Solution**

Joint optimization of analog and digital beamformers is challenging because they use different channel information for the design of analog and digital beamformers. Hybrid beamforming methods consider decoupled designs of analog and digital beamforming to reduce the complexity of joint optimization, but the main challenge remains the use of different channel information. To approximate the optimal solution to this mixed integer programming problem, we summarize our proposed algorithm below:

The analog precoder is formed by selecting *Kg* columns of DFT matrix of eigenvectors of channel covariance matrix **R***g* of users' group *g* in (41) to minimize the inter-group interference **I***g*,

$$\min\_{\mathbf{Y}} \qquad \mathbf{I}\_{\%}(\mathbf{Y}) \tag{56}$$

$$\text{subject to} \tag{57}$$

$$\text{tr}(\mathbf{Y}\_n) \le \mathbf{G}\_{\prime} \quad \forall n,$$

where **I***<sup>g</sup>* = ∑*<sup>G</sup> <sup>i</sup>*=<sup>1</sup> <sup>∑</sup>*<sup>G</sup> j*=1 *j* =*i* |**H***<sup>H</sup> <sup>i</sup>* **<sup>F</sup>***AB j* | 2. To solve the MIP problem, we divide the solution into two parts.

In the first part, we get the analog precoder using the selected columns of the DFT matrix which maximize the PF sum-rate. The inherent benefit of the DFT matrix is its constant modulus which enables the use of analog phase shifters and RF switches to realize the analog beamforming. In the second part, for the given analog precoder, intra-group users scheduling is performed and a ZF digital precoder is designed to maximize the sum-rate utility function. The decoupling of the analog and digital precoders design makes the solution suboptimal but tractable [34]. The joint hybrid beamforming and user scheduling Algorithm 1 takes *Kg*, *Nf* , *NRF*, *Nt*, and *K*. It generates the analog beamforming matrix **F***AB*, digital beamforming matrix **F***DB*, and the set of users in each group. The first part of the algorithm (line 9 to line 19) forms DFT/eigenmodes-based analog beamforming using limited statistical CSI feedback from the users. Beam and users pairing within each group is taken place in this part of the algorithm. The *while* loop at line 12 executes till all the binary combinations in *Nt* × *Nt* are exhausted with the condition that each column contain exactly one binary 1 and total number of 1s are equal to the number of streams (or number of RF chains). The second part (starts from line 39) assigns the radio resources to users to maximize the utility function.

#### **Algorithm 1** Joint Resource Allocation and Hybrid Beamforming Design Algorithm.

#### 1: **Inputs**


$$\text{8: } \mathcal{K}\_{\mathfrak{J}} = \{1, \dots, \mathbb{K}\_{\mathfrak{J}}\}, \\ \text{ } \mathcal{U}(\mathbb{F}\_n^{\text{DB}})^{last} = \mathbf{0}, \\ \text{ } \mathbf{Y}\_{\mathfrak{J}}^{last} = \mathbf{0}, \\ \pounds\_n = \{\}, \\ \forall n$$

{Average CSI-based Beam Selection and RF Precoder Design}


```
11: while g ≤ G do
```

19: Get **F***AB <sup>g</sup>* = Δ*Nt***Υ***last g*

{Instantaneous CSI-based Users Selection and Digital Precoder Design}

```
20: while n ≤ Nf do
```

```
21: while b ≤ NRF,g do
```

```
22: while k ≤ Kg do
```

```
23: Compute U(FDB
                     n ) = U(FDB
                               n ) + U(fDB
                                       k,n), ∀ k ∈ Kg
```

```
24: k∗ = arg max k {U(FDB
                         n )}
```

```
25: Ψk∗,n = 1
26: Update FDB
                 n and U(FDB
                          n )update with user k∗
```

```
27: if U(FDB
                n )updated ≥ U(FDB
                              n )last then
```

```
28: Kn = Kn ∪ {k∗}
29: Kg = Kg − {k∗}
```

```
30: U(FDB
               n )last = U(FDB
                         n )updated
```

```
31: k + +
```

```
32: else
```

```
33: break
34: end if
```

```
35: end while
```

```
36: b + +
```
37: **end while**

38: *n* + +

39: **end while**

40: Stack the beamforming matrices **F***DB* = [**F***DB* <sup>1</sup> , ..., **<sup>F</sup>***DB Nf* 41: **Output**

*<sup>G</sup>* ]

*Nf* ]

42: **F***AB* = [**F***AB* <sup>1</sup> , ..., **<sup>F</sup>***AB*

```
43: FDB = [FDB
              1 , ..., FDB
```

```
44: [K1, ..., KNf ]
```
]

The Algorithm 1 is illustrated in flowchart Figure 5.

**Figure 5.** Flowchart illustration of Algorithm 1.

#### **5. Machine Learning: K-Means Based Optimal Users Grouping for Analog Beamforming**

In this section, we use machine learning technique to group the users. Then, the DFT based fixed switched-beams are used to realize the analog beamforming matrix. The joint users scheduling and hybrid beamforming architecture with ML-based users grouping is shown in Figure 6.

**Figure 6.** Joint users scheduling and hybrid beamforming architecture with ML-based users grouping.

Machine learning algorithms can broadly be divided into two main categories, namely supervised learning and unsupervised learning algorithms. The former class of algorithms learn by training on the input labeled examples, called training dataset, {(*x*(1), *<sup>y</sup>*(1)),(*x*(2), *<sup>y</sup>*(2)),(*x*(3), *<sup>y</sup>*(3)), ...,(*x*(*m*), *<sup>y</sup>*(*m*))}, where the *i th* example (*x*(*i*), *y*(*i*)) consists of the *i th* instance of feature vector *x*(*i*) and the corresponding label *y*(*i*). Given a labeled training dataset, these algorithms try to find the decision boundary that separates the positive and negative labeled examples by fitting a hypothesis to the input dataset. Unsupervised machine learning algorithms, on the other hand, are given an unlabeled input dataset. These algorithms are used for extracting information or features from the dataset. These features might be related, but not confined, to the underlying structures or patterns in the input data, relationships in data items, grouping/clustering of data items, etc. Discovered features are meant to provide a deeper insight into the input dataset that can subsequently be exploited for achieving specific goals. Clustering algorithms make an important part of unsupervised learning where the input examples are grouped into two or more separate clusters based on some features. The K-Means (KM) algorithm, is probably the most popular clustering algorithm. It is an iterative algorithm that starts with a set of initial centroids given to it as input. During each iteration, it performs the following two steps.


Figure 7a depicts how the cluster centroids keep moving across iterations until the system stabilizes for an example network consisting of thirty users being grouped in five clusters. The system becomes stable in only five iterations and the final cluster layout is shown in Figure 7b.

Let us define the following notations to be used later in this section.

*K* = Total number of clusters being formed.

*x*(*i*) = Location coordinates of user *u*(*i*) . In our case, *<sup>x</sup>*(*i*) <sup>∈</sup> IR<sup>2</sup>

*c*(*i*) = Cluster to which the user *u*(*i*) is currently associated.

*<sup>μ</sup><sup>k</sup>* <sup>=</sup> Centroid of *<sup>k</sup>th* cluster, *<sup>μ</sup><sup>k</sup>* <sup>∈</sup> IR<sup>2</sup>

*<sup>μ</sup>c*(*i*) <sup>=</sup> Centroid of the cluster to which the user *<sup>u</sup>*(*i*) is currently associated.

Now the cost function *J* can be defined as

$$J(\mathbf{c}^{(1)}, \mathbf{c}^{(2)}, \dots, \mathbf{c}^{(m)}, \mu\_1, \mu\_2, \dots, \mu\_K) = \frac{1}{m} \sum\_{i=1}^{m} ||\mathbf{x}^{(i)} - \boldsymbol{u}\_{(\mathbf{c}^i)}||^2 \tag{57}$$

with the following optimization objective function.

$$\min\_{\mathfrak{c}^{(1)},\dots,\mathfrak{c}^{(m)},\mathfrak{h}\_1,\dots,\mathfrak{h}\_K} J(\mathfrak{c}^{(1)},\mathfrak{c}^{(2)},\dots,\mathfrak{c}^{(m)},\mathfrak{h}\_{1'},\mathfrak{h}\_{2'},\dots,\mathfrak{h}\_K)$$

It may be pointed out that Equation (57) allows us to compare multiple clustering layouts based on their cost and select the one with the lowest cost. The above optimization objective function constitutes a non-convex and NP-hard problem because it has many possible local minima and integer optimization variable *c*(*i*). The KM algorithm heuristically optimize this function by alternate minimization method. It iterates between two steps (Assign cluster and Recompute centroids) as described above.

**Figure 7.** Change in position of centroids as K-Means clustering algorithm progresses. (**a**) shows the transition of cluster centroids (shown as crosses) up to iteration 5, whereas, (**b**) shows only the final stable state after iteration 5. In the figures, cross-sign represents cluster centroids and the colored-circle-sign represents the user associated with the same group or cluster.

In this section, we use the KM algorithm for optimal clustering of *m* users competing for resources in a particular cell. The clustering is performed based on their geographic location, thus our input dataset {*u*(1), *<sup>u</sup>*(2), *<sup>u</sup>*(3), ..., *<sup>u</sup>*(*m*)} has *<sup>m</sup>* vectors *<sup>u</sup>*(*i*), 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>m</sup>*, consisting of location coordinates, of *i*th user. For the sake of simplicity, we assume these users are deployed in a two dimensional area, i.e., a plane and so *u*(*i*) = (*x* (*i*) <sup>1</sup> , *x* (*i*) <sup>2</sup> ), i.e., an ordered pair of location coordinates. Our clustering algorithm is summarized in Algorithm 2.

The proposed algorithm takes the location coordinates of *m* users as input. It also takes two numbers *mink* and *maxk* as additional input. The algorithm outputs the best number of clusters, *k*, such that *mink* ≤ *k* ≤ *maxk*, and corresponding members of each cluster. It starts with *k* = *mink* and randomly selects *k* user locations as the initial centroids (line 6). It assigns the closest centroid to each user (line 8) and then computes new centroids by calculating the center/average location of all nodes in each cluster (line 11). So, in effect, the location of centroids keeps moving in successive iterations. It repeats the above two steps until the change in centroid positions is zero or negligible. We repeat the test *maxt* times with a new set of randomly chosen initial centroids every time. During every test, the discovered centroids, corresponding centroid assignment to users, and the cost are saved (lines 14–16) for later comparison. After running the loop for *maxt* times, we select and store the best *k* centroids resulting from the test with the lowest cost while discarding the remaining (lines 19–21). The same is repeated for the next value of *k*, i.e., *k* = *k* + 1, until *k* > *maxk*. At the end we have *cnt* = *maxk* − *maxk* vectors *μk*, one for each value of *k*, the corresponding assignment vector **a***<sup>k</sup>* and cost *ck*. Finally, we choose the vector *μ* having the lowest cost and corresponding assignment vector **a** among *cnt* stored cases. That is the best number of clusters and corresponding centroids that the algorithm found.

**Algorithm 2** K-Means based users grouping algorithm.

1: *cnt* = 0 2: **for** *k* = *minkmaxk* **do** 3: *cnt* = *cnt* + 1 4: **for** *t* = 1 : *maxt* **do** 5: **repeat** 6: Randomly choose initial *k* centroids *μ*1, *μ*2, *μ*3, ..., *μ<sup>k</sup>* 7: **for** *i* = 1 : *m* **do** 8: *<sup>a</sup>*(*i*) <sup>=</sup> *<sup>j</sup>*, 1 <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>k</sup>*, such that *<sup>μ</sup><sup>j</sup>* is the centroid closest to *<sup>u</sup>*(*i*) 9: **end for** 10: **for** *l* = 1 : *k* **do** 11: *μ<sup>l</sup>* = mean of all users/points *u*(*i*) assigned to *lth* centroid 12: **end for** 13: **until** converges 14: *μ*(*t*) = (*μ*1, *μ*2, *μ*3,..., *μk*) 15: **a**(*t*) = (*a*(1), *a*(2), *a*(3),..., *a*(*m*)) 16: *c*(*t*) = *cost*(*μ*1, *μ*2, *μ*3,..., *μk*) 17: **end for** 18: *idx* = arg*min*{*c*(*t*), 1 ≤ *t* ≤ *maxt*} 19: *μ<sup>k</sup>* (*k*) <sup>=</sup> *<sup>μ</sup>*(*idx*), 1 <sup>≤</sup> *idx* <sup>≤</sup> *maxt* 20: **ak** (*k*) = **a**(*idx*), 1 ≤ *idx* ≤ *maxt* 21: *c* (*k*) *<sup>k</sup>* <sup>=</sup> *<sup>c</sup>*(*idx*), 1 <sup>≤</sup> *idx* <sup>≤</sup> *maxt* 22: **end for** 23: *index* = arg*min*{*c* (*k*) *<sup>k</sup>* , 1 ≤ *k* ≤ *cnt*} 24: *μ* = *μ<sup>k</sup>* (*index*), 1 <sup>≤</sup> *index* <sup>≤</sup> *cnt* 25: **a** = **ak** (*index*), 1 <sup>≤</sup> *index* <sup>≤</sup> *cnt* 26: *n* = *index*

After the groups formation, BS sends this information to all users, where users use this information to form reduced average statistical CSI. For example, a user in a group of 5, needs to send the average statistical CSI only after 1/5 of regular feedback interval time.

#### **6. Simulation Results**

Consider the downlink of a multiuser massive MIMO single cell with three 120 degree sectors. We neglect inter-sector interference and focus on a single 120 degree sector served by a ULA of *Nt* = 64 isotropic antennas at BS. The users grouping forms virtual sectors inside 120 physical sector.

In simulation, the results are obtained by averaging over 100 drops. In each drop we randomly generate spatial correlation matrices **R***g*. For each realization of spatial correlation matrix **R***g*, we simulate 1000 realizations of instantaneous channel **H**.

The joint spatio–radio scheduling and hybrid precoder scheme first forms the users groups and then selects the beams that maximizes the sum-rate through downlink training process. Secondly, it calculates the ZF based digital precoder using low dimensional effective channel feedback from the users.

Figure 8 shows the CDF of the non-zero eigenvalues of channel covariance matrix. Notice that approximately 50% of the non-zero eigenvalues are close to zero. The sum-rate increases as the number of groups increases at the cost of increased feedback overhead as shown in Figure 9. Using machine learning technique in Section 5 we can get optimal number of groups from channel covariance feedback. This results in increased sum-rate with substantial reduced feedback. The optimal *G* = 3 gives 27.6% increase in sum-rate compared to when *G* = 1 and 62.5% decrease in feedback overhead compared to *G* = 8. The comparison of performance of ML-based users grouping with previous work cannot be provided because there is no previous work that uses ML-based technique to reduce the CSI feedback overhead in massive MIMO systems. Many papers use users grouping in massive MIMO hybrid beamforming [3,5,39,40], but they do not utilize ML-based users grouping. Therefore, we have compared our proposed solution with two benchmarks of full-CSI (*G* = *K*) and coarse-CSI (*G* = 1). Figure 10 shows sum-rate with number of users at 10*dB* SNR. For a fixed number of groups *G* = 3, the increase in number of users, increases number of users per group. Due to the fixed number of groups, the feedback overhead remains the constant. Sum-rate is increasing with users because we assumed *NRF* = *Ns* = *K*. If we fix the number of RF chains to some hardware limit, then the sum-rate will saturate at specific number of users. It can be seen in Figure 10, that increasing number of users per group decreases the slop of the sum-rate for limited CSI schemes. This decrease is due the increase in intra-group interference.

Sum-rate also depends on the number of RF chains but this dependence is not linear as shown in Figure 11. This figure shows sum-rate variation with number of RF chains *NRF* when *Ns* = 8, *K* = 8, *Nt* = 64, and *SNR* = 10 dB. Sum rate increases with number of RF chains because it yields better conditioned effective channel matrix. It can be seen that the spectral efficiency does not increase monotonically with *NRF* and saturates at *NRF* = *Nt* where hybrid precoding is turned to the pure digital precoding. The increase in spectral efficiency with the number of RF chains comes at the cost of higher dimensional effective channel feedback overhead and power consumption in RF chains.

The spectral efficiency of the proposed scheme also varies with number of transmit antennas as shown in Figure 12. In the figure, *NRF* = *Ns* = *K* = 8, *SNR* = 10 dB, and BS has 16, 64, 128 or 256 ULA antennas. The performance gain increases with the increase in number of transmit antennas because large antennas array increases the resolution of the transmit beams (also depicts in figure 4) and, hence, decreases the potential of inter-beams interference.

In general, the spectral efficiency is a function of SNR and for the *SNR* = 10 dB, our ML-based users grouping and hybrid beamforming scheme gives 27.6% increased sum-rate at the cost of 33.3% extra feedback overhead as compared to the coarse-CSI case (G = 1). Our proposed scheme incurs 62.5% reduced feedback at the cost of 25.2% reduction in sum-rate as compared to the full-CSI case (G = K).

**Figure 8.** CDF of non-zero eigenvalues of channel covariance matrix **R***<sup>g</sup>* for *Nt* = 64.

**Figure 9.** Sum-rate Vs SNR with different number of groups, *K* = 8.

**Figure 10.** Sum-rate Vs number of users with different number of groups and CSI, *SNR* = 10 dB.

**Figure 11.** Sum-rate Vs number of RF chains. Number of users *K* = 8, number of transmit antennas *Nt* = 64, and *SNR* = 10 dB.

**Figure 12.** Sum-rate Vs number of transmit antennas. The number of RF chains at BS is fixed at 8, number of users is 8 and the results are obtained for *SNR* = 10 dB and *SNR* = 20 dB.

#### **7. Conclusions**

This paper studied the limited feedback two-stage hybrid beamformimg for decomposing the precoding matrix at the base-station. The huge channel state information feedback of massive MIMO has been reduced by the channel covariance-based RF precoding and beam selection. The well-known regularized block diagonalization can mitigate the inter-group interference, but requires substantial feedback. We used K-mean algorithm based unsupervised machine learning technique for users grouping and channel covariance-based eigenmodes/discrete Fourier transforms to reduce the feedback overhead and designed a simplified analog precoder. The digital precoder is designed with joint optimization of intra-group user utility function. It has been shown that more than 50% feedback overhead is reduced by the eigenmodes-based analog precoder design. The spatio–radio resources scheduling and limited feedback-based hybrid precoding increases the sum-rate by 27.6% compared to the sum-rate of one-group case at the cost of 33.3% extra feedback overhead, and reduces the feedback overhead by 62.5% at the cost of 25.2% reduction in sum-rate, compared to the full CSI feedback.

**Author Contributions:** H.K. and I.A., contributed the key idea and defined the main system model. G.B. and M.A. assisted with the system model and the mathematical analysis. I.A. developed the problem formulation. G.B. and H.K. developed the model for users grouping. H.K., I.A., G.B. and M.A. analysed all the results and added the relevant discussions. All authors contributed to the paper write up.

**Funding:** This research was supported by King Abdul Aziz City for Science and Technology Project under Grant PC-37-66.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Evaluation of Multi-Beam Massive MIMO Considering MAC Layer Using IEEE802.11ac and FDD-LTE**

#### **Fumiya Muramatsu 1, Kentaro Nishimori 1,\*, Ryotaro Taniguchi 1, Takefumi Hiraguri <sup>2</sup> and Tsutomu Mitsui <sup>2</sup>**


Received: 26 January 2019; Accepted: 14 February 2019; Published: 18 February 2019

**Abstract:** Massive multiple-input multiple-output (MIMO) transmission has attracted attention as a key technology for use in fifth-generation mobile communication systems. Multi-beam massive MIMO systems that apply beam selection in analog components and blind algorithms in digital components to eliminate the requirement for channel state information have been proposed as a method for reducing overhead. In this study, we developed an adaptive modulation scheme for implementing multi-beam massive MIMO and used computer simulation to compare it with digital and analog–digital hybrid beam-forming methods. The effectiveness of the proposed system was verified in a medium access control layer based on the IEEE802.11ac and frequency division duplex-LTE representative wireless communication standards.

**Keywords:** massive MIMO; analog multi-beam; hybrid beam-forming; PHY layer; MAC layer

#### **1. Introduction**

Cellular network data traffic volumes have increased significantly with the advent of smart devices. This trend will continue to grow as Internet of Things (IoT) equipment and big-data services become more common. To meet this demand, recent research and development have focused on achieving a 20-Gbps or more standard for future wireless communication [1,2].

Multiple-input multiple-output (MIMO) systems can be used to improve the transmission rate within a limited frequency band [3]. To this end, multi-user MIMO (MU-MIMO) has been developed to enable MIMO transmission to multiple users [4]. MU-MIMO technologies have been standardized for LTE-advanced and IEEE802.11ac [5,6]. Accordingly, massive MIMO is viewed as a fifth-generation (5G) technology, and it is expected to play an important role in achieving 5G targets [1,7].

In a MIMO system, a base station (BS) acquires channel state information (CSI) between itself and a user terminal (UT) [8]. As CSI acquisition under massive MIMO involves numerous UT–BS antenna pair channels, one of the most important challenges in implementing it is the acquisition of large amounts of CSI with a small overhead. To achieve this, it is necessary to evaluate not only the physical (PHY) layer but also the medium access control (MAC) layer. The MAC layer is a communication protocol of a part of the data link layer corresponding to the second layer of the OSI reference model in IEEE802, and it is located in a layer that is one layer above the physical layer.

Implicit beamforming (BF) has been proposed as an approach for eliminating CSI feedback in MU-MIMO systems [9]. However, the communication efficiency of short-packet communication decreases even if implicit BF is applied in a massive MIMO system [10].

CSI estimation and initial UT tracking, which require a significant amount of overhead, are necessary in hybrid BF massive MIMO systems [1]. A hybrid BF approach referred to as multi-beam massive MIMO transmission has been proposed to solve this problem. CSI estimation is unnecessary in this approach. The effectiveness of multi-beam massive MIMO has been demonstrated through computer simulation [11–13]. In multi-beam massive MIMO systems, a number of analog multi-beams are created for initial UT tracking and a subset of these beams with high received power are selected for use. As multi-beams have narrow beam widths, interference signals can be mitigated and residual interference can be cancelled by applying a blind array algorithm to digital beam components using only the information pertaining to a received signal [14–16]. This paper proposes a multi-beam massive MIMO approach that achieves high transmission efficiency and flexible communication using asynchronous UT transmission. As the proposed system does not require timing synchronization among UTs, all UTs can freely transmit signals while avoiding collision between their signals. The diminished signal transmission time resulting from this configuration reduces the total power consumption by the UTs.

In our previous studies, we proposed multi-beam massive MIMO beam-selection methods [13] that utilized received-signal information such as the power difference and amplitude correlation between beams. This approach was capable of appropriately performing beam-selection at a high signal-to-interference noise ratio (SINR). Furthermore, this method could be performed through signal processing using a simple configuration and was highly suitable for hybrid analog-digital massive MIMO.

Channel quality indicator (CQI) values, which can serve as an index of modulation scheme determination, cannot be obtained under multi-beam massive MIMO because it does not perform CSI estimation. In this paper, we propose a simple adaptive modulation scheme for multi-beam massive MIMO transmission based on amplitude correlation and received power. Under this method, an appropriate modulation scheme can be simply determined based on the relationship between amplitude correlation and the SINR. We assessed the proposed method by evaluating the transmission rates it achieved using the IEEE802.11ac and frequency division duplex (FDD)-LTE standards. In addition, we evaluated the throughput of the method under the MAC layer of each standard to consider overhead.

The rest of this paper is organized as follows: The general massive MIMO transmission approach is described in Section 2. In Section 3, we describe the use of amplitude correlation for modulation and propose a simple adaptive modulation scheme for multi-beam massive MIMO. In Section 4, we present our simulation model and describe its application in evaluating simple adaptive modulation schemes. Then, we describe the results of the performance evaluation of the proposed multi-beam massive MIMO method through the computer simulation of the IEEE802.11ac and FDD-LTE standard environments.

#### **2. Massive MIMO Transmission and Proposed System**

The bandwidth required by 5G systems is achieved using frequencies of 20 GHz or higher. Massive MIMO utilizes BF technology to resolve propagation loss, which is one of the more serious problems encountered under high-frequency bands. Typically, a technique known as digital BF (DBF) is adopted, as shown on the right-hand side of Figure 1. In this technique, weight values are calculated through digital signal processing. However, the massive MIMO systems that apply DBF experience problems with respect to power consumption and implementation cost [1,17,18].

The left-hand side of Figure 1 shows the configuration of a typical hybrid BF massive MIMO with sub-arrays [19]. In the figure, *NL* and *NK* indicate the number of elements and receivers in a sub-array, respectively. This configuration produces a hybrid analog-digital BF that tracks a desired signal through analog control while removing interference using digital signal processing, and it has attracted significant interest [20–22]. Massive MIMO is known to be highly effective in addressing the power consumption and implementation cost problems owing to its hybridization of analog and digital approaches.

Multi-beam massive MIMO eliminates the requirement for CSI estimation by forming numerous narrow directional beams and spatially separating the signals of each UT. As a result, the synchronization between a BS and a UT, which is required for CSI estimation, becomes unnecessary. Under standard MU-MIMO, UTs must transmit signals to a BS simultaneously to employ CSI feedback prior to communication. As multi-beam massive MIMO does not require CSI feedback, a UT can communicate with a BS at any time.

Figure 2 shows the configuration of analog multi-beam massive MIMO under uplink communication, along with its 16 multi-beam pattern. The proposed system generates *N* orthogonal multiple beams in the analog component and uses a butler matrix circuit to achieve multi-beam formation [23,24]. The butler matrix circuit, which was introduced in [24], forms an orthogonal beam using a Fast Fourier Transform (FFT) on a spatially arranged array. This system is also effective from the viewpoint of the pilot contamination problem in the multi-cell scenario because of the signals by multiple UTs can be received without the CSI estimation by the multi-beam circuit.

**Figure 1.** Uplink configurations of typical analog–digital hybrid (**left**) and full-digital (**right**) beamforming (BF) massive multiple-input multiple-output (MIMO) systems.

**Figure 2.** Uplink configuration of multi-beam massive MIMO in uplink and 16 multi-beam pattern.

The proposed method performs beam selection using an amplitude detector and a processor. Through beam selection, it is possible to spatially separate a specific UT using beam directivity. However, interference signals from the other UTs, which are also received by side lobes, cannot be rejected through only the multi-beam forming network. A method that uses only amplitude information, such as amplitude correlation, must be used to achieve beam selection with low interference power.

The system developed in this study applies robust independent component analysis (ICA) using a blind adaptive algorithm [16,25]. ICA is a blind signal processing technique that is commonly used in image processing and medicine. In this technique, an observed random vector is decomposed into statistically independent variables [16]. ICA does not require CSI because it utilizes only received signals. The use of ICA allows for a hybrid configuration for efficient transmission in massive MIMO systems in which multi-beams are applied to analog components and a blind algorithm is applied to digital components.

#### **3. Adaptive Modulation for Multi-Beam Massive MIMO**

As discussed in the previous section, multi-beam massive MIMO does not require CSI estimation, thereby eliminating the requirement for a process to obtain channel information at the BS side. As a result, it is not possible to use a modulation and coding scheme (MCS) index to combine the modulation scheme and coding rate from the CQI value, as is done under current wireless communication standards [5,6]. Therefore, we developed an adaptive modulation scheme based on the received power and amplitude correlation obtained from performing beam selection during uplink.

Here, we explain the beam-selection method and amplitude correlation in multi-beam massive MIMO. Beam-selection is one of the major challenges in multi-beam massive MIMO. As a directivity peak cannot be directed to a given UT, it is possible for one beam to receive signals from multiple UTs. It has been confirmed that in such situations, signal separation cannot be perfectly performed in digital components, even by a blind algorithm. To address this, the authors previously proposed a beam-selection method that utilizes the power differences between the multi-beams and correlation values obtained from amplitude information [13]. By employing this method, it is possible to achieve beam selection with a high SINR by setting thresholds.

Under the proposed beam-selection method, amplitude correlation is calculated based on the covariance between the amplitudes of received and adjacent beams. The amplitude of a received signal is defined as a vector, *x*, whose number of elements is equivalent to the lengths of all transmission-signal symbols. The correlation coefficient, *ρ*(*x*, *y*), with respect to the signal amplitudes, *x* and *y*, is denoted as

$$\rho(\mathbf{x}, \mathbf{y}) = \frac{\sum\_{i=1}^{N} (\mathbf{x}\_i - \overline{\mathbf{x}})(y\_i - \overline{\mathbf{y}})}{\sqrt{\sum\_{i=1}^{N} (\mathbf{x}\_i - \overline{\mathbf{x}})^2} \sqrt{\sum\_{i=1}^{N} (y\_i - \overline{\mathbf{y}})^2}},\tag{1}$$

where *x* is the mean of all elements in *x*.

Figure 3 shows the relationship between amplitude correlation and the SINR. The SINR of the received signal tends to be high when the amplitude correlation is high. Based on this finding, we examined a method of determining a simple adaptive modulation approach on the downlink using the amplitude correlation and received power obtained on the uplink.

**Figure 3.** Relationship between amplitude correlation and signal-to-interference noise ratio (SINR).

#### **4. Computer Simulation**

The results of the computer simulation of the proposed system were verified in terms of the Rayleigh fading and angular spread of quadrature amplitude modulation (QAM) signals [26,27]. The simulation conditions are listed in Table 1. Figure 4 shows the simulation model, which is a scattering ring model [27] with a specific angular spread. The model applies 101 paths per UT with an angular spread of 1.0 degree. We assume flat fading in narrowband signals and do not consider delay spread. Sixty-four elements are arranged in the horizontal direction at intervals of 0.5 wavelengths, and beam width is approximately 1.60 degrees. As the assumed transmission method involves a single carrier for narrowband signals, delay waves have no influence on signal separation. The signals for each sub-carrier are regarded as narrowband signals to apply them as actual broadband signals through orthogonal frequency division multiplexing. In the UT distribution, the center direction of the BS is set as 0 degrees and individual UTs are placed at random angles within a range of −60 to 60 degrees. A constant distance is set between the BS and all UTs so that a signal-to-noise ratio (SNR) of 20 [dB] per UT is received at the BS.

**Figure 4.** Simulation model.


**Table 1.** Simulation conditions.

First, the processing of the uplink was simulated transmitting QPSK signals from all 20 UTs to the BS, which recorded the amplitude correlation and received power for each beam. On the downlink, the BS transmitted QPSK 1024-QAM signals to eight users with high reception power. Then, the UTs received signals from the BS calculated bit error rate (BER). We assumed that a modulation scheme yielding a BER of less than 10−<sup>2</sup> could be applied [28,29]. We evaluated the applicable modulation schemes in terms of the SINR and received power recorded on the uplink to simulate an adaptive modulation scheme.

#### *4.1. Adaptive Modulation*

The results produced by the simulation procedure described in Section 4 are shown in Figure 5, in which the left-hand and right-hand figures show the relationship between the applicable modulation method and amplitude correlation and received power, respectively. Even though amplitude correlation increases with the modulation order, it is not appropriate to select a modulation method using only amplitude correlation because the standard deviation of the results is large. Therefore, the received power relationship shown in the right-hand figure was used to develop an indicator for determining the modulation method (Table 2). In the table, *ρ* and *P* represent amplitude correlation and received power, respectively.

**Figure 5.** Relationships between applicable modulation scheme and amplitude correlation (**left**) and received power (**right**).


**Table 2.** Adaptive modulation scheme for eight user terminals (UTs).

#### *4.2. IEEE802.11ac*

We evaluated the indicator shown in Table 2 and the downlink effectiveness of multi-beam massive MIMO transmission based on the IEEE802.11ac procedures [5,30]. The simulation conditions and model were the same as those used in Section 4. Equation (2) was used as an index of evaluation based on the BER calculated on the UTs.

$$R = \begin{cases} M \left( 1 - \text{BER} \right) & \text{(BER} \le 10^{-2}) \\ 0 & \text{(otherwise)} \end{cases} \text{ [bit/symbol]},\tag{2}$$

where *M* denotes the number of bits per symbol and *R* denotes the number of bits per symbol obtained independently of the coding rate.

The assessment was divided into parts, i.e., the evaluation of the PHY layer without considering the overhead arising from the control signal and the evaluation of the MAC layer considering overhead. Table 3 shows the relationship between the MCS index and the transmission rate under the IEEE802.11ac 20-[MHz] operation [5]. The values of *R* are obtained by multiplying the modulation order by the coding rate, and they are equivalent to *R*, which is the number of bits per symbol. Accordingly, it was assumed that an MCS index that satisfies *R* ≥ *R* could be applied to the *R* obtained on the UTs. Based on this, we could evaluate the transmission rate corresponding to the MCS index.


**Table 3.** Relationship between MCS index and transmission rate under IEEE802.11ac at 20 MHz.

In the MAC layer, throughput is calculated according to the downlink MU-MIMO procedure shown in Figure 6 [5]. As seen in the figure, the BS first transmits request signals to all UTs to obtain channel information. Once the BS has obtained channel feedback from each UT data transmission signal, data reception processing is performed for all UTs. Throughput is obtained by dividing the data size of frame aggregation, which was set as 63,000 bytes in this evaluation, by the time required by the processes. As multi-beam massive MIMO does not require procedures such as CSI estimation or feedback, we assumed that no time other than that necessary for data transmission was required.

**Figure 6.** Downlink multi-user (MU)-MIMO procedure under IEEE802.11ac.

We evaluated the performance of a typical hybrid BF massive MIMO and compared it with those of the sub-array and full-digital BF configurations discussed in Section 2. Table 4 shows the precoding and decoding methods used under each configuration in the downlink. For the hybrid BF, we assumed that maximum ratio combining (MRC) was performed in the analog component using a phase shifter [31]. In the digital component, the minimum mean square error (MMSE) weight calculated in the uplink was used as a precoder [31]. Eight sub-arrays (*NK* = 8) and eight sub-array elements (*NL* = 8) were utilized, resulting in 64 elements. In the full-digital BF, transmission precoding was performed using a block diagonalization (BD) method with 64 elements [32].

**Table 4.** Precoding and decoding on downlink.


Figure 7 shows the cumulative distribution function (CDF) characteristics of transmission rate and throughput. The adaptive modulation results show the characteristics obtained for each trial when the highest modulation order is selected in the modulation scheme in which the BER is 10−<sup>2</sup> or less. The proposed modulation results are obtained using the characteristics based on Table 2. At the assumed transmission rate, the full-digital BF configuration achieves 86.7 [Mbps], which is the maximum transmission rate obtainable under IEEE802.11ac, because it scans a sharp beam width using 64 elements. The median value of multi-beam massive MIMO is approximately 70 [Mbps], and it is asymptotic to the ideal characteristic obtained by utilizing the proposed simple adaptive modulation. The hybrid BF configuration characteristic achieves a low modulation order because beam width is widened when there are eight elements per sub-array. The full-digital configuration can achieve a throughput of only approximately 40 [Mbps] because of the overhead of channel estimation and feedback obtained when there are 64 antenna elements. In hybrid BF, overhead occurs owing to eight elements, which is the number of sub-arrays. As shown in Figure 6, throughput decreases when the transmission rate of even one UT is slow because IEEE802.11ac requires the synchronization of all UTs, as shown in Figure 6. This reduces the throughput characteristic of the hybrid BF. The multi-beam massive MIMO characteristic is the best among the three configurations because it does not require synchronization and overhead and can effectively utilize the transmission rate obtained in the PHY layer.

**Figure 7.** Characteristics of transmission rate and throughput under IEEE802.11ac.

#### *4.3. FDD-LTE*

We evaluated the proposed method under FDD-LTE (Release 15) [6].We assumed a bandwidth of 20 [MHz] per UT and 100 resource blocks (RBs). Table 5 shows the relationship between the MCS index and the transmission rate in LTE when there is one antenna. The transport block size (TBS) is based on [6], and the transmission rate is calculated based on the efficiency per bandwidth. The TBS corresponds to an actual data component divided into RBs and transmitted, while user data indicate the resource size for a given control signal. As in the IEEE802.11ac simulation, we compared the values of *R* and *R* produced by the respective approaches and evaluated the results in terms of the corresponding transmission rates.

**Table 5.** Relationship between MCS index and transmission rate under frequency division duplex (FDD)-LTE (100 RBs, 20 MHz).


The throughput in LTE, which is denoted here by *T*, can be calculated as follows: [6,33].

$$\begin{aligned} T &= TBS \times N\_{\rm str} \\ &\doteq N\_{\rm subc} \times N\_{\rm slot} \times N\_{\rm sym} \times M \times N\_{\rm str} \times \text{CR} \ [\text{bps}]\_{\prime \mu} \end{aligned}$$

where *N*subc is the number of sub-carriers, *N*slot is the number of slots per second, *N*sym is the number of symbols per slot, *N*str is the number of streams, and CR is the coding rate.

We apply Equation (3) to consider control signals because Equation (3) cannot express the number of control signals that can be eliminated under multi-beam massive MIMO.

$$T = TBS \times \frac{Usrdata'}{Usrdata} \times N\_{\text{subf}} \times N\_{\text{str}} \text{ [bps]},\tag{3}$$

where *N*subf is the number of sub-frames per second. The value of *Userdata* is obtained by increasing or decreasing the number of control signals from secured *Userdata*, which is defined in [6]. For example, when *IMCS* = 27 in Table 1, *Userdata* is obtained as 120,000 [bits]. This corresponds to a *Userdata* of 128,000 [bits] in the multi-beam massive MIMO, which is obtained by removing the control signal from *Userdata*. As there is no complete definition of a control signal in the case of 64 elements, we assumed that one antenna port would be added to handle the additional control signal required each time the number of antennas was doubled. Therefore, *Userdata* was 83,200 [bits] for 64 antennas under the full-digital configuration.

Figure 8 shows the relationship between the transmission rate and throughput based on Equation (3). We used this figure to evaluate throughput based on overhead for each configuration.

**Figure 8.** Transmission rate versus throughput from Equation (3).

Figure 9 shows the CDF characteristics of the transmission rate and throughput under FDD-LTE. The modulation scheme for each characteristic is determined in the same manner as that used in the evaluation under IEEE802.11ac. The transmission rate follows the same trend as that under IEEE802.11ac. However, unlike IEEE 802.11ac, a high transmission rate can be achieved even with a low modulation order. This suggests that the performance of the hybrid BF approaches that of multi-beam massive MIMO. An evaluation of throughput based on Figure 8 reveals that it increases linearly with transmission rate under all configurations. Given that multi-beam massive MIMO has the lowest number of control signals of all configurations, it achieves the highest throughput at a given transmission rate. This verifies the effectiveness of multi-beam massive MIMO, which can effectively utilize the transmission rate in the PHY layer.

**Figure 9.** Characteristics of transmission rate and throughput under FDD-LTE.

#### **5. Conclusions**

In this paper, we proposed a simple adaptive modulation method for multi-beam massive MIMO and evaluated its performance when implemented in a MAC layer. The effectiveness of the proposed transmission method was validated through computer simulation. The performance of the method asymptotically approached the ideal characteristic because of the implementation of simple adaptive modulation using amplitude correlation and received power. The proposed method was also shown to achieve higher throughput characteristics compared to a hybrid approach with sub-arrays and a full-digital beam-forming configuration under the IEEE802.11ac and FDD-LTE environments. Future works include the further theoretical analysis and experimental evaluation using the analog multi-beam by Butler matrix circuit.

**Author Contributions:** This study was led by F.M. while K.N., R.T., T.H. and T.M. assisted with the computer simulations.

**Funding:** This work was partially supported by the SCOPE #165004002, #185004002 and KAKENHI, Grant-in-Aid for Scientific Research (B) (17H01738, 17H03262).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Computationally Efficient Channel Estimation in 5G Massive Multiple-Input Multiple-output Systems**

#### **Imran Khan 1, Mohammad Haseeb Zafar 1, Majid Ashraf <sup>1</sup> and Sunghwan Kim 2,\***


Received: 9 November 2018; Accepted: 28 November 2018; Published: 3 December 2018

**Abstract:** Traditional channel estimation algorithms such as minimum mean square error (MMSE) are widely used in massive multiple-input multiple-output (MIMO) systems, but require a matrix inversion operation and an enormous amount of computations, which result in high computational complexity and make them impractical to implement. To overcome the matrix inversion problem, we propose a computationally efficient hybrid steepest descent Gauss–Seidel (SDGS) joint detection, which directly estimates the user's transmitted symbol vector, and can quickly converge to obtain an ideal estimation value with a few simple iterations. Moreover, signal detection performance was further improved by utilizing the bit log-likelihood ratio (LLR) for soft channel decoding. Simulation results showed that the proposed algorithm had better channel estimation performance, which improved the signal detection by 31.68% while the complexity was reduced by 45.72%, compared with the existing algorithms.

**Keywords:** 5G; massive MIMO; computational efficiency; precoding algorithms; channel estimation

#### **1. Introduction**

Multiple-input multiple-output (MIMO) technology is becoming more and more mature, especially when combined with orthogonal frequency division multiplexing (OFDM) [1–5], which has been successfully applied in multiple wireless communications fields such as Long-Term Evolution (LTE) and LTE-Advanced. However, traditional MIMO technology can only achieve a 4 × 4 or 8 × 8 scale system [6], which makes it difficult to meet the explosive growth in mobile data services. Therefore, in recent years, massive MIMO has been proposed based on traditional MIMO technology [7]. Massive MIMO systems configure up to hundreds of antenna arrays at the base station to serve multiple single-antenna end-users simultaneously [8], which can improve spectrum utilization and power utilization in wireless communications systems by two to three orders of magnitude [9–11]. This has become one of the most promising enabling technologies and one of the hottest research directions in 5G [12]. The maximum likelihood (ML) algorithm is the optimal algorithm in massive MIMO detection algorithms, but its computational complexity increases exponentially with the number of system antennas and the modulation order of baseband signals. It is difficult for it to be fast, effective, and realized in practical applications. Linear detection methods, such as the zero-forcing (ZF) algorithm and minimum mean square error (MMSE) algorithm, can achieve near-optimal detection performance in massive MIMO systems. The complexity in this kind of detection algorithm is greatly reduced, compared with the complexity of the ML algorithm, but introduces a complex high-dimensional matrix inversion operation, so a low-cost and efficient engineering implementation is still a problem to be solved. Aimed at this problem, many simplified algorithms based on the MMSE detection scheme have been proposed in recent years, and can be roughly divided into three types: The series

expansion class-approximation method [13,14], the iterative class-approximation method [15,16] and a gradient-based search for an approximate solution [17–20]. The authors in [13] proposed a method of using Neumann series expansion to approximate the inverse MMSE filter matrix, but when the number of expansion stages was gradually increased (*i* > 2), the computational complexity was still high, even equal to or exceeding the exact MMSE. The complexity of the filter matrix inversion algorithm also loses a large degree of detection performance. The authors in [14] applied the Newton algorithm derived from the first-order Taylor series expansion (similar to Neumann series expansion) to massive MIMO signal detection, and used the iterative method to improve the estimation accuracy of the MMSE filter matrix inversion. However, from the aspects of detection performance and computational complexity, the algorithm based on the Newton iteration was not dominant. Different from the two series expansion-based algorithms above, it is necessary to estimate the signal vector sent by the user by inverting the approximate matrix. Some iterative algorithms based on solving linear equations, such as the Richardson iterative (RI) algorithm [15] and the successive over-relaxation (SOR) algorithm [16], use the special properties of the MMSE filter's symmetric positive definite matrix. Through the method of solving linear equations, they directly estimate the transmission vector, thus avoiding the inversion of high-dimensional matrices.

The RI and SOR algorithms mentioned above have lower computational complexity at a fixed number of iterations, but RI convergence is slower and requires a higher number of iterations to achieve certain detection performance requirements. In SOR, although the detection performance is close to excellent, its internal iterative structure means the algorithm cannot be implemented in parallel in practical applications. The third type of algorithm is mainly designed and implemented based on the idea of a matrix gradient, including the conjugate-gradient (CG) method [17] and the steepest descent (SD) method [18]. This type of algorithm uses the matrix gradient search method and avoids the high-dimensional matrix inversion problem. However, compared to the method of series expansion, the CG and SD algorithms bring about great improvement in detection performance, but calculation of the matrix gradient after each iteration also causes higher complexity.

In this paper, a low-complexity joint detection algorithm was proposed. The SD algorithm had a good convergence direction at the beginning of the iteration, and the Gauss–Seidel (GS) algorithm with low complexity mentioned in [19] was combined with the SD method (called SDGS), which provided an effective search direction for GS iterations, speeding up convergence and improving the detection performance. Furthermore, applying it to soft output detection gave an approximate calculation method for the bit log-likelihood ratio (LLR) of the channel decoder input. A good compromise between detection performance and computational complexity was achieved.

The rest of this paper is organized as follows: Section 2 discusses the system model and analytical derivations, Section 3 explains signal detection, while Section 4 explains the mixed iterative algorithm and the proposed algorithm. Section 5 provides the simulation results, while Section 6 concludes the paper.

#### **2. System Model**

The research object considered in this paper was the uplink for a massive MIMO system consisting of a base station equipped with *N* antennas and *K* single-antenna users where *N K*, as shown in Figure 1. Let *sc* = [*s***1**, *s***2**,..., *sK*] *<sup>T</sup>* denote the *<sup>K</sup>* <sup>×</sup> **<sup>1</sup>** dimensional symbol vector sent by all users simultaneously, where *sk* ∈ *ε* was the transmitted symbol from the *k***th** user, and *ε* was the modulation symbol set.

Let *Hc* <sup>∈</sup> <sup>C</sup>*N*×*<sup>K</sup>* represent the Rayleigh fading channel matrix; then, the *<sup>N</sup>* <sup>×</sup> **<sup>1</sup>** dimensional signal vector received by the base station could be recorded as:

$$\mathbf{y}\_{\mathfrak{c}} = H\_{\mathfrak{c}} \mathbf{s}\_{\mathfrak{c}} + \mathfrak{n}\_{\mathfrak{c}} \tag{1}$$

*Electronics* **2018**, *7*, 382

where *nc* represented an additive white Gaussian noise (AWGN) vector with an *N* × **1** dimensional mean of 0 and a covariance matrix of *σ***<sup>2</sup>***IN*. Converting the complex model of Equation (1) into an equivalent real model gave:

$$y = H\mathbf{s} + \mathbf{n} \tag{2}$$

Among these terms, *<sup>s</sup>* <sup>∈</sup> <sup>R</sup>**2***K*, *<sup>H</sup>* <sup>∈</sup> <sup>R</sup>**2***N*×**2***K*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>**2***N*, and *<sup>n</sup>* <sup>∈</sup> <sup>R</sup>**2***N*, which were:

$$\begin{aligned} H &= \begin{bmatrix} \Re(H\_{\mathcal{E}}) & -\Im(H\_{\mathcal{E}}) \\ \Im(H\_{\mathcal{E}}) & \Re(H\_{\mathcal{E}}) \end{bmatrix}, \mathbf{y} = \begin{bmatrix} \Re(\mathbf{y}\_{\mathcal{E}}) \\ \Im(\mathbf{y}\_{\mathcal{E}}) \end{bmatrix} \\ \mathbf{s} &= \begin{bmatrix} \Re(\mathbf{s}\_{\mathcal{E}}) \\ \Im(\mathbf{s}\_{\mathcal{E}}) \end{bmatrix}, \mathbf{n} = \begin{bmatrix} \Re(\mathbf{n}\_{\mathcal{E}}) \\ \Im(\mathbf{n}\_{\mathcal{E}}) \end{bmatrix} \end{aligned} \tag{3}$$

Among those, (·) and (·) indicated the real part and imaginary part, respectively.

**Figure 1.** Massive multiple-input multiple-output system model.

#### *2.1. Minimum Mean Square Error Signal Detection*

The main task of signal detection was to accurately determine user transmission vector *s* at the base station through received signal vector *y*. The transmitted signal vector *s*ˆ detected by the MMSE algorithm could be expressed as:

$$\mathfrak{s} = \left(H^H H + \sigma^2 I\_{2K}\right)^{-1} H^H \mathfrak{y} = \mathcal{W}^{-1} \mathfrak{y} \tag{4}$$

where *y*ˆ = *HHy*. The filter matrix *W* of the MMSE detector could be expressed as:

$$\mathcal{W} = \mathcal{G} + \sigma^2 I\_{2K} \tag{5}$$

where *G* = *H<sup>H</sup> H* was the Gram matrix. In massive MIMO systems, the computational complexity of *W*−<sup>1</sup> is *O K*3 , which makes the implementation of the MMSE algorithm very complex.

#### *2.2. Log Likelihood Ratio Calculation*

Various channel coding techniques are commonly employed in wireless communication systems to improve their error performance, since channel reliability can be used to improve system stability. Conventional MIMO system signal detection generally uses a hard decision method to directly execute symbol decisions on the estimated value of the user-transmitted signal vector, i.e., *s*ˆ in Equation (4). In order to output the soft detection information to the back end of the detector, after the MMSE detector estimates *s*ˆ, the LLR soft information used for channel decoding could be calculated with the following method. First, we needed to restore the estimated *s*ˆ and the calculated *W*−<sup>1</sup> to the equivalent complex field to get *s*ˆ*<sup>c</sup>* and *W*−<sup>1</sup> *<sup>c</sup>* . Let *U* = *W*−<sup>1</sup> *<sup>c</sup> Gc* = *W*−<sup>1</sup> *<sup>c</sup> H<sup>H</sup> <sup>c</sup> Hc* denote the equalized channel matrix. The equalized signals obtained by the MMSE filter matrix could be obtained from Equations (2) and (4) as follows:

$$\begin{split} \hat{\mathbf{s}}\_{\mathcal{c}} &= \mathcal{W}\_{\mathcal{c}}^{-1} \mathbf{G}\_{\mathcal{c}} \mathbf{s}\_{\mathcal{c}} + \mathcal{W}\_{\mathcal{c}}^{-1} H\_{\mathcal{c}}^{H} \boldsymbol{n}\_{\mathcal{c}} \\ &= \mathcal{U} \mathbf{s}\_{\mathcal{c}} + \mathcal{W}\_{\mathcal{c}}^{-1} H\_{\mathcal{c}}^{H} \boldsymbol{n}\_{\mathcal{c}} \end{split} \tag{6}$$

Then, the estimated value of the symbol transmitted by the *i*th user is *s*ˆ*c*,*<sup>i</sup>* = *μisc*,*<sup>i</sup>* + *ei*, where *μ<sup>i</sup>* = [*U*]*ii* = *Uii* represented the effective channel gain after equalization, and *ei* represented the noise plus interference (NPI) term contained in the *s*ˆ*c*,*i*. The noise variance was *v*<sup>2</sup> *<sup>i</sup>* <sup>=</sup> <sup>∑</sup>*<sup>K</sup> j* =*i Uji* <sup>2</sup> <sup>+</sup> *Eiiσ*2, where *Uji* and *Eii* represented the (*j*, *i*)th element of the matrix *U* and the *i*th diagonal of the matrix *E*, respectively, where *E* = *W*−<sup>1</sup> *<sup>c</sup> H<sup>H</sup> c W*−<sup>1</sup> *<sup>c</sup> H<sup>H</sup> c <sup>H</sup>* <sup>=</sup> *<sup>W</sup>*−<sup>1</sup> *<sup>c</sup> GcW*−<sup>1</sup> *<sup>c</sup>* . Using the max-log approximation representation given in [11], the LLR *Li*,*<sup>b</sup>* corresponding to the *b*th bit transmitted by the *i*th user was expressed as:

$$L\_{i,b} = \text{Y}\_i \left( \begin{array}{c} \min \\ a \in O\_b^0 \end{array} \left| \frac{\mathfrak{s}\_{\mathbf{c},i}}{\mu\_i} a \right|^2 - \begin{array}{c} \min \\ a' \in O\_b^1 \end{array} \left| \frac{\mathfrak{s}\_{\mathbf{c},i}}{\mu\_i} a \right|^2 \right) \tag{7}$$

where Υ*<sup>i</sup>* = *μ*<sup>2</sup> *<sup>i</sup>* /*v*<sup>2</sup> *<sup>i</sup>* represented the signal-to-interference plus noise ratio (SINR), and *<sup>O</sup>*<sup>0</sup> *<sup>b</sup>* and *<sup>O</sup>*<sup>1</sup> *b* represented the modulation symbol set with the *b*th bit being 0 and 1, respectively.

#### **3. Low Complexity Signal Detection**

#### *3.1. Neumann Series Expansion*

In a massive MIMO system, the MMSE signal detection algorithm involves a high-dimensional matrix inversion, *W*<sup>−</sup>1, with a computational complexity of *O K*3 . In order to reduce the computational complexity of *W*<sup>−</sup>1, the authors in [11] proposed using Neumann series expansion to approximate matrix inversion results. When *W* approximates the invertible matrix *X* and satisfies

$$\lim\_{n \to \infty} (I - \mathcal{W}X)^n = 0 \tag{8}$$

then, the Neumann series can be expressed as

$$\mathcal{W}^{-1} = \Sigma\_{n=0}^{\infty} \left( X^{-1} (X - \mathcal{W}) \right)^{n} X \tag{9}$$

The decomposition matrix is *W* = *D* + *E*, where *D* is the diagonal matrix of *W*, and *E* is the hollow matrix corresponding to *W*. Since the number of antennas equipped at the base station was much larger than the number of single-antenna users (*N K*), matrix *W* has a diagonal dominant characteristic [3]; that is, *W* ≈ *D*. Substituting *D* for *X* in Equation (9) gives:

$$\mathcal{W}^{-1} = \Sigma\_{n=0}^{\infty} \left( -D^{-1}E \right)^{n} D^{-1} \tag{10}$$

when lim*n*→<sup>∞</sup> −*D*−1*<sup>E</sup> <sup>n</sup>* = 0, the progression of Equation (10) converges. If we only expand the first *<sup>i</sup>* term of the Neumann series, we can get:

$$\mathcal{W}\_i^{-1} = \Sigma\_{n=0}^{i-1} \left( -D^{-1}E \right)^n D^{-1} \tag{11}$$

when the value of *i* is small, the Neumann series expansion can approximate *W*−<sup>1</sup> with lower complexity. For example, when *i* = 2, *W*−<sup>1</sup> <sup>2</sup> = *<sup>D</sup>*−<sup>1</sup> − *<sup>D</sup>*−1*ED*<sup>−</sup>1, which is computationally complex, and the complexity is *O K*2 .

#### *3.2. Gauss–Seidel Algorithm*

In the Neumann series expansion algorithm, when the number of expansion terms *i* ≥ 3, the computational complexity is still *O K*3 , which is equal to or even exceeds the complexity of the exact inverse calculation of the MMSE filter matrix. Unlike the Neumann series expansion, which approximates *W*, the GS algorithm [19] can solve N-dimensional linear equations of the form *Ax* = *b* without inverting the matrix, where matrix *A* is an *N* × *N* dimensional symmetric positive definite matrix, *x* is the *N* × 1 dimensional solution vector, and *b* is the *N* × 1 dimensional measurement vector. Decomposing matrix *A* into a diagonal element matrix, *DA*, a strict lower triangular element matrix, *LA*, and a strict upper triangular element matrix, *L<sup>H</sup> <sup>A</sup>* , the GS algorithm can estimate *x* by the following iterative method:

$$\mathfrak{X}^{(i)} = \left(D\_A + L\_A\right)^{-1} \left(b - L\_A^H \mathfrak{X}^{(i-1)}\right) \tag{12}$$

where *i* = 1, 2, ... represents the number of iterations of the GS algorithm. In a massive MIMO system, as the number of base station antennas increases substantially, when it is much larger than the number of single-antenna users (*N K*), the individual column vectors of the uplink channel matrix *H* are progressively orthogonal [20], and *W* = *G* + *σ*<sup>2</sup> *I*2*<sup>K</sup>* is a symmetric positive definite matrix. Similarly, *W* can be decomposed into:

$$\mathcal{W} = \left( D + L + L^H \right) \tag{13}$$

Among those terms, *D*, *L*, and *LH*, respectively, is the diagonal element matrix of *W*, the strict lower triangular element matrix, and the strict upper triangular matrix. The GS algorithm can be used to avoid inverting the high-dimensional matrix, which directly estimates the transmitted signal vector *s*ˆ:

$$
\hat{\mathbf{s}}^{(i)} = \left(D + L\right)^{-1} \left(\hat{\mathbf{y}} - L^H \hat{\mathbf{s}}^{(i-1)}\right) \tag{14}
$$

where *s*ˆ (0) represents the initial solution and is usually set to a zero vector.

#### **4. Proposed Algorithm**

#### *4.1. Hybrid Iterative Algorithm Structure*

The SD algorithm based on matrix gradient search can have a good convergence direction at the beginning of the iteration [18], while the GS iterative algorithm has lower complexity. Therefore, using the above characteristics, this paper proposed a hybrid iteration of the joint SD and GS algorithm. The joint algorithm (called the SDGS algorithm) speeds up convergence of the iterative effect of the algorithm without increasing the complexity, and achieves error performance close to the MMSE ideal matrix inversion detection method. The steps are in SDGS Algorithm.

#### **SDGS Algorithm**


$$s^{(0)} = D^{-1} \circlearrowright \tag{15}$$

Since *D* is a diagonal matrix, it is obvious that calculating *D*−<sup>1</sup> requires only low complexity, and the initial value, *s*(0), is set to the initial value of the SD algorithm according to Equation (15).

• **Step 3:** The iterative results of the first two GS algorithms are represented by the SD algorithm, and the second GS iteration result can be expressed as:

$$\begin{split} s^{(2)} &= (D+L)^{-1} \left( \mathfrak{H} - L^H \mathfrak{s}^{(1)} \right) = (D+L)^{-1} \left[ \left( (D+L) - W \right) \mathfrak{s}^{(1)} + \mathfrak{H} \right] \\ &= s^{(1)} + (D+L)^{-1} \left( \mathfrak{H} - W \mathfrak{s}^{(1)} \right) = s^{(1)} + (D+L)^{-1} r^{(1)} \end{split} \tag{16}$$

$$\text{where } r^{(1)} = \mathcal{Y} - \mathcal{W} \left( s^{(0)} + \iota r^{(0)} \right) = \mathcal{Y} - \mathcal{W}s^{(0)} - \iota \mathcal{W}r^{(0)} = r^{(0)} - \iota \eta r^{(0)}; \text{ } \mathfrak{u} = \frac{\left( r^{(0)} \right)^H r^{(0)}}{\left( p^{(0)} \right)^H r^{(0)}} \tag{17}$$

• **Step 4:** Combine single SD and GS iterations into one hybrid iteration by substituting Equation (17) and

$$s^{(1)} = s^{(0)} + \mu r^{(0)} \to s^{(2)} = s^{(0)} + \mu r^{(0)} + (D + L)^{-1} \left(r^{(0)} - \mu p^{(0)}\right) \tag{18}$$

This represents the first two GS iterations as Equation (18); update the mixed iteration value *s*ˆ (1) = *s*(2), and then perform the next GS iteration.

• **Step 5:** Using the (*i* − 1)th GS iteration using Equation (14), ideal estimated value *s*ˆ (*i*) of the transmitted signal vector *s* can be obtained by setting the appropriate number of iterations, *i*:

$$\mathfrak{sl}^{(i)} = \left(D + L\right)^{-1} \left(\mathfrak{J} - L^H \mathfrak{sl}^{(i-1)}\right) \tag{19}$$

Then, *s*ˆ (*i*) is related to the complex domain for the next soft decision, so the hybrid iterative algorithm can converge very quickly after a small number of iteration.

#### *4.2. Approximate Log-Likelihood Ratio Calculation*

The low-complexity MMSE signal detection algorithm described in [13–16] directly estimates the transmitted signal vector *s*ˆ without calculating *W*−**1**. The exact calculation of the LLR for the channel decoder input is described in Section 1 (i.e., using the exact *W*−**<sup>1</sup>** matrix inversion information), which is not difficult to find with Equation (7). When the LLR of the first bit transmitted by the *i***th** user is *Li*,*b*, the inverse *W*−**<sup>1</sup>** of the MMSE detector filter matrix *W* needs to be used again to calculate the SINR of the *i***th** user. Consider using the *W* approximation of the diagonal property to replace *W*−**<sup>1</sup>** and *<sup>D</sup>*−**1**, that is, *<sup>W</sup>*<sup>6</sup> <sup>−</sup>**<sup>1</sup>** <sup>≈</sup> *<sup>D</sup>*−**1**, and then convert it to the complex domain to get *<sup>W</sup>*<sup>6</sup> <sup>−</sup>**<sup>1</sup>** *<sup>c</sup>* , in order to obtain the approximate channel gain and NPI variance, expressed as:

$$
\tilde{\mu}\_i = \dot{\mathcal{U}}\_{il} \tag{20}
$$

$$\widetilde{\boldsymbol{\sigma}}\_{l}^{2} = \sum\_{j \neq i}^{K} \left| \widetilde{\mathbf{U}}\_{jl} \right|^{2} + \widetilde{\mathbf{E}}\_{il} \boldsymbol{\sigma}^{2} \tag{21}$$

where *<sup>U</sup>*<sup>6</sup> <sup>≈</sup> *<sup>W</sup>*<sup>6</sup> <sup>−</sup>**<sup>1</sup>** *<sup>c</sup> Gc* <sup>=</sup> *<sup>D</sup>*−**<sup>1</sup>** *<sup>c</sup> Gc*, and *<sup>E</sup>*<sup>6</sup> <sup>≈</sup> *<sup>W</sup>*<sup>6</sup> <sup>−</sup>**<sup>1</sup>** *<sup>c</sup> GcW*<sup>6</sup> <sup>−</sup>**<sup>1</sup>** *<sup>c</sup>* <sup>=</sup> *<sup>U</sup>*6*W*<sup>6</sup> <sup>−</sup>**<sup>1</sup>** *<sup>c</sup>* <sup>=</sup> *UD*<sup>6</sup> <sup>−</sup>**<sup>1</sup>** *<sup>c</sup>* , so we can calculate *Yi* <sup>=</sup> *<sup>μ</sup>*6**<sup>2</sup>** *<sup>i</sup>* /*v*6**<sup>2</sup>** *i* .

#### *4.3. Complexity Analysis*

According to the number of real multiplications required in the algorithm, the computational complexity of the SDGS detection algorithm proposed in this paper was analyzed. Since all linear MMSE detection algorithms and the proposed algorithm must calculate the filter matrix, *W* = *G* + *I***2***k*, and the matched filter signal, *y*ˆ = *HHy*, then only the other parts were analyzed for complexity, mainly using the following three parts of the composition.

#### 4.3.1. Initial Value and First Iteration Calculation

Equation (15) requires **<sup>2</sup>***<sup>K</sup>* multiplications. The first iteration is mainly to calculate *<sup>r</sup>*(**0**) <sup>=</sup> *<sup>y</sup>*<sup>ˆ</sup> <sup>−</sup>*Ws*(**0**), *p*(**0**) = *Wr*(**0**), and scalar. Obviously, the respective **4***K***2**, **4***K***2**, and **4***K* sub-multiplications are required. Combining the first iteration of Equation (18), a total of **2***K***<sup>2</sup>** + **10***K* multiplications are required.

#### 4.3.2. GS iteration

Equation (19) can be expressed as (*<sup>D</sup>* <sup>+</sup> *<sup>L</sup>*)*s*ˆ(*i*) <sup>=</sup> *<sup>y</sup>*<sup>ˆ</sup> <sup>−</sup>*LHs*ˆ(*i*−**1**) <sup>=</sup> *<sup>c</sup>*. After *<sup>i</sup>* iterations, the calculation of *<sup>s</sup>*ˆ(*i*) mainly comes from the following two steps: First, *<sup>c</sup>* is a **<sup>2</sup>***<sup>K</sup>* <sup>×</sup> **<sup>2</sup>***<sup>K</sup>* strictly lower triangular element matrix; **<sup>2</sup>***<sup>K</sup>* <sup>×</sup> **<sup>2</sup>***<sup>K</sup>* and the **<sup>2</sup>***<sup>K</sup>* <sup>×</sup> **<sup>1</sup>** vector *<sup>s</sup>*ˆ(*i*−**1**) are multiplied, and *<sup>c</sup>* must be multiplied **<sup>2</sup>***K***<sup>2</sup>** <sup>−</sup> *<sup>K</sup>* times. Second, in Equation (19), the *m***th** element, ˆ*s* (*i*) *<sup>m</sup>* , can be expressed as:

$$s\_m^{(i)} = \begin{cases} \frac{\frac{c\_1}{L\_{11}} \cdot m = 1}{\frac{c\_m - \sum\_{k=1}^{m-1} s\_k^{(i)} L\_{mk}}{L\_{mm}}}, m = 2, \dots, 2K \end{cases} \tag{22}$$

where *cm* represents the *m***th** element of *c*, and *Lmk* represents the *m***th** row and *k***th** column element of the lower triangular matrix (*D* + *L*). When *m* = **1**, it is obvious that *s*ˆ (*i*) **<sup>1</sup>** requires **2***K* multiplications, and all *s*ˆ (*i*) *<sup>m</sup>* (*m* = **2**, ..., **2***K*) require **<sup>2</sup>***K***<sup>2</sup>** <sup>−</sup> *<sup>K</sup>* multiplications, so a total of **2***K***<sup>2</sup>** multiplications are required for each iteration.

#### 4.3.3. LLR calculation

The computational complexity of this part mainly came from the calculation of the effective channel gain and the NPI variance after equalization. It can be known from Equations (20) and (21) that all the elements of the matrix *<sup>U</sup>*<sup>6</sup> and the pair of matrices *<sup>E</sup>*<sup>6</sup> need to be calculated. Obviously, the former requires **2***K***<sup>2</sup>** multiplications, while the latter only requires **2***K* multiplications. Therefore, a total of **2***K***<sup>2</sup>** + **2***K* multiplications were required for this step.

In summary, the total complexity required for the joint iterations to be applied to the soft decision was **2***K***2**(*i* + **2**) + **12***Ki*, which reduced the computational complexity by an order of magnitude, compared to the traditional MMSE algorithm. The complexity of the number of iterations was kept at *O K***<sup>2</sup>** . In addition, considering the application scenarios of hard decision detection, Table 1 also gave a comparison of the computational complexity in the four detection algorithms.

**Table 1.** Complexity comparison of four kinds of detection algorithms for hard decision calculation.


#### **5. Simulation Results**

We deployed Matlab (R2017a, Mathworks, Natick, MA, USA) for performing analysis and experimentation. In order to verify the soft and hard detection performance of the SDGS algorithm proposed in this paper, this section presents Monte Carlo simulation results based on Matlab. The main simulation parameters configured are in Table 2.


**Table 2.** Simulation parameters.

Figure 2 compares the bit error rate (BER) based on Neumann series (NS) expansion, the conjugate gradient (CG) detection algorithm, the Gauss–Seidel iterative detection algorithm, the MMSE exact inversion detection algorithm, and the proposed SDGS joint algorithm under different antenna configurations. The decision mode is a hard decision; that is, estimated signal vector *s*ˆ is directly judged. The simulation results showed that the detection performance of the various algorithm increased with the number of iterations or the number of items expanded by the Neumann series. For example, when the number of iterations *i* = 2, the BER performance of the SDGS algorithm was much better than the BER when the number of items expanded by the Neumann series was 2. By comparing the performance in Figure 2a,b, it can be seen that with the increase in the ratio of the number of base station antennas to the number of users (*N*/*K*), the BER performance of the various algorithms was greatly improved. For example, if the BER was to reach 10<sup>−</sup>3, the MMSE algorithm and the proposed algorithm require an SNR of about 13 dB when the antenna configuration is 64 × 16, and only 8 dB when the configuration is 128 × 16.

**Figure 2.** Hard decision bit error rate (BER) performance comparison. (**a**) Analysis at 64 × 16 antenna configuration; (**b**) Analysis at 128 × 16 antenna configuration.

Figure 3 shows the soft decision simulation results for the two antenna configurations. With BER based on the NS, CG, and GS iterative detection algorithms, the MMSE exact inversion detection algorithm and the SDGS joint algorithm were compared. We set the system's convolutional code rate to 1/2, and the LLR calculation used the approximate calculation method described in this paper. Simulation results showed that no matter what kind of MMSE receiver was used, the soft decision was checked. The measured performance was much better than the hard decision. For example, when the BER reached 10−4, when the antenna was configured, the MMSE algorithm and the proposed algorithm required an SNR of 10 dB for hard decisions and only 5 dB for soft decisions. In addition, for the same number of iterations, the BER performance of the SDGS algorithm proposed in this paper was better than the other three simplified algorithms, and after a few iterations, the detection performance could quickly approach the detection performance of the ideal MMSE filter matrix inversion.

**Figure 3.** Soft decision bit error rate (BER) performance comparison. (**a**) Analysis at 64 × 16 antenna configuration; (**b**) Analysis at 128 × 16 antenna configuration.

Figure 4 shows the hard decision BER comparison of the proposed SDGS algorithm with NS, CG, GS and MMSE under a high fading scenario with 128 × 16 antenna configuration. As can be seen from Figure 4 the BER of the proposed SDGS algorithm was better and followed the MMSE performance with increasing SNR and number of iterations. Moreover, due to high fading impact on the SNR, there was a gap between the proposed SDGS algorithm and MMSE algorithm at a high SNR level.

Figure 5 shows the BER comparison of the proposed SDGS algorithm with NS, CG, GS and MMSE under a low fading level and 128 × 16 antenna configuration. It can be seen from Figure 5 that all the algorithms showed lower BER and better performance as compared with the hard decision BER performance in Figures 2 and 4. Therefore, to keep the system performance in a suitable level, the fading level and number of iterations should be considered, which has an obvious impact on the system's overall performance. Furthermore, the proposed SDGS algorithm in Figure 5 had a close BER performance with MMSE which indicated that the SDGS algorithm showed better performance in the low fading level.

**Figure 4.** Hard decision bit error rate (BER) performance comparison with high fading level at 128 × 16 antenna configuration.

**Figure 5.** Soft decision bit error rate (BER) performance comparison with slow fading level at 128 × 16 antenna configuration.

#### **6. Conclusions**

Signal detection methods based on MMSE filtering in massive MIMO systems are widely used, but matrix inversion with higher complexity makes it more difficult to implement them in practical applications. Some methods of approximate inversion, such as Neumann series expansion, has reduced the detection complexity, but due to a large degree of detection performance loss; others avoid the complex matrix inversion and directly estimate the signal vector. Although, computational complexity is reduced by orders of magnitude, detection performance needs to be improved. Based on the MMSE criterion, this paper proposes a low-complexity, hybrid, iterative SDGS joint detection algorithm, which directly estimates the user's transmitted symbol vector and can quickly converge to obtain an ideal estimation value with a few simple iterations. The matrix inversion operation is avoided, and

algorithm complexity is kept at *O K*2 . In addition, in order to make full use of soft information, the algorithm is applied to the soft decision, and an approximate calculation method of the LLR for channel decoding is given, which further improves the signal detection performance. Theoretical derivation and simulation results show that the SDGS algorithm can be used as one of the most effective solutions for signal detection in massive MIMO systems.

**Author Contributions:** I.K. conceived and designed the presented idea and developed the theory, performed the simulations, and wrote the paper. M.H.Z. analyzed the research and performed experimentations, and M.A. provided extensive technical support throughout the research. S.K. provided extensive support in the theoretical analysis and provided funding support.

**Funding:** This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF-2016R1D1A1B03934653).

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **A Novel Iterative Discrete Estimation Algorithm for Low-Complexity Signal Detection in Uplink Massive MIMO Systems**

#### **Hui Feng 1, Xiaoqing Zhao 2, Zhengquan Li 2,\* and Song Xing <sup>3</sup>**


Received: 1 August 2019; Accepted: 29 August 2019; Published: 2 September 2019

**Abstract:** In this paper, a novel iterative discrete estimation (IDE) algorithm, which is called the modified IDE (MIDE), is proposed to reduce the computational complexity in MIMO detection in uplink massive MIMO systems. MIDE is a revision of the alternating direction method of multipliers (ADMM)-based algorithm, in which a self-updating method is designed with the damping factor estimated and updated at each iteration based on the Euclidean distance between the iterative solutions of the IDE-based algorithm in order to accelerate the algorithm's convergence. Compared to the existing ADMM-based detection algorithm, the overall computational complexity of the proposed MIDE algorithm is reduced from *O N*<sup>3</sup> *t* + *O NrN*<sup>2</sup> *t* to *O N*<sup>2</sup> *t* + *O* (*NrNt*) in terms of the number of complex-valued multiplications, where Ntand Nr are the number of users and the number of receiving antennas at the base station (BS), respectively. Simulation results show that the proposed MIDE algorithm performs better in terms of the bit error rate (BER) than some recently-proposed approximation algorithms in MIMO detection of uplink massive MIMO systems.

**Keywords:** massive MIMO systems; MIDE algorithm; low computational complexity; BER

#### **1. Introduction**

With the development of the mobile Internet and the Internet of Things, much high data rate communication is required in the new generation of cellular networks like 5G [1]. By equipping hundreds of antennas at the base station (BS) serving tens of users, the massive multiple-input multiple-output (MIMO) is deemed one key technology for meeting the 5G requirements due to its improvements in data throughput, link reliability, higher spectral efficiency, and better communication quality compared with the traditional MIMO usage [2–4].

However, when applying the massive MIMO, a major computational challenge is the data detection in uplink MIMO systems due to the large increase in the system dimensions [5]. The maximum likelihood (ML) is the optimal detection approach on data detection, but its computational complexity grows exponentially with the number of user equipment (UE) and the modulation order [6,7]. Some suboptimal detection alternatives are proposed to reduce the computational complexity while obtaining a good bit error rate (BER) performance. For example, the linear minimum mean squared error (LMMSE) algorithm is one of the widely-used suboptimal detection algorithms with near-optimal BER performance and reduced computational complexity [8]. However, the LMMSE algorithm still involves the computation of the Gram matrix, as well as matrix inversion, where their corresponding computational complexity is *O NrN*<sup>2</sup> *t* and *O N*<sup>3</sup> *t* , respectively, with *Nt* denoting the number of single-antenna UE and *Nr* denoting the number of antennas at the BS.

It is worth noting that some approaches for approximating the matrix inversion have been proposed to reduce the computational complexity [9–19], among which, for example, the Neumann series (NS) approximation is used to approximate the matrix inversion by a series of truncated NS expansions [9,10]. However, only marginal reduction in the computational complexity can be reached with the increased terms of the NS expansion. Hence, various classical iterative algorithms have been provided to approximate the inverse matrix in LMMSE detection to achieve low computational complexity, which include the Richardson (RI) algorithm [11], the Jacobi algorithm [12–15], the successive over relaxation (SOR) algorithm [16], the symmetric successive over relaxation (SSOR) algorithm [17], and the Gauss–Seidel (GS) algorithm [18,19]. The computational complexity of the matrix inverse is reduced by these approximation-based detection algorithms from *O N*<sup>3</sup> *t* to *O N*<sup>2</sup> *t* . It is noted that these algorithms involving the inverse-matrix approximation in LMMSE detection (also known as the approximated LMMSE) achieve a near-LMMSE performance, but with lower computational complexity.

It is well known that the approximated LMMSE detection algorithms achieve a substantial performance loss when *Nr* > *Nt* in multiuser massive MIMO systems. Various algorithms have been proposed to obtain a better BER performance than that of LMMSE detection in multiuser massive MIMO systems, which include the non-convex and the convex optimization algorithms [20–22].

For example, the alternating minimization (AltMin) algorithm is one of the non-convex optimization algorithms, which is applied to the data detection in a multiuser massive MIMO system [20]. Specifically, the AltMin algorithm converts the ML detection problem into a sum of convex functions by decomposing the received vector into multiple sub-vectors. Hence, the non-convex problem is transformed into the convex problem in the AltMin algorithm. The AltMin algorithm shows better BER performance than that of the LMMSE detector in overloaded network scenarios with relatively low computational complexity. However, it shows near-LMMSE performance with even higher computational complexity when the ratio of the number of BS antennas to the number of single-antenna users is larger.

Similarly, some convex optimization algorithms are used to solve the non-convex optimization problems, which include, for example, the alternating direction method of multipliers (ADMM) detection algorithm [21,22]. In the multiuser massive MIMO system, ADMM demonstrates better BER performance than the LMMSE detection algorithm with the relatively low computational complexity of the iterative procedure. However, the computational complexity of the preprocessing in the ADMM algorithm includes the calculation complexity of the Gram matrix and LDLdecomposition [23,24], which results in very high computational complexity for massive MIMO systems.

To make a tradeoff between the performance and the computational complexity with different antenna configurations, the iterative discrete estimation (IDE) algorithm is integrated into the ADMM algorithm [25], which presents low computational complexity due to the avoidance of the calculation of the Gram matrix and LDL decomposition. Motivated by the aforementioned algorithms, we propose a modified IDE (MIDE) algorithm to present a better BER performance and lower computational complexity than the ADMM algorithms. To summarize, the main contributions of this work are listed as follows.

• A novel iterative data detection algorithm for uplink multiuser massive MIMO systems is designed by exploiting the IDE-based algorithmic framework. The proposed MIDE algorithm refactors the detection algorithm as a series of simpler subproblems with closed-form solutions.

• A heuristic damping factor is defined based on the Euclidean distance instead of a fixed factor. Compared with the fixed damping factor, this self-updated damping factor contributes to a faster convergence in the proposed MIDE algorithm.

• The computational complexity analysis indicates that the proposed algorithm has a lower computational complexity than the traditional approximated detection approximation algorithms (LMMSE, AltMin, and ADMM), under the same BER performance. Specifically, the complexity of the novel MIDE detection algorithm is only *O N*<sup>2</sup> *t* + *O* (*NrNt*).

• Simulation results reveal that for the typical independent and identically distributed (i.i.d.) frequency flat Rayleigh fading channel in massive MIMO systems, the proposed MIDE detection algorithm performs better than the ADMM and AltMin-based detection algorithms and the LMMSE detection algorithm in terms of BER performance with various system configurations.

The rest of the paper is organized as follows. In Section 2, we briefly introduce the system model. Section 3 specifies the proposed low-complexity signal detection based on the IDE algorithm and performs the computational complexity analysis of the algorithms. In Section 4, the numerical simulation results of the BER performance are presented. Finally, Section 5 concludes the paper.

Notation: Throughout the paper, the lowercase and uppercase boldface type is used for vectors (e.g., **a**) and matrices (e.g., **A**). The superscripts (·) <sup>−</sup><sup>1</sup> and (·) *<sup>H</sup>* denote the matrix inversion and the conjugate transpose, respectively. The *L*<sup>2</sup> norms of the vectors are represented by ·2. and denote the real part and the imaginary part of the complex-valued signal, respectively.

The typical uplink massive MIMO system is considered in this work, as shown in Figure 1, where there are *Nt* single-transmitting antenna UE devices and *Nr* receiving antennas at the BS [26]. In general, *Nr* is larger than *Nt* for an uplink massive MIMO communication system [27].

**Figure 1.** MIMO system signal detection structure diagram.

#### **2. System Model**

At the transmitter side, the source information **s** = [**s**1, ··· , **s***i*, ··· **s***Nt* ] *<sup>T</sup>* where each symbol *s* is mapped to constellation symbols by taking symbols from a set of the constellation alphabet Ω. The transmitted signal **x** = [**x**1, ··· , **x***i*, ··· **x***Nt* ] *<sup>T</sup>* is constructed by the modulated symbol **<sup>s</sup>**, where **<sup>x</sup>***<sup>i</sup>* denotes a signal transmitted by the *i* th UE device. The vector **<sup>y</sup>** = [**y**1, ··· , **<sup>y</sup>***i*, ··· **<sup>y</sup>***Nr* ] *<sup>T</sup>* represents the receiving signal at the BS, where **y***<sup>i</sup>* denotes a signal received by the *i* th receive antenna, and:

$$\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w} \tag{1}$$

where **w** is the *Nr*-by-one additive white Gaussian noise (AWGN) vector following *CN* 0, *σ*<sup>2</sup> with *σ*<sup>2</sup> representing the average power of the noise. In Equation (1), the matrix **H** denotes the *Nr*-by-*Nt* flat fading channel gain, and **H** can be expressed as:

$$\mathbf{H} = \begin{bmatrix} h\_{11} & h\_{12} & \dots & h\_{1N\_t} \\ h\_{21} & h\_{22} & \dots & h\_{2N\_t} \\ \vdots & \vdots & \ddots & \vdots \\ h\_{N\_r1} & h\_{N\_r2} & \dots & h\_{N\_rN\_t} \end{bmatrix} \tag{2}$$

where the element *hi*,*j*, *i* = {1, 2, ··· , *Nr*},*j* = {1, 2, ··· , *Nt*} denotes the channel impulse response between the receiving antenna *j* and the user *i*. It is noted that *hij* follows an i.i.d. Gaussian distribution with zero mean and unit variance. In addition, the channel matrix **H** is assumed to be known perfectly at the BS [28,29].

#### **3. Proposed MIDE Algorithm**

We first propose a low-complexity signal detection algorithm, which converts the ML problem into the constrained non-convex problem and utilizes the IDE-based algorithm to solve the problem iteratively. Then, a self-updating method is proposed to update the damping factor based on the Euclidean distance, which accelerates the convergence in the proposed MIDE algorithm. Finally, the computational complexity analysis of the proposed algorithm is provided with the comparison to the conventional algorithms.

#### *3.1. ML Problem Formulation and IDE-Based Algorithm*

Detecting the transmitted symbol vector **x** at the BS can be done by minimizing the squared Euclidean distance between the received signal vector **y** and the hypothesized received signal **Hx** with the vector **x** constrained to the modulation constellation Ω*Nt* , which can be represented as:

$$\hat{\mathbf{x}} = \underset{\mathbf{x} \in \Omega^{\mathcal{N}\_t}}{\text{arg min}} \left\| \mathbf{y} - \mathbf{H} \mathbf{x} \right\|\_2^2 \tag{3}$$

It is noted that in Equation (3), the finite-alphabet constraint *<sup>x</sup>* ∈ <sup>Ω</sup>*Nt* can be converted into the indicator function **I**<sup>Ω</sup> (**xˆ**), which is given by:

$$I\_{\Omega} \left( \mathfrak{k} \right) = \begin{cases} \ \text{0, if} \mathfrak{k} \in \Omega^{N\_l} \\ \ \text{\infty, otherwise} \end{cases} \tag{4}$$

By combining Equation (3) and (4), the signal detection problem can be converted into the constrained optimization problem, which is given by:

$$\begin{aligned} \underset{\mathbf{z}, \mathbf{\hat{z}}}{\text{minimize}} & \left\| \mathbf{y} - \mathbf{H} \mathbf{z} \right\|\_{2}^{2} + \mathbf{I}\_{\Omega} \left( \mathbf{\hat{x}} \right) \\ \text{s.t.} & \mathbf{z} - \mathbf{\hat{x}} = \mathbf{0} \end{aligned} \tag{5}$$

where **z** is the least-squares solution of the least-squares and **xˆ** is the estimated transmitted symbol. Since the optimization problem is defined over complex-valued variables, the Lagrangian objective function for the optimization problem in Equation (5) can be remodeled as:

$$L\_{\uparrow}\left(\mathbf{z}, \mathbf{\hat{x}}, \mathbf{u}\right) = \left\|\mathbf{y} - \mathbf{H}\mathbf{z}\right\|\_{2}^{2} + \mathbf{I}\_{\Omega}\left(\mathbf{\hat{x}}\right) + \gamma\left\|\mathbf{z} - \mathbf{\hat{x}}\right\| + \mathbf{u}^{H}\left(\mathbf{z} - \mathbf{\hat{x}}\right) \tag{6}$$

where *γ* > 0 is the penalty parameter and **u** is the dual vector. In order to solve this problem efficiently, we decompose it into three sub-problems. First, we solve **z** while holding **xˆ** and **u** fixed; then, we solve **xˆ** while holding **z** and **u** fixed; finally, we solve **u** while holding **xˆ** and **z** fixed. Specifically, the following procedure is repeated with iterations.

$$\mathbf{z}^{k} = \underset{\mathbf{z}}{\arg\min} \, L\_{\gamma} \left( \mathbf{z}, \mathbf{\hat{x}}^{k-1}, \mathbf{u}^{k-1} \right) \tag{7}$$

$$\mathbf{\hat{x}}^{k} = \underset{\mathbf{\hat{x}}}{\text{arg min}} \, L\_{\gamma} \left( \mathbf{z}^{k}, \mathbf{\hat{x}}^{k}, \mathbf{u}^{k-1} \right) \tag{8}$$

$$\mathbf{u}^{k} = \mathbf{u}^{k-1} + \gamma \left(\mathbf{z}^{k} - \mathbf{\hat{x}}^{k}\right) \tag{9}$$

Note that the **z**-minimization procedure is convex, but the **xˆ**-update is projected onto a convex set Ω*Nt* . To make the iterative procedure converge, the IDE-based algorithm is applied to remove the dual vector **u** at each iterative and turn the **z**-update and **xˆ**-update to reach a consensus. After manipulation, the **xˆ**-update involves solving a linearly-constrained minimum Euclidean-norm problem, and the **z**-update in IDE is given by:

$$\mathbf{z}^{k} = \mathbf{\hat{x}}^{k-1} + \left[ \operatorname{diag} \left( \mathbf{H}^{H} \mathbf{H} \right) \right]^{-1} \mathbf{H}^{H} \left( \mathbf{y} - \mathbf{H} \mathbf{\hat{x}}^{k-1} \right) \tag{10}$$

Hence, the **xˆ**-update step can be represented as:

$$\mathbf{x}^{k+1} = \Pi\_{\Omega} \left( \mathbf{z}^{k+1} \right) \tag{11}$$

where ΠΩ (·) is the projection onto <sup>Ω</sup>*Nt* , i.e., the elements of **<sup>z</sup>***k*+<sup>1</sup> can be implemented through simple rounding of each component to the closest element in Ω.

Additionally, to make the iteration converge, a fixed damping factor *α* is employed in the IDE-based algorithm to update the iterative solution **xˆ**, i.e.,

$$\mathfrak{X}\_d^{k+1} = \left(1 - a^k\right) \mathfrak{X}\_d^k + a^k \mathfrak{X}^{k+1} \tag{12}$$

where **xˆ** *<sup>k</sup>*+<sup>1</sup> *<sup>d</sup>* denotes the solution after updating. By applying **xˆ***<sup>d</sup>* to the **z**-update, the IDE-based algorithm for detection can be expressed as:

$$\mathbf{z}^{k} = \mathbf{\hat{x}}\_{d}^{k-1} + \left[ \operatorname{diag} \left( \mathbf{H}^{H} \mathbf{H} \right) \right]^{-1} \mathbf{H}^{H} \left( \mathbf{y} - \mathbf{H} \mathbf{\hat{x}}\_{d}^{k-1} \right) \tag{13}$$

$$\mathbf{\hat{x}}^k = \Pi\_\Omega \left( \mathbf{z}^k \right) \tag{14}$$

$$\mathfrak{X}\_d^k = \left(1 - a^k\right) \mathfrak{X}\_d^{k-1} + a^k \mathfrak{X}^k \tag{15}$$

#### *3.2. Modified IDE-Based Detection Algorithm with Self-Update Damping*

The performance of the IDE-based detection algorithm is influenced by the choice of the damping factor. In early studies, this parameter was fixed, e.g., 0.05 in [25]. However, a fixed damping factor is not applicable for all cases, and an optimal damping factor is not easy to obtain. We propose an MIDE herein to decide a proper damping factor *α*, by which the **xˆ***d*-update step is analyzed as follows.

$$\begin{aligned} \mathbf{\dot{x}}\_d^1 &= \left(1 - a^0\right) \mathbf{\dot{x}}\_d^0 + a^0 \mathbf{\dot{x}}^1, \\ \mathbf{\dot{x}}\_d^2 &= \left(1 - a^1\right) \mathbf{\dot{x}}\_d^1 + a^1 \mathbf{\dot{x}}^2 \\ &= \left(1 - a^1\right) \left(\left(1 - a^0\right) \mathbf{\dot{x}}\_d^0 + a^0 \mathbf{\dot{x}}^1\right) + a^1 \mathbf{\dot{x}}^2 \\ &= \left(1 - a^1\right) \left(1 - a^0\right) \mathbf{\dot{x}}\_d^0 + \left(1 - a^1\right) a^0 \mathbf{\dot{x}}^1 + a^1 \mathbf{\dot{x}}^2 \\ &\vdots \\ \mathbf{\dot{x}}\_d^{k+1} &= \left(1 - a^k\right) \mathbf{\dot{x}}\_d^k + a^k \mathbf{\dot{x}}^{k+1} \\ &= \left(1 - a^{k-1}\right) \cdots \left(1 - a^0\right) \mathbf{\dot{x}}\_d^0 + \left(1 - a^{k-1}\right) \cdots \left(1 - a^1\right) a^0 \mathbf{\dot{x}}^1 + \cdots + a^{k-1} \mathbf{\dot{x}}^{k+1} \end{aligned} \tag{16}$$

Since there is no prior information of the final result **xˆ***d*, the initial value of **xˆ**<sup>0</sup> *<sup>d</sup>* can be set as a zero vector. Hence, the expression for **xˆ***<sup>d</sup>* in Equation (16) is composed of the solution of **xˆ** with different values of the damping factor. Based on the expression in Equation (16), the convergence of the iterations can be measured by the difference between **xˆ** *<sup>k</sup> <sup>d</sup>* and **xˆ** *<sup>k</sup>*<sup>+</sup>1. Specifically, the Euclidean distance, which is one of the widely-used approaches for measuring the distance between two vectors [30], is defined as:

$$d^k \left( \mathfrak{X}\_{d'}^k \mathfrak{X}^{k+1} \right) = \sqrt{\sum\_{i=1}^{N\_t} \left( \mathfrak{X}\_d^k(i,1) - \mathfrak{X}^{k+1}(i,1) \right)^2} \tag{17}$$

We can notice from Equation (17) that the smaller the distance *d*, the closer the two vectors, implying the convergence of the IDE iterations. Based on this discussion, the following heuristic damping factor at the *k*th iteration is defined as:

$$\alpha^k = \frac{d^k}{d^k + q} \tag{18}$$

where *q* is a positive constant. Obviously, a higher value of *d* leads to a higher *α*, and vice versa. In other words, when *<sup>d</sup><sup>k</sup>* → 0, *<sup>α</sup><sup>k</sup>* → 0, and when *<sup>d</sup><sup>k</sup>* → 1, *<sup>α</sup><sup>k</sup>* → 1. We utilize the first iteration result *<sup>d</sup>*<sup>1</sup> to obtain the value of *p*. According to Equation (16), *d*<sup>1</sup> can be calculated as:

$$\begin{split} d^1 &= \sqrt{\sum\_{i=1}^{N\_l} \left( \mathfrak{X}\_d^0(i,1) - \mathfrak{X}^1(i,1) \right)^2} \\ &= \sqrt{\sum\_{i=1}^{N\_l} \left( \mathfrak{X}^1(i,1) \right)^2} \\ &= \sqrt{\sum\_{i=1}^{N\_l} \left( \mathfrak{X}^1(\mathfrak{X}^1(i,1)) \right)^2 + \left( \mathfrak{X}^1(\mathfrak{X}^1(i,1)) \right)^2} \end{split} \tag{19}$$

It is noticed that the value of *d*<sup>1</sup> is varying and decided based on several factors such as the modulation method, the noise in the system, etc. For the ease of calculation, the expectation of *d*<sup>1</sup> is computed instead of the direct calculation of *d*1. Since the real part and the imaginary part of the vector **xˆ**<sup>1</sup> have the same uniform distribution, the expectation for *d*<sup>1</sup> is obtained as:

$$E\left(d^1\right) = 2E\left(\sqrt{\sum\_{i=1}^{N\_t} \left(\Re\left(\hat{\mathfrak{x}}^1\left(i, 1\right)\right)\right)^2}\right) \tag{20}$$

The expectation of *d*<sup>1</sup> is based on the constellation points of the modulation scheme. With 16-QAM, for example, **xˆ**<sup>1</sup> (*i*, 1) can be {−3, −1, +1, +3}, and the probability *p* of each possible value of the point is the same without the prior information, i.e., *p* = 0.25 in this case. Hence, Equation (19) can be rewritten as:

$$E\left(d^1\right) = 2E\left(\sqrt{\sum\_{i=1}^{N\_t} \left(\Re\left(\hat{\mathbf{x}}^1\left(i, 1\right)\right)\right)^2}\right) = 2N\_t\sqrt{\frac{\sum cand^2}{\sqrt{M}}}\tag{21}$$

where *cand* represents the candidate value set of **xˆ**<sup>1</sup> (*i*, 1) and *M* represents the modulation cardinality. For the ease of description, we still take the 16-QAM as an example. Then, *cand* is denoted as the candidate values {−3, <sup>−</sup>1, <sup>+</sup>1, <sup>+</sup>3} and <sup>∑</sup> *cand*<sup>2</sup> <sup>=</sup> (−3) <sup>2</sup> <sup>+</sup> (−1) <sup>2</sup> + 12 + 32. *M* represents the modulation cardinality. Based on experience, the value of *α*<sup>1</sup> = 0.8 is set as 0.8. Then, substituting Equation (22) into Equation (18) yields the estimation of *q* as:

$$q = \frac{E\left(d^1\right)}{4} = \frac{N\_t}{2} \sqrt{\frac{\sum cand^2}{\sqrt{M}}}\tag{22}$$

The simulation results in Section 4 show that the proposed MIDE algorithm can improve the BER performance significantly compared to the IDE-based algorithm, which employs the fixed damping factor. The procedure of the proposed MIDE detection algorithm is illustrated in Algorithm 1.


	- **<sup>2</sup>** 2) **y**, the received signal matrix
	- **<sup>3</sup>** 3) *K*, the number of iterations

#### *3.3. Analysis of the Complexity of the Algorithm*

In this subsection, we analyze the computational complexity of the proposed MIDE algorithm, which is dominated by the multiplications operations. Hence, we compute the number of complex-valued multiplications as the measurement of the computational complexity of the algorithm [31].

 <sup>2</sup>

It can be found in Algorithm 1 that the computational complexity is composed of three parts including (1) preprocess, (2) the **x**-update procedure, and (3) the **x***d*-update procedure.

(1) Preprocess: The first part comes from the related computation before the iterative process. The main factors affecting the computational complexity of the preprocess are the computation of **D** and the multiplication of the *Nt* × *Nt* diagonal matrix and the *Nt* × *Nr* matrix. Let **h***<sup>i</sup>* be the *i* th column of the complex-valued channel matrix **H**. Then, the diagonal calculation can be presented as:

$$\mathbf{D}\_{ij} = \begin{cases} & \frac{1}{\mathbf{h}\_i^H \mathbf{h}\_j}, \quad i = j \\ & 0, \quad i \neq j \end{cases} \tag{23}$$

Therefore, the complexity of the preprocessing is counted as *N*<sup>2</sup> *<sup>t</sup>* + *NrNt*.


and two scalar multiplications with *Nt* × 1 vectors. Then, the complexity in this part is counted as 3*Nt*.

Therefore, the overall required number of complex multiplications by the MIDE algorithm is *NtNr* + *N*<sup>2</sup> *<sup>t</sup>* + *K* (2*NrNt* + 3*Nt*). For comparison, the similar calculations of the complexity of the typical detection algorithms (e.g., LMMSE with full matrix inversion, AltMin, ADMM) are given, as well as the one of the proposed MIDE in Table 1.

**Table 1.** Complexity comparison in terms of complex multiplication operations. LMMSE, linear minimum mean squared error; AltMin, alternating minimization; ADMM, alternating direction method of multipliers; MIDE, modified iterative discrete estimation.


Note that all these algorithms utilize the approximation approaches to solve the ML problem. It is obvious that the proposed MIDE algorithm and AltMin algorithm have lower computational complexity at each iteration among the compared iterative approaches. The numerical analysis will be provided for further analysis of the computational complexity, which depends on the number of iterations *K* and is presented in Section 4.

#### **4. Simulation Results**

#### *4.1. BER Performance Evaluation*

The BER performance of the proposed algorithm was evaluated and compared with the ones of other detection algorithms by numerical simulations. The simulation parameters are listed in Table 2. Several typical detection algorithms were selected for comparison, which were introduced in Section 1 and listed as LMMSE, AltMin, and ADMM.

We first considered the number of iterations for the antenna configurations of *Nt* × *Nr* = 16 × 128, 32 × 128 and 64 × 128. Figure 2 illustrates the BER performance of the proposed MIDE detection algorithm against the number of iterations, where the SNR was set as 3 dB. It is observed that with the increase of *Nr*/*Nt*, the convergence number of the iterations required by the proposed MIDE was almost the same and a very small one, e.g., 10 in all simulations. This demonstrated a reliable BER performance and a fast detection convergence in our proposed MIDE algorithm. Moreover, with the increase of the transmitting antennas, the diversity gain of the MIMO system decreased, leading to a degradation of the BER performance of the proposed algorithm and the compared MIMO detectors.



**Figure 2.** BER performance versus proposed iterations with SNR = 3 dB.

Figure 3 depicts the BER performance comparison of the IDE-based detection algorithm with the fixed damping factor *α* = 0.05 and the proposed MIDE detection with the self-updated damping factor. It is clear that the proposed MIDE-based detection with the self-updated damping factor showed better BER performance than the conventional IDE-based detection algorithm with all antenna configurations.

**Figure 3.** BER performance of the conventional damping factor and the proposed damping factor.

Moreover, Figure 4 compares the BER performances of the proposed MIDE algorithm and the conventional ADMM algorithm. From the figure, it is clear that the BER performance of both algorithms degraded when the number of users increased. However, it is observed that the proposed MIDE-based detection performed better than the conventional ADMM detection in terms of BER performance with all antenna configurations. Furthermore, we can observe from Figure 4 that when the target of BER was set as 10<sup>−</sup>3, the SNR required by the proposed algorithm was at least 0.5 dB less than the one of the conventional ADMM algorithm.

**Figure 4.** BER performance comparison of the conventional ADMM and the proposed algorithm.

Finally, we show the BER performance comparison of the conventional AltMin algorithm, the LMMSE algorithm, and the proposed MIDE algorithm in Figure 5.

**Figure 5.** BER performance comparison of the AltMin algorithm, the LMMSE algorithm, and the proposed algorithm.

It is clear that with any ratio of *Nr*/*Nt*, the proposed MIDE algorithm performed better than all the compared algorithms in terms of BER performance.

#### *4.2. Computational Complexity Comparison*

The computational complexity of ADMM, AltMin, and the proposed MIDE algorithms depends on the number of iterations *K*. The compared algorithms had a different number of iterations to reach convergence with different antenna configurations. We fixed the number of receiving antennas to *Nr* = 128, and the number of transmitting antennas *Nt* was increased from 16 to 84. Further, we set *K* = 5, 14, and 10 for the conventional ADMM algorithm, the AltMin algorithm [20], and the proposed MIDE algorithm, respectively, based on the convergence simulations. Figure 6 illustrates the total number of multiplications vs. the number of transmitting antennas, which was based on the analysis of the computational complexity of the algorithms in Section 3.3.

**Figure 6.** Computational complexity comparison against the number of users.

From Figure 6, we can see that the computational complexity increased with the number of users in all compared algorithms. However, the proposed MIDE algorithm achieved the lowest computational complexity among all compared algorithms under various antenna configurations. Specifically, the MIDE algorithm showed a relatively lower computational complexity than the AltMin algorithm, which was proven to have a low complexity detection for uplink massive systems in [20]. In addition, from Figure 6, the proposed MIDE algorithm achieved much lower computational complexity than the LMMSE and ADMM detection algorithms when the dimension of the MIMO system became larger. As a consequence, the proposed MIDE detection is much more applicable for massive MIMO systems with its low computational complexity.

#### **5. Conclusions**

In this paper, we proposed a low-complexity, IDE-based detection algorithm in uplink massive MIMO systems. The proposed MIDE algorithm avoided the calculation of the Gram matrix, the matrix inversion, and LDL decomposition to reach a low computational complexity. In addition, a self-updating damping method was provided with the damping factor estimated and updated at each iteration based on the Euclidean distance between the latest two detection solutions, which accelerated the convergence of the IDE-based detection algorithms. Simulation results showed that the proposed MIDE algorithm performed better than the conventional LMMSE, AltMin, and ADMM detection algorithms in terms of the BER performance and the computational complexity.

**Author Contributions:** H.F. and X.Z. contributed the idea generation; S.X. and X.Z. conducted the study design and paper writing; Z.L. helped the analysis of the simulation results.

**Funding:** This work is supported in part by the National Natural Science Foundation of China (No. 61571108, No. 61701197, and No. 61801193), the open research fund of National Mobile Communications Research Laboratory of millimeter wave, Southeast University (No. 2018D15), the open research fund of the National Key Laboratory of millimeter wave, Southeast University (No.K201918), the Open Foundation of Key Laboratory of Wireless Communication, Nanjing University of Posts and Telecommunication (No. 2017WICOM01), and the Wuxi Science and Technology Development Fund (No. H20191001).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Energy Efficiency Optimization for Massive MIMO Non-Orthogonal Unicast and Multicast Transmission with Statistical CSI**

#### **Wenjin Wang \*, Yufei Huang, Li You, Jiayuan Xiong, Jiamin Li and Xiqi Gao**

National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China **\*** Correspondence: wangwj@seu.edu.cn; Tel.: +86-025-8379-0506

Received: 16 July 2019; Accepted: 30 July 2019; Published: 1 August 2019

**Abstract:** We study the energy efficiency (EE) optimization problem in non-orthogonal unicast and multicast transmission for massive multiple-input multiple-output (MIMO) systems with statistical channel state information of all receivers available at the transmitter. Firstly, we formulate the EE maximization problem. We reduce the number of variables to be solved and simplify this large-dimensional-matrix-valued problem into a real-vector-valued problem. Next, we lower the computational complexity significantly by replacing the objective with its deterministic equivalent to avoid the high-complex expectation operation. With guaranteed convergence, we propose an iterative algorithm on beam domain power allocation using the minorize maximize algorithm and Dinkelbach's transform and derive the locally optimal power allocation strategy to achieve the optimal EE. Finally, we illustrate the significant EE performance gain of our EE maximization algorithm compared with the conventional approach through conducting numerical simulations.

**Keywords:** energy efficiency; non-orthogonal unicast and multicast transmission; statistical channel state information; massive MIMO; beam domain

#### **1. Introduction**

As mobile data expands rapidly, it is expected that global wireless data traffic will surpass 100 exabytes per month by 2023 [1]. A considerable proportion of the data traffic, such as massive software updating and sports broadcasting, is of common interest, which stimulates the demand for services that can deliver the same data to a group of user terminals (UTs) efficiently. Since physical layer multicasting can provide efficient point-to-multipoint wireless transmission, it has great potential for future mobile communication systems [2–4].

Recently, non-orthogonal unicast and multicast (NOUM) transmission has been gaining increasing interest [5–7]. At the transmitter, the unicast and multicast signals are precoded and then sent out to the receivers simultaneously, sharing the same time-frequency resources. Compared with the conventional orthogonal unicast and multicast (OUM) transmission, NOUM transmission is more spectrum-efficient, and more suitable for scenarios where both multicast and unicast signals are needed by a UT. Massive multiple-input multiple-output (MIMO) has become one of the core technologies of the fifth generation wireless system for its significant performance in energy efficiency (EE) and spectral efficiency [8,9]. Therefore, there has been considerable research on the combination of multicast transmission and massive MIMO systems [6,10,11]. Please note that mutual coupling is a major concern in massive MIMO because it can weaken the system performance [12–16]. We assume perfect isolation between the antennas without loss of generality.

EE has become a significant design criterion for wireless communication systems [17–19]. The broad-scale antenna arrays equipped at the base station (BS) cause the power consumption to increase in massive MIMO systems, and the energy consumed by wireless communications is

responsible for greenhouse gas emissions [20], which motivates the need to design energy-efficient systems [8,21,22]. EE of a massive MIMO system was considered in [8]. However, it ignores the power consumed by the BS circuit, while in [21], research on maximizing the EE and power transfer efficiency for wireless-powered systems was analyzed, taking the circuit power consumption into account. In [22], how the system parameters (number of antennas, transmitted power and number of UTs) affect the EE of a multi-user MIMO system was investigated.

There are also previous works that studied energy-efficient NOUM transmission in massive MIMO systems [23–25]. In [23], energy-efficient NOUM beamforming in multi-cell multi-user MIMO scenario was studied. An optimization beamforming algorithm was proposed in [24] to optimize the EE in the multi-cell multicast system. The extension of the problem was investigated in [25], which takes antenna selection into consideration.

Please note that most of the previous works made the assumption that the UTs' instantaneous channel state information (CSI) is available at the BS. However, in realistic systems, obtaining good estimates of instantaneous CSI is a challenging job [26–28]. Compared with obtaining instantaneous CSI, the acquisition of statistical CSI is easier and more precise. In [11], rate maximization problem for NOUM massive MIMO transmission was considered, and the EE maximization problem for physical-layer multicast transmission was investigated in [29], both assumed that the BS only has access to the UTs' statistical CSI.

To our knowledge, the research on EE optimization of NOUM transmission for massive MIMO systems with statistical CSI at the transmitter has not been studied yet. We investigate this problem in our work, and the major contributions we provide in this paper are listed as follows:


The remainder of the paper is constructed as follows. The channel model is introduced in Section 2. The EE maximization problem is formulated and investigated in Section 3. Numerical simulations are conducted in Section 4. Section 5 summarizes the paper.

Column vectors and matrices are represented by lower and upper case boldface letters, respectively, whereas italic letters stand for scalars, and the following are other notations used in this paper.


#### **2. System Model**

Consider a single cell massive MIMO system with an *M*-antenna BS, jointly serves *K* UTs. Denote by K - {1, 2, . . . , *K*} the UT set, where the *k*th UT is equipped with *Nk* antennas. The multicast and unicast services are carried out with the same time-frequency resources. The BS sends a multicast signal that is of common interest to all the UTs in the cell while delivering unique messages to UTs according to each UT's demand during the downlink transmission, as shown in Figure 1.

**Figure 1.** System model of NOUM.

Assume the downlink signal sent by the BS is denoted by

$$\mathbf{x} = \mathbf{x}^{\mathbf{m}} + \sum\_{k \in \mathcal{K}} \mathbf{x}\_k^{\mathbf{u}} \in \mathbb{C}^{M \times 1},\tag{1}$$

where **<sup>x</sup>**<sup>m</sup> <sup>∈</sup> <sup>C</sup>*M*×<sup>1</sup> represents the multicast signal and **<sup>x</sup>**<sup>u</sup> *<sup>k</sup>* <sup>∈</sup> <sup>C</sup>*M*×<sup>1</sup> denotes the unicast signal sent to the *k*th UT. Assume that **x**<sup>m</sup> and **x**<sup>u</sup> *<sup>k</sup>* are mutually uncorrelated, zero-mean, and their covariance matrices are **Q**<sup>m</sup> and **Q**<sup>u</sup> *<sup>k</sup>* , respectively. Define tr {**Q**m} as the multicast transmission power and tr <sup>+</sup> **Q**<sup>u</sup> *k* , as the unicast transmission power. At the *k*th UT, the received signal is denoted by

$$\mathbf{y}\_k = \mathbf{H}\_k \mathbf{x} + \mathbf{n}\_k \in \mathbb{C}^{N\_k \times 1},\tag{2}$$

where **H***<sup>k</sup>* is the downlink channel matrix of size *Nk* × *M*, and **n***<sup>k</sup>* ∼ CN **<sup>0</sup>**, *<sup>σ</sup>*2**I***Nk* represents the additive circularly symmetric complex-valued Gaussian noise with the variance being *σ*2.

We adopt Weichselberger's channel model [30,31] in our work because the correlation properties between the transmit and receive ends of Weichselberger's channel model are jointly considered rather than separately characterized in the Kronecker model. Then, we can write the downlink channel matrix in (2) as

$$\mathbf{H}\_k = \mathbf{U}\_k \mathbf{G}\_k \mathbf{V}\_k^H \in \mathbb{C}^{N\_k \times M},\tag{3}$$

where **<sup>U</sup>***<sup>k</sup>* <sup>∈</sup> <sup>C</sup>*Nk*×*Nk* and **<sup>V</sup>***<sup>k</sup>* <sup>∈</sup> <sup>C</sup>*M*×*<sup>M</sup>* are deterministic unitary matrices. **<sup>G</sup>***<sup>k</sup>* <sup>∈</sup> <sup>C</sup>*Nk*×*<sup>M</sup>* represents the downlink channel matrix in the beam domain [26,27,32], and the elements of **G***<sup>k</sup>* are independently distributed random variables with zero-mean. Denote **Ω***<sup>k</sup>* as the beam domain channel power matrix

$$\mathbf{O}\_k = \mathbb{E}\left\{ \mathbf{G}\_k \odot \mathbf{G}\_k^\* \right\} \in \mathbb{R}^{N\_k \times M},\tag{4}$$

where the average power of [**G***k*]*i*,*<sup>j</sup>* is represented by [**Ω***k*]*i*,*<sup>j</sup>* . As **Ω***<sup>k</sup>* has the property of remaining approximately constant while the frequency changes widely, the statistical CSI can be obtained accurately and efficiently [32].

The vast number of antenna arrays employed at the BS brings about new channel properties for massive MIMO systems. For example, as the BS antenna number *M* tends to infinity, the eigenvector matrices of the transmit correlation matrices between the BS and all UTs tend to be the same and are only affected by the BS array topology [32,33]. Denote the corresponding deterministic unitary matrix as **V**, and then in the massive MIMO scenario, the downlink channel matrix becomes

$$\mathbf{H}\_k \stackrel{M \to \infty}{=} \mathbf{U}\_k \mathbf{G}\_k \mathbf{V}^H. \tag{5}$$

Please note that many of the previous works on massive MIMO adopted the channel model mentioned in (5) such as [26,29,34], and it can achieve quite accurate performance [34].

#### **3. NOUM Transmission in Massive MIMO**

#### *3.1. Problem Formulation*

Consider a NOUM massive MIMO system. We assume that there is only one multicast group without loss of generality. Consider the case when the *k*th (∀*k*) UT knows its own instantaneous CSI with proper pilot design [33], while the BS only has access to the statistic CSI of all UTs.

Rewrite the received signal at the *k*th UT by inserting (1) into (2) as follows

$$\mathbf{y}\_k = \mathbf{H}\_k \mathbf{x}^m + \mathbf{H}\_k \mathbf{x}\_k^u + \sum\_{k' \neq k} \mathbf{H}\_k \mathbf{x}\_{k'}^u + \mathbf{n}\_k. \tag{6}$$

Each UT will decode the common multicast signal and its desired unicast signal in order by applying successive interference cancellation (SIC) method.

During the process of multicast decoding, the *k*th UT regards the term **H***k***x**<sup>m</sup> in (6) as the desired message while treating the others as interference. For the covariance matrix of the interference and noise, we have

$$\mathbf{K}\_k^{\mathrm{m}} = \underbrace{\sigma^2 \mathbf{I}\_{\mathrm{N}\_k}}\_{\mathrm{noise}} + \underbrace{\sum\_{k' \in \mathcal{K}} \mathbb{E}\left\{ \mathbf{H}\_k \mathbf{Q}\_k^{\mathrm{u}} \mathbf{H}\_k^{\mathrm{H}} \right\}}\_{\mathrm{interference}} \in \mathbb{C}^{\mathrm{N}\_k \times \mathrm{N}\_k}.\tag{7}$$

Since UT *k* has the knowledge of its own instantaneous CSI and the covariance matrix **K**<sup>m</sup> *k* , during the multicast transmission, we denote by *R*<sup>m</sup> *<sup>k</sup>* the ergodic rate of the *k*th UT

$$R\_k^{\mathfrak{m}} = \mathbb{E}\left\{ \log \det \left\{ \mathbf{K}\_k^{\mathfrak{m}} + \mathbf{H}\_k \mathbf{Q}^{\mathfrak{m}} \mathbf{H}\_k^H \right\} \right\} - \log \det \left\{ \mathbf{K}\_k^{\mathfrak{m}} \right\}.\tag{8}$$

Denote multicast ergodic rate as min *k R*<sup>m</sup> *<sup>k</sup>* . By inserting the massive system model in (5) and det {**<sup>I</sup>** <sup>+</sup> **MN**} <sup>=</sup> det {**<sup>I</sup>** <sup>+</sup> **NM**}, the Sylvester's determinant identity, into (8), the multicast rate *<sup>R</sup>*<sup>m</sup> *k* in (8) becomes

$$R\_k^{\mathfrak{m}} = \mathbb{E}\left\{ \log \det \left\{ \overline{\mathbf{K}}\_k^{\mathfrak{m}} + \mathbf{G}\_k \mathbf{V}^H \mathbf{Q}^{\mathfrak{m}} \mathbf{V} \mathbf{G}\_k^H \right\} \right\} - \log \det \left\{ \overline{\mathbf{K}}\_k^{\mathfrak{m}} \right\},\tag{9}$$

where **<sup>K</sup>**<sup>m</sup> *<sup>k</sup>* is defined as

$$\begin{split} \mathbf{K}\_{k}^{\mathbf{m}} & \triangleq \mathbf{U}\_{k}^{H} \mathbf{K}\_{k}^{\mathbf{m}} \mathbf{U}\_{k} \\ &= \sigma^{2} \mathbf{I}\_{N\_{k}} + \sum\_{k' \in \mathcal{K}} \mathbb{E} \left\{ \mathbf{G}\_{k} \mathbf{V}^{H} \mathbf{Q}\_{k'}^{\mathbf{u}} \mathbf{V} \mathbf{G}\_{k}^{H} \right\} \in \mathbb{C}^{N\_{k} \times N\_{k}}. \end{split} \tag{10}$$

Define a matrix-valued function **A***<sup>k</sup>* (**X**) by **A***<sup>k</sup>* (**X**) - E + **G***k***XG***<sup>H</sup> k* , . Since all the elements of **G***<sup>k</sup>* are zero-mean and independently distributed, the off-diagonal elements of **A***<sup>k</sup>* (**X**) are zero, so **A***<sup>k</sup>* (**X**) is a diagonal matrix-valued function of size *Nk* × *Nk*, and its *i*th diagonal element is

$$\mathbb{E}\left[\mathbf{A}\_{k}\right]\_{i,i} = \text{tr}\left\{\text{diag}\left\{\left(\left[\mathbf{D}\_{k}\right]\_{i,:}\right)^{T}\right\}\mathbf{X}\right\}.\tag{11}$$

Then the terms E + **G***k***V***H***Q**<sup>u</sup> *k* **VG***<sup>H</sup> k* , in (10) can be rewritten as **A***<sup>k</sup>* **V***H***Q**<sup>u</sup> *<sup>k</sup>* **V** .

For the unicast signal decoding, with SCI, the multicast signal is removed, so the interference only contains the unicast signal meant for other UTs. For the covariance matrix of the interference and noise at the *k*th UT, we have

$$\mathbf{K}\_{k}^{\rm u} = \underbrace{\sigma^{2}\mathbf{I}\_{\rm N\_{k}}}\_{\text{noise}} + \underbrace{\sum\_{k' \neq k} \mathbb{E}\left\{\mathbf{H}\_{k}\mathbf{Q}\_{k'}^{\rm u}\mathbf{H}\_{k}^{H}\right\}}\_{\text{interference}} \in \mathbb{C}^{N\_{k} \times N\_{k}}.\tag{12}$$

Then we denote by *R*<sup>u</sup> *<sup>k</sup>* the ergodic rate of the *k*th UT during the unicast transmission

$$\mathcal{R}\_k^{\mathbf{u}} = \mathbb{E}\left\{ \log \det \left\{ \mathbf{K}\_k^{\mathbf{u}} + \mathbf{H}\_k \mathbf{Q}\_k^{\mathbf{u}} \mathbf{H}\_k^H \right\} \right\} - \log \det \left\{ \mathbf{K}\_k^{\mathbf{u}} \right\}.\tag{13}$$

By inserting the massive system model in (5) and the Sylvester's determinant identity into (13), the unicast rate *R*<sup>u</sup> *<sup>k</sup>* at the *k*th UT becomes

$$\mathbb{E}\,R\_k^{\mathbf{u}} = \mathbb{E}\left\{\log \det\left\{\overline{\mathbf{K}}\_k^{\mathbf{u}} + \mathbf{G}\_k \mathbf{V}^H \mathbf{Q}\_k^{\mathbf{u}} \mathbf{V} \mathbf{G}\_k^H \right\} \right\} - \log \det\left\{\overline{\mathbf{K}}\_k^{\mathbf{u}} \right\},\tag{14}$$

where **<sup>K</sup>**<sup>u</sup> *<sup>k</sup>* is defined as

$$\overline{\mathbf{K}}\_{k}^{\mathbf{u}} \stackrel{\scriptstyle \Delta}{=} \sigma^{2} \mathbf{I}\_{\mathbf{N}\_{k}} + \sum\_{k' \neq k} \mathbf{A}\_{k} \left( \mathbf{V}^{H} \mathbf{Q}\_{k'}^{\mathbf{u}} \mathbf{V} \right) \in \mathbb{C}^{N\_{k} \times N\_{k}},\tag{15}$$

and the definition of **A***<sup>k</sup>* (**X**) is given in (11).

Next, we consider the system power consumption. Apply the power consumption model the same as the one used in [29,35] as follows

$$P = \mu \left( \text{tr} \left\{ \mathbf{Q}^{\text{m}} \right\} + \sum\_{k \in \mathcal{K}} \text{tr} \left\{ \mathbf{Q}\_{k}^{\text{u}} \right\} \right) + MP\_{\text{c}} + P\_{\text{s} \prime} \tag{16}$$

where the constant-coefficient *μ* ≥ 1 accounts for the reciprocal of the transmit amplifier drain efficiency. tr {**Q**m} means the multicast transmit power, and <sup>∑</sup>*k*∈K tr <sup>+</sup> **Q**<sup>u</sup> *k* , denotes the total unicast transmit power. *P*<sup>c</sup> stands for the constant circuit power consumption per antenna and is unaffected by the actual transmit power. *P*s represents the BS static power consumption and is irrelevant to the number of antennas.

In the following, we formulate the EE optimization problem for NOUM massive MIMO system. We aim at identifying the optimal transmit covariance matrices **Q**<sup>m</sup> and **Q**<sup>u</sup> *<sup>k</sup>* for multicast and unicast transmission that can maximize the system EE, respectively. We define a weight matrix **u** = [*u*0, *u*1,..., *uK*] with *u*<sup>0</sup> being the weight of multicast rate and *uk* being the weight of *k*th unicast rate. Then we can denote by *R* the weighted sum rate as follows:

$$R \triangleq \iota \iota\_0 \mathcal{K}(\min\_k R\_k^{\text{m}}) + \sum\_{k \in \mathcal{K}} \iota\_k R\_{k'}^{\text{u}} \tag{17}$$

and the EE of the considered system with bandwidth *W* is given by

$$EE = \frac{WR}{P} = \frac{\mathcal{W}\left(\mu\_0 K (\min\_k R\_k^{\text{m}}) + \sum\_{k \in \mathcal{K}} \mu\_k R\_k^{\text{u}}\right)}{\mu \left(\text{tr}\left\{\mathbf{Q}^{\text{m}}\right\} + \sum\_{k \in \mathcal{K}} \text{tr}\left\{\mathbf{Q}\_k^{\text{u}}\right\}\right) + MP\_{\mathbb{C}} + P\_{\mathbb{S}}}.\tag{18}$$

Therefore, the EE maximization problem is stated as

$$\begin{split} \max\_{\mathbf{Q}^{\mathbf{m}}, \mathbf{Q}\_{k}^{\mathbf{u}}, \forall k \in \mathcal{K}} & \quad \frac{\mathcal{W}\left(\boldsymbol{\mu}\_{0} \mathbf{K} (\min\_{k} R\_{k}^{\mathbf{m}}) + \sum\_{k \in \mathcal{K}} \boldsymbol{\mu}\_{k} R\_{k}^{\mathbf{u}}\right)}{\mu\left(\text{tr}\left\{\mathbf{Q}^{\mathbf{m}}\right\} + \sum\_{k \in \mathcal{K}} \text{tr}\left\{\mathbf{Q}\_{k}^{\mathbf{u}}\right\}\right) + M \mathbf{P}\_{\mathbf{c}} + P\_{\mathbf{s}}},\\ \text{s.t.} & \quad \text{tr}\left\{\mathbf{Q}^{\mathbf{m}}\right\} + \sum\_{k \in \mathcal{K}} \text{tr}\left\{\mathbf{Q}\_{k}^{\mathbf{u}}\right\} \le P\_{\text{max}},\\ \mathbf{Q}^{\mathbf{m}} \succeq \mathbf{0}, \ \mathbf{Q}\_{k}^{\mathbf{u}} \succeq \mathbf{0} \ \left(\forall k \in \mathcal{K}\right), \end{split} \tag{19}$$

where *P*max is the power budget at the BS.

#### *3.2. Optimal Transmit Directions*

The problem in (19) aims at designing large-dimensional complex matrices **Q**<sup>m</sup> and **Q**<sup>u</sup> *<sup>k</sup>* (∀*k*), and the computational complexity can be very high. To simplify this problem, first, we decompose the transmit covariance matrices as **Q**<sup>m</sup> = **Φ**m**Λ**<sup>m</sup> (**Φ**m) *<sup>H</sup>* and **Q**<sup>u</sup> *<sup>k</sup>* = **<sup>Φ</sup>**<sup>u</sup> *<sup>k</sup>* **<sup>Λ</sup>**<sup>u</sup> *k* **Φ**<sup>u</sup> *k <sup>H</sup>*, respectively. **Φ**<sup>m</sup> and **Φ**<sup>u</sup> *<sup>k</sup>* are constituted by the eigenvectors of **<sup>Q</sup>**<sup>m</sup> and **<sup>Q</sup>**<sup>u</sup> *<sup>k</sup>* , respectively, which represent the directions of the transmitted signals. Meanwhile, **Λ**<sup>m</sup> and **Λ**<sup>u</sup> *<sup>k</sup>* are diagonal matrices with their diagonal elements constituted by the eigenvalues of **Q**<sup>m</sup> and **Q**<sup>u</sup> *<sup>k</sup>* , respectively, which denote the allocated power over the corresponding directions.

The following theorem determines the values of the eigenvectors of **Q**<sup>m</sup> and **Q**<sup>u</sup> *k* .

**Theorem 1.** *The optimal multicast and unicast transmit covariance matrices of problem* (19) *is*

$$\mathbf{Q}^{\text{m,opt}} = \mathbf{V} \mathbf{A}^{\text{m}} \mathbf{V}^{H}, \quad \mathbf{Q}\_{k}^{\text{u,opt}} = \mathbf{V} \mathbf{A}\_{k}^{\text{u}} \mathbf{V}^{H}, \forall k,\tag{20}$$

*where* **Λ**<sup>m</sup> *and* **Λ**<sup>u</sup> *<sup>k</sup>* (∀*k*) *are both diagonal matrices, and the matrix* **V** *equals to the eigenvector matrices of the correlation matrices between the BS and all UTs and only depends on the BS array topology. The eigenvectors of* **Q**<sup>m</sup> *and* **Q**<sup>u</sup> *<sup>k</sup> are given by the columns of the matrix* **V***,*

#### **Proof.** Please refer to the Appendix A.

Theorem 1 above indicates that when solving problem (19), since the eigenvectors are deterministic, we only have to determine the power allocation matrix denoted by **Λ** - + **Λ**m, **Λ**<sup>u</sup> <sup>1</sup> , **<sup>Λ</sup>**<sup>u</sup> <sup>2</sup> ,..., **<sup>Λ</sup>**<sup>u</sup> *K* , , which reduces the number of variables to be optimized and the computational complexity significantly. Therefore, the large-dimensional complex-matrix-valued EE maximization problem can be transformed into a real-vector-valued power allocation problem in the beam domain.

Rewrite **<sup>K</sup>**<sup>m</sup> *<sup>k</sup>* and **<sup>K</sup>**<sup>u</sup> *<sup>k</sup>* as follows

$$\overline{\mathbf{K}}\_{k}^{\rm m}(\mathbf{A}) \triangleq \sigma^{2} \mathbf{I}\_{\mathcal{N}\_{k}} + \sum\_{k' \in \mathcal{K}} \mathbf{A}\_{k} \left(\mathbf{A}\_{k'}^{\rm u}\right), \tag{21}$$

$$\overline{\mathbf{K}}\_{k}^{\mathbf{u}}\left(\mathbf{A}\right) \triangleq \sigma^{2}\mathbf{I}\_{\mathcal{N}\_{k}} + \sum\_{k' \neq k} \mathbf{A}\_{k}\left(\mathbf{A}\_{k'}^{\mathbf{u}}\right) \, , \tag{22}$$

and without loss of optimality, we can simplify the problem in (19) into the problem below

$$\begin{aligned} &\max\_{\mathbf{A}} \quad \frac{W\left(\mu\_{0}K\left(\min\_{k}R\_{k}^{\mathrm{m}}\left(\mathbf{A}\right)\right)+\sum\_{k\in\mathcal{K}}\mu\_{k}R\_{k}^{\mathrm{u}}\left(\mathbf{A}\right)\right)}{\mu\left(\mathrm{tr}\left\{\mathbf{A}^{\mathrm{m}}\right\}+\sum\_{k\in\mathcal{K}}\mathrm{tr}\left\{\mathbf{A}\_{k}^{\mathrm{u}}\right\}\right)+MP\_{\mathrm{c}}+P\_{\mathrm{s}}},\\ &\text{s.t.} \quad \mathrm{tr}\left\{\mathbf{A}^{\mathrm{m}}\right\}+\sum\_{k\in\mathcal{K}}\mathrm{tr}\left\{\mathbf{A}\_{k}^{\mathrm{u}}\right\}\leq P\_{\mathrm{max}},\\ &\mathbf{A}^{\mathrm{m}}\succeq\mathbf{0},\ \mathbf{A}^{\mathrm{m}}\text{ diagonal},\ \mathbf{A}\_{k}^{\mathrm{u}}\succeq\mathbf{0},\ \mathbf{A}\_{k}^{\mathrm{u}}\text{ diagonal}\left(\forall k\in\mathcal{K}\right),\end{aligned}} \tag{23}$$

with

$$R\_k^{\mathbf{m}}(\Lambda) = \underbrace{\mathbb{E}\left\{\log \det \left\{ \overline{\mathbf{K}}\_k^{\mathbf{m}}(\Lambda) + \mathbf{G}\_k \mathbf{A}^{\mathbf{m}} \mathbf{G}\_k^H \right\} \right\}}\_{\triangleq s\_k^+(\Lambda)} - \underbrace{\log \det \left\{ \overline{\mathbf{K}}\_k^{\mathbf{m}}(\Lambda) \right\}}\_{\triangleq s\_k^-(\Lambda)}\tag{24}$$

$$R\_k^{\mathbf{u}}\left(\mathbf{A}\right) = \underbrace{\mathbb{E}\left\{\log \det \left\{ \mathbf{\overline{K}}\_k^{\mathbf{u}}\left(\mathbf{A}\right) + \mathbf{G}\_k \mathbf{A}\_k^{\mathbf{u}} \mathbf{G}\_k^H \right\} \right\}}\_{\triangleq t\_k^+\left(\mathbf{A}\right)} - \underbrace{\log \det \left\{ \mathbf{\overline{K}}\_k^{\mathbf{u}}\left(\mathbf{A}\right) \right\}}\_{\triangleq t\_k^-\left(\mathbf{A}\right)}.\tag{25}$$

Denote the lower bound of *R*<sup>m</sup> *<sup>k</sup>* (**Λ**) (∀*k*) as an auxiliary variable *η*, the problem in (23) can be equivalently expressed as

$$\begin{aligned} &\max\_{\mathbf{A}} \quad \frac{W\left(\mu\_{0}K\eta + \sum\_{k\in\mathcal{K}}\mu\_{k}\mathbf{R}\_{k}^{\mathbf{u}}\left(\mathbf{A}\right)\right)}{\mu\left(\operatorname{tr}\left\{\mathbf{A}^{\mathbf{m}}\right\} + \sum\_{k\in\mathcal{K}}\operatorname{tr}\left\{\mathbf{A}\_{k}^{\mathbf{u}}\right\}\right) + MP\_{\mathbb{C}} + P\_{\mathbb{S}}},\\ &\text{s.t.} \quad \begin{aligned} &\text{s.t.} \quad R\_{k}^{\mathbf{m}}\left(\mathbf{A}\right) \ge \eta \ \left(\forall k \in \mathcal{K}\right), \\ &\text{tr}\left\{\mathbf{A}^{\mathbf{m}}\right\} + \sum\_{k\in\mathcal{K}}\operatorname{tr}\left\{\mathbf{A}\_{k}^{\mathbf{u}}\right\} \le P\_{\max}, \end{aligned} \tag{26}$$
 
$$\begin{aligned} &\mathbf{A}^{\mathbf{m}} \succeq \mathbf{0}, \ \mathbf{A}^{\mathbf{m}} \text{ diagonal}, \ \mathbf{A}\_{k}^{\mathbf{u}} \succeq \mathbf{0}, \ \mathbf{A}\_{k}^{\mathbf{u}} \text{ diagonal} \ \left(\forall k \in \mathcal{K}\right). \end{aligned}$$

#### *3.3. Energy-Efficient Power Allocation for NOUM Transmission*

By observing problem (26), we can conclude that the numerator of the objective function is a difference of concave functions. We adopt the MM algorithm to deal with the problem. It is an iteration optimization process, where during each iteration, we replace the objective function with its lower bound function.

In this problem, we substitute *s*− *<sup>k</sup>* (**Λ**) in (24) and *t* − *<sup>k</sup>* (**Λ**) in (25) with their first-order Taylor expansions, respectively, to transfer the numerator of the objective function into a concave function, which leads to a concave-linear fractional program. We can solve problem (26) by solving a series of substitution problems iteratively. Then at the *p*th iteration, **Λ**(*p*) = ! **Λ***<sup>m</sup>* (*p*) , **Λ**<sup>u</sup> 1,(*p*) ,..., **Λ**<sup>u</sup> *K*,(*p*) " , and the sub-problem is

**Λ**(*p*+1) = arg max **Λ** *W u*0*Kη* + ∑ *k*∈K *uk t* + *<sup>k</sup>* (**Λ**) − *t* − *k* **Λ**(*p*) − ∑ *a* =*k* tr *<sup>∂</sup><sup>t</sup>* − *<sup>k</sup>* (**Λ**(*p*)) *∂***Λ**<sup>u</sup> *a <sup>T</sup>* **Λ**<sup>u</sup> *<sup>a</sup>* − **<sup>Λ</sup>**<sup>u</sup> *a*,(*p*) # *μ* tr {**Λ**m} + ∑ *k*∈K tr + **Λ**<sup>u</sup> *k* , + *MP*<sup>c</sup> + *P*<sup>s</sup> , s.t. *s* + *<sup>k</sup>* (**Λ**) − *s* − *k* **Λ**(*p*) − ∑ *a*∈K tr ⎧ ⎪⎨ ⎪⎩ ⎛ ⎝ *∂s*− *k* **Λ**(*p*) *∂***Λ**<sup>u</sup> *a* ⎞ ⎠ *T* **Λ**<sup>u</sup> *<sup>a</sup>* <sup>−</sup> **<sup>Λ</sup>**<sup>u</sup> *a*,(*p*) ⎫ ⎪⎬ ⎪⎭ <sup>−</sup> *<sup>η</sup>* <sup>≥</sup> <sup>0</sup> (∀*<sup>k</sup>* ∈ K), tr {**Λ**m} <sup>+</sup> ∑ *k*∈K tr {**Λ**<sup>u</sup> *<sup>k</sup>* } ≤ *P*max, **<sup>Λ</sup>**<sup>m</sup> **<sup>0</sup>**, **<sup>Λ</sup>**<sup>m</sup> diagonal, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* **<sup>0</sup>**, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* diagonal (∀*k* ∈ K), (27)

where the gradients of *s*− *k* **Λ**(*p*) and *t* − *k* **Λ**(*p*) with respect to **Λ**<sup>u</sup> *<sup>a</sup>* are defined by Δ*s* (*p*) *<sup>k</sup>* and Δ*t* (*p*) *<sup>k</sup>* , respectively, with their diagonal elements being

$$
\left[\Delta \mathbf{s}\_k^{(p)}\right]\_{i,i} = \left[\frac{\partial \mathbf{s}\_k^{-}\left(\mathbf{A}\_{(p)}\right)}{\partial \mathbf{A}\_{il}^{\mathbf{u}}}\right]\_{i,i} = \sum\_{n=1}^{N\_k} \frac{[\mathbf{\Omega}\_k]\_{n,i}}{\sigma^2 + \sum\_{k' \in \mathcal{K}} \sum\_{m=1}^M [\mathbf{\Omega}\_k]\_{n,m} \left[\mathbf{A}\_{k',(p)}^{\mathbf{u}}\right]\_{m,m}},\tag{28}
$$

$$\begin{aligned} \left[\Delta t\_k^{(p)}\right]\_{i,i} = \left[\frac{\partial t\_k^- \left(\mathbf{A}\_{(p)}\right)}{\partial \mathbf{A}\_{\mathbf{u}}^\mathbf{u}}\right]\_{i,i} = \left\{ \quad \sum\_{n=1}^{N\_k} \frac{[\mathbf{\Omega}\_k]\_{n\mathbf{j}}}{\sigma^2 + \sum\_{k'\neq k}^{\mathcal{M}} [\mathbf{\Omega}\_k]\_{n,n} \left[\mathbf{A}\_{k',(p)}^\mathbf{u}\right]\_{m,n}}{\mathbf{0}\_r}, \quad a \neq k, \\\ \mathbf{0}, \qquad a = k, \end{aligned} \tag{29}$$

respectively.

Since *t* − *k* **Λ**(*p*) and **Λ**<sup>u</sup> *<sup>a</sup>*,(*p*) in (27) are constant in each iteration, we can ignore them and obtain an equivalent optimization problem as

**Λ**(*p*+1) = arg max **Λ** *W u*0*Kη* + ∑ *k*∈K *uk t* + *<sup>k</sup>* (**Λ**) − ∑ *a* =*k* tr *<sup>∂</sup><sup>t</sup>* − *<sup>k</sup>* (**Λ**(*p*)) *∂***Λ**<sup>u</sup> *a T* **Λ**<sup>u</sup> *a* # *μ* tr {**Λ**m} + ∑ *k*∈K tr + **Λ**<sup>u</sup> *k* , + *MP*<sup>c</sup> + *P*<sup>s</sup> , s.t. *s* + *<sup>k</sup>* (**Λ**) − *s* − *k* **Λ**(*p*) − ∑ *a*∈K tr ⎧ ⎪⎨ ⎪⎩ ⎛ ⎝ *∂s*− *k* **Λ**(*p*) *∂***Λ**<sup>u</sup> *a* ⎞ ⎠ *T* **Λ**<sup>u</sup> *<sup>a</sup>* <sup>−</sup> **<sup>Λ</sup>**<sup>u</sup> *a*,(*p*) ⎫ ⎪⎬ ⎪⎭ <sup>−</sup> *<sup>η</sup>* <sup>≥</sup> <sup>0</sup> (∀*<sup>k</sup>* ∈ K), tr {**Λ**m} <sup>+</sup> ∑ *k*∈K tr {**Λ**<sup>u</sup> *<sup>k</sup>* } ≤ *P*max, **<sup>Λ</sup>**<sup>m</sup> **<sup>0</sup>**, **<sup>Λ</sup>**<sup>m</sup> diagonal, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* **<sup>0</sup>**, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* diagonal (∀*k* ∈ K). (30)

Although the numerator of the objective function and constraint of the transformed sub-problem (30) are concave, the computational complexity can still be quite high if the expectation operation is manipulated using Monte-Carlo methods. Via applying the large-dimensional random matrix theory in [36,37], we further reduce the optimization complexity by substituting the minuends of *R*<sup>m</sup> *<sup>k</sup>* (**Λ**) and *<sup>R</sup>*<sup>u</sup> *<sup>k</sup>* (**Λ**) with their DEs, respectively.

First, we define a diagonal matrix-valued function **Y***<sup>k</sup>* (**X**) of size *M* × *M*, and its *i*th diagonal element is

$$\left[\mathbf{Y}\_k\left(\mathbf{X}\right)\right]\_{i,i} = \text{tr}\left\{\text{diag}\left\{\left[\boldsymbol{\Omega}\_k\right]\_{:,i}\right\}\mathbf{X}\right\}.\tag{31}$$

Then, we can write the DE of *s*<sup>+</sup> *<sup>k</sup>* (**Λ**) as

$$\begin{split} S\_k^+ \left( \mathbf{A} \right) &= \log \det \left\{ \mathbf{I}\_M + \mathbf{I}\_k^{\mathrm{m}} \mathbf{A}^{\mathrm{m}} \right\} \\ &+ \log \det \left\{ \widetilde{\mathbf{I}}\_k^{\mathrm{m}} + \widetilde{\mathbf{K}}\_k^{\mathrm{m}} \left( \mathbf{A} \right) \right\} - \mathrm{tr} \left\{ \mathbf{I}\_{N\_k} - \left( \widetilde{\Phi}\_k^{\mathrm{m}} \right)^{-1} \right\}, \end{split} \tag{32}$$

where **Γ**<sup>m</sup> *<sup>k</sup>* , **<sup>Γ</sup>**6<sup>m</sup> *<sup>k</sup>* and **<sup>Φ</sup>**<sup>6</sup> <sup>m</sup> *<sup>k</sup>* are given by

$$\begin{aligned} \mathbf{I}\_k^{\mathbf{m}} &= \mathbf{Y}\_k \left( \left( \widetilde{\boldsymbol{\Phi}}\_k^{\mathbf{m}} \widetilde{\mathbf{K}}\_k^{\mathbf{m}} \left( \mathbf{A} \right) \right)^{-1} \right) \in \mathbb{C}^{M \times M}, \\ \widetilde{\mathbf{I}}\_k^{\mathbf{m}} &= \mathbf{A}\_k \left( \mathbf{A}^{\mathbf{m}} \left( \mathbf{I}\_M + \mathbf{A}^{\mathbf{m}} \mathbf{I}\_k^{\mathbf{m}} \right)^{-1} \right) \in \mathbb{C}^{N\_k \times N\_k}, \\ \widetilde{\boldsymbol{\Phi}}\_k^{\mathbf{m}} &= \mathbf{I}\_{N\_k} + \widetilde{\mathbf{I}}\_k^{\mathbf{m}} \left( \overline{\mathbf{K}}\_k^{\mathbf{m}} \left( \mathbf{A} \right) \right)^{-1} \in \mathbb{C}^{N\_k \times N\_k}, \end{aligned} \tag{33}$$

and the definition of **A***<sup>k</sup>* (**X**) is given in (11).

Likewise, we have the DE of *t* + *<sup>k</sup>* (**Λ**) as

$$\begin{split} T\_k^+ \left( \mathbf{A} \right) &= \log \det \left\{ \mathbf{I}\_M + \mathbf{I}\_k^{\mathbf{u}} \boldsymbol{\Lambda}\_k^{\mathbf{u}} \right\} \\ &+ \log \det \left\{ \ddot{\mathbf{I}}\_k^{\mathbf{u}} + \overline{\mathbf{K}}\_k^{\mathbf{u}} \left( \mathbf{A} \right) \right\} - \text{tr} \left\{ \mathbf{I}\_{N\_k} - \left( \ddot{\boldsymbol{\Phi}}\_k^{\mathbf{u}} \right)^{-1} \right\}, \end{split} \tag{34}$$

where **Γ**<sup>u</sup> *<sup>k</sup>* , **<sup>Γ</sup>**6<sup>u</sup> *<sup>k</sup>* and **<sup>Φ</sup>**<sup>6</sup> <sup>u</sup> *<sup>k</sup>* are given by

$$\begin{split} \mathbf{T}\_{k}^{\mathbf{u}} &= \mathbf{Y}\_{k} \left( \left( \widetilde{\boldsymbol{\Phi}}\_{k}^{\mathbf{u}} \overline{\mathbf{K}}\_{k}^{\mathbf{u}} \left( \mathbf{A} \right) \right)^{-1} \right) \in \mathbb{C}^{M \times M}, \\ \widetilde{\mathbf{I}}\_{k}^{\mathbf{u}} &= \mathbf{A}\_{k} \left( \mathbf{A}\_{k}^{\mathbf{u}} \left( \mathbf{I}\_{M} + \mathbf{A}\_{k}^{\mathbf{u}} \mathbf{I}\_{k}^{\mathbf{u}} \right)^{-1} \right) \in \mathbb{C}^{N\_{k} \times N\_{k}}, \\ \widetilde{\boldsymbol{\Phi}}\_{k}^{\mathbf{u}} &= \mathbf{I}\_{N\_{k}} + \widetilde{\mathbf{I}}\_{k}^{\mathbf{u}} \left( \overline{\mathbf{K}}\_{k}^{\mathbf{u}} \left( \mathbf{A} \right) \right)^{-1} \in \mathbb{C}^{N\_{k} \times N\_{k}}. \end{split} \tag{35}$$

With the DEs of *s*<sup>+</sup> *<sup>k</sup>* (**Λ**) and *t* + *<sup>k</sup>* (**Λ**) defined above, the optimization problem in (30) becomes

**Λ**(*p*+1) = arg max **Λ** *W u*0*Kη* + ∑ *k*∈K *uk T*+ *<sup>k</sup>* (**Λ**) − ∑ *a* =*k* tr *<sup>∂</sup><sup>t</sup>* − *<sup>k</sup>* (**Λ**(*p*)) *∂***Λ**<sup>u</sup> *a T* **Λ**<sup>u</sup> *a* # *μ* tr {**Λ**m} + ∑ *k*∈K tr + **Λ**<sup>u</sup> *k* , + *MP*<sup>c</sup> + *P*<sup>s</sup> , s.t. *S*<sup>+</sup> *<sup>k</sup>* (**Λ**) − *s* − *k* **Λ**(*p*) − ∑ *a*∈K tr ⎧ ⎪⎨ ⎪⎩ ⎛ ⎝ *∂s*− *k* **Λ**(*p*) *∂***Λ**<sup>u</sup> *a* ⎞ ⎠ *T* **Λ**<sup>u</sup> *<sup>a</sup>* <sup>−</sup> **<sup>Λ</sup>**<sup>u</sup> *a*,(*p*) ⎫ ⎪⎬ ⎪⎭ <sup>−</sup> *<sup>η</sup>* <sup>≥</sup> <sup>0</sup> (∀*<sup>k</sup>* ∈ K), tr {**Λ**m} <sup>+</sup> ∑ *k*∈K tr {**Λ**<sup>u</sup> *<sup>k</sup>* } ≤ *P*max, **<sup>Λ</sup>**<sup>m</sup> **<sup>0</sup>**, **<sup>Λ</sup>**<sup>m</sup> diagonal, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* **<sup>0</sup>**, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* diagonal (∀*k* ∈ K). (36)

We can observe from the optimization problem in (36) that the denominator and numerator of the objective function are linear and concave functions of **Λ**, respectively. We invoke Dinkelbach's transform [38] to deal with this concave-linear program. We can obtain the solution to (36) via solving a series of problems below

! **Λ**(*q*+1) (*p*) , *<sup>η</sup>*(*q*+1) " = arg max **Λ** *W* ⎛ ⎜⎝*<sup>u</sup>*0*K<sup>η</sup>* <sup>+</sup> ∑ *k*∈K *uk* ⎛ ⎜⎝ *T*+ *<sup>k</sup>* (**Λ**) − ∑ *a* =*k* tr ⎧ ⎪⎨ ⎪⎩ ⎛ ⎝ *∂t* − *k* **Λ**(*p*) *∂***Λ**<sup>u</sup> *a* ⎞ ⎠ *T* **Λ**<sup>u</sup> *a* ⎫ ⎪⎬ ⎪⎭ ⎞ ⎟⎠ ⎞ ⎟⎠ <sup>−</sup> *<sup>χ</sup>*(*q*) (*p*) *P* (**Λ**), s.t. *S*<sup>+</sup> *<sup>k</sup>* (**Λ**) − *s* − *k* **Λ**(*p*) − ∑ *a*∈K tr ⎧ ⎪⎨ ⎪⎩ ⎛ ⎝ *∂s*− *k* **Λ**(*p*) *∂***Λ**<sup>u</sup> *a* ⎞ ⎠ *T* **Λ**<sup>u</sup> *<sup>a</sup>* <sup>−</sup> **<sup>Λ</sup>**<sup>u</sup> *a*,(*p*) ⎫ ⎪⎬ ⎪⎭ <sup>−</sup> *<sup>η</sup>* <sup>≥</sup> <sup>0</sup> (∀*<sup>k</sup>* ∈ K), tr {**Λ**m} <sup>+</sup> ∑ *k*∈K tr {**Λ**<sup>u</sup> *<sup>k</sup>* } ≤ *P*max, **<sup>Λ</sup>**<sup>m</sup> **<sup>0</sup>**, **<sup>Λ</sup>**<sup>m</sup> diagonal, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* **<sup>0</sup>**, **<sup>Λ</sup>**<sup>u</sup> *<sup>k</sup>* diagonal (∀*k* ∈ K), (37)

where *P* (**Λ**) = *μ* tr {**Λ**m} <sup>+</sup> <sup>∑</sup> *k*∈K tr + **Λ**<sup>u</sup> *k* , <sup>+</sup> *MP*<sup>c</sup> <sup>+</sup> *<sup>P</sup>*s, *<sup>q</sup>* is the iteration index, and *<sup>χ</sup>*(*q*) (*p*) is the auxiliary variable. During each iteration, we update *χ*(*q*) (*p*) using the following equation

$$\chi\_{\left(p\right)}^{\left(q\right)} = \frac{\mathcal{W}\left(\boldsymbol{u}\_{0}K\eta^{\left(q\right)} + \sum\_{k \in \mathcal{K}} \boldsymbol{u}\_{k} \left(\boldsymbol{T}\_{k}^{+} \left(\boldsymbol{\Lambda}\_{\left(p\right)}^{\left(q\right)}\right) - \sum\_{a \neq k} \text{tr}\left\{\left(\frac{\partial \boldsymbol{t}\_{k}^{-} \left(\boldsymbol{\Lambda}\_{\left(p\right)}\right)}{\partial \boldsymbol{\Lambda}\_{a}^{\mathbf{u}}}\right)^{T} \boldsymbol{\Lambda}\_{a,\left(p\right)}^{\left(q\right)}\right\}\right)}{\mathcal{P}\left(\boldsymbol{\Lambda}\_{\left(p\right)}^{\left(q\right)}\right)}.\tag{38}$$

From the analysis above, we can observe that the proposed EE optimization algorithm involves two-layer iterations. During the outer iteration, via invoking the MM algorithm, we replace the numerator of the objective function in (26) with its lower bound function, thus making the numerator concave. The MM-based algorithm is guaranteed to converge to the locally optimal solution [39–41]; in the inner iteration, we transform the fractional problem in (36) into solvable convex optimization problems in (37) via Dinkelbach's transform, which can derive the global optimum solution to (36) with guaranteed convergence [42]. After several iterations, we can obtain the optimal beam domain power allocation matrix **Λ**. Please note that **Λ** is locally optimal due to the local optimality of MM algorithm. We present our algorithm in Algorithm 1.

**Algorithm 1** Energy-Efficient Power Allocation Algorithm in the Beam Domain for Massive MIMO NOUM Transmission

**Input:** Beam domain channel statistics **Ω***k*, initial power allocation matrix **Λ**(0), outer iteration

threshold <sup>1</sup> and inner iteration threshold <sup>2</sup>

**Output:** Power allocation matrix **Λ** in the beam domain


$$\overline{EE}\_{(p)} = \frac{\mathcal{W}\left(u\_0 \mathbb{K}\left(\min\_k \left\{ S\_k^+ \left( \mathbf{A}\_{(p)} \right) - s\_k^- \left( \mathbf{A}\_{(p)} \right) \right\} \right) + \sum\_{k \in \mathcal{K}} u\_k \left( T\_k^+ \left( \mathbf{A}\_{(p)} \right) - t\_k^- \left( \mathbf{A}\_{(p)} \right) \right) \right)}{P \left( \mathbf{A}\_{(p)} \right)} \tag{39}$$


$$\text{7: } \begin{array}{cccc} \text{Calculate } \Lambda\_{(p)}^{(q)} \text{ via solving problem (37) with } \chi\_{(p)}^{(q-1)} \end{array}$$


```
10: Let p = p + 1
```

#### **4. Numerical Results**

We provide numerical simulation results to demonstrate the performance of the EE optimization algorithm proposed above for NOUM transmission massive MIMO scenario with statistical CSI. Table 1 illustrates how the numerical simulation parameters are set.

First of all, in Figure 2, we illustrate the convergence performance by showing the iteration process of our EE optimization algorithm under different transmit power budgets *P*max. The horizontal ordinate is the outer iteration index. As we can see, the EE converges after only a few iterations. Also, we can observe that in the lower power budget regime, the EE performance convergences faster than that in the higher power budget regime.


**Table 1.** Simulation parameters.

**Figure 2.** The convergence performance of the proposed EE optimization algorithm for different power budgets *P*max.

Then, we evaluate the EE of the NOUM transmission versus the power budget *P*max under different numbers the antennas *M* at the BS in Figure 3. As we can see, the EE performance decreases when the BS antenna number *M* increases for the reason that in the power consumption model we adopted in (16), the total circuit power consumption grows linearly with *M*, the BS antenna number.

Next, the comparison of the EE performance of the power allocation algorithm proposed above with the rate maximization approach [11] is shown in Figure 4. We notice that the EE performance of the two approaches are similar at low transmit power budget regime. However, when the transmit power budget gets high, the EE performance of the rate maximization approach decreases, while that of our EE maximization approach remains high. This indicates that the rate maximization approach can achieve almost EE optimal when *P*max is low. However, our EE maximization approach outperforms the rate maximization one at high transmit power budget regime.

**Figure 3.** The EE performance of the NOUM transmission versus the power budget *P*max for different numbers of BS antennas *M*.

**Figure 4.** The EE performance of the proposed beam domain power allocation algorithm compared with the rate maximization approach.

Finally, in Figure 5, the EE performance of our power allocation approach and that of full CSI approach, which assumes instantaneous CSI is known at the BS, is compared. Since full CSI is an ideal case, it can achieve better EE performance than other imperfect CSI situation. However, the full CSI case suffers from pilot overhead. As Figure 5 illustrates, our proposed algorithm surpasses the full CSI approach with 3/7 pilot overhead [43] in the EE performance.

**Figure 5.** The comparison on the EE performance of proposed algorithm, full CSI case and full CSI with 3/7 overhead.

#### **5. Conclusions**

To conclude, we considered the EE optimization problem in NOUM transmission systems with statistical CSI available at the BS. We first formulated the EE maximization problem, and then determined the closed-form optimal eigenvectors of the multicast and unicast transmit covariance matrices for optimal EE, respectively. Next, with guaranteed convergence, we proposed a beam domain power allocation algorithm adopting the MM algorithm, DE and Dinkelbach's transform and derived the locally optimal power allocation strategy to achieve the EE optimization. Finally, with numerical results, we presented the performance gain of our EE maximization algorithm compared with the conventional approach.

**Author Contributions:** W.W. and Y.H. perceived the idea and wrote the manuscript. L.Y., J.X., J.L. and X.G. gave valuable suggestions on the structuring of the paper and assisted in the revising and proofreading.

**Funding:** This work was supported by the National Key R&D Program of China under Grant 2018YFB1801103, the National Natural Science Foundation of China under Grants 61631018, 61801114, 61761136016, 61771264, 61871465, and 61501113, the Natural Science Foundation of Jiangsu Province under Grant BK20170688, the Fundamental Research Funds for the Central Universities, and the Huawei Cooperation Project.

**Acknowledgments:** The authors would like to thank the Editor and the anonymous reviewers for their valuable comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of Theorem 1**

Firstly, we can tell from (11) that the value of **A***<sup>k</sup>* (**X**) is only affected by the diagonal elements of **<sup>X</sup>**, so **<sup>K</sup>**<sup>m</sup> *<sup>k</sup>* in (10) and **<sup>K</sup>**<sup>u</sup> *<sup>k</sup>* in (15) are irrelevant to the off-diagonal elements of **V***H***Q**<sup>u</sup> *<sup>k</sup>* **V**. Then, the power consumption in (16) has no relationship with the off-diagonal elements of **VΛ**m**V***<sup>H</sup>* or **VΛ**<sup>u</sup> *<sup>k</sup>* **<sup>V</sup>***H*. Moreover, applying the proof method similar to [44], we can conclude that **<sup>V</sup>Λ**m**V***<sup>H</sup>* and **VΛ**<sup>u</sup> *<sup>k</sup>* **<sup>V</sup>***<sup>H</sup>* should be diagonal to maximize *<sup>R</sup>*<sup>m</sup> *<sup>k</sup>* and *<sup>R</sup>*<sup>u</sup> *<sup>k</sup>* , respectively. Therefore, to maximize the objective function in (19), **Q**<sup>m</sup> and **Q**<sup>u</sup> *<sup>k</sup>* should be both diagonal matrices. This concludes the proof.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Artificial Intelligence-Based Discontinuous Reception for Energy Saving in 5G Networks**

**Mudasar Latif Memon 1, Mukesh Kumar Maheshwari 2, Navrati Saxena 3,\*, Abhishek Roy <sup>4</sup> and Dong Ryeol Shin 3,\***


Received: 17 June 2019; Accepted: 9 July 2019; Published: 11 July 2019

**Abstract:** 5G is expected to deal with high data rates for different types of wireless traffic. To enable high data rates, 5G employs beam searching operation to align the best beam pairs. Beam searching operation along with high order modulation techniques in 5G, exhausts the battery power of user equipment (UE). LTE network uses discontinuous reception (DRX) with fixed sleep cycles to save UE energy. LTE-DRX in current form cannot work in 5G network, as it does not consider multiple beam communication and the length of sleep cycle is fixed. On the other hand, artificial intelligence (AI) has a tendency to learn and predict the packet arrival-time values from real wireless traffic traces. In this paper, we present AI based DRX (AI-DRX) mechanism for energy efficiency in 5G enabled devices. We propose AI-DRX algorithm for multiple beam communications, to enable dynamic short and long sleep cycles in DRX. AI-DRX saves the energy of UE while considering delay requirements of different services. We train a recurrent neural network (RNN) on two real wireless traces with minimum root mean square error (RMSE) of 5 ms for trace 1 and 6 ms for trace 2. Then, we utilize the trained RNN model in AI-DRX algorithm to make dynamic short or long sleep cycles. As compared to LTE-DRX, AI-DRX achieves 69% higher energy efficiency on trace 1 and 55% more energy efficiency on trace 2, respectively. The AI-DRX attains 70% improvement in energy efficiency for trace 2 compared with Poisson packet arrival model for *λ* = 1/20.

**Keywords:** discontinuous deception; multiple beam communications; artificial intelligence; energy efficiency; 5G; wireless communications

#### **1. Introduction**

The use of cellular gadgets, like smartphones, notebooks, and tablets has comforted our life. The Ericsson mobility report predicts the rise of cellular traffic to 8.8 billion by 2024 [1]. These extensive growing cellular users require improved data rates with heterogeneous services in next generation networks. 5G expects to deal with various types of traffics including periodic and delay tolerant traffic for IoT devices or burst type of traffic for delay intolerant services [2,3]. 3rd Generation Partnership Project (3GPP) planned the standardization of GHz spectrum (mm-wave) to address the users' demand of high bandwidth. However, communication over high-frequency bands of the mm-wave requires directional air interface and narrower beams to reduce the path loss. In directional air interface, UE has to search for best beam pairs and make adequate beam alignment with the next generation nodeB (gNB) [4]. In addition to beam searching process in 5G networks, massive MIMO, higher order modulation schemes and advanced coding techniques also increase UE energy expenses.

Long term evolution (LTE) networks utilize DRX mechanism to reduce the energy consumption of the UE [2,5]. DRX enables a UE to save energy by switching off the radio circuitry part, in case of no incoming data. LTE-DRX turns off radio part during long and short sleep cycles in order to reduce the energy consumption of UE. The sleep cycles in LTE-DRX are of the fixed time period. If any new packet arrives during the fixed time sleep cycle, the packets will be buffered at evolved NodeB (eNB). The eNB serves the stored packets after the completion of each sleep cycle in LTE-DRX. Hence, the LTE-DRX saves the UE energy at the cost of delay [6,7]. LTE-DRX in its current form is not suitable for energy savings in 5G enabled cellular devices [8,9], due to two main reasons.


Authors in [3] propose DRX for 5G network, which requires UE to search for best beam pairs after completion of each sleep cycle in order to serve the packets. The additional beam searching operation after each sleep cycle enhances the energy consumption of UE. The work in [10] suggests beam aware DRX approach for 5G enabled machine to machine communications, in which UE has prior information about best beam pairs. Similarly, beam aware DRX mechanism for energy saving in the 5G network is proposed in [11]. The beam aware mechanism may not perform well in case of UE mobility and beam misalignment [12]. Kwon et al. [12] show that the probability of beam misalignment increases with an increase in UE velocity. The probability of misalignment is 0.1 and 0.38 if a UE moves with the velocity of 30 Km/h and 60 Km/h, respectively.

The DRX with built-in state of beamforming is proposed by Liu et al. [13]. Authors present the concept of DRX for multiple beam communications and utilize the semi-Markov model to design eleven-states of DRX. Their approach considers the beam training process only in case of beam misalignment and after completion of long sleep duration. Hence, authors save the energy of UE while minimizing the delay for 5G services. However, the fixed duration of sleep cycles in their approach may cause more energy consumption.

On the other hand, Recurrent Neural Network (RNN) in AI has shown incredible results to predict the upcoming value of a given sequence [14]. Long Short-Term Memory (LSTM) is a popular type of RNN that is specially designed to learn long-term dependencies of a sequence for predicting the upcoming value of a sequence [15]. The term long-term dependency refers to the sequence, whose desired/current output values (prediction results) depend on long-sequence of previous input values rather than the only single previous input value.

Motivated by the success of RNN to learn and predict long-term dependent sequence values in various applications [11,14,16], we use RNN to extract the pattern of packet arrival time from real wireless traffic traces and to predict the values of the next packet arrival time. Based on the prediction results, we propose an AI-DRX algorithm that works on a ten-state DRX model to enable energy saving. AI-DRX for multiple beam communications in 5G network saves the UE energy by enabling dynamic short and long sleep cycles, respectively. To be more specific, the following are our key contributions to save UE energy in multiple beam communications scenario of 5G networks.


• We evaluate the performance of AI-DRX in terms of energy efficiency and mean delay. AI-DRX achieves the energy efficiency of 59% on trace 1 and 95% on trace 2, respectively; while considering the mean delay requirements of different services.

The remaining paper is organized as follows. In Section 2 we present an overview of existing DRX mechanism and introduce the RNN with focus on LSTM neural network. Section 3 proposes AI-DRX algorithm with a ten-state DRX model, to enable dynamic short or long sleep cycles in multiple beams communications of the 5G network. Section 4 presents the performance analysis of AI-DRX in terms of energy efficiency and mean delay. Finally, we conclude our work in Section 5.

#### **2. Related Work**

#### *2.1. DRX in LTE*

LTE networks use DRX mechanism to minimize the power consumption of UE. The energy expenditure of UE can be reduced by switching off radio components during the unavailability of incoming data packets [17]. LTE-DRX is configured and controlled by evolved node B (eNB). The eNB informs the UE to turn off its radio components in case of no data in the buffer. The LTE-DRX can be regulated by radio resource control (RRC) layer at eNB. RRC sends the packets' information to UE via a physical downlink control channel (PDCCH). RRC operates in two modes: (1) RRC\_connected mode; (2) RRC\_idle mode, after a UE is turned on [18]. The RRC\_idle mode is only responsible for paging operations, UE neither receives nor transmits the data, but only monitors the paging signals during the paging occasion. Whereas, all the data exchange between UE and eNB takes place during RRC\_connected mode. Since all transmissions take place during the RRC\_connected mode and this mode is responsible for more energy consumption of UE, our work focuses on the DRX in RRC\_connected mode. Figure 1 shows the LTE-DRX in RRC\_connected mode that works on two types of states. These states are:


**Figure 1.** LTE-DRX in RRC\_connected mode.

Based on the above two states, different DRX parameters are configured in RRC\_connected mode while considering various services' delay requirements.

#### 2.1.1. Active State

A UE sends or receives the data packets during DRX active state. This state does not allow UE to save the energy because UE has to receive the data packets during this state. Moreover, a UE has to turn on the radio circuitry throughout the active state. The active state of DRX can be controlled by two parameters.


All the data packets are transmitted/received during the active time. The inactivity timer is a countdown timer in DRX active state. This timer re-starts every time a new data packet arrives at eNB and then eNB serves the received packets to UE. In case of no new packet arrival, the inactivity timer gets expired and the UE switches to sleep state for a certain time period.

#### 2.1.2. Sleep State

A UE turns off the radio components during sleep state in order to save power. UE cannot receive or send the data packets during DRX sleep state but can only monitor the PDCCH for any incoming data. DRX in a sleep state is controlled by the following parameters:


During ON time, a UE monitors the PDCCH. ON time always starts after completion of each sleep cycle. A short sleep cycle is a small duration of time that saves the UE energy by switching off the transceiver part. A short sleep cycle is repeated up to *Nsc* number of short sleep cycles. After the expiry of *Nsc*, a UE transits to the long sleep cycle. The long sleep cycle and long sleep timer are similar to short sleep cycle and short sleep timer but have a longer time period than the counterpart, respectively.

Figure 2 delineates the timing diagram of LTE-DRX. As shown in Figure 2, all the data transmission and reception take place during the active state. After reception of each new packet, inactivity timer restarts. In the case that no new packet reaches and inactivity timer finishes the countdown timer, then UE switches to short sleep cycle. After every short sleep cycle, a UE monitors the PDCCH for any incoming packet during the ON time. If no new packet arrives before the completion of short sleep timer, the UE switches to the long sleep cycle and remains there until intimation of the new packet is received. UE transits from a long sleep to idle state if no new packets arrive and the long sleep timer expires.

**Figure 2.** LTE-DRX timing diagram.

#### *2.2. Non-Compatibility of LTE-DRX for 5G Networks*

The LTE-DRX mechanism in its existing form is not suitable for 5G networks. The main reason for non-suitability is the directional communications in 5G networks, which require beamforming operation prior to data transmission [4]. Whereas, LTE-DRX mechanism does not include beam searching operation [19,20]. Figure 3 shows the concept of directional communication with multiple beams in 5G network. The gNB transmits *K* number of beams and a UE has *L* number of beams. A UE needs to search and align the best beam pair form *K* × *L* beam pairs, prior to the start of communications [4]. The process of beam searching and beam alignment in 5G is not included in LTE-DRX. Moreover, the process of beam searching in 5G network causes a UE to remain in the active state more than that of the LTE-DRX, which also increases the power consumption of UE [21]. Furthermore, the beamforming process in 5G requires additional time for beam searching and aligning best beam pairs [22]. The additional beam searching and beam alignment time cause more delay than the delay in LTE-DRX. Hence, directional communications in 5G is one of the reasons, which makes LTE-DRX mechanism in its existing form, not a suitable solution for power saving in 5G networks [21,22].

The second main reason for non-compatibility of existing LTE-DRX mechanism with 5G networks is the fixed length of sleep cycles in LTE-DRX [11]. LTE-DRX uses fixed length short and long sleep cycles to economize the power consumption of UE. Whereas, 5G network is expected to deal with different types of services, simultaneously [4]. These services may have different size of packet lengths, variable packet arrival time and variable transmission time interval (TTI) [23]. The fixed length of sleep cycles in LTE-DRX mechanism may not be suitable for 5G services as these cycles may under-utilize or over-utilize TTI. Moreover, fixed length sleep cycles in 5G may be responsible for least power savings in 5G enabled devices [23]. However, DRX design can still be used for 5G [9,24,25].

**Figure 3.** Directional communication with multiple beams in 5G network.

#### *2.3. Recurrent Neural Network for Predicting Sequential Data*

Recurrent Neural Network (RNN) is a part of AI that is widely used for predictions over sequential data [26]. Long short-term memory (LSTM) network is one of the popular kinds of RNN that performs well while predicting the long-term dependent time sequences [15]. LSTM is different from other kinds of RNN due to its gated structure and internal cell state in every single unit, as shown in Figure 4. LSTM neural network utilizes various previous inputs to decide the values of internal gates (forget gate, input gate, input modulation gate, output modulation gate), which contributes to cell state. Cell state helps the LSTM to remember or forget the impact of past inputs while deciding the prediction results. This is the main reason LSTM has better performance for predicting the long-term dependent sequences [14,15]. Table 1 shows the notations used in our work.

LSTM unit includes forget gate *FT* that can be mathematically shown by Equation (1). The output of forget gate varies from 0 to 1, due to sigmoid *Sig* = <sup>1</sup> <sup>1</sup>+*e*−*<sup>T</sup>* function. The letters *wx* and *wh* in the Equation (1) represent the weights associated with the current input *xT* and previous output *hT*−1, respectively. The term *bF* is the bias for the forget gate. Similarly, the input gate (*IT*) and input modulation gate (*GT*) can be calculated by Equations (2) and (3), respectively. The hyperbolic tangent function in input and output modulation gates can be mathematically written as tanh(*T*) = 2*Sig*(2*T*) − 1 and ranges from −1 to +1. Moreover, *bI* and *bC* are biases for input gate and cell state, respectively.

$$F\_T = \text{Sign}(w\_{\text{x}}\mathbf{x}\_T + w\_{\text{h}}h\_{T-1} + b\_F) \tag{1}$$

$$I\_T = \text{Sign}(w\_{\text{x}}x\_T + w\_{h}h\_{T-1} + b\_I) \tag{2}$$

$$G\_T = \tanh\left(w\_{\mathcal{X}}x\_T + w\_h h\_{T-1} + b\_{\mathcal{C}}\right) \tag{3}$$

**Figure 4.** Gated structure of single long short-term memor (LSTM) unit.


**Table 1.** List of abbreviations.

Cell state *CT* and the output layer *OT* can be calculated by Equations (4) and (5), respectively. The final prediction results can be observed by output modulation gate (*hT*) using Equation (6).

$$\mathbf{C}\_{T} = F\_{T} \otimes \mathbf{C}\_{T-1} + I\_{T} \otimes \mathbf{G}\_{T} \tag{4}$$

$$O\_T = \text{Sign}(w\_{\text{x}}\mathbf{x}\_T + w\_{\text{h}}h\_{T-1} + b\_{\text{O}}) \tag{5}$$

$$h\_T = O\_T \otimes \tanh(\mathcal{C}\_T) \tag{6}$$

During the training process, LSTM learns to extract the relationship between the input and desired output (upcoming value of sequence) by adjusting various weight values in Equations (1)–(3) and (5). Once the model is trained with the least error, the learned weight values can be used in Equations (1)–(6) to calculate the output values of a sequence (prediction). The error between prediction result and observed (actual) value can be computed by the root mean square error (RMSE) and is given as:

$$\text{RMSE} = \sqrt{\frac{\sum\_{T=1}^{N} (Predicted\_T - Obsered\_T)^2}{N}} \tag{7}$$

where the notation *T* represents the number of samples and *N* shows the maximum number of samples.

#### **3. AI-DRX for Multiple Beam Communications**

#### *3.1. System Model*

In 5G directional air interface, beam information is a very crucial operation as a UE wakes up to receive data or control packet [3]. Hence, our work considers DRX for multiple beam communications [13], having dynamic long and short sleep cycles instead of fixed length sleep cycles. Dynamic sleep cycles have different values of sleep time in each cycle rather than one static time value in each sleep cycle. The random values of dynamic sleep time can be predicted based on previous values of packet arrival time.

We model DRX as a ten-state model. These states range from *S*<sup>0</sup> to *S*<sup>9</sup> and are shown in Figure 5. Let us elaborate on the states below.


#### *3.2. Proposed AI-DRX Algorithm*

We propose artificial intelligence based DRX (AI-DRX) algorithm for multiple beams communications in 5G network. Our proposed algorithm enables DRX to achieve dynamic long or short sleep cycles and to reduce the power consumption of UE. AI-DRX algorithm works on ten-state model of DRX as shown in Figure 5. Algorithm 1 demonstrates AI-DRX mechanism and is elaborated below:


**Figure 5.** State diagram for AI-DRX.

**Algorithm 1** AI based DRX for Multiple Beam Communications.

```
1: Input TON, ThMin, ThMax
2: Examine buffer for incoming packets
3: if Packets in buffer > 0 then
4: Serve the packets and Predict TDY (S1)
5: else
6: if No Beam Misalignment then
7: Check TDY (S1)
8: if TDY < ThMin then
9: Go to Step 4 (S0)
10: else
11: if ThMin ≤ TDY < ThMax then
12: Go to Dynamic Short Sleep up to TDY (S2)
13: if Beam Misalignment then
14: Execute Beam Training (S7)
15: Feedback (S8)
16: ON (S9)
17: if No Beam Aligned then
18: Go to Step 14 (S7)
19: end if
20: else
21: if New packet arrives before completion of ON Time TDY then
22: Go to Step 4 (S9 to S0)
23: else
24: Go to Dynamic sleep for previous predicted time TDY (S2/S3)
25: end if
26: end if
27: else
28: if TDY > ThMax then
29: Go to Dynamic Long sleep (S3)
30: Go to Step 14 (S7)
31: end if
32: end if
33: end if
34: else
35: if Beam Misalignment then
36: Execute Beam Training (S4)
37: Feedback (S5)
38: Active after Beam Training (S6)
39: if No Beam Aligned then
40: Go to Step 36 (S4)
41: else
42: Go to Step 4 (S0)
43: end if
44: end if
45: end if
46: end if
```
#### *3.3. AI-DRX for Enabling Dynamic Long and Short Sleep Cycles*

Algorithm 1 (AI-DRX) demonstrates the use of artificial intelligence in the implementation of DRX for 5G networks. AI-DRX makes dynamic short and long sleep cycles in DRX. AI-DRX utilizes trained LSTM model to predict the upcoming value of packet arrival time and subsequently to enable dynamic sleep cycles in DRX. The training process is conducted offline on two traces of real wireless traffic acquired from the University of Massachusetts (UMass) trace repository [27] and Crawdad data set repository [28]. Training process learns the packet arrival time pattern of increasing sequence values from both traces. Once the LSTM network is trained offline with least prediction error, the trained model predicts the upcoming packet time of real wireless traffic. AI-DRX algorithm calculates the dynamic sleep cycles by using prediction results of upcoming packet arrival time value.

AI-DRX enables a UE to reduce power consumption in multiple beam communications scenario of 5G networks. AI-DRX saves energy by enabling dynamic short and long sleep cycles, respectively. AI-DRX enables dynamic short sleep cycles or dynamic long sleep cycles based on two threshold values *ThMin* and *ThMax*. The values of *ThMin* and *ThMax* can be varied according to delay requirements of various services. We discuss three cases of AI-DRX: (1) dynamic short sleep cycles; (2) dynamic long sleep cycles; (3) dynamic inactivity timer.

During an active time of AI-DRX, the packets are served to UE. The trained LSTM model predicts the value of upcoming packet arrival time (*TDY*) while serving the packets. UE checks for new packets in the buffer via PDCCH during ON state of AI-DRX. If no new packet is observed in the buffer, UE continues to sleep up to *TDY*. However, if a new packet is observed in the ON state, UE transits to active time and receives the packet(s). Meanwhile, inactivity timer in the active mode is restarted on the reception of every new packet. Furthermore, AI-DRX also considers the case of the active state having empty buffer, the UE counts down for dynamic inactivity timer (third case below) to complete and then transits to sleep cycle for predicted sleep time (*TDY*).

The first case enables dynamic short sleep cycle if the value of *TDY* is greater than or equal to *ThMin* and less than *ThMax*. The condition of dynamic short sleep cycle gets satisfied and UE sleeps for the predicted time value of *TDY*. There are very small chances of beam misalignment after a short sleep period [12,13]. Hence, AI-DRX considers the beam training process after dynamic short sleep cycles only if beams are misaligned. Figure 6 shows the timing diagram of dynamic short sleep cycle using AI-DRX algorithm.

**Figure 6.** Timing diagram of AI-DRX based dynamic short sleep cycle.

The second case considers the dynamic long sleep cycle up to the predicted value of *TDY*, if *TDY* is greater than *ThMax*. The concept of the dynamic long sleep cycle is depicted in Figure 7. The dynamic long sleep cycle saves more energy than that of short sleep cycle. Moreover, there are more chances of beam misalignment after the dynamic long sleep cycles. Hence, AI-DRX performs beam training and feedback after completion of each dynamic long sleep cycle.

The third case deals with dynamic inactivity timer. If the predicted value of *TDY* is less than *ThMin*, the UE will remain active until the period *TDY* or any new packet arrives. The concept of dynamic inactivity timer can be seen in Figures 6 and 7. AI-DRX also addresses the problem of beam misalignment during active time. AI-DRX performs beam training and feedback in case of any beam misalignment during active time.

It may be noted that AI-DRX utilizes dynamic short and long sleep cycles instead of static fixed time sleep cycles. Moreover, the inactivity timer value is also dynamic according to the prediction results (using trained LSTM model). Our proposed algorithm keeps updating the predicted time value based on most recent received packets and their packet arrival time.

**Figure 7.** Timing diagram of AI-DRX based dynamic long sleep cycle.

DRX saves the power of UE at the cost of delay. Hence, we considered two performance parameters: energy efficiency (EE) and mean delay. The performance parameter; EE of the UE is the ratio of dynamic short sleep time and dynamic long sleep time to the sum of active time (*TAC*), ON time (*TON*), beam training time, feedback time, dynamic short sleep (*TDY*) time and dynamic long sleep time (*TDY*). EE can be calculated by Equation (8). The beam training and feedback processes are considered during active time (*TAC*) and dynamic short and long sleep cycles take place during (*TDY*).

$$EE = \frac{T\_{DY}}{T\_{AC} + T\_{ON} + T\_{DY}}\tag{8}$$

Similarly, for packet inter-arrival time *λκ*, the mean holding period of active state can be calculated as [5]:

$$T\_{AC} = \frac{1 - e^{-\lambda\_k T\_{IN}}}{e^{-\lambda\_k T\_{IN}}(1 - e^{-\lambda\_k})} \tag{9}$$

Furthermore, the packets arrived during the sleep state and ON duration are stored in the buffer until next ON period. The packet arrival events are the random observer to the sleep period and ON state. Therefore, the mean delay is defined as the sum of mean sleep time and ON duration and is given as:

$$\text{Mean Delay} = \frac{T\_{DY}}{2} + \frac{T\_{ON}}{2} \tag{10}$$

#### **4. Performance Analysis**

We use MATLAB 2019a for training and testing of RNN on two different traces (data sets). These traces of burst traffic type are taken from the Crawdad dataset repository (trace 1) [28] and UMass trace repository (trace 2) [27]. The trace 2 shows the traffic pattern of HTTP and video streaming applications. The video streaming trace is used as it is expected by 2024 over three-quarters of mobile data traffic will be video traffic [1]. The trace 1 and trace 2 include seven parameters as shown in [11]. These parameters are: (1) serial number; (2) packet arrival time in seconds; (3) source address of the packets; (4) destination address of the packet; (5) protocol used; (6) length of packets in bytes; and (7) additional information. We have utilized the time parameter from both traces to train the LSTM network. Time parameter contains the information of packet arrival time. Our purpose is to train the LSTM network until it learns the time interval pattern of packet arrival time from a given sequence. Data pre-processing makes the training process simple and avoid training from the divergence [11,29]. Hence, we standardize the data with zero value of mean and unit value of the variance of the training set during the training process. Moreover, we also standardize the test set during the prediction time.

Hyper-parameters in LSTM network are selected manually to make the training process more efficient. These parameters include the number of hidden units, learning rate drop factor, the maximum number of epochs, initial learning rate, and the optimizer used. The best selection of hyper-parameters may result in the learning with least prediction error. The training process performance can be measured in terms of RMSE. The less the value of RMSE, the better is the prediction results of the trained model on unseen data (test set). We have selected 200 and 125 hidden units in LSTM network during training process over trace 1 and trace 2, respectively. The value of the learning rate drop factor remains 0.2 for both traces. Maximum number of epochs during the training process of trace 1 are considered to be 600 while for trace 2 are 1000 epochs. Initial learning rate during the training process of both traces remains 0.004. Furthermore, the optimizer used during the training process for both traces is Adam optimizer. The above mentioned hyper-parameters reduce the RMSE during training and testing processes.

We have considered the total length up to 130,820 sample values (323.886 s) of trace 1, while trace 2 has 77,470 values of time samples (417.642 s) [11]. We have divided both traces into 10% of samples as the training set and a number of different test sets randomly selected from the remaining 90% of both traces. Each test set has an equal number of samples. Figure 8a shows the RMSE for the initial test set of trace 1 that is as small as 12 ms. Whereas, Figure 8b shows the RMSE value for trace 2 is 10 ms for the first random test set.

**Figure 8.** Prediction result and root mean square error (RMSE) for first test set.

The RMSE values for random test sets from the remaining 90% samples of trace 1 and trace 2 are shown in Figure 9. It can be seen from Figure 9a,c that the minimum value of RMSE is least at 6 ms and 5 ms on random test sets from samples 70,500 to 71,953 and 90,700 to 92,153 of trace 1, respectively. Whereas, Figure 9b,d show the RMSE value of 6 ms and 8 ms from 10,000 to 10,899 and 65,000 to 65,899 samples of trace 2, respectively.

AI-DRX can be implemented at the gNB of 5G network. During the execution of AI-DRX, we consider packet generation event, active event, dynamic long sleep event, dynamic short sleep event, ON duration event, beam searching event and feedback event. Moreover, our approach enables dynamic sleep cycles in multiple beam communications scenario for 5G networks. The packet generation event in our simulation scans the time column of both traces and produces the data packets of the identical length on the same instant in the respective trace. The generated data packets are collected in the buffer and served to the UE during the active event. UE checks the buffer during ON period, in case of any packet in buffer, the UE switches to the active event, or else continues to sleep. During an active event, the buffered packets are served to UE after getting the beam pairs alignment between UE and gNB. At the same time, packet arrival time is inputted to the trained LSTM model to predict the upcoming packet arrival time. We can obtain the dynamic sleep duration by subtracting the previous packet arrival time value from the predicted dynamic time value *TDY* (upcoming packet arrival time).

(**b**) Test set from trace 2 (samples: 10,000 to 10,899).

(**a**) Test set from trace 1 (samples: 70,500 to 71,953).

**Figure 9.** Prediction results and RMSE for random test sets.

Figure 10a,b, demonstrate the energy efficiency and mean delay with varying ON Period (*TON*) for AI-DRX and LTE-DRX over trace 1 & trace 2, respectively. In Figure 10, *TON* is varied from 1 ms to 160 ms while the values for *ThMin* = 20 & *ThMax* = 100 are considered. Figure 10a highlights the decrease in energy efficiency with increase in *TON*. The reason for the drop in energy efficiency is, UE waits for longer period in ON state prior switching to active state (to serve packets). It can be seen from Figure 10b, the mean delay for trace 1 ranges from 150 ms to 194 ms with an increase in ON period. The reason for rise in mean delay lies in the feet that with an increase in ON duration, the time spend by UE in ON state will be higher, which results in higher delay. Moreover, by selecting the optimum value of *TON*, delay observed by UE can be minimized. From Figure 10a, it is noticed that the energy efficiency of trace 2 is higher than that of trace 1 due to higher arrival rate of trace 1 as compared to trace 2.

We have compared the performance of AI-DRX with LTE-DRX. To implement the LTE-DRX, we select the value of short sleep cycle to *ThMin*, the value of long sleep cycle to *ThMax* and fed trained model with trace 1 and trace 2. It is observed in Figure 10a AI-DRX energy efficiency for trace 1 is 69% higher than that of LTE-DRX, at the cost of higher delay. The reason arises from the fact that AI-DRX

calculates sleep time form real wireless traffic trace based on arrival rate, while LTE-DRX uses *ThMin* and *ThMax* to set the short and long sleep time. For AI-DRX, the small value of *ThMin* and *ThMax* achieves higher energy efficiency as UE easily transits to long sleep time, which results in a larger delay. Whereas, for small values of *ThMin* and *ThMax* in LTE-DRX, a UE sleeps for a short period that results in less energy efficiency and smaller mean delay. AI-DRX achieves 55% higher energy efficiency as compared to that of LTE-DRX using trace 2.

**Figure 10.** AI-DRX energy efficiency and mean delay with varying *TON* (*ThMin* = 20, *ThMax* = 100).

Figures 11 and 12 present the energy efficiency and mean delay with varying ON period *TON* for *ThMin* = 200 & *ThMax* = 1000 and *ThMin* = 300 & *ThMax* = 1600, respectively. It is observed from Figures 11 and 12 that the energy efficiency and mean delay of LTE-DRX increase with an increase in *ThMin* & *ThMax*, as UE sleep longer. The energy efficiency of AI-DRX decreases with an increase in *ThMin* & *ThMax*, as UE will not able to transit to sleep state if *TDY* < *ThMin*. With a small value of sleep time *TDY* or higher value of *ThMin* & *ThMax*, UE remains in the active state, which results in less energy efficiency.

**Figure 11.** AI-DRX energy efficiency and mean delay with varying *TON* (*ThMin* = 200, *ThMax* = 1000).

**Figure 12.** AI-DRX energy efficiency and mean delay with varying *TON* (*ThMin* = 300, *ThMax* = 1600).

To validate our proposal, we have compared our work to traditional Poisson arrival model. We generated three data sets considering Poisson arrival rate (*λ*) with mean value of *λ* = 1/20, *λ* = 1/10, and *λ* = 1. We have trained model using Poisson arrival rate. The generated traces are fed to AI-DRX algorithm to analyze the energy efficiency and mean delay. Figure 13a,b shows the energy efficiency and mean delay for AI-DRX trace 2 and Poisson arrival with varying *TON*. The energy efficiency of AI-DRX is 70% higher than Poisson arrival rate (*λ* = 1/20). The corresponding mean delay of AI-DRX is 100 ms (on an average) higher as compared to Poisson arrival for *λ* = 1/20. The gain in energy efficiency is achieved as AI-DRX considers real traffic arrival rate for selection of sleep cycles and inactivity timer, while Poisson arrival considers mean arrival rate. The energy efficiency and mean delay both are zero for higher Poisson arrival rate *λ* = 1. The reason lies in the fact that for higher arrival rate, UE could not transit to sleep state to save the power.

(**a**) Energy Efficiency.

(**b**) Mean Delay.

**Figure 13.** AI-DRX energy efficiency and mean delay comparison with Poisson Arrival with varying *TON* (*ThMin* = 300, *ThMax* = 1600).

Various services in a wireless network can tolerate different delay levels while not compromising quality of service (QoS). QoS class identifier (QCI) is a metric that is used to identify the characteristics of traffic. QCI measures the QoS with two parameters; (1) packet loss rate (PLR) and (2) packet delay budget (PDB). PDB can be defined as maximum tolerable waiting time by a packet during its delivery from eNB to UE. Standardized QCI characteristics are shown in Table 2 [6,30,31]. In various kinds of non-real-time services like email, web browsing, after a certain time period, a UE does not require to monitor PDCCH continuously [6,32]. Hence, these types of services can tolerate higher delays up to 300 ms [6,30,31]. These types of services require DRX with higher values of sleep time (*TDY* > *ThMin* & *TDY* > *ThMax*) and smaller values of ON timer for better energy efficiency. Whereas, the real-time services like voice and live video streaming cannot tolerate delay [33]. Therefore, the delay should be a higher priority than energy saving. If we limit the mean delay to 125 ms, the energy efficiency will be 95.02%. For this energy efficiency we have to select *ThMin* = 200 , *ThMax* = 1000, *TON* = 6 ms. If we increase the *TON* the energy efficiency of UE decreases as UE remains in active state. The mean delay also increases with increase in *TON* as UE does not receive the data during ON period but only monitors the Physical Downlink Control Channel (PDCCH). The mean delay observed by UE increases to 150 ms when *TON* = 80, with energy efficiency of 80.2%. The network can maximize the energy efficiency of UE by selecting optimum value of *ThMin*, *ThMax*, *TON* depending on QCI value of different services.


**Table 2.** Standardized QoS class identifier (QCI) characteristics in LTE/LTE-A [6,30,31].

#### **5. Conclusions**

In this work, we have suggested an AI-based DRX mechanism for energy saving in multiple beams communications scenario. We have modeled DRX as a ten-state model and suggested AI-DRX algorithm depending on these 10 states. AI-DRX algorithm enables dynamic short and long sleep cycles for energy efficiency of UE in the 5G network. We have trained LSTM network, a popular type of RNN, to extract the packet arrival time pattern from real wireless traffic traces. Later, we have utilized the learned model in AI-DRX algorithm for energy saving in 5G enabled devices. AI-DRX economizes power consumption of a UE by enabling dynamic short and long sleep cycles. Extensive training with selected hyper-parameters achieves the least RMSE of 5 ms on a random test set from trace 1 and 6 ms on a random test set from trace 2, respectively. The energy efficiency obtained with AI-DRX is approximately 60% and 95% for trace 1 and trace 2, respectively. AI-DRX achieves 69% higher energy efficiency on trace 1 and 55% more energy efficiency on trace 2 as compared to LTE-DRX, respectively. We also validated the performance of AI-DRX with traditional Poisson packet arrival model. AI-DRX attains 70% more energy efficiency on trace 2 as compared to Poisson packet arrival rate for *λ* = 1/20.

**Author Contributions:** Conceptualization, M.L.M. and M.K.M.; methodology, A.R.; investigation, M.L.M. and M.K.M.; resources, M.L.M.; data curation, M.L.M.; writing—original draft preparation, M.L.M. and M.K.M.; writing—review and editing, N.S. and A.R.; supervision, N.S., A.R. and D.R.S.; project administration, D.R.S.; funding acquisition, D.R.S.

**Funding:** This research received no external funding.

**Acknowledgments:** This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B03935633).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Planar Array Diagnostic Tool for Millimeter-Wave Wireless Communication Systems**

#### **Oluwole J. Famoriji 1,\*, Zhongxiang Zhang 2, Akinwale Fadamiro 1, Rabiu Zakariyya <sup>1</sup> and Fujiang Lin <sup>1</sup>**


Received: 8 October 2018; Accepted: 23 November 2018; Published: 3 December 2018

**Abstract:** In this paper, a diagnostic tool or procedure based on Bayesian compressive sensing (BCS) is proposed for identification of failed element(s) which manifest in millimeter-wave planar antenna arrays. With adequate a priori knowledge of the reference antenna array radiation pattern, a diagnostic problem of faulty elements was formulated. Sparse recovery algorithms, including total variation (TV), mixed -1/-<sup>2</sup> norm, and minimization of the -1, are readily available in the literature, and were used to diagnose the array under test (AUT) from measurement points, consequently providing faster and better diagnostic schemes than the traditional mechanisms, such as the back propagation algorithm, matrix method algorithm, etc. However, these approaches exhibit some drawbacks in terms of effectiveness and reliability in noisy data, and a large number of measurement data points. To overcome these problems, a methodology based on BCS was adapted in this paper. From far-field radiation pattern samples, planar array diagnosis was formulated as a sparse signal recovery problem where BCS was applied to recover the locations of the faults using relevance vector machine (RVM). The resulted BCS approach was validated through simulations and experiments to provide suitable guidelines for users, as well as insight into the features and potential of the proposed procedure. A *Ka*-band (28.9 GHz) 10 × 10 rectangular microstrip patch antenna array that emulates failure with zero excitation was designed for far-field measurements in an anechoic chamber. Both simulated and measured far-field samples were used to test the proposed approach. The proposed technique is demonstrated to detect diagnostic problems with fewer measurements provided the prior knowledge of the array radiation pattern is known, and the number of faults is relatively smaller than the array size. The effectiveness and reliability of the technique is verified experimentally and via simulation. In addition to a faster diagnosis and better reconstruction accuracy, the BCS-based technique shows more robustness to additive noisy data compared to other compressive sensing methods. The proposed procedure can be applied to next-generation transceivers, aerospace systems, radar systems, and other communication systems.

**Keywords:** far-field; antenna array; diagnosis procedure; noisy data; BCS; millimeter-wave

#### **1. Introduction**

Antenna array is a key technology component in various communication systems such as radar, radio-astronomy, remote sensing, satellite communications, and next-generation (fifth generation, 5G) wireless communications [1], where a very large number (in the hundreds) of radiating elements are particularly used to meet the increasing demands of high radiation performance and reconfigurability [2]. Conversely, the higher the number of radiating elements in the beam-forming

configuration, the higher the probability of failed element(s) will be. This causes abrupt field variations across the aperture of the array, and distortion in the radiation features (e.g., beamwidth, peak sidelobe, and boresight). Therefore, the availability of reliable and effective diagnosis tools for large arrays remains an asset, because manual dismantling and replacement operations consume excessive time and cost, and are even unfeasible in satellite-borne installations. Currently, failure identification in antenna arrays is a theoretical and practical important research domain. Detection of faulty elements in antenna arrays is of great interest in both military and civilian markets. Upcoming technologies adopt active or passive antenna arrays with a large number of elements [1–5]. For instance, millimeter-wave transceivers implement multiple-input multiple-output (MIMO) features and beamforming for future 5G applications, as shown in Figure 1. The block diagram shows the location of the AWMF-0108 in a 5G MIMO system [6]. The integrated circuit (IC) contents in the circle are the gain and phase control blocks with amplification and RX/TX switching. The first industrial and commercial millimeter-wave quad-core IC transceiver for 5G applications is the AWMF-0108 [6]. Many compactable antennas were designed for that purpose. Thus, some communication systems will evolve for 5G technology, even before full deployment, which is not expected until 2020. The large number of elements in the planar antenna required by the transceiver must function optimally. Failure in the element(s) causes far-field degradation of antenna systems. The detection of failed elements from field measurements taken from a suitable observation point is very important to re-calibrate the feeding network and to reinstall the needed radiation characteristics by reconfiguring the excitations of the healthy elements [1,3]. Testing of antennas is then a necessity when a certain number of elements exhibit fault. Therefore, the fast diagnosis of complex antenna structures is always a fundamental need.

Far-field measurements are a very powerful approach for antenna array testing. The measurement data are sequentially presented for probable failure identification in the array under test (AUT). The matrix method algorithm (MMA) and back propagation algorithm (BPA) are the most commonly used mechanisms to detect the number and corresponding positions of defective elements using a reference antenna (healthy) and the AUT (defective). BPA [7] was established using the Fourier relationship between radiated far-field and the field situated on the array aperture, and it is applicable to planar antenna arrays. A generalized form of BPA is MMA [8,9]. MMA uses linear algebra standard tools to stabilize the inversion matrix, which relates the array aperture field to radiated far-field. However, MMA and BPA demand a large number of measurements, thus causing long post-processing of large arrays. One approach to mitigate the problem is the use of a priori knowledge of the array without failure; consequently, only the defective array elements are identified. The modeled diagnostic problem is solved by employing available customs whose computation time is a little longer than standard methodologies. At this point, it is evident that the total time taken to get the antenna array diagnosed greatly depends on measurement time, with post-processing times having a higher order of magnitude. This is why sparse recovery-based methods require fewer measurement numbers and provide faster antenna array diagnosis.

Recently, compressive sensing emerged rapidly as a potential technique for solving sparse recovery problems [10–20]. Within this context, the appropriateness of compressive sensing in addressing the array diagnosis problem was examined in References [3,12,14,16,18,21]. Evidently, faulty element distribution in array configurations in practice were found to be highly sparse because it accounts for small non-null entries in the excitation vector of the transmit/receive modules. Beginning from that hypothesis, -1-norm minimization mechanism was applied successfully to detect failures in planar arrays using a small number of near-field [9] or far-field [14] measurements. Conversely, deterministic compressive sparse techniques require a "measurement matrix" to comply with the restricted isometric property (RIP) condition, for which the estimation of large matrices remains an open challenge [3,14,21]. An alternative is the probabilistic compressive sensing approach reported in Reference [21] to diagnose linear arrays from far-field measurements. However, most of these techniques were not tested experimentally.

**Figure 1.** *Ka*-band active antenna array formation which integrates multiple-input multiple-output (MIMO) and beamforming.

In this work, the problem of antenna element excitation level was not examined; however, we estimated field distribution on the array aperture. This helps us identify the modifications of aperture field distribution as a result of factors that cannot be quantified by simple failure of elements, such as different reflections of the array and its feed. The problem faced in getting more information about the AUT is the larger number of unknowns. However, in sparse recovery methods, the required number of measurements increases slowly and logarithmically with the number of unknowns [10–13]. Hence, the field reconstruction scheme benefits more in sparse recovery-based mechanisms. Different sparse recovery algorithms used to conduct antenna array diagnosis were unveiled [14] and compared to the traditional BPA and MMA. In particular, total variation (TV) norm, mixed -1/-<sup>2</sup> norm, and minimization of the -<sup>1</sup> norm were used to proffer solutions to the resulting inversion issues. From the field reconstructed on the antenna aperture, Fuchs et al. [20] acquired a good antenna diagnosis. The approach was applied to far-field simulation data generated from a 100-element antenna array. The performance of the diagnosis was evaluated and compared to the two standard techniques (BPA and MMA) under different conditions. The approaches were also applied to far-field measurement data of an antenna array with failure to justify the practical applicability of the proposed schemes. Although there were many more works on sparse recovery methods in applied electromagnetics and microwaves involving the diagnosis processes of antenna arrays [13,14], experimental data, which are fundamental for testing any procedure, were reported in few of them.

In References [15–20], differential scenarios with sparse recovery algorithms were employed to perform antenna diagnosis and retrieve element excitations. Reference [21] proposed a joint scheme for adaptive diagnosis of antenna arrays using communication signal fusion (radar-communication scheme) and the echoes of probe signals received at the same antenna. This method equally solved the antenna diagnosis problem at low signal-to-noise ratios (SNRs) to ensure optimal performance of smart sensors in wireless sensor networks. Also, Reference [22] attempted array diagnosis in millimeter waves using compressive sensing. This work considered both full and partial blockage, which occurs from a plethora of particles (such as ice, water droplets, salt, and dirt) and the technique jointly computed the locations of the blocked elements, and the induced phase-shifts and induced attenuation provided the prior knowledge of the angles of departure/arrival. Reference [23] proposed a deterministic sampling strategy for failure detection in uniform linear arrays via compressed sensing or a sparse recovery approach. This is an extension of the Weyl formula which is basically used for prime

numbers. The strategy obtained was good for nonprime number (i.e., valid for any number of array elements). This sampling approach is good for sparse electromagnetic (EM) problems encompassing Fourier matrices. Reference [24] gives a review of different capacities of sparse recovery by analyzing how compressive sensing can be applied to antenna array synthesis, diagnosis, and processing. Illustrations of a set of applicative examples were given, including direction-of-arrival estimation, along with present challenges and current trends in compressive sensing applications to the solution of innovative and traditional antenna array challenges. In general, compressive sensing generates few unknown numbers; however, it needs a comprehensive array model with exact knowledge of the radiating element patterns to produce useful results. The technique is sparse with respect to the whole array structure, and requires a priori information to recast it as the function of minimization of -<sup>1</sup> norm. However, these techniques are applicable only if the relationships between the data and the unknowns satisfy the restricted isometry property (RIP). To overcome this challenge, the Bayesian compressive sensing (BCS) approach is adopted. This technique was explored in many electromagnetic problems, such as antenna design and synthesis [25,26], microwave imaging [27–30], and direction-of-arrival estimation [31–33]. It is employed in this paper to estimate the number, magnitude, and location of failures in antenna arrays from far-field measurements. The BCS approach was attempted to diagnose large linear arrays [11], and more recently, planar array configurations [34]; however, no experimental validation was reported. Hence, there is a need for a more reliable procedure tested experimentally and via simulation, because experiments are fundamental tests of any given procedure.

Specifically, this work is an extension of that described in References [15,16]. The BCS method is applied to both the simulated and measured far-field data of a millimeter-wave 100-element microstrip patch antenna array in which failures were added intentionally. A new regularization technique was unveiled and applied to field distribution in order to enhance the efficiency and reliability of antenna array diagnosis. The proposed BCS-based approach is a better choice due to its fast nature and robustness under different noise conditions. The key contributions of this paper are summarized as follows:


However, some boundary conditions were observed. The BCS-based approach detects diagnostic problems with few measurements, provided prior knowledge of the reference array radiation pattern, and the number of faults is relatively smaller than the array size. The remainder of this paper is arranged as follows: Section 2 contains the problem formulation of antenna array diagnosis. Section 3 presents compressed sparse recovery methods. Resolution via the BCS-based approach is given in Section 4. The numerical simulations are presented in Section 5. Diagnoses from experimental data are presented and discussed in Section 6. Finally, some conclusions are drawn in Section 7.

#### **2. Antenna Array Diagnosis Problem Formulation**

Consider an antenna array in space (Figure 2a). The antenna radiated far-field is usually quantified by phase and amplitude. The AUT is depicted in Figure 2b. All the parameters associated with the AUT are marked with superscript "*u*". Specifically, *Eu*(*x*, *y*) is the tangential field situated on the antenna aperture, i.e.,

$$E^u(\mathbf{x}, \mathbf{y}) = E^u\_\mathbf{x}(\mathbf{x}, \mathbf{y})\mathbf{\hat{x}} + E^u\_y(\mathbf{x}, \mathbf{y})\mathbf{\hat{y}},\tag{1}$$

where *E<sup>u</sup> <sup>x</sup>* (*x*, *y*)*x*ˆ and *E<sup>u</sup> <sup>y</sup>* (*x*, *y*)*y*ˆ are the *x* and *y* planes of the aperture's electric field, respectively. Far-field *Fu*(*r*, *θ*, ∅) is the measured field on part of the hemispherical surface (<sup>0</sup> ≤ *<sup>θ</sup>* ≤ *<sup>π</sup>*/2, 0 ≤ *<sup>φ</sup>* ≤ <sup>2</sup>*π*) at radius *<sup>r</sup>* from the phase center of the AUT, and *<sup>r</sup>* > <sup>2</sup>*D*2/*λ*, where *<sup>D</sup>* is the diameter. Also, the amplitude and phase of a reference array (RA; array without failures) shown

in Figure 2b are assumed to be available. Associated quantities are marked with superscript "o". *Eo*(*x*, *y*) is the field on the aperture Σ of the reference array (RA) and *Fo*(*r*, *θ*, ∅) represents the far-field radiation. For the differential antenna (DA) shown in Figure 2c, the tangential distribution *E*(*x*, *y*) on the aperture Σ is equal to the difference between the field distributions of the reference array and the antenna under test, and the corresponding far-field *F*(*r*, *θ*, *φ*) is expressed as the difference between the fields of reference array (RA) and AUT as

$$E(\mathbf{x}, y) = \mathbb{E}^{u}(\mathbf{x}, y) - \mathbb{E}^{o}(\mathbf{x}, y),\tag{2}$$

$$F(r, \theta, \phi) = F^{\mu}(r, \theta, \phi) - F^{\nu}(r, \theta, \phi). \tag{3}$$

**Figure 2.** Antenna array: (**a**) reference antenna without failures; (**b**) antenna under test (AUT); (**c**) differential antenna (DA). The number of failures is *2* within the total element number *N* = 21.

The differential antenna gives a resulting problem in which only the corresponding area to the field modification radiates as a result of failure. By visually monitoring the field distribution on the DA, the identification of faulty elements in the AUT can be observed.

#### *2.1. Number of Far-Field Measurement Points Required*

BPA and MMA approaches require a large number of measurement points. In the differential antenna, we assumed the field was localized, i.e., the unknown we wanted to retrieve was very sparse, as shown in Figure 2c. In practice, there are a fewer number of failures than the overall elements *N*. Sparse recovery algorithms estimate **x** from a number of measurement points smaller than the number of measurements required by the standard mechanisms. Hence, it is possible to theoretically get a reduction in the number of measurement points. Fuch et al. [20] demonstrated this using total variation (TV), mixed -1/-<sup>2</sup> norm, and minimization of -1. However, better methods/algorithms that require fewer numbers of far-field measurement points for faster array diagnosis are still in demand.

#### *2.2. Signal-to-Noise Ratios (SNRs)*

Total variation (TV), mixed -1/-<sup>2</sup> norm, and minimization of -<sup>1</sup> techniques are the leading compressive techniques, and they exhibit low efficiency and reliability in antenna diagnosis for low SNRs. This drawback fosters the need for a more robust diagnosis procedure in the presence of noisy data.

#### **3. Compressed Sparse Recovery Methods**

The essence of matrix inversion regularization is to initiate a priori facts within the inversion. An adequate approach is needed to get this regularization by approximately reducing the selected norm *q* of **x** solution. Then, the optimization to be solved is

$$\min\_{\mathbf{x}} \|X\|\_{q} \qquad \text{subject to } \|\mathbf{y} - \mathbf{A}\mathbf{X}\|\_{2} \le \gamma. \tag{4}$$

where ∗*<sup>q</sup>* represents *lq* norm, and *γ* is a function of noise and factors influencing the data. There are various routines available to effectively solve the convex optimization problem of Equation (4) such as References [25–27]. The three norms *lq*, selected based on a priori knowledge of the differential antenna set-up with the diagnosis problem, can now be described for regularization of the inversion. We applied them to conduct diagnosis of both the simulated and measured radiating antennas.

#### *3.1. Total Variation (TV) Norm*

Based on a priori knowledge that solution *X* has small discontinuities as a result of failures present, in addition to the failures, we expect field *X* to be leveled and almost zero. Hence TV norm is a smooth function to regularize *X* [27]. Thus, minimizing TV norm is minimizing its gradient, which is the effect of smoothing. Consider a two-dimensional complex dataset *<sup>X</sup>* <sup>∈</sup> <sup>C</sup>*M*×*N*; TV norm gives

$$\|\mathbf{X}\|\_{TV} = \sum\_{m,n} |\mathbf{X}\_{m+1,n} - \mathbf{X}\_{m,n}| + |\mathbf{X}\_{m,n+1} - \mathbf{X}\_{m,n}|\tag{5}$$

$$\|\text{vec}(\nabla\_{\mathbf{x}}\mathbf{X})\|\_{1} + \|\text{vec}(\mathbf{X}\nabla\_{\mathbf{y}})\|\_{1}.$$

*Vec*(*X*) generates vector *N* × *M* which holds the columns of *X* stacked beneath each other. Gradient matrices ∇*<sup>x</sup>* and ∇*<sup>y</sup>* are of *M* × *M* and *N* × *N* size, respectively, which are expressed as

$$
\nabla\_x = \begin{bmatrix}
& -1 & & 1 & 0 \\
& & \ddots & & \ddots & \\
& & 0 & & -1 & 1
\end{bmatrix}, \text{and} \quad
$$

$$
\nabla\_y = \begin{bmatrix} -1 & & & 0 \\ & 1 & \ddots & & \\ & & \ddots & & -1 \\ & & & 1 \end{bmatrix}.
$$

Then, the optimization problem in Equation (4) transforms to

$$\min\_{\mathbf{X}} \|\mathbf{X}\|\_{TV} \quad \text{subject to } \|\mathbf{y} - \mathbf{A}\mathbf{vec}(\mathbf{X})\|\_2 \le \varepsilon. \tag{6}$$

#### *3.2. The* -<sup>1</sup> *Norm*

Since there is a sparse solution *X*; then, a space of search could be drastically reduced by the introduction of a priori knowledge in inversion. Specifically, the -1-norm (*X*<sup>1</sup> = ∑*k*|*xk*|) is the leading convex surrogate of an acceptable estimate sparsity of the vector (quasi-norm -<sup>0</sup> that calculates nonzero occurrences of a given vector). As a result, -<sup>1</sup> norm is an efficient approach to enhance sparse solutions [2,5,10,19]. The regularization problem is then

$$\min\_{X} \|X\|\_1 \quad \text{subject to } \|y - AX\|\_2 \le \epsilon. \tag{7}$$

Minimizing -1-norm imposes the pointwise sparsity of solution per sample *xk* of the field on the aperture of the DA.

#### *3.3. Mixed* -1/-<sup>2</sup> *Norm*

The radiating aperture's position and dimensions can also be taken. The solution *X* is grouped into *G* groups *Xg*, which corresponds to the individual radiating element's aperture *g.* For a faulty element, all regions of discretization *x g <sup>k</sup>* in the aperture will be faulty (nonzero). Let vector *X* of dimension *<sup>M</sup>* <sup>×</sup> *<sup>N</sup>* be divided into *<sup>G</sup>* non-overlapping groups depicted *<sup>X</sup><sup>g</sup>* of size *Ng*, such as <sup>∑</sup>*<sup>G</sup> <sup>g</sup>*=<sup>1</sup> *Ng* = *MN*. Hence, the mixed -1/-<sup>2</sup> norm is given as

$$\|\|X\|\|\_{1,2} = \sum\_{\mathbf{g}=1}^{G} \|\mathbf{X}^{\mathbf{g}}\|\|\_{2} = \sum\_{\mathbf{g}=1}^{G} \sqrt{\left|\mathbf{x}\_{1}^{\mathbf{g}}\right|^{2} + \dots + \left|\mathbf{x}\_{N\_{\mathbf{g}}}^{\mathbf{g}}\right|^{2}}.\tag{8}$$

The mixed -1/-<sup>2</sup> norm has similar behavior to -<sup>1</sup> norm on vector *X*12, ... , *Xg*2, ... , *XG*2; it, therefore, induces group sparsity at the radiating aperture level. The regularized inversion problem is then expressed as

$$\min\_{\mathbf{X}} \|\mathbf{X}\|\_{1,2} \quad \text{subject to } \|\mathbf{y} - \mathbf{A}\mathbf{X}\|\_2 \le \varepsilon. \tag{9}$$

#### **4. Resolution via Bayesian Compressive Sensing**

For a planar antenna configuration of *N* elements positioned at coordinates (*xnyn*), *n* = 1, ... , *N*, with error-free excitations *αn*, *n* = 1, ... , *N*, beaming a familiar field *E*(*u*, *v*),(*where u* = *sinθcosϕ and v* = *sinθsinϕ*), referencing a noisy case with element failure, the estimated far-field radiation pattern of (AUT) is expressed as

$$\check{E}(u\_l, v\_l) = \sum\_{n=1}^{N} \beta\_n e^{j\frac{2\pi}{\lambda}(x\_n u\_l + y\_n v\_l)} + v\_{l\prime} \tag{10}$$

where (*ul*, *vl*) for *l* = 1, ... , *L* is the angular location of the *l-th* angular sample, and *vl* is the noise effect considered as Gaussian-distributed with zero mean and variance *σ*2. *βn*, *n* = 1, ... , *N*, is the failed excitations vector, expressed as

$$\beta\_n = \begin{cases} \text{ } \space{law}\_n \quad \text{ with probability } \spadesuit \\\clubsuit\_n \quad \text{ } \space{otherwise} \end{cases}, \ n = 1, \ldots, N. \tag{11}$$

*h* ∈ (0, 1) is the failure factor, while Φ is the rate of failure, and *α<sup>n</sup>* is the weighting coefficient. From knowledge of the difference in field pattern, *<sup>W</sup>*(*ul*, *vl*) <sup>=</sup> *<sup>E</sup>*(*ul*, *vl*) <sup>−</sup> *<sup>E</sup>*6(*ul*, *vl*), *<sup>l</sup>* <sup>=</sup> 1, ... , *<sup>L</sup>*, array failures can be estimated by determining the minimum -<sup>0</sup> − *norm* vector

$$\underline{\mathbf{Y}} = \{ \mathbf{Y}\_n = \mathbf{a}\_n - \boldsymbol{\beta}\_n ; n = 1, \dots, N \}, \tag{12}$$

which satisfies

$$
\underline{\mathsf{W}} - \underline{\mathsf{Y}}\underline{\mathsf{Y}} = \underline{\mathsf{v}}.\tag{13}
$$

The aim is to determine the entries of the "failure" vector *W* = {*W*(*ul*, *vl*); *l* = 1, . . . , *L*}, *v* = {*vl* ; *l* = 1, . . . , *L*} from the prior knowledge of the difference between the field samples in Equation (10) on the AUT and those of the *golden* antenna with coefficients *α<sup>n</sup> n* = 1, ... , *N*. Against the deterministic approaches aimed at retrieving the vector *γ* with minimum -<sup>0</sup> − *norm* satisfying the condition of Equation (13), Ψ is the *L* × *N* radiation measurement matrix expressed as

Ψ = ⎡ ⎢ ⎣ *e*[*j*2*π*(*x*1*u*1+*y*1*v*1)] ... *e*[*j*2*π*(*xNu*1+*yNv*1)] . . . ... . . . *e*[*j*2*π*(*x*1*uL*+*y*1*vL*)] ... *e*[*j*2*π*(*xNuL*+*yNvL*)] ⎤ ⎥ ⎦. (14)

Hence, the BCS technique (summarized in Figure 2) can be employed to determine the sparsest solution Υˆ to the problem

$$\hat{\underline{\chi}} = \arg \left\{ \max\_{\underline{\chi}} [P(\underline{\chi}|\underline{\text{M}})] \right\}\_{\prime} \tag{15}$$

which gives

$$\hat{\underline{\mathbf{Y}}} = \frac{1}{\sigma\_{NP}} \left[ \frac{\underline{\mathbf{Y}}^T \underline{\mathbf{Y}}}{\sigma\_{NP}} + \text{diag} \left( \underline{f}\_{NP} \right) \right],\tag{16}$$

where *T* is the transpose operator, and *σNP* and *f NP* are the figures that are used to maximize the likelihood function

$$L\left(\sigma,\underline{f}\right) = -\frac{1}{2}\left[N\log 2\pi + \log\left|\underline{\underline{C}}\right| + \underline{\underline{W}^T}\underline{\underline{C}^{-1}}\underline{\underline{W}}\right].\tag{17}$$

Equation (11) is computed using RVM [29], with *C* = *σ* + Ψ*F*−<sup>1</sup> Ψ*T*, where *F* = *diag*(*f*). The implementation of the BCS technique (as shown in Figure 3) is summarized in Algorithm 1.

**Algorithm 1.** Proposed diagnostic procedure


**Figure 3.** Flowchart of the proposed Bayesian compressive sensing (BCS) procedure.

#### **5. Simulations and Analysis**

Assessing the performance of the BCS algorithm, we consider an RA with *N = 316* with Taylor taper and peak sidelobe level of −25 dB. Assuming a complete failure (*h = 0)*, the given percentage of failed elements is Φ = 4%. To determine the detection error numerically, the index of detection can be expressed as

$$\zeta = 100 \times \frac{\sum\_{n=1}^{N} \left| \mathbf{Y}\_n - \hat{\mathbf{Y}}\_n \right|^2}{\sum\_{n=1}^{N} \left| \mathbf{Y}\_n \right|^2},\tag{18}$$

where Υ*<sup>n</sup>* and Υˆ *<sup>n</sup>*, *n* = 1, ... , *N*, are the real and predicted failure entries of vector Υ. The uniformly sampled (*k* = 316 samples) far-field radiation pattern is within *(u, v)* space, with the signal-to-noise ratio *SNR* = 30 dB. The configuration of the AUT is presented in Figure 4, while the excitation coefficients *β<sup>n</sup>* of the failed array are equally shown. Figure 5 shows Υˆ *<sup>n</sup>* of the failure vector estimated via the proposed technique. As expected, there is good correlation between the location and the number of the real and predicted failed elements. Accuracy of the estimation was ascertained by a very small index of detection figure of *<sup>ζ</sup>* = 3.83 × <sup>10</sup><sup>−</sup>3.

**Figure 4.** Normalized array excitation of the array under test (AUT).

**Figure 5.** Normalized array excitation with estimated failure vector Υˆ and location and amplitude of entries.

To analyze the impact on performance metrics of the technique of the noise on far-field patterns, we ranged the SNR between 0 dB and 100 dB. Figure 6 shows the obtained result. The estimation error *ζ* is high for low SNRs irrespective of failed element percentage. Also, for higher SNR, the robustness of the approach increases, which shows that the best performances are attained at Φ = 3%. The impact of the percentage of failed elements on the performance of the method proposed was also assessed by varying the percentage of the failed elements from Φ = 2% to Φ = 20% (see Figure 7). Expectedly, the performance metrics of the approach reduced for a higher percentage of failed elements, even for very low noise levels. Conversely, the approach achieved a degree acceptable accuracy until about 10% of damaged elements at any SNR. This result validates the efficiency of the BCS technique in the diagnosis of sparse failures in arrays.

**Figure 6.** Detection index against signal-to-noise ratio (SNR) for different failure percentages.

**Figure 7.** Detection index against failure rate for various SNRs.

#### *Example Using Full-Wave Simulation Set-Up*

A 10 × 10 microstrip patch antenna array with an aperture size of 31 × 31 mm<sup>2</sup> operating at 28.9 GHz was designed, as shown in Figure 8. We designed and computed the radiation pattern and S-parameter (Figure 9) of the antenna using full-wave three-dimensional (3D) EM software Ansys HFSS v. 17. The elements were uniformly spaced along the *x* and *y* directions. Each element had an excitation port. Practically, measurements are made impure by noise; hence, a Gaussian noise *n* was added to the data on both radiation patterns as *y<sup>q</sup> <sup>n</sup>* <sup>=</sup> *<sup>y</sup><sup>q</sup>* <sup>+</sup> *<sup>n</sup>q*, with *<sup>q</sup>* <sup>=</sup> {*r*, *<sup>d</sup>*}. The noise level was determined by signal-to-noise ratio (SNR) defined from the maximum received field magnitude fitting the dynamic measurement range. The noise was estimated as

$$n^{q} = \frac{N(0,1) + jN(0,1)}{\sqrt{2}} \max |y^{q}| \times 10^{-SNR\_{dB}/20} \text{ }\tag{19}$$

where N (0, 1) is a Gaussian random vector of mean 0 and standard deviation 1.

The faulty elements cause low gain, high sidelobe level, wider beamwidth, lower front-to-back ratio, and a boresight pointing error. The effects are shown in Figure 9b. A Rogers 5880 dielectric substrate with 20 mm thickness and 3.48 dielectric constant was used for the antenna design because it has low signal loss, low dielectric loss, low outgassing (which is good for space applications), and cheap circuit fabrication. The total number of elements in the array was 100. The length and width of a single patch were estimated to be 7.4 mm and 9.5 mm, respectively. Elements were uniformly spaced by 8.947 mm and 6.847 mm along *x* and *y*, respectively.

All elements were fed with the same excitation value which equal to one to emulate the array without failure (reference array). Then *K* failures (in this case, the estimated percentage of failure rate) were also initiated intentionally by making the excitation equal to zero in order to model the AUT effectively. At first, we considered the reference array, i.e., the array without failures. The excitation coefficients are depicted in Figure 10, and the reconstructed excitation coefficients are presented in Figure 11. Also, for quantitative knowledge of the error estimated, the computed excitation error in dB is shown in Figure 12. The result indicates an exact reconstruction in the case of the reference array, and shows a low probability of a false alarm.

**Figure 8.** Designed microstrip patch antenna array in Ansys HFSS for diagnosis.

Also, considering an AUT with *K* = 5 element failures (Φ = 5%) (elements with zero excitation) because failed elements are usually of small number in practice, the resulted excitation coefficients are presented in Figure 13, and the estimated excitations by 30 random noisy measurement points are presented in Figure 14, while the dB excitation error is depicted in Figure 15. The result is an indication of good estimation of RA excitations and the locations of the faulty elements.

**Figure 9.** The antenna array for diagnostic purposes: (**a**) S-parameter; (**b**) RA radiation pattern; (**c**) AUT radiation pattern.

**Figure 10.** Excitation field of reference array.

**Figure 11.** Reconstructed excitation field of reference array employing 30 random noisy measurement points.

**Figure 12.** Reference array reconstructed excitation error in dB by 30 random noisy measurement points.

**Figure 13.** Reference excitation field of AUT with Φ = 5%.

**Figure 14.** AUT with *K* = 5 failures Φ = 5%: reconstructed excitation error field by 30 random noisy measurement points.

**Figure 15.** AUT with *K* = 5 failures Φ = 5%: reconstructed excitation error in dB by 30 random noisy measurement points.

#### **6. Antenna Array Diagnosis from Measured Data**

#### *6.1. Measurement Set-Up*

The proposed diagnostic technique was subjected to an experimental test, as presented in Figure 16. Although, this was a controlled environment (anechoic chamber), a more practical condition (uncontrolled environment) was also possible using the same set-up in Figure 16, without the chamber. The AUT was a 10 × 10 microstrip patch antenna array (see Figure 17a), reradiating a signal tilted in the two planes. Figure 17b,c show the measured radiation patterns for an ideal antenna and defected antenna, respectively. It can be see that the failure causes higher sidelobe level, reduced gain, lower front-to-back ratio (FBR), wider beamwidth, and a boresight pointing error.

**Figure 16.** Schematic of experimental set-up.

The antenna was particularly designed and fabricated for this purpose, and each element had its feeding port which was excited using a power divider. Five radiating elements in the array were not excited (zero excitation) to successfully emulate the failure of elements. The AUT set-up is depicted in Figure 18. About 1000 co-polar and cross-polar measurements were taken on the far-field half-sphere at 28.9 GHz in an anechoic chamber (see Figure 18).

Radiation pattern measurements obtained from the array with five faulty elements were fed into the proposed algorithm for post-processing. The reconstructed excitation error which identified the specific faulty elements is depicted in Figure 19, and the corresponding dB equivalence is shown in Figure 20. Moreover, the performance metrics of the BCS-based approach were experimentally tested, and are shown in Figure 21. Figure 21a shows the obtained reconstruction error versus the measurement number at different degrees of failure. The error decreased as the number of measurement points increased irrespective of failure percentage. The reconstruction error profile increased with increased failure rate. In Figure 21b, it is demonstrated how the reconstruction error profile changes with SNR for different failure rates. The reconstruction error degraded exponentially with increased SNR independent of failure rate. The reconstruction error increased with increased failure rate. Figure 21c depicts reconstruction error versus different levels of failure for various SNRs. It can be observed that the error increased with increased failure rate. Also, the reconstruction error decreased as SNR increased.

**Figure 17.** Photograph of (**a**) the fabricated antenna array, (**b**) measured radiation pattern without fault, and (**c**) measured radiation pattern with emulated failures.

**Figure 18.** Measurement set-up of AUT for diagnostic purpose.

The imperfection of the curves (compared to the simulation) could be attributed to measurement errors, and errors due to experimental set-ups which provide different conditions from those in the simulations. The experiment was conducted in an anechoic chamber, which is a controlled environment. Hence, the results presented here may show a little variation if the experiment is conducted in a more practical environment (i.e., uncontrolled environment). Moreover, the BCS-based procedure presented here can be trusted to effectively and reliably address sparse recovery problems, particularly the detection of faulty radiators in planar arrays for next-generation 5G wireless communications. Once the suitable data are collected, and used to diagnose the array, then the array feeding network can be recalibrated to restore the needed radiation features via excitation reconfiguration of the healthy antenna elements. However, prior knowledge of the *golden* array must be provided, and the failure rate is relatively smaller than the array size. Therefore, from the simulation results verified by the experiment, the BCS-based approach is adequate and reliable for noisy data. This technique overcomes the shortcomings of BPA, MMA, etc., demanding off-line phase training to form accurate mapping between the response of the AUT and the failure location. Hence, the proposed procedure will be highly useful for millimeter-wave planar array optimal performance.

**Figure 19.** Reconstructed excitation error field by 30 random noisy measurement points with 20 dB signal-to-noise-ratio.

**Figure 20.** Reconstructed excitation error in dB by 30 random noisy measurement points.

#### *6.2. Antenna Array Diagnosis from Simulated and Measured Far-Field Radiation Patterns*

There are differences between the simulated and measured antenna patterns due to measurements errors, uncontrollable array fabrication errors, and experimental set-ups that give different conditions from the simulations. For example, in our design, a finite flange was employed to feed the ground plane. Hence, the induced current on flange rescattered and redistributed somehow against the very large ground plane which was used for the simulations. Antenna array diagnosis procedures based on simulated radiation pattern (such as References [3,16–18,27,35]) may not be reliable and accurate, except when tested with the corresponding measured data. Although, in this work, simulation and measurement data exhibit little difference in field intensity of the identified faulty elements (in Figures 14 and 19, respectively) caused by the electromechanical coupling effect. In general, the proposed BCS-based approach shows good reliability and accuracy against both simulated and measured far-field radiation patterns.

#### *6.3. Number of Measurement Points versus Noisy Measured Data*

Measurement points affect field reconstruction fidelity and, hence, the scheme of diagnosis. The proposed BCS-based procedure performed well despite significantly reduced measurement data due to added sparse information. According to the study and experiments performed by Fuch et al. [14], total variation (TV), mixed -1/-<sup>2</sup> norm, and minimization of -<sup>1</sup> techniques require 64 measurement points for accurate reconstruction and diagnostics, compared to the BCS technique that requires 30 or less measurement points. The proposed method significantly reduces the number of measurements needed for diagnosis as compared to those three approaches. Since the speed of diagnosis is inversely related to the required number of measurement points, the BCS approach enables a faster diagnosis of antenna arrays.

Also, according to the diagnostic procedures proposed by Fuch et al. [14], total variation (TV), mixed -1/-<sup>2</sup> norm, and minimization of -<sup>1</sup> techniques can only accommodate measured data with the lowest SNR of 40 dB. A lower SNR results in bad diagnostics for the three procedures. However, there are practical measurements that exhibit SNRs lower than 40 dB which require diagnosis. To this advantage, the proposed BCS approach was theoretically and practically used to diagnose antenna arrays from measurement data with 20 dB SNR. It can equally adapt to measured data with lower SNR. The comparison is summarized in Table 1. The BCS approach requires a few seconds more computational time; however, this is very small with respect to the measurement time cost. Therefore, the proposed method shows its robustness to noisy measured data, and a reliable diagnosis was obtained for low SNRs.

**Figure 21.** Experimental performance assessment of the proposed BCS-based diagnostic procedure. (**a**) Reconstruction error versus number of measurements, (**b**) Reconstruction error against SNR, (**c**) Reconstruction error with different failure rate


**Table 1.** Comparison between Bayesian compressive sensing (BCS)-based approach and previous compressive sensing techniques. SNR—signal-to-noise ratio.

#### **7. Conclusions**

A faster and robust antenna array diagnosis procedure from far-field radiation pattern measurement points using Bayesian compressive sensing (BCS) approach was proposed in this paper. Previous compressive sensing procedures exhibit shortcomings based on reliability with noisy data, and require a large number of far-field measurement points. The proposed method solves these problems by formulating planar array diagnosis within the concept of the BCS framework, resolved using fast relevance machine (RVM). We are not the first to apply the BCS approach to antenna array diagnosis. It was applied only to linear configurations in References [15,16] without practical measurements, which are fundamental for testing any procedure. To the best knowledge of the authors, this is the first attempt to apply the BCS approach to planar antenna array diagnosis from far-field measurement points, validated with experimental measurements. Diagnoses from simulated and measured far-field points from a designed microstrip patch antenna array show the method's robustness to additive data noise, as well as its reconstruction accuracy and faster diagnosis speed, which is desired in practical applications. Hence, the proposed method is a better practical choice whenever an efficient, faster, and more reliable antenna array diagnosis (testing) is needed.

Also, it is important to comment on the choice of sampling strategy. We considered a random selection of measurement points from a uniform lattice. The choice of the sampling technique is not critical because it affects all the techniques in the same manner at the far-field. However, it was pointed out, from a non-uniform near-field lattice, that proper non-uniform random sampling (NURS) using a priori information on the problem provides meaningful reduction in the cardinality of the set of measured data compared to uniform random sampling and random sampling from a λ/2 equispaced dataset [39]. Moreover, the BCS-based technique was compared to other methods using the data reported in the literature. In the future, we will compare different techniques using experimental data from controlled and uncontrolled environments, and the same parameters in order to quantify the error affecting the result of different techniques. For example, we will determine what happens if we use 30 measurements instead of 64 measurements in the experimental data using -<sup>1</sup> minimization, i.e., the same number of data used by the BCS, as well as the error compared to BCS. A complete comparison among the techniques using real data is still absent in the present literature.

**Author Contributions:** Conceptualization, methodology, writing-review and editing, original draft preparation, O.J.F.; resources, Z.Z.; visualization, software, A.F. and R.Z.; supervision, F.L.

**Funding:** This research received no external funding. The APC was funded by University of Science and Technology of China.

**Acknowledgments:** The majority of the work was conducted at MESIC (a joint lab of USTC and IMECAS), and partially carried out at the USTC Center for Micro and Nanoscale Research and Fabrication. The authors would like to thank the Information Science Laboratory Center of USTC for software and hardware services. The authors acknowledge the Applied Electromagnetics and Microwave Engineering Group of Hefei Normal University for the provision of the anechoic chamber. The support of the Chinese Academy of Science and the World Academy of Science (CAS-TWAS) is appreciated. The authors appreciate Zhang [40] and the CVX research group [41] for making useful codes accessible online.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Mobile-Phone Antenna Array with Diamond-Ring Slot Elements for 5G Massive MIMO Systems**

#### **Naser Ojaroudi Parchin 1,\*, Haleh Jahanbakhsh Basherlou 2, Mohammad Alibakhshikenari 3, Yasser Ojaroudi Parchin 4, Yasir I. A. Al-Yasir 1, Raed A. Abd-Alhameed <sup>1</sup> and Ernesto Limiti <sup>3</sup>**


Received: 31 March 2019; Accepted: 7 May 2019; Published: 10 May 2019

**Abstract:** A design of mobile-phone antenna array with diamond-ring slot elements is proposed for fifth generation (5G) massive multiple-input/multiple-output (MIMO) systems. The configuration of the design consists of four double-fed diamond-ring slot antenna elements placed at different corners of the mobile-phone printed circuit board (PCB). A low-cost FR-4 dielectric with an overall dimension of 75 <sup>×</sup> 150 mm<sup>2</sup> is used as the design substrate. The antenna elements are fed by 50-Ohm L-shaped microstrip-lines. Due to the orthogonal placement of microstrip feed lines, the diamond-ring slot elements can exhibit the polarization and radiation pattern diversity characteristic. A good impedance bandwidth (S11 ≤ −10 dB) of 3.2–4 GHz has been achieved for each antenna radiator. However, for S11 ≤ −6 dB, this value is 3–4.2 GHz. The proposed design provides the required radiation coverage of 5G smartphones. The performance of the proposed MIMO antenna design is examined using both simulation and experiment. High isolation, high efficiency and sufficient gain-level characteristics have been obtained for the proposed MIMO smartphone antenna. In addition, the calculated total active reflection coefficient (TARC) and envelope correlation coefficient (ECC) of the antenna elements are very low over the whole band of interest which verify the capability of the proposed multi-antenna systems for massive MIMO and diversity applications. Furthermore, the properties of the design in Data-mode/Talk-mode are investigated and presented.

**Keywords:** 5G; diamond-ring slot; dual-polarized antenna; massive MIMO; mobile-phone antenna; pattern diversity

#### **1. Introduction**

Nowadays, there is an increased interest in research on MIMO systems in wireless communication [1,2]. It has incomparable advantages in improving the wireless link transmission capacity and reliability. In MIMO systems, multiple antennas are deployed at both transmitter and receiver sides [3]. This technology is a key component and probably the most established to truly reach the promised transfer data rates of fifth generation (5G) communication systems [4,5]. MIMO antennas are important for increasing channel capacity and link reliability [6,7]. Standard MIMO networks tend to use two or four antennas in a single physical package. However, massive MIMO is a MIMO system with an especially high number of antennas [8]. A 2 × 2 MIMO system has been successfully applied for fourth generation (4G) mobile communications and it is expected that the massive MIMO system with a large number of MIMO antennas is very promising for 5G wireless communications [9]. The greater number of elements in a network will make it more resistant to interference and intentional jamming [10].

Among the antennas which are used for MIMO applications, printed antennas are more appropriate due to their low cost, easy fabrication and their capability of easily being integrated to small terminal devices [11]. However, placing multiple antennas in the limited space of a transceiver poses a significant challenge in the incorporation of the MIMO technique. According to the requirement of cellular communications, compact, wideband and high isolation MIMO antenna is an urgent demand in the future mobile terminal and the portable applications [12–17]. Recently, several techniques have been introduced to design massive MIMO antennas for 4G and sub-6 GHz 5G mobile terminals [18–27].

We propose here a new design of Eight-port mobile-phone antenna with compact dual-polarized radiation elements providing wide impedance bandwidths for 5G applications. Eight-element MIMO smartphone antenna can achieve the channel capacity of 37 bps/Hz which is close to eight times that of a single antenna for single-input/single-output operation. With such a channel capacity and a wide frequency spectrum (200 MHz, at least), the data rate can be much higher than 1 Gbps. The antenna is designed to operate at 3.6 GHz, a candidate frequency band for sub-6-GHz 5G cellular networks, proposed by Ofcom, UK [28]. The design configuration contains four elements of double-fed/dual-polarized slot-ring antennas placed at corners of the printed circuit board (PCB). The antenna elements exhibit wide impedance bandwidth with low mutual coupling function providing pattern and polarization diversity characteristics at different sides of the mobile-phone PCB. As a result, the design not only can provide full radiation coverage but also it can support different polarizations. The computer simulation technology (CST) software was used to investigate antenna characteristics [29]. Fundamental properties of the single-element and its MIMO design in terms of S parameters, efficiency, radiation pattern, envelope correlation coefficient (ECC), total active reflection coefficient (TARC) are investigated. In addition, the performance of the designed mobile-phone antenna in Data-Mode/Talk-Mode are studied.

#### **2. Single-Element Diamond-Ring Slot Antenna**

Figure 1 depicts the configuration of the dual-polarized diamond-ring antenna. The antenna is designed on an FR-4 substrate (h = 1.6 mm, ε = 4.4, and δ = 0.025).

Its configuration contains a diamond-ring slot radiator with a pair of L-shaped microstrip feed lines. Parameter values of the designed antenna and its MIMO configuration are specified in Table 1. The motive behind the presented design is to achieve a dual-polarized wideband antenna with compact-size and capability of integration onto smartphone PCB. This has been achieved by using the diamond-ring slot antenna with L-shaped microstrip feed lines. The slot antenna is one type of printed antenna that has been investigated extensively for different wireless systems for several decades because of its attractive features including light-weight, compactness and ease of integration with radio frequency (RF) circuit [30]. The ring-slot antenna can excite two orthogonal polarization if it is fed differently [31]. This makes the printed-ring-slot antennas attractive. The resonant frequency of the antenna is mainly determined by the circumference length of the employed diamond-ring slot. Therefore, the circumference length of the diamond-ring slot needs to satisfy the dielectric wavelength at the corresponding frequency point, where Wx/2 + g = λ. However, the length of feed-line (Lf + L1) also has a little effect on the frequency point and impedance-bandwidth of the design.

**Figure 1.** The antenna schematic, (**a**) side view, (**b**) top and (**c**) bottom layers.


**Table 1.** Parameter values of the single-element antenna and its MIMO array design.

Configurations and S parameters of the square-ring slot with rectangular feed line, a diamond-ring slot with rectangular feed line, and the proposed diamond-ring slot radiator with L-shaped feed line are illustrated and compared in Figure 2a–c, respectively. It can be observed that by using the proposed design (Figure 2c), the antenna not only provides wider impedance bandwidth but it also exhibits high isolation with low mutual coupling characteristic (less than −20 dB) at the desired operation band. As shown, the operation frequency of the slot radiator with L-shaped feed lines spans from 3.2 to 4 GHz (800 MHz bandwidth). For S11 ≤ −6 dB, this value is 3–4.2 GHz.

S11 characteristics of varying design parameters including Wx, L1, x and W1 are illustrated in Figure 3. Figure 3a depicts the effects of diamond-ring size (Wx) on the resonance frequency: when its size decreases from 8 to 6 mm, the antenna resonance varies from 3.2 to 4.6 GHz. The frequency resonance of the antenna is also affected by the size of the L-shaped feed line arm (L1). As shown in Figure 3b, the antenna operation frequency tunes to lower frequencies (without any changes on its bandwidth or isolation). Figure 3c illustrates the S11 results for various values of x (width of the diamond-ring slot-line). As shown, it mainly affects the impedance bandwidth of the antenna: when its size changes from 0.5 to 2 mm, the antenna operation bandwidth varies from 0.3 to 1.2 GHz. Another important parameter of the dual-polarized diamond-ring slot antenna design is the length of the feed line arm (L1) which tunes the isolation characteristic. As can be observed from the results shown in Figure 3d, the antenna reflection coefficient tunes from −18 to less than −40 dB.

**Figure 2.** Various structures and S parameter results of: (**a**) square-ring slot with rectangular feed line, (**b**) diamond-ring slot with rectangular feed line, and (**c**) the proposed diamond-ring slot radiator with L-shaped feed line.

**Figure 3.** S11 results of the diamond-ring antenna for various values of (**a**) Wx, (**b**) L1, (**c**) x, and (**d**) W1.

Figure 4 shows the surface current distributions in the ground plane of the antenna at 3.6 GHz. As shown, the surface currents are mainly distributed around the diamond-ring slot radiator. In addition, for the different feeding ports of the antenna, the currents densities are equal and opposite due to the polarization diversity function [32–34]. Figure 5 illustrates the 3D radiation patterns of the antenna when it is fed differently (Port 1 and Port 2). As seen, the antenna exhibits similar radiation patterns with different orthogonal polarizations and more than 3 dB realized-gain. Radiation characteristics of the dual-polarized diamond-ring slot antenna in terms of radiation efficiency, total efficiency, and maximum gain are illustrated in Figure 6. As seen, the antenna provides high efficiencies. More than 80% radiation and total efficiency properties are obtained over the entire operation band (3.2–4 GHz). It can be observed that the antenna exhibits almost similar radiation and total efficiency. In addition, the antenna has around 2.5 dBi directivity.

**Figure 4.** Simulated current densities at 3.6 GHz for (**a**) 1st feeding port and (**b**) 2nd feeding port.

**Figure 5.** 3D views of the dual-polarized radiation patterns from (**a**) feeding port 1 and (**b**) feeding port 2.

A prototype of the design was fabricated and its S parameters were tested. Figure 7 shows a photograph of the fabricated prototype in the measurement setup. Figure 8 illustrates the measured and simulated S parameter results of the fabricated antenna. It is observed that the fabricated antenna works properly at the desired frequency range.

**Figure 6.** Radiation, total efficiencies, maximum gain of the diamond-ring slot antenna.

**Figure 7.** Photograph of the fabricated prototype in the measurement setup.

**Figure 8.** Measured and simulated S parameter results of the fabricated antenna.

#### **3. Mobile-Phone Antenna Design**

The simulated design layout of the mobile-phone antenna design is shown in Figure 9. It was implemented on an FR4 substrate with an overall dimension of 75 <sup>×</sup> 150 mm2. Four elements of the dual-polarized diamond-ring slot radiators are employed at the corner of the mobile-phone PCB. As can be observed, due to the compact size of the employed radiator, there is enough spaces in the configuration of smartphone antenna PCB to add other antennas covering different frequencies of 3G/4G mobile terminals. Figure 10 illustrates the simulated S parameters (including Snn and Smn) of the design over its operation band. It is evident that the proposed mobile-phone antenna exhibits good S parameters with wide bandwidth and low mutual coupling characteristics. As mentioned above, the dual-polarized radiation elements provide similar performances. 3D radiation patterns of antennas 1 and 2 at 3.6 GHz is represented in Figure 11. As seen, the antenna elements have quasi-omnidirectional radiation patterns covering the top and bottom portions of the mobile-phone PCB.

**Figure 9.** Designed mobile-phone antenna (**a**) transparent view, (**b**) top-layer and (**c**) bottom layer.

**Figure 10.** (**a**) Snn and (**b**) Smn results of the mobile-phone antenna.

**Figure 11.** 3D transparent views of the radiation patterns for (**a**) feeding port 1 and (**b**) feeding port 2.

Top-views of the radiation patterns for the proposed mobile-phone antenna design are displayed in Figure 12. It can be seen that each side of the mobile-phone PCB has been covered with differently polarized radiation patterns. Thus, the MIMO antenna exhibited good radiation coverage and can support different polarizations which make it more suitable to be used in future smartphones. Furthermore, the antenna provides high radiation and total efficiencies over the operation band, as

illustrated in Figure 13: more than 70% radiation and total efficiencies were obtained for the radiation elements at 3.6 GHz.

The proposed mobile-phone antenna was fabricated and its characteristics were tested in the Antenna Laboratory at the University of Bradford. Top and bottom views of the prototype are shown in Figure 14a,b, respectively. The mobile-phone antenna is constructed on a cheap FR4 dielectric with an overall dimension of 75 <sup>×</sup> 150 <sup>×</sup> 1.6 mm3. During the measurement process, 50-<sup>Ω</sup> RF loads are employed for the elements not under test to avoid their effects, as shown in Figure 14c. The measured and simulated S parameters (Snn: S11–S88 and Smn: S21–S81) of the fabricated design are illustrated in Figure 15. As illustrated, the diamond-ring slot resonators achieve good S parameters with sufficient impedance bandwidth and low mutual coupling characteristic in the desired frequency range. Some deviations from the measurements arise from the errors in fabrication, feeding and experiment processes.

**Figure 12.** 3D radiation patterns of the fifth generation (5G) mobile-phone antenna.

**Figure 13.** Efficiencies of the antenna elements (Ant. 1–Ant. 8) for the proposed design.

**Figure 14.** (**a**) Top, (**b**) bottom views of the fabricated design and (**c**) the prototype connected to the cables and 50-Ohm loads.

**Figure 15.** Measured and simulated (**a**) Snn (S11–S88) and (**b**) Smn (S21–S81) of the fabricated prototype.

According to the point that the radiation elements with the same placements and polarizations provide similar radiation patterns, 2D polar radiation patterns of the adjacent resonators (including Ant. 1 and Ant. 2) were measured at center operating frequency (3.6 GHz) and illustrated in Figure 16. As shown, the sample prototype exhibits good radiation patterns and provides acceptable agreement with simulations. In addition, the antenna elements with different polarizations provide sufficient gain values at the center frequency of the operation band.

**Figure 16.** Measured and simulated 2D radiation patterns for (**a**) Ant.1 and (**b**) Ant.2.

In order to ensure that the MIMO antenna can work properly, ECC and TARC characteristics are two important parameters which should be considered in MIMO antennas [35,36]. The ECC and TARC of two elements can be calculated from the S parameters using the formula described as:

$$ECC = \frac{\left| S\_{mm}^\* S\_{nm} + S\_{mn}^\* S\_{nn} \right|^2}{\left(1 - \left| S\_{mm} \right|^2 - \left| S\_{mm} \right|^2 \right) \left(1 - \left| S\_{mm} \right|^2 - \left| S\_{mn} \right|^2 \right)^\*} \tag{1}$$

$$TARC = -\sqrt{\frac{\left(S\_{mm} + S\_{mn}\right)^2 + \left(S\_{nm} + S\_{um}\right)^2}{2}}.\tag{2}$$

Figure 17 shows the calculated ECC and TARC results from simulated and measured S parameters of the mobile-phone antenna design. As evident from figures, the calculated ECC and TARC results are very low over the whole band of interest. The design provides less than 0.01 ECC over the entire operating band and proves that two adjacent antenna elements are irrelevant. In addition, its TARC value is less than −30 at 3.6 GHz. Table 2 summarizes and compares the fundamental characteristics of the presented mobile-phone antenna with the recently reported 5G mobile-phone antenna designs [16–25]. It can be observed that the proposed design can provide better performances in terms of efficiency, isolation and ECC. In addition, it exhibits wider bandwidth with pattern and polarization diversity characteristics to cover different sides of the mobile-phone PCB.

**Figure 17.** Calculated (**a**) envelope correlation coefficient (ECC) and (**b**) total active reflection coefficient (TARC) from measured S parameters.



#### **4. User-Hand**/**User-Head Impacts on the Mobile-Phone Antenna Performance**

The impact of human-hand/human-head on the characteristics of the design in terms of total efficiency and antenna realized-gain were investigated in this section [37–39]. As illustrated in Figure 18, different scenarios including right-hand and left-hand modes for top-layer and back-layer of the design are studied in simulations. According to the obtained results, the mobile-phone antenna design and its radiators exhibit good performances and provide sufficient total efficiencies in the presence of the human hand. Due to the symmetric configuration of the proposed design, it performs almost similarly for different hand scenarios. The maximum reductions of the radiation properties are observed for the radiation elements partially covered by the user hand which is due to the nature of hand tissue properties which can highly absorb the radiation power. As can be observed, the antenna elements provide 25–55% total efficiencies over the operation band of 3.2–4 GHz.

**Figure 18.** Placement and total efficiencies of the design for different user-hand scenarios (**a**) right-hand/top-layer, (**b**) right-hand/back-layer, (**c**) left-hand/top-layer and (**d**) left-hand/back-layer.

3D radiation patterns of the mobile-phone antenna in Talk-Mode at 3.6 GHz are illustrated in Figure 19a. It should be noted that the radiation performance of each antenna element mainly depends on its locations in the Talk-Mode scenario. As shown, the realized gain characteristic of the design varies from 3.2 to 4.9 dB. Compared with the radiation patterns in free space (Figure 12), due to the existence of the user's head and hand, radiation patterns are a bit distorted and become weaker. One can see that antenna elements are touched by different parts of the hand and head phantoms in the presented Talk-Mode.

**Figure 19.** (**a**) 3D and (**b**) 2D linearly-scaled radiation patterns of the mobile-phone antenna in Talk-Mode scenario.

The maximum reductions of the radiation properties are observed for the elements that are located near to user-head (Ant. 3 and Ant. 4) [40]. However, the difference is not very significant. The 2D-polar (linear-scaling) radiation patterns of the design are illustrated in Figure 19b. As can be observed, the directivity of the antenna radiation pattern is maximum in the opposite direction of the head, which is most important part of the body to protect from the radiation. The main lobe of each single-element directs most of the power while the other lobes should be negligible.

Figure 20 depicts the total efficiencies and reflection coefficient (Snn) of the antenna elements in the presence of the user-head and user-hand in Talk-Mode scenario. As seen, the diversion of the Snn characteristic of the design is not significant. In addition, the proposed MIMO design exhibit sufficient efficiency in its operation bandwidth. Based on the above analysis, we can conclude the MIMO design provides sufficient efficiency, radiation coverage and gain levels for diamond-ring slot radiators.

**Figure 20.** (**a**) Snn and (**b**) total efficiencies of the proposed smartphone antenna in Talk-Mode scenario.

#### **5. Conclusions**

A mobile-phone antenna design with dual-polarized radiators is proposed for 5G massive MIMO communications. The antenna configuration contains eight-port/four elements of diamond-rings slot radiators with L-shaped microstrip feed lines deployed at four corners of the PCB. The antenna elements exhibit wide bandwidth with the center frequency of 3.6 GHz. S parameters, radiation patterns, efficiency, ECC and TARC results of the design are studied and sufficient results are achieved. In addition, a prototype of the mobile-phone antenna was fabricated and measured. Moreover, the performances of the antenna in Hand-Mode and Talk-Mode scenarios are investigated. The obtained results demonstrated that the proposed smartphone antenna provides good characteristics and meets the requirements for use in future mobile handsets.

**Author Contributions:** Writing—original draft preparation, N.O.P., H.J.B., M.A., Y.O.P., Y.I.A.A.-Y., R.A.A.-A., and E.L.; writing—review and editing, N.O.P. and R.A.A.-A.; investigation, N.O.P., H.J.B., M.A., Y.O.P., Y.I.A.A.-Y.; resources, N.O.P., R.A.A.-A., E.L. and; For other cases, all authors have participated.

**Funding:** This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement H2020-MSCA-ITN-2016 SECRET-722424.

**Acknowledgments:** The authors wish to express their thanks to the support provided by the innovation programme under grant agreement H2020-MSCA-ITN-2016 SECRET-722424.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **High-Isolation Leaky-Wave Array Antenna Based on CRLH-Metamaterial Implemented on SIW with** ±**30<sup>o</sup> Frequency Beam-Scanning Capability at Millimetre-Waves**

**Mohammad Alibakhshikenari 1,\*, Bal Singh Virdee 2, Chan H. See 3,4, Raed A. Abd-Alhameed 5, Francisco Falcone <sup>6</sup> and Ernesto Limiti <sup>1</sup>**


Received: 22 April 2019; Accepted: 5 June 2019; Published: 6 June 2019

**Abstract:** The paper presents a feasibility study on the design of a new metamaterial leaky-wave antenna (MTM-LWA) used in the construction of a 1 × 2 array which is implemented using substrate-integrated waveguide (SIW) technology for millimetre-wave beamforming applications. The proposed 1 × 2 array antenna consists of two LWAs with metamaterial unit-cells etched on the top surface of the SIW. The metamaterial unit-cell, which is an E-shaped transverse slot, causes leakage loss and interrupts current flow over SIW to enhance the array's performance. The dimensions of the LWA are 40 <sup>×</sup> 10 <sup>×</sup> 0.75 mm3. Mutual-coupling between the array elements is suppressed by incorporating a metamaterial shield (MTM-shield) between the two antennas in the array. The LWA operates over a frequency range of 55–65 GHz, which is corresponding to 16.66% fractional bandwidth. The array is shown to exhibit beam-scanning of ±30◦ over its operating frequency range. Radiation gain in the backward (−30◦), broadside (0◦), and forward (+30◦) directions are 8.5 dBi, 10.1 dBi, and 9.5 dBi, respectively. The decoupling slab is shown to have minimal effect on the array's performance in terms of impedance bandwidth and radiation specifications. The MTM-shield is shown to suppress the mutual coupling by ~25 dB and to improve the radiation gain and efficiency by ~1 dBi and ~13% on average, respectively.

**Keywords:** Metamaterials (MTM); leaky-wave antenna (LWA); antenna arrays; substrate integrated waveguide (SIW); transverse slots; beam-scanning; mutual coupling isolation; millimetre-wave; composite right/left-handed transmission line (CRLH-TL)

#### **1. Introduction**

Leaky-wave antennas (LWAs) are travelling wave antennas with electrically large radiating aperture [1,2]. Such antennas can provide high gain directive beam without using a complex feeding network [3,4]. The advantage of LWA over a conventional array antenna is a simple feed structure

with low loss [5–7]. Conventional planar LWAs radiate higher order modes in the forward direction [8]. However, the periodic structure-based LWA can radiate in both forward and backward directions. It has been shown that metamaterial-based LWA designs can achieve a continuous main beam scanning from the backward to the forward direction as a function of frequency [9]. Various LWA designs based on metamaterial structures have been considered in the past. Such designs include (1) an LWA with a composite right/left-hand (CRLH)-folded substrate-integrated waveguide (SIW) structure that is shown to provide beam scanning from of −58◦ to 65◦ with a gain of 1 dBi [10]; (2) an interdigital-shaped slotted-SIW-based LWA that is reported to achieve a scanning angle of −60◦ to 70◦ with gain of around 8 dBi [11]; (3) a CRLH LWA based on a rectangular waveguide structure that has been demonstrated for a continuous main beam scanning range from −70◦ to 70◦ with gain of 8.64 dBi [12]; and (4) a planar slotted SIW LWA that provides a scanning range of −66◦ to 78◦ with consistent gain [13].

In this paper, a new type of LWA in array configuration is proposed based on SIW with metamaterial inclusions for millimetre-wave applications. Mutual coupling between the closely-spaced antennas in the array can undermine the array's performance. To circumvent this, a metamaterial shield is embedded between the two LWAs. With this approach, mutual coupling is shown to reduce by an average of ~25 dB over the array's operating frequency range. Engraved on the upper layer of the SIW LWA are several metamaterial unit-cells comprising of transverse E-shaped slots. Dimensions of the LWAs were modified for optimum array performance.

#### **2. Design Process of The Proposed Mtm-Lwa Array Based On Siw**

The proposed 1 × 2 array antenna based on MTM-LWA implemented on SIW technology is designed on a RO3003 dielectric substrate with ε*<sup>r</sup>* = 3.0, tangδ = 0.0010 and thickness of 0.75 mm. Figure 1 displays the layout of the proposed array structure that is constructed with two MTM-LWA on SIW. Engraved on the top of each SIW antenna are several metamaterial unit-cells consisting of transverse E-shaped slots. Leakage loss at the slots interrupts the current flow over SIW, which is shown to enhance the array's impedance bandwidth and beamwidth that scans as a function of frequency. In the structure, the transverse slots behave as series left-handed capacitance and the grounded via holes acts as shunt left-handed inductors.

**Figure 1.** *Cont.*

**Figure 1.** Proposed 1 × 2 array antenna based on MTM-LWA using SIW technology. (**a**) Top view; (**b**) view to show the Substrate integrated waveguide slots; (**c**) back side to show the ground plane.

The structural parameters of the MTM-LWA array are summarized in Table 1. Each antenna has dimensions of 40 <sup>×</sup> <sup>10</sup> <sup>×</sup> 0.75 mm3. The overall ground plane dimensions are 50 <sup>×</sup> <sup>35</sup> <sup>×</sup> 0.75 mm3. The S-parameter responses of the proposed array antenna are exhibited in Figure 2, which shows it operates throughout the frequency range of 55–65 GHz, which corresponds to 16.66% fractional bandwidth.


**Table 1.** Structural parameters of the array.

**Figure 2.** S-parameter responses of the proposed array antennas. Since the structure is symmetrical, we have not plotted all curves.

Radiation gain patterns of the proposed array antenna at three spot frequencies within its operating frequency range are plotted in Figure 3. It is evident that the array antenna is capable of beam-scanning from <sup>−</sup>30◦ to <sup>+</sup>30◦ with backward radiation at <sup>−</sup>30◦, broadside radiation at 0◦ , and forward radiation at +30◦. In backward, broadside, and forward directions, the gain is 8.5, 10.1, and 9.5 dBi, respectively.

**Figure 3.** *Cont.*

**Figure 3.** Radiation characteristics of the proposed 1 × 2 MTM-LWA array at 55 GHz, 60 GHz, and 65 GHz. (**a**) Backward-radiation at 55 GHz; (**b**) broadside-radiation at 60 GHz; (**c**) forward-radiation at 65 GHz.

#### **3. Suppress the Mutual Coupling Between the Closely Spaced Mtm-Lwa Arrays**

Mutual coupling between closely-spaced radiation elements can severely undermine the array's radiation performances. Here the isolation between the two MTM-LWAs is increased by introducing a metamaterial shield which is based on the SIW structure, as indicated in Figure 4. It comprises of transverse slots that are tapered. The slots have a width of 0.5 mm and essentially play the role of the series left-handed capacitances (*CL*), where the metallic via-holes with diameter of 0.25 mm act as shunt left-handed inductances (*LL*). The MTM-shield suppresses surface waves created by the LWAs to increase isolation between the two antennas in the array. The overall dimensions of the shield are <sup>40</sup> <sup>×</sup> 4 mm2.

Figure 5 shows the S-parameter response before and after applying the MTM-shield. After applying MTM-shield, the minimum, average and maximum suppression observed are 8 dB, ~25 dB, and 42.5 dB, respectively. This shows the effectiveness of the proposed isolation technique. The shield has no influence on the reflection coefficient response which is S11 ≤ −10 dB.

Radiation patterns of the proposed antenna arrays with no and with MTM-shield through its operational bandwidth at spot frequencies of 55 GHz, 60 GHz & 65 GHz are plotted in Figure 6. It is clear from this figure that with the shield the cross-polarized radiation over its operating range is substantially reduced. The average gain of the co-polarized radiation is only marginally affected. All details are tabulated in Table 2.

In addition, the radiation gain and efficiency curves over frequency bandwidth for both antennas without and with the proposed shield are shown in Figure 7. Obviously, after realizing the metamaterial shield based on SIW, the radiation gain and efficiency performances improved by ~1dBi and ~13% on average, respectively.

Surface current density distributions without and with the MTM-shield are exhibited in Figure 8. This figure shows that the MTM-shield is an effective EM band-gap structure to remarkably block surface currents from EM interacting with adjacent radiation elements in the antenna array. Destructive influences of surface currents in the antenna are dramatically suppressed from effecting the far-field of the array antennas.

**Figure 4.** Proposed SIW-based leaky-wave antenna array with MTM-shield. (**a**) Proposed MTM-SIW shield located between the array antennas; (**b**) top-view of the leaky-wave array antennas; (**c**) back-side to show ground plane.

**Figure 5.** Reflection and transition coefficients of the proposed antenna array before and after applying the MTM-shield. Since the structure is symmetrical, we have not plotted all curves.

**Figure 6.** Co- and cross-polarized radiation gain patterns of the proposed structure without (WO) and with (W) metamaterial-shield.


**Table 2.** Radiation properties.

**Figure 7.** Radiation gain and efficiency curves over frequency band for both cases with no and with MTM shield. (**a**) Radiation-gain; (**b**) Radiation-efficiency.

**Figure 8.** Surface current density distributions without and with the metamaterial shield at 60 GHz. (**a**) without the metamaterial shield; (**b**) with the metamaterial shield.

#### **4. Circuit Model of The Proposed MTM-Lwa Array and its Dispersion Phenomenon**

One way to explain the metamaterials (MTMs) is the transmission-line theory in terms of the circuit models. The concept of the composite right/left-handed metamaterial transmission lines (CRLH-MTM TLs) is investigated and realized based on this approach. This solution has been broadly recognized and adopted as a powerful analysis tool for the understanding and modelling of MTM devices. By considering the right-handed (RH) effects within a purely left-handed (LH) circuit, it demonstrates a general configuration of a practical MTM-TL. The circuit model of a generic symmetrical CRLH transmission-line unit-cell has exhibited in Figure 9 where the loss is neglected for simplicity. The series capacitance (*CL*) and the shunt inductance (*LL*), which have been realized by the slots and via-holes, respectively, contribute to the left-handedness while the series inductance (*LR*) and the shunt capacitance (*CR*), which have been implemented by the unwanted currents flowing on the patches and the gap space between the patches and ground plane, respectively, actualize its right-handed (RH) dual counterpart. The one indicated in Figure 9a is called *T*-type model with the LH capacitance placed at the two ends. The mushroom unit cell belongs to this type [14]. The circuit exhibited in Figure 9b is called π-type model with the LH capacitance in the centre. One example is the CRLH-SIW unit cell [15–17]. Therefore, each unit cell of the proposed leaky wave array antennas is based on the π-type model, which has been identified in Figure 4b.

**Figure 9.** Equivalent circuit models for the symmetrical CRLH-metamaterial unit-cells. (**a**) *T*-type circuit model. (**b**) π-type circuit model.

By using the periodic boundary conditions corresponding to the Bloch–Floquet theorem, these two CRLH transmission line unit-cells basically become equal and their dispersion relevance is determined to be [9]:

$$\beta(\omega) = \frac{1}{p}\cos^{-1}(1 - \frac{1}{2}(\frac{\omega\_L^2}{\omega^2} + \frac{\omega^2}{\omega\_R^2} - \frac{\omega\_L^2}{\omega\_{\text{sc}}^2} - \frac{\omega\_L^2}{\omega\_{\text{sh}}^2})) \tag{1}$$

where *p* = *5 mm* is the length of the unit-cell and

$$aw\_L = \frac{1}{\sqrt{\mathbb{C}\_L L\_L}}\tag{2}$$

$$
\omega\_{\mathbb{R}} = \frac{1}{\sqrt{\mathbb{C}\_{\mathbb{R}} L\_{\mathbb{R}}}} \tag{3}
$$

$$
\omega\_{\mathfrak{sl}} = \frac{1}{\sqrt{\mathbb{C}\_L L\_R}} \tag{4}
$$

$$
\omega\_{\rm sl} = \frac{1}{\sqrt{\mathbb{C}\_{\rm R} L\_{\rm L}}} \tag{5}
$$

Seemingly, there are two frequency spots referred to as the infinite-wavelength points (β = 0) with a bandgap in between. In the balanced case (ω*se* = ω*sh*), the bandgap vanishes. Generally, just one particular zeroth-order resonance will be excited that depends on the boundary conditions and the circuit values. For the short-ended resonator, it is determined by ω*se*, while for the open-ended case, it is represented by ω*sh* [9]. Multiplex resonances containing the negative-, zeroth-, and positive- order resonances can be produced by cascading more than one unit-cell. Those resonance frequencies of

different order modes for an M-stage CRLH-transmission line can be discovered on the dispersion diagram when the following condition is satisfied [9]:

$$
\partial\_M = \beta p\_M = \beta M\_p = n\pi \tag{6}
$$

$$\beta p = \frac{n\pi}{M} \begin{cases} n = 0, \ \pm 1, \ \ldots, \ \pm \!\!/ M - 1, \ \acute{f}n \text{ } T \text{ } type \text{ } unicec \text{ } cell \\\ n = 0, \ \pm 1, \ \ldots, \ \pm M, \ \acute{f}n \text{ } \pi \text{ } type \text{ } unicec \text{ } cell \end{cases} \tag{7}$$

The proposed leaky wave antenna (LWA) array applying the CRLH-SIW unit-cell as shown in Figures 1 and 4 has an π-type circuit model with the two ends terminated by the LH inductances (*LL*) realized by the metallic via-holes. The E-shaped transverse and the tapered transverse slots have presented within the circuit model as the LH capacitances (*CL*). RH contribution comes from the distributed shunt capacitor (*CR*) and the series inductor (*LR*). Figure 10 shows the dispersion curves of the proposed leaky wave array antennas based on the CRLH-SIW unit-cell that has been extracted by the CST Microwave Studio package and the equivalent circuit model exhibited in Figure 9b. It has been observed that the results achieved from CST Microwave Studio package and the circuit elements are in an excellent coherence with each other, and also it illustrates the dispersion relation of the unit-cell very well. There is one zeroth-order resonance frequency occurring at 61 GHz that defines the upper and lower edges of the stop-band. This LW antenna array falls into the short-ended case because it is operated below the original waveguide cutoff-frequency and the metallic via-holes offer the short-ended condition. According to Equations (6) and (7), there is one resonance that can be excited comprising the zeroth-order resonance at *fse* = 61 GHz. Figure 5 shows the reflection coefficient which has verified the proposed model. Notice that this resonance frequency can be simply controlled by engineering the dispersion diagram. Magnitudes of the equivalent circuit parameters, which were determined from full-wave EM simulation using CST Microwave Studio package, are *LL* = 6.45 *nH*, *C*<sup>L</sup> = 8.69 *pF*, *L*<sup>R</sup> = 1.53 *nH*, and *CR* = 4.12 *pF*. This data was then utilized to determine the equivalent circuit-model of the antenna, displayed in Figure 9b, which was verified applying ADS (RF-circuit solver).

**Figure 10.** Dispersion diagrams for the proposed LWA array based on SIW-MTM extracted by CST Microwave Studio package and the corresponding equivalent circuit shown in Figure 9b.

The proposed LWA is exhibited as a traveling-wave antenna, where the current propagates along a guiding structure. Since that, the perturbations are introduced along the structure by implementing the E-shaped transverse and the traveling-wave leaves the structure and radiates into free-space. Therefore, in the ideal case, no energy reaches the end of the structure. In a practical scenario, any energy that reaches the end is absorbed by a matching load. Usually, LWA is designed, in which at the least 90% of the power at the structures leaks away before the traveling-waves reach the end

of the antenna. Leaky-wave phenomenon is demonstrated with fast propagating waves only. The propagating wave number *Kp* is defined by [18,19]

$$K\_p = \sqrt{K\_0^2 - K\_z^2} \tag{8}$$

In this case, *Kz* = −*j*α is for surface-wave or slow-wave, or *Kz* = β is for leaky-wave or fast-wave, where *Kz* is the longitudinal wave-number and *K*<sup>0</sup> refers to the free-space propagating wave-number. The complexity of radiation *Kz* is given by

$$K\_z = \beta - ja\tag{9}$$

where α and β are the attenuation and phase constants respectively. Supposing that there is a standard free-space wave equation for the above wave, the waves outside the leaking-structure are given by:

$$\Psi(r) = \Psi\_0 e^{-j(k\_\rho p + \beta\_z)} e^{\alpha\_z} \tag{10}$$

If β < *Kp* i.e., if the phase velocity is smaller than the free-space velocity of light, *Vp* < *C*, then it is a slow-wave and *Kp* is imaginary. The wave decays exponentially in amplitude along the length of the structure and it is a bounded wave. If, β > *Kp* i.e., if the phase velocity is greater than the free space velocity of light, *Vp* > *C*, then it is a fast-wave and *Kp* is purely real; therefore, the real power at an angle is radiated with respect to the normal defined by [20]:

$$\sin(\theta) = \sin^{-1}\left(\frac{\beta}{K\_0}\right) = \sin^{-1}\left(\frac{c\beta}{a\nu}\right) \tag{11}$$

Since all of the abovementioned terms are functions of the angular frequency, the angle changes with frequency; hence, this shows frequency scanning behaviour. The main beam-width is

$$
\Delta\theta\_0 = \frac{0.91}{\binom{l}{\lambda} \cos\theta\_0} \tag{12}
$$

If the above equation is applied for large antenna lengths, high directivity can be specified as

$$D = \frac{4\pi A\_{\epsilon}}{\lambda\_0} \tag{13}$$

However, the effect of enhancing directivity is negligible if there is no power left near the end of the structures. To specify this parameter, the attenuation/leakage constant is determined as [20]:

$$\alpha\_z = \frac{e\_r^{\frac{A^2(z)}{2}}}{\int\_0^l A^2(z)dz - e\_r \int\_0^z A^2(z)dz} \tag{14}$$

Thus, if <sup>α</sup> is sufficiently small so that (1 <sup>−</sup> *<sup>e</sup>*−2α*<sup>l</sup>* <sup>&</sup>gt; 0), the improvement of directivity is perceptible as length *l* enhances.

#### **5. Comparison between This Work and The Literature**

Performance parameters of the proposed 1 × 2 MTM-LWA array antenna based on SIW is compared with the recent works employing various mutual coupling suppression techniques. The comparison in Table 3 is for array antennas composed of two radiation elements. Most of the arrays listed in Table 3 exhibit narrow band performance, and to increase isolation between the radiation elements they employ defected ground structures (DGS) to complement their suppression technique. The proposed array antenna presented here has the advantage of (i) symmetry; (ii) very wide bandwidth from 55

GHz to 65 GHz; (iii) simple design; (iv) improved radiation patterns; (v) enhanced radiation gain; (vi) low cross-polarization levels; and (vii) mutual coupling suppression on average of ~25dB over its operating band.



#### **6. Conclusion**

A feasibility study of a new metamaterial leaky-wave array antenna based on substrate-integrated waveguide (SIW) technology with transverse slots and metallic via-holes for operation over 55 GHz to 65 GHz was proposed and investigated. The array antenna provides beam-scanning capability of ±30◦ with the gain of 8.5, 10.1, and 9.5 dBi at backward (-30◦), broadside (0◦ ), and forward (+30◦) directions, respectively. To increase the isolation between the array's elements, a metamaterial shield based on SIW was introduced between the antennas, which has reduced the mutual coupling by an average value of ~25 dB. In addition, the proposed MTM shield has increased the radiation gain and efficiency by an average value of ~1dBi and ~13%, respectively.

**Author Contributions:** Conceptualization, M.A.; B.S.V.; F.F.; E.L.; methodology, M.A.; C.H.S.; F.F.; E.L.; software, M.A.; B.S.V; C.H.S.; validation, M.A.; B.S.V.; C.H.S.; R.A.A.-A.; F.F.; E.L.; formal analysis, M.A.; B.S.V.; F.F.; E.L.; investigation, M.A.; B.S.V.; C.H.S.; R.A.A.-A.; F.F.; E.L.; resources, M.A.; C.H.S.; R.A.A.-A.; E.L.; data curation, M.A.; C.H.S.; R.A.A.-A.; F.F.; writing—original draft preparation, M.A.; writing—review and editing, B.S.V.; C.H.S.; R.A.A.-A.; F.F.; E.L.; visualization, M.A.; B.S.V.; C.H.S.; F.F.; E.L.; supervision, E.L.; project administration, R.A.A.-A.; F.F.; E.L.; funding acquisition, R.A.A.-A.; F.F.; E.L..

**Funding:** This work is partially supported by grant agreement H2020-MSCA-ITN-2016 SECRET-722424 and the UK EPSRC under grant EP/E022936/1.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Electronics* Editorial Office E-mail: electronics@mdpi.com www.mdpi.com/journal/electronics

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18