Efficient QC-LDPC Encoder for 5G New Radio

Nguyen, Tram Thi Bao; Nguyen Tan, Tuy; Lee, Hanho

doi:10.3390/electronics8060668

Open AccessFeature PaperArticle

Efficient QC-LDPC Encoder for 5G New Radio

by

Tram Thi Bao Nguyen

,

Tuy Nguyen Tan

and

Hanho Lee

^*

Department of Information and Communication Engineering, Inha University, Incheon 22212, Korea

^*

Author to whom correspondence should be addressed.

Electronics 2019, 8(6), 668; https://doi.org/10.3390/electronics8060668

Submission received: 22 May 2019 / Revised: 7 June 2019 / Accepted: 11 June 2019 / Published: 13 June 2019

(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents a novel efficient encoding method and a high-throughput low-complexity encoder architecture for quasi-cyclic low-density parity-check (QC-LDPC) codes for the 5th-generation (5G) New Radio (NR) standard. By storing the quantized value of the permutation information for each submatrix instead of the whole parity check matrix, the required memory storage size is considerably reduced. In addition, sharing techniques are employed to reduce the hardware complexity. The encoding complexity of the proposed method was analyzed, and indicated a substantial reduction in the required area as well as memory storage when compared with existing state-of-the-art encoding approaches. The proposed method requires only 61% gate area, and 11% ROM storage when compared with a similar LDPC encoder using the Richardson–Urbanke method. Synthesis results on TSMC 65-nm complementary metal-oxide semiconductor (CMOS) technology with different submatrix sizes were carried out, which confirmed that the design methodology is flexible and can be adapted for multiple submatrix sizes. For all the considered submatrix sizes, the throughput ranged from 22.1–202.4 Gbps, which sufficiently meets the throughput requirement for the 5G NR standard.

Keywords:

quasi-cyclic LDPC code; channel codes; 5G New Radio; encoding

1. Introduction

Low-density parity-check (LDPC) codes [1], which were first proposed by Gallager in the early 1960s and rediscovered by MacKay and Neal [2] in 1996, have attracted widespread attention thanks to their remarkable error correction capabilities near the Shannon limit, with advancements in very large-scale integration (VLSI). Moreover, LDPC codes are among the most widely used types of forward error correction (FEC) codes in several communications standards such as the wireless local area network (WLAN, IEEE 802.11n), wireless radio access network (WRAN, IEEE 802.22), digital video broadcast (DVB), and the Advanced Television System Committee (ATSC). Recently, the fifth generation (5G) communication has been a hotspot of research and development [3]. More specially, LDPC codes play an important role in 5G communication and have been selected as the coding scheme for the 5G enhanced Mobile Broad Band (eMBB) data channel [4]. To support compatible rate and scalable data transmission, 3rd Generation Partnership Project (3GPP) has agreed to consider two rate-compatible base graphs, BG1 and BG2, for the channel coding [5]. Accordingly, several studies have been conducted on the 5G LDPC codes. In [6], a low-cost and flexible demonstration platform is designed and implemented to evaluate the real-time performance of LDPC over the air interface as defined by 5G New Radio (NR) specifications. An algebra-assisted method for constructing 5G LDPC codes is presented in [7].

Over recent years, research on LDPC codes has been focused on structured LDPC codes known as quasi-cyclic low-density parity-check (QC-LDPC) codes [8,9,10,11,12], which exhibit advantages over other types of LDPC codes with respect to the hardware implementations of encoding and decoding using simple shift registers and logic circuits. A low-complexity encoder can be realized by using QC-LDPC codes, due to the sparseness of the parity check matrix. However, it is not straightforward to encode with low complexity as LDPC codes are defined by their parity check matrix, and the generator matrix is generally unknown. Various approaches have been suggested to improve the hardware complexity of LDPC encoders [13,14,15,16,17,18,19,20,21]. One of the most conventional approaches is systematic encoding, in which the generator matrix is derived from the parity check matrix by exploiting Gaussian elimination. The main drawback related to this method is that the storage overhead is dramatically increased for large block sizes, which limits its practical applicability. The Richardson–Urbanke (RU) algorithm is a widely-used LDPC codes encoding scheme developed by Richardson and Urbanke [13]. The underlying principle of the method is the transformation of the parity check matrix into an approximate lower triangular (ALT) form by using only row and column permutations, which preserves the sparseness of the matrix. This method suffers from a long critical path, which could make the LDPC encoder unsuitable for high throughput applications. To overcome the limitations of the previous approaches, the design proposed in this paper, which is referred to as a low-complexity high-throughput LDPC encoder architecture for the 5G standard, requires significantly less area and memory storage while maintaining a high throughput.

This paper targets the design of low-complexity high-throughput QC-LDPC encoders for the 5G NR standard. In LDPC encoders, the memory and interconnecting blocks are considered as the major influencing factors of the overall area, delay, and power performance of the hardware design. Hence, the size of the read only memory (ROM) was decreased by storing the quantized value of the permutation information for each submatrix instead of the entire parity check matrix H. The proposed architecture requires less matrix multiplications than the RU method, by exploiting the characteristics of the 5G NR base matrix. In addition, the proposed algorithm does not require the inverse of the component matrix, which presents a primary advantage over the RU method. Moreover, block-memories are not required to store the generator matrix G, and the number of required components is reduced. The ROM size of the proposed method is 98.2% and 88.9% lower than those of the G matrix method and RU method, respectively.

To assess the benefits of the proposed encoding approach, we further implement and synthesize several QC-LDPC encoder architectures with different submatrix sizes Z = 30, 64, 96, 144, and 352. The application specific integrated circuit (ASIC) post synthesis implementation results on TSMC 65-nm complementary metal-oxide semiconductor (CMOS) technology revealed an area efficiency up to 597 Gbps/mm

^{2}

when the proposed encoding method was implemented. Hence, it can be concluded that a promising encoding architecture design for 5G NR LDPC codes was developed in this study.

The remainder of this paper is organized as follows. Section 2 gives a brief overview of the characteristics of 5G NR QC-LDPC codes. In Section 3, two conventional LDPC encoding algorithms from the literature are outlined. A novel 5G NR QC-LDPC encoding approach and a low-complexity high-throughput QC-LDPC encoder architecture are described in Section 4. Section 5 presents the implementation and comparison results, followed by the conclusions in Section 6.

2. 5G NR QC-LDPC Codes

The NR access technology marks a transition in FEC coding for the 3GPP of cellular technologies [22]. In this section, the QC-LDPC codes are reviewed, and the characteristics of standard 5G QC-LDPC codes are summarized. In addition, procedures are presented for the construction of the parity check matrix of the target LDPC codes.

2.1. Preliminary

Let Z be the size of a circulant permutation matrix and

P_{i, j}

be the shift value. For any integer value

P_{i, j}

,

0 \leq P_{i, j} \leq Z

, a

Z \times Z

circulant permutation matrix shifts the

Z \times Z

identity matrix I to the right by

P_{i, j}

times for the

(i, j)

-th non-zero element in a base matrix. This binary circulant permutation matrix is denoted as

Q (P_{i, j})

. Considering

Q (1)

as an example,

Q (1) = [\begin{matrix} 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & 1 \\ 1 & 0 & 0 & \dots & 0 \end{matrix}] .

(1)

For simple notation,

Q (- 1)

denotes the null matrix (all elements equal to zero) of the same size.

2.2. Introduction to QC-LDPC Codes

A binary QC-LDPC code can be characterized by the null space of an array of sparse circulants of the same size [7,23,24]. Taking into account the implementation, the parity-check matrix H of a QC-LDPC code can be defined by its base graph and shift coefficients

(P_{i, j})

. Elements 1s and 0s in the base graph are replaced by a circulant permutation matrix and a zero matrix of size

Z \times Z

, respectively. For two positive integers

m_{b}

and

n_{b}

, with

m_{b} \leq n_{b}

, consider the QC-LDPC code expressed by the following

m_{b} \times n_{b}

array of

Z \times Z

circulants over GF(2):

H = [\begin{matrix} Q (P_{1, 1}) & Q (P_{1, 2}) & \dots & Q (P_{1, n_{b}}) \\ Q (P_{2, 1}) & Q (P_{2, 2}) & \dots & Q (P_{2, n_{b}}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Q (P_{m_{b}, 1}) & Q (P_{m_{b}, 2}) & \dots & Q (P_{m_{b}, n_{b}}) \end{matrix}] .

(2)

The exponent matrix of H, which is

E (H)

, has the following form:

E (H) = [\begin{matrix} P_{1, 1} & P_{1, 2} & \dots & P_{1, n_{b}} \\ P_{2, 1} & P_{2, 2} & \dots & P_{2, n_{b}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ P_{m_{b}, 1} & P_{m_{b}, 2} & \dots & P_{m_{b}, n_{b}} \end{matrix}] .

(3)

Each entry in the matrix E is referred to as a shift value. It should be noted that the parity check matrix H in Equation (2) can be constructed by expanding the

m_{b} \times n_{b}

exponent matrix

E (H)

. This procedure is referred to as protograph construction [25].

2.3. 5G NR QC-LDPC Characteristics

As mentioned above, QC-LDPC codes play an important role in 5G communications and have been accepted as the channel coding scheme for the 5G eMBB data channel in 3GPP standard meeting. Figure 1 illustrates the general structure of the NR QC-LDPC base graph. The columns are divided into three parts: information columns, core parity columns, and extension parity columns. The rows are partitioned into two parts: core check rows and extension check rows. As shown in the figure, the base matrix is composed of five submatrices, namely, A, B, O, C, and I [22]. Submatrix A corresponds to systematic bits. In addition, B corresponds to the first set of parity bits and is a square matrix with a dual-diagonal structure: its first column is of weight 3, whereas the submatrix composed of other columns after the first column has an upper dual-diagonal structure. Submatrix O is an all-zero matrix. For the efficient support of incremental redundancy hybrid automatic repeat request (IR-HARQ), a single parity-check (SPC) based extension is used to support lower rates, as shown in Figure 1. Submatrix C corresponds to SPC rows, and I is an identity matrix that corresponds to the second set of parity bits, i.e., the SPC extension. The combination of A and B is referred to as the kernel, and the other parts (O, C, and I) are referred to as extensions. This code structure is similar to the Raptor-like extension, as described in [26].

The 3GPP agreed to consider two rate-compatible base graphs, denoted by BG1 and BG2, for the channel coding. Base graphs BG1 and BG2 have similar structures. However, BG1 is targeted for larger block lengths

(500 \leq K \leq 8448)

and higher rates

(1 / 3 \leq R \leq 8 / 9)

, whereas BG2 is targeted for smaller block lengths

(40 \leq K \leq 2560)

and lower rates

(1 / 5 \leq R \leq 2 / 3)

. The actual base graph usage and the definition of the two matrices are detailed in the NR standard specification TS 38.212 [27]. The base graph that supports

K_{m a x}

should support the following set of shift sizes Z, where

Z = a \times 2^{j}

for

a \in {2, 3, 5, 7, 9, 11, 13, 15}

and

0 \leq j \leq 7

.

For base graphs BG1 and BG2, the number of shift coefficient designs is 8. All lift sizes are divided into eight sets based on parameter a, where a is used for the definition of the lifting-size

a \times 2^{j}

. The set of shift coefficients are listed in Table 1.

The shift value

P_{i, j}

can be calculated using the function

P_{i, j} = f (V_{i j}, Z)

, where

V_{i, j}

is the shift coefficient of the

(i, j)

-th element in the corresponding shift design. The function f is defined as Equation (4), in which

m o d

denotes the modulo arithmetic:

P_{i, j} = f (V_{i, j}, z) = \{\begin{matrix} - 1, & if V_{i, j} = - 1, \\ m o d (V_{i, j}, z), & e l s e . \end{matrix}

(4)

The following procedures are the steps of constructing the parity check matrix of the target

(N, K)

QC-LDPC code with a given information block size K and code rate

R = K / N

. For a base graph,

k_{b}

denotes the number of information circulant columns; thus, if the lifting size is Z,

K = Z \times k_{b}

nominally.

Step 1:

Obtain the base graph BG1 or BG2 and determine the value of

k_{b}

for the given K and R.

–: For BG1: $k_{b} = 22$ .
–: For BG2: $k_{b} = 10$ if $K > 640$ ; $k_{b} = 9$ if $560 < K \leq 640$ ; $k_{b} = 8$ if $192 < K \leq 560$ ; and $k_{b} = 6 elsewhere .$

Step 2:

Determine Z by selecting the minimum Z value in Table 2, such that

k_{b} \times Z \geq K

.

Step 3:

After the lifting size Z is determined, the corresponding shift coefficient matrix is then selected from Table 1 {Set 1, Set 2,⋯, Set 8} according to set Z.

Step 4:

Calculate the shifting coefficient value

P_{i, j}

by the modular Z operation, as discussed in Equation (4).

Step 5:

Replace each entry in the final exponent matrix with the corresponding circulant permutation matrix or zero matrix of size

Z \times Z

. The QC-LDPC code construction is completed and a parity check matrix H of size

m_{b} Z \times n_{b} Z

is obtained. In 5G QC-LDPC codes, shortening and puncturing is carried out to obtain the desired information lengths and rate adaption. Figure 2 presents an illustration of the encoding process of these codes

3. LDPC Encoding Algorithms

Given a parity check matrix H, the objective of LDPC encoding is to solve parity equations:

H C^{T} = 0^{T},

(5)

where C is the systematic codeword, which consists of the information bit vector S and parity code vector P.

This section presents a review on two generic encoding methodologies for the implementation of the LDPC encoder: the Gaussian elimination method and the RU method.

3.1. LDPC Encoding with Gaussian Elinination

The Gaussian elimination is the most conventional method of encoding LDPC codes, which is carried out by the multiplication of the generator matrix G, and contains a complexity quadratic in the block length [19]. The unknown generator matrix G can be derived from the parity check matrix H. A generator matrix for code with a parity check matrix H can be obtained by carrying out Gauss–Jordan elimination on H in the following form:

H = [\begin{matrix} A & I_{N - K} \end{matrix}],

(6)

where A is an

(N - K) \times K

binary matrix and

I_{N - K}

is the identity matrix of order

(N - K)

. The generator matrix is as follows:

G = [\begin{matrix} I_{K} & A^{T} \end{matrix}] .

(7)

The codeword C is then obtained by multiplying the generator matrix G by the systematic bits S as follows:

C = S G .

(8)

The sequential LDPC encoder based on the multiplication of the G matrix requires a ROM to store the generator matrix used to compute the codeword C. The main drawback of this approach is that, unlike parity check matrix H, the corresponding generator matrix G will most likely not be sparse. The complexity of this straightforward encoding algorithm is

O (N^{2})

, where N is the number of bits in a codeword. Therefore, the implementation of the matrix multiplication at the encoder results in a very high complexity. For an arbitrary parity check matrix, the construction of G should be avoided and encoding should be carried out using back substitution with H.

3.2. LDPC Encoding with the RU Method

Instead of determining a generator matrix for H, an LDPC code can be directly encoded using the parity check matrix by transforming it into a lower triangular form and applying back substitution. The RU encoding method, which was proposed by Richardson and Urbanke [13], is a linear time encoding method for sparse parity check matrices. The underlying principle is transformation using only row and column permutations, to reformulate a parity check matrix H into a sparse matrix. Therefore, this approach can reduce the complexity more than the G matrix multiplication method. The RU algorithm consists of two steps: a pre-processing step and actual encoding step.

First, in the pre-processing step, the parity check matrix H is converted into the approximate lower triangular (ALT) form, as shown in Figure 3. The parity check matrix H is given by the

M \times N

matrix, where N is the block length of the code and M is the number of parity check equations. Given that the matrix transformation is realized solely by row and column permutations, the H matrix remains a sparse matrix:

H_{T} = [\begin{matrix} A & B & T \\ C & D & E \end{matrix}] .

(9)

Here, the matrix T has a lower triangular form with 1s along the diagonal, and all the entries above the diagonal are 0s. By multiplying H from the left by

[\begin{matrix} I & 0 \\ - E T^{- 1} & I \end{matrix}],

(10)

the following is obtained:

\tilde{H} = [\begin{matrix} A & B & T \\ \tilde{C} & \tilde{D} & 0 \end{matrix}],

(11)

where

\begin{matrix} \tilde{C} = - E T^{- 1} A + C, \\ \tilde{D} = - E T^{- 1} B + D, \\ \tilde{E} = - E T^{- 1} T + E = 0 . \end{matrix}

(12)

The actual encoding step is performed by matrix-multiplication, forward-substitution and vector addition operations. Let the codeword

C = [s p_{1} p_{2}]

where s represents the information bits,

p_{1}

denotes the first G parity check bits, and

p_{2}

contains the remaining

(M - G)

parity check bits. The codeword C must satisfy the parity check equation

H C^{T} = 0^{T}

. The two equations are then expressed by:

\begin{matrix} A s^{T} + B p_{1}^{T} + T p_{2}^{T} = 0^{T}, \\ \tilde{C} s^{T} + \tilde{D} p_{1}^{T} + 0 p_{2}^{T} = 0^{T} . \end{matrix}

(13)

Using the RU method, the calculation of the parity bits in the first parity portion

p_{1}

is only dependent on the information bits, given that E was cleared. Hence, it can be calculated independently of the parity bits in

p_{2}

. If

\tilde{D}

is non singular, then

p_{1}^{T}

can be obtained from Equation (13):

p_{1}^{T} = {\tilde{D}}^{- 1} \tilde{C} s^{T} .

(14)

If

\tilde{D}

is singular in GF(2), then it is necessary to further permute the columns of

\tilde{H}

to eliminate this singularity. Once

p_{1}

is known,

p_{2}

can be determined using Equation (13):

p_{2}^{T} = - T^{- 1} (A s^{T} + B p_{1}^{T}) .

(15)

Given that T is the lower triangular form,

p_{2}

can be found using back substitution. The complexity of this encoding procedure can be kept low since A, B and T are sparse. Table 3 and Table 4 present the complexity of calculation of

p_{1}^{T}

and

p_{2}^{T}

, respectively. The complexity of the RU algorithm is given by

O (N + G^{2})

, where N is the block length and G is the gap to linear encoding. The gap is actually the number of rows of the parity check matrix that cannot be set into a triangular form using only row and column permutations. With a small gap G, the lower encoding complexity for the code is achieved.

The disadvantage of encoding using the RU method is that there is no exact programmable step-by-step algorithm. The multiple matrix calculations in this algorithm significantly limit the development of a rapid flexible encoder [28]. In addition, the RU method is subjected to a long critical path and odd constraints, which could render the LDPC encoder non-systematic [19].

4. Proposed 5G NR QC-LDPC Encoder Design

4.1. Proposed QC-LDPC Encoding Algorithm

This section presents an efficient scheme developed in this study for the construction of efficient encoders for 5G NR QC-LDPC codes. The proposed encoding method is based on the special characteristics of 5G NR QC-LDPC codes, which are presented in Figure 1. The proposed architectures target low-complexity, while ensuring high-throughput. As reported in the literature review, base graphs BG1 and BG2 have similar structures. In this paper, we focus our description on BG1 with a size of

m_{b} \times n_{b}

(

m_{b}

= 46, and

n_{b}

= 68), which is the main 5G NR high rate base graph.

Let the codeword

C = [s p_{a} p_{c}]

, where s denotes the systematic portion, which is divided into 22 groups of Z bits, since the base graph BG1 has

k_{b} = n_{b} - m_{b} = 22

information bit columns. Moreover,

s = [s_{1}, s_{2}, \dots, s_{k_{b}}]

, where each element of s is a vector of length Z. The information messages received by the encoder are stored in registers that are organized by

k_{b}

blocks, denoted by

s_{i} (i = 1, 2, \dots, k_{b})

, which correspond to the systematic blocks, where each consists of Z bits. Given that the encoder was designed to read Z bits per clock cycle, it requires

k_{b}

cycles to store all the information blocks. Moreover, the parity sequence can be grouped into sets of Z bits. Suppose that the parity portion of each message p is split into two sub-components as follows: the first

g = 4

parity bits

p_{a} = [p_{a_{1}}, p_{a_{2}}, \dots, p_{a_{g}}]

, and the remaining

(m_{b} - g) = 42

parity bits

p_{c} = [p_{c_{1}}, p_{c_{2}}, \dots, p_{c_{m_{b} - g}}]

. More precisely, the encoded codeword can be expressed as:

C = [\begin{matrix} s_{1}, s_{2}, \dots, s_{k_{b}}, p_{a_{1}}, p_{a_{2}}, \dots, p_{a_{g}}, p_{c_{1}}, p_{c_{2}}, \dots, p_{c_{m_{b} - g}} . \end{matrix}]

(16)

The parity check matrix H of 5G NR QC-LDPC codes can be partitioned into six matrices and presented in the following form:

H = [\begin{matrix} A & B & 0 \\ C_{1} & C_{2} & I \end{matrix}],

(17)

where A is

g \times k_{b}

, B is

g \times g

,

C_{1}

is

(m_{b} - g) \times k_{b}

, and

C_{2}

is

(m_{b} - g) \times g

. Moreover, I is an identity matrix with dimensions of

(m_{b} - g) \times (m_{b} - g)

. The encoding of LDPC codes is carried out using the following defining equation:

H C^{T} = 0^{T} .

(18)

Equation (18) can also be expressed as:

\begin{matrix} [\begin{matrix} A & B & 0 \\ C_{1} & C_{2} & I \end{matrix}] [\begin{matrix} s \\ p_{a} \\ p_{c} \end{matrix}] = 0^{T} . \end{matrix}

(19)

Equation (19) is then naturally split into two equations, as follows:

A_{s}^{T} + B p_{a}^{T} + 0 p_{c}^{T} = 0^{T},

(20)

C_{1} s^{T} + C_{2} p_{a}^{T} I p_{c}^{T} = 0^{T} .

(21)

The proposed algorithm is performed in two steps. In the initial step, the parity bits in the first portion

p_{a}

are computed by solving Equation (20). The second step in the encoding process includes the computation of the

p_{c}

parity portions using Equation (21).

The first step in the encoder implementation is the determination of the

p_{a}

part. Initially, Equation (20) is re-written in block form as follows:

\begin{matrix} [\begin{matrix} a_{1, 1} & a_{1, 2} & \dots & a_{1, k_{b}} \\ a_{2, 1} & a_{2, 2} & \dots & a_{2, k_{b}} \\ a_{3, 1} & a_{3, 2} & ⋱ & a_{3, k_{b}} \\ a_{4, 1} & a_{4, 2} & \dots & a_{4, k_{b}} \end{matrix}] [\begin{matrix} s_{1} \\ s_{2} \\ ⋮ \\ s_{k_{b}} \end{matrix}] + [\begin{matrix} 1 & 0 & - 1 & - 1 \\ 0 & 0 & 0 & - 1 \\ - 1 & - 1 & 0 & 0 \\ 1 & - 1 & - 1 & 0 \end{matrix}] [\begin{matrix} p_{a_{1}} \\ p_{a_{2}} \\ p_{a_{3}} \\ p_{a_{4}} \end{matrix}] = 0 . \end{matrix}

(22)

This can then be expanded into the following set of equations:

\begin{matrix} \sum_{j = 1}^{k_{b}} a_{1, j} s_{j} + p_{a_{1}}^{(1)} + p_{a_{2}} = 0, \end{matrix}

(23)

\begin{matrix} \sum_{j = 1}^{k_{b}} a_{2, j} s_{j} + p_{a_{1}} + p_{a_{2}} + p_{a_{3}} = 0, \end{matrix}

(24)

\begin{matrix} \sum_{j = 1}^{k_{b}} a_{3, j} s_{j} + p_{a_{3}} + p_{a_{4}} = 0, \end{matrix}

(25)

\begin{matrix} \sum_{j = 1}^{k_{b}} a_{4, j} s_{j} + p_{a_{1}}^{(1)} + p_{a_{4}} = 0, \end{matrix}

(26)

where

p_{a_{1}}^{(α)}

denotes the

α^{th}

(right) cyclic shifted version of

p_{a_{1}}

for

0 \leq α \leq Z

. By adding up all the above equations, the following is obtained:

p_{a_{1}} = \sum_{i = 1}^{4} \sum_{j = 1}^{k_{b}} a_{i, j} s_{j} .

(27)

It should be noted that a straightforward implementation of

a_{i, j} s_{j}

can be done with the use of Z-bit cyclic shifters. Since

a_{i, j} s_{j}

is a circular right shift of

s_{j}

with the shift coefficient defined by

a_{i, j}

, the hardware complexity is trivial. Based on the definition below,

λ_{i} = \sum_{j = 0}^{k_{b}} a_{i, j} s_{j} for i = 1, 2, 3, 4,

(28)

the following can be obtained:

p_{a_{1}} = \sum_{i = 1}^{4} λ_{i},

(29)

p_{a_{2}} = λ_{1} + p_{a_{1}}^{(1)},

(30)

p_{a_{3}} = λ_{3} + p_{a_{4}},

(31)

p_{a_{4}} = λ_{4} + p_{a_{1}}^{(1)} .

(32)

From Equation (28), each

λ_{i}

value is computed by accumulating all the

a_{i, j} s_{j}

values. In Modulo 2,

λ_{i}

is obtained by carrying out XOR operations on all the elements of

a_{i, j} s_{j}

. The

λ_{i}

values can be estimated per clock cycle in

g = 4

cycles. The first block of the parity bits

p_{a_{1}}

is then calculated by accumulating all the

λ_{i}

values. The remaining parity bits

p_{a_{i}}

can be obtained using a method that can be easily derived from Equations (30)–(32). This process can be done in two clock cycles since there is dependency between

p_{a_{3}}

and

p_{a_{4}}

. All the parity bits

p_{a}

in the first parity portion are stored in registers.

In a second step, the

p_{c}

portion can be easily determined based on Equation (21), where matrices

C_{1}

and

C_{2}

are given by

C_{1} = [\begin{matrix} c_{1, 1} & c_{1, 2} & \dots & c_{1, k_{b}} \\ c_{2, 1} & c_{2, 2} & \dots & c_{2, k_{b}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ c_{m_{b} - g, 1} & c_{m_{b} - g, 2} & \dots & c_{m_{b} - g, k_{b}} \end{matrix}]; C_{2} = [\begin{matrix} c_{1, k_{b} + 1} & c_{1, k_{b} + 2} & \dots & c_{1, k_{b} + g} \\ c_{2, k_{b} + 1} & c_{2, k_{b} + 2} & \dots & c_{2, k_{b} + g} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ c_{m_{b} - g, k_{b} + 1} & c_{m_{b} - g, k_{b} + 2} & \dots & c_{m_{b} - g, k_{b} + g} . \end{matrix}]

(33)

Upon the application of Equation (21), the elements of

p_{c}

can be computed using the following equations:

\begin{matrix} p_{c_{1}} = \sum_{j = 1}^{k_{b}} c_{1, j} s_{j} + \sum_{j = 1}^{g} c_{1, k_{b} + j} p_{a_{j}}, \\ p_{c_{2}} = \sum_{j = 1}^{k_{b}} c_{2, j} s_{j} + \sum_{j = 1}^{g} c_{2, k_{b} + j} p_{a_{j}}, \\ ⋮ \\ p_{c_{m_{b} - g}} = \sum_{j = 1}^{k_{b}} c_{m_{b} - g, j} s_{j} + \sum_{j = 1}^{g} c_{m_{b} - g, k_{b} + j} p_{a_{j}} . \end{matrix}

(34)

Similarly,

c_{i, j} s_{j}

represents a circular shift of

s_{j}

with the shift coefficient defined by

c_{i, j}

, and

c_{i, k_{b} + j} p_{a_{j}}

represents a circular shift of

p_{a_{j}}

with the shift coefficient defined by

c_{i, k_{b} + j}

. As soon as

c_{i, j} s_{j}

and

c_{i, k_{b} + j} p_{a_{j}}

have been obtained, they can be used to determine the value of the corresponding parity bits in the second parity portion

p_{c}

. This step can be performed in a single clock cycle. Hence, all the

p_{c}

parity bits can be acquired in

(m_{b} - g)

clock cycles. The encoded codeword is then a combination of the original message s and the two calculated parity portions

p_{a}

and

p_{c}

.

4.2. Proposed QC-LDPC Encoder Architecture

Figure 4 details the overall block diagram for the proposed low complexity 5G NR QC-LDPC code encoder. The hardware architectures were designed to conduct the encoding process through steps defined in Equations (29)–(32) and (34). In the first step, the computation of the parity bits in the first portion

p_{a}

is carried out. From Equation (28), each

λ_{i}

value is computed by accumulating all the cyclic shift results of

s_{j}

. Since the information message s consists of

k_{b}

blocks of Z bits, a total of

k_{b} = 22

barrel shifters of size Z, which are denoted by

C S_{j}

,

j = 1, 2, \dots, k_{b}

, are required for the circular shift process. The vector addition of all the

λ_{i}

components is then carried out by the XOR trees. Each intermediate

λ_{i}

value corresponding to Equation (28) can be estimated per clock cycle and stored in the

λ_memory

to be used later. Thus, the value of

p_{a_{1}}

can be obtained in

g = 4

clock cycles when all

λ_{i}

values are obtained and stored in memory. The remaining parity bits of

p_{a}

can be obtained in 2 clock cycles with the use of XOR gates.The objective of the second step is the calculation of the parity bits in the second portion

p_{c}

. According to (34), the parity blocks

p_{c_{i}}

can be achieved by the vector addition of

c_{i, j} s_{j}

and

c_{i, k_{b} + j} p_{a_{j}}

. The value of

c_{i, j} s_{j}

is also computed by accumulating all the cyclic shift results of

s_{j}

. In this step, the overall hardware complexity can be further decreased by exploiting the sharing technique. More specifically, the barrel shifters and XOR trees are reused for the computation of

p_{c}

in this step. Control signals are generated by the controller block. The value of

c_{i, k_{b} + j} p_{a_{j}}

is estimated by accumulating all the cyclic shift results of

p_{a_{j}}

. The required number of Z–bit barrel shifters is

g = 4

. The main blocks of the proposed architecture can be described as follows.

(1) Input/ Output Buffer: the input buffer, which is implemented as a number of serial input parallel output shift registers, is exploited to store the input systematic bits

s_{i}

received by the encoder. The output buffer is used to store the encoded codeword.

(2) Memory Blocks: two memory blocks are utilized, namely, one for the submatrix permutation values, and the others for the accumulated values

λ

that correspond to matrix A. In Figure 4, the AROM, C

_{1}

ROM, and C

_{2}

ROM correspond to the ROMs that store the coefficients of matrix A, matrix

C_{1}

, and matrix

C_{2}

, respectively. Under the assumption that

\tilde{q} = ⌈ l o g_{2} Z ⌉

bits represent the required word length to store the permutation information for each submatrix:

\tilde{q} g k_{b}

,

\tilde{q} (m_{b} - g) k_{b}

, and

\tilde{q} (m_{b} - g) g

bits are required to store matrix A,

C_{1}

, and

C_{2}

, respectively. A significant portion of the hardware complexity of the LDPC encoder consists of the memory required to store the parity check matrix. Unlike the RU method, the proposed algorithm does not require the inverse of the component matrix, which reflects its primary advantage over the RU method. Compared with the Gaussian method, the proposed architecture does not require for block-memories to store the generator matrix G, which further decreases the number of required components. The

λ_memory

is implemented as a dual port random access memory (RAM) for storing

λ_{i}

messages

(i = 1, 2, \dots, g)

. Each memory word

λ_{i}

consists of Z bits, corresponding to one accumulated message of matrix A. Moreover, a total of

(g \times Z)

bits of

λ_memory

are required for the proposed encoder.

(3) Barrel Shifters: barrel shifters are used to implement the cyclic shift permutations, according to the shift values provided by the cyclic shifter controllers. It should be noted that the number of cyclic shifters is equal to the number of message blocks, and the size of the barrel shifters is equal to submatrix size Z.

(4) XOR Trees: in Modulo 2, the addition implementation is obtained by carrying out an XOR operation on all the elements.

(5) Controller: this block generates control signals, such as

d a t a_s e l

to indicate the step being processed; and

m e m_e n

, to enable write access to the

λ_memory

.

5. Performance Analysis and Comparison

This section reports the implementation results of the proposed LDPC encoder architecture as well as a detailed comparison between the proposed method and other encoder implementations in terms of area and speed for 5G NR standard. First, the design characteristics of different LDPC code encoders are discussed. Thereafter, an analysis of the proposed LDPC encoder, with respect to its implementation on ASIC, is presented.

Table 5 presents a comparison between the area and speed of the proposed encoding method and those of other state-of-the-art approaches. As shown in Table 5, the matrix size was utilized to determine the ROM storage, and the Hamming weights of the matrices were used in computing the gate count. Since all the systematic bits and parity check bits in the first parity portion are stored in registers, the number of flip-flops required was estimated by the bit sizes of K and

p_{a}

. In Table 5, the time interval between input frames was exploited in order to compare the processing speed of different encoding methods. The time between two consecutive input frames is based on the total number of clock cycles between the arrivals of the first Z bits of a frame up to the cycle wherein the encoder is ready to receive another frame.

To make it clear, a target LDPC code with a base graph BG1 and submatrix size

Z = 16

was considered in Table 6. As can be observed from Table 6, the proposed encoder gains a significant reduction in the storage overhead. In the Gaussian elimination method, the entire generator matrix G is stored in the memory. In the RU method, the location of the edges (ones) of each row is stored, with an extra bit indicating the end of a row. By only storing the values of shift coefficients for each submatrix, the proposed method dramatically reduces the ROM size by 98.2% and 88.9% when compared with the G matrix method and RU method, respectively. Moreover, the proposed encoder reduces the number of XOR logic gate counts by 1.65 times compared with the RU method. This leads to a significant reduction in the hardware complexity for the proposed encoder as these components are the main contributors of logic resources in the encoder architecture. Hence, the proposed encoding structure shows a significant advantage over other LDPC encoding methods with respect to hardware complexity. As can be seen from Table 6, the Gaussian elimination approach requires only 23 clock cycles to generate the encoded codeword for a given LDPC code. However, this method suffers from a significant storage overhead which makes it less of an idea for implementation. From the analysis of the RU design, it was found to require 471 clock cycles per codeword. This is significantly higher than that of the proposed encoder, which only requires 70 clock cycles. From Table 6, it can be observed that the number of clock cycles required per codeword for the encoding of the proposed encoder design decreased to 14.8% of that of RU method.

The ASIC post synthesis implementation results on TSMC 65–nm CMOS technology are shown in Table 7, for various QC-LDPC encoders with expansion factors Z = 30, 64, 96, 144, and 352, which are indicated in the table as BG1-Z30, BG1-Z64, BG1-Z96, BG1-Z144, and BG1-Z352, respectively. In Table 7,

\tilde{q}

size denotes the word length required to store the shift sizes while

C P C

stands for the number of clock cycles required per codeword for encoding. Note that all input data bits were assumed to be available for encoding, and the serialization factors are not included in the results. In the proposed design, the

C P C

is equal to the maximum number of clock cycles required for the calculation of the

p_{a}

and

p_{c}

parity check bits. The computation of

p_{a}

requires

(g + 2)

clock cycles, in which g clock cycles are used to compute all the

λ

values and

p_{a_{1}}

, and two extra clock cycles are required for estimation of the remaining parity bits in the

p_{a}

portion. The computation of

p_{c}

requires

(m_{b} - g)

clock cycles. Hence, this method requires

(m_{b} + 2)

clock cycles in total. The information throughput reported in Table 7 is given by the formula

T h r o u g h p u t = \frac{m_{b} \times Z \times f_{m a x}}{C P C},

(35)

where

f_{m a x}

is the maximum operating frequency (post synthesis). For different submatrix sizes, the throughput varied from 22.1–202.4 Gbps. In Table 7, the occupied areas are also reported. It should be noted that there is a significant increase in the core area when processing higher submatrix sizes. Since encoder architecture of a higher submatrix size Z requires a higher

\tilde{q}

size, additional memory and hardware components are required. It is shown that the encoding complexity of the proposed design is linearly proportional to the submatrix size Z of the code. To keep the throughput comparison on equal basis, the throughput-to-area ratio metric was further defined as

T A R = T h r o u g h p u t / A r e a (Gbps / mm^{2})

. For all the considered submatrix sizes in Table 7, the

T A R

ranged from 520–597 Gbps/mm

^{2}

.

Based on the implementation results presented above, it is clear that the design methodology is applicable to different submatrix sizes and offers a significantly high area efficiency and high information throughput, which is more than enough to satisfy the throughput requirement for the 5G NR standard.

6. Conclusions

In this paper, a novel low-complexity high-throughput encoder approach for the 5G NR standard is proposed. Based on the proposed encoding algorithm, five encoder architectures with different submatrix sizes were implemented. The derived architecture exhibited a significantly lower hardware complexity, as it decreased the memory and logic component requirements. The proposed design demonstrates a superior performance to the alternative methods. Moreover, the synthesis results revealed that the proposed design is appropriate for the high throughput 5G standard.

Author Contributions

T.T.B.N. conceptualized the idea of this research, conducted experiments, collected data, and prepared the original version. T.N.T. reviewed, analyzed data, and updated the manuscript. H.L. supervised, validated, reviewed, and supported the research with funding.

Funding

This work was supported by the INHA UNIVERSITY Research Grant.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gallager, R.G. Low-Density Parity-Check Codes; MIT Press: Cambridge, MA, USA, 1963. [Google Scholar]
MacKay, D.J.C.; Neal, R.M. Near Shannon Limit Performance of Low-Density Parity-Check Codes. Electron. Lett. 1996, 32, 1645–1646. [Google Scholar] [CrossRef]
Huo, Y.; Dong, X.; Xu, W. 5G Cellular User Equipment: From Theory to Practical Hardware Design. IEEE Access 2017, 5, 13992–14010. [Google Scholar] [CrossRef]
Session Chairman (Nokia). Chairman’s Notes of Agenda Item 7.1.5 Channel Coding and Modulation, 3GPP TSG RAN WG1 Meeting No. 87, R1-1613710 (2016). Available online: https://portal.3gpp.org/ngppapp/CreateTdoc.aspx?mode=view&contributionId=752413 (accessed on 22 May 2019).
Richardson, T.; Kudekar, S. Design of Low-Density Parity Check Codes for 5G New Radio. IEEE Commun. Mag. 2018, 56, 28–34. [Google Scholar] [CrossRef]
Ji, W.; Wu, Z.; Zheng, K.; Zhao, L.; Liu, Y. Design and Implementation of a 5G NR System Based on LDPC in Open Source SDR. In Proceedings of the 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, UAE, 9–13 December 2018; pp. 1–6. [Google Scholar]
Tang, H.; Xu, J.; Kou, Y.; Lin, S.; Abdel-Ghaffar, K. On Algebraic Construction of Gallager and Circulant Low-Density Parity-Check Codes. IEEE Trans. Inf. Theory 2004, 50, 1269–1279. [Google Scholar] [CrossRef]
Ajaz, S.; Nguyen, T.T.B.; Lee, H. An Area-Efficient Half-Row Pipelined Layered LDPC Decoder Architecture. J. Semicond. Technol. Sci. 2017, 17, 845–853. [Google Scholar] [CrossRef]
Nguyen, T.T.B.; Lee, H. Low-Complexity Multi-mode Multi-way Split-row Layered LDPC Decoder for Gigabit Wireless Communications. Integration 2018. [Google Scholar] [CrossRef]
Ajaz, S.; Lee, H. Efficient multi-Gb/s multi-mode LDPC decoder architecture for IEEE 802.11ad applications. Integration 2015, 51, 21–36. [Google Scholar] [CrossRef]
Ajaz, S.; Lee, H. An efficient radix-4 Quasi-cyclic shift network for QC-LDPC decoders. IEICE Electron. Express 2014, 11, 1–6. [Google Scholar] [CrossRef]
Ajaz, S.; Lee, H. Reduced-complexity Local Switch Based Multi-mode QC-LDPC Decoder Architecture for Gigabit Wireless Communications. IET Electron. Lett. 2013, 49, 1246–1248. [Google Scholar] [CrossRef]
Richardson, T.J.; Urbanke, R.L. Efficient Encoding of Low-density Parity-check Codes. IEEE Trans. Inf. Theory 2001, 47, 638–656. [Google Scholar] [CrossRef]
Khodaiemehr, H.; Kiani, D. Construction and Encoding of QC-LDPC Codes Using Group Rings. IEEE Trans. Inf. Theory 2017, 63, 2039–2060. [Google Scholar] [CrossRef]
Huang, Q.; Tang, L.; He, S.; Xiong, Z.; Wang, Z. Low-Complexity Encoding of Quasi-Cyclic Codes Based on Galois Fourier Transform. IEEE Trans. Commun. 2014, 62, 1757–1767. [Google Scholar] [CrossRef]
Li, Z.; Chen, L.; Zeng, L.; Lin, S.; Fong, W. Efficient Encoding of Quasi-cyclic Low-density Parity-check Codes. IEEE Trans. Commun. 2006, 54, 71–81. [Google Scholar] [CrossRef]
Ilani, I. Designing and Encoding QC-LDPC Codes Using Matrices over Commutative Rings. In Proceedings of the 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), Eilat, Israel, 16–18 November 2016; pp. 1–5. [Google Scholar]
Jung, Y.; Chung, C.; Jung, Y.; Kim, J. 7.7 Gbps Encoder Design for IEEE 802.11ac QC-LDPC Codes. J. Semicond. Technol. Sci. 2014, 14, 419–425. [Google Scholar] [CrossRef]
Cohen, A.E.; Parhi, K.K. A Low-Complexity Hybrid LDPC Code Encoder for IEEE 802.3an (10GBase-T) Ethernet. IEEE Trans. Signal Process. 2009, 57, 4085–4094. [Google Scholar] [CrossRef]
Zhang, P.; Liu, C.; Jiang, L. Efficient Encoding of QC-LDPC Codes Based on Rotate-left-accumulator Circuits. Electron. Lett. 2013, 49, 810–812. [Google Scholar] [CrossRef]
Jung, Y.; Kim, J. Memory-efficient and High-speed LDPC Encoder. Electron. Lett. 2010, 46, 1035–1036. [Google Scholar] [CrossRef]
Li, H.; Bai, B.; Mu, X.; Zhang, J.; Xu, H. Algebra-Assisted Construction of Quasi-Cyclic LDPC Codes for 5G New Radio. IEEE Access 2018, 6, 50229–50244. [Google Scholar] [CrossRef]
Chen, L.; Xu, J.; Djurdjevic, I.; Lin, S. Near-Shannon-limit Quasi- cyclic Low-density Parity-check Codes. IEEE Trans. Commun. 2004, 52, 1038–1042. [Google Scholar] [CrossRef]
Chen, L.; Lan, L.; Djurdjevic, I.; Lin, S.; Abdel-Ghaffar, K. An Algebraic Method for Constructing Quasi-cyclic LDPC Codes. In Proceedings of the International Symposium on Information Theory and Its Applications, Parma, Italy, 10–13 October 2004; pp. 535–539. [Google Scholar]
Li, J.; Lin, S.; Abdel-Ghaffar, K.; Ryan, W.; Costello, D.J., Jr. LDPC Code Designs, Constructions, and Unification; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar] [CrossRef]
Chen, T.; Vakilinia, K.; Divsalar, D.; Wesel, R.D. Protograph Based Raptor-like LDPC Codes. IEEE Trans. Commun. 2015, 63, 1522–1532. [Google Scholar] [CrossRef]
Ad-Hoc chair (Nokia). Chairman’s Notes of Agenda Item 7.1.4. Channel Coding, 3GPP TSG RAN WG1 Meeting AH 2, R1-1711982 (2017). Available online: https://portal.3gpp.org/ngppapp/CreateTdoc.aspx?mode=view&contributionId=805088 (accessed on 22 May 2019).
Yasotharan, H.; Carusone, A.C. A Flexible Hardware Encoder for Systematic Low-density Parity-check Codes. In Proceedings of the 52nd IEEE International Midwest Symposium on Circuits and Systems, Cancun, Mexico, 2–5 August 2009; pp. 54–57. [Google Scholar]

Figure 1. Sketch of base parity check structure for the 5G NR QC-LDPC codes.

Figure 2. Shortening by zero padding and puncturing of standard 5G QC-LDPC codes.

Figure 3. The parity check matrix H in approximate lower triangular form.

Figure 4. Low-complexity high-throughput encoder architecture for 5G NR QC-LDPC code.

Table 1. Relationship between exponent matrices and sets of lifting size.

Exponent Matrix	Lifting Size Set
Set 1	$Z = 2 \times 2^{j}, j = 0, 1, 2, 3, 4, 5, 6, 7$
Set 2	$Z = 3 \times 2^{j}, j = 0, 1, 2, 3, 4, 5, 6, 7$
Set 3	$Z = 5 \times 2^{j}, j = 0, 1, 2, 3, 4, 5, 6$
Set 4	$Z = 7 \times 2^{j}, j = 0, 1, 2, 3, 4, 5$
Set 5	$Z = 9 \times 2^{j}, j = 0, 1, 2, 3, 4, 5$
Set 6	$Z = 11 \times 2^{j}, j = 0, 1, 2, 3, 4, 5$
Set 7	$Z = 13 \times 2^{j}, j = 0, 1, 2, 3, 4$
Set 8	$Z = 15 \times 2^{j}, j = 0, 1, 2, 3, 4$

Table 2. Lifting size Z supported by standard 5G QC-LDPC codes.

Z		a
Z		2	3	5	7	9	11	13	15
j	0	2	3	5	7	9	11	13	15
	1	4	6	10	14	18	22	26	30
	2	8	12	20	28	36	44	52	60
	3	16	24	40	56	72	88	104	120
	4	32	48	80	112	144	176	208	240
	5	64	96	160	224	288	352
	6	128	192	320
	7	256	384

Table 3. Complexity analysis of

p_{1}^{T}

calculation.

Table 3. Complexity analysis of

p_{1}^{T}

calculation.

Operation	Comment	Complexity
$A s^{T}$	Multiplication by sparse matrix	$O (N)$
$T^{- 1} A s^{T}$	Back substitution, T is lower triangular matrix	$O (N)$
$- E T^{- 1} [A s^{T}]$	Multiplication by sparse matrix	$O (N)$
$C s^{T}$	Multiplication by sparse matrix	$O (N)$
$\tilde{C} = - E T^{- 1} [A s^{T}] + C s^{T}$	Addition	$O (N)$
$- {\tilde{D}}^{- 1} \tilde{C} s^{T}$	Multiplication by $G \times G$ matrix	$O (G^{2})$

Table 4. Complexity analysis of

p_{2}^{T}

calculation.

Table 4. Complexity analysis of

p_{2}^{T}

calculation.

Operation	Comment	Complexity
$A s^{T}$	Multiplication by sparse matrix	$O (N)$
$B p_{1}^{T}$	Multiplication by sparse matrix	$O (N)$
$[A s^{T}] + [B p_{1}^{T}]$	Addition	$O (N)$
$- T^{- 1} (A s^{T} + B p_{1}^{T})$	Back substitution, T is lower triangular	$O (N)$

Table 5. Comparison between Gaussian method, RU method, and proposed method.

		Gaussian	RU	Proposed
Area	Flip-flops	$k_{b} Z$	$(k_{b} + g) Z$	$(k_{b} + g) Z$
	XOR gates	$(k_{b} Z - 1) m_{b} Z$	$2 m_{b} + (m_{b} - g) Z$	$(k_{b} + 2 g - 1) Z$
	AND gates	$k_{b} m_{b} Z^{2}$	–	–
	Barrel shifter (Z bits)	–	–	$k_{b} + g + 1$
	Memory (bits)	ROM = $k_{b} m_{b} Z^{2}$	ROM = $(245 x + 29 y + 274) Z$	ROM = $\tilde{q} [m_{b} (k_{b} + g) - g^{2}]$ $λ_mem = g Z$
Speed (clock cycles)		$k_{b} + 1$	$28 Z + k_{b} + 1$	$n_{b} + 2$

Where

\tilde{q} = ⌈ l o g_{2} Z ⌉

;

x = ⌈ {log}_{2} (k_{b} Z) ⌉

;

y = ⌈ {log}_{2} (g Z) ⌉

.

Table 6. Comparison between Gaussian method, RU method, and proposed method for submatrix size

Z = 16

.

Table 6. Comparison between Gaussian method, RU method, and proposed method for submatrix size

Z = 16

.

		Gaussian	RU	Proposed
Area	Flip-flops	352	416	416
	XOR gates	258,336	764	464
	AND gates	259,072	–	–
	Barrel shifter (Z bits)	–	–	27
	Memory (bits)	ROM = 259,072	ROM = 42,488	ROM = 4720 $λ_mem = 64$
Speed (clock cycles)		23	471	70

Table 7. ASIC implementation results of LDPC encoders for different lifting sizes

Z =

30, 64, 96, 144, and 352.

Table 7. ASIC implementation results of LDPC encoders for different lifting sizes

Z =

30, 64, 96, 144, and 352.

Encoder	BG1-Z30	BG1-Z64	BG1-Z96	BG1-Z144	BG1-Z352
CMOS technology	65-nm	65-nm	65-nm	65-nm	65-nm
Base graph	BG1	BG1	BG1	BG1	BG1
Subset	8	1	2	5	6
Submatrix size Z	30	64	96	144	352
$\tilde{q}$ size (bits)	5	6	7	8	9
$C P C$ (clock cycles)	48	48	48	48	48
Max. frequency (MHz)	769	714	667	645	600
Throughput (Gbps)	22.1	43.8	61.4	89	202.4
Area (mm $^{2}$ )	0.037	0.077	0.117	0.171	0.389
Gate counts	45.9 K	96 K	146.3 K	214 K	486.4 K
$T A R$ $^{†}$ (Gbps/mm $^{2}$ )	597	569	525	520	520

^{†}

T A R = T h r o u g h p u t / A r e a

.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, T.T.B.; Nguyen Tan, T.; Lee, H. Efficient QC-LDPC Encoder for 5G New Radio. Electronics 2019, 8, 668. https://doi.org/10.3390/electronics8060668

AMA Style

Nguyen TTB, Nguyen Tan T, Lee H. Efficient QC-LDPC Encoder for 5G New Radio. Electronics. 2019; 8(6):668. https://doi.org/10.3390/electronics8060668

Chicago/Turabian Style

Nguyen, Tram Thi Bao, Tuy Nguyen Tan, and Hanho Lee. 2019. "Efficient QC-LDPC Encoder for 5G New Radio" Electronics 8, no. 6: 668. https://doi.org/10.3390/electronics8060668

APA Style

Nguyen, T. T. B., Nguyen Tan, T., & Lee, H. (2019). Efficient QC-LDPC Encoder for 5G New Radio. Electronics, 8(6), 668. https://doi.org/10.3390/electronics8060668

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient QC-LDPC Encoder for 5G New Radio

Abstract

1. Introduction

2. 5G NR QC-LDPC Codes

2.1. Preliminary

2.2. Introduction to QC-LDPC Codes

2.3. 5G NR QC-LDPC Characteristics

3. LDPC Encoding Algorithms

3.1. LDPC Encoding with Gaussian Elinination

3.2. LDPC Encoding with the RU Method

4. Proposed 5G NR QC-LDPC Encoder Design

4.1. Proposed QC-LDPC Encoding Algorithm

4.2. Proposed QC-LDPC Encoder Architecture

5. Performance Analysis and Comparison

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Z		a
Z		2	3	5	7	9	11	13	15
j	0	2	3	5	7	9	11	13	15
	1	4	6	10	14	18	22	26	30
	2	8	12	20	28	36	44	52	60
	3	16	24	40	56	72	88	104	120
	4	32	48	80	112	144	176	208	240
	5	64	96	160	224	288	352
	6	128	192	320
	7	256	384

Z		a
Z		2	3	5	7	9	11	13	15
j	0	2	3	5	7	9	11	13	15
	1	4	6	10	14	18	22	26	30
	2	8	12	20	28	36	44	52	60
	3	16	24	40	56	72	88	104	120
	4	32	48	80	112	144	176	208	240
	5	64	96	160	224	288	352
	6	128	192	320
	7	256	384

Z		a
Z		2	3	5	7	9	11	13	15
j	0	2	3	5	7	9	11	13	15
	1	4	6	10	14	18	22	26	30
	2	8	12	20	28	36	44	52	60
	3	16	24	40	56	72	88	104	120
	4	32	48	80	112	144	176	208	240
	5	64	96	160	224	288	352
	6	128	192	320
	7	256	384