1. Introduction
Forward error correcting codes counteract the random errors over the noisy channel by inserting redundant bits into code words [
1]. In order to balance the quality and rate of communication, different coding schemes have been proposed over the past few decades, and the corresponding decoding scheme has increasingly attracted the attention of many researchers. In the traditional communication system, only the decoder knows the encoding parameters it can decode accurately. However, under the conditions of non-cooperative communication [
2], such as cognitive radio, it is impossible for the non-cooperative receiver to decode without prior knowledge of the code parameters. Hence, coding blind recognition is urgently required, and has attracted extensive research interest [
3].
Among the existing coding schemes, Low Density Parity Check (LDPC) code, which was first proposed by Gallager in the 1960s [
4], has been widely used in modern communication systems, and has been identified as the long code coding scheme in the 5G enhanced mobile broadband scene due to its long code length, rich combination, and sparse check matrix. Since LDPC codes have these basic characteristics, it also poses a challenge to decoding and coding blind recognition. In practical terms, LDPC codes are usually too long to reconstruct the parity-check matrix directly.
In order to solve the problem of high time complexity and high computational complexity, most of the existing methods of LDPC coding blind recognition proposed in recent years use closed set identification. The identification methods based on the closed set utilize a known set which contains all probable parameters [
5,
6,
7,
8]. The log likelihood ratio (LLR) used in [
5,
6] for coding blind recognition performs well at a low SNR. In [
7], LDPC code is identified by the average likelihood difference (LD) of parity-checks. Since the LLR of syndrome a posterior probability is widely used in these methods, it is often limited by channel conditions. Wu proposed to calculate the average cosine conformity (CC) for recognition, which not only has an explicit probability density, but also has low computational complexity [
8]. The code parameters within a given closed set can be recognized using the methods above. However, the identification methods without a candidate set are more universal, and take a longer time to calculate. The minimum Hamming weight vector is used in the search algorithm proposed by Valembois [
9], but the fault tolerance capability is weak. Cluzeau used iterative coding [
10] to improve the tolerance of the method in [
9] followed by the problem of longer time consumption for finding sparse parity-check vectors.
Deep learning as an emerging technique has been widely applied in the many fields, such as image classification [
11] and natural language processing [
12]. The best solutions are found in specific problems through the connection of multi-layer networks. Different network models have been proposed by scholars in different fields—e.g., the Transformer model proposed by A. Vaswani [
13] is a well-known sequence to sequence (seq2seq) architecture that performs well by treating a sentence as a sequence of words in the field of natural language processing (NLP). The self-attention layer structure in the Transformer greatly reduces computation time using the trick of parallel calculation.
In recent years, the combination of neural networks and coding blind recognition has progressed rapidly [
14,
15,
16,
17]. It has been proven that both of the coding schemes and the coding parameter are able to be identified using neural networks [
14]. Two types of LDPC codes can be identified in [
15], while a 2-dimensional convolution neural network is used to identify the parity-check matrix of LDPC codes with the help of a candidate set [
16]. Moreover, a joint modulation and channel coding recognition framework is proposed in [
17] for the practical 5G-PDSCH protocols using the novel Res-Inception convolutional NN and the algorithm based on maximal ALLR. Inspired by the deep learning model used in NLP, we propose a novel method based on Transformer for recognizing the coding parameter of LDPC codes.
Furthermore, channel condition is the key factor in coding blind recognition. Almost all of the traditional coding blind recognition methods of LDPC codes adequately utilize the channel condition. On the contrary, most existing recognition methods based on deep learning pay little attention to the channel condition, which made these deep learning methods identify accurately at high SNR but fare badly when the channel condition is not good enough. Channel noise is similar to image noise in some areas. The deep learning method for two-dimensional image denoising has been widely studied in the field of computer vision [
18], and has made considerable progress over the last couple of years [
19,
20]. In addition to this denoising, the design aims for two-dimensional data. Denoising networks are also widely used in other fields. In the area of decoding, a novel receiver architecture [
21] concatenates a belief propagation (BP) decoder for decoding with a convolution neural network (CNN) for denoising. The iteration between BP decoding and CNN will gradually improve the SNR, and achieve better decoding performance. In [
22], the double-CNN denoiser is designed to surpass the noise by estimating the channel state information (CSI) under the Rayleigh fading channel. However, these two methods can hardly be applied in practical terms due to the constraint of the information bit length [
23], which can cause the dimension explosion of the neural network.
In this paper, we design a deep-learning-based architecture for coding blind recognition of LDPC codes under an additive white Gaussian noise channel with different SNRs. Briefly, the main contributions of this paper include: The code words are treated as a sequence of words which is sent to the proposed Transformer-based neural network for coding blind recognition. A denoising network and new loss functions are proposed in order to get reliable recovery of the bit stream for recognition. Besides, the denoising network and blind recognition network are cascaded, and the simulation results show that the accuracy of the cascade structure is better than that of the non-cascade structure. Furthermore, we compared the proposed method with traditional methods. When the SNR is low, our method performs better than the traditional one.
3. Proposed Cascade Neural Network
This section contains the main innovation of this work: a cascade neural network for LDPC coding recognition. In order to better present the proposed design, we firstly describe the system framework. Then, the CNN structure and the coding blind recognition network will be introduced, and their functions will be explained specifically.
The input of the denoising CNN is a 1-D vector
x, while the output vector is
y. Both
x and
y are of size
. As shown in
Figure 5, the received code words
are uniformly distributed as
N bits, and then sent into CNN. After denoising,
y are used to recalculate
(
L).
denotes the concatenation of
L with of size
. The blind recognition network based on the Transformer takes
as input. The output of the recognition network is the label, corresponding to the parity-check matrix
H.
3.1. Denoising Networks
Figure 6 shows the denoising networks using CNN. The input of our networks is a 1-D vector. The feature map at the
ith layer
can be expressed in (11) as
where ∗ represents the convolution operation, and
is the
jth convolution kernel in layer
i.
represents the corresponding bias. The activation function is ReLU. In addition, the structure parameters of the proposed denoising network are given in
Table 1.
In addition to the network framework, the loss function is also important for designing the network. It guides the training in the correct direction.
loss, which is also called the mean squared error (
), is widely used in the area of image denoising. Suppose that
is the forward calculation output of the network, and
y is the expected output.
can be represented by
Inspired by the image denoising,
loss is also used in this design. Furthermore, in [
22], the authors propose a method for recalculating the LLR. It obtains the empirical probability distribution function (EPDF)
F of the residual noise through histogram statistics. In order to recalculate the LLR more easily, we consider that the residual noise preserves Gaussian distribution with various values of
. Thus,
[
25] is used to evaluate whether the residual noise meets Gaussian distribution. Residual noise is defined as
, and the numerical expectation as
.
is expressed by:
S is called skewness, and
C is called kurtosis. For a Guassian distribution,
,
. Thus, the loss function is
where
.
In this paper, the candidate sets of LDPC codes have the length
, and rate
. IEEE802.11 elaborates on these LDPC codes specifically, and is used for Wireless Fidelity(Wi-Fi). The
H matrix of the LDPC code with
,
is shown in
Figure 7. The
H matrix is constituted by two basic elements, i.e., a zero matrix represented by
and an identity matrix. The elements in the
H matrix which are not equal to -1 represent the digits of rotating right of the identity matrix. In order to train denoising networks, we consider the AWGN with different SNR
. Three code words
with different lengths are sent to the channel. In order to prevent the dimension explosion of the neural network and reduce training time, the appropriate input size of the network is chosen, which is 648 bits. Thus, these code words are divided into 648 bits
c, which is the shortest code word length. After the process of the sending end and channel, we get
with white Gaussian noise added. Denoising networks use
c and
for training.
3.2. Recalculation of LLR
As mentioned in
Section 3.1, the residual noise preserves the Gaussian distribution with various values of
. It is easy to recalculate the
:
3.3. Recognition Networks
After the above operation, the s are calculated at sizes of (1, 648). Then, 15 s are concatenated into 1-D data of size (1, 9720) defined as , which is the input of the recognition network. However, because of the positional encoding layer in Transformer, the input of the network must be a fixed size. In order to identify these three different lengths of LDPC code and reduce the number of parameters, the ideas in the Swin Transformer are used in the recognition network.
Figure 8 shows the architecture of the network. This network takes
as input. Since the token size of this network is 324,
is divided into 30 × 324 by Patch Partition layer. The window size is of 10 tokens and the relative position encoding (RPE)
B is a learnable variable of size
. B is calculated by substituting its index into an RPE matrix of size
. The size of the RPE matrix is determined by the number of tokens in the window, and the relationship between tokens and windows is shown in
Figure 9. The Swin Transformer Block contains two layers: one for MHA calculation, and another for shifted window MHA(SW-MHA) calculation.
Figure 10 shows the detail of the Swin Transformer block. The window calculates MHA firstly, which is introduced in
Section 2.3. In order to calculate SW-MHA, the feature map must shift circularly as shown in
Figure 9. Then, the SW-MHA can be calculated the same way as the MHA.
It should be noted that the formula is a little different from (8) because of the difference in the strategy of linear embedding.
B is the RPE mentioned before, and a mask is useless in this task.
The function of the patch merging layer is similar to the max-pooling layer in CNN, which is used to expand the receptive field. The vector size of stage 1’s output is the the same as its input with the size of . Then, the outputs are resized into the size of by taking one of every three pieces of data from the output of stage 1. After a normalization in each row and a fully connected layer, the output becomes . The Swin Transformer block in stage 2 is similar to stage 1. The difference is only the dimension, which is 10 in stage 2. Finally, the outputs of stage 2 of the size are sent to the fully connected layer and processed using the softmax function for classification.
There are three different types of datasets for the recognition network. The main difference between these three datasets is that the impact of noise on their data inside varies. The data in dataset 1 are noiseless, the data in dataset 2 are the output of the recalculation and combination block, which retain the residual noise, and the data in dataset 3 come directly from the demodulator without denoising. The label of these datasets is one-hot codes of 12 bits, which corresponds to each parity-check matrix
H. The details of the datasets are given in
Table 2. Each dataset included in
Table 2 is randomly divided into 3 groups, i.e., the train set, validation set and test set. The train set accounts for
, while the validation set and test set account for
. Note that there is no other mechanism for accessing channel information, so dataset 1 and dataset 3 are sent to the network for training directly. The parameters of the recognition network are given in
Table 3.
5. Conclusions and Discussion
In this paper, we proposed a cascade network structure for LDPC coding blind recognition. Compared with the traditional blind recognition algorithm, it performs better under the condition of low SNR.
The original intention of our design of the denoising network was to filter part of the noise and obtain the code word, which has residual noise in the Gaussian distribution. The results show that the denoising network can provide accurate channel information when the SNR gap in datasets is not large. In addition, it performs better when the SNR is small. For the recognition network, training for various different SNRs is not reliable. We prefer focalization training, which coincides with the function of the noise reduction network.
The generation matrix of LDPC codes in datasets is described in IEEE802.11n, which is widely used in short-range wireless communications technology. The proposed cascade network can identify the parity-check matrix of LDPC codes in datasets with a probability of when the S is larger than 0 dB. Furthermore, this network achieves about accuracy when the S is around dB, which is better than other traditional methods.
After the division of the dataset and the optimization of the structure, this architecture performed better than the traditional method and another deep learning method. However, due to the limitation of the cascade structure, the training cost of the network is high, and the data division is also critical. Hence, the fusion of this structure and traditional algorithms may achieve better results. Here, we suppose a hypothetical structure with two different data paths, and . The demodulated data is sent to two different data paths, and the final output is the one with higher confidence of these two data paths.
is for data with low SNR, while is on the contrary. Both paths have a recognition network, but the training sets of the two networks are different. Recognition networks on use the output of the denoising network for training. The datasets of the denoising network are of a low SNR, such as dB or even lower, aimed to handle the situation of a bad channel condition. On the contrary, is for data with a high SNR. Firstly, the is evaluated blindly with the help of the traditional method. The soft information is subsequently used for training. Note that the recognition network can also be replaced by the traditional blind identification method, since the traditional method has already given good results when the SNR is high. At the end of these two paths, a discriminant function must be designed to determine whether the final output is from or .
In short, our work confirmed that it is feasible to use the pure deep learning method for LDPC blind recognition, and that the performance is better after using the CNN for denoising. In the subsequent work, we will rebuild the dataset with a different channel condition and different parity check matrix of LDPC codes, which is widely used in modern communication, and verify the accuracy of the existing network on the rebuild dataset. In addition, the hardware acceleration architecture for the network is also being designed synchronously.