With the development of information technology, fifth-generation (5G) mobile communication has become one of the most popular technologies worldwide. It has fundamentally changed the role of telecommunication technology in society [
1]. By the end of 2025, the total number of 5G connections will account for about a quarter of all users [
2]. One of the key enabling technologies of 5G is the massive multiple-input–multiple-output (MIMO) system. The massive MIMO system deploys hundreds or thousands of antennas at the base station to increase system capacity and reduce user interference and uses spatial multiplexing to increase network throughput. In order to improve throughput, it is necessary to transmit channel state information (CSI) from the user equipment (UE) to the base station (BS). In a Time Division Duplexing (TDD) massive MIMO system, downlink CSI through channel reciprocity can be obtained and then fed back to the BS. However, in a Frequency Division Duplexing (FDD) massive MIMO system, the user must first estimate the downlink CSI and then feed it back to the BS. In a massive MIMO system, each antenna needs to obtain the corresponding downlink CSI. The number of antennas is considerably high, and the CSI also increases sharply, resulting in additional overhead costs and transmission delays. Therefore, for an FDD massive MIMO system, it is very important to reduce the feedback overhead of CSI.
Initially, a codebook-based method was used to reduce overhead, but the codebook-based method increases the amount of overhead as the number of antennas increases, so it is not suitable for a massive MIMO system. In order to further reduce the CSI feedback overhead and ensure the accuracy of CSI acquisition, compressed sensing (CS) was proposed to reduce the CSI feedback overhead. The CS method reduces the feedback overhead by transforming CSI into a special sparse domain [
3,
4], but the iterative method for solving equations in the CS method has high complexity and cannot be used to obtain good results.
With the development of artificial intelligence, methods such as deep learning have been applied in the field of communications, including designing base station beamforming [
5], channel estimation [
6], symbol detection [
7], and CSI feedback [
8,
9,
10,
11]. Wen Chaokai et al. [
8] proposed CSINet, a method of using deep learning to feed back CSI. This method does not rely on knowledge of the channel distribution. By using deep learning training data to effectively manipulate the channel structure, a higher reconstruction speed can be obtained, proving that compared with CS methods, deep learning is more suitable for CSI feedback tasks. However, CSINet only focuses on the sparsity of the angular delay domain and ignores the spatial correlation, resulting in a sharp drop in resolution at low compression ratios (CR). The authors of [
9] designed the digital characteristics of the CSI matrix, saved real and imaginary numbers for training separately, and introduced a convolutional block attention module (CBAM) to suppress noise interference during channel transmission. Good accuracy was obtained. However, because a single-resolution convolution kernel cannot adapt to different compression rates at the same time, feedback accuracy was reduced in the case of high CR. The authors of [
10] proposed a multi-resolution neural network channel reconstruction network (CRNet) that can extract information at multiple scales. The CRNet designs two types of convolution kernels to handle high and low CR differently, ensuring reconstruction accuracy. At the same time, the large convolution kernel is optimized to reduce the computational complexity of the network. However, the CRNet’s optimization of the complexity of the convolution kernel also brings about a decrease in accuracy. The convolution kernel is divided into two asymmetric parts to reduce the amount of calculation and increase the accuracy of recovery. The authors of [
11] used dilated convolution to enhance the perception range of the convolution field without increasing the size of the convolution to obtain higher recovery accuracy, but because the characteristics of hole convolution and CSI information were not suitable, the CSI information consisted of a set of arrays related to the physical factors of the channel, and the dilated convolution naturally lost part of the information, which reduced reconstruction accuracy. The authors of [
12] proposed a neural network structure called CSINet+ for multi-rate compressed sensing that can effectively compress and quantify the CSI matrix. CSINet+ proposes a multi-rate compression method that can solve the problem wherein DL-based methods need to store different training parameters for different compression rates. Although CSINet+ improves the accuracy of feedback, it still suffers from a complex network structure design. The authors of [
13] proposed a new full convolutional neural network model called DeepCMC, which sacrificed the use of a full connection layer for compression and used convolutional compression as a whole, effectively reducing the training complexity of the network, and adding a quantized nuclear entropy coding block to reduce the redundancy of code words in the transmission process. A multi-user version was proposed, which can be used for joint training. However, as the number of users increases, the complexity of the model skyrockets. In order to further improve the coding efficiency of CSI feedback, the authors of [
14] proposed an efficient DL-based compression framework, dubbed CQNet, to solve CSI compression and codeword quantization kernel recovery under bandwidth constraints. CQNet is directly compatible with other DL-based CSI feedback approaches, and the combination of the two can be used to reduce codeword redundancy. CQNet uses a new non-uniform quantization module that can effectively reduce bits without reducing recovery accuracy. However, CQNet is only a quantitative framework model, so CQNet can yield different performance results for different CSI feedback models and cannot be applied to all methods.
Several points can reduce the overhead. First, as mentioned in [
15,
16,
17], the wireless channel between the base station and the user only has a small angular spread (AS). Due to the small AS and the large dimensionality of the channel, massive MIMO channels exhibit sparsity in the angular domain. Second, due to the angle reciprocity between uplink and downlink [
17], the authors of [
18] showed that the magnitudes corresponding to the uplink and downlink magnitude exhibit a strong correlation, and the absolute value of the uplink and downlink sparsity also exhibit a positive correlation by separating the real and imaginary parts of CSI. Since it is convenient to obtain uplink CSI in massive MIMO systems, the uplink CSI can be used to assist the downlink CSI in extracting features, thereby reducing the downlink feedback overhead. Compared with [
18] using only single-resolution convolution kernels, we propose to use multi-resolution convolution kernels to extract features from CSI matrices with different degrees of sparsity, which can obtain more extracted information and achieve higher feedback accuracy. Finally, the weaker UE needs to reduce the depth of the encoder-side network. Considering the above three points, this paper proposes an uplink-assisted channel feedback method (Complex Uplink Net, CUNet) to improve CSI feedback accuracy.