1. Introduction
Recently, the use of efficient modern technologies in wireless communication systems, including Massive Multiple-Input Multiple-Output (Massive MIMO), in addition to Intelligent Reflecting Surface (IRS) has gained significant attention. This is because these technologies have succeeded in improving the spectrum utilization of communication systems, making them more efficient [
1]. Despite the high cost and energy consumption, which are inherent in wireless communication systems, several wireless communication protocols have integrated Massive MIMO systems, to improve their Spectral Efficiency (SE), as well as their Energy Efficiency (EE) [
2]. This feature enhances the connectivity of a transmitter/receiver by facilitating more reliable connections and a higher data transmission rate. One of the major obstacles facing Massive MIMO technology is channel estimation [
3]. This is because of substantial performance degradation, where many operations, such as beamforming and resource allocation, require precise Channel State Information (CSI). Additionally, in the Frequency Division Duplex (FDD)-based Massive MIMO, obtaining precise downlink CSI presents a significant challenge because of the excessive overhead needed for a feedback link to the Base Station (BS), which represents an important step in knowing the downlink channel characteristics [
4].
To further decrease communication losses resulting from Non-Line-of-Sight (NLoS) and increase SE/EF of Massive MIMO systems, they have been recently integrated with IRS. The latter has attracted significant interest as a potential technology, due to its ability to establish a communication channel between User Equipment (UE) and BS in the presence of towering structures or other obstacles [
5]. Many reflecting elements, characterized by passivity and low cost, constitute the IRS system [
6]. The passive IRS reflects the incident signal with an appropriate phase shift, despite its inability to perform signal processing and amplification [
7]. This technological advancement has the potential to enhance network coverage, lessen interference, and enhance the accuracy of communication systems [
8]. However, due to the inability of numerous passive IRS components to transmit and receive data, estimating the channel in wireless communication aided by the passive IRS is difficult. Given the fact that the BS-IRS and IRS-UE channels cannot be individually estimated, an estimated cascaded BS-IRS-UE channel is obtained. However, cascaded channels face two significant challenges in terms of channel estimation, namely accuracy limitation and huge training overhead [
9].
The literature has mostly focused on the estimation of CSI under the Time Division Duplex (TDD) system-based Deep Learning (DL) techniques. This can be seen in [
10,
11,
12,
13,
14]. These studies employ DL-based techniques in their models, leveraging the assumption that channels are reciprocal since the BS can obtain the uplink CSI to estimate the downlink CSI. On the other hand, IRS-aided Massive MIMO systems, under the FDD system, are limited, despite FDD being an essential operating system for applications requiring low latency and high reliability. Since the FDD system has no channel reciprocity, the UE must relay the downlink CSI to BS via the feedback link. This introduces challenges in FDD-based IRS-aided Massive MIMO systems due to the increased overhead and the potential for inaccurate channel estimation of the cascaded CSI. Compressive Sensing (CS) techniques have been employed to reduce the feedback overhead. However, they pose challenges such as noise sensitivity and sub-optimal performance in time-varying and complex environments. Therefore, recent research is directed on leveraging DL-based models in the context of FDD channel estimation to enhance the accuracy and efficiency of CSI estimation. Despite these advancements, adapting these techniques to dynamic channel conditions and balancing the trade-offs between estimation accuracy and feedback overhead remain ongoing challenges.
Advanced DL-based models, including the Gated Recurrent Unit (GRU) and Dropout (DO) technique, have been developed to improve the accuracy and efficiency of CSI estimation. GRUs [
15,
16] use built-in memory cells to store information over time, whereas the DO [
17,
18] increases model robustness by deactivating neurons at random during training to prevent overfitting.
Building on these advancements, this paper proposes a DL-based model in the feedback link of FDD-based IRS-aided Massive MIMO, with the following essential contributions:
Proposing a regulation technique called “Dropout (DO)” within the framework of our proposed model in [
19] named “Channel state information Network, in conjunction with the Denoising Convolution Neural Network (CsiNet-DeCNN)”. This addition will mitigate overfitting, which can take place during the learning process and will take into consideration its effects on system performance since reducing the training overhead is accompanied by an improvement in system accuracy.
Leveraging a Recurrent Neural Network (RNN), known as the Gated Recurrent Unit (GRU), in our proposed model enhances the ability to discriminate and preserve essential signal features throughout the denoising process. The GRU improves the channel estimation accuracy and facilitates the learning of spatial structures in conjunction with time correlation in time-varying channels.
So that the validity of the proposed contribution may be assisted throughout this paper by investigating the performance of the proposed DGD-CNet model which undergoes a comprehensive analysis of different parameters, these parameters may be summarized as follows: Normalized Mean Square Error (NMSE), correlation coefficient, the system accuracy, the Signal-to-Noise-Ratio, as well as the computational complexity. The obtained results may be consolidated to prove that the proposed model can have higher system performance. This may be achieved by reducing the training overhead and achieving more accurate channel estimation at BS.
The rest of this paper is divided into the following:
Section 2 presents previous work undertaken in the same field, within literature.
Section 3 presents an IRS-aided Massive MIMO communication system model and the CSI feedback process.
Section 4 describes the proposed model to be used in channel estimation.
Section 5 discusses the results and analysis. Finally, a conclusion of the research and recommendations for future work can be found in
Section 6.
2. Related Work
To address the challenge of channel estimation, researchers have proposed various techniques in the literature for its enhancement in Massive MIMO [
20] and IRS-aided Massive MIMO systems [
21]. The authors in [
22,
23] introduced models based on CS, which decrease CSI feedback overhead and improve accuracy in FDD-based Massive MIMO systems. In [
22], a CS model reduced feedback overhead by using the spatial representation of signals, which improved efficiency in managing CSI. However, this model may face challenges in noisy or rapidly changing environments, where the assumption of spatial sparsity may not accurately capture real-world complexities, while [
23] utilized CS techniques to estimate CSI in slow-varying channels, which helped in reducing feedback overhead and improving efficiency; this model has limitations in fully addressing system complexity and enhancing accuracy in more dynamic or noisy environments.
Recently developed DL-based models possess a notable benefit when applied to the FDD-based Massive MIMO CSI estimation and feedback link. This is due to their ability to autonomously obtain the characteristics of a given problem and eliminate the requirements for prior extensive knowledge. In [
24], researchers proposed a model, which was explored for the first time. This included the use of a DL-based model for feedback that utilized CsiNet, employing the channel matrix in Massive MIMO. This channel matrix functions as an image. To compress and receive feedback of the CSI, a Convolutional Neural Network (CNN) is used. CsiNet is composed of a number of Neural Network (NN) layers [
25], that are designed to minimize CSI feedback while recovering the channel with high accuracy. NN comprises an input layer, several hidden layers, and output layers. It is responsible for training and learning the model, allowing it to generate accurate predictions.
Due to concerns related to network training, inaccurate channel estimation, and ignorance of the time correlation of the channel, the adoption of RNN has increased in recent years. A key feature of RNN is the presence of a hidden layer capable of retaining previously processed information. During the processing of time series data, this layer represents a structural advantage. Consequently, RNN works as a channel estimator, to improve the accuracy and learning of the CSI estimation. To estimate the channel for the feedback link, the authors in [
26,
27] integrated the CsiNet used in [
24] with a Long Short-Term Memory (LSTM) network, i.e., an RNN, which utilizes time-variant parameters. In [
26], the authors proposed a wireless channel temporal and frequency correlation-based CsiNet-LSTM, which learns spatial structure. This system is coupled with time correlation. For the latter, samples used to train time-varying Massive MIMO channels were employed. In [
27], a CsiNet architecture integrated a Convolutional (Conv) layer with an LSTM-based compression and decompression model (Conv-LSTMCsiNet), to improve CSI prediction and independently extract the spatial and temporal features. Although both models presented in [
26,
27] achieved better results than CsiNet, limitations arose due to the complexity of LSTMs, with their multiple gates and separate memory cells, leading to an increase in the space complexity of the model. The proposed model, however, uses GRU to mitigate these issues.
In [
28], a two-stage model was introduced, where the first stage utilized uplink channel estimation employing an Adaptive Deep Neural Network (ADNN), optimizing channel amplitude estimation and reconstruction. The second stage involved hybrid precoding for downlink data transmission using Adaptive LSTM (ALSTM) with performance improved by optimizing the hidden neurons. While this approach enhances accuracy in both channel estimation and data recovery, it also increases the risk of overfitting. As a result, the model performs well in controlled environments requiring high precision but may struggle in more dynamic environments due to these limitations. In [
29], the authors proposed an information detection and selection network (IdasNet), a DL-based framework for compressing CSI and providing transmitter feedback in an FDD-based Massive MIMO system. However, its complex design may complicate implementation in real-world applications. Moreover, its performance can be affected if the pre-compression and self-information selection steps are not optimally executed.
To enhance both the accuracy and efficiency within the communication network, IRS was used. Researchers in [
11,
12,
13,
14] introduced an enhanced channel estimation model, to be used with the TDD-based IRS-aided Massive MIMO system. In [
11], an Improved Deep Residual Shrinkage Network (IDRSN) was introduced to improve the pilot design by effectively reducing noise, making it advantageous in stable channel conditions. However, it may perform worse in dynamic environments due to its potential difficulty in adapting to sudden changes, leading to less accurate channel estimation. The authors in [
12] used a Residual U-shaped Network (ResU-Net) and a deep CS-based channel estimation model to identify the cascaded channel matrix with minimal pilot overhead. However, this model may struggle in scenarios with rapidly changing channels, potentially impacting the channel estimation accuracy. In [
13], researchers presented a hybrid IRS structure and DL-based CNN for sparse channel amplitude determination. However, the added complexity of the model may decrease channel estimation accuracy, making it less suitable for real-time applications. In [
14], to denoise channel estimation, a Convolutional Deep Residual Network (CDRN) was used in IRS-Multi-User Communication systems (MUCs). However, its performance may degrade in highly dynamic channels.
Currently, limited research is being conducted on DL-based CSI feedback for FDD-based IRS-aided Massive MIMO systems. Our proposed model in [
19] focuses on a DL technique to determine accurate CSI in FDD-based IRS-aided Massive MIMO systems, specifically through the feedback link. This model works on reducing channel noise through the integration of CsiNet and a deep denoising convolution neural network (CsiNet-DeCNN). The DeCNN exploits spatial characteristics of noisy channel matrices and subtracts noise additively, improving estimation accuracy. The results obtained through CsiNet-DeCNN were better than those achieved by the CsiNet model [
24] in channel reconstruction. However, it does not address the time correlation in time-varying channels, which is crucial for real-life applications. In [
30], the author introduced an attention mechanism-based CsiNet (ACNet), which uses a limited number of parameters, with the consideration of time correlation in time-varying channels. Although this model outperforms other DL-based CsiNet models in the literature, its performance decreases with a decreased Compression Ratio (CR), defined as the ratio of the size of the compressed data to the size of the original data.
Hence, this paper proposed a model that focuses on enhancing channel estimation for the feedback link in FDD-based IRS-aided Massive MIMO systems. This incorporates considerations for time correlation in time-varying channels, resulting in improved performance even with decreased CRs.
Table 1 summarizes the aforementioned works relating to channel estimation models and the proposed model.
Notation: Throughout this paper, scalar variables, vectors, and matrices are represented by normal-face letters, bold-face lowercase, and uppercase symbols, respectively. represents the complex field, while is the real field. The real and imaginary parts of a matrix A are represented by Re(A) and Im(A), respectively. The subscript denotes the Hermitian (or conjugate transpose) of a matrix or vector. The notation {.} represents the expectation operation.
Additionally, the complex space of m × n dimensional matrices are represented by . The function diag(.) denotes the diagonal matrix formed from the input vector. The Euclidean norm is returned by the operator , the Frobenius norm is denoted by , the notation denotes the transpose of the matrix, and the operation ° is the Hadamard product.
4. Proposed DGD-CNet Channel Estimation Model
This paper aimed to improve channel estimation accuracy by introducing the DO technique and GRU unit to the CsiNet-DeCNN model. Therefore, the following sub-sections provide a brief illustration of the CsiNet-DeCNN model, GRU, and DO, as well as the proposed DGD-CNet model architecture and the key performance indexes used to evaluate the proposed model.
4.1. CsiNet-DeCNN Model
Previously, our CsiNet-DeCNN model was published in [
19], in which the denoising encoder–decoder model was proposed. It showed the effects of integrating the denoising module into the autoencoder CsiNet model. During the process of CSI sensing and reconstruction, CsiNet-DeCNN performed exceptionally well. As shown in
Figure 3, the UE employs the denoising encoder for compressing the channel matrix
with
2.
The Leaky Rectified Linear Unit (LeakyRELU), as a nonlinear activation function and a
kernel Conv filter layer [
33], is part of the denoising encoder. This filter is a small matrix applied to the input data to extract local features by performing element-wise multiplications and summations. It captures spatial relationships between neighboring elements, balancing detailed feature extraction with the preservation of spatial resolution. This filter size is commonly used to maintain fine-grained details and enhance the accuracy of the reconstructed output. Furthermore, each layer undergoes Batch Normalization (BN), a technique that standardizes the inputs to a layer by recentering and rescaling them, which helps stabilize and accelerate the training process. Additionally, the feature compression of the denoising encoder transforms the channel matrix into a reshaped vector. This step is followed by a split of the channel matrix into two separate flows, namely a Fully Connected Network (FCN) and the DeCNN. The FCN performs as an integrated network that works on the acceleration of convergence and the mitigation of the vanishing gradient problem [
34]. DeCNN utilizes a denoising module that effectively eliminates noise in the noisy channel matrix [
19].
The DeCNN block has two layers: input and output, along with three denoising blocks. The three denoising blocks are connected in sequence, which enhances the effectiveness of the denoising process. Each of the three denoising blocks consists of a residual subnetwork with layers and an element-wise subtraction operation. The first layer of the residual subnetwork utilizes the “Conv+BN+ReLU” operation. To analyze the spatial characteristics of the channel vector, the two operations of Conv and ReLU are used together. To ensure that network stability is enhanced and that network training is accelerated, BN is integrated between the two operations. A Conv operation is employed in the final layer of the residual subnetwork. This operation merges the extracted features and generates the residual noise vector, which is then used in subsequent element-wise subtraction. To utilize the additive character of noise, an element-wise subtraction is finally used to denoise the noisy channel vector, then the two vectors are added to generate the final codeword. The codeword is subsequently delivered to the BS via the feedback link for CSI recovery.
In BS, the reconstructed channel coefficient matrix,
, is recovered through the use of the received codeword. The denoising decoder involves feature decompression and channel recovery. The feature decompression module is designed to include two flows, which are realized with the compressed method developed in the denoised encoder by an FCN and DeCNN block. The two flows are combined to produce the reconstructed output:
matrices, which estimate the real and imaginary components of
as an initial point. The channel recovery module reconstructs the channel matrix using two RefineNets units, which instantaneously refine reconstruction. The RefineNet unit consists of two layers: one input and one output, along with three further Conv layers, which utilize
kernels. This can be seen in
Figure 3. The subsequent steps involve passing the refined channel matrix into the final Conv and BN layer. This is where the sigmoid function is employed to adjust the values within the range of [0,1] and produce the final reconstructed channel matrix
with the dimension
.
The CsiNet-DeCNN model, introduced in [
19] and illustrated briefly in this subsection, significantly enhanced the performance of the CsiNet in reconstructing the channel matrix, with a low NMSE and high channel estimation accuracy. However, it did not consider the time correlation in time-varying channels.
We improved the performance of this model by leveraging the DO technique and GRU unit, which will be detailed in the following subsection.
4.2. Gated Recurrent Unit (GRU)
To enhance the performance of the system, GRU was proposed. The GRU [
35] accelerates the model and improves accuracy by simplifying the architecture of LSTM networks [
36]. Both GRU and LSTM use gating mechanisms to regulate the flow of information. In LSTM, three gates, such as the forget gate, input gate, and output gate manage data flow to and from the cell state, supporting long-term dependency management. On the other hand, the GRU streamlines the same process with only two gates: the update gate, which combines the function of the forget and input gates, and the reset gate, as shown in
Figure 4.
To control the amount of information saved between the different states (current and previous), the update gate,
, is used, which is defined as [
35]
The input weight matrix for
is denoted by
.
represents the bias term corresponding to this matrix,
represents the previous hidden state at time
,
denotes the recurrent weight matrix,
represents the GRU unit’s input vector, and
represents the activation function for the two gates, update, and reset. The latter,
, responsible for specifying the amount of information about the previous moment, which can be defined as [
35]
where
is the input weight matrix for the reset gate,
represents the corresponding bias term and
denotes the recurrent weight matrix for the reset gate. Both candidate and output hidden state gates
and
, respectively, are represented by [
35]
where
represents the weight matrix of the output hidden state, while the corresponding bias term is
and
is the recurrent weight matrix. tanh represents the activation functions for the candidate gate.
To enhance the robustness and generalization capability of the proposed model, a DO technique [
17] is applied to the final reconstructed channel matrix
from the CsiNet-DeCNN decoder. This is denoted by:
where
is the dropout rate. Subsequently, the processed sequence
is fed into the GRU unit. The GRU architecture allows for capturing temporal dependencies within the decoded feature across time steps
. At each time step
, the GRU computes the hidden state
based on the input
and the previous hidden state
, for
time steps, as follows:
4.3. The Proposed DGD-CNet Model Architecture
Inspired by RNN’s superior performance in channel spatial–temporal feature extraction, our proposed model improves CsiNet-DeCNN by leveraging DO and GRU to enhance the trade-off between CR and recovery quality.
Figure 5 shows the proposed DGD-CNet model.
The two stages that comprise our model are the extraction of features from the angular-delay domain and the representation and final reconstruction of correlations. For feature extraction in the angular-delay domain, the model compresses CsiNet-DeCNN with two distinct CRs to , to understand the structure of the angular-delay domain. The first channel, , gets converted with the usage of CsiNet-DeCNN, with a High-Compression Ratio (High-CR). The conversion turns it into a codeword vector, (), which can be used for high-resolution recovery, since it keeps enough structured information.
At the Low-Compression Ratio (Low-CR), the CsiNet-DeCNN encoder is used to process the remaining channel matrices of (). This leads to the production of a sequence of () codewords ( > ), since channel correlation reduces the amount of information needed. () codewords and the initial codeword are joined, before input into the Low-CR CsiNet-DeCNN decoder.
This ensures that feedback information is fully used. As features are extracted from the angular-delay domain, each CsiNet-DeCNN decoder produces two matrices, with sizes .
Since all the Low-CR CsiNets-DeCNNs in
Figure 5 have the same job, they have identical weights and biases. This means that their network parameters are identical. With the constant change in speed and frequency of feedback, the value of T also changes, and an easy rescale of the architecture is possible. This ensures the performance of channel groups with a changing
. This condition decreases the parameter overhead. Rather than producing (
) copies, a Low-CR CsiNet-DeCNN is reapplied (
) times in operation.
To enhance the generalization and mitigate overfitting, the reconstructed channel matrix from the CsiNet-DeCNN decoder incorporates the DO technique, as described in Equation (14). Also, the proposed model is improved by integrating the GRU units. To be more specific, GRU receives lengthy sequences as input from the DO as in Equation (15). During each step, previous inputs work on implicitly training the GRU, so that it gets to know time correlation. This is followed by their merger with current inputs to increase the recovery quality Low-CR.
The hidden layers in the GRU are equal to the output dimension. For the final reconstructed channel matrix, , two matrices made of are reshaped from the final outputs. This matrix is then transformed into the spatial frequency domain using 2D-IDFT to obtain the CSI representation.
The procedure of the proposed DGD-CNet model is summarized as follows. At the UE, there are several CRs, CsiNet-DeCNN encoders are established, whereas at the BS, there are CsiNet-DeCNN decoders, as well as the DO-GRU system. Each side has a counter. is first compressed at the UE with High-CR, and it is subsequently recovered at the BS by a High-CR CsiNet-DeCNN decoder and DO-GRU. For (), marking the following time step, at the UE, is the result of the transformation of to a lower-dimensional codeword. The latter, , should hold the learned correlation information. At the BS, , which marks the lower dimensional codeword, is received. A concatenation takes place between and (received from ). Then, they are inversely transformed using CsiNet-DeCNN decoder and DO-GRU, at the BS. The counter makes an additional count following each step. The same operation takes place until is reached on the counter. Then, a reset of the GRU takes place to be able to recover subsequent channel groups.
The overall network can be defined as an autoencoder. For this, it is assumed that fully differentiable channel models are used for training. This takes places for all kernels and biases for encoders and decoders, denoted by θ
,
. The function of this autoencoder is denoted by
, which takes the input
and produces the reconstructed channel matrix
:
where θ represents the complete set of encoder–decoder parameters,
represents the function of the network, and
is the reconstructed channel coefficient matrix of the cascaded CSI. For fair comparison with the other models in the literature, end-to-end training is accomplished on the network. For this to happen, a parameter update is undertaken by the Adaptive Moment Estimation Algorithm (ADAM) to minimize the Mean Squared Error (MSE) [
37]. The loss function is defined as the MSE between the reconstructed channel
and the original channel
. It is calculated with the usage of the following Equation [
26]:
where
is the total number of samples in the training set.
4.4. Key Performance Indexes
To evaluate the performance of the proposed DGD-CNet model in enhancing channel estimation, several key performance indexes are utilized. The NMSE measures the variation between
and
H (the reconstructed and the original channel matrix, respectively), and can be calculated from the Equation below [
26]:
The correlation coefficient (
measures the similarity between the original channel vector
and the reconstructed channel vector
of the
sub-carrier. This is carried out for a specific time
. Equation (19) is used to calculate (
[
26]:
where
is the
sub-carrier’s reconstructed channel vector at time
. The ratio of the reconstructed to original channel vector indicates accuracy, which can be described using the following Equation [
38]:
These key performance indexes collectively provide a comprehensive assessment of the proposed DGD-CNet model’s effectiveness in various scenarios and conditions. The overall training and testing process for the proposed model can be summarized in the flowchart seen in
Figure 6.
5. Numerical Results
This section provides the validation and analysis of the proposed DGD-CNet model and its comparison with the other models listed in the literature. Two types of channel matrices were generated using the channel model created by COST 2100 [
39] to create the training and testing samples. Switching to the angular-delay domain and following truncation, the channel matrix
, i.e., was changed from size
to size
.
Table 2 lists the channel setup parameters that apply to both indoor and outdoor situations, as well as the number of samples employed in testing, validation, and training to ensure fair comparison with the other existing models in the literature. These parameters include the number of epochs, batch size, and learning rate.
Collaboratory (Python 3.7) is used to implement the proposed DGD-CNet model. The performance of the proposed model is compared with other DL-based models that use CsiNet as their baseline. CsiNet offers a robust framework for channel estimation due to its advanced DL architecture, which enhances the channel estimation accuracy.
As such, the comparison undertaken in this paper with other models includes CsiNet [
24], CsiNet-LSTM [
26], Conv-LSTMCsiNet [
27], CsiNet-DeCNN [
19], and ACNet [
30]. Using different CRs, NMSE is one of the parameters tested, along with the correlation coefficient (
, SNR, and accuracy in both indoor and outdoor situations.
For various CRs, the DGD-CNet model’s performance is evaluated. is compressed under 1/4 CR for all evaluations.
Figure 7 shows the comparison between NMSE (in dB) and different CRs for the proposed DGD-CNet model and other DL-based models in both indoor and outdoor situations at SNR 5 dB. This investigation used an SNR value of 5 dB to reflect a realistic and challenging scenario for channel estimation and compression ratio trade-offs.
For High-CR, the proposed model has better performance than other DL-based models, with the lowest NMSE at −51.15 dB and −16.86 dB for indoor and outdoor situations, respectively. When compared to previous DL-based models, the proposed model had the lowest performance loss for both situations at Low-CR.
To clarify the conducted comparison,
Table 3 shows the percentage improvement in NMSE of the proposed model for Low-CR in comparison with other models from the literature (DL-based models). The model discussed in this paper shows that its performance improvement increases as CR decreases (Low-CR) for indoor situations. Also, for outdoor situations, our proposed model still shows better performance when CR decreases. For indoor situations, the improvements reach up to 437% over CsiNet and 433% over CsiNet-DeCNN at CR 1/64. In outdoor situations, the model shows up to 360% better NMSE compared to CsiNet at CR 1/64, although the gains are relatively lower due to the more complex and dynamic nature of outdoor environments. Nonetheless, the proposed model still outperforms other DL-based models, demonstrating its robustness and effectiveness in compressing CSI without compromising accuracy, making it a promising solution for CSI feedback in FDD-based IRS-aided Massive MIMO systems.
Figure 8 provides a comparison between NMSE performance versus SNR, measured in dB, for different SNRs in outdoor situations at a 1/16 CR, the proposed DGD-CNet model consistently performs better than any other model across different SNR levels. This is an indication of the extent of its performance stability. Results also show that the DGD-CNet could adapt to a variety of noise levels and is resilient in different communication scenarios.
For an effective comparison between the DGD-CNet model and other DL-based models, CsiNet is employed as a baseline. For every tested scenario, the proposed model performed better than CsiNet. This same result was reached when compared with CsiNet-DeCNN, which includes an IRS component. This marks a validation of the design that includes both DO and GRU. Additionally, DGD-CNet surpasses in channel reconstruction over both CsiNet-LSTM and ConvCsiNet-LSTM, with GRU playing a significant role in this improvement.
Regarding indoor and outdoor situations,
Table 4 shows the relationship linking CR to the correlation coefficient
. The analysis suggests that the proposed DGD-CNet model achieves high-quality compression, evidenced by its ability to maintain high correlation coefficients in both indoor and outdoor situations.
At a higher CR of 1/4, DGD-CNet establishes a strong baseline with ρ values of 0.99 for indoor and 0.90 for outdoor situations. As the CR decreases to 1/16, 1/32, and 1/64, representing an increasingly higher level of data compression, DGD-CNet consistently demonstrates improvements over other models. Compared to its counterparts, DGD-CNet shows notable enhancement in both indoor and outdoor situations: at 1/16 CR, it achieves a 2% improvement indoors and 1% outdoors, maintaining robust performance across varying levels of data retention. Even at the lowest CR of 1/64, DGD-CNet exhibits a 2% enhancement indoors and 1% outdoors, highlighting its resilience in preserving channel information integrity under more severe data compression conditions. These findings underscore DGD-CNet’s effectiveness and suitability for enhancing channel estimation accuracy in FDD-based IRS-aided Massive MIMO systems, promising advancements in practical wireless communication applications.
Table 5 illustrates the relationship between different CRs and the accuracy of the other DL-based models for all situations, indoor and outdoor. The DGD-CNet model proposed here outperforms other models seen in previously published works in both situations because it has higher values of accuracy for different CRs. At a CR of 1/4, DGD-CNet achieves the highest accuracy, with 0.91 for indoor (an improvement of 10% over the next best model) and 0.75 for outdoor situations (an improvement of 4% over the next best model). As the CR decreases to 1/16, 1/32, and 1/64, indicating higher compression and lower data retention, the accuracy of all models declines. However, DGD-CNet consistently maintains superior performance, showing notable resilience. At a CR of 1/16, DGD-CNet achieves an accuracy of 0.65 indoors (an improvement of 4% over the next best model) and 0.53 outdoors (an improvement of 1% over the next best model), outperforming its counterparts. Even at the lowest CR of 1/64, DGD-CNet achieves higher accuracy (0.61 indoors and 0.40 outdoors) than other models, which struggle to maintain comparable performance. These results highlight the robustness and effectiveness of the proposed DGD-CNet in high-compression scenarios, making it a promising solution for improving channel estimation accuracy in FDD-based IRS-aided Massive MIMO systems.
The superior performance of DGD-CNet can be attributed to the integration of GRU and DO techniques within the CsiNet-DeCNN framework. The ability of the GRU to capture temporal dependencies in channel data, coupled with its simpler architecture, results in a more accurate CSI estimation. Additionally, the inclusion of DO plays a critical role in preventing overfitting and enhancing the generalization capabilities of the model. These elements work together to deliver significant improvements in NMSE and correlation coefficient, ultimately leading to a more accurate and robust performance compared to other DL-based models in the literature.
Table 6 displays the reconstructed images of the channel matrix for the proposed DGD-CNet model and the other models in the literature, compared with the original pseudo-gray image at different CRs. The images clearly illustrate the superior performance of the DGD-CNet model across all compression levels. At a higher CR of 1/4, all models produce reconstructions that closely resemble the original image, but the DGD-CNet retains slightly more detail. As the CR decreases to 1/16, the difference becomes more pronounced, with DGD-CNet maintaining more of the original features and providing a clearer image than CsiNet, CsiNet-LSTM, and ConvLSTM-CsiNet. This trend continues at a CR of 1/32, where the DGD-CNet’s reconstruction still holds more structural details compared to the more blurred and degraded image produced by other models.
At the lowest CR of 1/64, the DGD-CNet significantly outperforms all the other models, preserving more of the essential features and minimizing information loss, whereas all the other model’s reconstruction shows considerable degradation and loss of critical details. These visual differences underscore the effectiveness of the proposed DGD-CNet model in retaining higher fidelity to the original image, especially at lower CRs.
Table 7 compares the proposed DGD-CNet model and other models in the literature, specifically CsiNet and ConvLSTM-CsiNet, in terms of the number of parameters and floating-point operations per second (FLOPs). The number of parameters refers to the number of learnable weights within a model. This directly affects the model size, memory requirements, and the potential for overfitting. FLOPs represent the total number of operations the model performs during computation. It provides insight into the computational resources required to run the model.
The proposed DGD-CNet model presents notable advantages regarding the number of model parameters. Compared to CsiNet and ConvLSTM-CsiNet, which have significantly more parameters, the proposed model demonstrates a substantial reduction in the number of parameters. This is due to its streamlined model design, the use of GRUs, and the integration of DeCNN for effective feature processing. Additionally, incorporating DO helps prevent overfitting and enhances the model generalization performance.
However, this improvement in the number of parameters comes with a trade-off: the model exhibits more FLOPs than ConvLSTM-CsiNet. FLOPs are increased by the complex operations performed by GRUs and the DeCNN component. While the proposed model design minimizes the number of parameters and benefits from DO regularization effect, the higher FLOPs highlight the need to balance parameter efficiency with computational complexity.
Additionally, the computational complexity (FLOPs) of the proposed model decreases as the CR decreases (less data information), while aligning with the model’s objective to efficiently manage reduced data information. Specifically, the proposed model achieves a 71.41% reduction in complexity compared to CsiNet at High-CR, and an impressive 95.5% reduction in complexity at low-CR.
6. Conclusions
This paper proposes a new model named DGD-CNet for channel estimation to address feedback compression in FDD-based IRS-aided Massive MIMO systems. The DGD-CNet model was evaluated alongside various DL-based models, including CsiNet, Conv-LSTMCsiNet, CsiNet-LSTM, CsiNet-DeCNN, and ACNet. The evaluation focused on the NMSE, correlation coefficient, system accuracy, SNR, and computational complexity.
Compared to the conventional CsiNet, the DGD-CNet achieves a 437% improvement in NMSE at Low-CR for indoor scenarios and an 8% enhancement in the correlation coefficient over the CsiNet-DeCNN model. This significant improvement of the NMSE is primarily attributed to the integration of GRU and DO within the CsiNet-DeCNN framework. The GRU, with its simpler architecture, effectively captures temporal dependencies in the channel data while mitigating the risk of overfitting. This results in an improved generalization and more accurate CSI estimation. On the other hand, DO helps prevent overfitting by randomly setting a fraction of the units to zero during training. This prevents the model from becoming too dependent on any single neuron, thereby enhancing its robustness, accuracy, and contribution to a reduction in NMSE.
Despite its advantages, including a low NMSE, high accuracy, high correlation coefficient, and lower number of parameters, the DGD-CNet model has higher FLOPs compared to the ConvLSTM-CsiNet model; thus as a future work, it is required to study how to reduce those FLOPs for the DGD-CNet model.
The proposed model is particularly well-suited for 5G and future 6G networks, where precise channel estimation is crucial for optimizing performance and managing complex environments with high user densities. Future efforts may also involve collaborating with implementation teams involved in the manufacturing of reflecting surfaces to obtain experimental results.