1. Introduction
The multi-input multi-output (MIMO) system is a key technology in modern communication and can significantly improve channel capacity and communication reliability by using multiple antennas [
1,
2,
3,
4,
5,
6,
7]. Spatial multiplexing and diversity gain are representative schemes for this improvement [
1,
2]. Notably, the channel capacity increases linearly with the number of either transmitter and receiver antennas. However, this increase is based on the unrealistic assumption of perfect channel state information (PCSI) at both the transmitter and receiver.
Many studies have proposed improving the channel estimation accuracy with limited time and frequency resources [
8,
9,
10,
11,
12,
13,
14,
15]. A representative method is pilot-aided channel estimation, which exploits the information shared between a transmitter and receiver. Linear minimum-mean square-error (LMMSE) channel estimation is a well-known method for pilot-aided channel estimation, owing to its simple structure [
8]. However, LMMSE channel estimation exhibits unsatisfactory performance with a limited number of pilots. Thus, many pilots are required to satisfy the performance requirement, which decreases the spectral efficiency.
To overcome this problem, data-aided channel estimation has been investigated in which the detected data symbols are exploited as additional pilot symbols [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26]. However, the detected data symbols may have errors that degrade the accuracy of channel estimation. The iterative turbo equalizer can overcome this degradation by increasing the maximum-a-posteriori probability (MAP) [
16,
17,
18,
19,
20,
21,
22]. However, such an iterative turbo equalizer has considerable complexity and latency at the receivers.
As a non-iterative approach, the reinforcement learning (RL)-aided channel estimator was introduced in [
27,
28,
29,
30,
31,
32,
33]. The basic concept of this approach is the sequential selection of detected data symbols to minimize the channel estimation errors. Hence, a Markov decision process (MDP) was defined to solve the sequential selection, and the corresponding optimal policy was derived in a closed-form expression in [
31]. In [
32], a low-complexity algorithm was investigated by introducing sub-blocks and finite backup samples, and the computational complexity and latency were significantly reduced without performance loss. Recently, a general framework for RL-aided channel estimation was studied in [
33] based on Monte Carlo tree search. However, the RL-aided channel estimators in [
31,
32,
33] were originally considered in time-invariant channels; they perform insufficiently in time-varying channels.
In this paper, we propose an RL-aided channel estimator for time-varying MIMO channels. To achieve this, we first introduce an optimization problem for an RL-aided channel estimator in time-varying channels. We then formulate an MDP to solve the optimization problem, and propose an RL algorithm for the MDP that considers the time-varying nature of the channel. The main contributions of this paper are as follows:
We propose an RL-aided channel estimator for time-varying channels modeled using a first-order Gaussian—Markov process. First, we define the optimization problem in time-varying channels to select the detected data symbols and minimize the estimation error between the estimated and current channels. This optimization problem is different from those in [
31,
32,
33], where the selection of the detected data symbols is unchanged because the current channel remains unchanged with the time slot index.
We propose an RL algorithm for the optimization problem that captures the time-varying nature of a channel. Because the optimization problem minimizes the estimation error between the estimated and current channels, we adjust the weights of the data symbols to improve the channel accuracy of the current channel. Using this adjustment, we derive the optimal policy as a closed-form solution. Note that the proposed optimal policy differs from those in [
31,
32,
33] because the influence of soft-decision symbols in the virtual state for future rewards gradually diminishes as the time slot index increases.
We propose a further performance improvement scheme to refine the state elements. This is because the previously selected data symbol degrades the estimation accuracy of the current channel. To improve the estimation accuracy, we refine the previously selected data symbol by reflecting the channel variation. In addition, we remove selected data symbols that are too old by introducing a sliding window, because they have a large noise variance to estimate the current channel. Through simulations, we demonstrated the effectiveness of the proposed channel estimator compared with conventional channel estimators in time-varying channels.
The remainder of this paper is organized as follows. In
Section 1, we introduce the system model, optimization problem, and the MDP. The proposed channel estimator, which determines the optimal policy for time-varying channels, is described in
Section 2. We propose a further performance improvement scheme in
Section 3. In
Section 4, we present simulation results to demonstrate the effectiveness of the proposed channel estimator. Finally, we provide our conclusions in
Section 5.
2. Preliminaries
This section describes the system model of a data-aided channel estimator for time-varying MIMO channels. We present the considered channel estimation and data detection schemes based on the model and introduce an optimization problem for data-aided channel estimation.
2.1. System Model
We consider MIMO systems in which a transmitter with a number of transmit antennas
communicates with a receiver with a number of receive antennas
(
Figure 1). The information is first encoded and mapped to the symbol constellation where
is the symbol constellation set. The transmitted symbol at time
n denoted by
is then sent over a wireless channel. We model the wireless channel using a first-order Gaussian—Markov process as a time-varying channel model [
34,
35,
36,
37,
38], where the channel matrix
has its
-th component between the
t-th and
r-th antennas following a Rayleigh fading
distribution. The temporal correlation of the wireless channel, denoted by
, increases with velocity. Based on this model, the channel matrix
at time slot
n is given by
where
follows a
distribution.
When the transmitter sends the symbol
to the receiver over the wireless channel
, the received symbol
is given by
where
denotes the conjugate transpose.
is the additive white Gaussian noise (AWGN) at time slot
n, with distribution
, where
and
respectively denote
zero and identity matrices.
The frame consists of one pilot and
data blocks (
Figure 1). The pilot block contains
symbols, whereas each data block contains
symbols.
is defined as the pilot index set and
is defined as the data index set. We consider data-aided channel estimation, where the receiver obtains the initial channel estimates using pilot symbols, and the accuracy of the initial channel estimates is improved by exploiting data symbols.
We adopt the LMMSE method as the basic channel estimation method because it has a simple structure and provides a reasonable performance. Based on the LMMSE method,
of the
r-th row for the initial channel estimate
can be obtained as
where
is the inverse operation.
and
are the pilot and corresponding received symbols in the pilot block, respectively.
The conventional channel estimator performs data detection at the receiver using the initial channel estimates
. Because the MAP rule guarantees optimal performance, we adopt it for data detection, which is given by
where
is the cardinality of a set.
where
k belongs to the index set of the symbol vector candidate
.
denotes a posteriori probability (APP), which is given by
where the likelihood probability in (5) is calculated by assuming the AWGN channel as
where
denotes the norm operation and
is the probability of an event. The a priori probability in (5) is also assumed to have an equal probability for possible candidate transmitted symbol
.
2.2. Problem
In a time-varying channel, the estimation accuracy of decreases gradually as time slot index n increases. This degradation results in poor detection performance at the receiver. Because the detected data symbol may have an error owing to the channel, an incorrect use of the detected data symbol severely degrades performance. To overcome this degradation, we consider a data-aided channel estimator that selects the detected data symbols for data-aided channel estimation.
For the selection, we define action
where the detected data symbol is used in channel estimation when
; otherwise, the detected data symbol is not used. When we define
as a set of actions, the considered data-aided channel estimation can be obtained using this set as
where
and
. The time slot index of the
i-th nonzero element is denoted as
. We then define the optimization problem as
where
is the expectation of a random variable.
Compared with previous studies [
31,
32,
33], the optimization problem in (8) considers the selection to minimize the MSE between the estimated channel and
. Because the channel is variant with time slot index
n, the best action
may be different with time slot index
n. That is, the best action in the previous time slot index may be invalid in the next time slot index. In addition, the optimization problem is difficult to solve because the number of candidate actions increases exponentially with the data symbol length. An exhaustive search for action candidates is not feasible in practical applications. To resolve these difficulties, we introduce a sequential selection of the detected data symbols and a refinement of the selected data.
2.3. Markov Decision Process
We formulate an MDP that solves the optimization problem in (8). To achieve this, we define state
, transition function
, action
, and reward
[
39]. Subsequently, the Q-value function
and the optimal policy
will be presented. The basic definitions for the MDP are adopted from those in [
31,
32,
33]; however, the RL solution for the MDP is different from those in previous studies, which will be explained in the next section.
The state set
is defined as
where
is the set of time slot indices where the symbol is used in channel estimation, and
is the
i-th smallest element.
is the transmitted symbol index at time slot
n. Based on the expression, we can obtain the proposed channel estimate using the state
as
where
. Note that
is the set of all states and
is the state.
The action set is defined as
. As explained in the previous subsection, the detected data symbol is used in the proposed channel estimation when
; otherwise, the detected data symbol is not used. The transition function
from state
is defined as
where
equals one when the event is true and zero otherwise.
and
.
is a possible candidate for the next state from state
and is defined as
The reward
is defined as the difference between the MSEs at the current state
and the next state
, which is given by
where
is error covariance. Unlike in [
31,
32,
33], the error covariance is defined between the estimated channel
and
at time slot index
n.
The Q-value function
is the sum of the rewards, which is given by
where
is a trace operation.
is the optimal sum of future reward after
.
is a discounting factor whose value is assumed as one because the proposed channel estimator also considers the effect of future rewards at the ending state [
31].
The optimal policy maximizes the Q-value function, which is expressed as
Solving the optimization problem in (15) is highly difficult because the transition probability
is unknown, and the number of candidate states exponentially increases with the data length. An effective method to solve this problem is to use a reinforcement learning algorithm. Therefore, the proposed channel estimator also adopts a reinforcement learning algorithm, but the effect of the time-varying channel is also considered in comparison with [
31,
32,
33].
A deep reinforcement learning (DRL) approach is a promising solution for dealing with the dimension explosion of the states by leveraging deep neural networks. To apply the DRL approach to our MDP, an agent needs to interact with an environment to obtain an action-value function for a given action and state. However, both the states and rewards of our MDP are not observable at the receiver. This means that the agent cannot acquire training samples, each of which consists of the state (or the state transition) and the corresponding reward. Consequently, the DRL approach and other data-driven approaches are not directly applicable to solving our MDP.
3. Proposed Optimal Policy
This section describes the proposed optimal policy. The basic concept of the derivation is similar to that in [
31,
32,
33]. However, its direct extension is difficult for time-varying channels. This is because capturing time-variant channels using previously selected data symbols is difficult. To address this, we approximate the first-order Gaussian—Markov process and propose a computationally efficient algorithm.
We employ the approximation in [
31,
32,
33] for the transition function, which is given by
where
as
.
The main difficulty in analyzing the time-varying channel model is solving element
. To resolve this difficulty, we approximate the first-order Gaussian—Markov process in (1) as follows:
where
. This approximation is often adopted in studies because it provides analytical tractableness [
36,
37,
38]. Using this approximation, the received symbol
for
can be expressed in terms of
as follows:
From approximation (18), the virtual state in [
31] that mimics the optimal behavior from state
can be obtained as follows:
where
The soft-decision symbol
for
is define as
In (19), because
, the effect of soft decision symbol
for estimating
is diminished as
m increases. Based on the virtual state, the state-action diagram for the proposed channel estimator is shown in
Figure 2. In this figure, the number of state transitions at state
are one and
K for
and
, respectively. However, after
, the state transition is simplified to one because the virtual state mimics the behavior of state
.
Using the definition of virtual state (19), we can compute the future reward
as
By applying (13) to the future reward, the future reward is simplified as
Using the approximations (16) and (22), the Q-value function in (14) is obtained as follows:
The error covariance matrix
can be computed as
where the distribution of
is given by
and
is applied in
.
is used in
.
By applying (24) to the Q-value function, the optimal policy at
is computed as
where
.
and
are defined as
Similar to [
31],
and
satisfy
. In addition,
and
satisfy
where
, and
.
Finally, similar to [
32], by applying the results in (23) and (24) to (25), we obtain the proposed optimal policy in closed-form as
When we define
and
, vectors are computed as
,
,
, and
. In addition, the constants are computed as
,
, and
. Note that the expression of the optimal policy in (26) is similar to that in [
32]. However, the vectors and constants in the optimal policy is different from those in [
32] because the temporal correlation
is considered in
and
. When
, the optimal policy in (26) is equivalent to that in [
32].
4. Further Performance Improvement
In this section, we propose a practical method to improve the estimation accuracy of the proposed channel estimator. The proposed method refines state elements to capture the time-varying nature of the channel.
4.1. State Element Refinement
Elements
and
in state
are updated when the detected data symbol is selected based on the optimal policy. However, the elements gradually lose their effectiveness in estimating
as time slot index
n increases. To address this, we first represent the received symbol for
in terms of
as
Using (27), we refine the elements
and
in state as the time slot index increases, which is given by
Regardless of the above refinement, the previously selected data symbols lose their effectiveness as the time slot index increases, particularly for large data lengths. This is because the term
in (1) becomes dominant, increasing the uncertainty in estimating the channel. To overcome this, we remove too-old selected data symbols in state by introducing a window size
. In other words, we maintain the size of the set of time slot indices as
. Thus, when the optimal action is one at time slot index
n,
n is included, whereas the first index
is removed from set
, which can be expressed as
4.2. Algorithm
Using the proposed optimal policy and performance improvement strategy, the proposed channel estimator is summarized in Algorithm 1. The receiver obtains the initial channel estimation during pilot transmission. Subsequently, during the data transmission, the receiver sequentially selects a data symbol based on the optimal policy. When the optimal action
, the state
is updated using the most-probable state transition [
31]. In addition, the state element refinement is performed based on this condition. After each data block ends, the channel estimate is updated using the state
.
Algorithm 1: Proposed channel estimator |
1 Obtain the initial channel estimate from (3) 2 Initialize the state . 3 for to do 4 fordo 5 Compute the optimal policy from (26). 6 Set the optimal values for and for . 7 Update from (12). 8 if and then 9 Remove the state elements in . 10 , 11 , 12 . 13 end 14 Refine the state elements in and . 15 end 16 Update the channel estimate from (10). 17 end |
In
Figure 3, we show a block diagram of the proposed channel estimator, which consists of the LMMSE channel estimator, optimal policy calculator, and state element refinement. The LMMSE channel estimator obtains the initial estimate at pilot transmission and updates the estimate at data transmission using state
. The optimal policy calculator obtains the optimal action of (26) from the channel estimates and APP from the data detector. The state elements are then refined based on the obtained optimal action, and the refined state is used to estimate the channel and optimal policy for the next step.
Application of other data detection: The proposed RL-aided channel estimator can be universally applied to any other soft-output data detection method. To achieve this, the proposed RL-aided channel estimator relies on the availability of APPs, which can be directly derived from the MAP data detection method. In the case of using other soft-output data detections, the proposed RL-aided channel estimator can utilize the APPs that are computed from the log-likelihood ratios.
Complexity analysis: Complexity is analyzed in terms of real multiplications to provide an implementation perspective.
Figure 3 shows the hardware structure of the proposed RL-aided channel estimator, which consists of the LMMSE channel estimator, state element refinement, and optimal policy calculator. Because the exact complexity can vary depending on the implementation details, the complexity order (
) of each component is analyzed.
The complexity order of the LMMSE channel estimator in (7) is
where
is the set of selected data symbol vectors. The complexity order of state element refinement in
Section 4.1 is
. The complexity of the optimal policy in (25) is primarily determined by the computation of
. Consequently, the complexity order of the optimal policy in (25) is
. It is important to note that among the components of the proposed channel estimator, the optimal policy calculator has the highest complexity because it performs every data symbol index
n, while the other components perform every data block index
d.
5. Simulation Results
This section presents the effectiveness of the proposed channel estimator using simulations. The numbers of transmit and receive antennas used were and . The transmission frame consisted of one pilot block with symbols and data blocks with symbols. Each symbol used 4-quadrature amplitude modulation (QAM) symbol mapping. We adopted turbo channel code with a rate of and 16 cyclic redundancy check bits. For the proposed channel estimator, the window size was set to . The signal-to-noise ratio (SNR) was defined as under the power constraint . The proposed channel estimator was compared with the following methods.
PCSI: This method is ideal for time-invariant channels in which a perfect initial channel estimate is available at the receiver. Because the initial channel changes during data transmission, it is not optimal for time-varying channels.
Pilot: This method uses a conventional pilot-aided channel estimator using (3).
Soft: This method is a data-aided channel estimator when all symbols in (20) are used as additional pilot symbols.
Conv-RL [
31]: This method is a data-aided channel estimator in which the detected data symbol is selected using the RL approach developed for time-invariant channels.
The performance of the methods was compared with that of the proposed channel estimator in terms of the block-error rate (BLER) and normalized MSE (NMSE). In addition, we considered the time-invariant channel and time-variant channel with and . Note that channel was more severely variant when than when .
Figure 4 shows the BLERs for the proposed and other channel estimators in the time-invariant channel, i.e.,
. The conventional pilot-aided channel estimator exhibited a poor performance when the number of pilots was small. Data-aided channel estimators can overcome performance degradation caused by pilot-aided channel estimators. In particular, the RL-based channel estimator [
31] showed an outstanding performance compared with other channel estimators. The BLER of the proposed channel estimator was slightly worse than that of [
31] because of a reduced window size
.
In
Figure 5, the proposed channel estimator is compared with other channel estimators in time-varying channels. The proposed channel estimator had a better BLER improvement than the conventional pilot-aided channel estimator. In particular, the performance improvement is more prominent at
than that at
. This is because the proposed channel estimator can efficiently capture channel variations by selecting and refining detected data symbols. In addition, in time-variant channels, the proposed channel estimator had a slightly higher BLER than the RL-based channel estimator, primarily due to the utilization of a reduced window size (see
Figure 5). However, in time-varying channels, this reduction in window size actually contributed to an improvement in BLER by effectively leveraging the most recent data symbols. Consequently, the proposed channel estimator had a lower BLER compared to the RL-based channel estimator. In
Figure 6, we show the BLER of the proposed channel estimator for different window sizes
in time-varying channels with
The BLER of the proposed channel estimator gradually degraded as
increased. This is because the selected data symbol is undesirable as an additional pilot symbol in fast fading channels; therefore, only the usage of the latest selected data symbol can improve the performance.
To further investigate the effect of window size, we investigated the NMSE of the proposed channel estimator for different window sizes
at
and
dB (
Figure 7). We observed that the NMSE improved until
but decreased as the data block length increased. This is because the old selected data symbol is ineffective for estimating the channel. Thus, when we discard the old data symbol, we can further improve the estimation accuracy (
Figure 7).
6. Conclusions
A data-aided channel estimator was proposed for time-varying channels, which involves selecting the detected data symbol. To facilitate efficient selection of the detected data symbol, an optimization problem was initially formulated to minimize the channel estimation error. Subsequently, the MDP for this optimization problem was formulated, and its optimal policy was derived using an RL algorithm. In the derivation process, approximations for the transition probability and a first-order Gaussian–Markov process were utilized. To improve estimation accuracy, a state element refinement was introduced to capture the time-varying nature of the channel by incorporating a window size. Simulation results demonstrated that the proposed channel estimator provides similar performance to the conventional RL-based channel estimator in time-invariant channels when , while showing improved performance in time-varying channels when and compared to conventional RL-based channel estimator.
An interesting direction for further research involves optimizing the frame structure in terms of the spectral efficiency. In this study, the frame structure comprises one pilot and D data block. The proposed RL-aided channel estimator is applied to the data blocks to capture the time-varying nature of the channel. However, in fast fading channels, it can be challenging for the proposed channel estimator to accurately track channel variations. In such cases, reducing the value of D in the frame structure can potentially improve the performance. However, this reduction also leads to a degradation in spectral efficiency. To find an appropriate value for D in time-varying channels, an optimization problem that maximizes spectral efficiency while maintaining acceptable performance levels becomes a suitable criterion. To address this, one approach is to first derive the performance of the RL-aided channel estimator. Subsequently, the solution to the optimization problem can be obtained using the derived performance.
Author Contributions
Conceptualization, M.M.; Methodology, T.-K.K.; Software, T.-K.K.; Validation, M.M.; Formal analysis, T.-K.K.; Investigation, T.-K.K.; Resources, T.-K.K.; Data curation, T.-K.K.; Writing—original draft, T.-K.K.; Writing—review & editing, M.M.; Visualization, M.M.; Supervision, M.M.; Project administration, M.M.; Funding acquisition, M.M. All authors have read and agreed to the published version of the manuscript.
Funding
The work of Tae-Kyoung Kim was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MIST) (No. 2021R1F1A1063273). The work of Moonsik Min was supported in part by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MSIT) (No. 2023R1A2C1004034), and in part by the BK21 FOUR Project funded by the Ministry of Education, Korea (4199990113966).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Goldsmith, A.; Jafar, S.A.; Jindal, N.; Vishwanath, S. Capacity Limits of MIMO Channels. IEEE J. Sel. Commun. 2003, 21, 684–702. [Google Scholar] [CrossRef] [Green Version]
- Zheng, L.; Tse, D.N.C. Diversity and Multiplexing: A Fundamental Tradeoff in Multiple-Antenna Channels. IEEE Trans. Inf. Theory 2003, 49, 1073–1096. [Google Scholar] [CrossRef] [Green Version]
- Paulraj, A.J.; Gore, D.A.; Nabar, R.U.; Bolcskei, H. An Overview of MIMO Communications-a Key to Gigabit Wireless. Proc. IEEE 2004, 92, 198–218. [Google Scholar] [CrossRef] [Green Version]
- Sanayei, S.; Nosratinia, A. Antenna Selection in MIMO Systems. IEEE Commun. Mag. 2004, 42, 68–73. [Google Scholar] [CrossRef]
- Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. Massive MIMO for Next Generation Wireless Systems. IEEE Commun. Mag. 2014, 52, 186–1954. [Google Scholar] [CrossRef] [Green Version]
- Zheng, K.; Zhao, L.; Mei, J.; Shao, B.; Xiang, W.; Hanzo, L. Survey of Large-Scale MIMO Systems. IEEE Commun. Surv. Tutor. 2015, 17, 1738–1760. [Google Scholar] [CrossRef] [Green Version]
- Yang, S.; Hanzo, L. Fifty Years of MIMO Detection: The Road to Large-Scale MIMOs. IEEE Commun. Surv. Tutor. 2015, 17, 1941–1988. [Google Scholar] [CrossRef] [Green Version]
- Morelli, M.; Mengali, U. A Comparison of Pilot-Aided Channel Estimation Methods for OFDM System. IEEE Trans. Signal Process. 2001, 49, 3065–3073. [Google Scholar] [CrossRef]
- Coleri, S.; Ergen, M.; Puri, A.; Bahai, A. Channel Estimation Techniques Based on Pilot Arrangement in OFDM Systems. IEEE Trans. Broadcast. 2002, 48, 223–229. [Google Scholar] [CrossRef] [Green Version]
- Mostofi, Y.; Cox, D.C. ICI Mitigation for Pilot-Aided OFDM Mobile Systems. IEEE Trans. Wirel. Commun. 2005, 4, 765–774. [Google Scholar] [CrossRef]
- Biguesh, M.; Gershman, A.B. Training-based MIMO Channel Estimation: A Study of Estimator Tradeoffs and Optimal Training Signals. IEEE Trans. Signal Process. 2006, 54, 884–893. [Google Scholar] [CrossRef]
- Ozdemir, M.K.; Arslan, H. Channel Estimation for Wireless OFDM Systems. IEEE Commun. Surv. Tutor. 2007, 9, 18–48. [Google Scholar] [CrossRef]
- Soltani, M.; Pourahmadi, V.; Mirzaei, A.; Sheikhzadeh, H. Deep Learning-based Channel Estimation. IEEE Commun. Lett. 2019, 23, 652–655. [Google Scholar] [CrossRef] [Green Version]
- Le, H.A.; Van Chien, T.; Nguyen, T.H.; Choo, H.; Nguyen, V.D. Machine Learning-Based 5G-and-Beyond Channel Estimation for MIMO-OFDM Communication Systems. Sensors 2021, 21, 4861. [Google Scholar] [CrossRef]
- Yuan, J.; Ngo, H.Q.; Matthaiou, M. Machine Learning-Based Channel Prediction in Massive MIMO with Channel Aging. IEEE Trans. Wirel. Commun. 2020, 19, 2960–2973. [Google Scholar] [CrossRef]
- Valenti, M.C.; Woerner, B.D. Iterative Channel Estimation and Decoding of Pilot Symbol Assisted Turbo Codes over Flat-Fading Channels. IEEE J. Sel. Commun. 2001, 19, 1697–1705. [Google Scholar] [CrossRef]
- Dowler, A.; Nix, A.; McGeehan, J. Data-derived Iterative Channel Estimation with Channel Tracking for a Mobile Fourth Generation Wide Area OFDM System. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM), San Francisco, CA, USA, 1–5 December 2003. [Google Scholar]
- Cozzo, C.; Hughes, B.L. Joint Channel Estimation and Data Detection in Space-Time Communications. IEEE Trans. Commun. 2003, 51, 1266–1270. [Google Scholar] [CrossRef]
- Song, S.; Singer, A.C.; Sung, K.M. Soft Input Channel Estimation for Turbo Equalization. IEEE Trans. Signal Process. 2004, 52, 2885–2894. [Google Scholar] [CrossRef]
- Nicoli, M.; Ferrara, S.; Spagnolini, U. Soft-Iterative Channel Estimation: Methods and Performance Analysis. IEEE Trans. Signal Process. 2007, 55, 2993–3006. [Google Scholar] [CrossRef]
- Zhao, M.; Shi, Z.; Reed, M.C. Iterative Turbo Channel Estimation for OFDM System over Rapid Dispersive Fading Channel. IEEE Trans. Wirel. Commun. 2008, 7, 3174–3184. [Google Scholar] [CrossRef]
- Guo, Q.; Ping, L.; Huang, D. A Low-Complexity Iterative Channel Estimation and Detection Technique for Doubly Selective Channels. IEEE Trans. Wirel. Commun. 2009, 8, 4340–4349. [Google Scholar]
- Ma, J.; Ping, L. Data-Aided Channel Estimation in Large Antenna Systems. IEEE Trans. Signal Process. 2014, 62, 3111–3124. [Google Scholar]
- Wen, C.K.; Wang, C.J.; Jin, S.; Wong, K.K.; Ting, P. Bayes-Optimal Joint Channel-and-Data Estimation for Massive MIMO with Low-Precision ADCs. IEEE Trans. Signal Process. 2015, 64, 2541–2556. [Google Scholar] [CrossRef] [Green Version]
- Park, S.; Shim, B.; Choi, J.W. Iterative Channel Estimation Using Virtual Pilot Signals for MIMO-OFDM Systems. IEEE Trans. Signal Process. 2015, 63, 3032–3045. [Google Scholar] [CrossRef]
- Huang, C.; Liu, L.; Yuen, C.; Sun, S. Iterative Channel Estimation Using LSE and Sparse Message Passing for mmWave MIMO Systems. IEEE Trans. Signal Process. 2018, 67, 245–259. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Wang, Q.; Yang, H.; Ma, X. Data-Aided MIMO Channel Estimation by Clustering and Reinforcement-Learning. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022. [Google Scholar]
- Naeem, M.; De Pietro, G.; Coronato, A. Application of Reinforcement Learning and Deep Learning in Multiple-Input and Multiple-Output (MIMO) Systems. Sensors 2022, 22, 309. [Google Scholar] [CrossRef]
- Oh, M.S.; Hosseinalipour, S.; Kim, T.; Brinton, C.G.; Love, D.J. Channel Estimation via Successive Denoising in MIMO OFDM Systems: A Reinforcement Learning Approach. In Proceedings of the IEEE International Conference on Communications (ICC), Montreal, QC, Canada, 14–23 June 2021. [Google Scholar]
- Chu, M.; Liu, A.; LAu, V.K.N.; Jiang, C.; Yang, T. Deep Reinforcement Learning based End-to-End Multi-User Channel Prediction and Beamforming. IEEE Trans. Wirel. Commun. 2022, 21, 10271–10285. [Google Scholar] [CrossRef]
- Jeon, Y.S.; Li, J.; Tavangaran, N.; Poor, H.V. Data-Aided Channel Estimator for MIMO Systems via Reinforcement Learning. In Proceedings of the IEEE International Conference on Communications (ICC), Prayagraj, India, 27–29 November 2020. [Google Scholar]
- Kim, T.K.; Min, M. A Low-Complexity Algorithm for Reinforcement Learning-Based Channel Estimator for MIMO Systems. Sensors 2022, 21, 4379. [Google Scholar] [CrossRef]
- Kim, T.K.; Jeon, Y.S.; Li, J.; Tavangaran, N.; Poor, H.V. Semi-Data-Aided Channel Estimation for MIMO Systems via Reinforcement Learning. IEEE Trans. Wirel. Commun. 2022; early access. [Google Scholar] [CrossRef]
- Dong, M.; Tong, L.; Sadler, B.M. Optimal Insertion of Pilot Symbols for Transmissions over Time-Varying Flat Fading Channels. IEEE Trans. Signal Process. 2004, 52, 1403–1418. [Google Scholar] [CrossRef] [Green Version]
- Kim, T.K.; Jeon, Y.S.; Min, M. Training Length Adaptation for Reinforcement Learning-Based Detection in Time-Varying Massive MIMO Systems With One-Bit ADCs. IEEE Trans. Veh. Technol. 2021, 70, 6999–7011. [Google Scholar] [CrossRef]
- Li, C.C.; Lin, Y.P. Predictive Coding of Bit Loading for Time Correlated MIMO Channels with A Decision Feedback Receiver. IEEE Trans. Signal Process. 2015, 63, 3376–3386. [Google Scholar] [CrossRef]
- Kim, H.; Yu, H.; Lee, Y. Limited Feedback for Multicell Zero-Forcing Coordinated Beamforming in Time-Varying Channels. IEEE Trans. Veh. Technol. 2015, 64, 2349–2359. [Google Scholar] [CrossRef]
- Mirza, J.; Dmochowski, P.A.; Smith, P.J.; Shafi, M. A Differential Codebook with Adaptive Scaling for Limited Feedback MU MISO Systems. IEEE Wirel. Commun. Lett. 2014, 3, 2–5. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).