Power Control and Link Selection for Wireless Relay Networks with Hybrid Energy Sources

Wu, Runze; Xie, Huan; Chen, Zhiyi; Tang, Liangrui

doi:10.3390/app9132744

Open AccessArticle

Power Control and Link Selection for Wireless Relay Networks with Hybrid Energy Sources

by

Runze Wu

,

Huan Xie

^*,

Zhiyi Chen

and

Liangrui Tang

School of Electrical and Electronic Engineering, North China Electric Power University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2019, 9(13), 2744; https://doi.org/10.3390/app9132744

Submission received: 9 June 2019 / Revised: 1 July 2019 / Accepted: 3 July 2019 / Published: 6 July 2019

Download

Browse Figures

Versions Notes

Abstract

:

The Hybrid energy supply (HES) wireless relay system is a new green network technology, where the source node is powered by the grid and relay is powered by harvested renewable energy. However, the network’s performance may degrade due to the intermittent nature of renewable energy. In this paper, our purpose is to minimize grid energy consumption and maximize throughput. However, improving the throughput requires increasing the transmission power of the source node, which will lead to a higher grid energy consumption. Linear weighted summation method is used to turn the two conflicting objectives into a single objective. Link assignment and a power control strategy are adopted to maximize the total reward of the network. The problem is formulated as a discrete Markov decision model. In addition, a backwards induction method based on state deletion is proposed to reduce the computational complexity. Simulation results show that the proposed algorithm can effectively alleviate performance degradation caused by the lack of renewable energy, and present the trade-off between energy consumption and throughput.

Keywords:

energy harvesting relay; energy consumption; throughput

1. Introduction

In order to expand the coverage of wireless networks and improve the communication quality of edge users, relay has been widely used in wireless networks such as long term evolution (LTE) and 5G [1,2]. However, dense deployment of relays can lead to problems such as high energy consumption and greenhouse gas emissions [3,4]. Energy harvesting technology is the most promising technology to solve the economic and environmental problems caused by the dense deployment of relays, which can collect and utilize renewable energy such as solar and wind energy [5,6,7,8,9]. In addition, energy harvesting technology can reduce the dependence of wireless networks on grid energy. Deploying green relays in areas where grid energy is scarce can effectively expand the coverage of wireless communication networks. However, the intermittent and random nature of the renewable energy may cause a decline in network performance. Therefore, it is critical to establish a renewable energy allocation and link selection mechanism to ensure the performance of wireless relay networks.

Optimal power control strategies to alleviate network performance degradation caused by the lack of renewable energy have been proposed in [10,11,12,13]. In [10], both the source and relay nodes are powered by renewable energy, and an off-line power control strategy is made to minimize data transmission time with throughput constraints. It was also proved that the power control strategy has water injection structure. Differing from [10], literature [11] considered how the source node is powered by the grid while the relay node is powered by renewable energy. An off-line power control strategy is proposed to maximize the grid energy efficiency of the source node. In [12], the research was extended to a Gauss fading channel, with both off-line and on-line power control schemes are proposed. In addition to the traditional relay, the literature [13] considers the relay with cache function. Off-line and on-line power control strategies are developed to maximize network throughput. All of the above studies have developed off-line or on-line power control strategies under the constraints of user’s quality of service (QoS).

Besides power control schemes, link selection strategies are also critical. In [14], the source node is powered by grid energy while the relay is powered by renewable energy. With causal side information, a link selection strategy is made to maximize the average throughput of all time slots. In [15], the relay is powered by radio frequency (RF) energy with a finite battery capacity, and only one data packet needs to be transmitted in each slot. A link selection scheme based on battery level is proposed to minimize the outage probability of the relay. Further, literature [16] jointly optimizes power and link selection to reduce outage probability of the relay. In [17], the research was extended to multi-relay with data caching, Under the constraints of battery capacity, on-line and off-line power allocation and link selection mechanisms were developed to minimize data transmission time.

Most previous studies on hybrid energy supply (HES) wireless relay systems assume that the amount of data transmitted by the source node are fixed in each slot, and then optimize the outage probability or transmission time. However, in practical applications, the source node needs to serve a lot of different users and deliver unfixed data bits in each slot. So, data bits should be transmitted as much as possible to reduce the network congestion with grid energy consumption constraints. Consequently, in this paper, we consider that the source node has available bits to be transmitted all the time. Our goal is to maximize network throughput while minimizing grid energy consumption.

The rest of this paper is organized as follows. Section 2 describes the system model. In Section 3, the Markov decision process (MDP) problem is formulated, for which a low-complexity algorithm is proposed. Simulations are shown in Section 4. Finally, Section 5 highlights the conclusion.

2. System Model

2.1. Network Model

We consider a green wireless relay network as shown in Figure 1, which consists of a source node (S), a relay node (R), and a destination node (D). The source node is powered by the electric grid, and its maximum transmit power is denoted as

p_{S}^{\max}

. The relay is powered purely by the renewable energy which is harvested from the nature and stored in a battery, with the maximum transmit power as

p_{R}^{\max}

. The distances between source node and relay, relay and destination node, source node and destination node are

d_{S R}

,

d_{R D}

and

d_{S D}

, respectively. We consider the performance changes of the network in N slots with length T. The slot index set is denoted as

ℕ = {1, 2 \dots \dots, N}

.

We assume that the source node has available bits to be transmitted all the time. The relay operates in a half-duplex manner that it receives data from the base station (BS) in the first half of the time slot and forwards it in the second half [11]. In each slot, there are two links for S to transmit bits: the relay link and direct link. The link assignment indicator in the

i

-th slot is denoted as

I_{j}^{i} \in {0, 1}

, with

j \in {S R D, S D}

.

I_{S R D}^{i} = 1

indicates that the relay link is assigned to deliver the bits, and

I_{S D}^{i} = 1

represents the event that the direct link is selected. In each slot, only one of these links can be selected, so

I_{S D}^{i} + I_{S R D}^{i} = 1, \forall i \in ℕ .

(1)

In addition, transmission power of source node S and relay R in slot

i

is

p_{S}^{i}

and

p_{R}^{i}

, respectively. Static power consumption is neglected in this work.

We consider that the channel is time-varying with small-scale fading; denoted

h_{i}^{k}

, with

k \in {S R, R D, S D}

as the channel fading factor between two nodes. In addition, all channels share the bandwidth (B) together. Therefore, once the direct link is selected in the

i

-th slot, the number of bits transmitted by the source node are calculated as

R_{S D}^{i} = B T \log (1 + h_{S D}^{i} g_{0} d_{S D}^{- α} σ^{- 2} P_{S D}^{i}),

(2)

according to the Shannon theorem, where

σ^{- 2}

is the noise variance of node D,

g_{0}

is the channel fading constant,

p_{S D}^{i}

is the transmit power of node S and

α

is the path loss exponent. While the relay link is selected, the total bits delivered by the source node and the relay are given as

R_{S R}^{i} = B \frac{T}{2} \log (1 + h_{S R}^{i} g_{0} d_{S R}^{- α} σ^{- 2} p_{S R}^{i})

(3)

and

R_{R D}^{i} = B \frac{T}{2} \log (1 + h_{R D}^{i} g_{0} d_{R D}^{- α} σ^{- 2} p_{R}^{i}),

(4)

where

p_{S R}^{i}

is the transmit power of node S when selecting the relay link.

2.2. Energy Model

In order to simplify the energy harvesting model, the harvesting process is thought to be accomplished at the beginning of each time slot [18]. Then, discrete energy model is adopted to describe the process of energy harvesting with

E_{H}^{i}

energy packets arriving at each time slot [19], which obeys the Poisson distribution with mean λ [20]. And λ represents the intensity of energy harvesting. In addition, each packet contains the energy of

E_{e}

. In the initial time slot, the energy stored in the battery is the original energy plus the harvested energy. While in the slot of

i > 1

, the energy consumed by the relay should be subtracted. Therefore, battery level in the

i

-th slot is expressed as:

E (i) = {\begin{array}{l} E_{0} + E_{H}^{i} E_{e} & i = 1 \\ E (i) - C_{R}^{i - 1} + E_{H}^{i} E_{e} & i > 1 \end{array},

(5)

where

E_{0}

is the initial energy, and

C_{R}^{i - 1}

is energy consumed by the relay in the last slot, which is given by:

C_{R}^{i - 1} = I_{S D}^{i - 1} T p_{S D}^{i - 1} + I_{S R D}^{i - 1} \frac{T}{2} p_{S R D}^{i - 1} .

(6)

Since the battery is of the limited size, the following energy constraint should be satisfied:

E (i) \leq E_{\max},

(7)

where

E_{\max}

is the maximum capacity of the battery. The energy consumed by the relay should be no more than the energy of the battery, that is

C_{R}^{i} \leq E (i)

.

2.3. Optimizing Objective

Most previous studies on two-hop green wireless relay networks concentrated on minimizing the outage probability or transmission time with fixed bits at each slot [10,21,22]. However, when the network is busy and needs to provide services for multiple users, there will be a continuous stream of data bits to be transmitted. In this case, throughput can show the carrying capacity of the network, which is attractive to us. In addition, the source node powered by the grid energy can also be applied to some wireless networks. For example, macro base stations powered by the grid energy can maintain basic coverage in heterogeneous wireless networks. To take above aspects into consideration, we set grid energy consumption and throughput as optimization objectives. However, according to Equations (2) and (3), improving the throughput requires increasing the transmission power of the source node, which will lead to higher grid energy consumption. Therefore, increasing throughput and reducing grid energy consumption are conflicting objectives, which can be formulated as:

o b j = {\begin{array}{l} \max \sum_{i = 1}^{N} I_{S D}^{i} R_{S D}^{i} + I_{S R D}^{i} R_{R D}^{i} \\ \min \sum_{i = 1}^{N} I_{S D}^{i} T p_{S D}^{i} + I_{S R D}^{i} \frac{T}{2} p_{S R}^{i} \end{array} .

(8)

There are many ways to solve multi-objective problems, and linear weighted summation is an effective method of them [23]. We use the weighted summation method to transform the conflicting multi-objectives into a single objective. Thus, the conflict objectives can be denoted as:

T R = \sum_{i = 1}^{N} [ω_{t} (I_{S D}^{i} R_{S D}^{i} + I_{S R D}^{i} R_{R D}^{i}) - ω_{g} (I_{S D}^{i} T p_{S D}^{i} + I_{S R D}^{i} \frac{T}{2} p_{S R}^{i})],

(9)

where

ω_{t}

and

ω_{g}

are weight coefficients of throughput and grid energy consumption, respectively.

ω_{t} (I_{S D}^{i} R_{S D}^{i} + I_{S R D}^{i} R_{R D}^{i})

and

- ω_{g} (I_{S D}^{i} T p_{S D}^{i} + I_{S R D}^{i} \frac{T}{2} p_{S R}^{i})

are the weighted throughput reward and grid energy reward in the

i

-th slot.

3. Optimal Control for Expected Total Rewards

In this section, we assume

h_{S R}^{i}

,

h_{S D}^{i}

and

h_{R D}^{i}

are causally known. Consequently, we aim to adapt the transmission power and link selection to maximize the expected total rewards. Thus, the problem can be formulated as:

P_{1} : \max_{I_{j}^{i}, p_{S R}^{i}, p_{S D}^{i}, p_{R}^{i}} E {\sum_{i = 1}^{N} [ω_{t} (I_{S D}^{i} R_{S D}^{i} + I_{S R D}^{i} R_{R D}^{i}) - ω_{g} (I_{S D}^{i} T p_{S D}^{i} + I_{S R D}^{i} \frac{T}{2} p_{S R}^{i})]}

(10)

s.t. Equations (1) and (7)

(11)

C_{R}^{i} \leq E (i)

(12)

R_{S R}^{i} = R_{R D}^{i}

(13)

p_{j}^{i} \leq p_{j}^{\max} \forall i \in ℕ, j \in {S, R}

(14)

I_{j}^{i} \in {0, 1}, \forall i \in N, j \in {S D, S R D} .

(15)

In

P_{1}

, the objective is the expected total rewards over N time slots. Equation (12) is the energy constraint that the energy consumed by relay should not exceed that of the battery. Equation (13) is the throughput constraint that the bits delivered in the two stages of relay link should be Equations (14) and (15) are the power constraint and the link selection constraint.

3.1. Problem Simplification

From Equation (10), we can see that the optimized variables of

P_{1}

include the 0–1 variable

I_{j}^{i}

and the continuous variable

p_{S R}^{i}

,

p_{S D}^{i}

and

p_{R}^{i}

. Thus, it is very difficult to optimize so many different types of variables at the same time. Therefore, we do some simplification of

P_{1}

to reduce the optimized variables. And, the problem of

P_{1}

can be expressed as:

P_{2} : \max_{p_{R}^{i}} E {\sum_{i = 1}^{N} [β_{i} D_{i} - (1 - β_{i}) H_{i}]}

(16)

s . t . C_{R}^{i} \leq E (i)

(17)

p_{R}^{i} \leq p_{R}^{\max} \forall i \in ℕ .

(18)

And

β_{i}

is the 0–1 variable, which can be calculated as:

β_{i} = {\begin{matrix} 1 p_{R}^{i} \neq 0 \\ 0 p_{R}^{i} = 0 \end{matrix} \forall i \in ℕ .

(19)

That is, when the transmission power of relay is non-zero,

β_{i} = 1

and relay link is chosen to transmit data. While the power of relay is zero,

β_{i} = 0

and direct link is selected to deliver bits. In addition,

D_{i}

is the network total reward when relay link is chosen, which is given by:

D_{i} = ω_{t} R_{R D}^{i} - ω_{g} G_{S R}^{i},

(20)

where

R_{R D}^{i}

is the throughput, and

G_{S R}^{i}

is the grid energy consumed by the source node. We assume that the amount of data bits transmitted in the two stages of relay link should be equal. So, there is

R_{S R}^{i} = R_{R D}^{i} = B \frac{T}{2} \log (1 + h_{S R}^{i} g_{0} d_{S R}^{- α} σ^{- 2} p_{S R}^{i}) = B \frac{T}{2} \log (1 + h_{R D}^{i} g_{0} d_{R D}^{- α} σ^{- 2} p_{R}^{i}) .

(21)

From equation (21), we can know that

h_{S R}^{i} d_{S R}^{- α} p_{S R}^{i} = h_{R D}^{i} d_{R D}^{- α} p_{R}^{i} .

(22)

According to Equation (20) to (22),

D_{i}

can be further given as:

D_{i} = w_{t} B \frac{T}{2} \log (1 + h_{R D}^{i} g_{0} d_{R D}^{- α} σ^{- 2} p_{R}^{i}) - w_{g} \frac{T}{2} \frac{h_{R D}^{i} d_{R D}^{- α} p_{R}^{i}}{h_{S R}^{i} d_{S R}^{- α}} .

(23)

H_{i}

is the reward when direct link is selected, and the problem of solving

H_{i}

can be expressed as:

P_{H} : \max_{p_{S D}^{i}} H_{i} = \max_{p_{S D}^{i}} [ω_{t} R_{S D}^{i} - ω_{g} G_{S D}^{i}] = ω_{t} T B \log (1 + h_{S D}^{i} g_{0}^{'} d_{S D}^{- α} σ^{- 2} p_{S D}^{i}) - ω_{g} T p_{S D}^{i}

(24)

s . t . p_{S D}^{i} \leq p_{S}^{\max} \forall i \in ℕ .

(25)

Proposition 1.

The reward of the direct link

H_{i}

is convex in transmission power

p_{S}^{i}

.

Proof:

The second derivative of formula (24) is

f {(p_{S D}^{i})}^{″} = - \frac{w_{t} T B {(h_{S D}^{i} g_{0} d_{S D}^{- α} σ^{- 2})}^{2}}{{(1 + h_{S D}^{i} g_{0} d_{S D}^{- α} σ^{- 2} p_{S D}^{i})}^{2} \ln 2} < 0,

(26)

which means that

f (p_{S D}^{i})

is a convex function. □

According to the properties of convex functions, we can easily know that

H_{i}

gets the maximum value at

p_{S D}^{i} = \frac{ω_{t}}{ω_{g}} - \frac{1}{h_{S D}^{i} g_{0} d_{S D}^{- α} σ^{- 2}}

. Therefore, the optimal

p_{S D}^{i}

and

H_{i}

are known once

h_{S D}^{i}

is given. However,

p_{S D}^{i}

has practical significance only on [0,

p_{S}^{\max}

] in this work. Consequently, when

\frac{ω_{t}}{ω_{g}} - \frac{1}{h_{S D}^{i} g_{0} d_{S D}^{- α} σ^{- 2}} < 0

,

P_{H}

decreases monotonously on [0,

p_{S}^{\max}

] with maximum value

p_{H} (0)

. While

\frac{ω_{t}}{ω_{g}} - \frac{1}{h_{S D}^{i} g_{0} d_{S D}^{- α} σ^{- 2}} > p_{S}^{\max}

,

P_{H}

increases monotonously on [0,

p_{S}^{\max}

] with maximum value

p_{H} (\max)

.

It can be seen from Equation (16) to (26), that the values of

β_{i}

and

D_{i}

are determined by

p_{R}^{i}

, and the maximum value of

H_{i}^{*}

is a fixed value in each time slot which is only related to

h_{S D}^{i}

. Therefore, the optimal variable of problem

P_{2}

is only

p_{R}^{i}

.

3.2. MDP Model for Expected Total Rewards

Our goal is to maximize the total rewards over N slots through a relay power control scheme. However, due to the limited battery capacity, the relay power selection results in each slot will affect the initial battery capacity at the next moment. So, power decisions on different time slots are mutually influential. The MDP is a useful model to handle such decision problems, and backward induction is an effective algorithm to solve this problem [24].

Therefore, we formulate

P_{2}

as a Markov decision process (MDP) problem, which can also be expressed as:

P_{2} : \max_{π \in \prod} E^{π} \sum_{i = 1}^{N} R (i),

(27)

where

π

is a feasible relay power policy,

\prod

denotes the set of all feasible policies.

R (i)

is the reward of slot i, which is given by

[β_{i} D_{i} - (1 - β_{i}) H_{i}]

.

3.2.1. MDP Basics

A sequential decision-making method is the selection of one of several action strategies in each time slot during the operation of the system [25]. In the sequential decision process, if the transfer of the system state obeys the known probability law and is independent of the previous history, then this sequential problem is called an MDP problem [26]. An MDP model consists of a reward function, system states, actions, state transition probability and objective, each of which will be described in detail later.

3.2.2. Reward Function

In an MDP model, the reward function is defined as

r (j, a_{i})

. It indicates that the system gets the reward with action

a_{i}

at state j [27]. This is denoted

o_{i} \in O, i \in ℕ

as the rule for selecting relay power in slot i. Thus, the rules over N slots can be expressed as

π {= (o}_{1} {, o}_{2} {, o}_{3}, \dots o_{N})

, and the set of all possible rules is denoted as

\prod

. Given the initial state k and strategy

π \in \prod_{m}^{O}

, the expected total rewards can be also written as:

V_{N} (π, s_{1}) = \sum_{i = 1}^{N} \sum_{a_{i} \in A_{i}, j \in S} P_{π} {s_{i} = j, Δ_{i} = a_{i} | s_{1} = k} r (j, a_{i}),

(28)

where

s_{i}

and

Δ_{i}

are the states of the relay system and the selected action in the slot i, respectively.

P_{π} {s_{i} = j, Δ_{i} = a_{i} | s_{1} = k}

is the conditional probability of using strategy

π \in \prod_{m}^{O}

, starting from state k, selecting action

a_{i}

, and moving to state j at slot

i

. Our aim is to find the optimal action selection scheme as

π^{*} {= (o}_{1}^{*} {, o}_{2}^{*}, \dots, o_{N}^{*})

, which makes

V_{N} (π^{*}, s_{1})

the maximum value.

3.2.3. Discretization of System States and Actions

The optional values of the system states and actions should be finite in MDP model. However, the system states include links and the battery states, and the relay power actions are continuous values in the wireless relay network. Therefore, it is necessary to discretize the system states and relay power actions. The relay system states consist of channel fading values and battery levels, which can be given as

s_{i} ≜ 〈 h_{S R}^{i}, h_{S D}^{i}, h_{R D}^{i}, ε (i) 〉

. We discretize the channel states by reference to the method in literature [28]. Denoting

H = {H_{1}, H_{2}, \dots H_{3}}

as the set of channel fading values, which is an equal-difference sequence. The probability when the channel fading is

H_{k}

with

k \in {1, 2 \dots K}

can be calculated as:

p {h_{j}^{i} = H_{k}} = \frac{1}{K} \forall i \in ℕ, j \in {S R, S D, S R D}, k \in {1, 2 \dots K} .

(29)

We divide the battery into M + 1 energy level. And the battery states set is taken as

ε (i) \in ε = [0, 1, \dots, m, \dots M]

. The real-time energy level of the battery can be calculated by

ε (i) = m = ⌊ \frac{E_{i} M}{E_{\max}} ⌋ .

(30)

Denote

A_{i}^{'} = [0, \frac{p_{R}^{\max}}{L}, \dots, p_{R}^{\max}]

as the action set, which is also an equal-difference sequence. Actually, the value of the relay transmit power is constrained by the battery level. Thus, the action set is given as

A_{i} = [0, \frac{p_{R}^{\max}}{L}, \dots, \min (p_{R}^{\max}, p_{x}^{i})]

, where

p_{x}^{i}

is calculated by:

p_{x}^{i} = ⌊ \frac{2 ε (i) E_{\max} L}{M T p_{R}^{\max}} ⌋ * \frac{p_{R}^{\max}}{L} .

(31)

⌊ x ⌋

is the function that rounds the variable x down.

3.2.4. State Transition Probability

After action

a_{i}

is selected, the system states will migrate from

s_{i}

to

s_{i + 1}

, which can be expressed as

s_{i} ≜ 〈 h_{S R}^{i}, h_{S D}^{i}, h_{R D}^{i}, ε (i) 〉 \to s_{i + 1} ≜ 〈 h_{S R}^{i + 1}, h_{S D}^{i + 1}, h_{R D}^{i + 1}, ε (i + 1) 〉

. Since the value of channel fading is equal probability, the state transition probability is:

p {s_{i + 1} | s_{i}, a_{i}} = \frac{1}{K^{3}} p {ε_{i + 1} | ε_{i}, a_{i}} .

(32)

We assume that

ε (i)

and

ε (i + 1)

are in the

M_{1}

and

M_{2}

levels of the battery, which should satisfy the Equation:

\frac{M_{2} E_{\max}}{M} \leq \frac{M_{1} E_{\max}}{M} - a_{i} * \frac{T}{2} + E_{H}^{i} E_{e} < \frac{(M_{2} + 1) E_{\max}}{M}

. As mentioned in Section 2.2, the energy harvesting process obeys the Poisson distribution with mean λ. Therefore, Equation (32) can be further given as:

p {s_{i + 1} | s_{i}, a_{i}} = \frac{1}{K^{3}} \sum_{n = n_{1}}^{n_{2}} \frac{λ^{n}}{n!} e^{- λ},

(33)

where

n_{1}

and

n_{2}

is given as

n_{1} = ⌈ \frac{\frac{M_{2} E_{\max}}{M} + a_{i} * \frac{T}{2} - \frac{M_{1} E_{\max}}{M}}{E_{e}} ⌉

and

n_{2} = ⌈ \frac{\frac{(M_{2} + 1) E_{\max}}{M} + a_{i} - \frac{M_{1} E_{\max}}{M}}{E_{e}} ⌉ - 1

, respectively.

⌈ x ⌉

is the function that rounds the variable x up.

3.3. The Backward Induction Algorithm for MDP Problem

The backward induction algorithm is an effective solution to the optimal strategy and value function in the finite-stage Markov decision programming problem [26]. A new function,

V_{*}^{n} (i)

, was proposed based on the backward induction algorithm, which is formulated as:

\begin{array}{l} V_{*}^{n} (i) & = \max_{a_{i} \in A_{i}} [r (k, a_{i}) + \sum_{j \in s} p (j | k, a_{i}) V_{*}^{i + 1} (j)] \\ = r (k, f_{*}^{i} (k)) + \sum_{j \in s} p (j | k, a_{i}) V_{*}^{i + 1} (j) \\ (k \in s, i = {N, N - 1, N - 2, \dots, 0}) \end{array},

(34)

where

V_{*}^{N + 1} (k) = 0, \forall k \in s

[26]. According to Equation (34), the optimal value function of the expected total rewards can be calculated as

V_{*}^{1} = (V_{*}^{1} (1), V_{*}^{1} (2), \dots, V_{*}^{1} (q))

. Meanwhile, the decision sequence

π^{*} = (o_{*}^{1}, o_{*}^{2}, \dots, o_{*}^{N})

obtained is the optimal strategy.

With the backward induction algorithm, the number of states required to traverse is

M \times L^{3}

. The state space may be very large if some of the elements are of large size and may encounter the curse of dimensionality [29]. An effective method to reduce the computational complexity in MDP model is proposed in literature [28]. In this case, we also eliminate some states that do not need to be searched according to the wireless relay network properties in our model by reference [28].

Proposition 2.

When

h_{S R}^{i}

,

h_{R D}^{i}

,

ε (i)

are fixed value, and the optimal action is

a_{i} = 0

at state

s_{i}^{-} ≜ 〈 h_{S R}^{i}, h_{S D}^{i^{-}}, h_{R D}^{i}, ε (i) 〉

, the optimal action is

a_{i} = 0

for any state

s_{i}^{+}

of

h_{S D}^{i^{+}} > h_{S D}^{i^{-}}

.

Proof:

If the optimal action for the state

s_{i}^{-}

is

a_{i} = 0

, according to Equation (34), we know that:

r (s_{i}^{-}, 0) + \sum_{j \in s} p (j | s_{i}^{-}, 0) V_{i + 1}^{*} (j) > \max_{a_{i} \in A_{i} a n d a_{i} \neq 0} [r (s_{i}^{-}, a_{i}) + \sum_{j \in s} p (j | s_{i}^{-}, a_{i}) V_{i + 1}^{*} (j)] .

(35)

From Equation (21), we know that

r (s_{i}, 0) = H_{i}^{*} = ω_{t} T B \log (1 + h_{S D}^{i} g_{0}^{'} d_{S D}^{- α} σ^{- 2} {p_{S D}^{i}}^{*}) - ω_{g} T {p_{S D}^{i}}^{*},

(36)

Which becomes larger as

h_{S D}^{i}

grows. Therefore, for any state

s_{i}^{+}

with

h_{i}^{S D^{+}} > h_{i}^{S D^{-}}

,

r (s_{i}^{+}, 0) > r (s_{i}^{-}, 0)

. Since

h_{i}^{S R}

,

h_{i}^{R D}

and

ε_{i}

are fixed value, we can get

\sum_{j \in s} p (j | s_{i}^{+}, 0) V_{i + 1}^{*} (j) = \sum_{j \in s} p (j | s_{i}^{-}, 0) V_{i + 1}^{*} (j),

(37)

and

\max_{a_{i} \in A_{i} a n d a_{i} \neq 0} [r (s_{i}^{+}, a_{i}) + \sum_{j \in s} p (j | s_{i}^{+}, a_{i}) V_{i + 1}^{*} (j)] = \max_{a_{i} \in A_{i} a n d a_{i} \neq 0} [r (s_{i}^{-}, a_{i}) + \sum_{j \in s} p (j | s_{i}^{-}, a_{i}) V_{i + 1}^{*} (j)] .

(38)

Finally,

r (s_{i}^{+}, 0) + \sum_{j \in s} p (j | s_{i}^{+}, 0) V_{i + 1}^{*} (j) > \max_{a_{i} \in A_{i} a n d a_{i} \neq 0} [r (s_{i}^{+}, a_{i}) + \sum_{j \in s} p (j | s_{i}^{+}, a_{i}) V_{i + 1}^{*} (j)],

(39)

which proves that the optimal action is

a_{i} = 0

in the state

s_{i}^{+}

. □

Proposition 3.

When

h_{S D}^{i}

,

h_{R D}^{i}

,

ε (i)

are fixed value, and the optimal action is

a_{i} = 0

at state

s_{i}^{+} ≜ 〈 h_{S R}^{i^{+}}, h_{S D}^{i}, h_{R D}^{i}, ε (i) 〉

, the optimal action is

a_{i} = 0

for any state

s_{i}^{-}

of

h_{S R}^{i^{-}} < h_{S R}^{i^{+}}

.

Proof:

If the optimal action for state

s_{i}^{+}

is

a_{i} = 0

, according to Equation (34), we can get:

r (s_{i}^{+}, 0) + \sum_{j \in s} p (j | s_{i}^{+}, 0) V_{i + 1}^{*} (j) > \max_{a_{i} \in A_{i} a n d a_{i} \neq 0} [r (s_{i}^{+}, a_{i}) + \sum_{j \in s} p (j | s_{i}^{+}, a_{i}) V_{i + 1}^{*} (j)] .

(40)

While

p_{R}^{i} \neq 0

,

r (s_{i}, p_{R}^{i})

is given as

r (s_{i}, p_{R}^{i}) = D_{i} = w_{t} B \frac{T}{2} \log (1 + h_{R D}^{i} g_{0} d_{R D}^{- α} σ^{- 2} p_{R}^{i}) - w_{g} \frac{T}{2} \frac{h_{R D}^{i} d_{R D}^{- α} p_{R}^{i}}{h_{S R}^{i} d_{S R}^{- α}},

(41)

which becomes smaller as

h_{S R}^{i}

grows. Thus, For the stat

s_{i}^{-}

with

h_{i}^{S R^{-}} < h_{i}^{S R^{+}}

,

r (S_{i}^{-}, a_{i}) < (S_{i}^{+}, a_{i}) a_{i} \in A_{i} a n d a_{i} \neq 0

. Since

h_{i}^{S D}

,

h_{i}^{R D}

and

ε_{i}

are fixed value, there are:

\max_{a_{i} \in A_{i} a n d a_{i} \neq 0} [r (s_{i}^{-}, a_{i}) + \sum_{j \in s} p (j | s_{i}^{-}, a_{i}) V_{i + 1}^{*} (j)] < \max_{a_{i} \in A_{i} a n d a_{i} \neq 0} [r (s_{i}^{+}, a_{i}) + \sum_{j \in s} p (j | s_{i}^{+}, a_{i}) V_{i + 1}^{*} (j)],

(42)

and

r (s_{i}^{-}, 0) + \sum_{j \in s} p (j | s_{i}^{-}, 0) V_{i + 1}^{*} (j) = r (s_{i}^{+}, 0) + \sum_{j \in s} p (j | s_{i}^{+}, 0) V_{i + 1}^{*} (j) .

(43)

Finally,

r (s_{i}^{-}, 0) + \sum_{j \in s} p (j | s_{i}^{-}, 0) V_{i + 1}^{*} (j) = r (s_{i}^{+}, 0) + \sum_{j \in s} p (j | s_{i}^{+}, 0) V_{i + 1}^{*} (j),

(44)

which indicates that the optimal action is

a_{i} = 0

in the state

s_{i}^{-}

.

Algorithm 1. Backward Induction Algorithm Based on States Elimination

Input:

p_{R}^{\max}

,

p_{S}^{\max}

,

d_{S D}

,

d_{R D}

,

d_{S R}

,

T

,

B

,

N

,

K

,

L

,

λ

,

E_{e}

,

ω_{t}

,

ω_{g}

,

ε

Output:

π^{*}

1: Initialize

π^{*} = z e r o s (N, K^{3} \times M)

,

V_{*}^{N + 1} = 0

2: While N

\neq 1

3:

p_{x}^{N} = ⌊ \frac{2 ε (N) L}{T p_{R}^{\max}} ⌋ * \frac{p_{R}^{\max}}{L}

,

A_{N} = [0, \frac{p_{R}^{\max}}{L}, \dots, \min (p_{R}^{\max}, p_{x}^{N})]

;

4: For m = 1 to M,

k_{S R}

= 1 to K

5:

k_{R D}

= K,

k_{S D}

= 1;

6: While

k_{S D} \neq K + 1

,

k_{R D} \neq 0

7:

s = < H_{k_{S R}}, H_{k_{S D}}, H_{k_{R D}}, ε (m) >

;

8: For j = 1: length (

A_{N}

)

9: Calculate

π^{*} (N, s) = \arg \max_{A_{N} (j)} V^{N} (s) = \arg \max_{A_{N} (j)} (r (s), A_{N} (j)) + \sum_{l \in s} p (l | s, A_{N} (j)) V_{*}^{N + 1} (l))

;

10: End For

11: If

π^{*} (N, s_{k_{S R}, k_{S D}, k_{R D}, m}) = 0

12:

π^{*} (N, s_{k_{S R}, k_{S D}^{+}, k_{R D}, m}) = 0 \forall k_{S D}^{+} > k_{S D}

,

π^{*} (N, s_{k_{S R}, k_{S D}, k_{R D}^{-}, m}) = 0 \forall k_{R D}^{-} < k_{R D}

;

13:

k_{R D}

=

k_{R D}

− 1,

k_{S D}

=

k_{S D}

+ 1;

14: Else

15: if

k_{S D}

= K

16:

k_{R D}

=

k_{R D}

− 1,

k_{S D}

= 1;

17: Else

18:

k_{S D}

=

k_{S D}

+ 1;

19: End If

20: End If

21: End For

23: N = N − 1;

24: End While

4. Numerical Simulations

In this section, we run some numerical simulations to analyze the total reward, grid energy consumption and throughput in two-hop wireless relay networks. In the simulations, we set B = 10 MHz, T = 1 ms,

p_{S}^{\max}

= 2 W,

p_{R}^{\max}

= 0.5 W,

σ^{2}

= −97.5 dBm,

g_{0}

= −40 dB,

α

= 4 [28]. And

E_{e}

= 0.01 mJ,

E_{\max}

= 1.6 mJ, K = 10, L = 20,

d_{S D}

= 80 m,

ω_{g} = 1

. The detailed numerical results are shown as follows.

4.1. Baseline Schemes

Joint Power Control and Link Selection Algorithm (JPLA): The JPCALSA only considers the current system state, and calculates the maximum rewards of relay and direct link, respectively. Then, the optimal access link is selected by comparing the rewards.

Power Control Algorithm (PCA): The PCA is to maximize the reward in single slot by adjusting the power of the relay, and link selection scheme is not taken into account [11].

When the energy of relay is sufficient, the system will be in the ideal state. In order to compare the ideal results with our results in different situations, we propose JPLA-F and BIABoSE-F, which are JPLA and BIABoSE with enough renewable energy.

4.2. Parameter Analysis

Figure 2 demonstrates the total rewards with different number of battery levels at different time slots. In any slot, the total rewards increase with the number of battery levels rises. Actually, the energy between two adjacent levels is expressed by the lower level, and the interval of two adjacent levels is smaller as the number of levels becomes larger. At this point, the error between the true value and the expressed value will be smaller, which makes a more accurate result. When the number of battery levels reach 80 and 160, their rewards are close and maximal. Consequently, for reducing the computational complexity, M = 80 is used for simulation analysis in the follow-up.

We assume that

d_{S D} = d_{S R} + d_{R D} = 80 m

, and the total rewards vary with

d_{S D}

is shown in Figure 3 and Figure 4. As

d_{S D}

increases, the rewards of all algorithms become larger first and then decrease. When

d_{R D}

is small,

d_{S R} = 80 - d_{R D}

is large, and the path loss between the source node and relay is high. In this case, the source node delivers a few bits to relay with high grid energy consumption, which leads to low total rewards. As

d_{R D}

increases, the path loss between the source node and relay decreases, and the total rewards rise. Once

d_{R D}

is larger than a certain threshold, the path loss between the relay and destination is high, the number of bits that can be transmitted by relay is lower than that by source node. In this case, the total reward is gradually reduced as the throughput of relay tapers off.

In addition, the JPLA-F and BIBAoSE-F achieve the maximum value near

d_{R D}^{*}

= 40 m in both Figures. Meanwhile, the PCA, JPLA and BIBAoSE obtain the maximum value at different

d_{R D}^{*}

in two Figures. Unlike the JPLA-F and BIBAoSE-F, the other three algorithms are affected by the energy harvesting intensity. The energy that the relay needed for data transmission grows larger as

d_{R D}

increases. Therefore, when the energy is more sufficient, the total rewards will be closer to optimal result. The total rewards reach the maximum value at

d_{R D}

= 40. Thus, we choose

d_{R D}

= 40 for subsequent simulations to better observe the improvement of system performance in the absence of energy.

4.3. Total Reward Maximization

Figure 5 shows the total rewards changes with the time slots. Compared with the PCA, the JPLA adds a link selection mechanism. Therefore, the JPLA can transmit data through the direct link when the battery is very low, which can increase the total rewards. The BIBAoSE takes the future system states into account, which makes a more efficient green energy allocation over N slots than the JPLA. However, all the algorithms can only alleviate the system performance degradation caused by insufficient energy and cannot replace the green energy supply. Therefore, the JPLA-F and the BIBAoSE-F always have the highest total rewards.

Figure 6 displays the total rewards vary as energy harvesting intensity increases. The system is in a green energy-deficient state, when the energy intensity is low. In this case, the relay can deliver more bits as the intensity increases, which leads to a higher reward. However, the rewards will be constant once the energy intensity reaches a certain threshold, because the battery capacity is limited. It should be noted that the rewards of the BIABoSE are lower than the other algorithms when the green energy is enough due to the discretization of states. However, the BIABoSE achieves better performance in our main application scenario, which is a lack of green energy.

4.4. Grid Energy Consumption and Throughput Trade-Off

Figure 7 shows the grid energy consumption and throughput when

ω_{t}

takes different values. When

ω_{t}

is very small, the energy consumption and throughput of all schemes are similar. In this case, the system has a high demand for grid energy consumption, which will impose strict limits on energy consumption. When

ω_{t}

increases, the throughput plays an increasingly important role in the reward. Although our schemes consume a little more energy than the JPLA and PCA, it greatly improves the throughput. When the value of

ω_{t}

is large, all schemes pursue maximum throughput regardless of energy consumption costs. Therefore, all throughput gains are very close. However, the BIBAoSE consumes the least energy and is closest to the JPLA-F and BIBAoSE-F. In addition, once the throughput constraints are given, we can find the value of

ω_{t}

and get the minimum grid energy consumption.

The energy consumption and throughput are shown in Figure 8. As can be seen from the graph, the BIBAoSE consumes less grid energy than the JPLA when achieves the same throughput. And the BIBAoSE can transmit more bits than the JPLA with the same grid energy supply. In short, the BIBAoSE has a better trade-off between energy consumption and throughput, which is closer to the ideal situation such as the JPLA-F and BIBAoSE-F.

5. Conclusions

In this paper, we proposed an online power allocation and link selection strategy to maximize the total rewards of two-hop relay wireless networks where the source node and relay are powered by grid and green energy, respectively. Simulation results show that the total reward of this scheme is optimal under different settings compared with some conventional schemes. Next, we will continue to study energy harvesting technology in multifunctional relay nodes. Then, the research results will be applied to practical scenarios such as 5G heterogeneous networks, the Internet of Things and other networks.

Author Contributions

R.W. and H.X. conceived and designed the experiments; R.W. and H.X. performed the simulations; H.X. and Z.C. wrote the paper; R.W. and L.T. technically reviewed the paper.

Funding

This research was funded by the National Natural Science Foundation of China (No. 51677065).

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, T.; Li, J.; Feng, S.; Guan, H.; Yan, S.; Jayakody, D.N.K. On the Incentive Mechanisms for Commercial Edge Caching in 5G Wireless Networks. IEEE Wirel. Commun. 2018, 25, 72–78. [Google Scholar] [CrossRef]
Ru, W.; Jia, L.; Zhang, G.; Huang, S.; Yuan, M. Energy Efficient Power Allocation for Relay-Aided D2D Communications in 5G Networks. China Commun. 2017, 14, 54–64. [Google Scholar]
Li, Z.; Fu, X.; Wang, S.; Pei, T.; Li, J. Achievable Rate Maximization for Cognitive Hybrid Satellite-Terrestrial Networks With AF-Relays. IEEE J. Sel. Areas Commun. 2018, 36, 304–313. [Google Scholar] [CrossRef]
Andrawes, A.; Nordin, R.; Ismail, M. Wireless Energy Harvesting with Cooperative Relaying under the Best Relay Selection Scheme. Energies 2019, 12, 892. [Google Scholar] [CrossRef]
Lei, C.; Yu, F.R.; Hong, J.; Rong, B.; Li, X.; Leung, V.C.M. Green Full-Duplex Self-Backhaul and Energy Harvesting Small Cell Networks with Massive MIMO. IEEE J. Sel. Areas Commun. 2016, 34, 3709–3724. [Google Scholar]
Zhu, Z.; Huang, S.; Zheng, C.; Zhou, F.; Zhang, D.; Lee, I. Robust Designs of Beamforming and Power Splitting for Distributed Antenna Systems with Wireless Energy Harvesting. IEEE Syst. J. 2018, 13, 30–41. [Google Scholar] [CrossRef]
Wang, L.; Wong, K.K.; Shi, J.; Gan, Z.; Robert, W.H., Jr. A New Look at Physical Layer Security, Caching, and Wireless Energy Harvesting for Heterogeneous Ultra-Dense Networks. IEEE Commun. Mag. 2017, 56, 49–55. [Google Scholar] [CrossRef]
Al-Hraishawi, H.; Baduge, G.A.A. Wireless Energy Harvesting in Cognitive Massive MIMO Systems with Underlay Spectrum Sharing. IEEE Wirel. Commun. Lett. 2017, 6, 134–137. [Google Scholar] [CrossRef]
Zhao, C.; Cai, L.X.; Yu, C.; Shan, H. Sustainable Cooperative Communication in Wireless Powered Networks with Energy Harvesting Relay. IEEE Trans. Wirel. Commun. 2017, 16, 8175–8189. [Google Scholar]
Ozel, O.; Tutuncuoglu, K.; Yang, J.; Ulukus, S.; Yener, A. Transmission with Energy Harvesting Nodes in Fading Wireless Channels: Optimal Policies. IEEE J. Sel. Areas Commun. 2011, 29, 1732–1743. [Google Scholar] [CrossRef]
Zhao, M.; Zhao, J.; Zhou, W.; Zhu, J.; Zhang, S. Energy efficiency optimization in relay-assisted networks with energy harvesting relay constraints. China Commun. 2015, 12, 84–94. [Google Scholar] [CrossRef]
Ahmed, I.; Ikhlef, A.; Schober, R.; Mallik, R.K. Power Allocation for Conventional and Buffer-Aided Link Adaptive Relaying Systems with Energy Harvesting Nodes. IEEE Trans. Wirel. Commun. 2014, 13, 1182–1195. [Google Scholar] [CrossRef]
Zhi, C.; Dong, Y.; Fan, P.; Letaief, K.B. Optimal Throughput for Two-Way Relaying: Energy Harvesting and Energy Co-Operation. IEEE J. Sel. Areas Commun. 2016, 34, 1448–1462. [Google Scholar]
Luo, Y.; Zhang, J.; Letaief, K.B. Relay selection for energy harvesting cooperative communication systems. In Proceedings of the IEEE Global Communications Conference, Atlanta, GA, USA, 9–13 December 2013. [Google Scholar]
Lee, Y.H.; Liu, K.H. Battery-aware relay selection for energy-harvesting relays with energy storage. In Proceedings of the IEEE 26th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Hong Kong, China, 30 August–2 September 2015. [Google Scholar]
Wang, F.; Guo, S.; Yang, Y.; Xiao, B. Relay Selection and Power Allocation for Cooperative Communication Networks with Energy Harvesting. IEEE Syst. J. 2016, 12, 1–12. [Google Scholar] [CrossRef]
Yuan, W.; Li, P.Q.; Liang, H.; Shen, X. Optimal Relay Selection and Power Control for Energy-Harvesting Wireless Relay Networks. IEEE Trans. Green Commun. Netw. 2018, 2, 471–481. [Google Scholar]
Yu, P.S.; Lee, J.; Quek, T.Q.S.; Hong, Y.-W.P. Traffic Offloading in Heterogeneous Networks with Energy Harvesting Personal Cells—Network Throughput and Energy Efficiency. IEEE Trans. Wirel. Commun. 2015, 15, 1146–1161. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, N.; Zhou, S.; Gong, J.; Niu, Z.; Shen, X. Energy-Aware Traffic Offloading for Green Heterogeneous Networks. IEEE J. Sel. Areas Commun. 2016, 34, 1116–1129. [Google Scholar]
Dhillon, H.S.; Li, Y.; Nuggehalli, P.; Pi, Z.; Andrews, J.G. Fundamentals of Heterogeneous Cellular Networks with Energy Harvesting. IEEE Trans. Wirel. Commun. 2014, 13, 2782–2797. [Google Scholar]
Ahmed, I.; Ikhlef, A.; Schober, R.; Mallik, R.K. Joint Power Allocation and Relay Selection in Energy Harvesting AF Relay Systems. IEEE Wirel. Commun. Lett. 2013, 2, 239–242. [Google Scholar] [CrossRef]
Huang, C.; Zhang, R.; Cui, S. Throughput Maximization for the Gaussian Relay Channel with Energy Harvesting Constraints. IEEE J. Sel. Areas Commun. 2013, 31, 1469–1479. [Google Scholar] [CrossRef]
Yu, G.; Jiang, Y.; Xu, L.; Li, G.Y. Multi-Objective Energy-Efficient Resource Allocation for Multi-RAT Heterogeneous Networks. IEEE J. Sel. Areas Commun. 2015, 33, 2118–2127. [Google Scholar] [CrossRef]
Gong, J.; Zhou, Z.; Zhou, S. On the Time Scales of Energy Arrival and Channel Fading in Energy Harvesting Communications. IEEE Trans. Green Commun. Netw. 2018, 2, 482–492. [Google Scholar] [CrossRef]
Benjaafar, S.; Morin, T.L.; Talavage, J.J. The strategic value of flexibility in sequential decision making. Eur. J. Oper. Res. 1995, 82, 438–457. [Google Scholar] [CrossRef] [Green Version]
Bertsekas, D.P. Dynamic Programming and Optimal Control; Athena Sci.: Belmont, MA, USA, 2005. [Google Scholar]
Hu, Q.; Chen, X. The finiteness of the reward function and the optimal value function in Markov decision processes. Math. Methods Oper. Res. 1999, 49, 255–266. [Google Scholar]
Mao, Y.; Zhang, J.; Letaief, K.B. Grid Energy Consumption and QoS Tradeoff in Hybrid Energy Supply Wireless Networks. IEEE Trans. Wirel. Commun. 2016, 15, 3573–3586. [Google Scholar] [CrossRef] [Green Version]
Song, H.; Liu, C.C.; Lawarrée, J.; Dahlgren, R.W. Optimal electricity supply bidding by Markov decision process. IEEE Trans. Power Syst. 2000, 15, 618–624. [Google Scholar] [CrossRef]

Figure 1. A green wireless network with an energy harvesting relay.

Figure 2. Total reward vs. time slots for different number of the battery intervals,

ω_{t} = 1

,

d_{S D} = d_{S R} + d_{R D} = 40 m

.

Figure 2. Total reward vs. time slots for different number of the battery intervals,

ω_{t} = 1

,

d_{S D} = d_{S R} + d_{R D} = 40 m

.

Figure 3. The total rewards versus

d_{S D}

,

ω_{t} = 1

, λ = 2.

Figure 3. The total rewards versus

d_{S D}

,

ω_{t} = 1

, λ = 2.

Figure 4. The total rewards versus

d_{S D}

,

ω_{t} = 1

, λ = 4.

Figure 4. The total rewards versus

d_{S D}

,

ω_{t} = 1

, λ = 4.

Figure 5. The total rewards versus time slots, λ = 2 and

ω_{t} = 1

.

Figure 5. The total rewards versus time slots, λ = 2 and

ω_{t} = 1

.

Figure 6. The total rewards versus energy harvesting intensity, N = 30 and

ω_{t} = 1

.

Figure 6. The total rewards versus energy harvesting intensity, N = 30 and

ω_{t} = 1

.

Figure 7. Grid energy consumption and throughput vs.

ω_{t}

, N = 30, λ = 2.

Figure 7. Grid energy consumption and throughput vs.

ω_{t}

, N = 30, λ = 2.

Figure 8. Grid energy consumption versus throughput. N = 30, λ = 2.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, R.; Xie, H.; Chen, Z.; Tang, L. Power Control and Link Selection for Wireless Relay Networks with Hybrid Energy Sources. Appl. Sci. 2019, 9, 2744. https://doi.org/10.3390/app9132744

AMA Style

Wu R, Xie H, Chen Z, Tang L. Power Control and Link Selection for Wireless Relay Networks with Hybrid Energy Sources. Applied Sciences. 2019; 9(13):2744. https://doi.org/10.3390/app9132744

Chicago/Turabian Style

Wu, Runze, Huan Xie, Zhiyi Chen, and Liangrui Tang. 2019. "Power Control and Link Selection for Wireless Relay Networks with Hybrid Energy Sources" Applied Sciences 9, no. 13: 2744. https://doi.org/10.3390/app9132744

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Power Control and Link Selection for Wireless Relay Networks with Hybrid Energy Sources

Abstract

1. Introduction

2. System Model

2.1. Network Model

2.2. Energy Model

2.3. Optimizing Objective

3. Optimal Control for Expected Total Rewards

3.1. Problem Simplification

3.2. MDP Model for Expected Total Rewards

3.2.1. MDP Basics

3.2.2. Reward Function

3.2.3. Discretization of System States and Actions

3.2.4. State Transition Probability

3.3. The Backward Induction Algorithm for MDP Problem

4. Numerical Simulations

4.1. Baseline Schemes

4.2. Parameter Analysis

4.3. Total Reward Maximization

4.4. Grid Energy Consumption and Throughput Trade-Off

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI