*3.2. Problem Formulation*

The forward Euler method is used in this paper to approximate the time derivatives. During each time segment Δ*t*, the motor power of the sUAV is *w*<sup>1</sup> and the climb angle is *w*2.

The following updated equations approximately model the sUAV hybrid propulsion system:

$$\begin{aligned} SOC(t\_{n+1}) &= SOC(t\_n) + \frac{dSOC}{dt}(t\_n) \Delta t, \\ M\_{FR}(t\_{n+1}) &= M\_{FR}(t\_n) + \frac{dM\_{FR}}{dt}(t\_n) \Delta t, \end{aligned}$$

where *SOC*(*tn*) and *MFR*(*tn*) are the state of charge and the mass of hydrogen remaining at *t* = *tn*.

The system is controlled by the change of the fuel cell load demand power Δ*PFC* at each discrete time instant. Thus, the fuel cell power is modeled as:

$$P\_{\rm FC}(t\_{n+1}) = P\_{\rm FC}(t\_n) + \mu \Delta P\_{\rm FC}(t\_n).$$

The motor power and climb angle are typically unknown a priori. In this paper, a Markov chain model is used to describe the evolution of *w*<sup>1</sup> and *w*<sup>2</sup> with the transition probabilities identified from the historical data. Once particular *w*<sup>1</sup> and *w*<sup>2</sup> values are encountered, a prediction of their probability distribution over the next time segment will be made using the Markov chain model.

The objective of the stochastic endurance maximization problem is to determine a control law that maximizes the time the sUAV can travel before the system states exit a prescribed set,

$$\mathcal{G} = \left\{ (\text{SOC}, M\_{FR}, P\_{\text{FC}}) : \text{SOC}\_{\text{min}} \le \text{SOC} \le \text{SOC}\_{\text{max}} \right\}$$

$$M\_{FR,\text{min}} \le M\_{FR} \le M\_{FR,\text{max}}, \quad 0 \le P\_{\text{FC}} \le P\_{\text{FC},\text{max}} \right\}. \tag{33}$$

The constraints on the *SOC* and *MFR* in Equation (33) reflect the minimum and maximum values of the battery state of charge and the mass of fuel, respectively. The constraints on *PFC* are reflective of the fact that the fuel cell load demand power cannot (i) exceed the maximum power of the fuel cell and (ii) be negative.

The optimal control policy developed in this paper through the application of DCOC specifies the change in fuel cell load power over one step, Δ*PFC*(*t*) = *PFC*(*t* + 1) − *PFC*(*t*), as a function of *SOC*(*t*), the mass of hydrogen fuel left, *MFR*(*t*), and the current fuel cell load power, *PFC*(*t*). The battery power complements fuel cell power in matching propeller requested power.

#### *3.3. Markov Chain Modeling*

A Markov chain model [39] is used to represent the evolution of *w* (in our case, *w* = [*w*1, *w*2]). The transition probabilities of the Markov chain are defined as:

$$p\_{ij} = prob\{\mathfrak{w}(t\_{n+1}) \in \mathcal{W}\_j \mid \mathfrak{w}(t\_n) \in \mathcal{W}\_i\},\tag{34}$$

where *Wi* and *Wj* (*i*, *j* = 1, ··· , *N*) are cells partitioning the feasible range of the operating conditions. The state dependence of the transition probabilities adds flexibility in reflecting the typical motor power and climb angle profiles of an sUAV.

The *pij*'s can be obtained from the statistical analysis of the historical flight data,

$$p\_{ij} = \frac{\mathcal{M}\_{ij}}{\mathcal{M}\_i} \, \tag{35}$$

where *Mij* is the total number of transitions from the cell *Wi* to the cell *Wj* (i.e., *<sup>w</sup>*(*tn*) <sup>∈</sup> *Wi*, *<sup>w</sup>*(*tn*+1) <sup>∈</sup> *Wj*), while *Mi* is the total number of transitions from *Wi* to any other cell, including *Wi* [21].

#### **4. Control Law Construction**

Here, we adopt the SDCOC framework from [16], which is applied to a discrete-time model with the following form,

$$\mathbf{x}(t\_{n+1}) = f(\mathbf{x}(t\_n), \mathbf{u}(t\_n), \mathbf{w}(t\_n)), \tag{36}$$

where *x*(*tn*) is the state vector, *u*(*tn*) is the control vector, and *w*(*tn*) is the vector of operating variables, which is not known until the time instant *tn*. The system has both control constraints and state constraints imposed as *<sup>u</sup>*(*tn*) <sup>∈</sup> *<sup>U</sup>* and {*x*(*tn*), *<sup>w</sup>*(*tn*)} ∈ *<sup>G</sup>*, respectively, where *U* and *G* are specified sets. A Markov chain with a finite number of states is used to represent transitions in *<sup>w</sup>*(*tn*) <sup>∈</sup> *<sup>W</sup>* <sup>=</sup> {*w<sup>p</sup>* : *<sup>p</sup>* <sup>∈</sup> *<sup>P</sup>*}. Here, *<sup>P</sup>* is the size of the grid for *<sup>w</sup>*. The transition probability from *<sup>w</sup>*(*tn*) = *<sup>w</sup><sup>i</sup>* <sup>∈</sup> *<sup>W</sup>* to *<sup>w</sup>*(*tn*+1) = *<sup>w</sup><sup>j</sup>* <sup>∈</sup> *<sup>W</sup>* is denoted by *pij*, expressed in (34). In a discounted variant of SDCOC, the objective is to determine a control function *u*(*x*, *w*) such that, with *u*(*tn*) = *u*(*x*(*tn*), *w*(*tn*)), a cost functional of the form,

$$J^{\mathbf{x}\_{0,\mathbf{w}\_0,\mathbf{u}}} = \mathbb{E}\_{\mathbf{x}\_{0,\mathbf{w}\_0}} \left[ \sum\_{t=0}^{\tau^{\mathbf{x}\_0,\mathbf{w}\_0,\mathbf{u}}(G) - 1} \delta^t \cdot 1 \right],\tag{37}$$

is maximized. Here, *<sup>τ</sup>x*0,*w*0,*u*(*G*) <sup>∈</sup> <sup>Z</sup><sup>+</sup> represents the first time instant when the trajectories of *<sup>x</sup>*(*tn*) and *<sup>w</sup>*(*tn*), which are denoted by {*xu*, *<sup>w</sup>u*} and result from applying the control *u*(*tn*) = *u*(*x*(*tn*), *w*(*tn*)) with values in the set *U*, exit the prescribed compact set *G*. *δ* is a discount factor [40]. For *δ* = 1, (37) maximizes the exit time, i.e., the time till the prescribed constraints become violated. The use of the discount factor 0 < *δ* < 1 facilitates faster convergence of the value iterations. Note that {*xu*, *<sup>w</sup>u*} is a random process, *<sup>τ</sup>x*0,*w*0,*u*(*G*) is a random variable, and <sup>E</sup>*x*0,*w*<sup>0</sup> [·] denotes the conditional expectation given the initial values of *x* and *w*.

To solve (37), the value iterations approach is used, which produces a sequence of value function approximations, *Vn*, at specified grid-points *<sup>x</sup>* ∈ {*x<sup>k</sup>* : *<sup>k</sup>* <sup>∈</sup> *<sup>K</sup>*},

$$\begin{aligned} V\_0(\mathbf{x}^k, \mathbf{w}^i) &\equiv 0, \\ V\_n(\mathbf{x}^k, \mathbf{w}^i) &= \max\_{\mathbf{u}^m, \mathbf{w} \in M} \left\{ \sum\_{j \in \mathcal{J}} F\_{n-1}(f(\mathbf{x}^k, \mathbf{u}^m, \mathbf{w}^i), \mathbf{w}^j) \cdot p\_{ij} \cdot \delta^t + 1 \right\}, \end{aligned}$$

where *<sup>u</sup>* ∈ {*u<sup>m</sup>* : *<sup>m</sup>* <sup>∈</sup> *<sup>M</sup>*} is a specified grid for *<sup>u</sup>*. Here, *<sup>K</sup>* and *<sup>M</sup>* are the size of the grid for *<sup>x</sup>* and *<sup>u</sup>*, respectively. In each iteration, once the values of *Vn*−<sup>1</sup> at the grid-points have been determined, linear or cubic interpolation is employed to approximate *Vn*−1(*f*(*xk*, *<sup>u</sup>m*, *<sup>w</sup><sup>i</sup>* ), *w<sup>j</sup>* ) as *Fn*−1(*x*, *<sup>w</sup><sup>i</sup>* ) = Interpolate[*Vn*−1](*x*, *<sup>w</sup><sup>i</sup>* ), if (*x*, *w<sup>i</sup>* ) ∈ *G*, and *Fn*−1(*x*, *<sup>w</sup><sup>i</sup>* ) = 0, if (*x*, *w<sup>i</sup>* ) <sup>∈</sup>/ *<sup>G</sup>*. A termination criterion of the form <sup>|</sup>*Vn*(*x*, *<sup>w</sup><sup>i</sup>* ) − *Vn*−1(*x*, *<sup>w</sup><sup>i</sup>* )| ≤ for all *<sup>x</sup>* ∈ {*x<sup>k</sup>* : *<sup>k</sup>* <sup>∈</sup> *<sup>K</sup>*} and *<sup>i</sup>* <sup>∈</sup> *<sup>P</sup>*, where <sup>&</sup>gt; 0 is sufficiently small, is used.

Once an approximation of the value function, *V*∗, is available, an optimal control law is determined as:

$$\mu\_\*(\mathbf{x}, \mathfrak{w}^i) \in \left\{ \mathfrak{u} \,:\, V\_\*\left(\mathbf{x}, \mathfrak{w}^i\right) - \sum\_{j \in J} V\_\*\left(f\left(\mathbf{x}, \mathfrak{u}, \mathfrak{w}^i\right), \mathfrak{w}^j\right) \cdot p\_{ij} \cdot \delta - 1 \le \epsilon \right\}.$$

#### **5. Control Law Computations and Results**

*5.1. sUAV Configuration and Model Parameters*

The model was parameterized for a 1.5 kg sUAV [23] that can be used for aerial photography and environmental monitoring applications. The minimum and maximum *SOC* values were set to *SOCmin* = 0.2 and *SOCmax* = 0.8. The minimum and maximum values of *MFR* were set as *MFRmin* = 2 g and *MFRmax* = 9 g. For the value iterations, the *SOC* grid was chosen with a step size of 0.05, and the *MFR* grid was chosen with a step size of 0.5 g. The grid for the control variable *u* was set as {−1, 0, 1}.

The transition probabilities for the operating variables (motor power and climb angle) were obtained from the time histories of the sUAV motor power and climb angle using (35) and assuming a time step Δ*t* = 1 s. These time histories were based on a scenario in which an sUAV follows a moving ground vehicle that sUAV operators are interested in monitoring. In this scenario, the ground vehicle, and consequently the sUAV, is assumed to be traveling with the velocity profile defined by concatenating the EPA Highway Cycle [41] nine times. For the sUAV, the speed profile is modified to remain above the stall speed while avoiding extreme acceleration values.

The climb angle time history, shown in Figure 3, was obtained from the Google Earth elevation profile for a path from Monroe, West Virginia, to Princeton, West Virginia, with the help of GPS visualizing software [42]. See [43] for the assessment of the accuracy of such extracted profiles.

**Figure 3.** Time histories of the sUAV climb angle.

Figure 4 provides the time histories of the sUAV motor power calculated based on the equations in Section 2.3. The trajectories in Figures 3 and 4 were used to compute the transition probabilities.

**Figure 4.** Time histories of the sUAV motor power.

#### *5.2. Control Law Computation*

Cython was used for control law computations as it is more efficient than MATLAB in handling nested for loops and two-dimensional interpolation. In our numerical experiments with dynamic programming, Cython was about 10 times faster than MATLAB.

To further speed up the value iterations, a discount factor was introduced. When testing the effect of the discount factor on the optimal policy, a zero climb angle (*γ* = 0) was assumed, which means that the only operating variable was the motor power. Table 2 shows the average exit time based on 100 random simulations for discount factors from 0.91 to 0.99. The stopping criterion was chosen with = 10−<sup>10</sup> for all *δ*. Computations were performed on a Hasee K780G-i7 laptop with a CORE i7-4710MQ (2.5–3.5 GHz) processor and 24 GB of RAM.Note that the number of iterations and the computing time decrease as the discount factor decreases, but so does the exit time. The discount factor *δ* = 0.95 was ultimately chosen as a compromise between value iteration convergence speed and solution accuracy. Figure 5 shows that the value iterations with a discount factor of *δ* = 0.95 converge much faster than those with *δ* = 1.


**Table 2.** Average exit time for different discount factor.

**Figure 5.** The effect of the discount factor in the value iteration approach.

### *5.3. Endurance Maximization Results*

We used = 10−<sup>10</sup> in the stopping criterion for the value iterations. Figure 6 illustrates the resulting control policy. Note that when *SOC* is low, the control policy calls for an increase in *PFC* to charge the battery. This is reasonable given that the fuel cell cannot alone respond rapidly to fast changes in motor power request, and hence, the battery has to be charged to do so.

**Figure 6.** A cross-section of the control policy in the endurance maximization problem for *PM* = 132.52 W and with (**a**) *γ* = 0 deg, *PFC* = 0 W, (**b**) *γ* = 0 deg, *PFC* = 302.1 W, (**c**) *γ* = 20 deg, *PFC* = 0 W, and (**d**) *γ* = 20 deg, *PFC* = 302.1 W.

The simulation results are given for three cases in Figures 7–18. The first case (Scenario I) corresponds to a higher initial *SOC*, and the second case (Scenario II) considers a lower initial *SOC*. The third scenario is for a mid-range initial *SOC* and is used to confirm the *SOC* behavior observed in the first two scenarios. In all cases, the initial fuel mass and initial fuel cell power are the same: *MFR*,0 = 6 g and *PFC*,0 = 0 W. The dashed lines in Figures 8, 12, and 16 indicate the constraints mentioned in Section 5.1. The spikes of power in Figures 7, 11, and 15 correspond to the time instants when the sUAV starts to accelerate while the positive and negative spikes of climb angle represent the time when the sUAV starts to climb or descend.

**Figure 7.** sUAV *PM* and *γ* versus time, Simulation Scenario I.

**Figure 8.** *SOC* and remaining *MFR* versus time, Simulation Scenario I; dashed lines show constraints.

**Figure 9.** Fuel cell load demand power and split fraction versus time, Simulation Scenario I. The dashed and dashed-dotted lines in the top sub-figure indicate the maximum *PFC* with |*γ*| = 0 deg and |*γ*| = 20 deg, respectively.

**Figure 10.** Battery power, Simulation Scenario I.

**Figure 11.** sUAV *PM* and *γ* versus time, Simulation Scenario II.

**Figure 12.** *SOC* and remaining *MFR* versus time, Simulation Scenario II; dashed lines show constraints.

**Figure 13.** Fuel cell load demand power and split fraction versus time, Simulation Scenario II. The dashed and dashed-dotted lines in the top figure indicate the maximum *PFC* with |*γ*| = 0 deg and |*γ*| = 20 deg, respectively.

**Figure 14.** Battery power, Simulation Scenario II.

**Figure 15.** sUAV *PM* and *γ* versus time, Simulation Scenario III.

**Figure 16.** *SOC* and remaining *MFR* versus time, Simulation Scenario III; dashed lines show constraints.

**Figure 17.** Fuel cell load demand power and split fraction versus time, Simulation Scenario III. The dashed and dashed-dotted lines in the top figure indicate the maximum *PFC* with |*γ*| = 0 deg and |*γ*| = 20 deg, respectively.

**Figure 18.** Battery power, Simulation Scenario III.

Figures 7–10 illustrate the closed-loop response for the first simulation scenario. The initial *SOC* is 0.8, and it decreases rapidly until it reaches a value of about 0.5. Then, it stays near this value between 2000 and 5000 s. Finally, when the mass of hydrogen reaches a relatively low value, the *SOC* starts to decrease and continues to decrease until the constraints are violated. The fuel cell load demand power keeps a relatively low value during the whole flight, and the mean value of the split fraction is negative during the 2000 to 5000 s time interval, which is the period when the *SOC* is kept at about 0.5.

Figures 11–13 illustrate the closed-loop response for the second simulation scenario. The initial *SOC* is 0.2. The battery is charged until it reaches a value of about 0.5 to enable the battery to sustain rapid propeller power fluctuations. Then, the *SOC* stays near that value of 0.5 between 500 and 1500 s. Finally, when the mass of hydrogen reaches a relatively low value, the *SOC* starts to decrease and continues to decrease until the constraints are violated. The fuel cell load demand power increases rapidly at first to charge the battery, then it keeps a relatively low value during the rest of the flight. The mean value of the split fraction is negative from the beginning to about 1500 s, which is the period when the battery is charged from *SOC* = 0.2 to about 0.5.

According to the results from Scenarios I and II, a turnpike behavior of the battery *SOC* is observed, with the *SOC* converging to about 0.5 and staying at that value for a while before decaying. To confirm this turnpike behavior, we additionally considered the responses with the developed policy to the initial SOC of 0.6. These are shown in Figures 15–18. The exit times for Scenarios I, II, and III were, respectively, 9890 s, 6019 s, and 8478 s.

#### **6. Conclusions**

This paper considers an endurance maximization problem for a small unmanned aerial vehicle (sUAV) with a hybrid propulsion system consisting of a polymer electrolyte fuel cell and a battery, both driving an electric motor connected to a propeller. A stochastic drift counteraction optimal control (SDCOC) approach is employed to develop control policies for optimally coordinating the fuel cell and the battery while enforcing the constraints on the fuel cell power output rate of change. Cython is used to implement value iterations and demonstrated an order of magnitude speedup versus MATLAB without increasing the code complexity, due to its efficiency in handling nested for-loops. Additionally, the use of a discount factor is shown to significantly speed up the value iterations at the price of decreased performance. The results illustrate the effectiveness of the SDCOC strategy in regulating the charging behavior of the battery by the fuel cell to provide the capability to respond to rapidly varying motor power demand.

The proposed approach based on SDCOC is particularly suitable for handling stochastic disturbances and can be applied to sUAVs exposed to headwinds with the headwind modeled as a stochastic disturbance. Accounting for such wind disturbances, extensions to include thermal dynamics, systematic and comprehensive comparison with other energy management approaches and propulsion system choices, the systematic study of the robustness to model uncertainties, as well as actual flight experiments represent directions for continuing research. In particular, our study of the discount factor impact on the computation time and exit time suggests that the flight time is sensitive to the choice of the energy management strategy; our approach based on SDCOC is optimal in the sense of maximizing expected flight time within a stochastically modeled environment.

**Author Contributions:** Conceptualization, J.Z., I.K., and M.R.A.; methodology, J.Z. and I.K.; software, J.Z. and I.K.; validation, J.Z. and I.K.; formal analysis, J.Z., I.K., and M.R.A.; investigation, J.Z., I.K., and M.R.A.; resources, J.Z., I.K., and M.R.A.; data curation, J.Z., I.K., and M.R.A.; writing—original draft preparation, J.Z., I.K., and M.R.A.; writing—review and editing, I.K. and M.R.A.; visualization, J.Z.; supervision, I.K.; project administration, I.K. All authors read and agreed to the published version of the manuscript.

**Funding:** This research was supported in part by the U.S. National Science Foundation (NSF) under Award Number ECCS-1931738.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

