Safety-Aware Optimal Attitude Pointing for Low-Thrust Satellites

Henninger, Helen; Biggs, James; von Ellenrieder, Karl

doi:10.3390/app11073002

Open AccessFeature PaperArticle

Safety-Aware Optimal Attitude Pointing for Low-Thrust Satellites

by

Helen Henninger

^1,*

,

James Biggs

²

and

Karl von Ellenrieder

¹

Facolta di Scienze e Tecnologie, Free University of Bozen-Bolzano, Piazza Universita’ 5, 39100 Bolzano, Italy

²

Dipartimento di Scienze e Tecnologie Aerospaziali, Politecnico di Milano, Via La Masa 34, 20156 Milano, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(7), 3002; https://doi.org/10.3390/app11073002

Submission received: 24 February 2021 / Revised: 22 March 2021 / Accepted: 24 March 2021 / Published: 27 March 2021

(This article belongs to the Special Issue Advances in Aerial, Space, and Underwater Robotics)

Download

Browse Figures

Versions Notes

Abstract

:

In geostationary orbit, long eclipses and the seasonal variations in the direction and intensity of the solar input can cause damage to sensitive equipment during attitude maneuvers, which may inadvertently point the equipment towards the Sun. The requirement that transmitting and receiving antennae remain pointed towards the Earth creates further restrictions to pointing directions. The aim of the study is to construct a novel geometric and reinforcement-learning-based method to determine attitude guidance maneuvers that maintain the equipment in safe and operational orientations throughout an attitude maneuver. The attitude trajectory is computed numerically using the geometric framing of Pontryagin’s maximum principle applied to the vehicle kinematics using the global matrix Lie group representation on

S O (3)

, and the angular velocities are shaped using free parameters. The values of these free parameters are determined by a reinforcement learning algorithm to avoid the forbidden areas while maintaining the pointing in operational areas (modeled as subsets of the two-sphere of all possible pointing directions of a particular axis). The method is applied to a model geosynchronous satellite and demonstrated in a simulation.

Keywords:

attitude control; satellites; geometric control; reinforcement learning; constrained motion planning

1. Introduction

Earth-pointing satellites are sometimes required to perform attitude maneuvers from one direction in space to another in such a way that, en route, payloads such as telescopes do not see bright objects such as the Sun or Moon, while maintaining the communication with the ground via the beam pencil of a communications antenna. To respect these pointing constraints, we require the satellite orientation to respect a particular pointing set. The pointing set is a subset of the satellite’s configuration set that guarantees safety (in the case of bright spot avoidance) or achieves a mission outcome (in the case of the pointing of communications antennae). The parallel concept of a safe set in robot guidance traditionally implies solely the robot’s

[x, y, z]

position, but in this paper, we extend this idea to orientation.

Several approaches to the overall constrained attitude motion planning problem are developed. For instance, Reference [1] developed attitude commands from a geometrical perspective, defining exclusion (or keep-out) zones on a unit sphere and determining ideal tangential paths around these zones. In [2], the problem was approached by applying the potential function method. Artificial potential functions guide the satellite during the attitude maneuver and avoid the violation of pointing constraints by overlaying regions of high potential around the forbidden regions. In [3], the unit sphere was discretized into a graph, and an admissible path between attitude keep-out zones was found with the

A^{*}

pathfinding algorithm. Randomized path planning algorithms were used by [4]. Here, solution paths were chosen in a random way, and a tree of possible paths was evaluated in the target direction. The all-time low-cost admissible path was chosen. In contrast to most other methods in the literature, Reference [4] treated the matter directly on the special orthogonal group

S O (3)

. Using

S O (3)

to represent the spacecraft’s rotation allows pointing constraints to be imposed on multiple body-fixed directions. Biggs [5] found reference motions for reorienting a spinning satellite. The path-planning problem with pointing constraints was addressed using optimization of an analytically defined cost function, which includes these parameters. The analytical expression was expressed explicitly on the special orthogonal group

S O (3)

(instead of employing a local parameterization like Euler angles). However, no explicit way of determining the parameters to respect the pointing constraints was presented.

One way of determining the parameters required is to preserve the pointing set is to “teach” the satellite’s guidance system via a learning algorithm to respect the boundaries of the pointing set. In reinforcement learning (RL), an agent (the vehicle) maps observations (states) to actions via a policy function, which can generally be expressed as a neural network to maximize its reward, a scalar feedback signal. The goal of RL is to determine actions that generate a desired system behavior.

The application of RL in obstacle avoidance is not new; the study [6] used an artificial potential field to design the positive rewards of the RL algorithm. According to the different requirements of the task, rewards were modified in the training process to obtain different paths. A control strategy with learning capabilities for unknown environments was demonstrated in [7] using RL with only sparse reward information. A discrete coding of the input space using a neural network structure was presented, which enabled a faster and more efficient convergence of the RL process. Preservation of a safe set is the fundamental concept of motion planning, which avoids catastrophic outcomes by maintaining motion within certain constraints defined by safety. Collision avoidance is the most commonly applied example of this kind of motion planning. A method was described for generating plan-like, reflexive obstacle avoidance behavior in a mobile robot in [8]. The paper showed experiments using a simulated vehicle with a primitive range sensor. As the vehicle explored its surroundings, it adapted its responses to sensory stimuli so as to minimize the negative reinforcement arising from collisions. The problem of a mobile robot learning to navigate an unknown environment while avoiding collisions was considered in [9]: an uncertainty-aware model-based learning algorithm that estimates the probability of collision together with a statistical estimate of uncertainty. In order to learn collision avoidance, the robot must experience collisions at training time. The vehicle accelerates in velocity in settings where it has high confidence. In [10], a tailored design of state and action spaces and a dueling deep Q-network (a deep RL network for autonomous navigation and obstacle avoidance) were proposed for unmanned surface vehicles. In [11], a method for enabling a quadrotor equipped with a monocular camera to autonomously avoid collisions with obstacles in unstructured environments was given. They proposed a deep RL-based method for obstacle avoidance that uses recurrent neural networks with temporal attention and provides better results compared to prior works in terms of distance covered without collisions. The paper [12] focused on the application of RL to obstacle avoidance in dynamic environments for the control of a mobile robot. An intelligent controller was proposed by integrating RL with the behavior-based control architecture and applied to the obstacle avoidance. The problem in which many autonomous systems are designed with a strong reliance on black box predictions from deep neural networks, which tend to be overconfident about unseen data, was considered in [13]. The importance of predictions that are robust to this is evident for safety-critical applications, such as collisions. The paper used an RL framework to form uncertainty-aware navigation around pedestrians.

Our aim in this paper is to use RL in the planning layer of the architecture, to produce a trajectory between two orientations that maintains the spacecraft in the pointing set throughout the maneuver. The approach combines and extends that described in [5]; we describe the system kinematics on

S O (3)

, and then formulate the problem as an optimal kinematic control problem on a matrix Lie group. An application of Pontryagin’s maximum principle (PMP) gives the necessary conditions for optimality in the form of extremal equations, which as in [5] contain a number of unknown parameters. RL is applied to augment the shape of the curve to avoid a forbidden region, by selecting the value of the parameter such that the path remains in the pointing set. These amended parameter values are then used to construct a new trajectory that maintains the satellite in the pointing set.

2. Methods

We considered the case of a satellite in geostationary orbit equipped with a communications antenna with a relatively small beam pencil that needed to be pointed towards the communications ground-station at all times and a sensitive on-board telescope that needed to avoid a bright celestial object. Expressing the attitude kinematics as

R (t) \in S O (3)

, we have:

\frac{d R (t)}{d t} = R (t) (ω_{1} (t) A_{1} + ω_{2} (t) A_{2} + ω_{3} (t) A_{3})

(1)

where

R (t) \in S O (3)

, and:

A_{1} = [\begin{matrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ - 1 & 0 & 0 \end{matrix}], A_{2} = [\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & - 1 & 0 \end{matrix}], A_{3} = [\begin{matrix} 0 & - 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}]

(2)

is the standard basis of the Lie algebra

so (3)

. We developed a three-step method that solves the attitude guidance problem of determining the desired

ω_{1}, ω_{2}, ω_{3}

and solving Equation (1) numerically such that the boundary conditions:

R (0) = R_{0} R (t_{f}) = R_{d}

(3)

(where

R_{0}

is the prescribed initial attitude and

R_{d}

the desired final attitude) are satisfied.

2.1. Bright Spot Avoidance

Figure 1 shows the local vertical and local horizontal (LVLH) reference frame, which has its origin at the satellite center of mass and has orthogonal axes (denoted by

x, y, z

), with the z-axis from the origin of the frame towards the Earth center, the x-axis in the orbital plane in the direction of orbital motion and perpendicular to the z-axis, and the y-axis completing the frame by the right-hand rule. A second, body-fixed, set of axes (denoted by

X, Y, Z

) and also centered at the satellite center of mass move with the body, and the spacecraft attitude is then defined as the relative angle from the local-level coordinates to the body frame.

The line-of-sight unit vector

ℓ_{0}

is along the initial direction of the boresight, and

ℓ_{f}

denotes its final desired direction. The figure depicts a bright object to be avoided by the telescope. The centroid of the bright point,

s

(from the Sun), is specified by an angle (azimuth)

α^{*}

in the slew-plane

ℓ_{0} - ℓ_{f}

and an elevation angle

ϵ^{*}

that measures its angle from the Y-axis. Although the Sun is moving with respect to the Earth-centered frame, we may assume for our purposes that since the maneuver time is significantly shorter than a solar day, the Sun’s position is fixed in this frame. The celestial object is avoided by a minimum specified angle denoted

ϵ_{s p e c}

; that is, the dot product of the Sun’s position vector

s

with the boresight axis position

a

must fall within the set:

S_{1} = {v \in R^{3} | s • v \leq sin ϵ_{s p e c}, ∥ v ∥ = 1} .

(4)

This set describes a region bounded by a cone in

R^{3}

, and the intersection of this cone with the sphere

S^{1}

describes the boundary of the forbidden region of the orientation of the boresight axis. In this paper, the boresight axis is denoted by

a = [2, 2, 1]

, and the trajectory traced by the boresight is then be given by:

a (t) = R (t) \cdot a .

(5)

This trajectory

a (t)

must lie within the set

S_{1}

at all times.

2.2. Communications Pointing Maintenance

In addition to the telescope in the previous section, we considered that the spacecraft has a communications antenna with a beam pencil centered on the Z-axis with pitch

ϕ

. We wished to ensure that the beam pencil does not pass out of contact with a ground station directly at the nadir; i.e., the z-axis in the body-fixed frame must remain within

sin 2 ϕ

of the Z-axis in the LVLH frame. This criterion requires that the dot product of the antenna axis must fall within the set defined by the inequality:

S_{2} = {v \in R^{3} | [0, 0, 1] • v \leq s i n 2 ϕ, ∥ v ∥ = 1},

(6)

The boundary of this region is a cone in

R^{3}

, and the intersection of this cone with the sphere

S^{2}

describes the boundary of the region in which the antenna axis must remain. In this paper, the antenna axis is denoted by

b = [0, 0, 1]

, and the trajectory traced by the antenna is then given by:

b (t) = R (t) \cdot b .

(7)

This trajectory

b (t)

must lie within the set

S_{2}

at all times.

In what follows, the maintenance of the pointing set means that:

b (t) \in S_{2} \land a (t) \notin S_{1}

(8)

for all

t \in [0, 1]

.

The boundaries of these two sets are shown on the sphere in Figure 2.

2.3. Trajectory Generation

We determined the solution

g (t) \in G

of the left-invariant differential system (1) that satisfies the boundary conditions:

R (0) = R_{0}, R (T_{f}) = R_{F}

(9)

while minimizing the expression:

J = \frac{1}{2} \int_{0}^{T_{f}} (c_{1} ω_{1}^{2} + c_{2} ω_{2}^{2} + c_{3} ω_{3}^{2}) d t .

(10)

This cost

J

in Equation (10) allows one to define a large class of trajectories where the weights of the cost function can be chosen to alter the shape of the path between two prescribed boundary configurations. A standard geometric control theory approach was used to generate these trajectories to minimize a cost

J

; for context, as for the nomenclature used in this section, the reader is directed to Appendix A, as well as any reader unfamiliar with geometric control theory.

As stated in Appendix A, to determine the optimal trajectory minimizing the cost

J

, we required integrating the kinematic equations (Equation (1)) and the extremal equations (Equation (A2)) simultaneously in the real-time interval

[0, T_{f}]

. To do this, we used numerical integration in the following way: setting:

P_{1} = M_{1}, P_{2} = M_{2}, P_{3} = M_{3}

(11)

then by the PMP, the optimal values of the velocities are given by:

ω_{1} = P_{1} / c_{1}, ω_{2} = P_{2} / c_{2}, ω_{3} = P_{3} / c_{3}

(12)

and for g in

S O (3)

, the solution of the kinematic equation, we may assign:

[\begin{matrix} P_{4} & P_{5} & P_{6} \\ P_{7} & P_{8} & P_{9} \\ P_{10} & P_{11} & P_{12} \end{matrix}] = [\begin{matrix} g_{11} & g_{12} & g_{12} \\ g_{21} & g_{22} & g_{23} \\ g_{31} & g_{32} & g_{33} \end{matrix}] .

(13)

Using the fact that from (1):

\dot{R} = R (u_{1} A_{1} + u_{2} A_{2} + u_{3} A_{3}),

(14)

(where

R (t)

denotes the desired trajectory), we may express the kinematic and extremal equations together in vector form:

\begin{matrix} \frac{d P_{1}}{d t} & = & - P_{2} P_{3} / c_{2} + P_{2} P_{3} / c_{3} \\ \frac{d P_{2}}{d t} & = & P_{1} P_{3} / c_{1} - P_{1} P_{3} / c_{3} \\ \frac{d P_{3}}{d t} & = & - P_{1} P_{2} / c_{1} + P_{1} P_{2} / c_{2} \\ \frac{d P_{4}}{d t} & = & P_{4} \\ ⋮ \\ \frac{d P_{12}}{d t} & = & P_{12} . \end{matrix}

(15)

The initial value is given by:

P_{0} = {[λ_{0}, R_{0}^{11}, R_{0}^{12}, \dots, R_{0}^{33}]}^{T}

(16)

where

R_{0}^{i j}

are the

i, j

components of the matrix

R_{0}

in Equation (9), and

λ_{0} = [p_{1}^{0}, \dots, p_{3}^{0}, M_{1}^{0},

\dots, M_{3}^{0}]

, the unknown initial vector for the extremal equations. Integrating (15) starting from

P_{0}

with an initial guess for

λ_{0}

and rearranging the solution

P_{7}, \dots, P_{22}

using the function:

Φ : R^{16} \to R^{4 \times 4}, Φ ([x_{1}, x_{2}, \dots, x_{16}]) = [\begin{matrix} x_{1} & x_{4} & x_{7} \\ x_{2} & x_{5} & x_{8} \\ x_{3} & x_{6} & x_{9} \end{matrix}]

(17)

we then obtained the

Φ ([P_{4} (t), \dots, P_{12} (t)]) = R (t)

, the matrix that gives the optimal solution in the configuration space.

Note: we are aware that for integrating Equation (1) developed on a matrix Lie group, symplectic integration is preferred over a numerical integration of (15). However, we calculated the deviation from the Lie group over the time interval

[0, T_{f}]

used in the simulations presented in Section 2 by calculating the norm of the difference of

R (t) R {(t)}^{T}

and the identity matrix over the whole of the trajectory to determine if the structure of

S O (3)

was preserved. Figure 3 shows that the difference is of the order of magnitude of

10^{- 13}

, while Figure 4 and Figure 5 also strongly suggest this result since the curves remain on the surface of the sphere throughout the maneuver.

The following steps were then used to solve the optimal control problem of moving the vehicle from the initial to the final configuration:

Step 1: Determine the solution $R (1, λ_{0}, c_{i}^{0})$ of (1) that minimizes the cost (10) by integrating the equations using (15) in terms of a chosen value $c_{i}^{0} = [1, 1, 1]$ the unknown boundary conditions $λ_{0}$ on the (imaginary) time domain $[0, 1]$ .
Step 2: Construct the “error matrix” $\tilde{R} (1, λ_{0}, c_{i}^{0}) = R (1, λ_{0}, c_{i}^{0}) - R_{D}$ and the function:

$S (λ_{0}) = ∥ \tilde{R} (λ_{0}, c_{i}^{0}) ∥$

for the fixed choice of $c_{i}^{0}$ .
Step 3: Find $λ_{0}$ that minimizes $S (λ_{0})$ for the particular choice of $c_{i}^{0}, T_{f} = 1$ , using a local minimizer.

After Step 3 was completed, an initial trajectory was determined in terms of the particular choice of variables

c_{i}^{0}

. We went on to apply RL to allow the vehicle to reactively choose a new optimal path that avoided the obstacles by adapting the

c_{i}

values away from the chosen initial values

c_{i}^{0}

. The initial choice of one for each

c_{i}

value was made, which corresponded in the case that energy was being evaluated to equal inertia values for each axis.

2.4. Reinforcement Learning

In this subsection, we specify the observation and reward functions that we used in conjunction with MATLAB’s Reinforcement Learning Toolbox software. In this software, the policy that selects the best action to take is represented in the form of a neural network. In this case, the action is an amendment of the values

c_{1}^{0}, c_{2}^{0}, c_{3}^{0}

; the action output is added to the

c_{i}^{0}

values selected in Step 1 in order to choose a new path such that the obstacle is avoided.

The observation sent to the RL block consists of detecting the boundary of the sets

S_{1}

and

S_{2}

, and the reward function

r (t)

is set up in order to ensure that the condition in (8) is maintained. In particular, when the antenna intersects the boundary of the pointing set

S_{2}

, the condition (denoting

b = [x_{b}, y_{b}, z_{b}]

) is expressed by a logical value:

o b s_{2} (t) = \{\begin{matrix} 1 & i f x_{b}^{2} + y_{b}^{2} > = r_{2}^{2} a n d z_{b} > = 0.791 \\ 0 \end{matrix}

(18)

where

r_{2} = sin (2 ϕ)

and

ϕ = 0.175

. Since the antenna axis is taken as

[0, 0, 1]

, the boundary of the region is clearly a plane parallel to the x-y-axis, and

z_{b} = 0.791

where it intersects the z-axis arises by choosing

ϕ = 0.174

; therefore, the coordinates of the boundary projected on the plane) are given by

X_{b} = sin (2 ϕ) cos (θ)

,

Y_{b} = sin (2 ϕ) sin (θ)

(for

θ \in [0, 2 π]

), and

z_{b}

can be calculated from the inverse stereographic projection through the south pole,

z_{b} = \frac{1 - X_{b}^{2} - Y_{b}^{2}}{1 + X_{b}^{2} + Y_{b}^{2}} .

To determine the boundary of the set

S_{1}

, we used the projection from sphere to plane,

[X, Y] = (\frac{x}{1 - z}, \frac{y}{1 - z})

to map the vector

a = [x_{a}, y_{a}, z_{a}]

to the plane where it has the form

[X_{a}, Y_{a}]

and the Sun vector

s = [x_{s}, y_{s}, z_{s}]

to the plane where it has the form

[X_{s}, Y_{s}]

, and we calculated the value:

A = {(X_{a} - X_{s})}^{2} + {(Y_{a} - Y_{s})}^{2} .

The intersection of

a

with the boundary of the set

S_{1}

is then described by:

o b s_{1} (t) = \{\begin{matrix} 1 & i f A \leq r_{1}^{2} \\ 0 \end{matrix}

(19)

where

r_{1} = 0.25

. The reward is correspondingly calculated as:

r (t) = \{\begin{matrix} - 10 & i f o b s_{1} (t) = 1 \lor o b s_{2} (t) = 1 \\ 10 & if o b s_{1} (t) = 0 \land o b s_{2} (t) = 0 . \end{matrix}

(20)

The RL-block then uses a neural network to calculate the three action outputs based on the values of the reward and observation functions. We chose the simple reward function (20) to avoid the issues of “reward shaping”, which may lead the agent to make unnecessarily elaborate maneuvers to increase the reward. A drawback of the Reinforcement Learning Toolbox is that the neural network defining the policy function component is a “black box”, and so, we had no real insight into the scaling involved. In this case, it was noticed that the policy was outputting actions to the order of

10^{- 3}

, so it was required to introduce a scaling of the actions to produce reasonable results. After trial and error, we determined that a scaling of 100 was efficient. Some further insight into the neural networks used in the policy function could have assisted in choosing a scaling term less heuristically.

2.5. Choosing a New Optimal Path Using Reinforcement Learning

After the initial guess

c_{i}

, the values

c_{i}^{0}

were amended to the obstacle-avoidance values chosen by RL (which we denote by

c_{i}^{r l}

, which as we noted in the previous section was determined by

c_{i}^{r l} = c_{i}^{0} + a c t i o n

), and we constructed a new trajectory that avoided the obstacle. We followed Steps (1)–(3)with the following amendments in order to construct the final trajectory:

Step (1a): Determine the solution $R (1, λ_{0}, c_{i}^{r l})$ of (1) that minimizes the cost (10) using (15) in terms of the unknown boundary conditions $λ_{0}$ on the imaginary time domain $[0, 1]$ .
Step (2a): Construct the “error matrix” $\tilde{R} (1, λ_{0}, c_{i}^{r l}) = R (1, λ_{0}, c_{i}^{r l}) - R_{T_{f}}$ and the function:

$S (λ_{0}) = ∥ \tilde{R} (1, λ_{0}, c_{i}^{r l}) ∥ .$

(21)
Step (3a): Find $λ_{0}$ that minimizes $S (λ_{0})$ using a local minimizer.

3. Results

We applied the steps described in Section 2 to create attitude maneuvers for a geostationary satellite from initial configuration

R_{0}

to final configuration

R_{D}

. We show two examples:

\begin{matrix} c c c P a t h 1 : & R_{0} = [\begin{matrix} - 0.9933 & 0.0997 & - 0.0584 \\ 0.0998 & 0.9950 & 0 \\ 0.0581 & - 0.0058 & - 0.9983 \end{matrix}], & R_{D} = [\begin{matrix} 0.5817 & 0.6517 & 0.4868 \\ 0.8134 & - 0.4660 & - 0.3481 \\ 0 & 0.5985 & - 0.8011 \end{matrix}] \end{matrix}

(22)

\begin{matrix} c c c P a t h 2 : & R_{0} = [\begin{matrix} - 0.6504 & 0.0653 & - 0.7568 \\ 0.0998 & 0.9950 & 0 \\ 0.7530 & - 0.0756 & - 0.6536 \end{matrix}], & R_{D} = [\begin{matrix} 0.21078 & 0.9761 & 0.0525 \\ 0.2947 & - 0.0123 & - 0.9555 \\ - 0.9320 & 0.2170 & - 0.2903 \end{matrix}] \end{matrix}

(23)

Rather than simply plotting these two maneuvers, we ran a series of simulations on an incremented set of the final position of Paths 1 and 2 to create a class of curves starting from the initial point

R_{0}

and going to a set of perturbations of the end point

R_{D}

. This showed that the method successfully adjusted each curve in the class to remain within the pointing set, rather than showing a single curve from

R_{0}

to

R_{D}

, which could represent an exceptional case.

The incremented sets of the final positions for Paths 1 and 2 are given by:

R_{1} = {exp (A_{2} p_{i}) R_{D} | p_{i} = - 0.5 + i / 18, i \in [0, 1, \dots, 9]},

R_{2} = {exp (- 2 A_{3}) exp (\frac{1}{2} A_{1}) exp (\frac{π}{2} A_{2}) exp (A_{1} p_{i}) R_{D} | p_{i} = - 0.2 + i / 22.5, i \in [0, 1, \dots, 9]}

where the rotation matrix used to determine the iterations was chosen such that the curves in the set from the initial point

R_{0}

to the final points in the sets

R_{i}

would violate the conditions stated for each path.

The plots of the maneuvers can be seen in Figure 4a,b and Figure 5a,b. Running 10 examples also allowed us to calculate the average CPU time for each of the two cases: for Path 1, the average CPU time was 1.437 s, and for Path 2, the average CPU time was 2.113 s. The CPU time was calculated as the sum of the CPU time for the MATLAB script used to calculate the curves in Figure 4a and Figure 5a (using Steps 1–3), the Simulink run to determine the new

c_{i}

values, and then, the second MATLAB script used to calculate the curves in Figure 4b and Figure 5b (using Steps 1a–3a).

We chose Path 1 where the original optimal trajectory could be seen to cause the telescope to pass across the bright spot S, while in Path 2, the original optimal trajectory designed for the antenna did not fall in the desired pointing region. One can see from the adapted trajectories in Figure 4b and Figure 5b that the

c_{i}^{0}

values were successfully updated in order to point the telescope

a (t)

away from the forbidden region at all times while maintaining the communications antenna curve

b (t)

in the desired region.

4. Discussion and Conclusions

This paper presented a method to obtain optimal attitude guidance controls for a spacecraft in geostationary orbit, avoiding pointing sensitive instrumentation towards the Sun during an attitude maneuver while maintaining the direction of a beam pencil. This method used RL to produce an optimal trajectory that reacted to the presence of obstacles to maintain the configuration in the pointing set. The simulations carried out demonstrated that the trajectory that we designed avoided the forbidden pointing direction while maintaining the antenna in the operational pointing direction for two classes of paths, one that violated the bright-spot condition and a second that violated the antenna-pointing condition. When comparing the average CPU times for the two classes of solutions shown here (Path 1, 1.437 s, and Path 2, 2.113 s) with those determined in [5], which perform Lie group-based pointing while avoiding a bright spot (between 0.36 and 0.406 s), the comparison seems unfavorable: however, in their method, the user is required to first run the program with a “guessed”

c_{i}

value and then increment the

c_{i}

values manually, each time running and then visually confirming that the trajectory avoids the forbidden region, a process that is certainly less efficient even if it is less computationally demanding. Therefore, a comparison of CPU times between the two papers is not entirely meaningful, and in terms of the ease of operation, ours is certainly preferable.

A possible drawback of our method is that we assumed the attitude to be precisely known. However, in a real mission, there will be delays in the pose estimation. We also assumed that the initial and final orientations were chosen such that the Sun-avoidant and beam pencil vectors fell in the pointing set at the initial and final position, which is reasonable, but a refinement could be to select a suitable orientation to replace the initial and final orientations if they are chosen to fall outside of the pointing set. Another weakness is the “black box” nature of the policy function neural network; also, we constructed a simple reward function and did not perform any reward shaping: it was seen as preferable since a poorly shaped reward function can cause a solution that is not ideal even if it provides the highest reward. This can be seen particularly in the fact that the vehicle passes the obstacle relatively closely and then overcompensates through the rest of the trajectory. Refining of the reward function and the inclusion of time and sensor delays are areas to be addressed in future work.

Author Contributions

H.H. carried out the research during her Postdoc with J.B. on attitude control for satellites and developed a novel geometric method for motion planning on the frame bundle of connected surfaces of arbitrary constant cross-sectional curvature. K.v.E. and H.H. noticed that an improvement could be made to the constrained planning approach used by J.B. in the paper [5] by combining this geometric method with reinforcement learning. H.H. developed the concept, wrote the paper, and carried out the simulations, and J.B. and K.v.E. provided constructive criticism and technical input. All authors read and agreed to the published version of the manuscript.

Funding

This work was sponsored by the European Regional Development Fund (FiRST Lab Project #FESR 1084).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The lead author can be contacted for the data generated during the simulations. No external data sets were used.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

RL	Reinforcement learning
PMP	Pontryagin’s Maximum Principle
LVLH	Local horizontal-local vertical

Appendix A. Optimal Control Approach

PMP [14] is a necessary condition for optimality, which is most naturally expressed in the language of the geometry of the cotangent bundle

T^{*} G

of a matrix Lie group G, where G is the configuration space of a particular motion (in this paper,

G = S O (3)

) [15]. The cotangent bundle

T^{*} G

can be trivialized (from the left) such that

T^{*} G = G \times g^{*}

, where

g^{*}

is the dual space of the Lie algebra

g

. PMP is associated with the system Equation (1) with the cost

J

in (10) an optimal Hamiltonian function H on

T^{*} G = G \times g^{*}

. It follows that the cotangent bundle of any manifold G is endowed with a canonical symplectic two-form

χ_{ξ}

that associates to each function H on

T^{*} G

a vector field

\vec{H}

defined through the relation:

d H_{ξ} (v) = χ_{ξ} ({\vec{H}}_{ξ}, v)

and each tangent vector

v \in T_{ξ} (T^{*} G)

.

Since in the case of

S O (3)

,

so {(3)}^{*}

can be identified with

so (3)

[15], in this case, the Hamiltonian function is a function on

S O (3) \times so (3)

. It follows that the Hamiltonian vector fields are given by the equation

\vec{H} [\cdot, \cdot] = \{., .\}

, where

\{., .\}

denotes the right Poisson bracket. Let

\hat{p}

denote an element of

so {(3)}^{*}

. Then, denoting the lift of the basis

A_{1}, A_{2}, A_{3}

as

M_{i} = \hat{p} (A_{i})

, an optimal trajectory

R (\cdot) : [0, t] \to S O (3)

is a projection of an integral curve

(R (\cdot), M (\cdot))

of the time-varying Hamiltonian vector field

\vec{H}

that satisfies the boundary conditions given in (1) and (10). This is called the projection “downstairs”, and

M (\cdot)

is the extremal curve, or the projection “upstairs” of

(R (\cdot), M (\cdot))

. The projection downstairs satisfies the differential equation:

R^{- 1} (t) (\frac{d R}{d t}) = d H,

(A1)

(where

d H = \sum_{i = 1}^{3} \frac{\partial H}{\partial P_{i}} A_{i}

, the unique vector field satisfying

d H (v) = χ (\vec{H}, v)

), while the projection upstairs can be determined by solving the extremal equations given in the following theorem.

Theorem [16]: The non-canonical Hamiltonian vector fields that are necessary conditions for optimality under the PMP are defined by:

\begin{matrix} \frac{d M_{1}}{d t} & = & - \frac{M_{2} M_{3}}{c_{2}} + \frac{M_{2} M_{3}}{c_{3}}, \\ \frac{d M_{2}}{d t} & = & \frac{M_{1} M_{3}}{c_{1}} - \frac{M_{1} M_{3}}{c_{3}}, \\ \frac{d M_{3}}{d t} & = & - \frac{M_{1} M_{2}}{c_{1}} + \frac{M_{1} M_{2}}{c_{2}} . \end{matrix}

(A2)

While in our particular case, optimality was used to produce a certain class of useful trajectories, we note that from [16], when the weights

c_{i}

are fixed to the values

c_{i} = I_{i}

where

I_{i}

represents inertia about the i-th axis, the trajectory

(R (\cdot), M (\cdot))

is an energy-minimal trajectory. Here, however, we used the optimization process to access a class of smooth trajectories described using a global representation of the attitude on the matrix Lie group

S O (3)

.

References

Hablani, H.B. Attitude Commands Avoiding Bright Objects and Maintaining Communication with Ground Station. J. Guid. Control Dyn. 1999, 22, 759–767. [Google Scholar] [CrossRef]
McInnes, C.R. Large Angle Slew Manoeuvres with Autonomous Sun Vector Avoidance. J. Guid. Control Dyn. 1994, 17, 875–877. [Google Scholar] [CrossRef]
Kjellberg, H.C.; Lightsey, E.G. Discretized Constrained Attitude Pathfinding and Control for Satellites. J. Guid. Control Dyn. 2013, 36, 1301–1309. [Google Scholar] [CrossRef]
Frazzoli, E.; Dahleh, M.A.; Feron, E.; Kornfeld, R. A Randomized Attitude Slew Planning Algorithm for Autonomous Spacecraft. In Proceedings of the AIAA Guidance, Navigation and Control Conference, Minneapolis, MN, USA, 13–16 August 2012. [Google Scholar]
Biggs, J.D.; Colley, L. Geometric Attitude Motion Planning for Spacecraft with Pointing and Actuator Constraints. J. Guid. Control Dyn. 2016, 1672–1677. [Google Scholar] [CrossRef]
Sun, Y.; Ran, X.; Zhang, G.; Xu, H.; Wang, X. AUV 3D Path Planning Based on the Improved Hierarchical Deep Q Network. J. Mar. Sci. Eng. 2020, 8, 145. [Google Scholar] [CrossRef] [Green Version]
Macek, K.; PetroviC, I.; Peric, N. A reinforcement learning approach to obstacle avoidance of mobile robots. In Proceedings of the 7th International Workshop on Advanced Motion Control, Maribor, Slovenia, 3–5 July 2002. [Google Scholar]
Prescott, T.; Mayhew, J. Obstacle avoidance through reinforcement learning. Adv. Neural Inf. Process. Syst. 1991, 4, 523–530. [Google Scholar]
Kahn, G.; Villaflor, A.; Pong, V.; Abbeel, P.; Levine, S. Uncertainty-aware reinforcement learning for collision avoidance. arXiv 2017, arXiv:1702.01182. [Google Scholar]
Wu, X.; Chen, H.; Chen, C.; Zhong, M.; Xie, S.; Guo, Y.; Fujita, H. The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method. Knowl. Based Syst. 2020, 196, 105201. [Google Scholar] [CrossRef]
Singla, A.; Padakandla, S.; Bhatnagar, S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge. IEEE Trans. Intell. Transp. Syst. 2019. [Google Scholar] [CrossRef]
Qiao, J.; Hou, Z.; Ruan, X. Application of reinforcement learning based on neural network to dynamic obstacle avoidance. In Proceedings of the IEEE 2008 International Conference on Information and Automation, Changsha, China, 20–23 June 2008. [Google Scholar]
Lütjens, B.; Everett, M.; How, J.P. Safe reinforcement learning with model uncertainty estimates. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019. [Google Scholar]
Boltyanskii, V.G.E.; Gamkrelidze, R.V.Y.; Pontryagin, L.S. The Theory of Optimal Processes: I. The Maximum Principle; TRW Space Technology Labs: Los Angeles, CA, USA, 1960. [Google Scholar]
Jurdjevic, V. Geometric Control Theory; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Biggs, J.D. Integrable Hamiltonian Systems on Six Dimensional Lie Groups. Ph.D. Thesis, The University of Reading, Reading, UK, 2007. [Google Scholar]

Figure 1. The inertial LVLH frame (X,Y,Z) and the body fixed frame ((x,y,z) shown in grey) with the telescope boresight initial and final directions

ℓ_{0}

and

ℓ_{f}

.

Figure 1. The inertial LVLH frame (X,Y,Z) and the body fixed frame ((x,y,z) shown in grey) with the telescope boresight initial and final directions

ℓ_{0}

and

ℓ_{f}

.

Figure 2. The boundaries of the pointing set. The bright spot set that needs to be avoided (

S_{2}

) is the interior of the sphere with the dashed boundary, while the set where the antenna needs to remain (

S_{1}

) is the interior of the sphere with the solid boundary.

Figure 2. The boundaries of the pointing set. The bright spot set that needs to be avoided (

S_{2}

) is the interior of the sphere with the dashed boundary, while the set where the antenna needs to remain (

S_{1}

) is the interior of the sphere with the solid boundary.

Figure 3. Frobenius norm of the difference between

R (t) R {(t)}^{T}

and the identity matrix for the trajectories shown in Figure 4a (dashed) and Figure 5a (solid).

Figure 3. Frobenius norm of the difference between

R (t) R {(t)}^{T}

and the identity matrix for the trajectories shown in Figure 4a (dashed) and Figure 5a (solid).

Figure 4. Correcting for a bright spot. The initial trajectories constructed between the points (a) in determining the pointing-constraint trajectory described by the parameters in Path 1 and the same trajectories after choosing the

c_{i}

values using reinforcement learning (b). The initial points are indicted by + and the final points by * (initial and final points from Equation (22).

Figure 4. Correcting for a bright spot. The initial trajectories constructed between the points (a) in determining the pointing-constraint trajectory described by the parameters in Path 1 and the same trajectories after choosing the

c_{i}

values using reinforcement learning (b). The initial points are indicted by + and the final points by * (initial and final points from Equation (22).

Figure 5. Correcting for antenna pointing. The initial trajectories constructed between the points (a) and the same trajectories after choosing the

c_{i}

values using reinforcement learning (b). The initial points are indicated by + and the final points by * (initial and final points from Equation (23)).

Figure 5. Correcting for antenna pointing. The initial trajectories constructed between the points (a) and the same trajectories after choosing the

c_{i}

values using reinforcement learning (b). The initial points are indicated by + and the final points by * (initial and final points from Equation (23)).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Henninger, H.; Biggs, J.; von Ellenrieder, K. Safety-Aware Optimal Attitude Pointing for Low-Thrust Satellites. Appl. Sci. 2021, 11, 3002. https://doi.org/10.3390/app11073002

AMA Style

Henninger H, Biggs J, von Ellenrieder K. Safety-Aware Optimal Attitude Pointing for Low-Thrust Satellites. Applied Sciences. 2021; 11(7):3002. https://doi.org/10.3390/app11073002

Chicago/Turabian Style

Henninger, Helen, James Biggs, and Karl von Ellenrieder. 2021. "Safety-Aware Optimal Attitude Pointing for Low-Thrust Satellites" Applied Sciences 11, no. 7: 3002. https://doi.org/10.3390/app11073002

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Safety-Aware Optimal Attitude Pointing for Low-Thrust Satellites

Abstract

1. Introduction

2. Methods

2.1. Bright Spot Avoidance

2.2. Communications Pointing Maintenance

2.3. Trajectory Generation

2.4. Reinforcement Learning

2.5. Choosing a New Optimal Path Using Reinforcement Learning

3. Results

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Optimal Control Approach

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI