Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter

Xiong, Kai; Zhao, Qin; Yuan, Li

doi:10.3390/s24196186

Open AccessArticle

Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter

by

Kai Xiong

^1,*

,

Qin Zhao

¹ and

Li Yuan

²

¹

Science and Technology on Space Intelligent Control Laboratory, Beijing Institute of Control Engineering, Beijing 100094, China

²

China Academy of Space Technology, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(19), 6186; https://doi.org/10.3390/s24196186

Submission received: 27 August 2024 / Revised: 20 September 2024 / Accepted: 20 September 2024 / Published: 24 September 2024

(This article belongs to the Section Navigation and Positioning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

For the relativistic navigation system where the position and velocity of the spacecraft are determined through the observation of the relativistic perturbations including stellar aberration and starlight gravitational deflection, a novel parallel Q-learning extended Kalman filter (PQEKF) is presented to implement the measurement bias calibration. The relativistic perturbations are extracted from the inter-star angle measurement achieved with a group of high-accuracy star sensors on the spacecraft. Inter-star angle measurement bias caused by the misalignment of the star sensors is one of the main error sources in the relativistic navigation system. In order to suppress the unfavorable effect of measurement bias on navigation performance, the PQEKF is developed to estimate the position and velocity, together with the calibration parameters, where the Q-learning approach is adopted to fine tune the process noise covariance matrix of the filter automatically. The high performance of the presented method is illustrated via numerical simulations in the scenario of medium Earth orbit (MEO) satellite navigation. The simulation results show that, for the considered MEO satellite and the presented PQEKF algorithm, in the case that the inter-star angle measurement accuracy is about 1 mas, after calibration, the positioning accuracy of the relativistic navigation system is less than 300 m.

Keywords:

spacecraft; autonomous navigation; relativistic navigation; Q-learning; extended Kalman filter

1. Introduction

Spacecraft navigation is an enabling technology for a wide variety of space missions, such as Earth satellites and deep space explorers. Currently, the commonly used navigation approach is the radio navigation based on the radio signal sent from beacons, such as ground stations and the global navigation satellite system (GNSS) [1,2]. To reduce the mission cost and improve the autonomous survival capacity, the autonomous navigation system that determines the position and velocity of the spacecraft with onboard instruments in a radio signal-denied environment is required [3,4,5]. To achieve precise navigation information for the spacecraft without the support of man-made beacons is critical for the development of future intelligent unmanned systems [6,7].

In the past few decades, several autonomous navigation techniques with different observation information sources have been studied, such as optical navigation (OPNAV) using the optical imaging of nearby celestial bodies [8,9,10], X-ray pulsar-based navigation (XNAV) [11,12,13] and star navigation based on the Doppler effect of starlight (StarNAV-DE) [14,15,16]. In the on-orbit demonstrations, the positioning accuracy of the OPNAV based on the observations of Earth is on the order of a few kilometers, while the accuracy of XNAV is less than 10 km, which is not sufficient to satisfy the high-precision navigation requirement for certain space missions. The StarNAV-DE technique has been demonstrated on the Chinese Hα Solar Explorer (CHASE). It is reported that the accuracy of the solar velocimeter observing the starlight Doppler effect on CHASE is about 2 m/s.

The spacecraft autonomous navigation method based on the relativistic perturbations of starlight is introduced in [17] and developed in [18,19]. Recently, the investigation relevant to the relativistic navigation has attracted increasing attention. A practical mathematical model to describe the relativistic perturbations to the space-based starlight observation is derived in [20]. The optical instruments for the observation of the relativistic perturbations are discussed in [21]. The application of relativistic navigation is suggested in [22] for interstellar spacecraft with high velocity such that the relativistic perturbations are not negligible. To enhance navigation accuracy and rapidity, the information fusion scheme of the relativistic navigation and the OPNAV are designed in [23,24]. The extended Kalman filter (EKF) and the unscented Kalman filter (UKF) are designed and evaluated for the implementation of the relativistic navigation in [25,26].

Among the previously mentioned autonomous navigation techniques, the relativistic navigation based on the relativistic perturbations to the inter-star angle measurement has the potential to achieve higher performance with current technology. Generally, the relativistic navigation performance depends on the measurement accuracy of the inter-star angle and the precision of the star catalog. As the inter-star angle can be measured with the accuracy of a few mas with state-of-the-art instruments, and the error of the modern star catalog is less than 0.1 mas, it is considered that relativistic navigation is a promising method to achieve high performance. In comparison with the OPNAV, an advantage of relativistic navigation is that the high-accuracy observation of starlight is generally easier than that of a nearby celestial body. Compared with the XNAV, relativistic navigation is competitive, as the number of visible stars is much more than the X-ray pulsars suitable for navigation. In addition, as the stability of the inter-star angle calculated with the star catalog is rather high, the main difficulty of the StarNAV-DE technique due to the poor stability of the stellar spectra is avoided.

In the relativistic navigation system, at least two star sensors separated from each other by a large angle are required to measure the inter-star angles, which reveal the variations in the relativistic perturbations, including stellar aberration and starlight gravitational deflection. The inter-star angle measurement bias caused by the misalignment of the star sensors strongly affects the performance of the relativistic navigation system. The main motivation of this study is to calibrate the measurement bias accurately via a fine-designed navigation filter. A common approach is to model the measurement bias as calibration parameters, which can be estimated together with the position and velocity of the spacecraft through the EKF.

It is well known that the state estimation accuracy of the EKF depends on the tuning of the process and measurement noise covariance matrices [27,28]. As the measurement noise covariance matrix can be determined through the specification of the star sensors, the problem remains in determining the process noise covariance matrix, especially for the elements related to the calibration parameters. Generally, it is difficult to obtain the optimal noise covariance matrices in the absence of exact statistical knowledge about the process noise. Several attempts have been made in the literature to develop adaptive filters [29,30,31]. For the study of autonomous navigation, the most widely used method is the adaptive extended Kalman filter (AEKF), where the noise covariance matrices are estimated together with the state vector. However, it is often difficult to guarantee the estimation accuracy of the noise covariance matrix in the presence of the state estimation error. To cope with this problem, a potential method is to combine the Q-learning approach with the EKF for tuning the process noise covariance matrix automatically [32,33,34,35,36]. The key idea of the parallel Q-learning extended Kalman filter (PQEKF) is the integration of the EKF and the Q-learning approach, where the process noise covariance matrix of the EKF is selected with the Q-learning approach, whose reward is constructed with the innovation of the EKF, such that the appropriated covariance matrix is determined to improve the filtering accuracy.

This paper studies a measurement bias calibration method for a relativistic navigation system. The main contributions of this study are as follows: (1) The PQEKF is presented to adjust the process noise covariance matrix related to the calibration parameters and that related to the position and velocity vectors, respectively. The PQEKF is different from its original version presented in [24] in that two learning agents are designed to work in parallel such that the flexibility of the algorithm is improved. (2) It is illustrated that the PQEKF is effective for calibrating the inter-star angle measurement bias of the star sensors. The simulation shows that, after calibration, the relativistic navigation accuracy for the MEO satellite is on the order of 300 m in the case that the standard deviation of the measurement noise is about 1 mas. (3) The principle of the presented method can be further applied to cope with other state estimation problems that require autonomous parameters tuning.

The remaining part of the paper is organized as follows: Section 2 formulates the mathematical model of the relativistic navigation system. Section 3 presents the PQEKF algorithm for the relativistic navigation system to estimate the calibration parameters. Section 4 evaluates the performance of the navigation filter via simulations. Finally, Section 5 concludes the paper.

2. Relativistic Navigation System Model

2.1. Basic Principle of Relativistic Navigation

The concept of the relativistic navigation is illustrated in Figure 1.

The basic principle is to estimate the position and velocity of the spacecraft based on the inter-star angle measurement, which is related to the relativistic perturbations including the stellar aberration and the starlight gravitational deflection. Considering that the effect of the stellar aberration and the starlight gravitational deflection to the inter-star angle measurement can be written as the function of the spacecraft velocity and position, respectively, the velocity and position information of the spacecraft can be extracted from the inter-star angle measurement. The derivation about the effect of relativistic perturbations to starlight is shown in [17]. As a basic of the navigation filter design, the observability analysis of the relativistic navigation system can be found in [24]. The spacecraft attitude determination method based on the stellar measurement is omitted here as it has been widely studied in literature, such as in [37].

As shown in Figure 2, a group of star sensors on a rigid platform are used to obtain the inter-star angle measurement. Then, the navigation filter is implemented to incorporate the orbital dynamics and the inter-star angle measurement, such that the position and velocity of the spacecraft are estimated. As the inter-star angle measurement bias caused by the misalignment of various star sensors may seriously degrade the navigation performance, the calibration method is required to estimate and compensate for the measurement bias. With accurate star sensors, a precise star catalog and a high-fidelity orbital dynamics model, high navigation performance is achievable if the calibration method is designed appropriately.

2.2. Dynamic Model

The dynamic model and the measurement model of the relativistic navigation system are constructed for the design of the navigation filter. The state vector is composed of the position vector

r_{k} = {[\begin{matrix} r_{x k} & r_{y k} & r_{z k} \end{matrix}]}^{T}

, the velocity vector

v_{k} = {[\begin{matrix} v_{x k} & v_{y k} & v_{z k} \end{matrix}]}^{T}

and the calibration parameter vector

b_{k} = {[\begin{matrix} \dots & b_{i j k} & \dots \end{matrix}]}^{T}

, which is shown as

x_{k} = {[\begin{matrix} r_{k}^{T} & v_{k}^{T} & b_{k}^{T} \end{matrix}]}^{T}

(1)

where the sub-label k denotes the discrete time, and i and j are used to distinguish different stars. The position and velocity vectors are defined in the Earth-centered inertial coordinate system.

For the Earth satellite, the dynamic model to describe the time evolution of the state vector is written as

x_{k} = f (x_{k - 1}) + w_{k}

(2)

with

f (x_{k}) = x_{k} + ϕ (x_{k}) τ

(3)

ϕ (x_{k}) = [\begin{matrix} v_{k} \\ - \frac{μ_{E} r_{k}}{{‖r_{k}‖}^{3}} + p (r_{k}) \\ 0_{m \times 1} \end{matrix}]

(4)

where

τ

is the step size,

μ_{E}

is the gravitational constant of Earth and

‖\cdot‖

denotes the Euclidean norm of a vector. The function

p (r_{k})

denotes the accelerations of the spacecraft generated by perturbations other than the central gravity of Earth, such as the non-spherical gravity of Earth, atmospheric drag, sunlight pressure and lunar and solar gravity. The expression of the function

p (r_{k})

can be found in [1]. The un-modeled perturbations are considered to be relatively small and can be absorbed in the process noise

w_{k}

, which is assumed as a zero-mean noise with the covariance matrix

Q_{k}

.

Q_{k}

as a symmetric and positive definite matrix.

2.3. Measurement Model

Considering the stellar aberration, the starlight gravitational deflection and the measurement bias, when multiple stars are observed simultaneously, the measurement model to describe the relation between the inter-star angle measurement and the state vector is formulated as

y_{k} = h (x_{k}) + ν_{k}

(5)

with

y_{k} = [\begin{matrix} ⋮ \\ y_{i j k} \\ ⋮ \end{matrix}], h (x_{k}) = [\begin{matrix} ⋮ \\ h_{i j} (x_{k}) \\ ⋮ \end{matrix}]

(6)

where

y_{k}

is the measurement vector,

ν_{k}

is the measurement noise with a covariance matrix

R_{k}

,

R_{k}

is a symmetric and positive definite matrix and

y_{i j k}

is the measurement of the angle between two stars distinguished by

i

and

j

. The elements

h_{i j} (x_{k})

in the function vector

h (x_{k})

are expressed as

h_{i j} (x_{k}) = {(u_{I i k}^{'})}^{T} u_{I j k}^{'} + \frac{1}{c} [1 - {(u_{I i k}^{'})}^{T} u_{I j k}^{'}] ({(v_{k} + v_{E, k})}^{T} u_{I i k}^{'} + {(v_{k} + v_{E, k})}^{T} u_{I j k}^{'}) - \frac{1}{c^{2}} [1 - {(u_{I i k}^{'})}^{T} u_{I j k}^{'}] [{({(v_{k} + v_{E, k})}^{T} u_{I i k}^{'})}^{2} + {({(v_{k} + v_{E, k})}^{T} u_{I j k}^{'})}^{2} + ({(v_{k} + v_{E, k})}^{T} u_{I i k}^{'}) ({(v_{k} + v_{E, k})}^{T} u_{I j k}^{'}) - \begin{matrix} {(v_{k} + v_{E, k})}^{T} (v_{k} + v_{E, k})] \end{matrix} + b_{i j k}

(7)

where

c

is the speed of light,

v_{E, k}

is the velocity vector of Earth’s center relative to the solar system barycenter (SSB) and

u_{I i k}^{'}

is the line-of-sight (LOS) vector of the

i

th star as seen by a fictitious stationary observer. The expression of

u_{I i k}^{'}

is given by

u_{I i k}^{'} = u_{I i k} + δ u_{I i k}

(8)

where

u_{I i k}

is the unit LOS vector of the

i

th star in the absence of the gravitational field, which is calculated from the star catalog.

Δ u_{I i k}

denotes the effect of Earth’s gravitational field to the unit LOS vector of the

i

th star, which is described as

δ u_{I i k} = \frac{2 μ_{E}}{c^{2}} \frac{(1 - u_{I i k}^{T} r_{k} / ‖r_{k}‖) (I_{3 \times 3} - u_{I i k} u_{I i k}^{T}) r_{k}}{{‖(I_{3 \times 3} - u_{I i k} u_{I i k}^{T}) r_{k}‖}^{2}}

(9)

Note that the measurement bias

b_{i j k}

is modeled as the calibration parameter. The derivation of the Jacobian matrix

H_{k}

for the measurement model shown in (5) is similar to that in [24].

3. Navigation Filtering Algorithm

3.1. Extended Kalman Filter

The extended Kalman filter (EKF) is one of the most applied navigation filtering algorithms. The traditional EKF can be adopted as the navigation filter to estimate the state vector

x_{k}

based on the measurement vector

y_{k}

. For clarity, the equations of the EKF are summarized in Algorithm 1.

Algorithm 1: Extended Kalman filter.

1 : function E K F ({\hat{x}}_{k - 1}, P_{k - 1}, y_{k}, Q_{k}, R_{k})

2 : {\hat{x}}_{k | k - 1} \leftarrow f ({\hat{x}}_{k - 1})

⊳

prediction

3 : P_{k | k - 1} \leftarrow F_{k} P_{k - 1} F_{k}^{T} + Q_{k}

4 : K_{k} \leftarrow P_{k | k - 1} H_{k}^{T} {(H_{k} P_{k | k - 1} H_{k}^{T} + R_{k})}^{- 1}

5 : {\tilde{y}}_{k} \leftarrow y_{k} - h ({\hat{x}}_{k | k - 1})

6 : {\hat{x}}_{k} \leftarrow {\hat{x}}_{k | k - 1} + K_{k} {\tilde{y}}_{k}

⊳

update

7 : P_{k} \leftarrow (I - K_{k} H_{k}) P_{k | k - 1} {(I - K_{k} H_{k})}^{T} + K_{k} R_{k} K_{k}^{T}

8 : return {\hat{x}}_{k}

, P_{k}

, {\tilde{y}}_{k}

9: end function

In the algorithm,

{\hat{x}}_{k | k - 1}

and

{\hat{x}}_{k}

are the predicate and the estimate of the state vector,

P_{k | k - 1}

and

P_{k}

are their corresponding estimation error covariance matrices,

K_{k}

is the Kalman gain,

{\tilde{y}}_{k}

is the measurement innovation and

F_{k} = {\frac{\partial f}{\partial x}|}_{x = {\hat{x}}_{k - 1}}

and

H_{k} = {\frac{\partial h}{\partial x}|}_{x = {\hat{x}}_{k | k - 1}}

are the Jacobian matrices.

In the EKF algorithm, the state prediction

{\hat{x}}_{k | k - 1}

is updated with the measurement

y_{k}

, where the strength of the state update is controlled with the Kalman gain

K_{k}

. In essence, the estimation accuracy of the EKF depends on the system model and filtering parameters. It is seen from Algorithm 1 that the tuning of the filtering parameters, such as the noise covariance matrices

Q_{k}

and

R_{k}

, plays an important role in optimizing the Kalman gain

K_{k}

, such that the observation information in

y_{k}

is extracted adequately, while the measurement noise

ν_{k}

is suppressed effectively. In practice, it is often difficult to design the optimal noise covariance matrices in the absence of exact statistical knowledge about

w_{k}

and

ν_{k}

. In order to guarantee navigation performance, it is a worthy research field to study how to set and adjust the noise covariance matrices appropriately.

According to practical experience, it is often inefficient to tune the process noise covariance matrix together with the measurement noise covariance matrix. For the considered navigation system, the measurement noise covariance matrix can be determined according to the measurement accuracy specification of the star sensors. However, less experience is inheritable for the aerospace engineers to fine tune the process noise covariance matrix related to the calibration parameters. Thus, the adaptation of the process noise covariance matrix

Q_{k}

is studied in this paper.

3.2. Q-Learning Approach

In recent years, reinforcement learning (RL) has received considerable attention, with many successful applications in various fields, such as computer science, robotics systems and control engineering [38,39]. RL is becoming a major tool in the field of artificial intelligence, such that an agent can make their own choice in an operational environment without an environmental model or labeled data. Q-learning is a representative RL approach and many studies have described its uses in solving different problems [40,41,42,43]. The combination of the EKF and Q-learning is a promising direction as both are familiar to aerospace engineers and easy to implement on the spacecraft with limited computational power.

This paper presents a parallel Q-learning extended Kalman filter (PQEKF), where the Q-learning approach is introduced to select the appropriate process noise covariance matrix through its trial-and-error mechanism, which helps to improve the filtering performance. This method differs from the Q-learning extended Kalman filter (QLEKF) presented in [24] in that two parallel learning agents, owning their separated state space, are designed for adjusting the process noise covariance corresponding to the kinematic state and the calibration parameters, respectively, which improves the flexibility of the algorithm.

In Q-learning, the agent interacts with the environment iteratively to learn the optimal strategy. The learning agent’s strategy is contained in a Q-table, which is composed of the Q-function

Q^{(i)} (s, a)

for the state

s \in S

and the action

a \in A

, where

S

and

A

are the state space and the action space,

i \in Z_{+}

denotes the number of iterations and

Z_{+}

represents the set of positive integer numbers. In each iteration, the agent performs an action

a

in the state

s

according to the current strategy and receives feedback, such as a utility function

U (s, a)

, from the environment, which indicates whether the strategy is good or not. Then, the Q-function

Q^{(i)} (s, a)

of the agent in the Q-table is updated based on the utility function. The strategy contained in the Q-table will be optimized when sufficient iterations are implemented.

For convenience, the iterative update equation of the Q-function is shown as follows

Q^{(i)} (s, a) = U (s, a) + γ Q^{(i - 1)} (s^{'}, a^{(i - 1)} (s^{'}))

(10)

with

a^{(i - 1)} (s^{'}) = a r g \min_{a^{'} \in A} Q^{(i - 1)} (s^{'}, a^{'})

(11)

where

Q^{(i)} (s, a)

is the Q-function that the agent gained at the i-th iteration,

U (s, a)

is the current utility function obtained from the environment,

γ \in [0, 1)

is the discounted factor, which is introduced for the tradeoff between the current utility function and the cumulated utility function, and

s^{'}

is the state after action

a

is performed.

The convergence of the Q-learning approach and the stability of the QLEKF algorithm are analyzed in [32,34], respectively. In this paper, a concise error-bound analysis of the Q-function

Q^{(i)} (s, a)

considering the finite number of iterations and calculation error is summarized in Theorems 1–3, which are helpful for the readers to grasp the key idea of the iterative update process shown in (10).

Theorem 1.

Considering the iterative update equation shown in (10), if the initial Q-function

Q^{(0)} (s, a)

satisfies the following condition for all

s \in S

and

a \in A

,

Q^{(0)} (s, a) \geq U (s, a) + γ Q^{(0)} (s^{'}, a^{(0)} (s^{'}))

(12)

and

Q^{(i)} (s, a) \geq 0,

(13)

then

\lim_{i \to \infty} Q^{(i)} (s, a) = Q^{*} (s, a)

(14)

where

Q^{*} (s, a) = U (s, a) + γ Q^{*} (s^{'}, a^{*} (s^{'}))

(15)

with

a^{*} (s^{'}) = a r g \min_{a^{'} \in A} Q^{*} (s^{'}, a^{'})

(16)

The proof of the theorem is collected in Appendix A. It indicates that, under certain conditions, the current Q-function

Q^{(i)} (s, a)

is convergent to the optimal Q-function

Q^{*} (s, a)

with an infinite number of iterations. In fact, the iterative update equations shown in (10) and (11) can be combined as

Q^{(i)} (s, a) = U (s, a) + γ \min_{a^{'} \in A} Q^{(i - 1)} (s^{'}, a^{'}) .

(17)

Correspondingly, the optimal Q-function is rewritten as

Q^{*} (s, a) = U (s, a) + γ \min_{a^{'} \in A} Q^{*} (s^{'}, a^{'}) .

(18)

The following lemmas are useful to analyze the error bounds of the current Q-function in the presence of calculation error with a finite number of iterations.

Lemma 1.

Considering the current Q-function sequence

\{Q_{1}^{(i)} (s, a)\}

and

\{Q_{2}^{(i)} (s, a)\}

obtained from (10), if the following inequality holds for all

s \in S

and

a \in A

,

Q_{1}^{(0)} (s, a) \leq Q_{2}^{(0)} (s, a),

(19)

then for

i \in Z_{+}

, the following inequality is satisfied

Q_{1}^{(i)} (s, a) \leq Q_{2}^{(i)} (s, a) .

(20)

Lemma 2.

Considering the current Q-function

Q^{(i)} (s, a)

shown in (17) and a scalar

ϵ

, let

{\bar{Q}}^{(i)} (s, a) = Q^{(i)} (s, a) + ϵ

(21)

and define the Q-learning operator

L^{(1)}

(L^{(1)} {\bar{Q}}^{(i)}) (s, a) = U (s, a) + γ \min_{a^{'} \in A} {\bar{Q}}^{(i)} (s^{'}, a^{'}) .

(22)

The composition of the mapping with itself

t

times is denoted by

L^{(t)}

. Then for

i \in Z_{+}

, the following equality is satisfied

(L^{(t)} {\bar{Q}}^{(i)}) (s, a) = Q^{(i + t)} (s, a) + γ^{t} ϵ .

(23)

The proofs of Lemmas 1 and 2 are collected in Appendix B and Appendix C. With these prerequisites, the error bound of

Q^{(i)} (s, a)

with a finite number of iterations is summarized in the following theorem.

Theorem 2.

For the current Q-function

Q^{(i)} (s, a)

shown in (17) and the optimal Q-function

Q^{*} (s, a)

shown in (18), if the conditions shown in Theorem 1 hold, then the following inequality is satisfied for all

s \in S

,

a \in A

and

i \in Z_{+}

\frac{γ}{1 - γ} {\underline{ϵ}}_{i} \leq Q^{*} (s, a) - Q^{(i)} (s, a) \leq \frac{γ}{1 - γ} {\bar{ϵ}}_{i}

(24)

where

{\underline{ϵ}}_{i} = \min_{s \in S, a \in A} [Q^{(i)} (s, a) - Q^{(i - 1)} (s, a)]

(25)

{\bar{ϵ}}_{i} = \max_{s \in S, a \in A} [Q^{(i)} (s, a) - Q^{(i - 1)} (s, a)] .

(26)

The proof of Theorem 2 is collected in Appendix D. It is seen from Theorem 2 that the bound of the error between the current Q-function and the optimal Q-function is determined by the difference between

Q^{(i)} (s, a)

and

Q^{(i - 1)} (s, a)

with a finite number of iterations. According to Theorem 1, the error bound defined in (25) and (26) tends to zero as

i \to \infty

.

In addition, consider a calculated Q-function

{\hat{Q}}^{(i)} (s, a)

with calculation error. Based on Theorem 2, the bound of the error between the calculated Q-function

{\hat{Q}}^{(i)} (s, a)

and the optimal Q-function

Q^{*} (s, a)

with a finite number of iterations is studied in the following theorem.

Theorem 3.

Considering the calculated Q-function sequence

\{{\hat{Q}}^{(i)} (s, a)\}

, if the following inequality holds for all

s \in S

,

a \in A

and

i \in Z_{+}

|{\hat{Q}}^{(i)} (s, a) - Q^{(i)} (s, a)| \leq ε_{i} (s, a)

(27)

where

ε_{i} (s, a)

is the function that describes the error bound between the calculated Q-function and the current Q-function, and the conditions shown in Theorem 1 hold, then the following inequality is satisfied

- {\underline{c}}_{i} \leq Q^{*} (s, a) - {\hat{Q}}^{(i)} (s, a) \leq {\bar{c}}_{i}

(28)

with

{\underline{c}}_{i} = \frac{{\bar{ε}}_{i} + γ {\bar{ε}}_{i - 1}}{1 - γ} - \frac{γ}{1 - γ} \min_{s \in S, a \in A} [{\hat{Q}}^{(i)} (s, a) - {\hat{Q}}^{(i - 1)} (s, a)]

(29)

{\bar{c}}_{i} = \frac{{\bar{ε}}_{i} + γ {\bar{ε}}_{i - 1}}{1 - γ} + \frac{γ}{1 - γ} \max_{s \in S, a \in A} [{\hat{Q}}^{(i)} (s, a) - {\hat{Q}}^{(i - 1)} (s, a)]

(30)

where

{\bar{ε}}_{i} = \underset{s \in S, a \in A}{m a x} ε_{i} (s, a) .

(31)

The proof of Theorem 3 is collected in Appendix E. It is seen from Theorem 3 that the error bound of the calculated Q-function

{\hat{Q}}^{(i)} (s, a)

is determined by the calculated error

{\bar{ε}}_{i}

and the difference between

{\hat{Q}}^{(i)} (s, a)

and

{\hat{Q}}^{(i - 1)} (s, a)

with a finite number of iterations. It is seen from the theorem that the error bound defined in (29) and (30) tends to zero if the calculation error is zero and

i \to \infty

.

3.3. Parallel Q-Learning Extended Kalman Filter

In this sub-section, the PQEKF algorithm is presented based on the EKF shown in Algorithm 1 and the iterative update equation of Q-learning shown in (10). To fine tune the process noise covariance matrix of the filter, it is assumed that

Q_{k}

is a diagonal matrix with the following structure

Q_{k} = [\begin{matrix} Q_{r v, k} & 0 \\ 0 & Q_{b, k} \end{matrix}]

(32)

where

Q_{r v, k}

is the sub-matrix corresponding to the position

r_{k}

and velocity

v_{k}

in the state vector, while

Q_{b, k}

is the sub-matrix corresponding to the calibration parameter vector

b_{k}

. To avoid the curse of dimensionality, two parallel learning agents are designed to tune the sub-matrices

Q_{r v, k}

and

Q_{b, k}

individually in an episode. Therefore, two Q-tables, each for one learning agent, should be updated in the algorithm. The Q-functions of the two agents are expressed as

Q_{r v} (s_{r v}, a_{r v})

(

s_{r v} \in S_{r v}

,

a_{r v} \in A_{r v}

) and

Q_{b} (s_{b}, a_{b})

(

s_{b} \in S_{b}

,

a_{b} \in A_{b}

), respectively, where

S_{r v}

,

S_{b}

and

A_{r v}

,

A_{b}

are the corresponding state space and action space. Note that the superscript

(i)

of the Q-function is omitted here for simplicity.

Our task is to achieve a reliable strategy to select the appropriate process noise covariance matrix, or specifically

Q_{r v, k}

and

Q_{b, k}

, with the purpose of enhancing the filtering performance. Hence, the state space in the Q-learning approach is constructed based on the different design values of the process noise covariance matrix. Accordingly, every state of the two agents is related to an element in the per-determined sets

\{\begin{matrix} \dots & Q_{r v}^{(s_{r v})} & \dots \end{matrix}\}

or

\{\begin{matrix} \dots & Q_{b}^{(s_{b})} & \dots \end{matrix}\}

, where

Q_{r v}^{(s_{r v})}

and

Q_{b}^{(s_{b})}

are the certain design values for

Q_{r v, k}

and

Q_{b, k}

, respectively.

The action space is constructed to describe the different state transitions in the state space. To simplify the formulation of the algorithm and focus on the main task, in this paper, the action is set as remaining in the current state. To decrease the learning time cost, the deterministic Q-learning approach presented in [44] is adopted, where the Q-function for all of the states and actions is updated in each iteration. Alternative learning methods are described in [33] to explore the state space randomly with a certain action selection strategy, such as the

ε

-greedy strategy or Softmax strategy.

The utility functions of the two agents

U_{r v} (s_{r v}, a_{r v})

and

U_{b} (s_{b}, a_{b})

are designed based on the measurement innovation of the tentative EKFs where the related process noise covariance matrices

Q_{r v}^{(s_{r v})}

and

Q_{b}^{(s_{b})}

are adopted. As the measurement innovation is an effective indicator of the filtering performance, the values of

U_{r v} (s_{r v}, a_{r v})

and

U_{b} (s_{b}, a_{b})

are utilized to evaluate the quality of the related design values

Q_{r v}^{(s_{r v})}

and

Q_{b}^{(s_{b})}

, and provide guidance to achieve the best possible strategy. The update laws of the two agents are formulated according to the iterative update equation of the Q-learning approach.

Following the previous description, for the navigation system model shown in (2) and (5), the detail implementation process of the PQEKF algorithm in one episode is presented in Algorithm 2.

Algorithm 2: Parallel Q-learning extended Kalman filter.

Input

: initial state estimate {\hat{x}}_{0}

and its error covariance matrix P_{0}

, process noise covariance matrix Q_{k}

, measurement noise covariance matrix R_{k}

, measurement sequence \{y_{k}\}

and initial Q-functions

Q_{r v} (s_{r v}, a_{r v})

and

Q_{b} (s_{b}, a_{b})

Output

: \{{\hat{x}}_{k}\}

and \{P_{k}\}

1 : for all s_{r v} \in S_{r v}

, a_{r v} \in A_{r v}

, do

2 : {\hat{x}}_{0}^{(s_{r v})} \leftarrow {\hat{x}}_{0}

, P_{0}^{(s_{r v})} \leftarrow P_{0}

, U_{r v} (s_{r v}, a_{r v}) \leftarrow 0

3 : {\hat{Q}}_{k}^{(s_{r v})} \leftarrow Q_{k}

, {\hat{Q}}_{k}^{(s_{r v})} (1 : 6, 1 : 6) \leftarrow Q_{r v}^{(s_{r v})}

4: end

5 : for all s_{b} \in S_{b}

, a_{b} \in A_{b}

, do

6 : {\hat{x}}_{0}^{(s_{b})} \leftarrow {\hat{x}}_{0}

, P_{0}^{(s_{b})} \leftarrow P_{0}

, U_{b} (s_{b}, a_{b}) \leftarrow 0

7 : {\hat{Q}}_{k}^{(s_{b})} \leftarrow Q_{k}

, {\hat{Q}}_{k}^{(s_{b})} (7 : 9, 7 : 9) \leftarrow Q_{b}^{(s_{b})}

8: end

9 : for k = 1,2, \dots, K

10 : for all s_{r v} \in S_{r v}

, a_{r v} \in A_{r v}

, do

11 : [{\hat{x}}_{k}^{(s_{r v})}, P_{k}^{(s_{r v})}, {\tilde{y}}_{k}^{(s_{r v})}] \leftarrow E K F ({\hat{x}}_{k - 1}^{(s_{r v})}, P_{k - 1}^{(s_{r v})}, y_{k}, {\hat{Q}}_{k}^{(s_{r v})}, R_{k})

12 : U_{r v} (s_{r v}, a_{r v}) \leftarrow (1 - K^{- 1}) U_{r v} (s_{r v}, a_{r v}) + K^{- 1} {({\tilde{y}}_{k}^{(s_{r v})})}^{T} {\tilde{y}}_{k}^{(s_{r v})}

13 : Q_{r v} (s_{r v}, a_{r v}) \leftarrow U_{r v} (s_{r v}, a_{r v}) + γ \min_{a_{rv}^{'} \in A_{rv}} Q_{r v} (s_{r v}^{'}, a_{r v}^{'})

14: end

15 : for all s_{r v} \in S_{r v}

16 : V_{r v} (s_{r v}) \leftarrow \min_{a_{rv} \in A_{rv}} Q_{r v} (s_{r v}, a_{r v})

17: end

18 : s_{r v, m i n} = a r g \min_{s_{rv} \in S_{rv}} V_{r v} (s_{r v})

19 : for all s_{b} \in S_{b}

, a_{b} \in A_{b}

, do

20 : [{\hat{x}}_{k}^{(s_{b})}, P_{k}^{(s_{b})}, {\tilde{y}}_{k}^{(s_{b})}] \leftarrow E K F ({\hat{x}}_{k - 1}^{(s_{b})}, P_{k - 1}^{(s_{b})}, y_{k}, {\hat{Q}}_{k}^{(s_{b})}, R_{k})

21 : U_{b} (s_{b}, a_{b}) \leftarrow (1 - K^{- 1}) U_{b} (s_{b}, a_{b}) + K^{- 1} {({\tilde{y}}_{k}^{(s_{b})})}^{T} {\tilde{y}}_{k}^{(s_{b})}

22 : Q_{b} (s_{b}, a_{b}) \leftarrow U_{b} (s_{b}, a_{b}) + γ \min_{a_{b}^{'} \in A_{b}} Q_{r v} (s_{b}^{'}, a_{b}^{'})

23: end

24 : for all s_{b} \in S_{b}

25 : V_{b} (s_{b}) \leftarrow \min_{a_{b} \in A_{b}} Q_{b} (s_{b}, a_{b})

26: end

27 : s_{b, m i n} = a r g \min_{s_{b} \in S_{b}} V_{b} (s_{b})

28 : Q_{k} (1 : 6, 1 : 6) \leftarrow Q_{r v}^{(s_{r v, m i n})}

, Q_{k} (7 : 9, 7 : 9) \leftarrow Q_{b}^{(s_{b, m i n})}

29 : [{\hat{x}}_{k}, P_{k}, {\tilde{y}}_{k}] \leftarrow E K F ({\hat{x}}_{k - 1}, P_{k - 1}, y_{k}, Q_{k}, R_{k})

30: end

In the algorithm,

{\hat{x}}_{k}^{(s_{r v})}

,

P_{k}^{(s_{r v})}

,

{\hat{x}}_{k}^{(s_{b})}

and

P_{k}^{(s_{b})}

are the state estimates and the estimation error covariance matrices of the tentative EKFs for the two agents,

{\tilde{y}}_{k}^{(s_{r v})}

and

{\tilde{y}}_{k}^{(s_{b})}

are the corresponding measurement innovations,

{\hat{Q}}_{k}^{(s_{r v})}

and

{\hat{Q}}_{k}^{(s_{b})}

are the process noise covariance matrices for trial and

K

is the length of the measurement sequence in one episode. The Q-functions of the two parallel agents are updated with the utility functions

U_{r v} (s_{r v}, a_{r v})

and

U_{b} (s_{b}, a_{b})

, which are accumulated in a certain time window to suppress the unfavorable effect of the measurement noise. Generally, if a relatively small utility function is obtained, the corresponding process noise covariance matrix for trial is considered to be satisfactory. Otherwise, the related

{\hat{Q}}_{k}^{(s_{r v})}

or

{\hat{Q}}_{k}^{(s_{b})}

are considered to be unsatisfactory. It is expected that the appropriate process noise covariance matrix, which is valuable to improve the filtering performance, is selected according to the Q-functions

Q_{r v} (s_{r v}, a_{r v})

and

Q_{b} (s_{b}, a_{b})

with this trial-and-error process. This algorithm can be implemented iteratively in multiple episodes to fine tune

Q_{k}

in the navigation filter during the space mission.

For clarity, the structure of the presented PQEKF algorithm is shown in Figure 3.

For the implementation of the PQEKF, the measurement

y_{k}

is acquired from the star sensors. Driven with the measurement data, the matrices

Q_{r v}^{(s_{r v, m i n})}

and

Q_{b}^{(s_{b, m i n})}

are selected with two parallel learning agents, which are designed based on the EKF algorithm and the Q-function update equation. With the appropriate process noise covariance matrix

Q_{k}

composed of the sub-matrices

Q_{r v}^{(s_{r v, m i n})}

and

Q_{b}^{(s_{b, m i n})}

, the navigation filter derives the optimized state estimate

{\hat{x}}_{k}

for the spacecraft control system. Although the PQEKF presented here only contains two parallel learning agents, the algorithm can be extended easily for the case with multiple learning agents to deal with more complicated problems.

The presented PQEKF algorithm is suitable for the navigation systems where the prior knowledge about the statistical characteristics of the process noise or the measurement noise is insufficient. For example, in the relativistic navigation system, it is difficult to specify the magnitude of the process noise covariance related to the calibration parameter vector previously. To ensure the feasibility of the algorithm, before the implementation of the PQEKF on the orbit, all design values of the process noise covariance matrix could be tested through numerical simulations on the ground. For the considered system, a small state space in the Q-learning approach with a few design values of the process noise covariance on different orders of magnitude is sufficient to improve the filtering performance.

4. Simulations

4.1. Simulation Conditions

In this section, comparisons are performed to demonstrate the efficiency of the calibration method for the relativistic navigation system using the PQEKF. The reference trajectory of the spacecraft is generated through a high precision orbit propagator, where non-spherical Earth gravity perturbation, lunar–solar gravitational perturbation and solar radiation pressure perturbation are considered. Assume that the spacecraft is an MEO satellite in a near-circular orbit with a semi-major axis of 21,528 km and inclination of 55°. The measurement data are generated according to the reference trajectory and the measurement model shown in Section 2. The navigation filters designed based on the EKF and the PQEKF presented in Section 3 are implemented individually to process the measurement data. The position and velocity estimation errors are obtained via comparison between the state estimation and the reference trajectory.

For the fairness of comparison, the EKF and the PQEKF share the same measurement noise covariance matrix

R_{k}

and the initial estimation error covariance matrix

P_{0}

. The parameter settings for the simulation are listed in Table 1.

For the PQEKF, when discretizing the state space, since the range of the state space is unknown, the upper limit and lower limit of the process noise covariance matrix is obtained through experiments. The performance of the presented methods is evaluated via the position and velocity estimation errors, which are critical for the orbital control of the spacecraft.

4.2. Simulation Results

First, the navigation performance of the presented method is compared with that of the traditional EKF without measurement bias calibration [24]. The three-axis position and velocity estimation error curves of the spacecraft obtained from the EKF without measurement bias calibration are shown in Figure 4 and Figure 5 with solid line, where the dashed lines represent the theoretic error bounds calculated from the estimation error covariance matrix of the navigation filter.

It is seen from Figure 4 and Figure 5 that the error curves fluctuate out of the theoretic error bounds frequently due to the unfavorable effect of the measurement bias. In contrast, the position and velocity estimation error curves of the calibration method based on the PQEKF are shown in Figure 6 and Figure 7.

From Figure 6 and Figure 7, it can be seen that all of the error curves are contained in the corresponding error bounds, which indicates the effectiveness for the design of the navigation filter.

Second, to facilitate the performance comparison of the algorithms in different simulation conditions, the position and velocity average root mean squared (RMS) errors of the EKF without bias calibration, the EKF with bias calibration and the presented method are plotted versus different settings of the measurement bias in Figure 8 and Figure 9.

It is easy to see from Figure 8 and Figure 9 that the estimation error of the EKF without bias calibration is enlarged with the increase in the measurement bias, while the effect of the measurement bias on the navigation performance is suppressed efficiently when the EKF with bias calibration is adopted. In addition, the calibration method based on the PQEKF achieves superior performance due to its ability in selecting the suitable process noise covariance matrix. It indicates that the presented method is not sensitive to the inter-star angle measurement bias.

In addition, the effect of the measurement noise on the PQEKF algorithm is examined through simulations. When the standard deviation of the measurement noise is changed in the scopes of [0.6, 1.6] mas, the RMS position errors of the EKF and the PQEKF are illustrated in Figure 10. The simulation result shows that, in comparison with the EKF, the PQEKF is more effective for suppressing the unfavorable effect of the measurement noise.

It is seen from Algorithm 2 that the PQEKF contains multiple EKFs. In the simulation, the execution time of the PQEKF is several times larger than that of the EKF. Nevertheless, it is easy to complete the one step iteration of the PQEKF algorithm in an observation period of the star sensors. For the considered system, to reduce the computational cost of the PQEKF, the state space or the action space of the Q-learning approach could be further optimized. For a complicated practical system with a large state space or action space, artificial neural network approximation or dedicated hardware can be introduced for the implementation of the algorithm. In addition, it is expected that a dynamic state space with the bound stretched automatically can be designed in future works.

Next, as the PQEKF is an improved version of the QLEKF, it is compared with the QLEKF for the relativistic navigation system through Monte Carlo trials. The position RMS error curves of the calibration methods based on the EKF, the AEKF, the QLEKF and the PQEKF are plotted in Figure 11. The statistical values of the navigation accuracy for the different methods are summarized in Table 2.

We can see from Figure 11 and Table 2 that the navigation performance of the PQEKF is slightly higher than the QLEKF, as two learning agents are implemented in parallel to select the appropriate

Q_{r v}^{(s_{r v, m i n})}

and

Q_{b}^{(s_{b, m i n})}

in parallel, while the QLEKF is designed to search for the whole

Q_{k}

. It is believed that the design of the PQEKF is more flexible than the QLEKF as different scale factors can be adopted to tune the different sub-matrices in the process noise covariance matrix.

Finally, the influence of the state space discretization on the positioning accuracy of the PQEKF algorithm for the relativistic navigation system is analyzed. When the number of states is set as five, seven and nine, respectively, the position RMS error curves of the PQEKF algorithms are shown in Figure 12.

From Figure 12, the variation in the position estimation error under the different settings of the state number in the state space discretization is rather small in the majority of the simulation processes. This indicates that the influence of the state number variation within a certain scope on the estimation accuracy of the PQEKF is not evident. In the considered scenario, two pre-determined sets with a small number of design values for

Q_{r v}

and

Q_{b}

are beneficial for improving the performance of the calibration method.

According to the above simulation analysis, it is confirmed that the presented method is well-suited for the relativistic navigation system with the requirement to calibrate the inter-star angle measurement bias. For the simulation conditions described in Section 4.1, the achievable spacecraft navigation accuracy is on the order of a few hundred meters, which is sufficient for most orbital control missions.

5. Conclusions

This paper presents an inter-star angle measurement bias calibration method for the spacecraft relativistic navigation system. The proper design of the process noise covariance matrix is critical for accurate calibration. In order to improve the calibration accuracy and the navigation performance, the Q-learning approach is combined with the EKF for an online adaptive tuning of the process noise covariance matrix based on the measurement data achieved from onboard star sensors. The PQEKF algorithm is developed as the navigation filter, where two learning agents are implemented in parallel to select the appropriate sub-matrices related to the kinematic state and the calibration parameters, respectively. The simulation results show that the navigation performance of the presented method is superior to that of the EKF, the AEKF and the QLEKF in the presence of measurement bias, demonstrating the efficiency of the calibration method and the PQEKF algorithm. This study introduces a hybrid framework to combine the reinforcement learning approach in the navigation filter, which can serve as a foundation method to improve the state estimation accuracy in potential applications of relativistic navigation for Earth satellites or deep space explorers.

Author Contributions

Conceptualization, K.X. and Q.Z.; methodology, K.X.; software, Q.Z.; validation, K.X., Q.Z. and L.Y.; formal analysis, K.X.; writing—original draft preparation, K.X.; writing—review and editing, Q.Z.; supervision, L.Y.; project administration, K.X.; funding acquisition, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62394354.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 1.

First, mathematical induction is adopted to prove the following inequality

Q^{(i)} (s, a) \leq Q^{(i - 1)} (s, a) .

(A1)

For

i = 1

, from (10), we have

Q^{(1)} (s, a) = U (s, a) + γ Q^{(0)} (s^{'}, a^{(0)} (s^{'})) .

(A2)

Considering the condition shown in (12), we obtain

Q^{(1)} (s, a) \leq Q^{(0)} (s, a) .

(A3)

It is seen from (A3) that the inequality (A1) holds for

i = 1

. Assume that the inequality (A1) holds for

i = l

, i.e.,

Q^{(l)} (s, a) \leq Q^{(l - 1)} (s, a) .

(A4)

It can be derived from (10) and (11) that

Q^{(l)} (s, a) = U (s, a) + γ Q^{(l - 1)} (s^{'}, a^{(l - 1)} (s^{'})) \geq U (s, a) + γ Q^{(l)} (s^{'}, a^{(l - 1)} (s^{'})) \geq U (s, a) + γ \min_{a^{'} \in A} Q^{(l)} (s^{'}, a^{'}) = U (s, a) + γ Q^{(l)} (s^{'}, a^{(l)} (s^{'})) = Q^{(l + 1)} (s, a) .

(A5)

From (A5), the inequality (A1) holds for

i = l + 1

. Thus, for

i = 1,2, \dots

, the inequality (A1) holds. The mathematical induction is complete.

Second, considering that

\{Q^{(i)} (s, a)\}

is a bounded non-increasing sequence, let

Q^{(\infty)} (s, a) = \lim_{i \to \infty} Q^{(i)} (s, a)

(A6)

a^{(\infty)} (s^{'}) = a r g \min_{a^{'} \in A} Q^{(\infty)} (s^{'}, a^{'}) .

(A7)

Taking the limit of both sides of equation (10) yields

Q^{(\infty)} (s, a) = U (s, a) + γ Q^{(\infty)} (s^{'}, a^{(\infty)} (s')) .

(A8)

The formulation of (A8) is essentially the same as that of (15). This completes the proof of Theorem 1. □

Appendix B

Proof of Lemma 1.

Mathematical induction is adopted to prove Lemma 1. For

i = 1

, let

a_{1}^{(0)} (s^{'}) = a r g \min_{a_{1}^{'} \in A} Q_{1}^{(0)} (s^{'}, a_{1}^{'})

(A9)

and

a_{2}^{(0)} (s^{'}) = a r g \min_{a_{2}^{'} \in A} Q_{2}^{(0)} (s^{'}, a_{2}^{'}) .

(A10)

Considering the condition shown in (19), we have

Q_{1}^{(0)} (s^{'}, a_{1}^{(0)} (s^{'})) \leq Q_{1}^{(0)} (s^{'}, a_{2}^{(0)} (s^{'})) \leq Q_{2}^{(0)} (s^{'}, a_{2}^{(0)} (s^{'}))

(A11)

It is derived from (10) and (A11) that

Q_{1}^{(1)} (s, a) = U (s, a) + γ Q_{1}^{(0)} (s^{'}, a_{1}^{(0)} (s^{'})) \leq U (s, a) + γ Q_{2}^{(0)} (s^{'}, a_{2}^{(0)} (s^{'})) = Q_{2}^{(1)} (s, a) .

(A12)

It is seen from (A12) that the inequality (20) holds for

i = 1

. Assume that the inequality (20) holds for

i = l

, i.e.,

Q_{1}^{(l)} (s, a) \leq Q_{2}^{(l)} (s, a) .

(A13)

Let

a_{1}^{(l)} (s^{'}) = a r g \min_{a_{1}^{'} \in A} Q_{1}^{(l)} (s^{'}, a_{1}^{'})

(A14)

and

a_{2}^{(l)} (s^{'}) = a r g \min_{a_{2}^{'} \in A} Q_{2}^{(l)} (s^{'}, a_{2}^{'}) .

(A15)

From (A13), we obtain

Q_{1}^{(l)} (s^{'}, a_{1}^{(l)} (s^{'})) \leq Q_{1}^{(l)} (s^{'}, a_{2}^{(l)} (s^{'})) \leq Q_{2}^{(l)} (s^{'}, a_{2}^{(l)} (s^{'})) .

(A16)

It is derived from (10) and (A16) that

Q_{1}^{(l + 1)} (s, a) = U (s, a) + γ Q_{1}^{(l)} (s^{'}, a_{1}^{(l)} (s^{'})) \leq U (s, a) + γ Q_{2}^{(l)} (s^{'}, a_{2}^{(l)} (s^{'})) = Q_{2}^{(l + 1)} (s, a)

(A17)

From (A17), the inequality (20) holds for

i = l + 1

. Thus, for

i = 1,2, \dots

, the inequality (20) holds. This completes the proof of Lemma 1. □

Appendix C

Proof of Lemma 2.

Mathematical induction is adopted to prove Lemma 2. For

t = 1

, it follows from (21) and (22) that

(L^{(1)} {\bar{Q}}^{(i)}) (s, a) = U (s, a) + γ \min_{a^{'} \in A} {\bar{Q}}^{(i)} (s^{'}, a^{'}) = U (s, a) + γ \min_{a^{'} \in A} Q^{(i)} (s^{'}, a^{'}) + γ ϵ .

(A18)

From (17), we have

Q^{(i + 1)} (s, a) = U (s, a) + γ \min_{a^{'} \in A} Q^{(i)} (s^{'}, a^{'}) .

(A19)

Inserting (A19) into (A18), we obtain

(L^{(1)} {\bar{Q}}^{(i)}) (s, a) = Q^{(i + 1)} (s, a) + γ ϵ .

(A20)

It is seen from (A20) that the equality (23) holds for

t = 1

.

Assume that the inequality (23) holds for

t = l

, i.e.,

(L^{(l)} {\bar{Q}}^{(i)}) (s, a) = Q^{(i + l)} (s, a) + γ^{l} ϵ .

(A21)

It follows from (22) and (A21) that

(L^{(l + 1)} {\bar{Q}}^{(i)}) (s, a) = U (s, a) + γ \min_{a^{'} \in A} (L^{(l)} {\bar{Q}}^{(i)}) (s^{'}, a^{'}) = U (s, a) + γ \min_{a^{'} \in A} Q^{(i + l)} (s^{'}, a^{'}) + γ^{l + 1} ϵ .

(A22)

Considering that

Q^{(i + l + 1)} (s, a) = U (s, a) + γ \min_{a^{'} \in A} Q^{(i + l)} (s^{'}, a^{'})

(A23)

The Equation (A22) becomes

(L^{(l + 1)} {\bar{Q}}^{(i)}) (s, a) = Q^{(i + l + 1)} (s, a) + γ^{l + 1} ϵ .

(A24)

From (A24), the inequality (23) holds for

t = l + 1

. Thus, for

i = 1,2, \dots

, the inequality (23) holds. This completes the proof of Lemma 2. □

Appendix D

Proof of Theorem 2.

It is easy to see from (25) that

{\bar{ϵ}}_{i} \geq Q^{(i)} (s, a) - Q^{(i - 1)} (s, a)

(A25)

or

Q^{(i)} (s, a) \leq Q^{(i - 1)} (s, a) + {\bar{ϵ}}_{i} .

(A26)

According to Lemma 1, we get the following inequality

Q^{(i + 1)} (s, a) \leq (L^{(1)} (Q^{(i - 1)} + {\bar{ϵ}}_{i})) (s, a) .

(A27)

According to Lemma 2, the right side of (A27) is written as

(L^{(1)} (Q^{(i - 1)} + {\bar{ϵ}}_{i})) (s, a) = Q^{(i)} (s, a) + γ {\bar{ϵ}}_{i} .

(A28)

Substituting (A28) into (A27) yields

Q^{(i + 1)} (s, a) \leq Q^{(i)} (s, a) + γ {\bar{ϵ}}_{i} .

(A29)

Furthermore, it is derived according to Lemmas 1 and 2 that

Q^{(i + 2)} (s, a) \leq (L^{(1)} (Q^{(i)} + γ {\bar{ϵ}}_{i})) (s, a)

(A30)

and

(L^{(1)} (Q^{(i)} + {\bar{ϵ}}_{i})) (s, a) = Q^{(i + 1)} (s, a) + γ^{2} {\bar{ϵ}}_{i} \leq Q^{(i)} (s, a) + γ {\bar{ϵ}}_{i} + γ^{2} {\bar{ϵ}}_{i} .

(A31)

Substituting (A31) into (A30), we have

Q^{(i + 2)} (s, a) \leq Q^{(i)} (s, a) + (γ + γ^{2}) {\bar{ϵ}}_{i} .

(A32)

With a similar process, it is easy to verify that

Q^{(i + t)} (s, a) \leq Q^{(i)} (s, a) + (\sum_{j = 1}^{t} γ^{j}) {\bar{ϵ}}_{i} .

(A33)

In the case

t \to \infty

, the inequality (A33) is expressed as

\lim_{t \to \infty} Q^{(i + t)} (s, a) \leq Q^{(i)} (s, a) + \frac{γ}{1 - γ} {\bar{ϵ}}_{i} .

(A34)

According to Theorem 1, we obtain

Q^{*} (s, a) \leq Q^{(i)} (s, a) + \frac{γ}{1 - γ} {\bar{ϵ}}_{i}

(A35)

Similarly, applying Lemmas 1 and 2, we get the following inequality from (26)

Q^{*} (s, a) \geq Q^{(i)} (s, a) + \frac{γ}{1 - γ} {\underline{ϵ}}_{i}

(A36)

Combining the inequalities (A35) and (A36), we conclude that the inequality (24) holds. This completes the proof of Theorem 2. □

Appendix E

Proof of Theorem 3.

According to Theorem 2, we have

Q^{(i)} (s, a) + \frac{γ}{1 - γ} \min_{s \in S, a \in A} [Q^{(i)} (s, a) - Q^{(i - 1)} (s, a)] \leq Q^{*} (s, a) .

(A37)

From (27), we obtain

Q^{(i)} (s, a) \geq {\hat{Q}}^{(i)} (s, a) - ε_{i} (s, a)

(A38)

and

Q^{(i - 1)} (s, a) \leq {\hat{Q}}^{(i - 1)} (s, a) + ε_{i - 1} (s, a) .

(A39)

Substituting (A38) and (A39) into (A37) yields

{\hat{Q}}^{(i)} (s, a) - ε_{i} (s, a) + \frac{γ}{1 - γ} \min_{s \in S, a \in A} [{\hat{Q}}^{(i)} (s, a) - ε_{i} (s, a) - {\hat{Q}}^{(i - 1)} (s, a) - ε_{i - 1} (s, a)] \leq Q^{*} (s, a)

(A40)

From (31), the inequality (A40) is rewritten as

{\hat{Q}}^{(i)} (s, a) - {\bar{ε}}_{i} + \frac{γ}{1 - γ} \min_{s \in S, a \in A} [{\hat{Q}}^{(i)} (s, a) - {\bar{ε}}_{i} - {\hat{Q}}^{(i - 1)} (s, a) - {\bar{ε}}_{i}] \leq Q^{*} (s, a)

(A41)

or

{\hat{Q}}^{(i)} (s, a) - {\underline{c}}_{i} \leq Q^{*} (s, a) .

(A42)

It is derived with a similar process that

{\hat{Q}}^{(i)} (s, a) + {\bar{c}}_{i} \geq Q^{*} (s, a)

(A43)

Combining the inequalities (A42) and (A43), we conclude that the inequality (28) holds. This completes the proof of Theorem 3. □

References

Huang, J.; Yang, R.; Zhan, X. Constraint Navigation Filter for Space Vehicle Autonomous Positioning with Deficient GNSS Measurements. Aerosp. Sci. Technol. 2022, 120, 107291. [Google Scholar] [CrossRef]
Ely, T.A.; Seubert, J.; Bradley, N.; Drain, T.; Bhaskaran, S. Radiometric Autonomous Navigation Fused with Optical for Deep Space Exploration. J. Astronaut. Sci. 2021, 68, 300–325. [Google Scholar] [CrossRef]
Gallo, E.; Barrientos, A. Reduction of GNSS-Denied Inertial Navigation Errors for Fixed Wing Autonomous Unmanned Air Vehicles. Aerosp. Sci. Technol. 2022, 120, 107237. [Google Scholar] [CrossRef]
Hu, J.; Liu, J.; Wang, Y.; Ning, X. INS/CNS/DNS/XNAV Deep Integrated Navigation in a Highly Dynamic Environment. Aircr. Eng. Aerosp. Technol. 2023, 95, 180–189. [Google Scholar] [CrossRef]
Yang, Y.; Han, X.; Song, N.; Wang, Z. A New Method to Improve the Measurement Accuracy of Autonomous Astronomical Navigation. J. Math. 2022, 2022, 3649662. [Google Scholar] [CrossRef]
Wang, Y.; Yan, T.; Wang, L. Development Situation and Trend of Space Intelligent Navigation Technology. Aerosp. Control Appl. 2022, 48, 9–17. [Google Scholar]
Zhou, B.; Li, Y.; Zhang, A.; Cui, S. Observability Analysis of Satellite Autonomous Orbit Determination with Modeling and Measurement Errors. Chin. Space Sci. Technol. 2023, 43, 25–34. [Google Scholar]
Christian, J.A. Optical Navigation Using Planet’s Centroid and Apparent Diameter in Image. J. Guid. Control. Dyn. 2015, 38, 192–204. [Google Scholar] [CrossRef]
Hou, B.; Wang, J.; Zhou, H.; He, Z.; Li, D.; Liu, X. Guidepost-based Autonomous Orbit Determination Method for GEO Satellite. Adv. Space Res. 2021, 67, 1090–1113. [Google Scholar] [CrossRef]
Turan, E.; Speretta, S.; Gill, E. Autonomous navigation for deep space small satellites: Scientific and technological advances. Acta Astronaut. 2022, 193, 56–74. [Google Scholar] [CrossRef]
Sheikh, S.I.; Pines, D.J. Spacecraft Navigation Using X-Ray Pulsars. J. Guid. Control. Dyn. 2006, 29, 49–63. [Google Scholar] [CrossRef]
Wang, Y.; Zheng, W.; Ge, M.; Zheng, S.; Zhang, S. Use of Statistical Linearization for Nonlinear Least-Squares Problems in Pulsar Navigation. J. Guid. Control. Dyn. 2023, 46, 1850–1855. [Google Scholar] [CrossRef]
Zoccarato, P.; Larese, S.; Naletto, G.; Zampieri, L.; Brotto, F. Deep Space Navigation by Optical Pulsars. J. Guid. Control. Dyn. 2023, 46, 1501–1511. [Google Scholar] [CrossRef]
Zhang, W. A Study of the Navigation Technology and Application Based on Astronomical Spectral Velocity Measurement. Navig. Control 2020, 19, 64–73. [Google Scholar]
Liu, J.; Wang, T.; Ning, X.; Kang, Z. Modelling and analysis of celestial Doppler difference velocimetry navigation considering solar characteristics. IET Radar Sonar Navig. 2020, 14, 1897–1904. [Google Scholar] [CrossRef]
Gui, M.; Yang, H.; Ning, X.; Ye, W.; Wei, C. A Novel Sun Direction/Solar Disk Velocity Difference Integrated Navigation Method Against Installation Error of Spectrometer Array. IEEE Sens. J. 2023, 23, 17480–17490. [Google Scholar] [CrossRef]
Christian, J.A. StarNAV: Autonomous Optical Navigation of a Spacecraft by the Relativistic Perturbation of Starlight. Sensors 2019, 19, 4064. [Google Scholar] [CrossRef]
Bailer-Jones, C.A.L. Lost in Space? Relativistic Interstellar Navigation using an Astrometric Star Catalog. Publ. Astron. Soc. Pac. 2021, 133, 074502. [Google Scholar] [CrossRef]
McKee, P.; Kowalski, J.; Christian, J. Navigation and star identification for an interstellar mission. Acta Astronaut. 2022, 192, 390–401. [Google Scholar] [CrossRef]
Klioner, S. A Practical Relativistic Model for Microarcsecond Astrometry in Space. Astron. J. 2003, 125, 1580–1597. [Google Scholar] [CrossRef]
McKee, P.; Nguyen, H.; Kudenov, M.W.; Christian, J.A. StarNAV with a wide field-of-view optical sensor. Acta Astron. 2022, 197, 220–234. [Google Scholar] [CrossRef]
Yucalan, D.; Peck, M. Autonomous Navigation of Relativistic Spacecraft in Interstellar Space. J. Guid. Control Dyn. 2021, 44, 1106–1115. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C. Integrated Celestial Navigation for Spacecraft Using Interferometer and Earth Sensor. Proc. Inst. Mech. Eng. Part G: J. Aerosp. Eng. 2020, 234, 2248–2262. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C.; Zhou, P. Integrated Autonomous Optical Navigation Using Q-Learning Extended Kalman Filter. Aircr. Eng. Aerosp. Technol. 2022, 94, 848–861. [Google Scholar] [CrossRef]
Gui, M.; Wei, Y.; Ning, X. Celestial angle measurement navigation for Mars probe considering relativistic effect. J. Deep Space Explor. 2023, 10, 126–132. [Google Scholar]
Liu, F.; Li, M.; Peng, Y.; Sun, J.; Liu, J. An autonomous navigation method for spacecraft in cislunar space using stellar aberration observation. J. Deep Space Explor. 2023, 10, 159–168. [Google Scholar]
Ullah, I.; Fayaz, M.; Naveed, N.; Kim, D. ANN Based Learning to Kalman Filter Algorithm for Indoor Environment Prediction in Smart Greenhouse. IEEE Access 2020, 8, 159371–159388. [Google Scholar] [CrossRef]
Or, B.; Klein, I. A Hybrid Model and Learning-Based Adaptive Navigation Filter. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Ning, X.; Li, Z.; Wu, W.; Yang, Y.; Fang, J.; Liu, G. Recursive Adaptive Filter Using Current Innovation for Celestial Navigation During the Mars Approach Phase. Sci. China-Inf. Sci. 2017, 60, 032205. [Google Scholar] [CrossRef]
Li, W.; Sun, S.; Jia, Y.; Du, J. Robust unscented Kalman filter with adaptation of process and measurement noise covariances. Digit. Signal Process. 2016, 48, 93–103. [Google Scholar] [CrossRef]
Jia, W.; Tian, Y.; Duan, H.; Luo, R.; Lian, J.; Ruan, C.; Zhao, D.; Li, C. Autonomous Navigation Control Based on Improved Adaptive Filtering for Agricultural Robot. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420925357. [Google Scholar] [CrossRef]
Xiong, K.; Zhou, P.; Wei, C. Autonomous Navigation of Unmanned Aircraft Using Space Target LOS Measurements and QLEKF. Sensors 2022, 22, 6992. [Google Scholar] [CrossRef] [PubMed]
Tao, W.; Zhang, J.; Hu, H.; Zhang, J.; Sun, H.; Zeng, Z.; Song, J.; Wang, J. Intelligent Navigation for the Cruise Phase of Solar System Boundary Exploration Based on Q-learning EKF. Complex Intell. Syst. 2024, 2, 2653–2672. [Google Scholar] [CrossRef]
Xiong, K.; Wei, C.; Zhang, H. Q-learning for noise covariance adaptation in extended Kalman filter. Asian J. Control. 2021, 23, 1803–1816. [Google Scholar] [CrossRef]
Chen, C.; Wu, X.; Bo, Y.; Chen, Y.; Liu, Y.; Alsaadi, F.E. SARSA in extended Kalman Filter for complex urban environments positioning. Int. J. Syst. Sci. 2021, 52, 3044–3059. [Google Scholar] [CrossRef]
Yin, Y.; Li, S.E.; Tang, K.; Cao, W.; Wu, W.; Li, H. Approximate optimal filter design for vehicle system through Actor-Critic reinforcement learning. Automot. Innov. 2022, 5, 415–426. [Google Scholar] [CrossRef]
Crassidis, J.L.; Markley, F.L.; Cheng, Y. Survey of Nonlinear Attitude Estimation Methods. J. Guid. Control. Dyn. 2007, 30, 12–28. [Google Scholar] [CrossRef]
Hu, Z.; Gong, W. Constrained Evolutionary Optimization Based on Reinforcement Learning Using the Objective Function and Constraints. Knowl.-Based Syst. 2022, 237, 107731. [Google Scholar] [CrossRef]
Jang, B.; Kim, M.; Harerimana, G.; Kim, J.W. Q-learning Algorithms: A Comprehensive Classification and Applications. IEEE Access 2019, 7, 133653–133667. [Google Scholar] [CrossRef]
Li, Y.; Yang, C.; Hou, Z.; Feng, Y.; Yin, C. Data-driven approximate Q-learning stabilization with optimality error bound analysis. Automatica 2019, 103, 435–442. [Google Scholar] [CrossRef]
Shi, H.; Li, X.; Hwang, K.; Pan, W.; Xu, G. Decoupled Visual Servoing with Fuzzy Q-learning. IEEE Trans. Ind. Inform. 2018, 14, 241–252. [Google Scholar] [CrossRef]
Wu, G. UAV-Based Interference Source Localization: A Multi-model Q-learning Approach. IEEE Access 2019, 7, 137982–137991. [Google Scholar] [CrossRef]
Maia, R.; Mendes, J.; Araujo, R.; Silva, M.; Nunes, U. Regenerative Braking System Modeling by Fuzzy Q-Learning. Eng. Appl. Artif. Intell. 2020, 93, 103712. [Google Scholar] [CrossRef]
Wei, Q.; Lewis, F.L.; Sun, Q.; Yan, P.; Song, R. Discrete-time Deterministic Q-learning: A Novel Convergence Analysis. IEEE Trans. Cybern. 2017, 47, 1224–1237. [Google Scholar] [CrossRef]

Figure 1. Concept of relativistic navigation.

Figure 2. Diagram of relativistic navigation and calibration method.

Figure 3. Diagram of PQEKF algorithm.

Figure 4. Position estimation error of traditional EKF without measurement bias calibration.

Figure 5. Velocity estimation error of traditional EKF without measurement bias calibration.

Figure 6. Position estimation error of calibration method based on PQEKF.

Figure 7. Velocity estimation error of calibration method based on PQEKF.

Figure 8. Position RMS errors of different methods vs. measurement bias.

Figure 9. Velocity RMS errors of different methods vs. measurement bias.

Figure 10. RMS errors as functions of measurement noise standard deviation.

Figure 11. Position RMS error curves of different navigation filters.

Figure 12. Position RMS error curves of PQEKF algorithms for different state numbers.

Table 1. Simulation parameter settings.

Simulation conditions	Duration of simulation	2.5 days
	Measurement noise standard deviation	1 mas
	Measurement bias	0.3 mas
	Update frequency	0.1 Hz
EKF parameters	Initial estimation error covariance	$P_{0} = d i a g ([p_{r}, p_{r}, p_{r}, p_{v}, p_{v}, p_{v}, p_{b}, p_{b}, p_{b}])$ $p_{r} = 300 m, p_{v} = 0.03 m / s, p_{b} = 0.1 m a s$
	Process noise covariance	$Q_{k} = d i a g ([q_{r}, q_{r}, q_{r}, q_{v}, q_{v}, q_{v}, q_{b}, q_{b}, q_{b}])$ $q_{r} = 1 \times 10^{- 5} m, q_{v} = 1 \times 10^{- 5} m / s,$ $q_{b} = 0.03 m a s$
	Measurement noise covariance	$R_{k} = d i a g ([σ_{I S A}, σ_{I S A}, σ_{I S A}])$ $σ_{I S A} = 1 m a s$
PQEKF parameters	State space for agent 1	${15^{- 2} Q_{r v, k}, 10^{- 2} Q_{r v, k}, 5^{- 2} Q_{r v, k}, Q_{r v, k}, 5^{2} Q_{r v, k}, 10^{2} Q_{r v, k}, 15^{2} Q_{r v, k}}$ $Q_{r v, k} = Q_{k} (1 : 6, 1 : 6)$
	State space for agent 2	${150^{- 2} Q_{b, k}, 100^{- 2} Q_{b, k}, 50^{- 2} Q_{b, k}, Q_{b, k}, 50^{2} Q_{b, k}, 100^{2} Q_{b, k}, 150^{2} Q_{b, k}}$ $Q_{b, k} = Q_{k} (7 : 9, 7 : 9)$
	Window size	$K = 50$
	Discounted factor	$γ = 0.9$

Table 2. Comparison of calibration methods based on EKF, AEKF, QLEKF and PQEKF.

Calibration Method	Average RMS Error
Calibration Method	Position (m)	Velocity (m/s)	Measurement Bias (mas)
EKF	609.8	0.073	0.275
AEKF	371.1	0.045	0.187
QLEKF	328.3	0.036	0.088
PQEKF	215.5	0.025	0.055

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiong, K.; Zhao, Q.; Yuan, L. Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter. Sensors 2024, 24, 6186. https://doi.org/10.3390/s24196186

AMA Style

Xiong K, Zhao Q, Yuan L. Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter. Sensors. 2024; 24(19):6186. https://doi.org/10.3390/s24196186

Chicago/Turabian Style

Xiong, Kai, Qin Zhao, and Li Yuan. 2024. "Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter" Sensors 24, no. 19: 6186. https://doi.org/10.3390/s24196186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Calibration Method for Relativistic Navigation System Using Parallel Q-Learning Extended Kalman Filter

Abstract

1. Introduction

2. Relativistic Navigation System Model

2.1. Basic Principle of Relativistic Navigation

2.2. Dynamic Model

2.3. Measurement Model

3. Navigation Filtering Algorithm

3.1. Extended Kalman Filter

3.2. Q-Learning Approach

3.3. Parallel Q-Learning Extended Kalman Filter

4. Simulations

4.1. Simulation Conditions

4.2. Simulation Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI