**1. Introduction**

As important spacecraft, study of the reliability of artificial satellites is a hot topic at present. Generally, an artificial satellite consists of a structural system, temperature control system, attitude control system, measurement and control system, and power supply system. The mission of the attitude control system is to help the satellite achieve attitude stability or attitude maneuver, to further guarantee the normal operation of the satellite platform and the normal work of the payload.

Satellites have high requirements for attitude accuracy, which makes the task of attitude control systems very heavy. Health state and reliability are the basic guarantee for the normal operation of satellites [1]. Therefore, research on the theory and technology of automatic fault diagnosis and anomaly detection of satellite attitude control systems will further ensure the safe and reliable operation of on-orbit aircraft, reducing the possibility of space accidents.

In recent years, many scholars have conducted research on fault diagnosis technology or health management technology. These research contents can be roughly divided into three main aspects. First, when there is a specific research object, a feasible solution is to construct a simulation model of the object by analyzing the working mechanism and failure mode of the object. The data generated based on the simulation data is used as the

**Citation:** Li, Y.; Lei, M.; Liu, P.; Wang, R.; Xu, M. A Novel Framework for Anomaly Detection for Satellite Momentum Wheel Based on Optimized SVM and Huffman-Multi-Scale Entropy. *Entropy* **2021**, *23*, 1062. https://doi.org/10.3390/ e23081062

Academic Editor: Donald J. Jacobs

Received: 2 July 2021 Accepted: 14 August 2021 Published: 17 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**<sup>\*</sup>** Correspondence: hitleimingjia@163.com

theoretical prediction value, and then the judgment criterion is designed to complete the detection task. Luo et al. propose an improved phenomenological model based on meshing vibration to generate fault simulation data [2]. Li et al. established an INS/ADS fault detection model based on kinematic equations, and combined an unscented Kalman filter (UKF) with Runge-Kutta to deal with the non-linear and discretization problem [3]. Second, some research aims at extracting the fault features by constructing more effective signal processing methods, such as the feature extraction method based on entropy value [4,5], the feature extraction method based on spectral kurtosis time (Spectral Kurtosis, SK) [6], or the Frequency domain feature extraction method [7]. To fully excavate the features of the momentum wheel telemetry signal, this paper uses a combination of time domain features, frequency domain features and complexity features for feature extraction. Considering that, compared with permutation, dispersion, hierarchy, etc., sample entropy has better consistency for different parameters, this paper chooses a complex quantification method based on sample entropy. Third, for the fault recognition process, various pattern recognition methods are used to learn the mapping relationship between features and failure modes, so as to realize automatic fault recognition [8].

Due to the extremely complex structure and working principle of the spacecraft itself, and the strong coupling between the sub-systems, it is very difficult to construct an accurate simulation model of the spacecraft or its components [9,10]. As the spacecraft is affected by the special space environment during its orbiting operation, it is extremely prone to unpredictable failures, for example, the circuit signal disturbance caused by electromagnetic background [11], the sudden change of attitude caused by the impact of space debris [12], etc. In addition, during the process of the spacecraft downloading telemetry data to the ground-based measurement and control station, data jumps and even partial loss can occur [13]. Therefore, the data generated by the simulation model is often difficult to simulate the actual telemetry data of the spacecraft, and it becomes very difficult to use the spacecraft anomaly detection method based on the physical simulation model in practical applications.

The fault diagnosis method based on the data mode does not impose necessary restrictions on the prior knowledge of the object or system (including mathematical models and expert experience, etc.), such as artificial neural network (ANN), support vector machine (SVM), Bayesian network (BN) and other health assessment methods.

ANN is a method that is widely used in fault identification problems. Multilayer Perceptron (MLP) is the most typical type of feedforward neural network model, which usually uses a BP algorithm to learn the parameters of the model. Kumar et al. proposed a method based on principal component analysis (PCA) and MLP to detect and classify the three-phase current signals online [14]. In addition, probabilistic neural network (PNN) [15], RBF neural network [16], extension neural network (ENN) [17] and recurrent neural network (RNN) [18–20] have also been applied to fault detection and diagnosis problems.

For high-dimensional identification problems in fault diagnosis, the SVM method based on the principle of structural risk minimization has been widely used in recent years [21–23]. Compared with the ANN method based on the principle of empirical risk minimization, the learning goal of the SVM is to learn the optimal classification hyperplane in the feature space. The ANN has the ability to deal with pattern recognition problems, but the sample size is large, and it takes a long time to adjust the network structure parameters. Bayesian decision-making has significant execution ability under the premise of considering prior probability, but good accuracy is based on a prior model with appropriate assumptions. Compared with the above methods, the SVM only needs a small number of samples for training and has better generalization ability. Therefore, this paper chooses SVM as the means of pattern recognition. In the field of fault diagnosis, research on the SVM method mainly focuses on two aspects of obtaining more accurate recognition accuracy, i.e., by optimizing the hyperparameters of the model and constructing a new kernel function. For specific recognition tasks, to optimize the hyperparameters of the model to obtain better recognition performance, many optimization methods are applied [24–26].

Liu et al. proposed a novel small sample data missing filling method based on support vector regression (SVR) and genetic algorithm (GA) to improve the equipment health diagnosis effect [25]. Particle swarm optimization (PSO) is a hyperparameter optimization algorithm which is used by Cuong-Le et al. for damage identifications [26]. In terms of constructing a new kernel function, Wang et al. proposed a kernel function selection mechanism under sparse representation and the superiority of the selection mechanism was performed in simulations and engineering experiments involving high-speed bearing fault diagnosis [27]. Although both GA and PSO can solve high-dimensional complex optimization problems well, in the iterative process of PSO, the particles can retain the memory of the good solution, but the GA cannot, so PSO can often converge to a better solution more quickly. Based on the above analysis, this paper uses PSO to optimize the multi-class SVM.

From the above analysis, it can be seen that the following problems still exist in the direct application of existing anomaly detection or fault diagnosis methods to the anomaly detection problem of the satellite momentum wheel.


Therefore, in response to the above problems, this article proposes a new method based on multi-type features fusion and improved SVM to handle the problem of anomaly detection for the satellite. The main contributions of the proposed framework can be summarized as follows:


The rest of this paper is organized as follows. Section 2 presents the scheme of the proposed anomaly detection framework. The construction method of multi-type feature sequence HMSE-T/F is provided in Section 3. In Section 4, the anomaly detection method based on multi-class SVM model and the improved adaptive particle swarm optimization (APSO) are stated. In Section 5, the performance of the proposed method is evaluated from different aspects. Finally, in Section 6, a comprehensive summary of this paper and prospects for future work are given.

### **2. The Scheme of the Proposed Anomaly Detection Framework**

*2.1. Description of Difficulties in Spacecraft Anomaly Detection*

In fact, since satellites are at normal working conditions at most of the time during their orbits, the proportion of normal data in the telemetry data collected on the ground is very high. For most detection methods that rely on plenty of training data, satellite telemetry data can provide very few abnormal or fault samples, and there are very few effective samples that can be used for classification model training. Therefore, some adaptive improvements are needed when using the classification model to detect anomalies in spacecraft.

Figure 1a shows the momentum wheel voltage change of a certain type of satellite within 10 days, and its sampling frequency is 0.125 Hz. Figure 1b shows a sudden voltage change in a certain type of satellite. Figure 1c is the frequency spectrum of the telemetry signal in Figure 1a,d is the partially enlarged view of Figure 1c. According to Figure 1a–d, apart from the feature of less abnormal data, satellite telemetry data also exhibits the characteristics of extremely low sampling frequency, slow data change over a long period of time, and many sudden abnormalities. Therefore, anomaly detection methods that rely on time domain and frequency domain feature extraction often find it difficult to distinguish the health status of their telemetry data.

**Figure 1.** A satellite's momentum wheel voltage telemetry data: (**a**) 10-day data sampled at a frequency of 0.125 Hz, (**b**) sample with sudden change, (**c**) frequency spectrum, (**d**) partially enlarged view of (**c**).

#### *2.2. The Proposed Anomaly Detection Framework*

To effectively solve the problem of satellite momentum wheel anomaly detection, a new anomaly detection framework based on multi-type feature extraction and fusion is proposed in this paper. The overall procedure of the proposed anomaly detection framework is shown in Figure 2. Specifically, the descriptions of each Step are detailed as follows.

**Figure 2.** The overall procedure of the proposed anomaly detection framework.

Step 1: Telemetry data collection.

When the satellite is in orbit, to obtain its internal operating status and further provide real-time data for the remote-control object, the sensors in the satellite telemetry system need to measure the operating status of each key component and convert it into electrical signals. After the signals of each channel are combined according to a certain system, they are transmitted to the ground telemetry equipment (including receiver, antenna and splitter demodulator) using radio communication technology, and the ground terminal equipment restores and stores the original parameter information of each channel through signal demodulation technology.

Step 2: Data preprocessing.

The collection process of telemetry data is interfered with by sensors, converters, and wireless transmission. The data obtained by the ground receiving end often produces abnormal jump points. These kind of data points that deviate from the change law of the measured signal are usually called abnormal outliers. The abnormal outliers of the telemetry data will provide wrong information and affect the processing and analysis results of the telemetry signal. Outlier elimination is an important part of telemetry data preprocessing. By eliminating random measurement values with large errors, the authenticity of telemetry data can be guaranteed to a certain extent, and the reliability of data analysis can be improved. Commonly used methods to eliminate outliers include visual inspection, mean square method, point discrimination, Letts criterion, etc. Different outlier elimination methods should be used for different types of telemetry data. Considering that this article mainly analyzes the telemetry data of the satellite momentum wheel, the outlier elimination method based on the Letts criterion is adopted.

The premise of the Letts criterion is that the distribution of the measured data is close to the normal distribution. Based on this assumption, the given confidence probability is 99.7% as the standard, and the standard deviation of three times the measured quantity is used as the basis. Any measurement value exceeding this limit is judged for wild value. For a given sequence of telemetry measurement values. For a given telemetry sequence *x* = {*xi*}, *i* = 1, ··· , *N*, the specific process of the method is as follows.

(1) Calculate the mean of the series:

$$\overline{\mathbf{x}} = \sum\_{i=1}^{N} \mathbf{x}\_i \tag{1}$$

(2) Calculate the standard deviation of the series:

$$
\sigma = \frac{1}{N} \sqrt{\sum\_{i=1}^{N} (\mathbf{x}\_i - \overline{\mathbf{x}})} \tag{2}
$$

(3) Eliminate outliers:

$$\begin{cases} \left| \left| \mathbf{x}\_{i} - \overline{\mathbf{x}} \right| \leq 3\sigma\_{\prime} \quad \text{not\ outlineders}, \quad \begin{array}{l} \text{keep} \\ \left| \mathbf{x}\_{i} - \overline{\mathbf{x}} \right| > 3\sigma\_{\prime} \quad \text{outliers}, \quad \begin{array}{l} \text{elseep} \\ \text{deletze} \end{array} \right. \end{cases} \tag{3}$$

In addition to the problem of outliers, the process of satellite telemetry data transmission to the ground is affected by the ionosphere, and data may be missing during the signal decoding process. A telemetry sequence that has many data problems should be discarded and not used as training data, but the missing value at a certain point in the sequence can be handled by the filling method. From the distribution of the missing values, they can be divided into missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). MCAR means that the law of missing values in the data is completely random and does not affect the unbiasedness of the overall sample. MAR means that the mechanism of missing data is not completely random. The missing data of this type depends on other variables. Such missing values are relatively rare in telemetry data. MNAR means that the missing data is related to the value of the variable itself.

The missing values in satellite telemetry data are generally MCAR, so this paper uses an interpolation method based on two short sequences before and after the missing point to fill in the missing values. Given the data sequence to be filled is *y* = {*yi*}, *i* = 1, ··· , *M*, The missing value to be filled is *y*∗. The auxiliary variable used to construct the regression equation is *x* = {*xi*}, *j* = 1, ··· , *M*. The auxiliary variable value corresponding to the missing value of the variable to be filled is *x*∗, and *x*∗ is a known variable. Use *x*, *y* to construct the regression equation:

$$y\_i = f^\*(x\_i) \tag{4}$$

where *f* ∗ needs to choose different regression models according to different telemetry data. Then the missing value is *y*∗ = *f*(*x*∗).

Step 3: Features extraction.

Considering the difficulty of using satellite telemetry data for anomaly detection as mentioned above, this paper adopts a time-frequency domain feature extraction and selection method based on feature quality evaluation. At the same time, a complexity feature extraction method based on Hoffman multi-scale entropy is proposed, which enriches signal feature types and provides effective feature learning samples for training satellite telemetry data anomaly detection models. The specific method of feature extraction is described in detail in Section 3.

Step 4: Obtaining the anomaly detection model.

This paper takes support vector machine (SVM) as the basic unit and uses a directed acyclic graph (DAG) principle to construct a satellite momentum wheel anomaly detection model based on the support vector machine. This model can effectively solve the multiclassification problem when some categories are difficult to distinguish. In addition, to improve the classification accuracy of the anomaly detection model, an improved particle

swarm optimization (PSO) algorithm is proposed to train SVMs. The specific method of Obtaining anomaly detection model is described in detail in Section 4.

#### **3. Multi-Type Feature Sequence HMSE-T/F Construction Method**

*3.1. Time/Frequency Domain Feature Extraction and Selection*

3.1.1. Time/Frequency-Domain Feature

The time domain signal is a time series in which time is the independent variable to describe the change of a certain physical quantity, and it is the most basic and most intuitive form of expression of the signal. The time domain signal reflects the corresponding relationship between real physical information and time. The processing of filtering, amplifying, statistical feature calculation, and correlation analysis of signals in the time domain is collectively referred to as time domain analysis.

When a device fails, its spectrum distribution may change. Like the statistical analysis of time-domain signals, this type of change can be described by statistical analysis of the signal's frequency spectrum.

Given a period of time domain signal *x*(*t*), the frequency spectrum of this signal is *y*(*k*), *k* = 1, ··· , *k*, *fk* is the k-th line of the spectrum. Then the time-domain statistical characteristics and frequency-domain statistical features of *x*(*t*) are shown in Table 1 [28].


**Table 1.** The time/frequency-domain statistical features of *x*(*t*).

3.1.2. Feature Evaluation and Selection

In this paper, two commonly used feature evaluation methods, Laplacian Score (LS) [29] and Relief-F Score (RFS) [30], are used to evaluate the effectiveness of the timedomain and frequency-domain features of the satellite momentum wheel telemetry signal. Feature selection is based on two different feature evaluation results, and the feature with the higher evaluation score is taken as the effective feature in the time/frequency domain.

(1) Laplacian Score (LS).

In practical problems, data of the same type are generally close to each other. Under this premise, the importance of describing features can be transformed into evaluating the local retention of features. The Laplace score is based on this idea. Let the data set be *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*m*×*n*, *Lr* is the LS of the *<sup>r</sup>*-th feature, *<sup>f</sup>ri* is the the *<sup>r</sup>*-th feature of the *<sup>i</sup>*-th sample. *Lr* can be calculated as follows.

Step 1: Construct a neighbor graph *G* containing *n* nodes, the *i*-th node corresponds to the *i*-th sample *xi*, if *x<sup>i</sup>* and *x<sup>j</sup>* are close to each other, that is, *x<sup>i</sup>* is within the *k*-neighbor range of *xj*, then an edge is constructed between nodes *x<sup>i</sup>* and *xj*. When the data labels are known, edges can be constructed directly between samples of the same type.

Step 2: If nodes *x<sup>i</sup>* and *x<sup>j</sup>* are connected, put *Sij* = *e xi*−*xj <sup>t</sup>* , where *t* is a suitable constant. Otherwise, put *Sij* = 0. The weight matrix *S* of the graph models the local structure of the data space.

Step 3: For the *r*-th feature, the *f<sup>r</sup>* and *D* can be defined as *f<sup>r</sup>* = [ *fr*1, *fr*2, ··· , *frm*] T, *D*= diag(*S1*). The matrix *L* = *D* − *S* is often called graph Laplacian. Let

$$
\overrightarrow{f}\_r = f\_r - \frac{f\_r^\ddagger D1}{1^\ddagger D1} \mathbf{1} \tag{5}
$$

2

where 1 = [1, ··· , 1] T.

Step 4: Compute the LS of the *r*-th feature as follows:

$$L\_r = \frac{\hat{f}\_r^T L \hat{f}\_r}{\hat{f}\_r^T D \hat{f}\_r} \tag{6}$$

(2) Relief-F Score (RFS).

The Relief-F Score method is a multi-class variant of the Relief method. The Relief method designs a correlation statistic to measure the importance of features. The statistic is a vector, each component of which corresponds to an initial feature, and the importance of the feature subset is determined by the sum of the relevant statistic components corresponding to each feature in the subset. For each *<sup>x</sup><sup>i</sup>* in the data set *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*m*×*n*, first find its nearest neighbor *xi*,*nh* in the same sample of *xi*, which is called guessing nearest neighbor, and then find its nearest neighbor *xi*,*nm* from different type samples of *xi*, which is called guessing wrong neighbor. The component of the correlation statistic corresponding to the feature is:

$$\delta^{(r)} = \sum\_{i} \left( -diff\left(\mathbf{x}\_{i}^{(r)}, \mathbf{x}\_{i, \text{nb}}^{(r)}\right)^{2} + diff\left(\mathbf{x}\_{i}^{(r)}, \mathbf{x}\_{i, \text{nn}}^{(r)}\right)^{2} \right) \tag{7}$$

where *x* (*r*) *<sup>i</sup>* is the value of the *<sup>r</sup>*-th feature of *<sup>x</sup>i*. For *<sup>x</sup><sup>a</sup>* and *<sup>x</sup>b*, *di f f x* (*r*) *<sup>a</sup>* , *x* (*r*) *b* depends on the type of the *r*-th feature. If the *r*-th feature r is discrete, when *x* (*r*) *<sup>a</sup>* = *x* (*r*) *<sup>b</sup>* , *di f f x* (*r*) *<sup>a</sup>* , *x* (*r*) *b* = 0, otherwise *di f f x* (*r*) *<sup>a</sup>* , *x* (*r*) *b* <sup>=</sup> 1. If the *<sup>r</sup>*-th feature r is continuous, then *di f f x* (*r*) *<sup>a</sup>* , *x* (*r*) *b* = - - *x* (*r*) *<sup>a</sup>* − *x* (*r*) *b* - - -.

Relief is designed for two classification problems, while Relief-F can handle multiple classification problems. For the sample *xi*, if it belongs to the *k*-th class, the Relief-F method first finds its nearest neighbor *xi*,*nh* in the *k*-th class sample, and then finds a nearest neighbor *xi*,*l*,*nm*, *l* = *k* of *x<sup>i</sup>* in each class except the *k*-th class as a guessing wrong neighbor, so the correlation statistic corresponding to the component of the *r*-th feature is

$$\delta^{(r)} = \sum\_{i} -diff\left(\mathbf{x}\_{i}^{(r)}, \mathbf{x}\_{i,\text{nb}}^{(r)}\right)^{2} + \sum\_{l \neq k} \left(p\_{l} \times diff\left(\mathbf{x}\_{i}^{(r)}, \mathbf{x}\_{i,\text{nm}}^{(r)}\right)^{2}\right) \tag{8}$$

where *pl* is the proportion of the *l*-th class sample in the data set *X*.
