Next Article in Journal
Construction and Evaluation of Prediction Model of Main Soil Nutrients Based on Spectral Information
Next Article in Special Issue
Fault Diagnosis Method for Aircraft EHA Based on FCNN and MSPSO Hyperparameter Optimization
Previous Article in Journal
Effect of Depth on Ultrasound Point Shear Wave Elastography in an Elasticity Phantom
Previous Article in Special Issue
Mission-Oriented Real-Time Health Assessments of Microsatellite Swarm Orbits
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unsupervised Anomaly Detection for Time Series Data of Spacecraft Using Multi-Task Learning

1
Institute of Telecommunication and Navigation Satellites, China Academy of Space Technology, Beijing 100094, China
2
College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
3
School of Aerospace Science and Technology, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(13), 6296; https://doi.org/10.3390/app12136296
Submission received: 28 April 2022 / Revised: 9 June 2022 / Accepted: 15 June 2022 / Published: 21 June 2022

Abstract

:
Although in-orbit anomaly detection is extremely important to ensure spacecraft safety, the complex spatial-temporal correlation and sparsity of anomalies in the data pose significant challenges. This study proposes the new multi-task learning-based time series anomaly detection (MTAD) method, which captures the spatial-temporal correlation of the data to learn the generalized normal patterns and hence facilitates anomaly detection. First, four proxy tasks are implemented for feature extraction through joint learning: (1) Long short-term memory-based data prediction; (2) autoencoder-based latent representation learning and data reconstruction; (3) variational autoencoder-based latent representation learning and data reconstruction; and (4) joint latent representation-based data prediction. Proxy Tasks 1 and 4 capture the temporal correlation of the data by fusing the latent space, whereas Tasks 2 and 3 fully capture the spatial correlation of the data. The isolation forest algorithm then detects anomalies from the extracted features. Application to a real spacecraft dataset reveals the superiority of our method over existing techniques, and further ablation testing for each task proves the effectiveness of fusing multiple tasks. The proposed MTAD method demonstrates promising potential for effective in-orbit anomaly detection for spacecraft.

1. Introduction

The number of orbiting spacecrafts has increased in recent years, followed by the number of occasional spacecraft failures. As such, spacecraft constitute complex in-orbit systems, a single failure in a component or subsystem may cause irreparable damage, especially if not detected in time [1]. A reasonable approach towards failure monitoring involves anomaly detection using the vast amount of telemetry data continuously produced by a number of spacecraft system components; these data constitute a multi-dimensional time series [2].
By definition, anomalies are patterns in data that do not conform to a well-defined notion of normal behavior, and anomaly detection refers to the problem of finding such patterns [3]. Anomaly detection methods for spacecraft include the out-of-limits, expert system-based, and data-driven methods [4], which have been widely studied. Although some success has been achieved in detecting spacecraft failures, the tens of thousands of parameters pertaining to the complex spacecraft structure and spatial-temporal correlations between those parameters pose various challenges for the abovementioned anomaly detection methods, as detailed below.
  • The expert system-based method has clear rules and strong interpretability; however, it relies heavily on expert knowledge and, thus, anomalies beyond the scope of prior knowledge cannot be detected [5]. In addition, with the continuous increase in spacecraft parameters and their increasingly complex spatial and temporal dependencies, summarizing and refining a large volume of credible expert knowledge is becoming more difficult.
  • For data-driven methods, anomaly detection seems to be a classification problem. However, the abnormal condition is rare for the spacecraft compared with the normal condition; thus, the available training data exhibits an extreme imbalance. In addition, it is not easy for users to label each item of training data, especially those pertaining to anomalies. Thus, supervised and semi-supervised methods do not work well, and an unsupervised learning method is required to handle the imbalanced and unlabeled data in a principle-based manner.
  • To distinguish anomalies from the normal data through an unsupervised method, a proxy task is typically designed and implemented; this can be a prediction [6,7] or reconstruction [8,9] task. The prediction task detects anomalies based on the errors between the predicted and real values, whereas the reconstruction task detects anomalies based on the errors between the reconstructed and real values. However, a single proxy task usually yields a suboptimal result because the individual task is not well aligned with anomaly detection. For instance, the principal component analysis [10], k-means [11], one-class support vector machine [12], and autoencoder (AE) [13] methods are unable to simultaneously capture spatial and temporal correlations optimally [14]. While it is necessary to model the inter-metric and temporal dependency for multivariate time series simultaneously to get better performance [15].
To overcome the above three challenges, this study proposes a multi-task learning-based time series anomaly detection method (MTAD) that trains models jointly on several different proxy tasks using high-dimensional and time series data from the spacecraft. The overall method framework is shown in Figure 1, including a multi-task model ( MTAD mt ) and error evaluation and anomaly detection model ( MTAD ad ). In the training stage, the following proxy tasks are first jointly trained via the data characterization of the complex spatial-temporal patterns for feature extraction: (1) the long short-term memory (LSTM)-based data prediction task; (2) the AE-based latent representation learning and data reconstruction task; (3) the variational autoencoder (VAE)-based latent representation learning and data reconstruction task; and (4) the joint latent representation-based data prediction task. Thereafter, the isolation forest (iForest)-based error evaluation and anomaly detection model is trained using the four-dimensional features extracted from the training data. Thus, in the detection stage, MTAD mt first extracts the features of the spacecraft data via the four proxy tasks and then inputs those features into the MTAD ad to realize unsupervised and robust anomaly detection.
The main contributions of this study are as follows:
  • Multi-task learning is introduced to spacecraft anomaly detection, fusing original data and its joint latent representation to facilitate simultaneous reconstruction and prediction, to model the complex spatial and temporal patterns of the spacecraft data.
  • Proxy tasks that utilize LSTM networks to capture complex temporal dependencies are set (Proxy Tasks 1 and 4, as detailed in Section 3).
  • Proxy tasks to capture the complex spatial dependencies between different parameters are set (Proxy Tasks 2 and 3); these tasks use both a temporal convolutional AE and LSTM VAE to generate two latent representations.
  • An iForest-based error evaluation and anomaly detection model is constructed, which integrates features extracted by the above four tasks for anomaly detection.
  • Experiments performed on real spacecraft data demonstrate that the proposed MTAD exhibits superior performance to comparative methods. Moreover, ablation experiments performed for each task demonstrate the effectiveness of the employed combination of the four proxy tasks.
The rest of the paper is organized as follows. Section 2 revisits the related work. Section 3 presents the data reprocessing process and our multi-task model, i.e., MTAD. Our approach is evaluated in Section 4. At last, we conclude the paper and discuss about the future work in Section 5.

2. Related Work

2.1. Spacecraft Anomaly Detection

The existing autonomous anomaly detection methods for spacecraft can be grouped into three categories: model-based, signal processing-based, and knowledge-based methods [16]. Anomaly detection using expert system-based and machine learning methods has been applied to spacecraft such as Hayabusa, the Geotail Space Shuttle, and the International Space Station [5]. For remote sensing satellites, independent health management systems have been designed [17]; these systems can diagnose and isolate anomalies and achieve appropriate recovery, thereby effectively improving the spacecraft reliability.
The random forest algorithm has been utilized for anomaly detection of spacecraft rolling bearings based on the frequency of the bearing fault signal [18]. Moreover, the hierarchical anomaly detection method has been proposed to identify failures in the attitude control system [19,20]. A method based on morphological variational modal decomposition and the Jensen–Rényi divergence distance has been proposed to overcome the problem of weak anomaly identification for on-orbit spacecraft failures [21]; this approach has proven to be effective when applied to spacecraft reaction wheel data. Additionally, a model-based fault detection, isolation, and identification scheme has been proposed, which uses unscented Kalman filters to detect and isolate faults in control moment gyros to facilitate a satellite attitude control subsystem [22].
Deep learning technology [23] has been successfully applied to various fields, and deep-learning-based anomaly detection in complex systems has received considerable attention [24]. Deep learning methods can mine hidden patterns in data and do not rely on expert knowledge. In the context of a spacecraft, a recurrent neural network (RNN)-based method has been developed to detect actuator failure in a satellite attitude control system [25]. Moreover, the effectiveness of the deep learning method for aircraft fault diagnosis has been verified through practical application to fault diagnosis and prediction of mechanical parts [26]. LSTM-based methods constitute a typical prediction-based approach, and the effectiveness of such a method for anomaly detection has been proven using expert-labeled telemetry anomaly data from the Soil Moisture Active Passive satellite and Curiosity, the Mars Science Laboratory rover, with a complementary unsupervised and nonparametric anomaly thresholding approach and false-positive mitigation strategies also being proposed [6]. A causal network and feature attention-based LSTM method has also been used to detect anomalies in satellite telemetry data [27].

2.2. Multi-Task Learning

Multi-task learning [28], which is applied in the present study, is a machine learning method based on shared representation that combines multiple related tasks to improve the model generalization ability, thereby yielding superior overall performance to that of a single task [29]. In contrast to single-task learning, multi-task learning uses multiple related tasks for simultaneous parallel learning and gradient backpropagating. Multi-task learning is a type of transfer learning that uses prior knowledge to aid learning of related or more complex tasks; regarded as being inspired by the human learning behavior. It aims to leverage useful information contained in multiple learning tasks to learn a more robust learner, helping alleviate this data sparsity problem [30], what is facing in the field of spacecraft anomaly detection.
Multi-task learning is widely used in natural language processing, speech recognition, drug discovery, and other fields [31]. With regard to anomaly detection, a video anomaly detection method based on self-monitoring and multi-task learning has been proposed [32], which integrates multiple self-monitoring proxy tasks and knowledge distillation algorithms, achieving good results in three benchmarks. To train a robust anomaly detector for Railway Track Inspection from an insufficient data and detect multiple types of anomalies, multiple tasks such as material classification and fastener detection are combined within a multi-task learning framework, resulting in improved performance [33]. Similarly, both high-scale shape features and low-scale fine features are combined by using two complementary tasks of jigsaw puzzle and geometric transformation recognition in a multi-task framework, which greatly improves fine-grained anomaly detection performance over simple geometric tasks [34]. The work above decomposes the anomaly detection problem into multiple related but different proxy tasks, and then train them jointly within a multi-task framework to get better performance or detect more failure modes than single task. However, these works mainly focus on video, image and other fields rather than spacecraft anomaly detection.

2.3. Deep Learning Models

This subsection briefly presents selected deep learning models that are incorporated in the proposed MTAD method.

2.3.1. LSTM

An RNN is a type of neural network with short-term memory capability. The neurons do not only accept information from other neurons but can also use their own information to form a network structure involving loops. Using such self-feedback neurons, the RNN can process time series data of any length. However, a problem of gradient explosion and disappearance arises for RNNs when the input sequence is relatively long; therefore, variants to overcome this problem, such as the LSTM network [35], have emerged.
In an LSTM network, gates are introduced to control the information transmission and temporal memory paths; namely, the input i t , forget f t , and output o t gates. In detail, f t controls the amount of information on the previous-step internal state c t that should be forgotten, i t controls the amount of information to be saved regarding the current-step candidate state c t , and o t controls the amount of information on the current-step internal state c t that is output to the external state h t . The gates and cell updates and output are defined as follows:
i t = σ ( W i x t + U i h t 1 + b i ) , f t = σ ( W f x t + U f h t 1 + b f ) , o t = σ ( W o x t + U o h t 1 + b o ) , c ˜ t = tanh ( W c x t + U c h t 1 + b c ) , c t = f t c t 1 + i t c ˜ t   , h t = o t tan h ( c t ) ,
where x is the input vector, h is the output vector, is the dot product operator, the matrix W contains the parameters to be trained, σ ( . ) is the sigmoid nonlinear function, and tan h is the hyperbolic tangent function.

2.3.2. TCN

A temporal convolutional network (TCN) [36] is a network based on a convolutional neural network, which can be used to model time series data. TCNs exhibit good performance when applied to time series data such as text and video, and have the advantage of reduced memory consumption for training. Beyond convolutional networks, TCNs incorporate causal convolution [37,38], dilated convolution, and residual networks [39]. Briefly, causal convolution sets the causal relationship between each layer of the network and is suitable for data involving time series characteristics; dilated convolution increases the receptive field while reducing the network depth; and the residual block inside a residual network alleviates the vanishing gradient problem using skip connections. These features are described in further detail below.
  • Causal convolution
As shown in Figure 2, causal convolution performs convolution operations in chronological order. For instance, for the time series data modeling task f , assuming that the input is x ( 0 ) , , x ( T ) and the target output is y ( 0 ) , , y ( T ) , the data before time t is used to compute the output y ^ ( t ) at time t, such that
y ^ ( t ) = f ( x ( 0 ) , , x ( t ) ) .
2.
Dilated convolution
Dilated convolution is incorporated to expand the TCN receptive field. In Figure 2, the convolution kernel has size 3 and the expansion coefficients d of the first to third convolution layers are 1, 2, and 4, respectively. As the layer number deepens exponentially, the receptive field of the network expands under the premise of the controllable number of calculations, and more historical data can be used for time series data modeling.
If the input sequence is x = { x ( 0 ) , , x ( T ) } n and the convolution kernel is f : { 0 , , k 1 } then, for any data sample x ( t ) , the convolution can be expressed as
F d ( x ( t ) ) = i = 0 k 1 f ( i ) · X x ( t ) d · i ,
where k is the convolution kernel size, d is the expansion factor, and x ( t ) d · i represents the past direction. The larger the expansion coefficient, the larger the receptive field of the convolutional network. When d = 1, the expanded convolution degenerates to an ordinary convolution.
3.
Residual connection
When the TCN depth increases, a residual module is introduced to improve the network stability. When x ( t ) is the input, the network output is expressed as follows:
ο = Activation ( x ( t ) + F d ( x ( t ) ) ) .

2.3.3. VAE

The VAE is a generative model based on the concept of Bayesian variational inference theory [40]. The core concept involves the use of neural networks to model two complex conditional probability density functions to sample from the distribution P ( z ) , obtain the latent variable z , and generate data through P θ ( x | z ) .
The VAE consists of an inference network and generating network. The inference network learns to map the independent and identically distributed variables X = { x ( i ) } i = 1 N to the latent variable Z = { z ( i ) } i = 1 M following the distribution N ( 0 , Ι ) in the latent space, and can be regarded as an encoder. The generating network learns to use Z = { z ( i ) } i = 1 M to sample the reconstructed original input data from the learned distribution. The difference between the VAE and traditional AE is that the output of the encoder and decoder in the VAE is a distribution or a parameter of the distribution, rather than a deterministic encoding.
Specifically, the generated data samples have the following probability density:
P ( x ) = P ( x , z | θ ) = P ( x | z , θ ) P ( z ) d z ,
where P ( x | z , θ ) is the conditional probability density function for observation of the spatial data sample when z is known. The probability of generating x from some z can be maximized by optimizing the parameter θ , where z is generated from P ( z ) . Then, the VAE learns to optimize the decoder and encoder parameters θ and ϕ , respectively, by maximizing the evidence lower bound as follows:
m a x θ , ϕ E L B O ( Q , x | θ , ϕ ) = max θ , ϕ E z Q ( z | ϕ ) [ log P ( x | z , θ ) P ( z | θ ) Q ( z | ϕ ) ] = m a x θ , ϕ E z Q ( z | x , ϕ ) [ log P ( x | z , θ ) ] D K L [ Q ( z | x , ϕ ) | | P ( z | θ ) .
For sample x , the expectation E z Q [ log P ( x | z , θ ) ] can be computed from samples taken from Q ( z | x , ϕ ) as follows:
E z Q [ log P θ ( x | z ) ] 1 L i = 1 L log P θ ( x | z ( l ) ) .

3. Materials and Method

In this section, we first present the idea and the overall process of our MTAD model. Then, the data preprocessing method is introduced. Finally, we show the details of each learning task and joint learning of the MTAD.

3.1. Overview

Hereafter, X t = { x 1 , x 2 , , x M } M indicates M-dimensional data samples at time, and X = { X t 1 , X t 2 , , X t K } M × K indicates the data acquired over steps. As stated above, this study proposes the MTAD method, which performs unsupervised anomaly detection by simultaneously learning the spatial-temporal patterns of the spacecraft data.
The overall process of the MTAD method is shown in Figure 3 and can be broadly divided into training and inference processes, as detailed below.
Training Process: First, the training data are standardized and the mean μ t r a i n and standard variance δ t r a i n are obtained. Then, a sliding window is used to obtain the final input data X and output Y , and the processed data are used to train the multi-task model MTAD mt , which incorporates Proxy Tasks 1–4 (detailed in Section 3.3). Finally, the feature vector E r r ( X ) extracted by MTAD mt is used to train the error evaluation and anomaly detection model, MTAD ad , which is implemented by the iForest algorithm.
Inference Process: Parameters μ t r a i n and δ t r a i n are used to standardize the test data S and are fed into MTAD mt to obtain the feature vector E r r ( S ) . Then, E r r ( S ) is fed into MTAD ad for anomaly detection.
The various steps of the training and inference processes are explained in detail in the following subsections.

3.2. Data Preprocessing

3.2.1. Data Standardization

The data are first standardized to facilitate rapid convergence during learning for the model. If the mean of the data is μ and the standard deviation is σ , the data sample x ( i ) can be standardized as follows:
x ( i ) = x ( i ) μ σ .

3.2.2. Sliding Window Processing

As shown in Figure 4, using a sliding window, the input data can be formulated as an M × K matrix, where K is the time step and M is the number of parameters, such that
X = [ x t K + 1 ( 1 ) x t K + 1 ( 2 ) x t K + 1 ( M 1 ) x t K + 1 ( M ) x t K + 2 ( 1 ) x t K + 2 ( 2 ) x t K + 2 ( M 1 ) x t K + 2 ( M ) x t 1 ( 1 ) x t 1 ( 2 ) x t 1 ( M 1 ) x t 1 ( M ) x t ( 1 ) x t ( 2 ) x t ( M 1 ) x t ( M ) ] R M × K .

3.3. Multi-Tasks and Joint Learning

The multi-task model, i.e., MTAD mt , learns the spatial-temporal patterns of the spacecraft data simultaneously through joint training via the four proxy tasks described below. The input is X M × K , as shown in Equation (9), and the output is the four-dimensional vector E r r ( X ) .

3.3.1. Task 1: LSTM-Based Data Prediction

Task 1 uses an LSTM network to model the temporal dependencies of the time series data for prediction. The input is an M × K matrix, as in Equation (9), and the output is a prediction of the next P steps, denoted o Task 1 M . Task 1 is expressed as follows, where θ 1 are the network parameters:
o Task 1 = P θ 1 ( X ) .
The predicted data after the next P steps are Y t M , where
Y t = [ x t + P ( 1 ) x t + P ( 2 ) x t + P ( M 1 ) x t + P ( M ) ] M .
The loss of the LSTMs is derived using the mean squared error (MSE) determined from the differences between the real and predicted values, i.e., o Task 1 and Y t , where
L o s s 1 = L ( θ 1 ; X ) = i = 1 N ( O Task 1 Y t ( i ) ) 2 .

3.3.2. Task 2: AE-Based Latent Representation Learning and Data Reconstruction

Task 2 uses a temporal convolutional AE neural network to generate latent representation based on the spatial patterns of the data and to reconstruct the data. The AE learns the latent representation of the data in an unsupervised manner. For data samples with M -dimensional X ( i ) M , 1 i N , the AE reduces the data dimensionality to K dimensions, with Z ( i ) k , 1 i N , where Z ( i ) is the compressed representation of dimension i. The optimization goal is to minimize the error between the reconstructed and original data.
The model is expressed in Equation (13), where E n c o d e r θ 2 is the encoder network, D e c o d e r φ 2 is the decoder network, and θ 2 and φ 2 are the parameters to be learned. E n c o d e r θ 2 maps the original data X M × K to a latent representation z ae , as shown in Equation (14).
o T a s k 2 = D e c o d e r φ 2 ( E n c o d e r θ 2 ( X ) ) ,
z a e = E n c o d e r θ 2 ( X )
The errors of Task 2 are derived from the difference between the reconstructed values of the AE network o Task 2 and the real values o Task 2 , and are calculated using the logcosh function such that
L o s s 2 = L ( θ 2 , ϕ 2 ; X ) = i = 1 N log ( cosh ( o Task 2 ( i ) X ( i ) ) ) .

3.3.3. Task 3: VAE-Based Latent Representation Learning and Data Reconstruction

Task 3 uses an LSTM VAE neural network to generate latent representation and to reconstruct the input to model the spatial patterns in the data. The VAE uses the inference and generating networks to model the conditional probability density function. A probability encoder models the latent variable distribution, and the latent space variability is considered in the sampling process; hence, the VAE has higher expressive ability than the AE. Therefore, on the basis of Task 2, Task 3 is designed to learn the VAE-based latent representation and to reconstruct the data, so as to improve the spatial-dependency capturing performance of the model.
The Task 3 input is given in Equation (9) and the output is denoted o T a s k 3 M × K . Task 3 is expressed in Equation (16), where E n c o d e r θ 3 is the encoder network and θ 3 is the parameter to be trained. E n c o d e r θ 3 maps the original data to a latent representation and is sampled from its distribution, as detailed in Equation (17). D e c o d e r φ 3 is the decoder network and φ 3 are the parameters to be trained. D e c o d e r φ 3 maps the latent representation to the original data X M × K .
o Task 3 = D e c o d e r φ 3 ( E n c o d e r θ 3 ( X ) ) ,
z v a e = E n c o d e r θ 3 ( X ) .
The Task 3 reconstruction error is determined from the difference between o Task 3 and X , and is calculated based on the MSE such that
L o s s 3 = L ( θ 3 , ϕ 3 ; X ) = i = 1 N ( 1 L l = 1 L || D θ 3 ( z ( l ) ) X ( i ) || 2 D K L [ Q ( z | X ( i ) , ϕ 3 ) | | P ( z | θ 3 ) ] ) .

3.3.4. Task 4: Joint Latent Representation-Based Data Prediction

Task 4 uses an LSTM network to output predictions from the joint latent representation learned in Tasks 2 and 3. Here, z i n p u t is the Task 4 input, with z i n p u t = [ z a e , z v a e ] ; o Task 4 M denotes the output; and θ 4 are the parameters to be trained for the Task 4 network, such that
o Task 4 = P θ 4 ( X ) .
Then, the difference between the predicted and expected values of the network, o Task 4 and Y t , respectively, is taken as the Task 4 error. This can be calculated based on the MSE, where
L o s s 4 = L ( θ 1 , θ 2 , θ 4 ; X ) = i = 1 N ( o Task 4 Y t ( i ) ) 2 .
The above four tasks are jointly trained to capture the spatial and temporal correlations in the data. For the input X , the joint loss function is
L o s s t o t a l = L o s s 1 + L o s s 2 + L o s s 3 + L o s s 4 .

3.4. iForest-Based Error Evaluation and Anomaly Detection

Based on the multi-task model, i.e., MTAD mt , the features of X are extracted as E r r ( X ) = [ E r r 1 ( X ) , E r r 2 ( X ) , E r r 3 ( X ) , E r r 4 ( X ) ] , where E r r i ( X ) is the loss of Proxy Task i for input X .
Next, the iForest-based [41] error evaluation and anomaly detection model, i.e., M T A D a d , is trained to perform anomaly detection from E r r ( X ) . In the iForest approach, a random hyperplane cuts the data space into two subspecies repeatedly, until no further subdivisions are possible. Hence, an isolated tree with only one data sample per leaf node is formed. In the present case, there is a low density of abnormal data samples and, thus, fewer splits are required to separate them individually for detection as anomalies. The MTAD method can be summarized as follows.
Training Process: The training dataset is input to MTAD mt , the error is calculated and E r r ( X ) is obtained. Then, the vector is input to MTAD ad for training.
Inference Process: The test data S are input to MTAD mt to obtain E r r ( S ) . Then, E r r ( S ) is input to MTAD ad and the anomalies are detected.
The MTAD Training Procedure is shown in Algorithm 1 as follows:
Algorithm 1: MTAD Training Procedure.
Input:   Dataset   X _ t r a i n = { X ( 1 ) , X ( 2 ) , , X ( N ) }
Output:   Trained   Multi - task   model ,   MTAD mt ;   trained   error   evaluation   and   anomaly   detection   model ,   MTAD ad
MTAD mt Training
1 :   Standardize   X _ t r a i n   and   get   parameters   u t r a i n   and   δ t r a i n //Equation (8), use a sliding window to transform each sample into X M × K .
2: Randomly initialize parameters
3: while epoch < Total_Epoch_Num do
4:    for   each   X   in   X _ t r a i n
5:    Get   outputs   o Task 1 ,   o Task 2 ,   o Task 3 from Tasks 1, 2, and 3//Equations (10), (13) and (16)
6:    Get   latent   representations   z a e   and   z v a e from Tasks 2 and 3//Equations (14) and (17)
7:    Combine   z a e   and   z v a e   to   obtain   z i n p u t = [ z a e , z v a e ]
8:    Get   output   o Task 4 , from Task 4//Equation (19)
9:    Compute L o s s 1 ,   L o s s 2 ,   L o s s 3 ,   L o s s 4 //Equations (12), (15), (18) and (20)
10:      Get   final   L o s s t o t a l = L o s s 1 + L o s s 2 + L o s s 3 + L o s s 4 //Equation (21)
11:     Update parameters using gradient descent
      end for
12: end while
13: Joint training to obtain
MTAD mt = [ Task 1 θ 1 , Task 2 θ 2 , φ 2 , Task 3 θ 3 , φ 3 , Task 4 θ 2 , θ 3 , θ 4 ]
M T A D a d Training
14 :   Get   the   feature   vector   based   on   MTAD mt on the training dataset
E r r ( X _ t r a i n ) = [ E r r 1 , E r r 2 , E r r 3 , E r r 4 ]//Equations (12), (15), (18) and (20)
15 :   Train   MTAD ad   based   on   E r r ( X _ t r a i n )
The MTAD Inference Procedure is shown in Algorithm 2 as follows:
Algorithm 2: MTAD Inference Procedure.
Input: Test   data   S M × K ;   trained   Multi - task   model ,   MTAD mt ;   trained   error   evaluation   and   anomaly   detection   model ,   MTAD ad
Output: Anomaly detection result
1 :   Standardize   S based on   u t r a i n   and   δ t r a i n
2 :   Derive   errors   E r r ( S )   based   on   MTAD mt
3 :   Input   E r r ( S )   to   anomaly   detection   model ,   MTAD ad
4 :   if   MTAD ad ( E r r ( S ) ) == 1 then
5:   result = “anomaly”
6: else
7:   result = “normal”
8: end if

4. Results and Discussion

4.1. Dataset

An experiment was performed using a dataset obtained from the electrical power subsystem of an in-orbit spacecraft, which contained two-and-a-half-year data with more than 0.6 million samples regarding 8 telemetries, such as voltage, current and temperature of the solar panels. Of the dataset, the early 80% were used for training and the remaining 20% for testing. Of the data samples, 0.15% were labeled anomalous by an expert and the rest were normal. The training dataset was preprocessed to remove errors in the data [42]. The detailed dataset statistics are summarized in Table 1.

4.2. Setup and Implementation Details

4.2.1. Evaluation Measures

Because the accuracy can be a misleading measure when the data are highly imbalanced [43], we evaluated our model in terms of P, R, and F1-score, as follows:
P = TP TP + FP ,
R = TP TP + FN ,
F 1 = 2 × P × R P + R ,
where TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative) represent the numbers of samples correctly classified as anomalous, correctly classified as normal, incorrectly classified as anomalous, and incorrectly classified as normal, respectively.

4.2.2. Implementation Details

We set the input length to 8 for each task. Thus, for the prediction task, the next time step of the time-series telemetry data was predicted based on the preceding eight steps. The model parameters are summarized in Table 2.
During the experiment, the contamination proportions in the training data were set to 0.005 and 0.002. In each task, the errors between the true and predicted values were ranked, and the error value in the contamination percentile was used as the threshold ( T H R ). For test sample S , the prediction/reconstruction error was taken as E r r ( S ) . If E r r ( S ) exceeded or was equal to T H R , S was identified as abnormal; otherwise, it was normal.

4.2.3. Methods for Comparison

The following alternative methods were also employed in the experiment, for comparison with the proposed MTAD:
  • iForest, which defines anomalous samples as sparsely distributed points that can easily be isolated. On the basis of sampling, a random hyperplane is used to construct the feature space to build an isolation tree. In the experiment, the input length was set to 8 and the number of trees was set to 100.
  • LSTM-AE [44], which is a reconstruction-based method featuring LSTM layers on both the encoder and decoder.
  • Fully connected neural network-based variation auto-encoder (FCNN-VAE), which is a VAE anomaly detection model implemented based on an FCNN.

4.3. Results and Discussion

We compared the findings of the proposed method with the existing alternatives in terms of P, R, and F1-score, for the different contamination ratios listed in Table 3. The proposed MTAD method outperformed the alternatives in each comparison test, proving its effectiveness. Specifically, MTAD exhibited the best performance as it captured the spatial and temporal correlations in the data, which is significant for discriminating normal values from anomalies. However, the alternative methods failed to do this.
Figure 5 shows the original telemetry data in blue and the detected results in green. For the detected results, “high” and “low” values denote normal and anomalous results, respectively. The red dotted line indicates the time at which the failure occurred. It is apparent that our method consistently returned a “low” value when the exception occurred, which indicates that our method is sensitive to unexpected data samples. Note that individual false alarms were reported, as shown in Figure 6a, because some data errors existed in the test dataset. This outcome is reasonable because data errors are a kind of anomaly that must be handled appropriately.
The performance of an unsupervised anomaly detection method depends on the ability of the method to learn data patterns well during the training process, without under- or overfitting. Therefore, we checked the fit of each task over the training and test datasets in Figure 6. Figure 6a,d show the prediction errors of Proxy Tasks 1 and 4, respectively, on the training and test datasets. Figure 6b,c show the reconstruction errors of Proxy Tasks 2 and 3, respectively. Clearly, none of the four proxy tasks was over- or underfitted; this was because of the shared representation of multiple tasks, which improved the model generalization.
To verify the effectiveness of each task implemented in the proposed method, ablation experiments were performed on the dataset; the results are shown in Figure 7. In the figure, “T1”, “T2”, and “T3” indicate that only Proxy Tasks 1, 2, or 3 were implemented, respectively, and “All” indicates that all tasks were reserved. “T1 + T2” indicates that Proxy Tasks 1 and 2 were reserved, and other combinations have similar meanings. Note that, as the Task 4 input depends on the latent representations of Tasks 2 and 3, there is no separate “T4” result. The experiment results demonstrate that, with a greater number of proxy tasks, the F1-score improved accordingly. Furthermore, the simultaneous implementation of Tasks 1 and 4 further improved the model performance. The F1-score values were lower when only a single prediction or reconstruction task was reserved, which indicates that the MTAD method is effective for anomaly detection because of the joint training of all proxy tasks.
The detailed ablation study results are summarized in Table 4.

5. Conclusions

In this study, we proposed the MTAD method, which is a new multi-task learning unsupervised anomaly detection technique that (i) captures the spatial-temporal correlations of multi-dimensional time series data from the spacecraft and extracts anomalous features and (ii) implements an iForest-based model to detect anomalies from those features. Experiments on a real spacecraft telemetry dataset demonstrated that the MTAD method outperforms comparative methods, and ablation studies highlighted the effectiveness of the employed multi-task learning.
In future work, we will evaluate our model on more real datasets with different characteristics, such as data from different satellite subsystems, with different percentage of anomalies and from different kinds of spacecraft. Moreover, we will add weights to the various proxy tasks to investigate the effect of changes in these weights on the model performance. Overall, the results of this work indicate the promising potential of the proposed MTAD technique for application to anomaly detection for in-orbit spacecraft.

Author Contributions

Conceptualization, X.H.; methodology, K.Y. and J.G.; software, K.Y.; validation, L.G.; investigation, Y.C.; resources, Y.C.; data curation, Y.W.; writing—review and editing, Y.W. and K.Y.; visualization, L.G.; supervision, J.G.; project administration, X.H.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China under Grant 61972398 and Key Research Program under Grant 2019-JCJQ-ZD-342-00.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used is from an in-orbit spacecraft, but it has not been made public yet.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Barreyre, C.; Boussouf, L.; Cabon, B.; Laurent, B.; Loubes, J.-M. Statistical methods for outlier detection in space telemetries. In Space Operations: Inspiring Humankind’s Future; Springer: Berlin/Heidelberg, Germany, 2019; pp. 513–547. [Google Scholar]
  2. Fujimaki, R.; Yairi, T.; Machida, K. An approach to spacecraft anomaly detection problem using kernel feature space. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, IL, USA, 21 August 2005; pp. 401–410. [Google Scholar]
  3. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [Google Scholar] [CrossRef]
  4. Peng, X.; Pang, J.; Peng, Y.; Liu, D. Review on anomaly detection of spacecraft telemetry data. Chin. J. Sci. Instrum. 2016, 37, 1929–1945. [Google Scholar]
  5. Fuertes, S.; Picart, G.; Tourneret, J.-Y.; Chaari, L.; Ferrari, A.; Richard, C. Improving spacecraft health monitoring with automatic anomaly detection techniques. In Proceedings of the 14th International Conference on Space Operations, Daejeon, Korea, 16–20 May 2016; p. 2430. [Google Scholar]
  6. Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar]
  7. Ding, N.; Gao, H.; Bu, H.; Ma, H.; Si, H. Multivariate-time-series-driven real-time anomaly detection based on bayesian network. Sensors 2018, 18, 3367. [Google Scholar] [CrossRef] [Green Version]
  8. Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar]
  9. Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. TadGAN: Time series anomaly detection using generative adversarial networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 33–43. [Google Scholar]
  10. Paffenroth, R.; Du Toit, P.; Nong, R.; Scharf, L.; Jayasumana, A.P.; Bandara, V. Space-time signal processing for distributed pattern detection in sensor networks. IEEE J. Sel. Top. Signal Process. 2013, 7, 38–49. [Google Scholar] [CrossRef]
  11. Latecki, L.J.; Lazarevic, A.; Pokrajac, D. Outlier detection with kernel density functions. In International Workshop on Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2007; pp. 61–75. [Google Scholar]
  12. Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef]
  13. Sakurada, M.; Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast, Australia, 2 December 2014; pp. 4–11. [Google Scholar]
  14. Zhang, Y.; Chen, Y.; Wang, J.; Pan, Z. Unsupervised deep anomaly detection for multi-sensor time-series signals. IEEE Trans. Knowl. Data Eng. 2021. [Google Scholar] [CrossRef]
  15. Li, Z.; Zhao, Y.; Han, J.; Su, Y.; Jiao, R.; Wen, X.; Pei, D. Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Washington, DA, USA, 14–18 August 2021; pp. 3220–3230. [Google Scholar]
  16. Jiang, L.; Li, H.; Yang, G.; Yang, Q.; Huang, H. A survey of spacecraft autonomous fault diagnosis research. J. Astronaut. 2009, 30, 1320–1326. [Google Scholar]
  17. Wang, W.; Wang, X.; Xu, H. Design and Implementation of Autonomous Health Management System for GF-3 Satellite. Spacecr. Eng. 2017, 26, 40–46. [Google Scholar]
  18. Wang, X.-G.; Wang, C.; Han, K.-Z. Early Fault Diagnosis Method of Rolling Bearings Based on Optimization of VMD and MCKD. J. Northeast. Univ. (Nat. Sci.) 2021, 42, 373. [Google Scholar]
  19. Fei, Y.; Meng, T.; Jin, Z.-H. Hierarchical fault detection for nano-pico satellite attitude control system. J. ZheJiang Univ. (Eng. Sci.) 2020, 54, 824–832. [Google Scholar]
  20. Li, L.; Gao, Y.; Wu, Z.; Zhang, X. Small fault detection method for actuator of satellite attitude control system. J. Beijing Univ. Aeronaut. Astronaut. 2019, 45, 529. [Google Scholar]
  21. Jiang, H.; Zhang, K.; Wang, J.; Lü, M. Spacecraft Anomaly Recognition Based on Morphological Variational Mode Decomposition and JRD. Xibei Gongye Daxue Xuebao/J. Northwestern Polytech. Univ. 2018, 36, 20–27. [Google Scholar] [CrossRef] [Green Version]
  22. Rahimi, A.; Kumar, K.D.; Alighanbari, H. Fault detection and isolation of control moment gyros for satellite attitude control subsystem. Mech. Syst. Signal Process. 2020, 135, 106419. [Google Scholar] [CrossRef]
  23. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  24. Chalapathy, R.; Chawla, S. Deep learning for anomaly detection: A survey. arXiv 2019, arXiv:1901.03407. [Google Scholar]
  25. Ni, P.; Wen, X. Fault diagnosis of satellite attitude actuator based on recurrent neural network. Chin. Space Sci. Technol. 2021, 41, 121. [Google Scholar]
  26. Jiang, H.; Shao, H.; Li, X. Deep learning theory with application in intelligent fault diagnosis of aircraft. J. Mech. Eng. 2019, 55, 27–34. [Google Scholar] [CrossRef]
  27. Zeng, Z.; Jin, G.; Xu, C.; Chen, S.; Zeng, Z.; Zhang, L. Satellite Telemetry Data Anomaly Detection Using Causal Network and Feature-Attention-Based LSTM. IEEE Trans. Instrum. Meas. 2022, 71, 1–21. [Google Scholar] [CrossRef]
  28. Caruana, R. Multitask learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
  29. Xu, D.; Shi, Y.; Tsang, I.W.; Ong, Y.-S.; Gong, C.; Shen, X. Survey on multi-output learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2409–2429. [Google Scholar] [CrossRef] [Green Version]
  30. Zhang, Y.; Yang, Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 2021. [Google Scholar] [CrossRef]
  31. Ruder, S. An overview of multi-task learning in deep neural networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
  32. Georgescu, M.-I.; Barbalau, A.; Ionescu, R.T.; Khan, F.S.; Popescu, M.; Shah, M. Anomaly detection in video via self-supervised and multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12742–12752. [Google Scholar]
  33. Gibert, X.; Patel, V.M.; Chellappa, R. Deep multitask learning for railway track inspection. IEEE Trans. Intell. Transp. Syst. 2016, 18, 153–164. [Google Scholar] [CrossRef] [Green Version]
  34. Jezequel, L.; Vu, N.-S.; Beaudet, J.; Histace, A. Fine-grained anomaly detection via multi-task self-supervision. In Proceedings of the 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Washington, DC, USA, 16–19 November 2021; pp. 1–8. [Google Scholar]
  35. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  36. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  37. Oord, A.v.d.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
  38. Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
  39. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  40. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
  41. Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
  42. Wang, Y.; Gong, J.; Zhang, J.; Han, X. A Deep Learning Anomaly Detection Framework for Satellite Telemetry with Fake Anomalies. Int. J. Aerosp. Eng. 2022, 2022, 1676933. [Google Scholar] [CrossRef]
  43. Akosa, J. Predictive accuracy: A misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum, Orlando, FL, USA, 2–5 April 2017. [Google Scholar]
  44. Malhotra, P.; Ramakrishnan, A.; Anand, G.; Vig, L.; Agarwal, P.; Shroff, G. LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv 2016, arXiv:1607.00148. [Google Scholar]
Figure 1. Overall structure of proposed MTAD method.
Figure 1. Overall structure of proposed MTAD method.
Applsci 12 06296 g001
Figure 2. Schematic of the TCN.
Figure 2. Schematic of the TCN.
Applsci 12 06296 g002
Figure 3. Overall process of the proposed MTAD method.
Figure 3. Overall process of the proposed MTAD method.
Applsci 12 06296 g003
Figure 4. Schematic of the sliding window.
Figure 4. Schematic of the sliding window.
Applsci 12 06296 g004
Figure 5. Detection results of the proposed MTAD method on test data set. “High” and “low” values indicate normal and anomalous results, respectively.
Figure 5. Detection results of the proposed MTAD method on test data set. “High” and “low” values indicate normal and anomalous results, respectively.
Applsci 12 06296 g005
Figure 6. (a,d) Task 1 and 4 prediction errors, respectively, and (b,c) Task 2 and 3 reconstruction errors, respectively, on training (blue) and test (red) data.
Figure 6. (a,d) Task 1 and 4 prediction errors, respectively, and (b,c) Task 2 and 3 reconstruction errors, respectively, on training (blue) and test (red) data.
Applsci 12 06296 g006
Figure 7. F1-scores for anomaly detection obtained by adding one proxy task at a time: (ac) Ablation test results based on Tasks 1–3, respectively.
Figure 7. F1-scores for anomaly detection obtained by adding one proxy task at a time: (ac) Ablation test results based on Tasks 1–3, respectively.
Applsci 12 06296 g007
Table 1. Detailed dataset statistics.
Table 1. Detailed dataset statistics.
DatasetInstances DimensionLabeled Anomalies
All0.9994820,125
Train491,34380
Test126,062820,125
Table 2. MTAD parameter settings.
Table 2. MTAD parameter settings.
TaskLayersHyper Parameters
LossParametersValue
Task1LSTM (units = 32)Equation (12)batch size32
Dense (units = 8)
Task2TCN (filters = 20, kernelsize = 20)Equation (15)
Conv1D (filters = 8)
Pooling (size = 8)
UpSampling1D (size = 8)training epochs40
TCN (filters = 20, kernelsize = 20)
Conv1D (filters = 8)
Task3LSTM (units = 32)Equation (18)
Dense (units = 30)
RepeatVector (units = 8)optimizerAdam
LSTM (units = 32)
Task4LSTM (units = 32)Equation (20)
LSTM (units = 32)
Dense (units = 8)
iForest//n_estimators100
Table 3. Detection performance of alternative methods and the proposed MTAD method.
Table 3. Detection performance of alternative methods and the proposed MTAD method.
MethodContamination = 0.005Contamination = 0.002
PRF1-ScorePRF1-Score
MTAD0.99661.00.99800.99570.99970.9977
LSTM-AE0.73500.99980.84720.92230.99980.8043
iForest0.99490.53150.69280.41390.99940.5854
BP-VAE0.15961.00.27530.15961.00.2753
Table 4. Ablation study results.
Table 4. Ablation study results.
MethodContamination = 0.005Contamination = 0.002
PRF1-ScorePRF1-Score
MTAD0.99661.00.99800.99570.99970.9977
T1 (LSTM)0.93311.00.96530.80821.00.8939
T2 (TCN-AE)0.96291.00.98110.78540.95640.8798
T3 (LSTM-VAE)0.93571.00.96680.83441.00.9097
T1 + T20.99480.99900.99690.99600.99270.9944
T1 + T2 + T30.99590.99950.99770.99730.99700.9972
T2 + T3 + T40.99530.99900.99710.99670.99700.9969
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yang, K.; Wang, Y.; Han, X.; Cheng, Y.; Guo, L.; Gong, J. Unsupervised Anomaly Detection for Time Series Data of Spacecraft Using Multi-Task Learning. Appl. Sci. 2022, 12, 6296. https://doi.org/10.3390/app12136296

AMA Style

Yang K, Wang Y, Han X, Cheng Y, Guo L, Gong J. Unsupervised Anomaly Detection for Time Series Data of Spacecraft Using Multi-Task Learning. Applied Sciences. 2022; 12(13):6296. https://doi.org/10.3390/app12136296

Chicago/Turabian Style

Yang, Kaifei, Yakun Wang, Xiaodong Han, Yuehua Cheng, Lifang Guo, and Jianglei Gong. 2022. "Unsupervised Anomaly Detection for Time Series Data of Spacecraft Using Multi-Task Learning" Applied Sciences 12, no. 13: 6296. https://doi.org/10.3390/app12136296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop