1. Introduction
In recent decades, embedded computers in aviation and aerospace systems have become more integrated and intelligent. Because of their complex structures, embedded computer systems’ development, production and maintenance costs are increasing. The functions of computers in working and storage conditions are subject to gradual degradation under the prolonged effects of stress factors such as heat, humidity, and vibration, which may eventually cause functional failure. During the maintenance of embedded computers, predicting the degradation trend of these critical functional parameters is desirable. In this way, maintenance personnel can monitor the health status of the products in real time and perform timely maintenance before a failure occurs, thereby reducing downtime and maintenance costs. Hence, the degradation analysis of electronic systems based on critical parameters has emerged as a research hotspot for scholars.
The reliability of computers and circuits deployed on aerospace equipment should be the most important factor in maintenance work. It is necessary to predict long-term computer degradation. Algorithms should predict computer failure due to occur within a few weeks or longer to facilitate early component replacement. Therefore, the long-term monitoring and prediction of key parameters of aerospace computers and their circuits can improve product reliability more effectively. For example, the flight control computers equipped in unmanned aerial vehicles (UAV) are subject to accelerated degradation in humid and hot climates and eventually fail to function. The power consumption and resistance of the entire computer rise significantly before its failure, which shows a noticeable degradation process. The timely repair and replacement of electronic components can improve the UAVs’ operational capability.
The theoretical research and application of computer degradation are in the exploratory stage. In the research process, scholars identify the key parameters that influence the computer’s function by analyzing the weak part of the computer. They perform failure mode and mechanism analyses based on these parameters. So, the reliability of the entire electronic system is simplified by studying the degradation of several key characterization parameters. Many studies have conducted various explorations in this direction following this idea. Mao et al. [
1] collected data on the key parameters of embedded computers during temperature-accelerated aging and evaluated real-time input data at normal working temperatures. They used the real input data under storage to update the acceleration factor and reasonably estimate its current degradation trend.
The current mainstream research methods mainly include physical-failure-model-based, mathematical–statistical-model-based, and machine learning-based approaches. A brief description of the methods and their advantages and disadvantages is presented in
Table 1. With the development of deep learning networks, data-driven methods have become a hot research topic. These methods can handle complex prediction problems that are difficult to describe with physical or statistical models.
Essentially, the long-term degradation data of key computer parameters are time series data. Time series data analysis has extensive applications in finance, meteorology, agriculture, industry, and medicine [
22]. Particularly in recent years, with the advancements in sensor and network technology, maintenance personnel can more easily collect key computer parameters automatically at regular intervals. It means that a significant amount of time series data of key parameters are available now. Computer parameter degradation should be a long-term time series prediction problem. Researchers can adopt time series methods to deal with it. These time series methods can extract patterns from past degradation trend data and forecast future development trends. A critical computer parameter degradation method based on time series analysis will offer guidance for maintenance work and has high academic significance.
Predicting computer parameter degradation is challenging, and differs significantly from traditional time series forecasting methods. Standard time series prediction focuses on the correspondence between the annotated inputs and the expected outputs. It optimizes the model by minimizing the difference between the model’s output values and the annotated values. However, long-term computer degradation prediction needs existing data to forecast future trends iteratively. The prediction model uses the predicted value y at time t as input for the next time during the test. In this way, the iterative inputs result in the prediction of more steps in the future. This iterative approach theoretically transforms a single-step model into a multi-step prediction model. However, there is inevitably some error in the predicted values at each step. The prediction model uses these previous predictions as input for the next step, which can lead to an accumulation of errors over long iterations. Therefore, excessively long prediction steps can lead to a poor prediction accuracy of the model, which should be appropriately addressed.
The network employs a Siamese architecture network, where one branch is trained with annotated values and the other with a curriculum learning approach. This curriculum learning method combines ground-truth and predicted values as next-step inputs to train the model, gradually increasing the proportion of predicted values as inputs to reduce error accumulation. The distance between the covariance matrices of the output prediction sequences from the two branches is compared during training. This transfer learning approach can reduce errors between iterative and ground-truth training.
This paper proposes the curriculum and transfer learning methods to address the challenges of long-term computer parameter degradation prediction. Our contributions are as follows:
The network employs a Siamese architecture network, where one branch is trained with annotated data and the other with a curriculum learning approach. The curriculum learning branch combines annotated data and predicted output as the next-step inputs to train the model. The proportion of predicted values in the inputs gradually increases during the training process to avoid accumulative prediction error.
During the training phase, the network output trained with the ground-truth data is regarded as the source domain, while the network is trained iteratively with the predicted value as the target domain. Correlation alignment (CORAL) loss can facilitate time series features to align the covariance between domains. Our network incorporates CORAL loss within deep networks to learn a nonlinear transformation to align source and target domain distributions within feature space.
These improvements enable higher accuracy in the long-term iterative prediction of key computer parameters, providing more time for subsequent maintenance work. Our paper is organized as follows: The related works are discussed in
Section 2. We propose our method of long-term computer parameter degradation estimation in
Section 3.
Section 4 presents the experiments with our datasets. We conclude with a summary in
Section 5.
2. Related Works
This section discusses the existing electronic product degradation prediction methods, time series forecasting, and analysis methods.
2.1. Methods in Electronic Product Degradation Prediction
Physical-failure-based models describe the degradation mechanism of electronic systems. The degradation parameters of the physical model are related to the material properties and stress levels. They are identified by comparative experiments or finite element analysis.
The University of Maryland CALCE Center proposed the life consumption monitoring (LCM) methodology for electronic devices [
3]. The LCM method combines the monitoring of environmental and operational stresses with the physical model of failure in electronic devices to calculate the cumulative damage and predict the product prediction. Renwick [
4] obtained the condition degradation pattern of capacitor devices by monitoring the electrical stress and performed failure prediction analysis. Rana [
5] developed physical failure models for different electronic components and completed a life prediction study. Liu [
6] proposed a reliability assessment based on integrating highly accelerated life testing and accelerated degradation testing. Rockwell [
7] used early warning circuits embedded in the product for early fault diagnosis for welded and corroded parts with low cycle fatigue characteristics.
Statistical-model-based methods, also known as empirical-model-based methods, present degradation predictions as probability density functions by building statistical models based on empirical knowledge and presenting the probability density functions based on statistical results. Many kinds of statistical models are commonly used, such as the auto-regressive model, the Wiener process model, the Weibull process model, the Gaussian process model, the Markov model, etc. Kim [
8] proposed a state-of-health prediction for lithium-ion batteries with the Seasonal ARIMA (SARIMA) and auto-regressive integrated moving average with exogenous variables (ARIMAX) models. Li [
9] addressed the degradation prediction of electronic products based on the Wiener degradation process model and Bayesian posterior estimation to realize real-time updates of parameters. Wei [
10] applied the Weibull distribution to model the distribution of each key component in a complex electronic system. It dynamically depicted the reliability changes in the entire system under the stress impacts. Wan [
11] proposed a stochastic model of thermal reliability analysis and prediction for a whole electronic system based on the Markov process to estimate the thermal reliability of an electronic system. Wang [
12] proposed a generalized Gaussian process to construct a one-stage maximum-likelihood method for parameter estimation for degradation procedures. Shi [
13] proposed a method for forecasting the remaining life of a multi-component computer based on Copula theory. Each component’s degradation distribution function was deduced from kernel density estimation and generated the remaining life prediction model with their correlation.
Data-driven approaches use artificial intelligence techniques to learn patterns of electronic degradation from existing data rather than building physical or statistical models. They can handle complex prediction problems that are difficult to describe with physical or statistical models. With the continuous development of deep learning, data-driven approaches are receiving more and more attention in electronic device degradation prediction. Fan [
14] utilized the existing constant-stress accelerated test data in storage conditions to assess the computer degradation trend using the support vector machine method. Jiang [
15] proposed a reliable cycling aging prediction based on a data-driven model to address the urgent issue of the adaptive and early prediction of lithium-ion batteries’ remaining useful life. Li [
16] introduced a deep learning-based battery health prognostics approach to predict the future degradation trajectory in one shot without iteration or feature extraction. Zhao [
17] constructed a probabilistic degradation prediction framework to estimate the probability density of target outputs based on parametric and non-parametric approaches. The method can naturally provide a confidence interval for the target prediction. Deng [
18] adopts a new multi-scale dilation convolution fusion unit with different dilated factors for remaining useful life (RUL) prediction. Liu [
19] proposed a novel fault diagnostic application of the Gaussian–Bernoulli deep belief network (GB-DBN) for electronics-rich analog systems, which can more effectively capture high-order semantic features from the raw output signals.
2.2. The Forecasting and Analysis Methods of Time Series
This paper employs a time series approach to address the degradation of crucial computer parameters and focuses on predictive analysis methods for time series. Traditional methods determine the time series parameter model and solve model parameters to complete the prediction. Typical methods include ARIMA (Auto-Regressive Integrated Moving Averages) [
23] and the Holt–Winters method. While traditional time series models can solve simple prediction problems, they may only suffice if the time series contains fewer variables and dimensions or if the change patterns are overly complex.
Time series decomposition is a valuable method in time series analysis. This approach assumes that a time series is the result of the superposition or coupling of long-term trends (Secular trend, T), seasonal variations (Seasonal Variation, S), cyclical fluctuations (Cyclical Variation, C), and irregular fluctuations (Irregular Variation, I). A typical method in this approach is Prophet [
24].
Time series data prediction is closely related to regression analysis in machine learning. Support Vector Regression (SVR) applies SVM to time series function regression by mapping x to a high-dimensional feature space using a nonlinear transformation and estimating the time series. Gradient Boosting Regression Tree (GBRT) introduces gradient descent [
25] to solve regression problems by calculating the negative gradient of the loss function to minimize it and ultimately obtain the optimal model.
With the tremendous achievements of deep learning methods in computer vision and natural language processing, deep learning approaches have gradually been introduced into time series prediction applications. By constructing various network architectures, deep neural networks can better represent high-dimensional data, avoiding manual feature engineering and model design. Deep learning facilitates end-to-end training by optimizing a predefined loss function.
Convolutional neural networks (CNNs) extract local features in the time dimension and gradually aggregate them through multiple layers to obtain hidden information from past time sequences. Borovykh [
26] constructed a stacked structure of dilated convolutions in the network to aggregate more historical records for future time series prediction. Chen [
27] proposes a temporal convolutional network (TCN) that treats the sequence as a one-dimensional input and captures long-term relationships through iterative multi-layer convolution. The TCN uses causal dilated convolutions [
28] and residual convolution skip connections to provide an extensive temporal receptive field for modeling.
Recurrent neural networks (RNNs) learn the hidden states within all time series before prediction, serving as feature representations of past information. Combined with current inputs, they provide the next step prediction. Long Short-Term Memory (LSTM) [
29] and its variant Gate Recurrent Unit (GRU) [
30] are essential for RNN-based time series prediction. The DeepAR network [
31] employs an LSTM model to solve time series prediction problems. Its previous expected output is used as input to replace the input of the next moment in the prediction phase. Rangapuram [
32] propose an RNN-based deep state space model that considers the current state to be related only to the last moment. Wen [
33] used an encoder–encoder structure to forecast the multi-horizon time steps in the future simultaneously.
Self-attention networks, originating in natural language processing, can be quickly applied to time series prediction due to the similarity of these tasks. RNNs must sequentially aggregate all the hidden information from t-n to t when processing and analyzing time series. The connection between each time step usually needs to be improved. RNNs are less efficient because of their sequential processing procedure. The model based on the attention mechanism can realize the association between any units in the input time series. Association weighting provides representations in the lower layer features to the upper layer. The self-attention mechanism can better realize the contextual information interaction in time series. Wu [
34] tried the time series prediction task using the Transformer architecture with good results. The Transformer model has great potential to improve prediction performance. However, Transformer also has limitations such as high computational effort, high memory footprint, and an encoder–decoder architecture, which prevent it from being applied to more prolonged time series prediction problems. The Informer [
35] selects O(logL)-dominant queries based on query–key similarity, achieving a similar improvement in computational complexity to LogTrans. Researchers have explored frequency-domain self-attention mechanisms in time series modeling. Autoformer [
36] designs a short-term trend decomposition architecture with an autocorrelation mechanism as an attention module. It measures the delay similarity between input signals and aggregates the top k similar subsequences to produce an output with O(N logN) complexity. FEDformer [
37] applied the attention mechanism in the frequency domain using the Fourier transform and wavelet transform. It achieved linear complexity by randomly selecting a subset of fixed-size frequencies. Li [
38] proposed a combination of CNN and Transformer to enhance contextual information extraction. Lim [
39] used LSTM to model the input sequence as a preprocessing step and fed it into the upper transformer to compensate for the sequence information using attention.
For all neural networks, the self-attention method currently performs the best in time series prediction because it can pay more attention to the historical context of time series on a larger scale. Although the algorithms above have improved prediction accuracy for public time series datasets, reducing iterative error is more critical for long-term prediction in real applications. From another perspective, the application of time-series-based time networks is also limited by the limitations of large-scale data. For scenarios with limited sample data, this can also affect performance.
3. Our Approach
3.1. Overview of Our Network Architecture
Our network employs a dual-branch structure during training, as shown in
Figure 1.
is the real output at
i step in the collected dataset, which is annotated in the training process.
is the prediction output at
i step, which may be regarded as an iterative input at
step. One branch uses the curriculum learning training method and combines annotated and predicted values as inputs to train the model. The proportion of predicted value inputs gradually increases during the training process. The other branch always uses annotated data for training to preserve the real data distribution. To ensure better fits of the actual data distribution in the curriculum learning training, a transfer learning approach compares the distance between the covariance matrices of the output sequences from the two branches. Deep transfer learning confuses the distribution between the hidden layers of the two branches to make the data distribution using prediction from real values as similar as possible to that when using a prediction from the predicted value.
During the end-to-end optimization process, the Mean Squared Error (MSE) loss function compares the output time series from each branch to their respective annotated data. In addition, the correlation alignment (CORAL) loss between the two branches is evaluated based on the covariance distance between their output sequences.
We adopted the dual-branch network architecture just during the training. During testing, the data are fed into a single trained network branch. The output from the previous step should be regarded as input for the subsequent step.
During the loss function statistics, the time series of the outputs from the two branches are optimized with the ground truth as the loss functions, respectively. In addition, the correlation alignment (CORAL) loss between the two branches is evaluated in terms of the covariance distance according to the corresponding two output time sequences.
It should be noted that this network architecture is only used for the training process. During the testing process, only one network branch works appropriately. One step’s output should be used as input for the next step in the testing process.
The LSTNET [
40] structure is the backbone of our Siamese network architecture due to its effectiveness in accounting for the accumulation of linear trends and periodic fluctuations in time series data. Its hypothesis is consistent with the changing trends required in the current degradation process of key computer parameters. We chose the LSTNET structure for the network architecture because it can better account for the accumulation of linear and periodic fluctuations in the degradation trend.
3.2. Curriculum Learning Iterative Training in Degradation Prediction
The iterative training methodology employed in the curriculum learning branch introduces an exposure bias into time series forecasting. During training, the time series forecasting network can predict future steps based on currently available data. The predicted outputs from the network are compared with their respective annotated data. The network parameters are updated accordingly based on the difference between them. However, during subsequent training iterations, the predicted outputs from previous steps are utilized as input data in the following steps. This iterative forecasting approach can result in the accumulation of errors.
The traditional deep learning-based time series methodologies exhibited limited predictive capabilities during the initial stages of training. Generating consistently accurate forecasts has yet to be achieved. When a significant deviation occurs at a particular step, the biased output value as input data will exacerbate subsequent forecast inaccuracies. It can result in insufficient network convergence and poor forecast accuracy during training.
Curriculum learning adopts a structured learning methodology, just like the human learning process. Initially, an instructor imparts knowledge onto the student and waits until the student has acquired a foundational understanding before allowing them to engage in independent learning. The curriculum learning branch’s input data combines the previously predicted output and annotated data. As illustrated in
Figure 2, during the training phase, the network ceases to rely solely on predicted outputs as network inputs for subsequent steps. Instead, it selects its initial steps output with probability
p and the annotated input with probability
as its input for the next steps. Throughout this scheduled sampling process, the value of
p fluctuates between 0 and 1. When the network needs still more development at the beginning of training, a smaller value of
p is optimal, resulting in a higher proportion of annotated data being utilized in input samples to train the network. As training progresses and the model matures,
p should incrementally increase to maximize the use of the model’s own predicted output as the iterative input during training. This gradual learning approach enables the model to adapt to training with expected outputs as cyclic inputs while maintaining prediction accuracy.
During the training phase, the predicted output is selected with probability p while the annotated data are selected with probability . The value of p ranges from 0 to 1. As the number of training iterations increases, p incrementally increases from 0 to 1. p should be set to 1 at the end of training.
3.3. Domain Adaptation between Two Branches
The iterative errors will inevitably be cumulated when training the network with predicted values. During iterative training, there is a discrepancy between the distributions of predicted data and the distribution of annotated data. More alignment of their feature spaces is essential to diminish this discrepancy. Domain transfer learning can accomplish this goal.
During the training phase, the network output trained with the ground-truth data is regarded as the source domain, while the network is iteratively trained with the predicted value as the target domain. Throughout the learning process, synchronizing the output of both sub-networks to the corresponding time series keeps their sequence alterations consistent in the time dimension. The similarity between both branches is determined by measuring the distance between their output sequence covariances to evaluate two time series.
Covariance quantifies how each dimension deviates from its mean value. The covariance matrix can effectively measure the sequence correlation over time dimensions for time series data. The covariance matrix assesses the correlation of changes across the periods within the time series and serves as a representation of internal correlation. Comparing the correlation of two time series can evaluate their relationship to learn their joint distribution.
Our network adopts the CORAL loss function to measure the distance between two distributions. CORAL loss can facilitate feature transformation to align the covariance between different domains. Our network incorporates CORAL as a loss function within deep networks to learn a nonlinear transformation to align source and target domain distributions within feature space. The covariance matrices for the source and target domains are calculated as shown in Formulas (1)–(3). The covariances for the source and target domains are represented as
and
. Here,
and
are the feature presentation of source and target data, respectively. The covariance of the source domain and target domain are indicated as
and
, which are calculated with the equations below.
and
represent the sequence length in the source and target domains, respectively.
In our method, the distance calculation in the covariance matrix differs from the traditional method. In computer vision tasks, the samples in a batch are usually normalized, and then CORAL loss is used to minimize the discrepancy between the source and target domains. However, for the time-varying sequence of critical parameters, the covariance matrix mainly reflects the autocorrelation of different time series at each time step. Therefore, the output features should be normalized by the time dimension rather than by the samples in the batch, as shown in
Figure 3. Normalization can reduce the noise impact from data samples. The two predicted sequence matrices are compared along the time dimension.
In the implementation, we feed the predicted values of two sub-networks into two fixed-length queues. In these two queues, a first-in-first-out strategy is used to store the parallel outputs of the two networks separately. The latest predicted values are continuously fed to replace previous values during training. By measuring the two networks’ predicted output to calculate the covariance matrix’s distance and minimizing the discrepancy between the source and target domains through the above-mentioned distance optimization, the two branches can learn to form the same distribution.
5. Conclusions
With the development of networks and sensor technology, it is possible to collect datasets more efficiently. It is possible to employ data-driven computer degradation analysis. Our paper uses a combination of curriculum learning and transfer learning to effectively reduce cumulative errors in the prediction process for long-term prediction scenarios of key computer features. We propose a Siamese network architecture oriented towards correlation alignment. During training, one branch uses annotated data, while the other uses curriculum learning in iterative prediction. Moreover, the correlation of the time series generated by the two branches is measured by optimizing the CORAL loss function, which realizes the alignment of the prediction time series distribution. Compared to the time series prediction methods developed in recent years, our approach can effectively address long-term prediction in embedded computers. The maintenance and reliability of electronic devices can be improved.
Current time series networks based on deep learning rely on complete datasets, but many applications are challenging to achieve. We need to preprocess to impute the missing value for incomplete data. This can be performed for simple cases by interpolating before and after values. For complex cases, the missing parts can be generated by using self-encoders and adversarial networks.