1. Introduction
Weapons such as missiles and torpedoes have the characteristics of long-term storage and one-time use, and the majority of their life cycle involves storage. During the long-term storage period, due to factors such as corrosion, aging, and material surface and interface reactions, the material properties and physical parameters of products will gradually change, which ultimately leads to product failure due to failure to meet functional performance requirements.
During the design phase, we need to access the weapon storage lifetime, including components, raw materials, and parts, to ensure that the elements used in the equipment can meet the storage lifetime criterion. At the stereotype stage, the storage lifetime of the overall system needs to be verified, and after delivery, there is still a need for periodic sampling and testing of products in service and prediction of remaining life.
The accelerated storage test (AST) is one of the critical techniques used in the aforementioned storage lifetime assessment. AST is a testing technique that enhances a particular stress quantity to obtain the crucial parameters of the product’s performance degradation during storage while the failure mechanism remains constant [
1,
2,
3]. It is practically useful in engineering due to its high speed and efficiency. Since AST obtains degenerating performance data by rapidly increasing the corresponding stress, it is possible that the failure mechanism will change due to excessive stress [
4]. That is, the failure mechanism maintaining invariability is a precondition for AST [
5,
6]. Once the failure mechanism changes, the assessment results will definitely deviate from the real state of the product. Many factors may affect the credibility of AST results, such as test equipment errors, artificial errors, test ambient fluctuation, and sample dispersion. Even though all mentioned factors could be controlled precisely, there still exists disagreement between the parameters that are inferred from the AST model under normal stress, such as the storage lifetime and model credibility, and the system’s real state. In addition, there is a continuing debate as to whether the AST model is correct and reflects the acceleration process of various products.
In recent decades, extensive research has been carried out both domestically and abroad to investigate the alteration of the failure mechanism that occurs in the accelerated testing process. In general, these studies can be divided into two categories: the first involves using the perspective of failure physics to judge the consistency of the failure mechanism under different stresses. This aim is mainly achieved through a comprehensive analysis of the chemical and microscopic structures, as well as destructive physics, to judge whether samples under different stresses are consistent in terms of microscopic appearance, element distribution, material properties, and other factors [
7,
8,
9,
10]. It has a clear physical concept and can only qualitatively judge whether the failure mechanism has changed. However, it is difficult to provide quantitatively consistent test results. Spearman’s rank correlation coefficient [
11] and grey theory [
12,
13] are used to identify the consistency of the failure mechanism based on the shape of the degradation path. From the perspective of the pseudo-life distribution, the literature [
14,
15] has evaluated the reliability and predicted the lifetime, and it obeys a log-normal or Weibull distribution using the F-statistic and Bartlett statistic. The variation in the shape of the parameters in the Arrhenius model also provides a way to explore the consistency of the failure mechanism [
16]. Cai et al. [
17] proposed a change-point model for the coefficients of variation to fit the abrupt change behavior of the failure mechanisms with a nonparametric empirical likelihood approach, which was used in the lifetime data of the metal oxide semiconductor transistors in the power distribution system of the Chinese Tiangong space station. Zhai et al. [
18] proposed a method for consistency testing of ADT (accelerated degradation test) failure mechanisms based on the activation energy invariance method and the likelihood ratio test, accounting for the degradation dispersion caused by the manufacturing technology.
The aforementioned technique ensures the credibility of the AST result to some extent, even though there are still a few defects. For example, the method based on failure physics only qualitatively determines the change in the failure mechanism. Based on the experimental data, the boundary consistency method and the failure mechanism consistency discrimination only determine the consistency of the failure mechanism instead of assessing the result incredibility caused by factors such as measurement error and accelerated model applicability. For the AST result, the most reliable criterion is to test its consistency with the corresponding natural storage test data. However, how can the degree of consistency be judged and to what extent? There is still a lack of good evaluation metrics.
This paper proposes a new method for evaluating the credibility of accelerated storage test data based on the idea of an area metric with small samples of natural storage test data as the benchmark. The remainder of this paper is organized as follows:
Section 2 introduces the theory of the probability distribution area metric.
Section 3 defines a credibility metric called CMADT, which is extrapolated from the aforementioned theory and is applied to assess the credibility of the AST results. An engineering use case is conducted to demonstrate the validation of the assessment metric in
Section 4, and
Section 5 offers conclusions with a summary of the main findings.
2. CMADT Credibility Metric for the Accelerated Aging Test
The main stresses loaded in the accelerated storage test are high-temperature single stress and temperature-humidity double-stress. There are four types of accelerated storage tests: constant stress, step stress, step-down stress, and sequential stress, according to the different methods of stress loading. Regardless of which stress and stress loading methods are used, we can derive the degradation model of key performance parameters under normal stress (e.g., temperature 25 °C) through performance degradation modeling and accelerated model solving [
19,
20,
21,
22].
In engineering practice, there are often some natural storage test data in addition to accelerated storage test data. For example, the natural storage test is carried out during the initial sample stage of product development, and the natural storage test data can be obtained during the final evaluation. Natural storage data can also be obtained during the service stage after the equipment is delivered every year. Generally, natural storage data have a higher confidence level, but there are often two problems. First, the storage period is short; for example, it is necessary to assess the reliability of product storage for 20 years, while natural storage data often only have a few years of storage time data. Second, the sample size is often small, and the natural storage data are not sufficient to provide high-confidence assessment results. For the above reasons, storage life assessment is often provided by accelerated storage tests in engineering practice, while natural storage test data are mainly used to verify the correctness of accelerated storage tests.
2.1. Area Metric for a Single Parameter
The area metric was first proposed in 2008 by American scholars Ferson and Oberkampf [
23,
24], and developed by LI [
25], JI [
26] and ZHANG [
27], et al.; it is a confirmation metric based on the probability distribution distance, which is mainly used in the field of modeling simulations.
Figure 1 shows that by calculating the area between the simulation model response and the empirical cumulative distribution function of the experimental observations (shaded part in
Figure 1), the accuracy of the simulation model is quantified and evaluated.
Suppose that the accumulated distribution function for the equivalent data of a product’s key performance parameters, which are obtained from the accelerated storage test at time
, is
, and the accumulated distribution function of the r samples tested in the natural storage test at time
is
, then, the area metric confirmed by the model can be borrowed.
can be seen as the accumulated distribution function of the simulation model response, and
is seen as the accumulated distribution function of the test observations. The area metric of the accelerated storage test
momentary confidence evaluation is defined as
From Equation (1), it can be seen that the area metric is smaller when the probability distribution of the equivalent data is closer to that of the benchmark data, and vice versa. Therefore, the area metric can be used to assess the confidence level of the accelerated storage test.
2.2. Dimensionless Measurement of Area Metrics
The metrics defined in Equation (1) are content-based, and it is not clear how large the gap is between multiple parameters of different scales for the same product or between different products; therefore, the evaluation results cannot be compared. It is also not clear how the large gap indicates “excellent” quality and how a small gap indicates “poor” quality.
To obtain a unified evaluation criterion, the metrics of Equation (1) in this paper are dimensionless, and their mathematical definition at time
is
is called the Credibility Metric of Accelerated Degradation Test (CMADT), where
is the value of the same scale as the area metric
and is used to characterize the dispersion of the accelerated storage test data, as shown in
Figure 2. Its expression is
where
and
are the mean and standard deviation of the accelerated storage test degradation data at time
, respectively.
From Equation (2), it is clear that is dimensionless, which is convenient for subsequent accelerated storage tests of multiple key performance parameters in terms of providing a unified plausibility measure.
3. Credibility Metric of the Accelerated Aging Test
3.1. Probability Distributions of Key Performance Parameters Based on Natural Storage Tests with Small Samples
The sample size of the natural storage test is relatively small compared with that of the accelerated aging test. Otherwise, the degradation model can be obtained directly from the empirical data of the natural storage test.
Assume that the sample size of the natural storage test is r and the corresponding test time is , where L is the number of tests. A set of key parameter degradation data is , where is the test data at moment . Considering the small sample, the model using the probability distribution easily introduces strong subjective assumptions that affect the accuracy of the assessment. This paper constructs the upper and lower bounds of the p-box of natural storage data based on the belief function and plausibility function of D–S evidence theory, and the processing of this method does not make subjective assumptions about the distribution type, which can effectively retain the statistical characteristics of the original information.
Definition 1. Let B be the recognition frame and be the focal element. is the basic probability assignment (BPA).
First, calculate the data series’ mean value of
Arrange
and
in ascending order
The sequence
can form the set
consisting of r interval numbers.
The distance from the mean value to each interval number in is calculated, and the basic probability distribution of is obtained. The trust function and likelihood function are constructed to obtain the CDF (Cumulative Distribution Function) of the p-box.
Definition 2. Let and be two interval numbers and set as the distance between the interval numbers D and E. When p = 2, note thatis called the Euclidean distance. can be viewed as the interval number [
,
], so the Euclidean distance from the interval number
to
is
Normalize
to obtain the distance vector from the interval number
to
In turn, the similarity between the interval number
and
is defined as
expresses the extent to which the distribution interval of the individual test data is similar to the expected value
as a basis for assigning a confidence probability to
, i.e., BPA.
Although the underlying probability distribution of
has been obtained, there is no evidence to show what distribution the parameter
obeys in the interval
.
Therefore, the p-box of the parameter is constructed with the upper bound as the likelihood function and the lower bound as the confidence function, such that any possible probability distribution of x falls in this envelope.
3.2. CMADT under the p-Box
The performance degradation model for the parameter P under normal stress, which is derived from the accelerated storage test data, yields a set of time-section data at moment , where N is related to the sample size involved in the accelerated storage test, and the performance degradation modeling method is used. If the modeling method of the performance degradation track is used, . Generally, the sample size of the equivalent data obtained from the evaluation of accelerated storage test data is relatively large. For example, in a high-temperature accelerated storage test with three stress levels and five samples per stress level, if the performance degradation trajectory method is used, we can obtain degradation curves under normal stress. Then, the sample size of the cross-sectional data at moment is 125. Therefore, the probability distribution of the time-section data can be better described by hypothesis testing using commonly used distribution types such as normal and log-normal distributions.
As p-box is used to express the small sample natural storage data in this paper, by using Equation (2) to calculate CMADT, the obtained result will be an interval number
with the following upper and lower bounds:
3.3. Single Key Parameter of the Overall Confidence Evaluation
The previous two sections discussed the credibility evaluation metrics of the equivalent data derived from accelerated storage tests at a single point in time. In general, benchmark data often exist for multiple time points of the test data, so an integrated overall credibility metric that combines the credibility of multiple time points needs to be investigated.
To ensure the normalized characteristics of the overall credibility index, this paper performs probability statistics on the CMADT indices of n test points and calculates the credibility confidence lower limits at confidence level as the overall credibility index of individual key performance parameters.
According to Equation (17), the CMADT value
can be obtained for all
n test points, i.e.,
Let the confidence level of the credibility assessment be
and α be the significance level. For
, if the series can pass a normality test (e.g., a K–S test), the confidence intervals for the
n CMADT values at confidence level
can be counted according to the normal distribution as follows:
where
can be obtained by looking up the normal distribution table, and for the more widely used significance level
used in practice, the values of
are provided in
Table 1.
The calculation process is similar for . The calculation process is not repeated, and finally, and are obtained.
To be conservative, the credibility evaluation of this accelerated storage test is used as the upper confidence limit, i.e., the overall credibility of the single key performance parameter (CMADT of single parameter, CSP) evaluation at confidence level
is
, where
The index obtained by Equation (21) is dimensionless, which is convenient for the horizontal comparison of multiple key performance parameters related to the same product or the credibility of accelerated storage test data for different products.
If the data series P cannot pass the normality test, the kernel density estimation (KDE) method [
28] can be used to calculate the lower confidence limit of credibility at confidence level
γ. The kernel density estimation is a nonparametric estimation method that can be used for distribution parameter estimation when the distribution is nonnormal and nonstandard.
For a set of data
, the probability density function is estimated as
where
is the window width and
is the kernel function that satisfies
In this paper, we choose the most classical Gaussian kernel function.
For the data series
, after obtaining the KDE estimate, the confidence interval under the confidence level
can be obtained as
where
is the
-quartile point of the probability distribution of
and
is the
-quartile point of the probability distribution of
. The significances of
and
are similar.
In this case, the single key performance parameter CSP evaluation at confidence level
results in
, where
3.4. Overall Credibility Evaluation of Multiple Key Parameters
The above discussion is for a case in which the key parameters of the product are single parameters. However, in practice, multiparameter degradation is also common [
29,
30,
31,
32]. Let the product have
m key performance parameters, which are denoted as
P1,
P2, …,
Pm, and the confidence of the accelerated storage test for
m parameters is
according to the method described in
Section 3.3. To obtain the confidence of the whole accelerated storage test, the weights of the
m parameters need to be calculated, and the total confidence index is obtained by the weighted average method.
The dynamic time warping (DTW) distance [
33,
34] is a time series similarity measure with better performance that is suitable for the application scenario of different key performance parameters over time in this paper. Therefore, in this paper, the dynamic time warping distance is used to measure the similarity between individual response quantities and then to determine the contribution (i.e., weight) of each key performance parameter in the calculation of the overall confidence.
Suppose there are two time series
and
, which are denoted as
Then, the DTW distance can be recursively defined as
where
,
.
From the above definition, it can be seen that it calculates the distance between two time series by finding the minimum path of the distance between time series. The DTW distances of two time series X and Y can be calculated recursively directly using Equation (28).
The correlation calculation of the product’s m key parameters based on the DTW distance is provided below.
Step 1: Note that the time series obtained during the degradation trials of the two key parameters and are and , respectively, and the DTW distances of and are calculated according to the above method. The results are expressed as .
Step 2: Repeat Step 1 to obtain the DTW distance between two of all
m key parameters, denoted as
Step 3: Normalize Equation (26)
Step 4: The rows of the DTW distance matrix
d are summed to obtain the weight coefficient of the
key parameter as
Step 5: Let the credibility of
m key parameters be
, and use the weighted average method to obtain a uniform credibility measure for the entire product accelerated storage test, that is
3.5. Expert Systems for Credibility Assessment
To facilitate designers or decision-makers to have a more intuitive judgment about the credibility of accelerated storage test results, this paper establishes the credibility evaluation level of the accelerated storage test shown in
Table 2 and divides the evaluation results into four levels: “excellent,” “good,” “medium,” and “poor.” In
Table 2,
The number of intervals corresponding to the four levels is expressed as
The confidence level of the accelerated storage test can be determined by calculating the similarity of the intervals
and
.
correspond to “excellent”, “good”, “medium”, and “poor”, respectively. The similarity is calculated by using the Euclidean distance of Equation (13), which is defined in
Section 3.1; the four similarity variables
are obtained, and the confidence level of the key performance parameters is
4. Use Case
A quartz accelerometer is a typical inertial device that is widely used in the mathematics field. The quartz plus meter experiences the problem of accuracy drift during long-term storage, which affects its reliability in use. With the aim of evaluating the storage life of quartz plus meters, a high-temperature accelerated storage test was conducted. The specimens were divided into three groups,
denote the test cycles, each test cycle was 80 h, and accelerated storage tests were carried out at 60 °C, 72 °C, and 85 °C for approximately 800 h, with sample sizes of three, five, and five, respectively. The key performance parameters of the quartz accelerometer were the amount of output voltage degradation at position 0°; the amount of output voltage degradation at position 180°; the output voltage deviation of the centrifugal test, which were noted as
P1,
P2, and
P3, respectively; and the failure threshold
. The performance degradation data for the above three parameters are shown in
Table 3,
Table 4 and
Table 5.
As seen from the line graph of the test values under each group of stresses (
Figure 3), there is an overall linear decreasing degradation trend for the three key performance parameters.
The degradation curves of the quartz accelerometer are drawn from the degradation data of three key performance parameters, namely, the amount of output voltage degradation at position 0°, the amount of output voltage degradation at position 180°, and the output voltage deviation from the centrifugal test at different stress levels.
Furthermore, the degradation curves of the three key performance parameters under normal stress levels are obtained, as shown in
Figure 4.
Additionally, a batch (8) of quartz accelerometers manufactured in 2010 was tested at the factory and tested once a year during the period 2016 to 2021. The test data were expressed as a degraded quantity (i.e., the test value minus the initial value), and the results are shown in
Table 6,
Table 7 and
Table 8.
An accelerated storage test evaluation of the quartz accelerometer was carried out, and the three key performance parameters for 6~11 years of equivalent storage obeyed a normal distribution. The parameters are shown in
Table 9.
The probability distribution of the key performance parameters related to the quartz accelerometer during natural storage is calculated. For P
1, according to Equations (4)–(12), the p-box for its storage from 6 to 11 years can be constructed, as shown in
Figure 5.
Similarly, the p-box of the performance degradation data for P2 and P3 at each storage time can also be obtained, which is not listed here due to space limitations.
The confidence indices of individual key performance parameters at each natural storage moment are also calculated. First, the plausibility index of P
1 at t = 6a is calculated. According to the probability distribution of p-box and the equivalent storage 6a of P
1 under natural storage at t = 6a (as shown in
Figure 6), from Equations (1) and (2), it is obtained that
These expressions lead to the upper and lower bounds for the normalized plausibility metric CMADT as
Similarly, the plausibility metrics can be calculated for the five-time 00 positions P
1 from t = 7a to t = 11a. The procedure for calculating the plausibility metrics for the two key performance parameters, P
2 and P
3, is similar.
Table 10 shows the calculation results of the credibility metrics for the three key performance parameters of the quartz plus table.
The overall confidence evaluation index CSP for a single key performance parameter is calculated. According to the K–S test, the parameter P
1 in
Table 10 follows a normal distribution. Otherwise, P
2 and P
3 do not follow a normal distribution, so kernel density estimates are used to calculate their statistical properties. Using the confidence level γ = 0.8,
, according to Equations (15) and (16), it is obtained that
In turn, the evaluation result
is obtained as
P
2 and P
3 use kernel density estimation, and their evaluation results
are
From Equations (24)–(26), the DTW distance between two of all
m key parameters is obtained as
Furthermore, from Equations (26)–(28), the weight coefficient of the
jth key parameter is obtained as
From Equation (25), the weighted average method is used to obtain a uniform measure of confidence for the entire product accelerated storage test as
Then, from Equation (31),
= [80.04, 81.62]. According to
Table 2, the confidence level of the accelerated storage test is “good”.
5. Conclusions
In many engineering projects related to AST, although many efforts have been expended and various mathematical methods have been used to strive for the accuracy of the evaluation results each time, users still question whether the results obtained under accelerated stress can truly reflect the life of the product. As a result, the results of AST are often questioned in engineering practice, but there seems to be little discussion in academia.
To evaluate the credibility of ASTs, this paper adopts the idea of the area metric to construct the area metric of ASTs and uses natural storage test data as the benchmark. Therefore, a normalized and dimensionless reliability metric CMADT for ASTs is proposed. The percentage from “100%” to “0%” represents model reliability from best to worst. Based on the concept of performance importance, multiple single-point CMADTs are combined into one metric that can reflect the reliability of the AST results for the overall product.
The normalized, dimensionless index CMADT proposed in this paper is of great significance. It meets the challenge that conventional methods cannot quantitatively assess AST results. The CMADT not only allows designers and decision-makers to judge the credibility of AST intuitively, but also allows horizontal comparison of AST results for different products. The quantitative evaluation of CMADT is transformed into expressions that are suitable for human thought in
Table 2, i.e., “excellent,” “good,” “moderate,” and “poor,” which can help senior decision-makers make better judgments. In addition, CMADT is applicable to different situations, such as those with large samples, small samples, and very small samples, and it has good generality. In addition, due to the lack of similar research results, it is difficult to compare the results with those of existing research methods. If there are any inaccuracies in the viewpoints of this article, we hope readers can point them out and explore them.