Next Article in Journal
Interval-Valued Linguistic q-Rung Orthopair Fuzzy TODIM with Unknown Attribute Weight Information
Previous Article in Journal
The Formulae and Symmetry Property of Bernstein Type Polynomials Related to Special Numbers and Functions
Previous Article in Special Issue
Least Squares Estimation of Multifactor Uncertain Differential Equations with Applications to the Stock Market
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Uncertain Time Series Analysis for the Confirmed Case of Brucellosis in China

1
School of Medicine, Liaocheng University, Liaocheng 252000, China
2
School of Medical Science, Shandong Xiehe University, Jinan 250105, China
3
School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
4
Hangzhou International Innovation Institute, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2024, 16(9), 1160; https://doi.org/10.3390/sym16091160
Submission received: 28 May 2024 / Revised: 7 August 2024 / Accepted: 28 August 2024 / Published: 5 September 2024
(This article belongs to the Special Issue Symmetry Applications in Uncertain Differential Equations)

Abstract

:
Brucellosis, as an infectious disease that affects both humans and livestock, poses a serious threat to human health and has a severe impact on economic development. Essentially, brucellosis transmission is a kind of study in biological systems, and the epistemic uncertainty existing in the data of confirmed brucellosis cases in China is realized as significant uncertainty that needs to be addressed. Therefore, this paper proposes an uncertain time series model to explore the confirmed brucellosis cases in China. Then, some methods based on uncertain statistics and symmetry of the biological system are applied, including order estimation, parameter estimation, residual analysis, uncertain hypothesis test, and forecast. The proposed model is practically applied to the data of confirmed brucellosis cases in China from January 2017 to December 2020, and the results show that the uncertain model fits the observed data better than the probabilistic model due to the frequency instability inherent in the data of confirmed brucellosis cases. Based on the proposed model and statistical method, this paper develops an approach to rapidly forecast the number of confirmed brucellosis cases in small sample scenarios, which can contribute to epidemic control in real application.

1. Introduction

Brucellosis, a zoonotic infectious disease caused by the bacterium Brucella, is mainly transmitted to humans through direct contact with infected animals, such as cattle, sheep, and pigs, or through the consumption of unprocessed meat and dairy products from these animals. It is characterized by long-term fever, excessive sweating, joint pain, and hepatomegaly or splenomegaly. It is prevalent worldwide, with over 123 countries reporting cases. In China, the disease is prevalent in pastoral areas, especially in Inner Mongolia, Northeast China, and Northwest China, where livestock farming is common. The pathogenesis of brucellosis is complex. After entering the human body, the bacteria travel through the lymphatic system to the lymph nodes and multiply within cells, forming localized lesions. If the infected cells rupture, the bacteria enter the bloodstream and spread to multiple organs, causing systemic infection.
The data about brucellosis are open in many countries. To achieve certain targets such as epidemic control by the given data, some statistical methods should be applied. Uncertain statistics, as a set of mathematical techniques for collecting, analyzing, and interpreting data by uncertainty theory [1], was started in 2010 [2]. Up to now, it has achieved fruitful results in both theory and practice. In order to estimate the values of unknown parameters in uncertain statistical model, various methods have been presented based on the given data, including the method of moments [3], the maximum likelihood estimation [4], the method of least squares [5], least absolute deviations estimation [6] and Tukeys biweight estimation [7]. In addition, the uncertain hypothesis test was initialized to determine whether a statistical hypothesis is correct on the basis of observed data [8].
As an important branch of uncertain statistics, uncertain time series analysis was started in 2019 by assuming that the disturbance term is an uncertain variable [9]. It is a set of techniques in uncertain statistics to predict future values based on previously observed values. Nowadays, uncertain time series analysis has been applied in many fields such as China’s birth rate [10], China’s population [11], crude oil price [12], epidemic spread [13,14,15,16], grain yield [17], motion analysis [18], water demand [19], and so on [20,21].
Specifically, refs. [13,14,15,16] have utilized the uncertain logistic growth model to predict different infectious diseases in various countries. The logistic growth model, being an S-shaped regression function, has its forecast value relied on the nature and trend of the function itself. The time series model adopted in this paper relies on existing valid data for forecast, making it more consistent with the dynamics of brucellosis transmission. Consequently, the forecast values obtained are more precise, particularly for short-term predictions. Refs. [13,22] also employed an uncertain time series model to analyze and forecast the monkeypox epidemic in Congo, but it failed to distinguish between the training set and the test set within the existing data. In contrast, this paper categorizes the cumulative confirmed cases of brucellosis into two types: the training set is used to obtain parameters, and the test set is intentionally utilized to verify the model’s validity, further justifying the rationality of the forecast. Furthermore, this paper presents a more intuitive comparison graph to illustrate the frequency instability in the cumulative confirmed cases of brucellosis, which is a content not covered in the existing literature. This graphical representation provides a clear visualization of the variability in the data, enhancing understanding of the disease’s transmission dynamics.
The subsequent sections provide an organized overview of the rest frame and distinguish the uncertainty according to the analysis in reliability modeling. It begins with the preliminaries and presentation of the data pertaining to the confirmed cases of brucellosis in China from January 2021 to May 2021 in Section 2 and Section 3, respectively. In Section 4, an uncertain time series model is employed to examine the confirmed cases of brucellosis in China by considering the disturbance term as an uncertain variable. These analyses encompass order estimation, parameter estimation, residual analysis, uncertain hypothesis test, and determination for the forecast value as well as confidence interval to comprehensively analyze and forecast the future number of confirmed brucellosis cases in China. Then, Section 5 introduces the rationality behind proposing an alternative to the probabilistic time series model by analyzing the frequency of the data of confirmed brucellosis cases. Finally, a concise summary and discussion are provided in Section 6.
These are two primary innovative points of this study:
1. We aim to address a new theory, uncertain time series analysis, to explore the confirmed brucellosis cases in China with the uncertain disturbance term in a natural symmetry biological system. The methods in uncertain statistics such as residual analysis, hypothesis test, and forecast (i.e., the forecast value and confidence interval) have been utilized to deal with the epistemic uncertainty.
2. In addition, a statistically simple model is proposed to formulate the brucellosis transmission very well and appropriately when the frequency stability of the observed data is insufficient, especially in small sample scenarios.

2. Preliminaries

As a branch of uncertain statistics, uncertain time series analysis is a set of statistical techniques that use uncertainty theory to predict future values based on the previously observed values. Assume X t are observed values at times t , t = 1 , 2 , , n , respectively. Then, the sequence of observed values
X 1 , X 2 , , X n
is called a time series. A basic problem of uncertain time series analysis is to predict the value of X n + 1 based on previously observed values X 1 , X 2 , , X n .
In order to model the time series, an uncertain time series model is addressed as
X t = a 0 + i = 1 k a i X t i + ε ,
where k is called the order of the uncertain time series model, a 0 , a 1 , , a k are unknown parameters, and ε is an uncertain disturbance term. After obtaining the parameter estimates based on actual data, the uncertain time series model is presented by
X t = a ^ 0 + i = 1 k a ^ i X t i + ε ,
where the uncertain disturbance term ε is assumed to follow an uncertainty distribution such as normal uncertainty distribution, i.e.,
X t = a ^ 0 + i = 1 k a ^ i X t i + N ( e ^ , σ ^ )
in which e ^ and σ ^ 2 are the expected value and variance of uncertain disturbance term ε .
In order to test whether the normal uncertainty distribution fits the residuals, the uncertain hypothesis test was introduced [8]. The uncertain hypothesis test is given by
W = { ( z k + 1 , z k + 2 , , z n , ) : there are at least α of indexes t s   with k + 1 t n such that z t < Φ 1 α 2 or z t > Φ 1 1 α 2 } ,
where
Φ 1 ( α ) = e ^ + σ ^ 3 π ln α 1 α .
If the residuals pass the uncertain hypothesis test, i.e., the uncertain time series model is a good fit to the observed data, the forecast value and α confidence interval can be obtained by
X ^ n + 1 = a ^ 0 + i = 1 k a ^ i X n + 1 i + e ^
and
X ^ n + 1 ± σ ^ 3 π ln 1 + α 1 α ,
respectively.

3. Data of the Confirmed Brucellosis Cases in China

The data of confirmed brucellosis cases in China from January 2017 to December 2020 are exhibited in Figure 1 and Table 1, which was reported by National Bureau of Statistics of China (https://www.phsciencedata.cn/Share/ (accessed on 1 May 2024)). Let t = 1 , 2 , , 48 represent the months from Jan. 2017 to Dec. 2020, respectively. Then, the corresponding confirmed brucellosis cases can be denoted by
I t , t = 1 , 2 , , 48 .
For example, I 1 = 2778 is the number of confirmed brucellosis cases in January 2017 , and I 47 = 2914 is the number of confirmed brucellosis cases in November 2020 .
Essentially, the historical data on the number of confirmed brucellosis cases constitute a sequence. More generally, any historical data or samples form sequences when they are sorted in chronological order in the context of time series. These data possess certain statistical characteristics and are underlain by some scientific principles (e.g., in confirmed brucellosis cases, the principles are related to biological transmission patterns of the bacteria). The methods used to obtain these data are relatively objective, yet in practical application, we need to extract patterns through statistical modeling. At this point, human “subjectivity” can influence the estimation of the number of confirmed brucellosis cases.
Hence, when utilizing historical data on the number of confirmed brucellosis cases for statistical modeling, uncertainty arises. One type of uncertainty stems from the aleatoric uncertainty within the data of confirmed brucellosis cases themselves (such as disturbances caused by differences in transmission environments), necessitating the assumption of this sequence as a set of random variables. Another type arises due to epistemic uncertainty resulting from incomplete knowledge during the modeling process, requiring the assumption of this sequence as a set of uncertain variables. Therefore, while random variables or uncertain variables are used to abstract real-world phenomena, the disturbance terms tend to exhibit a bias towards aleatoric or epistemic uncertainties. This distinction leads to scenarios where random variables (probability theory) and uncertain variables (uncertainty theory) are utilized and differentiated.
In extensive studies of system reliability, upon obtaining relevant data for reliability parameters, the primary step is to analyze whether the disturbance terms are more suitable to be assumed as random variables or uncertain variables. As bacterial transmission fundamentally represents a manifestation of biological systems, the study of transmission patterns can be analogized to the reliability modeling of biological systems. Therefore, it is necessary to analyze whether the usage scenario leans more towards aleatoric uncertainty or epistemic uncertainty. The number of confirmed brucellosis cases used in this paper encompasses data from 48 months. It aims to achieve a relatively accurate estimation of future confirmed brucellosis cases within such a dataset. Due to the limited sample size, the modeling process is subject to significant epistemic uncertainty. In such circumstances, utilizing time series analysis based on uncertainty theory holds a substantial advantage for estimating the future number. Hence, in the following sections, we initially assume that the disturbance term generated by the time series model is an uncertain variable. Lastly, we compare this approach with some probabilistic model to further illustrate the advantages of uncertain time series analysis in handling epistemic uncertainty.

4. Uncertain Autoregressive Model for Confirmed Brucellosis Cases in China

The Autocorrelation Function (ACF) plot is a graphical representation that displays the correlation between a time series and its past values. Based on Figure 2, we can conclude that the data exhibit strong autocorrelation, especially at shorter lags. Even at longer lags, the data still show a degree of autocorrelation. Therefore, we decide to use an autoregressive model to describe the confirmed brucellosis cases in China.
Therefore, an uncertain autoregressive model is adopted to forecast the next number of confirmed brucellosis cases I n + 1 based on the previous numbers of confirmed brucellosis cases I 1 , I 2 , , I n in this section.

4.1. Order Estimation

The uncertain autoregressive model is
I t = β 0 + i = 1 k β i I t i + z
where β 0 , β 1 , , β k is a vector of unknown parameters, and z is an uncertain variable which represents a disturbance term.
Next, the foremost problem to be settled is how to determine the order of the uncertain autoregressive model. Several cross-validation methods are utilized to ascertain the order, for instance, fixed origin cross validation, rolling origin cross validation, and rolling window cross validation [22].
  • Fixed origin cross validation is as follows:
    A T E T 1 ( k ) = 1 n T t = T + 1 n E I ^ t β ^ 0 i = 1 k β ^ i I ^ t i 2
    where β ^ 0 , β ^ 1 , , β ^ k are least squares estimates using the observations in training set I ^ 1 , I ^ 2 , , I ^ T .
  • Rolling origin cross validation is as follows:
    A T E T 2 ( k ) = m = 0 n T 1 1 n T m t = T + m + 1 n E I ^ t β 0 m i = 1 k β ^ i m I ^ t i 2
    where β ^ 0 m , β ^ 1 m , , β ^ k m are least squares estimates using the observations in training set I ^ 1 , I ^ 2 , , I ^ m + T .
  • Rolling window cross validation is as follows:
    A T E T 3 ( k ) = m = 0 n T 1 1 n T m t = T + m + 1 n E I ^ t β 0 m i = 1 k β ^ i m I ^ t i 2
    where β ^ 0 m , β ^ 1 m , , β ^ k m are least squares estimates using the observations in training set I ^ m + 1 , I ^ m + 2 , , I ^ m + T .
Let the maximum order be 5 and the length of the training set be 47 . Then, the average testing error (ATE) of each cross-validation method above is calculated and compared, which is shown in Table 2. The fixed origin cross validation ( A T E 47 1 ( p ) ), rolling origin cross validation ( A T E 47 2 ( p ) ), and rolling window cross validation ( A T E 47 3 ( p ) ) take the smallest ATE when p’s are 3 , 3 , and 3 , respectively. Therefore, the order of the uncertain autoregressive model is decided to be 3.
A third-order uncertain autoregressive model is presented as follows:
I t = β 0 + β 1 I t 1 + β 2 I t 2 + β 3 I t 3 + z ,
where β 0 , β 1 , β 2 , β 3 are unknown parameters, and z is an uncertain variable which represents a disturbance term.

4.2. Parameter Estimation

An autoregressive model supposes that the next value is a linear combination of previous observed values. This is a fundamental and natural model in time series analysis, where previous observations are assumed to influence the current observations directly so as to predict or explain future changes in the observed values. To establish the relationship between the next value and previous observed values, historical data are used to specify the parameters of the autoregressive model, and all the unknown parameters in the model are determined. In particular, deterministic patterns of development of the number of confirmed brucellosis cases can be obtained by the uncertain autoregressive model.
To specifically estimate the unknown parameters β 0 , β 1 , β 2 , β 3 in an uncertain autoregressive model, the least squares method is recast by Yang and Liu [9] to calculate the optimal solution for the following minimization problem:
min β 0 , β 1 , β 2 , β 3 t = 4 48 I t β 0 + β 1 I t 1 + β 2 I t 2 + β 3 I t 3 2 .
Based on the data provided in Table 1 and the least squares method, the parametric estimated value is obtained as
β ^ 0 , β ^ 1 , β ^ 2 , β ^ 3 = 1.4874 × 10 3 , 0.9888 , 0.1279 , 0.5443
implemented by the function “lsqnonlin” in Matlab. That means that the fitted autoregressive model is
I t = 1.4874 × 10 3 + 0.9888 I t 1 + 0.1279 I t 2 0.5443 I t 3 ,
and the fitted autoregressive model is reviewed in Figure 3, which implies a good fit for the historical data of the number of confirmed brucellosis cases.

4.3. Residual Analysis

The fundamental difference between uncertain time series analysis and traditional time series analysis lies in the assumption made about the models’ disturbance terms. In traditional methods, the disturbance terms are assumed to be random variables, whereas in uncertain autoregressive models, the disturbance terms are assumed to be uncertain variables.
Based on the distinction of uncertainty which drew from the system reliability modeling mentioned in Section 2, this paper characterizes the number of confirmed brucellosis cases using an uncertain autoregressive model. Therefore, the disturbance term of the model should be treated as an uncertain variable. The next step is naturally to estimate the uncertain variable through the observed data, which constitutes the method of residual analysis [3] in uncertain statistics. Roughly speaking, the residuals in this case are obtained by the difference between the observed values of confirmed brucellosis cases and the corresponding values computed by the fitted time series Model (12), i.e., the residuals are
ε t = I t 1.4874 × 10 3 + 0.9888 I t 1 + 0.1279 I t 2 0.5443 I t 3 , t = 4 , 5 , , 48 .
Considering the symmetry in a natural biological system, the disturbance term z is assumed to follow a normal uncertainty distribution N ( e , σ ) . By using the method of moments, the estimated expected value of the uncertain variable z is
e ^ = t = 4 48 ε t = 1.8190 ,
and the estimated variance is
σ ^ 2 = t = 4 48 ( ε t e ^ ) 2 = 445 . 9398 2 .
Thus, an uncertain autoregressive model is gained as follows:
I t = 1.4874 × 10 3 + 0.9888 I t 1 + 0.1279 I t 2 0.5443 I t 3 + N ( 1.8190 , 445.9398 ) .
The results for the calculation of residuals are reviewed in Figure 4, which implies that the residuals are not white noise in the sense of probability theory (more demonstrations and its disadvantage will be provided in Section 5), and the disturbance term (i.e., the population of the residuals) is indeed an uncertain variable when applying the autoregressive model.

4.4. Uncertain Hypothesis Test

The assumption regarding whether the disturbance term is a random variable or an uncertain variable directly impacts subsequent decisions, with the most immediate effect being on the use of the hypothesis test. If uncertain variables are treated as random variables and the probabilistic hypothesis test is employed, the test is indeed invalid; the reverse is also true. The core problem of the hypothesis test lies in deriving rejection regions based on relevant mathematical axioms. If historical data fall within the rejection region, it indicates that the model used is inappropriate for the given data, and another model has to be applied. Conversely, if the data fall outside the rejection region, the model is deemed applicable for subsequent practical applications such as forecast.
Since this paper has analyzed that it is necessary to employ an uncertain autoregressive model to characterize the epistemic uncertainty existing in the data, the uncertain hypothesis test [23] is used to determine whether the uncertain autoregressive Model (13) is suitable for the data of the confirmed brucellosis cases in China from January 2017 to December 2020, i.e., the normal uncertainty distribution N ( 1.8190 , 445.9398 ) fits the residuals ε 4 , ε 5 , , ε 48 . Let the significance level α be 0.05 . Then, the rejection region based on the uncertain hypothesis test is given by
W = { ( ε 4 , ε 5 , , ε 48 ) : there are at least 3 of indexes i s   with 4 t 48 such that ε t < 902.5404 or ε t > 898.9024 } .
Since there only exist two aberrant points (see Figure 5, where the upper line and lower lines are, respectively, the upper bound and lower bound of the rejection region), it is suitable to use the uncertain autoregressive Model (13) for characterizing the number of confirmed brucellosis cases in China. Essentially, the uncertain hypothesis test derives the theoretical symmetry of an acceptable range for the epistemic uncertainty existing in the data of the confirmed brucellosis cases.

4.5. Forecast

Based on the autoregressive Model (13), the forecast uncertain variable of the confirmed brucellosis cases in China on January 2021 is
I ^ 49 = 1.4874 × 10 3 + 0.9888 × 2925 + 0.1279 × 2914 0.5443 × 3043 + N ( 1.8190 , 445.9398 ) ,
that is, I ^ 49 follows the normal uncertainty distribution N 3092.7315 , 445.9398 . Generally, I ^ 49 is referred to as a forecast uncertain variable, and it is obviously a particular type of uncertain variable. For this uncertain variable, its specific uncertainty distribution (i.e., I ^ 49 N 3092.7315 , 445.9398 ) is obtained through the operational law of uncertainty theory. In other words, the basic information of I ^ 49 is known, and its epistemic uncertainty can also be described. Essentially, I ^ 49 can be regarded as an estimator for the number of confirmed brucellosis cases based on uncertain statistics.
The next crucial step for an estimator is to quantify the information it contains. Typically, this involves providing an appropriate point estimation based on its uncertainty distribution, thereby offering a rough estimation for the number of confirmed brucellosis cases at the next time ( t = 49 ). For the number of confirmed brucellosis cases, a natural point estimation is to take the expected value of its forecast uncertain variable, i.e., the expected value of I ^ 49 , and then we can obtain that the confirmed brucellosis cases in China on January 2021 is
E I ^ 49 3092.7315 .
Generally, the expected value of a forecast uncertain variable is referred to as a forecast value.
However, no one would believe that the future number of confirmed brucellosis cases will exactly match the forecast value. This is why point estimation can only be a rough estimation in practical applications. In order to evaluate the accuracy of the estimation, a typical method is to specify a confidence level (e.g., 95 % ) and provide a corresponding interval estimation based on the forecast uncertain variable and forecast value at this confidence level. The interval estimation serves as a confidence interval for the future number of confirmed brucellosis cases. For uncertain time series analysis, the calculation result of the 95 % confidence interval is
3092.7315 ± 445.9398 3 π ln 1 + 0.95 1 0.95 ,
i.e., 939.6253 ± 54.7630 = 2192.0101 , 3993.4529 . The symmetry of the confidence interval based on uncertain statistics is derived from the symmetric assumption for the uncertain disturbance term. In summary, the confirmed brucellosis cases in China January 2021 roughly equals to 3092 (point estimation), and could fall within the range 2192 , 3993 with a 95 % confidence level (interval estimation).

5. Why the Probabilistic Time Series Analysis Is Not Suitable

The above calculation has drawn upon the analysis for reliability modeling to distinguish the uncertainty within the system and underscored the significance of addressing the epistemic uncertainty existing in the data of confirmed brucellosis cases. Consequently, the method of uncertain time series analysis has been applied for data processing. In fact, the genesis of this issue is caused by the frequency instability of data. Subsequently, this section will demonstrate the frequency instability of the data of confirmed brucellosis cases and elucidate the disadvantages associated with probabilistic time series analysis.
(i)
Two-sample T-test
The two-sample T-test (Matlab function “ttest2”) returns a test decision that the null hypothesis should be rejected with a p-value of 0.0388 by butchering some residuals into two paragraphs,
( ε 4 , ε 5 , , ε 22 ) and ( ε 23 , ε 24 , , ε 48 ) .
That is, those groups of residuals come from populations with unequal means at the 5 % significance level.
(ii)
Wilcoxon Rank Sum Test
Wilcoxon Rank Sum test (Matlab function “ranksum”) returns a test decision that the null hypothesis should be rejected with a p-value of 0.0428 by butchering some residuals into two paragraphs,
( ε 6 , ε 7 , , ε 21 ) and ( ε 25 , ε 26 , , ε 47 ) .
That is, those groups of residuals come from populations with unequal medians at the 5 % significance level.
(iii)
Ansari–Bradley Test
The Ansari–Bradley test (Matlab function “ansaribradley”) returns a test decision that the null hypothesis should be rejected with a p-value of 1.1215 × 10 8 by butchering some residuals into two paragraphs,
( ε 4 , ε 5 , , ε 23 ) and ( ε 24 , ε 25 , , ε 48 ) .
That is, those groups of residuals come from populations with unequal dispersions at the 5 % significance level.
(iv)
Two-sample F-test
The two-sample F-test (Matlab function “vartest2”) returns a test decision that the null hypothesis should be rejected with a p-value of 8.1286 × 10 35 by butchering some residuals into two paragraphs,
( ε 4 , ε 5 , , ε 23 ) and ( ε 24 , ε 25 , , ε 48 ) .
That is, those groups of residuals come from populations with unequal variances at the 5 % significance level.
The failure to pass the aforementioned tests indicates that there is a significant difference in the statistical features within the data. In other words, the residuals obtained based on the original data within the aforementioned groups do not come from the same population. This suggests that the data of confirmed brucellosis cases lack frequency stability. From a statistical perspective, such data need to be treated as uncertain variables, as considering them as random variables would pose systemic risks (see Figure 6 and Figure 7).
By respectively treating the first half and the second half of the data as the training set and test set, we can calculate the autoregressive model similar to the method described in Section 3. Subsequently, residuals can be obtained in the test set and relabeled as ϵ 1 , ϵ 2 , , ϵ 22 , where their indices correspond to the x-axis of Figure 6 and Figure 7; a crucial indicator for frequency stability is the convergence of the sum of data, denoted as s i = ϵ 1 + ϵ 2 + + ϵ i , i = 1 , 2 , , 22 . The y-axis of Figure 6 and Figure 7 represents the values of s 1 , s 2 , , s 22 . If the residuals are assumed to be independent and identically distributed (i.e., from the same population), then the distribution functions of s 1 , s 2 , , s 22 can be computed based on the operational rule of probability theory or uncertainty theory. Consequently, the corresponding confidence intervals can be determined. The curves in Figure 6 and Figure 7 represent the calculated 95 % confidence intervals. As shown in Figure 6, the result based on probability theory (where the disturbance term is regarded as a random variable) exhibits significant errors for the test set data, with 19 points falling outside the 95 % confidence interval. Even when including the training set data, the accuracy rate is only 57.78 % , which is unacceptable for a 95 % confidence level. Conversely, the result based on uncertainty theory (where the disturbance term is regarded as an uncertain variable) in Figure 7 demonstrates a higher accuracy rate upon re-evaluation. In fact, all points in the test set fall within the 95 % confidence intervals. Moreover, the results obtained by other probabilistic time series model can be compared in a similar way, and the comparison is reviewed in Table 3, where PAR represents the probabilistic autoregressive model, PMA represents the probabilistic moving average model, and UAR represents the uncertain autoregressive model utilized in this paper.
In detail, the PAR model is
X t = a 0 + i = 1 p a i X t i + ε
where p is called the order of the probabilistic autoregressive model, a 0 , a 1 , , a p are unknown parameters, and ε is a random variable; the PMA model is
X t = b 0 + j = 1 q b j ε t j
where q is called the order of the probabilistic moving average model, b 0 , b 1 , , b q are unknown parameters, and ε t is the disturbance term (random variable) at time t.
Essentially, the greatest disadvantage caused by frequency instability is the failure of data to satisfy the law of large numbers. As reviewed in Figure 8, the mean value of the number of confirmed brucellosis cases is far from reaching convergence. The operational rule of probability theory inevitably leads to the rapid convergence of confidence intervals. However, data with frequency instability cannot keep pace with this convergence rate, resulting in inaccurate forecast based on probability theory. Perhaps, after a long period of data accumulation, the frequency may become relatively stable. However, for the issue studied in this paper, which concerns the bacterial transmission in biological systems and even epidemic control in society systems, there is evidently insufficient time to gather extensive data. In such situations, statistical methods based on uncertainty theory can achieve excellent predictive performance, and the mathematical model is relatively simple. Hence, they are conducive to numerical computations for subsequent decision-making, thus offering significant practical value and real-world applications.

6. Discussion

In this paper, the study for brucellosis transmission was regarded as reliability modeling for a biological system, and the epistemic uncertainty existing in the data of confirmed brucellosis cases in China was realized as a significant uncertainty that needs to be addressed. Therefore, an uncertain autoregressive model based on uncertain time series analysis was used to simulate the number of the confirmed brucellosis cases by considering the disturbance term in the model as an uncertain variable. Considering the symmetry in a natural biological system, the uncertain disturbance term was assumed to have a normal uncertainty distribution, and then the methods in uncertain statistics such as residual analysis, hypothesis test, and forecast (i.e., the forecast value and confidence interval) can be applied. By dividing original data as the training set and test set, the uncertain autoregressive model was verified to have higher accuracy compared to the probabilistic model. This is primarily due to the fact that the data of confirmed brucellosis cases have not yet reached a state of frequency stability. Hence, utilizing probability theory at this stage would lead to erroneous decisions.
Apart from achieving reliability modeling for the number of confirmed brucellosis cases based on the uncertain autoregressive model and symmetry of the biological system, the main contribution of this paper is the proposal of a statistically simple model that does not require high demands on frequency stability yet provides a good description for brucellosis transmission. This enables rapid forecast of the number of confirmed brucellosis cases in small sample scenarios, which can contribute to epidemic control in real applications. Additionally, this paper uses the uncertain autoregressive model to compute the corresponding I t , describing the evolution of the number of confirmed brucellosis cases over time. Essentially, I t is an uncertain process, which can also serve as a solution to an uncertain differential equation. Building upon those foundations, the modeling approach proposed in this paper is able to be applied in a extensive range of symmetric systems with significant epistemic uncertainty.

Author Contributions

S.Z.: Data Curation, Writing—Original Draft, Validation; Y.Z. (Co-first author): Software, Visualization, Investigation; W.L. (Corresponding author): Visualization, Writing—Review and Editing; R.K.: Peer-Review. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China (Grant No.62203026) and the Funding of Science and Technology on Reliability and Environmental Engineering Laboratory, China (No.6142004220101).

Institutional Review Board Statement

This paper does not contain any studies with human participants or animals performed by any of the authors.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Liu, B. Uncertainty Theory, 2nd ed.; Springer: Berlin, Germany, 2007; pp. 119–155. [Google Scholar]
  2. Liu, B. Uncertainty Theory: A Branch of Mathematics for Modeling Human Uncertainty; Springer: Berlin, Germany, 2010; pp. 146–153. [Google Scholar]
  3. Lio, W.; Liu, B. Residual and confidence interval for uncertain regression model with imprecise observations. J. Intell. Fuzzy Syst. 2018, 35, 2573–2583. [Google Scholar] [CrossRef]
  4. Lio, W.; Liu, B. Uncertain maximum likelihood estimation with application to uncertain regression analysis. Soft Comput. 2020, 24, 9351–9360. [Google Scholar] [CrossRef]
  5. Liu, Y.; Liu, B. Estimation of uncertainty distribution function by the principle of least squares. Commun. Stat.—Theory Methods 2023, 1–18. [Google Scholar] [CrossRef]
  6. Liu, Z.; Yang, Y. Least absolute deviations estimation for uncertain regression with imprecise observations. Fuzzy Opt. Decis. Mak. 2020, 19, 33–52. [Google Scholar] [CrossRef]
  7. Chen, D. Tukeys biweight estimation for uncertain regression model with imprecise observations. Soft Comput. 2020, 24, 16803–16809. [Google Scholar] [CrossRef]
  8. Ye, T.; Liu, B. Uncertain hypothesis test with application to uncertain regression analysis. Fuzzy Opt. Decis. Mak. 2022, 21, 157–174. [Google Scholar] [CrossRef]
  9. Yang, X.; Liu, B. Uncertain time series analysis with imprecise observations. Fuzzy Opt. Decis. Mak. 2019, 18, 263–278. [Google Scholar] [CrossRef]
  10. Ye, T.; Liu, B. Uncertain hypothesis test for uncertain differential equations. Fuzzy Opt. Decis. Mak. 2023, 23, 195–211. [Google Scholar] [CrossRef]
  11. Liu, Y. Analysis of China’s population with uncertain statistics. J. Uncertain Syst. 2022, 15, 2243001. [Google Scholar] [CrossRef]
  12. Zhang, Y.; Gao, J. Nonparametric uncertain time series models: Theory and application in Brent crude oil spot price analysis. Fuzzy Opt. Decis. Mak. 2024, 23, 239–252. [Google Scholar] [CrossRef]
  13. Chen, L.; Ding, C. Uncertain analysis of monkeypox outbreak in the Democratic Republic of the Congo. J. Ind. Manag. Opt. 2024, 20, 2842–2853. [Google Scholar] [CrossRef]
  14. Ding, C.; Liu, W. Analysis and prediction for confirmed COVID-19 cases in Czech Republic with uncertain logistic growth model. Symmetry 2021, 13, 2264. [Google Scholar] [CrossRef]
  15. Ding, C.; Ye, T. Uncertain logistic growth model for confirmed COVID-19 cases in Brazil. J. Uncertain Syst. 2022, 15, 2243008. [Google Scholar] [CrossRef]
  16. Liu, Z. Uncertain growth model for the cumulative number of COVID-19 infections in China. Fuzzy Opt. Decis. Mak. 2021, 20, 229–242. [Google Scholar] [CrossRef]
  17. Ye, T.; Kang, R. Modeling grain yield in China with uncertain time series model. J. Uncertain Syst. 2022, 15, 2243003. [Google Scholar] [CrossRef]
  18. Xie, J.; Lio, W. Uncertain nonlinear time series analysis with applications to motion analysis and epidemic spreading. Fuzzy Opt. Decis. Mak. 2024, 23, 279–294. [Google Scholar] [CrossRef]
  19. Li, W.; Wang, X. Analysis and prediction of urban household water demand with uncertain time series. Soft Comput. 2023, 28, 6199–6206. [Google Scholar] [CrossRef]
  20. Ye, T.; Liu, Y. Multivariate uncertain regression model with imprecise observations. J. Ambient Intell. Hum. Comput. 2020, 11, 4941–4950. [Google Scholar] [CrossRef]
  21. Zhang, X.; Ding, C. Uncertain analyze of the number of hospitals in China. J. Intell. Fuzzy Syst. 2024; accepted. [Google Scholar]
  22. Liu, Z.; Yang, X. Cross validation for uncertain autoregressive model. Commun. Stat.—Simul. Comput. 2020, 51, 4715–4726. [Google Scholar] [CrossRef]
  23. Ye, T.; Yang, X. Analysis and prediction of confirmed cases of COVID-19 in China by uncertain time series. Fuzzy Opt. Decis. Mak. 2021, 20, 209–228. [Google Scholar] [CrossRef]
Figure 1. Data of the confirmed brucellosis cases in China from January 2017 to December 2020.
Figure 1. Data of the confirmed brucellosis cases in China from January 2017 to December 2020.
Symmetry 16 01160 g001
Figure 2. ACF plot for the confirmed brucellosis cases in China.
Figure 2. ACF plot for the confirmed brucellosis cases in China.
Symmetry 16 01160 g002
Figure 3. Fitted autoregressive model with the data of confirmed brucellosis cases in China from January 2017 to December 2020.
Figure 3. Fitted autoregressive model with the data of confirmed brucellosis cases in China from January 2017 to December 2020.
Symmetry 16 01160 g003
Figure 4. Residual plot of uncertain autoregressive Model (11) corresponding to confirmed brucellosis cases.
Figure 4. Residual plot of uncertain autoregressive Model (11) corresponding to confirmed brucellosis cases.
Symmetry 16 01160 g004
Figure 5. Rejection region based on the uncertain hypothesis test for the residuals obtained by the data of confirmed brucellosis cases.
Figure 5. Rejection region based on the uncertain hypothesis test for the residuals obtained by the data of confirmed brucellosis cases.
Symmetry 16 01160 g005
Figure 6. The result obtained by probability theory.The blue circles represent the residuals in the test set, while the red lines designate the upper and lower limits of the confidence interval, respectively.
Figure 6. The result obtained by probability theory.The blue circles represent the residuals in the test set, while the red lines designate the upper and lower limits of the confidence interval, respectively.
Symmetry 16 01160 g006
Figure 7. The result obtained by uncertainty theory. The blue circles represent the residuals in the test set, while the red lines designate the upper and lower limits of the confidence interval, respectively.
Figure 7. The result obtained by uncertainty theory. The blue circles represent the residuals in the test set, while the red lines designate the upper and lower limits of the confidence interval, respectively.
Symmetry 16 01160 g007
Figure 8. The mean value of the number of confirmed brucellosis cases in China.
Figure 8. The mean value of the number of confirmed brucellosis cases in China.
Symmetry 16 01160 g008
Table 1. Data of the confirmed brucellosis cases in China from January 2017 to December 2020.
Table 1. Data of the confirmed brucellosis cases in China from January 2017 to December 2020.
277829783901417147104644417232762070196420311859
258325333765423649774542392332522044202621491917
272228834085470954385315491638222883245224092402
253020644130497756835887544743213324304329142925
Table 2. The ATEs for cross validations.
Table 2. The ATEs for cross validations.
p12345
A T E 47 1 ( p ) 0.99670.68080.06000.71901.8498
A T E 47 2 ( p ) 0.99670.68080.06000.71901.8498
A T E 47 3 ( p ) 0.71900.68080.06000.71900.7190
Table 3. Comparison for the accuracy rates of 95 % confidence intervals based on PAR, PMA, and UAR.
Table 3. Comparison for the accuracy rates of 95 % confidence intervals based on PAR, PMA, and UAR.
Theoretical RatePractical RateOutlierDeviation
PAR95%57.78%1937%
PMA95%54.17%2241%
UAR95%100%05%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, S.; Zhang, Y.; Lio, W.; Kang, R. Uncertain Time Series Analysis for the Confirmed Case of Brucellosis in China. Symmetry 2024, 16, 1160. https://doi.org/10.3390/sym16091160

AMA Style

Zhang S, Zhang Y, Lio W, Kang R. Uncertain Time Series Analysis for the Confirmed Case of Brucellosis in China. Symmetry. 2024; 16(9):1160. https://doi.org/10.3390/sym16091160

Chicago/Turabian Style

Zhang, Shanshan, Yaxuan Zhang, Waichon Lio, and Rui Kang. 2024. "Uncertain Time Series Analysis for the Confirmed Case of Brucellosis in China" Symmetry 16, no. 9: 1160. https://doi.org/10.3390/sym16091160

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop