Based on the imprecise probability theory and its application advantages, this study uses the imprecise Dirichlet model (IDM) with the Naive Credal Classifier (NCC).
3.1. Imprecise Dirichlet Model
The Imprecise Dirichlet Model (IDM) is an advanced Bayesian statistical approach for handling uncertainty, serving as an extension of the Dirichlet distribution for estimating probability distributions under conditions of insufficient information. Consider a multinomial distribution with N possible outcomes, whose Dirichlet prior probability density function (PDF) [
11], as shown in Equation (1).
In the formula, θ = (θ1, θ2, …, θN) represents the probability of the result occurring, so that 0 ≤ θn ≤ 1 (n = 1, 2, …, N) and = 1, α1, α2, …, αN represent the positive parameter of the Dirichlet distribution, and Γ(·) represents the gamma function, which is often used in statistics to represent the probability distribution of random variables.
When the sample observations M are acquired, the original prior Dirichlet probability density function is updated by Bayes’ theorem, and then the posterior Dirichlet probability density function is generated, which reflects the re-evaluation of the parameters by synthesizing the actual observations [
12], and the posterior probability density function is shown in Equation (2).
In the formula, M = {m1, m2, …, mn} is a sample observation; mn represents the number of occurrences of the random variable state n.
After obtaining the posterior probability density function, the parameter
θn is estimated using the posterior distribution expected value [
13], as defined in Equation (3).
When analyzing the estimated results of a deterministic Dirichlet model, if observations are lacking, the probability
θn of the nth result is determined by the parameter α, i.e.,
, where the parameter
α is called the prior weight of the result, often expressed as the parameter s, and is called the equivalent sample size in the Dirichlet distribution. In the probability estimation process,
s represents the influence of prior distribution on posterior probability, that is, the larger the value of
s, the more observed values are needed to adjust the parameters of prior distribution [
14]. When there are fewer available observations, the deterministic Dirichlet model estimates are more affected by the prior distribution, and if the settings are not reasonable, then the estimation results based on the deterministic Dirichlet model may become inaccurate, which may affect the final decision and prediction.
literature [
15], the authors, in order to overcome the shortcomings of deterministic Dirichlet models, IDM uses a series of Dirichlet prior distributions instead of a single Dirichlet distribution. In IDM, the prior probability density function is shown in Equation (4).
In the formula, rn (n = 1, 2, …, N) is the nth prior weight factor, and in Equation (4), s·rn has the same effect as αn. When rn varies within the interval [0, 1], f(θ) will contain all possible prior PDFS for a given predetermined s, thus avoiding unreasonable effects of prior values.
Then, according to the updating process of Bayes’ rule, a posterior PDF of the IDM relative to the observed value M can be calculated, as shown in Equation (5).
In the formula, represents the total number of observations.
Thus, a parameter representing the interval valued probabilities of all outcomes in the IDM,
,
, …,
can be estimated from the posterior PDF by calculating the expected value, according to Equation (6).
The bounds of expectation are calculated based on the bounds of rn, i.e., 0 and 1. Thus, the imprecise probability of random variable state occurrence in a given case can be estimated based on small-sample data. The IDM statistical model eliminates the adverse effects of unreasonable prior Settings on event probability estimation in the absence of sample size.
3.2. Naive Credal Classifier
Naive Credal Classifier (NCC) is a classifier based on Naive Bayes (NB), which enhances the robustness of the model by introducing imprecise probabilities. The core idea of NCC is to provide more robust classification results by using a set of prior probabilities to model uncertainty in the face of incomplete or small-scale data sets, i.e., multiple possible categories can be returned in the face of uncertain instances. The Bayesian framework learns to update the prior with a profile representing the data evidence to calculate a posterior probability that can be used for decision-making [
16]. Formally, a classifier is a function that maps instances of a set of variables (called attributes or features) to the state or class of a class variable.
The credal network utilizes Bayesian network theory for the state value
Xc of
xc and then calculates the probability
P(
xc|
xE) [
16] by looking at the specific values
xE present in the evidence variable
XE, as shown in Equation (7).
In the formula, I is the number of multi-state random variables in the Bayesian network, P(xi|πi) is the conditional probability quality function, xi is the observation value of the ith random variable Xi, Xi X, X represents all random variables in the network, πi is an observation value of Пi, which represents the state of the parent node of Xi, XM = X\(XEXc); represents a full probability operation on different states of variables in the node variable set XM.
Bayes classifiers perform classification by comparing the calculated posterior probabilities, and the category with the largest posterior probability is the classification result. However, when there is not a sufficient number of samples, Bayesian classifiers may return biased prior-dependent classification results, i.e., depending on the different priors employed, it may identify different classes as the most likely. However, any single a priori choice carries a certain arbitrariness, and these classifications are highly uncertain [
17]. The credal network classifier relaxes the classification results of Bayesian classifiers by accepting imprecise probabilistic representations [
18]. In a Bayesian classifier, each category of a class variable has a single-valued probability. In contrast, in a credal network classifier, the occurrence probability of each class can be expressed as an interval valued probability, that is, an imprecise precision probability.
In the literature [
18], the authors introduced Credal Set (CS) for credal networks in order to deal with the uncertainty of the node random variables. The credal set is used to describe the imprecise probabilistic properties of a node random variable, and mathematically, the credal set
K(
Xi) is defined as a closed convex set that covers all possible probabilistic mass functions
P(
Xi) of the random variable
Xi. The credible set K(
Xi) is:
K(Xi) represents the closed convex set consisting of all possible probability mass functions P(Xi) of the random variable Xi, CH represents a convex hull, means that the sum of all possible probabilities must equal 1, and ΩXi is the range of values for Xi.
As shown in Equation (9), there may be many combinations of prior distribution and observed data, so the credal set contains an infinite number of probability mass functions, but it only contains a finite number of extreme mass functions, which are called the vertices of the credal set, denoted as
ext[
K(
Xi)]. These extremal functions correspond to the vertices that make up the convex hull, and they can be obtained by combining the endpoints of the probability interval. The classification of a credal network classifier consists of calculating the upper and lower bounds of the conditional probability of
Xc =
xc given
XE =
xE, a goal that can be achieved by calculating the upper and lower bounds using Equations (9) and (10).
In the formula, P(X) represents the joint probabilistic mass function of all random variables, K(X) is the convex hull of a set of joint mass functions, i.e., the credal set, ext[K(X)] represents the limiting joint mass function of K(X), and P(X) ∈ ext[K(X)], which means that P(X) should be selected from ext[K(X)].
In this paper, IDM is used to model the prior and then return the imprecise probabilities, which are integrated into the credal network classifier to achieve the organic combination of IDM and credal classifier.
3.3. Naive Credal Classifier Classification Control Standards
Bayesian classifiers determine sample categories based on the principle of maximizing posterior probability in probability theory. Utilizing Bayesian networks, the classifier calculates the probability of each category by applying Bayes’ theorem given known input evidence
x. It then compares these probabilities and selects the category with the highest posterior probability as its classification decision, as illustrated in
Figure 1. In
Figure 1, P(
C1|X), P(
C2|X) on the axis, …, P(
C5|X) is calculated by a Bayesian classifier and the classification result is
C1 because P(
C1|X) has the greatest posterior probability.
Figure 2 and
Figure 3 illustrate the diagnostic logic and output results of the credal network classifier. As shown in
Figure 2, after computation, the Naive Credal Classifier category
C1 as having a lower bound of posterior imprecise probability higher than all other categories. Consequently,
C1 was designated as the sole diagnostic result under evidence condition
X. However, in
Figure 3, the lower limit of the posterior imprecision probability for
C1 is lower than the upper limit of the posterior imprecision probability for
C2, which indicates that the probability intervals of the two may overlap, as shown by the shaded area in the
Figure 3. In this case, the Naive Credal Classifier cannot determine an exact classification result, but instead provides a set of possible categories {
C1,
C2}, indicating that the sample is likely to be classified as
C1 or
C2 based on the conditions of evidence.
As can be seen, compared to Bayesian classifiers, the credal network controller provides a larger probability margin when performing category diagnosis on samples. When samples are unique, the credal network controller can deliver higher judgment reliability [
19]. When faced with overlapping regions of maximum a posteriori imprecise probability intervals, the credal network controller can generate sets encompassing multiple possible categories, effectively reducing the risk of misdiagnosis.