1. Introduction
The guide to designing and elaborating a reliable system is to know the kind of failures and errors that may happen during operation. One way to do this is to study the failures occurring in the observed production systems. System planners and data center developers can use the ideas obtained from such a study to improve the efficiency of the system and increase the resistance to failure for future systems by determining some issues and developing stronger techniques, operational policies, and application designs. Modeling failure time can also improve the process of error correction techniques and other mitigation strategies. For instance, if a particular type of memory failure occurs frequently, data scientists may develop algorithms that can detect and correct errors in real time or implement hardware redundancy to ensure continuity of processing. Overall, modeling failure time is an essential aspect of ensuring the reliability of data science workflows. By accounting for the potential failures of hardware components, data scientists can develop more robust and resilient systems that are better equipped to handle the challenges of large-scale data processing and analysis.
One way to measure chip reliability and its predicted lifetime is by using statistical analysis tools, which will estimate the lifetime of a processor for a large batch of manufactured chips.
Bivariate models are used to model the relationship between two variables. They have a wide range of applications in different fields such as finance, economics, technology, engineering, medicine, and many more. They allow the separation of the marginal distributions of the variables from the dependence structure between them. This makes it a flexible method for modeling complex distributions, especially in multivariate contexts. Many authors have proposed and studied bivariate models and shown their broad applications in many fields of science. In the last decade, many studies have discussed the idea of generating new bivariate models based on a multiform copula function. A copula function is an effective tool for modeling the dependency structure of variables, regardless of their marginals. They have a rich class of dependence models, such as Gaussian, t, Clayton, and Gumbel copulas, which can be used to model an immense range of dependence structures. Copulas are defined over the unit square. By using a copula function, it is possible to generate random numbers with a specified dependence structure, see Nelsen [
1].
Flores [
2] discussed various bivariate Weibull models based on different copula functions, such as the Farlie–Gumbel–Morgenstern, Clayton, Gumbel–Hougaard, Ali–Mikhail–Haq, and Gumbel–Barnett copulas. A new bivariate Gaussian–Weibull model was proposed by Verrill et al. [
3]. Later on, many authors focused their statistical analysis on bivariate models; see [
4] who developed a bivariate generalized Rayleigh distribution depending on the Clayton copula function, [
5] who introduced Bivariate power Lomax distribution and [
6] created a new bivariate Fréchet distribution using the Farlie–Gumbel–Morgenstern and the Ali–Mikhail–Haq copulas. A bivariate Weibull distribution was proposed by [
7] based on the FGM copula function. In [
8], the authors studied some families of bivariate Kumaraswamy distribution using different copulas. Others used the Marshall–Olkin bivariate copulas; one may refer to ([
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]). More recent work on bivariate models with copula functions with applications can be found in [
19,
20], among others.
Along with the method for estimating parameters, it is also worth noting that the selection depends on the specific application and the assumptions, robustness, and availability of the data. It is necessary to evaluate the performance of each method by using the appropriate statistical measures and possibly comparing it with other methods to confirm the results.
Yet, there are still many important issues in some practical cases, where the classical bivariate models do not support a suitable fit to the data from real-life experiments. Accordingly, there is an extensive need to develop new flexible bivariate distributions. One of the main motivations for using a bivariate distribution with copulas is its ability to capture the non-linear and asymmetric dependencies between variables. Copulas provide a rich class of dependence models, which can be used to model many cases of dependence structures. In addition, copulas facilitate the computation of various probability measures, such as the probability of a joint event, the probability of one event conditional on another, and the probability of exceeding certain thresholds. Another important motivation is the ability to use copulas for risk assessment and portfolio optimization. The flexibility of copulas in modeling the dependence structure makes it a suitable tool for financial and insurance applications, where the non-linear and asymmetric dependencies between variables are often present.
Sklar [
21] found the probability density function (pdf) and the cumulative distribution function (CDF) for the two-dimension copula function. As a result, if
X and
Y are two random variables for the distribution functions
and
, respectively, the joint CDF and pdf for bivariate copula are given as
and
Different kinds of copula functions have been defined based on the above equations, such as the Farlie–Gumbel–Morgenstern copula. This copula is a well-known parametric family of the copula; this family was first introduced by Gumbel [
22]. The joint CDF and pdf of the Farlie–Gumbel–Morgenstern copula are given respectively as
where
and
and
are vectors of the parameters for the variables
X and
Y, respectively.
represents the copula parameter and it has a value in the range
.
In this work, we formulated a bivariate model based on a new extension of the modified exponential lifetime, see El-Damcese and Ramadan [
23]. This new extension of the exponential distribution is called the modified extended exponential (MExE) distribution. This new model has three parameters and offers great flexibility to the density function; it also has an increasing failure rate and increasing reliability function. Many of its properties and inferential analysis were discussed in [
24,
25,
26], where in the latter, the estimation under progressively hybrid Type-II and progressive hybrid Type-I censored samples were considered and applied to fit some mechanical data. The attractive properties of the MExE that have been discussed in the literature encourage us to use it as a bivariate model as we expect to fit many real data from different fields of science.
The suggested bivariate distribution is based on the Farlie–Gumbel–Morgenstern copula function, which is applied to the modified extended exponential distribution, and it is denoted by bivariate modified extended exponential (BMExE) distribution. Statistical properties for this distribution are discussed. Point and interval estimation for the parameters was performed using the maximum likelihood estimation and the Bayesian estimation methods, where a numerical technique such as the Metropolis–Hastings algorithm was utilized for evaluating estimators. Three types of confidence intervals to estimate the unknown parameters were considered, namely, asymptomatic, Bayesian credible, and bootstrap intervals. Some measures of goodness-of-fit were used to fit the bivariate model such as the Akaike information criterion (AIC), the Kolmogorov–Smirnov test, the corrected Akaike information criterion (CAIC), the Bayesian information criterion (BIC), and the Hannan–Quinn information criterion (HQIC). The real data examples show the suitability of the suggested distribution compared with some competitive traditional distributions.
The remaining parts of this paper appear as follows: the BMExE distribution is defined in
Section 2. Statistical characteristics of the BMExE distribution are presented in
Section 3, while the maximum likelihood estimation and the Bayesian estimation were performed in
Section 4. In
Section 5, confidence intervals are discussed.
Section 6 shows the suitability of the new model, which was illustrated by a simulation study. In
Section 7, the applications of computers’ processors and memories of real data units are discussed. Finally, the conclusion and some remarks for the BMExE distribution are presented in
Section 8.
2. The Bivariate Modified Extended Exponential Model
The modified extended exponential distribution was first studied by El-Damcese and Ramadan [
23], in which the pdf and CDF are given, respectively, by:
and
The reliability and hazard functions of MEED are given by
and
respectively.
According to Equations (
1) and (
2), and using the copula defined in Equations (
3) and (
4) with
X and
Y are two random variables following the MExE function as in Equations (
5) and (
6); the joint pdf and CDF of the bivariate MExE distribution with the Farlie–Gumbel–Morgenstern copula function are given as follows:
and
where
and
3. Properties of the New Model
In this section, various important statistical properties were studied and are presented for the new bivariate model.
3.1. Marginal and Conditional Distributions
Let
X and
be two random variables following the MExE distribution, then the marginal density functions are given by
and
respectively. For the conditional pdf of
X given
Y, it can be written as
where
and
Additionally, we can define the conditional CDF of
X given
Y as:
On the other hand, the conditional pdf of
Y given
X is given as:
The conditional CDF of
Y given
X is obtained as:
A bivariate sample can be generated from the MExE distribution by using the conditional way as summarized below:
Generate independent variables W and Z from uniform (0,1) distribution.
Let
Let
in Equation (
12) to find
y by a numerical method such as Newton–Raphson.
To obtain , , repeat the above steps n times.
3.2. Product Moments
If the random variables
X and
Y are distributed as a BMExE distribution, then its
and
moments about zero can be obtained as follows:
Then,
where
,
and
The above result can be obtained using some integration techniques to obtain the needed result.
3.3. Moment Generating Function
Let
X and
Y be random variables with the pdf presented in Equation (
9). Then, the moment generating function (
) of
is given by
By using the exponential series expansion
, we observe
Using the product moment defined in the previous section in Equation (
13), the
of
is written as
where
and
3.4. Reliability Function
Osmetti and Chiodini [
27] demonstrated the idea of finding the reliability of a joint survival function and they showed that it is more convenient to work with
X and
Y as random variables with survival functions
and
as follows:
The reliability functions of the marginal distributions are defined as
and
The joint survival function for copula is expressed as
Hence, the reliability function of the BMExE distribution is
where
and
Basu [
28] was the first who defined the bivariate failure rate function which is given as
Hence, the hazard rate function of the BMExE distribution is
4. Inference under Complete and Censored Samples
In this section, the classical and non-classical estimation problems were considered; for classical, we used the maximum likelihood estimation and discussed the point and interval estimation for the unknown parameters of the BMExE distribution. Bayesian estimation is the most powerful non-classical estimation method; hence, point and credible intervals were obtained for the unknown parameters. These estimation issues were handled for both complete samples and Type-II censored samples.
4.1. The Maximum Likelihood Estimation
In this subsection, we explored the maximum likelihood estimation (MLE) for the unknown parameters
, subject to complete and Type-II censored samples. Suppose that
,
are the
n observed values from the BMExE distribution. The likelihood function for a bivariate model was discussed by Kim et al. [
29]. Therefore, the likelihood function for
under a complete sample is defined as follows:
Then,
where
and
The log-likelihood function
can be expressed as follows:
Differentiating Equation (
15) partially with respect to the model’s parameters, we have:
and
By equating the above partial derivatives with zero, we get a system of non-linear normal equations. This system needs a numerical solution such as a nonlinear optimization algorithm.
Now, for the censored sample case, the likelihood function under a Type-II censored sample is written as follows:
Substituting Equations (
6) and (
9) into Equation (
16), the log-likelihood function
becomes
where
and
The log-likelihood function
can be expressed as follows:
Differentiating Equation (
18) partially with respect to the model’s parameters, we have:
and
By equating these partial derivative equations to zeros, we get a system of non-linear (normal) equations.
Since there is no closed-form expression for the MLE
and
, their values were computed numerically using a nonlinear optimization algorithm. All numerical calculations and their results were obtained and are summarized in
Section 6.
4.2. Bayesian Estimation
The Bayesian method considers the parameters as random variables; the ambiguity of the parameters was defined as a joint prior distribution, which was observed before the collected failure times. The ability to collect prior knowledge in the analysis makes the Bayesian approach very helpful in reliability analysis. The limitation of data availability is one of the main challenges related to the reliability analysis; see [
30]. Bayesian estimates of the unknown parameters
were achieved with respect to the squared error loss function (SEL).
The Bayesian estimation uses an appropriate choice of prior(s) for estimating each parameter. According to the Bayesian estimation theory, no prior distribution for a parameter is considered the best until it is tested and validated. Additionally, most prior distributions are selected according to one’s subjective knowledge and beliefs. Hence, if one has enough knowledge of the parameter(s), it is wise to select an informative prior(s); otherwise, it is better to consider non-informative prior(s). For this research, we selected non-informative priors (uniform) and an informative prior (gamma). These assumed that prior distributions have been used widely by several authors such as [
31,
32,
33]. The definitions of the above listed loss function are presented as follows:
For informative prior, it is assumed that the parameters
are independent and follow the prior distributions,
For non-informative prior, it is assumed that the parameters
are independent and follow the uniform distributions,
For non-prior, it is assumed that the parameters
are independent and follow
The posterior distribution of
and
, denoted by
x̠,y̠), can be obtained by combining the likelihood function with the priors and it can be written as
A commonly-used loss function is the SEL, which is a symmetrical loss function that assigns equal losses to overestimation and underestimation. If
is the parameter to be estimated by an estimator
, then the square error loss function is defined as
Therefore, the Bayes estimate of any function of
, say
under the SEL function, can be obtained as
where
It was noticed that the ratio of multiple integrals in Equation (
22) cannot be obtained in an explicit form. Thus, the MCMC technique generates samples from the joint posterior density function. To implement the MCMC technique, we considered the Gibbs within Metropolis–Hasting samplers procedure.
In the MCMC method, we estimated the posterior distribution and the multi-integrals via simulated samples generated from the posterior distribution. The MCMC technique was performed, where Gibbs sampling and the Metropolis–Hastings (M-H) algorithm were used for finding the MCMC method. That algorithm was first studied by [
34,
35]. Similar to the acceptance-rejection sampling, the M–H algorithm combines each iteration of the algorithm with a chosen value that can be generated from a certain distribution; so, the chosen value was accepted according to a suitable acceptance probability. This action assures the convergence of the Markov chain for the objective density. For more information about the applications of the M–H algorithm, one may refer to [
36,
37,
38].
For the Type-II censoring data, Equation (
16) can be used to replace Equation (
14) to obtain the Bayes estimates of the unknown parameters
and
. At this point, we concluded that the idea behind using the MCMC method over the MLE method is that it can obtain a reasonable interval estimate of the parameters by constructing the confidence intervals based on the empirical posterior distribution. This is regularly inaccessible in the MLE case. Chen and Shao [
39] used a technique that extensively generates the highest posterior density intervals of unknown parameters of the distribution. In this study, the M–H algorithm was used to select samples that were used to generate failure times estimates.
5. Confidence Intervals
In this section, we introduce two methods for generating confidence intervals for the unknown parameters of the BMExE distribution, the asymptotic confidence interval, and the bootstrap confidence interval of , , where and . There are two parts to the bootstrap method; bootstrap percentile and bootstrap-t.
5.1. Asymptotic Confidence Intervals
The asymptotic property of normal distribution for the MLEs is the most commonly used method for producing confidence bounds for the parameters. The Fisher information matrix is created in combination with the asymptotic variance-covariance matrix of the MLE for the parameters by getting the negative second derivatives of the log-likelihood function at
Assume the parameter vector
’s asymptotic variance-covariance matrix is
where
is the variance-covariance matrix.
A confidence interval for the parameter vector can be built based on the MLE’s asymptotic normality. Using the asymptotic normality of the MLE, a confidence interval for parameter vector can be constructed as
and where is the percentile of the standard normal distribution with a right tail probability of .
5.2. Bootstrap Confidence Interval
Bootstrap is a re-sampling method used in statistical inference. It is frequently utilized to compute confidence intervals; for more information, see Efron [
40]. In this subsection, we used the parametric bootstrap method to compute confidence intervals for the unknown parameters
, where
and
. We provide two parametric bootstrap methods for confidence intervals, percentile bootstrap, and bootstrap-t confidence intervals.
5.2.1. Percentile Bootstrap Confidence Interval
The following steps summarize the algorithm for obtaining percentile bootstrap confidence intervals:
MLE and Bayesian estimators are measured for the BMExE distribution’s parameters;
Create bootstrap samples with , , and obtain the bootstrap estimate, respectively, as , , and using the bootstrap sample;
Repeat step (2) times to have ,
, and ;
Arrange and
in ascending order as and .
The two side percentile bootstrap confidence intervals are given by and
.
5.2.2. Bootstrap-t Confidence Intervals
The following steps summarize the algorithm for obtaining bootstrap-t confidence intervals:
The steps (1, 2) are the same as in Boot-p;
Where is asymptotic variances of and it can be obtained using the Fisher information matrix, the t-statistic of is computed as ;
Repeat steps 2-3 times and obtain ;
Arrange in ascending order as ;
The two side percentile bootstrap confidence intervals are given by
,
,
and
.
6. Simulation Study
In this section, a Monte Carlo simulation was performed using the copula function. The BMExE parameters were estimated using the R program. Nelsen [
1] described how to generate a sample from a given joint distribution. We can generate a bivariate sample using the conditional approach by following the procedure below.
By using uniform distribution, generate U and V independently;
Set ;
Set
in Equation (
12) to obtain
y by numerical analysis;
To obtain , , repeat the above steps (n) times.
A simulation algorithm: simulation experiments were performed with the following data generated from the BMExE distribution, where are following the MExE distribution with , , and where , thus the parameters , and were obtained with the following cases for the random variables, generating:
Case-I: and .
Case-II: and .
Case-III: and .
We chose several sample sizes, such as n = 40, 100, and 200, and different Type-II censored sample sizes, such as r = 30, and 40, for n = 40, r = 70, and 85 for n = 100, and r = 160, and 180 for n = 200. The bias, mean square error, length of asymptotic confidence intervals, length of bootstrap-p, and length of bootstrap-t were calculated for MLE, while bias, the mean square error, and length of credible confidence intervals were calculated for the Bayesian method. A comparison was performed between the different approaches of the resulting estimators with respect to the bias and MSE, where is the number of simulated samples.
The of confidence intervals were produced using asymptotic MLE and credible intervals and they were evaluated against different criteria. The length of the typical confidence intervals was compared. Estimates of the parameters in the Bayes technique were generated from three cases as informative, non-informative priors, and non-prior in order to evaluate the type of prior. In the case of informative priors, the hyper-parameters were selected by elective hyper-parameters utilizing MLE information to display the outcomes of the estimated parameters.
When sample size grows and Bayes estimates are used, the bias, mean square error, and length of confidence intervals for computed MLE parameters have a downward trend;
For bias, mean square error, and length of the asymptotic confidence interval under the Bayes estimates consistently outperform the MLEs;
The performance improves as the censored sample size r rises, keeping the Type-II censored sample size n and time fixed;
Compared to the confidence interval methods, the bootstrap confidence interval estimates are superior as they have the least confidence length;
Compared to the estimation method, the Bayesian estimates are superior.
Table A2,
Table A3 and
Table A4 are shown in
Appendix A, where the notation MSE refers to mean square error, LACI is the length of asymptotic confidence intervals, LBP is the length of bootstrap-p, LBT is the length of bootstrap-t, and LCCI refers to the length of the credible confidence intervals.
7. Application of Real Data
The data set contains n = 50 simulated primitive computer series systems, each with a processor and a memory. The computer works if both components of the system are in good working order. Let us assume that the system is going through a latent deterioration process. Over a small period of time, the deterioration progresses swiftly (in hours). The system becomes more susceptible to shocks as a result, making it possible for a lethal shock to randomly destroy either the first, the second, or both components. Because a fatal shock can kill both components at once, the independence premise could not be reliable, hence we used the Farlie–Gumbel–Morgenstern to examine this issue. Life of the processor can be denoted as
X and memory lifetime can be denoted as
Y. The data set is shown in
Table A1 in
Appendix A.
In order to check the outliers in the processor and memory data,
Figure 2 was obtained, where a scatter plot with a boxplot was used to differentiate between the various groups. In contrast,
Figure 3 presents the dependence plot of processor and memory, which have been reviewed to indicate independence in the data and the density distribution of numerical data.
Figure 2 and
Figure 3 show that the data have right-skewed shapes, non-symmetric ships, and low negative correlation.
The MLE estimator of marginal parameters with standard error (SE) and Kolmogorov–Smirnov (KS) statistics with the
p-value for all competing models, as MExE is included in
Table 1.
Figure 4 displays the estimated CDF with empirical CDF, pdf with histogram, and PP-plots for each sample.
Table 1 includes the estimates’ results along with various goodness-of-fit indices. We note that the data are consistent with this distribution.
Table 2 discusses different measures of goodness-of-fit, including AIC, CAIC, the BIC, and (HQIC), as well as the MLE estimator for the marginal parameters with SE. The comparison was performed for three models: BMExE, Bivariate Lomax–Claim (BLC), which was introduced by [
41], bivariate Fréchet (BF) by [
6], and bivariate Lomax (BL) by [
42].
Table 2 displays the MLE with SE for the BMExE model parameters. Comparisons of bivariate models based on the BMExE model, utilizing the AIC, CIAC, BIC, and HQIC metrics, are covered in
Table 2. These findings lead us to the conclusion that the BMExE distribution outperforms other bivariate distributions in terms of the goodness-of-fit measures mentioned before. We found that Bayesian estimates outperform MLE in terms of the value of SE when comparing them.
The distributions of the parameters from the computer data set were trace plotted in
Figure 5 for
and
to follow the posterior density of the MCMC outputs when r = 50 (complete sample). The marginal posterior density estimates of the parameters of the BMExE distribution, along with their histograms based on 10,000 chain values, are also displayed in
Figure 6 for
and
. In the case of the copula parameter, we generated a plot of the MCMC, shown in
Figure 7. All of the generated posteriors are symmetric with regard to the theoretical posterior density functions, according to the estimations, which are obvious in these figures.
Table 3 discusses point and interval estimation for the unknown parameters using the MLE and Bayesian methods based on Type-II censored samples with different samples and a 95% confidence level. The results show how well the MCMC algorithm converges and how near the 95% asymptotic and higher posterior credible confidence interval boundaries are spaced. Also, the SE has a downward trend when r increases.
8. Conclusions
In this paper, we proposed a new bivariate family called the BMExE, which is derived from the Farlie–Gumbel–Morgenster copula function and the modified extended exponential distribution. This family’s statistical properties have been described; as a result, it can be used very efficiently in life testing data such as for medical data and computer survival times. The maximum likelihood method and the Bayesian method with the M–H algorithm were used to estimate parameters. Three types of prior distributions were assumed; the informative Gamma, the non-informative, and the non-prior distributions. As a result, we can conclude that Bayesian estimation is the best estimator for the BMExE distribution. Furthermore, the new bivariate model can be utilized in place of other classical bivariate distributions for a variety of applications. The BMExE model is efficient because the marginal functions are the same as the basic distribution; in addition, the moment-generating function and product moments have closed forms. For estimating unknown parameters, three types of confidence intervals were considered: asymptomatic, bootstrap, and Bayesian credible intervals. A real data analysis based on computers’ failure times for their processors and memories was performed using some goodness-of-fit tests; the analysis shows the suitability of the new bivariate model with the failure times of computers. Some limitations may appear in modeling bivariate data, such as the skewness, the sample size, and the dependence issue between failure causes. Hence, more work is still needed in this field to improve more flexible bivariate models.